RE: [PATCH 52/55] scsi: Move prototype declaration to header file megaraid/megaraid_sas.h from megaraid/megaraid_sas_fusion.c
>-Original Message- >From: Rashika Kheria [mailto:rashika.khe...@gmail.com] >Sent: Sunday, March 30, 2014 12:19 AM >To: linux-ker...@vger.kernel.org >Cc: DL-MegaRAID Linux; James E.J. Bottomley; linux-scsi@vger.kernel.org; >j...@joshtriplett.org >Subject: [PATCH 52/55] scsi: Move prototype declaration to header file >megaraid/megaraid_sas.h from megaraid/megaraid_sas_fusion.c > >Move prototype declaration of function to header file >megaraid/megaraid_sas.h from megaraid/megaraid_sas_fusion.c because it is >used by more than one file. > >This eliminates the following warning in megaraid/megaraid_sas_fp.c: >drivers/scsi/megaraid/megaraid_sas_fp.c:1223:5: warning: no previous >prototype for ‘get_updated_dev_handle’ [-Wmissing-prototypes] > >Signed-off-by: Rashika Kheria >Reviewed-by: Josh Triplett >--- > drivers/scsi/megaraid/megaraid_sas.h|3 +++ > drivers/scsi/megaraid/megaraid_sas_fusion.c |2 -- > 2 files changed, 3 insertions(+), 2 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas.h >b/drivers/scsi/megaraid/megaraid_sas.h >index 3b0afb4..17fe706 100644 >--- a/drivers/scsi/megaraid/megaraid_sas.h >+++ b/drivers/scsi/megaraid/megaraid_sas.h >@@ -1737,6 +1737,9 @@ megasas_check_and_restore_queue_depth(struct >megasas_instance *instance); void megasas_free_cmds(struct >megasas_instance *instance); int megasas_alloc_cmds(struct >megasas_instance *instance); > >+u16 get_updated_dev_handle(struct LD_LOAD_BALANCE_INFO *lbInfo, >+ struct IO_REQUEST_INFO *in_info); >+ > u8 > MR_BuildRaidContext(struct megasas_instance *instance, > struct IO_REQUEST_INFO *io_info, >diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >b/drivers/scsi/megaraid/megaraid_sas_fusion.c >index ce6219c..b3d79f4 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >@@ -63,8 +63,6 @@ wait_and_poll(struct megasas_instance *instance, struct >megasas_cmd *cmd); int megasas_clear_intr_fusion(struct >megasas_register_set __iomem *regs); > >-u16 get_updated_dev_handle(struct LD_LOAD_BALANCE_INFO *lbInfo, >- struct IO_REQUEST_INFO *in_info); > int megasas_transition_to_ready(struct megasas_instance *instance, int ocr); > > extern u32 megasas_dbg_lvl; Acked-by: Sumit Saxena -Sumit >-- >1.7.9.5 > N�r��yb�X��ǧv�^�){.n�+{���"�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!�i
RE: [PATCH 51/55] scsi: Move prototype declaration to header file megaraid/megaraid_sas_fusion.h from megaraid/megaraid_sas_base.c
>-Original Message- >From: Rashika Kheria [mailto:rashika.khe...@gmail.com] >Sent: Sunday, March 30, 2014 12:18 AM >To: linux-ker...@vger.kernel.org >Cc: DL-MegaRAID Linux; James E.J. Bottomley; linux-scsi@vger.kernel.org; >j...@joshtriplett.org >Subject: [PATCH 51/55] scsi: Move prototype declaration to header file >megaraid/megaraid_sas_fusion.h from megaraid/megaraid_sas_base.c > >Move prototype declarations of functions to header file >megaraid/megaraid_sas_fusion.h from megaraid/megaraid_sas_base.c >because they are used by more than one file. > >This eliminates the following type of warnings in >megaraid/megaraid_sas_fusion.c: >drivers/scsi/megaraid/megaraid_sas_fusion.c:2170:1: warning: no previous >prototype for ‘megasas_release_fusion’ [-Wmissing-prototypes] > >Signed-off-by: Rashika Kheria >Reviewed-by: Josh Triplett >--- > drivers/scsi/megaraid/megaraid_sas_base.c | 13 - > drivers/scsi/megaraid/megaraid_sas_fusion.h | 14 ++ > 2 files changed, 14 insertions(+), 13 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >b/drivers/scsi/megaraid/megaraid_sas_base.c >index 9768deee..0ad386b 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_base.c >+++ b/drivers/scsi/megaraid/megaraid_sas_base.c >@@ -160,21 +160,8 @@ u32 > megasas_build_and_issue_cmd(struct megasas_instance *instance, > struct scsi_cmnd *scmd); > static void megasas_complete_cmd_dpc(unsigned long instance_addr); -void >-megasas_release_fusion(struct megasas_instance *instance); -int - >megasas_ioc_init_fusion(struct megasas_instance *instance); -void - >megasas_free_cmds_fusion(struct megasas_instance *instance); >-u8 >-megasas_get_map_info(struct megasas_instance *instance); -int - >megasas_sync_map_info(struct megasas_instance *instance); int >wait_and_poll(struct megasas_instance *instance, struct megasas_cmd >*cmd); -void megasas_reset_reply_desc(struct megasas_instance *instance); >-int megasas_reset_fusion(struct Scsi_Host *shost); -void >megasas_fusion_ocr_wq(struct work_struct *work); > > static void > megasas_issue_dcmd(struct megasas_instance *instance, struct >megasas_cmd *cmd) diff --git >a/drivers/scsi/megaraid/megaraid_sas_fusion.h >b/drivers/scsi/megaraid/megaraid_sas_fusion.h >index 35a5139..01e5ab3 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >@@ -760,4 +760,18 @@ union desc_value { > } u; > }; > >+void >+megasas_release_fusion(struct megasas_instance *instance); int >+megasas_ioc_init_fusion(struct megasas_instance *instance); void >+megasas_free_cmds_fusion(struct megasas_instance *instance); >+u8 >+megasas_get_map_info(struct megasas_instance *instance); int >+megasas_sync_map_info(struct megasas_instance *instance); void >+megasas_reset_reply_desc(struct megasas_instance *instance); int >+megasas_reset_fusion(struct Scsi_Host *shost); void >+megasas_fusion_ocr_wq(struct work_struct *work); >+ > #endif /* _MEGARAID_SAS_FUSION_H_ */ Acked-by: Sumit Saxena -Sumit >-- >1.7.9.5 >
RE: [PATCH 30/55] scsi: Mark functions as static in megaraid/megaraid_sas_fp.c
>-Original Message- >From: Rashika Kheria [mailto:rashika.khe...@gmail.com] >Sent: Saturday, March 29, 2014 11:48 PM >To: linux-ker...@vger.kernel.org >Cc: DL-MegaRAID Linux; James E.J. Bottomley; linux-scsi@vger.kernel.org; >j...@joshtriplett.org >Subject: [PATCH 30/55] scsi: Mark functions as static in >megaraid/megaraid_sas_fp.c > >Mark functions as static in megaraid/megaraid_sas_fp.c because they are not >used outside this file. > >This eliminates the following warning in megaraid/megaraid_sas_fp.c: >drivers/scsi/megaraid/megaraid_sas_fp.c:80:5: warning: no previous >prototype for ‘mega_mod64’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:98:5: warning: no previous >prototype for ‘mega_div64_32’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:206:5: warning: no previous >prototype for ‘MR_GetSpanBlock’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:341:5: warning: no previous >prototype for ‘mr_spanset_get_span_block’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:582:4: warning: no previous >prototype for ‘get_arm’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:705:4: warning: no previous >prototype for ‘MR_GetPhyParams’ [-Wmissing-prototypes] >drivers/scsi/megaraid/megaraid_sas_fp.c:1196:4: warning: no previous >prototype for ‘megasas_get_best_arm’ [-Wmissing-prototypes] > >Signed-off-by: Rashika Kheria >Reviewed-by: Josh Triplett >--- > drivers/scsi/megaraid/megaraid_sas_fp.c | 26 ++ > 1 file changed, 14 insertions(+), 12 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_fp.c >b/drivers/scsi/megaraid/megaraid_sas_fp.c >index e24b6eb..83d5f74 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fp.c >+++ b/drivers/scsi/megaraid/megaraid_sas_fp.c >@@ -77,7 +77,7 @@ static u8 mr_spanset_get_phy_params(struct >megasas_instance *instance, u32 ld, static u64 get_row_from_strip(struct >megasas_instance *instance, u32 ld, > u64 strip, struct MR_FW_RAID_MAP_ALL *map); > >-u32 mega_mod64(u64 dividend, u32 divisor) >+static u32 mega_mod64(u64 dividend, u32 divisor) > { > u64 d; > u32 remainder; >@@ -95,7 +95,7 @@ u32 mega_mod64(u64 dividend, u32 divisor) > * > * @return quotient > **/ >-u64 mega_div64_32(uint64_t dividend, uint32_t divisor) >+static u64 mega_div64_32(uint64_t dividend, uint32_t divisor) > { > u32 remainder; > u64 d; >@@ -203,7 +203,7 @@ u8 MR_ValidateMapInfo(struct megasas_instance >*instance) > return 1; > } > >-u32 MR_GetSpanBlock(u32 ld, u64 row, u64 *span_blk, >+static u32 MR_GetSpanBlock(u32 ld, u64 row, u64 *span_blk, > struct MR_FW_RAID_MAP_ALL *map) > { > struct MR_SPAN_BLOCK_INFO *pSpanBlock = MR_LdSpanInfoGet(ld, >map); @@ -338,7 +338,7 @@ static int getSpanInfo(struct >MR_FW_RAID_MAP_ALL *map, PLD_SPAN_INFO ldSpanInfo) > *div_error - Devide error code. > */ > >-u32 mr_spanset_get_span_block(struct megasas_instance *instance, >+static u32 mr_spanset_get_span_block(struct megasas_instance *instance, > u32 ld, u64 row, u64 *span_blk, struct >MR_FW_RAID_MAP_ALL *map) { > struct fusion_context *fusion = instance->ctrl_context; @@ -579,8 >+579,8 @@ static u32 get_arm_from_strip(struct megasas_instance >*instance, } > > /* This Function will return Phys arm */ >-u8 get_arm(struct megasas_instance *instance, u32 ld, u8 span, u64 stripe, >- struct MR_FW_RAID_MAP_ALL *map) >+static u8 get_arm(struct megasas_instance *instance, u32 ld, u8 span, >+u64 stripe, struct MR_FW_RAID_MAP_ALL *map) > { > struct MR_LD_RAID *raid = MR_LdRaidGet(ld, map); > /* Need to check correct default value */ @@ -702,10 +702,11 @@ >static u8 mr_spanset_get_phy_params(struct megasas_instance *instance, >u32 ld, > *span - Span number > *block - Absolute Block number in the physical disk > */ >-u8 MR_GetPhyParams(struct megasas_instance *instance, u32 ld, u64 >stripRow, >- u16 stripRef, struct IO_REQUEST_INFO *io_info, >- struct RAID_CONTEXT *pRAID_Context, >- struct MR_FW_RAID_MAP_ALL *map) >+static u8 MR_GetPhyParams(struct megasas_instance *instance, u32 ld, >+u64 stripRow, u16 stripRef, >+struct IO_REQUEST_INFO *io_info, >+struct RAID_CONTEXT *pRAID_Context, >+struct MR_FW_RAID_MAP_ALL *map) > { > struct MR_LD_RAID *raid = MR_LdRaidGet(ld, map); > u32 pd, arRef; >@@ -1193,8 +1194,9 @@ mr_update_load_balance_params(struct >MR_FW_RAID_MAP_ALL *map, > } > } > >-u8 megasas_get_best_arm(struct LD_LOAD_BALANCE_INFO *lbInfo, u8 arm, >u64 block, >- u32 count) >+static u8 megasas_get_best_arm(struct LD_LOAD_BALANCE_INFO *lbInfo, >u8 arm, >+ u64 block, >+ u32 count) > { > u16 pend0, pe
RE: [PATCH 29/55] scsi: Mark functions as static in megaraid/megaraid_sas_fusion.c
>-Original Message- >From: Rashika Kheria [mailto:rashika.khe...@gmail.com] >Sent: Saturday, March 29, 2014 11:47 PM >To: linux-ker...@vger.kernel.org >Cc: DL-MegaRAID Linux; James E.J. Bottomley; linux-scsi@vger.kernel.org; >j...@joshtriplett.org >Subject: [PATCH 29/55] scsi: Mark functions as static in >megaraid/megaraid_sas_fusion.c > >Mark functions as static in megaraid/megaraid_sas_fusion.c because they are >not used outside this file. > >This eliminates the warnings of following type in >megaraid/megaraid_sas_fusion.c: >drivers/scsi/megaraid/megaraid_sas_fusion.c:91:1: warning: no previous >prototype for ‘megasas_enable_intr_fusion’ [-Wmissing-prototypes] > >Signed-off-by: Rashika Kheria >Reviewed-by: Josh Triplett >--- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 39 ++- > > 1 file changed, 20 insertions(+), 19 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >b/drivers/scsi/megaraid/megaraid_sas_fusion.c >index 2806d6d..ce6219c 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >@@ -74,7 +74,7 @@ extern int resetwaittime; > * megasas_enable_intr_fusion - Enables interrupts > * @regs: MFI register set > */ >-void >+static void > megasas_enable_intr_fusion(struct megasas_instance *instance) { > struct megasas_register_set __iomem *regs; @@ -94,7 +94,7 @@ >megasas_enable_intr_fusion(struct megasas_instance *instance) > * megasas_disable_intr_fusion - Disables interrupt > * @regs: MFI register set > */ >-void >+static void > megasas_disable_intr_fusion(struct megasas_instance *instance) { > u32 mask = 0x; >@@ -134,8 +134,8 @@ megasas_clear_intr_fusion(struct >megasas_register_set __iomem *regs) > * > * Returns a free command from the pool > */ >-struct megasas_cmd_fusion *megasas_get_cmd_fusion(struct >megasas_instance >-*instance) >+static struct megasas_cmd_fusion *megasas_get_cmd_fusion( >+struct megasas_instance *instance) > { > unsigned long flags; > struct fusion_context *fusion = >@@ -363,7 +363,7 @@ static int megasas_create_frame_pool_fusion(struct >megasas_instance *instance) > * and is used as SMID of the cmd. > * SMID value range is from 1 to max_fw_cmds. > */ >-int >+static int > megasas_alloc_cmds_fusion(struct megasas_instance *instance) { > int i, j, count; >@@ -919,7 +919,7 @@ megasas_display_intel_branding(struct >megasas_instance *instance) > * > * This is the main function for initializing firmware. > */ >-u32 >+static u32 > megasas_init_adapter_fusion(struct megasas_instance *instance) { > struct megasas_register_set __iomem *reg_set; @@ -1037,7 +1037,7 >@@ fail_alloc_mfi_cmds: > * @frame_count : Number of frames for the command > * @regs :MFI register set > */ >-void >+static void > megasas_fire_cmd_fusion(struct megasas_instance *instance, > dma_addr_t req_desc_lo, > u32 req_desc_hi, >@@ -1059,7 +1059,7 @@ megasas_fire_cmd_fusion(struct megasas_instance >*instance, > * @ext_status : ext status of cmd returned by FW > */ > >-void >+static void > map_cmd_status(struct megasas_cmd_fusion *cmd, u8 status, u8 >ext_status) { > >@@ -1199,7 +1199,7 @@ megasas_make_sgl_fusion(struct megasas_instance >*instance, > * > * Used to set the PD LBA in CDB for FP IOs > */ >-void >+static void > megasas_set_pd_lba(struct MPI2_RAID_SCSI_IO_REQUEST *io_request, u8 >cdb_len, > struct IO_REQUEST_INFO *io_info, struct scsi_cmnd *scp, > struct MR_FW_RAID_MAP_ALL *local_map_ptr, u32 >ref_tag) @@ -1376,7 +1376,7 @@ megasas_set_pd_lba(struct >MPI2_RAID_SCSI_IO_REQUEST *io_request, u8 cdb_len, > * Prepares the io_request and chain elements (sg_frame) for IO > * The IO can be for PD (Fast Path) or LD > */ >-void >+static void > megasas_build_ldio_fusion(struct megasas_instance *instance, > struct scsi_cmnd *scp, > struct megasas_cmd_fusion *cmd) >@@ -1678,7 +1678,7 @@ NonFastPath: > * Invokes helper functions to prepare request frames > * and sets flags appropriate for IO/Non-IO cmd > */ >-int >+static int > megasas_build_io_fusion(struct megasas_instance *instance, > struct scsi_cmnd *scp, > struct megasas_cmd_fusion *cmd) >@@ -1749,7 +1749,7 @@ megasas_build_io_fusion(struct megasas_instance >*instance, > return 0; > } > >-union MEGASAS_REQUEST_DESCRIPTOR_UNION * >+static union MEGASAS_REQUEST_DESCRIPTOR_UNION * > megasas_get_request_descriptor(struct megasas_instance *instance, u16 >index) { > u8 *p; >@@ -1829,7 +1829,7 @@ megasas_build_and_issue_cmd_fusion(struct >megasas_instance *instance, > * @instance: Adapter soft state > * Completes all comman
RE: [PATCH trivial 1/3] megaraid_sas: Spelling s/intance/instance/
>-Original Message- >From: Geert Uytterhoeven [mailto:ge...@linux-m68k.org] >Sent: Tuesday, March 25, 2014 2:07 AM >To: Jiri Kosina >Cc: linux-ker...@vger.kernel.org; Geert Uytterhoeven; DL-MegaRAID Linux; >linux-scsi@vger.kernel.org >Subject: [PATCH trivial 1/3] megaraid_sas: Spelling s/intance/instance/ > >From: Geert Uytterhoeven > >Signed-off-by: Geert Uytterhoeven >Cc: Neela Syam Kolli >Cc: linux-scsi@vger.kernel.org >--- > drivers/scsi/megaraid/megaraid_sas_base.c |2 +- > drivers/scsi/megaraid/megaraid_sas_fusion.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >b/drivers/scsi/megaraid/megaraid_sas_base.c >index 3b7ad10497fe..10082678077c 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_base.c >+++ b/drivers/scsi/megaraid/megaraid_sas_base.c >@@ -3865,7 +3865,7 @@ fail_ready_state: > > /** > * megasas_release_mfi - Reverses the FW initialization >- * @intance: Adapter soft state >+ * @instance: Adapter soft state > */ > static void megasas_release_mfi(struct megasas_instance *instance) { diff -- >git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >b/drivers/scsi/megaraid/megaraid_sas_fusion.c >index f6555921fd7a..f7d68f65f974 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >@@ -2164,7 +2164,7 @@ megasas_issue_dcmd_fusion(struct >megasas_instance *instance, > > /** > * megasas_release_fusion - Reverses the FW initialization >- * @intance: Adapter soft state >+ * @instance: Adapter soft state > */ > void > megasas_release_fusion(struct megasas_instance *instance) Acked-by: Sumit Saxena -Sumit >-- >1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 04/10/2014 10:36 PM, James Bottomley wrote: > On Thu, 2014-04-10 at 19:52 +0200, Hannes Reinecke wrote: >> On 04/10/2014 05:31 PM, Alan Stern wrote: >>> On Thu, 10 Apr 2014, Hannes Reinecke wrote: >>> On 04/10/2014 12:58 PM, Andreas Reis wrote: > That patch appears to work in preventing the crashes, judged on one > repeated appearance of the bug. > > dmesg had the usual > [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing > [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using > xhci_hcd > [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > with disabled ep 880427b829c0 > [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > with disabled ep 880427b82a08 > [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing > > repeated five times, followed by one > [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error > recovery > > and then as often as something tried to read from it: > [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device > > The stick could then be properly un- and remounted (the latter if it > had been physically replugged) without issue � for the bug to > reoccur after one to three minutes. I tried this three times, no > dmesg difference except the ep addresses varied on two of that. > Was this just that patch you've tested with or the entire patch series? If the latter, Alan, is this the expected outcome? >>> >>> Yes, it is. The same thing should happen with the entire patch series. >>> I would've thought the error recover should _not_ run into offlining devices here, but rather the device should be recovered eventually. >>> >>> The command times out, it is aborted, and the command is retried. The >>> same thing happens, and we repeat five times. Eventually the SCSI core >>> gives up and declares the device to be offline. >>> >> Hmm. Ok. If you are fine with it who am I to argue here. >> James, shall I resent the patch series? > > You mean the one patch? No, it's OK, I have it. > > It's still not complete, though, as I've said a couple of times. The > problem is that we have abort memory on any eh command as well, which > this doesn't fix. > > The scenario is abort command, set flag, abort completes, send TUR, TUR > doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd() > will skip the abort because the flag is set and move straight to reset. > > The fix is this, I can just add it as well. > > James > > --- > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 771c16b..7516e2c 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -920,6 +920,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct > scsi_eh_save *ses, > ses->prot_op = scmd->prot_op; > > scmd->prot_op = SCSI_PROT_NORMAL; > + scmd->eh_eflags = 0; > scmd->cmnd = ses->eh_cmnd; > memset(scmd->cmnd, 0, BLK_MAX_CDB); > memset(&scmd->sdb, 0, sizeof(scmd->sdb)); > > Oh yes, that is correct. Acked-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
aic94xx: maybe uninitialized variable in asd_process_ctrl_a_user
Hi James, While building a recent kernel with -Werror I found this warning: drivers/scsi/aic94xx/aic94xx_sds.c: In function 'asd_read_flash': drivers/scsi/aic94xx/aic94xx_sds.c:597:21: error: 'offs' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/scsi/aic94xx/aic94xx_sds.c:985:6: note: 'offs' was declared here This looks like a valid complaint from the compiler, since in asd_process_ctrl_a_user if the call to asd_find_flash_de fails (and returns -ENOENT) then offs will not be set, but that will not prevent the variable to be later passed to the call to asd_read_flash_seg later in that same function. Would you please have a look at it? Let me know if there's a more appropriate way to report these issues (e.g. bug tracker.) Thanks, Filipe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
On 04/10/14 at 04:34pm, Jiang Liu wrote: > Hi Baoquan, > Could you please help to give output of "lspci -"? > Is device "hpsa :03:00.0" a legacy PCI device(non-PCIe)? > It may have relationship with IOMMU driver. > Thanks! > Gerry Well, the machine bug was reported on is a AMD machine, and it doesn't have the IOMMU problem. David saw there are some DMAR errors, it should be a intel machine which use the VT-d. > > On 2014/4/10 12:03, Bjorn Helgaas wrote: > > [+cc Joerg, iommu list] > > > > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: > >> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > >>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > >> [+linux-scsi] > >> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > >>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > Hi, > > The kernel is 3.14.0+ which is pulled just now. > >>> > >>> Cc'ing more people. > >>> > >>> While the hpsa driver appears to be involved in some way, I'm sure if > >>> this is a related issue, but as of today's pull I'm getting another > >>> problem that causes my DL980 not to come up. > >>> > >>> *Massive* amounts of: > >>> > >>> DMAR:[fault reason 02] Present bit in context entry is clear > >>> dmar: DRHD: handling fault status reg 602 > >>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000 > >>> > >>> Then: > >>> > >>> hpsa :03:00.0: Controller lockup detected: 0x > >>> ... > >>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > >>> ... > >>> > >>> Screenshot of the actual LOCKUP: > >>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png > >>> > >>> While I haven't bisected, things worked fine until at least until > >>> commit > >>> 39de65aa2c3e (April 2nd). > >>> > >>> Any ideas? > >> > >> Well, it's either a DMA remapping issue or a hpsa one. Your assertion > >> that everything worked fine until 39de65aa2c3e would tend to vindicate > >> hpsa, > > Hmm here you mean DMA, right? > >>> > >>> No, it vindicates the hpsa changes ... they don't seem to be causing > >>> problems until something goes wrong with dma remapping. > >>> > > because all the hpsa changes went in before that under > > Missing crucial info: > > > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > > > >> Merge: 3e75c6d b2bff6c > >> Author: Linus Torvalds > >> Date: Tue Apr 1 18:49:04 2014 -0700 > >> > >> Merge tag 'scsi-misc' of > >> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > >> > >> can you revalidate that this commit works OK just to make sure? > > Ok so I don't see those DMA messages and system starts just fine. I'm > thinking perhaps something broke after the IO mmu stuff in commit > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly > causing the CPU stalls and just blame hpsa in the path as a side effect? > > /me goes out to try the commit. > >>> > >>> That's my guess. The DMAR messages are DMA remapping issues caused in > >>> the IOMMU. If I had to guess, I'd say the DMAR fault message is > >>> indicating the IOMMU is calling for a mapping address before it can > >>> satisfy the driver read request, which is causing the hang apparently in > >>> the hpsa driver. > >>> > >>> I've added linux-pci to the cc; I think they deal with iommu issues on > >>> x86. > >> > >> So that merge commit appears to be the culprit, I see both the DMA > >> messages and the lockup blaming hpsa... > > > > My understanding so far (please correct me if I'm wrong): > > > > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'") > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hpsa: fix uninitialized trans_support in hpsa_put_ctlr_into_performant_mode()
This patch works for me. Tested-by: Baoquan He Thanks Baoquan On 04/10/14 at 05:17pm, scame...@beardog.cce.hp.com wrote: > > Without this, you'll see a null pointer dereference in > hpsa_enter_performant_mode(). > > Signed-off-by: Stephen M. Cameron > --- > drivers/scsi/hpsa.c |4 > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c > index 8cf4a0c..ef4dfdd 100644 > --- a/drivers/scsi/hpsa.c > +++ b/drivers/scsi/hpsa.c > @@ -7463,6 +7463,10 @@ static void hpsa_put_ctlr_into_performant_mode(struct > ctlr_info *h) > if (hpsa_simple_mode) > return; > > + trans_support = readl(&(h->cfgtable->TransportSupport)); > + if (!(trans_support & PERFORMANT_MODE)) > + return; > + > /* Check for I/O accelerator mode support */ > if (trans_support & CFGTBL_Trans_io_accel1) { > transMethod |= CFGTBL_Trans_io_accel1 | > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] blk-mq: move request structures into struct blk_mq_tags
On 2014-04-10 04:01, Christoph Hellwig wrote: On Wed, Apr 09, 2014 at 10:23:32AM -0600, Jens Axboe wrote: This should go into block/blk-mq-tag.h. Ok. We might as well leave this, the mtip32xx conversion ends up using it. So if we pull it now, it'll just be reintroduced shortly. It's back in the latest revision of the patch, just taking a struct blk_mq_tag pointer now so that it can be used by SCSI as well. I've also changed an opencode variant of it to use the helper. Great. Will you send out an updated patchset? -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: esp_scsi QTAG in FAS216
Hello Kars, >> > I've never seen a formula for any ESP or FAS chip for the timeout >> > other than the one mentioned in huge comment in >> > esp_set_clock_params(), although I do see the 7668 instead of 8192 >> > factor being used in the old NCR53C9x driver. >> >> I haven't gone far enough back in the 53C9x revision history to be >> certain. but it would seem to me that Kars de Jong added that FAS >> special case. >> >> Can you confirm that, Kars? Any recollection as to the reason? > > That is the value that's in the data manual of the Symbios Logic > SYM53CF94/96-2 (the actual chip that's in my Amiga SCSI controller). > > Funny, according to the QLogic FAS2x6 manual the value should be 7682 > for FAS216/216U/236/236U chips... > > I don't think it's all that important. It only means that the actual > selection timeout used by the chip will be slightly shorter than it is > supposed to be. Thanks for checking that - I agree that it might not amount to much. The more important issue is the one about the one-byte reconnect message. Does the manual speak to that particular issue? Any hint on how we could enable SCSI-2 features on chip init? Can you point me to a source for the manuals if possible? Cheers, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
On 04/10/14 at 04:34pm, Jiang Liu wrote: > Hi Baoquan, > Could you please help to give output of "lspci -"? > Is device "hpsa :03:00.0" a legacy PCI device(non-PCIe)? > It may have relationship with IOMMU driver. > Thanks! > Gerry Hi, I just saw your mail now. Do you still need the output of "lspci -" on my test machine? In fact, I didn't see the DMAR error related to intel vt-d issues. If the output is helpful, I can make a latest build to do this. Thanks Baoquan > > On 2014/4/10 12:03, Bjorn Helgaas wrote: > > [+cc Joerg, iommu list] > > > > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: > >> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > >>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > >> [+linux-scsi] > >> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > >>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > Hi, > > The kernel is 3.14.0+ which is pulled just now. > >>> > >>> Cc'ing more people. > >>> > >>> While the hpsa driver appears to be involved in some way, I'm sure if > >>> this is a related issue, but as of today's pull I'm getting another > >>> problem that causes my DL980 not to come up. > >>> > >>> *Massive* amounts of: > >>> > >>> DMAR:[fault reason 02] Present bit in context entry is clear > >>> dmar: DRHD: handling fault status reg 602 > >>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000 > >>> > >>> Then: > >>> > >>> hpsa :03:00.0: Controller lockup detected: 0x > >>> ... > >>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > >>> ... > >>> > >>> Screenshot of the actual LOCKUP: > >>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png > >>> > >>> While I haven't bisected, things worked fine until at least until > >>> commit > >>> 39de65aa2c3e (April 2nd). > >>> > >>> Any ideas? > >> > >> Well, it's either a DMA remapping issue or a hpsa one. Your assertion > >> that everything worked fine until 39de65aa2c3e would tend to vindicate > >> hpsa, > > Hmm here you mean DMA, right? > >>> > >>> No, it vindicates the hpsa changes ... they don't seem to be causing > >>> problems until something goes wrong with dma remapping. > >>> > > because all the hpsa changes went in before that under > > Missing crucial info: > > > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > > > >> Merge: 3e75c6d b2bff6c > >> Author: Linus Torvalds > >> Date: Tue Apr 1 18:49:04 2014 -0700 > >> > >> Merge tag 'scsi-misc' of > >> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > >> > >> can you revalidate that this commit works OK just to make sure? > > Ok so I don't see those DMA messages and system starts just fine. I'm > thinking perhaps something broke after the IO mmu stuff in commit > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly > causing the CPU stalls and just blame hpsa in the path as a side effect? > > /me goes out to try the commit. > >>> > >>> That's my guess. The DMAR messages are DMA remapping issues caused in > >>> the IOMMU. If I had to guess, I'd say the DMAR fault message is > >>> indicating the IOMMU is calling for a mapping address before it can > >>> satisfy the driver read request, which is causing the hang apparently in > >>> the hpsa driver. > >>> > >>> I've added linux-pci to the cc; I think they deal with iommu issues on > >>> x86. > >> > >> So that merge commit appears to be the culprit, I see both the DMA > >> messages and the lockup blaming hpsa... > > > > My understanding so far (please correct me if I'm wrong): > > > > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'") > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] async scsi resume for 3.15
Hi Linus, James might still be in the process of sending this your way. However, given the proximity to -rc1, my reasoning for sending this directly is: 1/ It provides a tangible speed up for a non-esoteric use case (laptop resume): https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach 2/ You already pulled the first half of this enabling from Tejun. Quoting Tejun's ATA pull request: "Dan finishes the patchset to make libata PM operations asynchronous. Combined with one patch being routed through scsi, this should speed resume measurably." 3/ As far as I can tell it is acceptable to James: http://marc.info/?l=linux-scsi&m=139499409510791&w=2 http://marc.info/?l=linux-scsi&m=139508044602605&w=2 http://marc.info/?l=linux-scsi&m=139536062515216&w=2 4/ I promised Todd I would get it upstream before he returns from vacation. Please pull, thank you. -- Dan The following changes since commit 455c6fdbd219161bd09b1165f11699d6d73de11c: Linux 3.14 (2014-03-30 20:40:15 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci async-scsi-resume for you to fetch changes up to 3c31b52f96f7b559d950b16113c0f68c72a1985e: scsi: async sd resume (2014-04-10 15:30:35 -0700) Dan Williams (1): scsi: async sd resume drivers/scsi/Kconfig | 3 ++ drivers/scsi/scsi.c | 9 drivers/scsi/scsi_pm.c | 128 --- drivers/scsi/scsi_priv.h | 2 + drivers/scsi/scsi_scan.c | 2 +- drivers/scsi/sd.c| 1 + 6 files changed, 115 insertions(+), 30 deletions(-) 8<- >From 3c31b52f96f7b559d950b16113c0f68c72a1985e Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 10 Apr 2014 15:30:35 -0700 Subject: [PATCH] scsi: async sd resume async_schedule() sd resume work to allow disks and other devices to resume in parallel. This moves the entirety of scsi_device resume to an async context to ensure that scsi_device_resume() remains ordered with respect to the completion of the start/stop command. For the duration of the resume, new command submissions (that do not originate from the scsi-core) will be deferred (BLKPREP_DEFER). It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container of these operations. Like scsi_sd_probe_domain it is flushed at sd_remove() time to ensure async ops do not continue past the end-of-life of the sdev. The implementation explicitly refrains from reusing scsi_sd_probe_domain directly for this purpose as it is flushed at the end of dpm_resume(), potentially defeating some of the benefit. Given sdevs are quiesced it is permissible for these resume operations to bleed past the async_synchronize_full() calls made by the driver core. We defer the resolution of which pm callback to call until scsi_dev_type_{suspend|resume} time and guarantee that the callback parameter is never NULL. With this in place the type of resume operation is encoded in the async function identifier. There is a concern that async resume could trigger PSU overload. In the enterprise, storage enclosures enforce staggered spin-up regardless of what the kernel does making async scanning safe by default. Outside of that context a user can disable asynchronous scanning via a kernel command line or CONFIG_SCSI_SCAN_ASYNC. Honor that setting when deciding whether to do resume asynchronously. Inspired by Todd's analysis and initial proposal [2]: https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach Cc: Len Brown Cc: Phillip Susi [alan: bug fix and clean up suggestion] Acked-by: Alan Stern Suggested-by: Todd Brandt [djbw: kick all resume work to the async queue] Signed-off-by: Dan Williams --- drivers/scsi/Kconfig | 3 ++ drivers/scsi/scsi.c | 9 drivers/scsi/scsi_pm.c | 128 --- drivers/scsi/scsi_priv.h | 2 + drivers/scsi/scsi_scan.c | 2 +- drivers/scsi/sd.c| 1 + 6 files changed, 115 insertions(+), 30 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index c8bd092..02832d6 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -263,6 +263,9 @@ config SCSI_SCAN_ASYNC You can override this choice by specifying "scsi_mod.scan=sync" or async on the kernel's command line. + Note that this setting also affects whether resuming from + system suspend will be performed asynchronously. + menu "SCSI Transports" depends on SCSI diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index d8afec8..1b345bf 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -91,6 +91,15 @@ EXPORT_SYMBOL(scsi_logging_level); ASYNC_DOMAIN(scsi_sd_probe_domain); EXPORT_SYMBOL(scsi_sd_probe_domain); +/* + * Separate domain (from scsi_sd_probe_domain) to maximize the benef
Re: hpsa driver bug crack kernel down!
On Thu, Apr 10, 2014 at 2:45 PM, wrote: >> > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'") >> >> Yes, specifically (finally done bisecting): >> >> commit 2e45528930388658603ea24d49cf52867b928d3e >> Author: Jiang Liu >> Date: Wed Feb 19 14:07:36 2014 +0800 >> >> iommu/vt-d: Unify the way to process DMAR device scope array >> >> Now we have a PCI bus notification based mechanism to update DMAR >> device scope array, we could extend the mechanism to support boot >> time initialization too, which will help to unify and simplify >> the implementation. >> >> Signed-off-by: Jiang Liu >> Signed-off-by: Joerg Roedel > > My git bisect appears to be converging on something else, something > within the hpsa patches that I sent up recently, unfortunately for > me. Will let you all know when it converges. > This smells very much like the problem that was solved couple of years ago for SI domain. It is likely that path is broken with the DMAR device scope array change. Please take a look to see if the following no longer occurs. Looks like BIOS could be expecting this RMRR to be still mapped. /* * We want to prevent any device associated with an RMRR from * getting placed into the SI Domain. This is done because * problems exist when devices are moved in and out of domains * and their respective RMRR info is lost. We exempt USB devices * from this process due to their usage of RMRRs that are known * to not be needed after BIOS hand-off to OS. */ if (device_has_rmrr(dev) && (pdev->class >> 8) != PCI_CLASS_SERIAL_USB) return 0; -- Shuah -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hpsa: fix uninitialized trans_support in hpsa_put_ctlr_into_performant_mode()
On Thu, 2014-04-10 at 17:17 -0500, scame...@beardog.cce.hp.com wrote: > Without this, you'll see a null pointer dereference in > hpsa_enter_performant_mode(). So I'm not surprised that this patch doesn't solve the problem I am seeing with DMAR and the hpsa driver hard lockup. In any case it should address Baoquan's original report, so it confirms that it is in fact two different sets of issues. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] hpsa: fix uninitialized trans_support in hpsa_put_ctlr_into_performant_mode()
Without this, you'll see a null pointer dereference in hpsa_enter_performant_mode(). Signed-off-by: Stephen M. Cameron --- drivers/scsi/hpsa.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 8cf4a0c..ef4dfdd 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -7463,6 +7463,10 @@ static void hpsa_put_ctlr_into_performant_mode(struct ctlr_info *h) if (hpsa_simple_mode) return; + trans_support = readl(&(h->cfgtable->TransportSupport)); + if (!(trans_support & PERFORMANT_MODE)) + return; + /* Check for I/O accelerator mode support */ if (trans_support & CFGTBL_Trans_io_accel1) { transMethod |= CFGTBL_Trans_io_accel1 | -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa NULL pointer in hpsa_enter_performant_mode()
On Thu, Apr 10, 2014 at 04:20:46PM -0500, scame...@beardog.cce.hp.com wrote: > On Thu, Apr 10, 2014 at 02:53:30PM -0600, Bjorn Helgaas wrote: > > [subject changed] > > > > On Thu, Apr 10, 2014 at 2:45 PM, wrote: > > > On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote: > > >> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote: > > >> > [+cc Joerg, iommu list] > > >> > > > >> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso > > >> > wrote: > > >> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > > >> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > > >> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > > >> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > > >> > >> > > > [+linux-scsi] > > >> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > > >> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > > >> > >> > > > > > Hi, > > >> > >> > > > > > > > >> > >> > > > > > The kernel is 3.14.0+ which is pulled just now. > > >> > >> > > > > > > >> > >> > > > > Cc'ing more people. > > >> > >> > > > > > > >> > >> > > > > While the hpsa driver appears to be involved in some way, > > >> > >> > > > > I'm sure if > > >> > >> > > > > this is a related issue, but as of today's pull I'm getting > > >> > >> > > > > another > > >> > >> > > > > problem that causes my DL980 not to come up. > > >> > >> > > > > > > >> > >> > > > > *Massive* amounts of: > > >> > >> > > > > > > >> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear > > >> > >> > > > > dmar: DRHD: handling fault status reg 602 > > >> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > > >> > >> > > > > 7f61e000 > > >> > >> > > > > > > >> > >> > > > > Then: > > >> > >> > > > > > > >> > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x > > >> > >> > > > > ... > > >> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > > >> > >> > > > > ... > > >> > >> > > > > > > >> > >> > > > > Screenshot of the actual LOCKUP: > > >> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png > > >> > >> > > > > > > >> > >> > > > > While I haven't bisected, things worked fine until at least > > >> > >> > > > > until commit > > >> > >> > > > > 39de65aa2c3e (April 2nd). > > >> > >> > > > > > > >> > >> > > > > Any ideas? > > >> > >> > > > > > >> > >> > > > Well, it's either a DMA remapping issue or a hpsa one. Your > > >> > >> > > > assertion > > >> > >> > > > that everything worked fine until 39de65aa2c3e would tend to > > >> > >> > > > vindicate > > >> > >> > > > hpsa, > > >> > >> > > > >> > >> > Hmm here you mean DMA, right? > > >> > >> > > >> > >> No, it vindicates the hpsa changes ... they don't seem to be causing > > >> > >> problems until something goes wrong with dma remapping. > > >> > >> > > >> > >> > > because all the hpsa changes went in before that under > > >> > >> > > Missing crucial info: > > >> > >> > > > > >> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > > >> > >> > > > > >> > >> > > > Merge: 3e75c6d b2bff6c > > >> > >> > > > Author: Linus Torvalds > > >> > >> > > > Date: Tue Apr 1 18:49:04 2014 -0700 > > >> > >> > > > > > >> > >> > > > Merge tag 'scsi-misc' of > > >> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > > >> > >> > > > > > >> > >> > > > can you revalidate that this commit works OK just to make > > >> > >> > > > sure? > > >> > >> > > > >> > >> > Ok so I don't see those DMA messages and system starts just fine. > > >> > >> > I'm > > >> > >> > thinking perhaps something broke after the IO mmu stuff in commit > > >> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be > > >> > >> > indirectly > > >> > >> > causing the CPU stalls and just blame hpsa in the path as a side > > >> > >> > effect? > > >> > >> > > > >> > >> > /me goes out to try the commit. > > >> > >> > > >> > >> That's my guess. The DMAR messages are DMA remapping issues caused > > >> > >> in > > >> > >> the IOMMU. If I had to guess, I'd say the DMAR fault message is > > >> > >> indicating the IOMMU is calling for a mapping address before it can > > >> > >> satisfy the driver read request, which is causing the hang > > >> > >> apparently in > > >> > >> the hpsa driver. > > >> > >> > > >> > >> I've added linux-pci to the cc; I think they deal with iommu issues > > >> > >> on > > >> > >> x86. > > >> > > > > >> > > So that merge commit appears to be the culprit, I see both the DMA > > >> > > messages and the lockup blaming hpsa... > > >> > > > >> > My understanding so far (please correct me if I'm wrong): > > >> > > > >> > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > > >> > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > > > > > ^^^ this one, 1a0b6abaea78, did not work for me, crashing in > > > hpsa_enter_performant mode() which was surprsing to me as I am > > > pretty sure I tried on this very same machine I'm using now > > > (DL360p wi
Re: hpsa NULL pointer in hpsa_enter_performant_mode()
On Thu, Apr 10, 2014 at 02:53:30PM -0600, Bjorn Helgaas wrote: > [subject changed] > > On Thu, Apr 10, 2014 at 2:45 PM, wrote: > > On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote: > >> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote: > >> > [+cc Joerg, iommu list] > >> > > >> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: > >> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > >> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > >> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > >> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > >> > >> > > > [+linux-scsi] > >> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > >> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > >> > >> > > > > > Hi, > >> > >> > > > > > > >> > >> > > > > > The kernel is 3.14.0+ which is pulled just now. > >> > >> > > > > > >> > >> > > > > Cc'ing more people. > >> > >> > > > > > >> > >> > > > > While the hpsa driver appears to be involved in some way, I'm > >> > >> > > > > sure if > >> > >> > > > > this is a related issue, but as of today's pull I'm getting > >> > >> > > > > another > >> > >> > > > > problem that causes my DL980 not to come up. > >> > >> > > > > > >> > >> > > > > *Massive* amounts of: > >> > >> > > > > > >> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear > >> > >> > > > > dmar: DRHD: handling fault status reg 602 > >> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > >> > >> > > > > 7f61e000 > >> > >> > > > > > >> > >> > > > > Then: > >> > >> > > > > > >> > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x > >> > >> > > > > ... > >> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > >> > >> > > > > ... > >> > >> > > > > > >> > >> > > > > Screenshot of the actual LOCKUP: > >> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png > >> > >> > > > > > >> > >> > > > > While I haven't bisected, things worked fine until at least > >> > >> > > > > until commit > >> > >> > > > > 39de65aa2c3e (April 2nd). > >> > >> > > > > > >> > >> > > > > Any ideas? > >> > >> > > > > >> > >> > > > Well, it's either a DMA remapping issue or a hpsa one. Your > >> > >> > > > assertion > >> > >> > > > that everything worked fine until 39de65aa2c3e would tend to > >> > >> > > > vindicate > >> > >> > > > hpsa, > >> > >> > > >> > >> > Hmm here you mean DMA, right? > >> > >> > >> > >> No, it vindicates the hpsa changes ... they don't seem to be causing > >> > >> problems until something goes wrong with dma remapping. > >> > >> > >> > >> > > because all the hpsa changes went in before that under > >> > >> > > Missing crucial info: > >> > >> > > > >> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > >> > >> > > > >> > >> > > > Merge: 3e75c6d b2bff6c > >> > >> > > > Author: Linus Torvalds > >> > >> > > > Date: Tue Apr 1 18:49:04 2014 -0700 > >> > >> > > > > >> > >> > > > Merge tag 'scsi-misc' of > >> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > >> > >> > > > > >> > >> > > > can you revalidate that this commit works OK just to make sure? > >> > >> > > >> > >> > Ok so I don't see those DMA messages and system starts just fine. > >> > >> > I'm > >> > >> > thinking perhaps something broke after the IO mmu stuff in commit > >> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly > >> > >> > causing the CPU stalls and just blame hpsa in the path as a side > >> > >> > effect? > >> > >> > > >> > >> > /me goes out to try the commit. > >> > >> > >> > >> That's my guess. The DMAR messages are DMA remapping issues caused in > >> > >> the IOMMU. If I had to guess, I'd say the DMAR fault message is > >> > >> indicating the IOMMU is calling for a mapping address before it can > >> > >> satisfy the driver read request, which is causing the hang apparently > >> > >> in > >> > >> the hpsa driver. > >> > >> > >> > >> I've added linux-pci to the cc; I think they deal with iommu issues on > >> > >> x86. > >> > > > >> > > So that merge commit appears to be the culprit, I see both the DMA > >> > > messages and the lockup blaming hpsa... > >> > > >> > My understanding so far (please correct me if I'm wrong): > >> > > >> > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > >> > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > > > ^^^ this one, 1a0b6abaea78, did not work for me, crashing in > > hpsa_enter_performant mode() which was surprsing to me as I am > > pretty sure I tried on this very same machine I'm using now > > (DL360p with P420, P430 and P420i) with 3.14-rc-something plus > > all the hpsa patches that I thought were merged in. > > I think we have to completely different issues mixed together in this > thread, so I changed the subject here. Thanks. > > The reports above for 39de65aa2c3e, 1a0b6abaea78, were for a DMA fault. > > The original message from Baoquan He was for
hpsa NULL pointer in hpsa_enter_performant_mode()
[subject changed] On Thu, Apr 10, 2014 at 2:45 PM, wrote: > On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote: >> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote: >> > [+cc Joerg, iommu list] >> > >> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: >> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: >> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: >> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: >> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: >> > >> > > > [+linux-scsi] >> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: >> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: >> > >> > > > > > Hi, >> > >> > > > > > >> > >> > > > > > The kernel is 3.14.0+ which is pulled just now. >> > >> > > > > >> > >> > > > > Cc'ing more people. >> > >> > > > > >> > >> > > > > While the hpsa driver appears to be involved in some way, I'm >> > >> > > > > sure if >> > >> > > > > this is a related issue, but as of today's pull I'm getting >> > >> > > > > another >> > >> > > > > problem that causes my DL980 not to come up. >> > >> > > > > >> > >> > > > > *Massive* amounts of: >> > >> > > > > >> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear >> > >> > > > > dmar: DRHD: handling fault status reg 602 >> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr >> > >> > > > > 7f61e000 >> > >> > > > > >> > >> > > > > Then: >> > >> > > > > >> > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x >> > >> > > > > ... >> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa] >> > >> > > > > ... >> > >> > > > > >> > >> > > > > Screenshot of the actual LOCKUP: >> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png >> > >> > > > > >> > >> > > > > While I haven't bisected, things worked fine until at least >> > >> > > > > until commit >> > >> > > > > 39de65aa2c3e (April 2nd). >> > >> > > > > >> > >> > > > > Any ideas? >> > >> > > > >> > >> > > > Well, it's either a DMA remapping issue or a hpsa one. Your >> > >> > > > assertion >> > >> > > > that everything worked fine until 39de65aa2c3e would tend to >> > >> > > > vindicate >> > >> > > > hpsa, >> > >> > >> > >> > Hmm here you mean DMA, right? >> > >> >> > >> No, it vindicates the hpsa changes ... they don't seem to be causing >> > >> problems until something goes wrong with dma remapping. >> > >> >> > >> > > because all the hpsa changes went in before that under >> > >> > > Missing crucial info: >> > >> > > >> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 >> > >> > > >> > >> > > > Merge: 3e75c6d b2bff6c >> > >> > > > Author: Linus Torvalds >> > >> > > > Date: Tue Apr 1 18:49:04 2014 -0700 >> > >> > > > >> > >> > > > Merge tag 'scsi-misc' of >> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi >> > >> > > > >> > >> > > > can you revalidate that this commit works OK just to make sure? >> > >> > >> > >> > Ok so I don't see those DMA messages and system starts just fine. I'm >> > >> > thinking perhaps something broke after the IO mmu stuff in commit >> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly >> > >> > causing the CPU stalls and just blame hpsa in the path as a side >> > >> > effect? >> > >> > >> > >> > /me goes out to try the commit. >> > >> >> > >> That's my guess. The DMAR messages are DMA remapping issues caused in >> > >> the IOMMU. If I had to guess, I'd say the DMAR fault message is >> > >> indicating the IOMMU is calling for a mapping address before it can >> > >> satisfy the driver read request, which is causing the hang apparently in >> > >> the hpsa driver. >> > >> >> > >> I've added linux-pci to the cc; I think they deal with iommu issues on >> > >> x86. >> > > >> > > So that merge commit appears to be the culprit, I see both the DMA >> > > messages and the lockup blaming hpsa... >> > >> > My understanding so far (please correct me if I'm wrong): >> > >> > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") >> > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > ^^^ this one, 1a0b6abaea78, did not work for me, crashing in > hpsa_enter_performant mode() which was surprsing to me as I am > pretty sure I tried on this very same machine I'm using now > (DL360p with P420, P430 and P420i) with 3.14-rc-something plus > all the hpsa patches that I thought were merged in. I think we have to completely different issues mixed together in this thread, so I changed the subject here. The reports above for 39de65aa2c3e, 1a0b6abaea78, were for a DMA fault. The original message from Baoquan He was for a NULL pointer dereference in hpsa_enter_performant_mode(), which is very likely the same problem you're seeing, Steve. I changed the subject to "hpsa NULL pointer in hpsa_enter_performant_mode()", so hopefully we can chase the NULL pointer issue there and leave the original, already long thread, for the DMA fault iss
Re: hpsa driver bug crack kernel down!
On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote: > On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote: > > [+cc Joerg, iommu list] > > > > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: > > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > > >> > > > [+linux-scsi] > > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > > >> > > > > > Hi, > > >> > > > > > > > >> > > > > > The kernel is 3.14.0+ which is pulled just now. > > >> > > > > > > >> > > > > Cc'ing more people. > > >> > > > > > > >> > > > > While the hpsa driver appears to be involved in some way, I'm > > >> > > > > sure if > > >> > > > > this is a related issue, but as of today's pull I'm getting > > >> > > > > another > > >> > > > > problem that causes my DL980 not to come up. > > >> > > > > > > >> > > > > *Massive* amounts of: > > >> > > > > > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear > > >> > > > > dmar: DRHD: handling fault status reg 602 > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > > >> > > > > 7f61e000 > > >> > > > > > > >> > > > > Then: > > >> > > > > > > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x > > >> > > > > ... > > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > > >> > > > > ... > > >> > > > > > > >> > > > > Screenshot of the actual LOCKUP: > > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png > > >> > > > > > > >> > > > > While I haven't bisected, things worked fine until at least > > >> > > > > until commit > > >> > > > > 39de65aa2c3e (April 2nd). > > >> > > > > > > >> > > > > Any ideas? > > >> > > > > > >> > > > Well, it's either a DMA remapping issue or a hpsa one. Your > > >> > > > assertion > > >> > > > that everything worked fine until 39de65aa2c3e would tend to > > >> > > > vindicate > > >> > > > hpsa, > > >> > > > >> > Hmm here you mean DMA, right? > > >> > > >> No, it vindicates the hpsa changes ... they don't seem to be causing > > >> problems until something goes wrong with dma remapping. > > >> > > >> > > because all the hpsa changes went in before that under > > >> > > Missing crucial info: > > >> > > > > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > > >> > > > > >> > > > Merge: 3e75c6d b2bff6c > > >> > > > Author: Linus Torvalds > > >> > > > Date: Tue Apr 1 18:49:04 2014 -0700 > > >> > > > > > >> > > > Merge tag 'scsi-misc' of > > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > > >> > > > > > >> > > > can you revalidate that this commit works OK just to make sure? > > >> > > > >> > Ok so I don't see those DMA messages and system starts just fine. I'm > > >> > thinking perhaps something broke after the IO mmu stuff in commit > > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly > > >> > causing the CPU stalls and just blame hpsa in the path as a side > > >> > effect? > > >> > > > >> > /me goes out to try the commit. > > >> > > >> That's my guess. The DMAR messages are DMA remapping issues caused in > > >> the IOMMU. If I had to guess, I'd say the DMAR fault message is > > >> indicating the IOMMU is calling for a mapping address before it can > > >> satisfy the driver read request, which is causing the hang apparently in > > >> the hpsa driver. > > >> > > >> I've added linux-pci to the cc; I think they deal with iommu issues on > > >> x86. > > > > > > So that merge commit appears to be the culprit, I see both the DMA > > > messages and the lockup blaming hpsa... > > > > My understanding so far (please correct me if I'm wrong): > > > > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") ^^^ this one, 1a0b6abaea78, did not work for me, crashing in hpsa_enter_performant mode() which was surprsing to me as I am pretty sure I tried on this very same machine I'm using now (DL360p with P420, P430 and P420i) with 3.14-rc-something plus all the hpsa patches that I thought were merged in. But now I am seeing: [] hpsa_enter_performant_mode+0x4c0/0x540 [hpsa] RSP: 0018:88042c515a78 EFLAGS: 00010297 RAX: RBX: 88042c65 RCX: 0004 RDX: RSI: 0001 RDI: RBP: 88042c515b48 R08: R09: 8af03cc0 R10: R11: 0001 R12: 88042c515a98 R13: 6104 R14: 88042c515ad8 R15: a0001630 FS: 7f86f7a38700() GS:88043f56() knlGS: CS: 0010 DS: ES: CR0: 80050033 usb 1-1.6: new low-speed USB device number 3 using ehci-pci CR2: CR3: 00042c4c3000 CR4: 000407e0 Stack: 00
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On Thu, 2014-04-10 at 19:52 +0200, Hannes Reinecke wrote: > On 04/10/2014 05:31 PM, Alan Stern wrote: > > On Thu, 10 Apr 2014, Hannes Reinecke wrote: > > > >> On 04/10/2014 12:58 PM, Andreas Reis wrote: > >>> That patch appears to work in preventing the crashes, judged on one > >>> repeated appearance of the bug. > >>> > >>> dmesg had the usual > >>> [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing > >>> [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using > >>> xhci_hcd > >>> [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > >>> with disabled ep 880427b829c0 > >>> [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > >>> with disabled ep 880427b82a08 > >>> [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing > >>> > >>> repeated five times, followed by one > >>> [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error > >>> recovery > >>> > >>> and then as often as something tried to read from it: > >>> [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device > >>> > >>> The stick could then be properly un- and remounted (the latter if it > >>> had been physically replugged) without issue � for the bug to > >>> reoccur after one to three minutes. I tried this three times, no > >>> dmesg difference except the ep addresses varied on two of that. > >>> > >> Was this just that patch you've tested with or the entire patch series? > >> > >> If the latter, Alan, is this the expected outcome? > > > > Yes, it is. The same thing should happen with the entire patch series. > > > >> I would've thought the error recover should _not_ run into > >> offlining devices here, but rather the device should be recovered > >> eventually. > > > > The command times out, it is aborted, and the command is retried. The > > same thing happens, and we repeat five times. Eventually the SCSI core > > gives up and declares the device to be offline. > > > Hmm. Ok. If you are fine with it who am I to argue here. > James, shall I resent the patch series? You mean the one patch? No, it's OK, I have it. It's still not complete, though, as I've said a couple of times. The problem is that we have abort memory on any eh command as well, which this doesn't fix. The scenario is abort command, set flag, abort completes, send TUR, TUR doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd() will skip the abort because the flag is set and move straight to reset. The fix is this, I can just add it as well. James --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 771c16b..7516e2c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -920,6 +920,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, ses->prot_op = scmd->prot_op; scmd->prot_op = SCSI_PROT_NORMAL; + scmd->eh_eflags = 0; scmd->cmnd = ses->eh_cmnd; memset(scmd->cmnd, 0, BLK_MAX_CDB); memset(&scmd->sdb, 0, sizeof(scmd->sdb)); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Update Maintainers for IBM Power 842, vscsi, and vfc drivers
On 04/09/2014 01:32 PM, Nathan Fontenot wrote: > Update the MAINTAINERS file to indicate the current maintainers > for the IBM Power 842 Compression driver, IBM Power Virtual SCSI > driver and the IBM Power Virtual FC Driver. > > Signed-off-by: Nathan Fontenot > --- Acked-by: Brian King -- Brian King Power Linux I/O IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 04/10/2014 05:31 PM, Alan Stern wrote: On Thu, 10 Apr 2014, Hannes Reinecke wrote: On 04/10/2014 12:58 PM, Andreas Reis wrote: That patch appears to work in preventing the crashes, judged on one repeated appearance of the bug. dmesg had the usual [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using xhci_hcd [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b829c0 [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b82a08 [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing repeated five times, followed by one [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error recovery and then as often as something tried to read from it: [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device The stick could then be properly un- and remounted (the latter if it had been physically replugged) without issue � for the bug to reoccur after one to three minutes. I tried this three times, no dmesg difference except the ep addresses varied on two of that. Was this just that patch you've tested with or the entire patch series? If the latter, Alan, is this the expected outcome? Yes, it is. The same thing should happen with the entire patch series. I would've thought the error recover should _not_ run into offlining devices here, but rather the device should be recovered eventually. The command times out, it is aborted, and the command is retried. The same thing happens, and we repeat five times. Eventually the SCSI core gives up and declares the device to be offline. Hmm. Ok. If you are fine with it who am I to argue here. James, shall I resent the patch series? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
On Thu, 2014-04-10 at 09:19 -0700, Davidlohr Bueso wrote: > > > > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > > > > > >> > > > > 7f61e000 > > > > That "Present bit in context entry is clear" fault means that we have > > not set up *any* mappings for this PCI device… on this IOMMU. > > > > > > Yes, specifically (finally done bisecting): > > > > > > > > commit 2e45528930388658603ea24d49cf52867b928d3e > > > > Author: Jiang Liu > > > > Date: Wed Feb 19 14:07:36 2014 +0800 > > > > > > > > iommu/vt-d: Unify the way to process DMAR device scope array > > > > This commit is about how we decide which IOMMU a given PCI device is > > attached to. > > > > Thus, my first guess would be that we are quite happily setting up the > > requested DMA maps on the *wrong* IOMMU, and then taking faults when the > > device actually tries to do DMA. > > > > However, I'm not 100% convinced of that. The fault address looks > > suspiciously like a true physical address, not a virtual bus address of > > the type that we'd normally allocate for a dma_map_* operation. Those > > would start at 0xf000 and work downwards, typically. > > > > Do you have 'iommu=pt' on the kernel command line? > > No. > > > Can I see the full > > dmesg as this system boots, and also a copy of the DMAR table? > > Attaching a dmesg from one of the kernels that boots. It doesn't appear > to have much of the related information... It shows us that the address 0x7f61e000 is in an E820-reserved region, and that there's and RMRR covering that region for an unspecified PCI device, but that's going to be the hpsa. So if isn't just a simple case of us assigning this device to the wrong IOMMU, *perhaps* it's that we lose the RMRR when the driver takes control of the device. RMRRs are generally expected to be a boot-time thing, for things like legacy keyboard/mouse emulation via USB. Using them while the system is *active* is... horrid. We've often not quite handled that right. -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: hpsa driver bug crack kernel down!
On Thu, 2014-04-10 at 16:34 +0800, Jiang Liu wrote: > Hi Baoquan, > Could you please help to give output of "lspci -"? Attached. > Is device "hpsa :03:00.0" a legacy PCI device(non-PCIe)? > It may have relationship with IOMMU driver. I honestly don't know. PCI is way out of my area of knowledge. 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 22) Subsystem: Hewlett-Packard Company Device 330b Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- Dis
Re: hpsa driver bug crack kernel down!
On Thu, 2014-04-10 at 09:14 -0600, Bjorn Helgaas wrote: > > Thus, my first guess would be that we are quite happily setting up the > > requested DMA maps on the *wrong* IOMMU, and then taking faults when the > > device actually tries to do DMA. > > > I like the "wrong IOMMU (or no IOMMU at all)" theory. If we didn't > connect the device with an IOMMU at all, that would explain the device > DMAing directly to a physical address, wouldn't it? An unlikely failure mode. We're much more likely to see *wrong* IOMMU than no IOMMU. And thus we'd still see the distinctive virtual addresses just below 4GiB. However, Rob's answer may solve that puzzle. If this is one of those abominations where the device continues to do DMA to system memory even after the OS is up and running and *thinks* it has control of the hardware, then the offending address will be listed in an RMRR entry (which tells the OS to set up a 1:1 mapping for access to certain memory ranges for a given device). And will be inside an E820 reserved region. A little odd that such an error would trigger only when we're actually trying to initialise the device from the Linux driver, not as soon as we enable the IOMMU. But all things are possible. But the DMAR table and dmesg that I asked for would give us a bit more information and hopefully let us stop speculating... > > We should also rate-limit DMA faults, which would avoid the lockup > > failure mode. Bjorn, what should an IOMMU driver *do* when it detects > > that a device is creating an endless stream of DMA faults and isn't > > aborting the transaction? > > You mentioned that POWER with EEH does something intelligent in this > case, but I'm not familiar with that code. We have AER support, which > can result in resetting a device, but I think DMA faults are reported > differently, and I don't think there's any nice existing way for PCI > to deal with them. Maybe there should be, though. Quite frankly, I don't care how *you* deal with them, or even if you can. All I want to know is how I tell you about the problem, because *I* sure as hell don't want to be trying to deal with it in the IOMMU code. That's a generic PCI layer thing. :) -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: hpsa driver bug crack kernel down!
On 4/10/2014 11:14 AM, Bjorn Helgaas wrote: > On Thu, Apr 10, 2014 at 2:46 AM, Woodhouse, David > wrote: > >>> DMAR:[fault reason 02] Present bit in context entry is clear >>> dmar: DRHD: handling fault status reg 602 >>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000 >> >> That "Present bit in context entry is clear" fault means that we have >> not set up *any* mappings for this PCI device… on this IOMMU. >> Yes, specifically (finally done bisecting): commit 2e45528930388658603ea24d49cf52867b928d3e Author: Jiang Liu Date: Wed Feb 19 14:07:36 2014 +0800 iommu/vt-d: Unify the way to process DMAR device scope array >> >> This commit is about how we decide which IOMMU a given PCI device is >> attached to. >> >> Thus, my first guess would be that we are quite happily setting up the >> requested DMA maps on the *wrong* IOMMU, and then taking faults when the >> device actually tries to do DMA. >> >> However, I'm not 100% convinced of that. The fault address looks >> suspiciously like a true physical address, not a virtual bus address of >> the type that we'd normally allocate for a dma_map_* operation. Those >> would start at 0xf000 and work downwards, typically. > > I like the "wrong IOMMU (or no IOMMU at all)" theory. If we didn't > connect the device with an IOMMU at all, that would explain the device > DMAing directly to a physical address, wouldn't it? > >> Do you have 'iommu=pt' on the kernel command line? Can I see the full >> dmesg as this system boots, and also a copy of the DMAR table? This will be really helpful information. This box has devices with RMRR records and if they're not set up correctly, DMAR faults can occur. >> >> We should also rate-limit DMA faults, which would avoid the lockup >> failure mode. Bjorn, what should an IOMMU driver *do* when it detects >> that a device is creating an endless stream of DMA faults and isn't >> aborting the transaction? > > You mentioned that POWER with EEH does something intelligent in this > case, but I'm not familiar with that code. We have AER support, which > can result in resetting a device, but I think DMA faults are reported > differently, and I don't think there's any nice existing way for PCI > to deal with them. Maybe there should be, though. > > Bjorn > ___ > iommu mailing list > io...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On Thu, 10 Apr 2014, Hannes Reinecke wrote: > On 04/10/2014 12:58 PM, Andreas Reis wrote: > > That patch appears to work in preventing the crashes, judged on one > > repeated appearance of the bug. > > > > dmesg had the usual > > [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing > > [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using > > xhci_hcd > > [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > > with disabled ep 880427b829c0 > > [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > > with disabled ep 880427b82a08 > > [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing > > > > repeated five times, followed by one > > [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error > > recovery > > > > and then as often as something tried to read from it: > > [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device > > > > The stick could then be properly un- and remounted (the latter if it > > had been physically replugged) without issue � for the bug to > > reoccur after one to three minutes. I tried this three times, no > > dmesg difference except the ep addresses varied on two of that. > > > Was this just that patch you've tested with or the entire patch series? > > If the latter, Alan, is this the expected outcome? Yes, it is. The same thing should happen with the entire patch series. > I would've thought the error recover should _not_ run into > offlining devices here, but rather the device should be recovered > eventually. The command times out, it is aborted, and the command is retried. The same thing happens, and we repeat five times. Eventually the SCSI core gives up and declares the device to be offline. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
On Thu, Apr 10, 2014 at 2:46 AM, Woodhouse, David wrote: >> > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear >> > > >> > > > > dmar: DRHD: handling fault status reg 602 >> > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr >> > > >> > > > > 7f61e000 > > That "Present bit in context entry is clear" fault means that we have > not set up *any* mappings for this PCI device… on this IOMMU. > >> > Yes, specifically (finally done bisecting): >> > >> > commit 2e45528930388658603ea24d49cf52867b928d3e >> > Author: Jiang Liu >> > Date: Wed Feb 19 14:07:36 2014 +0800 >> > >> > iommu/vt-d: Unify the way to process DMAR device scope array > > This commit is about how we decide which IOMMU a given PCI device is > attached to. > > Thus, my first guess would be that we are quite happily setting up the > requested DMA maps on the *wrong* IOMMU, and then taking faults when the > device actually tries to do DMA. > > However, I'm not 100% convinced of that. The fault address looks > suspiciously like a true physical address, not a virtual bus address of > the type that we'd normally allocate for a dma_map_* operation. Those > would start at 0xf000 and work downwards, typically. I like the "wrong IOMMU (or no IOMMU at all)" theory. If we didn't connect the device with an IOMMU at all, that would explain the device DMAing directly to a physical address, wouldn't it? > Do you have 'iommu=pt' on the kernel command line? Can I see the full > dmesg as this system boots, and also a copy of the DMAR table? > > We should also rate-limit DMA faults, which would avoid the lockup > failure mode. Bjorn, what should an IOMMU driver *do* when it detects > that a device is creating an endless stream of DMA faults and isn't > aborting the transaction? You mentioned that POWER with EEH does something intelligent in this case, but I'm not familiar with that code. We have AER support, which can result in resetting a device, but I think DMA faults are reported differently, and I don't think there's any nice existing way for PCI to deal with them. Maybe there should be, though. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sd: medium access timeout counter fails to reset
There is an error with the medium access timeout feature of the sd driver. The sdkp->medium_access_timed_out value is reset to zero in sd_done() in the wrong place. Currently it is reset to zero only when a command returns sense data. This can result in cases where the medium access check falsely triggers from timed out commands which are hours or days apart. For example, an I/O command times out and is aborted. It then retries and succeeds. But with no sense data generated and returned, the medium_access_timed_out value is not reset. If no sd command returns sense data, then the next command to time out (however far in time from the first failure) will trigger the medium access timeout and put the device offline. The resetting of sdkp->medium_access_timed_out should occur before the check for sense data. Signed-off-by: David Jeffery --- To reproduce using scsi_debug, use SCSI_DEBUG_OPT_TIMEOUT or SCSI_DEBUG_OPT_MAC_TIMEOUT to force an I/O command to timeout. Then, remove the opt value so the I/O will succeed on retry. Perform more I/O as desired. Finally, repeat the process to make a new I/O command time out. Without the patch, the device will be marked offline even though many I/O commands have succeeded between the 2 instances of timed out commands. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 470954a..a41e68e 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1689,12 +1689,12 @@ static int sd_done(struct scsi_cmnd *SCpnt) sshdr.ascq)); } #endif + sdkp->medium_access_timed_out = 0; + if (driver_byte(result) != DRIVER_SENSE && (!sense_valid || sense_deferred)) goto out; - sdkp->medium_access_timed_out = 0; - switch (sshdr.sense_key) { case HARDWARE_ERROR: case MEDIUM_ERROR: -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: esp_scsi QTAG in FAS216
2014-04-06 22:33 GMT+02:00 Michael Schmitz : > > Hello Dave, Tuomas, > > >> Also, looking at the timeout formulae in the old NCR53C9x.c driver, > >> the values would be different for FAS216. Why was this dropped from > >> the modern esp_scsi? > > > > I've never seen a formula for any ESP or FAS chip for the timeout > > other than the one mentioned in huge comment in > > esp_set_clock_params(), although I do see the 7668 instead of 8192 > > factor being used in the old NCR53C9x driver. > > I haven't gone far enough back in the 53C9x revision history to be > certain. but it would seem to me that Kars de Jong added that FAS > special case. > > Can you confirm that, Kars? Any recollection as to the reason? That is the value that's in the data manual of the Symbios Logic SYM53CF94/96-2 (the actual chip that's in my Amiga SCSI controller). Funny, according to the QLogic FAS2x6 manual the value should be 7682 for FAS216/216U/236/236U chips... I don't think it's all that important. It only means that the actual selection timeout used by the chip will be slightly shorter than it is supposed to be. Kind regards, Kars. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ch: add refcounting
On 04/10/2014 01:51 PM, Christoph Hellwig wrote: >> static int >> ch_release(struct inode *inode, struct file *file) >> { >> scsi_changer *ch = file->private_data; >> >> scsi_device_put(ch->device); >> +ch->device = NULL; >> file->private_data = NULL; >> +kref_put(&ch->ref, ch_destroy); > > Any reason you need to put the scsi_device here already? Defering > this would give you much eaiser life time rules, and no need to > deal with a NULL ch->device ever. > Sure. But this would require a far more in-depth analysis of the lifetime of the ch object, and most likely a far more intrusive patch. You're welcome to do so :-) This patch is just a minimal fix; I didn't dare to change too much of the internals. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Darlehen Angebot
Wir bieten privaten und gewerblichen Darlehen ohne Sicherheiten (nur Identifikation) bei 3% Zinssatz, ab € 10.000 bis € 90.000.000 in 1 Jahr bis 20 Jahren Laufzeit überall in der Welt.. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 04/10/2014 02:26 PM, Andreas Reis wrote: > Only your 0/3 patch to which Alan linked, along with two other > patches by Mathias Nyman ("disable usb3 on intel hosts" and "disable > all lpm related control transfers", one of which is the source of > the "do nothing"s). > > I'll revert the latter two and apply the rest of the set. Which I'm > guessing currently consists of said 0/3 patch — > http://www.spinics.net/lists/linux-scsi/msg73502.html > — plus 2/3 and 3/3? > Yes, that is correct. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
Only your 0/3 patch to which Alan linked, along with two other patches by Mathias Nyman ("disable usb3 on intel hosts" and "disable all lpm related control transfers", one of which is the source of the "do nothing"s). I'll revert the latter two and apply the rest of the set. Which I'm guessing currently consists of said 0/3 patch — http://www.spinics.net/lists/linux-scsi/msg73502.html — plus 2/3 and 3/3? Or should I just omit 0/3 and try whichever of the two in 1/3 "works best"? Rather confusing ATM. Anyway, for whatever reason the bug is happening rather frequently now. I've spotted the following occurring after the "Device offlined" line two times now: [ 206.901385] sd 11:0:0:0: [sdg] Unhandled error code [ 206.901394] sd 11:0:0:0: [sdg] [ 206.901397] Result: hostbyte=0x01 driverbyte=0x00 [ 206.901400] sd 11:0:0:0: [sdg] CDB: [ 206.901403] cdb[0]=0x2a: 2a 00 02 25 1b 50 00 00 08 00 [ 206.901419] end_request: I/O error, dev sdg, sector 35986256 The second time had "sd 12:0:0:0", "cdb[0]=0x28: 28 00 03 94 77 20 00 00 08 00" and a different sector. Andreas Reis On 10.04.2014 13:37, Hannes Reinecke wrote: On 04/10/2014 12:58 PM, Andreas Reis wrote: That patch appears to work in preventing the crashes, judged on one repeated appearance of the bug. dmesg had the usual [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using xhci_hcd [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b829c0 [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b82a08 [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing repeated five times, followed by one [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error recovery and then as often as something tried to read from it: [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device The stick could then be properly un- and remounted (the latter if it had been physically replugged) without issue — for the bug to reoccur after one to three minutes. I tried this three times, no dmesg difference except the ep addresses varied on two of that. Was this just that patch you've tested with or the entire patch series? If the latter, Alan, is this the expected outcome? I would've thought the error recover should _not_ run into offlining devices here, but rather the device should be recovered eventually. Andreas, can you test with the entire patch series and enable 'scsi_logging_level -s -E 5' prior to running the tests? THX. Cheers, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ch: add refcounting
> static int > ch_release(struct inode *inode, struct file *file) > { > scsi_changer *ch = file->private_data; > > scsi_device_put(ch->device); > + ch->device = NULL; > file->private_data = NULL; > + kref_put(&ch->ref, ch_destroy); Any reason you need to put the scsi_device here already? Defering this would give you much eaiser life time rules, and no need to deal with a NULL ch->device ever. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
On 04/10/2014 12:58 PM, Andreas Reis wrote: > That patch appears to work in preventing the crashes, judged on one > repeated appearance of the bug. > > dmesg had the usual > [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing > [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using > xhci_hcd > [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > with disabled ep 880427b829c0 > [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called > with disabled ep 880427b82a08 > [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing > > repeated five times, followed by one > [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error > recovery > > and then as often as something tried to read from it: > [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device > > The stick could then be properly un- and remounted (the latter if it > had been physically replugged) without issue — for the bug to > reoccur after one to three minutes. I tried this three times, no > dmesg difference except the ep addresses varied on two of that. > Was this just that patch you've tested with or the entire patch series? If the latter, Alan, is this the expected outcome? I would've thought the error recover should _not_ run into offlining devices here, but rather the device should be recovered eventually. Andreas, can you test with the entire patch series and enable 'scsi_logging_level -s -E 5' prior to running the tests? THX. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
That patch appears to work in preventing the crashes, judged on one repeated appearance of the bug. dmesg had the usual [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using xhci_hcd [ 215.350296] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b829c0 [ 215.350305] xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 880427b82a08 [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing repeated five times, followed by one [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error recovery and then as often as something tried to read from it: [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device The stick could then be properly un- and remounted (the latter if it had been physically replugged) without issue — for the bug to reoccur after one to three minutes. I tried this three times, no dmesg difference except the ep addresses varied on two of that. Andreas Reis On 09.04.2014 20:02, Alan Stern wrote: On Wed, 9 Apr 2014, Hannes Reinecke wrote: I finally got a chance to try it out. It does seem to do what we want. I didn't track the flow of control in complete detail, but the command definitely got aborted both times it was issued. Good, so it is as I thought. James, can we include this patch instead of your prior solution? First, we should have the original bug reporter try it out. Andreas, the patch in question can be found here: http://marc.info/?l=linux-usb&m=13962706597&w=2 Can you try this in place of the 1/3 patch posted by James? It should have the same effect, of preventing your system from crashing when the READ command fails. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] blk-mq: move request structures into struct blk_mq_tags
On Wed, Apr 09, 2014 at 10:23:32AM -0600, Jens Axboe wrote: > This should go into block/blk-mq-tag.h. Ok. > We might as well leave this, the mtip32xx conversion ends up using it. So > if we pull it now, it'll just be reintroduced shortly. It's back in the latest revision of the patch, just taking a struct blk_mq_tag pointer now so that it can be used by SCSI as well. I've also changed an opencode variant of it to use the helper. Pointer: http://git.infradead.org/users/hch/scsi.git/commitdiff/b0f1ed35bbeb6d0177fc0cc0bf5c880c3c5d1817 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
On Thu, 2014-04-10 at 09:15 +0200, Joerg Roedel wrote: > [+ David, VT-d maintainer ] > > Jiang, David, can you please have a look into this issue? > > > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear > > > >> > > > > dmar: DRHD: handling fault status reg 602 > > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > > > >> > > > > 7f61e000 That "Present bit in context entry is clear" fault means that we have not set up *any* mappings for this PCI device… on this IOMMU. > > Yes, specifically (finally done bisecting): > > > > commit 2e45528930388658603ea24d49cf52867b928d3e > > Author: Jiang Liu > > Date: Wed Feb 19 14:07:36 2014 +0800 > > > > iommu/vt-d: Unify the way to process DMAR device scope array This commit is about how we decide which IOMMU a given PCI device is attached to. Thus, my first guess would be that we are quite happily setting up the requested DMA maps on the *wrong* IOMMU, and then taking faults when the device actually tries to do DMA. However, I'm not 100% convinced of that. The fault address looks suspiciously like a true physical address, not a virtual bus address of the type that we'd normally allocate for a dma_map_* operation. Those would start at 0xf000 and work downwards, typically. Do you have 'iommu=pt' on the kernel command line? Can I see the full dmesg as this system boots, and also a copy of the DMAR table? We should also rate-limit DMA faults, which would avoid the lockup failure mode. Bjorn, what should an IOMMU driver *do* when it detects that a device is creating an endless stream of DMA faults and isn't aborting the transaction? I can set it to silent so that it just stops *reporting* the DMA faults for that device... and I suppose I can re-enable them when I next see a DMA mapping for it (although actually it'd be better to have a hook to do that on FLR or something like that). But there must be a better answer than that, surely? And I don't want to hack it up locally in *one* specific IOMMU driver, any more than I have to. On a POWER system with EEH, the kernel would end up isolating the offending device completely, and subsequently resetting it... -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: hpsa driver bug crack kernel down!
Hi Baoquan, Could you please help to give output of "lspci -"? Is device "hpsa :03:00.0" a legacy PCI device(non-PCIe)? It may have relationship with IOMMU driver. Thanks! Gerry On 2014/4/10 12:03, Bjorn Helgaas wrote: > [+cc Joerg, iommu list] > > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: >> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: >>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: >> [+linux-scsi] >> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: >>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: Hi, The kernel is 3.14.0+ which is pulled just now. >>> >>> Cc'ing more people. >>> >>> While the hpsa driver appears to be involved in some way, I'm sure if >>> this is a related issue, but as of today's pull I'm getting another >>> problem that causes my DL980 not to come up. >>> >>> *Massive* amounts of: >>> >>> DMAR:[fault reason 02] Present bit in context entry is clear >>> dmar: DRHD: handling fault status reg 602 >>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000 >>> >>> Then: >>> >>> hpsa :03:00.0: Controller lockup detected: 0x >>> ... >>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa] >>> ... >>> >>> Screenshot of the actual LOCKUP: >>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png >>> >>> While I haven't bisected, things worked fine until at least until commit >>> 39de65aa2c3e (April 2nd). >>> >>> Any ideas? >> >> Well, it's either a DMA remapping issue or a hpsa one. Your assertion >> that everything worked fine until 39de65aa2c3e would tend to vindicate >> hpsa, Hmm here you mean DMA, right? >>> >>> No, it vindicates the hpsa changes ... they don't seem to be causing >>> problems until something goes wrong with dma remapping. >>> > because all the hpsa changes went in before that under > Missing crucial info: > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > >> Merge: 3e75c6d b2bff6c >> Author: Linus Torvalds >> Date: Tue Apr 1 18:49:04 2014 -0700 >> >> Merge tag 'scsi-misc' of >> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi >> >> can you revalidate that this commit works OK just to make sure? Ok so I don't see those DMA messages and system starts just fine. I'm thinking perhaps something broke after the IO mmu stuff in commit 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly causing the CPU stalls and just blame hpsa in the path as a side effect? /me goes out to try the commit. >>> >>> That's my guess. The DMAR messages are DMA remapping issues caused in >>> the IOMMU. If I had to guess, I'd say the DMAR fault message is >>> indicating the IOMMU is calling for a mapping address before it can >>> satisfy the driver read request, which is causing the hang apparently in >>> the hpsa driver. >>> >>> I've added linux-pci to the cc; I think they deal with iommu issues on >>> x86. >> >> So that merge commit appears to be the culprit, I see both the DMA >> messages and the lockup blaming hpsa... > > My understanding so far (please correct me if I'm wrong): > > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'") > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpsa driver bug crack kernel down!
[+ David, VT-d maintainer ] Jiang, David, can you please have a look into this issue? Thanks, Joerg On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote: > On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote: > > [+cc Joerg, iommu list] > > > > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso wrote: > > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote: > > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote: > > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote: > > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote: > > >> > > > [+linux-scsi] > > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote: > > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote: > > >> > > > > > Hi, > > >> > > > > > > > >> > > > > > The kernel is 3.14.0+ which is pulled just now. > > >> > > > > > > >> > > > > Cc'ing more people. > > >> > > > > > > >> > > > > While the hpsa driver appears to be involved in some way, I'm > > >> > > > > sure if > > >> > > > > this is a related issue, but as of today's pull I'm getting > > >> > > > > another > > >> > > > > problem that causes my DL980 not to come up. > > >> > > > > > > >> > > > > *Massive* amounts of: > > >> > > > > > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear > > >> > > > > dmar: DRHD: handling fault status reg 602 > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr > > >> > > > > 7f61e000 > > >> > > > > > > >> > > > > Then: > > >> > > > > > > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x > > >> > > > > ... > > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa] > > >> > > > > ... > > >> > > > > > > >> > > > > Screenshot of the actual LOCKUP: > > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png > > >> > > > > > > >> > > > > While I haven't bisected, things worked fine until at least > > >> > > > > until commit > > >> > > > > 39de65aa2c3e (April 2nd). > > >> > > > > > > >> > > > > Any ideas? > > >> > > > > > >> > > > Well, it's either a DMA remapping issue or a hpsa one. Your > > >> > > > assertion > > >> > > > that everything worked fine until 39de65aa2c3e would tend to > > >> > > > vindicate > > >> > > > hpsa, > > >> > > > >> > Hmm here you mean DMA, right? > > >> > > >> No, it vindicates the hpsa changes ... they don't seem to be causing > > >> problems until something goes wrong with dma remapping. > > >> > > >> > > because all the hpsa changes went in before that under > > >> > > Missing crucial info: > > >> > > > > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1 > > >> > > > > >> > > > Merge: 3e75c6d b2bff6c > > >> > > > Author: Linus Torvalds > > >> > > > Date: Tue Apr 1 18:49:04 2014 -0700 > > >> > > > > > >> > > > Merge tag 'scsi-misc' of > > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > > >> > > > > > >> > > > can you revalidate that this commit works OK just to make sure? > > >> > > > >> > Ok so I don't see those DMA messages and system starts just fine. I'm > > >> > thinking perhaps something broke after the IO mmu stuff in commit > > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly > > >> > causing the CPU stalls and just blame hpsa in the path as a side > > >> > effect? > > >> > > > >> > /me goes out to try the commit. > > >> > > >> That's my guess. The DMAR messages are DMA remapping issues caused in > > >> the IOMMU. If I had to guess, I'd say the DMAR fault message is > > >> indicating the IOMMU is calling for a mapping address before it can > > >> satisfy the driver read request, which is causing the hang apparently in > > >> the hpsa driver. > > >> > > >> I've added linux-pci to the cc; I think they deal with iommu issues on > > >> x86. > > > > > > So that merge commit appears to be the culprit, I see both the DMA > > > messages and the lockup blaming hpsa... > > > > My understanding so far (please correct me if I'm wrong): > > > > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'") > > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'") > > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'") > > Yes, specifically (finally done bisecting): > > commit 2e45528930388658603ea24d49cf52867b928d3e > Author: Jiang Liu > Date: Wed Feb 19 14:07:36 2014 +0800 > > iommu/vt-d: Unify the way to process DMAR device scope array > > Now we have a PCI bus notification based mechanism to update DMAR > device scope array, we could extend the mechanism to support boot > time initialization too, which will help to unify and simplify > the implementation. > > Signed-off-by: Jiang Liu > Signed-off-by: Joerg Roedel > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html