Re: [RFC] SCSI EH document
Hello, Jeff James. Jeff Garzik wrote: Tejun Heo wrote: Hello, fellow SCSI/ATA developers. This is the first draft of SCSI EH document. This document tries to describe how SCSI EH works and what choirs should be done to maintain SCSI midlayer integrity. It's intended that this document can be used as reference for implementing either fine-grained EH callbacks or single eh_strategy_handler() callback. I'm pretty sure that I've screwed up in (hopefully) several places, so please correct me. Also, I have several places where I'm not sure or have questions, those are marked with *VERIFY* and *QUESTION* respectively. If you know the answer, please let me know. Thanks. SCSI EH Although it's up to the SCSI maintainer ultimately, I think it would be nice to stick this in Documentation/DocBook/ or Documentation/scsi/ If James agrees, I'll reformat it to DocBook and submit the patch. James, what do you think? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
On 09/07/05 07:22, Tejun Heo wrote: Hello, Jeff James. Jeff Garzik wrote: Tejun Heo wrote: Hello, fellow SCSI/ATA developers. This is the first draft of SCSI EH document. This document tries to describe how SCSI EH works and what choirs should be done to maintain SCSI midlayer integrity. It's intended that this document can be used as reference for implementing either fine-grained EH callbacks or single eh_strategy_handler() callback. I'm pretty sure that I've screwed up in (hopefully) several places, so please correct me. Also, I have several places where I'm not sure or have questions, those are marked with *VERIFY* and *QUESTION* respectively. If you know the answer, please let me know. Thanks. SCSI EH Although it's up to the SCSI maintainer ultimately, I think it would be nice to stick this in Documentation/DocBook/ or Documentation/scsi/ If James agrees, I'll reformat it to DocBook and submit the patch. James, what do you think? Add to this the vendor treatment of linux-scsi and lo and behold, our current situation. Again: Documentation/ManagamentStyle, Section 1: Decisions. Any work is good work. If _you_ think that it should go into Documentation/scsi, then it probably should. Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
On Wed, 2005-09-07 at 20:22 +0900, Tejun Heo wrote: If James agrees, I'll reformat it to DocBook and submit the patch. James, what do you think? Certainly ... just submit it as a patch and I'll put it in. As far as docbook goes, that's fine ... it's just that docbook has to be built and it's supposed to derive its API from the actual files its describing (although you get to do the sections) ... not that all docbook files do this, of course. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
On 08/30/05 06:47, Tejun Heo wrote: Hi, Luben. Luben Tuikov wrote: On 08/29/05 05:14, Tejun Heo wrote: Both all the list-heads need to be cleared, otherwise there may be list corruption next time the element is added to the list_head. scmd-eh_entry is never used as list head. It's always used as list entry. So, technically, it needs not be cleared, I think. No? The problem we had was w/ shost-eh_cmd_q not being cleared. In your strategy routine: ... spin_lock_irqsave(shost-host_lock, flags); list_splice_init(shost-eh_cmd_q, error_q); spin_unlock_irqrestore(shost-host_lock, flags); ... loop { ... list_del_init(cmd-eh_entry); ... } A good policy to follow is: 1. Never leave prev/next pointing somewhere where - you don't belong, or - where you don't know existance is in place. 2. Someone (memory release?) may do: if (!list_empty(cmd-eh_entry)) Refuse to free the memory. Which is often the case to check if the object belongs to a list. (You shouldn't have to do this but case pointed only for illustrational purposes.) Luben The reason why I explicitly stated that clearing scmd-eh_entry was not currently necessary was that libata had infinite loop bug due to eh_cmd_q related memory corruption and I wanted to make sure that not clearing scmd-eh_entry wasn't the cause. Previously, libata didn't clear both shost-eh_cmd_q and scmd-eh_entry. I posted an one liner which cleared shost-eh_cmd_q and I believe that's the fix for the problem. However, Mark Lord is still having lockup problems with libata which, I suspect, is because libata doesn't handle PM's properly. But, as I'm not very sure, I wanted to make sure that libata's not clearing scmd-eh_cmd_q is not causing the lockup. Ah, ok, thanks for clarifying. I agree that, as a policy, always clearing list_head's are nice if it's not in the *real* hot path where reducing several assignments matter, but as it's not strictly/technically necessary, it might be difficult to enforce as long as functions which don't clear list_head are there. Unless you're running on anything but a 6502, one or 10 assignments would make absolutely no difference. Been there, done that (including the 6502 ;-) ). Plus the fact that today's compilers are so much more advanced than 20 years ago, you'll see no speedup difference from the number of assignments. Also compare the processor time to assign a value to the processor time it does anything else, while the disk is doing IO. What matters is complexity, big-oh notation. Now, as a matter of practice and experience, programmers develop certain _patterns_ of programming certain constructs, like for example linked list manipulation. Those practices, (patterns), have proven bugless over many years of programming. This is what Jeff is talking about. While it is true that list_del(cmd-eh_entry); release command (cmd) is equivalent to list_del_init(cmd-eh_entry); release command(cmd), you will find that after time, when this code is augmented and some block written between the list_del() and release command, that block may take the wrong assumption causing you to have mysterious bugs. Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
On 08/29/05 05:14, Tejun Heo wrote: Both all the list-heads need to be cleared, otherwise there may be list corruption next time the element is added to the list_head. scmd-eh_entry is never used as list head. It's always used as list entry. So, technically, it needs not be cleared, I think. No? The problem we had was w/ shost-eh_cmd_q not being cleared. In your strategy routine: ... spin_lock_irqsave(shost-host_lock, flags); list_splice_init(shost-eh_cmd_q, error_q); spin_unlock_irqrestore(shost-host_lock, flags); ... loop { ... list_del_init(cmd-eh_entry); ... } A good policy to follow is: 1. Never leave prev/next pointing somewhere where - you don't belong, or - where you don't know existance is in place. 2. Someone (memory release?) may do: if (!list_empty(cmd-eh_entry)) Refuse to free the memory. Which is often the case to check if the object belongs to a list. (You shouldn't have to do this but case pointed only for illustrational purposes.) Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
Tejun Heo wrote: Hi, Jeff. Jeff Garzik wrote: Tejun Heo wrote: Both all the list-heads need to be cleared, otherwise there may be list corruption next time the element is added to the list_head. scmd-eh_entry is never used as list head. It's always used as list entry. So, technically, it needs not be cleared, I think. No? The problem we had was w/ shost-eh_cmd_q not being cleared. Every node is a list_head. You want all pointers for all nodes pointing to something useful, even if they are not actively present on a list, so that they may be easily and corrected added to a list at a later time. Read list_del_init() for example. Jeff Okay... to make things clearer. A struct list_head can have two roles. list head : other list_head gets added to it list entry : gets added to other list_head When a struct list_head acts as list head, it always needs to be in clean (pointing to valid things) state before being used. When a struct list_head acts as list entry, its current content doesn't matter. AFAICS, scmd-eh_entry is currently not used as list head in any place, neither do we use list_empty() test on it to determine whether or not it's on a list. The original code does clear scmd-eh_entry all the time and I am very okay with that. I just wanted to make sure that I didn't miss something regarding scmd-eh_entry's usage. If I'm missing such a usage, can you please point out? I'm mainly talking at a higher level. While it strictly doesn't matter in this instance, in general its a bad idea to clear a list using a list = (null) operation. Its a good way to leak references, leak memory, etc. In this specific case it seems to be OK. Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] SCSI EH document
On 08/25/05 23:53, Tejun Heo wrote: Hello, fellow SCSI/ATA developers. This is the first draft of SCSI EH document. This document tries to describe how SCSI EH works and what choirs should be done to maintain SCSI midlayer integrity. It's intended that this document can be used as reference for implementing either fine-grained EH callbacks or single eh_strategy_handler() callback. Very good stuff, Tejun! I'll have to print it and read it. At first glance, good job! Thanks, Luben I'm pretty sure that I've screwed up in (hopefully) several places, so please correct me. Also, I have several places where I'm not sure or have questions, those are marked with *VERIFY* and *QUESTION* respectively. If you know the answer, please let me know. Thanks. SCSI EH == TABLE OF CONTENTS [1] How SCSI commands travel through the midlayer and to EH [1-1] struct scsi_cmnd [1-2] How do scmd's get completed? [1-2-1] Completing a scmd w/ scsi_done [1-2-2] Completing a scmd w/ timeout [1-3] How EH takes over [2] How SCSI EH works [2-1] EH through fine-grained callbacks [2-1-1] Overview [2-1-2] Flow of scmds through EH [2-1-3] Flow of control [2-2] EH through hostt-eh_strategy_handler() [2-2-1] Pre hostt-eh_strategy_handler() SCSI midlayer conditions [2-2-2] Post hostt-eh_strategy_handler() SCSI midlayer conditions [2-2-3] Things to consider [1] How SCSI commands travel through the midlayer and to EH [1-1] struct scsi_cmnd Each SCSI command is represented with struct scsi_cmnd (== scmd). A scmd has two list_head's to link itself into lists. The two are scmd-list and scmd-eh_entry. The former is used for free list or per-device allocated scmd list and not of much interest to this EH discussion. The latter is used for completion and EH lists. [1-2] How do scmd's get completed? Once LLDD gets hold of a scmd, either the LLDD will complete the command by calling scsi_done callback passed from midlayer when invoking hostt-queuecommand() or SCSI midlayer will time it out. [1-2-1] Completing a scmd w/ scsi_done For all non-EH commands, scsi_done() is the completion callback. It does the following. 1. Delete timeout timer. If it fails, it means that timeout timer has expired and is going to finish the command. Just return. 2. Link scmd to per-cpu scsi_done_q using scmd-en_entry 3. Raise SCSI_SOFTIRQ SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to determine what to do with the command. scsi_decide_disposition() looks at the scmd-result value and sense data to determine what to do with the command. - SUCCESS scsi_finish_command() is invoked for the command. The function does some maintenance choirs and notify completion by calling scmd-done() callback, which, for fs requests, would be HLD completion callback - sd:sd_rw_intr, sr:rw_intr, st:st_intr. - NEEDS_RETRY - ADD_TO_MLQUEUE scmd is requeued to blk queue. - otherwise scsi_eh_scmd_add(scmd, 0) is invoked for the command. See [1-3] for details of this funciton. [1-2-2] Completing a scmd w/ timeout The timeout handler is scsi_times_out(). When a timeout occurs, this function 1. invokes optional hostt-eh_timedout() callback. Return value can be one of - EH_HANDLED This indicates that eh_timedout() dealt with the timeout. The scmd is passed to __scsi_done() and thus linked into per-cpu scsi_done_q. Normal command completion described in [1-2-1] follows. - EH_RESET_TIMER This indicates that more time is required to finish the command. Timer is restarted. This action is counted as a retry and only allowed scmd-allowed + 1(!) times. Once the limit is reached, EH_NOT_HANDLED action is taken. *NOTE* This action is racy as the LLDD could finish the scmd after the timeout has expired but before it's added back. In such cases, scsi_done() would think that timeout has occurred and return without doing anything. We lose completion and the command will time out again. - EH_NOT_HANDLED This is the same as when eh_timedout() callback doesn't exist. Step #2 is taken. 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the command. See [1-3] for more information. [1-3] How EH takes over scmds enter EH via scsi_eh_scmd_add(), which does the following. 1. Turns on scmd-eh_eflags as requested. It's 0 for error completions and SCSI_EH_CANCEL_CMD for timeouts. 2. Links scmd-eh_entry to shost-eh_cmd_q 3. Sets SHOST_RECOVERY bit in shost-shost_state 4. Increments shost-host_failed 5. Wakes up SCSI EH thread if shost-host_busy == shost-host_failed As can be seen above, once any scmd