Re: [RFC] Re: broken userland ABI in configfs binary attributes
On Mon, Aug 26, 2019 at 05:29:49PM +0100, Al Viro wrote: > On Mon, Aug 26, 2019 at 03:48:38AM +0100, Al Viro wrote: > > > We might be able to paper over that mess by doing what /dev/st does - > > checking that file_count(file) == 1 in ->flush() instance and doing commit > > there in such case. It's not entirely reliable, though, and it's definitely > > not something I'd like to see spreading. > > This "not entirely reliable" turns out to be an understatement. > If you have /proc/*/fdinfo/* being read from at the time of final close(2), > you'll get file_count(file) > 1 the last time ->flush() is called. In other > words, we'd get the data not committed at all. How about always doing the write in ->flush instead of ->release? Yes, that means that calling close(dup(fd)) is going to flush the write, but you shouldn't be doing that. I think there'll also be extra flushes done if you fork() during one of these writes ... but, again, don't do that. It's not like these are common things. Why does the prototype of file_operations::release suggest that it can return an int? __fput doesn't pay any attention to the return value. Changing that to return void might help some future programmers avoid this mistake.
Re: [PATCH] scsi: advansys: use struct_size() in kzalloc()
On Fri, Jan 11, 2019 at 08:41:43AM -0800, James Bottomley wrote: > On Fri, 2019-01-11 at 16:46 +0100, Hannes Reinecke wrote: > > > - asc_sg_head = kzalloc(sizeof(asc_scsi_q->sg_head) > > > + > > > - use_sg * sizeof(struct asc_sg_list), > > > GFP_ATOMIC); > > > + asc_sg_head = kzalloc(struct_size(asc_sg_head, > > > sg_list, use_sg), > > > + GFP_ATOMIC); > > If you want ... > > Are we sure there's a benefit to this? It's obvious that the current > code is correct but no-one's likely to test the new code for quite some > time, so changing the code introduces risk. What's the benefit of > making the change in legacy drivers? Just because we have a new, shiny > macro doesn't mean we have to force its use everywhere. > > I would recommend we have a rational needs test: so run the coccinelle > script over all the drivers to find out where this construct is used, > but only update those that are actually buggy with the new macro. It's hard to tell whether they're buggy. The problem being defended against here is integer overflow. So can 'use_sg' ever get large enough that sizeof(asc_scsi_q->sg_head) + use_sg * sizeof(struct asc_sg_list) is larger than 4 billion? Probably not; I imagine there's some rational sane limit elsewhere that says "No more than 256 SG elements" or something. But I don't know without checking. Is there some device-specific ioctl where the user can specify 2^31 scatterlist entries and somebody forgot to check? This macro is a defense-in-depth strategy, so using it as widely as possible makes more sense than arguing about whether there are already adequate safeguards in place.
Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption
On Tue, Dec 04, 2018 at 12:28:54PM -0800, Andrew Morton wrote: > On Tue, 4 Dec 2018 12:18:01 -0800 Matthew Wilcox wrote: > > I only had a review comment on 8/9, which I then withdrew during my review > > of patch 9/9. Unless I missed something during my re-review of my > > responses? > > And in 0/9, that 1.3MB allocation. > > Maybe it's using kvmalloc, I didn't look. Oh! That's the mptsas driver doing something utterly awful. Not the fault of this patchset, in any way.
Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption
On Tue, Dec 04, 2018 at 12:14:43PM -0800, Andrew Morton wrote: > On Tue, 4 Dec 2018 11:22:34 -0500 Tony Battersby > wrote: > > > On 11/13/18 1:36 AM, Matthew Wilcox wrote: > > > On Mon, Nov 12, 2018 at 10:46:35AM -0500, Tony Battersby wrote: > > >> Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy > > >> driver corrupts DMA pool memory. > > >> > > >> Signed-off-by: Tony Battersby > > > I like it! Also, here you're using blks_per_alloc in a way which isn't > > > normally in the performance path, but might be with the right config > > > options. With that, I withdraw my objection to the previous patch and > > > > > > Acked-by: Matthew Wilcox > > > > > > Andrew, can you funnel these in through your tree? If you'd rather not, > > > I don't mind stuffing them into a git tree and asking Linus to pull > > > for 4.21. > > > > > No reply for 3 weeks, so adding Andrew Morton to recipient list. > > > > Andrew, I have 9 dmapool patches ready for merging in 4.21. See Matthew > > Wilcox's request above. > > > > I'll take a look, but I see that this v4 series has several review > comments from Matthew which remain unresponded to. Please attend to > that. I only had a review comment on 8/9, which I then withdrew during my review of patch 9/9. Unless I missed something during my re-review of my responses? > Also, Andy had issues with the v2 series so it would be good to hear an > update from him? Certainly.
Re: [PATCH v4 0/9] mpt3sas and dmapool scalability
On Mon, Nov 12, 2018 at 10:40:57AM -0500, Tony Battersby wrote: > I posted v3 on August 7. Nobody acked or merged the patches, and then > I got too busy with other stuff to repost until now. Thanks for resending. They were in my pile of things to look at, but that's an ever-growing pile. > I believe these patches are ready for merging. I agree. > cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools > (manually cleaned up column alignment) > poolinfo - 0.1 > reply_post_free_array pool 1 21 192 1 > reply_free pool 1 1 41728 1 > reply pool 1 1 1335296 1 > sense pool 1 1 970272 1 > chain pool 373959 386048 128 12064 > reply_post_free pool12 12 166528 12 That reply pool ... 1 object of 1.3MB? That's a lot of strain to put on the page allocator. I wonder if anything can be done about that. (I'm equally non-thrilled about the sense pool, the reply_post_free pool and the reply_free pool, but they seem a little less stressful than the reply pool)
Re: [PATCH v4 9/9] dmapool: debug: prevent endless loop in case of corruption
On Mon, Nov 12, 2018 at 10:46:35AM -0500, Tony Battersby wrote: > Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy > driver corrupts DMA pool memory. > > Signed-off-by: Tony Battersby I like it! Also, here you're using blks_per_alloc in a way which isn't normally in the performance path, but might be with the right config options. With that, I withdraw my objection to the previous patch and Acked-by: Matthew Wilcox Andrew, can you funnel these in through your tree? If you'd rather not, I don't mind stuffing them into a git tree and asking Linus to pull for 4.21.
Re: [PATCH v4 8/9] dmapool: improve accuracy of debug statistics
On Mon, Nov 12, 2018 at 10:45:58AM -0500, Tony Battersby wrote: > +++ linux/mm/dmapool.c2018-08-06 17:52:53.0 -0400 > @@ -61,6 +61,7 @@ struct dma_pool { /* the pool */ > struct device *dev; > unsigned int allocation; > unsigned int boundary; > + unsigned int blks_per_alloc; > char name[32]; > struct list_head pools; > }; This one I'm not totally happy with. You're storing this value when it could be easily calculated each time through the show_pools() code. I appreciate this is a topic where reasonable people might have different opinions about which solution is preferable. > @@ -182,6 +182,9 @@ struct dma_pool *dma_pool_create(const c > retval->size = size; > retval->boundary = boundary; > retval->allocation = allocation; > + retval->blks_per_alloc = > + (allocation / boundary) * (boundary / size) + > + (allocation % boundary) / size; > > INIT_LIST_HEAD(&retval->pools); > >
Re: [PATCH v4 7/9] dmapool: cleanup integer types
On Mon, Nov 12, 2018 at 10:45:21AM -0500, Tony Battersby wrote: > To represent the size of a single allocation, dmapool currently uses > 'unsigned int' in some places and 'size_t' in other places. Standardize > on 'unsigned int' to reduce overhead, but use 'size_t' when counting all > the blocks in the entire pool. > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 6/9] dmapool: improve scalability of dma_pool_free
On Mon, Nov 12, 2018 at 10:44:40AM -0500, Tony Battersby wrote: > dma_pool_free() scales poorly when the pool contains many pages because > pool_find_page() does a linear scan of all allocated pages. Improve its > scalability by replacing the linear scan with virt_to_page() and storing > dmapool private data directly in 'struct page', thereby eliminating > 'struct dma_page'. In big O notation, this improves the algorithm from > O(n^2) to O(n) while also reducing memory usage. > > Thanks to Matthew Wilcox for the suggestion to use struct page. > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 5/9] dmapool: rename fields in dma_page
On Mon, Nov 12, 2018 at 10:44:02AM -0500, Tony Battersby wrote: > Rename fields in 'struct dma_page' in preparation for moving them into > 'struct page'. No functional changes. > > in_use -> dma_in_use > offset -> dma_free_off > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 4/9] dmapool: improve scalability of dma_pool_alloc
On Mon, Nov 12, 2018 at 10:43:25AM -0500, Tony Battersby wrote: > dma_pool_alloc() scales poorly when allocating a large number of pages > because it does a linear scan of all previously-allocated pages before > allocating a new one. Improve its scalability by maintaining a separate > list of pages that have free blocks ready to (re)allocate. In big O > notation, this improves the algorithm from O(n^2) to O(n). > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 3/9] dmapool: cleanup dma_pool_destroy
On Mon, Nov 12, 2018 at 10:42:48AM -0500, Tony Battersby wrote: > Remove a small amount of code duplication between dma_pool_destroy() and > pool_free_page() in preparation for adding more code without having to > duplicate it. No functional changes. > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 2/9] dmapool: remove checks for dev == NULL
On Mon, Nov 12, 2018 at 10:42:12AM -0500, Tony Battersby wrote: > dmapool originally tried to support pools without a device because > dma_alloc_coherent() supports allocations without a device. But nobody > ended up using dma pools without a device, so the current checks in > dmapool.c for pool->dev == NULL are both insufficient and causing bloat. > Remove them. > > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH v4 1/9] dmapool: fix boundary comparison
On Mon, Nov 12, 2018 at 10:41:34AM -0500, Tony Battersby wrote: > Fixes: e34f44b3517f ("pool: Improve memory usage for devices which can't > cross boundaries") > Signed-off-by: Tony Battersby Acked-by: Matthew Wilcox
Re: [PATCH -next] advansys: remove set but not used variable 'srb_tag' in adv_isr_callback
On Wed, Oct 17, 2018 at 02:49:50PM +0200, Johannes Thumshirn wrote: > On 17/10/18 14:34, YueHaibing wrote: > > - srb_tag = le32_to_cpu(scsiqp->srb_tag); > > scp = scsi_host_find_tag(boardp->shost, scsiqp->srb_tag); > > Shouldn't this be: > scp = scsi_host_find_tag(boardp->shost, > le32_to_cpu(scsiqp->srb_tag)); I don't think so. Look at how scsiqp->srb_tag is set: adv_build_req(struct asc_board *boardp, struct scsi_cmnd *scp, ADV_SCSI_REQ_Q **adv_scsiqpp) { + u32 srb_tag = scp->request->tag; ... + scsiqp->srb_tag = srb_tag; If we're not converting it to le32 on the way in, we probably don't want to convert it back on the way out. Please don't make me power up my PA-RISC machine with an Advansys card in it, just to find out ...
Re: [PATCH 1/2] iscsi target: Set conn->sess to NULL when iscsi_login_set_conn_values fails
On Mon, Aug 27, 2018 at 02:45:15PM -0500, Mike Christie wrote: > Signed-off-by: Vincent Pelletier > [rebased against idr/ida changes and to handle ret review comments from > Matthew] > Signed-off-by: Mike Christie > Cc: Matthew Wilcox Reviewed-by: Matthew Wilcox
Re: [PATCH 0/3]: iscsi target login fixes
On Fri, Aug 24, 2018 at 01:37:10PM -0500, Mike Christie wrote: > The following patchset is a round up of login fixes that have been > on the list and in Mathew's tree. They fix a couple of bugs in the > iscsi login failure handling path. > > The patches were made against Martin's 4.19/scsi-queue branch. > > Matthew, the first patch is one that I had sent to you that had a fix > for the idr issue you asked about on the list. There was a conflict so > I had to update the patch. I was not sure if you wanted me to keep > your signed off so to be safe I dropped it. > > Also, it sounded like your ida/idr patches were not going to make 4.19, > so hopefully the idr iscsi login fix will be merged already when your > patches get merged for 4.20, so you do not have to carry it anymore. Christmas came early, and Linus decided to pull the updated ida patches. commit 26abc916a898d34c5ad159315a2f683def3c Author: Mike Christie Date: Thu Jul 26 12:13:49 2018 -0500 iscsi target: fix session creation failure handling so patch 1 does not need to be added to v4.19/scsi-queue
Re: [PATCH 2/3] iscsi target: Set conn->sess to NULL when iscsi_login_set_conn_values fails
On Fri, Aug 24, 2018 at 01:37:12PM -0500, Mike Christie wrote: > Also, let idr_alloc return value through instead of replacing it with -ENOMEM, > as it is already a negative value and caller checks sign, not exact value. I bet it's less code to just return -ENOMEM in all situations instead of making the compiler remember the return value from idr_alloc(). The patch becomes smaller, at least ;-)
Re: [PATCH 0/3]: iscsi target login fixes
On Fri, Aug 24, 2018 at 01:37:10PM -0500, Mike Christie wrote: > The following patchset is a round up of login fixes that have been > on the list and in Mathew's tree. They fix a couple of bugs in the > iscsi login failure handling path. > > The patches were made against Martin's 4.19/scsi-queue branch. > > Matthew, the first patch is one that I had sent to you that had a fix > for the idr issue you asked about on the list. There was a conflict so > I had to update the patch. I was not sure if you wanted me to keep > your signed off so to be safe I dropped it. It should at least have my Reported-by: on it ;-) But I did review the patch, and it LGTM, so Martin, please include my Signed-off-by: on patch 1/3. > Also, it sounded like your ida/idr patches were not going to make 4.19, > so hopefully the idr iscsi login fix will be merged already when your > patches get merged for 4.20, so you do not have to carry it anymore. I'll leave it in my tree for now and drop it again once these fixes are merged, otherwise I'll have to resolve a marge conflict twice.
target/iscsi: Allocate session IDs from an IDA
Ping? It'd be nice to get some answers to these questions rather than merging this conversion with the questions still in the changelog ... - Forwarded message from Matthew Wilcox - Date: Thu, 21 Jun 2018 14:28:27 -0700 From: Matthew Wilcox To: linux-ker...@vger.kernel.org Cc: Matthew Wilcox , "Nicholas A. Bellinger" , Bart Van Assche , Hannes Reinecke , Kees Cook , Varun Prakash , Sagi Grimberg , Philippe Ombredanne , Greg Kroah-Hartman , Kate Stewart , Thomas Gleixner , "David S. Miller" , Denys Vlasenko , linux-scsi@vger.kernel.org, target-de...@vger.kernel.org Subject: [PATCH 18/26] target/iscsi: Allocate session IDs from an IDA X-Mailer: git-send-email 2.14.3 Since the session is never looked up by ID, we can use the more space-efficient IDA instead of the IDR. I think I found a bug where we never reuse session ID 0, but there may be a good reason for it, so I've merely documented it in this patch. Not reusing session ID 0 doesn't even leak memory. Signed-off-by: Matthew Wilcox --- drivers/target/iscsi/iscsi_target.c | 10 ++ drivers/target/iscsi/iscsi_target.h | 4 +--- drivers/target/iscsi/iscsi_target_login.c | 20 ++-- 3 files changed, 9 insertions(+), 25 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 8e223799347a..94bad43c41ff 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -57,9 +57,8 @@ static DEFINE_SPINLOCK(tiqn_lock); static DEFINE_MUTEX(np_lock); static struct idr tiqn_idr; -struct idr sess_idr; +DEFINE_IDA(sess_ida); struct mutex auth_id_lock; -spinlock_t sess_idr_lock; struct iscsit_global *iscsit_global; @@ -700,9 +699,7 @@ static int __init iscsi_target_init_module(void) spin_lock_init(&iscsit_global->ts_bitmap_lock); mutex_init(&auth_id_lock); - spin_lock_init(&sess_idr_lock); idr_init(&tiqn_idr); - idr_init(&sess_idr); ret = target_register_template(&iscsi_ops); if (ret) @@ -4375,10 +4372,7 @@ int iscsit_close_session(struct iscsi_session *sess) pr_debug("Decremented number of active iSCSI Sessions on" " iSCSI TPG: %hu to %u\n", tpg->tpgt, tpg->nsessions); - spin_lock(&sess_idr_lock); - idr_remove(&sess_idr, sess->session_index); - spin_unlock(&sess_idr_lock); - + ida_free(&sess_ida, sess->session_index); kfree(sess->sess_ops); sess->sess_ops = NULL; spin_unlock_bh(&se_tpg->session_lock); diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h index 42de1843aa40..48bac0acf8c7 100644 --- a/drivers/target/iscsi/iscsi_target.h +++ b/drivers/target/iscsi/iscsi_target.h @@ -55,9 +55,7 @@ extern struct kmem_cache *lio_ooo_cache; extern struct kmem_cache *lio_qr_cache; extern struct kmem_cache *lio_r2t_cache; -extern struct idr sess_idr; +extern struct ida sess_ida; extern struct mutex auth_id_lock; -extern spinlock_t sess_idr_lock; - #endif /*** ISCSI_TARGET_H ***/ diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index 99501785cdc1..561d2ad38989 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -336,22 +336,16 @@ static int iscsi_login_zero_tsih_s1( timer_setup(&sess->time2retain_timer, iscsit_handle_time2retain_timeout, 0); - idr_preload(GFP_KERNEL); - spin_lock_bh(&sess_idr_lock); - ret = idr_alloc(&sess_idr, NULL, 0, 0, GFP_NOWAIT); - if (ret >= 0) - sess->session_index = ret; - spin_unlock_bh(&sess_idr_lock); - idr_preload_end(); - + ret = ida_alloc(&sess_ida, GFP_KERNEL); if (ret < 0) { - pr_err("idr_alloc() for sess_idr failed\n"); + pr_err("Session ID allocation failed %d\n", ret); iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR, ISCSI_LOGIN_STATUS_NO_RESOURCES); kfree(sess); return -ENOMEM; } + sess->session_index = ret; sess->creation_time = get_jiffies_64(); /* * The FFP CmdSN window values will be allocated from the TPG's @@ -1163,11 +1157,9 @@ void iscsi_target_login_sess_out(struct iscsi_conn *conn, goto old_sess_out; if (conn->sess->se_sess) transport_free_session(conn->sess->se_sess); - if (conn->sess->session_index != 0) { - spin_lock_bh(&sess_idr_lock); - idr_remove(&sess_idr, conn->sess->session_index); -
[PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox The sbitmap and the percpu_ida perform essentially the same task, allocating tags for commands. Since the sbitmap is more used than the percpu_ida, convert the percpu_ida users to the sbitmap API. Signed-off-by: Matthew Wilcox --- drivers/scsi/qla2xxx/qla_target.c| 16 ++- drivers/target/iscsi/iscsi_target_util.c | 34 +--- drivers/target/sbp/sbp_target.c | 8 +++--- drivers/target/target_core_transport.c | 5 ++-- drivers/target/tcm_fc/tfc_cmd.c | 11 drivers/usb/gadget/function/f_tcm.c | 8 +++--- drivers/vhost/scsi.c | 9 --- drivers/xen/xen-scsiback.c | 8 +++--- include/target/iscsi/iscsi_target_core.h | 1 + include/target/target_core_base.h| 5 ++-- 10 files changed, 73 insertions(+), 32 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 025dc2d3f3de..cdf671c2af61 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd) return; } cmd->jiffies_at_free = get_jiffies_64(); - percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag); + sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag, + cmd->se_cmd.map_cpu); } EXPORT_SYMBOL(qlt_free_cmd); @@ -4084,7 +4085,8 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd) qlt_send_term_exchange(qpair, NULL, &cmd->atio, 1, 0); qlt_decr_num_pend_cmds(vha); - percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag); + sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag, + cmd->se_cmd.map_cpu); spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); spin_lock_irqsave(&ha->tgt.sess_lock, flags); @@ -4215,9 +4217,9 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha, { struct se_session *se_sess = sess->se_sess; struct qla_tgt_cmd *cmd; - int tag; + int tag, cpu; - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING); + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu); if (tag < 0) return NULL; @@ -4230,6 +4232,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha, qlt_incr_num_pend_cmds(vha); cmd->vha = vha; cmd->se_cmd.map_tag = tag; + cmd->se_cmd.map_cpu = cpu; cmd->sess = sess; cmd->loop_id = sess->loop_id; cmd->conf_compl_supported = sess->conf_compl_supported; @@ -5212,7 +5215,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha, struct fc_port *sess; struct se_session *se_sess; struct qla_tgt_cmd *cmd; - int tag; + int tag, cpu; unsigned long flags; if (unlikely(tgt->tgt_stop)) { @@ -5244,7 +5247,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha, se_sess = sess->se_sess; - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING); + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu); if (tag < 0) return; @@ -5275,6 +5278,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha, cmd->reset_count = ha->base_qpair->chip_reset; cmd->q_full = 1; cmd->qpair = ha->base_qpair; + cmd->se_cmd.map_cpu = cpu; if (qfull) { cmd->q_full = 1; diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 4435bf374d2d..28bcffae609f 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -17,7 +17,7 @@ **/ #include -#include +#include #include /* ipv6_addr_equal() */ #include #include @@ -147,6 +147,28 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd) spin_unlock_bh(&cmd->r2t_lock); } +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup) +{ + int tag = -1; + DEFINE_WAIT(wait); + struct sbq_wait_state *ws; + + if (state == TASK_RUNNING) + return tag; + + ws = &se_sess->sess_tag_pool.ws[0]; + for (;;) { + prepare_to_wait_exclusive(&ws->wait, &wait, state); + if (signal_pending_state(state, current)) + break; + schedule(); + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup); + } + + finish_wait(&ws->wait, &wait); + return tag; +} + /* * May be called from software interrupt (timer) context
[PATCH 0/2] Use sbitmap instead of percpu_ida
From: Matthew Wilcox This is a pretty rough-and-ready conversion of the target drivers from using percpu_ida to sbitmap. It compiles; I don't have a target setup, so it's completely untested. I haven't tried to do anything particularly clever here, so it's possible that, for example, the wait queue in iscsi_target_util could be more clever, like the block layer uses multiple wait queues to avoid pingpongs. Or maybe we could figure out a way to not store the CPU that the ID was allocated on, or perhaps the options I specified to sbitmap_queue_init() are suboptimal. Patch 2 isn't interesting; it just deletes the implementation. Patch 1 will be where all the action is. Matthew Wilcox (2): Convert target drivers to use sbitmap Remove percpu_ida drivers/scsi/qla2xxx/qla_target.c| 16 +- drivers/target/iscsi/iscsi_target_util.c | 34 +- drivers/target/sbp/sbp_target.c | 8 +- drivers/target/target_core_transport.c | 5 +- drivers/target/tcm_fc/tfc_cmd.c | 11 +- drivers/usb/gadget/function/f_tcm.c | 8 +- drivers/vhost/scsi.c | 9 +- drivers/xen/xen-scsiback.c | 8 +- include/linux/percpu_ida.h | 83 - include/target/iscsi/iscsi_target_core.h | 1 + include/target/target_core_base.h| 5 +- lib/Makefile | 2 +- lib/percpu_ida.c | 391 --- 13 files changed, 74 insertions(+), 507 deletions(-) delete mode 100644 include/linux/percpu_ida.h delete mode 100644 lib/percpu_ida.c -- 2.17.0
[PATCH 2/2] Remove percpu_ida
From: Matthew Wilcox With its one user gone, remove the library code. Signed-off-by: Matthew Wilcox --- include/linux/percpu_ida.h | 83 lib/Makefile | 2 +- lib/percpu_ida.c | 391 - 3 files changed, 1 insertion(+), 475 deletions(-) delete mode 100644 include/linux/percpu_ida.h delete mode 100644 lib/percpu_ida.c diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h deleted file mode 100644 index 07d78e4653bc.. --- a/include/linux/percpu_ida.h +++ /dev/null @@ -1,83 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __PERCPU_IDA_H__ -#define __PERCPU_IDA_H__ - -#include -#include -#include -#include -#include -#include -#include - -struct percpu_ida_cpu; - -struct percpu_ida { - /* -* number of tags available to be allocated, as passed to -* percpu_ida_init() -*/ - unsignednr_tags; - unsignedpercpu_max_size; - unsignedpercpu_batch_size; - - struct percpu_ida_cpu __percpu *tag_cpu; - - /* -* Bitmap of cpus that (may) have tags on their percpu freelists: -* steal_tags() uses this to decide when to steal tags, and which cpus -* to try stealing from. -* -* It's ok for a freelist to be empty when its bit is set - steal_tags() -* will just keep looking - but the bitmap _must_ be set whenever a -* percpu freelist does have tags. -*/ - cpumask_t cpus_have_tags; - - struct { - spinlock_t lock; - /* -* When we go to steal tags from another cpu (see steal_tags()), -* we want to pick a cpu at random. Cycling through them every -* time we steal is a bit easier and more or less equivalent: -*/ - unsignedcpu_last_stolen; - - /* For sleeping on allocation failure */ - wait_queue_head_t wait; - - /* -* Global freelist - it's a stack where nr_free points to the -* top -*/ - unsignednr_free; - unsigned*freelist; - } cacheline_aligned_in_smp; -}; - -/* - * Number of tags we move between the percpu freelist and the global freelist at - * a time - */ -#define IDA_DEFAULT_PCPU_BATCH_MOVE32U -/* Max size of percpu freelist, */ -#define IDA_DEFAULT_PCPU_SIZE ((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2) - -int percpu_ida_alloc(struct percpu_ida *pool, int state); -void percpu_ida_free(struct percpu_ida *pool, unsigned tag); - -void percpu_ida_destroy(struct percpu_ida *pool); -int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags, - unsigned long max_size, unsigned long batch_size); -static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags) -{ - return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE, - IDA_DEFAULT_PCPU_BATCH_MOVE); -} - -typedef int (*percpu_ida_cb)(unsigned, void *); -int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn, - void *data); - -unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu); -#endif /* __PERCPU_IDA_H__ */ diff --git a/lib/Makefile b/lib/Makefile index ce20696d5a92..7626dece1d27 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -39,7 +39,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ -percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \ +percpu-refcount.o rhashtable.o reciprocal_div.o \ once.o refcount.o usercopy.o errseq.o bucket_locks.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c deleted file mode 100644 index 6016f1deb1f5.. --- a/lib/percpu_ida.c +++ /dev/null @@ -1,391 +0,0 @@ -/* - * Percpu IDA library - * - * Copyright (C) 2013 Datera, Inc. Kent Overstreet - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2, or (at - * your option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Sat, Apr 28, 2018 at 09:46:52PM +0200, Julia Lawall wrote: > FWIW, here is my semantic patch and the output - it reports on things that > appear to be too small and things that it doesn't know about. > > What are the relevant pci wrappers? I didn't find them. Basically all of the functions in include/linux/pci-dma-compat.h > too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30 > too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30 > unknown: sound/pci/ctxfi/cthw20k2.c:2033: DMA_BIT_MASK(dma_bits) > unknown: sound/pci/ctxfi/cthw20k2.c:2034: DMA_BIT_MASK(dma_bits) This one's good: const unsigned int dma_bits = BITS_PER_LONG; > unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask and this one: consistent_mask = (instance->adapter_type == VENTURA_SERIES) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32); > unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: > DMA_BIT_MASK(wil->dma_addr_size) if (wil->dma_addr_size > 32) dma_set_mask_and_coherent(dev, DMA_BIT_MASK(wil->dma_addr_size)); > unknown: drivers/net/ethernet/netronome/nfp/nfp_main.c:452: > DMA_BIT_MASK(NFP_NET_MAX_DMA_BITS) drivers/net/ethernet/netronome/nfp/nfp_net.h:#define NFP_NET_MAX_DMA_BITS 40 > unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask Looks safe ... drivers/gpu/host1x/bus.c: device->dev.coherent_dma_mask = host1x->dev->coherent_dma_mask; drivers/gpu/host1x/bus.c: device->dev.dma_mask = &device->dev.coherent_dma_mask; drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(32), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(32), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: dma_set_mask_and_coherent(host->dev, host->info->dma_mask); drivers/gpu/host1x/dev.h: u64 dma_mask; /* mask of addressable memory */ ... but that reminds us that maybe some drivers aren't using dma_set_mask() but rather touching dma_mask directly. ... 57 more to look at ...
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote: > > Not really. We have unchecked_isa_dma to support about 4 drivers, > > Ah very neat: > > * CONFIG_CHR_DEV_OSST - "SCSI OnStream SC-x0 tape support" That's an upper level driver, like cdrom, disk and regular tapes. > * CONFIG_SCSI_ADVANSYS - "AdvanSys SCSI support" If we ditch support for the ISA boards, this can go away. > * CONFIG_SCSI_AHA1542 - "Adaptec AHA1542 support" Probably true. > * CONFIG_SCSI_ESAS2R - "ATTO Technology's ExpressSAS RAID adapter driver" That's being set to 0. You missed BusLogic.c and gdth.c
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 11:07:07AM -0500, Christopher Lameter wrote: > Well it looks like what we are using it for is to force allocation from > low physical memory if we fail to obtain proper memory through a normal > channel. The use of ZONE_DMA is only there for emergency purposes. > I think we could subsitute ZONE_DMA32 on x87 without a problem. > > Which means that ZONE_DMA has no purpose anymore. > > Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses > instead and remove ZONE_DMA32? > > That would actually improve the fallback because you have more memory for > the old devices. Some devices have incredibly bogus hardware like 28 bit addressing or 39 bit addressing. We don't have a good way to allocate memory by physical address other than than saying "GFP_DMA for anything less than 32, GFP_DMA32 (or GFP_KERNEL on 32-bit) for anything less than 64 bit". Even CMA doesn't have a "cma_alloc_phys()". Maybe that's the right place to put such an allocation API.
Re: [Lsf] [LSF/MM TOPIC] block-mq issues with FC
On Fri, Apr 08, 2016 at 01:29:26PM +0200, Hannes Reinecke wrote: > I'd like to propose a topic on block-mq issues with FC. > During my performance testing using block/scsi-mq with FC I've hit > several issues I'd like to discuss: If there's a general block-mq bitching session, I have some ideas :-) - Inability to use all queues supported by a device. Intel's P3700 supports 31 queues, but block-mq insists on assigning an even multiple of CPUs to each queue. So if you have 48 CPUs, it will use 24 queues. If you have 128 CPUs, it will only use 16 of the queues. - Interrupt steering needs to be controlled by block-mq instead of the driver. It's pointless to have each driver implement its own policies on interrupt steering, irqbalanced remains a source of end-user frustration, and block-mq can change the queue<->cpu mapping without the driver's knowledge. (thanks to Keith for his input on the first and suggestion of the second). -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: struct request cleanups
On Fri, Apr 17, 2015 at 10:37:15PM +0200, Christoph Hellwig wrote: > The real RFC is the last one which allocates the block_pc specific > data separately in the callers instead of bloating every struct > request with it. I always hated what we did, but with the upcoming > split of nvme into transports and command sets we'll need a NVME > equivalent of BLOCK_PC, and as NVMe was designed by crackmonkeys > dreaming of an ATA controller the "command block" for NVME is even > bigger than what we have to deal with in SCSI. Umm, the NVMe command was never supposed to be "transported". The crackmonkeys are the ones who are trying to ship it across the network instead of designing a proper network protocol. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] brd: Fix brd_direct_access with partitions
On Wed, Jul 30, 2014 at 05:18:47PM +0300, Boaz Harrosh wrote: > When brd_direct_access() is called on a partition-bdev > it would access the wrong sector. And caller would then > corrupt the device's data. > > This is a preliminary fix, Matthew Wilcox has a patch > in his DAX patchset which will define a global wrapper > to bdev->bd_disk->fops->direct_access that will do the > proper checks and translations before calling a driver > global member. (The way it is done at the rest of the > block stack) Uh, no, let's just focus on getting the DAX code in instead of putting in interim patches that will conflict. Patch 4/22 is uncontroversial, fixes this problem, has no dependencies, and is key to the rest of the DAX patchset. If this problem wants to be fixed, then put 4/22 in instead. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] brd.ko Never supported partitions should we remove the dead code ?
On Tue, Jul 29, 2014 at 07:37:49PM +0300, Boaz Harrosh wrote: > But before we are running to fix this bug. Could we please do better and just > remove the support for partitions > all together. > Since it *never* worked anyway, so probably no one needs it! Surly no > one used it I fixed this in patch 4/22 of the v8 series. The correct place to handle partitioning is in the block layer, not the individual drivers (nor the filesystems). -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] mpt3sas: Remove use of DEF_SCSI_QCMD
On Wed, Apr 02, 2014 at 03:41:15AM -0700, Christoph Hellwig wrote: > On Thu, Mar 27, 2014 at 04:40:31PM -0400, Matthew Wilcox wrote: > > Removing the host_lock from the I/O submission path gives a huge > > scalability improvement. > > This looks reasonable to me, but did you do a full audit that all state > is properly synchronized in the functions called from _scsih_qcmd_lck? > > If so that should be stated in the patch description. I did one for mpt2sas back in July 2011 when I first submitted this patch. http://marc.info/?l=linux-scsi&m=130980763617589&w=2 I must confess to having not done so for this iteration. The equivalent to this patch has been in RHEL since 6.2, so I hope that gives us some amount of confidence. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] fusion: Add free msg frames to the head, not tail of list
Reusing a msg frame quickly means it's still cache-hot. This yields a small but noticable performance improvement in a well-known database benchmark. This improvement is already present in the mpt3sas driver. Signed-off-by: Matthew Wilcox --- drivers/message/fusion/mptbase.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 570b18a..ebc0af7 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c @@ -1037,7 +1037,7 @@ mpt_free_msg_frame(MPT_ADAPTER *ioc, MPT_FRAME_HDR *mf) goto out; /* signature to know if this mf is freed */ mf->u.frame.linkage.arg1 = cpu_to_le32(0xdeadbeaf); - list_add_tail(&mf->u.frame.linkage.list, &ioc->FreeQ); + list_add(&mf->u.frame.linkage.list, &ioc->FreeQ); #ifdef MFCNT ioc->mfcnt--; #endif -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] Performance improvements for LSI SCSI cards
The host lock is a serious scalability problem on 2-socket and larger systems which are doing a lot of I/O. Before removing the temporary usgae of DEF_SCSI_QCMD, we need to remove all uses of serial_number. An unrelated performance issue is that reusing the most recent driver-specific data structure to track the I/O instead of the least recently used keeps the cache-hot lines in use, which is a nice performance improvement. It's already present in the mpt3sas driver, it just didn't make it into the fusion or mpt2sas drivers yet. Matthew Wilcox (7): mpt3sas: Remove uses of serial_number mpt3sas: Remove use of DEF_SCSI_QCMD mpt2sas: Remove uses of serial_number mpt2sas: Remove use of DEF_SCSI_QCMD mpt2sas: Add free smids to the head, not tail of list fusion: Add free msg frames to the head, not tail of list fusion: Remove use of DEF_SCSI_QCMD drivers/message/fusion/mptbase.c | 2 +- drivers/message/fusion/mptfc.c | 12 +--- drivers/message/fusion/mptsas.c | 10 -- drivers/message/fusion/mptscsih.c| 8 +++- drivers/message/fusion/mptscsih.h| 2 +- drivers/message/fusion/mptspi.c | 12 +--- drivers/scsi/mpt2sas/mpt2sas_base.c | 8 drivers/scsi/mpt2sas/mpt2sas_base.h | 2 +- drivers/scsi/mpt2sas/mpt2sas_ctl.c | 2 +- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 24 +--- drivers/scsi/mpt3sas/mpt3sas_base.h | 2 +- drivers/scsi/mpt3sas/mpt3sas_ctl.c | 2 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 24 +--- 13 files changed, 45 insertions(+), 65 deletions(-) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] mpt2sas: Remove uses of serial_number
The mpt2sas_scsih_issue_tm() function does not use the 'serial_number' argument passed to it. Removing it removes the last vestiges of the scsi_cmnd's serial_number field from this driver. Signed-off-by: Matthew Wilcox --- drivers/scsi/mpt2sas/mpt2sas_base.h | 2 +- drivers/scsi/mpt2sas/mpt2sas_ctl.c | 2 +- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 15 ++- 3 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h index 1f2ac3a..fd3b998 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.h +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h @@ -1065,7 +1065,7 @@ void mpt2sas_scsih_event_callback(struct MPT2SAS_ADAPTER *ioc, u8 msix_index, u32 reply); int mpt2sas_scsih_issue_tm(struct MPT2SAS_ADAPTER *ioc, u16 handle, uint channel, uint id, uint lun, u8 type, u16 smid_task, - ulong timeout, unsigned long serial_number, enum mutex_type m_type); + ulong timeout, enum mutex_type m_type); void mpt2sas_scsih_set_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle); void mpt2sas_scsih_clear_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle); void mpt2sas_expander_remove(struct MPT2SAS_ADAPTER *ioc, u64 sas_address); diff --git a/drivers/scsi/mpt2sas/mpt2sas_ctl.c b/drivers/scsi/mpt2sas/mpt2sas_ctl.c index b7f887c..62df8f9 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_ctl.c +++ b/drivers/scsi/mpt2sas/mpt2sas_ctl.c @@ -987,7 +987,7 @@ _ctl_do_mpt_command(struct MPT2SAS_ADAPTER *ioc, struct mpt2_ioctl_command karg, mpt2sas_scsih_issue_tm(ioc, le16_to_cpu(mpi_request->FunctionDependent1), 0, 0, 0, MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, 10, - 0, TM_MUTEX_ON); + TM_MUTEX_ON); ioc->tm_cmds.status = MPT2_CMD_NOT_USED; } else mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 7f0af4f..33586a3 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -2368,7 +2368,6 @@ mpt2sas_scsih_clear_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle) * @type: MPI2_SCSITASKMGMT_TASKTYPE__XXX (defined in mpi2_init.h) * @smid_task: smid assigned to the task * @timeout: timeout in seconds - * @serial_number: the serial_number from scmd * @m_type: TM_MUTEX_ON or TM_MUTEX_OFF * Context: user * @@ -2381,7 +2380,7 @@ mpt2sas_scsih_clear_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle) int mpt2sas_scsih_issue_tm(struct MPT2SAS_ADAPTER *ioc, u16 handle, uint channel, uint id, uint lun, u8 type, u16 smid_task, ulong timeout, - unsigned long serial_number, enum mutex_type m_type) + enum mutex_type m_type) { Mpi2SCSITaskManagementRequest_t *mpi_request; Mpi2SCSITaskManagementReply_t *mpi_reply; @@ -2634,8 +2633,7 @@ _scsih_abort(struct scsi_cmnd *scmd) handle = sas_device_priv_data->sas_target->handle; r = mpt2sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, scmd->device->lun, - MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK, smid, 30, - scmd->serial_number, TM_MUTEX_ON); + MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK, smid, 30, TM_MUTEX_ON); out: sdev_printk(KERN_INFO, scmd->device, "task abort: %s scmd(%p)\n", @@ -2696,8 +2694,7 @@ _scsih_dev_reset(struct scsi_cmnd *scmd) r = mpt2sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, scmd->device->lun, - MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET, 0, 30, 0, - TM_MUTEX_ON); + MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET, 0, 30, TM_MUTEX_ON); out: sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n", @@ -2757,7 +2754,7 @@ _scsih_target_reset(struct scsi_cmnd *scmd) r = mpt2sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, 0, MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, - 30, 0, TM_MUTEX_ON); + 30, TM_MUTEX_ON); out: starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n", @@ -5880,7 +5877,7 @@ broadcast_aen_retry: spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags); r = mpt2sas_scsih_issue_tm(ioc, handle, 0, 0, lun, - MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK, smid, 30, 0, + MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK, smid, 30, TM_MUTEX_OFF); if (r == FAILED) { sdev_printk(KERN_WARNING, sdev, @@ -5922,7 +5919,7 @@ broadcast_aen_retry: r = mpt2sas_scsih_issue_tm(ioc, handle, sdev->channel, sdev-
[PATCH 2/7] mpt3sas: Remove use of DEF_SCSI_QCMD
Removing the host_lock from the I/O submission path gives a huge scalability improvement. Signed-off-by: Matthew Wilcox --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 952f6e0..18e713d 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3515,7 +3515,7 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) /** - * _scsih_qcmd_lck - main scsi request entry point + * _scsih_qcmd - main scsi request entry point * @scmd: pointer to scsi command object * @done: function pointer to be invoked on completion * @@ -3526,9 +3526,9 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) * SCSI_MLQUEUE_HOST_BUSY if the entire host queue is full */ static int -_scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) +_scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) { - struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host); + struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); struct MPT3SAS_DEVICE *sas_device_priv_data; struct MPT3SAS_TARGET *sas_target_priv_data; Mpi2SCSIIORequest_t *mpi_request; @@ -3541,7 +3541,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) scsi_print_command(scmd); #endif - scmd->scsi_done = done; sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -3656,8 +3655,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) out: return SCSI_MLQUEUE_HOST_BUSY; } -static DEF_SCSI_QCMD(_scsih_qcmd) - /** * _scsih_normalize_sense - normalize descriptor and fixed format sense data -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] fusion: Remove use of DEF_SCSI_QCMD
Removing the host_lock from the I/O submission path gives a huge scalability improvement. Signed-off-by: Matthew Wilcox --- drivers/message/fusion/mptfc.c| 12 +--- drivers/message/fusion/mptsas.c | 10 -- drivers/message/fusion/mptscsih.c | 8 +++- drivers/message/fusion/mptscsih.h | 2 +- drivers/message/fusion/mptspi.c | 12 +--- 5 files changed, 18 insertions(+), 26 deletions(-) diff --git a/drivers/message/fusion/mptfc.c b/drivers/message/fusion/mptfc.c index fd75108..02a3eef 100644 --- a/drivers/message/fusion/mptfc.c +++ b/drivers/message/fusion/mptfc.c @@ -649,7 +649,7 @@ mptfc_slave_alloc(struct scsi_device *sdev) } static int -mptfc_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) +mptfc_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *SCpnt) { struct mptfc_rport_info *ri; struct fc_rport *rport = starget_to_rport(scsi_target(SCpnt->device)); @@ -658,14 +658,14 @@ mptfc_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) if (!vdevice || !vdevice->vtarget) { SCpnt->result = DID_NO_CONNECT << 16; - done(SCpnt); + SCpnt->scsi_done(SCpnt); return 0; } err = fc_remote_port_chkready(rport); if (unlikely(err)) { SCpnt->result = err; - done(SCpnt); + SCpnt->scsi_done(SCpnt); return 0; } @@ -673,15 +673,13 @@ mptfc_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) ri = *((struct mptfc_rport_info **)rport->dd_data); if (unlikely(!ri)) { SCpnt->result = DID_IMM_RETRY << 16; - done(SCpnt); + SCpnt->scsi_done(SCpnt); return 0; } - return mptscsih_qcmd(SCpnt,done); + return mptscsih_qcmd(SCpnt); } -static DEF_SCSI_QCMD(mptfc_qcmd) - /* * mptfc_display_port_link_speed - displaying link speed * @ioc: Pointer to MPT_ADAPTER structure diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 00d339c..711fcb5 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1896,7 +1896,7 @@ mptsas_slave_alloc(struct scsi_device *sdev) } static int -mptsas_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) +mptsas_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *SCpnt) { MPT_SCSI_HOST *hd; MPT_ADAPTER *ioc; @@ -1904,11 +1904,11 @@ mptsas_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) if (!vdevice || !vdevice->vtarget || vdevice->vtarget->deleted) { SCpnt->result = DID_NO_CONNECT << 16; - done(SCpnt); + SCpnt->scsi_done(SCpnt); return 0; } - hd = shost_priv(SCpnt->device->host); + hd = shost_priv(shost); ioc = hd->ioc; if (ioc->sas_discovery_quiesce_io) @@ -1917,11 +1917,9 @@ mptsas_qcmd_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) if (ioc->debug_level & MPT_DEBUG_SCSI) scsi_print_command(SCpnt); - return mptscsih_qcmd(SCpnt,done); + return mptscsih_qcmd(SCpnt); } -static DEF_SCSI_QCMD(mptsas_qcmd) - /** * mptsas_mptsas_eh_timed_out - resets the scsi_cmnd timeout * if the device under question is currently in the diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c index 727819c..2a1c6f2 100644 --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1304,7 +1304,6 @@ int mptscsih_show_info(struct seq_file *m, struct Scsi_Host *host) /** * mptscsih_qcmd - Primary Fusion MPT SCSI initiator IO start routine. * @SCpnt: Pointer to scsi_cmnd structure - * @done: Pointer SCSI mid-layer IO completion function * * (linux scsi_host_template.queuecommand routine) * This is the primary SCSI IO start routine. Create a MPI SCSIIORequest @@ -1313,7 +1312,7 @@ int mptscsih_show_info(struct seq_file *m, struct Scsi_Host *host) * Returns 0. (rtn value discarded by linux scsi mid-layer) */ int -mptscsih_qcmd(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) +mptscsih_qcmd(struct scsi_cmnd *SCpnt) { MPT_SCSI_HOST *hd; MPT_FRAME_HDR *mf; @@ -1329,10 +1328,9 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_cmnd *)) hd = shost_priv(SCpnt->device->host); ioc = hd->ioc; - SCpnt->scsi_done = done; - dmfprintk(ioc, printk(MYIOC_s_DEBUG_FMT "qcmd: SCpnt=%p, done()=%p\n", - ioc->name, SCpnt, done)); + dmfprintk(ioc, printk(MYIOC_s_DEBUG_FMT "qcmd: SCpnt=%p\n", + ioc->name, SCpnt)); if (ioc->taskm
[PATCH 5/7] mpt2sas: Add free smids to the head, not tail of list
Reusing a smid quickly means it's still cache-hot. This yields a small but noticable performance improvement in a well-known database benchmark. This improvement is already present in the mpt3sas driver. Signed-off-by: Matthew Wilcox --- drivers/scsi/mpt2sas/mpt2sas_base.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index bde63f7..8b88118 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1739,14 +1739,14 @@ mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER *ioc, u16 smid) list_for_each_entry_safe(chain_req, next, &ioc->scsi_lookup[i].chain_list, tracker_list) { list_del_init(&chain_req->tracker_list); - list_add_tail(&chain_req->tracker_list, + list_add(&chain_req->tracker_list, &ioc->free_chain_list); } } ioc->scsi_lookup[i].cb_idx = 0xFF; ioc->scsi_lookup[i].scmd = NULL; ioc->scsi_lookup[i].direct_io = 0; - list_add_tail(&ioc->scsi_lookup[i].tracker_list, + list_add(&ioc->scsi_lookup[i].tracker_list, &ioc->free_list); spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags); @@ -1764,13 +1764,13 @@ mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER *ioc, u16 smid) /* hi-priority */ i = smid - ioc->hi_priority_smid; ioc->hpr_lookup[i].cb_idx = 0xFF; - list_add_tail(&ioc->hpr_lookup[i].tracker_list, + list_add(&ioc->hpr_lookup[i].tracker_list, &ioc->hpr_free_list); } else if (smid <= ioc->hba_queue_depth) { /* internal queue */ i = smid - ioc->internal_smid; ioc->internal_lookup[i].cb_idx = 0xFF; - list_add_tail(&ioc->internal_lookup[i].tracker_list, + list_add(&ioc->internal_lookup[i].tracker_list, &ioc->internal_free_list); } spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags); -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] mpt2sas: Remove use of DEF_SCSI_QCMD
Removing the host_lock from the I/O submission path gives a huge scalability improvement. Signed-off-by: Matthew Wilcox --- drivers/scsi/mpt2sas/mpt2sas_scsih.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index 33586a3..7351843 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -3950,9 +3950,9 @@ _scsih_setup_direct_io(struct MPT2SAS_ADAPTER *ioc, struct scsi_cmnd *scmd, * SCSI_MLQUEUE_HOST_BUSY if the entire host queue is full */ static int -_scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) +_scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) { - struct MPT2SAS_ADAPTER *ioc = shost_priv(scmd->device->host); + struct MPT2SAS_ADAPTER *ioc = shost_priv(shost); struct MPT2SAS_DEVICE *sas_device_priv_data; struct MPT2SAS_TARGET *sas_target_priv_data; struct _raid_device *raid_device; @@ -3960,7 +3960,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) u32 mpi_control; u16 smid; - scmd->scsi_done = done; sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -4036,7 +4035,7 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) MPT_TARGET_FLAGS_RAID_COMPONENT) mpi_request->Function = MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH; else - mpi_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST; + mpi_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST; mpi_request->DevHandle = cpu_to_le16(sas_device_priv_data->sas_target->handle); mpi_request->DataLength = cpu_to_le32(scsi_bufflen(scmd)); @@ -4080,8 +4079,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *)) return SCSI_MLQUEUE_HOST_BUSY; } -static DEF_SCSI_QCMD(_scsih_qcmd) - /** * _scsih_normalize_sense - normalize descriptor and fixed format sense data * @sense_buffer: sense data returned by target -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] mpt3sas: Remove uses of serial_number
The mpt3sas_scsih_issue_tm() function does not use the 'serial_number' argument passed to it. Removing it removes the last vestiges of the scsi_cmnd's serial_number field from this driver. Signed-off-by: Matthew Wilcox --- drivers/scsi/mpt3sas/mpt3sas_base.h | 2 +- drivers/scsi/mpt3sas/mpt3sas_ctl.c | 2 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 15 ++- 3 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 0ebf5d9..9b90a6f 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -993,7 +993,7 @@ void mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase); int mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, uint channel, uint id, uint lun, u8 type, u16 smid_task, - ulong timeout, unsigned long serial_number, enum mutex_type m_type); + ulong timeout, enum mutex_type m_type); void mpt3sas_scsih_set_tm_flag(struct MPT3SAS_ADAPTER *ioc, u16 handle); void mpt3sas_scsih_clear_tm_flag(struct MPT3SAS_ADAPTER *ioc, u16 handle); void mpt3sas_expander_remove(struct MPT3SAS_ADAPTER *ioc, u64 sas_address); diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c index 9b89de1..ba9cbe5 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c +++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c @@ -980,7 +980,7 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command karg, mpt3sas_scsih_issue_tm(ioc, le16_to_cpu(mpi_request->FunctionDependent1), 0, 0, 0, MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, 30, - 0, TM_MUTEX_ON); + TM_MUTEX_ON); } else mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, FORCE_BIG_HAMMER); diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index a961fe1..952f6e0 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -2029,7 +2029,6 @@ mpt3sas_scsih_clear_tm_flag(struct MPT3SAS_ADAPTER *ioc, u16 handle) * @type: MPI2_SCSITASKMGMT_TASKTYPE__XXX (defined in mpi2_init.h) * @smid_task: smid assigned to the task * @timeout: timeout in seconds - * @serial_number: the serial_number from scmd * @m_type: TM_MUTEX_ON or TM_MUTEX_OFF * Context: user * @@ -2042,7 +2041,7 @@ mpt3sas_scsih_clear_tm_flag(struct MPT3SAS_ADAPTER *ioc, u16 handle) int mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, uint channel, uint id, uint lun, u8 type, u16 smid_task, ulong timeout, - unsigned long serial_number, enum mutex_type m_type) + enum mutex_type m_type) { Mpi2SCSITaskManagementRequest_t *mpi_request; Mpi2SCSITaskManagementReply_t *mpi_reply; @@ -2293,8 +2292,7 @@ _scsih_abort(struct scsi_cmnd *scmd) handle = sas_device_priv_data->sas_target->handle; r = mpt3sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, scmd->device->lun, - MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK, smid, 30, - scmd->serial_number, TM_MUTEX_ON); + MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK, smid, 30, TM_MUTEX_ON); out: sdev_printk(KERN_INFO, scmd->device, "task abort: %s scmd(%p)\n", @@ -2353,8 +2351,7 @@ _scsih_dev_reset(struct scsi_cmnd *scmd) r = mpt3sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, scmd->device->lun, - MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET, 0, 30, 0, - TM_MUTEX_ON); + MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET, 0, 30, TM_MUTEX_ON); out: sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n", @@ -2414,7 +2411,7 @@ _scsih_target_reset(struct scsi_cmnd *scmd) r = mpt3sas_scsih_issue_tm(ioc, handle, scmd->device->channel, scmd->device->id, 0, MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, - 30, 0, TM_MUTEX_ON); + 30, TM_MUTEX_ON); out: starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n", @@ -5425,7 +5422,7 @@ _scsih_sas_broadcast_primitive_event(struct MPT3SAS_ADAPTER *ioc, spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags); r = mpt3sas_scsih_issue_tm(ioc, handle, 0, 0, lun, - MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK, smid, 30, 0, + MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK, smid, 30, TM_MUTEX_OFF); if (r == FAILED) { sdev_printk(KERN_WARNING, sdev, @@ -5467,7 +5464,7 @@ _scsih_sas_broadcast_primitive_event(struct MPT3SAS_ADAPTER
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Fri, Jan 24, 2014 at 10:57:48AM +, Mel Gorman wrote: > So far on the table is > > 1. major filesystem overhawl > 2. major vm overhawl > 3. use compound pages as they are today and hope it does not go >completely to hell, reboot when it does Is the below paragraph an exposition of option 2, or is it an option 4, change the VM unit of allocation? Other than the names you're using, this is basically what I said to Kirill in an earlier thread; either scrap the difference between PAGE_SIZE and PAGE_CACHE_SIZE, or start making use of it. The fact that EVERYBODY in this thread has been using PAGE_SIZE when they should have been using PAGE_CACHE_SIZE makes me wonder if part of the problem is that the split in naming went the wrong way. ie use PTE_SIZE for 'the amount of memory pointed to by a pte_t' and use PAGE_SIZE for 'the amount of memory described by a struct page'. (we need to remove the current users of PTE_SIZE; sparc32 and powerpc32, but that's just a detail) And we need to fix all the places that are currently getting the distinction wrong. SMOP ... ;-) What would help is correct typing of variables, possibly with sparse support to help us out. Big Job. > That's why I suggested that it may be necessary to change the basic unit of > allocation the kernel uses to be larger than the MMU page size and restrict > how the sub pages are used. The requirement is to preserve the property that > "with the exception of slab reclaim that any reclaim action will result > in K-sized allocation succeeding" where K is the largest blocksize used by > any underlying storage device. From an FS perspective then certain things > would look similar to what they do today. Block data would be on physically > contiguous pages, buffer_heads would still manage the case where block_size > <= PAGEALLOC_PAGE_SIZE (as opposed to MMU_PAGE_SIZE), particularly for > dirty tracking and so on. The VM perspective is different because now it > has to handle MMU_PAGE_SIZE in a very different way, page reclaim of a page > becomes multiple unmap events and so on. There would also be anomalies such > as mlock of a range smaller than PAGEALLOC_PAGE_SIZE becomes difficult if > not impossible to sensibly manage because mlock of a 4K page effectively > pins the rest and it's not obvious how we would deal with the VMAs in that > case. It would get more than just the storage gains though. Some of the > scalability problems that deal with massive amount of struct pages may > magically go away if the base unit of allocation and management changes. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of block-integrity
On Tue, Jan 07, 2014 at 02:33:10PM +0100, Hannes Reinecke wrote: > I would indeed like to have a discussion at LSF about the future of > DIX. DIF is not an issue, as most HBAs support it already and we > actually need it for proper connectivity. > > DIX, OTOH, has been left dormant since time immemorial, with the > only known (supposed) user being Oracle. > (I actually talked to the DB/2 folks about it, and the response > was a polite feigned interest ...) I think there's a terminology confusion here; you seem to be using DIX to mean the TCP CRC and DIF to mean T10DIF. I've seen other people use DIX to mean separate SGLs for metadata and DIF to mean interleaved data. Can you confirm which thing you mean here? -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
On Mon, Jun 24, 2013 at 10:07:51AM +0200, Ingo Molnar wrote: > I'm wondering, how will this scheme work if the IO completion latency is a > lot more than the 5 usecs in the testcase? What if it takes 20 usecs or > 100 usecs or more? There's clearly a threshold at which it stops making sense, and our current NAND-based SSDs are almost certainly on the wrong side of that threshold! I can't wait for one of the "post-NAND" technologies to make it to market in some form that makes it economical to use in an SSD. The problem is that some of the people who are looking at those technologies are crazy. They want to "bypass the kernel" and "do user space I/O" because "the kernel is too slow". This patch is part of an effort to show them how crazy they are. And even if it doesn't convince them, at least users who refuse to rewrite their applications to take advantage of magical userspace I/O libraries will see real performance benefits. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
On Mon, Jun 24, 2013 at 08:11:02PM -0400, Steven Rostedt wrote: > What about hooking into the idle_balance code? That happens if we are > about to go to idle but before the full schedule switch to the idle > task. > > > In __schedule(void): > > if (unlikely(!rq->nr_running)) > idle_balance(cpu, rq); That may be a great place to try it from the PoV of the scheduler, but are you OK with me threading a struct backing_dev_info * all the way through the scheduler to idle_balance()? :-) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
On Mon, Jun 24, 2013 at 09:15:45AM +0200, Jens Axboe wrote: > Willy, I think the general design is fine, hooking in via the bdi is the > only way to get back to the right place from where you need to sleep. > Some thoughts: > > - This should be hooked in via blk-iopoll, both of them should call into > the same driver hook for polling completions. I actually started working on this, then I realised that it's actually a bad idea. blk-iopoll's poll function is to poll the single I/O queue closest to this CPU. The iowait poll function is to poll all queues that the I/O for this address_space might complete on. I'm reluctant to ask drivers to define two poll functions, but I'm even more reluctant to ask them to define one function with two purposes. > - It needs to be more intelligent in when you want to poll and when you > want regular irq driven IO. Oh yeah, absolutely. While the example patch didn't show it, I wouldn't enable it for all NVMe devices; only ones with sufficiently low latency. There's also the ability for the driver to look at the number of outstanding I/Os and return an error (eg -EBUSY) to stop spinning. > - With the former note, the app either needs to opt in (and hence > willingly sacrifice CPU cycles of its scheduling slice) or it needs to > be nicer in when it gives up and goes back to irq driven IO. Yup. I like the way you framed it. If the task *wants* to spend its CPU cycles on polling for I/O instead of giving up the remainder of its time slice, then it should be able to do that. After all, it already can; it can submit an I/O request via AIO, and then call io_getevents in a tight loop. So maybe the right way to do this is with a task flag? If we go that route, I'd like to further develop this option to allow I/Os to be designated as "low latency" vs "normal". Taking a page fault would be "low latency" for all tasks, not just ones that choose to spin for I/O. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: Allow block drivers to poll for I/O instead of sleeping
A paper at FAST2012 (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed out the performance overhead of taking interrupts for low-latency block I/Os. The solution the author investigated was to spin waiting for each I/O to complete. This is inefficient as Linux submits many I/Os which are not latency-sensitive, and even when we do submit latency-sensitive I/Os (eg swap-in), we frequently submit several I/Os before waiting. This RFC takes a different approach, only spinning when we would otherwise sleep. To implement this, I add an 'io_poll' function pointer to backing_dev_info. I include a sample implementation for the NVMe driver. Next, I add an io_wait() function which will call io_poll() if it is set. It falls back to calling io_schedule() if anything goes wrong with io_poll() or the task exceeds its timeslice. Finally, all that is left is to judiciously replace calls to io_schedule() with calls to io_wait(). I think I've covered the main contenders with sleep_on_page(), sleep_on_buffer() and the DIO path. I've measured the performance benefits of this with a Chatham NVMe prototype device and a simple # dd if=/dev/nvme0n1 of=/dev/null iflag=direct bs=512 count=100 The latency of each I/O reduces by about 2.5us (from around 8.0us to around 5.5us). This matches up quite well with the performance numbers shown in the FAST2012 paper (which used a similar device). Is backing_dev_info the right place for this function pointer? Have I made any bad assumptions about the correct way to retrieve the backing_dev_info from (eg) a struct page, buffer_head or kiocb? Should I pass a 'state' into io_wait() instead of retrieving it from 'current'? Is kernel/sched/core.c the right place for io_wait()? Should it be bdi_wait() instead? Should it be marked with __sched? Where should I add documentation for the io_poll() function pointer? NB: The NVMe driver piece of this is for illustrative purposes only and should not be applied. I've rearranged the diff so that the interesting pieces appear first. diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index c388155..97f8fde 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -68,6 +68,7 @@ struct backing_dev_info { unsigned int capabilities; /* Device capabilities */ congested_fn *congested_fn; /* Function pointer if device is md/dm */ void *congested_data; /* Pointer to aux data for congested func */ + int (*io_poll)(struct backing_dev_info *); char *name; @@ -126,6 +127,8 @@ int bdi_has_dirty_io(struct backing_dev_info *bdi); void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi); void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2); +void io_wait(struct backing_dev_info *bdi); + extern spinlock_t bdi_lock; extern struct list_head bdi_list; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 58453b8..4840065 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4527,6 +4527,36 @@ long __sched io_schedule_timeout(long timeout) return ret; } +/* + * Wait for an I/O to complete against this backing_dev_info. If the + * task exhausts its timeslice polling for completions, call io_schedule() + * anyway. If a signal comes pending, return so the task can handle it. + * If the io_poll returns an error, give up and call io_schedule(), but + * swallow the error. We may miss an I/O completion (eg if the interrupt + * handler gets to it first). Guard against this possibility by returning + * if we've been set back to TASK_RUNNING. + */ +void io_wait(struct backing_dev_info *bdi) +{ + if (bdi && bdi->io_poll) { + long state = current->state; + while (!need_resched()) { + int ret = bdi->io_poll(bdi); + if ((ret > 0) || signal_pending_state(state, current)) { + set_current_state(TASK_RUNNING); + return; + } + if (current->state == TASK_RUNNING) + return; + if (ret < 0) + break; + cpu_relax(); + } + } + + io_schedule(); +} + /** * sys_sched_get_priority_max - return maximum RT priority. * @policy: scheduling class. diff --git a/fs/aio.c b/fs/aio.c index 2bbcacf..7b20397 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -453,11 +453,15 @@ static void kill_ioctx(struct kioctx *ctx) */ ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { + struct backing_dev_info *bdi = NULL; + + if (iocb->ki_filp) + bdi = iocb->ki_filp->f_mapping->backing_dev_info; while (atomic_read(&iocb->ki_users)) { set_current_state(TASK_UNINTERRUPTIBLE); if (!atomic_read(&iocb->ki_users)) break; - io_schedule(); +
Re: [PATCH][SCSI] mpt3sas: Paer 1 of MPI API headers
On Sat, Sep 29, 2012 at 10:52:50PM +0200, Bj??rn Mork wrote: > writes: > > This patch contains MPI API headers > > Why can't this and the other headers be shared between the mpt2sas and > mpt3sas drivers? Looks like you are duplicating a lot of code already > present in drivers/scsi/mpt2sas. Why? That's a recipe for maintenance > hell... I would go further and ask why this needs a new driver; why can't mpt2sas be enhanced to drive the 12Gbit cards? -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] ipr: Reduce queuecommand lock time
On Mon, Jul 16, 2012 at 03:48:08PM -0500, Brian King wrote: > +static int ipr_queuecommand(struct Scsi_Host *shost, > + struct scsi_cmnd *scsi_cmd) > { > struct ipr_ioa_cfg *ioa_cfg; > struct ipr_resource_entry *res; > struct ipr_ioarcb *ioarcb; > struct ipr_cmnd *ipr_cmd; > + unsigned long lock_flags = 0; You don't need to initialise lock_flags. Looking at the rest of the code, you drop the lock in the middle, then re-acquire it. That'll help with hold time, but I'm not convinced it'll help with performance. Have you done performance testing with these changes? I seem to remember we used an eight-socket box to show host_lock problems in the past. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"
On Tue, Feb 26, 2008 at 11:23:01AM -0800, Benny Halevy wrote: > Pete, the subject says "PATCH 1/2" but I didn't see any follow-up message > for PATCH 2/2. Just wondering :) I think the problem's on your end ... I got it and so did marc: http://marc.info/?l=linux-scsi&m=120405067313933&w=2 -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [3/22] Remove unchecked_isa_dma in advansys.c
On Tue, Feb 26, 2008 at 07:18:18AM -0800, James Bottomley wrote: > On Tue, 2008-02-26 at 04:44 +0100, Andi Kleen wrote: > > No pci_alloc_coherent works, but pci_map_* will not. > > Yes, it does ... dma_map is a flush/virt_to_phys on x86. so of course it > works. If you mean it doesn't transform from > 24 bit phys to < 24 bit > phys, then yes; but that's not an API requirement; that's what bounce > buffering is all about, so for ISA devices we always only call dma_map > on < 24 bit phys addresses and it all works. Actually, tomo's patch violates that ... asc_dvc_varp->overrun_buf = kzalloc(ASC_OVERRUN_BSIZE, GFP_KERNEL); [...] asc_dvc->overrun_dma = dma_map_single(board->dev, asc_dvc->overrun_buf, ASC_OVERRUN_BSIZE, DMA_FROM_DEVICE); There's a missing check for ISA boards and GFP_DMA. Honestly, I'd rather over-allocate asc_dvc_varp->overrun_buf by 4 bytes and map the appropriate alignment. Patch next week, probably. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [3/22] Remove unchecked_isa_dma in advansys.c
On Mon, Feb 25, 2008 at 11:40:35PM +0100, Andi Kleen wrote: > On Mon, Feb 25, 2008 at 02:47:46PM -0700, Matthew Wilcox wrote: > > I have to say I really don't like this patch. I'll look up my current > > Why do you not like it? What would you do better? > > > patch queue for this driver next week -- I think I fixed this differently. > > (I must have fixed it somehow because it works on parisc, which is most > > unforgiving of drivers which do DMA without the DMA API). This patch is unnecessary. The driver does not access the cmnd field using DMA. See AscPutReadyQueue where it calls: AscMemWordCopyPtrToLram(iop_base, q_addr + ASC_SCSIQ_CDB_BEG, (uchar *)scsiq->cdbptr, scsiq->q2.cdb_len >> 1); Despite the name, it doesn't actually copy a pointer to Lram. It copies the thing pointed at to Lram. (ie it's memcpy_toio with a minor twist). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [3/22] Remove unchecked_isa_dma in advansys.c
On Mon, Feb 25, 2008 at 11:40:35PM +0100, Andi Kleen wrote: > > (I must have fixed it somehow because it works on parisc, which is most > > unforgiving of drivers which do DMA without the DMA API). > > At least on x86 the DMA API cannot do ISA bouncing. You're saying that if I set a 24-bit DMA mask, and then do a pci_alloc_coherent(), x86 might hand me back something that's not accessible? That would be just broken. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [3/22] Remove unchecked_isa_dma in advansys.c
On Mon, Feb 25, 2008 at 12:35:16AM +0100, Andi Kleen wrote: > That patch is a little more complicated than the others. advansys > was the only ISA driver who actually passed ->cmnd to the firmware. > So I implemented a simple own bounce buffer scheme for this case. > Also did sense_buffer bouncing in the driver while I was at it; > which means it doesn't require the mid layer to do this anymore. I have to say I really don't like this patch. I'll look up my current patch queue for this driver next week -- I think I fixed this differently. (I must have fixed it somehow because it works on parisc, which is most unforgiving of drivers which do DMA without the DMA API). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [SCSI] gdth: misc cleanups, preparation for ISA/EISA hotplug API
On Sun, Feb 24, 2008 at 12:31:17AM -0500, Christoph Hellwig wrote: > On Sun, Feb 24, 2008 at 12:18:23AM -0500, Jeff Garzik wrote: > > hm. We'll see how it plays out... on the remove side, the above is > > exact what happens in gdth_remove_one() without my patch, thus > > consolidating two cases of the same code into one. There is a less-strong > > argument for doing the allocation that way, but it may turn out to be > > useful anyway once the ISA/EISA API conversion is complete. > > EISA ->remove has a different prototype from PCI ->remove from ISA > ->remove, so gdth_remove_one will be split up eventually. Having the > scsi_host_put duplicated in each shouldn't be too much of a problem :) Shouldn't need to duplicate it ... free bus-specific things in the ->remove method, and call a common helper. See advansys_release() and its callers in advansys.c for how I did it. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info
On Fri, Feb 22, 2008 at 09:17:33AM -0700, Yang, Bo wrote: > James, > > What is the status for this patch? We need to submit more patches based > on the acceptance of this patch. I've NAKed it. You need to use compat_ioctl. That way your code will work even if you run a 32-bit x86 application on ia64 (or a 64-bit application on x86). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New sparse warning in lpfc_sli.c
On Wed, Feb 20, 2008 at 11:30:19PM -0800, Harvey Harrison wrote: > James, somewhere between 2.6.25-rc1 and -rc2 this new warning appeared, care > to give the locking a quick once-over here? > > drivers/scsi/lpfc/lpfc_sli.c:645:1: warning: context imbalance in > 'lpfc_sli_hbqbuf_fill_hbqs' - wrong count at exit Easy enough to spot by hand: /* Check whether HBQ is still in use */ spin_lock_irqsave(&phba->hbalock, flags); if (!phba->hbq_in_use) { spin_unlock_irqrestore(&phba->hbalock, flags); return 0; } /* Populate HBQ entries */ for (i = start; i < end; i++) { hbq_buffer = (phba->hbqs[hbqno].hbq_alloc_buffer)(phba); if (!hbq_buffer) --->return 1; Here's a patch: lpfc: Balance locking Commit 3163f725a5d071eea1830bbbfab78cfe3fc9baaf introduced locking in lpfc_sli_hbqbuf_fill_hbqs, but missed unlocking on one exit. Reported-by: Harvey Harrison <[EMAIL PROTECTED]> Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]> Cc: James Smart <[EMAIL PROTECTED]> diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index f532064..fc0d950 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -648,28 +648,24 @@ lpfc_sli_hbqbuf_fill_hbqs(struct lpfc_hba *phba, uint32_t hbqno, uint32_t count) unsigned long flags; struct hbq_dmabuf *hbq_buffer; - if (!phba->hbqs[hbqno].hbq_alloc_buffer) { + if (!phba->hbqs[hbqno].hbq_alloc_buffer) return 0; - } start = phba->hbqs[hbqno].buffer_count; end = count + start; - if (end > lpfc_hbq_defs[hbqno]->entry_count) { + if (end > lpfc_hbq_defs[hbqno]->entry_count) end = lpfc_hbq_defs[hbqno]->entry_count; - } /* Check whether HBQ is still in use */ spin_lock_irqsave(&phba->hbalock, flags); - if (!phba->hbq_in_use) { - spin_unlock_irqrestore(&phba->hbalock, flags); - return 0; - } + if (!phba->hbq_in_use) + goto out; /* Populate HBQ entries */ for (i = start; i < end; i++) { hbq_buffer = (phba->hbqs[hbqno].hbq_alloc_buffer)(phba); if (!hbq_buffer) - return 1; + goto err; hbq_buffer->tag = (i | (hbqno << 16)); if (lpfc_sli_hbq_to_firmware(phba, hbqno, hbq_buffer)) phba->hbqs[hbqno].buffer_count++; @@ -677,8 +673,12 @@ lpfc_sli_hbqbuf_fill_hbqs(struct lpfc_hba *phba, uint32_t hbqno, uint32_t count) (phba->hbqs[hbqno].hbq_free_buffer)(phba, hbq_buffer); } + out: spin_unlock_irqrestore(&phba->hbalock, flags); return 0; + err: + spin_unlock_irqrestore(&phba->hbalock, flags); + return 1; } int -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_lib: Convert GFP_ATOMIC to gfp_flags
Sorry, $SUBJECT should have read 'gfp_mask' instead of 'gfp_flags'. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_lib: Convert GFP_ATOMIC to gfp_flags
scsi_init_io() is already passed a gfp_mask, so we should use that instead of an explicit GFP_ATOMIC Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f243fc3..b81170d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1048,8 +1048,8 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask) goto err_exit; if (blk_bidi_rq(cmd->request)) { - struct scsi_data_buffer *bidi_sdb = kmem_cache_zalloc( - scsi_bidi_sdb_cache, GFP_ATOMIC); + struct scsi_data_buffer *bidi_sdb; + bidi_sdb = kmem_cache_zalloc(scsi_bidi_sdb_cache, gfp_mask); if (!bidi_sdb) { error = BLKPREP_DEFER; goto err_exit; @@ -1057,7 +1057,7 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask) cmd->request->next_rq->special = bidi_sdb; error = scsi_init_sgtable(cmd->request->next_rq, bidi_sdb, - GFP_ATOMIC); + gfp_mask); if (error) goto err_exit; } -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_scan: Convert GFP_ATOMIC to GFP_KERNEL
There is no need to allocate this memory atomically. There is no problem with sleeping during a bus scan, and it's actually causing problems for libata. Also remove the GFP_DMA flag. If it's needed, the block layer will bounce-buffer it. Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 1dc165a..98c66bd 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -242,7 +242,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, extern void scsi_evt_thread(struct work_struct *work); sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size, - GFP_ATOMIC); + GFP_KERNEL); if (!sdev) goto out; @@ -747,7 +747,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, */ sdev->inquiry = kmemdup(inq_result, max_t(size_t, sdev->inquiry_len, 36), - GFP_ATOMIC); + GFP_KERNEL); if (sdev->inquiry == NULL) return SCSI_SCAN_NO_RESPONSE; @@ -1010,8 +1010,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget, if (!sdev) goto out; - result = kmalloc(result_len, GFP_ATOMIC | - ((shost->unchecked_isa_dma) ? __GFP_DMA : 0)); + result = kmalloc(result_len, GFP_KERNEL); if (!result) goto out_free_sdev; @@ -1328,8 +1327,7 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags, * prevent us from finding any LUNs on this target. */ length = (max_scsi_report_luns + 1) * sizeof(struct scsi_lun); - lun_data = kmalloc(length, GFP_ATOMIC | - (sdev->host->unchecked_isa_dma ? __GFP_DMA : 0)); + lun_data = kmalloc(length, GFP_KERNEL); if (!lun_data) { printk(ALLOC_FAILURE_MSG, __FUNCTION__); goto out; -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA API Re: Remove dma_coherent_mem interface
On Wed, Oct 31, 2007 at 09:03:24PM +, ian wrote: > I'd like to ask that this not happen - there are several users of this > code in the handhelds.org tree, which we hope will merge into mainline. > this code is _essential_ to allow support of these devices. So ... it's been 4 months. Any new users merged? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI RAM driver
On Tue, Feb 19, 2008 at 10:14:53PM +0900, FUJITA Tomonori wrote: > I see that two drivers have very different objectives but if we add > use_thread option to scsi_debug (we can do easily), it seems that > scsi_debug can provide all the features that scsi_ram does. It's not just use_thread. It's also discard_read/discard_write. And scsi_ram has a different data storage model from scsi_debug -- scsi_debug simulates an arbitrarily sized disc by wrapping around some small (virtually) contiguous allocation of pages; scsi_ram actually allocates the amount of ram that it's told to. This can be solved with another module parameter, of course. I'm in no way opposed to merging the two; it's a question of whether Doug will mind me doing some surgery on his driver. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SCSI RAM driver
This is the first release of a driver we've been using internally at Intel for a few weeks which is as low-latency as possible. It's designed to let us find latency issues elsewhere in the storage stack (eg filesystem, block layer, scsi layer) There are a few different options, controlled through module parameters. The sector size and disc capacity are load-time parameters, but the parameters affecting performance are tweakable at runtime. Arguably, this driver should be merged into scsi_debug. I didn't want to trip over Doug's toes while developing this driver, and the two drivers do have very different objectives. scsi_debug is obviously a lot more fully-featured and implements lots of bits of the spec I simply haven't bothered with. Like MODE SENSE ;-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index a5f0aaa..e8dd685 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1499,6 +1499,16 @@ config SCSI_DEBUG information. This driver is primarily of use to those testing the SCSI and block subsystems. If unsure, say N. +config SCSI_RAM + tristate "SCSI RAM-based host" + depends on SCSI + help + This driver simulates a host adapter with one disc attached. + The primary purpose of this driver is for performance testing + of the fs/block/scsi stack; as such it attempts to simulate a + disc which is as fast as RAM. If you are unsure how to answer + this question, say N. + config SCSI_MESH tristate "MESH (Power Mac internal SCSI) support" depends on PPC32 && PPC_PMAC && SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 925c26b..95466d8 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -133,6 +133,7 @@ obj-$(CONFIG_SCSI_ENCLOSURE)+= ses.o # This goes last, so that "real" scsi devices probe earlier obj-$(CONFIG_SCSI_DEBUG) += scsi_debug.o +obj-$(CONFIG_SCSI_RAM) += scsi_ram.o obj-$(CONFIG_SCSI_WAIT_SCAN) += scsi_wait_scan.o diff --git a/drivers/scsi/scsi_ram.c b/drivers/scsi/scsi_ram.c new file mode 100644 index 000..f097aee --- /dev/null +++ b/drivers/scsi/scsi_ram.c @@ -0,0 +1,618 @@ +/* + * scsi_ram.c - A RAM-based SCSI driver for Linux. + * + * This driver is intended to run as fast as possible, hence the options + * to discard writes and reads. + * By default, it'll allocate half a gigabyte of RAM to use as a ramdisc; + * you can change this with the `capacity' module parameter. + * + * (C) Copyright 2007-2008 Intel Corporation + * Author: Matthew Wilcox <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Matthew Wilcox <[EMAIL PROTECTED]>"); + +#define DRV_NAME "scsi_ram" + +static unsigned int sector_size = 512; +module_param(sector_size, uint, 0444); +MODULE_PARM_DESC(sector_size, "Size of sectors"); + +static unsigned int capacity = 1024 * 1024; +module_param(capacity, uint, 0444); +MODULE_PARM_DESC(capacity, "Number of logical blocks in device"); + +static int throw_away_writes; +module_param(throw_away_writes, int, 0644); +MODULE_PARM_DESC(throw_away_writes, "Discard all writes to the device"); + +static int throw_away_reads; +module_param(throw_away_reads, int, 0644); +MODULE_PARM_DESC(throw_away_reads, "Don't actually read data from the device"); + +static int use_thread = 1; +module_param(use_thread, int, 0644); +MODULE_PARM_DESC(use_thread, "Use a separate thread to do data accesses"); + +static void copy_buffer(struct scsi_cmnd *cmnd, char *buf, int len) +{ + char *p; + struct scatterlist *sg; + int i; + + scsi_for_each_sg(cmnd, sg, scsi_sg_count(cmnd), i) { + int tocopy = sg->length; + if (tocopy > len) + tocopy = len; + + p = kmap_atomic(sg_page(sg), KM_USER0); + memcpy(p + sg->offset, buf, tocopy); + kunmap_atomic(p, KM_USER0); + + len -= tocopy; + if (!len) + break; + buf += tocopy; + } + + scsi_set_resid(cmnd, len); +} + +static char inquiry[57] = { + 0, 0, 5, 0x22, 52, 0, 0, 0x0a, /* 0-7 */ + 'L', 'i', 'n', 'u', 'x', ' ', ' ', ' ', /* 8-15 */ + 'R', 'A', 'M', ' ', 'D',
Re: [PATCH] scsi_debug: disable clustering
On Sun, Feb 17, 2008 at 08:18:11AM -0600, James Bottomley wrote: > No, he means that kmap_atomic can only map a page of data. This makes > single page only sg list entries and input assumption into this loop. > with ENABLE_CLUSTERING, that's potentially not true. Of course, this > accidentally works most of the time because of the way kmap functions. Ah, right. I'm on the verge of releasing a ram-based scsi driver I've been working on ... this loop should work fine with clustering as it takes account of the sg potentially having multiple pages: scsi_for_each_sg(cmnd, sg, scsi_sg_count(cmnd), i) { struct page *sgpage = sg_page(sg); unsigned int to_off = sg->offset; unsigned int sg_copy = sg->length; if (sg_copy > len) sg_copy = len; len -= sg_copy; while (sg_copy > 0) { char *vto, *vfrom; unsigned int page_copy; if (from_off > to_off) page_copy = PAGE_SIZE - from_off; else page_copy = PAGE_SIZE - to_off; if (page_copy > sg_copy) page_copy = sg_copy; vfrom = get_data_page(data_pfn); vto = get_sg_page(sgpage); memcpy(vto + to_off, vfrom + from_off, page_copy); put_sg_page(vto); put_data_page(vfrom); from_off += page_copy; if (from_off == PAGE_SIZE) { from_off = 0; data_pfn++; } to_off += page_copy; if (to_off == PAGE_SIZE) { to_off = 0; sgpage++; } sg_copy -= page_copy; } if (!len) break; } get_data_page() and get_sg_page() each do a kmap_atomic, so I was concerned that I might be out of KM_USER* pages. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: disable clustering
On Sat, Feb 16, 2008 at 11:57:15PM +0900, FUJITA Tomonori wrote: > scsi_debug does at several places: > > for_each_sg(sdb->table.sgl, sg, sdb->table.nents, k) { > kaddr = (unsigned char *) > kmap_atomic(sg_page(sg), KM_USER0); > > > We cannot do something like that with the clustering enabled (or we > can use scsi_kmap_atomic_sg). Why not? Is KM_USER0 used for something else with clstering enabled? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device
On Wed, Feb 13, 2008 at 10:57:37AM +0200, Boaz Harrosh wrote: > I still don't have a card for testing myself. Again anyone > wants to send me a card. Intel people anybody home? Apparently Intel sold this line of cards to Adaptec. The copyright notice in the file backs this up: * Copyright (C) 1995-06 ICP vortex GmbH, Achim Leubner * * Copyright (C) 2002-04 Intel Corporation * * Copyright (C) 2003-06 Adaptec Inc. * * <[EMAIL PROTECTED]> * -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gdth: convert to PCI hotplug API
On Fri, Feb 15, 2008 at 10:44:52AM -0500, Jeff Garzik wrote: > I kept those array for one reason: you need it to preserve the existing > in-driver PCI device sort. Just get rid of it. I got rid of it for sym2 during 2.5 and very few people have complained. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] scsi_dh: Add support for SDEV_PASSIVE
On Mon, Feb 11, 2008 at 10:27:46AM -0800, Chandra Seetharaman wrote: > On Sat, 2008-02-09 at 05:45 -0700, Matthew Wilcox wrote: > > On Mon, Feb 04, 2008 at 01:19:30PM -0800, Chandra Seetharaman wrote: > > > The device does send these error messages currently, but it takes some > > > time to get the check condition back, which adds up the time to boot > > > especially when the # of LUNS is huge. > > > > > > For example, in my test configuration, I had 40 luns, and the time > > > difference (with this patch and without it) to boot is 171 seconds and > > > 1426 seconds. > > > > Was that with sync or async SCSI bus scanning? > > I didn't change anything, IOW, i did default scanning, which I would > guess sync ?! That would depend on your CONFIG_SCSI_SCAN_ASYNC setting. Try booting with 'scsi_mod.scan=async' and without this patch, and see how long it takes. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_error: Fix language abuse.
On Sat, Feb 09, 2008 at 01:50:20PM +, Alan Cox wrote: > On Fri, 08 Feb 2008 20:32:54 -0500 Douglas Gilbert <[EMAIL PROTECTED]> wrote: > > Alan Cox wrote: > > > The word "illegal" has a precise dictionary meaning of "prohibited by > > > law". > > > > Also "contrary to or forbidden by official rules, regulations, etc". > > The OED I have here doesn't seem to think so, however if the words are > the ones used in the T10 documentation then I'm happy to drop the patch. Not everyone believes in the OED's definitions. I prefer Chambers (in general), which has the (online) definition: illegal adj 1 against the law; not legal. Also called unlawful. 2 not authorized by law or by specific rules which apply. ETYMOLOGY: 17c: from Latin illegalis. I believe the second definition would apply here. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can all 53c7xx-related stuff be removed?
On Sat, Feb 09, 2008 at 02:12:07PM -0500, Robert P. J. Day wrote: > On Sat, 9 Feb 2008, Matthew Wilcox wrote: > > On Sat, Feb 09, 2008 at 11:54:43AM -0500, Robert P. J. Day wrote: > > > just an observation that, even though 53c7xx content has been > > > officially removed, there are still a few droppings hanging around the > > > tree: > > > > Erm. It's not just a driver, it's also the name of some chips. > > > > > $ grep -r 53c7xx * > > > arch/m68k/Makefile:# without -fno-strength-reduce the 53c7xx.c driver > > > fails ;-( > > > > Ask the m68k guys. Obviously, the comment can go away, but maybe they > > want to remove -fno-strength-reduce now ...? > > > > > Documentation/scsi/sym53c8xx_2.txt:by the 53c7xx and 53c8xx family. > > > Documentation/scsi/ncr53c8xx.txt:by the 53c7xx and 53c8xx family. > > > drivers/scsi/zalon.c: * Zalon 53c7xx device driver. > > > > Here it's being used as the name of the chip. > > > > > drivers/scsi/a4000t.c: * plus modifications of the 53c7xx.c driver to > > > support the Amiga. > > > drivers/scsi/zorro7xx.c: * plus modifications of the 53c7xx.c driver to > > > support the Amiga. > > > > This is part of the credit being given to Alan Hourihane. I don't see > > how you can remove it. > > sorry, i wasn't suggesting removing *all* of that, just doing a > mindless grep and posting the results. whether anyone wants to > process any of that is their call. I was suggesting ways you could turn that mindless grep into a useful contribution. Whether you want to follow those suggestions is up to you. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can all 53c7xx-related stuff be removed?
On Sat, Feb 09, 2008 at 11:54:43AM -0500, Robert P. J. Day wrote: > just an observation that, even though 53c7xx content has been > officially removed, there are still a few droppings hanging around the > tree: Erm. It's not just a driver, it's also the name of some chips. > $ grep -r 53c7xx * > arch/m68k/Makefile:# without -fno-strength-reduce the 53c7xx.c driver fails > ;-( Ask the m68k guys. Obviously, the comment can go away, but maybe they want to remove -fno-strength-reduce now ...? > Documentation/scsi/sym53c8xx_2.txt:by the 53c7xx and 53c8xx family. > Documentation/scsi/ncr53c8xx.txt:by the 53c7xx and 53c8xx family. > drivers/scsi/zalon.c: * Zalon 53c7xx device driver. Here it's being used as the name of the chip. > drivers/scsi/a4000t.c: * plus modifications of the 53c7xx.c driver to support > the Amiga. > drivers/scsi/zorro7xx.c: * plus modifications of the 53c7xx.c driver to > support the Amiga. This is part of the credit being given to Alan Hourihane. I don't see how you can remove it. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] scsi_dh: Add support for SDEV_PASSIVE
On Mon, Feb 04, 2008 at 01:19:30PM -0800, Chandra Seetharaman wrote: > The device does send these error messages currently, but it takes some > time to get the check condition back, which adds up the time to boot > especially when the # of LUNS is huge. > > For example, in my test configuration, I had 40 luns, and the time > difference (with this patch and without it) to boot is 171 seconds and > 1426 seconds. Was that with sync or async SCSI bus scanning? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA mapping on SCSI device?
On Tue, Jan 29, 2008 at 02:09:52PM -0800, Luben Tuikov wrote: > > The ideal solution would be to do mapping against a > > different struct > > device for each port, so that we could maintain the proper > > DMA mask for > > each of them at all times. However I'm not sure if > > that's possible. The > > thought of using the SCSI struct device for DMA mapping was > > brought up > > at one point.. any thoughts on that? > > The reason for this is that the object that a struct scsi_dev > represents has nothing to do with HW DMA engines. It really would work, once the few remaining architectures move away from asserting that the 'struct device' passed in is a pci device. It seems like the best way forward to me. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA mapping on SCSI device?
On Mon, Jan 28, 2008 at 06:08:44PM -0600, Robert Hancock wrote: > The > thought of using the SCSI struct device for DMA mapping was brought up > at one point.. any thoughts on that? I believe this will work on some architectures and not others. Anything that uses include/asm-generic/dma-mapping.h will break, for example. It would be nice for those architectures to get fixed ... -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Megaraid: Use of outb_p
On Tue, Jan 22, 2008 at 02:30:12PM -0700, Yang, Bo wrote: > Alan/Matthew, > > I found inb_p/outb_p are defined as inb/outb in kernel src. So it > should not have problems to change inb_p/outb_p to inb/outb. That's not true for x86-32. Please, can you look up the documentation for this chip and find out what sort of delay it needs? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Warning: sym53c8xx calls dma_free_coherent() with irqs disabled
On Tue, Jan 22, 2008 at 01:19:09PM -0500, Tony Battersby wrote: > When unloading, sym53c8xx calls dma_free_coherent() while holding a > spinlock (sym53c8xx_lock) with irqs disabled, which produces the > following warning with 2.6.24: Ugh. I'm slightly torn. On the one hand, it's probably possible to fix sym_malloc. On the other hand, it's really long past time that sym2 dropped its custom allocator and started using dma pools. I'll take a look at converting to dma pools once I've got half a dozen other projects that're higher priority out of the way ;-( In any case thanks for the report. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] use dynamically allocated sense buffer
On Mon, Jan 21, 2008 at 01:08:58PM +0900, FUJITA Tomonori wrote: > On Sun, 20 Jan 2008 09:40:11 -0700 > Matthew Wilcox <[EMAIL PROTECTED]> wrote: > > Longer-term, I want to allow low-level drivers to allocate the > > sense_buffer themselves so they can DMA directly into it (ie grown-up dma > > mapping, rather than this quaint x86 __GFP_DMA). This patch doesn't get > > Yeah, I think that the approach is one of candidates. > > If we go with it, I think that the major issue is that LLDs don't know > when they can reclaim sense_buffer from scsi-ml; scsi-ml uses > sense_buffer after scmd->scsi_done. The midlayer would call a function in the scsi_host_template to free the command. The sense_buffer would be freed at the same time. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] use dynamically allocated sense buffer
On Wed, Jan 16, 2008 at 01:32:17PM +0900, FUJITA Tomonori wrote: > This removes static array sense_buffer in scsi_cmnd and uses > dynamically allocated sense_buffer (with GFP_DMA). > > The reason for doing this is that some architectures need cacheline > aligned buffer for DMA: > > http://lkml.org/lkml/2007/11/19/2 > > The problems are that scsi_eh_prep_cmnd puts scsi_cmnd::sense_buffer > to sglist and some LLDs directly DMA to scsi_cmnd::sense_buffer. It's > necessary to DMA to scsi_cmnd::sense_buffer safely. This patch solves > these issues. > > __scsi_get_command allocates sense_buffer via kmem_cache_alloc and > attaches it to a scsi_cmnd so everything just work as before. I think this is fine for the moment. Longer-term, I want to allow low-level drivers to allocate the sense_buffer themselves so they can DMA directly into it (ie grown-up dma mapping, rather than this quaint x86 __GFP_DMA). This patch doesn't get us any closer to that, but it doesn't get us further away from it either. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sym53c8xx ISTAT1 feature never set
I got an interesting mail from Mike Anderson yesterday detailing how he managed to provoke an error recovery path that didn't work so well. In sym_soft_reset(), we check to see if the chip has an ISTAT1 register, and if it does, and the scripts are running, we take additional action to abort the running script before doing a full reset. The comments for this function discuss a possible hang that can occur if we don't do this. The only problem is ... we never set the FE_ISTAT1 bit. Mike tells me that when he sets it, it makes his error recovery work. I think we need a patch somewhat along these lines: diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_ index 463f119..37575c1 100644 --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c @@ -1226,61 +1226,61 @@ static struct sym_chip sym_dev_table[] = { , {PCI_DEVICE_ID_NCR_53C875, 0x01, "875", 6, 16, 5, 2, FE_WIDE|FE_ULTRA|FE_CACHE0_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_DIFF|FE_VARCLK} + FE_RAM|FE_DIFF|FE_VARCLK|FE_ISTAT1} , {PCI_DEVICE_ID_NCR_53C875, 0xff, "875", 6, 16, 5, 2, FE_WIDE|FE_ULTRA|FE_DBLR|FE_CACHE0_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_DIFF|FE_VARCLK} + FE_RAM|FE_DIFF|FE_VARCLK|FE_ISTAT1} , {PCI_DEVICE_ID_NCR_53C875J, 0xff, "875J", 6, 16, 5, 2, FE_WIDE|FE_ULTRA|FE_DBLR|FE_CACHE0_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_DIFF|FE_VARCLK} + FE_RAM|FE_DIFF|FE_VARCLK|FE_ISTAT1} , {PCI_DEVICE_ID_NCR_53C885, 0xff, "885", 6, 16, 5, 2, FE_WIDE|FE_ULTRA|FE_DBLR|FE_CACHE0_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_DIFF|FE_VARCLK} + FE_RAM|FE_DIFF|FE_VARCLK|FE_ISTAT1} , #ifdef SYM_DEBUG_GENERIC_SUPPORT {PCI_DEVICE_ID_NCR_53C895, 0xff, "895", 6, 31, 7, 2, FE_WIDE|FE_ULTRA2|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS| - FE_RAM|FE_LCKFRQ} + FE_RAM|FE_LCKFRQ|FE_ISTAT1} , #else {PCI_DEVICE_ID_NCR_53C895, 0xff, "895", 6, 31, 7, 2, FE_WIDE|FE_ULTRA2|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_LCKFRQ} + FE_RAM|FE_LCKFRQ|FE_ISTAT1} , #endif {PCI_DEVICE_ID_NCR_53C896, 0xff, "896", 6, 31, 7, 4, FE_WIDE|FE_ULTRA2|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_RAM8K|FE_64BIT|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ} + FE_RAM|FE_RAM8K|FE_64BIT|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C895A, 0xff, "895a", 6, 31, 7, 4, FE_WIDE|FE_ULTRA2|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_RAM8K|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ} + FE_RAM|FE_RAM8K|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C875A, 0xff, "875a", 6, 31, 7, 4, FE_WIDE|FE_ULTRA|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ} + FE_RAM|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_LCKFRQ|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C1010_33, 0x00, "1010-33", 6, 31, 7, 8, FE_WIDE|FE_ULTRA3|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFBC|FE_LDSTR|FE_PFEN| FE_RAM|FE_RAM8K|FE_64BIT|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_CRC| - FE_C10} + FE_C10|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C1010_33, 0xff, "1010-33", 6, 31, 7, 8, FE_WIDE|FE_ULTRA3|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFBC|FE_LDSTR|FE_PFEN| FE_RAM|FE_RAM8K|FE_64BIT|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_CRC| - FE_C10|FE_U3EN} + FE_C10|FE_U3EN|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C1010_66, 0xff, "1010-66", 6, 31, 7, 8, FE_WIDE|FE_ULTRA3|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFBC|FE_LDSTR|FE_PFEN| FE_RAM|FE_RAM8K|FE_64BIT|FE_DAC|FE_IO256|FE_NOPM|FE_LEDC|FE_66MHZ|FE_CRC| - FE_C10|FE_U3EN} + FE_C10|FE_U3EN|FE_ISTAT1} , {PCI_DEVICE_ID_LSI_53C1510, 0xff, "1510d", 6, 31, 7, 4, FE_WIDE|FE_ULTRA2|FE_QUAD|FE_CACHE_SET|FE_BOF|FE_DFS|FE_LDSTR|FE_PFEN| - FE_RAM|FE_IO256|FE_LEDC} + FE_RAM|FE_IO256|FE_LEDC|FE_ISTAT1} }; #define sym_num_devs (ARRAY_SIZE(sym_dev_table)) But I'd welcome advice from anyone who knows these chips better than I do (ie Kai, James, etc). Note that the only place we currently _check_ FE_ISTAT1 is in the sym_soft_reset(). We do use nc_istat1 in a couple of other places, but those places are guarded by FE_C10 (ie 53c1010 chips) which are a superset of chips which implement ISTAT1. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Megaraid: Use of outb_p
On Fri, Jan 18, 2008 at 01:32:12PM -0700, Yang, Bo wrote: > Alan, > > The in/outb_p in MegaRAID scsi driver is used for our old io mapped > megaraid controller. There are still some customers are using those old > controller. Please keep them. Hi Bo, I think you've misunderstood the question. The '_p' versions of inb/outb introduce a delay after each access. The current Linux implementation of _p provokes bugs on some chipsets, so Alan is looking for alternatives. Let's ask the question like this: Replacing outb_p() with outb(); udelay(x); How large does 'x' need to be for these megaraid controllers? > Thanks. > > Bo Yang > > -Original Message- > From: Alan Cox [mailto:[EMAIL PROTECTED] > Sent: Friday, January 18, 2008 7:36 AM > To: linux-scsi@vger.kernel.org; DL-MegaRAID Linux > Subject: Megaraid: Use of outb_p > > > I notice the MegaRAID driver uses outb_p. Can someone at LSI confirm > that the delays between each I/O are required, and if so how long they > must be. > > I'm trying to sort out the use of in/outb_p and where it is unneccessary > or used for non ISA devices. > > (Please cc me on the reply) > Alan > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] A SCSI fault injection framework using SystemTap.
On Tue, Jan 15, 2008 at 12:04:09PM +0900, K.Tanaka wrote: > I would like to introduce a SCSI fault injection framework using SystemTap. > > Currently, kernel has Fault-injection framework and Faulty mode for md, > which can also be used for testing the error handling. But, they could > only produce fixed type of errors stochastically. In order to simulate > more realistic scsi disk faults, I have created a new flexible fault > injection > framework using SystemTap. How does it compare to using scsi_debug, which I believe can do all of the above and more? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Convert drivers/scsi/ch.c to use unlocked_ioctl
On Mon, Jan 14, 2008 at 03:31:17PM +0100, Mathieu Segaud wrote: > + .owner = THIS_MODULE, > + .open = ch_open, > + .release= ch_release, > + .unlocked_ioctl = ch_ioctl, Do you have a tab setting that's not 8? You seem to have used one tab too many for the first three lines. Please correct that one, and then add my: Reviewed-by: Matthew Wilcox <[EMAIL PROTECTED]> Thanks! -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Convert drivers/scsi/ch.c to use unlocked_ioctl
On Mon, Jan 14, 2008 at 02:32:13PM +0100, Mathieu Segaud wrote: > +#include You don't add any uses of lock_kernel() and there are none in the driver currently. > - .owner= THIS_MODULE, > - .open = ch_open, > - .release = ch_release, > - .ioctl= ch_ioctl, > + .owner = THIS_MODULE, > + .open = ch_open, > + .release= ch_release, > + .unlocked_ioctl = ch_ioctl, If you're going to do the gratuitous reformatting, at least use tabs instead of spaces. Other than that, should be fine. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
On Fri, Jan 11, 2008 at 09:50:46PM +0100, Krzysztof Helt wrote: > + BUG_ON(!sym_data->io_reset); > + sym_data->io_reset = &eh_done; Isn't that BUG_ON still the wrong sense? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] Switch ioctl functions of drivers/scsi/sg.c to unlocked_ioctl
On Thu, Jan 10, 2008 at 08:32:27PM +0100, Andi Kleen wrote: > Not many really in the core kernel. Hardly any. Grep for > lock_kernel to be sure, but there is not much. > > It's mostly drivers that still need it. > > How about the low level SCSI drivers that might called from the high > level SCSI code? > > Anyways starting these kinds of discussions was the goal of the proposal. > Even if Andre's patch ends up not being merged it would have reached > its goal. If your goal is to get rid of the BKL everywhere, sure. It's not clear to me this is the most productive way of spending our time though. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] Switch ioctl functions of drivers/scsi/sg.c to unlocked_ioctl
On Thu, Jan 10, 2008 at 07:59:44PM +0100, Andi Kleen wrote: > > Really, all this is doing is open coding what the ioctl handler is doing > > anyway, isn't it? in which case, why bother to change it at all? > > Because once it's open coded it is visible and can then be eliminated. > Does SCSI need the BKL at all? > > But perhaps for such a long ioctl handler it would be better to move > the lock/unlock_kernel()s into the individual case ...: statements; > then it could be eliminated step by step. This style of conversion is going to cause a lot of churn -- re-architecting this function to be single-exit, then presumably when the lock_kernel calls are pushed further down or eliminated, turning it back into a multiple-exit function. I suggest that for complex ioctl handlers, it be left to the maintainers to handle, and handle it properly all at once, rather than a gradual pushdown. You could argue that unlocked_ioctl has been around for a long time and people haven't made that move yet. But there's been no pressure before now to do so, and I think people would rather convert their own code than have somebody else do it. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memory allocation in sg_io()
On Thu, Jan 10, 2008 at 02:19:08PM +0100, Oliver Neukum wrote: > Am Donnerstag, 10. Januar 2008 14:05:25 schrieb Boaz Harrosh: > > On Thu, Jan 10 2008 at 14:33 +0200, Oliver Neukum <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > could you explain to me why this code can get away with allocating the > > > sense buffer on the stack? > > > > > > static int sg_io(struct file *file, struct request_queue *q, > > > struct gendisk *bd_disk, struct sg_io_hdr *hdr) > > > { > > > unsigned long start_time; > > > int writing = 0, ret = 0, has_write_perm = 0; > > > struct request *rq; > > > char sense[SCSI_SENSE_BUFFERSIZE]; > > Yes, you are doing DMA on the stack. Nobody does DMA to rq->sense. ub does a memcpy to it. cciss does a memcpy to it. scsi_lib assigns its own buffer to req->sense, ignoring the one passed down. That's how this code gets away with it. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sata_nv does not function in kernel > 2.6.20.21
On Thu, Jan 10, 2008 at 02:47:33AM +, Matthew Hall wrote: > I am using the Supermicro H8DCE motherboard. Some (not all) of the SATA > channels quit working due to some kind of resource conflict when I > upgrade to any kernel above 2.6.20.xx series, in my case I am running > 2.6.20.21 SMP x86_64 presently. Could you provide an 'lspci -v' and a 'cat /proc/iomem' for both kernels please? This doesn't seem to be a scsi problem per se, so I'm adding linux-kernel and linux-ide -- can you drop linux-scsi from any reply please. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AHCI finds disks; no /dev/sd inodes bound?
On Wed, Jan 09, 2008 at 12:36:26PM -0800, Jon Watte wrote: > Stefan Richter wrote: > >>Those systems (servers) typically have enough memory to tolerate a few > >>extra KB of code without problems. In fact most PCs these days have. > >> > > > >It would be a stupid solution nevertheless. > > > >(We also don't "select EXT3".) > > > Not selecting EXT3 is a little more understandable, because there are > many options -- cramfs, xfs, reiserfs, etc, depending on target. > However, the number of people who DO want SATA support but DO NOT want > SD block device support is... uh.. anyone? > > Solving the problem bigger and better, by factoring "SD" into a > mid-level menu, and maybe calling it something non-SCSI, would probably > be even better. And even more work. OK, how about this? config BLK_DEV_ATA_SD tristate "ATA disc support" select BLK_DEV_SD config BLK_DEV_ATA_SR tristate "ATA CDROM support" select BLK_DEV_SR Help text left as an exercise for the reader. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Number of devices that SCSI can support
On Wed, Jan 09, 2008 at 09:05:52AM -0600, James Bottomley wrote: > On Tue, 2008-01-08 at 19:22 -0700, Matthew Wilcox wrote: > > On Tue, Jan 08, 2008 at 04:55:46PM -0800, Vinay Venkataraghavan wrote: > > > Is there a limit on the number of devices that SCSI supports. In other > > > words, I have a QLogic HBA card, and I am connecting to a SAN which has > > > 64 targets. > > > > I've personally had over five hundred LUNs. You shouldn't be hitting a > > limit here. > > I believe the largest test that's been run was the old OSDL CGL > workgroup ... they went up to 4096. > > However, LUN support depends on the driver and HBA parameters as well > (some choose to have arbitrary limits). I was using a qlogic HBA for my tests, so I don't think this is the problem -- although the original poster claims to have 64 targets, and I had only one target with 128 luns (attached 4 times). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Number of devices that SCSI can support
On Tue, Jan 08, 2008 at 04:55:46PM -0800, Vinay Venkataraghavan wrote: > Is there a limit on the number of devices that SCSI supports. In other words, > I have a QLogic HBA card, and I am connecting to a SAN which has 64 targets. I've personally had over five hundred LUNs. You shouldn't be hitting a limit here. > I am running 2.6.9-42. That sounds like a Red Hat Enterprise kernel. Perhaps you should contact them for support? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] aacraid: big endian issues
On Tue, Jan 08, 2008 at 01:17:15PM -0800, Grant Grundler wrote: > Why apply le32_to_cpu() to the constant instead of the variable? > On systems were le32_to_cpu() is doing something, can gcc or > preprocessor optimize the constant? > I've always assumed it could not but that might be wrong. $ grep constant_p include/linux/byteorder/* include/linux/byteorder/swabb.h:(__builtin_constant_p((__u32)(x)) ? \ include/linux/byteorder/swabb.h:(__builtin_constant_p((__u32)(x)) ? \ include/linux/byteorder/swab.h:(__builtin_constant_p((__u16)(x)) ? \ include/linux/byteorder/swab.h:(__builtin_constant_p((__u32)(x)) ? \ include/linux/byteorder/swab.h:(__builtin_constant_p((__u64)(x)) ? \ -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Buggy Firewire bridge 'Prolific PL3507'
On Tue, Jan 08, 2008 at 06:56:11PM +0100, Stefan Richter wrote: > It's hald or something like that. Can't we get hald to look at the cached results of the inquiry command, handily available through sysfs? > Matthew Wilcox wrote: > > I have a hard time thinking this needs to be handled generically > > because READ CAPACITY isn't in the Primary Command Set, unlike INQUIRY. > > It's mandatory for SBC devices to implement it, but other devices > > shouldn't even implement it. > > It is also mandatory in RBC, which most SBP-2 to IDE or SBP-2 to SATA > bridges implement (besides MMC, if a CD/DVD-ROM/R/W is attached). According to SPC4r05a, READ CAPACITY(10) (opcode 25) is Mandatory for D, W and O. READ CAPACITY (opcode 25) is Optional for R. READ CARD CAPACITY (also opcode 25) is Mandatory for K. D - DIRECT ACCESS BLOCK DEVICE (SBC-3) W - WRITE ONCE BLOCK DEVICE (SBC) O - OPTICAL MEMORY BLOCK DEVICE (SBC) R - CD/DVD DEVICE (MMC-6) K - OPTICAL CARD READER/WRITER DEVICE (OCRW) According to this table (D.2 for anyone following along at home), RBC devices are not supposed to implement it. > I would not at all like to see a respective workaround added to the sbp2 > driver. Sbp2 is a transport layer which does not touch commands. Maybe > an in-kernel workaround would be considered by whoever maintains the > point where that extra INQUIRY is injected (via sd or sg, I don't know > which). It would be thoroughly inappropriate to put this workaround in the midlayer. sbp2 isn't a good place either, but it's better than sg/sd/scsi_ioctl/... -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Buggy Firewire bridge 'Prolific PL3507'
On Tue, Jan 08, 2008 at 02:43:34PM +0100, Hannes Reinecke wrote: > Hi all, > > I have here a buggy firewire bridge (Prolific PL3507) which requires that > each 'INQUIRY' command is followed by a 'READ CAPACITY' command. Otherwise > any read will return invalid data (the payload is preceded by 36 empty bytes). > > How to fix this? Sure one could add a hack to sbp2.c to always issue a READ > CAPACITY after INQUIRY, but this somehow feels wrong ... There's only one place in the scsi stack that issues INQUIRY -- scsi_scan.c [1]. sd is going to issue READ CAPACITY before it issues any READ commands, so I don't see where you're having this problem. Is it with some program issuing INQUIRY through SG_IO or something? I have a hard time thinking this needs to be handled generically because READ CAPACITY isn't in the Primary Command Set, unlike INQUIRY. It's mandatory for SBC devices to implement it, but other devices shouldn't even implement it. My opinion is that it needs to be handled either in the application issuing sg commands or in sbp2. [1] yes, there are others, such as drivers and scsi_transport_spi, but none relevant to sbp2. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: An MCA ESP driver
On Sat, Jan 05, 2008 at 03:18:12PM -0600, James Bottomley wrote: > Just a check up on this: Matthew were you ever going to complete the > mca_94x conversion? It's quite topical because it would be another > example driver for the m68k people to look at. > > If not, I can probably complete the bits you haven't yet done, but it > would be nice to know. I wasn't planning on completing the conversion, no. I don't have the hardware myself, so can't test it. I started working on it when I was between jobs, and once I found gainful employment, the amount of free time I have to hack on obscure hardware that I don't have declined to nil ;-) Thanks for volunteering to complete the conversion. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad escriptions in Kconfig
On Mon, Dec 31, 2007 at 10:16:43AM -0500, Douglas Gilbert wrote: > Bodo Eggert wrote: > > --- > > SCSI target support (SCSI_TGT) [N/m/y/?] (NEW) ? > > > > If you want to use SCSI target mode drivers enable this option. > > If you choose M, the module will be called scsi_tgt. > > --- > > > > What TF is a SCSI target mode, what is a target mode driver? > > Heard of google :-) > > For explanations of SCSI (and other storage) terminology > reference could be made to SAM-3 or SAM-4 drafts (because > the real standards cost money) at www.t10.org . > > Perhaps many other subsections in the kernel could have > similar references. I think that's an appalling idea. Someone's trying to configure their kernel, not research hundreds of new ideas on the internet. Here's a better description: help The SCSI target code allows your computer to appear as a SCSI device. This is useful in a SAN or NAS environment where you want other computers to be able to treat this computer as a disc. To compile this driver as a module, choose M here: the module will be called scsi_tgt. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: BUG: null pointer deref., segfaults
>From the backtrace, this doesn't seem to be a scsi or ide problem. It might be a block-layer bug, or a VM problem. I've cc'd the VM people to see what they think. On Mon, Dec 31, 2007 at 12:06:00AM +0100, Erno Kovacs wrote: > [1.] One line summary of the problem: > > using dd on a broken hdd causes kernel NULL pointer dereference > > > [2.] Full description of the problem/report: > > I have a broken hdd (unreadable sector). While dd-ing it into another same > size hdd, > I get kernel-level error. First time it is a NULL pointer dereference then a > few minutes > later its a BUG in mm/slab.c, no IO operation anymore shown by iostat, and dd > gets > kernel-space process (shown as [dd] by ps). System becomes unstable, > processes get > segfaulted, even reboot is unoperational. > > Dmesg output: > BUG: unable to handle kernel NULL pointer dereference at virtual address > 002a > printing eip: > c016b8ce > *pde = > Oops: 0002 [#1] > Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev > tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse > > serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core > usbcore sata_sil > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010213 (2.6.23.12 #1) > EIP is at drop_buffers+0x5a/0xd5 > eax: 002e ebx: d6e300f0 ecx: 0026 edx: d6e30118 > esi: c126e140 edi: d6e300f0 ebp: d6e300f0 esp: c149bdd8 > ds: 007b es: 007b fs: gs: ss: 0068 > Process kswapd0 (pid: 163, ti=c149a000 task=dff6f030 task.ti=c149a000) > Stack: c1274860 c1117260 df1152fc c149be00 c126e140 0001 >c149bf84 c016b987 c126e140 df1152fc c0141f72 043e 0011 > c149bf14 0120 0004 0004 0001 c12dc440 > Call Trace: > [] try_to_free_buffers+0x3e/0x6c > [] shrink_page_list+0x414/0x4fc > [] isolate_lru_pages+0x44/0x17f > [] shrink_inactive_list+0xce/0x265 > [] check_preempt_curr_fair+0x52/0x56 > [] shrink_zone+0xbe/0xe2 > [] kswapd+0x251/0x3b8 > [] autoremove_wake_function+0x0/0x35 > [] kswapd+0x0/0x3b8 > [] kthread+0x36/0x5b > [] kthread+0x0/0x5b > [] kernel_thread_helper+0x7/0x10 > === > Code: 14 8b 02 83 e0 06 0b 42 34 74 07 31 c0 e9 8c 00 00 00 8b 52 04 39 fa 75 > d5 89 fb 8b 4b 28 8d 53 28 8b 6b 04 39 d1 74 53 8b 42 04 <89> 41 04 89 > > 08 89 52 04 83 7b 30 00 89 53 28 75 29 c7 44 24 0c > EIP: [] drop_buffers+0x5a/0xd5 SS:ESP 0068:c149bdd8 > > then later > > [ cut here ] > kernel BUG at mm/slab.c:2983! > invalid opcode: [#2] > Modules linked in: ac battery dm_crypt dm_snapshot dm_mirror dm_mod joydev > tsdev usbhid sd_mod fan rtc evdev thermal processor button psmouse > > serio_raw pcspkr via_agp agpgart i2c_viapro ehci_hcd hpt366 uhci_hcd i2c_core > usbcore sata_sil > CPU:0 > EIP:0060:[]Tainted: G D VLI > EFLAGS: 00010046 (2.6.23.12 #1) > EIP is at cache_alloc_refill+0xe0/0x3ec > eax: 0043 ebx: 0020 ecx: dffe7da0 edx: dffe7da0 > esi: d4686000 edi: dffedd20 ebp: dffe9200 esp: de49dcc8 > ds: 007b es: 007b fs: gs: 0033 ss: 0068 > Process dd (pid: 1888, ti=de49c000 task=df7dd030 task.ti=de49c000) > Stack: 0043 8050 dffe7da0 001a c0262880 d67d79e0 c013d8d3 >c016b777 c1126900 dffe7da0 0282 8050 c01500e2 c1126900 > 1000 c016b75d c1126900 c016bda5 0001 c1126900 f000 > Call Trace: > [] submit_bio+0xa5/0xac > [] mempool_alloc+0x1c/0x93 > [] alloc_buffer_head+0x2a/0x2e > [] kmem_cache_alloc+0x2b/0x65 > [] alloc_buffer_head+0x10/0x2e > [] alloc_page_buffers+0x2d/0xbb > [] create_empty_buffers+0x10/0x6b > [] block_read_full_page+0x40/0x296 > [] blkdev_get_block+0x0/0x42 > [] __do_page_cache_readahead+0x16c/0x1bf > [] ondemand_readahead+0x48/0xf1 > [] do_generic_mapping_read+0x10d/0x3c8 > [] generic_file_aio_read+0x12d/0x159 > [] file_read_actor+0x0/0xca > [] do_sync_read+0xc6/0x109 > [] handle_IRQ_event+0x1a/0x3f > [] autoremove_wake_function+0x0/0x35 > [] clockevents_program_event+0x9c/0xa3 > [] do_sync_read+0x0/0x109 > [] vfs_read+0xa6/0x128 > [] sys_read+0x41/0x67 > [] sysenter_past_esp+0x5f/0x85 > === > Code: be 00 00 00 8b 37 39 fe 75 15 8b 77 10 8d 47 10 c7 47 30 01 00 00 00 39 > c6 0f 84 9a 00 00 00 8b 54 24 08 8b 42 1c 39 46 10 72 2d <0f> 0b eb fe > > 8b 44 24 08 8b 5e 14 8b 4d 00 8b 50 10 8b 04 24 0f > EIP: [] cache_alloc_refill+0xe0/0x3ec SS:ESP 0068:de49dcc8 > > > > [3.] Keywords (i.e., modules, networking, kernel): > > dd null pointer dereference segfault > > > [4.] Kernel version (from /proc/version): > > Linux version 2.6.23.12 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 > (prerelease) (Debian 4.1.1-21)) #1 Thu Dec 27 13:22:18 CET 2007 > > > [5.] Output of Oops.. message (if applicable) with symbolic information > resolved (see Documentatio
Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd
On Fri, Dec 21, 2007 at 02:30:28PM +0100, Thomas Bogendoerfer wrote: > there are SCSI host drivers, which also DMA to the sense buffer like > sgiwd93.c for example. Yes ... and there are others which don't, for example qla2xxx and sym53c8xx. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd
On Fri, Dec 21, 2007 at 10:33:26AM +, Alan Cox wrote: > On Fri, 21 Dec 2007 13:30:08 +1100 > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > > > The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly > > by some low level drivers (that typically happens with USB mass > > storage). > > Should that not be fixed in USB storage by using pci_alloc_coherent on the > PCI device of the hub not peeing directly into kernel space ? That's what I said, but Ben seems fixated on this particular fix. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On Fri, Dec 14, 2007 at 11:13:40AM -0700, Matthew Wilcox wrote: > I'll send it to our DB team to see if this improves our numbers at all. It does, by approximately 0.67%. This is about double the margin of error, and a significant improvement. Thanks! -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INITIO scsi driver fails to work properly
On Wed, Dec 19, 2007 at 10:50:40AM -0600, James Bottomley wrote: > So, to get the best of both worlds, file a bugzilla and note the bugid. > Then email a complete report to the relevant list, but add [BUG ] > to the subject line and cc [EMAIL PROTECTED] If you do > this, bugzilla will keep track of the entire discussion as it progresses > and allow those who track bugs through bugzilla to get a pretty accurate > idea of the status. You should never need to touch bugzilla again once > the initial bug report is filed: all future information flow is via the > mailing lists. The problem is that it appears to the casual observer as if they can then add information to the bug through the web interface. But that information will never be forwarded to the mailing list. Unless there's a way of marking bugs as 'unchangable through the web interface' or 'all messages appended to this bug need to be forwarded', Bugzilla just doesn't fit our needs. The Debian BTS fits our way of working much better. Perhaps somebody should investigate a migration. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INITIO scsi driver fails to work properly
On Wed, Dec 19, 2007 at 10:48:27AM +0200, Filippos Papadopoulos wrote: > Would it be better to open a bug report at bugzilla? No, it wouldn't. Bugzilla is a place where bug reports go to be ignored. Witness 9370 where despite my best efforts to move discussion to the mailing list, it's been thoroughly ignored because the original reporte insists on posting additional information there instead of to the mailing list. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html