On 07/25/2014 09:28 PM, scame...@beardog.cce.hp.com wrote:
>
> hpsa: Work In Progress: "lockless monster" patches
>
> To be clear, I am not suggesting that these patches be merged at this time.
>
> This patchset is vs. Christoph Hellwig's scsi-mq.4 branch which
> may be found here: git://git.infradead.org/users/hch/scsi.git
>
> We've been working for a long time on a patchset for hpsa to remove
> all the locks from the main i/o path in pursuit of high IOPS.  Some
> of that work is already upstream, but a lot more of it is not quite
> yet ready to be merged.  However, I think we've "gone dark" for a bit
> too long on this, and even though the patches aren't really ready to
> be merged just yet, I thought I should let other people who might be
> interested have a look anyway, as things are starting to be at least
> more stable than unstable.  Be warned though, there are still some
> problems, esp. around error recovery.
>
> That being said, with the right hardware (fast SAS SSDs, recent Smart
> Arrays e.g. P430, with up-to-date firmware, attached to recent disk 
> enclosures)
> with these patches and the scsi-mq patches, it is possible to get around
> ~970k IOPS from a single logical drive on a single controller.
>
> There are about 150 patches in this set.  Rather than bomb the list
> with that, here is a link to a tarball of the patches in the form of
> an stgit patch series:

Hi Stephen,
I've looked only at the alloc functions - I'm not sure if I got it right,
it seems it uses the same cmd_pool for both alloc function,
from (0 - reserved_cmds) it's cmd_alloc and above that it's handled
by cmd_tagged_alloc.
The logic in both functions seems to be valid for me.

In cmd_tagged_alloc "Thus, there should never be a collision here between
two requests" if this is true you don't need the refcount and just a simple
flag were sufficient for your other needs. (And maybe you get to ~971k IOPS..)

cmd_alloc some minor changes are possible 
- try to find free entries only to reserved_cmds
  (the bitmap might be shorter too)
- next thread should start + 1 from the last result

- when nr_cmds is read from hw it should be tested (nr_cmds > reserved_cmds)
- what is the correct pronunciation of benignhly?
@@ -5634,7 +5634,8 @@ static struct CommandList *cmd_alloc(struct ctlr_info *h)
 
        offset = h->last_allocation; /* benighly racy */
        for (;;) {
-               i = find_next_zero_bit(h->cmd_pool_bits, h->nr_cmds, offset);
+               offset %=  h->reserved_cmds;
+               i = find_next_zero_bit(h->cmd_pool_bits, h->reserved_cmds, 
offset);
                if (unlikely(i >= h->reserved_cmds)) {
                        offset = 0;
                        continue;
@@ -5642,15 +5643,16 @@ static struct CommandList *cmd_alloc(struct ctlr_info 
*h)
                c = h->cmd_pool + i;
                refcount = atomic_inc_return(&c->refcount);
                if (unlikely(refcount > 1)) {
-                       cmd_free(h, c); /* already in use */
-                       offset = (i + 1) % h->nr_cmds;
+                       atomic_dec(&c->refcount); /* already in use, since we
+                        don't own the command just decrease the refcount */
+                       offset = i + 1;
                        continue;
                }
                set_bit(i & (BITS_PER_LONG - 1),
                        h->cmd_pool_bits + (i / BITS_PER_LONG));
                break; /* it's ours now. */
        }
-       h->last_allocation = i; /* benignly racy */
+       h->last_allocation = i + 1; /* benignly racy */
        hpsa_cmd_partial_init(h, i, c);
        return c;
 }
@@ -5670,6 +5672,7 @@ static void cmd_free(struct ctlr_info *h, struct 
CommandList *c)
                clear_bit(i & (BITS_PER_LONG - 1),
                          h->cmd_pool_bits + (i / BITS_PER_LONG));
        }
+       else ; /*BUG ? or just atomic_dec and no if */
 }
------------------------------------------------------------------------

And please have a look at "[PATCH] hpsa: refine the pci enble/disable handling"
I posted in June and add it to your series or review in the list.

Thanks,
Tomas


>
> https://github.com/smcameron/hpsa-lockless-patches-work-in-progress/blob/master/hpsa-lockless-vs-hch-scsi-mq.4-2014-07-25-1415CDT.tar.bz2?raw=true
>
> In some cases, I have probably erred on the side of having too many too
> small patches, on the theory that it is easier to bake a cake than to 
> unbake a cake.  Before these are submitted "for reals", there will be
> some squashing of patches, along with other cleaning up.
>
> There are some patches in this set which are already upstream in
> James's tree which do not happen to be in Christoph's tree
> (most of which are named "*_sent_to_james").  There are also
> quite a few patches which are strictly for debugging and are not
> ever intended to be merged. 
>
>
> Here is a list of all the patches sorted by author:
>
> Arnd Bergmann (1):
>   [SCSI] hpsa: fix non-x86 builds
>
> Christoph Hellwig (1):
>   reserve block tags in scsi host
>
> Joe Handzik (9):
>   hpsa: add masked physical devices into h->dev[] array
>   hpsa: add hpsa_bmic_id_physical_device function
>   hpsa: get queue depth for physical devices
>   hpsa: use ioaccel2 path to submit IOs to physical drives in HBA mode.
>   hpsa: Get queue depth from identify physical bmic for physical disks.
>   hpsa: add ioaccel sg chaining for the ioaccel2 path
>   Set the phys_disk value for a CommandList structure that is
>     submitted. Squash
>   hpsa: unmap ioaccel2 commands before, not after adding to resubmit
>     workqueue
>   hpsa: add more ioaccel2 error handling, including underrun statuses.
>
> Nicholas Bellinger (1):
>   hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
>
> Robert Elliott (44):
>   Change scsi.c scsi_log_completion() to print strings for QUEUED,
>   Set scsi_logging_level to be more verbose to get better messages
>   hpsa: do not unconditionally copy sense data
>   hpsa: optimize cmd_alloc function by remembering last allocation
>   From a154642aeed291c7cfe4b9ea9da932156030f7d1 Mon Sep 17 00:00:00
>     2001
>   hpsa: Clean up host, channel, target, lun prints
>   hpsa: do not check cmd_alloc return value - it cannnot return NULL
>   hpsa: return -1 rather than -ENOMEM in hpsa_get_raid_map if fill_cmd
>     fails
>   hpsa: return -ENOMEM not 1 from ioaccel2_alloc_cmds_and_bft on
>     allocation failure
>   hpsa: return -ENOMEM not 1 from hpsa_alloc_ioaccel_cmd_and_bft on
>     allocation failure
>   hpsa: return -ENOMEM not -EFAULT from hpsa_passthru_ioctl on kmalloc
>     failure
>   hpsa: pass error from pci_set_consistent_dma_mask intact from
>     hpsa_message
>   hpsa: detect and report failures changing controller transport modes
>   hpsa: add hpsa_disable_interrupt_mode
>   hpsa: rename free_irqs to hpsa_free_irqs and move before
>     hpsa_request_irq
>   hpsa: rename hpsa_request_irq to hpsa_request_irqs
>   hpsa: on failure of request_irq, free the irqs that we did get
>   hpsa: make hpsa_pci_init disable interrupts and pci_disable_device on
>     critical failures
>   hpsa: fix a string constant broken into two lines
>   hpsa: avoid unneccesary calls to resource freeing functions in
>     hpsa_init_one
>   hpsa: add hpsa_free_cfgtables function to undo what
>     hpsa_find_cfgtables does
>   hpsa: report failure to ioremap config table
>   hpsa: add hpsa_free_pci_init function
>   hpsa: clean up error handling in hpsa_pci_init
>   hpsa: make hpsa_remove_one use hpsa_pci_free_init
>   hpsa: rename hpsa_alloc_ioaccel_cmd_and_bft to
>     hpsa_alloc_ioaccel1_cmd_and_bft
>   hpsa: rename hpsa_allocate_cmd_pool to hpsa_alloc_cmd_pool
>   hpsa: rename ioaccel2_alloc_cmds_and_bft to
>     hpsa_alloc_ioaccel2_cmd_and_bft
>   hpsa: fix memory leak in hpsa_alloc_cmd_pool
>   hpsa: return status not void from hpsa_put_ctlr_into_performant_mode
>   hpsa: clean up error handling in hpsa_init_one
>   hpsa: report allocation failures while allocating SG chain blocks
>   hpsa: rename hpsa_allocate_sg_chain_blocks to
>     hpsa_alloc_sg_chain_blocks
>   hpsa: improve allocation failure messages from hpsa_init_one
>   hpsa: fix minor if-statement style issue in hpsa_init_one
>   hpsa: fix wrong indent level in hpsa_init_one
>   Add hpsa_free_performant_mode and call it from hpsa_init_one
>   hpsa: shorten the wait for the CISS doorbell mode change
>     acknowledgement
>   hpsa: clean up some error reporting output in abort handler
>   hpsa: try to detect controller lockup in abort handler
>   hpsa: Return DID_NO_CONNECT rather than DID_ERR on lockup
>   hpsa: check for lockup on all simple_cmd submissions
>   hpsa: Support 64-bit lun values in printks
>   hpsa: do not print ioaccel2 warning messages about unusual
>     completions.
>
> Stephen M. Cameron (95):
>   scsi: break from queue depth adjusting loops when device found
>   hpsa: remove online devices from offline device list
>   hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
>   hpsa: make hpsa_init_one return -ENOMEM if allocation of
>     h->lockup_detected fails
>   hpsa: fix 6-byte READ/WRITE with 0 length data xfer
>   hpsa: remove spin lock around command allocation
>   hpsa: use atomics for commands_outstanding instead of spin locks
>   hpsa: reserve some commands for use by driver
>   hpsa: get rid of cmd_special_alloc and cmd_special_free
>   hpsa: do not queue commands internally in driver
>   hpsa: remove unused interrupt handler code
>   hpsa: remove direct lookup bit from command tags
>   hpsa: fix race between abort handler and main i/o path
>   hpsa: allow lockup detector to be turned off
>   hpsa: do not request device rescan on every ioaccel path error
>   hpsa: factor out hpsa_ciss_submit function
>   hpsa: use workqueue to resubmit failed ioaccel commands
>   hpsa: use driver instead of system work queue for command
>     resubmission
>   hpsa: remove 'action required' phrasing
>   hpsa: Count passthru cmds with atomics, not a spin locked int
>   hpsa: remove unused fifo_full function
>   hpsa: slightly optimize SA5_performant_completed
>   hpsa: do not check for msi(x) in interrupt_pending
>   hpsa: fix fail_all_outstanding_cmds to use reference counting
>   hpsa: fix endianness issue with scatter gather elements
>   hpsa: get rid of type/attribute/direction bit field where possible
>   hpsa: do not use function pointers in fast path command submission
>   hpsa: do not wait for aborted commands to complete after abort
>   hpsa: fix hpsa_drain_accel_commands to use reference counting instead
>     of bitmap
>   hpsa: fix race when updating raid map data.
>   hpsa: allow commands to be completed on specified reply queue
>   hpsa: fix completion race in abort handler
>   hpsa: fix aborting of reused commands
>   hpsa: lay groundwork for handling driver initiated command timeouts
>   hpsa: do not free command twice in hpsa_get_raid_map
>   hpsa: do not send aborts to logical devices that do not support
>     aborts
>   hpsa: remember whether devices support aborts across rescans
>   hpsa: do not send two aborts with swizzled tags.
>   hpsa: decrement h->commands_outstanding in fail_all_outstanding_cmds
>   hpsa: flush work queue hpsa_wq in fail_all_outstanding_cmds
>   hpsa: use per controller not per driver work queue for command
>     resubmission
>   hpsa: fix allocation sizes for CISS_REPORT_LUNs commands
>   hpsa: always use extended report physical LUNs for physical devices
>   hpsa: prevent manually adding masked physical devices
>   hpsa: fix values for expose status
>   hpsa: avoid unnecessary io in hpsa_get_pdisk_of_ioaccel2()
>   hpsa: do not be so noisy about check conditions
>   hpsa: handle descriptor format sense data where needed
>   hpsa: print CDBs instead of kernel virtual addresses for uncommon
>     errors
>   hpsa: handle Command Status TMF function status
>   hpsa: do not use a void pointer for scsi_cmd field of struct
>     CommandList
>   hpsa: return FAILED from abort handler if controller locked up.
>   hpsa: if controller locked up return failed from device reset handler
>   hpsa: check for ctlr lockup after command allocation in main io path
>   hpsa: mark masked devices as masked in output
>   hpsa: limit number of concurrent abort requests
>   hpsa: separate lockup detection and device rescanning
>   hpsa: factor out hpsa_init_cmd function
>   hpsa: reinitialize commands on resubmitting failed ioaccel commands
>   hpsa: fully initialize commands once at driver load not every time
>   hpsa: change driver version to 3.4.6
>   hpsa: allow lockup detected to be viewed via sysfs
>   hpsa: do not ignore return value of hpsa_register_scsi
>   hpsa: DEBUG ONLY - allow triggering command resubmitting via sysfs
>   hpsa: try resubmitting down raid path on task set full
>   hpsa: factor out hpsa_ioaccel_submit function
>   hpsa: try resubmitting down ioaccelerated path on task set full
>   hpsa: honor queue depth of physical devices
>   hpsa: use cancel_delayed_work_sync where needed
>   hpsa: clean up queue depth getting code
>   hpsa: try to fix ioaccel command accounting code
>   hpsa: fix hpsa_figure_phys_disk_ptrs to handle layout_map_count other
>     than 1
>   hpsa: close race in ioaccel_cmds_out accounting
>   hpsa: guard against overflowing raid map array
>   hpsa: test ioaccel_cmds_out vs. queue depth earlier to avoid wasting
>     time
>   hpsa: DEBUG ONLY - fix missing atomic-dec of ioaccel_cmds_out in
>     debug command resubmission code
>   hpsa: fix missing atomic_dec of dev->ioaccel_cmds_out
>   hpsa: do not ack controller events on controllers that do not support
>     it
>   hpsa: fix code that updates h->dev[]->phys_disk[]
>   hpsa: remove bug-on that triggers in hba mode by mistake
>   hpsa: remove unused temp64 variable from hpsa_cmd_init
>   hpsa bump driver version again
>   hpsa break hpsa_free_irqs_and_disable_msix into two functions
>   separate hpsa_free_cmd_pool into three separate function
>   hpsa: fix missing return in hpsa_command_resubmit_worker
>   hpsa: fix command leaks in hpsa_resubmit_command_worker
>   hpsa: fix bad interpretations of hpsa_ioaccel_submit return value
>   hpsa: do not touch phys_disk[] for ioaccel enabled logical drives
>     during rescan
>   hpsa: add change_queue_type function
>   hpsa: make hpsa_change_queue_depth handle reason SCSI_QDEPTH_QFULL
>   hpsa: initialize queue depth of logical drives reasonably
>   hpsa: attempt to fix queue depth calculations for logical drives
>   hpsa: add support sending aborts to physical devices via the ioaccel2
>     path
>   hpsa: fix problem with abort support checking happening too early
>   hpsa: remove incorrect bug on in hpsa_scsi_ioaccel_raid_map
>
> Webb Scales (3):
>   Initialize reference counts to zero on initial command entry
>     allocation
>   hpsa: make hpsa_send_reset_as_abort_ioaccel2() use the specified
>     reply queue.
>   hpsa:  use block layer tag for command allocation
>
>  drivers/scsi/hpsa.c       | 3642 
> ++++++++++++++++++++++++++++-----------------
>  drivers/scsi/hpsa.h       |  103 +-
>  drivers/scsi/hpsa_cmd.h   |  222 +++-
>  drivers/scsi/scsi.c       |   64 +-
>  drivers/scsi/scsi_error.c |    2 +
>  drivers/scsi/scsi_lib.c   |    1 +
>  include/scsi/scsi_host.h  |    9 +
>  7 files changed, 2561 insertions(+), 1482 deletions(-)
>
> -- steve
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to