Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-03 Thread Nick Alcock
On 1 Aug 2013, Bernd Schubert stated:

 Once I noticed that scsi_get_vpd_page() works fine from other function
 calls and that it is not 0x89, but already 0x0 that fails fixing it became
 easy.

 Nix, any chance you could verify it also works for you?

Confirmed, thank you!

 Somehow older areca firmware versions have issues with
 scsi_get_vpd_page() and a large buffer.

I wonder if they're using math modulo SD_BUF_SIZE-1 by mistake, so they
misinterpret this as zero? (Still, doing math modulo 511 seems very
odd, even if this firmware *does* only support 512-byte sectors.)

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-02 Thread Nick Alcock
On 1 Aug 2013, Bernd Schubert told this:

 Once I noticed that scsi_get_vpd_page() works fine from other function
 calls and that it is not 0x89, but already 0x0 that fails fixing it became
 easy.

 Nix, any chance you could verify it also works for you?

Sorry for the delay: it's hard for me to verify this during the working
week.

I'll check it tomorrow -- after I've run a backup! :} (why yes, bugs of
this nature do frighten me a bit. I know it's superstition, but I'm
always wondering whether the SCSI controller will come back again
whenever that post-error bus reset happens.)

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-30 Thread Nick Alcock
On 30 Jul 2013, Bernd Schubert told this:

 On 07/30/2013 01:34 AM, Martin K. Petersen wrote:
 (wheezy)fslab1:~# sg_inq -v /dev/sdc
 inquiry cdb: 12 00 00 00 24 00
 standard INQUIRY:
 inquiry cdb: 12 00 00 00 60 00
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
   SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  BQue=0
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=1
   [RelAdr=0]  WBus16=1  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
   [SPI: Clocking=0x3  QAS=0  IUS=0]
 length=96 (0x60)   Peripheral device type: disk
  Vendor identification: Hitachi
  Product identification: HDS724040KLSA80
  Product revision level: R001
 inquiry cdb: 12 01 00 00 fc 00
 inquiry cdb: 12 01 80 00 fc 00
  Unit serial number: KRFS2CRAHXJZVD

 Besides the firmware, the difference might be that I'm exporting single disks 
 without any areca-raidset in between.
 I can try to confirm that tomorrow, I just need the system as it is till 
 tomorrow noon.

Aaah. Yeah, it looks like in JBOD mode it's just passing things straight
on to the disk: that vendor ID is a dead giveaway. For all I know my
earlier firmware does the same, but for obvious reasons I can't really
test that! Quite possibly it's passing *everything* on to the disk,
including all SCSI commands, in which case we don't actually know that
your Areca controller supports the VPD page we thought it did: quite
possibly only this underlying disk does.

You can get a degree of info on the underlying disks in the array even
if it's in RAID mode -- smartctl does it, for instance -- but it takes
Areca-specific code and chattering to the sg devices directly. I bet
that in JBOD mode, the sg device is the only exposure the controller has
to the world, and *all* the /dev/sd* devices are just passthroughs.

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nick Alcock
My server's ARC-1210 has been working fine for years, but when I
upgraded from 3.10.1, it started failing:

Instead of

[0.784044] Areca RAID Controller0: F/W V1.46 2009-01-06  Model ARC-1210
[0.804028] scsi0 : Areca SATA Host Adapter RAID Controller
 Driver Version 1.20.00.15 2010/08/05
[...]

[4.111770] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.115399] sd 7:0:0:1: [sdd] No Caching mode page present
[4.115401] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.118081]  sdd: sdd1
[4.124363] sd 7:0:0:1: [sdd] No Caching mode page present
[4.124601] sd 7:0:0:1: [sdd] Assuming drive cache: write through
[4.124867] sd 7:0:0:1: [sdd] Attached SCSI removable disk

I now see (timestamps and some of the right edge chopped off because not
captured on my camera, no netconsole as this machine has all my storage
and is my loghost, and with this bug it can't get at any of that
storage).

sd 7:0:0:1: [sdd] Assuming drive cache: write through
sd 7:0:0:1: [sdd] No Caching mode page present
sd 7:0:0:1: [sdd] Assuming drive cache: write through
 sdd: sdd1
sd 7:0:0:1: [sdd] No Caching mode page present
sd 7:0:0:1: [sdd] Assuming drive cache: write through
sd 7:0:0:1: [sdd] Attached SCSI removable disk
arcmsr0: abort device command of scsi id = 0 lun = 1
arcmsr0: abort device command of scsi id = 0 lun = 0
arcmsr: executing bus reset eh.num_resets=0, num_[...]

arcmsr0: wait 'abort all outstanding command' timeout
arcmsr0: executing hw bus reset 
arcmsr0: waiting for hw bus reset return, retry=0
arcmsr0: waiting for hw bus reset return, retry=1
Areca RAID Controller0: F/W V1.46 2009-01-06  Model ARC-1210
arcmsr: scsi  bus reset eh returns with success
[and back to the top of the error messages again, apparently forever,
 not that the machine would be much use without its RAID array even
 if this loop terminated at some point, so I only gave it a couple
 of minutes]

The failure happens precisely at the moment we transition to early
userspace, so presumably userspace I/O is failing (or something related
to raw device access, perhaps, since the first thing it does is a
vgscan).

I haven't bisected yet (sorry, I have work to do which means this
machine must be running right now), but nothing has changed in the
arcmsr controller, nor in SCSI-land excepting

commit 98dcc2946adbe4349ef1ef9b99873b912831edd4
Author: Martin K. Petersen martin.peter...@oracle.com
Date:   Thu Jun 6 22:15:55 2013 -0400

SCSI: sd: Update WRITE SAME heuristics

so my, admittedly largely baseless, suspicions currently fall there.


Obviously, at this point, this machine has no modules loaded (it has
almost none loaded even when fully operational)

.config, unchanged from 3.10.1 to 3.10.3:

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT=elf64-x86-64
CONFIG_ARCH_DEFCONFIG=arch/x86/configs/x86_64_defconfig
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS=-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_LZMA=y
CONFIG_DEFAULT_HOSTNAME=spindle
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_FHANDLE=y
CONFIG_AUDIT=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y