RE: [PATCH 0/7] aacraid driver updates

2015-03-16 Thread Mahesh Rajashekhara
Hi James,

Can you please let me know the status of this patch set which I submitted 
sometime back.

Thanks,
Mahesh

-Original Message-
From: Mahesh Rajashekhara 
Sent: Wednesday, March 04, 2015 2:08 PM
To: jbottom...@parallels.com; linux-scsi@vger.kernel.org
Cc: aacr...@pmc-sierra.com; Harry Yang; Achim Leubner; Rajinikanth Pandurangan; 
Rich Bono; Mahesh Rajashekhara
Subject: [PATCH 0/7] aacraid driver updates

This patch set includes some important bug fixes and new feature supports.

Mahesh Rajashekhara (7):
  aacraid: AIF support for SES device add/remove
  aacraid: IOCTL pass-through command fix
  aacraid: 4KB sector support
  aacraid: MSI-x support
  aacraid: vpd page code 0x83 support
  aacraid: performance improvement changes
  aacraid: AIF raw device remove support

 drivers/scsi/aacraid/aachba.c   |  355 +--
 drivers/scsi/aacraid/aacraid.h  |  102 +-
 drivers/scsi/aacraid/commctrl.c |   10 +-
 drivers/scsi/aacraid/comminit.c |  106 ++-
 drivers/scsi/aacraid/commsup.c  |   78 ++--
 drivers/scsi/aacraid/dpcsup.c   |   13 +-
 drivers/scsi/aacraid/linit.c|   48 +++--
 drivers/scsi/aacraid/rx.c   |   14 +-
 drivers/scsi/aacraid/src.c  |  404 +--
 9 files changed, 894 insertions(+), 236 deletions(-)

-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


avoid disk spinup at system resume

2015-03-16 Thread Peter Münster
Hi,

My SATA disk (sdb) spins up at every system resume. How could I avoid
that please?

Some details:

- sdb not mounted
- echo 0 >/sys/block/sdb/device/scsi_disk/2:0:0:0/manage_start_stop
- hdparm -C /dev/sdb :   "drive state is:  standby"
- s2ram
- after wake up: "drive state is:  active/idle"
- kernel message at wakeup: "sd 2:0:0:0: [sdb] Synchronizing SCSI cache"
- when doing "echo scsi remove-single-device 2 0 0 0 >/proc/scsi/scsi"
  before s2ram, sdb keeps sleeping after resume

I need the disk only about once per week. I use s2ram several times per
day, and I prefer that sdb keeps sleeping.

TIA for any hints,
-- 
   Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] aacraid driver updates

2015-03-16 Thread James Bottomley
On Mon, 2015-03-16 at 11:24 +, Mahesh Rajashekhara wrote:
> Hi James,
> 
> Can you please let me know the status of this patch set which I
> submitted sometime back.

We're following this:

http://marc.info/?l=linux-scsi&m=142556689315114

So you need a review.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v4] xen-scsiback: define a pr_fmt macro with xen-pvscsi

2015-03-16 Thread David Vrabel
On 10/03/15 20:49, Tao Chen wrote:
> Add the {xen-pvscsi: } prefix in pr_fmt and remove DPRINTK, then
> replace all DPRINTK with pr_debug.
> 
> Also fixed up some comments just as eliminate redundant whitespace
> and format the code.
> 
> These will make the code easier to read.

Applied to devel/for-linus-4.1, thanks.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] scsi: qla4xxx: drop duplicate init_completion

2015-03-16 Thread Nicholas Mc Guire
The double call to init_completion(&ha->disable_acb_comp); is unnecessary.
Checking drivers/scsi/qla4xxx/ql4_def.h it seems that struct scsi_qla_host
only is defining these four struct completions - thus this seems to be a
typo only. Thus the duplicate call is simply dropped.

Signed-off-by: Nicholas Mc Guire 
---

Originally posted as http://lkml.org/lkml/2014/12/23/355

Patch was only compile-tested with x86_64_defconfig + CONFIG_SCSI_LOWLEVEL=y,
CONFIG_SCSI_QLA_ISCSI=m

Patch is against 4.0-rc3 (localversion-next is -next-20150316

 drivers/scsi/qla4xxx/ql4_os.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 6d25879..2723bd9 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -8669,7 +8669,6 @@ static int qla4xxx_probe_adapter(struct pci_dev *pdev,
mutex_init(&ha->mbox_sem);
mutex_init(&ha->chap_sem);
init_completion(&ha->mbx_intr_comp);
-   init_completion(&ha->disable_acb_comp);
init_completion(&ha->idc_comp);
init_completion(&ha->link_up_comp);
init_completion(&ha->disable_acb_comp);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices

2015-03-16 Thread Hans de Goede
A new uas compatible controller has shown up in some people's devices from
the manufacturer Initio Corporation, this controller needs the US_FL_NO_ATA_1X
quirk to work properly with uas, so add it to the uas quirks table.

Reported-and-tested-by: Benjamin Tissoires 
Cc: Benjamin Tissoires 
Cc: sta...@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede 
---
 drivers/usb/storage/unusual_uas.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/usb/storage/unusual_uas.h 
b/drivers/usb/storage/unusual_uas.h
index 8257042..c85ea53 100644
--- a/drivers/usb/storage/unusual_uas.h
+++ b/drivers/usb/storage/unusual_uas.h
@@ -113,6 +113,13 @@ UNUSUAL_DEV(0x0bc2, 0xab2a, 0x, 0x,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
 
+/* Reported-by: Benjamin Tissoires  */
+UNUSUAL_DEV(0x13fd, 0x3940, 0x, 0x,
+   "Initio Corporation",
+   "",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_NO_ATA_1X),
+
 /* Reported-by: Tom Arild Naess  */
 UNUSUAL_DEV(0x152d, 0x0539, 0x, 0x,
"JMicron",
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread James Bottomley
[cc to linux-scsi added since this seems relevant]
On Mon, 2015-03-16 at 17:00 +1100, Dave Chinner wrote:
> Hi Folks,
> 
> As I told many people at Vault last week, I wrote a document
> outlining how we should modify the on-disk structures of XFS to
> support host aware SMR drives on the (long) plane flights to Boston.
> 
> TL;DR: not a lot of change to the XFS kernel code is required, no
> specific SMR awareness is needed by the kernel code.  Only
> relatively minor tweaks to the on-disk format will be needed and
> most of the userspace changes are relatively straight forward, too.
> 
> The source for that document can be found in this git tree here:
> 
> git://git.kernel.org/pub/scm/fs/xfs/xfs-documentation
> 
> in the file design/xfs-smr-structure.asciidoc. Alternatively,
> pull it straight from cgit:
> 
> https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/design/xfs-smr-structure.asciidoc
> 
> Or there is a pdf version built from the current TOT on the xfs.org
> wiki here:
> 
> http://xfs.org/index.php/Host_Aware_SMR_architecture
> 
> Happy reading!

I don't think it would have caused too much heartache to post the entire
doc to the list, but anyway

The first is a meta question: What happened to the idea of separating
the fs block allocator from filesystems?  It looks like a lot of the
updates could be duplicated into other filesystems, so it might be a
very opportune time to think about this.


> == Data zones
> 
> What we need is a mechanism for tracking the location of zones (i.e. start 
> LBA),
> free space/write pointers within each zone, and some way of keeping track of
> that information across mounts. If we assign a real time bitmap/summary inode
> pair to each zone, we have a method of tracking free space in the zone. We can
> use the existing bitmap allocator with a small tweak (sequentially ascending,
> packed extent allocation only) to ensure that newly written blocks are 
> allocated
> in a sane manner.
> 
> We're going to need userspace to be able to see the contents of these inodes;
> read only access wil be needed to analyse the contents of the zone, so we're
> going to need a special directory to expose this information. It would be 
> useful
> to have a ".zones" directory hanging off the root directory that contains all
> the zone allocation inodes so userspace can simply open them.

The ZBC standard is being constructed.  However, all revisions agree
that the drive is perfectly capable of tracking the zone pointers (and
even the zone status).  Rather than having you duplicate the information
within the XFS metadata, surely it's better with us to come up with some
block way of reading it from the disk (and caching it for faster
access)?


> == Quantification of Random Write Zone Capacity
> 
> A basic guideline is that for 4k blocks and zones of 256MB, we'll need 8kB of
> bitmap space and two inodes, so call it 10kB per 256MB zone. That's 40MB per 
> TB
> for free space bitmaps. We'll want to suport at least 1 million inodes per TB,
> so that's another 512MB per TB, plus another 256MB per TB for directory
> structures. There's other bits and pieces of metadata as well (attribute 
> space,
> internal freespace btrees, reverse map btrees, etc.
> 
> So, at minimum we will probably need at least 2GB of random write space per TB
> of SMR zone data space. Plus a couple of GB for the journal if we want the 
> easy
> option. For those drive vendors out there that are listening and want good
> performance, replace the CMR region with a SSD

This seems to be a place where standards work is still needed.  Right at
the moment for Host Managed, the physical layout of the drives makes it
reasonably simple to convert edge zones from SMR to CMR and vice versa
at the expense of changing capacity.  It really sounds like we need a
simple, programmatic way of doing this.  The question I'd have is: are
you happy with just telling manufacturers ahead of time how much CMR
space you need and hoping they comply, or should we push for a standards
way of flipping end zones to CMR?


> === Crash recovery
> 
> Write pointer location is undefined after power failure. It could be at an old
> location, the current location or anywhere in between. The only guarantee that
> we have is that if we flushed the cache (i.e. fsync'd a file) then they will 
> at
> least be in a position at or past the location of the fsync.
> 
> Hence before a filesystem runs journal recovery, all it's zone allocation 
> write
> pointers need to be set to what the drive thinks they are, and all of the zone
> allocation beyond the write pointer need to be cleared. We could do this 
> during
> log recovery in kernel, but that means we need full ZBC awareness in log
> recovery to iterate and query all the zones.

If you just use a cached zone pointer provided by block, this should
never be a problem because you'd always know where the drive thought the
pointer was.


> === RAID on SMR
> 
> How does RAID work with SMR, and exa

Re: [PATCH 1/3] scsi: serialize ->rescan against ->remove

2015-03-16 Thread Paolo Bonzini


On 05/03/2015 14:37, Paolo Bonzini wrote:
> 
> 
> On 05/03/2015 14:33, Christoph Hellwig wrote:
>> Any chance to get reviews for this series?  Also we should at least
>> expedite this first patch into 4.0-rc as it fixes scanning races
>> in virtio_scsi.
> 
> I reviewed 1 and 3, but I'm not really qualified for patch 2.

Christoph,

any news about these patches?

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread Adrian Palmer
Thanks for the document!  I think we are off to a good start going in
a common direction.  We have quite a few details to iron out, but I
feel that we are getting there by everyone simply expressing what's
needed.

My additions are in-line.


Adrian Palmer
Firmware Engineer II
R&D Firmware
Seagate, Longmont Colorado
720-684-1307
adrian.pal...@seagate.com


On Mon, Mar 16, 2015 at 9:28 AM, James Bottomley
 wrote:
> [cc to linux-scsi added since this seems relevant]
> On Mon, 2015-03-16 at 17:00 +1100, Dave Chinner wrote:
>> Hi Folks,
>>
>> As I told many people at Vault last week, I wrote a document
>> outlining how we should modify the on-disk structures of XFS to
>> support host aware SMR drives on the (long) plane flights to Boston.
>>
>> TL;DR: not a lot of change to the XFS kernel code is required, no
>> specific SMR awareness is needed by the kernel code.  Only
>> relatively minor tweaks to the on-disk format will be needed and
>> most of the userspace changes are relatively straight forward, too.
>>
>> The source for that document can be found in this git tree here:
>>
>> git://git.kernel.org/pub/scm/fs/xfs/xfs-documentation
>>
>> in the file design/xfs-smr-structure.asciidoc. Alternatively,
>> pull it straight from cgit:
>>
>> https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/design/xfs-smr-structure.asciidoc
>>
>> Or there is a pdf version built from the current TOT on the xfs.org
>> wiki here:
>>
>> http://xfs.org/index.php/Host_Aware_SMR_architecture
>>
>> Happy reading!
>
> I don't think it would have caused too much heartache to post the entire
> doc to the list, but anyway
>
> The first is a meta question: What happened to the idea of separating
> the fs block allocator from filesystems?  It looks like a lot of the
> updates could be duplicated into other filesystems, so it might be a
> very opportune time to think about this.
>

That's not a half-bad idea.  In speaking to EXT4 dev group, we're
already looking at pulling the block allocator out and making it
plugable.  I'm looking at doing a clean re-write anyway for SMR.
However, the question I have is in Cow vs non-CoW system differences
for allocation preferences, and what other changes need to be made in
*all* the file systems.

>
>> == Data zones
>>
>> What we need is a mechanism for tracking the location of zones (i.e. start 
>> LBA),
>> free space/write pointers within each zone, and some way of keeping track of
>> that information across mounts. If we assign a real time bitmap/summary inode
>> pair to each zone, we have a method of tracking free space in the zone. We 
>> can
>> use the existing bitmap allocator with a small tweak (sequentially ascending,
>> packed extent allocation only) to ensure that newly written blocks are 
>> allocated
>> in a sane manner.
>>
>> We're going to need userspace to be able to see the contents of these inodes;
>> read only access wil be needed to analyse the contents of the zone, so we're
>> going to need a special directory to expose this information. It would be 
>> useful
>> to have a ".zones" directory hanging off the root directory that contains all
>> the zone allocation inodes so userspace can simply open them.
>
> The ZBC standard is being constructed.  However, all revisions agree
> that the drive is perfectly capable of tracking the zone pointers (and
> even the zone status).  Rather than having you duplicate the information
> within the XFS metadata, surely it's better with us to come up with some
> block way of reading it from the disk (and caching it for faster
> access)?
>

In discussions with Dr. Reinecke, it seems extremely prudent to have a
kernel cache somewhere.  The SD driver would be the base for updating
the cache, but it would need to be available to the allocators, the
/sys fs for userspace utilities, and possibly other processes.  In
EXT4, I don't think it's feasible to have the cache -- however, the
metadata will MIRROR the cache ( BG# = Zone#, databitmap = WP, etc)

>
>> == Quantification of Random Write Zone Capacity
>>
>> A basic guideline is that for 4k blocks and zones of 256MB, we'll need 8kB of
>> bitmap space and two inodes, so call it 10kB per 256MB zone. That's 40MB per 
>> TB
>> for free space bitmaps. We'll want to suport at least 1 million inodes per 
>> TB,
>> so that's another 512MB per TB, plus another 256MB per TB for directory
>> structures. There's other bits and pieces of metadata as well (attribute 
>> space,
>> internal freespace btrees, reverse map btrees, etc.
>>
>> So, at minimum we will probably need at least 2GB of random write space per 
>> TB
>> of SMR zone data space. Plus a couple of GB for the journal if we want the 
>> easy
>> option. For those drive vendors out there that are listening and want good
>> performance, replace the CMR region with a SSD
>
> This seems to be a place where standards work is still needed.  Right at
> the moment for Host Managed, the physical layout of the drives makes it
> reasonably simple to convert 

Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread James Bottomley
On Mon, 2015-03-16 at 12:23 -0600, Adrian Palmer wrote:
[...]
> >> == Data zones
> >>
> >> What we need is a mechanism for tracking the location of zones (i.e. start 
> >> LBA),
> >> free space/write pointers within each zone, and some way of keeping track 
> >> of
> >> that information across mounts. If we assign a real time bitmap/summary 
> >> inode
> >> pair to each zone, we have a method of tracking free space in the zone. We 
> >> can
> >> use the existing bitmap allocator with a small tweak (sequentially 
> >> ascending,
> >> packed extent allocation only) to ensure that newly written blocks are 
> >> allocated
> >> in a sane manner.
> >>
> >> We're going to need userspace to be able to see the contents of these 
> >> inodes;
> >> read only access wil be needed to analyse the contents of the zone, so 
> >> we're
> >> going to need a special directory to expose this information. It would be 
> >> useful
> >> to have a ".zones" directory hanging off the root directory that contains 
> >> all
> >> the zone allocation inodes so userspace can simply open them.
> >
> > The ZBC standard is being constructed.  However, all revisions agree
> > that the drive is perfectly capable of tracking the zone pointers (and
> > even the zone status).  Rather than having you duplicate the information
> > within the XFS metadata, surely it's better with us to come up with some
> > block way of reading it from the disk (and caching it for faster
> > access)?
> >
> 
> In discussions with Dr. Reinecke, it seems extremely prudent to have a
> kernel cache somewhere.  The SD driver would be the base for updating
> the cache, but it would need to be available to the allocators, the
> /sys fs for userspace utilities, and possibly other processes.  In
> EXT4, I don't think it's feasible to have the cache -- however, the
> metadata will MIRROR the cache ( BG# = Zone#, databitmap = WP, etc)

I think I've got two points: if we're caching it, we should have a
single cache and everyone should use it.  There may be a good reason why
we can't do this, but I'd like to see it explained before everyone goes
off and invents their own zone pointer cache.  If we do it in one place,
we can make the cache properly shrinkable (the information can be purged
under memory pressure and re-fetched if requested).

> >
> >> == Quantification of Random Write Zone Capacity
> >>
> >> A basic guideline is that for 4k blocks and zones of 256MB, we'll need 8kB 
> >> of
> >> bitmap space and two inodes, so call it 10kB per 256MB zone. That's 40MB 
> >> per TB
> >> for free space bitmaps. We'll want to suport at least 1 million inodes per 
> >> TB,
> >> so that's another 512MB per TB, plus another 256MB per TB for directory
> >> structures. There's other bits and pieces of metadata as well (attribute 
> >> space,
> >> internal freespace btrees, reverse map btrees, etc.
> >>
> >> So, at minimum we will probably need at least 2GB of random write space 
> >> per TB
> >> of SMR zone data space. Plus a couple of GB for the journal if we want the 
> >> easy
> >> option. For those drive vendors out there that are listening and want good
> >> performance, replace the CMR region with a SSD
> >
> > This seems to be a place where standards work is still needed.  Right at
> > the moment for Host Managed, the physical layout of the drives makes it
> > reasonably simple to convert edge zones from SMR to CMR and vice versa
> > at the expense of changing capacity.  It really sounds like we need a
> > simple, programmatic way of doing this.  The question I'd have is: are
> > you happy with just telling manufacturers ahead of time how much CMR
> > space you need and hoping they comply, or should we push for a standards
> > way of flipping end zones to CMR?
> >
> 
> I agree this is an issue, but for HA (and less for HM), there is a lot
> of flexability needed for this.  In our BoFs at Vault, we talked about
> partitioning needs.  We cannot assume that there is 1 partition per
> disk, and that it has absolute boundaries.  Sure a data disk can have
> 1 partition from LBA 0 to end of disk, but an OS disk can't.  For
> example, GPT and EFI cause problems.  On the other end, gamers and
> hobbists tend to dual/triple boot  There cannot be a onesize
> partition for all purposes.
> 
> The conversion between CMR and SMR zones is not simple.  That's a
> hardware format.  Any change in the LBA space would be non-linear.
> 
> One idea that I came up with in our BoFs is using flash with an FTL.
> If the manufacturers put in enough flash to cover 8 or so zones, then
> a command could be implemented to allow the flash to be assigned to
> zones.  That way, a limited number of CMR zones can be placed anywhere
> on the disk without disrupting format or LBA space.  However, ZAC/ZBC
> is to be applied to flash also...

Perhaps we need to step back a bit.  The problem is that most
filesystems will require some CMR space for metadata that is
continuously updated in place.  The amount will pro

Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread Dave Chinner
On Mon, Mar 16, 2015 at 03:06:27PM -0400, James Bottomley wrote:
> On Mon, 2015-03-16 at 12:23 -0600, Adrian Palmer wrote:
> [...]
> > >> == Data zones
> > >>
> > >> What we need is a mechanism for tracking the location of zones (i.e. 
> > >> start LBA),
> > >> free space/write pointers within each zone, and some way of keeping 
> > >> track of
> > >> that information across mounts. If we assign a real time bitmap/summary 
> > >> inode
> > >> pair to each zone, we have a method of tracking free space in the zone. 
> > >> We can
> > >> use the existing bitmap allocator with a small tweak (sequentially 
> > >> ascending,
> > >> packed extent allocation only) to ensure that newly written blocks are 
> > >> allocated
> > >> in a sane manner.
> > >>
> > >> We're going to need userspace to be able to see the contents of these 
> > >> inodes;
> > >> read only access wil be needed to analyse the contents of the zone, so 
> > >> we're
> > >> going to need a special directory to expose this information. It would 
> > >> be useful
> > >> to have a ".zones" directory hanging off the root directory that 
> > >> contains all
> > >> the zone allocation inodes so userspace can simply open them.
> > >
> > > The ZBC standard is being constructed.  However, all revisions agree
> > > that the drive is perfectly capable of tracking the zone pointers (and
> > > even the zone status).  Rather than having you duplicate the information
> > > within the XFS metadata, surely it's better with us to come up with some
> > > block way of reading it from the disk (and caching it for faster
> > > access)?

You misunderstand my proposal - XFS doesn't track the write pointer
in it's metadata at all. It tracks a sequential allocation target
block in each zone via the per-zone allocation bitmap inode. The
assumption is that this will match the underlying zone write
pointer, as long as we verify they match when we first go to
allocate from the zone.

> > In discussions with Dr. Reinecke, it seems extremely prudent to have a
> > kernel cache somewhere.  The SD driver would be the base for updating
> > the cache, but it would need to be available to the allocators, the
> > /sys fs for userspace utilities, and possibly other processes.  In
> > EXT4, I don't think it's feasible to have the cache -- however, the
> > metadata will MIRROR the cache ( BG# = Zone#, databitmap = WP, etc)
> 
> I think I've got two points: if we're caching it, we should have a
> single cache and everyone should use it.  There may be a good reason why
> we can't do this, but I'd like to see it explained before everyone goes
> off and invents their own zone pointer cache.  If we do it in one place,
> we can make the cache properly shrinkable (the information can be purged
> under memory pressure and re-fetched if requested).

Sure, but XFS won't have it's own cache, so what the kernel does
here when we occasionally query the location of the write pointer is
irrelevant to me...

> > >> == Quantification of Random Write Zone Capacity
> > >>
> > >> A basic guideline is that for 4k blocks and zones of 256MB, we'll need 
> > >> 8kB of
> > >> bitmap space and two inodes, so call it 10kB per 256MB zone. That's 40MB 
> > >> per TB
> > >> for free space bitmaps. We'll want to suport at least 1 million inodes 
> > >> per TB,
> > >> so that's another 512MB per TB, plus another 256MB per TB for directory
> > >> structures. There's other bits and pieces of metadata as well (attribute 
> > >> space,
> > >> internal freespace btrees, reverse map btrees, etc.
> > >>
> > >> So, at minimum we will probably need at least 2GB of random write space 
> > >> per TB
> > >> of SMR zone data space. Plus a couple of GB for the journal if we want 
> > >> the easy
> > >> option. For those drive vendors out there that are listening and want 
> > >> good
> > >> performance, replace the CMR region with a SSD
> > >
> > > This seems to be a place where standards work is still needed.  Right at
> > > the moment for Host Managed, the physical layout of the drives makes it
> > > reasonably simple to convert edge zones from SMR to CMR and vice versa
> > > at the expense of changing capacity.  It really sounds like we need a
> > > simple, programmatic way of doing this.  The question I'd have is: are
> > > you happy with just telling manufacturers ahead of time how much CMR
> > > space you need and hoping they comply, or should we push for a standards
> > > way of flipping end zones to CMR?

I've taken what manufacturers are already shipping and found that it
is sufficient for our purposes. They've already set the precendence,
we'll be dependent on them maintaining that same percentage of
CMR:SMR regions in their drives. Otherwise, they won't have
filesystems that run on their drives and they won't sell any of
them.

i.e. we don't need to standardise anything here - the problem is
already solved.

> possibly btrfs) will need some.  One possibility is that we let the
> drives be reformatted in place, say as p

Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread Dave Chinner
On Mon, Mar 16, 2015 at 11:28:53AM -0400, James Bottomley wrote:
> [cc to linux-scsi added since this seems relevant]
> On Mon, 2015-03-16 at 17:00 +1100, Dave Chinner wrote:
> > Hi Folks,
> > 
> > As I told many people at Vault last week, I wrote a document
> > outlining how we should modify the on-disk structures of XFS to
> > support host aware SMR drives on the (long) plane flights to Boston.
> > 
> > TL;DR: not a lot of change to the XFS kernel code is required, no
> > specific SMR awareness is needed by the kernel code.  Only
> > relatively minor tweaks to the on-disk format will be needed and
> > most of the userspace changes are relatively straight forward, too.
> > 
> > The source for that document can be found in this git tree here:
> > 
> > git://git.kernel.org/pub/scm/fs/xfs/xfs-documentation
> > 
> > in the file design/xfs-smr-structure.asciidoc. Alternatively,
> > pull it straight from cgit:
> > 
> > https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/design/xfs-smr-structure.asciidoc
> > 
> > Or there is a pdf version built from the current TOT on the xfs.org
> > wiki here:
> > 
> > http://xfs.org/index.php/Host_Aware_SMR_architecture
> > 
> > Happy reading!
> 
> I don't think it would have caused too much heartache to post the entire
> doc to the list, but anyway
> 
> The first is a meta question: What happened to the idea of separating
> the fs block allocator from filesystems?  It looks like a lot of the
> updates could be duplicated into other filesystems, so it might be a
> very opportune time to think about this.

Which requires a complete rework of the fs/block layer. That's the
long term goal, but we aren't going to be there for a few years yet.
Hust look at how long it's taken for copy offload (which is trivial
compared to allocation offload) to be implemented

> > === RAID on SMR
> > 
> > How does RAID work with SMR, and exactly what does that look like to
> > the filesystem?
> > 
> > How does libzbc work with RAID given it is implemented through the scsi 
> > ioctl
> > interface?
> 
> Probably need to cc dm-devel here.  However, I think we're all agreed
> this is RAID across multiple devices, rather than within a single
> device?  In which case we just need a way of ensuring identical zoning
> on the raided devices and what you get is either a standard zone (for
> mirror) or a larger zone (for hamming etc).

Any sort of RAID is a bloody hard problem, hence the fact that I'm
designing a solution for a filesystem on top of an entire bare
drive. I'm not trying to solve every use case in the world, just the
one where the drive manufactures think SMR will be mostly used: the
back end of "never delete" distributed storage environments

We can't wait for years for infrastructure layers to catch up in the
brave new world of shipping SMR drives. We may not like them, but we
have to make stuff work. I'm not trying to solve every problem - I'm
just tryin gto address the biggest use case I see for SMR devices
and it just so happens that XFS is already used pervasively in that
same use case, mostly within the same "no raid, fs per entire
device" constraints as I've documented for this proposal...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/3] scsi: storvsc: Increase the tablesize based on host's capabilities

2015-03-16 Thread KY Srinivasan


> -Original Message-
> From: K. Y. Srinivasan [mailto:k...@microsoft.com]
> Sent: Monday, March 9, 2015 8:42 PM
> To: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; oher...@suse.com;
> jbottom...@parallels.com; h...@infradead.org; linux-scsi@vger.kernel.org;
> a...@canonical.com; vkuzn...@redhat.com
> Cc: KY Srinivasan
> Subject: [PATCH 0/3] scsi: storvsc: Increase the tablesize based on host's
> capabilities
> 
> Presently, storvsc limits the I/O size arbitrarily. Make this configurable 
> based
> on what the host advertises.
> 
> K. Y. Srinivasan (3):
>   scsi: storvsc: Retrieve information about the capability of the
> target
>   scsi: storvsc: Set the tablesize based on the information given by
> the host
>   scsi: storvsc: Enable clustering
> 
>  drivers/scsi/storvsc_drv.c |   78
> +++-
>  1 files changed, 62 insertions(+), 16 deletions(-)

Christoph,

Let me know if I should resend these patches.

Regards,

K. Y
> 
> --
> 1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread Alireza Haghdoost
On Mon, Mar 16, 2015 at 3:32 PM, Dave Chinner  wrote:
> On Mon, Mar 16, 2015 at 11:28:53AM -0400, James Bottomley wrote:
>> [cc to linux-scsi added since this seems relevant]
>> On Mon, 2015-03-16 at 17:00 +1100, Dave Chinner wrote:
>> > Hi Folks,
>> >
>> > As I told many people at Vault last week, I wrote a document
>> > outlining how we should modify the on-disk structures of XFS to
>> > support host aware SMR drives on the (long) plane flights to Boston.
>> >
>> > TL;DR: not a lot of change to the XFS kernel code is required, no
>> > specific SMR awareness is needed by the kernel code.  Only
>> > relatively minor tweaks to the on-disk format will be needed and
>> > most of the userspace changes are relatively straight forward, too.
>> >
>> > The source for that document can be found in this git tree here:
>> >
>> > git://git.kernel.org/pub/scm/fs/xfs/xfs-documentation
>> >
>> > in the file design/xfs-smr-structure.asciidoc. Alternatively,
>> > pull it straight from cgit:
>> >
>> > https://git.kernel.org/cgit/fs/xfs/xfs-documentation.git/tree/design/xfs-smr-structure.asciidoc
>> >
>> > Or there is a pdf version built from the current TOT on the xfs.org
>> > wiki here:
>> >
>> > http://xfs.org/index.php/Host_Aware_SMR_architecture
>> >
>> > Happy reading!
>>
>> I don't think it would have caused too much heartache to post the entire
>> doc to the list, but anyway
>>
>> The first is a meta question: What happened to the idea of separating
>> the fs block allocator from filesystems?  It looks like a lot of the
>> updates could be duplicated into other filesystems, so it might be a
>> very opportune time to think about this.
>
> Which requires a complete rework of the fs/block layer. That's the
> long term goal, but we aren't going to be there for a few years yet.
> Hust look at how long it's taken for copy offload (which is trivial
> compared to allocation offload) to be implemented
>
>> > === RAID on SMR
>> >
>> > How does RAID work with SMR, and exactly what does that look like to
>> > the filesystem?
>> >
>> > How does libzbc work with RAID given it is implemented through the scsi 
>> > ioctl
>> > interface?
>>
>> Probably need to cc dm-devel here.  However, I think we're all agreed
>> this is RAID across multiple devices, rather than within a single
>> device?  In which case we just need a way of ensuring identical zoning
>> on the raided devices and what you get is either a standard zone (for
>> mirror) or a larger zone (for hamming etc).
>
> Any sort of RAID is a bloody hard problem, hence the fact that I'm
> designing a solution for a filesystem on top of an entire bare
> drive. I'm not trying to solve every use case in the world, just the
> one where the drive manufactures think SMR will be mostly used: the
> back end of "never delete" distributed storage environments
> We can't wait for years for infrastructure layers to catch up in the
> brave new world of shipping SMR drives. We may not like them, but we
> have to make stuff work. I'm not trying to solve every problem - I'm
> just tryin gto address the biggest use case I see for SMR devices
> and it just so happens that XFS is already used pervasively in that
> same use case, mostly within the same "no raid, fs per entire
> device" constraints as I've documented for this proposal...
>
> Cheers,
>
> Dave.


I am confused what kind of application you are referring to for this
"back end, no raid, fs per entire device". Are you gonna rely on the
application to do replication for disk failure protection ?

I think it is a good idea to devise the file system changes with a
little bit concern about its negative impact on RAID. My impression is
that these changes push more in-place parity update if file system
deployed on the top of parity based RAID array since it would convert
most of random IOs to sequential IOs that might happen to be in the
same parity stripe.

--Alireza
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


€950,000.00 EURO

2015-03-16 Thread Qatar Foundation
Liebe Begünstigte,

Sie wurden ausgewählt, um € 950.000,00 EURO als Charity-Spenden / Hilfe der 
Qatar Foundation erhalten. Antworten zurück für weitere Informationen.

Mit freundlichen Grüßen,
Ingenieur Saad Al Muhannadi.
Antwort auf: qf.qa...@gmail.com
Präsident der Qatar Foundation.



---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] xfs: Supporting Host Aware SMR Drives

2015-03-16 Thread Dave Chinner
On Mon, Mar 16, 2015 at 08:12:16PM -0500, Alireza Haghdoost wrote:
> On Mon, Mar 16, 2015 at 3:32 PM, Dave Chinner  wrote:
> > On Mon, Mar 16, 2015 at 11:28:53AM -0400, James Bottomley wrote:
> >> Probably need to cc dm-devel here.  However, I think we're all agreed
> >> this is RAID across multiple devices, rather than within a single
> >> device?  In which case we just need a way of ensuring identical zoning
> >> on the raided devices and what you get is either a standard zone (for
> >> mirror) or a larger zone (for hamming etc).
> >
> > Any sort of RAID is a bloody hard problem, hence the fact that I'm
> > designing a solution for a filesystem on top of an entire bare
> > drive. I'm not trying to solve every use case in the world, just the
> > one where the drive manufactures think SMR will be mostly used: the
> > back end of "never delete" distributed storage environments
> > We can't wait for years for infrastructure layers to catch up in the
> > brave new world of shipping SMR drives. We may not like them, but we
> > have to make stuff work. I'm not trying to solve every problem - I'm
> > just tryin gto address the biggest use case I see for SMR devices
> > and it just so happens that XFS is already used pervasively in that
> > same use case, mostly within the same "no raid, fs per entire
> > device" constraints as I've documented for this proposal...
> 
> I am confused what kind of application you are referring to for this
> "back end, no raid, fs per entire device". Are you gonna rely on the
> application to do replication for disk failure protection ?

Exactly. Think distributed storage such as Ceph and gluster where
the data redundancy and failure recovery algorithms are in layers
*above* the local filesystem, not in the storage below the fs.  The
"no raid, fs per device" model is already a very common back end
storage configuration for such deployments.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html