Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Ross Walker
On Apr 24, 2010, at 4:53 PM, Les Mikesell  wrote:

> Ross Walker wrote:
>> On Apr 24, 2010, at 4:34 PM, Les Mikesell   
>> wrote:
>>
>>> Ross Walker wrote:
 On Apr 24, 2010, at 12:43 PM, Les Mikesell 
 wrote:

> Ross Walker wrote:
>> NFS should always be 'sync' if performance isn't good, then your
>> storage isn't good.
> Why demand sync on remote storage when you typically don't have it
> locally?
> Programs that need transactional integrity should know when to  
> fsync
> () and for
> anything else there's not much difference whether you crash before
> or after a
> write() was issued in terms of it not completing.
 Yes, but 'async' ignores those fsyncs and returns immediately.
>>> That sounds like a bug in the nfs client code if fsync() doesn't
>>> block until all
>>> of the data is committed to disk.
>>
>> It's not the client side I'm talking about, but the server side. We
>> were talking NFS servers and exporting sync (obey fsyncs) vs async
>> (ignore fsyncs).
>>
>> The client always mounts async, that's not the problem.
>
> That's different.  I thought the nfs spec was always sync on the  
> server side and
> the client says when async is OK.  And there's some special case  
> response to
> handle the case where the server rebooted between the async writes  
> and the
> subsequent fsync().

All the NFS info you wanted, but were afraid to ask:

http://nfs.sourceforge.net/

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Les Mikesell
Ross Walker wrote:
> On Apr 24, 2010, at 4:34 PM, Les Mikesell  wrote:
> 
>> Ross Walker wrote:
>>> On Apr 24, 2010, at 12:43 PM, Les Mikesell 
>>> wrote:
>>>
 Ross Walker wrote:
> NFS should always be 'sync' if performance isn't good, then your
> storage isn't good.
 Why demand sync on remote storage when you typically don't have it
 locally?
 Programs that need transactional integrity should know when to fsync
 () and for
 anything else there's not much difference whether you crash before
 or after a
 write() was issued in terms of it not completing.
>>> Yes, but 'async' ignores those fsyncs and returns immediately.
>> That sounds like a bug in the nfs client code if fsync() doesn't  
>> block until all
>> of the data is committed to disk.
> 
> It's not the client side I'm talking about, but the server side. We  
> were talking NFS servers and exporting sync (obey fsyncs) vs async  
> (ignore fsyncs).
> 
> The client always mounts async, that's not the problem.

That's different.  I thought the nfs spec was always sync on the server side 
and 
the client says when async is OK.  And there's some special case response to 
handle the case where the server rebooted between the async writes and the 
subsequent fsync().

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Ross Walker
On Apr 24, 2010, at 4:34 PM, Les Mikesell  wrote:

> Ross Walker wrote:
>> On Apr 24, 2010, at 12:43 PM, Les Mikesell 
>> wrote:
>>
>>> Ross Walker wrote:
 NFS should always be 'sync' if performance isn't good, then your
 storage isn't good.
>>> Why demand sync on remote storage when you typically don't have it
>>> locally?
>>> Programs that need transactional integrity should know when to fsync
>>> () and for
>>> anything else there's not much difference whether you crash before
>>> or after a
>>> write() was issued in terms of it not completing.
>>
>> Yes, but 'async' ignores those fsyncs and returns immediately.
>
> That sounds like a bug in the nfs client code if fsync() doesn't  
> block until all
> of the data is committed to disk.

It's not the client side I'm talking about, but the server side. We  
were talking NFS servers and exporting sync (obey fsyncs) vs async  
(ignore fsyncs).

The client always mounts async, that's not the problem.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Les Mikesell
Ross Walker wrote:
> On Apr 24, 2010, at 12:43 PM, Les Mikesell   
> wrote:
> 
>> Ross Walker wrote:
>>> NFS should always be 'sync' if performance isn't good, then your
>>> storage isn't good.
>> Why demand sync on remote storage when you typically don't have it  
>> locally?
>> Programs that need transactional integrity should know when to fsync 
>> () and for
>> anything else there's not much difference whether you crash before  
>> or after a
>> write() was issued in terms of it not completing.
> 
> Yes, but 'async' ignores those fsyncs and returns immediately.

That sounds like a bug in the nfs client code if fsync() doesn't block until 
all 
of the data is committed to disk.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Ross Walker
On Apr 24, 2010, at 12:43 PM, Les Mikesell   
wrote:

> Ross Walker wrote:
>>
>> NFS should always be 'sync' if performance isn't good, then your
>> storage isn't good.
>
> Why demand sync on remote storage when you typically don't have it  
> locally?
> Programs that need transactional integrity should know when to fsync 
> () and for
> anything else there's not much difference whether you crash before  
> or after a
> write() was issued in terms of it not completing.

Yes, but 'async' ignores those fsyncs and returns immediately.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Les Mikesell
Ross Walker wrote:
> 
> NFS should always be 'sync' if performance isn't good, then your  
> storage isn't good.

Why demand sync on remote storage when you typically don't have it locally? 
Programs that need transactional integrity should know when to fsync() and for 
anything else there's not much difference whether you crash before or after a 
write() was issued in terms of it not completing.

-- 
   Les Mikesesll
lesmikes...@gmail.com

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-24 Thread Ross Walker
On Apr 22, 2010, at 8:08 PM, Ray Van Dolson  wrote:

> On Thu, Apr 22, 2010 at 03:57:01PM -0700, nate wrote:
>> John R Pierce wrote:
>>> Ray Van Dolson wrote:
> I think what you want is a proper storage array with mirrored  
> write
> cache.
>

 Which is what we have with ZFS + SSD-based ZIL for far less money  
 than
 a NetApp.

>>>
>>> not unless you have a pair of them configured as an active/standby  
>>> HA
>>> cluster, sharing dual port disk storage, and some how (magic?)  
>>> mirroring
>>> the cache pool so that if the active storage controller/server  
>>> fails,
>>> the standby can take over wthout losing a single write.
>>>
>>
>> OT too but really thought this was a good post/thread on ZFS
>>
>> http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg18898.html
>>
>> "ZFS is designed for high *reliability*"
>> [..]
>> "You want something  completely different. You expect it to deliver
>> *availability*.
>>
>> And availability is something ZFS doesn't promise. It simply can't
>> deliver this."
>
> Yep... and something you of course know going in.
>
> Don't want to get off on a tangent on that -- am still interested what
> type of solutions in the Linux world are out there that can  
> approximate
> what an SSD based ZIL does for ZFS.
>
> Kent Overstreet (from lkml) mentioned that his bcache patch is  
> intented
> to do something very similar.
>
> So I guess that's my answer -- it's not here yet, so sounds like the
> controller is the only way to achieve this currently.

How about locating XFS journal on SSDs and using HW RAID controller  
with big NVRAM cache.

That should be a lot faster than ZFS with SSD ZIL.

NFS should always be 'sync' if performance isn't good, then your  
storage isn't good.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-23 Thread Les Mikesell
On 4/23/2010 11:17 AM, Ray Van Dolson wrote:
> On Fri, Apr 23, 2010 at 10:20:01AM +0200, Jure Pečar wrote:
>>
 Ray Van Dolson wrote:
>> I think what you want is a proper storage array with mirrored write
>> cache.
>>
>> When ext3 came into widespread use, a popular method to "cache"
>> frequent fsyncs was to run it in a full data journaling mode, with
>> external journal on a separate disk.  This turned all random writes
>> to a sequential write, limited to a very small piece of disk and a
>> periodical journal flush to the real file system.  This worked
>> amazingly well for busy mail queues - throughput went up 10x and
>> more. People were also reporting improvements in NFS scenarios. Don't
>> know how this is relevant today in times of SSD, but it should be
>> worth to test it.
>
> Interesting.  As long as the requirements of O_SYNC are met once the
> data is written to the journal (I imagine it would be), then I could
> definitely see this speeding up NFS...
>
> On the other hand, if no write confirmation is sent until the data
> actually flushes out of the journal and onto disk, then the wins
> probably aren't as significant.

Do any linux filesystems actually get this right now?  In the past, the 
filesystem cache was somewhat divorced from file writes so fsync() and 
probably any write with O_SYNC would wait until the entire filesystem 
cache was flushed to disk, not just the related file buffer.

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-23 Thread Ray Van Dolson
On Fri, Apr 23, 2010 at 10:20:01AM +0200, Jure Pečar wrote:
> 
> > > Ray Van Dolson wrote:
> > > >> I think what you want is a proper storage array with mirrored write
> > > >> cache.
> 
> When ext3 came into widespread use, a popular method to "cache"
> frequent fsyncs was to run it in a full data journaling mode, with
> external journal on a separate disk.  This turned all random writes
> to a sequential write, limited to a very small piece of disk and a
> periodical journal flush to the real file system.  This worked
> amazingly well for busy mail queues - throughput went up 10x and
> more. People were also reporting improvements in NFS scenarios. Don't
> know how this is relevant today in times of SSD, but it should be
> worth to test it.

Interesting.  As long as the requirements of O_SYNC are met once the
data is written to the journal (I imagine it would be), then I could
definitely see this speeding up NFS...

On the other hand, if no write confirmation is sent until the data
actually flushes out of the journal and onto disk, then the wins
probably aren't as significant.

Sounds like it'd be worth trying though, thanks.

Ray
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-23 Thread Chan Chung Hang Christopher
Jure Pečar wrote:
>>> Ray Van Dolson wrote:
> I think what you want is a proper storage array with mirrored write
> cache.
> 
> When ext3 came into widespread use, a popular method to "cache" frequent 
> fsyncs was to run it in a full data journaling mode, with external journal on 
> a separate disk.
> This turned all random writes to a sequential write, limited to a very small 
> piece of disk and a periodical journal flush to the real file system.
> This worked amazingly well for busy mail queues - throughput went up 10x and 
> more. People were also reporting improvements in NFS scenarios. Don't know 
> how this is relevant today in times of SSD, but it should be worth to test it.
> 
> 

separate disk only? Don't forget nvram sticks or bbu ramdrives.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-23 Thread Jure Pečar

> > Ray Van Dolson wrote:
> > >> I think what you want is a proper storage array with mirrored write
> > >> cache.

When ext3 came into widespread use, a popular method to "cache" frequent fsyncs 
was to run it in a full data journaling mode, with external journal on a 
separate disk.
This turned all random writes to a sequential write, limited to a very small 
piece of disk and a periodical journal flush to the real file system.
This worked amazingly well for busy mail queues - throughput went up 10x and 
more. People were also reporting improvements in NFS scenarios. Don't know how 
this is relevant today in times of SSD, but it should be worth to test it.


-- 

Jure Pečar
http://jure.pecar.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread Ray Van Dolson
On Thu, Apr 22, 2010 at 03:50:11PM -0700, John R Pierce wrote:
> Ray Van Dolson wrote:
> >> I think what you want is a proper storage array with mirrored write
> >> cache.
> >> 
> >
> > Which is what we have with ZFS + SSD-based ZIL for far less money than
> > a NetApp.
> >   
> 
> not unless you have a pair of them configured as an active/standby HA 
> cluster, sharing dual port disk storage, and some how (magic?) mirroring 
> the cache pool so that if the active storage controller/server fails, 
> the standby can take over wthout losing a single write.

This is definitely tangental to what I was originally asking. :)

I'm not suggesting this perfectly replaces (or even comes close) a
clustered NetApp setup.  But it can provide similiar NFS write
performance and I can buy three of them and replicate data for DR needs
for far less than the price of a NetApp SnapMirror setup.

Ray
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread Ray Van Dolson
On Thu, Apr 22, 2010 at 03:57:01PM -0700, nate wrote:
> John R Pierce wrote:
> > Ray Van Dolson wrote:
> >>> I think what you want is a proper storage array with mirrored write
> >>> cache.
> >>>
> >>
> >> Which is what we have with ZFS + SSD-based ZIL for far less money than
> >> a NetApp.
> >>
> >
> > not unless you have a pair of them configured as an active/standby HA
> > cluster, sharing dual port disk storage, and some how (magic?) mirroring
> > the cache pool so that if the active storage controller/server fails,
> > the standby can take over wthout losing a single write.
> >
> 
> OT too but really thought this was a good post/thread on ZFS
> 
> http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg18898.html
> 
> "ZFS is designed for high *reliability*"
> [..]
> "You want something  completely different. You expect it to deliver
> *availability*.
> 
> And availability is something ZFS doesn't promise. It simply can't
> deliver this."

Yep... and something you of course know going in.

Don't want to get off on a tangent on that -- am still interested what
type of solutions in the Linux world are out there that can approximate
what an SSD based ZIL does for ZFS.

Kent Overstreet (from lkml) mentioned that his bcache patch is intented
to do something very similar.

So I guess that's my answer -- it's not here yet, so sounds like the
controller is the only way to achieve this currently.

Ray
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread nate
John R Pierce wrote:
> Ray Van Dolson wrote:
>>> I think what you want is a proper storage array with mirrored write
>>> cache.
>>>
>>
>> Which is what we have with ZFS + SSD-based ZIL for far less money than
>> a NetApp.
>>
>
> not unless you have a pair of them configured as an active/standby HA
> cluster, sharing dual port disk storage, and some how (magic?) mirroring
> the cache pool so that if the active storage controller/server fails,
> the standby can take over wthout losing a single write.
>

OT too but really thought this was a good post/thread on ZFS

http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg18898.html

"ZFS is designed for high *reliability*"
[..]
"You want something  completely different. You expect it to deliver
*availability*.

And availability is something ZFS doesn't promise. It simply can't
deliver this."

--


nate


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread John R Pierce
Ray Van Dolson wrote:
>> I think what you want is a proper storage array with mirrored write
>> cache.
>> 
>
> Which is what we have with ZFS + SSD-based ZIL for far less money than
> a NetApp.
>   

not unless you have a pair of them configured as an active/standby HA 
cluster, sharing dual port disk storage, and some how (magic?) mirroring 
the cache pool so that if the active storage controller/server fails, 
the standby can take over wthout losing a single write.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread Ray Van Dolson
On Thu, Apr 22, 2010 at 02:06:47PM -0700, nate wrote:
> Ray Van Dolson wrote:
> 
> > The "delayed allocation" features in ext4 (and xfs, reiser4) sound
> > interesting.  Might give a little performance boost for synchronous
> > write workloads
> 
> Doesn't delayed allocation defeat the purpose of a synchronous write?

I don't know for sure.  From reading, it sounds like as far as data
integrity is concerned it would fall somewhere between complete
write-through synchronous writes and asynchronous writes.

> I think what you want is a proper storage array with mirrored write
> cache.

Which is what we have with ZFS + SSD-based ZIL for far less money than
a NetApp.

This[1] sounds interesting...

Ray

[1] http://lkml.org/lkml/2010/4/5/41
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread nate
Ray Van Dolson wrote:

> The "delayed allocation" features in ext4 (and xfs, reiser4) sound
> interesting.  Might give a little performance boost for synchronous
> write workloads

Doesn't delayed allocation defeat the purpose of a synchronous write?

I think what you want is a proper storage array with mirrored write
cache.

nate


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread Ray Van Dolson
On Thu, Apr 22, 2010 at 03:37:41PM -0500, Les Mikesell wrote:
> On 4/22/2010 3:20 PM, Ray Van Dolson wrote:
> > [ Wish there was a generic, active Linux "storage" mailing list out
> >there -- something other than the kernel lists I mean ]
> >
> > To frame the discussion, we use VMware ESX (vSphere) quite a bit with
> > NFS datastores.  Often times with NetApp, but lately, more often with
> > Solaris 10 + ZFS + SSD's for ZIL (intent log or write cache).
> >
> > The ZIL lets us use synchronous writes (safer) without the normal
> > delay.  Were we to try and get the same level of performance with
> > Linux, we'd need to use async mode for our NFS shares -- and we'd lose
> > some reliability.
> >
> > However, given the latest rumblings and ruminations about Oracle
> > potentially no longer selling entitlements for Solaris 10 on non-Sun
> > hardware -- and then turning around and no longer allowing you to run
> > Solaris 10 "freely", we're left with either OpenSolaris or looking at
> > Linux again (we run Solaris 10 on Silicon Mechanics hardware).
> 
> Is there some problem with OpenSolaris or NexentaStor?
> 

Maybe not, but am trying to see what options there are on the Linux
side.

The "delayed allocation" features in ext4 (and xfs, reiser4) sound
interesting.  Might give a little performance boost for synchronous
write workloads

[ We like Nexenta and OpenSolaris just fine, but really like the
  stability guarantee Solaris gives us -- much like RHEL.  Would rather
  not have to worry (as much) about needing to reboot storage boxes and
  even though I have confidence in OpenSolaris, it's still more of a
  moving / changing target.  Not to say we won't ultimately go in that
  direction though. ]

Thanks,
Ray
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] OT: Caching synchronous writes

2010-04-22 Thread Les Mikesell
On 4/22/2010 3:20 PM, Ray Van Dolson wrote:
> [ Wish there was a generic, active Linux "storage" mailing list out
>there -- something other than the kernel lists I mean ]
>
> To frame the discussion, we use VMware ESX (vSphere) quite a bit with
> NFS datastores.  Often times with NetApp, but lately, more often with
> Solaris 10 + ZFS + SSD's for ZIL (intent log or write cache).
>
> The ZIL lets us use synchronous writes (safer) without the normal
> delay.  Were we to try and get the same level of performance with
> Linux, we'd need to use async mode for our NFS shares -- and we'd lose
> some reliability.
>
> However, given the latest rumblings and ruminations about Oracle
> potentially no longer selling entitlements for Solaris 10 on non-Sun
> hardware -- and then turning around and no longer allowing you to run
> Solaris 10 "freely", we're left with either OpenSolaris or looking at
> Linux again (we run Solaris 10 on Silicon Mechanics hardware).

Is there some problem with OpenSolaris or NexentaStor?

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos