Hello,

On Wed, 17 Feb 2016 10:04:11 +0100 Piotr Wachowicz wrote:

> Thanks for your reply.
> 
> 
> > > Let's consider both cases:
> > > Journals on SSDs - for writes, the write operation returns right
> > > after data lands on the Journal's SSDs, but before it's written to
> > > the backing HDD. So, for writes, SSD journal approach should be
> > > comparable to having a SSD cache tier.
> > Not quite, see below.
> >
> >
> Could you elaborate a bit more?
> 
> Are you saying that with a Journal on a SSD writes from clients, before
> they can return from the operation to the client, must end up on both the
> SSD (Journal) *and* HDD (actual data store behind that journal)? 

No, your initial statement is correct. 

However that burst of speed doesn't last indefinitely. 

Aside from the size of the journal (which is incidentally NOT the most
limiting factor) there are various "filestore" parameters in Ceph, in
particular the sync interval ones. 
There was a more in-depth explanation by a developer about this in this ML,
try your google-foo. 

For short bursts of activity, the journal helps a LOT.
If you send a huge number of for example 4KB writes to your cluster, the
speed will eventually (after a few seconds) go down to what your backing
storage (HDDs) are capable of sustaining.

> I was
> under the impression that one of the benefits of having a journal on a
> SSD is deferring the write to the slow HDD to a later time, until after
> the write call returns to the client. Is that not the case? If so, that
> would mean SSD cache tier should be much faster in terms of write
> latency than SSD journal.
> 
> 
> > In your specific case writes to the OSDs (HDDs) will be (at least) 50%
> > slower if your journals are on disk instead of the SSD.
> >
> 
> Is that because of the above -- with Journal on the same disk (HDD) as
> the data, writes have to be written twice (assuming no btrfs/zfs cow) to
> the HDD (journal, and data). Whereas with a Journal on the SSD write to
> the Journal and disk can be done in parallel with write to the HDD? 
Yes, as far as the doubling of the I/O and thus the halving of speed is
concerned. Even with disk based journals the ACK of course happens when
ALL journal OSDs have done their writing. 

>(But
> still both of those have to be completed before the write operation
> returns to the client).
>
See above, eventually, kind-a-sorta.  
> 
> 
> > (Which SSDs do you plan to use anyway?)
> >
> 
> Intel DC S3700
> 
Good choice, with the 200GB model prefer the 3700 over the 3710 (higher
sequential write speed).

Christian
> 
> Thanks,
> Piotr


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to