Re: [ceph-users] CEPH cache layer. Very slow

2015-08-14 Thread Voloshanenko Igor
72 osd, 60 hdd, 12 ssd
Primary workload - rbd, kvm

пятница, 14 августа 2015 г. пользователь Ben Hines написал:

> Nice to hear that you have no SSD failures yet in 10months.
>
> How many OSDs are you running, and what is your primary ceph workload?
> (RBD, rgw, etc?)
>
> -Ben
>
> On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
> > wrote:
> > Hi!
> >
> >
> > Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for
> ceph
> > journals
> > and DC S3700 400Gb in the SSD pool: same hosts, separate root in
> crushmap.
> >
> > SSD pool are not yet in production, journаlling SSDs works under
> production
> > load
> > for 10 months. They're in good condition - no faults, no degradation.
> >
> > We specially take 200Gb SSD for journals to reduce costs, and also have a
> > higher
> > than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
> > 1/3 to 1/6.
> >
> > So, as a conclusion - I'll recommend you to get a bigger budget and buy
> > durable
> > and fast SSDs for Ceph.
> >
> > Megov Igor
> > CIO, Yuterra
> >
> > ________
> > От: ceph-users > от
> имени Voloshanenko
> > Igor >
> > Отправлено: 13 августа 2015 г. 15:54
> > Кому: Jan Schermer
> > Копия: ceph-users@lists.ceph.com 
> > Тема: Re: [ceph-users] CEPH cache layer. Very slow
> >
> > So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
> > intel S3500 240G (((
> >
> > Any other models? (((
> >
> > 2015-08-13 15:45 GMT+03:00 Jan Schermer 
> >:
> >>
> >> I tested and can recommend the Samsung 845 DC PRO (make sure it is DC
> PRO
> >> and not just "PRO" or "DC EVO"!).
> >> Those were very cheap but are out of stock at the moment (here).
> >> Faster than Intels, cheaper, and slightly different technology (3D
> V-NAND)
> >> which IMO makes them superior without needing many tricks to do its job.
> >>
> >> Jan
> >>
> >> On 13 Aug 2015, at 14:40, Voloshanenko Igor <
> igor.voloshane...@gmail.com >
> >> wrote:
> >>
> >> Tnx, Irek! Will try!
> >>
> >> but another question to all, which SSD good enough for CEPH now?
> >>
> >> I'm looking into S3500 240G (I have some S3500 120G which show great
> >> results. Around 8x times better than Samsung)
> >>
> >> Possible you can give advice about other vendors/models with same or
> below
> >> price level as S3500 240G?
> >>
> >> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov  >:
> >>>
> >>> Hi, Igor.
> >>> Try to roll the patch here:
> >>>
> >>>
> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
> >>>
> >>> P.S. I am no longer tracks changes in this direction(kernel), because
> we
> >>> use already recommended SSD
> >>>
> >>> С уважением, Фасихов Ирек Нургаязович
> >>> Моб.: +79229045757
> >>>
> >>> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
> >>> >:
> >>>>
> >>>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
> >>>>
> >>>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
> >>>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
> >>>> --gr[53/1800]
> >>>> ting --name=journal-test
> >>>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
> >>>> iodepth=1
> >>>> fio-2.1.3
> >>>> Starting 1 process
> >>>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops]
> [eta
> >>>> 00m:00s]
> >>>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
> >>>> 10:46:42 2015
> >>>>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
> >>>> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> >>>>  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> >>>> clat percentiles (usec):
> >>>>  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
> >>>> 2928],
> >>>>  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
> >>>> 3408],
> >>>>  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
> 

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Voloshanenko Igor
So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
intel S3500 240G (((

Any other models? (((

2015-08-13 15:45 GMT+03:00 Jan Schermer :

> I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO
> and not just "PRO" or "DC EVO"!).
> Those were very cheap but are out of stock at the moment (here).
> Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
> which IMO makes them superior without needing many tricks to do its job.
>
> Jan
>
> On 13 Aug 2015, at 14:40, Voloshanenko Igor 
> wrote:
>
> Tnx, Irek! Will try!
>
> but another question to all, which SSD good enough for CEPH now?
>
> I'm looking into S3500 240G (I have some S3500 120G which show great
> results. Around 8x times better than Samsung)
>
> Possible you can give advice about other vendors/models with same or below
> price level as S3500 240G?
>
> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov :
>
>> Hi, Igor.
>> Try to roll the patch here:
>>
>> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>>
>> P.S. I am no longer tracks changes in this direction(kernel), because we
>> use already recommended SSD
>>
>> С уважением, Фасихов Ирек Нургаязович
>> Моб.: +79229045757
>>
>> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor > >:
>>
>>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
>>>
>>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
>>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
>>> --gr[53/1800]
>>> ting --name=journal-test
>>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
>>> iodepth=1
>>> fio-2.1.3
>>> Starting 1 process
>>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
>>> 00m:00s]
>>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
>>> 10:46:42 2015
>>>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
>>> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>>>  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>>> clat percentiles (usec):
>>>  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
>>> 2928],
>>>  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
>>> 3408],
>>>  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
>>> 4016],
>>>  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
>>> 99.95th=[10048],
>>>  | 99.99th=[14912]
>>> bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
>>> stdev=34.31
>>> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
>>>   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
>>>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> >=64=0.0%
>>>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0
>>>
>>> Run status group 0 (all jobs):
>>>   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
>>> mint=60001msec, maxt=60001msec
>>>
>>> Disk stats (read/write):
>>>   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%
>>>
>>> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
>>>
>>> I try to change cache mode :
>>> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
>>> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
>>>
>>> no luck, still same shit results, also i found this article:
>>> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
>>> which disable CMD_FLUSH
>>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
>>>
>>> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
>>> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
>>> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
>>> exception was not included into libsata.c)
>>>
>>> 2015-08-12 19:17 GMT+03:00 Pieter Koorts :
>>>
 Hi Igor

 I suspect you have very much the same problem as me.

 https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html

 Basically Samsung drives (like many SATA SSD's) are very much hit and
 miss so you will need to test them like described here to see if they are
 any good.
 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 To give you an idea my average performance went from 11MB/s (with
 Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a
 very small cluster.

 Pieter

 On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor <
 igor.voloshane...@gmail.com> wrote:

 Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes,
 12 disks on each, 10 HDD, 2 SSD)

 Also we cover this with custom crushmap with 2 root leaf

 ID   WEIGHT  TYPE NAM

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Jan Schermer
I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO and 
not just "PRO" or "DC EVO"!).
Those were very cheap but are out of stock at the moment (here).
Faster than Intels, cheaper, and slightly different technology (3D V-NAND) 
which IMO makes them superior without needing many tricks to do its job.

Jan

> On 13 Aug 2015, at 14:40, Voloshanenko Igor  
> wrote:
> 
> Tnx, Irek! Will try!
> 
> but another question to all, which SSD good enough for CEPH now?
> 
> I'm looking into S3500 240G (I have some S3500 120G which show great results. 
> Around 8x times better than Samsung)
> 
> Possible you can give advice about other vendors/models with same or below 
> price level as S3500 240G?
> 
> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov  >:
> Hi, Igor.
> Try to roll the patch here:
> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>  
> 
> 
> P.S. I am no longer tracks changes in this direction(kernel), because we use 
> already recommended SSD
> 
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757 
> 
> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor  >:
> So, after testing SSD (i wipe 1 SSD, and used it for tests)
> 
> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write 
> --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800]
> ting --name=journal-test
> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> fio-2.1.3
> Starting 1 process
> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta 
> 00m:00s]
> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42 
> 2015
>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> clat percentiles (usec):
>  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928],
>  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408],
>  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016],
>  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048],
>  | 99.99th=[14912]
> bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31
> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
>   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
> >=64=0.0%
>  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0
> 
> Run status group 0 (all jobs):
>   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, 
> mint=60001msec, maxt=60001msec
> 
> Disk stats (read/write):
>   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%
> 
> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
> 
> I try to change cache mode :
> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
> 
> no luck, still same shit results, also i found this article:
> https://lkml.org/lkml/2013/11/20/264  
> pointed to old very simple patch, which disable CMD_FLUSH
> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba 
> 
> 
> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH 
> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch because 
> SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this exception was not 
> included into libsata.c)
> 
> 2015-08-12 19:17 GMT+03:00 Pieter Koorts  >:
> Hi Igor
> 
> I suspect you have very much the same problem as me.
> 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html 
> 
> 
> Basically Samsung drives (like many SATA SSD's) are very much hit and miss so 
> you will need to test them like described here to see if they are any good. 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>  
> 
> 
> To give you an idea my average performance went from 11MB/s (with Samsung 
> SSD) to 30MB/s (without any SSD) on write performance. This is a very small 
> cluster.
> 
> Pieter
> 
> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor  > wrote:
> 
>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 node

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Voloshanenko Igor
Tnx, Irek! Will try!

but another question to all, which SSD good enough for CEPH now?

I'm looking into S3500 240G (I have some S3500 120G which show great
results. Around 8x times better than Samsung)

Possible you can give advice about other vendors/models with same or below
price level as S3500 240G?

2015-08-13 12:11 GMT+03:00 Irek Fasikhov :

> Hi, Igor.
> Try to roll the patch here:
>
> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>
> P.S. I am no longer tracks changes in this direction(kernel), because we
> use already recommended SSD
>
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>
> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor 
> :
>
>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
>>
>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
>> --gr[53/1800]
>> ting --name=journal-test
>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
>> iodepth=1
>> fio-2.1.3
>> Starting 1 process
>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
>> 00m:00s]
>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
>> 10:46:42 2015
>>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
>> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>>  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>> clat percentiles (usec):
>>  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928],
>>  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408],
>>  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016],
>>  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048],
>>  | 99.99th=[14912]
>> bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
>> stdev=34.31
>> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
>>   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
>>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0
>>
>> Run status group 0 (all jobs):
>>   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
>> mint=60001msec, maxt=60001msec
>>
>> Disk stats (read/write):
>>   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%
>>
>> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
>>
>> I try to change cache mode :
>> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
>> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
>>
>> no luck, still same shit results, also i found this article:
>> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
>> which disable CMD_FLUSH
>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
>>
>> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
>> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
>> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
>> exception was not included into libsata.c)
>>
>> 2015-08-12 19:17 GMT+03:00 Pieter Koorts :
>>
>>> Hi Igor
>>>
>>> I suspect you have very much the same problem as me.
>>>
>>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html
>>>
>>> Basically Samsung drives (like many SATA SSD's) are very much hit and
>>> miss so you will need to test them like described here to see if they are
>>> any good.
>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>
>>> To give you an idea my average performance went from 11MB/s (with
>>> Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a
>>> very small cluster.
>>>
>>> Pieter
>>>
>>> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor <
>>> igor.voloshane...@gmail.com> wrote:
>>>
>>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes,
>>> 12 disks on each, 10 HDD, 2 SSD)
>>>
>>> Also we cover this with custom crushmap with 2 root leaf
>>>
>>> ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> -100 5.0 root ssd
>>> -102 1.0 host ix-s2-ssd
>>>2 1.0 osd.2   up  1.0  1.0
>>>9 1.0 osd.9   up  1.0  1.0
>>> -103 1.0 host ix-s3-ssd
>>>3 1.0 osd.3   up  1.0  1.0
>>>7 1.0 osd.7   up  1.0  1.0
>>> -104 1.0 host ix-s5-ssd
>>>1 1.0 osd.1   up  1.0  1.0
>>>6 1.0 osd.6   up  1.0  1.0
>>> -105 1.0 host ix-s6-ssd
>>>4 1.0 osd.4

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Irek Fasikhov
Hi, Igor.
Try to roll the patch here:
http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov

P.S. I am no longer tracks changes in this direction(kernel), because we
use already recommended SSD

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2015-08-13 11:56 GMT+03:00 Voloshanenko Igor :

> So, after testing SSD (i wipe 1 SSD, and used it for tests)
>
> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write
> --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800]
> ting --name=journal-test
> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
> iodepth=1
> fio-2.1.3
> Starting 1 process
> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
> 00m:00s]
> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
> 10:46:42 2015
>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> clat percentiles (usec):
>  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928],
>  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408],
>  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016],
>  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048],
>  | 99.99th=[14912]
> bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
> stdev=34.31
> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
>   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
>   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
>   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
> mint=60001msec, maxt=60001msec
>
> Disk stats (read/write):
>   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%
>
> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
>
> I try to change cache mode :
> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
>
> no luck, still same shit results, also i found this article:
> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
> which disable CMD_FLUSH
> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
>
> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
> exception was not included into libsata.c)
>
> 2015-08-12 19:17 GMT+03:00 Pieter Koorts :
>
>> Hi Igor
>>
>> I suspect you have very much the same problem as me.
>>
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html
>>
>> Basically Samsung drives (like many SATA SSD's) are very much hit and
>> miss so you will need to test them like described here to see if they are
>> any good.
>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>
>> To give you an idea my average performance went from 11MB/s (with Samsung
>> SSD) to 30MB/s (without any SSD) on write performance. This is a very small
>> cluster.
>>
>> Pieter
>>
>> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor <
>> igor.voloshane...@gmail.com> wrote:
>>
>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes,
>> 12 disks on each, 10 HDD, 2 SSD)
>>
>> Also we cover this with custom crushmap with 2 root leaf
>>
>> ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -100 5.0 root ssd
>> -102 1.0 host ix-s2-ssd
>>2 1.0 osd.2   up  1.0  1.0
>>9 1.0 osd.9   up  1.0  1.0
>> -103 1.0 host ix-s3-ssd
>>3 1.0 osd.3   up  1.0  1.0
>>7 1.0 osd.7   up  1.0  1.0
>> -104 1.0 host ix-s5-ssd
>>1 1.0 osd.1   up  1.0  1.0
>>6 1.0 osd.6   up  1.0  1.0
>> -105 1.0 host ix-s6-ssd
>>4 1.0 osd.4   up  1.0  1.0
>>8 1.0 osd.8   up  1.0  1.0
>> -106 1.0 host ix-s7-ssd
>>0 1.0 osd.0   up  1.0  1.0
>>5 1.0 osd.5   up  1.0  1.0
>>   -1 5.0 root platter
>>   -2 1.0 host ix-s2-platter
>>   13 1.0 osd.13  up  1.0  1.0
>>   17 1.0 osd.17   

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-13 Thread Voloshanenko Igor
So, after testing SSD (i wipe 1 SSD, and used it for tests)

root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 --rw=write
--bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --gr[53/1800]
ting --name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
fio-2.1.3
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 10:46:42
2015
  write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
 lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
clat percentiles (usec):
 |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ 2928],
 | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ 3408],
 | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ 4016],
 | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], 99.95th=[10048],
 | 99.99th=[14912]
bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, stdev=34.31
lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
  cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, util=99.30%

So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s

I try to change cache mode :
echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type

no luck, still same shit results, also i found this article:
https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
which disable CMD_FLUSH
https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba

Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
exception was not included into libsata.c)

2015-08-12 19:17 GMT+03:00 Pieter Koorts :

> Hi Igor
>
> I suspect you have very much the same problem as me.
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html
>
> Basically Samsung drives (like many SATA SSD's) are very much hit and miss
> so you will need to test them like described here to see if they are any
> good.
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
> To give you an idea my average performance went from 11MB/s (with Samsung
> SSD) to 30MB/s (without any SSD) on write performance. This is a very small
> cluster.
>
> Pieter
>
> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor <
> igor.voloshane...@gmail.com> wrote:
>
> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
> disks on each, 10 HDD, 2 SSD)
>
> Also we cover this with custom crushmap with 2 root leaf
>
> ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -100 5.0 root ssd
> -102 1.0 host ix-s2-ssd
>2 1.0 osd.2   up  1.0  1.0
>9 1.0 osd.9   up  1.0  1.0
> -103 1.0 host ix-s3-ssd
>3 1.0 osd.3   up  1.0  1.0
>7 1.0 osd.7   up  1.0  1.0
> -104 1.0 host ix-s5-ssd
>1 1.0 osd.1   up  1.0  1.0
>6 1.0 osd.6   up  1.0  1.0
> -105 1.0 host ix-s6-ssd
>4 1.0 osd.4   up  1.0  1.0
>8 1.0 osd.8   up  1.0  1.0
> -106 1.0 host ix-s7-ssd
>0 1.0 osd.0   up  1.0  1.0
>5 1.0 osd.5   up  1.0  1.0
>   -1 5.0 root platter
>   -2 1.0 host ix-s2-platter
>   13 1.0 osd.13  up  1.0  1.0
>   17 1.0 osd.17  up  1.0  1.0
>   21 1.0 osd.21  up  1.0  1.0
>   27 1.0 osd.27  up  1.0  1.0
>   32 1.0 osd.32  up  1.0  1.0
>   37 1.0 osd.37  up  1.0  1.0
>   44 1.0 osd.44  up  1.0  1.0
>   48 1.0 osd.48  up  1.0  1.0
>   55 1.0 osd.55

Re: [ceph-users] CEPH cache layer. Very slow

2015-08-12 Thread Pieter Koorts

Hi Igor

I suspect you have very much the same problem as me.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html

Basically Samsung drives (like many SATA SSD's) are very much hit and miss so 
you will need to test them like described here to see if they are any good. 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

To give you an idea my average performance went from 11MB/s (with Samsung SSD) 
to 30MB/s (without any SSD) on write performance. This is a very small cluster.

Pieter

On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor  
wrote:

Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12 
disks on each, 10 HDD, 2 SSD)

Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0     host ix-s2-ssd
   2 1.0         osd.2               up  1.0          1.0
   9 1.0         osd.9               up  1.0          1.0
-103 1.0     host ix-s3-ssd
   3 1.0         osd.3               up  1.0          1.0
   7 1.0         osd.7               up  1.0          1.0
-104 1.0     host ix-s5-ssd
   1 1.0         osd.1               up  1.0          1.0
   6 1.0         osd.6               up  1.0          1.0
-105 1.0     host ix-s6-ssd
   4 1.0         osd.4               up  1.0          1.0
   8 1.0         osd.8               up  1.0          1.0
-106 1.0     host ix-s7-ssd
   0 1.0         osd.0               up  1.0          1.0
   5 1.0         osd.5               up  1.0          1.0
  -1 5.0 root platter
  -2 1.0     host ix-s2-platter
  13 1.0         osd.13              up  1.0          1.0
  17 1.0         osd.17              up  1.0          1.0
  21 1.0         osd.21              up  1.0          1.0
  27 1.0         osd.27              up  1.0          1.0
  32 1.0         osd.32              up  1.0          1.0
  37 1.0         osd.37              up  1.0          1.0
  44 1.0         osd.44              up  1.0          1.0
  48 1.0         osd.48              up  1.0          1.0
  55 1.0         osd.55              up  1.0          1.0
  59 1.0         osd.59              up  1.0          1.0
  -3 1.0     host ix-s3-platter
  14 1.0         osd.14              up  1.0          1.0
  18 1.0         osd.18              up  1.0          1.0
  23 1.0         osd.23              up  1.0          1.0
  28 1.0         osd.28              up  1.0          1.0
  33 1.0         osd.33              up  1.0          1.0
  39 1.0         osd.39              up  1.0          1.0
  43 1.0         osd.43              up  1.0          1.0
  47 1.0         osd.47              up  1.0          1.0
  54 1.0         osd.54              up  1.0          1.0
  58 1.0         osd.58              up  1.0          1.0
  -4 1.0     host ix-s5-platter
  11 1.0         osd.11              up  1.0          1.0
  16 1.0         osd.16              up  1.0          1.0
  22 1.0         osd.22              up  1.0          1.0
  26 1.0         osd.26              up  1.0          1.0
  31 1.0         osd.31              up  1.0          1.0
  36 1.0         osd.36              up  1.0          1.0
  41 1.0         osd.41              up  1.0          1.0
  46 1.0         osd.46              up  1.0          1.0
  51 1.0         osd.51              up  1.0          1.0
  56 1.0         osd.56              up  1.0          1.0
  -5 1.0     host ix-s6-platter
  12 1.0         osd.12              up  1.0          1.0
  19 1.0         osd.19              up  1.0          1.0
 24 1.0         osd.24              up  1.0          1.0
  29 1.0         osd.29              up  1.0          1.0
  34 1.0         osd.34              up  1.0          1.0
  38 1.0         osd.38              up  1.0          1.0
  42 1.0         osd.42              up  1.0          1.0
  50 1.0         osd.50              up  1.0          1.0
  53 1.0         osd.53              up  1.0          1.0
  57 1.0         osd.57              up  1.0          1.0
  -6 1.0     host ix-s7-platter
  10 1.0         osd.10              up  1.0          1.0
  15 1.0         osd.15              up  1.0          1.0
  20 1.0         osd.20              up  1.0          1.0
  25 1.0         osd.25              up

[ceph-users] CEPH cache layer. Very slow

2015-08-12 Thread Voloshanenko Igor
Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
disks on each, 10 HDD, 2 SSD)

Also we cover this with custom crushmap with 2 root leaf

ID   WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0 host ix-s2-ssd
   2 1.0 osd.2   up  1.0  1.0
   9 1.0 osd.9   up  1.0  1.0
-103 1.0 host ix-s3-ssd
   3 1.0 osd.3   up  1.0  1.0
   7 1.0 osd.7   up  1.0  1.0
-104 1.0 host ix-s5-ssd
   1 1.0 osd.1   up  1.0  1.0
   6 1.0 osd.6   up  1.0  1.0
-105 1.0 host ix-s6-ssd
   4 1.0 osd.4   up  1.0  1.0
   8 1.0 osd.8   up  1.0  1.0
-106 1.0 host ix-s7-ssd
   0 1.0 osd.0   up  1.0  1.0
   5 1.0 osd.5   up  1.0  1.0
  -1 5.0 root platter
  -2 1.0 host ix-s2-platter
  13 1.0 osd.13  up  1.0  1.0
  17 1.0 osd.17  up  1.0  1.0
  21 1.0 osd.21  up  1.0  1.0
  27 1.0 osd.27  up  1.0  1.0
  32 1.0 osd.32  up  1.0  1.0
  37 1.0 osd.37  up  1.0  1.0
  44 1.0 osd.44  up  1.0  1.0
  48 1.0 osd.48  up  1.0  1.0
  55 1.0 osd.55  up  1.0  1.0
  59 1.0 osd.59  up  1.0  1.0
  -3 1.0 host ix-s3-platter
  14 1.0 osd.14  up  1.0  1.0
  18 1.0 osd.18  up  1.0  1.0
  23 1.0 osd.23  up  1.0  1.0
  28 1.0 osd.28  up  1.0  1.0
  33 1.0 osd.33  up  1.0  1.0
  39 1.0 osd.39  up  1.0  1.0
  43 1.0 osd.43  up  1.0  1.0
  47 1.0 osd.47  up  1.0  1.0
  54 1.0 osd.54  up  1.0  1.0
  58 1.0 osd.58  up  1.0  1.0
  -4 1.0 host ix-s5-platter
  11 1.0 osd.11  up  1.0  1.0
  16 1.0 osd.16  up  1.0  1.0
  22 1.0 osd.22  up  1.0  1.0
  26 1.0 osd.26  up  1.0  1.0
  31 1.0 osd.31  up  1.0  1.0
  36 1.0 osd.36  up  1.0  1.0
  41 1.0 osd.41  up  1.0  1.0
  46 1.0 osd.46  up  1.0  1.0
  51 1.0 osd.51  up  1.0  1.0
  56 1.0 osd.56  up  1.0  1.0
  -5 1.0 host ix-s6-platter
  12 1.0 osd.12  up  1.0  1.0
  19 1.0 osd.19  up  1.0  1.0
 24 1.0 osd.24  up  1.0  1.0
  29 1.0 osd.29  up  1.0  1.0
  34 1.0 osd.34  up  1.0  1.0
  38 1.0 osd.38  up  1.0  1.0
  42 1.0 osd.42  up  1.0  1.0
  50 1.0 osd.50  up  1.0  1.0
  53 1.0 osd.53  up  1.0  1.0
  57 1.0 osd.57  up  1.0  1.0
  -6 1.0 host ix-s7-platter
  10 1.0 osd.10  up  1.0  1.0
  15 1.0 osd.15  up  1.0  1.0
  20 1.0 osd.20  up  1.0  1.0
  25 1.0 osd.25  up  1.0  1.0
  30 1.0 osd.30  up  1.0  1.0
  35 1.0 osd.35  up  1.0  1.0
  40 1.0 osd.40  up  1.0  1.0
  45 1.0 osd.45  up  1.0  1.0
  49 1.0 osd.49  up  1.0  1.0
  52 1.0 osd.52  up  1.0  1.0


Then create 2 pools, 1 on HDD (platters), 1 on SSD/
and put SSD pul in from of HDD pool (cache tier)

now we receive very bad performance results from cluster.
Even with rados