Re: [ceph-users] Benchmark performance when using SSD as the journal

2018-11-14 Thread Dave.Chen
Hi Roos,

I will try with the configuration, thank you very much!

Best Regards,
Dave Chen

-Original Message-
From: Marc Roos  
Sent: Wednesday, November 14, 2018 4:37 PM
To: ceph-users; Chen2, Dave
Subject: RE: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL] 
Please report any suspicious attachments, links, or requests for sensitive 
information.


 

Try comparing results from something like this test


[global]
ioengine=posixaio
invalidate=1
ramp_time=30
iodepth=1
runtime=180
time_based
direct=1
filename=/mnt/cephfs/ssd/fio-bench.img

[write-4k-seq]
stonewall
bs=4k
rw=write
#write_bw_log=sdx-4k-write-seq.results
#write_iops_log=sdx-4k-write-seq.results

[randwrite-4k-seq]
stonewall
bs=4k
rw=randwrite
#write_bw_log=sdx-4k-randwrite-seq.results
#write_iops_log=sdx-4k-randwrite-seq.results

[read-4k-seq]
stonewall
bs=4k
rw=read
#write_bw_log=sdx-4k-read-seq.results
#write_iops_log=sdx-4k-read-seq.results

[randread-4k-seq]
stonewall
bs=4k
rw=randread
#write_bw_log=sdx-4k-randread-seq.results
#write_iops_log=sdx-4k-randread-seq.results

[rw-4k-seq]
stonewall
bs=4k
rw=rw
#write_bw_log=sdx-4k-rw-seq.results
#write_iops_log=sdx-4k-rw-seq.results

[randrw-4k-seq]
stonewall
bs=4k
rw=randrw
#write_bw_log=sdx-4k-randrw-seq.results
#write_iops_log=sdx-4k-randrw-seq.results

[write-128k-seq]
stonewall
bs=128k
rw=write
#write_bw_log=sdx-128k-write-seq.results
#write_iops_log=sdx-128k-write-seq.results

[randwrite-128k-seq]
stonewall
bs=128k
rw=randwrite
#write_bw_log=sdx-128k-randwrite-seq.results
#write_iops_log=sdx-128k-randwrite-seq.results

[read-128k-seq]
stonewall
bs=128k
rw=read
#write_bw_log=sdx-128k-read-seq.results
#write_iops_log=sdx-128k-read-seq.results

[randread-128k-seq]
stonewall
bs=128k
rw=randread
#write_bw_log=sdx-128k-randread-seq.results
#write_iops_log=sdx-128k-randread-seq.results

[rw-128k-seq]
stonewall
bs=128k
rw=rw
#write_bw_log=sdx-128k-rw-seq.results
#write_iops_log=sdx-128k-rw-seq.results

[randrw-128k-seq]
stonewall
bs=128k
rw=randrw
#write_bw_log=sdx-128k-randrw-seq.results
#write_iops_log=sdx-128k-randrw-seq.results

[write-1024k-seq]
stonewall
bs=1024k
rw=write
#write_bw_log=sdx-1024k-write-seq.results
#write_iops_log=sdx-1024k-write-seq.results

[randwrite-1024k-seq]
stonewall
bs=1024k
rw=randwrite
#write_bw_log=sdx-1024k-randwrite-seq.results
#write_iops_log=sdx-1024k-randwrite-seq.results

[read-1024k-seq]
stonewall
bs=1024k
rw=read
#write_bw_log=sdx-1024k-read-seq.results
#write_iops_log=sdx-1024k-read-seq.results

[randread-1024k-seq]
stonewall
bs=1024k
rw=randread
#write_bw_log=sdx-1024k-randread-seq.results
#write_iops_log=sdx-1024k-randread-seq.results

[rw-1024k-seq]
stonewall
bs=1024k
rw=rw
#write_bw_log=sdx-1024k-rw-seq.results
#write_iops_log=sdx-1024k-rw-seq.results

[randrw-1024k-seq]
stonewall
bs=1024k
rw=randrw
#write_bw_log=sdx-1024k-randrw-seq.results
#write_iops_log=sdx-1024k-randrw-seq.results

[write-4096k-seq]
stonewall
bs=4096k
rw=write
#write_bw_log=sdx-4096k-write-seq.results
#write_iops_log=sdx-4096k-write-seq.results

[randwrite-4096k-seq]
stonewall
bs=4096k
rw=randwrite
#write_bw_log=sdx-4096k-randwrite-seq.results
#write_iops_log=sdx-4096k-randwrite-seq.results

[read-4096k-seq]
stonewall
bs=4096k
rw=read
#write_bw_log=sdx-4096k-read-seq.results
#write_iops_log=sdx-4096k-read-seq.results

[randread-4096k-seq]
stonewall
bs=4096k
rw=randread
#write_bw_log=sdx-4096k-randread-seq.results
#write_iops_log=sdx-4096k-randread-seq.results

[rw-4096k-seq]
stonewall
bs=4096k
rw=rw
#write_bw_log=sdx-4096k-rw-seq.results
#write_iops_log=sdx-4096k-rw-seq.results

[randrw-4096k-seq]
stonewall
bs=4096k
rw=randrw
#write_bw_log=sdx-4096k-randrw-seq.results
#write_iops_log=sdx-4096k-randrw-seq.results



-Original Message-
From: dave.c...@dell.com [mailto:dave.c...@dell.com] 
Sent: woensdag 14 november 2018 5:21
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Benchmark performance when using SSD as the 
journal

Hi all,

 

We want to compare the performance between HDD partition as the journal 
(inline from OSD disk) and SSD partition as the journal, here is what we 
have done, we have 3 nodes used as Ceph OSD,  each has 3 OSD on it. 
Firstly, we created the OSD with journal from OSD partition, and run 
“rados bench” utility to test the performance, and then migrate the 
journal from HDD to SSD (Intel S4500) and run “rados bench” again, the 
expected result is SSD partition should be much better than HDD, but the 
result shows us there is nearly no change,

 

The configuration of Ceph is as below,

pool size: 3

osd size: 3*3

pg (pgp) num: 300

osd nodes are separated across three different nodes

rbd image size: 10G (10240M)

 

The utility I used is,

rados bench -p rbd $duration write

rados bench -p rbd $duration seq

rados bench -p rbd $duration rand

 

Is there anything wrong from what I did?  Could anyone give me some 
suggestion?

 

 

Best Regards,

Dave Chen

 



Re: [ceph-users] Benchmark performance when using SSD as the journal

2018-11-14 Thread Dave.Chen
Thanks Mokhtar! This is what I am looking for, thanks for your explanation!


Best Regards,
Dave Chen

From: Maged Mokhtar 
Sent: Wednesday, November 14, 2018 3:36 PM
To: Chen2, Dave; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.

Hi Dave,

The SSD journal will help boost iops  & latency which will be more apparent for 
small block sizes. The rados benchmark default block size is 4M, use the -b 
option to specify the size. Try at 4k, 32k, 64k ...
As a side note, this is a rados level test, the rbd image size is not relevant 
here.

Maged.
On 14/11/18 06:21, dave.c...@dell.com wrote:
Hi all,

We want to compare the performance between HDD partition as the journal (inline 
from OSD disk) and SSD partition as the journal, here is what we have done, we 
have 3 nodes used as Ceph OSD,  each has 3 OSD on it. Firstly, we created the 
OSD with journal from OSD partition, and run "rados bench" utility to test the 
performance, and then migrate the journal from HDD to SSD (Intel S4500) and run 
"rados bench" again, the expected result is SSD partition should be much better 
than HDD, but the result shows us there is nearly no change,

The configuration of Ceph is as below,
pool size: 3
osd size: 3*3
pg (pgp) num: 300
osd nodes are separated across three different nodes
rbd image size: 10G (10240M)

The utility I used is,
rados bench -p rbd $duration write
rados bench -p rbd $duration seq
rados bench -p rbd $duration rand

Is there anything wrong from what I did?  Could anyone give me some suggestion?


Best Regards,
Dave Chen





___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Benchmark performance when using SSD as the journal

2018-11-13 Thread Dave.Chen
Thanks Martin for your suggestion!
I will definitely try bluestore later. The version of Ceph I am using is 
v10.2.10 Jewel, do you think it’s stable enough to use Bluestore for Jewel or 
should I upgrade Ceph to Luminous?


Best Regards,
Dave Chen

From: Martin Verges 
Sent: Wednesday, November 14, 2018 1:49 PM
To: Chen2, Dave
Cc: singap...@amerrick.co.uk; ceph-users
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.
Please never use the Datasheet values to select your SSD. We never had a single 
one that that delivers the shown perfomance in a Ceph Journal use case.

However, do not use Filestore anymore. Especialy with newer kernel versions. 
Use Bluestore instead.
--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Mi., 14. Nov. 2018, 05:46 hat 
mailto:dave.c...@dell.com>> geschrieben:
Thanks Merrick!

I checked with Intel spec [1], the performance Intel said is,

•  Sequential Read (up to) 500 MB/s
•  Sequential Write (up to) 330 MB/s
•  Random Read (100% Span) 72000 IOPS
•  Random Write (100% Span) 2 IOPS

I think these indicator should be must better than general HDD, and I have run 
read/write commands with “rados bench” respectively,   there should be some 
difference.

And is there any kinds of configuration that could give us any performance gain 
with this SSD (Intel S4500)?

[1] 
https://ark.intel.com/products/120521/Intel-SSD-DC-S4500-Series-480GB-2-5in-SATA-6Gb-s-3D1-TLC-

Best Regards,
Dave Chen

From: Ashley Merrick mailto:singap...@amerrick.co.uk>>
Sent: Wednesday, November 14, 2018 12:30 PM
To: Chen2, Dave
Cc: ceph-users
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.
Only certain SSD's are good for CEPH Journals as can be seen @ 
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

The SSD your using isn't listed but doing a quick search online it appears to 
be a SSD designed for read workloads as a "upgrade" from a HD so probably is 
not designed for the high write requirements a journal demands.
Therefore when it's been hit by 3 OSD's of workloads your not going to get much 
more performance out of it than you would just using the disk as your seeing.

On Wed, Nov 14, 2018 at 12:21 PM 
mailto:dave.c...@dell.com>> wrote:
Hi all,

We want to compare the performance between HDD partition as the journal (inline 
from OSD disk) and SSD partition as the journal, here is what we have done, we 
have 3 nodes used as Ceph OSD,  each has 3 OSD on it. Firstly, we created the 
OSD with journal from OSD partition, and run “rados bench” utility to test the 
performance, and then migrate the journal from HDD to SSD (Intel S4500) and run 
“rados bench” again, the expected result is SSD partition should be much better 
than HDD, but the result shows us there is nearly no change,

The configuration of Ceph is as below,
pool size: 3
osd size: 3*3
pg (pgp) num: 300
osd nodes are separated across three different nodes
rbd image size: 10G (10240M)

The utility I used is,
rados bench -p rbd $duration write
rados bench -p rbd $duration seq
rados bench -p rbd $duration rand

Is there anything wrong from what I did?  Could anyone give me some suggestion?


Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Benchmark performance when using SSD as the journal

2018-11-13 Thread Dave.Chen
Thanks Merrick!

I haven’t tried the blue store but I believe what you said, I tried again with 
“rbd bench-write” with filestore, the result has more than 50% performance 
increase with the SSD as the journal, so I am still cannot understand why 
“rados bench” cannot give us any difference, what’s the  rationale behind it? 
Do you know that?


Best Regards,
Dave Chen

From: Ashley Merrick 
Sent: Wednesday, November 14, 2018 12:49 PM
To: Chen2, Dave
Cc: ceph-users
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.
Well as you mentioned Journals I guess you was using filestore in your test?

You could go down the route of bluestore and put the WAL + DB onto the SSD and 
the bluestore data onto the HD, you should notice an increase in performance 
over both methods you have tried on filestore.

On Wed, Nov 14, 2018 at 12:45 PM 
mailto:dave.c...@dell.com>> wrote:
Thanks Merrick!

I checked with Intel spec [1], the performance Intel said is,

•  Sequential Read (up to) 500 MB/s
•  Sequential Write (up to) 330 MB/s
•  Random Read (100% Span) 72000 IOPS
•  Random Write (100% Span) 2 IOPS

I think these indicator should be must better than general HDD, and I have run 
read/write commands with “rados bench” respectively,   there should be some 
difference.

And is there any kinds of configuration that could give us any performance gain 
with this SSD (Intel S4500)?

[1] 
https://ark.intel.com/products/120521/Intel-SSD-DC-S4500-Series-480GB-2-5in-SATA-6Gb-s-3D1-TLC-

Best Regards,
Dave Chen

From: Ashley Merrick mailto:singap...@amerrick.co.uk>>
Sent: Wednesday, November 14, 2018 12:30 PM
To: Chen2, Dave
Cc: ceph-users
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.
Only certain SSD's are good for CEPH Journals as can be seen @ 
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

The SSD your using isn't listed but doing a quick search online it appears to 
be a SSD designed for read workloads as a "upgrade" from a HD so probably is 
not designed for the high write requirements a journal demands.
Therefore when it's been hit by 3 OSD's of workloads your not going to get much 
more performance out of it than you would just using the disk as your seeing.

On Wed, Nov 14, 2018 at 12:21 PM 
mailto:dave.c...@dell.com>> wrote:
Hi all,

We want to compare the performance between HDD partition as the journal (inline 
from OSD disk) and SSD partition as the journal, here is what we have done, we 
have 3 nodes used as Ceph OSD,  each has 3 OSD on it. Firstly, we created the 
OSD with journal from OSD partition, and run “rados bench” utility to test the 
performance, and then migrate the journal from HDD to SSD (Intel S4500) and run 
“rados bench” again, the expected result is SSD partition should be much better 
than HDD, but the result shows us there is nearly no change,

The configuration of Ceph is as below,
pool size: 3
osd size: 3*3
pg (pgp) num: 300
osd nodes are separated across three different nodes
rbd image size: 10G (10240M)

The utility I used is,
rados bench -p rbd $duration write
rados bench -p rbd $duration seq
rados bench -p rbd $duration rand

Is there anything wrong from what I did?  Could anyone give me some suggestion?


Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Benchmark performance when using SSD as the journal

2018-11-13 Thread Dave.Chen
Thanks Merrick!

I checked with Intel spec [1], the performance Intel said is,

•  Sequential Read (up to) 500 MB/s
•  Sequential Write (up to) 330 MB/s
•  Random Read (100% Span) 72000 IOPS
•  Random Write (100% Span) 2 IOPS

I think these indicator should be must better than general HDD, and I have run 
read/write commands with “rados bench” respectively,   there should be some 
difference.

And is there any kinds of configuration that could give us any performance gain 
with this SSD (Intel S4500)?

[1] 
https://ark.intel.com/products/120521/Intel-SSD-DC-S4500-Series-480GB-2-5in-SATA-6Gb-s-3D1-TLC-

Best Regards,
Dave Chen

From: Ashley Merrick 
Sent: Wednesday, November 14, 2018 12:30 PM
To: Chen2, Dave
Cc: ceph-users
Subject: Re: [ceph-users] Benchmark performance when using SSD as the journal


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.
Only certain SSD's are good for CEPH Journals as can be seen @ 
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

The SSD your using isn't listed but doing a quick search online it appears to 
be a SSD designed for read workloads as a "upgrade" from a HD so probably is 
not designed for the high write requirements a journal demands.
Therefore when it's been hit by 3 OSD's of workloads your not going to get much 
more performance out of it than you would just using the disk as your seeing.

On Wed, Nov 14, 2018 at 12:21 PM 
mailto:dave.c...@dell.com>> wrote:
Hi all,

We want to compare the performance between HDD partition as the journal (inline 
from OSD disk) and SSD partition as the journal, here is what we have done, we 
have 3 nodes used as Ceph OSD,  each has 3 OSD on it. Firstly, we created the 
OSD with journal from OSD partition, and run “rados bench” utility to test the 
performance, and then migrate the journal from HDD to SSD (Intel S4500) and run 
“rados bench” again, the expected result is SSD partition should be much better 
than HDD, but the result shows us there is nearly no change,

The configuration of Ceph is as below,
pool size: 3
osd size: 3*3
pg (pgp) num: 300
osd nodes are separated across three different nodes
rbd image size: 10G (10240M)

The utility I used is,
rados bench -p rbd $duration write
rados bench -p rbd $duration seq
rados bench -p rbd $duration rand

Is there anything wrong from what I did?  Could anyone give me some suggestion?


Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Benchmark performance when using SSD as the journal

2018-11-13 Thread Dave.Chen
Hi all,

We want to compare the performance between HDD partition as the journal (inline 
from OSD disk) and SSD partition as the journal, here is what we have done, we 
have 3 nodes used as Ceph OSD,  each has 3 OSD on it. Firstly, we created the 
OSD with journal from OSD partition, and run "rados bench" utility to test the 
performance, and then migrate the journal from HDD to SSD (Intel S4500) and run 
"rados bench" again, the expected result is SSD partition should be much better 
than HDD, but the result shows us there is nearly no change,

The configuration of Ceph is as below,
pool size: 3
osd size: 3*3
pg (pgp) num: 300
osd nodes are separated across three different nodes
rbd image size: 10G (10240M)

The utility I used is,
rados bench -p rbd $duration write
rados bench -p rbd $duration seq
rados bench -p rbd $duration rand

Is there anything wrong from what I did?  Could anyone give me some suggestion?


Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrate OSD journal to SSD partition

2018-11-07 Thread Dave.Chen
Hi all,

I have been trying to migrate the journal to SSD partition for an while, 
basically I followed the guide here [1],  I have the below configuration 
defined in the ceph.conf

[osd.0]
osd_journal = /dev/disk/by-partlabel/journal-1

And then create the journal in this way,
# ceph-osd -i 0 -mkjournal

After that, I started the osd,  and I saw the service is started successfully 
from the log print out on the console,
08 14:03:35 ceph1 ceph-osd[5111]: starting osd.0 at :/0 osd_data 
/var/lib/ceph/osd/ceph-0 /dev/disk/by-partlabel/journal-1
08 14:03:35 ceph1 ceph-osd[5111]: 2018-11-08 14:03:35.618247 7fe8b54b28c0 -1 
osd.0 766 log_to_monitors {default=true}

But I not sure whether the new journal is effective or not, looks like it is 
still using the old partition (/dev/sdc2) for journal, and new partition which 
is actually "dev/sde1" has no information on the journal,

# ceph-disk list

/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2
/dev/sdd :
/dev/sdd2 ceph journal, for /dev/sdd1
/dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2
/dev/sde :
/dev/sde1 other, 0fc63daf-8483-4772-8e79-3d69d8477de4
/dev/sdf other, unknown

# ls -l /var/lib/ceph/osd/ceph-0/journal
lrwxrwxrwx 1 ceph ceph 58  21  2018 /var/lib/ceph/osd/ceph-0/journal -> 
/dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61

# ls -l /dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61
lrwxrwxrwx 1 root root 10 8 13:59 
/dev/disk/by-partuuid/5b5cd6f6-5de4-44f3-9d33-e8a7f4b59f61 -> ../../sdc2


My question is how I know which partition is taking the role of journal? Where 
can I see the new journal partition is linked?

Any comments is highly appreciated!


[1] https://fatmin.com/2015/08/11/ceph-show-osd-to-journal-mapping/


Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
I saw these statement from this link ( 
http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the 
reason which leads to the warning?  

" This, combined with the default CRUSH failure domain, ensures that replicas 
or erasure code shards are separated across hosts and a single host failure 
will not affect availability."

Best Regards,
Dave Chen

-Original Message-
From: Chen2, Dave 
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@lists.ceph.com
Cc: Chen2, Dave
Subject: RE: [ceph-users] PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, 
it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" 
now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still 
got the warning message, and  the status is "active+undersized+degraded",  so 
does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that 
only for the HA consideration? Any official documents from Ceph has some guide 
on this?


$ ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.25439 root default
-2 1.81360 host ceph3
 2 1.81360 osd.2   up  1.0  1.0
-4 3.62720 host ceph1
 0 1.81360 osd.0   up  1.0  1.0
 1 1.81360 osd.1   up  1.0  1.0


Best Regards,
Dave Chen

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Burkhard Linke
Sent: Thursday, June 21, 2018 2:39 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG status is "active+undersized+degraded"

Hi,


On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:
> Hi all,
>
> I have setup a ceph cluster in my lab recently, the configuration per my 
> understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
> PG stuck with state "active+undersized+degraded", I think this should be very 
> generic issue, could anyone help me out?
>
> Here is the details about the ceph cluster,
>
> $ ceph -v  (jewel)
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.89049 root default
> -2 1.81360 host ceph3
> 2 1.81360 osd.2   up  1.0  1.0
> -3 0.44969 host ceph4
> 3 0.44969 osd.3   up  1.0  1.0
> -4 3.62720 host ceph1
> 0 1.81360 osd.0   up  1.0  1.0
> 1 1.81360 osd.1   up  1.0  1.0

*snipsnap*

You have a large difference in the capacities of the nodes. This results in a 
different host weight, which in turn might lead to problems with the crush 
algorithm. It is not able to get three different hosts for OSD placement for 
some of the PGs.

CEPH and crush do not cope well with heterogenous setups. I would suggest to 
move one of the OSDs from host ceph1 to ceph4 to equalize the host weight.

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, 
it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" 
now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still 
got the warning message, and  the status is "active+undersized+degraded",  so 
does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that 
only for the HA consideration? Any official documents from Ceph has some guide 
on this?


$ ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.25439 root default
-2 1.81360 host ceph3
 2 1.81360 osd.2   up  1.0  1.0
-4 3.62720 host ceph1
 0 1.81360 osd.0   up  1.0  1.0
 1 1.81360 osd.1   up  1.0  1.0


Best Regards,
Dave Chen

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Burkhard Linke
Sent: Thursday, June 21, 2018 2:39 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG status is "active+undersized+degraded"

Hi,


On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:
> Hi all,
>
> I have setup a ceph cluster in my lab recently, the configuration per my 
> understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
> PG stuck with state "active+undersized+degraded", I think this should be very 
> generic issue, could anyone help me out?
>
> Here is the details about the ceph cluster,
>
> $ ceph -v  (jewel)
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.89049 root default
> -2 1.81360 host ceph3
> 2 1.81360 osd.2   up  1.0  1.0
> -3 0.44969 host ceph4
> 3 0.44969 osd.3   up  1.0  1.0
> -4 3.62720 host ceph1
> 0 1.81360 osd.0   up  1.0  1.0
> 1 1.81360 osd.1   up  1.0  1.0

*snipsnap*

You have a large difference in the capacities of the nodes. This results in a 
different host weight, which in turn might lead to problems with the crush 
algorithm. It is not able to get three different hosts for OSD placement for 
some of the PGs.

CEPH and crush do not cope well with heterogenous setups. I would suggest to 
move one of the OSDs from host ceph1 to ceph4 to equalize the host weight.

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG status is "active+undersized+degraded"

2018-06-20 Thread Dave.Chen
Hi all,

I have setup a ceph cluster in my lab recently, the configuration per my 
understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
PG stuck with state "active+undersized+degraded", I think this should be very 
generic issue, could anyone help me out?

Here is the details about the ceph cluster,

$ ceph -v  (jewel)
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)

# ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.89049 root default
-2 1.81360 host ceph3
2 1.81360 osd.2   up  1.0  1.0
-3 0.44969 host ceph4
3 0.44969 osd.3   up  1.0  1.0
-4 3.62720 host ceph1
0 1.81360 osd.0   up  1.0  1.0
1 1.81360 osd.1   up  1.0  1.0


# ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized
pg 17.58 is stuck unclean for 61033.947719, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck unclean for 61033.948201, current state 
active+undersized+degraded, last acting [0,2]
pg 17.58 is stuck undersized for 61033.343824, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck undersized for 61033.327566, current state 
active+undersized+degraded, last acting [0,2]
pg 17.58 is stuck degraded for 61033.343835, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck degraded for 61033.327576, current state 
active+undersized+degraded, last acting [0,2]
pg 17.16 is active+undersized+degraded, acting [0,2]
pg 17.58 is active+undersized+degraded, acting [2,0]



# rados lspools
rbdbench


$ ceph osd pool get rbdbench size
size: 3



Where can I get the details about the issue?   Appreciate for any comments!

Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD down with Ceph version of Kraken

2017-12-05 Thread Dave.Chen
Hi,

Our Ceph version is Kraken and for the storage node we have up to 90 hard disks 
that can be used for OSD, we configured the messenger type as "simple", I 
noticed that "simple" type here might create lots of threads and hence occupied 
lots of resource, we observed the configuration will cause many OSD failure, 
and happened frequently. Is there any configuration could help to work around 
the issue of OSD failure?

Thanks in the advance!

Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com