Re: [ceph-users] ceph 12.2.5 - atop DB/WAL SSD usage 0%

2018-04-30 Thread Hans van den Bogert
Shouldn't Steven see some data being written to the block/wal for object 
metadata? Though that might be negligible with 4MB objects




On 27-04-18 16:04, Serkan Çoban wrote:

rados bench is using 4MB block size for io. Try with with io size 4KB,
you will see ssd will be used for write operations.

On Fri, Apr 27, 2018 at 4:54 PM, Steven Vacaroaia  wrote:

Hi

During rados bench tests, I noticed that HDD usage goes to 100% but SSD
stays at ( or very close to 0)

Since I created OSD with BLOCK/WAL on SSD, shouldnt  I see some "activity'
on SSD ?

How can I be sure CEPH is actually using SSD for WAL /DB ?


Note
I only have 2 HDD and one SSD per server for now


Comands used

rados bench -p rbd 50 write -t 32 --no-cleanup && rados bench -p rbd -t 32
50 rand


/usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc
--block.wal /dev/disk/by-partuuid/32ffde6f-7249-40b9-9bc5-2b70f0c3f7ad
--block.db /dev/disk/by-partuuid/2d9ab913-7553-46fc-8f96-5ffee028098a

( partitions are on SSD ...see below)

  sgdisk -p /dev/sda
Disk /dev/sda: 780140544 sectors, 372.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 5FE0EA74-7E65-45B8-A356-62240333491E
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 780140510
Partitions will be aligned on 2048-sector boundaries
Total free space is 520093629 sectors (248.0 GiB)

Number  Start (sector)End (sector)  Size   Code  Name
1   251660288   253757439   1024.0 MiB    ceph WAL
2204862916607   30.0 GiB  ceph DB
3   253757440   255854591   1024.0 MiB    ceph WAL
462916608   125831167   30.0 GiB  ceph DB
5   255854592   257951743   1024.0 MiB    ceph WAL
6   125831168   188745727   30.0 GiB  ceph DB
7   257951744   260048895   1024.0 MiB    ceph WAL
8   188745728   251660287   30.0 GiB  ceph DB
[root@osd04 ~]# ls -al /dev/disk/by-partuuid/
total 0
drwxr-xr-x 2 root root 200 Apr 26 15:39 .
drwxr-xr-x 8 root root 160 Apr 27 08:45 ..
lrwxrwxrwx 1 root root  10 Apr 27 09:38 0baf986d-f786-4c1a-8962-834743b33e3a
-> ../../sda8
lrwxrwxrwx 1 root root  10 Apr 27 09:38 2d9ab913-7553-46fc-8f96-5ffee028098a
-> ../../sda2
lrwxrwxrwx 1 root root  10 Apr 27 09:38 32ffde6f-7249-40b9-9bc5-2b70f0c3f7ad
-> ../../sda3
lrwxrwxrwx 1 root root  10 Apr 27 09:38 3f4e2d47-d553-4809-9d4e-06ba37b4c384
-> ../../sda6
lrwxrwxrwx 1 root root  10 Apr 27 09:38 3fc98512-a92e-4e3b-9de7-556b8e206786
-> ../../sda1
lrwxrwxrwx 1 root root  10 Apr 27 09:38 64b8ae66-cf37-4676-bf9f-9c4894788a7f
-> ../../sda7
lrwxrwxrwx 1 root root  10 Apr 27 09:38 96254af9-7fe4-4ce0-886e-2e25356eff81
-> ../../sda5
lrwxrwxrwx 1 root root  10 Apr 27 09:38 ae616b82-35ab-4f7f-9e6f-3c65326d76a8
-> ../../sda4






  dm-0 |  busy 90% |  | read2516  | write  0 |
|  KiB/r512 | KiB/w  0 |   | MBr/s  125.8 | MBw/s0.0
|   | avq10.65 | avio 3.57 ms  |  |
LVM | dm-1 |  busy 80% |  | read2406  | write
0 |  |  KiB/r512 | KiB/w  0 |   | MBr/s
120.3 | MBw/s0.0 |   | avq12.59 | avio 3.30 ms  |
|
DSK |  sdc |  busy 90% |  | read5044  | write
0 |  |  KiB/r256 | KiB/w  0 |   | MBr/s
126.1 | MBw/s0.0 |   | avq19.53 | avio 1.78 ms  |
|
DSK |  sdd |  busy 80% |  | read4805  | write
0 |  |  KiB/r256 | KiB/w  0 |   | MBr/s
120.1 | MBw/s0.0 |   | avq23.97 | avio 1.65 ms  |
|
DSK |  sda |  busy  0% |  | read   0  | write
7 |  |  KiB/r  0 | KiB/w 10 |   | MBr/s
0.0 | MBw/s0.0 |   | avq 0.00 | avio 0.00 ms  |
|

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
Last thing I can come up with is doing a 2 node scenario with at least one
of the nodes  being an other. Maybe you've already done that..

But again, even the read performance in your shown bench of the 2 node
cluster is pretty bad.

The premise of this thread that a 2 node cluster does work well, is not
true (imo).

Hans

On Thu, Apr 19, 2018, 19:28 Steven Vacaroaia <ste...@gmail.com> wrote:

> fio is fine and megacli setings are as below ( device with WT is the SSD)
>
>
>  Vendor Id  : TOSHIBA
>
> Product Id : PX05SMB040Y
>
> Capacity   : 372.0 GB
>
>
>
> Results
>
> Jobs: 20 (f=20): [W(20)] [100.0% done] [0KB/447.1MB/0KB /s] [0/115K/0
> iops] [eta 00m:00s]
>
>
>
> Vendor Id  : SEAGATE
>
> Product Id : ST600MM0006
>
> Capacity   : 558.375 GB
>
>
>
> Results
>
> Jobs: 10 (f=10): [W(10)] [100.0% done] [0KB/100.5MB/0KB /s] [0/25.8K/0
> iops] [eta 00m:00s]
>
>
>
>
>  megacli -LDGetProp -cache -Lall -a0
>
> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
> Direct, Write Cache OK if bad BBU
> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
>
> Exit Code: 0x00
> [root@osd01 ~]# megacli -LDGetProp -dskcache -Lall -a0
>
> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disk's Default
> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
>
>
> On Thu, 19 Apr 2018 at 14:22, Hans van den Bogert <hansbog...@gmail.com>
> wrote:
>
>> I see, the second one is the read bench. Even in the 2 node scenario the
>> read performance is pretty bad. Have you verified the hardware with micro
>> benchmarks such as 'fio'? Also try to review storage controller settings.
>>
>> On Apr 19, 2018 5:13 PM, "Steven Vacaroaia" <ste...@gmail.com> wrote:
>>
>> replication size is always 2
>>
>> DB/WAL on HDD in this case
>>
>> I tried with  OSDs with WAL/DB on SSD - they exhibit the same symptoms  (
>> cur MB/s 0 )
>>
>> In summary, it does not matter
>> - which server ( any 2 will work better than any 3 or 4)
>> - replication size ( it tried with size 2 and 3 )
>> - location of WAL/DB ( on separate SSD or same HDD)
>>
>>
>> Thanks
>> Steven
>>
>> On Thu, 19 Apr 2018 at 12:06, Hans van den Bogert <hansbog...@gmail.com>
>> wrote:
>>
>>> I take it that the first bench is with replication size 2, the second
>>> bench is with replication size 3? Same for the 4 node OSD scenario?
>>>
>>> Also please let us know how you setup block.db and Wal, are they on the
>>> SSD?
>>>
>>> On Thu, Apr 19, 2018, 14:40 Steven Vacaroaia <ste...@gmail.com> wrote:
>>>
>>>> Sure ..thanks for your willingness to help
>>>>
>>>> Identical servers
>>>>
>>>> Hardware
>>>> DELL R620, 6 cores, 64GB RAM, 2 x 10 GB ports,
>>>> Enterprise HDD 600GB( Seagate ST600MM0006), Enterprise grade SSD 340GB
>>>> (Toshiba PX05SMB040Y)
>>>>
>>>>
>>>> All tests done with the following command
>>>> rados bench -p rbd 50 write --no-cleanup && rados bench -p rbd 50 seq
>>>>
>>>>
>>>> ceph osd pool ls detail
>>>> "pool_name": "rbd",
>>>> "flags": 1,
>>>> "flags_names": "hashpspool",
>>>> "type": 1,
>>>> "size": 2,
>>>> "min_size": 1,
>>>> "crush_rule": 1,
>>>> "object_hash": 2,
>>>> "pg_num": 64,
>>>> "pg_placement_num": 64,
>>>> "crash_replay_interval": 0,
>>>> "last_change": "354",
>>>> "last_force_op_resend": "0",
>>>> "last_force_op_resend_preluminous": "0",
>>>> "auid": 0,
>>>> "snap_mode": "selfmanaged&quo

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
I see, the second one is the read bench. Even in the 2 node scenario the
read performance is pretty bad. Have you verified the hardware with micro
benchmarks such as 'fio'? Also try to review storage controller settings.

On Apr 19, 2018 5:13 PM, "Steven Vacaroaia" <ste...@gmail.com> wrote:

replication size is always 2

DB/WAL on HDD in this case

I tried with  OSDs with WAL/DB on SSD - they exhibit the same symptoms  (
cur MB/s 0 )

In summary, it does not matter
- which server ( any 2 will work better than any 3 or 4)
- replication size ( it tried with size 2 and 3 )
- location of WAL/DB ( on separate SSD or same HDD)


Thanks
Steven

On Thu, 19 Apr 2018 at 12:06, Hans van den Bogert <hansbog...@gmail.com>
wrote:

> I take it that the first bench is with replication size 2, the second
> bench is with replication size 3? Same for the 4 node OSD scenario?
>
> Also please let us know how you setup block.db and Wal, are they on the
> SSD?
>
> On Thu, Apr 19, 2018, 14:40 Steven Vacaroaia <ste...@gmail.com> wrote:
>
>> Sure ..thanks for your willingness to help
>>
>> Identical servers
>>
>> Hardware
>> DELL R620, 6 cores, 64GB RAM, 2 x 10 GB ports,
>> Enterprise HDD 600GB( Seagate ST600MM0006), Enterprise grade SSD 340GB
>> (Toshiba PX05SMB040Y)
>>
>>
>> All tests done with the following command
>> rados bench -p rbd 50 write --no-cleanup && rados bench -p rbd 50 seq
>>
>>
>> ceph osd pool ls detail
>> "pool_name": "rbd",
>> "flags": 1,
>> "flags_names": "hashpspool",
>> "type": 1,
>> "size": 2,
>> "min_size": 1,
>> "crush_rule": 1,
>> "object_hash": 2,
>> "pg_num": 64,
>> "pg_placement_num": 64,
>> "crash_replay_interval": 0,
>> "last_change": "354",
>> "last_force_op_resend": "0",
>> "last_force_op_resend_preluminous": "0",
>> "auid": 0,
>> "snap_mode": "selfmanaged",
>> "snap_seq": 0,
>> "snap_epoch": 0,
>> "pool_snaps": [],
>> "removed_snaps": "[]",
>> "quota_max_bytes": 0,
>> "quota_max_objects": 0,
>> "tiers": [],
>> "tier_of": -1,
>> "read_tier": -1,
>> "write_tier": -1,
>> "cache_mode": "none",
>> "target_max_bytes": 0,
>> "target_max_objects": 0,
>> "cache_target_dirty_ratio_micro": 40,
>> "cache_target_dirty_high_ratio_micro": 60,
>> "cache_target_full_ratio_micro": 80,
>> "cache_min_flush_age": 0,
>> "cache_min_evict_age": 0,
>> "erasure_code_profile": "",
>> "hit_set_params": {
>> "type": "none"
>> },
>> "hit_set_period": 0,
>> "hit_set_count": 0,
>> "use_gmt_hitset": true,
>> "min_read_recency_for_promote": 0,
>> "min_write_recency_for_promote": 0,
>> "hit_set_grade_decay_rate": 0,
>> "hit_set_search_last_n": 0,
>> "grade_table": [],
>> "stripe_width": 0,
>> "expected_num_objects": 0,
>> "fast_read": false,
>> "options": {},
>> "application_metadata": {}
>> }
>>
>>
>> ceph osd crush rule dump
>> [
>> {
>> "rule_id": 0,
>> "rule_name": "replicated_rule",
>> "ruleset": 0,
>> "type": 1,
>> "min_size": 1,
>> "max_size": 10,
>> "steps": [
>> {
>> "op": "take",
>> "item": -1,
>> "item_name": "default"
>> },
>> {
>> "op": "chooseleaf_firstn",
>> "num": 0,
>> "type": &qu

Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
  466   46.5956 0   -
>  1.25618
>41  16   488   472   46.044412   0.0175485
>  1.25181
>42  16   488   472   44.9481 0   -
>  1.25181
>43  16   488   472   43.9028 0   -
>  1.25181
>44  16   562   546   49.6316   98.6667   0.0150341
>  1.26385
>45  16   569   553   49.150828   0.0151556
>  1.25516
>46  16   569   553   48.0823 0   -
>  1.25516
>47  16   569   553   47.0593 0   -
>  1.25516
>48  16   569   553   46.0789 0   -
>  1.25516
>49  16   569   553   45.1386 0   -
>  1.25516
>50  16   569   553   44.2358 0   -
>  1.25516
>51  16   569   553   43.3684 0   -
>  1.25516
> Total time run: 51.724920
> Total writes made:  570
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 44.0793
> Stddev Bandwidth:   55.3843
> Max bandwidth (MB/sec): 232
> Min bandwidth (MB/sec): 0
> Average IOPS:   11
> Stddev IOPS:13
> Max IOPS:   58
> Min IOPS:   0
> Average Latency(s): 1.45175
> Stddev Latency(s):  2.9411
> Max latency(s): 11.3013
> Min latency(s): 0.0141657
>
>
>
> 2018-04-19 09:36:35.633624 min lat: 0.00804825 max lat: 10.2583 avg lat:
> 1.03388
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>40  16   479   463   46.2955 0   -
>  1.03388
>41  16   540   524   51.1169  24.4  0.00913275
>  1.23193
>42  16   540   524   49.8999 0   -
>  1.23193
>43  16   541   525   48.8324 2 2.31401
>  1.23399
>44  16   541   525   47.7226 0   -
>  1.23399
>45  16   541   525   46.6621 0   -
>  1.23399
>46  16   541   525   45.6477 0   -
>  1.23399
>47  16   541   525   44.6765 0   -
>  1.23399
>48  16   541   525   43.7458 0   -
>  1.23399
>49  16   541   52542.853 0   -
>  1.23399
>50  16   541   52541.996 0   -
>  1.23399
>51  16   541   525   41.1725 0   -
>  1.23399
> Total time run:   51.530655
> Total reads made: 542
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   42.072
> Average IOPS: 10
> Stddev IOPS:  15
> Max IOPS: 62
> Min IOPS: 0
> Average Latency(s):   1.5204
> Max latency(s):   11.4841
> Min latency(s):   0.00627081
>
>
>
> Many thanks
> Steven
>
>
>
>
> On Thu, 19 Apr 2018 at 08:42, Hans van den Bogert <hansbog...@gmail.com>
> wrote:
>
>> Hi Steven,
>>
>> There is only one bench. Could you show multiple benches of the different
>> scenarios you discussed? Also provide hardware details.
>>
>> Hans
>>
>> On Apr 19, 2018 13:11, "Steven Vacaroaia" <ste...@gmail.com> wrote:
>>
>> Hi,
>>
>> Any idea why 2 servers with one OSD each will provide better performance
>> than 3 ?
>>
>> Servers are identical
>> Performance  is impacted irrespective if I used SSD for WAL/DB or not
>> Basically, I am getting lots of cur MB/s zero
>>
>> Network is separate 10 GB for public and private
>> I tested it with iperf and I am getting 9.3 Gbs
>>
>> I have tried replication by 2 and 3 with same results ( much better for 2
>> servers than 3 )
>>
>> reinstalled CEPH multiple times
>> ceph.conf very simple - no major customization ( see below)
>> I am out of ideas - any hint will be TRULY appreciated
>>
>> Steven
>>
>>
>>
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>>
>> public_network = 10.10.30.0/24
>> cluster_network = 192.168.0.0/24
>>
>>
>> osd_pool_default_size = 2
>> osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
>> osd_crush_chooseleaf_type = 1
>>
>>
>> [mon]
>> mon_allow_pool_delete = true
>> mon_osd_min_down_reporters = 1
>>
>> [osd]
>> osd_mkfs_type = xfs
>> osd_mount_options_xfs =
>> "rw,noatime,nodiratime,attr2,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=4M"
>> osd_mkfs_options_xfs = "-f -i size=2048"
>> bluestore_block_db_size = 32212254720
>> bluestore_block_wal_size = 1073741824
>>
>> rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd 120 seq
>> hints = 1
>> Maintaining 16 concurrent writes of 4194304 bytes to objects of size
>> 4194304 for up to 120 seconds or 0 objects
>> Object prefix: benchmark_data_osd01_383626
>>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
>> lat(s)
>> 0   0 0 0 0 0   -
>>0
>> 1  165741   163.991   1640.197929
>> 0.065543
>> 2  16574181.992 0   -
>> 0.065543
>> 3  166751   67.993620   0.0164632
>> 0.249939
>> 4  166751   50.9951 0   -
>> 0.249939
>> 5  167155   43.9958 8   0.0171439
>> 0.319973
>> 6  16   181   165   109.989   440   0.0159057
>> 0.563746
>> 7  16   182   166   94.8476 40.221421
>> 0.561684
>> 8  16   182   166   82.9917 0   -
>> 0.561684
>> 9  16   240   224   99.5458   116   0.0232989
>> 0.638292
>>10  16   264   248   99.190196   0.0222669
>> 0.583336
>>11  16   264   248   90.1729 0   -
>> 0.583336
>>12  16   285   269   89.657942   0.0165706
>> 0.600606
>>13  16   285   269   82.7611 0   -
>> 0.600606
>>14  16   310   294   83.991850   0.0254241
>> 0.756351
>>
>>
>>
>
>
>
>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Hans van den Bogert
Hi Steven,

There is only one bench. Could you show multiple benches of the different
scenarios you discussed? Also provide hardware details.

Hans

On Apr 19, 2018 13:11, "Steven Vacaroaia"  wrote:

Hi,

Any idea why 2 servers with one OSD each will provide better performance
than 3 ?

Servers are identical
Performance  is impacted irrespective if I used SSD for WAL/DB or not
Basically, I am getting lots of cur MB/s zero

Network is separate 10 GB for public and private
I tested it with iperf and I am getting 9.3 Gbs

I have tried replication by 2 and 3 with same results ( much better for 2
servers than 3 )

reinstalled CEPH multiple times
ceph.conf very simple - no major customization ( see below)
I am out of ideas - any hint will be TRULY appreciated

Steven



auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx


public_network = 10.10.30.0/24
cluster_network = 192.168.0.0/24


osd_pool_default_size = 2
osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
osd_crush_chooseleaf_type = 1


[mon]
mon_allow_pool_delete = true
mon_osd_min_down_reporters = 1

[osd]
osd_mkfs_type = xfs
osd_mount_options_xfs =
"rw,noatime,nodiratime,attr2,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=4M"
osd_mkfs_options_xfs = "-f -i size=2048"
bluestore_block_db_size = 32212254720
bluestore_block_wal_size = 1073741824

rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd 120 seq
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size
4194304 for up to 120 seconds or 0 objects
Object prefix: benchmark_data_osd01_383626
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  165741   163.991   1640.197929
0.065543
2  16574181.992 0   -
0.065543
3  166751   67.993620   0.0164632
0.249939
4  166751   50.9951 0   -
0.249939
5  167155   43.9958 8   0.0171439
0.319973
6  16   181   165   109.989   440   0.0159057
0.563746
7  16   182   166   94.8476 40.221421
0.561684
8  16   182   166   82.9917 0   -
0.561684
9  16   240   224   99.5458   116   0.0232989
0.638292
   10  16   264   248   99.190196   0.0222669
0.583336
   11  16   264   248   90.1729 0   -
0.583336
   12  16   285   269   89.657942   0.0165706
0.600606
   13  16   285   269   82.7611 0   -
0.600606
   14  16   310   294   83.991850   0.0254241
0.756351


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scalability new node to the existing cluster

2018-04-18 Thread Hans van den Bogert
I keep seeing these threads where adding nodes has such an impact on the 
cluster as a whole, that I wonder what the rest of the cluster looks like. 
Normally I’d just advise someone to put a limit on the concurrent backfills 
that can be done, and `osd max backfills` by default already is 1. Could it be 
that the real culprit here is that the hardware is heavily overbooked? 68 OSDs 
per node sounds an order of magnitude above what you should be doing, unless 
you have vast experience with Ceph and its memory requirements under stress. 
I wonder if this cluster would even come online after an outage, or would also 
crumble due to peering and possible backfilling.

To be honest I don’t even get why using the weight option would solve this. The 
same amount of data needs to be transferred any way at one point; it seems like 
a poor-man’s throttling mechanism. And if memory shortage is the case here, due 
to, again, the many OSDs than the reweight strategy will only give you slightly 
better odds.

So
1) I would keep track of memory usage on the nodes to see if that increases 
under peering/backfilling, 
  - If this is the case, and you’re using bluestore: try lowering 
bluestore_cache_size* params, to give you some leeway.
2) If using bluestore, try throttling by changing the following params, 
depending on your environment:
  - osd recovery sleep
  - osd recovery sleep hdd
  - osd recovery sleep ssd

There are other throttling params you can change, though most defaults are just 
fine in my environment, and I don’t have experience with them.

Good luck, 

Hans


> On Apr 18, 2018, at 1:32 PM, Serkan Çoban  wrote:
> 
> You can add new OSDs with 0 weight and edit below script to increase
> the osd weights instead of decreasing.
> 
> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight
> 
> 
> On Wed, Apr 18, 2018 at 2:16 PM, nokia ceph  wrote:
>> Hi All,
>> 
>> We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now we are
>> trying to add new node with 68 disks to the cluster .
>> 
>> We tried to add new node and created all OSDs in one go , the cluster
>> stopped all client traffic and does only backfilling .
>> 
>> Any procedure to add the new node without affecting the client traffic ?
>> 
>> If we create  OSDs one by one , then there is no issue in client traffic
>> however  time taken to add new node with 68 disks will be several months.
>> 
>> Please provide your suggestions..
>> 
>> Thanks,
>> Muthu
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.2.3 release date?

2018-02-12 Thread Hans van den Bogert
Hi Wido,

Did you ever get an answer? I'm eager to know as well.


Hans

On Tue, Jan 30, 2018 at 10:35 AM, Wido den Hollander  wrote:
> Hi,
>
> Is there a ETA yet for 12.2.3? Looking at the tracker there aren't that many
> outstanding issues: http://tracker.ceph.com/projects/ceph/roadmap
>
> On Github we have more outstanding PRs though for the Luminous milestone:
> https://github.com/ceph/ceph/milestone/10
>
> Are we expecting 12.2.3 in Feb? I'm asking because there are some Mgr
> related fixes I'm backporting now for a few people which are in 12.2.3.
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Retrieving ceph health from restful manager plugin

2018-02-05 Thread Hans van den Bogert
Hi All,

I might really be bad at searching, but I can't seem to find the ceph
health status through the new(ish) restful api. Is that right? I know
how I could retrieve it through a Python script, however I'm trying to
keep our monitoring application as layer cake free as possible -- as
such a restful API call would be preferred.

Regards,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Redirect for restful API in manager

2018-02-05 Thread Hans van den Bogert
Hi all,

In the release notes of 12.2.2 the following is stated:

> Standby ceph-mgr daemons now redirect requests to the active
messenger, easing configuration for tools & users accessing the web
dashboard, restful API, or other ceph-mgr module services.

However, it doesn't seem to be the case that the restful API redirects
the client. Can anybody verify that? If it doesn't redirect, will this
be added in the near future?

Regards,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Ceph team involvement in Rook (Deploying Ceph in Kubernetes)

2018-01-21 Thread Hans van den Bogert
Should I summarize this is ceph-helm being being EOL? If I'm spinning up a
toy cluster for a homelab, should I invest time in Rook, or stay with
ceph-helm for now?

On Fri, Jan 19, 2018 at 11:55 AM, Kai Wagner  wrote:

> Just for those of you who are not subscribed to ceph-users.
>
>
>  Forwarded Message 
> Subject: Ceph team involvement in Rook (Deploying Ceph in Kubernetes)
> Date: Fri, 19 Jan 2018 11:49:05 +0100
> From: Sebastien Han  
> To: ceph-users  ,
> Squid Cybernetic  ,
> Dan Mick  , Chen, Huamin
>  , John Spray 
> , Sage Weil  ,
> bas...@tabbara.com
>
> Everyone,
>
> Kubernetes is getting bigger and bigger. It has become the platform of
> choice to run microservices applications in containers, just like
> OpenStack did for and Cloud applications in virtual machines.
>
> When it comes to container storage there are three key aspects:
>
> * Providing persistent storage to containers, Ceph has drivers in
> Kuberntes already with kRBD and CephFS
> * Containerizing the storage itself, so efficiently running Ceph
> services in Containers. Currently, we have ceph-container
> (https://github.com/ceph/ceph-container)
> * Deploying the containerized storage in Kubernetes, we wrote
> ceph-helm charts (https://github.com/ceph/ceph-helm)
>
> The third piece although it's working great has a particular goal and
> doesn't aim to run Ceph just like any other applications in Kuberntes.
> We were also looking for a better abstraction/ease of use for
> end-users, multi-cluster support, operability, life-cycle management,
> centralized operations, to learn more you can 
> readhttp://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021918.html.
> As a consequence, we decided to look at what the ecosystem had to
> offer. As a result, Rook came out, as a pleasant surprise. For those
> who are not familiar with Rook, please visit https://rook.io but in a
> nutshell, Rook is an open source orchestrator for distributed storage
> systems running in cloud-native environments. Under the hood, Rook is
> deploying, operating and managing Ceph life cycle in Kubernetes. Rook
> has a vibrant community and committed developers.
>
> Even if Rook is not perfect (yet), it has firm foundations, and we are
> planning on helping to make it better. We already opened issues for
> that and started doing work with Rook's core developers. We are
> looking at reconciling what is available today
> (rook/ceph-container/helm), reduce the overlap/duplication and all
> work together toward a single and common goal. With this
> collaboration, through Rook, we hope to make Ceph the de facto Open
> Source storage solution for Kubernetes.
>
> These are exciting times, so if you're a user, a developer, or merely
> curious, have a look at Rook and send us feedback!
>
> Thanks!
> --
> Cheers
>
> ––
> Sébastien Han
> Principal Software Engineer, Storage Architect
>
> "Always give 100%. Unless you're giving blood."
>
> Mail: s...@redhat.com
> Address: 11 bis, rue Roquépine - 75008 Paris
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing PG number

2018-01-02 Thread Hans van den Bogert
Please refer to standard documentation as much as possible, 


http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#set-the-number-of-placement-groups
 


Han’s is also incomplete, since you also need to change the ‘pgp_num’ as well.

Regards,

Hans

> On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev  wrote:
> 
> Increased number of PGs in multiple pools in a production cluster on 12.2.2 
> recently - zero issues.
> CEPH claims that increasing pg_num and pgp_num are safe operations, which are 
> essential for it's ability to scale, and this sounds pretty reasonable to me. 
> [1]
> 
> 
> [1] 
> https://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/
>  
> 
> 
> 2018-01-02 18:21 GMT+03:00 Karun Josy  >:
> Hi,
> 
>  Initial PG count was not properly planned while setting up the cluster, so 
> now there are only less than 50 PGs per OSDs.
> 
> What are the best practises to increase PG number of a pool ?
> We have replicated pools as well as EC pools.
> 
> Or is it better to create a new pool with higher PG numbers?
> 
> 
> Karun 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The way to minimize osd memory usage?

2017-12-11 Thread Hans van den Bogert
There’s probably multiple reasons. However I just wanted to chime in that I set 
my cache size to 1G and I constantly see OSD memory converge to ~2.5GB. 

In [1] you can see the difference between a node with 4 OSDs, v12.2.2, on the 
left; and a node with 4 OSDs v12.2.1 on the right. I really hoped that v12.2.2 
would make the memory usage a bit closer to the cache parameter. almost 2.5x, 
in contrast to 3x of 12.2.1, is still quite far off IMO.

Practically, I think it’s not quite possible to have 2 OSDs on your 2GB server, 
let alone have some leeway memory.


[1] https://pasteboard.co/GXHO5eF.png 

> On Dec 11, 2017, at 3:44 AM, shadow_lin  wrote:
> 
> My workload is mainly seq write(for surveillance usage).I am not sure how 
> cache would effect the write performance and why the memory usage keeps 
> increasing as more data is wrote into ceph storage.
>  
> 2017-12-11 
> lin.yunfan
> 发件人:Peter Woodman 
> 发送时间:2017-12-11 05:04
> 主题:Re: [ceph-users] The way to minimize osd memory usage?
> 收件人:"David Turner"
> 抄送:"shadow_lin","ceph-users","Konstantin
>  Shalygin"
>  
> I've had some success in this configuration by cutting the bluestore 
> cache size down to 512mb and only one OSD on an 8tb drive. Still get 
> occasional OOMs, but not terrible. Don't expect wonderful performance, 
> though. 
>  
> Two OSDs would really be pushing it. 
>  
> On Sun, Dec 10, 2017 at 10:05 AM, David Turner  wrote: 
> > The docs recommend 1GB/TB of OSDs. I saw people asking if this was still 
> > accurate for bluestore and the answer was that it is more true for 
> > bluestore 
> > than filestore. There might be a way to get this working at the cost of 
> > performance. I would look at Linux kernel memory settings as much as ceph 
> > and bluestore settings. Cache pressure is one that comes to mind that an 
> > aggressive setting might help. 
> > 
> > 
> > On Sat, Dec 9, 2017, 11:33 PM shadow_lin  wrote: 
> >> 
> >> The 12.2.1(12.2.1-249-g42172a4 (42172a443183ffe6b36e85770e53fe678db293bf) 
> >> we are running is with the memory issues fix.And we are working on to 
> >> upgrade to 12.2.2 release to see if there is any furthermore improvement. 
> >> 
> >> 2017-12-10 
> >>  
> >> lin.yunfan 
> >>  
> >> 
> >> 发件人:Konstantin Shalygin  
> >> 发送时间:2017-12-10 12:29 
> >> 主题:Re: [ceph-users] The way to minimize osd memory usage? 
> >> 收件人:"ceph-users" 
> >> 抄送:"shadow_lin" 
> >> 
> >> 
> >> > I am testing running ceph luminous(12.2.1-249-g42172a4 
> >> > (42172a443183ffe6b36e85770e53fe678db293bf) on ARM server. 
> >> Try new 12.2.2 - this release should fix memory issues with Bluestore. 
> >> 
> >> ___ 
> >> ceph-users mailing list 
> >> ceph-users@lists.ceph.com 
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> > 
> > ___ 
> > ceph-users mailing list 
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd/bluestore: Get block.db usage

2017-12-04 Thread Hans van den Bogert
Hi all,

Is there a way to get the current usage of the bluestore's block.db?
I'd really like to monitor this as we have a relatively high number of
objects per OSD.

A second question related to the above, are there mechanisms to
influence which objects' metadata gets spilled once the block.db is
full? -- For instance, I could not care for the extra latency when
object metadata gets spilled to the backing disk if it for RGW-related
data, in contrast to RBD objects metadata, which should remain on the
faster SSD-based block.db.

Regards,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Hans van den Bogert
Never mind, you already said you are on the latest ceph-deploy, so that can’t 
be it.
I’m not familiar with deploying on Centos, but I can imagine that the last part 
of the checklist is important:

http://docs.ceph.com/docs/luminous/start/quick-start-preflight/#priorities-preferences

Can you verify that you did that part?

> On Nov 15, 2017, at 10:41 AM, Hans van den Bogert <hansbog...@gmail.com> 
> wrote:
> 
> Hi,
> 
> Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ?
> 
> Regards,
> 
> Hans
>> On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.) <tj.ra...@leicester.ac.uk> 
>> wrote:
>> 
>> Hi All,
>> 
>> I feel like I’m doing something silly.  I’m spinning up a new cluster, and 
>> followed the instructions on the pre-flight and quick start here:
>> 
>> http://docs.ceph.com/docs/luminous/start/quick-start-preflight/
>> http://docs.ceph.com/docs/luminous/start/quick-ceph-deploy/
>> 
>> but ceph-deploy always installs Jewel.   
>> 
>> ceph-deploy is version 1.5.39 and I’m running CentOS 7 (7-4.1708)
>> 
>> Any help would be appreciated.
>> 
>> -TJ Ragan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Hans van den Bogert
Hi,

Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ?

Regards,

Hans
> On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.)  
> wrote:
> 
> Hi All,
> 
> I feel like I’m doing something silly.  I’m spinning up a new cluster, and 
> followed the instructions on the pre-flight and quick start here:
> 
> http://docs.ceph.com/docs/luminous/start/quick-start-preflight/
> http://docs.ceph.com/docs/luminous/start/quick-ceph-deploy/
> 
> but ceph-deploy always installs Jewel.   
> 
> ceph-deploy is version 1.5.39 and I’m running CentOS 7 (7-4.1708)
> 
> Any help would be appreciated.
> 
> -TJ Ragan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-09 Thread Hans van den Bogert
> On Nov 9, 2017, at 5:25 AM, Sam Huracan <nowitzki.sa...@gmail.com> wrote:
> 
> root@radosgw system]# ceph --admin-daemon 
> /var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep log_file
> "log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log”,

The .asok filename resembles what should be used in your config. If Im right 
you should use ‘client.rgw.radosgw’ in your ceph.conf.



> On Nov 9, 2017, at 5:25 AM, Sam Huracan <nowitzki.sa...@gmail.com> wrote:
> 
> @Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is the 
> same in ceph.conf.
> Everything go well, service radosgw running, port 7480 is opened, but all my 
> config of radosgw in ceph.conf can't be set, rgw_dns_name is still empty, and 
> log file keeps default value.
> 
> [root@radosgw system]# ceph --admin-daemon 
> /var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep log_file
> "log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log",
> 
> 
> [root@radosgw system]# cat /etc/ceph/ceph.client.radosgw.keyring 
> [client.radosgw.gateway]
> key = AQCsywNaqQdDHxAAC24O8CJ0A9Gn6qeiPalEYg==
> caps mon = "allow rwx"
> caps osd = "allow rwx"
> 
> 
> 2017-11-09 6:11 GMT+07:00 Hans van den Bogert <hansbog...@gmail.com 
> <mailto:hansbog...@gmail.com>>:
> Are you sure you deployed it with the client.radosgw.gateway name as
> well? Try to redeploy the RGW and make sure the name you give it
> corresponds to the name you give in the ceph.conf. Also, do not forget
> to push the ceph.conf to the RGW machine.
> 
> On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan <nowitzki.sa...@gmail.com 
> <mailto:nowitzki.sa...@gmail.com>> wrote:
> >
> >
> > Hi Cephers,
> >
> > I'm testing RadosGW in Luminous version.  I've already installed done in 
> > separate host, service is running but RadosGW did not accept any my 
> > configuration in ceph.conf.
> >
> > My Config:
> > [client.radosgw.gateway]
> > host = radosgw
> > keyring = /etc/ceph/ceph.client.radosgw.keyring
> > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> > log file = /var/log/radosgw/client.radosgw.gateway.log
> > rgw dns name = radosgw.demo.com <http://radosgw.demo.com/>
> > rgw print continue = false
> >
> >
> > When I show config of radosgw socket:
> > [root@radosgw ~]# ceph --admin-daemon 
> > /var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep dns
> > "mon_dns_srv_name": "",
> > "rgw_dns_name": "",
> > "rgw_dns_s3website_name": "",
> >
> > rgw_dns_name is empty, hence S3 API is unable to access Ceph Object Storage.
> >
> >
> > Do anyone meet this issue?
> >
> > My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64
> >
> > Thanks in advance
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Hans van den Bogert
Are you sure you deployed it with the client.radosgw.gateway name as
well? Try to redeploy the RGW and make sure the name you give it
corresponds to the name you give in the ceph.conf. Also, do not forget
to push the ceph.conf to the RGW machine.

On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan  wrote:
>
>
> Hi Cephers,
>
> I'm testing RadosGW in Luminous version.  I've already installed done in 
> separate host, service is running but RadosGW did not accept any my 
> configuration in ceph.conf.
>
> My Config:
> [client.radosgw.gateway]
> host = radosgw
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /var/log/radosgw/client.radosgw.gateway.log
> rgw dns name = radosgw.demo.com
> rgw print continue = false
>
>
> When I show config of radosgw socket:
> [root@radosgw ~]# ceph --admin-daemon 
> /var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep dns
> "mon_dns_srv_name": "",
> "rgw_dns_name": "",
> "rgw_dns_s3website_name": "",
>
> rgw_dns_name is empty, hence S3 API is unable to access Ceph Object Storage.
>
>
> Do anyone meet this issue?
>
> My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64
>
> Thanks in advance
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Just to get this really straight, Jewel OSDs do send this metadata?
Otherwise I'm probably mistaken that I ever saw 10.2.x versions in the
output.

Thanks,

Hans

On 2 Nov 2017 12:31 PM, "John Spray" <jsp...@redhat.com> wrote:

> On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert
> <hansbog...@gmail.com> wrote:
> > Hi all,
> >
> > During our upgrade from Jewel to Luminous I saw the following behaviour,
> if
> > my memory serves me right:
> >
> > When upgrading for example monitors and OSDs, we saw that the `ceph
> > versions` command correctly showed at one that some OSDs were still on
> Jewel
> > (10.2.x) and some were already upgraded and thus showed a version of
> 12.2.0.
> > All as expected -- however, for the RGWs, those only showed up until they
> > were upgraded to luminous. So the command gives a false sense of a
> complete
> > overview of the Ceph cluster, e.g., in my case this resulted that I
> forgot
> > about 4 out of 6 RGW instances which were still on Jewel.
> >
> > What are the semantics of the `ceph versions` ? -- Was I wrong in
> expecting
> > that Jewel RGWs should show up there?
>
> RGW daemons reporting metadata to the mon/mgr daemons is a new
> features in luminous -- older RGW daemons are effectively invisible to
> us, so Jewel daemons will not show up at all in the versions output.
>
> John
>
> >
> > Thanks,
> >
> > Hans
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Hi all,

During our upgrade from Jewel to Luminous I saw the following behaviour, if
my memory serves me right:

When upgrading for example monitors and OSDs, we saw that the `ceph
versions` command correctly showed at one that some OSDs were still on
Jewel (10.2.x) and some were already upgraded and thus showed a version of
12.2.0.
All as expected -- however, for the RGWs, those only showed up until they
were upgraded to luminous. So the command gives a false sense of a complete
overview of the Ceph cluster, e.g., in my case this resulted that I forgot
about 4 out of 6 RGW instances which were still on Jewel.

What are the semantics of the `ceph versions` ? -- Was I wrong in expecting
that Jewel RGWs should show up there?

Thanks,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
Never mind, I should’ve read the whole thread first.
> On Nov 2, 2017, at 10:50 AM, Hans van den Bogert <hansbog...@gmail.com> wrote:
> 
> 
>> On Nov 1, 2017, at 4:45 PM, David Turner <drakonst...@gmail.com 
>> <mailto:drakonst...@gmail.com>> wrote:
>> 
>> All it takes for data loss is that an osd on server 1 is marked down and a 
>> write happens to an osd on server 2.  Now the osd on server 2 goes down 
>> before the osd on server 1 has finished backfilling and the first osd 
>> receives a request to modify data in the object that it doesn't know the 
>> current state of.  Tada, you have data loss.
> 
> I’m probably misunderstanding, but if a osd on server 1 is backfilling, and 
> its only candidate to backfill from is an osd on server 2, and the latter 
> goes down; then wouldn’t the osd on server 1 block, i.e., not accept requests 
> to modify, until server 1 comes up again?
> Or is there a ‘hole' here somewhere where server 1 *thinks* it’s done 
> backfilling whereas the osdmap it used to backfill with was out of date?
> 
> Thanks, 
> 
> Hans

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert

> On Nov 1, 2017, at 4:45 PM, David Turner  wrote:
> 
> All it takes for data loss is that an osd on server 1 is marked down and a 
> write happens to an osd on server 2.  Now the osd on server 2 goes down 
> before the osd on server 1 has finished backfilling and the first osd 
> receives a request to modify data in the object that it doesn't know the 
> current state of.  Tada, you have data loss.

I’m probably misunderstanding, but if a osd on server 1 is backfilling, and its 
only candidate to backfill from is an osd on server 2, and the latter goes 
down; then wouldn’t the osd on server 1 block, i.e., not accept requests to 
modify, until server 1 comes up again?
Or is there a ‘hole' here somewhere where server 1 *thinks* it’s done 
backfilling whereas the osdmap it used to backfill with was out of date?

Thanks, 

Hans___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] announcing ceph-helm (ceph on kubernetes orchestration)

2017-10-25 Thread Hans van den Bogert
Very interesting.
I've been toying around with Rook.io [1]. Did you know of this project, and
if so can you tell if ceph-helm and Rook.io have similar goals?

Regards,

Hans

[1] https://rook.io/

On 25 Oct 2017 21:09, "Sage Weil"  wrote:

> There is a new repo under the ceph org, ceph-helm, which includes helm
> charts for deploying ceph on kubernetes.  The code is based on the ceph
> charts from openstack-helm, but we've moved them into their own upstream
> repo here so that they can be developed more quickly and independently
> from the openstack-helm work.  The code has already evolved a fair bit,
> mostly to support luminous and fix a range of issues:
>
> https://github.com/ceph/ceph-helm/tree/master/ceph/ceph
>
> The repo is a fork of the upstream kubernetes/charts.git repo with an eye
> toward eventually merging the chart upstream into that repo.  How useful
> that would be in practice is not entirely clear to me since the version in
> the ceph-helm repo will presumably always be more up to date and users
> have to point to *some* source for the chart either way.  Also the current
> structure of the files in the repo is carried over from openstack-helm,
> which uses the helm-toolkit stuff and isn't in the correct form for the
> upstream charts.git.  Suggestions/input here on what direction makes more
> sense would be welcome!
>
> There are also some docs on getting a ceph cluster up in kubernetes using
> these charts at
>
> https://github.com/ceph/ceph/pull/18520
> http://docs.ceph.com/ceph-prs/18520/start/kube-helm/
>
> that should be merged shortly.  Not terribly detailed and we're not
> covering much on the operations side yet, but that all is coming.
>
> A very rough sketch of the direction currently being considered from
> running ceph in kubernetes is here:
>
> http://pad.ceph.com/p/containers
>
> and there is a trello board here
>
> https://trello.com/b/kcXOllJp/kubehelm
>
> All of this builds on the container image that Sebastien has been working
> on for some time, that has recently been renamed from ceph-docker ->
> ceph-container
>
> https://github.com/ceph/ceph-container
>
> Dan is working on getting an image registry up at registry.ceph.com so
> that we can publish test build images, releases, or both.
>
> We also have a daily sync up call for the folks who are actively working
> on this.
>
> That's all for now!  :)
> sage
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Drive write cache recommendations for Luminous/Bluestore

2017-10-23 Thread Hans van den Bogert
Hi All,

For Jewel there is this page about drive cache:
http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/#hard-drive-prep

For Bluestore I can't find any documentation or discussions about drive
write cache, while I can imagine that revisiting this subject might be
necessary.

For our cluster specifically, we use HP gen 9 with a b140i controller where
disks are directly attached (i.e., not RAID). Often with the Linux kernel,
controllers automatically enable drive write cache for directly attached
harddisks since this *should* be safe as long as harddisks correctly adhere
to flush semantics . In the case of the b140i controller; I can confirm
with `hdparm -W /dev/sdx` that drive write cache is *not* enabled by
default.

So my main two questions are:

1. Did anybody do extensive testing, with luminous in combination with
drive write cache enabled, or at least elaborate on the subject.
2. Depending on item 1., could and should I enable drive write cache for
the disks attached to a HP b140i controller.

Thanks!

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph delete files and status

2017-10-20 Thread Hans van den Bogert
My experience with RGW is that actual freeing up of space is asynchronous to 
the a S3 client’s command to delete an object. I.e., it might take a while 
before it’s actually freed up.
Can you redo your little experiment and simply wait for an hour to let the 
garbage collector to do its thing, or force the garbage collector with 
something like:

$ radosgw-admin gc process

I haven’t used this command to actually test if this would have the intended 
result of freeing up space. But it wouldn’t hurt anything.

Regards,

Hans

> On Oct 19, 2017, at 11:06 PM, nigel davies  wrote:
> 
> I am using RGW, with an S3 bucket setup. 
> 
> The live vershion also uses rbd as well 
> 
> On 19 Oct 2017 10:04 pm, "David Turner"  > wrote:
> How are you uploading a file?  RGW, librados, CephFS, or RBD?  There are 
> multiple reasons that the space might not be updating or cleaning itself up.  
> The more information you can give us about how you're testing, the more we 
> can help you.
> 
> On Thu, Oct 19, 2017 at 5:00 PM nigel davies  > wrote:
> Hay
> 
> I some how got the space back, by tweeking the reweights. 
> 
> but i am a tad confused i uploaded a file (200MB) then removed the file and 
> the space is not changed. i am not sure why that happens and what i can do
> 
> On Thu, Oct 19, 2017 at 6:42 PM, nigel davies  > wrote:
> PS was not aware of fstrim
> 
> On 19 Oct 2017 6:41 pm, "nigel davies"  > wrote:
> Hay
> 
> My ceph cluster is connted to ceph gateway for a S3 server I was uploading 
> the file and removing it from the bucket using s3cmd.
> 
> I upload and removed the file a few times now the cluster so close to full. 
> 
> I was told their is a way to clear the any deleted files out or something.
> 
> On 19 Oct 2017 5:09 pm, "Jamie Fargen"  > wrote:
> Nigel-
> 
> What method did you use to upload and delete the file? How did you check the 
> space utilization? I believe the reason that you are still seeing the space 
> being utilized when you issue your ceph -df is because even after the file is 
> deleted, the file system doesn't actually delete the file, it just removes 
> the file inode entry pointing to the file. The file will still be on the disk 
> until the blocks are re-allocated to another file.
> 
> -Jamie
> 
> On Thu, Oct 19, 2017 at 11:54 AM, nigel davies  > wrote:
> Hay all
> 
> I am looking at my small test Ceph cluster, i have uploaded a 200MB iso and 
> checked the space on "ceph status" and see it incress.
> 
> But when i delete the file the space used does not go down. 
> 
> Have i missed a configuration somewhere or something?
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> 
> 
> -- 
> Jamie Fargen
> Consultant
> jfar...@redhat.com 
> 813-817-4430 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High mem with Luminous/Bluestore

2017-10-19 Thread Hans van den Bogert
> Memory usage is still quite high here even with a large onode cache! 
> Are you using erasure coding?  I recently was able to reproduce a bug in 
> bluestore causing excessive memory usage during large writes with EC, 
> but have not tracked down exactly what's going on yet.
> 
> Mark
No, this is 3-way replicated cluster for all pools.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High mem with Luminous/Bluestore

2017-10-18 Thread Hans van den Bogert
Indeed it shows ssd in the OSD's metadata.

"bluestore_bdev_type": "ssd",


Then I misunderstood the role of the device class in CRUSH, I expected the
OSD would actually set its settings according to the CRUSH device class.

I'll try to force the OSDs to behave like HDDs and monitor the memory usage.


Thanks,

Hans

On Wed, Oct 18, 2017 at 11:56 AM, Wido den Hollander <w...@42on.com> wrote:

>
> > Op 18 oktober 2017 om 11:41 schreef Hans van den Bogert <
> hansbog...@gmail.com>:
> >
> >
> > Hi All,
> >
> > I've converted 2 nodes with 4 HDD/OSDs each from Filestore to Bluestore.
> I
> > expected somewhat higher memory usage/RSS values, however I see, imo, a
> > huge memory usage for all OSDs on both nodes.
> >
> > Small snippet from `top`
> > PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> > COMMAND
> > 4652 ceph  20   0 9840236 8.443g  21364 S   0.7 27.1  31:21.15
> > /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
> >
> >
> > The only deviation from a conventional install is that we use bcache for
> > our HDDs. Bcache by default is recognized as an 'ssd' in CRUSH. I've
> > manually set the class to 'hdd'.
> >
> > Small snippet from `ceph osd tree`
> >   -37.27399 host osd02
> >  5   hdd  1.81850 osd.5  up  1.0 1.0
> >
> > So I would expect around 2GB of usage according to rules of thumb in the
> > documentation and Sage's comments about the bluestore cache parameters
> for
> > HDDs; yet we're now seeing a usage of more than 8GB after less than 1 day
> > of runtime for this OSD. Is this a memory leak?
>
> Although you've set the class to HDD it's the OSD which probably sees
> itself as an SSD backed OSD.
>
> Test with:
>
> $ ceph osd metadata 5
>
> It will show:
>
> "bluestore_bdev_rotational": "0",
> "bluestore_bdev_type": "ssd",
>
> The default for SSD OSDs is 3GB, see: http://docs.ceph.com/docs/
> master/rados/configuration/bluestore-config-ref/
>
> bluestore_cache_size_ssd is set to 3GB, so it will use at least 3GB.
>
> I agree, 5GB above the 3GB is a lot of memory, but could you check the OSD
> metadata first?
>
> >
> > Having read the other threads Sage recommends to also send the mempool
> dump
> >
> > {
> > "bloom_filter": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_alloc": {
> > "items": 5732656,
> > "bytes": 5732656
> > },
> > "bluestore_cache_data": {
> > "items": 10659,
> > "bytes": 481820672
> > },
> > "bluestore_cache_onode": {
> > "items": 1106714,
> > "bytes": 752565520
> > },
> > "bluestore_cache_other": {
> > "items": 412675997,
> > "bytes": 1388849420
> > },
> > "bluestore_fsck": {
> > "items": 0,
> > "bytes": 0
> > },
> > "bluestore_txc": {
> > "items": 5,
> > "bytes": 3600
> > },
> > "bluestore_writing_deferred": {
> > "items": 21,
> > "bytes": 225280
> > },
> > "bluestore_writing": {
> > "items": 2,
> > "bytes": 188146
> > },
> > "bluefs": {
> > "items": 951,
> > "bytes": 50432
> > },
> > "buffer_anon": {
> > "items": 14440810,
> > "bytes": 1804695070
> > },
> > "buffer_meta": {
> > "items": 10754,
> > "bytes": 946352
> > },
> > "osd": {
> > "items": 155,
> > "bytes": 1869920
> > },
> > "osd_mapbl": {
> > "items": 16,
> > "bytes": 288280
> > },
> > "osd_pglog": {
> > "items": 284680,
> > "bytes": 91233440
> > },
> > "osdmap": {
> > "items": 14287,
> > "bytes": 731680
> > },
> > "osdmap_mapping": {
> > "items": 0,
> > "bytes": 0
> > },
> > "pgmap": {
> > "items": 0,
> > "bytes": 0
> > },
> > "mds_co": {
> > "items": 0,
> > "bytes": 0
> > },
> > "unittest_1": {
> > "items": 0,
> > "bytes": 0
> > },
> > "unittest_2": {
> > "items": 0,
> > "bytes": 0
> > },
> > "total": {
> > "items": 434277707,
> > "bytes": 4529200468
> > }
> > }
> >
> > Regards,
> >
> > Hans
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High mem with Luminous/Bluestore

2017-10-18 Thread Hans van den Bogert
Hi All,

I've converted 2 nodes with 4 HDD/OSDs each from Filestore to Bluestore. I
expected somewhat higher memory usage/RSS values, however I see, imo, a
huge memory usage for all OSDs on both nodes.

Small snippet from `top`
PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
4652 ceph  20   0 9840236 8.443g  21364 S   0.7 27.1  31:21.15
/usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph


The only deviation from a conventional install is that we use bcache for
our HDDs. Bcache by default is recognized as an 'ssd' in CRUSH. I've
manually set the class to 'hdd'.

Small snippet from `ceph osd tree`
  -37.27399 host osd02
 5   hdd  1.81850 osd.5  up  1.0 1.0

So I would expect around 2GB of usage according to rules of thumb in the
documentation and Sage's comments about the bluestore cache parameters for
HDDs; yet we're now seeing a usage of more than 8GB after less than 1 day
of runtime for this OSD. Is this a memory leak?

Having read the other threads Sage recommends to also send the mempool dump

{
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 5732656,
"bytes": 5732656
},
"bluestore_cache_data": {
"items": 10659,
"bytes": 481820672
},
"bluestore_cache_onode": {
"items": 1106714,
"bytes": 752565520
},
"bluestore_cache_other": {
"items": 412675997,
"bytes": 1388849420
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 5,
"bytes": 3600
},
"bluestore_writing_deferred": {
"items": 21,
"bytes": 225280
},
"bluestore_writing": {
"items": 2,
"bytes": 188146
},
"bluefs": {
"items": 951,
"bytes": 50432
},
"buffer_anon": {
"items": 14440810,
"bytes": 1804695070
},
"buffer_meta": {
"items": 10754,
"bytes": 946352
},
"osd": {
"items": 155,
"bytes": 1869920
},
"osd_mapbl": {
"items": 16,
"bytes": 288280
},
"osd_pglog": {
"items": 284680,
"bytes": 91233440
},
"osdmap": {
"items": 14287,
"bytes": 731680
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
},
"total": {
"items": 434277707,
"bytes": 4529200468
}
}

Regards,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to get current min-compat-client setting

2017-10-13 Thread Hans van den Bogert
Hi, 

I’m in the middle of debugging some incompatibilities with an upgrade of 
Proxmox which uses Ceph. At this point I’d like to know what my current value 
is for the min-compat-client setting, which would’ve been set by:

ceph osd set-require-min-compat-client …

AFAIK, there is no direct get-* variant of the above command. Does anybody now 
how I can retrieve the current setting with perhaps lower level commands/tools ?

Thanks, 

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Thanks for answering even before I asked the questions:)

So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD
down?  Is HEALTH_ERR period of 2-4 seconds within normal bounds? For
context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest
CPUs)

On Thu, Aug 3, 2017 at 1:55 PM, Hans van den Bogert <hansbog...@gmail.com>
wrote:

> What are the implications of this? Because I can see a lot of blocked
> requests piling up when using 'noout' and 'nodown'. That probably makes
> sense though.
> Another thing, no when the OSDs come back online, I again see multiple
> periods of HEALTH_ERR state. Is that to be expected?
>
> On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong <linghucongs...@163.com>
> wrote:
>
>>
>>
>> set the osd noout nodown
>>
>>
>>
>>
>> At 2017-08-03 18:29:47, "Hans van den Bogert" <hansbog...@gmail.com>
>> wrote:
>>
>> Hi all,
>>
>> One thing which has bothered since the beginning of using ceph is that a
>> reboot of a single OSD causes a HEALTH_ERR state for the cluster for at
>> least a couple of seconds.
>>
>> In the case of planned reboot of a OSD node, should I do some extra
>> commands in order not to go to HEALTH_ERR state?
>>
>> Thanks,
>>
>> Hans
>>
>>
>>
>>
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
What are the implications of this? Because I can see a lot of blocked
requests piling up when using 'noout' and 'nodown'. That probably makes
sense though.
Another thing, no when the OSDs come back online, I again see multiple
periods of HEALTH_ERR state. Is that to be expected?

On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong <linghucongs...@163.com>
wrote:

>
>
> set the osd noout nodown
>
>
>
>
> At 2017-08-03 18:29:47, "Hans van den Bogert" <hansbog...@gmail.com>
> wrote:
>
> Hi all,
>
> One thing which has bothered since the beginning of using ceph is that a
> reboot of a single OSD causes a HEALTH_ERR state for the cluster for at
> least a couple of seconds.
>
> In the case of planned reboot of a OSD node, should I do some extra
> commands in order not to go to HEALTH_ERR state?
>
> Thanks,
>
> Hans
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Gracefully reboot OSD node

2017-08-03 Thread Hans van den Bogert
Hi all,

One thing which has bothered since the beginning of using ceph is that a
reboot of a single OSD causes a HEALTH_ERR state for the cluster for at
least a couple of seconds.

In the case of planned reboot of a OSD node, should I do some extra
commands in order not to go to HEALTH_ERR state?

Thanks,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Linear space complexity or memory leak in `Radosgw-admin bucket check --fix`

2017-07-25 Thread Hans van den Bogert
Hi All,

I don't seem to be able to fix a bucket, a bucket which has become
inconsistent due to the use of the `inconsistent-index` flag 8).

My ceph-admin VM has 4GB of RAM, but that doesn't seem to be enough to do a
`radosgw-admin bucket check --fix` which holds 6M items, as the
radosgw-admin process is killed eventually by the Out-Of-Memory-Manager. Is
this high RAM usage to be expected, or should I file a bug?

Regards,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crash on startup

2017-02-01 Thread Hans van den Bogert
Hi All, 

I'm clueless as to why an OSD crashed. I have a log at [1]. If anyone can 
explain how this should be interpreted, then please let me know. I can only see 
generic errors probably started by a false assert. Restarting the OSD fails 
with the same errors as in [1]. It seems like, though correct me if I'm wrong, 
that replaying the journal fails. 
Is this something which can just happen and should I just wipe the whole OSD 
and recreate a new OSD? Or is this a symptom of a bigger issue?

Regards,

Hans

[1] http://pastebin.com/yBqkAqix ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com