[ceph-users] Re: active+remapped+backfilling keeps going .. and going

2020-04-24 Thread Eugen Block

I would start with mgr logs, maybe increase the debug level if necessary.


Zitat von "Kyriazis, George" :

pg_autoscalar is on, but the number of PGs are stable.  I’ve seen  
subsequent calls to “ceph -s” list the same number of total PGs, but  
PGs remapped+backfilling increased.


I haven’t seen anything in the logs, but perhaps I’m not looking at  
the right place.  Any place in particular I should be looking?


Thanks!

George


# ceph -s
  cluster:
id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
health: HEALTH_WARN
603 pgs not deep-scrubbed in time
603 pgs not scrubbed in time
2 daemons have recently crashed

  services:
mon: 5 daemons, quorum  
vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 2h)
mgr: vis-ivb-07(active, since 2h), standbys: vis-hsw-01,  
vis-ivb-10, vis-clx-01, vis-clx-05

mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby
osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 100 remapped pgs

  data:
pools:   5 pools, 608 pgs
objects: 46.32M objects, 49 TiB
usage:   129 TiB used, 75 TiB / 204 TiB avail
pgs: 8985854/172482064 objects misplaced (5.210%)
 508 active+clean
 100 active+remapped+backfilling

  io:
client:   102 KiB/s wr, 0 op/s rd, 4 op/s wr
recovery: 117 MiB/s, 86 objects/s

# ceph -s
  cluster:
id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
health: HEALTH_WARN
603 pgs not deep-scrubbed in time
603 pgs not scrubbed in time
2 daemons have recently crashed

  services:
mon: 5 daemons, quorum  
vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 5h)
mgr: vis-ivb-07(active, since 5h), standbys: vis-hsw-01,  
vis-ivb-10, vis-clx-01, vis-clx-05

mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby
osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 103 remapped pgs

  data:
pools:   5 pools, 608 pgs
objects: 46.32M objects, 49 TiB
usage:   128 TiB used, 75 TiB / 204 TiB avail
pgs: 8681394/172482064 objects misplaced (5.033%)
 505 active+clean
 103 active+remapped+backfilling

  io:
recovery: 70 MiB/s, 54 objects/s

#



On Apr 24, 2020, at 1:52 PM, Eugen Block  
mailto:ebl...@nde.ag>> wrote:


Yes, that means it's off. Can you see anything in the logs? They  
should show that something triggers the rebalancing. Could it be the  
pg_autoscaler? Is that enabled?



Zitat von "Kyriazis, George"  
mailto:george.kyria...@intel.com>>:


Here is the status of my balancer:

# ceph balancer status
{
   "last_optimize_duration": "",
   "plans": [],
   "mode": "none",
   "active": false,
   "optimize_result": "",
   "last_optimize_started": ""
}
#

Doesn’t that mean it’s “off”?

Thanks,

George


On Apr 24, 2020, at 1:49 AM, Lomayani S. Laizer  
mailto:lomlai...@gmail.com>>  
wrote:


I had a similar problem  when upgraded to octopus and the solution  
is to turn off  autobalancing.


You can try to turn off if enabled

ceph balancer off


On Fri, Apr 24, 2020 at 8:51 AM Eugen Block  
mailto:ebl...@nde.ag>> wrote:

Hi,
the balancer is probably running, which mode? I changed the mode to
none in our own cluster because it also never finished rebalancing and
we didn’t have a bad pg distribution. Maybe it’s supposed to be like
that, I don’t know.

Regards
Eugen


Zitat von "Kyriazis, George"  
mailto:george.kyria...@intel.com>>:


Hello,

I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15
OSDs), on a 10G network.

The cluster started small, and I’ve progressively added OSDs over
time.  Problem is…. The cluster never rebalances completely.  There
is always progress on backfilling, but PGs that used to be in
active+clean state jump back into the active+remapped+backfilling
(or active+remapped+backfill_wait) state, to be moved to different
OSDs.

Initially I had a 1G network (recently upgraded to 10G), and I was
holding on the backfill settings (osd_max_backfills and
osd_recovery_sleep_hdd).  I just recently (last few weeks) upgraded
to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0
(only HDDs, no SSDs).  Cluster has been backfilling for months now
with no end in sight.

Is this normal behavior?  Is there any setting that I can look at
that till give me an idea as to why PGs are jumping back into
remapped from clean?

Below is output of “ceph osd tree” and “ceph osd df”:

# ceph osd tree
ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
-1   203.72472 root default
-940.01666 host vis-hsw-01
 3   hdd  10.91309 osd.3   up  1.0 1.0
 6   hdd  14.55179 osd.6   up  1.0 1.0
10   hdd  14.55179 osd.10  up  1.0 1.0
-1340.01666 host vis-hsw-02
 0   hdd  10.91309 osd.0   up  1.0 1.0
 7   hdd  14.55179 osd.7   up 

[ceph-users] Re: How to debug ssh: ceph orch host add ceph01 10.10.1.1

2020-04-24 Thread Dimitri Savineau
Did you take a look at the cephadm logs (/var/log/ceph/ceph.cephadm.log) ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: active+remapped+backfilling keeps going .. and going

2020-04-24 Thread Kyriazis, George
pg_autoscalar is on, but the number of PGs are stable.  I’ve seen subsequent 
calls to “ceph -s” list the same number of total PGs, but PGs 
remapped+backfilling increased.

I haven’t seen anything in the logs, but perhaps I’m not looking at the right 
place.  Any place in particular I should be looking?

Thanks!

George


# ceph -s
  cluster:
id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
health: HEALTH_WARN
603 pgs not deep-scrubbed in time
603 pgs not scrubbed in time
2 daemons have recently crashed

  services:
mon: 5 daemons, quorum 
vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 2h)
mgr: vis-ivb-07(active, since 2h), standbys: vis-hsw-01, vis-ivb-10, 
vis-clx-01, vis-clx-05
mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby
osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 100 remapped pgs

  data:
pools:   5 pools, 608 pgs
objects: 46.32M objects, 49 TiB
usage:   129 TiB used, 75 TiB / 204 TiB avail
pgs: 8985854/172482064 objects misplaced (5.210%)
 508 active+clean
 100 active+remapped+backfilling

  io:
client:   102 KiB/s wr, 0 op/s rd, 4 op/s wr
recovery: 117 MiB/s, 86 objects/s

# ceph -s
  cluster:
id: ec2c9542-dc1b-4af6-9f21-0adbcabb9452
health: HEALTH_WARN
603 pgs not deep-scrubbed in time
603 pgs not scrubbed in time
2 daemons have recently crashed

  services:
mon: 5 daemons, quorum 
vis-ivb-07,vis-ivb-10,vis-hsw-01,vis-clx-01,vis-clx-05 (age 5h)
mgr: vis-ivb-07(active, since 5h), standbys: vis-hsw-01, vis-ivb-10, 
vis-clx-01, vis-clx-05
mds: cephfs:1 {0=vis-hsw-01=up:active} 2 up:standby
osd: 15 osds: 15 up (since 2d), 15 in (since 8d); 103 remapped pgs

  data:
pools:   5 pools, 608 pgs
objects: 46.32M objects, 49 TiB
usage:   128 TiB used, 75 TiB / 204 TiB avail
pgs: 8681394/172482064 objects misplaced (5.033%)
 505 active+clean
 103 active+remapped+backfilling

  io:
recovery: 70 MiB/s, 54 objects/s

#



On Apr 24, 2020, at 1:52 PM, Eugen Block mailto:ebl...@nde.ag>> 
wrote:

Yes, that means it's off. Can you see anything in the logs? They should show 
that something triggers the rebalancing. Could it be the pg_autoscaler? Is that 
enabled?


Zitat von "Kyriazis, George" 
mailto:george.kyria...@intel.com>>:

Here is the status of my balancer:

# ceph balancer status
{
   "last_optimize_duration": "",
   "plans": [],
   "mode": "none",
   "active": false,
   "optimize_result": "",
   "last_optimize_started": ""
}
#

Doesn’t that mean it’s “off”?

Thanks,

George


On Apr 24, 2020, at 1:49 AM, Lomayani S. Laizer 
mailto:lomlai...@gmail.com>> 
wrote:

I had a similar problem  when upgraded to octopus and the solution is to turn 
off  autobalancing.

You can try to turn off if enabled

ceph balancer off


On Fri, Apr 24, 2020 at 8:51 AM Eugen Block 
mailto:ebl...@nde.ag>> wrote:
Hi,
the balancer is probably running, which mode? I changed the mode to
none in our own cluster because it also never finished rebalancing and
we didn’t have a bad pg distribution. Maybe it’s supposed to be like
that, I don’t know.

Regards
Eugen


Zitat von "Kyriazis, George" 
mailto:george.kyria...@intel.com>>:

Hello,

I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15
OSDs), on a 10G network.

The cluster started small, and I’ve progressively added OSDs over
time.  Problem is…. The cluster never rebalances completely.  There
is always progress on backfilling, but PGs that used to be in
active+clean state jump back into the active+remapped+backfilling
(or active+remapped+backfill_wait) state, to be moved to different
OSDs.

Initially I had a 1G network (recently upgraded to 10G), and I was
holding on the backfill settings (osd_max_backfills and
osd_recovery_sleep_hdd).  I just recently (last few weeks) upgraded
to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0
(only HDDs, no SSDs).  Cluster has been backfilling for months now
with no end in sight.

Is this normal behavior?  Is there any setting that I can look at
that till give me an idea as to why PGs are jumping back into
remapped from clean?

Below is output of “ceph osd tree” and “ceph osd df”:

# ceph osd tree
ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
-1   203.72472 root default
-940.01666 host vis-hsw-01
 3   hdd  10.91309 osd.3   up  1.0 1.0
 6   hdd  14.55179 osd.6   up  1.0 1.0
10   hdd  14.55179 osd.10  up  1.0 1.0
-1340.01666 host vis-hsw-02
 0   hdd  10.91309 osd.0   up  1.0 1.0
 7   hdd  14.55179 osd.7   up  1.0 1.0
11   hdd  14.55179 osd.11  up  1.0 1.0
-1140.01666 host vis-hsw-03
 4   hdd  

[ceph-users] Re: active+remapped+backfilling keeps going .. and going

2020-04-24 Thread Eugen Block
Yes, that means it's off. Can you see anything in the logs? They  
should show that something triggers the rebalancing. Could it be the  
pg_autoscaler? Is that enabled?



Zitat von "Kyriazis, George" :


Here is the status of my balancer:

# ceph balancer status
{
"last_optimize_duration": "",
"plans": [],
"mode": "none",
"active": false,
"optimize_result": "",
"last_optimize_started": ""
}
#

Doesn’t that mean it’s “off”?

Thanks,

George


On Apr 24, 2020, at 1:49 AM, Lomayani S. Laizer  
mailto:lomlai...@gmail.com>> wrote:


I had a similar problem  when upgraded to octopus and the solution  
is to turn off  autobalancing.


You can try to turn off if enabled

ceph balancer off


On Fri, Apr 24, 2020 at 8:51 AM Eugen Block  
mailto:ebl...@nde.ag>> wrote:

Hi,
the balancer is probably running, which mode? I changed the mode to
none in our own cluster because it also never finished rebalancing and
we didn’t have a bad pg distribution. Maybe it’s supposed to be like
that, I don’t know.

Regards
Eugen


Zitat von "Kyriazis, George"  
mailto:george.kyria...@intel.com>>:



Hello,

I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15
OSDs), on a 10G network.

The cluster started small, and I’ve progressively added OSDs over
time.  Problem is…. The cluster never rebalances completely.  There
is always progress on backfilling, but PGs that used to be in
active+clean state jump back into the active+remapped+backfilling
(or active+remapped+backfill_wait) state, to be moved to different
OSDs.

Initially I had a 1G network (recently upgraded to 10G), and I was
holding on the backfill settings (osd_max_backfills and
osd_recovery_sleep_hdd).  I just recently (last few weeks) upgraded
to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0
(only HDDs, no SSDs).  Cluster has been backfilling for months now
with no end in sight.

Is this normal behavior?  Is there any setting that I can look at
that till give me an idea as to why PGs are jumping back into
remapped from clean?

Below is output of “ceph osd tree” and “ceph osd df”:

# ceph osd tree
ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
 -1   203.72472 root default
 -940.01666 host vis-hsw-01
  3   hdd  10.91309 osd.3   up  1.0 1.0
  6   hdd  14.55179 osd.6   up  1.0 1.0
 10   hdd  14.55179 osd.10  up  1.0 1.0
-1340.01666 host vis-hsw-02
  0   hdd  10.91309 osd.0   up  1.0 1.0
  7   hdd  14.55179 osd.7   up  1.0 1.0
 11   hdd  14.55179 osd.11  up  1.0 1.0
-1140.01666 host vis-hsw-03
  4   hdd  10.91309 osd.4   up  1.0 1.0
  8   hdd  14.55179 osd.8   up  1.0 1.0
 12   hdd  14.55179 osd.12  up  1.0 1.0
 -340.01666 host vis-hsw-04
  5   hdd  10.91309 osd.5   up  1.0 1.0
  9   hdd  14.55179 osd.9   up  1.0 1.0
 13   hdd  14.55179 osd.13  up  1.0 1.0
-1543.65807 host vis-hsw-05
  1   hdd  14.55269 osd.1   up  1.0 1.0
  2   hdd  14.55269 osd.2   up  1.0 1.0
 14   hdd  14.55269 osd.14  up  1.0 1.0
 -5   0 host vis-ivb-07
 -7   0 host vis-ivb-10
#

# ceph osd df
ID CLASS WEIGHT   REWEIGHT SIZERAW USE DATAOMAPMETA
AVAIL   %USE  VAR  PGS STATUS
 3   hdd 10.91309  1.0  11 TiB 8.2 TiB 8.2 TiB 552 MiB  25 GiB
2.7 TiB 75.08 1.19 131 up
 6   hdd 14.55179  1.0  15 TiB 9.1 TiB 9.1 TiB 1.2 GiB  30 GiB
5.5 TiB 62.47 0.99 148 up
10   hdd 14.55179  1.0  15 TiB 8.1 TiB 8.1 TiB 1.5 GiB  20 GiB
6.4 TiB 55.98 0.89 142 up
 0   hdd 10.91309  1.0  11 TiB 7.5 TiB 7.4 TiB 504 MiB  24 GiB
3.5 TiB 68.34 1.09 120 up
 7   hdd 14.55179  1.0  15 TiB 8.7 TiB 8.7 TiB 1.0 GiB  31 GiB
5.8 TiB 60.07 0.95 144 up
11   hdd 14.55179  1.0  15 TiB 9.4 TiB 9.3 TiB 819 MiB  20 GiB
5.2 TiB 64.31 1.02 147 up
 4   hdd 10.91309  1.0  11 TiB 7.0 TiB 7.0 TiB 284 MiB  25 GiB
3.9 TiB 64.35 1.02 112 up
 8   hdd 14.55179  1.0  15 TiB 9.3 TiB 9.2 TiB 1.8 GiB  29 GiB
5.3 TiB 63.65 1.01 157 up
12   hdd 14.55179  1.0  15 TiB 8.6 TiB 8.6 TiB 623 MiB  19 GiB
5.9 TiB 59.14 0.94 136 up
 5   hdd 10.91309  1.0  11 TiB 8.6 TiB 8.6 TiB 542 MiB  29 GiB
2.3 TiB 79.01 1.26 134 up
 9   hdd 14.55179  1.0  15 TiB 8.2 TiB 8.2 TiB 707 MiB  27 GiB
6.3 TiB 56.56 0.90 138 up
13   hdd 14.55179  1.0  15 TiB 8.7 TiB 8.7 TiB 741 MiB  18 GiB
5.8 TiB 59.85 0.95 134 up
 1   hdd 14.55269  1.0  15 TiB 9.8 TiB 9.8 TiB 1.3 GiB  20 GiB
4.8 TiB 67.18 1.07 158 up
 2   hdd 14.55269  1.0  15 TiB 8.7 TiB 8.7 TiB 936 MiB  18 GiB
5.8 TiB 60.04 0.95 148 up
14   hdd 14.55269  1.0  15 TiB 8.3 TiB 8.3 TiB 673 MiB  18 GiB
6.3 

[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Stefan Priebe - Profihost AG
Hi Igor,

could it be the fact that there are those 64kb spilled over metadata i
can't get away?

Stefan

Am 24.04.20 um 13:08 schrieb Igor Fedotov:
> Hi Stefan,
> 
> that's not 100% pure experiment. Fresh OSD might be faster by itself.
> E.g. due to lack of space fragmentation and/or empty lookup tables.
> 
> You might want to recreate OSD.0 without DB and attach DB manually. Then
> benchmark resulting OSD.
> 
> Different experiment if you have another slow OSD with recently added DB
> would be to:
> 
> Compare benchmark results for both bitmap and stupid allocators for this
> specific OSD. I.e. benchmark it as-is then change
> bluestore_allocator/bluefs_allocator to stupid and benchmark again.
> 
> 
> And just in case - I presume all the benchmark results are persistent,
> i.e. you can see the same results for multiple runs.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> 
> On 4/24/2020 12:32 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Igor,
>>
>> there must be a difference. I purged osd.0 and recreated it.
>>
>> Now it gives:
>> ceph tell osd.0 bench
>> {
>>     "bytes_written": 1073741824,
>>     "blocksize": 4194304,
>>     "elapsed_sec": 8.155473563993,
>>     "bytes_per_sec": 131659040.46819863,
>>     "iops": 31.389961354303033
>> }
>>
>> What's wrong wiht adding a block.db device later?
>>
>> Stefan
>>
>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>>
>>> if the OSDs are idle the difference is even more worse:
>>>
>>> # ceph tell osd.0 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 15.39670787501,
>>>  "bytes_per_sec": 69738403.346825853,
>>>  "iops": 16.626931034761871
>>> }
>>>
>>> # ceph tell osd.38 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 6.890398517004,
>>>  "bytes_per_sec": 155831599.77624846,
>>>  "iops": 37.153148597776521
>>> }
>>>
>>> Stefan
>>>
>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
 Hi,
 Am 23.04.20 um 14:06 schrieb Igor Fedotov:
> I don't recall any additional tuning to be applied to new DB
> volume. And assume the hardware is pretty the same...
>
> Do you still have any significant amount of data spilled over for
> these updated OSDs? If not I don't have any valid explanation for
> the phenomena.

 just the 64k from here:
 https://tracker.ceph.com/issues/44509

> You might want to try "ceph osd bench" to compare OSDs under pretty
> the same load. Any difference observed

 Servers are the same HW. OSD Bench is:
 # ceph tell osd.0 bench
 {
  "bytes_written": 1073741824,
  "blocksize": 4194304,
  "elapsed_sec": 16.09141478101,
  "bytes_per_sec": 66727620.822242722,
  "iops": 15.909104543266945
 }

 # ceph tell osd.36 bench
 {
  "bytes_written": 1073741824,
  "blocksize": 4194304,
  "elapsed_sec": 10.023828538,
  "bytes_per_sec": 107118933.6419194,
  "iops": 25.539143953780986
 }


 OSD 0 is a Toshiba MG07SCA12TA SAS 12G
 OSD 36 is a Seagate ST12000NM0008-2H SATA 6G

 SSDs are all the same like the rest of the HW. But both drives
 should give the same performance from their specs. The only other
 difference is that OSD 36 was directly created with the block.db
 device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.

 Stefan

>
> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> is there anything else needed beside running:
>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD}
>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
>>
>> I did so some weeks ago and currently i'm seeing that all osds
>> originally deployed with --block-db show 10-20% I/O waits while
>> all those got converted using ceph-bluestore-tool show 80-100% I/O
>> waits.
>>
>> Also is there some tuning available to use more of the SSD? The
>> SSD (block-db) is only saturated at 0-2%.
>>
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newbie question - how do I add more monitors?

2020-04-24 Thread jhamster
The easiest way I have found to add monitors is to add them to the initial 
monitors list which will create them all at the same time.
After running "ceph-deploy new {primary-node-name}"
You will need to edit the ceph.conf file to add the other additional monitors. 
Edit the lines “mon_initial_members” and “mon_host” to contain all of the nodes 
hostnames/IPs that will be hosting monitor nodes
ex: mon_initial_members = icephnode01, icephnode04
ex: mon_host = node01.IP, node04.IP

Then after installing ceph on each of the nodes with ceph-deploy you need only 
run
"ceph-deploy mon create-initial"

This will deploy the monitor daemon to all hosts in the mon_initial_members 
group and they will then be able to communicate with each other.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Not able to start object gateway

2020-04-24 Thread Sailaja Yedugundla
I am trying to setup a single node cluster using cephadm. I was able to start 
the cluster with 1 monitor and 3 osds. When I try to create users for rados 
gateway using the command radosgw-admin user create, the command hangs. Here is 
my cluster status.
 cluster:
id: 5a03d7e2-85e4-11ea-bba9-021f94750a41
health: HEALTH_WARN
Reduced data availability: 44 pgs inactive
Degraded data redundancy: 6/15 objects degraded (40.000%), 2 pgs 
degraded, 80 pgs undersized

  services:
mon: 1 daemons, quorum ip-172-31-9-253.ec2.internal (age 9h)
mgr: ip-172-31-9-253.ec2.internal.umudgc(active, since 9h)
osd: 3 osds: 3 up (since 74m), 3 in (since 74m); 41 remapped pgs

  data:
pools:   4 pools, 97 pgs
objects: 5 objects, 1.2 KiB
usage:   3.0 GiB used, 27 GiB / 30 GiB avail
pgs: 45.361% pgs not active
 6/15 objects degraded (40.000%)
 2/15 objects misplaced (13.333%)
 44 undersized+peered
 24 active+undersized+remapped
 17 active+clean
 10 active+undersized
 2  active+undersized+degraded

  progress:
Rebalancing after osd.1 marked in (74m)
  [=...] (remaining: 2h)
Rebalancing after osd.2 marked in (74m)
  []

Can someone help me resolving this issue.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Stefan Priebe - Profihost AG
Hi Igor,

> Am 24.04.2020 um 13:09 schrieb Igor Fedotov :
> 
> that's not 100% pure experiment. Fresh OSD might be faster by itself. E.g. 
> due to lack of space fragmentation and/or empty lookup tables.

Also the migrated ones were just 3 weeks old having a usage of 5%.

> You might want to recreate OSD.0 without DB and attach DB manually. Then 
> benchmark resulting OSD.

I’ve 24 osds where I migrated db and 8 which where initially created with dB. 
All of them show the same symptoms.

> Different experiment if you have another slow OSD with recently added DB 
> would be to:
Yes see above 24 of them ;-)

> Compare benchmark results for both bitmap and stupid allocators for this 
> specific OSD. I.e. benchmark it as-is then change 
> bluestore_allocator/bluefs_allocator to stupid and benchmark again.

Can try in a few hours.

> And just in case - I presume all the benchmark results are persistent, i.e. 
> you can see the same results for multiple runs.

Yes I did 10 runs for each posted benchmark.

Thanks,
Stefan

> 
> 
> Thanks,
> 
> Igor
> 
> 
> 
>> On 4/24/2020 12:32 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Igor,
>> 
>> there must be a difference. I purged osd.0 and recreated it.
>> 
>> Now it gives:
>> ceph tell osd.0 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "elapsed_sec": 8.155473563993,
>> "bytes_per_sec": 131659040.46819863,
>> "iops": 31.389961354303033
>> }
>> 
>> What's wrong wiht adding a block.db device later?
>> 
>> Stefan
>> 
>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>> 
>>> if the OSDs are idle the difference is even more worse:
>>> 
>>> # ceph tell osd.0 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 15.39670787501,
>>>  "bytes_per_sec": 69738403.346825853,
>>>  "iops": 16.626931034761871
>>> }
>>> 
>>> # ceph tell osd.38 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 6.890398517004,
>>>  "bytes_per_sec": 155831599.77624846,
>>>  "iops": 37.153148597776521
>>> }
>>> 
>>> Stefan
>>> 
>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
 Hi,
 Am 23.04.20 um 14:06 schrieb Igor Fedotov:
> I don't recall any additional tuning to be applied to new DB volume. And 
> assume the hardware is pretty the same...
> 
> Do you still have any significant amount of data spilled over for these 
> updated OSDs? If not I don't have any valid explanation for the phenomena.
 
 just the 64k from here:
 https://tracker.ceph.com/issues/44509
 
> You might want to try "ceph osd bench" to compare OSDs under pretty the 
> same load. Any difference observed
 
 Servers are the same HW. OSD Bench is:
 # ceph tell osd.0 bench
 {
  "bytes_written": 1073741824,
  "blocksize": 4194304,
  "elapsed_sec": 16.09141478101,
  "bytes_per_sec": 66727620.822242722,
  "iops": 15.909104543266945
 }
 
 # ceph tell osd.36 bench
 {
  "bytes_written": 1073741824,
  "blocksize": 4194304,
  "elapsed_sec": 10.023828538,
  "bytes_per_sec": 107118933.6419194,
  "iops": 25.539143953780986
 }
 
 
 OSD 0 is a Toshiba MG07SCA12TA SAS 12G
 OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
 
 SSDs are all the same like the rest of the HW. But both drives should give 
 the same performance from their specs. The only other difference is that 
 OSD 36 was directly created with the block.db device (Nautilus 14.2.7) and 
 OSD 0 (14.2.8) does not.
 
 Stefan
 
> 
> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>> 
>> is there anything else needed beside running:
>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
>> 
>> I did so some weeks ago and currently i'm seeing that all osds 
>> originally deployed with --block-db show 10-20% I/O waits while all 
>> those got converted using ceph-bluestore-tool show 80-100% I/O waits.
>> 
>> Also is there some tuning available to use more of the SSD? The SSD 
>> (block-db) is only saturated at 0-2%.
>> 
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Stefan Priebe - Profihost AG
No not a standalone Wal I wanted to ask whether bdev-new-db migrated dB and Wal 
from hdd to ssd.

Stefan

> Am 24.04.2020 um 13:01 schrieb Igor Fedotov :
> 
> 
> Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD, NVMe) 
> standalone WAL makes no sense.
> 
> 
> 
> On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote:
>> Is Wal device missing? Do I need to run bluefs-bdev-new-db and Wal?
>> 
>> Greets,
>> Stefan
>> 
>>> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG 
>>> :
>>> 
>>> Hi Igor,
>>> 
>>> there must be a difference. I purged osd.0 and recreated it.
>>> 
>>> Now it gives:
>>> ceph tell osd.0 bench
>>> {
>>>"bytes_written": 1073741824,
>>>"blocksize": 4194304,
>>>"elapsed_sec": 8.155473563993,
>>>"bytes_per_sec": 131659040.46819863,
>>>"iops": 31.389961354303033
>>> }
>>> 
>>> What's wrong wiht adding a block.db device later?
>>> 
>>> Stefan
>>> 
>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
 Hi,
 if the OSDs are idle the difference is even more worse:
 # ceph tell osd.0 bench
 {
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 15.39670787501,
 "bytes_per_sec": 69738403.346825853,
 "iops": 16.626931034761871
 }
 # ceph tell osd.38 bench
 {
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 6.890398517004,
 "bytes_per_sec": 155831599.77624846,
 "iops": 37.153148597776521
 }
 Stefan
 Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
> Hi,
>> Am 23.04.20 um 14:06 schrieb Igor Fedotov:
>> I don't recall any additional tuning to be applied to new DB volume. And 
>> assume the hardware is pretty the same...
>> 
>> Do you still have any significant amount of data spilled over for these 
>> updated OSDs? If not I don't have any valid explanation for the 
>> phenomena.
> 
> just the 64k from here:
> https://tracker.ceph.com/issues/44509
> 
>> You might want to try "ceph osd bench" to compare OSDs under pretty the 
>> same load. Any difference observed
> 
> Servers are the same HW. OSD Bench is:
> # ceph tell osd.0 bench
> {
>  "bytes_written": 1073741824,
>  "blocksize": 4194304,
>  "elapsed_sec": 16.09141478101,
>  "bytes_per_sec": 66727620.822242722,
>  "iops": 15.909104543266945
> }
> 
> # ceph tell osd.36 bench
> {
>  "bytes_written": 1073741824,
>  "blocksize": 4194304,
>  "elapsed_sec": 10.023828538,
>  "bytes_per_sec": 107118933.6419194,
>  "iops": 25.539143953780986
> }
> 
> 
> OSD 0 is a Toshiba MG07SCA12TA SAS 12G
> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
> 
> SSDs are all the same like the rest of the HW. But both drives should 
> give the same performance from their specs. The only other difference is 
> that OSD 36 was directly created with the block.db device (Nautilus 
> 14.2.7) and OSD 0 (14.2.8) does not.
> 
> Stefan
> 
>> 
>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
>>> Hello,
>>> 
>>> is there anything else needed beside running:
>>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
>>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
>>> 
>>> I did so some weeks ago and currently i'm seeing that all osds 
>>> originally deployed with --block-db show 10-20% I/O waits while all 
>>> those got converted using ceph-bluestore-tool show 80-100% I/O waits.
>>> 
>>> Also is there some tuning available to use more of the SSD? The SSD 
>>> (block-db) is only saturated at 0-2%.
>>> 
>>> Greets,
>>> Stefan
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Igor Fedotov

Hi Stefan,

that's not 100% pure experiment. Fresh OSD might be faster by itself. 
E.g. due to lack of space fragmentation and/or empty lookup tables.


You might want to recreate OSD.0 without DB and attach DB manually. Then 
benchmark resulting OSD.


Different experiment if you have another slow OSD with recently added DB 
would be to:


Compare benchmark results for both bitmap and stupid allocators for this 
specific OSD. I.e. benchmark it as-is then change 
bluestore_allocator/bluefs_allocator to stupid and benchmark again.



And just in case - I presume all the benchmark results are persistent, 
i.e. you can see the same results for multiple runs.



Thanks,

Igor



On 4/24/2020 12:32 PM, Stefan Priebe - Profihost AG wrote:

Hi Igor,

there must be a difference. I purged osd.0 and recreated it.

Now it gives:
ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 8.155473563993,
    "bytes_per_sec": 131659040.46819863,
    "iops": 31.389961354303033
}

What's wrong wiht adding a block.db device later?

Stefan

Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:

Hi,

if the OSDs are idle the difference is even more worse:

# ceph tell osd.0 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 15.39670787501,
 "bytes_per_sec": 69738403.346825853,
 "iops": 16.626931034761871
}

# ceph tell osd.38 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 6.890398517004,
 "bytes_per_sec": 155831599.77624846,
 "iops": 37.153148597776521
}

Stefan

Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:

Hi,
Am 23.04.20 um 14:06 schrieb Igor Fedotov:
I don't recall any additional tuning to be applied to new DB 
volume. And assume the hardware is pretty the same...


Do you still have any significant amount of data spilled over for 
these updated OSDs? If not I don't have any valid explanation for 
the phenomena.


just the 64k from here:
https://tracker.ceph.com/issues/44509

You might want to try "ceph osd bench" to compare OSDs under pretty 
the same load. Any difference observed


Servers are the same HW. OSD Bench is:
# ceph tell osd.0 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 16.09141478101,
 "bytes_per_sec": 66727620.822242722,
 "iops": 15.909104543266945
}

# ceph tell osd.36 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 10.023828538,
 "bytes_per_sec": 107118933.6419194,
 "iops": 25.539143953780986
}


OSD 0 is a Toshiba MG07SCA12TA SAS 12G
OSD 36 is a Seagate ST12000NM0008-2H SATA 6G

SSDs are all the same like the rest of the HW. But both drives 
should give the same performance from their specs. The only other 
difference is that OSD 36 was directly created with the block.db 
device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.


Stefan



On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:

Hello,

is there anything else needed beside running:
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1


I did so some weeks ago and currently i'm seeing that all osds 
originally deployed with --block-db show 10-20% I/O waits while 
all those got converted using ceph-bluestore-tool show 80-100% I/O 
waits.


Also is there some tuning available to use more of the SSD? The 
SSD (block-db) is only saturated at 0-2%.


Greets,
Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Igor Fedotov
Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD, 
NVMe) standalone WAL makes no sense.



On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote:

Is Wal device missing? Do I need to run *bluefs-bdev-new-db and Wal?*

Greets,
Stefan

Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG 
:


Hi Igor,

there must be a difference. I purged osd.0 and recreated it.

Now it gives:
ceph tell osd.0 bench
{
   "bytes_written": 1073741824,
   "blocksize": 4194304,
   "elapsed_sec": 8.155473563993,
   "bytes_per_sec": 131659040.46819863,
   "iops": 31.389961354303033
}

What's wrong wiht adding a block.db device later?

Stefan

Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:

Hi,
if the OSDs are idle the difference is even more worse:
# ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 15.39670787501,
    "bytes_per_sec": 69738403.346825853,
    "iops": 16.626931034761871
}
# ceph tell osd.38 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 6.890398517004,
    "bytes_per_sec": 155831599.77624846,
    "iops": 37.153148597776521
}
Stefan
Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:

Hi,
Am 23.04.20 um 14:06 schrieb Igor Fedotov:
I don't recall any additional tuning to be applied to new DB 
volume. And assume the hardware is pretty the same...


Do you still have any significant amount of data spilled over for 
these updated OSDs? If not I don't have any valid explanation for 
the phenomena.


just the 64k from here:
https://tracker.ceph.com/issues/44509

You might want to try "ceph osd bench" to compare OSDs under 
pretty the same load. Any difference observed


Servers are the same HW. OSD Bench is:
# ceph tell osd.0 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 16.09141478101,
 "bytes_per_sec": 66727620.822242722,
 "iops": 15.909104543266945
}

# ceph tell osd.36 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 10.023828538,
 "bytes_per_sec": 107118933.6419194,
 "iops": 25.539143953780986
}


OSD 0 is a Toshiba MG07SCA12TA SAS 12G
OSD 36 is a Seagate ST12000NM0008-2H SATA 6G

SSDs are all the same like the rest of the HW. But both drives 
should give the same performance from their specs. The only other 
difference is that OSD 36 was directly created with the block.db 
device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.


Stefan



On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:

Hello,

is there anything else needed beside running:
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1


I did so some weeks ago and currently i'm seeing that all osds 
originally deployed with --block-db show 10-20% I/O waits while 
all those got converted using ceph-bluestore-tool show 80-100% 
I/O waits.


Also is there some tuning available to use more of the SSD? The 
SSD (block-db) is only saturated at 0-2%.


Greets,
Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Stefan Priebe - Profihost AG
Is Wal device missing? Do I need to run bluefs-bdev-new-db and Wal?

Greets,
Stefan

> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG 
> :
> 
> Hi Igor,
> 
> there must be a difference. I purged osd.0 and recreated it.
> 
> Now it gives:
> ceph tell osd.0 bench
> {
>"bytes_written": 1073741824,
>"blocksize": 4194304,
>"elapsed_sec": 8.155473563993,
>"bytes_per_sec": 131659040.46819863,
>"iops": 31.389961354303033
> }
> 
> What's wrong wiht adding a block.db device later?
> 
> Stefan
> 
>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>> Hi,
>> if the OSDs are idle the difference is even more worse:
>> # ceph tell osd.0 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "elapsed_sec": 15.39670787501,
>> "bytes_per_sec": 69738403.346825853,
>> "iops": 16.626931034761871
>> }
>> # ceph tell osd.38 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "elapsed_sec": 6.890398517004,
>> "bytes_per_sec": 155831599.77624846,
>> "iops": 37.153148597776521
>> }
>> Stefan
>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov:
 I don't recall any additional tuning to be applied to new DB volume. And 
 assume the hardware is pretty the same...
 
 Do you still have any significant amount of data spilled over for these 
 updated OSDs? If not I don't have any valid explanation for the phenomena.
>>> 
>>> just the 64k from here:
>>> https://tracker.ceph.com/issues/44509
>>> 
 You might want to try "ceph osd bench" to compare OSDs under pretty the 
 same load. Any difference observed
>>> 
>>> Servers are the same HW. OSD Bench is:
>>> # ceph tell osd.0 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 16.09141478101,
>>>  "bytes_per_sec": 66727620.822242722,
>>>  "iops": 15.909104543266945
>>> }
>>> 
>>> # ceph tell osd.36 bench
>>> {
>>>  "bytes_written": 1073741824,
>>>  "blocksize": 4194304,
>>>  "elapsed_sec": 10.023828538,
>>>  "bytes_per_sec": 107118933.6419194,
>>>  "iops": 25.539143953780986
>>> }
>>> 
>>> 
>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G
>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
>>> 
>>> SSDs are all the same like the rest of the HW. But both drives should give 
>>> the same performance from their specs. The only other difference is that 
>>> OSD 36 was directly created with the block.db device (Nautilus 14.2.7) and 
>>> OSD 0 (14.2.8) does not.
>>> 
>>> Stefan
>>> 
 
 On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> is there anything else needed beside running:
> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
> 
> I did so some weeks ago and currently i'm seeing that all osds originally 
> deployed with --block-db show 10-20% I/O waits while all those got 
> converted using ceph-bluestore-tool show 80-100% I/O waits.
> 
> Also is there some tuning available to use more of the SSD? The SSD 
> (block-db) is only saturated at 0-2%.
> 
> Greets,
> Stefan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newbie question - how do I add more monitors?

2020-04-24 Thread Mason-Williams, Gabryel (DLSLtd,RAL,LSCI)
Hello Chris,

When using ceph deploy I use `ceph-deploy mon create $HOSTNAME` that then 
should add that host to the cluster as a monitor.

Here is a link to the documentation: 
https://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-mon/

Hope that helps

Kind regards


Gabryel Mason-Williams, Placement Student


Address: Diamond Light Source Ltd., Diamond House, Harwell Science & Innovation 
Campus, Didcot, Oxfordshire OX11 0DE


Email: gabryel.mason-willi...@diamond.ac.uk


From: c.stodd...@sheffield.ac.uk 
Sent: 24 April 2020 11:01
To: ceph-users@ceph.io 
Subject: [ceph-users] Newbie question - how do I add more monitors?

version 14.2.9 nautilus

I have just set up my first cluster with one admin/monitor host and six OSDs. 
Everything appears to be working fine. I wanted to add a couple more monitors 
on the OSDs, and following the quick ceph deploy documentation I did:

ceph-deploy mon add icephnode01
ceph-deploy mon add icephnode04

This installs and starts the monitor daemons on the two nodes, but they are 
stuck at probing. They have the right networking and firewall exemptions as far 
as I can tell, so I don't think it's that. If I get a copy of the monmap with 
'ceph mon getmap' and use monmaptool  to look at it I can only see that one 
host. Likewise ceph.conf only contains the one initial host.

Should I have done something before the ceph-deploy mon add to modify the 
monmap? Or am I way off base?

Thanks for your patience with a beginner question!

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Newbie question - how do I add more monitors?

2020-04-24 Thread c . stoddart
version 14.2.9 nautilus

I have just set up my first cluster with one admin/monitor host and six OSDs. 
Everything appears to be working fine. I wanted to add a couple more monitors 
on the OSDs, and following the quick ceph deploy documentation I did:

ceph-deploy mon add icephnode01 
ceph-deploy mon add icephnode04

This installs and starts the monitor daemons on the two nodes, but they are 
stuck at probing. They have the right networking and firewall exemptions as far 
as I can tell, so I don't think it's that. If I get a copy of the monmap with 
'ceph mon getmap' and use monmaptool  to look at it I can only see that one 
host. Likewise ceph.conf only contains the one initial host.

Should I have done something before the ceph-deploy mon add to modify the 
monmap? Or am I way off base?

Thanks for your patience with a beginner question!

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: adding block.db to OSD

2020-04-24 Thread Stefan Priebe - Profihost AG

Hi Igor,

there must be a difference. I purged osd.0 and recreated it.

Now it gives:
ceph tell osd.0 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 8.155473563993,
"bytes_per_sec": 131659040.46819863,
"iops": 31.389961354303033
}

What's wrong wiht adding a block.db device later?

Stefan

Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:

Hi,

if the OSDs are idle the difference is even more worse:

# ceph tell osd.0 bench
{
     "bytes_written": 1073741824,
     "blocksize": 4194304,
     "elapsed_sec": 15.39670787501,
     "bytes_per_sec": 69738403.346825853,
     "iops": 16.626931034761871
}

# ceph tell osd.38 bench
{
     "bytes_written": 1073741824,
     "blocksize": 4194304,
     "elapsed_sec": 6.890398517004,
     "bytes_per_sec": 155831599.77624846,
     "iops": 37.153148597776521
}

Stefan

Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:

Hi,
Am 23.04.20 um 14:06 schrieb Igor Fedotov:
I don't recall any additional tuning to be applied to new DB volume. 
And assume the hardware is pretty the same...


Do you still have any significant amount of data spilled over for 
these updated OSDs? If not I don't have any valid explanation for the 
phenomena.


just the 64k from here:
https://tracker.ceph.com/issues/44509

You might want to try "ceph osd bench" to compare OSDs under pretty 
the same load. Any difference observed


Servers are the same HW. OSD Bench is:
# ceph tell osd.0 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 16.09141478101,
 "bytes_per_sec": 66727620.822242722,
 "iops": 15.909104543266945
}

# ceph tell osd.36 bench
{
 "bytes_written": 1073741824,
 "blocksize": 4194304,
 "elapsed_sec": 10.023828538,
 "bytes_per_sec": 107118933.6419194,
 "iops": 25.539143953780986
}


OSD 0 is a Toshiba MG07SCA12TA SAS 12G
OSD 36 is a Seagate ST12000NM0008-2H SATA 6G

SSDs are all the same like the rest of the HW. But both drives should 
give the same performance from their specs. The only other difference 
is that OSD 36 was directly created with the block.db device (Nautilus 
14.2.7) and OSD 0 (14.2.8) does not.


Stefan



On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:

Hello,

is there anything else needed beside running:
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} 
bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1


I did so some weeks ago and currently i'm seeing that all osds 
originally deployed with --block-db show 10-20% I/O waits while all 
those got converted using ceph-bluestore-tool show 80-100% I/O waits.


Also is there some tuning available to use more of the SSD? The SSD 
(block-db) is only saturated at 0-2%.


Greets,
Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Check if upmap is supported by client?

2020-04-24 Thread Konstantin Shalygin

On 4/13/20 4:52 PM, Frank Schilder wrote:

Is there a way to check if a client supports upmap?


Yes, and actually is not hard, example:

# echo 0x27018fb86aa42ada | python detect_upmap.py
Upmap is supported

The Gist: https://gist.github.com/k0ste/96905ebd1c73c5411dd8d03a9c14b0ea



k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBase/HDFS on Ceph/CephFS

2020-04-24 Thread Marc Roos
 
I think the idea behind pool size of 1, is that hadoop already writes 
copies to 2 other pools(?).

However that leaves the possibility that pg's of these 3 pools can maybe 
share an osd, and if that osd fails, you loose data in these pools. I 
have no idea what the chances are that the same data of different pools 
can end up on the same osd.


-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] HBase/HDFS on Ceph/CephFS

Hi

We have an 3 year old Hadoop cluster - up for refresh - so it is time to 
evaluate options. The "only" usecase is running an HBase installation 
which is important for us and migrating out of HBase would be a hazzle.

Our Ceph usage has expanded and in general - we really like what we see.

Thus - Can this be "sanely" consolidated somehow? I have seen this:
https://docs.ceph.com/docs/jewel/cephfs/hadoop/
But it seem really-really bogus to me.

It recommends that you set:
pool 3 'hadoop1' rep size 1 min_size 1

Which would - if I understand correct - be disastrous. The Hadoop end 
would replicated in 3 across - but within Ceph the replication would be 
1.
The 1 replication in ceph means pulling the OSD node would "gaurantee" 
the pg's to go inactive - which could be ok - but there is nothing 
gauranteeing that the other Hadoop replicas are not served out of the 
same OSD-node/pg? In which case - rebooting an OSD node would bring the 
hadoop cluster unavailable.

Is anyone serving HBase out of Ceph - how does the stadck and 
configuration look? If I went for 3 x replication in both Ceph and HDFS 
then it would definately work, but 9x copies of the dataset is a bit 
more than what looks feasible at the moment.

Thanks for your reflections/input.

Jesper
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HBase/HDFS on Ceph/CephFS

2020-04-24 Thread Serkan Çoban
You do not want to mix ceph with hadoop, because you'll loose data
locality, which is the main point of hadoop systems.
Every read/write request will go through network, this is not optimal.

On Fri, Apr 24, 2020 at 9:04 AM  wrote:
>
> Hi
>
> We have an 3 year old Hadoop cluster - up for refresh - so it is time
> to evaluate options. The "only" usecase is running an HBase installation
> which is important for us and migrating out of HBase would be a hazzle.
>
> Our Ceph usage has expanded and in general - we really like what we see.
>
> Thus - Can this be "sanely" consolidated somehow? I have seen this:
> https://docs.ceph.com/docs/jewel/cephfs/hadoop/
> But it seem really-really bogus to me.
>
> It recommends that you set:
> pool 3 'hadoop1' rep size 1 min_size 1
>
> Which would - if I understand correct - be disastrous. The Hadoop end would
> replicated in 3 across - but within Ceph the replication would be 1.
> The 1 replication in ceph means pulling the OSD node would "gaurantee" the
> pg's to go inactive - which could be ok - but there is nothing
> gauranteeing that the other Hadoop replicas are not served out of the same
> OSD-node/pg? In which case - rebooting an OSD node would bring the hadoop
> cluster unavailable.
>
> Is anyone serving HBase out of Ceph - how does the stadck and
> configuration look? If I went for 3 x replication in both Ceph and HDFS
> then it would definately work, but 9x copies of the dataset is a bit more
> than what looks feasible at the moment.
>
> Thanks for your reflections/input.
>
> Jesper
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Repeatedly OSD crashes in PrimaryLogPG::hit_set_trim()

2020-04-24 Thread KOT MATPOCKuH
Hello all!

I'm got repeatedlyOSD crashes for 3 of OSD which stracktrace:

 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
 1: (()+0x911e70) [0x564d0067fe70]
 2: (()+0xf5d0) [0x7f1272dad5d0]
 3: (gsignal()+0x37) [0x7f1271dce2c7]
 4: (abort()+0x148) [0x7f1271dcf9b8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x242) [0x7f12762252b2]
 6: (()+0x25a337) [0x7f1276225337]
 7: (PrimaryLogPG::hit_set_trim(std::unique_ptr >&, unsigned int)+0x930)
[0x564d002ab480]
 8: (PrimaryLogPG::hit_set_persist()+0xa0c) [0x564d002afafc]
 9: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2989)
[0x564d002c5f09]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xc99) [0x564d002cac09]
 11: (OSD::dequeue_op(boost::intrusive_ptr,
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x1b7)
[0x564d00124c87]
 12: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0x62) [0x564d0039d8c2]
 13: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x592) [0x564d00144ae2]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d3)
[0x7f127622aec3]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f127622bab0]
 16: (()+0x7dd5) [0x7f1272da5dd5]
 17: (clone()+0x6d) [0x7f1271e95f6d]

I think this happens after some network troubles and one PG was marked
recovery_unfound, to solve this problem this command was executed:
ceph pg 2.f8 mark_unfound_lost revert

Also last lines in debug output before crash are related to this PG:
-10001> 2020-04-23 14:16:18.790 7f534915a700 10 osd.12 pg_epoch: 10476
pg[2.f8( v 8673'30651498 (6953'30648450,8673'30651498]
local-lis/les=10475/10476 n=14 ec=66/66 lis/c 10475/10427 les/c/f
10476/10428/201 10475/10475/10473) [12,13,17] r=0 lpr=10475
pi=[10427,10475)/2 crt=8673'30651498 lcod 0'0 mlcod 0'0 active mbc={}]
get_object_context: obc NOT found in cache:
2:1f00:.ceph-internal::hit_set_2.f8_archive_2020-04-22
02%3a57%3a10.496532Z_2020-04-22 03%3a57%3a11.211949Z:head
-10001> 2020-04-23 14:16:18.790 7f534915a700 10 osd.12 pg_epoch: 10476
pg[2.f8( v 8673'30651498 (6953'30648450,8673'30651498]
local-lis/les=10475/10476 n=14 ec=66/66 lis/c 10475/10427 les/c/f
10476/10428/201 10475/10475/10473) [12,13,17] r=0 lpr=10475
pi=[10427,10475)/2 crt=8673'30651498 lcod 0'0 mlcod 0'0 active mbc={}]
get_object_context: no obc for soid
2:1f00:.ceph-internal::hit_set_2.f8_archive_2020-04-22
02%3a57%3a10.496532Z_2020-04-22 03%3a57%3a11.211949Z:head and !can_create

I'm evicted must of PG's from cache pool, and currently we have only two
rados objects in this PG:
# rados --pgid 2.f8 ls
rbd_data.10d4416b8b4567.1dfb
rbd_header.1db946b8b4567

I tried to remove one of them:
rados -p vms-cache rm rbd_header.1db946b8b4567

But command not completed for last ~8 hours, because repeatedly OSD crashes.

Does we have any ways to solve this problem without data loss?

-- 
MATPOCKuH
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: active+remapped+backfilling keeps going .. and going

2020-04-24 Thread Lomayani S. Laizer
I had a similar problem  when upgraded to octopus and the solution is to
turn off  autobalancing.

You can try to turn off if enabled

ceph balancer off



On Fri, Apr 24, 2020 at 8:51 AM Eugen Block  wrote:

> Hi,
> the balancer is probably running, which mode? I changed the mode to
> none in our own cluster because it also never finished rebalancing and
> we didn’t have a bad pg distribution. Maybe it’s supposed to be like
> that, I don’t know.
>
> Regards
> Eugen
>
>
> Zitat von "Kyriazis, George" :
>
> > Hello,
> >
> > I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15
> > OSDs), on a 10G network.
> >
> > The cluster started small, and I’ve progressively added OSDs over
> > time.  Problem is…. The cluster never rebalances completely.  There
> > is always progress on backfilling, but PGs that used to be in
> > active+clean state jump back into the active+remapped+backfilling
> > (or active+remapped+backfill_wait) state, to be moved to different
> > OSDs.
> >
> > Initially I had a 1G network (recently upgraded to 10G), and I was
> > holding on the backfill settings (osd_max_backfills and
> > osd_recovery_sleep_hdd).  I just recently (last few weeks) upgraded
> > to 10G, with osd_max_backfills = 50 and osd_recovery_sleep_hdd = 0
> > (only HDDs, no SSDs).  Cluster has been backfilling for months now
> > with no end in sight.
> >
> > Is this normal behavior?  Is there any setting that I can look at
> > that till give me an idea as to why PGs are jumping back into
> > remapped from clean?
> >
> > Below is output of “ceph osd tree” and “ceph osd df”:
> >
> > # ceph osd tree
> > ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
> >  -1   203.72472 root default
> >  -940.01666 host vis-hsw-01
> >   3   hdd  10.91309 osd.3   up  1.0 1.0
> >   6   hdd  14.55179 osd.6   up  1.0 1.0
> >  10   hdd  14.55179 osd.10  up  1.0 1.0
> > -1340.01666 host vis-hsw-02
> >   0   hdd  10.91309 osd.0   up  1.0 1.0
> >   7   hdd  14.55179 osd.7   up  1.0 1.0
> >  11   hdd  14.55179 osd.11  up  1.0 1.0
> > -1140.01666 host vis-hsw-03
> >   4   hdd  10.91309 osd.4   up  1.0 1.0
> >   8   hdd  14.55179 osd.8   up  1.0 1.0
> >  12   hdd  14.55179 osd.12  up  1.0 1.0
> >  -340.01666 host vis-hsw-04
> >   5   hdd  10.91309 osd.5   up  1.0 1.0
> >   9   hdd  14.55179 osd.9   up  1.0 1.0
> >  13   hdd  14.55179 osd.13  up  1.0 1.0
> > -1543.65807 host vis-hsw-05
> >   1   hdd  14.55269 osd.1   up  1.0 1.0
> >   2   hdd  14.55269 osd.2   up  1.0 1.0
> >  14   hdd  14.55269 osd.14  up  1.0 1.0
> >  -5   0 host vis-ivb-07
> >  -7   0 host vis-ivb-10
> > #
> >
> > # ceph osd df
> > ID CLASS WEIGHT   REWEIGHT SIZERAW USE DATAOMAPMETA
> > AVAIL   %USE  VAR  PGS STATUS
> >  3   hdd 10.91309  1.0  11 TiB 8.2 TiB 8.2 TiB 552 MiB  25 GiB
> > 2.7 TiB 75.08 1.19 131 up
> >  6   hdd 14.55179  1.0  15 TiB 9.1 TiB 9.1 TiB 1.2 GiB  30 GiB
> > 5.5 TiB 62.47 0.99 148 up
> > 10   hdd 14.55179  1.0  15 TiB 8.1 TiB 8.1 TiB 1.5 GiB  20 GiB
> > 6.4 TiB 55.98 0.89 142 up
> >  0   hdd 10.91309  1.0  11 TiB 7.5 TiB 7.4 TiB 504 MiB  24 GiB
> > 3.5 TiB 68.34 1.09 120 up
> >  7   hdd 14.55179  1.0  15 TiB 8.7 TiB 8.7 TiB 1.0 GiB  31 GiB
> > 5.8 TiB 60.07 0.95 144 up
> > 11   hdd 14.55179  1.0  15 TiB 9.4 TiB 9.3 TiB 819 MiB  20 GiB
> > 5.2 TiB 64.31 1.02 147 up
> >  4   hdd 10.91309  1.0  11 TiB 7.0 TiB 7.0 TiB 284 MiB  25 GiB
> > 3.9 TiB 64.35 1.02 112 up
> >  8   hdd 14.55179  1.0  15 TiB 9.3 TiB 9.2 TiB 1.8 GiB  29 GiB
> > 5.3 TiB 63.65 1.01 157 up
> > 12   hdd 14.55179  1.0  15 TiB 8.6 TiB 8.6 TiB 623 MiB  19 GiB
> > 5.9 TiB 59.14 0.94 136 up
> >  5   hdd 10.91309  1.0  11 TiB 8.6 TiB 8.6 TiB 542 MiB  29 GiB
> > 2.3 TiB 79.01 1.26 134 up
> >  9   hdd 14.55179  1.0  15 TiB 8.2 TiB 8.2 TiB 707 MiB  27 GiB
> > 6.3 TiB 56.56 0.90 138 up
> > 13   hdd 14.55179  1.0  15 TiB 8.7 TiB 8.7 TiB 741 MiB  18 GiB
> > 5.8 TiB 59.85 0.95 134 up
> >  1   hdd 14.55269  1.0  15 TiB 9.8 TiB 9.8 TiB 1.3 GiB  20 GiB
> > 4.8 TiB 67.18 1.07 158 up
> >  2   hdd 14.55269  1.0  15 TiB 8.7 TiB 8.7 TiB 936 MiB  18 GiB
> > 5.8 TiB 60.04 0.95 148 up
> > 14   hdd 14.55269  1.0  15 TiB 8.3 TiB 8.3 TiB 673 MiB  18 GiB
> > 6.3 TiB 56.97 0.90 131 up
> >  TOTAL 204 TiB 128 TiB 128 TiB  13 GiB 350 GiB
> > 75 TiB 62.95
> > MIN/MAX VAR: 0.89/1.26  STDDEV: 6.44
> > #
> >
> >
> > Thank you!
> >
> > George
> >
> > ___
> > ceph-users 

[ceph-users] HBase/HDFS on Ceph/CephFS

2020-04-24 Thread jesper
Hi

We have an 3 year old Hadoop cluster - up for refresh - so it is time
to evaluate options. The "only" usecase is running an HBase installation
which is important for us and migrating out of HBase would be a hazzle.

Our Ceph usage has expanded and in general - we really like what we see.

Thus - Can this be "sanely" consolidated somehow? I have seen this:
https://docs.ceph.com/docs/jewel/cephfs/hadoop/
But it seem really-really bogus to me.

It recommends that you set:
pool 3 'hadoop1' rep size 1 min_size 1

Which would - if I understand correct - be disastrous. The Hadoop end would
replicated in 3 across - but within Ceph the replication would be 1.
The 1 replication in ceph means pulling the OSD node would "gaurantee" the
pg's to go inactive - which could be ok - but there is nothing
gauranteeing that the other Hadoop replicas are not served out of the same
OSD-node/pg? In which case - rebooting an OSD node would bring the hadoop
cluster unavailable.

Is anyone serving HBase out of Ceph - how does the stadck and
configuration look? If I went for 3 x replication in both Ceph and HDFS
then it would definately work, but 9x copies of the dataset is a bit more
than what looks feasible at the moment.

Thanks for your reflections/input.

Jesper
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io