[ceph-users] issue with monitors

2020-08-27 Thread techno10
i'm running the following
[root@node1 ~]# ceph --versionceph version 15.2.4 
(7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
on fedora 32.. installed from the built-in repos. I'm running into a simple 
issue that's rather frustrating. Here is a set of commands i'm running and 
output:
[root@node1 ~]# ceph orch daemon add mon 
node3:[v2:172.16.0.47:3000,v1:172.16.0.47:6789]Deployed mon.node3 on host 
'node3'

[root@node1 ~]# ceph mon dumpdumped monmap epoch 4epoch 4fsid 
c43406b4-e8c2-11ea-b934-001b21d6d88clast_changed 
2020-08-28T00:13:27.040659+created 
2020-08-28T00:10:04.265004+min_mon_release 15 (octopus)0: 
[v2:172.16.0.45:3300/0,v1:172.16.0.45:6789/0] mon.node11: 
[v2:172.16.0.46:3300/0,v1:172.16.0.46:6789/0] mon.node2
it said it added it but it fails to show up?
I attempt to re-add:
[root@node1 ~]# ceph orch daemon add mon 
node3:[v2:172.16.0.47:3000,v1:172.16.0.47:6789]Error ENOENT: ('name %s already 
in use', 'node3')
but again, it's not there, checking with a different command:
[root@node1 ~]# ceph mon state4: 2 mons at 
{node1=[v2:172.16.0.45:3300/0,v1:172.16.0.45:6789/0],node2=[v2:172.16.0.46:3300/0,v1:172.16.0.46:6789/0]},
 election epoch 16, leader 0 node1, quorum 0,1 node1,node2
this is a very basic function that I think I'm doing correctly but it seems 
like it's not working properly??
thanks,Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it possible to mount a cephfs within a container?

2020-08-27 Thread steven prothero
Hello,

octopus 15.2.4

just as a test, I put my OSDs each inside of a LXD container. Set up
cephFS and mounted it inside a LXD container and it works.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Fwd: Ceph Upgrade Issue - Luminous to Nautilus (14.2.11 ) using ceph-ansible

2020-08-27 Thread Suresh Rama
Hi All,

We encountered an issue while upgrading our Ceph cluster from Luminous
12.2.12 to Nautilus 14.2.11.   We used
https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous
and ceph-ansible to upgrade the cluster.  We use HDD for data and NVME for
WAL and DB.

*Cluster Background:*
HP DL360
24 x 3.6T SATA
2x1.6T NVME for Journal
osd_scenario: non-collocated
current version: Luminous  12.2.12  & 12.2.5
type: bluestore

The upgrade went well for MONs (though I had to overcome the systemd
masking issues).  While testing OSD upgrade with one OSD node, we
encountered issue with OSD daemon failing quickly after startup. After
comparing and checking the block devices mapping, everything looks fine.
The nodes was up for more than 700+ days and then I decided to do a clean
reboot.  After that noticed the mount points are completely missing and
also ceph-disk is no longer part of nautilus. Had to manually mount the
partitions after checking disk partitions  and whoami information.  After
manually mounting the osd.108, now it's throwing permission error which I'm
still reviewing (bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Permission denied).  Enclosed the log of the OSD for full review
- https://pastebin.com/7k0xBfDV.

*Questions*:
What could have went wrong here and how can we fix this?
Do we need to migrate the Luminous cluster from ceph-disk to ceph-volume
before attempting upgrade or any other best practice can be followed?
What's the best upgrade method using ceph-ansible to move from Luminous to
Nautilus?  Manual upgrade of Ceph-ansible?

Started thinking now Octopus release which uses container, what is the best
transition path for long run?  We don't want to destroy and rebuild the
entire cluster but we can do one node at a time but that would be a very
lengthy process for 2500+ systems of 13 clusters.  Looking for help and
expert comments on the transition path.

Any help would be greatly appreciated.

2020-08-27 14:41:01.132 7f0e0ebf2c00  0 bdev(0xb7e2a80
/var/lib/ceph/osd/ceph-108/block.wal) ioctl(F_SET_FILE_RW_HINT) on
/var/lib/ceph/osd/ceph-108/block.wal failed: (22) Invalid argument
2020-08-27 14:41:01.132 7f0e0ebf2c00  1 bdev(0xb7e2a80
/var/lib/ceph/osd/ceph-108/block.wal) open size 1073741824 (0x4000, 1
GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-08-27 14:41:01.132 7f0e0ebf2c00  1 bluefs add_block_device bdev 0 path
/var/lib/ceph/osd/ceph-108/block.wal size 1 GiB
2020-08-27 14:41:01.132 7f0e0ebf2c00  0  set rocksdb option
compaction_style = kCompactionStyleLevel
2020-08-27 14:41:01.132 7f0e0ebf2c00 -1 rocksdb: Invalid argument: Can't
parse option compaction_threads
2020-08-27 14:41:01.136 7f0e0ebf2c00 -1
/build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::_upgrade_super()' thread 7f0e0ebf2c00 time 2020-08-27
14:41:01.135973
/build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: 10249: FAILED
ceph_assert(ondisk_format > 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x846368]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*,
char const*, ...)+0) [0x846543]
 3: (BlueStore::_upgrade_super()+0x4b6) [0xd62346]
 4: (BlueStore::_mount(bool, bool)+0x592) [0xdb0b52]
 5: (OSD::init()+0x3f3) [0x8f5483]
 6: (main()+0x27e2) [0x84c462]
 7: (__libc_start_main()+0xf0) [0x7f0e0bda3830]
 8: (_start()+0x29) [0x880389]

Journalctl -xu log

Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Perm
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Perm
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)

[ceph-users] Re: [cephadm] Deploy Ceph in a closed environment

2020-08-27 Thread Tony Liu
Please discard this question, I figure it out.

Tony
> -Original Message-
> From: Tony Liu 
> Sent: Thursday, August 27, 2020 1:55 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] [cephadm] Deploy Ceph in a closed environment
> 
> Hi,
> 
> I'd like to deploy Ceph in a closed environment (no connectivity to
> public). I will build repository and registry to hold required packages
> and container images. How do I specify the private registry when running
> "cephadm bootstrap"? The same question for adding OSD.
> 
> Thanks!
> Tony
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [cephadm] Deploy Ceph in a closed environment

2020-08-27 Thread Tony Liu
Hi,

I'd like to deploy Ceph in a closed environment (no connectivity
to public). I will build repository and registry to hold required
packages and container images. How do I specify the private
registry when running "cephadm bootstrap"? The same question for
adding OSD.

Thanks!
Tony

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph auth ls

2020-08-27 Thread Marc Roos
 
This what I mean, this guy is just posting all his keys.

https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg26140.html


-Original Message-
To: ceph-users
Subject: [ceph-users] ceph auth ls


Am I the only one that thinks it is not necessary to dump these keys 
with every command (ls and get)? Either remove these keys from auth ls 
and auth get. Or remove the commands "auth print_key" "auth print-key" 
and "auth get-key"





___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is it possible to mount a cephfs within a container?

2020-08-27 Thread Marc Roos


I am getting this, on a osd node I am able to mount the path.

adding ceph secret key to kernel failed: Operation not permitted
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph auth ls

2020-08-27 Thread Marc Roos


Am I the only one that thinks it is not necessary to dump these keys 
with every command (ls and get)? Either remove these keys from auth ls 
and auth get. Or remove the commands "auth print_key" "auth print-key" 
and "auth get-key"





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: Upgrade Path Advice Nautilus (CentOS 7) -> Octopus (new OS)

2020-08-27 Thread Cloud Guy
On Thu, 27 Aug 2020 at 13:21, Anthony D'Atri 
wrote:

>
>
> >
> > Looking for a bit of guidance / approach to upgrading from Nautilus to
> > Octopus considering CentOS and Ceph-Ansible.
> >
> > We're presently running a Nautilus cluster (all nodes / daemons 14.2.11
> as
> > of this post).
> > - There are 4 monitor-hosts with mon, mgr, and dashboard functions
> > consolidated;
>
> You want an odd number of mons.  Add or remove one.
>

Agreed.   Odd number is the target end state.


>
> > - 4 RGW hosts
> > - 4 ODS costs, with 10 OSDs each.   This is planned to scale to 7 nodes
> > with additional OSDs and capacity (considering to do this as part of
> > upgrade process)
>
>  Don’t tempt fate.  One thing at a time. Not three.
>

Never said I was doing all three.  One at a time as per suggested proc.
We would be upgrading MGRs, MONs in one go given they are collocated on the
same nodes.

>
> > - Currently using ceph-ansible (however it's a process to maintain
> scripts
> > / configs between playbook versions - although a great framework, not
> ideal
> > in our case;
>
> ^ Kefu  ;)
>

??  Not sure I follow.  Our question is around Ceph Orchestrator vs
Ansible.   The idea of having something managed by the Ceph project vs. a
bolt-on.   There are valid arguments for both.   My comments were not
intended to offend.  Our objective is to reduce complexity / moving parts
in managing ceph as a whole.   Given the project has native orchestrator it
would be preferred to leverage / transition into that (for our deployment).



>
> > Octopus support on CentOS 7 is limited due to python dependencies, as a
> > result we want to move to CentOS 8 or Ubuntu 20.04.
>
> Do you have a compelling reason to go to Octopus today?
>

Is there a compelling reason not to proceed?  Is it not the next stable
release?  4 updates since release so far.  Specifically, I'm after object
lock and other performance efficiencies.


>
> >   The other outlier is CentOS native Kernel support for LSI2008 (eg.
> 9211)  HBAs which some of our
> > OSD nodes use.
>
> How is this a factor, do newer kernels drop support for that old HBA?
>

It's a RHEL / CentOS thing.   Mainline and Ubuntu kernels support is just
fine.   It's a mature HBA :) extensively deployed and used in scale out
storage clusters.


> > Here's an upgrade path scenario that is being considered.   At a
> high-level:
>
> I suggest that if you are set on doing this, you do one step at a time and
> don’t try to get fancy.  Especially since you only have one cluster.
>

Thats the intent.   I'm looking for validation / experiences and others
from their upgrades.


>
> I believe there are Nautlius packages available for CentOS 8 now, so
> perhaps:
>
> * Update each node — serially — to CentOS 8 + new Ceph packages
> * Update to Octopus via the documented method
> * Add your new nodes
>
>
Noted.   It's a valid scenario as well.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread DHilsbos
Dallas;

It looks to me like you will need to wait until data movement naturally 
resolves the near-full issue.

So long as you continue to have this:
  io:
recovery: 477 KiB/s, 330 keys/s, 29 objects/s
the cluster is working.

That said, there are some things you can do.
1)  The near-full ratio is configurable.  I don't have those commands 
immediately to hand, but Googling, or searching archives of this list should 
show you have to change this value from its default of 80%.  Make sure you set 
it back when the data movement is complete, or almost complete.  You need to be 
careful with this, as ceph will happily run up to the new near-full ratio, and 
error again.  You also need to keep track of the other full ratios (I believe 
there are 2 others).
2)  Adjust performance settings to allow the data movement to go faster.  
Again, I don't have those setting immediately to hand, but Googling something 
like 'ceph recovery tuning,' or searching this list, should point you in the 
right direction. Notice that you only have 6 PGs trying to move at a time, with 
2 blocked on your near-full OSDs (8 & 19).  I believe; by default, each OSD 
daemon is only involved in 1 data movement at a time.  The tradeoff here is 
user activity suffers if you adjust to favor recovery, however, with the 
cluster in ERROR status, I suspect user activity is already suffering.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Dallas Jones [mailto:djo...@tech4learning.com] 
Sent: Thursday, August 27, 2020 9:02 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster degraded after adding OSDs to increase 
capacity

The new drives are larger capacity than the first drives I added to the
cluster, but they're all SAS HDDs.



cephuser@ceph01:~$ ceph osd df tree
ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
 %USE  VAR  PGS STATUS TYPE NAME
-1   122.79410- 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
TiB 33.86 1.00   -root default
-340.93137-  41 TiB  14 TiB  14 TiB  72 GiB 154 GiB   27
TiB 33.86 1.00   -host ceph01
 0   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 7.4 GiB  24 GiB  569
GiB 79.64 2.35 218 up osd.0
 1   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 7.6 GiB  23 GiB  694
GiB 75.16 2.22 196 up osd.1
 2   hdd   2.72849  1.0 2.7 TiB 1.6 TiB 1.6 TiB 8.8 GiB  18 GiB  1.1
TiB 60.39 1.78 199 up osd.2
 3   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 8.3 GiB  23 GiB  583
GiB 79.13 2.34 202 up osd.3
 4   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.4 GiB  22 GiB  692
GiB 75.22 2.22 214 up osd.4
 5   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 8.5 GiB  19 GiB  1.0
TiB 62.39 1.84 195 up osd.5
 6   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 8.5 GiB  21 GiB  709
GiB 74.62 2.20 217 up osd.6
22   hdd   5.45799  1.0 5.5 TiB 4.2 GiB 165 MiB 2.0 GiB 2.1 GiB  5.5
TiB  0.08 0.00  23 up osd.22
23   hdd   5.45799  1.0 5.5 TiB 2.7 GiB 161 MiB 1.5 GiB 1.0 GiB  5.5
TiB  0.05 0.00  23 up osd.23
27   hdd   5.45799  1.0 5.5 TiB  23 GiB  17 GiB 5.0 GiB 1.3 GiB  5.4
TiB  0.42 0.01  63 up osd.27
28   hdd   5.45799  1.0 5.5 TiB  10 GiB 2.8 GiB 6.0 GiB 1.3 GiB  5.4
TiB  0.18 0.01  82 up osd.28
-540.93137-  41 TiB  14 TiB  14 TiB  71 GiB 157 GiB   27
TiB 33.89 1.00   -host ceph02
 7   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 9.6 GiB  23 GiB  652
GiB 76.66 2.26 221 up osd.7
 8   hdd   2.72849  0.95001 2.7 TiB 2.4 TiB 2.4 TiB 7.6 GiB  26 GiB  308
GiB 88.98 2.63 220 up osd.8
 9   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.5 GiB  23 GiB  679
GiB 75.71 2.24 214 up osd.9
10   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 7.5 GiB  21 GiB  777
GiB 72.18 2.13 208 up osd.10
11   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 6.1 GiB  22 GiB  752
GiB 73.10 2.16 191 up osd.11
12   hdd   2.72849  1.0 2.7 TiB 1.5 TiB 1.5 TiB 9.1 GiB  18 GiB  1.2
TiB 56.45 1.67 188 up osd.12
13   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 7.9 GiB  19 GiB 1024
GiB 63.37 1.87 193 up osd.13
25   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 165 MiB 3.7 GiB 1.0 GiB  5.5
TiB  0.09 0.00  42 up osd.25
26   hdd   5.45799  1.0 5.5 TiB 2.9 GiB 157 MiB 1.6 GiB 1.2 GiB  5.5
TiB  0.05 0.00  26 up osd.26
29   hdd   5.45799  1.0 5.5 TiB  24 GiB  18 GiB 4.2 GiB 1.2 GiB  5.4
TiB  0.43 0.01  58 up osd.29
30   hdd   5.45799  1.0 5.5 TiB  21 GiB  14 GiB 5.6 GiB 1.3 GiB  5.4
TiB  0.38 0.01  71 up osd.30
-740.93137-  41 TiB  14 TiB  14 TiB  73 GiB 156 GiB   27
TiB 33.83 1.00   -host ceph03
14   hdd   2.72849  1.0 2.7 

[ceph-users] Re: Add OSD with primary on HDD, WAL and DB on SSD

2020-08-27 Thread Tony Liu
How's WAL utilize disk when it shares the same device with DB?
Say device size 50G, 100G, 200G, they are no difference to DB
because DB will take 30G anyways. Does it make any difference
to WAL?

Thanks!
Tony
> -Original Message-
> From: Zhenshi Zhou 
> Sent: Wednesday, August 26, 2020 11:16 PM
> To: Tony Liu 
> Cc: Anthony D'Atri ; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Add OSD with primary on HDD, WAL and DB on
> SSD
> 
> Official document says that you should allocate 4% of the slow device
> space for block.db.
> 
> But the main problem is that Bluestore uses RocksDB and RocksDB puts a
> file on the fast device only if it thinks that the whole layer will fit
> there.
> 
> As for RocksDB, L1 is about 300M, L2 is about 3G, L3 is near 30G, and L4
> is about 300G.
> For instance, RocksDB puts L2 files to block.db only if it’s at least 3G
> there.
> As a result, 30G is a acceptable value.
> 
> Tony Liu mailto:tonyliu0...@hotmail.com> >
> 于2020年8月25日周二 上午10:49写道:
> 
> 
>   > -Original Message-
>   > From: Anthony D'Atri   >
>   > Sent: Monday, August 24, 2020 7:30 PM
>   > To: Tony Liu   >
>   > Subject: Re: [ceph-users] Re: Add OSD with primary on HDD, WAL
> and DB on
>   > SSD
>   >
>   > Why such small HDDs?  Kinda not worth the drive bays and power,
> instead
>   > of the complexity of putting WAL+DB on a shared SSD, might you
> have been
>   > able to just buy SSDs and not split? ymmv.
> 
>   2TB is for testing, it will bump up to 10TB for production.
> 
>   > The limit is a function of the way the DB levels work, it’s not
>   > intentional.
>   >
>   > WAL by default takes a fixed size, like 512 MB or something.
>   >
>   > 64 GB is a reasonable size, it accomodates the WAL and allows
> space for
>   > DB compaction without overflowing.
> 
>   For each 10TB HDD, what's the recommended DB device size for both
>   DB and WAL? The doc recommends 1% - 4%, meaning 100GB - 400GB for
>   each 10TB HDD. But given the WAL data size and DB data size, I am
>   not sure if that 100GB - 400GB will be used efficiently.
> 
>   > With this commit the situation should be improved, though you
> don’t
>   > mention what release you’re running
>   >
>   > https://github.com/ceph/ceph/pull/29687
> 
>   I am using ceph version 15.2.4 octopus (stable).
> 
>   Thanks!
>   Tony
> 
>   > >>>  I don't need to create
>   > >>> WAL device, just primary on HDD and DB on SSD, and WAL will
> be using
>   > >>> DB device cause it's faster. Is that correct?
>   > >>
>   > >> Yes.
>   > >>
>   > >>
>   > >> But be aware that the DB sizes are limited to 3GB, 30GB and
> 300GB.
>   > >> Anything less than those sizes will have a lot of untilised
> space,
>   > >> e.g a 20GB device will only utilise 3GB.
>   > >
>   > > I have 1 480GB SSD and 7 2TB HDDs. 7 LVs are created on SSD,
> each is
>   > > about 64GB, for 7 OSDs.
>   > >
>   > > Since it's shared by DB and WAL, DB will take 30GB and WAL will
> take
>   > > the rest 34GB. Is that correct?
>   > >
>   > > Is that size of DB and WAL good for 2TB HDD (block store and
> object
>   > > store cases)?
>   > >
>   > > Could you share a bit more about the intention of such limit?
>   > >
>   > >
>   > > Thanks!
>   > > Tony
>   > > ___
>   > > ceph-users mailing list -- ceph-users@ceph.io  us...@ceph.io>  To unsubscribe send an
>   > > email to ceph-users-le...@ceph.io  le...@ceph.io>
> 
>   ___
>   ceph-users mailing list -- ceph-users@ceph.io  us...@ceph.io>
>   To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: Upgrade Path Advice Nautilus (CentOS 7) -> Octopus (new OS)

2020-08-27 Thread Anthony D'Atri


> 
> Looking for a bit of guidance / approach to upgrading from Nautilus to
> Octopus considering CentOS and Ceph-Ansible.
> 
> We're presently running a Nautilus cluster (all nodes / daemons 14.2.11 as
> of this post).
> - There are 4 monitor-hosts with mon, mgr, and dashboard functions
> consolidated;

You want an odd number of mons.  Add or remove one.

> - 4 RGW hosts
> - 4 ODS costs, with 10 OSDs each.   This is planned to scale to 7 nodes
> with additional OSDs and capacity (considering to do this as part of
> upgrade process)

 Don’t tempt fate.  One thing at a time. Not three.

> - Currently using ceph-ansible (however it's a process to maintain scripts
> / configs between playbook versions - although a great framework, not ideal
> in our case;

^ Kefu  ;)

> Octopus support on CentOS 7 is limited due to python dependencies, as a
> result we want to move to CentOS 8 or Ubuntu 20.04.

Do you have a compelling reason to go to Octopus today?

>   The other outlier is CentOS native Kernel support for LSI2008 (eg. 9211)  
> HBAs which some of our
> OSD nodes use.

How is this a factor, do newer kernels drop support for that old HBA?

> Here's an upgrade path scenario that is being considered.   At a high-level:

I suggest that if you are set on doing this, you do one step at a time and 
don’t try to get fancy.  Especially since you only have one cluster.

I believe there are Nautlius packages available for CentOS 8 now, so perhaps:

* Update each node — serially — to CentOS 8 + new Ceph packages
* Update to Octopus via the documented method
* Add your new nodes
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Tech Talk: Secure Token Service in the Rados Gateway

2020-08-27 Thread Mike Perez
Hi everyone,

In 30 minutes join us for this month's Ceph Tech Talk: Secure Token Service
in RGW:
https://ceph.io/ceph-tech-talks/

On Thu, Aug 13, 2020 at 1:11 PM Mike Perez  wrote:

> Hi everyone,
>
> Join us August 27th at 17:00 UTC to hear Pritha Srivastava present on this
> month's Ceph Tech Talk: Secure Token Service in the Rados Gateway. Calendar
> invite and archive can be found here:
>
> https://ceph.io/ceph-tech-talks/
>
> If you're interested or know someone who can present September 24th, or
> October 22nd please let me know!
> --
>
> Mike Perez
>
> He/Him
>
> Ceph Community Manager
>
> Red Hat Los Angeles 
>
> thin...@redhat.com
> M: 1-951-572-2633 IM: IRC Freenode/OFTC: thingee
>
> 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
> @Thingee 
> 
> 
>


-- 

Mike Perez

he/him

Ceph Community Manager


M: +1-951-572-2633

494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee   Thingee
 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread Anthony D'Atri
Is your MUA wrapping lines, or is the list software?

As predicted.  Look at the VAR column and the STDDEV of 37.27

> On Aug 27, 2020, at 9:02 AM, Dallas Jones  wrote:
> 
> 1   122.79410- 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
> TiB 33.86 1.00   -root default
> -340.93137-  41 TiB  14 TiB  14 TiB  72 GiB 154 GiB   27
> TiB 33.86 1.00   -host ceph01
> 0   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 7.4 GiB  24 GiB  569
> GiB 79.64 2.35 218 up osd.0
> 1   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 7.6 GiB  23 GiB  694
> GiB 75.16 2.22 196 up osd.1
> 2   hdd   2.72849  1.0 2.7 TiB 1.6 TiB 1.6 TiB 8.8 GiB  18 GiB  1.1
> TiB 60.39 1.78 199 up osd.2
> 3   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 8.3 GiB  23 GiB  583
> GiB 79.13 2.34 202 up osd.3
> 4   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.4 GiB  22 GiB  692
> GiB 75.22 2.22 214 up osd.4
> 5   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 8.5 GiB  19 GiB  1.0
> TiB 62.39 1.84 195 up osd.5
> 6   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 8.5 GiB  21 GiB  709
> GiB 74.62 2.20 217 up osd.6
> 22   hdd   5.45799  1.0 5.5 TiB 4.2 GiB 165 MiB 2.0 GiB 2.1 GiB  5.5
> TiB  0.08 0.00  23 up osd.22
> 23   hdd   5.45799  1.0 5.5 TiB 2.7 GiB 161 MiB 1.5 GiB 1.0 GiB  5.5
> TiB  0.05 0.00  23 up osd.23
> 27   hdd   5.45799  1.0 5.5 TiB  23 GiB  17 GiB 5.0 GiB 1.3 GiB  5.4
> TiB  0.42 0.01  63 up osd.27
> 28   hdd   5.45799  1.0 5.5 TiB  10 GiB 2.8 GiB 6.0 GiB 1.3 GiB  5.4
> TiB  0.18 0.01  82 up osd.28
> -540.93137-  41 TiB  14 TiB  14 TiB  71 GiB 157 GiB   27
> TiB 33.89 1.00   -host ceph02
> 7   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 9.6 GiB  23 GiB  652
> GiB 76.66 2.26 221 up osd.7
> 8   hdd   2.72849  0.95001 2.7 TiB 2.4 TiB 2.4 TiB 7.6 GiB  26 GiB  308
> GiB 88.98 2.63 220 up osd.8
> 9   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.5 GiB  23 GiB  679
> GiB 75.71 2.24 214 up osd.9
> 10   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 7.5 GiB  21 GiB  777
> GiB 72.18 2.13 208 up osd.10
> 11   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 6.1 GiB  22 GiB  752
> GiB 73.10 2.16 191 up osd.11
> 12   hdd   2.72849  1.0 2.7 TiB 1.5 TiB 1.5 TiB 9.1 GiB  18 GiB  1.2
> TiB 56.45 1.67 188 up osd.12
> 13   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 7.9 GiB  19 GiB 1024
> GiB 63.37 1.87 193 up osd.13
> 25   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 165 MiB 3.7 GiB 1.0 GiB  5.5
> TiB  0.09 0.00  42 up osd.25
> 26   hdd   5.45799  1.0 5.5 TiB 2.9 GiB 157 MiB 1.6 GiB 1.2 GiB  5.5
> TiB  0.05 0.00  26 up osd.26
> 29   hdd   5.45799  1.0 5.5 TiB  24 GiB  18 GiB 4.2 GiB 1.2 GiB  5.4
> TiB  0.43 0.01  58 up osd.29
> 30   hdd   5.45799  1.0 5.5 TiB  21 GiB  14 GiB 5.6 GiB 1.3 GiB  5.4
> TiB  0.38 0.01  71 up osd.30
> -740.93137-  41 TiB  14 TiB  14 TiB  73 GiB 156 GiB   27
> TiB 33.83 1.00   -host ceph03
> 14   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 6.9 GiB  23 GiB  627
> GiB 77.56 2.29 205 up osd.14
> 15   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 6.8 GiB  21 GiB  793
> GiB 71.62 2.12 189 up osd.15
> 16   hdd   2.72849  1.0 2.7 TiB 1.9 TiB 1.9 TiB 8.7 GiB  21 GiB  813
> GiB 70.89 2.09 209 up osd.16
> 17   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 8.6 GiB  23 GiB  609
> GiB 78.19 2.31 216 up osd.17
> 18   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 9.1 GiB  19 GiB  1.0
> TiB 62.40 1.84 209 up osd.18
> 19   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.2 TiB 9.1 GiB  24 GiB  541
> GiB 80.65 2.38 210 up osd.19
> 20   hdd   2.72849  1.0 2.7 TiB 1.8 TiB 1.8 TiB 8.4 GiB  19 GiB  969
> GiB 65.32 1.93 200 up osd.20
> 21   hdd   5.45799  1.0 5.5 TiB 3.7 GiB 161 MiB 2.2 GiB 1.3 GiB  5.5
> TiB  0.07 0.00  28 up osd.21
> 24   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 177 MiB 3.6 GiB 1.1 GiB  5.5
> TiB  0.09 0.00  37 up osd.24
> 31   hdd   5.45799  1.0 5.5 TiB 8.9 GiB 2.7 GiB 5.0 GiB 1.2 GiB  5.4
> TiB  0.16 0.00  59 up osd.31
> 32   hdd   5.45799  1.0 5.5 TiB 6.0 GiB 182 MiB 4.7 GiB 1.1 GiB  5.5
> TiB  0.11 0.00  70 up osd.32
>  TOTAL 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread Dallas Jones
The new drives are larger capacity than the first drives I added to the
cluster, but they're all SAS HDDs.



cephuser@ceph01:~$ ceph osd df tree
ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
 %USE  VAR  PGS STATUS TYPE NAME
-1   122.79410- 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
TiB 33.86 1.00   -root default
-340.93137-  41 TiB  14 TiB  14 TiB  72 GiB 154 GiB   27
TiB 33.86 1.00   -host ceph01
 0   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 7.4 GiB  24 GiB  569
GiB 79.64 2.35 218 up osd.0
 1   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 7.6 GiB  23 GiB  694
GiB 75.16 2.22 196 up osd.1
 2   hdd   2.72849  1.0 2.7 TiB 1.6 TiB 1.6 TiB 8.8 GiB  18 GiB  1.1
TiB 60.39 1.78 199 up osd.2
 3   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 8.3 GiB  23 GiB  583
GiB 79.13 2.34 202 up osd.3
 4   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.4 GiB  22 GiB  692
GiB 75.22 2.22 214 up osd.4
 5   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 8.5 GiB  19 GiB  1.0
TiB 62.39 1.84 195 up osd.5
 6   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 8.5 GiB  21 GiB  709
GiB 74.62 2.20 217 up osd.6
22   hdd   5.45799  1.0 5.5 TiB 4.2 GiB 165 MiB 2.0 GiB 2.1 GiB  5.5
TiB  0.08 0.00  23 up osd.22
23   hdd   5.45799  1.0 5.5 TiB 2.7 GiB 161 MiB 1.5 GiB 1.0 GiB  5.5
TiB  0.05 0.00  23 up osd.23
27   hdd   5.45799  1.0 5.5 TiB  23 GiB  17 GiB 5.0 GiB 1.3 GiB  5.4
TiB  0.42 0.01  63 up osd.27
28   hdd   5.45799  1.0 5.5 TiB  10 GiB 2.8 GiB 6.0 GiB 1.3 GiB  5.4
TiB  0.18 0.01  82 up osd.28
-540.93137-  41 TiB  14 TiB  14 TiB  71 GiB 157 GiB   27
TiB 33.89 1.00   -host ceph02
 7   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 9.6 GiB  23 GiB  652
GiB 76.66 2.26 221 up osd.7
 8   hdd   2.72849  0.95001 2.7 TiB 2.4 TiB 2.4 TiB 7.6 GiB  26 GiB  308
GiB 88.98 2.63 220 up osd.8
 9   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.5 GiB  23 GiB  679
GiB 75.71 2.24 214 up osd.9
10   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 7.5 GiB  21 GiB  777
GiB 72.18 2.13 208 up osd.10
11   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 6.1 GiB  22 GiB  752
GiB 73.10 2.16 191 up osd.11
12   hdd   2.72849  1.0 2.7 TiB 1.5 TiB 1.5 TiB 9.1 GiB  18 GiB  1.2
TiB 56.45 1.67 188 up osd.12
13   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 7.9 GiB  19 GiB 1024
GiB 63.37 1.87 193 up osd.13
25   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 165 MiB 3.7 GiB 1.0 GiB  5.5
TiB  0.09 0.00  42 up osd.25
26   hdd   5.45799  1.0 5.5 TiB 2.9 GiB 157 MiB 1.6 GiB 1.2 GiB  5.5
TiB  0.05 0.00  26 up osd.26
29   hdd   5.45799  1.0 5.5 TiB  24 GiB  18 GiB 4.2 GiB 1.2 GiB  5.4
TiB  0.43 0.01  58 up osd.29
30   hdd   5.45799  1.0 5.5 TiB  21 GiB  14 GiB 5.6 GiB 1.3 GiB  5.4
TiB  0.38 0.01  71 up osd.30
-740.93137-  41 TiB  14 TiB  14 TiB  73 GiB 156 GiB   27
TiB 33.83 1.00   -host ceph03
14   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 6.9 GiB  23 GiB  627
GiB 77.56 2.29 205 up osd.14
15   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 6.8 GiB  21 GiB  793
GiB 71.62 2.12 189 up osd.15
16   hdd   2.72849  1.0 2.7 TiB 1.9 TiB 1.9 TiB 8.7 GiB  21 GiB  813
GiB 70.89 2.09 209 up osd.16
17   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 8.6 GiB  23 GiB  609
GiB 78.19 2.31 216 up osd.17
18   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 9.1 GiB  19 GiB  1.0
TiB 62.40 1.84 209 up osd.18
19   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.2 TiB 9.1 GiB  24 GiB  541
GiB 80.65 2.38 210 up osd.19
20   hdd   2.72849  1.0 2.7 TiB 1.8 TiB 1.8 TiB 8.4 GiB  19 GiB  969
GiB 65.32 1.93 200 up osd.20
21   hdd   5.45799  1.0 5.5 TiB 3.7 GiB 161 MiB 2.2 GiB 1.3 GiB  5.5
TiB  0.07 0.00  28 up osd.21
24   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 177 MiB 3.6 GiB 1.1 GiB  5.5
TiB  0.09 0.00  37 up osd.24
31   hdd   5.45799  1.0 5.5 TiB 8.9 GiB 2.7 GiB 5.0 GiB 1.2 GiB  5.4
TiB  0.16 0.00  59 up osd.31
32   hdd   5.45799  1.0 5.5 TiB 6.0 GiB 182 MiB 4.7 GiB 1.1 GiB  5.5
TiB  0.11 0.00  70 up osd.32
  TOTAL 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
TiB 33.86
MIN/MAX VAR: 0.00/2.63  STDDEV: 37.27

On Thu, Aug 27, 2020 at 8:43 AM Eugen Block  wrote:

> Hi,
>
> are the new OSDs in the same root and is it the same device class? Can
> you share the output of ‚ceph osd df tree‘?
>
>
> Zitat von Dallas Jones :
>
> > My 3-node Ceph cluster (14.2.4) has been running fine for months.
> However,
> > my data pool became close to full a couple of weeks ago, so I added 12
> new
> > OSDs, roughly doubling the capacity 

[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread Anthony D'Atri
Doubling the capacity in one shot was a big topology change, hence the 53% 
misplaced.

OSD fullness will naturally reflect a bell curve; there will be a tail of 
under-full and over-full OSDs.  If you’d not said that your cluster was very 
full before expansion I would have predicted it from the full / nearfull OSDs.

Think of CRUSH has a hash function that can experience collisions.  When you 
change the topology, some collisions are removed, and sometimes PGs newly land 
on OSDs that they were previously redirected from, which can result in 
additional fillage.   This can also occur as just a natural result of move data 
moving onto a given OSD before it’s moved off, especially as Ceph makes copies 
before deleting the old during a move, to maintain full redundancy along the 
way.


`ceph osd df | sort -nk8`


Couple of ways to recover, depending on the unspecified release that you’re 
running.  You need to squeeze the most-full outliers down on a continual basis 
going forward.

* Balance OSDs with either the ceph-mgr pg-upmap balancer (if all clients are 
Luminous or better)
* Balance OSDs with reweight-by-utilization
* Balance OSDs with override weights `ceph osd reweight osd.666 0.xx`
* Raise the osd full ratio and backfill full ratio a few percentage points to 
let the 3 affected OSDs drain.  You may need to restart them serially for the 
new setting to take effect.


> On Aug 27, 2020, at 8:28 AM, Dallas Jones  wrote:
> 
> My 3-node Ceph cluster (14.2.4) has been running fine for months. However,
> my data pool became close to full a couple of weeks ago, so I added 12 new
> OSDs, roughly doubling the capacity of the cluster. However, the pool size
> has not changed, and the health of the cluster has changed for the worse.
> The dashboard shows the following cluster status:
> 
>   - PG_DEGRADED_FULL: Degraded data redundancy (low space): 2 pgs
>   backfill_toofull
>   - POOL_NEARFULL: 6 pool(s) nearfull
>   - OSD_NEARFULL: 1 nearfull osd(s)
> 
> Output from ceph -s:
> 
>  cluster:
>id: e5a47160-a302-462a-8fa4-1e533e1edd4e
>health: HEALTH_ERR
>1 nearfull osd(s)
>6 pool(s) nearfull
>Degraded data redundancy (low space): 2 pgs backfill_toofull
> 
>  services:
>mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 5w)
>mgr: ceph01(active, since 4w), standbys: ceph03, ceph02
>mds: cephfs:1 {0=ceph01=up:active} 2 up:standby
>osd: 33 osds: 33 up (since 43h), 33 in (since 43h); 1094 remapped pgs
>rgw: 3 daemons active (ceph01, ceph02, ceph03)
> 
>  data:
>pools:   6 pools, 1632 pgs
>objects: 134.50M objects, 7.8 TiB
>usage:   42 TiB used, 81 TiB / 123 TiB avail
>pgs: 213786007/403501920 objects misplaced (52.983%)
> 1088 active+remapped+backfill_wait
> 538  active+clean
> 4active+remapped+backfilling
> 2active+remapped+backfill_wait+backfill_toofull
> 
>  io:
>recovery: 477 KiB/s, 330 keys/s, 29 objects/s
> 
> Can someone steer me in the right direction for how to get my cluster
> healthy again?
> 
> Thanks in advance!
> 
> -Dallas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread Eugen Block

Hi,

are the new OSDs in the same root and is it the same device class? Can  
you share the output of ‚ceph osd df tree‘?



Zitat von Dallas Jones :


My 3-node Ceph cluster (14.2.4) has been running fine for months. However,
my data pool became close to full a couple of weeks ago, so I added 12 new
OSDs, roughly doubling the capacity of the cluster. However, the pool size
has not changed, and the health of the cluster has changed for the worse.
The dashboard shows the following cluster status:

   - PG_DEGRADED_FULL: Degraded data redundancy (low space): 2 pgs
   backfill_toofull
   - POOL_NEARFULL: 6 pool(s) nearfull
   - OSD_NEARFULL: 1 nearfull osd(s)

Output from ceph -s:

  cluster:
id: e5a47160-a302-462a-8fa4-1e533e1edd4e
health: HEALTH_ERR
1 nearfull osd(s)
6 pool(s) nearfull
Degraded data redundancy (low space): 2 pgs backfill_toofull

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 5w)
mgr: ceph01(active, since 4w), standbys: ceph03, ceph02
mds: cephfs:1 {0=ceph01=up:active} 2 up:standby
osd: 33 osds: 33 up (since 43h), 33 in (since 43h); 1094 remapped pgs
rgw: 3 daemons active (ceph01, ceph02, ceph03)

  data:
pools:   6 pools, 1632 pgs
objects: 134.50M objects, 7.8 TiB
usage:   42 TiB used, 81 TiB / 123 TiB avail
pgs: 213786007/403501920 objects misplaced (52.983%)
 1088 active+remapped+backfill_wait
 538  active+clean
 4active+remapped+backfilling
 2active+remapped+backfill_wait+backfill_toofull

  io:
recovery: 477 KiB/s, 330 keys/s, 29 objects/s

Can someone steer me in the right direction for how to get my cluster
healthy again?

Thanks in advance!

-Dallas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread Dallas Jones
My 3-node Ceph cluster (14.2.4) has been running fine for months. However,
my data pool became close to full a couple of weeks ago, so I added 12 new
OSDs, roughly doubling the capacity of the cluster. However, the pool size
has not changed, and the health of the cluster has changed for the worse.
The dashboard shows the following cluster status:

   - PG_DEGRADED_FULL: Degraded data redundancy (low space): 2 pgs
   backfill_toofull
   - POOL_NEARFULL: 6 pool(s) nearfull
   - OSD_NEARFULL: 1 nearfull osd(s)

Output from ceph -s:

  cluster:
id: e5a47160-a302-462a-8fa4-1e533e1edd4e
health: HEALTH_ERR
1 nearfull osd(s)
6 pool(s) nearfull
Degraded data redundancy (low space): 2 pgs backfill_toofull

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 5w)
mgr: ceph01(active, since 4w), standbys: ceph03, ceph02
mds: cephfs:1 {0=ceph01=up:active} 2 up:standby
osd: 33 osds: 33 up (since 43h), 33 in (since 43h); 1094 remapped pgs
rgw: 3 daemons active (ceph01, ceph02, ceph03)

  data:
pools:   6 pools, 1632 pgs
objects: 134.50M objects, 7.8 TiB
usage:   42 TiB used, 81 TiB / 123 TiB avail
pgs: 213786007/403501920 objects misplaced (52.983%)
 1088 active+remapped+backfill_wait
 538  active+clean
 4active+remapped+backfilling
 2active+remapped+backfill_wait+backfill_toofull

  io:
recovery: 477 KiB/s, 330 keys/s, 29 objects/s

Can someone steer me in the right direction for how to get my cluster
healthy again?

Thanks in advance!

-Dallas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recover pgs from failed osds

2020-08-27 Thread Eugen Block
What is the memory_target for your OSDs? Can you share more details  
about your setup? You write about high memory, are the OSD nodes  
affected by OOM killer? You could try to reduce the osd_memory_target  
and see if that helps bring the OSDs back up. Splitting the PGs is a  
very heavy operation.



Zitat von Vahideh Alinouri :


Ceph cluster is updated from nautilus to octopus. On ceph-osd nodes we have
high I/O wait.

After increasing one of pool’s pg_num from 64 to 128 according to warning
message (more objects per pg), this lead to high cpu load and ram usage on
ceph-osd nodes and finally crashed the whole cluster. Three osds, one on
each host, stuck at down state (osd.34 osd.35 osd.40).

Starting the down osd service causes high ram usage and cpu load and
ceph-osd node to crash until the osd service fails.

The active mgr service on each mon host will crash after consuming almost
all available ram on the physical hosts.

I need to recover pgs and solving corruption. How can i recover unknown and
down pgs? Is there any way to starting up failed osd?


Below steps are done:

1- osd nodes’ kernel was upgraded to 5.4.2 before ceph cluster upgrading.
Reverting to previous kernel 4.2.1 is tested for iowate decreasing, but it
had no effect.

2- Recovering 11 pgs on failed osds by export them using
ceph-objectstore-tools utility and import them on other osds. The result
followed: 9 pgs are “down” and 2 pgs are “unknown”.

2-1) 9 pgs export and import successfully but status is “down” because of
"peering_blocked_by" 3 failed osds. I cannot lost osds because of
preventing unknown pgs from getting lost. pgs size in K and M.

"peering_blocked_by": [

{

"osd": 34,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

},

{

"osd": 35,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

},

{

"osd": 40,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

}

]


2-2) 1 pg (2.39) export and import successfully, but after starting osd
service (pg import to it), ceph-osd node RAM and CPU consumption increase
and cause ceph-osd node to crash until the osd service fails. Other osds
become "down" on ceph-osd node. pg status is “unknown”. I cannot use
"force-create-pg" because of data lost. pg 2.39 size is 19G.

# ceph pg map 2.39

osdmap e40347 pg 2.39 (2.39) -> up [32,37] acting [32,37]

# ceph pg 2.39 query

Error ENOENT: i don't have pgid 2.39


*pg 2.39 info on failed osd:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-34* --op info
--pgid 2.39

{

"pgid": "2.39",

"last_update": "35344'6456084",

"last_complete": "35344'6456084",

"log_tail": "35344'6453182",

"last_user_version": 10595821,

"last_backfill": "MAX",

"purged_snaps": [],

"history": {

"epoch_created": 146,

"epoch_pool_created": 79,

"last_epoch_started": 25208,

"last_interval_started": 25207,

"last_epoch_clean": 25208,

"last_interval_clean": 25207,

"last_epoch_split": 370,

"last_epoch_marked_full": 0,

"same_up_since": 8347,

"same_interval_since": 25207,

"same_primary_since": 8321,

"last_scrub": "35328'6440139",

"last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"last_deep_scrub": "35261'6031075",

"last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",

"last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"prior_readable_until_ub": 0

},

"stats": {

"version": "35344'6456082",

"reported_seq": "11733156",

"reported_epoch": "35344",

"state": "active+clean",

"last_fresh": "2020-08-19T14:16:18.587435+0430",

"last_change": "2020-08-19T12:00:59.377747+0430",

"last_active": "2020-08-19T14:16:18.587435+0430",

"last_peered": "2020-08-19T14:16:18.587435+0430",

"last_clean": "2020-08-19T14:16:18.587435+0430",

"last_became_active": "2020-08-06T00:23:51.016769+0430",

"last_became_peered": "2020-08-06T00:23:51.016769+0430",

"last_unstale": "2020-08-19T14:16:18.587435+0430",

"last_undegraded": "2020-08-19T14:16:18.587435+0430",

"last_fullsized": "2020-08-19T14:16:18.587435+0430",

"mapping_epoch": 8347,

"log_start": "35344'6453182",

"ondisk_log_start": "35344'6453182",

"created": 146,

"last_epoch_clean": 25208,

"parent": "0.0",

"parent_split_bits": 7,

"last_scrub": "35328'6440139",

"last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"last_deep_scrub": "35261'6031075",

"last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",

"last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"log_size": 2900,

"ondisk_log_size": 2900,

"stats_invalid": false,

"dirty_stats_invalid": false,

"omap_stats_invalid": false,

"hitset_stats_invalid": false,

"hitset_bytes_stats_invalid": false,

"pin_stats_invalid": false,

"manifest_stats_invalid": false,

"snaptrimq_len": 0,

"stat_sum": {

"num_bytes": 19749578960,

"num_objects": 2442,

"num_object_clones": 20,

"num_object_copies": 7326,

"num_objects_missing_on_primary": 0,

"num_objects_missing": 0,

"num_objects_degraded": 0,


[ceph-users] Re: kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)

2020-08-27 Thread Stefan Kooman
Hi list (and cephfs devs :-)),

On 2020-04-29 17:43, Jake Grimmett wrote:
> ...the "mdsmap_decode" errors stopped suddenly on all our clients...
> 
> Not exactly sure what the problem was, but restarting our standby mds
> demons seems to have been the fix.
> 
> Here's the log on the standby mds exactly when the errors stopped:
> 
> 2020-04-29 15:41:22.944 7f3d04e06700  1 mds.ceph-s2 Map has assigned me
> to become a standby
> 2020-04-29 15:43:05.621 7f3d04e06700  1 mds.ceph-s2 Updating MDS map to
> version 394712 from mon.0
> 2020-04-29 15:43:05.623 7f3d04e06700  1 mds.0.0 handle_mds_map i am now
> mds.34541673.0 replaying mds.0.0
> 2020-04-29 15:43:05.623 7f3d04e06700  1 mds.0.0 handle_mds_map state
> change up:boot --> up:standby-replay
> 2020-04-29 15:43:05.623 7f3d04e06700  1 mds.0.0 replay_start
> 2020-04-29 15:43:05.623 7f3d04e06700  1 mds.0.0  recovery set is
> 2020-04-29 15:43:05.655 7f3cfe5f9700  0 mds.0.cache creating system
> inode with ino:0x100
> 2020-04-29 15:43:05.656 7f3cfe5f9700  0 mds.0.cache creating system
> inode with ino:0x1

So, we got some HEALTH_WARN on our cluster because of this issue.

Cluster: 13.2.8
client: cephfs kernel client 5.7.9-050709-generic with 13.2.10 (Ubuntu
18.04)

The standby mds, and only the standby, is logging about this:

> 2020-08-27 06:25:01.086 7efc10cad700 -1 received  signal: Hangup from pkill 
> -1 -x ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw  (PID: 21705) 
> UID: 0
> 2020-08-27 08:42:25.340 7efc0d2be700  0 log_channel(cluster) log [WRN] : 1 
> slow requests, 1 included below; oldest blocked for > 30.497840 secs
> 2020-08-27 08:42:25.340 7efc0d2be700  0 log_channel(cluster) log [WRN] : slow 
> request 30.497839 seconds old, received at 2020-08-27 08:41:54.847218: 
> client_request(client.133487514:37390263 getattr AsLsXsFs #0x10050572c4e 
> 2020-08-27 08:41:54.840824 caller_uid=3860, caller_gid=3860{}) currently 
> failed to rdlock, waiting
> 2020-08-27 11:06:55.492 7efc0d2be700  0 log_channel(cluster) log [WRN] : 
> client.134430768 isn't responding to mclientcaps(revoke), ino 0x1005081be30 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 64.583827 seconds ago
> 2020-08-27 11:07:55.502 7efc0d2be700  0 log_channel(cluster) log [WRN] : 
> client.134430768 isn't responding to mclientcaps(revoke), ino 0x1005081be30 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 124.593098 seconds ago
> 2020-08-27 11:09:55.561 7efc0d2be700  0 log_channel(cluster) log [WRN] : 
> client.134430768 isn't responding to mclientcaps(revoke), ino 0x1005081be30 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 244.651434 seconds ago
> 2020-08-27 11:13:55.505 7efc0d2be700  0 log_channel(cluster) log [WRN] : 
> client.134430768 isn't responding to mclientcaps(revoke), ino 0x1005081be30 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 484.596083 seconds ago
> 2020-08-27 11:21:55.500 7efc0d2be700  0 log_channel(cluster) log [WRN] : 
> client.134430768 isn't responding to mclientcaps(revoke), ino 0x1005081be30 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 964.592686 seconds ago

On the clients we get the "mdsmap_decode got incorrect
state(up:standby-replay)" logging exactly on the times the mds2 is logging.

No logging of this on the active mds.


I would expect exactly the opposite. Why is the standby mds logging this?

Sometimes the "client.$id isn't responding to mclientcaps(revoke)"
warnings resolve itself. But it can also take a considerable amount of time.

I of course could restart the standby mds ... but that's not my first
choice. If this is a software defect, I would like to get it fixed.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph Octopus 15.2.3 ] MDS crashed suddenly

2020-08-27 Thread carlimeunier
Hello,

Same issue with another cluster. 
Here is the coredump tag 41659448-bc1b-4f8a-b563-d1599e84c0ab 

Thanks,
Carl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rados df with nautilus / bluestore

2020-08-27 Thread Igor Fedotov

Hi Manuel,

this behavior was primarily updated in Nautilus by 
https://github.com/ceph/ceph/pull/19454


Per-pool stats under "POOLS" section are now the most precise means to 
answer various questions about space utilization.


'STORED" column provides net amount of data for a specific pool.

You can use "ceph df detail --format json|json-pretty" to obtain a 
report in json format for further processing../.



Hope this helps,

Igor


On 8/27/2020 12:02 PM, Manuel Lausch wrote:

Hi

we found a very ugly issue in rados df

I have several clusters, all running ceph nautilus (14.2.11), We have
there replicated pools with replica size 4.

On the older clusters "rados df" shows in the used column the net used
space. On our new cluster, rados df shows in the used column the gross
used space.

The older clusters was upgraded from luminous (and before) and uses
filestore and the new cluster is initally deployed with nautilus and
bluestore.

why are the outputs different? Is this related with nautilus or with
bluestore? For our reporting this values have a significant relevance
and now I am running in such discrepancies.

What commands/metrics can I use to get more reliable values. Maybe "ceph
df detail"?


Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radowsgw still needs dedicated clientid?

2020-08-27 Thread Wido den Hollander




On 27/08/2020 14:23, Marc Roos wrote:
  
Can someone shed a light on this? Because it is the difference of

running multiple instances of one task, or running multiple different
tasks.


As far as I know this is still required because the client talk to each 
other using RADOS notifies and thus require different client IDs.


Wido





-Original Message-
To: ceph-users
Subject: [ceph-users] radowsgw still needs dedicated clientid?


I think I can remember reading somewhere that every radosgw is required
to run with their own clientid. Is this still necessary? Or can I run
multiple instances of radosgw with the same clientid?

So can have something like

rgw: 2 daemons active (rgw1, rgw1, rgw1)

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recover pgs from failed osds

2020-08-27 Thread Vahideh Alinouri
vahideh.alino...@gmail.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recover pgs from failed osds

2020-08-27 Thread Vahideh Alinouri
Ceph cluster is updated from nautilus to octopus. On ceph-osd nodes we have
high I/O wait.

After increasing one of pool’s pg_num from 64 to 128 according to warning
message (more objects per pg), this lead to high cpu load and ram usage on
ceph-osd nodes and finally crashed the whole cluster. Three osds, one on
each host, stuck at down state (osd.34 osd.35 osd.40).

Starting the down osd service causes high ram usage and cpu load and
ceph-osd node to crash until the osd service fails.

The active mgr service on each mon host will crash after consuming almost
all available ram on the physical hosts.

I need to recover pgs and solving corruption. How can i recover unknown and
down pgs? Is there any way to starting up failed osd?


Below steps are done:

1- osd nodes’ kernel was upgraded to 5.4.2 before ceph cluster upgrading.
Reverting to previous kernel 4.2.1 is tested for iowate decreasing, but it
had no effect.

2- Recovering 11 pgs on failed osds by export them using
ceph-objectstore-tools utility and import them on other osds. The result
followed: 9 pgs are “down” and 2 pgs are “unknown”.

2-1) 9 pgs export and import successfully but status is “down” because of
"peering_blocked_by" 3 failed osds. I cannot lost osds because of
preventing unknown pgs from getting lost. pgs size in K and M.

"peering_blocked_by": [

{

"osd": 34,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

},

{

"osd": 35,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

},

{

"osd": 40,

"current_lost_at": 0,

"comment": "starting or marking this osd lost may let us proceed"

}

]


2-2) 1 pg (2.39) export and import successfully, but after starting osd
service (pg import to it), ceph-osd node RAM and CPU consumption increase
and cause ceph-osd node to crash until the osd service fails. Other osds
become "down" on ceph-osd node. pg status is “unknown”. I cannot use
"force-create-pg" because of data lost. pg 2.39 size is 19G.

# ceph pg map 2.39

osdmap e40347 pg 2.39 (2.39) -> up [32,37] acting [32,37]

# ceph pg 2.39 query

Error ENOENT: i don't have pgid 2.39


*pg 2.39 info on failed osd:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-34* --op info
--pgid 2.39

{

"pgid": "2.39",

"last_update": "35344'6456084",

"last_complete": "35344'6456084",

"log_tail": "35344'6453182",

"last_user_version": 10595821,

"last_backfill": "MAX",

"purged_snaps": [],

"history": {

"epoch_created": 146,

"epoch_pool_created": 79,

"last_epoch_started": 25208,

"last_interval_started": 25207,

"last_epoch_clean": 25208,

"last_interval_clean": 25207,

"last_epoch_split": 370,

"last_epoch_marked_full": 0,

"same_up_since": 8347,

"same_interval_since": 25207,

"same_primary_since": 8321,

"last_scrub": "35328'6440139",

"last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"last_deep_scrub": "35261'6031075",

"last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",

"last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"prior_readable_until_ub": 0

},

"stats": {

"version": "35344'6456082",

"reported_seq": "11733156",

"reported_epoch": "35344",

"state": "active+clean",

"last_fresh": "2020-08-19T14:16:18.587435+0430",

"last_change": "2020-08-19T12:00:59.377747+0430",

"last_active": "2020-08-19T14:16:18.587435+0430",

"last_peered": "2020-08-19T14:16:18.587435+0430",

"last_clean": "2020-08-19T14:16:18.587435+0430",

"last_became_active": "2020-08-06T00:23:51.016769+0430",

"last_became_peered": "2020-08-06T00:23:51.016769+0430",

"last_unstale": "2020-08-19T14:16:18.587435+0430",

"last_undegraded": "2020-08-19T14:16:18.587435+0430",

"last_fullsized": "2020-08-19T14:16:18.587435+0430",

"mapping_epoch": 8347,

"log_start": "35344'6453182",

"ondisk_log_start": "35344'6453182",

"created": 146,

"last_epoch_clean": 25208,

"parent": "0.0",

"parent_split_bits": 7,

"last_scrub": "35328'6440139",

"last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"last_deep_scrub": "35261'6031075",

"last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",

"last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",

"log_size": 2900,

"ondisk_log_size": 2900,

"stats_invalid": false,

"dirty_stats_invalid": false,

"omap_stats_invalid": false,

"hitset_stats_invalid": false,

"hitset_bytes_stats_invalid": false,

"pin_stats_invalid": false,

"manifest_stats_invalid": false,

"snaptrimq_len": 0,

"stat_sum": {

"num_bytes": 19749578960,

"num_objects": 2442,

"num_object_clones": 20,

"num_object_copies": 7326,

"num_objects_missing_on_primary": 0,

"num_objects_missing": 0,

"num_objects_degraded": 0,

"num_objects_misplaced": 0,

"num_objects_unfound": 0,

"num_objects_dirty": 2442,

"num_whiteouts": 0,

"num_read": 16120686,

"num_read_kb": 82264126,

"num_write": 19731882,

"num_write_kb": 379030181,

"num_scrub_errors": 0,

"num_shallow_scrub_errors": 0,

"num_deep_scrub_errors": 0,

"num_objects_recovered": 2861,


[ceph-users] Re: radowsgw still needs dedicated clientid?

2020-08-27 Thread Marc Roos
 
Can someone shed a light on this? Because it is the difference of 
running multiple instances of one task, or running multiple different 
tasks.



-Original Message-
To: ceph-users
Subject: [ceph-users] radowsgw still needs dedicated clientid?


I think I can remember reading somewhere that every radosgw is required 
to run with their own clientid. Is this still necessary? Or can I run 
multiple instances of radosgw with the same clientid?

So can have something like 

rgw: 2 daemons active (rgw1, rgw1, rgw1)

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RandomCrashes on OSDs Attached to Mon Hosts with Octopus

2020-08-27 Thread Denis Krienbühl
Hi Igor

Just to clarify:

>> I grepped the logs for "checksum mismatch" and "_verify_csum". The only
>> occurrences I could find where the ones that preceed the crashes.
> 
> Are you able to find multiple _verify_csum precisely?

There are no “_verify_csum” entries whatsoever. I wrote that wrongly.
I could only find “checksum mismatch” right when the crash happens.

Sorry for the confusion.

I will keep tracking those counters and have a look at monitor/osd memory 
tracking.

Cheers,

Denis

> On 27 Aug 2020, at 13:39, Igor Fedotov  wrote:
> 
> Hi Denis
> 
> please see my comments inline.
> 
> 
> Thanks,
> 
> Igor
> 
> On 8/27/2020 10:06 AM, Denis Krienbühl wrote:
>> Hi Igor,
>> 
>> Thanks for your input. I tried to gather as much information as I could to
>> answer your questions. Hopefully we can get to the bottom of this.
>> 
>>> 0) What is backing disks layout for OSDs in question (main device type?, 
>>> additional DB/WAL devices?).
>> Everything is on a single Intel NVMe P4510 using dmcrypt with 2 OSDs per NVMe
>> device. There is no additional DB/WAL device and there are no HDDs involved.
>> 
>> Also note that we use 40 OSDs per host with a memory target of 6'174'015'488.
>> 
>>> 1) Please check all the existing logs for OSDs at "failing" nodes for other 
>>> checksum errors (as per my comment #38)
>> I grepped the logs for "checksum mismatch" and "_verify_csum". The only
>> occurrences I could find where the ones that preceed the crashes.
> 
> Are you able to find multiple _verify_csum precisely?
> 
> If so this means data read failures were observed at user data not RocksDB 
> one. Which backs the hypothesis about interim  disk read
> 
> errors as a root cause. User data reading has quite a different access stack 
> and is able to retry after such errors hence they aren't that visible.
> 
> But having checksum failures for both DB and user data points to the same 
> root cause at lower layers (kernel, I/O stack etc).
> 
> It might be interesting whether _verify_csum and RocksDB csum were happening 
> nearly at the same period of time. Not even for a single OSD but for 
> different OSDs of the same node.
> 
> This might indicate that node was suffering from some decease at that time. 
> Anything suspicious from system-wide logs for this time period?
> 
>> 
>>> 2) Check if BlueFS spillover is observed for any failing OSDs.
>> As everything is on the same device, there can be no spillover, right?
> Right
>> 
>>> 3) Check "bluestore_reads_with_retries" performance counters for all OSDs 
>>> at nodes in question. See comments 38-42 on the details. Any non-zero 
>>> values?
>> I monitored this over night by repeatedly polling this performance counter 
>> over
>> all OSDs on the mons. Only one OSD, which has crashed in the past, has had a
>> value of 1 since I started measuring. All the other OSDs, including the ones
>> that crashed over night, have a value of 0. Before and after the crash.
> 
> Even a single occurrence isn't expected - this counter should always be equal 
> to 0. And presumably these are peak hours when the cluster is exposed to the 
> issue at most. Night is likely to be not the the peak period though. So 
> please keep tracking...
> 
> 
>> 
>>> 4) Start monitoring RAM usage and swapping for these nodes. Comment 39.
>> The memory use of those nodes is pretty constant with ~6GB free, ~25GB 
>> availble of 256GB.
>> There are also only a handful of pages being swapped, if at all.
>> 
>>> a hypothesis why mon hosts are affected only  - higher memory utilization 
>>> at these nodes is what causes disk reading failures to appear. RAM leakage 
>>> (or excessive utilization) in MON processes or something?
>> Since the memory usage is rather constant I'm not sure this is the case, I 
>> think
>> we would see more of an up/down pattern. However we are not yet monitoring 
>> all
>> processes, and that would be somthing I'd like to get some data on, but I'm 
>> not
>> sure this is the right course of action at the moment.
> 
> Given the fact that colocation with monitors is probably the clue - suggest 
> to track  MON and OSD process at least.
> 
> And high memory pressure is just a working hypothesis for these disk failures 
> root cause. Something else (e.g. high disk utilization) might be another 
> trigger or it might just be wrong...
> 
> So please just pay some attention to this.
> 
>> 
>> What do you think, is it still plausible that we see a memory utilization
>> problem, even though there's little variance in the memory usage patterns?
>> 
>> The approaches we currently consider is to upgrade our kernel and to lower 
>> the memory
>> target somewhat.
>> 
>> Cheers,
>> 
>> Denis
>> 
>> 
>>> On 26 Aug 2020, at 15:29, Igor Fedotov  wrote:
>>> 
>>> Hi Denis,
>>> 
>>> this reminds me the following ticket: https://tracker.ceph.com/issues/37282
>>> 
>>> Please note they mentioned co-location with mon in comment #29.
>>> 
>>> 
>>> Working hypothesis for this ticket is the interim disk read 

[ceph-users] Re: slow "rados ls"

2020-08-27 Thread Marcel Kuiper
Sorry that had to be Wido/Stefan

Another question is: hoe to use this ceph-kvstore-tool tool to compact the
rocksdb? (can't find a lot of examples)

The WAL and DB are on a separate NVMe. The directoy structure for an osd
looks like:

root@se-rc3-st8vfr2t2:/var/lib/ceph/osd# ls -l ceph-174
total 24
lrwxrwxrwx 1 ceph ceph 93 Aug 27 10:12 block ->
/dev/ceph-97d39775-65ef-41a6-a9fe-94a108c0816d/osd-block-7f83916e-7250-4935-89af-d678a9bb9f29
lrwxrwxrwx 1 ceph ceph 27 Aug 27 10:12 block.db ->
/dev/ceph-db-nvme0n1/db-sdd
-rw--- 1 ceph ceph 37 Aug 27 10:12 ceph_fsid
-rw--- 1 ceph ceph 37 Aug 27 10:12 fsid
-rw--- 1 ceph ceph 57 Aug 27 10:12 keyring
-rw--- 1 ceph ceph  6 Aug 27 10:12 ready
-rw--- 1 ceph ceph 10 Aug 27 10:12 type
-rw--- 1 ceph ceph  4 Aug 27 10:12 whoami

Kind Regards

Marcel Kuiper


> Hi Wido/Joost
>
> pg_num is 64. It is not that we use 'rados ls' for operations. We just
> noticed as a difference that on this cluster it takes about 15 seconds to
> return on pool .rgw.root or rc3-se.rgw.buckets.index and our other
> clusters return almost instantaniously
>
> Is there a way that I can determine from statistics that manual compaction
> might help (besides doing the compaction and notice the difference in
> behaviour). Any pointers in investigating this further would be much
> appreciated
>
> Is there operational impact to be expected when compacting manually?
>
> Kind Regards
>
> Marcel Kuiper
>
>>
>>
>> On 26/08/2020 15:59, Stefan Kooman wrote:
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow "rados ls"

2020-08-27 Thread Marcel Kuiper
Hi Wido/Joost

pg_num is 64. It is not that we use 'rados ls' for operations. We just
noticed as a difference that on this cluster it takes about 15 seconds to
return on pool .rgw.root or rc3-se.rgw.buckets.index and our other
clusters return almost instantaniously

Is there a way that I can determine from statistics that manual compaction
might help (besides doing the compaction and notice the difference in
behaviour). Any pointers in investigating this further would be much
appreciated

Is there operational impact to be expected when compacting manually?

Kind Regards

Marcel Kuiper

>
>
> On 26/08/2020 15:59, Stefan Kooman wrote:
>> On 2020-08-26 15:20, Marcel Kuiper wrote:
>>> Hi Vladimir,
>>>
>>> no it is the same on all monitors. Actually I got triggered because I
>>> got
>>> slow responses on my rados gateway with the radosgw-admin command and
>>> narrowed it down to slow respons for rados commands anywhere in the
>>> cluster.
>>
>> Do you have a very large amount of objects. And / or a lot of OMAP data
>> and thus large rocksdb databases? We have seen slowness (and slow ops)
>> from having very large rocksdb databases due to a lot of OMAP data
>> concentrated on only a few nodes (cephfs metadata only). You might
>> suffer from the same thing.
>>
>> Manual rocksdb compaction on the OSDs might help.
>
> In addition: Keep in mind that RADOS was never designed to list objects
> fast. The more Placement Groups you have the slower a listing will be.
>
> Wido
>
>>
>> Gr. Stefan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Infiniband support

2020-08-27 Thread Max Krasilnikov
День добрий! 

 Wed, Aug 26, 2020 at 10:08:57AM -0300, quaglio wrote: 

>Hi,
> I could not see in the doc if Ceph has infiniband support. Is there
>someone using it?
> Also, is there any rdma support working natively?
> 
> Can anyoune point me where to find more information about it?

We're using it as RoCE LAG on Nautilus. There is no visible difference with TCP
mode on 1-year retrospective. Moreover, rbd can't use rdma, but rbd is our
primary use case. So, when creating new cluster I will not configure RDMA in
future. Until things changed.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] export administration regulations issue for ceph community edition

2020-08-27 Thread Peter Parker
Does anyone know of any new statements from the ceph community or foundation 
regarding EAR?
I read the legal page of ceph.com and mentioned some information.
https://ceph.com/legal-page/terms-of-service/

But I am still not sure, if my clients and I are within the scope of the entity 
list, whether the use of ceph community edition complies with the corresponding 
laws.
Whether it affects the current release and future releases.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rados df with nautilus / bluestore

2020-08-27 Thread Manuel Lausch
Hi

we found a very ugly issue in rados df

I have several clusters, all running ceph nautilus (14.2.11), We have
there replicated pools with replica size 4.

On the older clusters "rados df" shows in the used column the net used
space. On our new cluster, rados df shows in the used column the gross
used space.

The older clusters was upgraded from luminous (and before) and uses
filestore and the new cluster is initally deployed with nautilus and
bluestore.

why are the outputs different? Is this related with nautilus or with
bluestore? For our reporting this values have a significant relevance
and now I am running in such discrepancies.

What commands/metrics can I use to get more reliable values. Maybe "ceph
df detail"?


Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RandomCrashes on OSDs Attached to Mon Hosts with Octopus

2020-08-27 Thread Denis Krienbühl
Hi Igor,

Thanks for your input. I tried to gather as much information as I could to
answer your questions. Hopefully we can get to the bottom of this.

> 0) What is backing disks layout for OSDs in question (main device type?, 
> additional DB/WAL devices?).

Everything is on a single Intel NVMe P4510 using dmcrypt with 2 OSDs per NVMe
device. There is no additional DB/WAL device and there are no HDDs involved.

Also note that we use 40 OSDs per host with a memory target of 6'174'015'488.

> 1) Please check all the existing logs for OSDs at "failing" nodes for other 
> checksum errors (as per my comment #38)

I grepped the logs for "checksum mismatch" and "_verify_csum". The only
occurrences I could find where the ones that preceed the crashes.

> 2) Check if BlueFS spillover is observed for any failing OSDs.

As everything is on the same device, there can be no spillover, right?

> 3) Check "bluestore_reads_with_retries" performance counters for all OSDs at 
> nodes in question. See comments 38-42 on the details. Any non-zero values?

I monitored this over night by repeatedly polling this performance counter over
all OSDs on the mons. Only one OSD, which has crashed in the past, has had a
value of 1 since I started measuring. All the other OSDs, including the ones
that crashed over night, have a value of 0. Before and after the crash.

> 4) Start monitoring RAM usage and swapping for these nodes. Comment 39.

The memory use of those nodes is pretty constant with ~6GB free, ~25GB availble 
of 256GB.
There are also only a handful of pages being swapped, if at all.

> a hypothesis why mon hosts are affected only  - higher memory utilization at 
> these nodes is what causes disk reading failures to appear. RAM leakage (or 
> excessive utilization) in MON processes or something?

Since the memory usage is rather constant I'm not sure this is the case, I think
we would see more of an up/down pattern. However we are not yet monitoring all
processes, and that would be somthing I'd like to get some data on, but I'm not
sure this is the right course of action at the moment.

What do you think, is it still plausible that we see a memory utilization
problem, even though there's little variance in the memory usage patterns?

The approaches we currently consider is to upgrade our kernel and to lower the 
memory
target somewhat.

Cheers,

Denis


> On 26 Aug 2020, at 15:29, Igor Fedotov  wrote:
> 
> Hi Denis,
> 
> this reminds me the following ticket: https://tracker.ceph.com/issues/37282
> 
> Please note they mentioned co-location with mon in comment #29.
> 
> 
> Working hypothesis for this ticket is the interim disk read failures which 
> cause RocksDB checksum failures. Earlier we observed such a problem for main 
> device. Presumably it's heavy memory pressure which causes kernel to be 
> failing this way.  See my comment #38 there.
> 
> So I'd like to see answers/comments for the following questions:
> 
> 0) What is backing disks layout for OSDs in question (main device type?, 
> additional DB/WAL devices?).
> 
> 1) Please check all the existing logs for OSDs at "failing" nodes for other 
> checksum errors (as per my comment #38)
> 
> 2) Check if BlueFS spillover is observed for any failing OSDs.
> 
> 3) Check "bluestore_reads_with_retries" performance counters for all OSDs at 
> nodes in question. See comments 38-42 on the details. Any non-zero values?
> 
> 4) Start monitoring RAM usage and swapping for these nodes. Comment 39.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> 
> 
> 
> 
> On 8/26/2020 3:47 PM, Denis Krienbühl wrote:
>> Hi!
>> 
>> We've recently upgraded all our clusters from Mimic to Octopus (15.2.4). 
>> Since
>> then, our largest cluster is experiencing random crashes on OSDs attached to 
>> the
>> mon hosts.
>> 
>> This is the crash we are seeing (cut for brevity, see links in post 
>> scriptum):
>> 
>>{
>>"ceph_version": "15.2.4",
>>"utsname_release": "4.15.0-72-generic",
>>"assert_condition": "r == 0",
>>"assert_func": "void 
>> BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)",
>>"assert_file": "/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc 
>> ",
>>"assert_line": 11430,
>>"assert_thread_name": "bstore_kv_sync",
>>"assert_msg": "/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc 
>> : In function 'void 
>> BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 
>> 7fc56311a700 time 
>> 2020-08-26T08:52:24.917083+0200\n/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc
>>  : 11430: FAILED ceph_assert(r == 0)\n",
>>"backtrace": [
>>"(()+0x12890) [0x7fc576875890]",
>>"(gsignal()+0xc7) [0x7fc575527e97]",
>>"(abort()+0x141) [0x7fc575529801]",
>>"(ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x1a5) [0x559ef9ae97b5]",
>>"(ceph::__ceph_assertf_fail(char const*, char 

[ceph-users] How To Configure Bellsouth Email Settings in a Right Way?

2020-08-27 Thread sofi Hayat
It is required to use the right server and port settings to enjoy all the 
benefits of the Bellsouth email service. It is also recommended everyone to 
configure Bellsouth Email Settings and correctly and appropriately. There are 
few users unable to setup Bellsouth email on Android phone, iPhone, or computer 
device. For such helpless candidates, we provide helpline number by which they 
connect with top-most technicians for quality assistance. Once you contact to 
tech-savvy, your Bellsouth email settings will easily be configured in a 
second.   
https://www.emailsupport.us/blog/bellsouth-email-settings-for-outlook/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How To Configure Bellsouth Email Settings in a Right Way?

2020-08-27 Thread sofi Hayat
It is required to use the right server and port settings to enjoy all the 
benefits of the Bellsouth email service. It is also recommended everyone to 
configure Bellsouth Email Settings and correctly and appropriately. There are 
few users unable to setup Bellsouth email on Android phone, iPhone, or computer 
device. For such helpless candidates, we provide helpline number by which they 
connect with top-most technicians for quality assistance. Once you contact to 
tech-savvy, your Bellsouth email settings will easily be configured in a 
second.   
https://www.emailsupport.us/blog/bellsouth-email-settings-for-outlook/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Add OSD with primary on HDD, WAL and DB on SSD

2020-08-27 Thread Zhenshi Zhou
Official document says that you should allocate 4% of the slow device space
for block.db.
But the main problem is that Bluestore uses RocksDB and RocksDB puts a file
on the fast
device only if it thinks that the whole layer will fit there.
As for RocksDB, L1 is about 300M, L2 is about 3G, L3 is near 30G, and L4 is
about 300G.
For instance, RocksDB puts L2 files to block.db only if it’s at least 3G
there.
As a result, 30G is a acceptable value.

Tony Liu  于2020年8月25日周二 上午10:49写道:

> > -Original Message-
> > From: Anthony D'Atri 
> > Sent: Monday, August 24, 2020 7:30 PM
> > To: Tony Liu 
> > Subject: Re: [ceph-users] Re: Add OSD with primary on HDD, WAL and DB on
> > SSD
> >
> > Why such small HDDs?  Kinda not worth the drive bays and power, instead
> > of the complexity of putting WAL+DB on a shared SSD, might you have been
> > able to just buy SSDs and not split? ymmv.
>
> 2TB is for testing, it will bump up to 10TB for production.
>
> > The limit is a function of the way the DB levels work, it’s not
> > intentional.
> >
> > WAL by default takes a fixed size, like 512 MB or something.
> >
> > 64 GB is a reasonable size, it accomodates the WAL and allows space for
> > DB compaction without overflowing.
>
> For each 10TB HDD, what's the recommended DB device size for both
> DB and WAL? The doc recommends 1% - 4%, meaning 100GB - 400GB for
> each 10TB HDD. But given the WAL data size and DB data size, I am
> not sure if that 100GB - 400GB will be used efficiently.
>
> > With this commit the situation should be improved, though you don’t
> > mention what release you’re running
> >
> > https://github.com/ceph/ceph/pull/29687
>
> I am using ceph version 15.2.4 octopus (stable).
>
> Thanks!
> Tony
>
> > >>>  I don't need to create
> > >>> WAL device, just primary on HDD and DB on SSD, and WAL will be using
> > >>> DB device cause it's faster. Is that correct?
> > >>
> > >> Yes.
> > >>
> > >>
> > >> But be aware that the DB sizes are limited to 3GB, 30GB and 300GB.
> > >> Anything less than those sizes will have a lot of untilised space,
> > >> e.g a 20GB device will only utilise 3GB.
> > >
> > > I have 1 480GB SSD and 7 2TB HDDs. 7 LVs are created on SSD, each is
> > > about 64GB, for 7 OSDs.
> > >
> > > Since it's shared by DB and WAL, DB will take 30GB and WAL will take
> > > the rest 34GB. Is that correct?
> > >
> > > Is that size of DB and WAL good for 2TB HDD (block store and object
> > > store cases)?
> > >
> > > Could you share a bit more about the intention of such limit?
> > >
> > >
> > > Thanks!
> > > Tony
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > > email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io