[ceph-users] Re: have buckets with low number of shards

2021-11-23 Thread DHilsbos
Manoosh;

You can't reshard a bucket without downtime.  During a reshard RGW creates new 
RADOS objects to match the new shard number.  Then all the RGW objects are 
moved from the old RADOS objects to the new RADOS objects, and the original 
RADOS objects are destroyed.  The reshard locks the bucket for the duration.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: mahnoosh shahidi [mailto:mahnooosh@gmail.com] 
Sent: Tuesday, November 23, 2021 8:20 AM
To: Josh Baergen
Cc: Ceph Users
Subject: [ceph-users] Re: have buckets with low number of shards

Hi Josh

Thanks for your response. Do you have any advice how to reshard these big
buckets so it doesn't cause any down time in our cluster? Resharding these
buckets makes a lots of slow ops in deleting old shard phase and the
cluster can't responde to any requests till resharding is completely done.

Regards,
Mahnoosh

On Tue, Nov 23, 2021, 5:28 PM Josh Baergen 
wrote:

> Hey Mahnoosh,
>
> > Running cluster in octopus 15.2.12 . We have a big bucket with about 800M
> > objects and resharding this bucket makes many slow ops in our bucket
> index
> > osds. I wanna know what happens if I don't reshard this bucket any more?
> > How does it affect the performance? The performance problem would be only
> > for that bucket or it affects the entire bucket index pool?
>
> Unfortunately, if you don't reshard the bucket, it's likely that
> you'll see widespread index pool performance and stability issues,
> generally manifesting as one or more OSDs becoming very busy to the
> point of holding up traffic for multiple buckets or even flapping (the
> OSD briefly gets marked down), leading to recovery. Recovering large
> index shards can itself cause issues like this to occur. Although the
> official recommendation, IIRC, is 100K objects per index shard, the
> exact objects per shard count at which one starts to experience these
> sorts of issues highly depends on the hardware involved and user
> workload.
>
> Josh
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to add multiple nodes to a cluster?

2021-11-02 Thread DHilsbos
Zakhar;

When adding nodes I usually set the following:
noin (OSDs register as up, but stay out)
norebalance (new placement shouldn't be calculated when the cluster layout 
changes, I've been bit by this not working as expected, so I also set below)
nobackfill (PGs don't move)

I then remove noin, and wait until all OSDs show as in.
Then I remove norebalance, which will cause a bunch of PGs to become misplaced, 
as it recalculates PG locations.
Finally remove nobackfill, and allow the cluster to recover completely.

The first two reach steady state very quickly.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

-Original Message-
From: Zakhar Kirpichenko [mailto:zak...@gmail.com] 
Sent: Monday, November 1, 2021 11:21 PM
To: ceph-users
Subject: [ceph-users] Best way to add multiple nodes to a cluster?

Hi!

I have a 3-node 16.2.6 cluster with 33 OSDs, and plan to add another 3
nodes of the same configuration to it. What is the best way to add the new
nodes and OSDs so that I can avoid a massive rebalance and performance hit
until all new nodes and OSDs are in place and operational?

I would very much appreciate any advice.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread DHilsbos
Troels;

This sounds like a failure domain issue.  If I remember correctly, Ceph 
defaults to a failure domain of disk (osd), while you need a failure domain of 
host.

Could you do a ceph -s while one of the hosts is offline?  You're looking for 
the HEALTH_ flag, and any errors other than slow ops.

Also, what major version of Ceph are you running?

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: Troels Hansen [mailto:t...@miracle.dk] 
Sent: Monday, October 25, 2021 12:55 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Rebooting one node immediately blocks IO via RGW

I have a strange issue..
Its a 3 node cluster, deployed on Ubuntu, on containers, running version
15.2.4, docker.io/ceph/ceph:v15

Its only running RGW, and everything seems fine, and everyting works.
No errors and the cluster is healthy.

As soon as one node is restarted all IO is blocked, apparently because of
slow ops, but I see no reason for it.

Its running as simple as possible, with a replica count of 3.

The second the OSD's on the halted node dissapears I see slow ops, but its
blocking everything, and there is no IO to the cluster.

The slow requests are spread accross all of the remaining OSD's.

2021-10-20T05:07:02.554282+0200 mon.prodceph-mon1 [WRN] Health check
failed: 0 slow ops, oldest one blocked for 30 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:04.652756+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.585995+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:05.629622+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.629660+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:05.629690+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:06.555735+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:06.677696+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:06.677732+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:06.677750+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:07.553717+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:07.643135+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:07.643159+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:07.643175+0200 osd.13 [WRN] slow request
osd_op(client.3050

[ceph-users] Re: failing dkim

2021-10-25 Thread DHilsbos
MJ;

A lot of mailing lists "rewrite" the origin address to one that matches the 
mailing list server.
Here's an example from the Samba mailing list: "samba 
; on behalf of; Rowland Penny via samba 
".

This mailing list relays the email, without modifying the sender, or the 
envelope address.  For this email, you see a @performair.com email coming from 
a ceph.io (RedHat?) server.

I don't know if that's the cause, but that's a significant difference between 
this and other mailing lists.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Monday, October 25, 2021 7:10 AM
To: ceph-users
Subject: [ceph-users] failing dkim

Hi,

This is not about ceph, but about this ceph-users mailinglist.

We have recently started using DKIM/DMARC/SPF etc, and since then we 
notice that the emails from this ceph-users mailinglist come with either a
- failing DKIM signature
or
- no DKIM signature
at all.

Many of the other mailinglists I am subscribed to (like postfix, samba, 
sogo) generally pass the DKIM verification.

Does this say something about how this particular ceph-users mailinglist 
is setup, or is there something we can do about it..?

Sorry for being off-topic, please reply privately if this is not allowed 
and or appreciated here.

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-10-25 Thread DHilsbos
MJ;

Assuming that you have a replicated pool with 3 replicas and min_size = 2, I 
would think stopping 2 OSD daemons, or 2 OSD containers would guarantee 
HEALTH_ERR.  Similarly, if you have a replicated pool with 2 replicas, still 
with min_size = 2, stopping 1 OSD should do the trick.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Saturday, October 23, 2021 4:06 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: How to make HEALTH_ERR quickly and pain-free



Op 21-01-2021 om 11:57 schreef George Shuklin:
> I have hell of the question: how to make HEALTH_ERR status for a cluster 
> without consequences?
> 
> I'm working on CI tests and I need to check if our reaction to 
> HEALTH_ERR is good. For this I need to take an empty cluster with an 
> empty pool and do something. Preferably quick and reversible.
> 
> For HEALTH_WARN the best thing I found is to change pool size to 1, it 
> raises "1 pool(s) have no replicas configured" warning almost instantly 
> and it can be reverted very quickly for empty pool.

To get HEALTH_WARN we always simply set something like noout, but we 
also wonder if there's a nice way to set HEALTH_ERR, for the same purpose.

Anyone..?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD's fail to start after power loss

2021-10-13 Thread DHilsbos
Todd;

What version of ceph are you running?  Are you running containers or packages?  
Was the cluster installed manually, or using a deployment tool?

Logs provided are for osd ID 31, is ID 31 appropriate for that server?  Have 
you verified that the ceph.conf on that server is intact, and correct?

Your log snippet references /var/lib/ceph/osd/ceph-31/keyring; does this file 
exist?  Does the /var/lib/ceph/osd/ceph-31/ folder exist?  If both exist, are 
the ownership and permissions correct / appropriate?

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: Orbiting Code, Inc. [mailto:supp...@orbitingcode.com] 
Sent: Wednesday, October 13, 2021 7:21 AM
To: ceph-users@ceph.io
Subject: [ceph-users] OSD's fail to start after power loss

Hello Everyone,

I have 3 OSD hosts with 12 OSD's each. After a power failure on 1 host, 
all 12 OSD's fail to start on that host. The other 2 hosts did not lose 
power, and are functioning. Obviously I don't want to restart the 
working hosts at this time. Syslog shows:

Oct 12 17:24:07 osd3 systemd[1]: 
ceph-volume@lvm-31-cae13d9a-1d3d-4003-a57f-6ffac21a682e.service: Main 
process exited, code
=exited, status=1/FAILURE
Oct 12 17:24:07 osd3 systemd[1]: 
ceph-volume@lvm-31-cae13d9a-1d3d-4003-a57f-6ffac21a682e.service: Failed 
with result 'exit-
code'.
Oct 12 17:24:07 osd3 systemd[1]: Failed to start Ceph Volume activation: 
lvm-31-cae13d9a-1d3d-4003-a57f-6ffac21a682e.

This is repeated for all 12 OSD's on the failed host. Running the 
following command, shows additional errors.

root@osd3:/var/log# /usr/bin/ceph-osd -f --cluster ceph --id 31 
--setuser ceph --setgroup ceph
2021-10-12 17:50:23.117 7fce92e6ac00 -1 auth: unable to find a keyring 
on /var/lib/ceph/osd/ceph-31/keyring: (2) No such file or directory
2021-10-12 17:50:23.117 7fce92e6ac00 -1 AuthRegistry(0x55c4ec50aa40) no 
keyring found at /var/lib/ceph/osd/ceph-31/keyring, disabling cephx
2021-10-12 17:50:23.117 7fce92e6ac00 -1 auth: unable to find a keyring 
on /var/lib/ceph/osd/ceph-31/keyring: (2) No such file or directory
2021-10-12 17:50:23.117 7fce92e6ac00 -1 AuthRegistry(0x7ffe9b64eb08) no 
keyring found at /var/lib/ceph/osd/ceph-31/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)

No tmpfs mounts exist for any directories in /var/lib/ceph/osd/ceph-**

Any assistance helping with this situation would be greatly appreciated.

Thank you,
Todd
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster down

2021-10-13 Thread DHilsbos
Jorge;

This sounds, to me, like something to discuss with the proxmox folks.

Unless there was an IP conflict between the rebooted server, and one of the 
existing mons, I can't see the ceph cluster going unavailable.  Further, I 
don't see where anything ceph related would cause hypervisors, on other hosts, 
to restart.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: Jorge JP [mailto:jorg...@outlook.es] 
Sent: Wednesday, October 13, 2021 4:07 AM
To: Marc; ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster down

Hello Marc,

For add node to ceph cluster with Proxmox first I have to install Proxmox hehe, 
this is not the problem.

File configuration is revised and correct. I understand your words but not is 
problem of configuration.

I can understand that cluster can have problems if any servers not configured 
correctly or ports in the switches not configured correctly. But this server 
never became in a member of cluster.

I extracted a part of logfile when ceph down.

A bit weeks ago, I have a problem with a port configuration and remove mtu 9216 
and various hypervisors of cluster proxmox rebooted. But today the server not 
relationated with ceph cluster. Only have public and private ips in same 
network but ports not configured.


De: Marc 
Enviado: miércoles, 13 de octubre de 2021 12:49
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: RE: Cluster down

>
> We currently have a ceph cluster in Proxmox, with 5 ceph nodes with the
> public and private network correctly configured and without problems.
> The state of ceph was optimal.
>
> We had prepared a new server to add to the ceph cluster. We did the
> first step of installing Proxmox with the same version. I was at the
> point where I was setting up the network.

I am not using proxmox, just libvirt. But I would say the most important part 
is your ceph cluster. So before doing anything I would make sure to add the 
ceph node first and then install other things.

> For this step, I did was connect by SSH to the new server and copy the
> network configuration of one of the ceph nodes to this new one. Of
> course, changing the ip addresses.

I would not copy at all. Just change the files manually if you did not edit one 
file correctly or the server reboots before you change the ip addresses you can 
get into all kinds of problems.

> What happened when restarting the network service is that I lost access
> to the cluster. I couldn't access any of the 5 servers that are part of
> the  ceph cluster. Also, 2 of 3 hypervisors
> that we have in the proxmox cluster were restarted directly.

So now you know, you first have to configure networking, then ceph and then 
proxmox. Take your time adding a server. I guess the main reason you are in the 
current situation, you try to do it quick quick.

> Why has this happened if the new server is not yet inside the ceph
> cluster on the proxmox cluster and I don't even have the ports
> configured on my switch?

Without logs nobody is able to tell.

> Do you have any idea?
>
> I do not understand, if now I go and take any server and configure an IP
> of the cluster network and even if the ports are not even configured,
> will the cluster knock me down?

Nothing should happen if you install an OS and use ip addresses in the same 
space as your cluster/client network. Do this first.

> I recovered the cluster by phisically removing the cables from the new
> server.

So wipe it, and start over.

> Thanks a lot and sorry for my english...

No worries, your english is much better than my spanish ;)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster Sync

2021-10-12 Thread DHilsbos
Michel;

I am neither a Ceph evangelist, nor a Ceph expert, but here is my current 
understanding:
Ceph clusters do not have in-built cross cluster synchronization.  That said, 
there are several things which might meet your needs.

1) If you're just planning your Ceph deployment, then the latest release 
(Pacific) introduced the concept of a stretch cluster, essentially a cluster 
which is stretched across datacenters (i.e. a relatively low-bandwidth, 
high-latency link)[1].

2) RADOSGW allows for uni-directional as well as bi-directional synchronization 
of the data that it handles.[2]

3) RBD provides mirroring functionality for the data it handles.[3]

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

[1] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/
[2] https://docs.ceph.com/en/latest/radosgw/sync-modules/
[3] https://docs.ceph.com/en/latest/rbd/rbd-mirroring/


-Original Message-
From: Michel Niyoyita [mailto:mico...@gmail.com] 
Sent: Tuesday, October 12, 2021 8:35 AM
To: ceph-users
Subject: [ceph-users] Ceph cluster Sync

Dear team

I want to build two different cluster: one for primary site and the second
for DR site. I would like to ask if these two cluster can
communicate(synchronized) each other and data written to the PR site be
synchronized to the DR site ,  if once we got trouble for the PR site the
DR automatically takeover.

Please help me for the solution or advise me how to proceed

Best Regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: urgent question about rdb mirror

2021-10-01 Thread DHilsbos
Ignazio;

If your first attempt at asking a question results in no responses, you might 
consider why, before reposting.

I don't use RBD mirroring, so I can only supply theoretical information.

Googling RBD mirroring (for me) results in the below as the first result:
https://docs.ceph.com/en/latest/rbd/rbd-mirroring/

You're also mixing terminology, which confuses the request, and you're overly 
complicating the request.

You ask about mirroring RBD, then you ask about mirroring pools, these are 
different.  A basic Ceph cluster doesn't have the ability to replicate, so 
pools can't replicate / mirror, but the overlay technologies might (RGW and RBD 
do).  Replication is between 2 clusters, however, so the question about 3 
clusters is confusing.

To summarize, my understanding is: You can replicate RBD from A to C (or B), 
and you can replicate B to C (or A), I don't believe that you can replicate A 
to B to C, if that's what you're asking. You CANNOT replicate -pools- on A (or 
B) to C (or any other cluster).

A final thought; when asking about specific functionality, it is usually wise 
to indicate what version you are running, as capabilities are being added all 
the time.

For instance, if you're running Pacific (16.2.x), then you might consider the 
capabilities of a stretched cluster, rather than RBD mirroring.
https://docs.ceph.com/en/pacific/rados/operations/stretch-mode/

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

-Original Message-
From: Ignazio Cassano [mailto:ignaziocass...@gmail.com] 
Sent: Thursday, September 30, 2021 11:38 PM
To: ceph-users
Subject: [ceph-users] urgent question about rdb mirror

Hello All,
Please I would like to know if it is possibile two clusters can mirror rbd
to a third cluster.
In other words I have 3 separated ceph cluster : A  B C.
I would like cluster A and cluster B can mirror some pools on cluster C.
Is it possible ?
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Leader election loop reappears

2021-09-29 Thread DHilsbos
Manuel;

Reading through this mailing list this morning, I can't help but mentally 
connect your issue to Javier's issue.  In part because you're both running 
16.2.6.

Javier's issue seems to be that OSDs aren't registering public / cluster 
network addresses correctly.  His most recent message indicates that he pulled 
the OSD metadata, and found the addresses incorrect there.

I wonder if your rogue MON might have IP addresses registered wrong.  I don't 
know how to get metadata, but if you could that might provide insight.  I might 
also be interesting to extract the current monmap and see what that says.

My thoughts, probably not even worth 2 cents, but there you go.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

-Original Message-
From: Manuel Holtgrewe [mailto:zyklenf...@gmail.com] 
Sent: Wednesday, September 29, 2021 6:43 AM
To: ceph-users
Subject: [ceph-users] Leader election loop reappears

Dear all,

I was a bit too optimistic in my previous email. It looks like the leader
election loop reappeared. I could fix it by stopping the rogue mon daemon
but I don't know how to fix it for good.

I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in
total). I have about 35 HDDs in each server and 4 SSDs. The servers have
about 250 GB of RAM, there is no memory pressure on any daemon. I have an
identical mirror cluster that does not have the issue (but that one does
not have its file system mounted elsewhere and is running no rgws). I have
migrated both clusters recently to cephadm and then from octopus to pacific.

The primary cluster has problems (pulled from the cluster before
fixing/restarting mon daemon):

- `ceph -s` and other commands feel pretty sluggish
- `ceph -s` shows inconsistent results in the "health" section and
"services" overview
- cephfs clients hang and after rebooting the clients, mounting is not
possible any more
- `ceph config dump` prints "monclient: get_monmap_and_config failed to get
config"
- I have a mon leader election loop shown in its journalctl output on the
bottom.
- the primary mds daemon says things like "skipping upkeep work because
connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled
message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from
client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output

I tried to reboot the client that is supposedly not reacting to cache
pressure but that did not help either. The servers are connected to the
same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for
public network. I have disabled firewalld on the nodes but that did not fix
the problem either. I suspect that "laggy monitors" are caused more
probable on the software side than on the network side.

I took down the rogue mon.osd-1 with `docker stop` and it looks like the
problem disappears then.

To summarize: I suspect the cause to be connected to the mon daemons. I
have found that similar problems have been reported a couple of times.

What is the best way forward? It seems that the general suggestion for such
cases is to just "ceph orch redeploy mon", so I did this.

Is there any way to find out the root cause to get rid of it?

Best wishes,
Manuel

osd-1 # ceph -s
  cluster:
id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f
health: HEALTH_WARN
1 clients failing to respond to cache pressure
1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
Low space hindering backfill (add storage if this doesn't
resolve itself): 5 pgs backfill_toofull

  services:
mon: 5 daemons, quorum  (age 4h), out of quorum: osd-1, osd-2, osd-5,
osd-4, osd-3
mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3,
osd-5.jcfyqe, osd-2
mds: 1/1 daemons up, 1 standby
osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs
rgw: 12 daemons active (6 hosts, 2 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 5322 pgs
objects: 263.18M objects, 944 TiB
usage:   1.4 PiB used, 639 TiB / 2.0 PiB avail
pgs: 25576348/789544299 objects misplaced (3.239%)
 5026 active+clean
 291  active+remapped+backfilling
 5active+remapped+backfill_toofull

  io:
client:   165 B/s wr, 0 op/s rd, 0 op/s wr
recovery: 2.3 GiB/s, 652 objects/s

  progress:
Global Recovery Event (53m)
  [==..] (remaining: 3m)

osd-1 # ceph health detail
HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down,
quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage
if this doesn't resolve itself): 5 pgs backfill_toofull
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
respond to cache pressure client_id: 56229355
[WRN] MON_DOWN: 1/5 mons down, quorum osd-1,

[ceph-users] Re: The reason of recovery_unfound pg

2021-08-20 Thread DHilsbos
Satoru;

Ok.  What your cluster is telling you, then, is that it doesn't know which 
replica is the "most current" or "correct" replica.  You will need to determine 
that, and let ceph know which one to use as the "good" replica.  Unfortunately, 
I can't help you with this.  In fact, if this is critical data, I'd seriously 
consider engaging a contractor to help you recover the data, and help your 
cluster return to a fully operational status.

I have found it helpful to set noout, and norebalance, when I intend to reboot 
or offline any of my OSDs.

It's also critical to allow the cluster to return to a cluster state of 
HEALTH_OK in between reboots.

Thank you,

Dominic L. Hilsbos, MBA
Vice President – Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


From: Satoru Takeuchi [mailto:satoru.takeu...@gmail.com] 
Sent: Friday, August 20, 2021 2:48 PM
To: Dominic Hilsbos
Cc: ceph-users
Subject: Re: [ceph-users] Re: The reason of recovery_unfound pg

Hi Dominic,

2021年8月21日(土) 1:05 :
Satoru;

You said " after restarting all nodes one by one."  After each reboot, did you 
allow the cluster the time necessary to come back to a "HEALTH_OK" status?


No, the we rebooted with the following policy.

1. Reboot one machine.
2. Wait until completing reboot as a Kubernetes level (not Ceph cluster level).
3. If there are other nodes to be rebooted, go to step 1.

I should have explained this logic to you as well.
I realized that above logic is wrong and I should wait coming back to HEALTH_OK.
Unfortunately I doesn't understand the meaning of pg state well and there seem
to be several states which mean "pg might be lost".

https://docs.ceph.com/en/latest/rados/operations/pg-states/

Could you tell me that pg can become `recovery_unfoud` state in this case?

Thanks,
Satoru
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The reason of recovery_unfound pg

2021-08-20 Thread DHilsbos
Satoru;

You said " after restarting all nodes one by one."  After each reboot, did you 
allow the cluster the time necessary to come back to a "HEALTH_OK" status?

Thank you,

Dominic L. Hilsbos, MBA
Vice President – Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: Satoru Takeuchi [mailto:satoru.takeu...@gmail.com] 
Sent: Friday, August 20, 2021 8:37 AM
To: ceph-users
Subject: [ceph-users] Re: The reason of recovery_unfound pg

2021年8月21日(土) 0:25 Satoru Takeuchi :
...
> # additional information
>

I forgot to write the important information. All my data have 3 copies.

Regards,
Satoru
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous won't fully recover

2021-07-23 Thread DHilsbos
Sean;

These lines look bad:
14 scrub errors
Reduced data availability: 2 pgs inactive
Possible data damage: 8 pgs inconsistent
osd.95 (root=default,host=hqosd8) is down

I suspect you ran into a hardware issue with one more drives in some of the 
servers that did not go offline.

osd.95 is offline, you need to resolve this.

You should fix your tunables, when you can (probably not part of your current 
issues).

Thank you,

Dominic L. Hilsbos, MBA 
Vice President – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Shain Miley [mailto:smi...@npr.org] 
Sent: Friday, July 23, 2021 10:48 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Luminous won't fully recover

We recently had a few Ceph nodes go offline which required a reboot.  I have 
been able to get the cluster back to the state listed below however it does not 
seem like it will progress past the point of 23473/287823588 objects misplaced.



Yesterday it was about 13% of the data that was misplaced…however this morning 
it has goteen to 0.008% but has not moved past this point in about an hour.



Does anyone see anything in the output below that points to the problem and/or 
are there any suggestions that I can follow in order to figure out why the 
cluster health is not moving beyond this point?





---

root@rbd1:~# ceph -s

cluster:

id: 504b5794-34bd-44e7-a8c3-0494cf800c23

health: HEALTH_ERR

crush map has legacy tunables (require argonaut, min is firefly)

23473/287823588 objects misplaced (0.008%)

14 scrub errors

Reduced data availability: 2 pgs inactive

Possible data damage: 8 pgs inconsistent



  services:

mon: 3 daemons, quorum hqceph1,hqceph2,hqceph3

mgr: hqceph2(active), standbys: hqceph3

osd: 288 osds: 270 up, 270 in; 2 remapped pgs

rgw: 1 daemon active



  data:

pools:   17 pools, 9411 pgs

objects: 95.95M objects, 309TiB

usage:   936TiB used, 627TiB / 1.53PiB avail

pgs: 0.021% pgs not active

 23473/287823588 objects misplaced (0.008%)

 9369 active+clean

 30   active+clean+scrubbing+deep

 8active+clean+inconsistent

 2activating+remapped

 2active+clean+scrubbing



  io:

client:   1000B/s rd, 0B/s wr, 0op/s rd, 0op/s wr



root@rbd1:~# ceph health detail

HEALTH_ERR crush map has legacy tunables (require argonaut, min is firefly); 1 
osds down; 23473/287823588 objects misplaced (0.008%); 14 scrub errors; Reduced 
data availability: 3 pgs inactive, 13 pgs peering; Possible data damage: 8 pgs 
inconsistent; Degraded data redundancy: 408658/287823588 objects degraded 
(0.142%), 38 pgs degraded

OLD_CRUSH_TUNABLES crush map has legacy tunables (require argonaut, min is 
firefly)

see http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables

OSD_DOWN 1 osds down

osd.95 (root=default,host=hqosd8) is down

OBJECT_MISPLACED 23473/287823588 objects misplaced (0.008%)

OSD_SCRUB_ERRORS 14 scrub errors

PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 13 pgs peering

pg 3.b41 is stuck peering for 106.682058, current state peering, last 
acting [204,190]

pg 3.c33 is stuck peering for 103.403643, current state peering, last 
acting [228,274]

pg 3.d15 is stuck peering for 128.537454, current state peering, last 
acting [286,24]

pg 3.fa9 is stuck peering for 106.526146, current state peering, last 
acting [286,47]

pg 3.fb7 is stuck peering for 105.878878, current state peering, last 
acting [62,97]

pg 3.13a2 is stuck peering for 106.491138, current state peering, last 
acting [270,219]

pg 3.1521 is stuck inactive for 170180.165265, current state 
activating+remapped, last acting [94,186,188]

pg 3.1565 is stuck peering for 106.782784, current state peering, last 
acting [121,60]

pg 3.157c is stuck peering for 128.557448, current state peering, last 
acting [128,268]

pg 3.1744 is stuck peering for 106.639603, current state peering, last 
acting [192,142]

pg 3.1ac8 is stuck peering for 127.839550, current state peering, last 
acting [221,190]

pg 3.1e24 is stuck peering for 128.201670, current state peering, last 
acting [118,158]

pg 3.1e46 is stuck inactive for 169121.764376, current state 
activating+remapped, last acting [87,199,170]

pg 18.36 is stuck peering for 128.554121, current state peering, last 
acting [204]

pg 21.1ce is stuck peering for 106.582584, current state peering, last 
acting [266,192]

PG_DAMAGED Possible data damage: 8 pgs inconsistent

pg 3.1ca is active+clean+inconsistent, acting [201,8,180]

pg 3.56a is active+clean+inconsistent, acting [148,240,8]

pg 3.b0f is active+clean+inconsistent, acting [148,260,8]

pg 3.b56 is active+clean+inconsistent, acting [218,8,240]

   

[ceph-users] Re: Issue with Nautilus upgrade from Luminous

2021-07-09 Thread DHilsbos
Suresh;

I don't believe we use tunables, so I'm not terribly familiar with them.

A quick Google search ("ceph tunable") supplied the following pages:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.2.3/html/storage_strategies/crush_tunables
https://docs.ceph.com/en/latest/rados/operations/crush-map/#tunables

Thank you,

Dominic L. Hilsbos, MBA 
Vice President - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Suresh Rama [mailto:sstk...@gmail.com] 
Sent: Thursday, July 8, 2021 7:25 PM
To: ceph-users
Subject: [ceph-users] Issue with Nautilus upgrade from Luminous

Dear All,

We have 13  Ceph clusters and we started upgrading one by one from Luminous
to Nautilus. Post upgrade started fixing the warning alerts and had issues
setting "*ceph config set mon mon_crush_min_required_version firefly" *yielded
no results.  Updated the mon config and restart the daemons the warning
didn't go away

I have also tried to set it to hammer and no use.  The warning is still
there.  Do you have any recommendations?  I thought of changing it to
hammer so I can use straw2 but I was stuck with warning message.  I have
also bounced the nodes and the issue remains the same.

Please review and share your inputs.

  cluster:
id: xxx
health: HEALTH_WARN
crush map has legacy tunables (require firefly, min is hammer)
1 pools have many more objects per pg than average
15252 pgs not deep-scrubbed in time
21399 pgs not scrubbed in time
clients are using insecure global_id reclaim
mons are allowing insecure global_id reclaim
3 monitors have not enabled msgr2


ceph daemon mon.$(hostname -s) config show |grep -i
mon_crush_min_required_version
"mon_crush_min_required_version": "firefly",

ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "firefly",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

ceph config dump
WHO   MASK LEVELOPTION VALUE   RO
  mon  advanced mon_crush_min_required_version firefly *

ceph versions
{
"mon": {
"ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6)
nautilus (stable)": 549,
"ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)": 226
},
"mds": {},
"rgw": {
"ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)": 2
},
"overall": {
"ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6)
nautilus (stable)": 549,
"ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)": 234
}
}

ceph -s
  cluster:
id:xx
health: HEALTH_WARN
crush map has legacy tunables (require firefly, min is hammer)
1 pools have many more objects per pg than average
13811 pgs not deep-scrubbed in time
19994 pgs not scrubbed in time
clients are using insecure global_id reclaim
mons are allowing insecure global_id reclaim
3 monitors have not enabled msgr2

  services:
mon: 3 daemons, quorum
pistoremon-ho-c01,pistoremon-ho-c02,pistoremon-ho-c03 (age 24s)
mgr: pistoremon-ho-c02(active, since 2m), standbys: pistoremon-ho-c01,
pistoremon-ho-c03
osd: 800 osds: 775 up (since 105m), 775 in
rgw: 2 daemons active (pistorergw-ho-c01, pistorergw-ho-c02)

  task status:

  data:
pools:   28 pools, 27336 pgs
objects: 107.19M objects, 428 TiB
usage:   1.3 PiB used, 1.5 PiB / 2.8 PiB avail
pgs: 27177 active+clean
 142   active+clean+scrubbing+deep
 17active+clean+scrubbing

  io:
client:   220 MiB/s rd, 1.9 GiB/s wr, 7.07k op/s rd, 25.42k op/s wr

-- 
Regards,
Suresh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

2021-06-25 Thread DHilsbos
Christian;

Do the second site's RGW instance(s) have access to the first site's OSDs?  Is 
the reverse true?

It's been a while since I set up the multi-site sync between our clusters, but 
I seem to remember that, while metadata is exchanged RGW1<-->RGW2, data is 
exchanged OSD1<-->RGW2.

Anyone else on the list, PLEASE correct me if I'm wrong.

Thank you,

Dominic L. Hilsbos, MBA 
Vice President – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Christian Rohmann [mailto:christian.rohm...@inovex.de] 
Sent: Friday, June 25, 2021 9:25 AM
To: ceph-users@ceph.io
Subject: [ceph-users] rgw multisite sync not syncing data, error: 
RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log 
shards

Hey ceph-users,


I setup a multisite sync between two freshly setup Octopus clusters.
In the first cluster I created a bucket with some data just to test the 
replication of actual data later.

I then followed the instructions on 
https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site-system-to-multi-site
 
to add a second zone.

Things went well and both zones are now happily reaching each other and 
the API endpoints are talking.
Also the metadata is in sync already - both sides are happy and I can 
see bucket listings and users are "in sync":


> # radosgw-admin sync status
>   realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
>   zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
>    zone 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
>   metadata sync no sync (zone is master)
>   data sync source: c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
>     init
>     full sync: 128/128 shards
>     full sync: 0 buckets to sync
>     incremental sync: 0/128 shards
>     data is behind on 128 shards
>     behind shards: [0...127]
>

and on the other side ...

> # radosgw-admin sync status
>   realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst)
>   zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra)
>    zone c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1)
>   metadata sync syncing
>     full sync: 0/64 shards
>     incremental sync: 64/64 shards
>     metadata is caught up with master
>   data sync source: 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn)
>     init
>     full sync: 128/128 shards
>     full sync: 0 buckets to sync
>     incremental sync: 0/128 shards
>     data is behind on 128 shards
>     behind shards: [0...127]
>


also the newly created buckets (read: their metadata) is synced.



What is apparently not working in the sync of actual data.

Upon startup the radosgw on the second site shows:

> 2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: start
> 2021-06-25T16:15:06.445+ 7fe71eff5700  1 RGW-SYNC:meta: realm 
> epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736
> 2021-06-25T16:15:11.525+ 7fe71dff3700  0 
> RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read remote 
> data log shards
>

also when issuing

# radosgw-admin data sync init --source-zone obst-rgn

it throws

> 2021-06-25T16:20:29.167+ 7f87c2aec080 0 
> RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data 
> log shards





Does anybody have any hints on where to look for what could be broken here?

Thanks a bunch,
Regards


Christian





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Strategy for add new osds

2021-06-15 Thread DHilsbos
Personally, when adding drives like this, I set noin (ceph osd set noin), and 
norebalance (ceph osd set norebalance).  Like your situation, we run smaller 
clusters; our largest cluster only has 18 OSDs.

That keeps the cluster from starting data moves until all new drives are in 
place.  Don't forget to unset these values (ceph osd unset noin, ceph osd unset 
norebalance).

There are also values you can tune to control whether user traffic or recovery 
traffic gets precedent while data is moving.

Thank you,

Dominic L. Hilsbos, MBA 
Vice President - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Kai Börnert [mailto:kai.boern...@posteo.de] 
Sent: Tuesday, June 15, 2021 8:20 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Strategy for add new osds

Hi,

as far as I understand it,

you get no real benefit with doing them one by one, as each osd add, can 
cause a lot of data to be moved to a different osd, even tho you just 
rebalanced it.

The algorithm determining the placement of pg's does not take the 
current/historic placement into account, so changing anything at this, 
could cause any amount of data to migrate, with each change

Greetings,

Kai

On 6/15/21 5:06 PM, Jorge JP wrote:
> Hello,
>
> I have a ceph cluster with 5 nodes (1 hdd each node). I want to add 5 more 
> drives (hdd) to expand my cluster. What is the best strategy for this?
>
> I will add each drive in each node but is a good strategy add one drive and 
> wait to rebalance the data to new osd for add new osd? or maybe.. I should be 
> add the 5 drives without wait rebalancing and ceph rebalancing the data to 
> all new osd?
>
> Thank you.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread DHilsbos
Only if you also look at why containers are bad in general, as that also 
applies to ceph as well.

Dominic L. Hilsbos, MBA 
Vice President - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Fox, Kevin M [mailto:kevin@pnnl.gov] 
Sent: Wednesday, June 2, 2021 9:27 AM
To: Sasha Litvak; ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Debating containers vs packages is like debating systemd vs initrd. There are 
lots of reasons why containers (and container orchestration) are good for 
deploying things, including ceph. Repeating them in each project every time it 
comes up is not really productive. I'd recommend looking at why containers are 
good in general. It applies to ceph too.


From: Sasha Litvak 
Sent: Wednesday, June 2, 2021 7:56 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Check twice before you click! This email originated from outside PNNL.


Is there a link of the talk  I can use as a reference?  I would like to
look at the pro container points as this post is getting a little bit one
sided.  I understand that most people prefer things to be stable especially
with the underlying storage systems.  To me personally, use of
containers in general adds a great flexibility because it
detaches underlying OS from the running software.  All points are fair
about adding complexity to the complex system but one thing is missing.
Every time developers decide to introduce some new more efficient libraries
or frameworks we hit a distribution dependency hell.  Because of that, ceph
sometimes abandons entire OS versions before their actual lifetime is
over.  My resources are limited and  I don't want to debug / troubleshoot
/ upgrade OS in addition to ceph itself, hence the containers.  Yes  it
took me a while to warm up to the idea in general but now I don't even
think too much about it.  I went from Nautilus to Pacific (Centos 7 to
Centos 8) within a few hours without needing to upgrade my Ubuntu bare
metal nodes.

This said,  I am for giving people a choice to use packages + ansible /
manual install and also allowing manual install of containers.  Forcing
users' hands too much may make people avoid upgrading their ceph clusters.


On Wed, Jun 2, 2021 at 9:27 AM Dave Hall  wrote:

> I'd like to pick up on something that Matthew alluded to, although what I'm
> saying may not be popular.  I agree that containers are compelling for
> cloud deployment and application scaling, and we can all be glad that
> container technology has progressed from the days of docker privilege
> escalation.  I also agree that for Ceph users switching from native Ceph
> processes to containers carries a learning curve that could be as
> intimidating as learning Ceph itself.
>
> However, here is where I disagree with containerized Ceph:  I worked for 19
> years as a software developer for a major world-wide company.  In that
> time, I noticed that most of the usability issues experienced by customers
> were due to the natural and understandable tendency for software developers
> to program in a way that's easy for the programmer, and in the process to
> lose sight of what's easy for the user.
>
> In the case of Ceph, containers make it easier for the developers to
> produce and ship releases.  It reduces dependency complexities and testing
> time.  But the developers aren't out in the field with their deployments
> when something weird impacts a cluster and the standard approaches don't
> resolve it.  And let's face it:  Ceph is a marvelously robust solution for
> large scale storage, but it is also an amazingly intricate matrix of
> layered interdependent processes, and you haven't got all of the bugs
> worked out yet.
>
> Just note that the beauty of software (or really of anything that is
> 'designed') is that a few people (developers) can produce something that a
> large number of people (storage administrators, or 'users') will want to
> use.
>
> Please remember the ratio of users (cluster administrators) to developers
> and don't lose sight of the users in working to ease and simplify
> development.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
>
> On Wed, Jun 2, 2021 at 5:37 AM Matthew Vernon  wrote:
>
> > Hi,
> >
> > In the discussion after the Ceph Month talks yesterday, there was a bit
> > of chat about cephadm / containers / packages. IIRC, Sage observed that
> > a common reason in the recent user survey for not using cephadm was that
> > it only worked on containerised deployments. I think he then went on to
> > say that he hadn't heard any compelling reasons why not to use
> > containers, and suggested that resistance was essentially a user
> > education question[0].
> >
> > I'd like to suggest, briefly, that:
> >
> > * containerised 

[ceph-users] Re: Revisit Large OMAP Objects

2021-04-14 Thread DHilsbos
Casey;

That makes sense, and I appreciate the explanation.

If I were to shut down all uses of RGW, and wait for replication to catch up, 
would this then address most known issues with running this command in a 
multi-site environment?  Can I offline RADOSGW daemons as an added precaution?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Casey Bodley [mailto:cbod...@redhat.com] 
Sent: Wednesday, April 14, 2021 9:03 AM
To: Dominic Hilsbos
Cc: k0...@k0ste.ru; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Revisit Large OMAP Objects

On Wed, Apr 14, 2021 at 11:44 AM  wrote:
>
> Konstantin;
>
> Dynamic resharding is disabled in multisite environments.
>
> I believe you mean radosgw-admin reshard stale-instances rm.
>
> Documentation suggests this shouldn't be run in a multisite environment.  
> Does anyone know the reason for this?

say there's a bucket with 10 objects in it, and that's been fully
replicated to a secondary zone. if you want to remove the bucket, you
delete its objects then delete the bucket

when the bucket is deleted, rgw can't delete its bucket instance yet
because the secondary zone may not be caught up with sync - it
requires access to the bucket instance (and its index) to sync those
last 10 object deletions

so the risk with 'stales-instances rm' in multisite is that you might
delete instances before other zones catch up, which can lead to
orphaned objects

>
> Is it, in fact, safe, even in a multisite environment?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Konstantin Shalygin [mailto:k0...@k0ste.ru]
> Sent: Wednesday, April 14, 2021 12:15 AM
> To: Dominic Hilsbos
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Revisit Large OMAP Objects
>
> Run reshard instances rm
> And reshard your bucket by hand or leave dynamic resharding process to do 
> this work
>
>
> k
>
> Sent from my iPhone
>
> > On 13 Apr 2021, at 19:33, dhils...@performair.com wrote:
> >
> > All;
> >
> > We run 2 Nautilus clusters, with RADOSGW replication (14.2.11 --> 14.2.16).
> >
> > Initially our bucket grew very quickly, as I was loading old data into it 
> > and we quickly ran into Large OMAP Object warnings.
> >
> > I have since done a couple manual reshards, which has fixed the warning on 
> > the primary cluster.  I have never been able to get rid of the issue on the 
> > cluster with the replica.
> >
> > I prior conversation on this list led me to this command:
> > radosgw-admin reshard stale-instances list --yes-i-really-mean-it
> >
> > The results of which look like this:
> > [
> >"nextcloud-ra:f91aeff8-a365-47b4-a1c8-928cd66134e8.185262.1",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.6",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.2",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.5",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.4",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.3",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.1",
> >"3520ae821f974340afd018110c1065b8/OS 
> > Development:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.1",
> >
> > "10dfdfadb7374ea1ba37bee1435d87ad/volumebackups:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.2",
> >"WorkOrder:f91aeff8-a365-47b4-a1c8-928cd66134e8.44130.1"
> > ]
> >
> > I find this particularly interesting, as nextcloud-ra, /OS 
> > Development, /volumbackups, and WorkOrder buckets no longer exist.
> >
> > When I run:
> > for obj in $(rados -p 300.rgw.buckets.index ls | grep 
> > f91aeff8-a365-47b4-a1c8-928cd66134e8.3512190.1);   do   printf "%-60s 
> > %7d\n" $obj $(rados -p 300.rgw.buckets.index listomapkeys $obj | wc -l);   
> > done
> >
> > I get the expected 64 entries, with counts around 2 +/- 1000.
> >
> > Are the above listed stale instances ok to delete?  If so, how do I go 
> > about doing so?
> >
> > Thank you,
> >
> > Dominic L. Hilsbos, MBA
> > Director - Information Technology
> > Perform Air International Inc.
> > dhils...@performair.com
> > www.PerformAir.com
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Revisit Large OMAP Objects

2021-04-14 Thread DHilsbos
Konstantin;

Dynamic resharding is disabled in multisite environments.

I believe you mean radosgw-admin reshard stale-instances rm.

Documentation suggests this shouldn't be run in a multisite environment.  Does 
anyone know the reason for this?

Is it, in fact, safe, even in a multisite environment?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Konstantin Shalygin [mailto:k0...@k0ste.ru] 
Sent: Wednesday, April 14, 2021 12:15 AM
To: Dominic Hilsbos
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Revisit Large OMAP Objects

Run reshard instances rm
And reshard your bucket by hand or leave dynamic resharding process to do this 
work


k

Sent from my iPhone

> On 13 Apr 2021, at 19:33, dhils...@performair.com wrote:
> 
> All;
> 
> We run 2 Nautilus clusters, with RADOSGW replication (14.2.11 --> 14.2.16).
> 
> Initially our bucket grew very quickly, as I was loading old data into it and 
> we quickly ran into Large OMAP Object warnings.
> 
> I have since done a couple manual reshards, which has fixed the warning on 
> the primary cluster.  I have never been able to get rid of the issue on the 
> cluster with the replica.
> 
> I prior conversation on this list led me to this command:
> radosgw-admin reshard stale-instances list --yes-i-really-mean-it
> 
> The results of which look like this:
> [
>"nextcloud-ra:f91aeff8-a365-47b4-a1c8-928cd66134e8.185262.1",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.6",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.2",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.5",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.4",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.3",
>"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.1",
>"3520ae821f974340afd018110c1065b8/OS 
> Development:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.1",
>
> "10dfdfadb7374ea1ba37bee1435d87ad/volumebackups:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.2",
>"WorkOrder:f91aeff8-a365-47b4-a1c8-928cd66134e8.44130.1"
> ]
> 
> I find this particularly interesting, as nextcloud-ra, /OS 
> Development, /volumbackups, and WorkOrder buckets no longer exist.
> 
> When I run:
> for obj in $(rados -p 300.rgw.buckets.index ls | grep 
> f91aeff8-a365-47b4-a1c8-928cd66134e8.3512190.1);   do   printf "%-60s %7d\n" 
> $obj $(rados -p 300.rgw.buckets.index listomapkeys $obj | wc -l);   done
> 
> I get the expected 64 entries, with counts around 2 +/- 1000.
> 
> Are the above listed stale instances ok to delete?  If so, how do I go about 
> doing so?
> 
> Thank you,
> 
> Dominic L. Hilsbos, MBA 
> Director - Information Technology 
> Perform Air International Inc.
> dhils...@performair.com 
> www.PerformAir.com
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Revisit Large OMAP Objects

2021-04-13 Thread DHilsbos
All;

We run 2 Nautilus clusters, with RADOSGW replication (14.2.11 --> 14.2.16).

Initially our bucket grew very quickly, as I was loading old data into it and 
we quickly ran into Large OMAP Object warnings.

I have since done a couple manual reshards, which has fixed the warning on the 
primary cluster.  I have never been able to get rid of the issue on the cluster 
with the replica.

I prior conversation on this list led me to this command:
radosgw-admin reshard stale-instances list --yes-i-really-mean-it

The results of which look like this:
[
"nextcloud-ra:f91aeff8-a365-47b4-a1c8-928cd66134e8.185262.1",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.6",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.2",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.5",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.4",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.3",
"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.1",
"3520ae821f974340afd018110c1065b8/OS 
Development:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.1",

"10dfdfadb7374ea1ba37bee1435d87ad/volumebackups:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.2",
"WorkOrder:f91aeff8-a365-47b4-a1c8-928cd66134e8.44130.1"
]

I find this particularly interesting, as nextcloud-ra, /OS Development, 
/volumbackups, and WorkOrder buckets no longer exist.

When I run:
for obj in $(rados -p 300.rgw.buckets.index ls | grep 
f91aeff8-a365-47b4-a1c8-928cd66134e8.3512190.1);   do   printf "%-60s %7d\n" 
$obj $(rados -p 300.rgw.buckets.index listomapkeys $obj | wc -l);   done

I get the expected 64 entries, with counts around 2 +/- 1000.

Are the above listed stale instances ok to delete?  If so, how do I go about 
doing so?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag

2021-04-12 Thread DHilsbos
Igor;

Does this only impact CephFS then?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Igor Fedotov [mailto:ifedo...@suse.de] 
Sent: Monday, April 12, 2021 9:16 AM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSDs RocksDB corrupted when upgrading 
nautilus->octopus: unknown WriteBatch tag

The workaround would be to disable bluestore_fsck_quick_fix_on_mount, do 
an upgrade and then do a regular fsck.

Depending on fsck  results either proceed with a repair or not.


Thanks,

Igor


On 4/12/2021 6:35 PM, dhils...@performair.com wrote:
> Is there a way to check for these zombie blobs, and other issues needing 
> repair, prior to the upgrade?  That would allow us to know that issues might 
> be coming, and perhaps address them before they result in corrupt OSDs.
>
> I'm considering upgrading our clusters from 14 to 15, and would really like 
> to avoid these kinds of issues.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
> -Original Message-
> From: Igor Fedotov [mailto:ifedo...@suse.de]
> Sent: Monday, April 12, 2021 7:55 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: OSDs RocksDB corrupted when upgrading 
> nautilus->octopus: unknown WriteBatch tag
>
> Sorry for being too late to the party...
>
> I think the root cause is related to the high amount of repairs made
> during the first post-upgrade fsck run.
>
> The check (and fix) for zombie spanning blobs was been backported to
> v15.2.9 (here is the PR https://github.com/ceph/ceph/pull/39256). And I
> presumt it's the one which causes BlueFS data corruption due to huge
> transaction happening during such a repair.
>
> I haven't seen this exact issue (as having that many zombie blobs is a
> rarely met bug by itself) but we had to some degree similar issue with
> upgrading omap names, see: https://github.com/ceph/ceph/pull/39377
>
> Huge resulting transaction could cause too big write to WAL which in
> turn caused data corruption (see https://github.com/ceph/ceph/pull/39701)
>
> Although the fix for the latter has been merged for 15.2.10 some
> additional issues with huge transactions might still exist...
>
>
> If someone can afford another OSD loss it could be interesting to get an
> OSD log for such a repair with debug-bluefs set to 20...
>
> I'm planning to make a fix to cap transaction size for repair in the
> nearest future anyway though..
>
>
> Thanks,
>
> Igor
>
>
> On 4/12/2021 5:15 PM, Dan van der Ster wrote:
>> Too bad. Let me continue trying to invoke Cunningham's Law for you ... ;)
>>
>> Have you excluded any possible hardware issues?
>>
>> 15.2.10 has a new option to check for all zero reads; maybe try it with true?
>>
>>   Option("bluefs_check_for_zeros", Option::TYPE_BOOL, Option::LEVEL_DEV)
>>   .set_default(false)
>>   .set_flag(Option::FLAG_RUNTIME)
>>   .set_description("Check data read for suspicious pages")
>>   .set_long_description("Looks into data read to check if there is a
>> 4K block entirely filled with zeros. "
>>   "If this happens, we re-read data. If there is
>> difference, we print error to log.")
>>   .add_see_also("bluestore_retry_disk_reads"),
>>
>> The "fix zombie spanning blobs" feature was added in 15.2.9. Does
>> 15.2.8 work for you?
>>
>> Cheers, Dan
>>
>> On Sun, Apr 11, 2021 at 10:17 PM Jonas Jelten  wrote:
>>> Thanks for the idea, I've tried it with 1 thread, and it shredded another 
>>> OSD.
>>> I've updated the tracker ticket :)
>>>
>>> At least non-racecondition bugs are hopefully easier to spot...
>>>
>>> I wouldn't just disable the fsck and upgrade anyway until the cause is 
>>> rooted out.
>>>
>>> -- Jonas
>>>
>>>
>>> On 29/03/2021 14.34, Dan van der Ster wrote:
 Hi,

 Saw that, looks scary!

 I have no experience with that particular crash, but I was thinking
 that if you have already backfilled the degraded PGs, and can afford
 to try another OSD, you could try:

   "bluestore_fsck_quick_fix_threads": "1",  # because
 https://github.com/facebook/rocksdb/issues/5068 showed a similar crash
 and the dev said it occurs because WriteBatch is not thread safe.

   "bluestore_fsck_quick_fix_on_mount": "false", # should disable the
 fsck during upgrade. See https://github.com/ceph/ceph/pull/40198

 -- Dan

 On Mon, Mar 29, 2021 at 2:23 PM Jonas Jelten  wrote:
> Hi!
>
> After upgrading MONs and MGRs successfully, the first OSD host I upgraded 
> on Ubuntu Bionic from 14.2.16 to 15.2.10
> shredded all OSDs on it by corrupting RocksDB, and they now refuse to 
> boot.
> RocksDB complains "Corruption: unknown WriteBatch tag".
>
> The initial crash/corruption occured when t

[ceph-users] Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag

2021-04-12 Thread DHilsbos
Is there a way to check for these zombie blobs, and other issues needing 
repair, prior to the upgrade?  That would allow us to know that issues might be 
coming, and perhaps address them before they result in corrupt OSDs.

I'm considering upgrading our clusters from 14 to 15, and would really like to 
avoid these kinds of issues.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Igor Fedotov [mailto:ifedo...@suse.de] 
Sent: Monday, April 12, 2021 7:55 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: OSDs RocksDB corrupted when upgrading 
nautilus->octopus: unknown WriteBatch tag

Sorry for being too late to the party...

I think the root cause is related to the high amount of repairs made 
during the first post-upgrade fsck run.

The check (and fix) for zombie spanning blobs was been backported to 
v15.2.9 (here is the PR https://github.com/ceph/ceph/pull/39256). And I 
presumt it's the one which causes BlueFS data corruption due to huge 
transaction happening during such a repair.

I haven't seen this exact issue (as having that many zombie blobs is a 
rarely met bug by itself) but we had to some degree similar issue with 
upgrading omap names, see: https://github.com/ceph/ceph/pull/39377

Huge resulting transaction could cause too big write to WAL which in 
turn caused data corruption (see https://github.com/ceph/ceph/pull/39701)

Although the fix for the latter has been merged for 15.2.10 some 
additional issues with huge transactions might still exist...


If someone can afford another OSD loss it could be interesting to get an 
OSD log for such a repair with debug-bluefs set to 20...

I'm planning to make a fix to cap transaction size for repair in the 
nearest future anyway though..


Thanks,

Igor


On 4/12/2021 5:15 PM, Dan van der Ster wrote:
> Too bad. Let me continue trying to invoke Cunningham's Law for you ... ;)
>
> Have you excluded any possible hardware issues?
>
> 15.2.10 has a new option to check for all zero reads; maybe try it with true?
>
>  Option("bluefs_check_for_zeros", Option::TYPE_BOOL, Option::LEVEL_DEV)
>  .set_default(false)
>  .set_flag(Option::FLAG_RUNTIME)
>  .set_description("Check data read for suspicious pages")
>  .set_long_description("Looks into data read to check if there is a
> 4K block entirely filled with zeros. "
>  "If this happens, we re-read data. If there is
> difference, we print error to log.")
>  .add_see_also("bluestore_retry_disk_reads"),
>
> The "fix zombie spanning blobs" feature was added in 15.2.9. Does
> 15.2.8 work for you?
>
> Cheers, Dan
>
> On Sun, Apr 11, 2021 at 10:17 PM Jonas Jelten  wrote:
>> Thanks for the idea, I've tried it with 1 thread, and it shredded another 
>> OSD.
>> I've updated the tracker ticket :)
>>
>> At least non-racecondition bugs are hopefully easier to spot...
>>
>> I wouldn't just disable the fsck and upgrade anyway until the cause is 
>> rooted out.
>>
>> -- Jonas
>>
>>
>> On 29/03/2021 14.34, Dan van der Ster wrote:
>>> Hi,
>>>
>>> Saw that, looks scary!
>>>
>>> I have no experience with that particular crash, but I was thinking
>>> that if you have already backfilled the degraded PGs, and can afford
>>> to try another OSD, you could try:
>>>
>>>  "bluestore_fsck_quick_fix_threads": "1",  # because
>>> https://github.com/facebook/rocksdb/issues/5068 showed a similar crash
>>> and the dev said it occurs because WriteBatch is not thread safe.
>>>
>>>  "bluestore_fsck_quick_fix_on_mount": "false", # should disable the
>>> fsck during upgrade. See https://github.com/ceph/ceph/pull/40198
>>>
>>> -- Dan
>>>
>>> On Mon, Mar 29, 2021 at 2:23 PM Jonas Jelten  wrote:
 Hi!

 After upgrading MONs and MGRs successfully, the first OSD host I upgraded 
 on Ubuntu Bionic from 14.2.16 to 15.2.10
 shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot.
 RocksDB complains "Corruption: unknown WriteBatch tag".

 The initial crash/corruption occured when the automatic fsck was ran, and 
 when it committed the changes for a lot of "zombie spanning blobs".

 Tracker issue with logs: https://tracker.ceph.com/issues/50017


 Anyone else encountered this error? I've "suspended" the upgrade for now :)

 -- Jonas
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To u

[ceph-users] Re: First 6 nodes cluster with Octopus

2021-03-30 Thread DHilsbos
Mabi;

We're running Nautilus, and I am not wholly convinced of the "everything in 
containers" view of the world, so take this with a small grain  of salt...

1) We don't run Ubuntu, sorry.  I suspect the documentation highlights 18.04 
because it's the current LTS release.  Personally, if I had a preference of 
20.04 over 18.04, I would attempt to build a cluster on 20.04, and see how it 
goes.  You might also look at this: 
https://www.server-world.info/en/note?os=Ubuntu_20.04&p=ceph15&f=1

2) Containers are the preferred way of doing things in Octopus, so yes it's 
considered stable.

3) Our first evaluation cluster was 3 Intel Atom C3000 nodes, with each node 
running all the daemons (MON, MGR, MDS, 2 x OSD).  Worked fine, and allowed me 
to demonstrate the concepts in a size I could carry around.

4) Yes, and No...  When the Cluster is happy, everything is generally happy.  
In certain Warning and Error situations, MONs can chew through the HD space 
fairly quickly.  I'm not familiar with the HD usage of the daemons.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: mabi [mailto:m...@protonmail.ch] 
Sent: Tuesday, March 30, 2021 12:03 PM
To: ceph-users@ceph.io
Subject: [ceph-users] First 6 nodes cluster with Octopus

Hello,

I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu 
nodes and have a few questions mostly regarding planning of the infra.

1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is 
it ok to use Ubuntu 20.04 instead or should I stick with 18.04?

2) The documentation recommends using Cephadm for new deployments, so I will 
use that but I read that with Cephadm everything is running in containers, so 
is this the new way to go? Or is Ceph in containers kind of still experimental?

3) As I will be needing cephfs I will also need MDS servers so with a total of 
6 nodes I am planning the following layout:

Node 1: MGR+MON+MDS
Node 2: MGR+MON+MDS
Node 3: MGR+MON+MDS
Node 4: OSD
Node 5: OSD
Node 6: OSD

Does this make sense? I am mostly interested in stability and HA with this 
setup.

4) Is there any special kind of demand in terms of disks on the MGR+MON+MDS 
nodes? Or can I use have my OS disks on these nodes? As far as I understand the 
MDS will create a metadata pool on the OSDs.

Thanks for the hints.

Best,
Mabi



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 10G stackabe lacp switches

2021-02-16 Thread DHilsbos
Sorry; Netgear M4300 switches, not M4100.

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Monday, February 15, 2021 8:39 AM
To: li...@merit.unu.edu; ceph-users@ceph.io
Subject: [ceph-users] Re: 10G stackabe lacp switches

MJ;

I was looking at something similar, and reached out one of my VARs, and they 
recommended Netgear M4100 series switches.  We don't user any of them yet, so I 
can't provide first-hand experience.

On the subject of UTP vs SFP+; I'm told that SFP+ with DAC cables experience 
lower latencies than 10 Gig over copper.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Monday, February 15, 2021 4:16 AM
To: ceph-users@ceph.io
Subject: [ceph-users] 10G stackabe lacp switches

Hi,

Hapy to report that we recently upgraded our three-host 24 OSD cluster 
from HDD filestore to SSD BlueStore. After a few months of use, their 
WEAR is still at 1%, and the cluster performance ("rados bench" etc) has 
dramatically improved. So all in all: yes, we're happy Samsung PM883 
ceph users. :-)

We currently have a "meshed" ceph setup, with the three hosts connected 
directly to each other over 10G ethernet, as described here:

https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Method_2_.28routed.29

As we would like to be able to add more storage hosts, we need to loose 
the meshed network setup.

My idea is to add two stacked 10G ethernet switches to the setup, so we 
can start using lacp bonded networking over two physical switches.

Looking around, we can get refurb Cisco Small Business 550X for around 
1300 euro. We also noticed that mikrotik and TP-Link have some even 
nicer-priced 10G switches, but those all lack bonding. :-(

Therfore I'm asking here: anyone here with suggestions on what to look 
at, for nice-priced 10G stackable switches..?

We would like to continue using ethernet, as we use that everywhere, and 
also performance-wise we're happy with what we currently have.

Last december I wrote to mikrotik support, asking if they will support 
stacking / LACP any time soon, and their answer was: probably 2nd half 
of 2021.

So, anyone here with interesting insights to share for ceph 10G ethernet 
storage networking?

Thanks,
MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 10G stackabe lacp switches

2021-02-15 Thread DHilsbos
MJ;

I was looking at something similar, and reached out one of my VARs, and they 
recommended Netgear M4100 series switches.  We don't user any of them yet, so I 
can't provide first-hand experience.

On the subject of UTP vs SFP+; I'm told that SFP+ with DAC cables experience 
lower latencies than 10 Gig over copper.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Monday, February 15, 2021 4:16 AM
To: ceph-users@ceph.io
Subject: [ceph-users] 10G stackabe lacp switches

Hi,

Hapy to report that we recently upgraded our three-host 24 OSD cluster 
from HDD filestore to SSD BlueStore. After a few months of use, their 
WEAR is still at 1%, and the cluster performance ("rados bench" etc) has 
dramatically improved. So all in all: yes, we're happy Samsung PM883 
ceph users. :-)

We currently have a "meshed" ceph setup, with the three hosts connected 
directly to each other over 10G ethernet, as described here:

https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Method_2_.28routed.29

As we would like to be able to add more storage hosts, we need to loose 
the meshed network setup.

My idea is to add two stacked 10G ethernet switches to the setup, so we 
can start using lacp bonded networking over two physical switches.

Looking around, we can get refurb Cisco Small Business 550X for around 
1300 euro. We also noticed that mikrotik and TP-Link have some even 
nicer-priced 10G switches, but those all lack bonding. :-(

Therfore I'm asking here: anyone here with suggestions on what to look 
at, for nice-priced 10G stackable switches..?

We would like to continue using ethernet, as we use that everywhere, and 
also performance-wise we're happy with what we currently have.

Last december I wrote to mikrotik support, asking if they will support 
stacking / LACP any time soon, and their answer was: probably 2nd half 
of 2021.

So, anyone here with interesting insights to share for ceph 10G ethernet 
storage networking?

Thanks,
MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread DHilsbos
My impression is that cost / TB for a drive may be approaching parity, but the 
TB /drive is still well below (or at least at densities approaching parity, 
cost / TB is still quite high).  I can get a Micron 15TB SSD for $2600, but why 
would I when I can get a 18TB Seagate IronWolf for <$600, a 18TB Seagate Exos 
for <$500, or a 18TB WD Gold for <$600?  Personally I wouldn't use drives that 
big, in our little tiny clusters, but it exemplifies the issues around 
discussing cost parity.

As such a cluster needs more dives for the same total size (thus more nodes), 
which drives up the cost / TB for a cluster.

My 2 cents.

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Adam Boyhan [mailto:ad...@medent.com] 
Sent: Thursday, February 4, 2021 10:58 AM
To: Anthony D'Atri
Cc: ceph-users
Subject: [ceph-users] Re: NVMe and 2x Replica

All great input and points guys. 

Helps me lean towards 3 copes a bit more. 

I mean honestly NVMe cost per TB isn't that much more than SATA SSD now. 
Somewhat surprised the salesmen aren't pitching 3x replication as it makes them 
more money. 



From: "Anthony D'Atri"  
To: "ceph-users"  
Sent: Thursday, February 4, 2021 12:47:27 PM 
Subject: [ceph-users] Re: NVMe and 2x Replica 

> I searched each to find the section where 2x was discussed. What I found was 
> interesting. First, there are really only 2 positions here: Micron's and Red 
> Hat's. Supermicro copies Micron's positon paragraph word for word. Not 
> surprising considering that they are advertising a Supermicro / Micron 
> solution. 

FWIW, at Cephalocon another vendor made a similar claim during a talk. 

* Failure rates are averages, not minima. Some drives will always fail sooner 
* Firmware and other design flaws can result in much higher rates of failure or 
insidious UREs that can result in partial data unavailability or loss 
* Latent soft failures may not be detected until a deep scrub succeeds, which 
could be weeks later 
* In a distributed system, there are up/down/failure scenarios where the 
location of even one good / canonical / latest copy of data is unclear, 
especially when drive or HBA cache is in play. 
* One of these is a power failure. Sure PDU / PSU redundancy helps, but stuff 
happens, like a DC underprovisioning amps, so that a spike in user traffic 
results in the whole row going down :-x Various unpleasant things can happen. 

I was championing R3 even pre-Ceph when I was using ZFS or HBA RAID. As others 
have written, as drives get larger the time to fill them with replica data 
increases, as does the chance of overlapping failures. I’ve experieneced R2 
overlapping failures more than once, with and before Ceph. 

My sense has been that not many people run R2 for data they care about, and as 
has been written recently 2,2 EC is safer with the same raw:usable ratio. I’ve 
figured that vendors make R2 statements like these as a selling point to assert 
lower TCO. My first response is often “How much would it cost you directly, and 
indirectly in terms of user / customer goodwill, to loose data?”. 

> Personally, this looks like marketing BS to me. SSD shops want to sell SSDs, 
> but because of the cost difference they have to convince buyers that their 
> products are competitive. 

^this. I’m watching the QLC arena with interest for the potential to narrow the 
CapEx gap. Durability has been one concern, though I’m seeing newer products 
claiming that eg. ZNS improves that. It also seems that there are something 
like what, *4* separate EDSFF / ruler form factors, I really want to embrace 
those eg. for object clusters, but I’m VERY wary of the longevity of competing 
standards and any single-source for chassies or drives. 

> Our products cost twice as much, but LOOK you only need 2/3 as many, and you 
> get all these other benefits (performance). Plus, if you replace everything 
> in 2 or 3 years anyway, then you won't have to worry about them failing. 

Refresh timelines. You’re funny ;) Every time, every single time, that I’ve 
worked in an organization that claims a 3 (or 5, or whatever) hardware refresh 
cycle, it hasn’t happened. When you start getting close, the capex doesn’t 
materialize, or the opex cost of DC hands and operational oversight. “How do 
you know that the drives will start failing or getting slower? Let’s revisit 
this in 6 months”. Etc. 

___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread DHilsbos
Adam;

Earlier this week, another thread presented 3 white papers in support of 
running 2x on NVMe for Ceph.

I searched each to find the section where 2x was discussed.  What I found was 
interesting.  First, there are really only 2 positions here: Micron's and Red 
Hat's.  Supermicro copies Micron's positon paragraph word for word.  Not 
surprising considering that they are advertising a Supermicro / Micron solution.

This is Micron's statement:
" NVMe SSDs have high reliability with high MTBR and low bit error rate. 2x 
replication is recommended in production when deploying OSDs on NVMe versus the 
3x replication common with legacy storage."

This is Red Hat's statement:
" Given the better MTBF and MTTR of flash-based media, many Ceph customers have 
chosen to run 2x replications in
production when deploying OSDs on flash. This differs from magnetic media 
deployments, which typically use 3x replication."

Looking at these statements, these acronyms pop out at me: MTBR and MTTR.  MTBR 
is Mean Time Between Replacements, while MTTR is Mean Time Till Replacement.  
Essentially; this is saying that most companies replaces these drives before 
they have to worry about large numbers failing.

Regarding MTBF; I can't find any data to support Red Hat's assertion that MTBF 
is better for flash.  I looked at both Western Digital Gold, and Seagate Exos 
12 TB drives, and found they both list a MTBF of 2.5 million hours.  I was 
unable to find any information on the MTBF of Micron drives, but the MTBF of 
Kingston's DC1000B 240GB drive is 2 million hours.

Personally, this looks like marketing BS to me.  SSD shops want to sell SSDs, 
but because of the cost difference they have to convince buyers that their 
products are competitive.

Pitch is thus:
Our products cost twice as much, but LOOK you only need 2/3 as many, and you 
get all these other benefits (performance).  Plus, if you replace everything in 
2 or 3 years anyway, then you won't have to worry about them failing.

I'll address general concerns of 2x replication in another email.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Adam Boyhan [mailto:ad...@medent.com] 
Sent: Thursday, February 4, 2021 4:38 AM
To: ceph-users
Subject: [ceph-users] NVMe and 2x Replica

I know there is already a few threads about 2x replication but I wanted to 
start one dedicated to discussion on NVMe. There are some older threads, but 
nothing recent that addresses how the vendors are now pushing the idea of 2x. 

We are in the process of considering Ceph to replace our Nimble setup. We will 
have two completely separate clusters at two different sites that we are using 
rbd-mirror snapshot replication. The plan would be to run 2x replication on 
each cluster. 3x is still an option, but for obvious reasons 2x is enticing. 

Both clusters will be spot on to the super micro example in the white paper 
below. 

It seems all the big vendors feel 2x is safe with NVMe but I get the feeling 
this community feels otherwise. Trying to wrap my head around were the 
disconnect is between the big players and the community. I could be missing 
something, but even our Supermicro contact that we worked the config out with 
was in agreement with 2x on NVMe. 

Appreciate the input! 

[ https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf | 
https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf ] 

[ 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 ] 
[ 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 | 
https://www.redhat.com/cms/managed-files/st-micron-ceph-performance-reference-architecture-f17294-201904-en.pdf
 ] 

[ 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 | 
https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf
 ] 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-03 Thread DHilsbos
Adam;

I'd like to see that / those white papers.

I suspect what they're advocating is multiple OSD daemon processes per NVMe 
device.  This is something which can improve performance.  Though I've never 
done it, I believe you partition the device, and then create your OSD pointing 
at a partition.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Adam Boyhan [mailto:ad...@medent.com] 
Sent: Wednesday, February 3, 2021 8:50 AM
To: Magnus HAGDORN
Cc: ceph-users
Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2

Isn't this somewhat reliant on the OSD type? 

Redhat/Micron/Samsung/Supermicro have all put out white papers backing the idea 
of 2 copies on NVMe's as safe for production. 


From: "Magnus HAGDORN"  
To: pse...@avalon.org.ua 
Cc: "ceph-users"  
Sent: Wednesday, February 3, 2021 4:43:08 AM 
Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2 

On Wed, 2021-02-03 at 09:39 +, Max Krasilnikov wrote: 
> > if a OSD becomes unavailble (broken disk, rebooting server) then 
> > all 
> > I/O to the PGs stored on that OSD will block until replication 
> > level of 
> > 2 is reached again. So, for a highly available cluster you need a 
> > replication level of 3 
> 
> 
> AFAIK, with min_size 1 it is possible to write even to only active 
> OSD serving 
> 
yes, that's correct but then you seriously risk trashing your data 

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. 
___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin sync status takes ages to print output

2021-01-14 Thread DHilsbos
Istvan;

What version of Ceph are you running?  Another email chain indicates you're 
running on CentOS 8, which suggests Octopus (15).

We're running multisite replicated radosgw on Nautilus.  I don't see the long 
running time that you are suggesting, though we only have ~35k objects.

I generally don't worry about sync unless the "oldest incremental change not 
applied" is several minutes or more in the past.  Our work day has just 
started, so use isn't very high yet.  This afternoon, when anticipated use 
peaks, I'll set a watch to see how behind the clusters get.

According to the command output, you have 64 shards in the metadata, and 128 
shards in the data.  That seems low, as that's the same number of shards we're 
running, with our significantly lower object count.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Szabo, Istvan (Agoda) [mailto:istvan.sz...@agoda.com] 
Sent: Wednesday, January 13, 2021 11:18 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: radosgw-admin sync status takes ages to print output

UPDATE: Finally got back the master sync command output:

radosgw-admin sync status
  realm 5fd28798-9195-44ac-b48d-ef3e95caee48 realm)
  zonegroup 31a5ea05-c87a-436d-9ca0-ccfcbad481e3 (data)
   zone 9213182a-14ba-48ad-bde9-289a1c0c0de8 (hkg)
  metadata sync no sync (zone is master)
  data sync source: 61c9d940-fde4-4bed-9389-edc8d7741817 (sin)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
source: f20ddd64-924b-4f78-8d2d-dd6c65f98ba9 (ash)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 128 shards
behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
oldest incremental change not applied: 
2021-01-14T13:03:17.807529+0700 [20]
45 shards are recovering
recovering shards: 
[5,14,23,25,26,34,36,37,38,45,46,47,49,50,51,52,54,55,57,58,60,61,62,67,68,69,71,77,79,80,88,89,90,95,97,100,108,110,111,117,118,120,121,125,126]

Sorry for the 2 email.


-Original Message-
From: Szabo, Istvan (Agoda)  
Sent: Thursday, January 14, 2021 12:57 PM
To: ceph-users@ceph.io
Subject: [ceph-users] radosgw-admin sync status takes ages to print output

Email received from outside the company. If in doubt don't click links nor open 
attachments!


Hello,

I have a 3 DC octopus Multisite setup with bucket sync policy applied.

I have 2 buckets where I’ve set the shard 24.000 and the other is 9.000 because 
they want to use 1 bucket but with a huge amount of objects (2.400.000.000 and 
900.000.000) and in case of multisite we need to preshard the buckets as it is 
in the documentation.

Do I need to fine tune something on the syncing to make this query faster?
This is the output after 5-10 minutes query time not sure is it healthy or good 
or not to be honest, haven’t really find any good explanation about the output 
in the ceph documentation.

From the master zone I can’r reallt even query because timed out, but in 
secondary zone can see this:


radosgw-admin sync status
  realm 5fd28798-9195-44ac-b48d-ef3e95caee48 (realm)
  zonegroup 31a5ea05-c87a-436d-9ca0-ccfcbad481e3 (data)
   zone 61c9d940-fde4-4bed-9389-edc8d7741817 (sin)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 9213182a-14ba-48ad-bde9-289a1c0c0de8 (hkg)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 128 shards
behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
oldest incremental change not applied: 
2021-01-14T12:01:00.131104+0700 [11]
source: f20ddd64-924b-4f78-8d2

[ceph-users] Re: Global AVAIL vs Pool MAX AVAIL

2021-01-12 Thread DHilsbos
Mark;

Just to clarify; when you say you have "1 replica," does that mean that mean 
that Replica Size = 2, or Replica Size = 1?

Neither of these is good.

With Replica Size = 1; if one hard drive (which contains a PG) fails, the 
entire pool fails.  Not just refuses writes, but stops accepting reads until 
(possibly) professionally recovered.

With Replica Size = 3; the cluster can't automatically heal if it finds a 
replica that doesn't match it's master; which is correct?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Mark Johnson [mailto:ma...@iovox.com] 
Sent: Monday, January 11, 2021 7:54 PM
To: anthony.da...@gmail.com
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Global AVAIL vs Pool MAX AVAIL

Thanks Anthony,

Shortly after I made that post, I found a Server Fault post where someone had 
asked the exact same question.  The reply was this - "The 'MAX AVAIL' column 
represents the amount of data that can be used before the first OSD becomes 
full. It takes into account the projected distribution of data across disks 
from the CRUSH map and uses the 'first OSD to fill up' as the target."

To answer your question, yes we have a rather unbalanced cluster which is 
something I'm working on.  When I saw these figures, I got scared that I had 
less time to work on it than I thought.  There are about 10 pools in the 
cluster, but we primarily use one for almost all of our storage and it only has 
64 pgs & 1 replica across 20 OSDs.  So, as data has grown, it works out that 
each PG in this cluster accounts for about 148GB, and the OSDs are about 1.4TB 
each, so it's easy to see how it's found itself way out of balance.

Anyway, once I've added the OSDs and data has rebalanced, I'm going to start 
the process of incrementally increasing the PG count for this pool in a staged 
process to reduce the amount of data per PG and (hopefully) balance out the 
data distribution better than it is.

This is one big learning process - I just wish I wasn't learning in production 
so much.



On Mon, 2021-01-11 at 15:58 -0800, Anthony D'Atri wrote:

Either you have multiple CRUSH roots or device classes, or you have unbalanced 
OSD utilization.  What version of Ceph?  Do you have any balancing enabled?


Do


ceph osd df | sort -nk8 | head

ceph osd df | sort -nk8 | tail


and I’ll bet you have OSDs way more full than others.  The STDDEV value that 
ceph df reports I suspect is accordingly high


On Jan 11, 2021, at 2:07 PM, Mark Johnson <



ma...@iovox.com

> wrote:


Can someone please explain to me the difference between the Global "AVAIL" and 
the "MAX AVAIL" in the pools table when I do a "ceph df detail"?  The reason 
being that we have a total of 14 pools, however almost all of our data exists 
in one pool.  A "ceph df detail" shows the following:


GLOBAL:

   SIZE   AVAIL RAW USED %RAW USED OBJECTS

   28219G 6840G   19945G 70.68  36112k


But the POOLS table from the same output shows the MAX AVAIL for each pool as 
498G and the pool with all the data shows 9472G used with a %USED of 95.00.  If 
it matters, the pool size is set to 2 so my guess is the global available 
figure is raw, meaning I should still have approx. 3.4TB available, but that 
95% used has me concerned.  I'm going to be adding some OSDs soon but still 
would like to understand the difference and how much trouble I'm in at this 
point.



___

ceph-users mailing list --



ceph-users@ceph.io


To unsubscribe send an email to



ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Compression of data in existing cephfs EC pool

2021-01-04 Thread DHilsbos
Paul;

I'm not familiar with rsync, but is it possible you're running into a system 
issue of the copies being shallow?

In other words, is it possible that you're ending up with a hard-link (2 
directory entries pointing to the same initial inode), instead of a deep copy?

I believe CephFS is implemented such that directories and their entries are 
omaps, while inodes are data objects.  If your operating system / filesystem / 
copy mechanism isn't creating new inodes, and deleting the old ones, they 
wouldn't get compressed.

Confirmation from a Ceph dev on the above implementation assumptions would be 
appreciated.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Paul Mezzanini [mailto:pfm...@rit.edu] 
Sent: Monday, January 4, 2021 11:23 AM
To: Burkhard Linke; ceph-users@ceph.io
Subject: [ceph-users] Re: Compression of data in existing cephfs EC pool

That does make sense and I wish it were true however what I'm seeing doesn't 
support your hypothesis.  I've had several drives die and be replaced since the 
go-live date and I'm actually in the home stretch on reducing the pg_num on 
that pool so pretty much every PG has already been moved several times over.

It's also possible that my method for checking compression is flawed.  Spot 
checks from what I can see in an OSD stat dump and ceph df detail seem to line 
up so I don't believe this is the case.

The only time I see the counters move is when someone puts new data in via 
globus or migration from a cluster job.

I will test what you proposed though by draining an OSD and refilling it then 
checking the stat dump to see what lives under compression and what does not.   

-paul 

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.






Just my two cents:

Compression is an OSD level operation, and the OSD involved in a PG do
no know about each others' compression settings. And they probably also
do not care, considering the OSD to be a black box.


I would propose to drain OSDs (one by one or host by host by setting osd
weights) to move the uncompressed data off. Reset the weights to the
former values later to move the data back, and upon writing the data it
should be compressed.

Compression should also happen during writing the data to other osds
when it is moved an OSD, but you will end up with a mix of compressed
and uncompressed data on the same OSD. You will have to process all OSDs).


If this is working as expected, you do not have to touch the data on the
filesystem level at all. The operation happens solely on the underlying
storage.


Regards,

Burkhard

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus Health Metrics

2020-12-28 Thread DHilsbos
All;

I turned on device health metrics in one of our Nautilus clusters.  
Unfortunately, it doesn't seem to be collecting any information.

When I do "ceph device get-health-metrics , I get the following;
{
"20200821-223626": {
"dev": "/dev/sdc",
"error": "smartctl failed",
"nvme_smart_health_information_add_log_error": "nvme returned an error: 
sudo: exit status: 1",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "samsung_ssd_860_evo_4tb",
"smartctl_error_code": -22,
"smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit 
status: 1\nstdout:\n"
}
}

The cluster is Nautilus 14.2.16 (updated from 14.2.11 just after turning on 
health metrics).  Smartctl is release 7.0 dated 2018-12-30 at 14:47:55 UTC.

Thoughts?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CentOS

2020-12-08 Thread DHilsbos
Marc;

As if that's not enough confusion (from the FAQ):
"Security issues will be updated in CentOS Stream after they are solved in the 
current RHEL release. Obviously, embargoed security releases can not be 
publicly released until after the embargo is lifted."

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Marc Roos [mailto:m.r...@f1-outsourcing.eu] 
Sent: Tuesday, December 8, 2020 3:19 PM
To: Dominic Hilsbos; mozes
Cc: aKrishna; ceph-users
Subject: [ceph-users] Re: CentOS


I am confused about that page
 
"Does this mean that CentOS Stream is the RHEL BETA test platform now?"
"No, CentOS Stream will be getting fixes and features ahead of RHEL"

However this is how wikipedia describes beta:
Beta version software is often useful for demonstrations and previews 
within an organization and to prospective customers. 

"we expect CentOS Stream to have fewer bugs ... than RHEL until those 
packages make it into the RHEL release" 
That looks also contradictory to me. 



-Original Message-
Subject: *SPAM* Re: [ceph-users] Re: CentOS

Marc,

That video may be out of date.

https://centos.org/distro-faq/#q6-will-there-be-separateparallelsimultaneous-streams-for-8-9-10-etc

--
Adam

On Tue, Dec 8, 2020 at 3:50 PM  wrote:
>
> Marc;
>
> I'm not happy about this, but RedHat is suggesting that those of us 
running CentOS for production should move to CentOS Stream.  As such, I 
need to determine if the software I'm running on top of it can be run on 
Stream.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
> -Original Message-
> From: Marc Roos [mailto:m.r...@f1-outsourcing.eu]
> Sent: Tuesday, December 8, 2020 2:02 PM
> To: ceph-users; Dominic Hilsbos
> Cc: aKrishna
> Subject: [ceph-users] Re: CentOS
>
>
> I did not. Thanks for the info. But if I understand this[1] 
> explanation correctly. CentOS stream is some sort of trial environment 

> for rhel. So who is ever going to put SDS on such an OS?
>
> Last post on this blog "But if you read the FAQ, you also learn that 
> once they start work on RHEL 9, CentOS Stream 8 ceases to exist..."
>
> [1]
> https://www.youtube.com/watch?v=IEEdOogPMY8
>
>
>
>
>
>
> -Original Message-
> To: ceph-users@ceph.io
> Subject: [ceph-users] CentOS
>
> All;
>
> As you may or may not know; this morning RedHat announced the end of 
> CentOS as a rebuild distribution[1].  "CentOS" will be retired in 
> favor of the recently announced "CentOS Stream."
>
> Can Ceph be installed on CentOS Stream?
>
> Since CentOS Stream is currently at 8, the question really is: Can 
> Ceph Octopus be installed on CentOS Stream 8?  How about Nautilus?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
> [1: https://blog.centos.org/2020/12/future-is-centos-stream/]
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CentOS

2020-12-08 Thread DHilsbos
Marc;

I'm not happy about this, but RedHat is suggesting that those of us running 
CentOS for production should move to CentOS Stream.  As such, I need to 
determine if the software I'm running on top of it can be run on Stream.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Marc Roos [mailto:m.r...@f1-outsourcing.eu] 
Sent: Tuesday, December 8, 2020 2:02 PM
To: ceph-users; Dominic Hilsbos
Cc: aKrishna
Subject: [ceph-users] Re: CentOS

 
I did not. Thanks for the info. But if I understand this[1] explanation 
correctly. CentOS stream is some sort of trial environment for rhel. So 
who is ever going to put SDS on such an OS?

Last post on this blog "But if you read the FAQ, you also learn that 
once they start work on RHEL 9, CentOS Stream 8 ceases to exist..."

[1]
https://www.youtube.com/watch?v=IEEdOogPMY8






-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] CentOS

All;

As you may or may not know; this morning RedHat announced the end of 
CentOS as a rebuild distribution[1].  "CentOS" will be retired in favor 
of the recently announced "CentOS Stream."

Can Ceph be installed on CentOS Stream?

Since CentOS Stream is currently at 8, the question really is: Can Ceph 
Octopus be installed on CentOS Stream 8?  How about Nautilus?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

[1: https://blog.centos.org/2020/12/future-is-centos-stream/]
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CentOS

2020-12-08 Thread DHilsbos
All;

As you may or may not know; this morning RedHat announced the end of CentOS as 
a rebuild distribution[1].  "CentOS" will be retired in favor of the recently 
announced "CentOS Stream."

Can Ceph be installed on CentOS Stream?

Since CentOS Stream is currently at 8, the question really is: Can Ceph Octopus 
be installed on CentOS Stream 8?  How about Nautilus?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

[1: https://blog.centos.org/2020/12/future-is-centos-stream/]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread DHilsbos
Adrian;

I've always considered the advantage of ARM to be the reduction in the failure 
domain.  Instead of one server with 2 processors, and 2 power supplies, in 1 
case, running 48 disks, you can do  4 cases containing 8 power supplies, and 32 
processors running 32 (or 64...) disks.

The architecture is different with ARM; you pair an ARM SoC up with just one or 
2 disks, and you only run the OSD software.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Robert Sander [mailto:r.san...@heinlein-support.de] 
Sent: Tuesday, November 24, 2020 5:56 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph on ARM ?

Am 24.11.20 um 13:12 schrieb Adrian Nicolae:

>     Has anyone tested Ceph in such scenario ?  Is the Ceph software 
> really optimised for the ARM architecture ?

Personally I have not run Ceph on ARM, but there are companies selling such 
setups:

https://softiron.com/
https://www.ambedded.com.tw/

Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs snapshots and previous version

2020-11-24 Thread DHilsbos
Oliver;

You might consider asking this question of the CentOS folks.  Possibly at 
cen...@centos.org.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Oliver Weinmann [mailto:oliver.weinm...@me.com] 
Sent: Monday, November 23, 2020 3:20 PM
To: ceph-users
Subject: [ceph-users] Cephfs snapshots and previous version

Today I played with a samba gateway and cephfs. I couldn’t get previous 
versions displayed on a windows client and found very little info on the net 
how to accomplish this. It seems that I need a vfs module called 
ceph_snapshots. It’s not included in the latest samba version on Centos 8. by 
this I also noticed that there is no vfs ceph module. Are these modules not 
stable and therefore not included in centos8? I can compile them but I would 
like to know why they are not included. And one more question. Are there any 
plans to add samba gateway support to cephadm?

Best regards,
Oliver

Von meinem iPhone gesendet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Accessing Ceph Storage Data via Ceph Block Storage

2020-11-17 Thread DHilsbos
Vaughan;

An absolute minimal Ceph cluster really needs to be 3 servers, and at that 
usable space should be 1/3 of raw space (see the archives of this mailing list 
for many discussions of why size=2 is bad).

While it is possible to run other tasks on Ceph servers, memory utilization of 
Ceph processes can be quite large, so it's often discouraged, especially on 
memory constrained servers.

Would it be feasible to acquire a system with sufficient RAM to run both VMs?

I believe RBD can be cached, but I can't speak to how it's configured, or how 
well it works.  I believe you would want a really fast drive (SSD) to store the 
cache on.

Depending on your performance and storage volume needs, you might be able to 
get away with building a micro-cluster, based on ARM CPUs.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-Original Message-
From: Vaughan Beckwith [mailto:vaughan.beckw...@bluesphere.co.za] 
Sent: Tuesday, November 17, 2020 3:54 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Accessing Ceph Storage Data via Ceph Block Storage

Hi All,

I'm not sure if this is the correct place to ask this question, I have tried 
the channels, but have received very little help there.

I am currently very new to Ceph and am investigating it as to a possible 
replacement for a legacy application which use to provide us with replication.

At the moment my company has three servers, two primary servers running Ubuntu 
and a backup server also running Ubuntu, the two primary servers each host a 
virtual machine, and it is these virtual machines that the office workers use 
for shared folder access, email and as a domain server, the office workers are 
not aware of the underlying linux servers.  In the past the legacy software 
would replicate the running VM files on both primary servers to the backup 
server.  The replication is done at the underlying linux host level and not 
from within the guest VMs.  I was hoping that I could get Ceph to do this as 
well.  From what I have read and I speak under correction, the best Ceph client 
type for this would be the block access, whereby I would then mount the block 
and start up the VMs.  As I would be running the VMs, as per normal routine, 
would Ceph then have to retrieve the large VM files from the storage nodes 
across the lan and bring the data back to the client to run in the VM. 
  Is there an option to cache certain parts of the data on certain clients?

Also none of the primary servers as they currently stand have the capacity to 
run both VMs together, so each primary has a dedicated VM which it runs, the 
backup server currently keeps replicated copies of both VM images from each 
primary, the replication is provided by the legacy application.  I'm also 
wondering if I need to get a fourth server, so I have 2 clients and 2 storage 
nodes.

Any suggestions or help would be greatly appreciated.

Yours sincerely

Vaughan Beckwith
Bluesphere Technologies
BSC I.T. (Honours)

vaughan.beckw...@bluesphere.co.za
Telephone: 011 675 6354
Fax: (011) 675 6423

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-17 Thread DHilsbos
Phil;

I'm probably going to get crucified for this, but I put a year of testing into 
this before determining it was sufficient to the needs of my organization...

If the primary concerns are capability and cost (not top of the line 
performance), then I can tell you that we have had great success utilizing 
Intel Atom C3000 series CPUs.  We have built 2 clusters with capacities on the 
order of 130TiB, for less than $30,000 each.  The initial clusters cost $20,000 
each, for half the capacity.  Our testing cluster cost $8,000 to build, and 
most of that hardware could have been wrapped into the first production cluster 
build.

For those keeping track, no that is not the lowest cost / unit space.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Phil Merricks [mailto:seffyr...@gmail.com] 
Sent: Monday, November 16, 2020 5:52 PM
To: Janne Johansson
Cc: Hans van den Bogert; ceph-users
Subject: [ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - 
Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Thanks for all the replies folks.  I think it's testament to the
versatility of Ceph that there are some differences of opinion and
experience here.

With regards to the purpose of this cluster, it is providing distributed
storage for stateful workloads of containers.  The data produced is
somewhat immutable, it can be regenerated over time, however that does
cause some slowdown for the teams that use the data as part of their
development pipeline.  To the best of my understanding the goals here were
to provide a data loss safety net but still make efficient use of the block
devices assigned to the cluster, which is I imagine where the EC direction
came from.  The cluster is 3 nodes with the OSDs themselves mainly housed
in two of those.  Additionally there was an initiative to 'use what we
have' (or as I like to put it, 'cobble it together') with commodity
hardware that was immediately available to hand.  The departure of my
predecessor has left some unanswered questions so I am not going to bother
second guessing beyond what I already know.  As I understand it my steps
are:

1:  Move off the data and scrap the cluster as it stands currently.
(already under way)
2:  Group the block devices into pools of the same geometry and type (and
maybe do some tiering?)
3. Spread the OSDs across all 3 nodes so recovery scope isn't so easily
compromised by a loss at the bare metal level
4. Add more hosts/OSDs if EC is the right solution (this may be outside of
the scope of this implementation, but I'll keep a-cobblin'!)

The additional ceph outputs follow:
ceph osd tree 
ceph osd erasure-code-profile get cephfs-media-ec 

I am fully prepared to do away with EC to keep things simple and efficient
in terms of CPU occupancy.



On Mon, 16 Nov 2020 at 02:32, Janne Johansson  wrote:

> Den mån 16 nov. 2020 kl 10:54 skrev Hans van den Bogert <
> hansbog...@gmail.com>:
>
> > > With this profile you can only loose one OSD at a time, which is really
> > > not that redundant.
> > That's rather situation dependent. I don't have really large disks, so
> > the repair time isn't that large.
> > Further, my SLO isn't that high that I need 99.xxx% uptime, if 2 disks
> > break in the same repair window, that would be unfortunate, but I'd just
> > grab a backup from a mirroring cluster. Looking at it from another
> > perspective, I came from a single host RAID5 scenario, I'd argue this is
> > better since I can survive a host failure.
> >
> > Also this is a sliding problem right? Someone with K+3 could argue K+2
> >   is not enough as well.
> >
>
> There are a few situations like when you are moving data or when a scrub
> found a bad PG where you are suddenly out of copies in case something bad
> happens. I think Raid5 operators also found this out, when your cold spare
> disk kicks in, you find that old undetected error on one of the other disks
> and think repairs are bad or stress your raid too much.
>
> As with raids, the cheapest resource is often the actual disks and not
> operator time, restore-wait-times and so on, so that is why many on this
> list advocates for K+2-or-more, or Repl=3 because we have seen the errors
> one normally didn't expect. Yes, a double surprise of two disks failing in
> the same night after running for years is uncommon, but it is not as
> uncommon to resize pools, move PGs around or find a scrub error or two some
> day.
>
> So while one could always say "one more drive is better than your amount",
> there are people losing data with repl=2 or K+1 because some more normal
> operation was in flight and _then_ a single surprise happens.  So you can
> have a weird reboot, causing those PGs needing backfill later, and if one
> of the uptodate hosts have any single surprise during the recove

[ceph-users] Re: safest way to re-crush a pool

2020-11-10 Thread DHilsbos
Michael;

I run a Nautilus cluster, but all I had to do was change the rule associated 
with the pool, and ceph moved the data.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Michael Thomas [mailto:w...@caltech.edu] 
Sent: Tuesday, November 10, 2020 1:32 PM
To: ceph-users@ceph.io
Subject: [ceph-users] safest way to re-crush a pool

I'm setting up a radosgw for my ceph Octopus cluster.  As soon as I 
started the radosgw service, I notice that it created a handful of new 
pools.  These pools were assigned the 'replicated_data' crush rule 
automatically.

I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush 
rule spans all device types.  I would like radosgw to use a replicated 
SSD pool and avoid the HDDs.  What is the recommended way to change the 
crush device class for these pools without risking the loss of any data 
in the pools?  I will note that I have not yet written any user data to 
the pools.  Everything in them was added by the radosgw process 
automatically.

--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fix PGs states

2020-10-30 Thread DHilsbos
This line is telling:
 1 osds down
This is likely the cause of everything else.

Why is one of your OSDs down?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Ing. Luis Felipe Domínguez Vega [mailto:luis.doming...@desoft.cu] 
Sent: Thursday, October 29, 2020 7:46 PM
To: Ceph Users
Subject: [ceph-users] Fix PGs states

Hi:

I have this ceph status:
-
cluster:
 id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
 health: HEALTH_WARN
 noout flag(s) set
 1 osds down
 Reduced data availability: 191 pgs inactive, 2 pgs down, 35 
pgs incomplete, 290 pgs stale
 5 pgs not deep-scrubbed in time
 7 pgs not scrubbed in time
 327 slow ops, oldest one blocked for 233398 sec, daemons 
[osd.12,osd.36,osd.5] have slow ops.

   services:
 mon: 1 daemons, quorum fond-beagle (age 23h)
 mgr: fond-beagle(active, since 7h)
 osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs
  flags noout

   data:
 pools:   7 pools, 2305 pgs
 objects: 350.37k objects, 1.5 TiB
 usage:   3.0 TiB used, 38 TiB / 41 TiB avail
 pgs: 6.681% pgs unknown
  1.605% pgs not active
  1835 active+clean
  279  stale+active+clean
  154  unknown
  22   incomplete
  10   stale+incomplete
  2down
  2remapped+incomplete
  1stale+remapped+incomplete


How can i fix all of unknown, incomplete, remmaped+incomplete, etc... i 
dont care if i need remove PGs
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large map object found

2020-10-23 Thread DHilsbos
Peter;

As with many things in Ceph, I don’t believe it’s a hard and fast rule (i.e. 
noon power of 2 will work).  I believe the issues are performance, and balance. 
 I can't confirm that.  Perhaps someone else on the list will add their 
thoughts.

Has your  warning gone away?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Friday, October 23, 2020 5:41 AM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: Re: Large map object found

Perfect -- many thanks Dominic!

I haven't found a doc which notes the --num-shards needs to be a power of two. 
It isn't I don't believe you -- just haven't seen that anywhere.

peter


Peter Eisch​

Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


v2.66

On 10/22/20, 10:24 AM, "dhils...@performair.com"  
wrote:

Peter;

I believe shard counts should be powers of two.

Also, resharding makes the buckets unavailable, but occurs very quickly. As 
such it is not done in the background, but in the foreground, for a manual 
reshard.

Notice the statement: "reshard of bucket  from  
to  completed successfully." It's done.

The warning notice won't go away until a scrub is completed to determine that a 
large OMAP object no longer exists.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.performair.com%2F&data=04%7C01%7Cpeter.eisch%40virginpulse.com%7C14386968705f4571e9a008d8769e9a16%7Cb123a16e892b4cf6a55a6f8c7606a035%7C0%7C0%7C637389770850660951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AUPmwxIgzRhmwpg2MM6b%2FzpPyR84%2F92OFsW9UrKw%2Fes%3D&reserved=0


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Thursday, October 22, 2020 8:04 AM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: Re: Large map object found

Thank you! This was helpful.

I opted for a manual reshard:

[root@cephmon-s03 ~]# radosgw-admin bucket reshard 
--bucket=d2ff913f5b6542cda307c9cd6a95a214/NAME_segments --num-shards=3
tenant: d2ff913f5b6542cda307c9cd6a95a214
bucket name: backups_sql_dswhseloadrepl_segments
old bucket instance id: 80bdfc66-d1fd-418d-b87d-5c8518a0b707.340850308.51
new bucket instance id: 80bdfc66-d1fd-418d-b87d-5c8518a0b707.948621036.1
total entries: 1000 2000 3000 3228
2020-10-22 08:40:26.353 7fb197fc66c0 1 execute INFO: reshard of bucket 
"backups_sql_dswhseloadrepl_segments" from 
"d2ff913f5b6542cda307c9cd6a95a214/backups_sql_dswhseloadrepl_segments:80bdfc66-d1fd-418d-b87d-5c8518a0b707.340850308.51"
 to 
"d2ff913f5b6542cda307c9cd6a95a214/backups_sql_dswhseloadrepl_segments:80bdfc66-d1fd-418d-b87d-5c8518a0b707.948621036.1"
 completed successfully

[root@cephmon-s03 ~]# radosgw-admin buckets reshard list
[] 
[root@cephmon-s03 ~]# radosgw-admin buckets reshard status 
--bucket=d2ff913f5b6542cda307c9cd6a95a214/NAME_segments
[
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
}
]
[root@cephmon-s03 ~]#

This kicked of an autoscale event. Would the reshard presumably start after the 
autoscaling is complete?

peter



Peter Eisch​

Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited a

[ceph-users] Re: Large map object found

2020-10-22 Thread DHilsbos
Peter;

I believe shard counts should be powers of two.

Also, resharding makes the buckets unavailable, but occurs very quickly.  As 
such it is not done in the background, but in the foreground, for a manual 
reshard.

Notice the statement: "reshard of bucket   from  
to  completed successfully."  It's done.

The warning notice won't go away until a scrub is completed to determine that a 
large OMAP object no longer exists.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Thursday, October 22, 2020 8:04 AM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: Re: Large map object found

Thank you! This was helpful.

I opted for a manual reshard:

[root@cephmon-s03 ~]# radosgw-admin bucket reshard 
--bucket=d2ff913f5b6542cda307c9cd6a95a214/NAME_segments --num-shards=3
tenant: d2ff913f5b6542cda307c9cd6a95a214
bucket name: backups_sql_dswhseloadrepl_segments
old bucket instance id: 80bdfc66-d1fd-418d-b87d-5c8518a0b707.340850308.51
new bucket instance id: 80bdfc66-d1fd-418d-b87d-5c8518a0b707.948621036.1
total entries: 1000 2000 3000 3228
2020-10-22 08:40:26.353 7fb197fc66c0 1 execute INFO: reshard of bucket 
"backups_sql_dswhseloadrepl_segments" from 
"d2ff913f5b6542cda307c9cd6a95a214/backups_sql_dswhseloadrepl_segments:80bdfc66-d1fd-418d-b87d-5c8518a0b707.340850308.51"
 to 
"d2ff913f5b6542cda307c9cd6a95a214/backups_sql_dswhseloadrepl_segments:80bdfc66-d1fd-418d-b87d-5c8518a0b707.948621036.1"
 completed successfully

[root@cephmon-s03 ~]# radosgw-admin buckets reshard list
[] 
[root@cephmon-s03 ~]# radosgw-admin buckets reshard status 
--bucket=d2ff913f5b6542cda307c9cd6a95a214/NAME_segments
[
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": "not-resharding",
"new_bucket_instance_id": "",
"num_shards": -1
}
]
[root@cephmon-s03 ~]#

This kicked of an autoscale event. Would the reshard presumably start after the 
autoscaling is complete?

peter



Peter Eisch​

Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


v2.66

On 10/21/20, 3:19 PM, "dhils...@performair.com"  wrote:

This email originates outside Virgin Pulse.


Peter;

Look into bucket sharding.

Thank you,

Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International Inc.
dhils...@performair.com
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.performair.com%2F&data=04%7C01%7Cpeter.eisch%40virginpulse.com%7C667c3a35965f41ae09e908d875fe8be6%7Cb123a16e892b4cf6a55a6f8c7606a035%7C0%7C0%7C637389083427271421%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ze6f5CxNUXEsaL2HMSbuc1liFMjitKk9tG1gTNdojgE%3D&reserved=0


From: Peter Eisch [mailto:peter.ei...@virginpulse.com]
Sent: Wednesday, October 21, 2020 12:39 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Large map object found

Hi,

My rgw.buckets.index has the cluster in WARN. I'm either not understanding the 
real issue or I'm making it worse, or both.

OMAP_BYTES: 70461524
OMAP_KEYS: 250874

I thought I'd head this off by deleting rgw objects which would normally get 
deleted in the near future but this only seemed to make the values grow. Before 
I deleted lots of objects the values were:

OMAP_BYTES: 65450132
OMAP_KEYS: 209843

I read the default is 200k but I haven't read the proper way to manage this 
situation. What reading should I dive into? I could probably craft up a command 
to increase the value to clear the warning but I'm guessing this might not be 
great long-term.

Other errata which might matter:
Size: 3
Pool: nvme
CLASS SIZE AVAIL USED RAW USED %RAW USED
nvme 256 TiB 165 TiB 91 TiB 91 TiB 35.53

Errata: the complete statements:

PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE 
SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP
43.d 2 0 0 0 0 70461524 250

[ceph-users] Re: Large map object found

2020-10-21 Thread DHilsbos
Peter;

Look into bucket sharding.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Wednesday, October 21, 2020 12:39 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Large map object found

Hi,

My rgw.buckets.index has the cluster in WARN. I'm either not understanding the 
real issue or I'm making it worse, or both.

OMAP_BYTES: 70461524
OMAP_KEYS: 250874

I thought I'd head this off by deleting rgw objects which would normally get 
deleted in the near future but this only seemed to make the values grow. Before 
I deleted lots of objects the values were:

OMAP_BYTES: 65450132
OMAP_KEYS: 209843

I read the default is 200k but I haven't read the proper way to manage this 
situation. What reading should I dive into? I could probably craft up a command 
to increase the value to clear the warning but I'm guessing this might not be 
great long-term. 

Other errata which might matter:
Size: 3
Pool: nvme
CLASS SIZE AVAIL USED RAW USED %RAW USED 
nvme 256 TiB 165 TiB 91 TiB 91 TiB 35.53

Errata: the complete statements:

PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE 
SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 
43.d 2 0 0 0 0 70461524 250874 3070 active+clean 36m 185904'456870 
185904:1357091 [99,90,48]p99 [99,90,48]p99 2020-10-21 13:53:42.102363 
2020-10-21 13:53:42.102363 

Thanks!

peter
Peter Eisch​

Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


v2.66


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph iSCSI Performance

2020-10-06 Thread DHilsbos
Mark;

Are you suggesting some other means to configure iSCSI targets with Ceph?

If so, how do configure for non-tcmu?

The iSCSI clients are not RBD aware, and I can't really make them RBD aware.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Monday, October 5, 2020 3:40 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph iSCSI Performance

I don't have super recent results, but we do have some test data from 
last year looking at kernel rbd, rbd-nbd, rbd+tcmu, fuse, etc:


https://docs.google.com/spreadsheets/d/1oJZ036QDbJQgv2gXts1oKKhMOKXrOI2XLTkvlsl9bUs/edit?usp=sharing


Generally speaking going through the tcmu layer was slower than kernel 
rbd or librbd directly (sometimes by quite a bit!).  There was also more 
client side CPU usage per unit performance as well (which makes sense 
since there's additional work being done).  You may be able to get some 
of that performance back with more clients as I do remember there being 
some issues with iodepth and tcmu. The only setup that I remember being 
slower at the time though was rbd-fuse which I don't think is even 
really maintained.


Mark


On 10/5/20 4:43 PM, dhils...@performair.com wrote:
> All;
>
> I've finally gotten around to setting up iSCSI gateways on my primary 
> production cluster, and performance is terrible.
>
> We're talking 1/4 to 1/3 of our current solution.
>
> I see no evidence of network congestion on any involved network link.  I see 
> no evidence CPU or memory being a problem on any involved server (MON / OSD / 
> gateway /client).
>
> What can I look at to tune this, preferably on the iSCSI gateways?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph iSCSI Performance

2020-10-05 Thread DHilsbos
All;

I've finally gotten around to setting up iSCSI gateways on my primary 
production cluster, and performance is terrible.

We're talking 1/4 to 1/3 of our current solution.

I see no evidence of network congestion on any involved network link.  I see no 
evidence CPU or memory being a problem on any involved server (MON / OSD / 
gateway /client).

What can I look at to tune this, preferably on the iSCSI gateways?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RadosGW and DNS Round-Robin

2020-09-04 Thread DHilsbos
All;

We've been running RadosGW on our nautilus cluster for a while, and we're going 
to be adding iSCSI capabilities to our cluster, via 2 additional servers.

I intend to also run RadosGW on these servers.  That begs the question of how 
to "load balance" these servers.  I don't believe that we need true load 
balancing (i.e. through a dedicated proxy), and I'd rather not add the 
complexity and single point of failure.

The question then is:  Does RadosGW play nice with round-robin DNS?  The real 
question here is whether RadosGW maintains internal client state locally 
between connections.  I would expect it's safe, given that it is HTTP, but I'd 
prefer to verify. 

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph iSCSI Questions

2020-09-04 Thread DHilsbos
All;

We've used iSCSI to support virtualization for a while, and have used 
multi-pathing almost the entire time.  Now, I'm looking to move from our single 
box iSCSI hosts to iSCSI on Ceph.

We have 2 independent, non-routed, subnets assigned to iSCSI (let's call them 
192.168.250.0/24 and 192.168.251.0/24).  These subnets are hosted in VLANs 250 
and 251, respectively, on our switches.  Currently; each target and each 
initiator have a dedicated network port for each subnet (i.e. 2 NIC  per target 
& 2 NIC per initiator).

I have 2 server prepared to setup as Ceph iSCSI targets (let's call them 
ceph-iscsi1 & cpeh-iscsi2), and I'm wondering about their network 
configurations.  My initial plan is to configure one on the 250 network, and 
the other on the 251 network.

Would it be possible to have both servers on both networks?  In other words, 
can I give ceph-iscsi1 both 192.168.250.200 and 192.168.251.200, and 
ceph-iscsi2 192.168.250.201 and 192.168.251.201?

If that works, I would expect the initiators to see 4 paths to each portal, 
correct?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-31 Thread DHilsbos
Dallas;

First, I should point out that you have an issue with your units.  Your cluster 
is reporting 81TiB (1024^4) of available space, not 81TB (1000^4).  Similarly; 
it's reporting 22.8 TiB free space in the pool, not 22.8TB.  For comparison; 
your 5.5 TB drives (this is the correct unit here) is only 5.02 TiB.  Hard 
drive manufacturers market in one set of units, while software systems report 
in another.  Thus, while you added 66 TB to your cluster, that is only 60 TiB.  
For background information, these pages are interesting:
https://en.wikipedia.org/wiki/Tebibyte
https://en.wikipedia.org/wiki/Binary_prefix#Consumer_confusion

It looks like you're using a replicated rule for your cephfs_data pool.  With 
81.2 TiB available in the cluster, the maximum free space you can expect is 
27.06 TiB (81.2 / 3 = 27.06).  As we've seen you can't actually fill a cluster 
to 100%.  It might be worth noting that the discrepancy is roughly 10% of your 
entire cluster (122.8 / (81.2 - (22.8 * 3))).

From your previously provided OSD map, I'm seeing some reweights that aren't 1. 
 It's possible that has some impact.

It's also possible that your cluster is "reserving" space on your HDDs for DB 
and WAL operations.

It would take someone that is more familiar with the CephFS and Dashboard code 
than I am, to answer your question definitively.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


From: Dallas Jones [mailto:djo...@tech4learning.com] 
Sent: Monday, August 31, 2020 2:59 PM
To: Dominic Hilsbos
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster degraded after adding OSDs to increase 
capacity

Thanks to everyone who replied. After setting osd_recovery_sleep_hdd to 0 and 
changing osd-max-backfills to 16, my recovery throughput increased from < 1MBPS 
to 40-60MPBS
and finished up late last night.

The cluster is mopping up a bunch of queued deep scrubs, but is otherwise now 
healthy.

I do have one remaining question - the cluster now shows 81TB of free space, 
but the data pool only shows 22.8TB of free space. I was expecting/hoping to 
see the free space value for the pool
grow more after doubling the capacity of the cluster (it previously had 21 OSDs 
w/ 2.7TB SAS drives; I just added 12 more OSDs w/ 5.5TB drives).

Are my expectations flawed, or is there something I can do to prod Ceph into 
growing the data pool free space?








On Fri, Aug 28, 2020 at 9:37 AM  wrote:
Dallas;

I would expect so, yes.

I wouldn't be surprised to see the used percentage slowly drop as the recovery 
/ rebalance progresses.  I believe that the pool free space number is based on 
the free space of the most filled OSD under any of the PGs, so I expect the 
free space will go up as your near-full OSDs drain.

I've added OSDs to one of our clusters, once, and the recovery / rebalance 
completed fairly quickly.  I don't remember how the pool sizes progressed.  I'm 
going to need to expand our other cluster in the next couple of months, so 
follow up on how this proceeds would be appreciated.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


From: Dallas Jones [mailto:djo...@tech4learning.com] 
Sent: Friday, August 28, 2020 7:58 AM
To: Florian Pritz
Cc: ceph-users@ceph.io; Dominic Hilsbos
Subject: Re: [ceph-users] Re: Cluster degraded after adding OSDs to increase 
capacity

Thanks for the reply. I dialed up the value for max backfills yesterday, which 
increased my recovery throughput from about 1mbps to 5ish. After tweaking 
osd_recovery_sleep_hdd, I'm seeing 50-60MBPS - which is fairly epic. No clients 
are currently using this cluster, so I'm not worried about tanking client 
performance.

One remaining question: Will the pool sizes begin to adjust once the recovery 
process is complete? Per the following screenshot, my data pool is ~94% full...



On Fri, Aug 28, 2020 at 4:31 AM Florian Pritz  
wrote:
On Thu, Aug 27, 2020 at 05:56:22PM +, dhils...@performair.com wrote:
> 2)  Adjust performance settings to allow the data movement to go faster.  
> Again, I don't have those setting immediately to hand, but Googling something 
> like 'ceph recovery tuning,' or searching this list, should point you in the 
> right direction. Notice that you only have 6 PGs trying to move at a time, 
> with 2 blocked on your near-full OSDs (8 & 19).  I believe; by default, each 
> OSD daemon is only involved in 1 data movement at a time.  The tradeoff here 
> is user activity suffers if you adjust to favor recovery, however, with the 
> cluster in ERROR status, I suspect user activity is already suffering.

We've set osd_max_backfills to 16 in the config and when necessary we
manually change the runtime value of osd_recovery_sleep_hdd. It defaults
to 0.1 seconds of wait time between objects (I think?). If you really
want fast recovery tr

[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-28 Thread DHilsbos
Dallas;

I would expect so, yes.

I wouldn't be surprised to see the used percentage slowly drop as the recovery 
/ rebalance progresses.  I believe that the pool free space number is based on 
the free space of the most filled OSD under any of the PGs, so I expect the 
free space will go up as your near-full OSDs drain.

I've added OSDs to one of our clusters, once, and the recovery / rebalance 
completed fairly quickly.  I don't remember how the pool sizes progressed.  I'm 
going to need to expand our other cluster in the next couple of months, so 
follow up on how this proceeds would be appreciated.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


From: Dallas Jones [mailto:djo...@tech4learning.com] 
Sent: Friday, August 28, 2020 7:58 AM
To: Florian Pritz
Cc: ceph-users@ceph.io; Dominic Hilsbos
Subject: Re: [ceph-users] Re: Cluster degraded after adding OSDs to increase 
capacity

Thanks for the reply. I dialed up the value for max backfills yesterday, which 
increased my recovery throughput from about 1mbps to 5ish. After tweaking 
osd_recovery_sleep_hdd, I'm seeing 50-60MBPS - which is fairly epic. No clients 
are currently using this cluster, so I'm not worried about tanking client 
performance.

One remaining question: Will the pool sizes begin to adjust once the recovery 
process is complete? Per the following screenshot, my data pool is ~94% full...



On Fri, Aug 28, 2020 at 4:31 AM Florian Pritz  
wrote:
On Thu, Aug 27, 2020 at 05:56:22PM +, dhils...@performair.com wrote:
> 2)  Adjust performance settings to allow the data movement to go faster.  
> Again, I don't have those setting immediately to hand, but Googling something 
> like 'ceph recovery tuning,' or searching this list, should point you in the 
> right direction. Notice that you only have 6 PGs trying to move at a time, 
> with 2 blocked on your near-full OSDs (8 & 19).  I believe; by default, each 
> OSD daemon is only involved in 1 data movement at a time.  The tradeoff here 
> is user activity suffers if you adjust to favor recovery, however, with the 
> cluster in ERROR status, I suspect user activity is already suffering.

We've set osd_max_backfills to 16 in the config and when necessary we
manually change the runtime value of osd_recovery_sleep_hdd. It defaults
to 0.1 seconds of wait time between objects (I think?). If you really
want fast recovery try this additional change:

ceph tell osd.\* config set osd_recovery_sleep_hdd 0

Be warned though, this will seriously affect client performance. Then
again it can bump your recovery speed by multiple orders of magnitude.
If you want to go back to how things were, set it back to 0.1 instead of
0. It may take a couple of seconds (maybe a minute) until performance
for clients starts to improve. I guess the OSDs are too busy with
recovery to instantly accept the changed value.

Florian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster degraded after adding OSDs to increase capacity

2020-08-27 Thread DHilsbos
Dallas;

It looks to me like you will need to wait until data movement naturally 
resolves the near-full issue.

So long as you continue to have this:
  io:
recovery: 477 KiB/s, 330 keys/s, 29 objects/s
the cluster is working.

That said, there are some things you can do.
1)  The near-full ratio is configurable.  I don't have those commands 
immediately to hand, but Googling, or searching archives of this list should 
show you have to change this value from its default of 80%.  Make sure you set 
it back when the data movement is complete, or almost complete.  You need to be 
careful with this, as ceph will happily run up to the new near-full ratio, and 
error again.  You also need to keep track of the other full ratios (I believe 
there are 2 others).
2)  Adjust performance settings to allow the data movement to go faster.  
Again, I don't have those setting immediately to hand, but Googling something 
like 'ceph recovery tuning,' or searching this list, should point you in the 
right direction. Notice that you only have 6 PGs trying to move at a time, with 
2 blocked on your near-full OSDs (8 & 19).  I believe; by default, each OSD 
daemon is only involved in 1 data movement at a time.  The tradeoff here is 
user activity suffers if you adjust to favor recovery, however, with the 
cluster in ERROR status, I suspect user activity is already suffering.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Dallas Jones [mailto:djo...@tech4learning.com] 
Sent: Thursday, August 27, 2020 9:02 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster degraded after adding OSDs to increase 
capacity

The new drives are larger capacity than the first drives I added to the
cluster, but they're all SAS HDDs.



cephuser@ceph01:~$ ceph osd df tree
ID CLASS WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
 %USE  VAR  PGS STATUS TYPE NAME
-1   122.79410- 123 TiB  42 TiB  41 TiB 217 GiB 466 GiB   81
TiB 33.86 1.00   -root default
-340.93137-  41 TiB  14 TiB  14 TiB  72 GiB 154 GiB   27
TiB 33.86 1.00   -host ceph01
 0   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 7.4 GiB  24 GiB  569
GiB 79.64 2.35 218 up osd.0
 1   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 7.6 GiB  23 GiB  694
GiB 75.16 2.22 196 up osd.1
 2   hdd   2.72849  1.0 2.7 TiB 1.6 TiB 1.6 TiB 8.8 GiB  18 GiB  1.1
TiB 60.39 1.78 199 up osd.2
 3   hdd   2.72849  0.95001 2.7 TiB 2.2 TiB 2.1 TiB 8.3 GiB  23 GiB  583
GiB 79.13 2.34 202 up osd.3
 4   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.4 GiB  22 GiB  692
GiB 75.22 2.22 214 up osd.4
 5   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 8.5 GiB  19 GiB  1.0
TiB 62.39 1.84 195 up osd.5
 6   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 8.5 GiB  21 GiB  709
GiB 74.62 2.20 217 up osd.6
22   hdd   5.45799  1.0 5.5 TiB 4.2 GiB 165 MiB 2.0 GiB 2.1 GiB  5.5
TiB  0.08 0.00  23 up osd.22
23   hdd   5.45799  1.0 5.5 TiB 2.7 GiB 161 MiB 1.5 GiB 1.0 GiB  5.5
TiB  0.05 0.00  23 up osd.23
27   hdd   5.45799  1.0 5.5 TiB  23 GiB  17 GiB 5.0 GiB 1.3 GiB  5.4
TiB  0.42 0.01  63 up osd.27
28   hdd   5.45799  1.0 5.5 TiB  10 GiB 2.8 GiB 6.0 GiB 1.3 GiB  5.4
TiB  0.18 0.01  82 up osd.28
-540.93137-  41 TiB  14 TiB  14 TiB  71 GiB 157 GiB   27
TiB 33.89 1.00   -host ceph02
 7   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.1 TiB 9.6 GiB  23 GiB  652
GiB 76.66 2.26 221 up osd.7
 8   hdd   2.72849  0.95001 2.7 TiB 2.4 TiB 2.4 TiB 7.6 GiB  26 GiB  308
GiB 88.98 2.63 220 up osd.8
 9   hdd   2.72849  1.0 2.7 TiB 2.1 TiB 2.0 TiB 8.5 GiB  23 GiB  679
GiB 75.71 2.24 214 up osd.9
10   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 1.9 TiB 7.5 GiB  21 GiB  777
GiB 72.18 2.13 208 up osd.10
11   hdd   2.72849  1.0 2.7 TiB 2.0 TiB 2.0 TiB 6.1 GiB  22 GiB  752
GiB 73.10 2.16 191 up osd.11
12   hdd   2.72849  1.0 2.7 TiB 1.5 TiB 1.5 TiB 9.1 GiB  18 GiB  1.2
TiB 56.45 1.67 188 up osd.12
13   hdd   2.72849  1.0 2.7 TiB 1.7 TiB 1.7 TiB 7.9 GiB  19 GiB 1024
GiB 63.37 1.87 193 up osd.13
25   hdd   5.45799  1.0 5.5 TiB 4.9 GiB 165 MiB 3.7 GiB 1.0 GiB  5.5
TiB  0.09 0.00  42 up osd.25
26   hdd   5.45799  1.0 5.5 TiB 2.9 GiB 157 MiB 1.6 GiB 1.2 GiB  5.5
TiB  0.05 0.00  26 up osd.26
29   hdd   5.45799  1.0 5.5 TiB  24 GiB  18 GiB 4.2 GiB 1.2 GiB  5.4
TiB  0.43 0.01  58 up osd.29
30   hdd   5.45799  1.0 5.5 TiB  21 GiB  14 GiB 5.6 GiB 1.3 GiB  5.4
TiB  0.38 0.01  71 up osd.30
-740.93137-  41 TiB  14 TiB  14 TiB  73 GiB 156 GiB   27
TiB 33.83 1.00   -host ceph03
14   hdd   2.72849  1.0 2.7 T

[ceph-users] Re: Help

2020-08-17 Thread DHilsbos
Randy;

Nextcloud is easy, it has a "standard" S3 client capability, though it also has 
Swift client capability.  As a S3 client, it does look for the older path style 
(host/bucket), rather than Amazons newer DNS style (bucket.host).

You can find information on configuring Nextcloud's primary storage here: 
https://docs.nextcloud.com/server/18/admin_manual/configuration_files/primary_storage.html

And configuring for S3 here:
https://docs.nextcloud.com/server/18/admin_manual/configuration_files/primary_storage.html#simple-storage-service-s3

Note that Nextcloud still requires a database.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Randy Morgan [mailto:ran...@chem.byu.edu] 
Sent: Friday, April 17, 2020 11:14 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Help

We are seeking information on configuring Ceph to work with Noobaa and 
NextCloud.

Randy

-- 
Randy Morgan
CSR
Department of Chemistry/BioChemistry
Brigham Young University
ran...@chem.byu.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to see files in buckets in radosgw object storage in ceph dashboard.?

2020-08-17 Thread DHilsbos
I would expect that most S3 compatible clients would work with RadosGW.

As to adding it to the Ceph dashboard, I don't think that's a good idea.  A 
bucket is a flat namespace.  Amazon (and others then did also) added semantics 
that allow for a pseudo-hierarchical behavior, but it's still based on a flat 
namespace.

We haven't been running Ceph very long, but I'm certain we already have several 
thousand objects in our RadosGW instance, others out there have tens and even 
hundreds of thousands of objects. Even with my relatively modest cluster I 
don't want the dashboard loading the object information for my buckets.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: sathvik vutukuri [mailto:7vik.sath...@gmail.com] 
Sent: Saturday, August 15, 2020 5:52 PM
To: ceph-users
Subject: [ceph-users] How to see files in buckets in radosgw object storage in 
ceph dashboard.?

Hi All,

Is there any way to see the list of files under buckets in ceph dashboard
for rados object storage. At present I can only see buckets details.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Yep, you're right.  I focused in on the last op, and missed the ones above it.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 11:49 AM
To: ceph-users@ceph.io
Cc: Dominic Hilsbos
Subject: RE: [ceph-users] Re: osd out vs crush reweight]

Hi Dominiq

I must say that I inherited this cluster and did not develop the cursh
rule used. The rule reads:

"rule_id": 1,
"rule_name": "hdd",
"ruleset": 1,
"type": 1,
"min_size": 2,
"max_size": 3,
"steps": [
{
"op": "take",
"item": -31,
"item_name": "DC3"
},
{
"op": "choose_firstn",
"num": 0,
"type": "room"
},
{
"op": "chooseleaf_firstn",
"num": 1,
"type": "host"
},

Doesn't that say it will choose DC3, then a room within DC3 and then a
host? (I agree that racks in the tree are superfluous, but it does not
harm either)

Anyway thanks for your effort. I hope someone else can explain why setting
the crushweight of an osd to 0 results in surprisingly much PG's going to
other osd;s on the same node instead of going to other nodes

Marcel

> Marcel;
>
> To answer your question, I don't see anything that would be keeping these
> PGs on the same node.  Someone with more knowledge of how the Crush rules
> are applied, and the code around these operations, would need to weigh in.
>
> I am somewhat curious though; you define racks, and even rooms in your
> tree, but your failure domain is set to host.  Is that intentional?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 10:14 AM
> To: ceph-users@ceph.io
> Cc: Dominic Hilsbos
> Subject: Re: [ceph-users] Re: osd out vs crush reweight]
>
> Dominic
>
> The crush rule dump and tree are attached (hope that works). All pools use
> crush_rule 1
>
> Marcel
>
>> Marcel;
>>
>> Sorry, could also send the output of:
>> ceph osd tree
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: dhils...@performair.com [mailto:dhils...@performair.com]
>> Sent: Tuesday, July 21, 2020 9:41 AM
>> To: c...@mknet.nl; ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>> Marcel;
>>
>> Thank you for the information.
>>
>> Could you send the output of:
>> ceph osd crush rule dump
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 9:38 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>>
>> Hi Dominic,
>>
>> This cluster is running 14.2.8 (nautilus) There's 172 osds divided
>> over 19 nodes.
>> There are currently 10 pools.
>> All pools have 3 replica's of data
>> There are 3968 PG's (the cluster is not yet fully in use. The number
>> of PGs is expected to grow)
>>
>> Marcel
>>
>>> Marcel;
>>>
>>> Short answer; yes, it might be expected behavior.
>>>
>>> PG placement is highly dependent on the cluster layout, and CRUSH
>>> rules.
>>> So... Some clarifying questions.
>>>
>>> What version of Ceph are you running?
>>> How many nodes do you have?
>>> How many pools do you have, and what are their failure domains?
>>>
>>> Thank you,
>>>
>>> Dominic L. Hilsbos, MBA
>>> Director - Information Technology
>>> Perform Air International, Inc.
>>> dhils...@performair.com
>>> www.PerformAir.com
>>>
>>>
>>> -Original Message-
>>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>>> Sent: Tuesday, July 21, 2020 6:52 AM
>>> To: ceph-users@ceph.io
>>> Subject: [ceph-users] osd out vs crush reweight
>>>
>>> Hi list,
>>>
>>> I ran a test with marking an osd out versus setting its crush weight
>>> to 0.
>>> I compared to what osds pages were send. The crush map has 3 rooms.
>>> This is what happened.
>>>
>>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's
>>> were send to the following osds
>>>
>>> NR PG's   OSD
>>>   2   1
>>>   1   4
>>>   1   5
>>>   1   6
>>>   1   7
>>>   2   8
>>>   1   31
>>>   1   34
>>>   1   35
>>>   1   56
>>>   2   57
>>>   1   58
>>>   1   61
>>>   1   83
>>>   1   84
>>>   1   88
>>>   1   99
>>>   1   100
>>>   

[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

To answer your question, I don't see anything that would be keeping these PGs 
on the same node.  Someone with more knowledge of how the Crush rules are 
applied, and the code around these operations, would need to weigh in.

I am somewhat curious though; you define racks, and even rooms in your tree, 
but your failure domain is set to host.  Is that intentional?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 10:14 AM
To: ceph-users@ceph.io
Cc: Dominic Hilsbos
Subject: Re: [ceph-users] Re: osd out vs crush reweight]

Dominic

The crush rule dump and tree are attached (hope that works). All pools use 
crush_rule 1

Marcel

> Marcel;
>
> Sorry, could also send the output of:
> ceph osd tree
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: dhils...@performair.com [mailto:dhils...@performair.com]
> Sent: Tuesday, July 21, 2020 9:41 AM
> To: c...@mknet.nl; ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
> Marcel;
>
> Thank you for the information.
>
> Could you send the output of:
> ceph osd crush rule dump
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 9:38 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
>
> Hi Dominic,
>
> This cluster is running 14.2.8 (nautilus) There's 172 osds divided 
> over 19 nodes.
> There are currently 10 pools.
> All pools have 3 replica's of data
> There are 3968 PG's (the cluster is not yet fully in use. The number 
> of PGs is expected to grow)
>
> Marcel
>
>> Marcel;
>>
>> Short answer; yes, it might be expected behavior.
>>
>> PG placement is highly dependent on the cluster layout, and CRUSH rules.
>> So... Some clarifying questions.
>>
>> What version of Ceph are you running?
>> How many nodes do you have?
>> How many pools do you have, and what are their failure domains?
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 6:52 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] osd out vs crush reweight
>>
>> Hi list,
>>
>> I ran a test with marking an osd out versus setting its crush weight 
>> to 0.
>> I compared to what osds pages were send. The crush map has 3 rooms. 
>> This is what happened.
>>
>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's 
>> were send to the following osds
>>
>> NR PG's   OSD
>>   2   1
>>   1   4
>>   1   5
>>   1   6
>>   1   7
>>   2   8
>>   1   31
>>   1   34
>>   1   35
>>   1   56
>>   2   57
>>   1   58
>>   1   61
>>   1   83
>>   1   84
>>   1   88
>>   1   99
>>   1   100
>>   2   107
>>   1   114
>>   2   117
>>   1   118
>>   1   119
>>   1   121
>>
>> All PG's were send to osds on other nodes in the same room, except 
>> for 1 PG on osd 114. I think this works as expected
>>
>> Now I  marked the osd in and wait until all stabilized. Then I set 
>> the crush weight to 0. ceph osd crush reweight osd.111 0. I thought 
>> this lowers the crush weight of the node so even less chances that 
>> PG's end up on an osd of the same node. However the result are
>>
>> NR PG's   OSD
>>   1   61
>>   1   83
>>   1   86
>>   3   108
>>   4   109
>>   5   110
>>   2   112
>>   5   113
>>   7   114
>>   5   115
>>   2   116
>>
>> except for 3 PG's all other PG's ended up on an osd belonging to the 
>> same node :-O. Is this expected behaviour? Can someone explain?? This 
>> is on nautilus 14.2.8.
>>
>> Thanks
>>
>> Marcel
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
>>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mail

[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Sorry, could also send the output of:
ceph osd tree

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Tuesday, July 21, 2020 9:41 AM
To: c...@mknet.nl; ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]

Marcel;

Thank you for the information.

Could you send the output of:
ceph osd crush rule dump

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 9:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]


Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>   2   1
>   1   4
>   1   5
>   1   6
>   1   7
>   2   8
>   1   31
>   1   34
>   1   35
>   1   56
>   2   57
>   1   58
>   1   61
>   1   83
>   1   84
>   1   88
>   1   99
>   1   100
>   2   107
>   1   114
>   2   117
>   1   118
>   1   119
>   1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>   1   61
>   1   83
>   1   86
>   3   108
>   4   109
>   5   110
>   2   112
>   5   113
>   7   114
>   5   115
>   2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Thank you for the information.

Could you send the output of:
ceph osd crush rule dump

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 9:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]


Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>   2   1
>   1   4
>   1   5
>   1   6
>   1   7
>   2   8
>   1   31
>   1   34
>   1   35
>   1   56
>   2   57
>   1   58
>   1   61
>   1   83
>   1   84
>   1   88
>   1   99
>   1   100
>   2   107
>   1   114
>   2   117
>   1   118
>   1   119
>   1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>   1   61
>   1   83
>   1   86
>   3   108
>   4   109
>   5   110
>   2   112
>   5   113
>   7   114
>   5   115
>   2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight

2020-07-21 Thread DHilsbos
Marcel;

Short answer; yes, it might be expected behavior.

PG placement is highly dependent on the cluster layout, and CRUSH rules.  So... 
Some clarifying questions.

What version of Ceph are you running?
How many nodes do you have?
How many pools do you have, and what are their failure domains?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 6:52 AM
To: ceph-users@ceph.io
Subject: [ceph-users] osd out vs crush reweight

Hi list,

I ran a test with marking an osd out versus setting its crush weight to 0.
I compared to what osds pages were send. The crush map has 3 rooms. This
is what happened.

On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
send to the following osds

NR PG's   OSD
  2   1
  1   4
  1   5
  1   6
  1   7
  2   8
  1   31
  1   34
  1   35
  1   56
  2   57
  1   58
  1   61
  1   83
  1   84
  1   88
  1   99
  1   100
  2   107
  1   114
  2   117
  1   118
  1   119
  1   121

All PG's were send to osds on other nodes in the same room, except for 1
PG on osd 114. I think this works as expected

Now I  marked the osd in and wait until all stabilized. Then I set the
crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
lowers the crush weight of the node so even less chances that PG's end up
on an osd of the same node. However the result are

NR PG's   OSD
  1   61
  1   83
  1   86
  3   108
  4   109
  5   110
  2   112
  5   113
  7   114
  5   115
  2   116

except for 3 PG's all other PG's ended up on an osd belonging to the same
node :-O. Is this expected behaviour? Can someone explain?? This is on
nautilus 14.2.8.

Thanks

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Thank you!

2020-07-20 Thread DHilsbos
I just want to thank the Ceph community, and the Ceph developers for such a 
wonderful product.

We had a power outage on Saturday, and both Ceph clusters went offline, along 
with all of our other servers.

Bringing Ceph back to full functionality was an absolute breeze, no problems, 
no hiccups, no nothing.  Just start the servers, and watch everything sort 
itself out.

Again; Thank you!

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph/rados performace sync vs async

2020-07-17 Thread DHilsbos
Daniel;

As I said, I don't actually KNOW most of this.

As such, what I laid out was conceptual.  

Ceph would need to be implemented to perform these operations in parallel, or 
not.  Conceptually, those areas where operations can be parallelized, making 
them parallel would improve wall clock performance in 80% - 90% of cases, thus 
making this configurable wouldn't make sense.

That said, I don't know which route the developers went.

All I know is that the client transfers each chunk to the master for its PG, 
and the master sends it on to the replicas.

I suspect that replicas must acknowledge the chunk before the master finishes 
the synchronous operation.
I suspect that all replicas are transferred (from the master) in parallel.

Given the maturity of Ceph, I suspect this has already been done, unless the 
developers ran into a significant issue, but I don't know.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Daniel Mezentsev [mailto:d...@soleks.com] 
Sent: Friday, July 17, 2020 4:14 PM
To: Dominic Hilsbos
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: ceph/rados performace sync vs async

  Hi Dominic,

Pool configured as replica, replica size is 2 (i know it's not  
recommended, but it's test environment, and i ca't afford more nodes  
running at home). What you are saying - makes sense for me.

Yoo mentioned that even for sync IO some jobs can be done in parallel.  
That is above my current level of ceph knowledge, but sounds like very  
interesting part to take a look at. Can you give me initial direction  
where to start, and is that options are configurable ?

> Daniel;
>
> How is your pool configured? Replica, or Erasure-Coded?
>
> I don't actually know any of this, but...
>
> I would expect that a synchronous call to a replica pool (R=3) would  
> look something like this:
> Client --> PG Master Host (data)
> PG Master Host --> Local Disk (data)
> PG Master Host --> PG Replica Host 1 (data)
> PG Master Host --> PG Replica Host 2 (data)
> PG Replica Host 1 --> Disk (data)
> PG Replica Host 2 --> Disk (data)
> PG Replica Host 1 --> PG Master Host (ack)
> PG Replica Host 2 --> PG Master Host (ack)
> PG Master Host --> Client (ack)
>
> Some of that can happen in parallel, for instance the master could  
> be transferring the file to both replica hosts, while also writing  
> it to disk.
>
> You can imagine why a synchronous call could be significantly slower  
> though, yes?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com[1]
>
> -Original Message-
> From: Daniel Mezentsev [mailto:d...@soleks.com]
> Sent: Friday, July 17, 2020 3:05 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] ceph/rados performace sync vs async
>
> Hi All,
>
> I started a small project related to metrics collection and
> processing, Ceph was chosen as a storage backend. Decided to use rados
> directly, to avoid any additional layers. I got a very simple client -
> it works fine, but performance is very low. Can't get more than
> 30-35MBsec. Rados bench shows 200MBsec for my test pool. Should be
> mentioned about the client - I'm using sbcl (yep lisp). Call to rados
> API is just cffi call. 
>  
> Did try in async mode. Wow ! Saturated network bandwidth for large
> objects (4Mb and bigger), for small objects - saturated OSD IOPS
> ~2.4KIOPS for 8 SAS disks, so ~300 IOPS per disk - that sounds pretty
> reasonable. Bottom line - issue not with the lisp client - im getting
> close to C performance, difference is sync vs async IO.
>  
> Why it's so big - sync operations are approx 2-3 times slower then async .
> Daniel Mezentsev, founder
> (+1) 604 313 8592.
> Soleks Data Group.
> Shaping the clouds.
> ___
> ceph-users mailing list -- ceph-us...@ceph.ioto unsubscribe send an  
> email to ceph-users-le...@ceph.io



Links:
--
[1] http://www.PerformAir.com
  Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph/rados performace sync vs async

2020-07-17 Thread DHilsbos
Daniel;

How is your pool configured? Replica, or Erasure-Coded?

I don't actually know any of this, but...

I would expect that a synchronous call to a replica pool (R=3) would look 
something like this:
Client --> PG Master Host (data)
PG Master Host --> Local Disk (data)
PG Master Host --> PG Replica Host 1 (data)
PG Master Host --> PG Replica Host 2 (data)
PG Replica Host 1 --> Disk (data)
PG Replica Host 2 --> Disk (data)
PG Replica Host 1 --> PG Master Host (ack)
PG Replica Host 2 --> PG Master Host (ack)
PG Master Host --> Client (ack)

Some of that can happen in parallel, for instance the master could be 
transferring the file to both replica hosts, while also writing it to disk.

You can imagine why a synchronous call could be significantly slower though, 
yes?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Daniel Mezentsev [mailto:d...@soleks.com] 
Sent: Friday, July 17, 2020 3:05 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ceph/rados performace sync vs async

Hi All,

  I started a small project related to metrics collection and  
processing, Ceph was chosen as a storage backend. Decided to use rados  
directly, to avoid any additional layers. I got a very simple client -  
it works fine, but performance is very low. Can't get more than  
30-35MBsec. Rados bench shows 200MBsec for my test pool. Should be  
mentioned about the client - I'm using sbcl (yep lisp). Call to rados  
API is just cffi call. 
   
  Did try in async mode. Wow ! Saturated network bandwidth for large  
objects (4Mb and bigger), for small objects - saturated OSD IOPS  
~2.4KIOPS for 8 SAS disks, so ~300 IOPS per disk - that sounds pretty  
reasonable. Bottom line - issue not with the lisp client - im getting  
close to C performance, difference is sync vs async IO.
   
  Why it's so big - sync operations are approx 2-3 times slower then async .
  Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: about replica size

2020-07-10 Thread DHilsbos
This keeps coming up, which is not surprising, considering it is a core 
question.

Here's how I look at it:
The Ceph team has chosen to default to N+2 redundancy.  This is analogous to 
RAID 6 (NOT RAID 1).

The basic reasoning for N+2 in storage is as follows:
If you experience downtime (either routine, or non-routine), then you can 
survive an additional failure during recovery.

The scenario goes like this:
Update software on OSD host
Reboot
All hosted OSDs are marked down
Host come online, hosted OSDs come online
Recovery begins
Drive in another host is found to have an unreadable sector

If you only have single redundancy (N+1, i.e. R2, m=X n=1, RAID1, RAID5), then 
you have now lost data.

If you have double redundancy (N+2, R3, m=X n=2, RAID6), then there is a third 
method to get the good data, and both redundancy layers can be rebuilt.

RAID6 came into being because the longer you spend recovering, the more likely 
you are to run into an undetected failure.

Ceph takes the same view; it is built for MASSIVE storage, with LONG recovery 
times.  My pools are built on 10Tb drives, others in this list use 12Tb and 
14Tb drives.

It all comes down to this: are you absolutely certain that you will never 
encounter a latent failure, while executing routine maintenance?

If you look at it from the stand point of Risk Management, using Dr. Reason's 
Swiss Cheese model, then each redundancy layer is a barrier against failure.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Zhenshi Zhou [mailto:deader...@gmail.com] 
Sent: Thursday, July 9, 2020 7:11 PM
To: ceph-users
Subject: [ceph-users] about replica size

Hi,

As we all know, the default replica setting of 'size' is 3 which means
there
are 3 copies of an object. What is the disadvantages if I set it to 2,
except
I get fewer copies?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Module 'cephadm' has failed: auth get failed: failed to find client.crash.ceph0-ote in keyring retval:

2020-07-03 Thread DHilsbos
Biohazard;

This looks like a fairly simple authentication issue.  It looks like the 
keyring(s) available to the command don't contain a key which meets the 
commands needs.

Have you verified the presence and accuracy of your keys?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: bioh...@yahoo.com [mailto:bioh...@yahoo.com] 
Sent: Friday, July 3, 2020 2:29 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Module 'cephadm' has failed: auth get failed: failed to 
find client.crash.ceph0-ote in keyring retval:

anyone seen this error on the new Ceph 15.2.4 cluster using cpehadm to manage 
it ?

Module 'cephadm' has failed: auth get failed: failed to find 
client.crash.ceph0-ote in keyring retval:
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Object Gateway not working within the dashboard anymore after network change

2020-07-03 Thread DHilsbos
Hendik;

I'm assuming that s3.url.com round robin DNSed to the new interface on each 
host.

I don't see a problem with pointing the dashboard at one of the hosts directly. 
 Though there is no load balancing in that kind of setup.  I don't believe the 
dashboard represents a significant load.

If the load balancing is a concern to you, you could setup another round 
robined DNS, at (for instance): rgw.url.com, and direct the dashboard there.  
You could also use an IP address instead of a URL.

You mentioned setting static routes between the RGW hosts and the Ceph cluster; 
you might consider moving the Ceph (old) interfaces into the Ceph Public 
network subnet.  This would reduce the complexity of your overall setup, and 
possibly improve maintainability.  Just a thought.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Hendrik Peyerl [mailto:hpey...@plusline.net] 
Sent: Friday, July 3, 2020 7:11 AM
To: Dominic Hilsbos
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Object Gateway not working within the dashboard 
anymore after network change

Hello Dominic,

thank you for you quick help, I did change the settings but maybe to the wrong 
host:

The endpoint for the clients would be something like $bucketname.s3.url.com, i 
therefor set the api-host to s3.url.com (which worked before).
Now that I am writing this I realize that the dashboard hosts do not have 
access to that URL. As i have 2 RGW Server it would probably be a bad idea to 
just point it to one of them?

Thanks,

Hendrik

> On 3. Jul 2020, at 15:49, dhils...@performair.com wrote:
> 
> Hendrik;
> 
> Since the hostname / FQDN for use by Ceph for you RGW server(s) changed, did 
> you adjust the rgw-api-host setting for the dashboard?
> 
> The command would be:
> ceph dashboard set-rgw-api-host 
> 
> Thank you,
> 
> Dominic L. Hilsbos, MBA 
> Director – Information Technology 
> Perform Air International Inc.
> dhils...@performair.com 
> www.PerformAir.com
> 
> 
> 
> -Original Message-
> From: Hendrik Peyerl [mailto:hpey...@plusline.net] 
> Sent: Friday, July 3, 2020 4:26 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Object Gateway not working within the dashboard anymore 
> after network change
> 
> Hi all,
> 
> we are currently experiencing a problem with the Obejct Gateway part of the 
> dashboard not working anymore:
> 
> We had a working setup were the RGW servers only had 1 network interface with 
> an IP address that was reachable by the monitor servers and the dashboard was 
> working as expected. 
> After our initial tests everything was working great and we decided to add 
> another physical link to the RGW Servers for the traffic to the clients.
> With that network change we also had to set the default gateway to the new 
> interface while adding static routes for the rest of the ceph environment.
> To avoid issues with hostnames (the old hostname now resolves to the new 
> interface) we added another hostname for the internal traffic, purged the 
> gateways from ceph and added them again via ceph-deploy rgw create with the 
> new hostname.
> 
> The S3 communication is working perfectly fine as it did before, we can reach 
> all buckets and the monitors can communicate with the Gateway. The Dashboard 
> however throws the following error whenever we navigate to any of the object 
> gateway menus:
> 
> — 
> 
> 2020-07-03 10:33:41.871 7fa0f9dbc700  0 mgr[dashboard] [03/Jul/2020:10:33:41] 
> HTTP Traceback (most recent call last):
>  File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 656, in 
> respond
>response.body = self.handler()
>  File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 188, 
> in __call__
>self.body = self.oldhandler(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line 221, in 
> wrap
>return self.newhandler(innerfunc, *args, **kwargs)
>  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88, in 
> dashboard_exception_handler
>return handler(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 34, in 
> __call__
>return self.callable(*self.args, **self.kwargs)
>  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 661, in 
> inner
>ret = func(*args, **kwargs)
>  File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 28, in status
>if not instance.is_service_online():
>  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 507, in 
> func_wrapper
>**kwargs)
>  File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 321, in 
> is_service_online
>_ = request({'format': 'json'})
>  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 313, in __call__
>data, raw_content)
>  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 445, in do_requ

[ceph-users] Re: Object Gateway not working within the dashboard anymore after network change

2020-07-03 Thread DHilsbos
Hendrik;

Since the hostname / FQDN for use by Ceph for you RGW server(s) changed, did 
you adjust the rgw-api-host setting for the dashboard?

The command would be:
ceph dashboard set-rgw-api-host 

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Hendrik Peyerl [mailto:hpey...@plusline.net] 
Sent: Friday, July 3, 2020 4:26 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Object Gateway not working within the dashboard anymore 
after network change

Hi all,

we are currently experiencing a problem with the Obejct Gateway part of the 
dashboard not working anymore:

We had a working setup were the RGW servers only had 1 network interface with 
an IP address that was reachable by the monitor servers and the dashboard was 
working as expected. 
After our initial tests everything was working great and we decided to add 
another physical link to the RGW Servers for the traffic to the clients.
With that network change we also had to set the default gateway to the new 
interface while adding static routes for the rest of the ceph environment.
To avoid issues with hostnames (the old hostname now resolves to the new 
interface) we added another hostname for the internal traffic, purged the 
gateways from ceph and added them again via ceph-deploy rgw create with the new 
hostname.

The S3 communication is working perfectly fine as it did before, we can reach 
all buckets and the monitors can communicate with the Gateway. The Dashboard 
however throws the following error whenever we navigate to any of the object 
gateway menus:

— 

2020-07-03 10:33:41.871 7fa0f9dbc700  0 mgr[dashboard] [03/Jul/2020:10:33:41] 
HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 656, in 
respond
response.body = self.handler()
  File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 188, 
in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line 221, in 
wrap
return self.newhandler(innerfunc, *args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88, in 
dashboard_exception_handler
return handler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 34, in 
__call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 661, in 
inner
ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 28, in status
if not instance.is_service_online():
  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 507, in func_wrapper
**kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 321, in 
is_service_online
_ = request({'format': 'json'})
  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 313, in __call__
data, raw_content)
  File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 445, in do_request
ex.args[0].reason.args[0])
  File "/usr/lib64/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
TypeError: expected string or buffer

2020-07-03 10:33:41.872 7fa0f9dbc700  0 mgr[dashboard] [2a02:2e0:13::a05:42784] 
[GET] [500] [45.044s] [plusline] [1.8K] /api/rgw/status
2020-07-03 10:33:41.872 7fa0f9dbc700  0 mgr[dashboard] ['{"status": "500 
Internal Server Error", "version": "3.2.2", "traceback": "Traceback (most 
recent call last):\\n  File \\"/usr/lib/python2.7/site-
packages/cherrypy/_cprequest.py\\", line 656, in respond\\nresponse.body = 
self.handler()\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line 188, in 
__call__\\nself.b
ody = self.oldhandler(*args, **kwargs)\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\", line 221, in 
wrap\\nreturn self.newhandler(innerfunc, *args, **kwargs)\\n  File \\"/usr/s
hare/ceph/mgr/dashboard/services/exception.py\\", line 88, in 
dashboard_exception_handler\\nreturn handler(*args, **kwargs)\\n  File 
\\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", l
ine 34, in __call__\\nreturn self.callable(*self.args, **self.kwargs)\\n  
File \\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 661, in 
inner\\nret = func(*args, **kwargs)\\n  F
ile \\"/usr/share/ceph/mgr/dashboard/controllers/rgw.py\\", line 28, in 
status\\nif not instance.is_service_online():\\n  File 
\\"/usr/share/ceph/mgr/dashboard/rest_client.py\\", line 507, in func_w
rapper\\n**kwargs)\\n  File 
\\"/usr/share/ceph/mgr/dashboard/services/rgw_client.py\\", line 321, in 
is_service_online\\n_ = request({\'format\': \'json\'})\\n  File 
\\"/usr/share/ceph/mgr/dashb
oard/rest_client.py\\", line 313, in __call__\\ndata, raw_content)\\n  File 
\\"/usr/share/c

[ceph-users] Re: fault tolerant about erasure code pool

2020-06-26 Thread DHilsbos
As others have pointed out; setting the failure domain to OSD is dangerous 
because then all 6 chunks for an object can end up on the same host.  6 hosts 
really seems like the minimum to mess with EC pools.

Adding a bucket type between host and osd seems like a good idea here, if you 
absolutely must use EC pools.

Perhaps something that corresponds to the HBAs / disk controllers?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Lindsay Mathieson [mailto:lindsay.mathie...@gmail.com] 
Sent: Friday, June 26, 2020 4:08 AM
To: Zhenshi Zhou
Cc: ceph-users
Subject: [ceph-users] Re: fault tolerant about erasure code pool

On 26/06/2020 8:08 pm, Zhenshi Zhou wrote:
> Hi Lindsay,
>
> I have only 3 hosts, and is there any method to set a EC pool cluster 
> in a better way

There's failure domain by OSD, which Janne knows far better than I :)

-- 
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

2020-06-24 Thread DHilsbos
All;

This conversation has been fascinating.

I'm throwing my hat in the ring, though I know almost nothing about systemd...

Completely non-portable, but...
Couldn't you write a script to issue the necessary commands to the desired 
drives, then create a system unit that calls it before OSD initialization?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk] 
Sent: Wednesday, June 24, 2020 9:15 AM
To: Marc Roos; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> I can remember reading this before. I was hoping you maybe had some
> setup with systemd scripts or maybe udev.

Yeah, doing this on boot up would be ideal. I was looking really hard into 
tuned and other services that claimed can do it, but required plugins or other 
stuff did/does not exist and documentation is close to non-existent.

After spending a couple of days I gave up and went with the simple 
script-command version.

If you come across something that allows easy configuration of this at 
boot-time, please let me know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 24 June 2020 18:08:49
To: Frank Schilder; paul.emmerich
Cc: bknecht; ceph-users; s.priebe
Subject: RE: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba 
MG07ACA14TE HDDs

> Sorry for the spam, but I need to add this disclaimer:

> Although it is documented as safe to disable volatile write cache on a
disk in use, I would
> probably not do it. The required cache flush might be erroneous in the
firmware.

I can remember reading this before. I was hoping you maybe had some
setup with systemd scripts or maybe udev.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't bind mon to v1 port in Octopus.

2020-06-18 Thread DHilsbos
My understanding is that MONs only configure themselves from the config file at 
first startup.  After that all MONs use the monmap to learn about themselves, 
and their peers.

As such; adding an address to the config file for a running MON, even if you 
restart / reboot, would not achieve the expect changes, as it doesn't modify 
the monmap.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: mafo...@gmail.com [mailto:mafo...@gmail.com] 
Sent: Thursday, June 18, 2020 12:56 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Can't bind mon to v1 port in Octopus.

After some more testing it seems that ceph just does no pickup on some of 
ceph.conf changes after bootstrapped. It was possible to bind to the v1 port 
using `ceph mon set-addrs aio1 [v2:172.16.6.210:3300,v1:172.16.6.210:6789]`
It was defo not an issue with OS syscalls or permissions,  just ceph not 
picking up on new config after a restart.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW listing slower on nominally faster setup

2020-06-11 Thread DHilsbos
Stefan;

I can't find it, but I seem to remember a discussion in this mailing list that 
sharded RGW performance is significantly better if the shard count is a power 
of 2, so you might try increasing shards to 64.

Also, you might looks at OSD logs while a listing is trying to run, to see if 
this illuminates anything for you.

You said: "2 x SATA SSDs for RGW index pool," but do you have the zone's index 
pool running on a rule which only targets SSDs, or only targets those SSDs?  
Are you running your RGW multi-site?  Are you running replication for RGW in 
multi-site?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Stefan Wild [mailto:sw...@tiltworks.com] 
Sent: Wednesday, June 10, 2020 6:05 PM
To: ceph-users@ceph.io
Subject: [ceph-users] RGW listing slower on nominally faster setup

Hi everyone,

We are currently transitioning from a temporary machine to our production 
hardware. Since we're starting with under 200 TB raw storage, we are currently 
on only 1–2 physical machines per cluster, eventually in 3 zones. The temporary 
machine is undersized for even that with an older single 6-core CPU and 
spinning disks only. As of now that "cluster-of-one" is running on Nautilus and 
has 3 buckets with 98K, 1.1M and 1.4M objects, respectively for a total of 9.1 
TB. As we're expecting these to grow to around 5M objects each and will be in a 
multisite configuration, I went with 50 shards per bucket.

Listing "directories" via S3 is somewhat slow (sometimes to the point of read 
timeouts) but mostly bearable. After the new production setup (dual 
8-core/16-thread Xeon Silvers, 2 x SATA SSDs for RGW index pool, on Octopus, 
with enough free memory to easily fit all bucket indexes multiple times) synced 
successfully, listings via S3 always time out on the RGW on that machine/zone.

As soon as I trigger a single listing via S3 (even on the 98K object bucket), 
reads go up to a sustained 300–500MB/s and 20–50K IOPS on the bucket index pool 
for several hours. The RGW debug log is flooded with lines like this:

{"log":"debug 2020-06-08T19:31:08.315+ 7f83d704c700  1 
RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing 
requires read #1\n","stream":"stdout","time":"2020-06-08T19:31:08.317198682Z"}

I get that sharded RGW indexes (and listing objects in S3 buckets in general) 
are not very efficient, but after getting somewhat decent results on slower 
hardware and an older Ceph version, I wasn't expecting the nominally much 
better setup to be orders of magnitude slower.

Any help or pointers would be greatly appreciated.

Thank you,
Stefan


___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw-admin sync status output

2020-06-10 Thread DHilsbos
All;

We've been running our Ceph clusters (Nautilus / 14.2.8) for a while now 
(roughly 9 months), and I've become curious about the output of the 
"radosgw-admin sync status" command.

Here's a the output from our secondary zone:
  realm  ()
  zonegroup  ()
   zone  ()
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source:  ()
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
3 shards are recovering
recovering shards: [39,41,66]

Radosgw is in use, so the active recovery for data doesn't really surprise me.

What I am curious about is these 2 lines:
full sync: 0/64 shards
full sync: 0/128 shards

Is this considered normal?  If so, why have those lines present in this output?
Are they relevant to a different type of replication than what we are doing 
(I'm not aware of a different type of radosgw replication, but I'm not 
omniscient)?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
Jarett;

It is and it isn't.  Replication can be thought of as continuous backups.

Backups, especially as SpiderFox is suggesting, are point-in-time, immutable 
copies of data.  Until they are written over, they don't change, even if the 
data does.

In Ceph's RadosGW (RGW) multi-site replication changes, even "bad" changes, are 
pushed to the peer as quickly as the system can manage.  Even the "replication" 
occurring within a cluster can be considered a "backup," sort of.

As SpiderFox suggested, if malware is able to delete or encrypt the files, 
either through  RGW, RADOS, or on the underlying block device, you've got 
problems.  Note though; if they bypass RGW, then (AFAIK), the changes won't be 
replicated to the peer.

That's why I talk about disaster recovery.  Backing up is one disaster recovery 
technique, and is still perfectly valid.  Perform Air maintains backups of our 
Active Directory domain controllers, for instance.

Clustering, and off-site replication, are other disaster recovery paradigms.  
Each has advantages and disadvantages.

Ultimately, as long as everything works, I believe the only wrong disaster 
recovery plan is doing nothing.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Jarett DeAngelis [mailto:jar...@reticulum.us] 
Sent: Friday, May 29, 2020 5:02 PM
To: Dominic Hilsbos
Cc: ludek.navra...@yahoo.co.uk; ceph-users@ceph.io
Subject: Re: [ceph-users] OSD backups and recovery

For some reason I’d thought replication between clusters was an “official” 
method of backing up.

> On May 29, 2020, at 4:31 PM,  
>  wrote:
> 
> Ludek;
> 
> As a cluster system, Ceph isn't really intended to be backed up.  It's 
> designed to take quite a beating, and preserve your data.
> 
> From a broader disaster recovery perspective, here's how I architected my 
> clusters:
> Our primary cluster is laid out in such a way that an entire rack can fail 
> without read / write being impacted, much less data integrity.  On top of 
> that, our RadosGW was a multi-site setup which automatically sends a copy of 
> every object to a second cluster at a different location.
> 
> Thus my disaster recovery looks like this:
> 1 rack or less: no user impact, rebuild rack
> 2 racks: users are unable to add objects, but existing data is safe, 
> rebuild cluster (or as below) Whole site: switch second site to master 
> and continue
> 
> No backup or recovery necessary.
> 
> You might look the multi-site documentation: 
> https://docs.ceph.com/docs/master/radosgw/multisite/
> 
> I had a long conversation with our owner on this same topic, and how the 
> organization would have to move from a "Backup & Recover" mindset to a 
> "Disaster Recovery" mindset.  It worked well for us, as we were looking to 
> move more towards Risk Analysis based approaches anyway.
> 
> Thank you,
> 
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
> 
> 
> 
> -Original Message-
> From: Ludek Navratil [mailto:ludek.navra...@yahoo.co.uk] 
> Sent: Wednesday, February 5, 2020 6:57 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] OSD backups and recovery
> 
> HI all,
> what is the best approach for OSD backups and recovery? We use only Radosgw 
> with S3 API and I need to backup the content of S3 buckets. Currently I sync 
> s3 buckets to local filesystem and backup the content using Amanda.
> I believe that there must a better way to do this but I couldn't find it in 
> docs. 
> 
> I know that one option is to setup an archive zone, but it requires an 
> additional ceph cluster that needs to be maintained and looked after. I would 
> rather avoid that.
> 
> How can I backup an entire Ceph cluster? Or individual OSDs in the way that 
> will allow me to recover the data correctly?  
> 
> Many thanks,Ludek
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
> ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
SpiderFox;

If you're concerned about ransomware (and you should be), then you should:
a) protect the cluster from the internet AND from USERS.
b) place another technology between your cluster and your users (I use 
Nextcloud backed by RadosGW through S3 buckets)
c) turn on versioning in your buckets

If you have a backup solution that can handle petabytes of data reliably, then 
certainly use it.  Everything I've tried fell over dead at a couple dozen 
terabytes.

Nothing is fool proof, not even the vaunted offline backup (ever try to do a 
recover and find the tape can't be read?).

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 

dhils...@performair.com 
300 S. Hamilton Pl. 
Gilbert, AZ 85233 
Phone: (480) 610-3500 
Fax: (480) 610-3501 
www.PerformAir.com



-Original Message-
From: Coding SpiderFox [mailto:codingspider...@gmail.com] 
Sent: Friday, May 29, 2020 2:45 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: OSD backups and recovery

Am Fr., 29. Mai 2020 um 23:32 Uhr schrieb :

> Ludek;
>
> As a cluster system, Ceph isn't really intended to be backed up.  It's 
> designed to take quite a beating, and preserve your data.
>
>
But that does not save me when a crypto trojan encrypts all my data. There 
should always be an offline backup that can be restored in case of crypto trojan



> From a broader disaster recovery perspective, here's how I architected 
> my
> clusters:
> Our primary cluster is laid out in such a way that an entire rack can 
> fail without read / write being impacted, much less data integrity.  
> On top of that, our RadosGW was a multi-site setup which automatically 
> sends a copy of every object to a second cluster at a different location.
>
> Thus my disaster recovery looks like this:
> 1 rack or less: no user impact, rebuild rack
> 2 racks: users are unable to add objects, but existing data is safe, 
> rebuild cluster (or as below) Whole site: switch second site to master 
> and continue
>
> No backup or recovery necessary.
>
> You might look the multi-site documentation:
> https://docs.ceph.com/docs/master/radosgw/multisite/
>
> I had a long conversation with our owner on this same topic, and how 
> the organization would have to move from a "Backup & Recover" mindset 
> to a "Disaster Recovery" mindset.  It worked well for us, as we were 
> looking to move more towards Risk Analysis based approaches anyway.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Ludek Navratil [mailto:ludek.navra...@yahoo.co.uk]
> Sent: Wednesday, February 5, 2020 6:57 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] OSD backups and recovery
>
> HI all,
> what is the best approach for OSD backups and recovery? We use only 
> Radosgw with S3 API and I need to backup the content of S3 buckets.
> Currently I sync s3 buckets to local filesystem and backup the content 
> using Amanda.
> I believe that there must a better way to do this but I couldn't find 
> it in docs.
>
> I know that one option is to setup an archive zone, but it requires an 
> additional ceph cluster that needs to be maintained and looked after. 
> I would rather avoid that.
>
> How can I backup an entire Ceph cluster? Or individual OSDs in the way 
> that will allow me to recover the data correctly?
>
> Many thanks,Ludek
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: General question CephFS or RBD

2020-05-29 Thread DHilsbos
Willi;

ZFS on RBD seems like a waste, and overkill.  A redundant storage solution on 
top of a redundant storage solution?

You can have multiple file systems within CephFS, the thing to note is that 
each CephFS MUST have a SEPARATE active MDS.

For failover, each should have a secondary MDS, and these also need to be 
separate (preferably running in standby-replay mode).  Each MDS instance can 
only handle one responsibility, for one file system.  Each file system also 
uses 2 pools; one for metadata (think filenames, file properties, and the 
directory tree), and one for the file data itself.

The containerization present by default in Octopus should make running many 
MDSs easier.

We run 3 CephFS file systems from our primary cluster.  This uses 6 MDSs, and 6 
pools.  We assigned the metadata pools to our SSDs (using CRUSH rules) for 
performance.

You might also work with your users on switching to an Object Storage paradigm 
(think S3), as RadosGW has some nice disaster recovery features.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Willi Schiegel [mailto:willi.schie...@posteo.de] 
Sent: Sunday, January 26, 2020 4:24 AM
To: ceph-users@ceph.io
Subject: [ceph-users] General question CephFS or RBD

Hello All,

I have a HW RAID based 240 TB data pool with about 200 million files for users 
in a scientific institution. Data sizes range from tiny parameter files for 
scientific calculations and experiments to huge images of brain scans. There 
are group directories, home directories, Windows roaming profile directories 
organized in ZFS pools on Solaris operating systems, exported via NFS and Samba 
to Linux, macOS, and Windows clients.

I would like to switch to CephFS because of the flexibility and expandability 
but I cannot find any recommendations for which storage backend would be 
suitable for all the functionality we have.

Since I like the features of ZFS like immediate snapshots of very large data 
pools, quotas for each file system within hierarchical data trees and dynamic 
expandability by simply adding new disks or disk images without manual resizing 
would it be a good idea to create RBD images, map them onto the file servers 
and create zpools on the mapped images? I know that ZFS best works with raw 
disks but maybe a RBD image is close enough to a raw disk?

Or would CephFS be the way to go? Can there be multiple CephFS pools for the 
group data folders and for the user's home directory folders for example or do 
I have to have everything in one single file space?

Maybe someone can share his or her field experience?

Thank you very much.

Best regards
Willi
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD backups and recovery

2020-05-29 Thread DHilsbos
Ludek;

As a cluster system, Ceph isn't really intended to be backed up.  It's designed 
to take quite a beating, and preserve your data.

From a broader disaster recovery perspective, here's how I architected my 
clusters:
Our primary cluster is laid out in such a way that an entire rack can fail 
without read / write being impacted, much less data integrity.  On top of that, 
our RadosGW was a multi-site setup which automatically sends a copy of every 
object to a second cluster at a different location.

Thus my disaster recovery looks like this:
1 rack or less: no user impact, rebuild rack
2 racks: users are unable to add objects, but existing data is safe, rebuild 
cluster (or as below)
Whole site: switch second site to master and continue

No backup or recovery necessary.

You might look the multi-site documentation: 
https://docs.ceph.com/docs/master/radosgw/multisite/

I had a long conversation with our owner on this same topic, and how the 
organization would have to move from a "Backup & Recover" mindset to a 
"Disaster Recovery" mindset.  It worked well for us, as we were looking to move 
more towards Risk Analysis based approaches anyway.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Ludek Navratil [mailto:ludek.navra...@yahoo.co.uk] 
Sent: Wednesday, February 5, 2020 6:57 AM
To: ceph-users@ceph.io
Subject: [ceph-users] OSD backups and recovery

HI all,
what is the best approach for OSD backups and recovery? We use only Radosgw 
with S3 API and I need to backup the content of S3 buckets. Currently I sync s3 
buckets to local filesystem and backup the content using Amanda.
I believe that there must a better way to do this but I couldn't find it in 
docs. 

I know that one option is to setup an archive zone, but it requires an 
additional ceph cluster that needs to be maintained and looked after. I would 
rather avoid that.

How can I backup an entire Ceph cluster? Or individual OSDs in the way that 
will allow me to recover the data correctly?  

Many thanks,Ludek
  
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph and iSCSI

2020-05-29 Thread DHilsbos
BR;

I've built my own iSCSI targets (using Fedora and CentOS), and use them in 
production.  I've also built 2 different Ceph clusters.

They are completely different.  Set aside everything you know about iSCSI, it 
doesn't apply.

Ceph is a clustered object store, it can dynamically expand (nearly) without 
limit, and is (mostly) self-healing.  There are overlay technologies that allow 
Ceph clusters to pretend to be Amazon S3 or OpenStack Swift (RadosGW), block 
devices, similar to OpenStack Cinder or Amazon EBS (RBD), and file systems 
usable directly by Linux clients (CephFS).

You might look at the architecture documentation: 
https://docs.ceph.com/docs/master/architecture/

The only point of overlap is that Ceph can be coerced to provide iSCSI targets.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Bobby [mailto:italienisch1...@gmail.com] 
Sent: Tuesday, January 7, 2020 7:49 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph and iSCSI

Hi all,

I am new to Ceph. But I have a some good understanding of iSCSI protocol. I 
will dive into Ceph because it looks promising. I am particularly interested in 
Ceph-RBD. I have a request. Can you please tell me, if any, what are the common 
similarities between iSCSI and Ceph. If someone has to work on a common model 
for iSCSI and Ceph, what would be those significant points you would suggest to 
someone who has some understanding of  iSCSI?

Looking forward to answers. Thanks in advance :-)

BR
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread DHilsbos
Phil;

I like to refer to basic principles, and design assumptions / choices when 
considering things like this.  I also like to refer to more broadly understood 
technologies.  Finally; I'm still relatively new to Ceph, so here it goes...

TLDR: Ceph is (likes to be) double-redundent (like RAID-6), while dual power 
(n+1) is single-redundant.

Like RAID, Ceph (or more precisely a Ceph pool) can be in, and moves through, 
the following states:

Normal --> Partially Failed (degraded) --> Recovering --> Normal.

When talking about these systems, we often gloss over Recovery, acting as if it 
takes no time.  Recovery does take time though, and if anything ELSE happens 
while recovery is ongoing, what can the software do?

Think RAID-5; what happens if a drive fails in a RAID-5 array, and during 
recovery an unreadable block is found on another drive?  That's single 
redundancy.  If you use RAID-6, the array goes to the second redundancy level, 
and the recovery continues.

As a result of the long recovery times expected of modern large hard-drives, 
Ceph pushes for double-redundancy (3x replication, 5-2 EC).  Further, it 
decreases availability the more redundancy is degraded (i.e. when the first 
layer of redundancy is compromised, writes are still allowed.  When the second 
is lost, writes are disallowed, but reads are allowed.  Only when all three 
layers are compromised are reads disallowed).

Dual power feeds (n+1) is only single-redundant, thus the entire system can't 
achieve better than single-redundancy.  Depending on the reliability of the 
power, and your service guarantees, this may be acceptable.

If you add ATSs, then you need to look at the failure rate (MTBF, or similar) 
to determine if your service guarantees are impacted.

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Phil Regnauld [mailto:p...@x0.dk] 
Sent: Friday, May 29, 2020 12:59 AM
To: Hans van den Bogert
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: CEPH failure domain - power considerations

Hans van den Bogert (hansbogert) writes:
> I would second that, there's no winning in this case for your 
> requirements and single PSU nodes. If there were 3 feeds,  then yes; 
> you could make an extra layer in your crushmap much like you would 
> incorporate a rack topology in the crushmap.

I'm not fully up on coffee for today, so I haven't yet worked out why
3 feeds would help ? To have a 'tie breaker' of sorts, with hosts spread
across 3 rails ?
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Maximum CephFS Filesystem Size

2020-04-01 Thread DHilsbos
All;

Another interesting piece of information: the host that mounts the CephFS shows 
it as 45% full.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air Internationl, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Wednesday, April 01, 2020 8:43 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Maximum CephFS Filesystem Size

All;

We set up a CephFS on a Nautilus (14.2.8) cluster in February, to hold backups. 
 We finally have all the backups running, and are just waiting for the system 
reach steady-state.

I'm concerned about usage numbers, in the Dashboard Capacity it shows the 
cluster as 37% used, while under Filesystems -->  --> Pools -_>  
--> Usage, it shows 71% used.

Does CephFS place a limit on the size of a CephFS?  Is there a limit to how 
large a pool can be in Ceph?  Where is the sizing discrepancy coming from, and 
do I need to address it?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Maximum CephFS Filesystem Size

2020-04-01 Thread DHilsbos
All;

We set up a CephFS on a Nautilus (14.2.8) cluster in February, to hold backups. 
 We finally have all the backups running, and are just waiting for the system 
reach steady-state.

I'm concerned about usage numbers, in the Dashboard Capacity it shows the 
cluster as 37% used, while under Filesystems -->  --> Pools -_>  
--> Usage, it shows 71% used.

Does CephFS place a limit on the size of a CephFS?  Is there a limit to how 
large a pool can be in Ceph?  Where is the sizing discrepancy coming from, and 
do I need to address it?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus upgrade stuck: Assertion `map->require_osd_release >= ceph_release_t::mimic' failed.

2020-03-26 Thread DHilsbos
This is a little beyond my understanding of Ceph, but let me take a crack at 
it...  I've found that Ceph tends to be fairly logical, mostly.

require_osd_release looks like a cluster wide configuration value which 
controls the minimum required version for an OSD daemon to join the cluster.
check_osdmap_features looks like an upgrade validation which checks that the 
above minimum version is.
unknown -> luminous suggests that a) require_osd_release is not set in your 
cluster, and b) because it is unset it is being assumed to be luminous.

This looks like a sane way to check, before upgrading, if the upgrade is likely 
to complete correctly.

>From your previous email "`map->require_osd_release >= ceph_release_t::mimic' 
>failed." suggests that the upgrade is looking for the above configuration 
>value needs to be mimic or higher.

I would suggest that you set this configuration value to mimic or nautilus, as 
appropriate for your cluster, and retry the upgrade.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Ml Ml [mailto:mliebher...@googlemail.com] 
Sent: Wednesday, March 25, 2020 2:16 PM
To: ceph-users
Subject: [ceph-users] Re: octopus upgrade stuck: Assertion 
`map->require_osd_release >= ceph_release_t::mimic' failed.

in the logs it says:

2020-03-25T22:10:00.823+0100 7f0bd5320e00  0 
/build/ceph-15.2.0/src/cls/hello/cls_hello.cc:312: loading cls_hello
2020-03-25T22:10:00.823+0100 7f0bd5320e00  0 osd.32 57223 crush map
has features 288232576282525696, adjusting msgr requires for clients
2020-03-25T22:10:00.823+0100 7f0bd5320e00  0 osd.32 57223 crush map
has features 288232576282525696 was 8705, adjusting msgr requires for
mons
2020-03-25T22:10:00.823+0100 7f0bd5320e00  0 osd.32 57223 crush map
has features 1008808516661821440, adjusting msgr requires for osds
2020-03-25T22:10:00.823+0100 7f0bd5320e00  1 osd.32 57223
check_osdmap_features require_osd_release unknown -> luminous
2020-03-25T22:10:04.695+0100 7f0bd5320e00  0 osd.32 57223 load_pgs
2020-03-25T22:10:10.907+0100 7f0bcc01d700  4 rocksdb:
[db/compaction_job.cc:1332] [default] [JOB 3] Generated table #59886:
2107241 keys, 72886355 bytes
2020-03-25T22:10:10.907+0100 7f0bcc01d700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1585170610911598, "cf_name": "default", "job": 3,
"event": "table_file_creation", "file_number": 59886, "file_size":
72886355, "table_properties": {"data_size": 67112666, "index_size":
504659, "filter_size": 5268165, "raw_key_size": 38673953,
"raw_average_key_size": 18, "raw_value_size": 35746098,
"raw_average_value_size": 16, "num_data_blocks": 16488, "num_entries":
2107241, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}}
2020-03-25T22:10:13.047+0100 7f0bd5320e00  0 osd.32 57223 load_pgs
opened 230 pgs
2020-03-25T22:10:13.047+0100 7f0bd5320e00 -1 osd.32 57223
log_to_monitors {default=true}
2020-03-25T22:10:13.107+0100 7f0bd5320e00  0 osd.32 57223 done with
init, starting boot process
2020-03-25T22:10:13.107+0100 7f0bd5320e00  1 osd.32 57223 start_boot


does the line:
  check_osdmap_features require_osd_release unknown -> luminous
mean it thinks the local osd itself is luminous?

On Wed, Mar 25, 2020 at 8:12 PM Ml Ml  wrote:
>
> Hello List,
>
> i followed:
>  https://ceph.io/releases/v15-2-0-octopus-released/
>
> I came from a healthy nautilus and i am stuck at:
>   5.) Upgrade all OSDs by installing the new packages and restarting
> the ceph-osd daemons on all OSD host
>
> When i try to start an osd like this, i get:
>   /usr/bin/ceph-osd -f --cluster ceph --id 32 --setuser ceph --setgroup ceph
> ...
> 2020-03-25T20:11:03.292+0100 7f2762874e00 -1 osd.32 57223
> log_to_monitors {default=true}
> ceph-osd: /build/ceph-15.2.0/src/osd/PeeringState.cc:109: void
> PGPool::update(ceph::common::CephContext*, OSDMapRef): Assertion
> `map->require_osd_release >= ceph_release_t::mimic' failed.
> ceph-osd: /build/ceph-15.2.0/src/osd/PeeringState.cc:109: void
> PGPool::update(ceph::common::CephContext*, OSDMapRef): Assertion
> `map->require_osd_release >= ceph_release_t::mimic' failed.
> *** Caught signal (Aborted) **
>  in thread 7f274854f700 thread_name:tp_osd_tp
> Aborted
>
>
>
> My current status:
>
> root@ceph03:~# ceph osd tree
> ID  CLASS  WEIGHTTYPE NAMESTATUS  REWEIGHT  PRI-AFF
> -1 60.70999  root default
> -2 20.25140  host ceph01
>  0hdd   1.71089  osd.0up   1.0  1.0
>  8hdd   2.67029  osd.8up   1.0  1.0
> 11hdd   1.5  osd.11   up   1.0  1.0
> 12hdd   1.5  osd.12   up   1.0  1.0
> 14hdd   2.7  osd.14   up   1.0  1.0
> 18hdd   1.5  osd.18   up   1.0  1.0
> 22hdd   2.7  osd.22   up   1.0  1.0
> 23hdd   2.7  osd.23   up   1.0  1.0
> 26hdd   

[ceph-users] Re: ceph ignoring cluster/public_network when initiating TCP connections

2020-03-23 Thread DHilsbos
Liviu;

First: what version of Ceph are you running?

Second: I don't see a cluster network option in you configuration file?

At least for us, running Nautilus, there are no underscores (_) in the options, 
so our configuration files look like this:

[global]
auth clust required = cphx
public network = /
cluster network = /

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Liviu Sas [mailto:droop...@gmail.com] 
Sent: Sunday, March 22, 2020 5:03 PM
To: ceph-users@ceph.io
Subject: [ceph-users] ceph ignoring cluster/public_network when initiating TCP 
connections

Hello,

While testing our ceph cluster setup, I noticed a possible issue with the
cluster/public network configuration being ignored for TCP session
initiation.

Looks like the daemons (mon/mgr/mds/osd) are all listening on the right IP
address but are initiating TCP sessions from the wrong interfaces.
Would it be possible to force ceph daemons to use the cluster/public IP
addresses to initiate new TCP connections instead of letting the kernel
chose?

Some details below:

We set everything up to use our "10.2.1.0/24" network:
10.2.1.x (x=node number 1,2,3)
But we can see TCP sessions being initiated from "10.2.0.0/24" network.

So the daemons are listening to the right IP addresses.
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep LISTE
ceph-mds  1541648 ceph   16u IPv48169344
 0t0TCP 10.2.1.1:6800 (LISTEN)
ceph-mds  1541648 ceph   17u IPv48169346
 0t0TCP 10.2.1.1:6801 (LISTEN)
ceph-mgr  1541654 ceph   25u IPv48163039
 0t0TCP 10.2.1.1:6810 (LISTEN)
ceph-mgr  1541654 ceph   27u IPv48163051
 0t0TCP 10.2.1.1:6811 (LISTEN)
ceph-mon  1541703 ceph   27u IPv48170914
 0t0TCP 10.2.1.1:3300 (LISTEN)
ceph-mon  1541703 ceph   28u IPv48170915
 0t0TCP 10.2.1.1:6789 (LISTEN)
ceph-osd  1541711 ceph   16u IPv48169353
 0t0TCP 10.2.1.1:6802 (LISTEN)
ceph-osd  1541711 ceph   17u IPv48169357
 0t0TCP 10.2.1.1:6803 (LISTEN)
ceph-osd  1541711 ceph   18u IPv48169362
 0t0TCP 10.2.1.1:6804 (LISTEN)
ceph-osd  1541711 ceph   19u IPv48169368
 0t0TCP 10.2.1.1:6805 (LISTEN)
ceph-osd  1541711 ceph   20u IPv48169375
 0t0TCP 10.2.1.1:6806 (LISTEN)
ceph-osd  1541711 ceph   21u IPv48169383
 0t0TCP 10.2.1.1:6807 (LISTEN)
ceph-osd  1541711 ceph   22u IPv48169392
 0t0TCP 10.2.1.1:6808 (LISTEN)
ceph-osd  1541711 ceph   23u IPv48169402
 0t0TCP 10.2.1.1:6809 (LISTEN)

Sessions to the other nodes use the wrong IP address:

@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.2
ceph-mds  1541648 ceph   28u IPv48279520
 0t0TCP 10.2.0.2:44180->10.2.1.2:6800 (ESTABLISHED)
ceph-mgr  1541654 ceph   41u IPv48289842
 0t0TCP 10.2.0.2:44146->10.2.1.2:6800 (ESTABLISHED)
ceph-mon  1541703 ceph   40u IPv48174827
 0t0TCP 10.2.0.2:40864->10.2.1.2:3300 (ESTABLISHED)
ceph-osd  1541711 ceph   65u IPv48171035
 0t0TCP 10.2.0.2:58716->10.2.1.2:6804 (ESTABLISHED)
ceph-osd  1541711 ceph   66u IPv48172960
 0t0TCP 10.2.0.2:54586->10.2.1.2:6806 (ESTABLISHED)
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.3
ceph-mds  1541648 ceph   30u IPv48292421
 0t0TCP 10.2.0.2:45710->10.2.1.3:6802 (ESTABLISHED)
ceph-mon  1541703 ceph   46u IPv48173025
 0t0TCP 10.2.0.2:40164->10.2.1.3:3300 (ESTABLISHED)
ceph-osd  1541711 ceph   67u IPv48173043
 0t0TCP 10.2.0.2:56920->10.2.1.3:6804 (ESTABLISHED)
ceph-osd  1541711 ceph   68u IPv48171063
 0t0TCP 10.2.0.2:41952->10.2.1.3:6806 (ESTABLISHED)
ceph-osd  1541711 ceph   69u IPv48178891
 0t0TCP 10.2.0.2:57890->10.2.1.3:6808 (ESTABLISHED)


See below our cluster config:

[global]
 auth_client_required = cephx
 auth_cluster_required = cephx
 auth_service_required = cephx
 cluster_network = 10.2.1.0/24
 fsid = 0f19b6ff-0432-4c3f-b0cb-730e8302dc2c
 mon_allow_pool_delete = true
 mon_host = 10.2.1.1 10.2.1.2 10.2.1.3
 osd_pool_default_min_size = 2
 osd_pool_default_size = 3
 public_network = 10.2.1.0/24

[client]
 keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
 keyring = /var/lib/ceph/mds

[ceph-users] Re: Link to Nautilus upgrade

2020-03-09 Thread DHilsbos
Peter;

Or possibly this:
https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous
Or this:
https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-pre-luminous-releases-like-jewel

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Monday, March 09, 2020 8:58 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Link to Nautilus upgrade

Hi,

When upgrading a cluster from Luminous to Nautilus I followed a page on 
ceph.com.  I need to do another cluster and while I tagged the link, the page 
no longer exists.

https://docs.ceph.com/master/releases/nautilus/#nautilus-old-upgrade

Might anyone have either an updated link or point me to how to find the 
contents of best steps?

Thanks,

peter



Peter Eisch​
Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com
|

virginpulse.com/global-challenge


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


v2.64


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Link to Nautilus upgrade

2020-03-09 Thread DHilsbos
Peter;

Might this be what you're after:
https://docs.ceph.com/docs/nautilus/install/upgrading-ceph/#

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


From: Peter Eisch [mailto:peter.ei...@virginpulse.com] 
Sent: Monday, March 09, 2020 8:58 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Link to Nautilus upgrade

Hi,

When upgrading a cluster from Luminous to Nautilus I followed a page on 
ceph.com.  I need to do another cluster and while I tagged the link, the page 
no longer exists.

https://docs.ceph.com/master/releases/nautilus/#nautilus-old-upgrade

Might anyone have either an updated link or point me to how to find the 
contents of best steps?

Thanks,

peter



Peter Eisch​
Senior Site Reliability Engineer


T

1.612.445.5135












virginpulse.com
|

virginpulse.com/global-challenge


Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


v2.64


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hardware feedback before purchasing for a PoC

2020-03-09 Thread DHilsbos
Ignacio;



Personally, I like to use hardware for a proof of concept that I can roll over 
into the final system, or repurpose if the project is denied.



As such, I would recommend these:

Supermicro 5019A-12TN4 
Barebones



I built our PoC around three of them (2 x 12TB Seagate Ironwolf drives, plus 1x 
256 GB Intel M.2 SSDs), then they got turned into MONs for the production 
cluster.  The production cluster ended up with larger M.2s, and smaller 
spinners, as I was concerned about recovery time for 24 - 36TB per node, with 
only 4 x 1Gb network.



Just my 2 cents.



Thank you,



Dominic L. Hilsbos, MBA

Director - Information Technology

Perform Air International Inc.

dhils...@performair.com

www.PerformAir.com





-Original Message-
From: Ignacio Ocampo [mailto:naf...@gmail.com]
Sent: Sunday, March 08, 2020 7:00 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Hardware feedback before purchasing for a PoC



Hi team, I'm planning to invest in hardware for a PoC and I would like your

feedback before the purchase:



The goal is to deploy a *16TB* storage cluster, with *3 replicas* thus *3

nodes*.



System configuration: https://pcpartpicker.com/list/cfDpDx ($400 USD per

node)



Some notes about the configuration:



   - 4-core processor for 4 OSD daemons

   - 8GB RAM for the first 4TB of storage, that will increase to 16GB of

   RAM when 16TB of storage.

   - Motherboard:

   - 4 x SATA 6 Gb/s (one per each OSD disk)

  - 2 x PCI-E x1 Slots (1 will be used for an additional Gigabit

  Ethernet)

  - 1 x M.2 Slots for the host OS

  - Ram can increase up-to 32 GB, and another SATA 6b/s controller can

  be added on PCI-E x1 for growth up to *32TB*



As noted, the plan is to deploy nodes with *4TB* and gradually add *12TB* as

needed, memory also should be increased to *16GB* after *8TB* threshold.

Edit

Questions to validate before the purchase



1. Does the hardware components make sense for the *16TB* growth projection?



2. Is it easy to gradually add more capacity to each node (*4TB* each time

per node)?

Thanks for your support!



--

Ignacio Ocampo

___

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS Issues

2020-03-06 Thread DHilsbos
All;

When I went to check Wido's suggestion, I found the MDS daemons would start 
successfully.  I obviously found no significant time differences.

Sorry for making a mountain out of a mole-hill.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Wido den Hollander [mailto:w...@42on.com] 
Sent: Friday, March 06, 2020 12:15 PM
To: Dominic Hilsbos; ceph-users@ceph.io
Subject: [ceph-users] Re: MDS Issues



On 3/6/20 7:46 PM, dhils...@performair.com wrote:
> All;
> 
> We are in the middle of upgrading our primary cluster from 14.2.5 to 14.2.8. 
> Our cluster utilizes 6 MDSs for 3 CephFS file systems. 3 MDSs are collocated 
> with MON/MGR, and 3 MDSs are collocated with OSDs.
> 
> At this point we have upgraded all 3 of the MON/MDS/MGR servers. The MDS on 2 
> of the 3 is currently not working, and we are seeing the below log messages.
> 
> 2020-03-06 11:12:56.184 <> -1 mds. unable to obtain rotating service 
> keys; retrying
> 2020-03-06 11:13:26.184 <>  0 monclient: wait_auth_rotating timed out after 30
> 2020-03-06 11:13:26.184 <> -1 mds. ERROR: failed to refresh rotating 
> keys, maximum retry time reached.
> 2020-03-06 11:13:26.184 <>  1 mds. suicide! Wanted state up:boot
> 
> Any ideas?
> 

Double check: Is the time correct on all the machines?

cephx can have issues if there is a clock issue.

Wido

> Thank you,
> 
> Dominic L. Hilsbos, MBA 
> Director - Information Technology 
> Perform Air International Inc.
> dhils...@performair.com 
> www.PerformAir.com
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS Issues

2020-03-06 Thread DHilsbos
All;

We are in the middle of upgrading our primary cluster from 14.2.5 to 14.2.8. 
Our cluster utilizes 6 MDSs for 3 CephFS file systems. 3 MDSs are collocated 
with MON/MGR, and 3 MDSs are collocated with OSDs.

At this point we have upgraded all 3 of the MON/MDS/MGR servers. The MDS on 2 
of the 3 is currently not working, and we are seeing the below log messages.

2020-03-06 11:12:56.184 <> -1 mds. unable to obtain rotating service 
keys; retrying
2020-03-06 11:13:26.184 <>  0 monclient: wait_auth_rotating timed out after 30
2020-03-06 11:13:26.184 <> -1 mds. ERROR: failed to refresh rotating 
keys, maximum retry time reached.
2020-03-06 11:13:26.184 <>  1 mds. suicide! Wanted state up:boot

Any ideas?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread DHilsbos
Simone;

What is your failure domain?

If you don't know your failure domain can you provide the CRUSH ruleset for the 
pool that experienced the "object unfound" error?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Simone Lazzaris [mailto:simone.lazza...@qcom.it] 
Sent: Thursday, March 05, 2020 6:11 AM
To: ceph-users; Chad William Seys
Subject: [ceph-users] Re: How can I fix "object unfound" error?

In data mercoledì 4 marzo 2020 18:14:31 CET, Chad William Seys ha scritto:
> > Maybe I've marked the object as "lost" and removed the failed
> > OSD.
> > 
> > The cluster now is healthy, but I'd like to understand if it's likely
> > to bother me again in the future.
> 
> Yeah, I don't know.
> 
> Within the last month there are 4 separate instances of people
> mentioning "unfound" object in their cluster.
> 
> I'm deferring as long as possible any OSD drive upgrades.  I ran into
> the problem when "draining" an OSD.
> 
> "draining" means remove OSD from crush map, wait for all PG to be stored
> elsewhere, then replace drive with larger one.  Under those
> circumstances there should be no PG unfound.
> 
> BTW, are you using cache tiering ?  The bug report mentions this, but
> some people did not have this enabled.
> 
> Chad.

No, I don't have cache tiering enabled. I also found strange that the PG was 
marked 
unfound: the cluster was perfectly healthy before the kernel panic and a single 
OSD failure 
shouldn't create mush hassle.


*Simone Lazzaris*
*Qcom S.p.A. a Socio Unico*
 

Via Roggia Vignola, 9 | 24047 Treviglio (BG)T +39 0363 1970352 | M +39 
3938111237

simone.lazza...@qcom.it[1] | www.qcom.it[2]
* LinkedIn[3]* | *Facebook*[4]
[5] 




[1] mailto:simone.lazza...@qcom.it
[2] https://www.qcom.it
[3] https://www.linkedin.com/company/qcom-spa
[4] http://www.facebook.com/qcomspa
[5] https://www.qcom.it/includes/NUOVAemail-banner.gif
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stately MDS Transitions

2020-02-28 Thread DHilsbos
Marc;

If I understand that command correctly, it's tells MDS 'c' to disappear, the 
same as rebooting would, right?

Let me just clarify something then...

When I run ceph fs dump I get the following:
110248: [v2:10.2.80.10:6800/1470324937,v1:10.2.80.10:6801/1470324937] 'S700041' 
mds.0.29 up:active seq 997315
120758: [v2:10.2.80.11:6800/2691008522,v1:10.2.80.11:6801/2691008522] 'S700042' 
mds.0.0 up:standby-replay seq 2

Doesn't that indicate that the standby-replay server is out of date, and would 
need to pause the FS to replay?

Is there something wrong in my setup, or is the above expected?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marc Roos [mailto:m.r...@f1-outsourcing.eu] 
Sent: Friday, February 28, 2020 8:19 AM
To: ceph-users; Dominic Hilsbos
Subject: RE: [ceph-users] Stately MDS Transitions

 

ceph mds fail c?



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -. 
F1 Outsourcing Development Sp. z o.o.
Poland 

t:  +48 (0)124466845
f:  +48 (0)124466843
e:  m...@f1-outsourcing.eu


-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: vrijdag 28 februari 2020 16:17
To: ceph-users@ceph.io
Subject: [ceph-users] Stately MDS Transitions

All;

We just started really fiddling with CephFS on our production cluster 
(Nautilus - 14.2.5 / 14.2.6), and I have a question...

Is there a command / set of commands that transitions a standby-replay 
MDS server to the active role, while swapping the active MDS to 
standby-replay, or even just standby?

I'm looking for a way to seamlessly, and without down time, prepare the 
active MDS to go offline (reboot) as part of planned /periodic 
maintenance.

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stately MDS Transitions

2020-02-28 Thread DHilsbos
All;

We just started really fiddling with CephFS on our production cluster (Nautilus 
- 14.2.5 / 14.2.6), and I have a question...

Is there a command / set of commands that transitions a standby-replay MDS 
server to the active role, while swapping the active MDS to standby-replay, or 
even just standby?

I'm looking for a way to seamlessly, and without down time, prepare the active 
MDS to go offline (reboot) as part of planned /periodic maintenance.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SSD considerations for block.db and WAL

2020-02-27 Thread DHilsbos
Christian;

What is your failure domain?  If your failure domain is set to OSD / drive, and 
2 OSDs share a DB / WAL device, and that DB / WAL device dies, then portions of 
the data could drop to read-only (or be lost...).

Ceph is really set up to own the storage hardware directly.  It doesn't 
(usually) make sense to put any kind of RAID / JBOD between Ceph and the 
hardware.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Christian Wahl [mailto:w...@teco.edu] 
Sent: Thursday, February 27, 2020 12:09 PM
To: ceph-users@ceph.io
Subject: [ceph-users] SSD considerations for block.db and WAL


Hi everyone,

we currently have 6 OSDs with 8TB HDDs split across 3 hosts.
The main usage is KVM-Images.

To improve speed we planned on putting the block.db and WAL onto NVMe-SSDs.
The plan was to put 2x1TB in each host.

One option I thought of was to RAID 1 them for better redundancy, I don't know 
how high the risk is of corrupting the block.db by one failed SSD block.
Or should I just one for WAL+block.db and use the other one as fast storage?

Thank you all very much!

Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread DHilsbos
Rodrigo;

Best bet would be to check logs.  Check the OSD logs on the affected server.  
Check cluster logs on the MONs.  Check OSD logs on other servers.

Your Ceph version(s) and your OS distribution and version would also be useful 
to help you troubleshoot this OSD flapping issue.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Rodrigo Severo - Fábrica [mailto:rodr...@fabricadeideias.com] 
Sent: Tuesday, February 04, 2020 11:05 AM
To: Wesley Dillingham
Cc: ceph-users
Subject: [ceph-users] Re: All pgs peering indefinetely

Em ter., 4 de fev. de 2020 às 14:54, Wesley Dillingham
 escreveu:
>
>
> I would guess that you have something preventing osd to osd communication on 
> ports 6800-7300 or osd to mon communication on  port 6789 and/or 3300.

The 3 servers are on the same subnet. They are connect to a
non-managed switch. And none have any firewall (iptables) rules
blocking anything. They can ping one the other.

Can you think about some other way that some traffic could be blocked?
Or some other test I could do to check for connectivity?


Regards,

Rodrigo





>
>
> Respectfully,
>
> Wes Dillingham
> w...@wesdillingham.com
> LinkedIn
>
>
> On Tue, Feb 4, 2020 at 12:44 PM Rodrigo Severo - Fábrica 
>  wrote:
>>
>> Em ter., 4 de fev. de 2020 às 12:39, Rodrigo Severo - Fábrica
>>  escreveu:
>> >
>> > Hi,
>> >
>> >
>> > I have a rather small cephfs cluster with 3 machines right now: all of
>> > them sharing MDS, MON, MGS and OSD roles.
>> >
>> > I had to move all machines to a new physical location and,
>> > unfortunately, I had to move all of them at the same time.
>> >
>> > They are already on again but ceph won't be accessible as all pgs are
>> > in peering state and OSD keep going down and up again.
>> >
>> > Here is some info about my cluster:
>> >
>> > ---
>> > # ceph -s
>> >   cluster:
>> > id: e348b63c-d239-4a15-a2ce-32f29a00431c
>> > health: HEALTH_WARN
>> > 1 filesystem is degraded
>> > 1 MDSs report slow metadata IOs
>> > 2 osds down
>> > 1 host (2 osds) down
>> > Reduced data availability: 324 pgs inactive, 324 pgs peering
>> > 7 daemons have recently crashed
>> > 10 slow ops, oldest one blocked for 206 sec, mon.a2-df has 
>> > slow ops
>> >
>> >   services:
>> > mon: 3 daemons, quorum a2-df,a3-df,a1-df (age 47m)
>> > mgr: a2-df(active, since 82m), standbys: a3-df, a1-df
>> > mds: cephfs:1/1 {0=a2-df=up:replay} 2 up:standby
>> > osd: 6 osds: 4 up (since 5s), 6 in (since 47m)
>> > rgw: 1 daemon active (a2-df)
>> >
>> >   data:
>> > pools:   7 pools, 324 pgs
>> > objects: 850.25k objects, 744 GiB
>> > usage:   2.3 TiB used, 14 TiB / 16 TiB avail
>> > pgs: 100.000% pgs not active
>> >  324 peering
>> > ---
>> >
>> > ---
>> > # ceph osd df tree
>> > ID  CLASSWEIGHT   REWEIGHT SIZERAW USE DATAOMAPMETA
>> > AVAIL   %USE  VAR  PGS STATUS TYPE NAME
>> >  -1  16.37366-  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
>> >  14 TiB 13.83 1.00   -root default
>> > -10  16.37366-  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
>> >  14 TiB 13.83 1.00   -datacenter df
>> >  -3   5.45799- 5.5 TiB 773 GiB 770 GiB 382 MiB 2.7 GiB
>> > 4.7 TiB 13.83 1.00   -host a1-df
>> >   3 hdd-slow  3.63899  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
>> > 3.6 TiB  0.03 0.00   0   down osd.3
>> >   0  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 382 MiB 1.7 GiB
>> > 1.1 TiB 41.43 3.00   0   down osd.0
>> >  -5   5.45799- 5.5 TiB 773 GiB 770 GiB 370 MiB 2.7 GiB
>> > 4.7 TiB 13.83 1.00   -host a2-df
>> >   4 hdd-slow  3.63899  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
>> > 3.6 TiB  0.03 0.00 100 up osd.4
>> >   1  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 370 MiB 1.7 GiB
>> > 1.1 TiB 41.42 3.00 224 up osd.1
>> >  -7   5.45767- 5.5 TiB 773 GiB 770 GiB 387 MiB 2.7 GiB
>> > 4.7 TiB 13.83 1.00   -host a3-df
>> >   5 hdd-slow  3.63869  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
>> > 3.6 TiB  0.03 0.00 100 up osd.5
>> >   2  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 387 MiB 1.7 GiB
>> > 1.1 TiB 41.43 3.00 224 up osd.2
>> >  TOTAL  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
>> >  14 TiB 13.83
>> > MIN/MAX VAR: 0.00/3.00  STDDEV: 21.82
>> > ---
>> >
>> > At this exact moment both OSDs from server a1-df were down but that's
>> > changing. Sometimes I have only one OSD down, but most of the times I
>> > have 2. And 

[ceph-users] Re: More OMAP Issues

2020-02-04 Thread DHilsbos
Paul;

Yes, we are running a multi-site setup.

Re-sync would be acceptable at this point, as we only have 4 TiB in use right 
now.

Tearing down and reconfiguring the second site would also be acceptable, except 
that I've never been able to cleanly remove a zone from a zone group.  The only 
way I've found to remove a zone completely is to tear down the entire RADOSGW 
configuration (delete .rgw.root pool from both clusters).

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Sent: Tuesday, February 04, 2020 9:52 AM
To: Dominic Hilsbos
Cc: ceph-users
Subject: Re: [ceph-users] More OMAP Issues

Are you running a multi-site setup?
In this case it's best to set the default shard size to large enough
number *before* enabling multi-site.

If you didn't do this: well... I think the only way is still to
completely re-sync the second site...


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Tue, Feb 4, 2020 at 5:23 PM  wrote:
>
> All;
>
> We're backing to having large OMAP object warnings regarding our RGW index 
> pool.
>
> This cluster is now in production, so I can simply dump the buckets / pools 
> and hope everything works out.
>
> I did some additional research on this issue, and it looks like I need to 
> (re)shard the bucket (index?).  I found information that suggests that, for 
> older versions of Ceph, buckets couldn't be sharded after creation[1].  Other 
> information suggests the Nautilus (which we are running), can re-shard 
> dynamically, but not when multi-site replication is configured[2].
>
> This suggests that a "manual" resharding of a Nautilus cluster should be 
> possible, but I can't find the commands to do it.  Has anyone done this?  
> Does anyone have the commands to do it?  I can schedule down time for the 
> cluster, and take the RADOSGW instance(s), and dependent user services 
> offline.
>
> [1]: https://ceph.io/geen-categorie/radosgw-big-index/
> [2]: https://docs.ceph.com/docs/master/radosgw/dynamicresharding/
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] More OMAP Issues

2020-02-04 Thread DHilsbos
All;

We're backing to having large OMAP object warnings regarding our RGW index pool.

This cluster is now in production, so I can simply dump the buckets / pools and 
hope everything works out.

I did some additional research on this issue, and it looks like I need to 
(re)shard the bucket (index?).  I found information that suggests that, for 
older versions of Ceph, buckets couldn't be sharded after creation[1].  Other 
information suggests the Nautilus (which we are running), can re-shard 
dynamically, but not when multi-site replication is configured[2].

This suggests that a "manual" resharding of a Nautilus cluster should be 
possible, but I can't find the commands to do it.  Has anyone done this?  Does 
anyone have the commands to do it?  I can schedule down time for the cluster, 
and take the RADOSGW instance(s), and dependent user services offline.

[1]: https://ceph.io/geen-categorie/radosgw-big-index/
[2]: https://docs.ceph.com/docs/master/radosgw/dynamicresharding/

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: All pgs peering indefinetely

2020-02-04 Thread DHilsbos
Rodrigo;

Are all your hosts using the same IP addresses as before the move?  Is the new 
network structured the same?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Rodrigo Severo - Fábrica [mailto:rodr...@fabricadeideias.com] 
Sent: Tuesday, February 04, 2020 8:40 AM
To: ceph-users
Subject: [ceph-users] All pgs peering indefinetely

Hi,


I have a rather small cephfs cluster with 3 machines right now: all of
them sharing MDS, MON, MGS and OSD roles.

I had to move all machines to a new physical location and,
unfortunately, I had to move all of them at the same time.

They are already on again but ceph won't be accessible as all pgs are
in peering state and OSD keep going down and up again.

Here is some info about my cluster:

---
# ceph -s
  cluster:
id: e348b63c-d239-4a15-a2ce-32f29a00431c
health: HEALTH_WARN
1 filesystem is degraded
1 MDSs report slow metadata IOs
2 osds down
1 host (2 osds) down
Reduced data availability: 324 pgs inactive, 324 pgs peering
7 daemons have recently crashed
10 slow ops, oldest one blocked for 206 sec, mon.a2-df has slow ops

  services:
mon: 3 daemons, quorum a2-df,a3-df,a1-df (age 47m)
mgr: a2-df(active, since 82m), standbys: a3-df, a1-df
mds: cephfs:1/1 {0=a2-df=up:replay} 2 up:standby
osd: 6 osds: 4 up (since 5s), 6 in (since 47m)
rgw: 1 daemon active (a2-df)

  data:
pools:   7 pools, 324 pgs
objects: 850.25k objects, 744 GiB
usage:   2.3 TiB used, 14 TiB / 16 TiB avail
pgs: 100.000% pgs not active
 324 peering
---

---
# ceph osd df tree
ID  CLASSWEIGHT   REWEIGHT SIZERAW USE DATAOMAPMETA
AVAIL   %USE  VAR  PGS STATUS TYPE NAME
 -1  16.37366-  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
 14 TiB 13.83 1.00   -root default
-10  16.37366-  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
 14 TiB 13.83 1.00   -datacenter df
 -3   5.45799- 5.5 TiB 773 GiB 770 GiB 382 MiB 2.7 GiB
4.7 TiB 13.83 1.00   -host a1-df
  3 hdd-slow  3.63899  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
3.6 TiB  0.03 0.00   0   down osd.3
  0  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 382 MiB 1.7 GiB
1.1 TiB 41.43 3.00   0   down osd.0
 -5   5.45799- 5.5 TiB 773 GiB 770 GiB 370 MiB 2.7 GiB
4.7 TiB 13.83 1.00   -host a2-df
  4 hdd-slow  3.63899  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
3.6 TiB  0.03 0.00 100 up osd.4
  1  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 370 MiB 1.7 GiB
1.1 TiB 41.42 3.00 224 up osd.1
 -7   5.45767- 5.5 TiB 773 GiB 770 GiB 387 MiB 2.7 GiB
4.7 TiB 13.83 1.00   -host a3-df
  5 hdd-slow  3.63869  1.0 3.6 TiB 1.1 GiB  90 MiB 0 B   1 GiB
3.6 TiB  0.03 0.00 100 up osd.5
  2  hdd  1.81898  1.0 1.8 TiB 772 GiB 770 GiB 387 MiB 1.7 GiB
1.1 TiB 41.43 3.00 224 up osd.2
 TOTAL  16 TiB 2.3 TiB 2.3 TiB 1.1 GiB 8.1 GiB
 14 TiB 13.83
MIN/MAX VAR: 0.00/3.00  STDDEV: 21.82
---

At this exact moment both OSDs from server a1-df were down but that's
changing. Sometimes I have only one OSD down, but most of the times I
have 2. And exactly which ones are actually down keeps changing.

What should I do to get my cluster back up? Just wait?


Regards,

Rodrigo Severo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Write i/o in CephFS metadata pool

2020-01-29 Thread DHilsbos
Sammy;

I had a thought; since you say the FS has high read activity, but you're seeing 
large write I/O... is it possible that this is related to  atime (Linux last 
access time)?  If I remember my Linux FS basics, atime is stored in the file 
entry for the file in the directory, and I believe directory information is 
stored in the metadata pool (dentries?).

As a test; you might try mounting the CephFS with the noatime flag.  Then see 
if the write I/O is reduced.

I honestly don't know if CephFS supports atime, but I would expect it would.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Samy Ascha [mailto:s...@xel.nl] 
Sent: Wednesday, January 29, 2020 2:25 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Write i/o in CephFS metadata pool

Hi!

I've been running CephFS for a while now and ever since setting it up, I've 
seen unexpectedly large write i/o on the CephFS metadata pool.

The filesystem is otherwise stable and I'm seeing no usage issues.

I'm in a read-intensive environment, from the clients' perspective and 
throughput for the metadata pool is consistently larger than that of the data 
pool.

For example:

# ceph osd pool stats
pool cephfs_data id 1
  client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr

pool cephfs_metadata id 2
  client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr

I realise, of course, that this is a momentary display of statistics, but I see 
this unbalanced r/w activity consistently when monitoring it live.

I would like some insight into what may be causing this large imbalance in r/w, 
especially since I'm in a read-intensive (web hosting) environment.

Some of it may be expected in when considering details of my environment and 
CephFS implementation specifics, so please ask away if more details are needed.

With my experience using NFS, I would start by looking at client io stats, like 
`nfsstat` and tuning e.g. mount options, but I haven't been able to find such 
statistics for CephFS clients.

Is there anything of the sort for CephFS? Are similar stats obtainable in some 
other way?

This might be a somewhat broad question and shallow description, so yeah, let 
me know if there's anything you would like more details on.

Thanks a lot,
Samy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] No Activity?

2020-01-28 Thread DHilsbos
All;

I haven't had a single email come in from the ceph-users list at ceph.io since 
01/22/2020.

Is there just that little traffic right now?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 0B OSDs

2019-10-25 Thread DHilsbos
All;

We're setting up our second cluster, using version 14.2.4, and we've run into a 
weird issue: all of our OSDs are created with a size of 0 B.  Weights are 
appropriate for the size of the underlying drives, but ceph -s shows this:

  cluster:
id: 
health: HEALTH_WARN
Reduced data availability: 256 pgs inactive
too few PGs per OSD (28 < min 30)

  services:
mon: 3 daemons, quorum s700041,s700042,s700043 (age 4d)
mgr: s700041(active, since 3d), standbys: s700042, s700043
osd: 9 osds: 9 up (since 21m), 9 in (since 44m)

  data:
pools:   1 pools, 256 pgs
objects: 0 objects, 0 B
-->usage:   0 B used, 0 B / 0 B avail<-- (emphasis added)
pgs: 100.000% pgs unknown
 256 unknown

Thoughts?

I have ceph-volumne.log, and the log from one of the OSD daemons, though it 
looks like the auth keys get printed to the ceph-volume.log.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
300 S. Hamilton Pl. 
Gilbert, AZ 85233 
Phone: (480) 610-3500 
Fax: (480) 610-3501 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 0B OSDs?

2019-10-25 Thread DHilsbos
All;

We're setting up our second cluster, using version 14.2.4, and we've run into a 
weird issue: all of our OSDs are created with a size of 0 B.  Weights are 
appropriate for the size of the underlying drives, but ceph -s shows this:

  cluster:
id: 
health: HEALTH_WARN
Reduced data availability: 256 pgs inactive
too few PGs per OSD (28 < min 30)

  services:
mon: 3 daemons, quorum s700041,s700042,s700043 (age 4d)
mgr: s700041(active, since 3d), standbys: s700042, s700043
osd: 9 osds: 9 up (since 21m), 9 in (since 44m)

  data:
pools:   1 pools, 256 pgs
objects: 0 objects, 0 B
-->usage:   0 B used, 0 B / 0 B avail<-- (emphasis added)
pgs: 100.000% pgs unknown
 256 unknown

Thoughts?

I have ceph-volumne.log, and the log from one of the OSD daemons, though it 
looks like the auth keys get printed to the ceph-volume.log.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [nautilus] Dashboard & RADOSGW

2019-09-10 Thread DHilsbos
All;

I found the problem, it was an identity issue.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Tuesday, September 10, 2019 3:52 PM
To: ceph-users@ceph.io
Cc: Stephen Self
Subject: [ceph-users] [nautilus] Dashboard & RADOSGW

All;

We're trying to add a RADOSGW instance to our new production cluster, and it's 
not showing in the dashboard, or in ceph -s.

The cluster is running 14.2.2, and the RADOSGW got 14.2.3.

systemctl status ceph-radosgw@ rgw.s700037 returns: active (running).

ss -ntlp does NOT show port 80.

Here's the ceph.conf on the system:
[global]
fsid = effc5134-e0cc-4628-a079-d67b60071f90
mon initial members = s700034,s700035,s700036
mon host = 
[v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0]
public network = 10.0.80.0/24
cluster network = 10.0.88.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 8
osd pool default pgp num = 8

[client.rgw.s700037]
host = s700037.performair.local
rgw frontends = "civetweb port=80"
rgw dns name = radosgw.performair.local

Any thoughts on what I'm missing?

I'm also seeing these in the manager's logs:
2019-09-10 15:49:43.946 7efe6eee1700  0 mgr[dashboard] [10/Sep/2019:15:49:43] 
ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1837, in start
self.tick()
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1902, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", 
line 52, in wrap
keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23)
  File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket
ciphers=ciphers)
  File "/usr/lib64/python2.7/ssl.py", line 609, in __init__
self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate 
unknown (_ssl.c:618)

Thoughts on this?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Manager plugins issues on new ceph-mgr nodes

2019-09-10 Thread DHilsbos
Alexander;

What is your operating system?

Is it possible that the dashboard module isn't installed?

I've run into "Error ENOENT: all mgr daemons do not support module 'dashboard'" 
on my CentOS 7 machines, where the module is a separate package (I had to use 
"yum install ceph-mgr-dashboard" to get the dashboard module). 

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Alexandru Cucu [mailto:m...@alexcucu.ro] 
Sent: Tuesday, September 10, 2019 5:23 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Manager plugins issues on new ceph-mgr nodes

Hello,

Running 14.2.3, updated from 14.2.1.
Until recently I've had ceph-mgr collocated with OSDs. I've installed
ceph-mgr on separate servers and everything looks OK in Ceph status
but there are multiple issues:

1. Dashboard only runs on old mgr servers. Tried restarting the
daemons and disable/enable the dashboard plugin. New mgr won't listen
on the dashboard port.
2. To (re)enable the dashboard plugin I had to use "--force"
# ceph mgr module enable dashboard
Error ENOENT: all mgr daemons do not support module 'dashboard',
pass --force to force enablement
3. When accessing the Cluster -> Manager modules menu in the dashboard
I get a 500 error message. The exact error below:


2019-09-10 15:01:39.270 7fb6d4916700  0 mgr[dashboard]
[10/Sep/2019:15:01:39] HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line
656, in respond
response.body = self.handler()
  File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py",
line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line
221, in wrap
return self.newhandler(innerfunc, *args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88,
in dashboard_exception_handler
return handler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py",
line 34, in __call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
649, in inner
ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
842, in wrapper
return func(*vpath, **params)
  File "/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py",
line 35, in list
obj['enabled'] = True
TypeError: 'NoneType' object does not support item assignment

2019-09-10 15:01:39.271 7fb6d4916700  0 mgr[dashboard]
[:::192.168.15.55:54860] [GET] [500] [0.014s] [admin] [1.3K]
/api/mgr/module
2019-09-10 15:01:39.272 7fb6d4916700  0 mgr[dashboard] ['{"status":
"500 Internal Server Error", "version": "3.2.2", "detail": "The server
encountered an unexpected condition which prevented it from fulfilling
the request.", "traceback": "Traceback (most recent call last):\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\",
line 656, in respond\\nresponse.body = self.handler()\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
188, in __call__\\nself.body = self.oldhandler(*args, **kwargs)\\n
 File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\",
line 221, in wrap\\nreturn self.newhandler(innerfunc, *args,
**kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line 88, in
dashboard_exception_handler\\nreturn handler(*args, **kwargs)\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\",
line 34, in __call__\\nreturn self.callable(*self.args,
**self.kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 649,
in inner\\nret = func(*args, **kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 842,
in wrapper\\nreturn func(*vpath, **params)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py\\", line
35, in list\\nobj[\'enabled\'] = True\\nTypeError: \'NoneType\'
object does not support item assignment\\n"}']


Anyone got the same problems after adding new manager nodes? Is there
something I'm missing here?

Thanks!
---
Alex Cucu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [nautilus] Dashboard & RADOSGW

2019-09-10 Thread DHilsbos
All;

We're trying to add a RADOSGW instance to our new production cluster, and it's 
not showing in the dashboard, or in ceph -s.

The cluster is running 14.2.2, and the RADOSGW got 14.2.3.

systemctl status ceph-radosgw@ rgw.s700037 returns: active (running).

ss -ntlp does NOT show port 80.

Here's the ceph.conf on the system:
[global]
fsid = effc5134-e0cc-4628-a079-d67b60071f90
mon initial members = s700034,s700035,s700036
mon host = 
[v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0]
public network = 10.0.80.0/24
cluster network = 10.0.88.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 8
osd pool default pgp num = 8

[client.rgw.s700037]
host = s700037.performair.local
rgw frontends = "civetweb port=80"
rgw dns name = radosgw.performair.local

Any thoughts on what I'm missing?

I'm also seeing these in the manager's logs:
2019-09-10 15:49:43.946 7efe6eee1700  0 mgr[dashboard] [10/Sep/2019:15:49:43] 
ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1837, in start
self.tick()
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1902, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", 
line 52, in wrap
keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23)
  File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket
ciphers=ciphers)
  File "/usr/lib64/python2.7/ssl.py", line 609, in __init__
self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate 
unknown (_ssl.c:618)

Thoughts on this?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   >