[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-02-28 Thread Szabo, Istvan (Agoda)
So-so. I had some interruption so it failed on one site, but the other is kind 
of working. This is the first time when I saw data caught up in the 
radosgw-admin data sync status on 1 side.
Today will finish the other problematic site, I’ll let you know the result is 
it working or not.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

From: 特木勒 
Sent: Sunday, February 28, 2021 1:34 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io
Subject: Re: [Suspicious newsletter] [ceph-users] RGW: Multiple Site does not 
sync olds data

Email received from outside the company. If in doubt don't click links nor open 
attachments!

Hi Istvan:

Thanks for your reply.

Does directional sync solve the problem? I tried to run `radosgw-admin sync 
init`, bit it still did not work. :(

Thanks

Szabo, Istvan (Agoda) mailto:istvan.sz...@agoda.com>> 
于2021年2月26日周五 上午7:47写道:
Same for me, 15.2.8 also.
I’m trying directional sync now, looks like symmetrical has issue.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com>
---

On 2021. Feb 26., at 1:03, 特木勒 mailto:twl...@gmail.com>> 
wrote:

Email received from outside the company. If in doubt don't click links nor 
open attachments!


Hi all:

ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)

I have a strange question, I just create a multiple site for Ceph cluster.
But I notice the old data of source cluster is not synced. Only new data
will be synced into second zone cluster.

Is there anything I need to do to enable full sync for bucket or this is a
bug?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Best practices for OSD on bcache

2021-02-28 Thread Norman.Kern
Hi, guys

I am testing ceph on bcache devices,  I found the performance is not good as 
expected. Does anyone have any best practices for it?  Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-02-28 Thread 特木勒
Hi Istvan:

Thank you so much. :)

Also I open an issue on *tracker.ceph.com *, hope
we could get some responses soon.

Issue: https://tracker.ceph.com/issues/49542

Thanks

Szabo, Istvan (Agoda)  于2021年3月1日周一 上午10:01写道:

> So-so. I had some interruption so it failed on one site, but the other is
> kind of working. This is the first time when I saw data caught up in the
> radosgw-admin data sync status on 1 side.
>
> Today will finish the other problematic site, I’ll let you know the result
> is it working or not.
>
>
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
>
> *From:* 特木勒 
> *Sent:* Sunday, February 28, 2021 1:34 PM
> *To:* Szabo, Istvan (Agoda) 
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [Suspicious newsletter] [ceph-users] RGW: Multiple Site
> does not sync olds data
>
>
>
> Email received from outside the company. If in doubt don't click links nor
> open attachments!
> --
>
> Hi Istvan:
>
>
>
> Thanks for your reply.
>
>
>
> Does directional sync solve the problem? I tried to run `radosgw-admin
> sync init`, bit it still did not work. :(
>
>
>
> Thanks
>
>
>
> Szabo, Istvan (Agoda)  于2021年2月26日周五 上午7:47写道:
>
> Same for me, 15.2.8 also.
> I’m trying directional sync now, looks like symmetrical has issue.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> On 2021. Feb 26., at 1:03, 特木勒  wrote:
>
> Email received from outside the company. If in doubt don't click links
> nor open attachments!
> 
>
> Hi all:
>
> ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)
>
> I have a strange question, I just create a multiple site for Ceph cluster.
> But I notice the old data of source cluster is not synced. Only new data
> will be synced into second zone cluster.
>
> Is there anything I need to do to enable full sync for bucket or this is a
> bug?
>
> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 'ceph df' %USED explanation

2021-02-28 Thread Mark Johnson
I'm in the middle of increasing PG count for one of our pools by making small 
increments, waiting for the process to complete, rinse and repeat.  I'm doing 
it this way so I can control when all this activity is happening and keeping it 
away from the busier production traffic times.

I'm expecting some inbalance as PGs get created on already unbalanced OSDs, 
however our monitoring picked up something today that I'm not really 
understanding.  Our total utilization is just over 50% and about 96% of our 
total data is in this one pool.  Due to there not being enough PGs, the amount 
of data in each is quite large and since they aren't evenly spread across the 
OSDs, there's a bit of inbalance.  That's all cool and to be expected, which is 
the reason for increasing the PG count in the first place.

However, as some PGs are splitting, the new PGs are sometimes being created on 
OSDs that already have a disproportionate amount of data.  Again, not totally 
unexpected.  Our monitoring detected the usage of this pool to be >85% today as 
I neared the end of another increase in PG count.  What I'm not understanding 
is how this value is determined.  I've read other posts and the calculations 
suggested don't give a result that equals what shows in my %USED column.  I'm 
suspecting that it's somehow related to the MAX AVAIL value (which I believe is 
somewhat indirectly related to the amount available based on the individual OSD 
utilization), but none of the posts I read mention this in their calculations 
and I've been unable to create a formula with any of the values I have to end 
up with the  value I have.

For the record, my current total utilization based on a 'ceph osd df' looks 
like this:

  TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)

My most utilised OSD (currently in the process of moving some data off this 
OSD) is 81.58% used with 188G available and a variance of 1.62.

A cut-down output of 'ceph df' looks like this:

GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
39507G 17569G   19930G 50.45
POOLS:
NAME  ID USED   %USED MAX AVAIL 
OBJECTS
default.rgw.buckets.data  30  9552G 86.05 1548G 
36285066

I suspect that as I get the utilization of my over-utilized OSDs down, this 
%USED value will drop.  But, I'd just love to fully understand how this value 
is calculated.

Thanks,
Mark J

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting started with cephadm

2021-02-28 Thread Peter Childs
The fix was to upgrade podman and the issue went away on any containers
restarted, so I'll just do a rolling reboot to clear that issue. Probably
upgrade ceph to 15.2.9 at the same time.

It's a bit of a dev/proof of concept cluster currently, bit it does mean
we're going to need to workout which distro to go with going forward and
which tool box to use

Everyone here is too used to Spectrum Scale or Lustre and this is the first
time I've really even played with ceph.

Peter.



On Sun, 28 Feb 2021, 18:13 David Orman,  wrote:

> Perhaps just swap out your hosts one at a time with a different
> distribution that's more current. We also use podman from the Kubic
> project instead of the OS-provided version. Just make sure to backup
> package files when you install versions from there, as they wipe their
> repos of the old version when new versions come out, leaving you with
> little ability to roll-back.
>
> That output you see is related to LVM, and will likely go away when
> you reboot. We see the same behavior even with Podman 3.0.1, but part
> of our setup process involves rebooting all hosts in order to ensure
> they behave properly. FWIW, this output doesn't impact anything
> negatively, other than being annoying on every ceph command.
>
> I suppose what I'm asking is what this: "start having problems
> starting the OSD up" means, specifically, from your initial email.
> What behavior do you see? What do logs show? Hopefully that will help
> pinpoint the root cause of your problems.
>
> On Sun, Feb 28, 2021 at 4:21 AM Peter Childs  wrote:
> >
> > Currently I'm using the default podman that comes with CentOS7 1.6.4
> which I fear is the issue.
> >
> > /bin/podman: stderr WARNING: The same type, major and minor should not
> be used for multiple devices.
> >
> > Looks to be part of the issue, and I've heard this is an issue in older
> versions of podman.
> >
> > I can't see a ram issue, or a CPU issue it looks like its probably an
> issue with podman mounting overlays, so maybe upgrading podman past that
> available with CentOS 7 is the first plan, shame CentOS 8 is a non-project
> now :(
> >
> > Peter.
> >
> > On Sat, 27 Feb 2021 at 19:37, David Orman  wrote:
> >>
> >> Podman is fine (preferably 3.0+). What were those variables set to
> >> before? With most recent distributions and kernels we've not noticed a
> >> problem with the defaults. Did you notice errors that lead to you
> >> changing them? We have many clusters of 21 nodes, 24 HDDs each,
> >> multiple NVMEs serving as WAL/DB which were on 15.2.7 and prior, but
> >> now all are 15.2.9, running in podman 3.0.1 (fixes issues with the 2.2
> >> series on upgrade). We have less RAM (128G) per node without issues.
> >>
> >> On the OSDs that will not start - what error(s) do you see? You can
> >> inspect the OSDs with "podman logs " if they've started inside of
> >> podman but just aren't joining the cluster; if they haven't, then
> >> looking at the systemctl status for the service or journalctl will
> >> normally give more insight. Hopefully the root cause of your problems
> >> can be identified so it can be addressed directly.
> >>
> >> On Sat, Feb 27, 2021 at 11:34 AM Peter Childs  wrote:
> >> >
> >> > I'm new to ceph, and I've been trying to set up a new cluster with 16
> >> > computers with 30 disks each and 6 SSD (plus boot disks), 256G of
> memory,
> >> > IB Networking. (ok its currently 15 but never mind)
> >> >
> >> > When I take them over about 10 OSD's each they start having problems
> >> > starting the OSD up and I can normally fix this by rebooting them and
> it
> >> > will continue again for a while, and it is possible to get them up to
> the
> >> > full complement with a bit of poking around. (Once its working it fne
> >> > unless you start adding services or moving the OSD's around
> >> >
> >> > Is there anything I can change to make it a bit more stable.
> >> >
> >> > I've already set
> >> >
> >> > fs.aio-max-nr = 1048576
> >> > kernel.pid_max = 4194303
> >> > fs.file-max = 50
> >> >
> >> > which made it a bit better, but I feel it could be even better.
> >> >
> >> > I'm currently trying to upgrade to 15.2.9 from the default cephadm
> version
> >> > of octopus.  The upgrade is going very very slowly. I'm currently
> using
> >> > podman if that helps, I'm not sure if docker would be better? (I've
> mainly
> >> > used singularity when I've handled containers before)
> >> >
> >> > Thanks in advance
> >> >
> >> > Peter Childs
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting started with cephadm

2021-02-28 Thread David Orman
Perhaps just swap out your hosts one at a time with a different
distribution that's more current. We also use podman from the Kubic
project instead of the OS-provided version. Just make sure to backup
package files when you install versions from there, as they wipe their
repos of the old version when new versions come out, leaving you with
little ability to roll-back.

That output you see is related to LVM, and will likely go away when
you reboot. We see the same behavior even with Podman 3.0.1, but part
of our setup process involves rebooting all hosts in order to ensure
they behave properly. FWIW, this output doesn't impact anything
negatively, other than being annoying on every ceph command.

I suppose what I'm asking is what this: "start having problems
starting the OSD up" means, specifically, from your initial email.
What behavior do you see? What do logs show? Hopefully that will help
pinpoint the root cause of your problems.

On Sun, Feb 28, 2021 at 4:21 AM Peter Childs  wrote:
>
> Currently I'm using the default podman that comes with CentOS7 1.6.4 which I 
> fear is the issue.
>
> /bin/podman: stderr WARNING: The same type, major and minor should not be 
> used for multiple devices.
>
> Looks to be part of the issue, and I've heard this is an issue in older 
> versions of podman.
>
> I can't see a ram issue, or a CPU issue it looks like its probably an issue 
> with podman mounting overlays, so maybe upgrading podman past that available 
> with CentOS 7 is the first plan, shame CentOS 8 is a non-project now :(
>
> Peter.
>
> On Sat, 27 Feb 2021 at 19:37, David Orman  wrote:
>>
>> Podman is fine (preferably 3.0+). What were those variables set to
>> before? With most recent distributions and kernels we've not noticed a
>> problem with the defaults. Did you notice errors that lead to you
>> changing them? We have many clusters of 21 nodes, 24 HDDs each,
>> multiple NVMEs serving as WAL/DB which were on 15.2.7 and prior, but
>> now all are 15.2.9, running in podman 3.0.1 (fixes issues with the 2.2
>> series on upgrade). We have less RAM (128G) per node without issues.
>>
>> On the OSDs that will not start - what error(s) do you see? You can
>> inspect the OSDs with "podman logs " if they've started inside of
>> podman but just aren't joining the cluster; if they haven't, then
>> looking at the systemctl status for the service or journalctl will
>> normally give more insight. Hopefully the root cause of your problems
>> can be identified so it can be addressed directly.
>>
>> On Sat, Feb 27, 2021 at 11:34 AM Peter Childs  wrote:
>> >
>> > I'm new to ceph, and I've been trying to set up a new cluster with 16
>> > computers with 30 disks each and 6 SSD (plus boot disks), 256G of memory,
>> > IB Networking. (ok its currently 15 but never mind)
>> >
>> > When I take them over about 10 OSD's each they start having problems
>> > starting the OSD up and I can normally fix this by rebooting them and it
>> > will continue again for a while, and it is possible to get them up to the
>> > full complement with a bit of poking around. (Once its working it fne
>> > unless you start adding services or moving the OSD's around
>> >
>> > Is there anything I can change to make it a bit more stable.
>> >
>> > I've already set
>> >
>> > fs.aio-max-nr = 1048576
>> > kernel.pid_max = 4194303
>> > fs.file-max = 50
>> >
>> > which made it a bit better, but I feel it could be even better.
>> >
>> > I'm currently trying to upgrade to 15.2.9 from the default cephadm version
>> > of octopus.  The upgrade is going very very slowly. I'm currently using
>> > podman if that helps, I'm not sure if docker would be better? (I've mainly
>> > used singularity when I've handled containers before)
>> >
>> > Thanks in advance
>> >
>> > Peter Childs
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting started with cephadm

2021-02-28 Thread Peter Childs
Currently I'm using the default podman that comes with CentOS7 1.6.4 which
I fear is the issue.

/bin/podman: stderr WARNING: The same type, major and minor should not be
used for multiple devices.

Looks to be part of the issue, and I've heard this is an issue in older
versions of podman.

I can't see a ram issue, or a CPU issue it looks like its probably an issue
with podman mounting overlays, so maybe upgrading podman past that
available with CentOS 7 is the first plan, shame CentOS 8 is a non-project
now :(

Peter.

On Sat, 27 Feb 2021 at 19:37, David Orman  wrote:

> Podman is fine (preferably 3.0+). What were those variables set to
> before? With most recent distributions and kernels we've not noticed a
> problem with the defaults. Did you notice errors that lead to you
> changing them? We have many clusters of 21 nodes, 24 HDDs each,
> multiple NVMEs serving as WAL/DB which were on 15.2.7 and prior, but
> now all are 15.2.9, running in podman 3.0.1 (fixes issues with the 2.2
> series on upgrade). We have less RAM (128G) per node without issues.
>
> On the OSDs that will not start - what error(s) do you see? You can
> inspect the OSDs with "podman logs " if they've started inside of
> podman but just aren't joining the cluster; if they haven't, then
> looking at the systemctl status for the service or journalctl will
> normally give more insight. Hopefully the root cause of your problems
> can be identified so it can be addressed directly.
>
> On Sat, Feb 27, 2021 at 11:34 AM Peter Childs  wrote:
> >
> > I'm new to ceph, and I've been trying to set up a new cluster with 16
> > computers with 30 disks each and 6 SSD (plus boot disks), 256G of memory,
> > IB Networking. (ok its currently 15 but never mind)
> >
> > When I take them over about 10 OSD's each they start having problems
> > starting the OSD up and I can normally fix this by rebooting them and it
> > will continue again for a while, and it is possible to get them up to the
> > full complement with a bit of poking around. (Once its working it fne
> > unless you start adding services or moving the OSD's around
> >
> > Is there anything I can change to make it a bit more stable.
> >
> > I've already set
> >
> > fs.aio-max-nr = 1048576
> > kernel.pid_max = 4194303
> > fs.file-max = 50
> >
> > which made it a bit better, but I feel it could be even better.
> >
> > I'm currently trying to upgrade to 15.2.9 from the default cephadm
> version
> > of octopus.  The upgrade is going very very slowly. I'm currently using
> > podman if that helps, I'm not sure if docker would be better? (I've
> mainly
> > used singularity when I've handled containers before)
> >
> > Thanks in advance
> >
> > Peter Childs
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io