[ceph-users] Re: Recommended settings for PostgreSQL

2020-10-19 Thread Dave Hall
Another path that we have been investigating is to use some NVMe on the 
database machine as a cache (bcache, cachefs, etc). Several TB of U.2 
drives in a striped-LVM should enhance performance for 'hot' data and 
cover for the issues of storing a large DB in Ceph.


Note that we haven't tried this yet, but there are at least some 
discussions for MySQL.


-Dave

Dave Hall
Binghamton University

On 10/19/2020 10:49 PM, Brian Topping wrote:

Another option is to let PosgreSQL do the replication with local storage. There 
are great reasons for Ceph, but databases optimize for this kind of thing 
extremely well.

With replication in hand, run snapshots to RADOS buckets for long term storage.


On Oct 17, 2020, at 7:28 AM, Gencer W. Genç  wrote:

Hi,

I have an existing few RBDs. I would like to create a new RBD Image for 
PostgreSQL. Do you have any suggestions for such use cases? For example;

Currently defaults are:

Object size (4MB) and Stripe Unit (None)
Features: Deep flatten + Layering + Exclusive Lock + Object Map + FastDiff

Should I use as is or should I use 16KB of object size and different sets of 
features for PostgreSQL?

Thanks,
Gencer.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommended settings for PostgreSQL

2020-10-19 Thread Brian Topping
Another option is to let PosgreSQL do the replication with local storage. There 
are great reasons for Ceph, but databases optimize for this kind of thing 
extremely well. 

With replication in hand, run snapshots to RADOS buckets for long term storage.

> On Oct 17, 2020, at 7:28 AM, Gencer W. Genç  wrote:
> 
> Hi,
> 
> I have an existing few RBDs. I would like to create a new RBD Image for 
> PostgreSQL. Do you have any suggestions for such use cases? For example;
> 
> Currently defaults are:
> 
> Object size (4MB) and Stripe Unit (None)
> Features: Deep flatten + Layering + Exclusive Lock + Object Map + FastDiff
> 
> Should I use as is or should I use 16KB of object size and different sets of 
> features for PostgreSQL?
> 
> Thanks,
> Gencer.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommended settings for PostgreSQL

2020-10-19 Thread Gencer W . Genç
Marc Roos wrote:
> >   In the past I see some good results (benchmark &
> > latencies) for MySQL  and PostgreSQL. However, I've always used 
> >   4MB object size. Maybe i can get much better
> > performance on smaller  object size. Haven't tried actually.
> 
> Did you tune mysql / postgres for this setup? Did you have a default 
> ceph rbd setup?

Yes, I had to tune some settings on PostgreSQL. Especially on:

synchronous_commit = off

I have a default RBD settings.

Do you have any recommendation?

Thanks,
Gencer.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon DB compaction MON_DISK_BIG

2020-10-19 Thread Anthony D'Atri

> 
> Hi,
> 
> Yeah, sequentially and waited for finish, and it looks like it is still doing 
> something in the background because now it is 9.5GB even if it tells 
> compaction done.
> I think the ceph tell compact initiated harder so not sure how far it will go 
> down, but looks promising. When I sent the email it was 13, now 9.5.

Online compaction isn’t as fast as offline compaction.  If you set 
mon_compact_on_start = true in ceph.conf the mons will compact more efficiently 
before joining the quorum.  This means of course that they’ll take longer to 
start up and become active.  Arguably this should 

> 1 osd is down long time and but that one I want to remove from the cluster 
> soon, all pgs are active clean.

There’s an issue with at least some versions of Luminous where having down/out 
OSDs confounds comnpaction.  If you don’t end up soon with the mon DB size you 
expect, try removing or replacing that OSD and I’ll bet you have better results.

— aad

> 
> mon stat same yes.
> 
> now I fininshed the email it is 8.7Gb.
> 
> I hope I didn't break anything  and it will delete everything.
> 
> Thank you
> 
> From: Anthony D'Atri 
> Sent: Tuesday, October 20, 2020 9:13 AM
> To: ceph-users@ceph.io
> Cc: Szabo, Istvan (Agoda)
> Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG
> 
> Email received from outside the company. If in doubt don't click links nor 
> open attachments!
> 
> 
> I hope you restarted those mons sequentially, waiting between each for the 
> quorum to return.
> 
> Is there any recovery or pg autoscaling going on?
> 
> Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the 
> same?
> 
> — aad
> 
>> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda)  
>> wrote:
>> 
>> Hi,
>> 
>> 
>> I've received a warning today morning:
>> 
>> 
>> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
>> lot of disk space
>> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
>> lot of disk space
>>   mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
>>   mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
>>   mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)
>> 
>> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.
>> 
>> I've also ran this command:
>> 
>> ceph tell mon.`hostname -s` compact on the first node, but it wents down 
>> only to 13GB.
>> 
>> 
>> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
>> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
>> 13G total
>> 
>> 
>> Anything else I can do to reduce it?
>> 
>> 
>> Luminous 12.2.8 is the version.
>> 
>> 
>> Thank you in advance.
>> 
>> 
>> 
>> This message is confidential and is for the sole use of the intended 
>> recipient(s). It may also be privileged or otherwise protected by copyright 
>> or other legal rules. If you have received it by mistake please let us know 
>> by reply email and delete it from your system. It is prohibited to copy this 
>> message or disclose its content to anyone. Any confidentiality or privilege 
>> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
>> the message. All messages sent to and from Agoda may be monitored to ensure 
>> compliance with company policies, to protect the company's interests and to 
>> remove potential malware. Electronic messages may be intercepted, amended, 
>> lost or deleted, or contain viruses.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon DB compaction MON_DISK_BIG

2020-10-19 Thread Anthony D'Atri
I hope you restarted those mons sequentially, waiting between each for the 
quorum to return.

Is there any recovery or pg autoscaling going on?

Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the 
same?

— aad

> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda)  
> wrote:
> 
> Hi,
> 
> 
> I've received a warning today morning:
> 
> 
> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot 
> of disk space
> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a 
> lot of disk space
>mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
>mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
>mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)
> 
> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.
> 
> I've also ran this command:
> 
> ceph tell mon.`hostname -s` compact on the first node, but it wents down only 
> to 13GB.
> 
> 
> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
> 13G total
> 
> 
> Anything else I can do to reduce it?
> 
> 
> Luminous 12.2.8 is the version.
> 
> 
> Thank you in advance.
> 
> 
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bucket notification is working strange

2020-10-19 Thread Krasaev
Hi everyone, I asked the same question in stackoverflow, but will repeat here.

I configured bucket notification using a bucket owner creds and when the owner 
does actions I can see new events in a configured endpoint(kafka actually). 
However, when I try to do actions in the bucket, but with another user creds I 
do not see events in the configured notification topic. Is it expected behavior 
and each user has to configure own topic(is it possible if a user is not system 
at all)? Or I have missed something? Thank you.


https://stackoverflow.com/questions/64384060/enable-bucket-notifications-for-all-users-in-ceph-octopus
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD host count affecting available pool size?

2020-10-19 Thread Eugen Block

Hi,

I'm not sure I understand what your interpretation is.
If you have 30 OSDs each with 1TB you'll end up with 30TB available  
(raw) space, no matter if those OSDs are spread across 3 or 10 hosts.
The crush rules you define determine how many replicas are going to be  
distributed across your OSDs. Having a default recplicated rule with  
"size = 3" will result in 10TB usable space (given your example), just  
for simplification I'm not taking into account the rocksDB sizes etc.



Is Ceph limiting the max available pool size because all of my OSDs are
being hosted on just three nodes? If I had 30 OSDs running across ten nodes
instead, a node failure would result in just three OSDs dropping out
instead of ten.


So the answer would be no, the pool size is defined by available OSDs  
(of a device class) divided by the replica count. What you gain from  
having more servers with fewer OSDs is a higher failure resiliency and  
you're more flexible in terms of data placement (e.g. use  
erasure-coding to save space). The load during the recovery of one  
failed node with 10 OSDs is much higher than having to recover only 3  
OSDs on one node, the clients probably wouldn't even notice. Having  
more nodes improves the performance if you have many clients since  
there are more OSD nodes to talk to, ceph scales out quite well.


If this doesn't answer your question, could you please clarify?

Regards,
Eugen


Zitat von Dallas Jones :


Ah, this sentence in the docs I've overlooked before:

When you deploy OSDs they are automatically placed within the CRUSH map
under a host node named with the hostname for the host they are running on.
This, combined with the default CRUSH failure domain, ensures that replicas
or erasure code shards are separated across hosts and a single host failure
will not affect availability.

I think this means what I thought it would mean - having the OSDs
concentrated onto fewer hosts is limiting the volume size...

On Mon, Oct 19, 2020 at 9:08 AM Dallas Jones 
wrote:


Hi, Ceph brain trust:

I'm still trying to wrap my head around some capacity planning for Ceph,
and I can't find a definitive answer to this question in the docs (at least
one that penetrates my mental haze)...

Does the OSD host count affect the total available pool size? My cluster
consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to
make each SAS drive individually addressable. Each node is running 10 OSDs.

Is Ceph limiting the max available pool size because all of my OSDs are
being hosted on just three nodes? If I had 30 OSDs running across ten nodes
instead, a node failure would result in just three OSDs dropping out
instead of ten.

Is there any rationale to this thinking, or am I trying to manufacture a
solution to a problem I still don't understand?

Thanks,

Dallas


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Octopus

2020-10-19 Thread Amudhan P
Hi,

I have installed Ceph Octopus cluster using cephadm with a single network
now I want to add a second network and configure it as a cluster address.

How do I configure ceph to use second Network as cluster network?.

Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD host count affecting available pool size?

2020-10-19 Thread Dallas Jones
Ah, this sentence in the docs I've overlooked before:

When you deploy OSDs they are automatically placed within the CRUSH map
under a host node named with the hostname for the host they are running on.
This, combined with the default CRUSH failure domain, ensures that replicas
or erasure code shards are separated across hosts and a single host failure
will not affect availability.

I think this means what I thought it would mean - having the OSDs
concentrated onto fewer hosts is limiting the volume size...

On Mon, Oct 19, 2020 at 9:08 AM Dallas Jones 
wrote:

> Hi, Ceph brain trust:
>
> I'm still trying to wrap my head around some capacity planning for Ceph,
> and I can't find a definitive answer to this question in the docs (at least
> one that penetrates my mental haze)...
>
> Does the OSD host count affect the total available pool size? My cluster
> consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to
> make each SAS drive individually addressable. Each node is running 10 OSDs.
>
> Is Ceph limiting the max available pool size because all of my OSDs are
> being hosted on just three nodes? If I had 30 OSDs running across ten nodes
> instead, a node failure would result in just three OSDs dropping out
> instead of ten.
>
> Is there any rationale to this thinking, or am I trying to manufacture a
> solution to a problem I still don't understand?
>
> Thanks,
>
> Dallas
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD host count affecting available pool size?

2020-10-19 Thread Dallas Jones
Hi, Ceph brain trust:

I'm still trying to wrap my head around some capacity planning for Ceph,
and I can't find a definitive answer to this question in the docs (at least
one that penetrates my mental haze)...

Does the OSD host count affect the total available pool size? My cluster
consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to
make each SAS drive individually addressable. Each node is running 10 OSDs.

Is Ceph limiting the max available pool size because all of my OSDs are
being hosted on just three nodes? If I had 30 OSDs running across ten nodes
instead, a node failure would result in just three OSDs dropping out
instead of ten.

Is there any rationale to this thinking, or am I trying to manufacture a
solution to a problem I still don't understand?

Thanks,

Dallas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-19 Thread Michael Thomas

Hi Frank,

I'll give both of these a try and let you know what happens.

Thanks again for your help,

--Mike

On 10/16/20 12:35 PM, Frank Schilder wrote:

Dear Michael,

this is a bit of a nut. I can't see anything obvious. I have two hypotheses 
that you might consider testing.

1) Problem with 1 incomplete PG.

In the shadow hierarchy for your cluster I can see quite a lot of nodes like

 {
 "id": -135,
 "name": "node229~hdd",
 "type_id": 1,
 "type_name": "host",
 "weight": 0,
 "alg": "straw2",
 "hash": "rjenkins1",
 "items": []
 },

I would have expected that hosts without a device of a certain device class are 
*excluded* completely from a tree instead of having weight 0. I'm wondering if 
this could lead to the crush algorithm fail in the way described here: 
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon
 . This might be a long shot, but could you export your crush map and play with 
the tunables as described under this link to see if more tries lead to a valid 
mapping? Note that testing this is harmless and does not change anything on the 
cluster.

>

The hypothesis here is that buckets with weight 0 are not excluded from drawing 
a-priori, but a-posteriori. If there are too many draws of an empty bucket, a 
mapping fails. Allowing more tries should then lead to success. We should at 
least rule out this possibility.

2) About the incomplete PG.

I'm wondering if the problem is that the pool has exactly 1 PG. I don't have a 
test pool with Nautilus and cannot try this out. Can you create a test pool 
with pg_num=pgp_num=1 and see if the PG gets an OSD mapping? If not, can you 
then increase pg_num and pgp_num to, say, 10 and see if this has any effect?

I'm wondering here if there needs to be a minimum number >1 of PGs in a pool. 
Again, this is more about ruling out a possibility than expecting success. As an 
extension to this test, you could increase pg_num and pgp_num of the pool 
device_health_metrics to see if this has any effect.


The crush rules and crush tree look OK to me. I can't really see why the 
missing OSDs are not assigned to the two PGs 1.0 and 7.39d.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 16 October 2020 15:41:29
To: Michael Thomas; ceph-users@ceph.io
Subject: [ceph-users] Re: multiple OSD crash, unfound objects

Dear Michael,


Please mark OSD 41 as "in" again and wait for some slow ops to show up.


I forgot. "wait for some slow ops to show up" ... and then what?

Could you please go to the host of the affected OSD and look at the output of "ceph daemon 
osd.ID ops" or "ceph daemon osd.ID dump_historic_slow_ops" and check what type of 
operations get stuck? I'm wondering if its administrative, like peering attempts.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder
Sent: 16 October 2020 15:09:20
To: Michael Thomas; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects

Dear Michael,

thanks for this initial work. I will need to look through the files you posted 
in more detail. In the meantime:

Please mark OSD 41 as "in" again and wait for some slow ops to show up. As far as I can 
see, marking it "out" might have cleared hanging slow ops (there were 1000 before), but 
they then started piling up again. From the OSD log it looks like an operation that is sent to/from 
PG 1.0, which doesn't respond because it is inactive. Hence, getting PG 1.0 active should resolve 
this issue (later).

Its a bit strange that I see slow ops for OSD 41 in the latest health detail 
(https://pastebin.com/3G3ij9ui). Was the OSD still out when this health report 
was created?

I think we might have misunderstood my question 6. My question was whether or 
not each host bucket corresponds to a physical host and vice versa, that is, 
each physical host has exactly 1 host bucket. I'm asking because it is possible 
to have multiple host buckets assigned to a single physical host and this has 
implications on how to manage things.

Coming back to PG 1.0 (the only PG in pool device_health_metrics as far as I 
can see), the problem is that is has no OSDs assigned. I need to look a bit 
longer at the data you uploaded to find out why. I can't see anything obvious.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 16 October 2020 02:08:01
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects

On 10/14/20 3:49 PM, Frank Schilder wrote:

Hi Michael,

it doesn't look too bad. All degraded objects are due to the undersized PG. If 
this is an EC pool with 

[ceph-users] Re: Recommended settings for PostgreSQL

2020-10-19 Thread Gencer W . Genç
Yes, I had to tune some settings on PostgreSQL. Especially on:

synchronous_commit = off

I have a default RBD settings.

Do you have any recommendation?

Thanks,
Gencer.

On 19.10.2020 12:49:51, Marc Roos  wrote:

> In the past I see some good results (benchmark & latencies) for MySQL
and PostgreSQL. However, I've always used
> 4MB object size. Maybe i can get much better performance on smaller
object size. Haven't tried actually.

Did you tune mysql / postgres for this setup? Did you have a default
ceph rbd setup?




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OIDC Integration

2020-10-19 Thread technical
Dear Pritha, thanks a lot for your feedback and apologies for missing your 
comment about the backporting. Would you have a rough estimate on the next 
Octopus release by any chance?

On another note on the same subject, would you be able to give us some feedback 
on how the users will be created in Ceph? (for example when we used ldap, an 
ldap user used to be created in Ceph for "mapping", will it be the same in this 
case)

If we have multiple tenants (unique usernames "emails" in KeyCloak) how will 
the introspect url's be defined for different tenants?

Thanks in advance
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW with HAProxy

2020-10-19 Thread Seena Fallah
Hi

When I use haproxy with keep-alive mode to rgws, haproxy gives many
responses like this!
Is there any problem with keep-alive mode in rgw?
Using nautilus 14.2.9 with beast frontend.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommended settings for PostgreSQL

2020-10-19 Thread Marc Roos


> In the past I see some good results (benchmark & latencies) for MySQL 
and PostgreSQL. However, I've always used 
> 4MB object size. Maybe i can get much better performance on smaller 
object size. Haven't tried actually.

Did you tune mysql / postgres for this setup? Did you have a default 
ceph rbd setup?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io