[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Fyodor Ustinov
Hi!

> docs.ceph.io ?  If there’s something that you’d like to see added there, 
> you’re
> welcome to submit a tracker ticket, or write to me privately.  It is not
> uncommon for documentation enhancements to be made based on mailing list
> feedback.

Documentation...

Try to install a completely new ceph cluster from scratch on fresh installed 
LTS Ubuntu by this doc https://docs.ceph.com/en/latest/cephadm/install/ . Many 
interesting discoveries await you.
Nothing special - only step by step installation. As described in 
documentation. No more and no less.

WBR,
Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-26 Thread Josh Baergen
Hi Jerry,

I think this is one of those "there must be something else going on
here" situations; marking any OSD out should affect only that one
"slot" in the acting set, at least until backfill completes (and in my
experience has always been the case). It might be worth inspecting the
cluster log on your mons to see if any additional OSDs are flapping
(going down briefly) during this process, as that could cause them to
drop out of the acting set until backfills complete.

There is quite a bit of shuffling of data going on when you fail an
OSD, and that might just be because of the width of your EC profile
given your cluster size and CRUSH rules (I believe that the
'chooseleaf' bit there is involved with that reshuffling, since EC
chunks will be moved around across hosts when an OSD is marked out in
your current configuration).

Unfortunately, that's probably the extent that I can help you, both
because of reaching close to the limit of my understanding of CRUSH
rules in this sort of configuration and because I'll be OOO for a bit
soon. :) Hopefully others can chime in with further ideas.

Josh

On Mon, Jul 26, 2021 at 2:45 AM Jerry Lee  wrote:
>
> After doing more experiments, the outcome answer some of my questions:
>
> The environment is kind of different compared to the one mentioned in
> previous mail.
> 1) the `ceph osd tree`
>  -2 2.06516  root perf_osd
>  -5 0.67868  host jceph-n2-perf_osd
>   2ssd  0.17331  osd.2  up
> 1.0  1.0
>   3ssd  0.15875  osd.3  up
> 1.0  1.0
>   4ssd  0.17331  osd.4  up
> 1.0  1.0
>   5ssd  0.17331  osd.5  up
> 1.0  1.0
> -25 0.69324  host Jceph-n1-perf_osd
>   8ssd  0.17331  osd.8  up
> 1.0  1.0
>   9ssd  0.17331  osd.9  up
> 1.0  1.0
>  10ssd  0.17331  osd.10 up
> 1.0  1.0
>  11ssd  0.17331  osd.11 up
> 1.0  1.0
> -37 0.69324  host Jceph-n3-perf_osd
>  14ssd  0.17331  osd.14 up
> 1.0  1.0
>  15ssd  0.17331  osd.15 up
> 1.0  1.0
>  16ssd  0.17331  osd.16 up
> 1.0  1.0
>  17ssd  0.17331  osd.17 up
> 1.0  1.0
>
> 2) the used CRUSH rule for the EC8+3 pool for which the OSDs are
> selected by 'osd' instead.
> # ceph osd crush rule dump erasure_ruleset_by_osd
> {
> "rule_id": 9,
> "rule_name": "erasure_ruleset_by_osd",
> "ruleset": 9,
> "type": 3,
> "min_size": 1,
> "max_size": 16,
> "steps": [
> {
> "op": "take",
> "item": -2,
> "item_name": "perf_osd"
> },
> {
> "op": "choose_indep",
> "num": 0,
> "type": "osd"
> },
> {
> "op": "emit"
> }
> ]
> }
>
> 3) the erasure-code-profile used to create the EC8+3 pool (min_size = 8)
> # ceph osd erasure-code-profile get jec_8_3
> crush-device-class=ssd
> crush-failure-domain=osd
> crush-root=perf_ssd
> k=8
> m=3
> plugin=isa
> technique=reed_sol_van
>
> The following consequence of acting set after unplugging only 2 OSDs:
>
> T0:
> [3,9,10,5,16,14,8,11,2,4,15]
>
> T1: after issuing `ceph osd out 17`
> [NONE,NONE,10,5,16,14,8,11,2,4,NONE]
> state of this PG: "active+recovery_wait+undersized+degraded+remapped"
>
> T2: before recovery finishes, issuing `ceph osd out 11`
> [NONE,NONE,10,5,16,14,8,NONE,2,4,NONE]
> state of this PG: "down+remapped"
> comment: "not enough up instances of this PG to go active"
>
> With only 2 OSDs out, a PG of the EC8+3 pool enters "down+remapped"
> state.  So, it seems that the min_size of a erasure coded K+M pool
> should be set to K+1 which ensures that the data is intact even one
> more extra OSD is broken during recovery, although the pool may not
> serve IO.
>
> Any feedback and ideas are welcomed and appreciated!
>
> - Jerry
>
> On Mon, 26 Jul 2021 at 11:33, Jerry Lee  wrote:
> >
> > Hello Josh,
> >
> > I simulated the osd.14 failure by the following steps:
> >1. hot unplug the disk
> >2. systemctl stop ceph-osd@14
> >3. ceph osd out 14
> >
> > The used CRUSH rule to create the EC8+3 pool is described as below:
> > # ceph osd crush rule dump erasure_hdd_mhosts
> > {
> > "rule_id": 8,
> > "rule_name": "erasure_hdd_mhosts",
> > "ruleset": 8,
> > "type": 3,
> > "min_size": 1,
> > "max_size": 16,
> > "steps": [
> > {
> > "op": "take",
> > "item": -1,
> > "item_name": "default"
> > },
> > {
> > "op": "chooseleaf_indep",
> > "num": 0,
> > "type": 

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Joshua West
I want to chime in here as well. I am a relatively new Ceph user who
learned about Ceph through my use of Proxmox.
I have two small 5 node Ceph/Proxmox clusters in two different
locations (to play with mirroring), and a mere 300TB of combined
storage.

This is a hobby for me. I find Ceph really interesting, and have been
enjoying my time learning and playing with Ceph.

Naturally, as time goes on, I have more and more time and effort
invested into this hobby, and have definitely run into issues while
learning.

With all that preamble out of the way, I whole-heartedly agree with
Yuri. Particularly when troubleshooting issues, my experience was that
there was an open-but-closed experience with Ceph. The docs are very
thorough, and a lifesaver, but it felt to me that there is a
goldilocks zone for knowledge.

The true basics are mostly included in the docs, but as I progressed
beyond simple cluster setup, and began trying to optimize my cluster,
was where I wished there was a modern and official place to learn and
chat with other users at my relative level of expertise.

This is the first, and quite possibly last bbs mailing list I've
joined. Not because this one is anything but awesome, but because,
frankly, I wasn't even aware they still existed.

Josh


Joshua West
President
403-456-0072
CAYK.ca


On Mon, Jul 26, 2021 at 1:12 PM Yosh de Vos  wrote:
>
> Hi Marc, seems like you had a bad night's sleep right?
> There is just so much wrong with that reply.
>
> I think Yuri has a valid point, it is really hard to find support or
> solutions online. Also the documentation page is lacking explaining basic
> knowledge about Ceph concepts which would be helpful for newcomers.
> it almost feels like Ceph is a closed source solution or not used by many
> from the search results I get.
> Are they too focused on having Ceph consultants to fix your problems or do
> they actually want to build a community to share knowledge?
>
>
>
>
> Op ma 26 jul. 2021 om 19:48 schreef Marc :
>
> >
> > > I feel like ceph is living in 2005.
> >
> > No it is just you. Why don't you start reading
> > https://docs.ceph.com/en/latest/
> >
> > >It's quite hard to find help on
> > > issues related to ceph and it's almost impossible to get involved into
> > > helping others.
> >
> > ???, Just click the reply button, you must be able to find that, not?
> >
> > > There's a BBS aka Mailman maillist, which is from 1980 era and there's
> > > an irc channel that's dead.
> > > Why not set a Q board up or a google group at least?
> >
> > Because that is shit, you can not sign up via email, google is putting up
> > these cookie walls, and last but not least you are forcing people to share
> > their data with google. Maybe you do not care what data google grabbing of
> > you, but others might do.
> >
> > > Why not open a
> > > "Discussions" tab on github so people would be able to get connected?
> >
> > Because it would spread the knowledge over different media, and therefore
> > you are likely to create a situation where on each medium your repsonse
> > times go down. Everyone has email, not everyone has a github account.
> >
> > > Why do I have to end up on random reddit boards and servethehome and
> >
> > Wtf reddit, servethehome?? There is nothing you can find there. Every time
> > there is reddit link in my search results, the information there is shit. I
> > am not even opening reddit links anymore.
> >
> > > proxmox forums trying to nit pick pieces of knowledge from random
> > > persons?
> > >
> >
> > Yes if I want to get more info on linux, I am always asking at microsoft
> > technet.
> >
> > > I'm always losing hope each time I have to deal with ceph issues.
> >
> > Issues?? If you have a default setup, you hardly have (one even can say
> > no) issues.
> >
> > > But
> > > when it works, it's majestic of course. Documentation (both on redhat
> > > side and main docs) is pretty thorough, though.
> > >
> >
> > So if you read it all, you should not have any problems. I did not even
> > read all, and do not have any issues (knock on wood, of course). But I have
> > the impression you (like many others here) are not taking enough time to
> > educate yourself.
> > If you a aspire to become a brain surgion, you also do not get educated
> > via reddit, not? Educate yourself, so when the shit hits the fan, you can
> > fix the majority yourself. Ceph is not a wordpress project.
> >
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Anthony D'Atri


>> Are they too focused on having Ceph consultants to fix your problems or
>> do they actually want to build a community to share knowledge?
>> 
> 
> I am also a little worried about this strategy. You can see also that redhat 
> is putting it's information pages behind a login. Now they cancelled CentOS. 
> What is next to come, p

CentOS isn’t canceled.

Ceph — and thus everyone using it -- has benefitted dramatically from Red Hat’s 
contributions to the Ceph codebase and community.  They have employees with 
addictions to food and shelter, who are paid by charging for Red Hat services.  
That said, last I knew one can sign up for a developer account and access many 
of Red Hat’s articles.

The community has always been Ceph’s superpower.

Mailing lists are far from dead or obsolete.  They accomodate people in 
timezones around the world, and are widely and readily indexed and archived.  
That said, a web forum might not be a bad idea, given resources to host and 
administer it, though it might be more susceptible to data loss.  

> Also the documentation page is lacking explaining basic
> knowledge about Ceph concepts which would be helpful for newcomers.


docs.ceph.io ?  If there’s something that you’d like to see added there, you’re 
welcome to submit a tracker ticket, or write to me privately.  It is not 
uncommon for documentation enhancements to be made based on mailing list 
feedback.

I can also recommend a book for people getting started with Ceph ;)

— aad


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Виталий Филиппов
Google groups is only like 2007. Use Telegram @ceph_users and/or @ceph_ru :-) 

24 июля 2021 г. 0:27:21 GMT+03:00, y...@deepunix.net пишет:
>I feel like ceph is living in 2005. It's quite hard to find help on
>issues related to ceph and it's almost impossible to get involved into
>helping others. 
>There's a BBS aka Mailman maillist, which is from 1980 era and there's
>an irc channel that's dead. 
>
>Why not set a Q board up or a google group at least? Why not open a
>"Discussions" tab on github so people would be able to get connected?
>
>Why do I have to end up on random reddit boards and servethehome and
>proxmox forums trying to nit pick pieces of knowledge from random
>persons?
>
>
>I'm always losing hope each time I have to deal with ceph issues. But
>when it works, it's majestic of course. Documentation (both on redhat
>side and main docs) is pretty thorough, though.
>
>tnx
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin

On 7/26/21 12:02 PM, Ernesto Puerta wrote:
> Hi Harry,
>
> No, that feature is still there. There's been a recent thread in this
> mailing list (please see "Pacific 16.2.5 Dashboard minor regression
> ")
> about an unrelated change in cephadm that might impact this failover
> mechanism.
>
> What URL are you getting redirected to now? Are you using a reverse
> proxy/load balancer in front of the Dashboard
> 
> (e.g.: HAProxy)?

No redirection, nothing. Just timeout on every manager other than the
active one.  Adding a HAproxy would be easily done, but seems redundant
to ceph internal capability -- that at one time worked, anyhow.



>
> Kind Regards,
> Ernesto
>
>
> On Mon, Jul 26, 2021 at 4:06 PM Harry G. Coin  > wrote:
>
> Somewhere between Nautilus and Pacific the hosts running standby
> managers, which previously would redirect browsers to the currently
> active mgr/dashboard, seem to have stopped doing that.   Is that a
> switch somewhere?  Or was I just happily using an undocumented
> feature?
>
> Thanks
>
> Harry Coin
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> 
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Brad Hubbard
On Tue, Jul 27, 2021 at 3:49 AM Marc  wrote:
>
>
> > I feel like ceph is living in 2005.
>
> No it is just you. Why don't you start reading 
> https://docs.ceph.com/en/latest/
>
> >It's quite hard to find help on
> > issues related to ceph and it's almost impossible to get involved into
> > helping others.
>
> ???, Just click the reply button, you must be able to find that, not?
>
> > There's a BBS aka Mailman maillist, which is from 1980 era and there's
> > an irc channel that's dead.

Can you clarify which IRC channel specifically you are referring to here?

> > Why not set a Q board up or a google group at least?
>
> Because that is shit, you can not sign up via email, google is putting up 
> these cookie walls, and last but not least you are forcing people to share 
> their data with google. Maybe you do not care what data google grabbing of 
> you, but others might do.
>
> > Why not open a
> > "Discussions" tab on github so people would be able to get connected?
>
> Because it would spread the knowledge over different media, and therefore you 
> are likely to create a situation where on each medium your repsonse times go 
> down. Everyone has email, not everyone has a github account.
>
> > Why do I have to end up on random reddit boards and servethehome and
>
> Wtf reddit, servethehome?? There is nothing you can find there. Every time 
> there is reddit link in my search results, the information there is shit. I 
> am not even opening reddit links anymore.
>
> > proxmox forums trying to nit pick pieces of knowledge from random
> > persons?
> >
>
> Yes if I want to get more info on linux, I am always asking at microsoft 
> technet.
>
> > I'm always losing hope each time I have to deal with ceph issues.
>
> Issues?? If you have a default setup, you hardly have (one even can say no) 
> issues.
>
> > But
> > when it works, it's majestic of course. Documentation (both on redhat
> > side and main docs) is pretty thorough, though.
> >
>
> So if you read it all, you should not have any problems. I did not even read 
> all, and do not have any issues (knock on wood, of course). But I have the 
> impression you (like many others here) are not taking enough time to educate 
> yourself.
> If you a aspire to become a brain surgion, you also do not get educated via 
> reddit, not? Educate yourself, so when the shit hits the fan, you can fix the 
> majority yourself. Ceph is not a wordpress project.
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Brad

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Brad Hubbard
On Tue, Jul 27, 2021 at 5:53 AM Nico Schottelius
 wrote:
>
>
> Good evening dear mailing list,
>
> while I do think we have a great mailing list (this is one of the most
> helpful open source mailing lists I'm subscribed to), I do agree with
> the ceph IRC channel not being so helpful. The join/leave messages on
> most days significantly exceeds the number of real messages.

Can you clarify which IRC channel specifically you are referring to?

>
> I am not sure what is the reason for it, but maybe IRC is not for
> everyone. As some time ago we opened a Matrix channel on
> #ceph:ungleich.ch, I wanted to take the opportunity to invite you, in
> case you like real time discussion, but you are not into IRC.
>
> In case you don't have a matrix account yet, you can find more
> information about it on https://ungleich.ch/u/projects/open-chat/.
>
> HTH and best regards,
>
> Nico
>
> --
> Sustainable and modern Infrastructures by ungleich.ch
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Brad

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Marc


> In case you don't have a matrix account yet, you can find more
> information about it on https://ungleich.ch/u/projects/open-chat/.

Yes! Matrix, if they finally fix the reverse proxy functionality, I will be the 
first to join :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Nico Schottelius


Good evening dear mailing list,

while I do think we have a great mailing list (this is one of the most
helpful open source mailing lists I'm subscribed to), I do agree with
the ceph IRC channel not being so helpful. The join/leave messages on
most days significantly exceeds the number of real messages.

I am not sure what is the reason for it, but maybe IRC is not for
everyone. As some time ago we opened a Matrix channel on
#ceph:ungleich.ch, I wanted to take the opportunity to invite you, in
case you like real time discussion, but you are not into IRC.

In case you don't have a matrix account yet, you can find more
information about it on https://ungleich.ch/u/projects/open-chat/.

HTH and best regards,

Nico

--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Marc
> 
> Hi Marc, seems like you had a bad night's sleep right?

No, no, no I am really like that all the time ;)

> it is really hard to find support or
> solutions online. 

What do you even mean by this?

> Also the documentation page is lacking explaining
> basic knowledge about Ceph concepts which would be helpful for
> newcomers.

Lets say I have linux sysadmin background, if you have that, you have it up and 
running quickly. Recently I installed a test cluster. After reading my manual 
for the new server install, I am still surprised how easily and quickly you can 
have a 3 node cluster, with 3 osds, 1 mon, 1 mgr and 1 mds online.
Now they are trying to push this container solution, which I do not get. Then 
you have to learn ceph and learn about containers.

> it almost feels like Ceph is a closed source solution or not used by
> many from the search results I get.

Who cares how many use it. If cern and nasa are using it, it is good enough for 
me.

> Are they too focused on having Ceph consultants to fix your problems or
> do they actually want to build a community to share knowledge?
> 

I am also a little worried about this strategy. You can see also that redhat is 
putting it's information pages behind a login. Now they cancelled CentOS. What 
is next to come, p
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread cek+ceph
Have found the problem. All this was caused by missing mon_host directive in 
ceph.conf. I have expected userspace to catch this, but it looks like it didn't 
care. 
We use DNS SRV in this cluster.

With mon_host directive reinstated, it was able to connect:
Jul 26 09:51:40 xx kernel: libceph: mon0 10.xx:6789 session established
Jul 26 09:51:40 xx kernel: libceph: client188721 fsid 
548a0823-815a-4ac5-a2e5-42cc7e8206ab
Jul 26 09:51:40 xx kernel: rbd: image blk1: image uses unsupported features: 
0x38

I'm wondering what happens in case this mon1 host goes down, will the kernel 
module go through the remaining mon directive addresses?

As re: strace, here you go:

# strace -f -e write -s 500  rbd device map test1/blk1 --user testing-rw
strace: Process 12962 attached
strace: Process 12963 attached
strace: Process 12964 attached
[pid 12964] write(7, " name=testing-rw,key=client.testing-rw test1 blk1 -", 51) 
= -1 ESRCH (No such process)
[pid 12964] write(6, "\375\377\377\377", 4) = 4
[pid 12961] write(2, "rbd: sysfs write failed", 23rbd: sysfs write failed 

[pid 12964] +++ exited with 0 +++
[pid 12961] <... write resumed>)= 23
[pid 12961] write(2, "\n", 1
)   = 1
strace: Process 12970 attached
strace: Process 12971 attached
strace: Process 12972 attached
[pid 12961] write(6, "c", 1)= 1
strace: Process 12973 attached
strace: Process 12974 attached
strace: Process 12975 attached
strace: Process 12976 attached
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12961] write(6, "c", 1)= 1
[pid 12971] write(6, "c", 1)= 1
[pid 12971] write(12, "c", 1)   = 1
[pid 12961] write(9, "c", 1)= 1
[pid 12976] +++ exited with 0 +++
[pid 12975] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12974] +++ exited with 0 +++
[pid 12973] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
strace: Process 12977 attached
[pid 12961] write(6, "c", 1)= 1
strace: Process 12978 attached
strace: Process 12979 attached
strace: Process 12980 attached
strace: Process 12981 attached
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12961] write(6, "c", 1)= 1
[pid 12971] write(12, "c", 1)   = 1
[pid 12971] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
strace: Process 12982 attached
[pid 12961] write(9, "c", 1)= 1
strace: Process 12983 attached
strace: Process 12984 attached
[pid 12978] write(12, "c", 1)   = 1
strace: Process 12985 attached
strace: Process 12986 attached
[pid 12961] write(6, "c", 1)= 1
[pid 12984] write(6, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
strace: Process 12987 attached
strace: Process 12988 attached
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(9, "c", 1)= 1
[pid 12984] write(12, "c", 1)   = 1
[pid 12984] write(9, "c", 1)= 1
strace: Process 12989 attached
strace: Process 12990 attached
[pid 12961] write(1, "In some cases useful info is found in syslog - try 
\"dmesg | tail\".\n", 67In some cases useful info is found in syslog - try 
"dmesg | tail".
) = 67
[pid 12990] +++ exited with 0 +++
[pid 12989] +++ exited with 0 +++
[pid 12984] +++ exited with 0 +++
[pid 12983] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12961] write(6, "c", 1)= 1
[pid 12982] +++ exited with 0 +++
[pid 12961] write(12, "c", 1)   = 1
[pid 12961] write(9, "c", 1)= 1
[pid 12981] +++ exited with 0 +++
[pid 12980] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12979] +++ exited with 0 +++
[pid 12978] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12961] write(9, "c", 1)= 1
[pid 12961] write(12, "c", 1)   = 1
[pid 12977] +++ exited with 0 +++
[pid 12961] write(2, "rbd: map failed: ", 17rbd: map failed: ) = 17
[pid 12961] write(2, "(3) No such process", 19(3) No such process) = 19
[pid 12961] write(2, "\n", 1
)   = 1
[pid 12985] +++ exited with 0 +++
[pid 12986] +++ exited with 0 +++
[pid 12987] +++ exited with 0 +++
[pid 12988] +++ exited with 0 +++
[pid 12961] write(6, "c", 1)= 1
[pid 12970] +++ exited with 0 +++
[pid 12961] write(9, "c", 1)

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Yosh de Vos
Hi Marc, seems like you had a bad night's sleep right?
There is just so much wrong with that reply.

I think Yuri has a valid point, it is really hard to find support or
solutions online. Also the documentation page is lacking explaining basic
knowledge about Ceph concepts which would be helpful for newcomers.
it almost feels like Ceph is a closed source solution or not used by many
from the search results I get.
Are they too focused on having Ceph consultants to fix your problems or do
they actually want to build a community to share knowledge?




Op ma 26 jul. 2021 om 19:48 schreef Marc :

>
> > I feel like ceph is living in 2005.
>
> No it is just you. Why don't you start reading
> https://docs.ceph.com/en/latest/
>
> >It's quite hard to find help on
> > issues related to ceph and it's almost impossible to get involved into
> > helping others.
>
> ???, Just click the reply button, you must be able to find that, not?
>
> > There's a BBS aka Mailman maillist, which is from 1980 era and there's
> > an irc channel that's dead.
> > Why not set a Q board up or a google group at least?
>
> Because that is shit, you can not sign up via email, google is putting up
> these cookie walls, and last but not least you are forcing people to share
> their data with google. Maybe you do not care what data google grabbing of
> you, but others might do.
>
> > Why not open a
> > "Discussions" tab on github so people would be able to get connected?
>
> Because it would spread the knowledge over different media, and therefore
> you are likely to create a situation where on each medium your repsonse
> times go down. Everyone has email, not everyone has a github account.
>
> > Why do I have to end up on random reddit boards and servethehome and
>
> Wtf reddit, servethehome?? There is nothing you can find there. Every time
> there is reddit link in my search results, the information there is shit. I
> am not even opening reddit links anymore.
>
> > proxmox forums trying to nit pick pieces of knowledge from random
> > persons?
> >
>
> Yes if I want to get more info on linux, I am always asking at microsoft
> technet.
>
> > I'm always losing hope each time I have to deal with ceph issues.
>
> Issues?? If you have a default setup, you hardly have (one even can say
> no) issues.
>
> > But
> > when it works, it's majestic of course. Documentation (both on redhat
> > side and main docs) is pretty thorough, though.
> >
>
> So if you read it all, you should not have any problems. I did not even
> read all, and do not have any issues (knock on wood, of course). But I have
> the impression you (like many others here) are not taking enough time to
> educate yourself.
> If you a aspire to become a brain surgion, you also do not get educated
> via reddit, not? Educate yourself, so when the shit hits the fan, you can
> fix the majority yourself. Ceph is not a wordpress project.
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Dave Piper
Hi Igor,

Thanks for your time looking into this.

I've attached a 5 minute window of OSD logs, which includes several restart 
attempt (each one takes ~25 seconds).

When I said it looked like we were starting up in a different state, I'm 
referring to how "Recovered from manifest file" log appears twice, with 
different logs afterwards. This behaviour seems to repeat reliably on each 
restart of the OSD. My interpretation of this was that when the initial 
recovery attempt leads to the rocksdb shutdown, ceph is automatically trying to 
start the OSD in some alternative state but that this is also failing (with the 
bdev errors I copied in).  Possibly I'm inferring too much.

I tried turning up the logging levels for rocksdb and bluestore but they're 
both very spammy so I've not included this in the attached logs. Let me know if 
you think that would be helpful.

My ceph version is 15.2.11. We're running a containerized deployment using 
docker image ceph-daemon:v5.0.10-stable-5.0-octopus-centos-8 .

[qs-admin@condor_sc0 metaswitch]$ sudo docker exec b732f9135b42 ceph version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)

Cheers,

Dave



-Original Message-
From: Igor Fedotov  
Sent: 23 July 2021 20:45
To: Dave Piper ; ceph-users@ceph.io
Subject: [EXTERNAL] Re: [ceph-users] OSDs flapping with "_open_alloc loaded 132 
GiB in 2930776 extents available 113 GiB"

Hi Dave,

The follow log line indicates that allocator has just completed loading 
information about free disk blocks into memory.  And it looks perfectly fine.

>_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB
  

Subsequent rocksdb shutdown looks weird without any other log output indicating 
the issue.
Curious what do you mean under "

After that we seem to try starting up in a slightly different state and get a 
different set of errors:

"
The resulted errors show lack of disk space at some point but I'd definitely 
like to get the full startup log.

Please also specify which Octopus version do you have?

Thanks,
Igor

On 7/23/2021 6:48 PM, Dave Piper wrote:
> Hi all,
>
> We've got a containerized test cluster with 3 OSDs and ~ 220GiB of data. 
> Shortly after upgrading from nautilus -> octopus, 2 of the 3 OSDs have 
> started flapping. I've also got alarms about the MDS being damaged, which 
> we've seen elsewhere and have a recovery process for, but I'm unable to run 
> this (I suspect because I've only got 1 functioning OSD). My RGWs are also 
> failing to start, again I suspect because of the bad state of OSDs. I've 
> tried restarting all OSDs, rebooting all servers, checked auth (all looks 
> fine) - but I'm still in the same state.
>
> My OSDs seem to be failing at the  "_open_alloc opening allocation metadata" 
> step; looking at logs for each OSD restart, the OSD writes this log, then no 
> logs for a few minutes and then logs:
>
>  bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 132 GiB in 
> 2930776 extents available 113 GiB
>  rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background 
> work
>
> After that we seem to try starting up in a slightly different state and get a 
> different set of errors:
>
>  bluefs _allocate failed to allocate 0x100716 on bdev 1, free 0xd; 
> fallback to bdev 2
>  bluefs _allocate unable to allocate 0x100716 on bdev 2, free 
> 0x; fallback to slow device expander
>
> and eventually crash and log a heap of stack dumps.
>
> I don't know what extents are but I seem to have a lot of them, and more than 
> I've got capacity for? Maybe I'm running out of RAM or disk space somewhere, 
> but I've got 21GB of free RAM on the server, and each OSD has a 350GiB device 
> attached to it.
>
>
>
> I'm wondering if anyone has seen anything like this before or can suggest 
> next debug steps to take?
>
> Cheers,
>
> Dave
>
>
>
> Full OSD logs surrounding the "_open_alloc opening allocation metadata" step:
>
>
> Jul 23 00:07:13 condor_sc0 container_name/ceph-osd-1[1709]: 
> 2021-07-23T00:07:13.818+ 7f3de111bf40  4 rocksdb: EVENT_LOG_v1 
> {"time_micros": 1626998833819439, "job": 1, "event": 
> "recovery_started", "log_files": [392088, 392132]}
>
> Jul 23 00:07:13 condor_sc0 container_name/ceph-osd-1[1709]: 
> 2021-07-23T00:07:13.818+ 7f3de111bf40  4 rocksdb: 
> [db/db_impl_open.cc:583] Recovering log #392088 mode 0
>
> Jul 23 00:07:17 condor_sc0 container_name/ceph-osd-1[1709]: 
> 2021-07-23T00:07:17.240+ 7f3de111bf40  4 rocksdb: 
> [db/db_impl_open.cc:583] Recovering log #392132 mode 0
>
> Jul 23 00:07:17 condor_sc0 container_name/ceph-osd-1[1709]: 
> 2021-07-23T00:07:17.486+ 7f3de111bf40  4 rocksdb: EVENT_LOG_v1 
> {"time_micros": 1626998837486404, "job": 1, "event": 
> "recovery_finished"}
>
> Jul 23 00:07:17 condor_sc0 container_name/ceph-osd-1[1709]: 
> 2021-07-23T00:07:17.486+ 7f3de111bf40  1 
> bluestore(/var/lib/ceph/osd/ceph-1) _open_db opened rocksdb path db 
> options 
> 

[ceph-users] we're living in 2005.

2021-07-26 Thread yuri
I feel like ceph is living in 2005. It's quite hard to find help on issues 
related to ceph and it's almost impossible to get involved into helping others. 
There's a BBS aka Mailman maillist, which is from 1980 era and there's an irc 
channel that's dead. 

Why not set a Q board up or a google group at least? Why not open a 
"Discussions" tab on github so people would be able to get connected?

Why do I have to end up on random reddit boards and servethehome and proxmox 
forums trying to nit pick pieces of knowledge from random persons?


I'm always losing hope each time I have to deal with ceph issues. But when it 
works, it's majestic of course. Documentation (both on redhat side and main 
docs) is pretty thorough, though.

tnx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Dave Piper
Hi Igor, 

> So to get more verbose but less log one can set both debug-bluestore and 
> debug-bluefs to 1/20. ...

More verbose logging attached. I've trimmed the file to a single restart 
attempt to keep the filesize down; let me know if there's not enough here.

> It would be also great to collect the output for the following commands: ...

I've tried running ceph-bluestore-tool previously on this system but both 
commands fails with the following error:

[qs-admin@condor_sc0 ~]$ sudo docker exec 419d997e5a05 ceph-bluestore-tool 
--path /var/lib/ceph/osd/ceph-1 --command bluefs-stats
error from cold_open: (11) Resource temporarily unavailable
2021-07-26T10:50:15.032+ 7f9a9bf68240 -1 
bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
/var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still running?)(11) Resource 
temporarily unavailable
[qs-admin@condor_sc0 ~]$

There's only one OSD running on this server; should I be stopping it / the 
other OSDs in the cluster before running the `ceph-bluestore-tool` command? 
Previously when the OSDs were failing to start, /var/lib/ceph/osd/ceph-1/ was 
empty but it now contains the following:

[qs-admin@condor_sc0 ~]$ sudo docker exec 419d997e5a05 ls 
/var/lib/ceph/osd/ceph-1
block
ceph_fsid
fsid
keyring
ready
require_osd_release
type
whoami
[qs-admin@condor_sc0 ~]$


> And finally you can try to switch to bitmap allocator as a workaround ...

Switching to the bitmap allocator as you suggested has led to both failing OSDs 
starting up successfully. I've now got 3/3 OSDs up and in!  The cluster still 
has MDS issues that were blocked behind getting the OSDs running as mentioned 
in my original post, but I think these are unrelated to the OSD problem as it's 
an issue we've seen in isolation elsewhere.

So - that's a big step forward! Should I retry with my original config on the 
latest octopus release and see if this is now fixed?


Cheers again,

Dave


-Original Message-
From: Igor Fedotov  
Sent: 26 July 2021 11:14
To: Dave Piper ; ceph-users@ceph.io
Subject: Re: [EXTERNAL] Re: [ceph-users] OSDs flapping with "_open_alloc loaded 
132 GiB in 2930776 extents available 113 GiB"

Hi Dave,

Some notes first:

1) The following behavior is fine, BlueStore mounts in two stages - the first 
one is read-only and among other things it loads allocation map from DB. And 
that's exactly the case here.

Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.703+
7f0e15b3df40  1 bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded
132 GiB in 2930776 extents available 113 GiB Jul 26 08:55:31 condor_sc0 
docker[15282]: 2021-07-26T08:55:31.703+
7f0e15b3df40  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background 
work Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.704+
7f0e15b3df40  4 rocksdb: [db/db_impl.cc:563] Shutdown complete

2) What's really broken is the following allocation attempt:

Jul 26 08:55:34 condor_sc0 docker[15282]: 2021-07-26T08:55:34.767+
7f0e15b3df40  1 bluefs _allocate failed to allocate 0x100716 on bdev 1, free 
0xd; fallback to bdev 2 Jul 26 08:55:34 condor_sc0 docker[15282]: 
2021-07-26T08:55:34.767+
7f0e15b3df40  1 bluefs _allocate unable to allocate 0x100716 on bdev 2, free 
0x; fallback to slow device expander Jul 26 08:55:35 condor_sc0 
docker[15282]: 2021-07-26T08:55:35.042+
7f0e15b3df40 -1 bluestore(/var/lib/ceph/osd/ceph-1)
allocate_bluefs_freespace failed to allocate on 0x4000 min_size
0x11 > allocated total 0x0 bluefs_shared_alloc_size 0x1 allocated 0x0 
available 0x 1c09738000 Jul 26 08:55:35 condor_sc0 docker[15282]: 
2021-07-26T08:55:35.044+
7f0e15b3df40 -1 bluefs _allocate failed to expand slow device to fit 
+0x100716
Jul 26 08:55:35 condor_sc0 docker[15282]: 2021-07-26T08:55:35.044+
7f0e15b3df40 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 
0x100716

This occurs during BlueFS recovery and that's an attempt to get more space to 
write out the bluefs log. This shouldn't fail given the plenty of free space:

... available 0x 1c09738000 ...


So to get more verbose but less log one can set both debug-bluestore and 
debug-bluefs to 1/20. This way just last 1 lines of the log preceeding the 
crash would be at level 20. Which seems sufficient for the troubleshooting.

It would be also great to collect the output for the following commands:

ceph-bluestore-tool --path  --command bluefs-bdev-sizes

ceph-bluestore-tool --path  --command bluefs-stats


And finally you can try to switch to bitmap allocator as a workaround - we've 
fixed a couple of issues in Hybrid one which prevented from proper allcoations 
under some circumstances. The fixes were made after v15.2.11 release hence this 
might be the case. So please try setting:

bluestore_allocator = bitmap

bluefs_allocator = bitmap


Thanks,

Igor


On 7/26/2021 12:14 PM, Dave Piper wrote:
> Hi Igor,
>
> Thanks for your time looking into this.
>
> I've attached 

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Marc


> I feel like ceph is living in 2005. 

No it is just you. Why don't you start reading https://docs.ceph.com/en/latest/

>It's quite hard to find help on
> issues related to ceph and it's almost impossible to get involved into
> helping others.

???, Just click the reply button, you must be able to find that, not?

> There's a BBS aka Mailman maillist, which is from 1980 era and there's
> an irc channel that's dead.
> Why not set a Q board up or a google group at least?

Because that is shit, you can not sign up via email, google is putting up these 
cookie walls, and last but not least you are forcing people to share their data 
with google. Maybe you do not care what data google grabbing of you, but others 
might do. 

> Why not open a
> "Discussions" tab on github so people would be able to get connected?

Because it would spread the knowledge over different media, and therefore you 
are likely to create a situation where on each medium your repsonse times go 
down. Everyone has email, not everyone has a github account.

> Why do I have to end up on random reddit boards and servethehome and

Wtf reddit, servethehome?? There is nothing you can find there. Every time 
there is reddit link in my search results, the information there is shit. I am 
not even opening reddit links anymore.

> proxmox forums trying to nit pick pieces of knowledge from random
> persons?
> 

Yes if I want to get more info on linux, I am always asking at microsoft 
technet.

> I'm always losing hope each time I have to deal with ceph issues.

Issues?? If you have a default setup, you hardly have (one even can say no) 
issues. 

> But
> when it works, it's majestic of course. Documentation (both on redhat
> side and main docs) is pretty thorough, though.
> 

So if you read it all, you should not have any problems. I did not even read 
all, and do not have any issues (knock on wood, of course). But I have the 
impression you (like many others here) are not taking enough time to educate 
yourself. 
If you a aspire to become a brain surgion, you also do not get educated via 
reddit, not? Educate yourself, so when the shit hits the fan, you can fix the 
majority yourself. Ceph is not a wordpress project.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [Kolla][wallaby] add new cinder backend

2021-07-26 Thread Ignazio Cassano
Hello All,
I am playing witk kolla wallaby on ubuntu 20.04.
When I add a new backend type, volume container stop to work and continue
to restarting and all instances are stopped.
I can solve only restarting one controller at a time.
This morning I had cinder configurated for nfs netapp with 24 instances
running.
I added the ceph backend in globals.yml and all configurations suggested in
/etc/kolla/config.
Then I launched kolla-ansible deploy.
No errors occurred in deployment but all instances went in stopped state
and cinder service-list reported nfs backend down. Cinder volume container
remained in restarting.
Only restarting one controller at a time cinder volume started and then I
was able ti use cinder with ceph and nfs.
Seems it happens every time I modify cinder.
Any suggestion, please?
Ignazio
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Ernesto Puerta
Hi Harry,

No, that feature is still there. There's been a recent thread in this
mailing list (please see "Pacific 16.2.5 Dashboard minor regression
")
about an unrelated change in cephadm that might impact this failover
mechanism.

What URL are you getting redirected to now? Are you using a reverse
proxy/load balancer in front of the Dashboard

(e.g.: HAProxy)?

Kind Regards,
Ernesto


On Mon, Jul 26, 2021 at 4:06 PM Harry G. Coin  wrote:

> Somewhere between Nautilus and Pacific the hosts running standby
> managers, which previously would redirect browsers to the currently
> active mgr/dashboard, seem to have stopped doing that.   Is that a
> switch somewhere?  Or was I just happily using an undocumented feature?
>
> Thanks
>
> Harry Coin
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Mon, Jul 26, 2021 at 5:25 PM  wrote:
>
> Have found the problem. All this was caused by missing mon_host directive in 
> ceph.conf. I have expected userspace to catch this, but it looks like it 
> didn't care.

We should probably add an explicit check for that so that the error
message is explicit.

> We use DNS SRV in this cluster.
>
> With mon_host directive reinstated, it was able to connect:
> Jul 26 09:51:40 xx kernel: libceph: mon0 10.xx:6789 session established
> Jul 26 09:51:40 xx kernel: libceph: client188721 fsid 
> 548a0823-815a-4ac5-a2e5-42cc7e8206ab
> Jul 26 09:51:40 xx kernel: rbd: image blk1: image uses unsupported features: 
> 0x38

Now you just need to disable object-map, fast-diff and deep-flatten
with "rbd feature disable" as mentioned by Marc and Dimitri.

>
> I'm wondering what happens in case this mon1 host goes down, will the kernel 
> module go through the remaining mon directive addresses?

Yes, these addreses are used to put together the initial list of
monitors (initial monmap).  The kernel picks one at random and keeps
trying until the session with either of them gets established.  After
that the real monmap received from the cluster replaces the initial
list.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Mark Nelson
Yeah, I suspect that regular manual compaction might be the necessary 
work around here if tombstones are slowing down iterator performance.  
If it is related to tombstones, it would be similar to what we saw when 
we tried to use deleterange and saw similar performance issues.


I'm a little at a lose as to why nautilus was better (other than the 
ill-fated bluefs_buffered_io change).  There has  been a fair amount of 
code churn both in Ceph but also in rocksdb related to some of this 
though.  Pacific is definitely more likely to get backports for this 
kind of thing IMHO.



Mark


On 7/26/21 6:19 AM, Igor Fedotov wrote:
Unfortunately I'm not an expert in RGW hence nothing to recommend from 
that side.


Apparently your issues are caused by bulk data removal - it appears 
that RocksDB can hardly sustain such things and its performance 
degrades. We've seen that plenty of times before.


So far there are two known workarounds - manual DB compaction with 
using ceph-kvstore-tool and setting bluefs_buffer_io to true. The 
latter makes sense for some Ceph releases which got that parameter set 
to false by default, v15.2.12 is one of them. And indeed that setting 
might cause high RAM usage in cases - you might want to look for 
relevant recent PRs at github or ask Mark Nelson from RH for more 
details.


Nevertheless current upstream recommendation/default is to have it set 
to true as it greatly improves DB performance.



So you might want to try to compact RocksDB as per above but please 
note that's a temporary workaround - DB might start to degrade if 
removals are going on.


There is also a PR to address the bulk removal issue in general:

1) https://github.com/ceph/ceph/pull/37496 (still pending review and 
unlikely to be backported to Octopus).



One more question - do your HDD OSDs  have additional fast (SSD/NVMe) 
drives for DB volumes? Or their DBs reside as spinning drives only? If 
the latter is true I would strongly encourage you to fix that by 
adding respective fast disks - RocksDB tend to works badly when not 
deployed on SSDs...



Thanks,

Igor


On 7/26/2021 1:28 AM, mahnoosh shahidi wrote:

Hi Igor,
Thanks for your response.This problem happens on my osds with hdd 
disks. I set the bluefs_buffered_io to true just for these osds but 
it caused my bucket index disks (which are ssd) to produce slow ops. 
I also tried to set bluefs_buffered_io to true in bucket index osds 
but they filled the entire memory (256G) so I had to set the 
bluefs_buffered_io back to false in all osds. Is that the only way to 
handle the garbage collector problem? Do you have any ideas for the 
bucket index problem?


On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov > wrote:


    Hi Mahnoosh,

    you might want to set bluefs_buffered_io to true for every OSD.

    It looks it's false by default in v15.2.12


    Thanks,

    Igor

    On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
    > We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We
    updated our
    > cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago.
    After
    > upgrading, the garbage collector process which is run after the
    lifecycle
    > process, causes slow ops and makes some osds to be restarted. In
    each
    > process the garbage collector deletes about 1 million objects.
    Below are
    > the one of the osd's logs before it restarts.
    >
    > ```
    > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400
    is_healthy
    > false -- internal heartbeat failed
    > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
    > healthy; waiting to boot
    > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map
    is_healthy
    > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
    > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400
    is_healthy
    > false -- internal heartbeat failed
    > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
    > healthy; waiting to boot
    > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map
    is_healthy
    > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
    > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400
    is_healthy
    > false -- internal heartbeat failed
    > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
    > healthy; waiting to boot
    > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map
    is_healthy
    > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
    > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400
    is_healthy
    > false -- internal heartbeat failed
    > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
    > healthy; waiting to boot
    > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map
    is_healthy
    > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
    > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400
    is_healthy
   

[ceph-users] Deployment Method of Octopus and Pacific

2021-07-26 Thread Xiaolong Jiang
Hi Ceph Users,

We are currently deploying Nautilus using ceph-deploy in spinnaker.

However the newer version of Ceph is not supporting ceph-deploy.

Is there anyone having experience deploying Octopus/Pacific using
spinnaker?


-- 
Best regards,
Xiaolong Jiang

Senior Software Engineer at Netflix
Columbia University
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Gargano Andrea
Hi all,
we had deployed a cluster ceph with three nodes pacific with ubuntu 20.04, 
after we had tryed to restart one node, but when it’s comes up we see:

root@tst2-ceph01:~# ceph status
  cluster:
id: be115adc-edf0-11eb-8509-c5c80111fd98
health: HEALTH_WARN
6 failed cephadm daemon(s)
1 osds down
1 host (6 osds) down
Degraded data redundancy: 1 pg undersized

  services:
mon: 3 daemons, quorum tst2-ceph01.tstsddc.csi.it,tst2-ceph02,tst2-ceph03 
(age 44s)
mgr: tst2-ceph03.fmrcvf(active, since 65m), standbys: 
tst2-ceph01.tstsddc.csi.it.ydtoyd
osd: 18 osds: 12 up (since 65m), 13 in (since 18m)

  data:
pools:   1 pools, 1 pgs
objects: 0 objects, 0 B
usage:   757 GiB used, 13 TiB / 14 TiB avail
pgs: 1 active+undersized


root@tst2-ceph01:~# ceph osd tree
ID  CLASS  WEIGHTTYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 20.73944  root default
-7  6.91315  host tst2-ceph01
2hdd   1.15219  osd.2   down 0  1.0
5hdd   1.15219  osd.5   down 0  1.0
8hdd   1.15219  osd.8   down 0  1.0
11hdd   1.15219  osd.11  down 0  1.0
14hdd   1.15219  osd.14  down 0  1.0
17hdd   1.15219  osd.17  down   1.0  1.0
-3  6.91315  host tst2-ceph02
0hdd   1.15219  osd.0 up   1.0  1.0
3hdd   1.15219  osd.3 up   1.0  1.0
6hdd   1.15219  osd.6 up   1.0  1.0
9hdd   1.15219  osd.9 up   1.0  1.0
12hdd   1.15219  osd.12up   1.0  1.0
15hdd   1.15219  osd.15up   1.0  1.0
-5  6.91315  host tst2-ceph03
1hdd   1.15219  osd.1 up   1.0  1.0
4hdd   1.15219  osd.4 up   1.0  1.0
7hdd   1.15219  osd.7 up   1.0  1.0
10hdd   1.15219  osd.10up   1.0  1.0
13hdd   1.15219  osd.13up   1.0  1.0
16hdd   1.15219  osd.16up   1.0  1.0

The services on node:

● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@alertmanager.tst2-ceph01.service
   loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@crash.tst2-ceph01.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@grafana.tst2-ceph01.service
loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@mgr.tst2-ceph01.fthmip.service
 loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@mon.tst2-ceph01.service
loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@node-exporter.tst2-ceph01.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.0.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.1.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.2.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.3.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.4.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.5.service
  loaded failed failed>
● 
ceph-0c7a175e-ebbc-11eb-8509-c5c80111fd98@osd.6.service

[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Ansgar Jazdzewski
Yes, the empty DB told me that at this point I had no other choice
than recreate the entire mon service.

* remove broken mon
  ceph mon remove $(hostname -s)

* mon preparation done
  rm -rf /var/lib/ceph/mon/ceph-$(hostname -s)
  mkdir /var/lib/ceph/mon/ceph-$(hostname -s)
  ceph auth get mon. -o /tmp/mon-keyfile
  ceph mon getmap -o /tmp/mon-monmap
  ceph-mon -i $(hostname -s) --mkfs --monmap /tmp/mon-monmap --keyring
/tmp/mon-keyfile
  chown -R ceph: /var/lib/ceph/mon/ceph-$(hostname -s)

will wait for low-traffic time on the cluster to enable the recreated mon

thanks for all the help so far
Ansgar

Am Mo., 26. Juli 2021 um 15:39 Uhr schrieb Dan van der Ster
:
>
> Your log ends with
>
> > 2021-07-25 06:46:52.078 7fe065f24700  1 mon.osd01@0(leader).osd e749666 
> > do_prune osdmap full prune enabled
>
> So mon.osd01 was still the leader at that time.
> When did it leave the cluster?
>
> > I also found that the rocksdb on osd01 is only 1MB in size and 345MB on the 
> > other mons!
>
> It sounds like mon.osd01's db has been re-initialized as empty, e.g.
> maybe the directory was lost somehow between reboots?
>
> -- dan
>
>
> On Mon, Jul 26, 2021 at 1:55 PM Ansgar Jazdzewski
>  wrote:
> >
> > Hi Dan, Hi Folks,
> >
> > this is how things started, I also found that the rocksdb on osd01 is
> > only 1MB in size and 345MB on the other mons!
> >
> > 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> > : monmap e1: 3 mons at
> > {osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0],osd03=[v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0]}
> > 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> > : fsmap cephfs:1 {0=osd01=up:active} 2 up:standby
> > 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> > : osdmap e749665: 436 total, 436 up, 436 in
> > 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> > : mgrmap e89: osd03(active, since 13h), standbys: osd01, osd02
> > 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [INF]
> > : overall HEALTH_OK
> > 2021-07-25 06:46:30.805 7fe065f24700  1 mon.osd01@0(leader).osd
> > e749665 do_prune osdmap full prune enabled
> > 2021-07-25 06:46:30.957 7fe06371f700  0 mon.osd01@0(leader) e1
> > handle_command mon_command({"prefix": "status"} v 0) v1
> > 2021-07-25 06:46:30.957 7fe06371f700  0 log_channel(audit) log [DBG] :
> > from='client.? 10.152.28.171:0/3290370429' entity='client.admin'
> > cmd=[{"prefix": "status"}]: dispatch
> > 2021-07-25 06:46:51.922 7fe065f24700  1 mon.osd01@0(leader).mds e85
> > tick: resetting beacon timeouts due to mon delay (slow election?) of
> > 20.3627s seconds
> > 2021-07-25 06:46:51.922 7fe065f24700 -1 mon.osd01@0(leader) e1
> > get_health_metrics reporting 13 slow ops, oldest is pool_op(delete
> > unmanaged snap pool 3 tid 27666 name  v749664)
> > 2021-07-25 06:46:51.930 7fe06371f700  0 log_channel(cluster) log [INF]
> > : mon.osd01 calling monitor election
> > 2021-07-25 06:46:51.930 7fe06371f700  1
> > mon.osd01@0(electing).elector(173) init, last seen epoch 173,
> > mid-election, bumping
> > 2021-07-25 06:46:51.946 7fe06371f700  1 mon.osd01@0(electing) e1
> > collect_metadata :  no unique device id for : fallback method has no
> > model nor serial'
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
> > handle_auth_request failed to assign global_id
> > 2021-07-25 06:46:51.970 7fe06371f700  0 log_channel(cluster) log [INF]
> > : mon.osd01 is new leader, mons osd01,osd02,osd03 in quorum (ranks
> > 0,1,2)
> > 2021-07-25 06:46:52.002 7fe06371f700  1 mon.osd01@0(leader).osd
> > e749666 e749666: 436 total, 436 up, 436 in
> > 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
> > : monmap e1: 3 mons at
> > 

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread cek+ceph
Although I appreciate the responses, they have provided zero help solving this 
issue thus far.
It seems like the kernel module doesn't even get to the stage where it reads 
the attributes/features of the device. It doesn't know where to connect and, 
presumably, is confused by the options passed by userspace. 

Obviously, I have already tried "--user", "--name" and so on with similar 
messages in dmesg. I don't feel like downgrading to 10. Any way to make el7 
work with 14 userspace utils?

I have also just realized there was 1 message missing from provided logs, which 
somehow was logged by userspace(?) and not kernel. Here's the full log:

# rbd device map test1/blk1 --user testing-rw

Jul 26 05:33:53 xx key.dns_resolver[9147]: name=testing-rw: No address 
associated with name
Jul 26 05:33:53 xx kernel: libceph: resolve 'name=testing-rw' (ret=-3): failed
Jul 26 05:33:53 xx kernel: libceph: parse_ips bad ip 
'name=testing-rw,key=client.testing-rw'

# rbd info test1/blk1 --user testing-rw
works perfectly

Thanks.


On 7/24/21 12:47 PM, Marc wrote:
> 
> If you have the default kernel you can not use all these features. I think 
> even dmesg shows you something about that when mapping.
> 
> 
>> -Original Message-
>> From: cek+c...@deepunix.net
>> Sent: Friday, 23 July 2021 23:58
>> To: ceph-users@ceph.io
>> Subject: *SPAM* [ceph-users] unable to map device with krbd on
>> el7 with ceph nautilus
>>
>> Hi.
>>
>> I've followed the installation guide and got nautilus 14.2.22 running on

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin
Somewhere between Nautilus and Pacific the hosts running standby
managers, which previously would redirect browsers to the currently
active mgr/dashboard, seem to have stopped doing that.   Is that a
switch somewhere?  Or was I just happily using an undocumented feature?

Thanks

Harry Coin


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Dimitri Savineau
Hi,

> As Marc mentioned, you would need to disable unsupported features but
> you are right that the kernel doesn't make it to that point.

I remember disabling unsupported features on el7 nodes (kernel 3.10) with
Nautilus.
But the error on the map command is usually more obvious.

$ rbd feature disable test1/blk1 object-map fast-diff deep-flatten

Could you give this a try ?

Regards,

Dimitri

On Mon, Jul 26, 2021 at 8:56 AM Ilya Dryomov  wrote:

> On Mon, Jul 26, 2021 at 12:39 PM  wrote:
> >
> > Although I appreciate the responses, they have provided zero help
> solving this issue thus far.
> > It seems like the kernel module doesn't even get to the stage where it
> reads the attributes/features of the device. It doesn't know where to
> connect and, presumably, is confused by the options passed by userspace.
>
> Sorry, your "rbd info" output didn't register with me for some reason.
> --id and --user should be equivalent and --name with "client." prefix
> should be fine too so my suggestion was useless.
>
> As Marc mentioned, you would need to disable unsupported features but
> you are right that the kernel doesn't make it to that point.
>
> >
> > Obviously, I have already tried "--user", "--name" and so on with
> similar messages in dmesg. I don't feel like downgrading to 10. Any way to
> make el7 work with 14 userspace utils?
>
> It is expected to work.  Could you please strace "rbd device map"
> with something like "strace -f -e write -s 500 rbd device map ..."
> and attach the output?
>
> Thanks,
>
> Ilya
>
> >
> > I have also just realized there was 1 message missing from provided
> logs, which somehow was logged by userspace(?) and not kernel. Here's the
> full log:
> >
> > # rbd device map test1/blk1 --user testing-rw
> >
> > Jul 26 05:33:53 xx key.dns_resolver[9147]: name=testing-rw: No address
> associated with name
> > Jul 26 05:33:53 xx kernel: libceph: resolve 'name=testing-rw' (ret=-3):
> failed
> > Jul 26 05:33:53 xx kernel: libceph: parse_ips bad ip
> 'name=testing-rw,key=client.testing-rw'
> >
> > # rbd info test1/blk1 --user testing-rw
> > works perfectly
> >
> > Thanks.
> >
> >
> > On 7/24/21 12:47 PM, Marc wrote:
> > >
> > > If you have the default kernel you can not use all these features. I
> think even dmesg shows you something about that when mapping.
> > >
> > >
> > >> -Original Message-
> > >> From: cek+c...@deepunix.net
> > >> Sent: Friday, 23 July 2021 23:58
> > >> To: ceph-users@ceph.io
> > >> Subject: *SPAM* [ceph-users] unable to map device with krbd on
> > >> el7 with ceph nautilus
> > >>
> > >> Hi.
> > >>
> > >> I've followed the installation guide and got nautilus 14.2.22 running
> on
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Ignazio Cassano
We solved our issuewe got a dirty lvm configuration and cleaned it.
Now it is working fine
Ignazio

Il giorno lun 26 lug 2021 alle ore 13:25 Ignazio Cassano <
ignaziocass...@gmail.com> ha scritto:

> Hello, I want to add further information I found  for the issue described
> by Andrea:
> ephadm.log:2021-07-26 13:07:11,281 DEBUG /usr/bin/docker: stderr Error: No
> such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.11
> cephadm.log:2021-07-26 13:07:11,654 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.17
> cephadm.log:2021-07-26 13:07:12,843 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.5
> cephadm.log:2021-07-26 13:07:12,962 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.8
> cephadm.log:2021-07-26 13:07:13,074 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.2
> cephadm.log:2021-07-26 13:17:22,353 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.14
> cephadm.log:2021-07-26 13:17:22,475 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.11
> cephadm.log:2021-07-26 13:17:22,871 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.17
> cephadm.log:2021-07-26 13:17:24,053 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.5
> cephadm.log:2021-07-26 13:17:24,157 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.8
> cephadm.log:2021-07-26 13:17:24,258 DEBUG /usr/bin/docker: stderr Error:
> No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.2
>
> Any help, please ?
> Ignazio
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Dan van der Ster
Your log ends with

> 2021-07-25 06:46:52.078 7fe065f24700  1 mon.osd01@0(leader).osd e749666 
> do_prune osdmap full prune enabled

So mon.osd01 was still the leader at that time.
When did it leave the cluster?

> I also found that the rocksdb on osd01 is only 1MB in size and 345MB on the 
> other mons!

It sounds like mon.osd01's db has been re-initialized as empty, e.g.
maybe the directory was lost somehow between reboots?

-- dan


On Mon, Jul 26, 2021 at 1:55 PM Ansgar Jazdzewski
 wrote:
>
> Hi Dan, Hi Folks,
>
> this is how things started, I also found that the rocksdb on osd01 is
> only 1MB in size and 345MB on the other mons!
>
> 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> : monmap e1: 3 mons at
> {osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0],osd03=[v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0]}
> 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> : fsmap cephfs:1 {0=osd01=up:active} 2 up:standby
> 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> : osdmap e749665: 436 total, 436 up, 436 in
> 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
> : mgrmap e89: osd03(active, since 13h), standbys: osd01, osd02
> 2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [INF]
> : overall HEALTH_OK
> 2021-07-25 06:46:30.805 7fe065f24700  1 mon.osd01@0(leader).osd
> e749665 do_prune osdmap full prune enabled
> 2021-07-25 06:46:30.957 7fe06371f700  0 mon.osd01@0(leader) e1
> handle_command mon_command({"prefix": "status"} v 0) v1
> 2021-07-25 06:46:30.957 7fe06371f700  0 log_channel(audit) log [DBG] :
> from='client.? 10.152.28.171:0/3290370429' entity='client.admin'
> cmd=[{"prefix": "status"}]: dispatch
> 2021-07-25 06:46:51.922 7fe065f24700  1 mon.osd01@0(leader).mds e85
> tick: resetting beacon timeouts due to mon delay (slow election?) of
> 20.3627s seconds
> 2021-07-25 06:46:51.922 7fe065f24700 -1 mon.osd01@0(leader) e1
> get_health_metrics reporting 13 slow ops, oldest is pool_op(delete
> unmanaged snap pool 3 tid 27666 name  v749664)
> 2021-07-25 06:46:51.930 7fe06371f700  0 log_channel(cluster) log [INF]
> : mon.osd01 calling monitor election
> 2021-07-25 06:46:51.930 7fe06371f700  1
> mon.osd01@0(electing).elector(173) init, last seen epoch 173,
> mid-election, bumping
> 2021-07-25 06:46:51.946 7fe06371f700  1 mon.osd01@0(electing) e1
> collect_metadata :  no unique device id for : fallback method has no
> model nor serial'
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
> handle_auth_request failed to assign global_id
> 2021-07-25 06:46:51.970 7fe06371f700  0 log_channel(cluster) log [INF]
> : mon.osd01 is new leader, mons osd01,osd02,osd03 in quorum (ranks
> 0,1,2)
> 2021-07-25 06:46:52.002 7fe06371f700  1 mon.osd01@0(leader).osd
> e749666 e749666: 436 total, 436 up, 436 in
> 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
> : monmap e1: 3 mons at
> {osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0],osd03=[v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0]}
> 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
> : fsmap cephfs:1 {0=osd01=up:active} 2 up:standby
> 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
> : osdmap e749666: 436 total, 436 up, 436 in
> 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
> : mgrmap e89: osd03(active, since 13h), standbys: osd01, osd02
> 2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [INF]
> : Health check cleared: MON_DOWN (was: 1/3 mons down, quorum
> osd02,osd03)
> 2021-07-25 06:46:52.042 7fe061f1c700  0 log_channel(cluster) log [WRN]
> : Health detail: HEALTH_WARN 7 slow ops, oldest one blocked for 36
> sec, daemons [mon.osd02,mon.osd03] have slow ops.
> 2021-07-25 06:46:52.042 

[ceph-users] R: [ceph] [pacific] cephadm cannot create OSD

2021-07-26 Thread Gargano Andrea
Hi Dimitri,
that's works for me!
Thank you,

Andrea

Da: Gargano Andrea 
Inviato: venerdì 23 luglio 2021 17:48
A: Dimitri Savineau 
Cc: ceph-users@ceph.io
Oggetto: Re: [ceph-users] [ceph] [pacific] cephadm cannot create OSD

Hi Dimitri,
Thank you, I'll retry and I'll let you know on monday.

Andrea

Ottieni Outlook per Android

From: Dimitri Savineau mailto:dsavi...@redhat.com>>
Sent: Friday, July 23, 2021 5:35:22 PM
To: Gargano Andrea mailto:andrea.garg...@dgsspa.com>>
Cc: ceph-users@ceph.io 
mailto:ceph-users@ceph.io>>
Subject: Re: [ceph-users] [ceph] [pacific] cephadm cannot create OSD

Hi,

This looks similar to 
https://tracker.ceph.com/issues/46687

Since you want to use hdd devices to bluestore data and ssd devices for 
bluestore db, I would suggest using the rotational [1] filter isn't dealing 
with the size filter.

---
service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0
...

Could you give this a try ?

[1] 
https://docs.ceph.com/en/latest/cephadm/osd/#rotational

Regards,

Dimitri

On Fri, Jul 23, 2021 at 7:12 AM Gargano Andrea 
mailto:andrea.garg...@dgsspa.com>> wrote:
Hi all,
we are trying to install ceph on ubuntu 20.04 but we are not able to create OSD.
Entering in cephadm shell we can see the following:

root@tst2-ceph01:/# ceph -s
  cluster:
id: 8b937a98-eb86-11eb-8509-c5c80111fd98
health: HEALTH_ERR
Module 'cephadm' has failed: No filters applied
OSD count 0 < osd_pool_default_size 3

  services:
mon: 3 daemons, quorum tst2-ceph01,tst2-ceph03,tst2-ceph02 (age 2h)
mgr: tst2-ceph01.kwyejx(active, since 3h), standbys: tst2-ceph02.qrpuzp
osd: 0 osds: 0 up (since 115m), 0 in (since 105m)

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs:


root@tst2-ceph01:/# ceph orch device ls
Hostname Path  Type  SerialSize   Health   
Ident  Fault  Available
tst2-ceph01  /dev/sdb  hdd   600508b1001c1960d834c222fb64f2ea  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sdc  hdd   600508b1001c36e812fb5d14997f5f47  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sdd  hdd   600508b1001c01a0297ac2c5e8039063  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sde  hdd   600508b1001cf4520d0f0155d0dd31ad  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sdf  hdd   600508b1001cc911d4f570eba568a8d0  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sdg  hdd   600508b1001c410bd38e6c55807bea25  1200G  Unknown  
N/AN/AYes
tst2-ceph01  /dev/sdh  ssd   600508b1001cdb21499020552589eadb   400G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdb  hdd   600508b1001ce1f33b63f8859aeac9b4  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdc  hdd   600508b1001c0b4dbfa794d2b38f328e  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdd  hdd   600508b1001c145b8de4e4e7cc9129d5  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sde  hdd   600508b1001c1d81d0aaacfdfd20f5f1  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdf  hdd   600508b1001c28d2a2c261449ca1a3cc  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdg  hdd   600508b1001c1f9a964b1513f70b51b3  1200G  Unknown  
N/AN/AYes
tst2-ceph02  /dev/sdh  ssd   600508b1001c8040dd5cf17903940177   400G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdb  hdd   600508b1001c900ef43d7745db17d5cc  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdc  hdd   600508b1001cf1b79f7dc2f79ab2c90b  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdd  hdd   600508b1001c83c09fe03eb17e555f5f  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sde  hdd   600508b1001c9c4c5db12fabf54a4ff3  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdf  hdd   600508b1001cdaa7dc09d751262e2cc9  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdg  hdd   600508b1001c8f435a08b7eae4a1323e  1200G  Unknown  
N/AN/AYes
tst2-ceph03  /dev/sdh  ssd   600508b1001c5e24f822d6790a5df65b   400G  Unknown  
N/AN/AYes


we wrote the following spec file:


[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Mon, Jul 26, 2021 at 12:39 PM  wrote:
>
> Although I appreciate the responses, they have provided zero help solving 
> this issue thus far.
> It seems like the kernel module doesn't even get to the stage where it reads 
> the attributes/features of the device. It doesn't know where to connect and, 
> presumably, is confused by the options passed by userspace.

Sorry, your "rbd info" output didn't register with me for some reason.
--id and --user should be equivalent and --name with "client." prefix
should be fine too so my suggestion was useless.

As Marc mentioned, you would need to disable unsupported features but
you are right that the kernel doesn't make it to that point.

>
> Obviously, I have already tried "--user", "--name" and so on with similar 
> messages in dmesg. I don't feel like downgrading to 10. Any way to make el7 
> work with 14 userspace utils?

It is expected to work.  Could you please strace "rbd device map"
with something like "strace -f -e write -s 500 rbd device map ..."
and attach the output?

Thanks,

Ilya

>
> I have also just realized there was 1 message missing from provided logs, 
> which somehow was logged by userspace(?) and not kernel. Here's the full log:
>
> # rbd device map test1/blk1 --user testing-rw
>
> Jul 26 05:33:53 xx key.dns_resolver[9147]: name=testing-rw: No address 
> associated with name
> Jul 26 05:33:53 xx kernel: libceph: resolve 'name=testing-rw' (ret=-3): failed
> Jul 26 05:33:53 xx kernel: libceph: parse_ips bad ip 
> 'name=testing-rw,key=client.testing-rw'
>
> # rbd info test1/blk1 --user testing-rw
> works perfectly
>
> Thanks.
>
>
> On 7/24/21 12:47 PM, Marc wrote:
> >
> > If you have the default kernel you can not use all these features. I think 
> > even dmesg shows you something about that when mapping.
> >
> >
> >> -Original Message-
> >> From: cek+c...@deepunix.net
> >> Sent: Friday, 23 July 2021 23:58
> >> To: ceph-users@ceph.io
> >> Subject: *SPAM* [ceph-users] unable to map device with krbd on
> >> el7 with ceph nautilus
> >>
> >> Hi.
> >>
> >> I've followed the installation guide and got nautilus 14.2.22 running on
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-users Digest, Vol 102, Issue 52

2021-07-26 Thread Eugen Block

Hi,

I replied a couple of days ago  
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/IPLTK777USH2TNXM4DU4U2F2YCHWX4Z4/).



Zitat von renjianxinlover :


anyone has ideas?


| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/22/2021 10:12,renjianxinlover wrote:
sorry,a point was left out yesterday. Currently, the .index pool  
with that three OSD(.18,.19,.29) is not in use and nearly has no any  
data.



| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/21/2021 14:46, wrote:
Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: inbalancing data distribution for osds with custom device class
(renjianxinlover)
2. Re: inbalancing data distribution for osds with custom device class
(Eugen Block)


--

Message: 1
Date: Wed, 21 Jul 2021 11:28:27 +0800 (GMT+08:00)
From: renjianxinlover 
Subject: [ceph-users] Re: inbalancing data distribution for osds with
custom device class
To: "ceph-users@ceph.io" 
Message-ID:
<50973ab3.6cc8.17ac71b83be.coremail.renjianxinlo...@163.com>
Content-Type: text/plain; charset=UTF-8



Ceph: ceph version 12.2.12  
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
OS: Linux *** 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3 (2019-09-02)  
x86_64 GNU/Linux

| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/21/2021 11:16,renjianxinlover wrote:
ceph cluster osds' disk occupation information as follows,


ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR   PGS
2   hdd 7.27689  1.0 7.28TiB  145GiB 7.14TiB  1.94  0.84  85
3   hdd 7.27689  1.0 7.28TiB  182GiB 7.10TiB  2.44  1.05  94
4   hdd 7.27689  1.0 7.28TiB  165GiB 7.12TiB  2.22  0.96  98
5   hdd 7.27689  1.0 7.28TiB  138GiB 7.14TiB  1.85  0.80  88
6   hdd 7.27689  1.0 7.28TiB  122GiB 7.16TiB  1.64  0.71  90
7   hdd 7.27689  1.0 7.28TiB  169GiB 7.11TiB  2.26  0.98  96
8   hdd 7.27689  1.0 7.28TiB  161GiB 7.12TiB  2.16  0.93  91
9   hdd 7.27689  1.0 7.28TiB  115GiB 7.16TiB  1.54  0.67  89
18 rgw-index-ssd 0.72769  1.0  745GiB  531GiB  215GiB 71.20 30.74  46
0   ssd 1.74660  1.0 1.75TiB 1.79GiB 1.74TiB  0.10  0.04  70
10   hdd 7.27689  1.0 7.28TiB  138GiB 7.14TiB  1.86  0.80  77
11   hdd 7.27689  1.0 7.28TiB  165GiB 7.12TiB  2.21  0.95  90
12   hdd 7.27689  1.0 7.28TiB  101GiB 7.18TiB  1.35  0.58  59
13   hdd 7.27689  1.0 7.28TiB  150GiB 7.13TiB  2.02  0.87  78
14   hdd 7.27689  1.0 7.28TiB  166GiB 7.11TiB  2.23  0.96  97
15   hdd 7.27689  1.0 7.28TiB  184GiB 7.10TiB  2.46  1.06 111
16   hdd 7.27689  1.0 7.28TiB  131GiB 7.15TiB  1.76  0.76  93
17   hdd 7.27689  1.0 7.28TiB  115GiB 7.16TiB  1.54  0.67  79
19 rgw-index-ssd 0.72769  1.0  745GiB  523GiB  223GiB 70.12 30.28  43
1   ssd 1.74660  1.0 1.75TiB 1.76GiB 1.74TiB  0.10  0.04  65
20   hdd 7.27689  1.0 7.28TiB  124GiB 7.16TiB  1.67  0.72  85
21   hdd 7.27689  1.0 7.28TiB  122GiB 7.16TiB  1.64  0.71  82
22   hdd 7.27689  1.0 7.28TiB  144GiB 7.14TiB  1.93  0.83  90
23   hdd 7.27689  1.0 7.28TiB  176GiB 7.11TiB  2.36  1.02  96
24   hdd 7.27689  1.0 7.28TiB  178GiB 7.10TiB  2.38  1.03  96
25   hdd 7.27689  1.0 7.28TiB  171GiB 7.11TiB  2.29  0.99  94
26   hdd 7.27689  1.0 7.28TiB  157GiB 7.12TiB  2.10  0.91 100
27   hdd 7.27689  1.0 7.28TiB  160GiB 7.12TiB  2.15  0.93  93
29 rgw-index-ssd 0.72769  1.0  745GiB 1.03GiB  744GiB  0.14  0.06  41
28   ssd 1.74660  1.0 1.75TiB 2.20GiB 1.74TiB  0.12  0.05  67
30   hdd 7.27689  1.0 7.28TiB  114GiB 7.17TiB  1.53  0.66  78
31   hdd 7.27689  1.0 7.28TiB  186GiB 7.10TiB  2.49  1.08  97
32   hdd 7.27689  1.0 7.28TiB  143GiB 7.14TiB  1.92  0.83  77
33   hdd 7.27689  1.0 7.28TiB  169GiB 7.11TiB  2.26  0.98  95
34   hdd 7.27689  1.0 7.28TiB  153GiB 7.13TiB  2.05  0.89  84
35   hdd 7.27689  1.0 7.28TiB  101GiB 7.18TiB  1.36  0.59  77
36   hdd 7.27689  1.0 7.28TiB  111GiB 7.17TiB  1.49  0.65  79
37   hdd 7.27689  1.0 7.28TiB  106GiB 7.17TiB  1.43  0.62  74
38   hdd 7.27689  1.0 7.28TiB  151GiB 7.13TiB  2.03  0.88  88
TOTAL  248TiB 5.73TiB  242TiB  2.32
MIN/MAX VAR: 0.04/30.74  STDDEV: 15.50


where, rgw-index-ssd is a custom device class used for rgw bucket  
index by crush rule below,



# begin crush map
tunable choose_local_tries 0
tunable 

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Igor Fedotov

Dave,

please see inline

On 7/26/2021 1:57 PM, Dave Piper wrote:

Hi Igor,


So to get more verbose but less log one can set both debug-bluestore and 
debug-bluefs to 1/20. ...

More verbose logging attached. I've trimmed the file to a single restart 
attempt to keep the filesize down; let me know if there's not enough here.
Jul 26 10:25:07 condor_sc0 docker[19100]: -9628> 
2021-07-26T10:25:05.512+ 7f9b3ed48f40 20 bluefs _read got 32768
Jul 26 10:25:07 condor_sc0 docker[19100]: -9627> 
2021-07-26T10:25:05.512+ 7f9b3ed48f40 10 bluefs _read h 
0x563e8bd3ff80 0xb2d~8000 from file(ino 316842 size 0xe6a476e mtime 
2021-07-14T15:54:21.751044+ allocated e6b extents 

[ceph-users] Re: ceph-users Digest, Vol 102, Issue 52

2021-07-26 Thread renjianxinlover
anyone has ideas?


| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/22/2021 10:12,renjianxinlover wrote:
sorry,a point was left out yesterday. Currently, the .index pool with that 
three OSD(.18,.19,.29) is not in use and nearly has no any data.


| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/21/2021 14:46, wrote:
Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: inbalancing data distribution for osds with custom device class
(renjianxinlover)
2. Re: inbalancing data distribution for osds with custom device class
(Eugen Block)


--

Message: 1
Date: Wed, 21 Jul 2021 11:28:27 +0800 (GMT+08:00)
From: renjianxinlover 
Subject: [ceph-users] Re: inbalancing data distribution for osds with
custom device class
To: "ceph-users@ceph.io" 
Message-ID:
<50973ab3.6cc8.17ac71b83be.coremail.renjianxinlo...@163.com>
Content-Type: text/plain; charset=UTF-8



Ceph: ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous 
(stable)
OS: Linux *** 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3 (2019-09-02) x86_64 
GNU/Linux
| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
签名由网易邮箱大师定制
On 7/21/2021 11:16,renjianxinlover wrote:
ceph cluster osds' disk occupation information as follows,


ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR   PGS
2   hdd 7.27689  1.0 7.28TiB  145GiB 7.14TiB  1.94  0.84  85
3   hdd 7.27689  1.0 7.28TiB  182GiB 7.10TiB  2.44  1.05  94
4   hdd 7.27689  1.0 7.28TiB  165GiB 7.12TiB  2.22  0.96  98
5   hdd 7.27689  1.0 7.28TiB  138GiB 7.14TiB  1.85  0.80  88
6   hdd 7.27689  1.0 7.28TiB  122GiB 7.16TiB  1.64  0.71  90
7   hdd 7.27689  1.0 7.28TiB  169GiB 7.11TiB  2.26  0.98  96
8   hdd 7.27689  1.0 7.28TiB  161GiB 7.12TiB  2.16  0.93  91
9   hdd 7.27689  1.0 7.28TiB  115GiB 7.16TiB  1.54  0.67  89
18 rgw-index-ssd 0.72769  1.0  745GiB  531GiB  215GiB 71.20 30.74  46
0   ssd 1.74660  1.0 1.75TiB 1.79GiB 1.74TiB  0.10  0.04  70
10   hdd 7.27689  1.0 7.28TiB  138GiB 7.14TiB  1.86  0.80  77
11   hdd 7.27689  1.0 7.28TiB  165GiB 7.12TiB  2.21  0.95  90
12   hdd 7.27689  1.0 7.28TiB  101GiB 7.18TiB  1.35  0.58  59
13   hdd 7.27689  1.0 7.28TiB  150GiB 7.13TiB  2.02  0.87  78
14   hdd 7.27689  1.0 7.28TiB  166GiB 7.11TiB  2.23  0.96  97
15   hdd 7.27689  1.0 7.28TiB  184GiB 7.10TiB  2.46  1.06 111
16   hdd 7.27689  1.0 7.28TiB  131GiB 7.15TiB  1.76  0.76  93
17   hdd 7.27689  1.0 7.28TiB  115GiB 7.16TiB  1.54  0.67  79
19 rgw-index-ssd 0.72769  1.0  745GiB  523GiB  223GiB 70.12 30.28  43
1   ssd 1.74660  1.0 1.75TiB 1.76GiB 1.74TiB  0.10  0.04  65
20   hdd 7.27689  1.0 7.28TiB  124GiB 7.16TiB  1.67  0.72  85
21   hdd 7.27689  1.0 7.28TiB  122GiB 7.16TiB  1.64  0.71  82
22   hdd 7.27689  1.0 7.28TiB  144GiB 7.14TiB  1.93  0.83  90
23   hdd 7.27689  1.0 7.28TiB  176GiB 7.11TiB  2.36  1.02  96
24   hdd 7.27689  1.0 7.28TiB  178GiB 7.10TiB  2.38  1.03  96
25   hdd 7.27689  1.0 7.28TiB  171GiB 7.11TiB  2.29  0.99  94
26   hdd 7.27689  1.0 7.28TiB  157GiB 7.12TiB  2.10  0.91 100
27   hdd 7.27689  1.0 7.28TiB  160GiB 7.12TiB  2.15  0.93  93
29 rgw-index-ssd 0.72769  1.0  745GiB 1.03GiB  744GiB  0.14  0.06  41
28   ssd 1.74660  1.0 1.75TiB 2.20GiB 1.74TiB  0.12  0.05  67
30   hdd 7.27689  1.0 7.28TiB  114GiB 7.17TiB  1.53  0.66  78
31   hdd 7.27689  1.0 7.28TiB  186GiB 7.10TiB  2.49  1.08  97
32   hdd 7.27689  1.0 7.28TiB  143GiB 7.14TiB  1.92  0.83  77
33   hdd 7.27689  1.0 7.28TiB  169GiB 7.11TiB  2.26  0.98  95
34   hdd 7.27689  1.0 7.28TiB  153GiB 7.13TiB  2.05  0.89  84
35   hdd 7.27689  1.0 7.28TiB  101GiB 7.18TiB  1.36  0.59  77
36   hdd 7.27689  1.0 7.28TiB  111GiB 7.17TiB  1.49  0.65  79
37   hdd 7.27689  1.0 7.28TiB  106GiB 7.17TiB  1.43  0.62  74
38   hdd 7.27689  1.0 7.28TiB  151GiB 7.13TiB  2.03  0.88  88
TOTAL  248TiB 5.73TiB  242TiB  2.32
MIN/MAX VAR: 0.04/30.74  STDDEV: 15.50


where, rgw-index-ssd is a custom device class used for rgw bucket index by 
crush rule below,


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1

[ceph-users] Re: RGW: LC not deleting expired files

2021-07-26 Thread Paul JURCO
Hi Vidushi,
aws s3api list-object-versions shows the same files as s3cmd, so I would
say versioning is not enabled.
aws s3api get-bucket-versioning result is empty.
Is there any other method to check if versioning is enabled?
Thank you!
Paul

On Mon, Jul 26, 2021 at 2:42 PM Vidushi Mishra  wrote:

> Hi Paul,
>
> Are these non-current versioned objects displayed in the bucket stats?
> Also, the LC rule applied to the bucket can only delete/expire objects for
> a normal bucket.
> In the case of a versioned bucket, the LC rule applied will expire the
> current version  [create a delete-marker for every object and move the
> object version from current to non-current, thereby reflecting the same
> number of objects in bucket stats output ].
>
> Vidushi
>
> On Mon, Jul 26, 2021 at 4:55 PM Paul JURCO  wrote:
>
>> Hi!
>> I need some help understanding LC processing.
>> On latest versions of octopus installed (tested with 15.2.13 and 15.2.8)
>> we
>> have at least one bucket which is not having the files removed when
>> expiring.
>> The size of the bucket reported with radosgw-admin compared with the one
>> obtained with s3cmd is different, logs below:
>> ~3 TB in bucket stats and 11GB from s3cmd.
>> We tried to run manually several times the LC (lc process) but no success,
>> even with bucket check (including with --fix --check-objects) didn't help.
>> Configs changed are below, we have a few TB size buckets with millions of
>> objects, 6h default processing LC time was always not enough:
>> rgw_lc_debug_interval = 28800
>> rgw_lifecycle_work_time = 00:00-23:59
>> rgw_lc_max_worker = 5
>> rgw_lc_max_wp_worker = 9
>> rgw_enable_lc_threads = true
>>
>> Status of LC is always complete:
>> {
>> "bucket":
>>
>> ":feeds-bucket-dev-dc418787:3ccb869f-b0f4-4fb9-a8d7-ecf5f5e18f33.37270170.140",
>> "started": "Mon, 26 Jul 2021 07:10:14 GMT",
>> "status": "COMPLETE"
>> },
>> It looks like versioning is not supported yet, and with aws-cli I did not
>> get any response for 's3api get-bucket-versioning'.
>> So, what to do?
>>
>> $ s3cmd -c .s3cfg-feeds-bucket-dev-dc418787 du
>> s3://feeds-bucket-dev-dc418787
>> *12725854360 *192 objects s3://feeds-bucket-dev-dc418787/
>> (192 files, 11GB)
>>
>> Output of 'radosgw-admin lc get'  and 'bucket stats' is attached.
>> Thank you for any suggestion!
>>
>> Also, we found that the LC is removing files earlier than set for another
>> bucket, 6h after the file was added instead of 31 days.
>>
>> Paul
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Ansgar Jazdzewski
Hi Dan, Hi Folks,

this is how things started, I also found that the rocksdb on osd01 is
only 1MB in size and 345MB on the other mons!

2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
: monmap e1: 3 mons at
{osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0],osd03=[v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0]}
2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
: fsmap cephfs:1 {0=osd01=up:active} 2 up:standby
2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
: osdmap e749665: 436 total, 436 up, 436 in
2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [DBG]
: mgrmap e89: osd03(active, since 13h), standbys: osd01, osd02
2021-07-25 06:46:30.029 7fe061f1c700  0 log_channel(cluster) log [INF]
: overall HEALTH_OK
2021-07-25 06:46:30.805 7fe065f24700  1 mon.osd01@0(leader).osd
e749665 do_prune osdmap full prune enabled
2021-07-25 06:46:30.957 7fe06371f700  0 mon.osd01@0(leader) e1
handle_command mon_command({"prefix": "status"} v 0) v1
2021-07-25 06:46:30.957 7fe06371f700  0 log_channel(audit) log [DBG] :
from='client.? 10.152.28.171:0/3290370429' entity='client.admin'
cmd=[{"prefix": "status"}]: dispatch
2021-07-25 06:46:51.922 7fe065f24700  1 mon.osd01@0(leader).mds e85
tick: resetting beacon timeouts due to mon delay (slow election?) of
20.3627s seconds
2021-07-25 06:46:51.922 7fe065f24700 -1 mon.osd01@0(leader) e1
get_health_metrics reporting 13 slow ops, oldest is pool_op(delete
unmanaged snap pool 3 tid 27666 name  v749664)
2021-07-25 06:46:51.930 7fe06371f700  0 log_channel(cluster) log [INF]
: mon.osd01 calling monitor election
2021-07-25 06:46:51.930 7fe06371f700  1
mon.osd01@0(electing).elector(173) init, last seen epoch 173,
mid-election, bumping
2021-07-25 06:46:51.946 7fe06371f700  1 mon.osd01@0(electing) e1
collect_metadata :  no unique device id for : fallback method has no
model nor serial'
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.962 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.966 7fe067727700  1 mon.osd01@0(electing) e1
handle_auth_request failed to assign global_id
2021-07-25 06:46:51.970 7fe06371f700  0 log_channel(cluster) log [INF]
: mon.osd01 is new leader, mons osd01,osd02,osd03 in quorum (ranks
0,1,2)
2021-07-25 06:46:52.002 7fe06371f700  1 mon.osd01@0(leader).osd
e749666 e749666: 436 total, 436 up, 436 in
2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
: monmap e1: 3 mons at
{osd01=[v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0],osd02=[v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0],osd03=[v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0]}
2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
: fsmap cephfs:1 {0=osd01=up:active} 2 up:standby
2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
: osdmap e749666: 436 total, 436 up, 436 in
2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [DBG]
: mgrmap e89: osd03(active, since 13h), standbys: osd01, osd02
2021-07-25 06:46:52.026 7fe06371f700  0 log_channel(cluster) log [INF]
: Health check cleared: MON_DOWN (was: 1/3 mons down, quorum
osd02,osd03)
2021-07-25 06:46:52.042 7fe061f1c700  0 log_channel(cluster) log [WRN]
: Health detail: HEALTH_WARN 7 slow ops, oldest one blocked for 36
sec, daemons [mon.osd02,mon.osd03] have slow ops.
2021-07-25 06:46:52.042 7fe061f1c700  0 log_channel(cluster) log [WRN]
: SLOW_OPS 7 slow ops, oldest one blocked for 36 sec, daemons
[mon.osd02,mon.osd03] have slow ops.
2021-07-25 06:46:52.078 7fe065f24700  1 mon.osd01@0(leader).osd
e749666 do_prune osdmap full prune enabled

Am Mo., 26. Juli 2021 um 09:45 Uhr schrieb Dan van der Ster
:
>
> Hi,
>
> Do you have ceph-mon logs from when mon.osd01 first failed before the
> on-call team rebooted it? They might give a clue what happened to
> start this problem, which maybe is still happening now.
>
> This looks similar but it was eventually found to be a network issue:
> https://tracker.ceph.com/issues/48033
>
> -- Dan
>
> On Sun, 

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Igor Fedotov

Hi Mahnoosh!

Unfortunately I'm not an expert in RGW hence nothing to recommend from 
this side.


Apparently your issues are caused by bulk data removal - it appears that 
RocksDB can hardly sustain such things and its performance degrades. 
We've seen that plenty of times before.



So far there are two known workarounds - manual DB compaction with using 
ceph-kvstore-tool and setting bluefs_buffer_io to true. The latter makes 
sense for some Ceph releases which got that parameter set to false by 
default, v15.2.12 is one of them. And indeed that setting



On 7/26/2021 1:28 AM, mahnoosh shahidi wrote:

Hi Igor,
Thanks for your response.This problem happens on my osds with hdd 
disks. I set the bluefs_buffered_io to true just for these osds but it 
caused my bucket index disks (which are ssd) to produce slow ops. I 
also tried to set bluefs_buffered_io to true in bucket index osds but 
they filled the entire memory (256G) so I had to set the 
bluefs_buffered_io back to false in all osds. Is that the only way to 
handle the garbage collector problem? Do you have any ideas for the 
bucket index problem?


On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov > wrote:


Hi Mahnoosh,

you might want to set bluefs_buffered_io to true for every OSD.

It looks it's false by default in v15.2.12


Thanks,

Igor

On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
> We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We
updated our
> cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago.
After
> upgrading, the garbage collector process which is run after the
lifecycle
> process, causes slow ops and makes some osds to be restarted. In
each
> process the garbage collector deletes about 1 million objects.
Below are
> the one of the osd's logs before it restarts.
>
> ```
> 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
>
> ```
> what is the suitable configuration for gc in such a heavy delete
process so
> it doesn't make slow ops? We had the same delete load in
nautilus but we
> didn't have any problem with that.
> ___
> ceph-users mailing list -- ceph-users@ceph.io

> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: LC not deleting expired files

2021-07-26 Thread Vidushi Mishra
Hi Paul,

Are these non-current versioned objects displayed in the bucket stats?
Also, the LC rule applied to the bucket can only delete/expire objects for
a normal bucket.
In the case of a versioned bucket, the LC rule applied will expire the
current version  [create a delete-marker for every object and move the
object version from current to non-current, thereby reflecting the same
number of objects in bucket stats output ].

Vidushi

On Mon, Jul 26, 2021 at 4:55 PM Paul JURCO  wrote:

> Hi!
> I need some help understanding LC processing.
> On latest versions of octopus installed (tested with 15.2.13 and 15.2.8) we
> have at least one bucket which is not having the files removed when
> expiring.
> The size of the bucket reported with radosgw-admin compared with the one
> obtained with s3cmd is different, logs below:
> ~3 TB in bucket stats and 11GB from s3cmd.
> We tried to run manually several times the LC (lc process) but no success,
> even with bucket check (including with --fix --check-objects) didn't help.
> Configs changed are below, we have a few TB size buckets with millions of
> objects, 6h default processing LC time was always not enough:
> rgw_lc_debug_interval = 28800
> rgw_lifecycle_work_time = 00:00-23:59
> rgw_lc_max_worker = 5
> rgw_lc_max_wp_worker = 9
> rgw_enable_lc_threads = true
>
> Status of LC is always complete:
> {
> "bucket":
>
> ":feeds-bucket-dev-dc418787:3ccb869f-b0f4-4fb9-a8d7-ecf5f5e18f33.37270170.140",
> "started": "Mon, 26 Jul 2021 07:10:14 GMT",
> "status": "COMPLETE"
> },
> It looks like versioning is not supported yet, and with aws-cli I did not
> get any response for 's3api get-bucket-versioning'.
> So, what to do?
>
> $ s3cmd -c .s3cfg-feeds-bucket-dev-dc418787 du
> s3://feeds-bucket-dev-dc418787
> *12725854360 *192 objects s3://feeds-bucket-dev-dc418787/
> (192 files, 11GB)
>
> Output of 'radosgw-admin lc get'  and 'bucket stats' is attached.
> Thank you for any suggestion!
>
> Also, we found that the LC is removing files earlier than set for another
> bucket, 6h after the file was added instead of 31 days.
>
> Paul
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Ignazio Cassano
Hello, I want to add further information I found  for the issue described
by Andrea:
ephadm.log:2021-07-26 13:07:11,281 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.11
cephadm.log:2021-07-26 13:07:11,654 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.17
cephadm.log:2021-07-26 13:07:12,843 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.5
cephadm.log:2021-07-26 13:07:12,962 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.8
cephadm.log:2021-07-26 13:07:13,074 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.2
cephadm.log:2021-07-26 13:17:22,353 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.14
cephadm.log:2021-07-26 13:17:22,475 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.11
cephadm.log:2021-07-26 13:17:22,871 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.17
cephadm.log:2021-07-26 13:17:24,053 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.5
cephadm.log:2021-07-26 13:17:24,157 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.8
cephadm.log:2021-07-26 13:17:24,258 DEBUG /usr/bin/docker: stderr Error: No
such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.2

Any help, please ?
Ignazio
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW: LC not deleting expired files

2021-07-26 Thread Paul JURCO
Hi!
I need some help understanding LC processing.
On latest versions of octopus installed (tested with 15.2.13 and 15.2.8) we
have at least one bucket which is not having the files removed when
expiring.
The size of the bucket reported with radosgw-admin compared with the one
obtained with s3cmd is different, logs below:
~3 TB in bucket stats and 11GB from s3cmd.
We tried to run manually several times the LC (lc process) but no success,
even with bucket check (including with --fix --check-objects) didn't help.
Configs changed are below, we have a few TB size buckets with millions of
objects, 6h default processing LC time was always not enough:
rgw_lc_debug_interval = 28800
rgw_lifecycle_work_time = 00:00-23:59
rgw_lc_max_worker = 5
rgw_lc_max_wp_worker = 9
rgw_enable_lc_threads = true

Status of LC is always complete:
{
"bucket":
":feeds-bucket-dev-dc418787:3ccb869f-b0f4-4fb9-a8d7-ecf5f5e18f33.37270170.140",
"started": "Mon, 26 Jul 2021 07:10:14 GMT",
"status": "COMPLETE"
},
It looks like versioning is not supported yet, and with aws-cli I did not
get any response for 's3api get-bucket-versioning'.
So, what to do?

$ s3cmd -c .s3cfg-feeds-bucket-dev-dc418787 du
s3://feeds-bucket-dev-dc418787
*12725854360 *192 objects s3://feeds-bucket-dev-dc418787/
(192 files, 11GB)

Output of 'radosgw-admin lc get'  and 'bucket stats' is attached.
Thank you for any suggestion!

Also, we found that the LC is removing files earlier than set for another
bucket, 6h after the file was added instead of 31 days.

Paul
Sample of headers for an object, observe the expiry date (2 days, as in LC 
rule):
{
"AcceptRanges": "bytes",
"Expiration": "expiry-date=\"Mon, 12 Jul 2021 00:00:00 GMT\", 
rule-id=\"cleanup\"",
"LastModified": "Fri, 09 Jul 2021 10:52:23 GMT",
"ContentLength": 1274290121,
"ETag": "\"d5eac62748ba7fea103f1973e563966a-244\"",
"ContentType": "text/csv; charset=utf-8",
"Metadata": {}
}



# radosgw-admin bucket stats --bucket=feeds-bucket-dev-dc418787
2021-07-26T13:28:43.776+0300 7fddc7df0080  2 all 8 watchers are set, enabling 
cache
2021-07-26T13:28:43.776+0300 7fdd88fe9700  2 
RGWDataChangesLog::ChangesRenewThread: start
{
"bucket": "feeds-bucket-dev-dc418787",
"num_shards": 30,
"tenant": "",
"zonegroup": "93c4027b-e930-473e-88e4-341ca31cfc2c",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "3ccb869f-b0f4-4fb9-a8d7-ecf5f5e18f33.46261325.1",
"marker": "3ccb869f-b0f4-4fb9-a8d7-ecf5f5e18f33.37270170.140",
"index_type": "Normal",
"owner": "user-dc418787",
"ver": 
"0#7328,1#28720,2#239,3#170,4#46497,5#6412,6#4894,7#9952,8#24841,9#22430,10#57475,11#41288,12#69788,13#44486,14#10091,15#50822,16#1364,17#50524,18#6216,19#156400,20#8052,21#660,22#42518,23#4178,24#12188,25#564,26#8123,27#548,28#42203,29#70443",
"master_ver": 
"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0",
"mtime": "2021-07-26T06:12:23.448315Z",
"creation_time": "2020-07-08T11:01:01.929011Z",
"max_marker": 
"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#",
"usage": {
"rgw.main": {
"size": 3302827555820,
"size_actual": 3302828068864,
"size_utilized": 3302827555820,
"size_kb": 3225417535,
"size_kb_actual": 3225418036,
"size_kb_utilized": 3225417535,
"num_objects": 627741
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 34101,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 34,
"num_objects": 1263



# radosgw-admin lc get --bucket=feeds-bucket-dev-dc418787
2021-07-26T13:38:37.177+0300 7fb5a3d44080  2 all 8 watchers are set, enabling 
cache
2021-07-26T13:38:37.181+0300 7fb565fdb700  2 
RGWDataChangesLog::ChangesRenewThread: start
{
"prefix_map": {
"": {
"status": true,
"dm_expiration": false,
"expiration": 2,
"noncur_expiration": 0,
"mp_expiration": 1,
"transitions": {},
"noncur_transitions": {}
}
},
"rule_map": [
{
"id": "cleanup",
"rule": {
"id": "cleanup",
"prefix": "",
"status": "Enabled",
"expiration": {
"days": "2",
"date": ""
},
"noncur_expiration": {
"days": "",
"date": ""
},
"mp_expiration": {
"days": "1",
"date": ""
},
"filter": {
  

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread mahnoosh shahidi
Thanks for your help. Our hdd osd have separate nvme disks for DB use.

On Mon, Jul 26, 2021 at 3:49 PM Igor Fedotov  wrote:

> Unfortunately I'm not an expert in RGW hence nothing to recommend from
> that side.
>
> Apparently your issues are caused by bulk data removal - it appears that
> RocksDB can hardly sustain such things and its performance degrades. We've
> seen that plenty of times before.
>
> So far there are two known workarounds - manual DB compaction with using
> ceph-kvstore-tool and setting bluefs_buffer_io to true. The latter makes
> sense for some Ceph releases which got that parameter set to false by
> default, v15.2.12 is one of them. And indeed that setting might cause high
> RAM usage in cases - you might want to look for relevant recent PRs at
> github or ask Mark Nelson from RH for more details.
>
> Nevertheless current upstream recommendation/default is to have it set to
> true as it greatly improves DB performance.
>
>
> So you might want to try to compact RocksDB as per above but please note
> that's a temporary workaround - DB might start to degrade if removals are
> going on.
>
> There is also a PR to address the bulk removal issue in general:
>
> 1) https://github.com/ceph/ceph/pull/37496 (still pending review and
> unlikely to be backported to Octopus).
>
>
> One more question - do your HDD OSDs  have additional fast (SSD/NVMe)
> drives for DB volumes? Or their DBs reside as spinning drives only? If the
> latter is true I would strongly encourage you to fix that by adding
> respective fast disks - RocksDB tend to works badly when not deployed on
> SSDs...
>
>
> Thanks,
>
> Igor
>
>
> On 7/26/2021 1:28 AM, mahnoosh shahidi wrote:
>
> Hi Igor,
> Thanks for your response.This problem happens on my osds with hdd disks. I
> set the bluefs_buffered_io to true just for these osds but it caused my
> bucket index disks (which are ssd) to produce slow ops. I also tried to set
> bluefs_buffered_io to true in bucket index osds but they filled the entire
> memory (256G) so I had to set the bluefs_buffered_io back to false in all
> osds. Is that the only way to handle the garbage collector problem? Do you
> have any ideas for the bucket index problem?
>
> On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov  wrote:
>
>> Hi Mahnoosh,
>>
>> you might want to set bluefs_buffered_io to true for every OSD.
>>
>> It looks it's false by default in v15.2.12
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
>> > We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We updated our
>> > cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago. After
>> > upgrading, the garbage collector process which is run after the
>> lifecycle
>> > process, causes slow ops and makes some osds to be restarted. In each
>> > process the garbage collector deletes about 1 million objects. Below are
>> > the one of the osd's logs before it restarts.
>> >
>> > ```
>> > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
>> > false -- internal heartbeat failed
>> > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
>> > healthy; waiting to boot
>> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map is_healthy
>> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
>> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
>> > false -- internal heartbeat failed
>> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
>> > healthy; waiting to boot
>> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map is_healthy
>> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
>> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
>> > false -- internal heartbeat failed
>> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
>> > healthy; waiting to boot
>> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map is_healthy
>> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
>> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
>> > false -- internal heartbeat failed
>> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
>> > healthy; waiting to boot
>> > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map is_healthy
>> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
>> > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
>> > false -- internal heartbeat failed
>> >
>> > ```
>> > what is the suitable configuration for gc in such a heavy delete
>> process so
>> > it doesn't make slow ops? We had the same delete load in nautilus but we
>> > didn't have any problem with that.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to 

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Igor Fedotov
Unfortunately I'm not an expert in RGW hence nothing to recommend from 
that side.


Apparently your issues are caused by bulk data removal - it appears that 
RocksDB can hardly sustain such things and its performance degrades. 
We've seen that plenty of times before.


So far there are two known workarounds - manual DB compaction with using 
ceph-kvstore-tool and setting bluefs_buffer_io to true. The latter makes 
sense for some Ceph releases which got that parameter set to false by 
default, v15.2.12 is one of them. And indeed that setting might cause 
high RAM usage in cases - you might want to look for relevant recent PRs 
at github or ask Mark Nelson from RH for more details.


Nevertheless current upstream recommendation/default is to have it set 
to true as it greatly improves DB performance.



So you might want to try to compact RocksDB as per above but please note 
that's a temporary workaround - DB might start to degrade if removals 
are going on.


There is also a PR to address the bulk removal issue in general:

1) https://github.com/ceph/ceph/pull/37496 (still pending review and 
unlikely to be backported to Octopus).



One more question - do your HDD OSDs  have additional fast (SSD/NVMe) 
drives for DB volumes? Or their DBs reside as spinning drives only? If 
the latter is true I would strongly encourage you to fix that by adding 
respective fast disks - RocksDB tend to works badly when not deployed on 
SSDs...



Thanks,

Igor


On 7/26/2021 1:28 AM, mahnoosh shahidi wrote:

Hi Igor,
Thanks for your response.This problem happens on my osds with hdd 
disks. I set the bluefs_buffered_io to true just for these osds but it 
caused my bucket index disks (which are ssd) to produce slow ops. I 
also tried to set bluefs_buffered_io to true in bucket index osds but 
they filled the entire memory (256G) so I had to set the 
bluefs_buffered_io back to false in all osds. Is that the only way to 
handle the garbage collector problem? Do you have any ideas for the 
bucket index problem?


On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov > wrote:


Hi Mahnoosh,

you might want to set bluefs_buffered_io to true for every OSD.

It looks it's false by default in v15.2.12


Thanks,

Igor

On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
> We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We
updated our
> cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago.
After
> upgrading, the garbage collector process which is run after the
lifecycle
> process, causes slow ops and makes some osds to be restarted. In
each
> process the garbage collector deletes about 1 million objects.
Below are
> the one of the osd's logs before it restarts.
>
> ```
> 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
> 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
> healthy; waiting to boot
> 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map
is_healthy
> 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400
is_healthy
> false -- internal heartbeat failed
>
> ```
> what is the suitable configuration for gc in such a heavy delete
process so
> it doesn't make slow ops? We had the same delete load in
nautilus but we
> didn't have any problem with that.
> ___
> ceph-users mailing list -- ceph-users@ceph.io

> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io

[ceph-users] Re: How to set retention on a bucket?

2021-07-26 Thread Konstantin Shalygin


> On 26 Jul 2021, at 08:05, Szabo, Istvan (Agoda)  
> wrote:
> 
> Haven't really found how to set the retention on a s3 bucket for a specific 
> day. Is there any ceph document about it?

Is not possible to set retention on specific days, only at +days from putObject 
day. LC policy is highly documented here [1]


[1] 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html
 


k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Igor Fedotov

Hi Dave,

Some notes first:

1) The following behavior is fine, BlueStore mounts in two stages - the 
first one is read-only and among other things it loads allocation map 
from DB. And that's exactly the case here.


Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.703+ 
7f0e15b3df40  1 bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 
132 GiB in 2930776 extents available 113 GiB
Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.703+ 
7f0e15b3df40  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all 
background work
Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.704+ 
7f0e15b3df40  4 rocksdb: [db/db_impl.cc:563] Shutdown complete


2) What's really broken is the following allocation attempt:

Jul 26 08:55:34 condor_sc0 docker[15282]: 2021-07-26T08:55:34.767+ 
7f0e15b3df40  1 bluefs _allocate failed to allocate 0x100716 on bdev 1, 
free 0xd; fallback to bdev 2
Jul 26 08:55:34 condor_sc0 docker[15282]: 2021-07-26T08:55:34.767+ 
7f0e15b3df40  1 bluefs _allocate unable to allocate 0x100716 on bdev 2, 
free 0x; fallback to slow device expander
Jul 26 08:55:35 condor_sc0 docker[15282]: 2021-07-26T08:55:35.042+ 
7f0e15b3df40 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
allocate_bluefs_freespace failed to allocate on 0x4000 min_size 
0x11 > allocated total 0x0 bluefs_shared_alloc_size 0x1 
allocated 0x0 available 0x 1c09738000
Jul 26 08:55:35 condor_sc0 docker[15282]: 2021-07-26T08:55:35.044+ 
7f0e15b3df40 -1 bluefs _allocate failed to expand slow device to fit 
+0x100716
Jul 26 08:55:35 condor_sc0 docker[15282]: 2021-07-26T08:55:35.044+ 
7f0e15b3df40 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 
0x100716


This occurs during BlueFS recovery and that's an attempt to get more 
space to write out the bluefs log. This shouldn't fail given the plenty 
of free space:


... available 0x 1c09738000 ...


So to get more verbose but less log one can set both debug-bluestore and 
debug-bluefs to 1/20. This way just last 1 lines of the log 
preceeding the crash would be at level 20. Which seems sufficient for 
the troubleshooting.


It would be also great to collect the output for the following commands:

ceph-bluestore-tool --path  --command bluefs-bdev-sizes

ceph-bluestore-tool --path  --command bluefs-stats


And finally you can try to switch to bitmap allocator as a workaround - 
we've fixed a couple of issues in Hybrid one which prevented from proper 
allcoations under some circumstances. The fixes were made after v15.2.11 
release hence this might be the case. So please try setting:


bluestore_allocator = bitmap

bluefs_allocator = bitmap


Thanks,

Igor


On 7/26/2021 12:14 PM, Dave Piper wrote:

Hi Igor,

Thanks for your time looking into this.

I've attached a 5 minute window of OSD logs, which includes several restart 
attempt (each one takes ~25 seconds).

When I said it looked like we were starting up in a different state, I'm referring to how 
"Recovered from manifest file" log appears twice, with different logs 
afterwards. This behaviour seems to repeat reliably on each restart of the OSD. My 
interpretation of this was that when the initial recovery attempt leads to the rocksdb 
shutdown, ceph is automatically trying to start the OSD in some alternative state but 
that this is also failing (with the bdev errors I copied in).  Possibly I'm inferring too 
much.

I tried turning up the logging levels for rocksdb and bluestore but they're 
both very spammy so I've not included this in the attached logs. Let me know if 
you think that would be helpful.

My ceph version is 15.2.11. We're running a containerized deployment using 
docker image ceph-daemon:v5.0.10-stable-5.0-octopus-centos-8 .

[qs-admin@condor_sc0 metaswitch]$ sudo docker exec b732f9135b42 ceph version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)

Cheers,

Dave



-Original Message-
From: Igor Fedotov 
Sent: 23 July 2021 20:45
To: Dave Piper ; ceph-users@ceph.io
Subject: [EXTERNAL] Re: [ceph-users] OSDs flapping with "_open_alloc loaded 132 GiB 
in 2930776 extents available 113 GiB"

Hi Dave,

The follow log line indicates that allocator has just completed loading 
information about free disk blocks into memory.  And it looks perfectly fine.


_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB
   


Subsequent rocksdb shutdown looks weird without any other log output indicating 
the issue.
Curious what do you mean under "

After that we seem to try starting up in a slightly different state and get a 
different set of errors:

"
The resulted errors show lack of disk space at some point but I'd definitely 
like to get the full startup log.

Please also specify which Octopus version do you have?

Thanks,
Igor

On 7/23/2021 6:48 PM, Dave Piper wrote:

Hi all,

We've got a containerized test cluster with 3 OSDs and ~ 220GiB of data. Shortly 
after upgrading from nautilus 

[ceph-users] How to set retention on a bucket?

2021-07-26 Thread Szabo, Istvan (Agoda)
Hi,

Haven't really found how to set the retention on a s3 bucket for a specific 
day. Is there any ceph document about it?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-26 Thread Jerry Lee
After doing more experiments, the outcome answer some of my questions:

The environment is kind of different compared to the one mentioned in
previous mail.
1) the `ceph osd tree`
 -2 2.06516  root perf_osd
 -5 0.67868  host jceph-n2-perf_osd
  2ssd  0.17331  osd.2  up
1.0  1.0
  3ssd  0.15875  osd.3  up
1.0  1.0
  4ssd  0.17331  osd.4  up
1.0  1.0
  5ssd  0.17331  osd.5  up
1.0  1.0
-25 0.69324  host Jceph-n1-perf_osd
  8ssd  0.17331  osd.8  up
1.0  1.0
  9ssd  0.17331  osd.9  up
1.0  1.0
 10ssd  0.17331  osd.10 up
1.0  1.0
 11ssd  0.17331  osd.11 up
1.0  1.0
-37 0.69324  host Jceph-n3-perf_osd
 14ssd  0.17331  osd.14 up
1.0  1.0
 15ssd  0.17331  osd.15 up
1.0  1.0
 16ssd  0.17331  osd.16 up
1.0  1.0
 17ssd  0.17331  osd.17 up
1.0  1.0

2) the used CRUSH rule for the EC8+3 pool for which the OSDs are
selected by 'osd' instead.
# ceph osd crush rule dump erasure_ruleset_by_osd
{
"rule_id": 9,
"rule_name": "erasure_ruleset_by_osd",
"ruleset": 9,
"type": 3,
"min_size": 1,
"max_size": 16,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "perf_osd"
},
{
"op": "choose_indep",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}

3) the erasure-code-profile used to create the EC8+3 pool (min_size = 8)
# ceph osd erasure-code-profile get jec_8_3
crush-device-class=ssd
crush-failure-domain=osd
crush-root=perf_ssd
k=8
m=3
plugin=isa
technique=reed_sol_van

The following consequence of acting set after unplugging only 2 OSDs:

T0:
[3,9,10,5,16,14,8,11,2,4,15]

T1: after issuing `ceph osd out 17`
[NONE,NONE,10,5,16,14,8,11,2,4,NONE]
state of this PG: "active+recovery_wait+undersized+degraded+remapped"

T2: before recovery finishes, issuing `ceph osd out 11`
[NONE,NONE,10,5,16,14,8,NONE,2,4,NONE]
state of this PG: "down+remapped"
comment: "not enough up instances of this PG to go active"

With only 2 OSDs out, a PG of the EC8+3 pool enters "down+remapped"
state.  So, it seems that the min_size of a erasure coded K+M pool
should be set to K+1 which ensures that the data is intact even one
more extra OSD is broken during recovery, although the pool may not
serve IO.

Any feedback and ideas are welcomed and appreciated!

- Jerry

On Mon, 26 Jul 2021 at 11:33, Jerry Lee  wrote:
>
> Hello Josh,
>
> I simulated the osd.14 failure by the following steps:
>1. hot unplug the disk
>2. systemctl stop ceph-osd@14
>3. ceph osd out 14
>
> The used CRUSH rule to create the EC8+3 pool is described as below:
> # ceph osd crush rule dump erasure_hdd_mhosts
> {
> "rule_id": 8,
> "rule_name": "erasure_hdd_mhosts",
> "ruleset": 8,
> "type": 3,
> "min_size": 1,
> "max_size": 16,
> "steps": [
> {
> "op": "take",
> "item": -1,
> "item_name": "default"
> },
> {
> "op": "chooseleaf_indep",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
>
> And the output of `ceph osd tree` is also attached:
> [~] # ceph osd tree
> ID   CLASS  WEIGHT   TYPE NAME  STATUS
> REWEIGHT  PRI-AFF
>  -132.36148  root default
> -13 2.69679  host jceph-n01
>   0hdd  0.89893  osd.0  up
> 1.0  1.0
>   1hdd  0.89893  osd.1  up
> 1.0  1.0
>   2hdd  0.89893  osd.2  up
> 1.0  1.0
> -17 2.69679  host jceph-n02
>   3hdd  0.89893  osd.3  up
> 1.0  1.0
>   4hdd  0.89893  osd.4  up
> 1.0  1.0
>   5hdd  0.89893  osd.5  up
> 1.0  1.0
> -21 2.69679  host jceph-n03
>   6hdd  0.89893  osd.6  up
> 1.0  1.0
>   7hdd  0.89893  osd.7  up
> 1.0  1.0
>   8hdd  0.89893  osd.8  up
> 1.0  1.0
> -25 2.69679  host jceph-n04
>   9hdd  0.89893  osd.9  up
> 1.0  1.0
>  10hdd  0.89893  osd.10 up
> 1.0  1.0
>  11hdd  0.89893  

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Fri, Jul 23, 2021 at 11:58 PM  wrote:
>
> Hi.
>
> I've followed the installation guide and got nautilus 14.2.22 running on el7 
> via https://download.ceph.com/rpm-nautilus/el7/x86_64/ yum repo.
> I'm now trying to map a device on an el7 and getting extremely weird errors:
>
> # rbd info test1/blk1 --name client.testing-rw
> rbd image 'blk1':
> size 50 GiB in 12800 objects
> order 22 (4 MiB objects)
> snapshot_count: 0
> id: 2e0929313a08e
> block_name_prefix: rbd_data.2e0929313a08e
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff, 
> deep-flatten
> op_features:
> flags:
> create_timestamp: Fri Jul 23 15:59:12 2021
> access_timestamp: Fri Jul 23 15:59:12 2021
> modify_timestamp: Fri Jul 23 15:59:12 2021
>
> # rbd device map test1/blk1 --name client.testing-rw
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (3) No such process
>
> # dmesg | tail
> [91885.624859] libceph: resolve 'name=testing-rw' (ret=-3): failed
> [91885.624863] libceph: parse_ips bad ip 
> 'name=testing-rw,key=client.testing-rw'

Hi,

I think it should be "rbd device map test1/blk1 --id testing-rw".

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Installing and Configuring RGW to an existing cluster

2021-07-26 Thread Szabo, Istvan (Agoda)
You have different ways: ceph-deploy and full manual:

Full manual:
RGW:
on all RGW yum install ceph-radosgw -y
first RGW node:
ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring
chown ceph:ceph /etc/ceph/ceph.client.radosgw.keyring
ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n 
client.rgw.servername2001 --gen-key
ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n 
client.rgw.servername2002 --gen-key
ceph-authtool /etc/ceph/ceph.client.radosgw.keyring -n 
client.rgw.servername2003 --gen-key
ceph-authtool -n client.rgw.servername2001 --cap osd 'allow rwx' --cap mon 
'allow rwx' /etc/ceph/ceph.client.radosgw.keyring
ceph-authtool -n client.rgw.servername2002 --cap osd 'allow rwx' --cap mon 
'allow rwx' /etc/ceph/ceph.client.radosgw.keyring
ceph-authtool -n client.rgw.servername2003 --cap osd 'allow rwx' --cap mon 
'allow rwx' /etc/ceph/ceph.client.radosgw.keyring
ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.rgw.servername2001 
-i /etc/ceph/ceph.client.radosgw.keyring
ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.rgw.servername2002 
-i /etc/ceph/ceph.client.radosgw.keyring
ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.rgw.servername2003 
-i /etc/ceph/ceph.client.radosgw.keyring
for i in {2..3}; do scp -pr /etc/ceph/ceph.client.radosgw.keyring 
servername200$i:/etc/ceph/ceph.client.radosgw.keyring;done
for i in {2..4}; do scp -pr /etc/ceph/ceph.conf servername200$i:/etc/ceph/;done

on the 2nd 3rd node:
chown ceph:ceph /etc/ceph/ceph.client.radosgw.keyring
systemctl start ceph-radosgw@rgw.`hostname -s` && systemctl enable 
ceph-radosgw@rgw.`hostname -s`


Ceph-deploy: 
https://docs.ceph.com/en/octopus/install/ceph-deploy/install-ceph-gateway/

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Matt Dunavant 
Sent: Thursday, July 22, 2021 11:21 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Installing and Configuring RGW to an existing cluster

Hi all,


We are currently using a ceph cluster for block storage on version 14.2.16. We 
would like to start experimenting with object storage but the ceph 
documentation doesn't seem to cover a lot of the installation or configuration 
of the RGW piece. Does anybody know where I may be able to find documentation 
about adding object storage to our existing cluster or any considerations to 
take into account while doing so?


Thanks,

Matt
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Dan van der Ster
Hi,

Do you have ceph-mon logs from when mon.osd01 first failed before the
on-call team rebooted it? They might give a clue what happened to
start this problem, which maybe is still happening now.

This looks similar but it was eventually found to be a network issue:
https://tracker.ceph.com/issues/48033

-- Dan

On Sun, Jul 25, 2021 at 6:36 PM Ansgar Jazdzewski
 wrote:
>
> Am So., 25. Juli 2021 um 18:02 Uhr schrieb Dan van der Ster
> :
> >
> > What do you have for the new global_id settings? Maybe set it to allow 
> > insecure global_id auth and see if that allows the mon to join?
>
>  auth_allow_insecure_global_id_reclaim is allowed as we still have
> some VM's not restarted
>
> # ceph config get mon.*
> WHO MASK LEVELOPTION VALUE RO
> mon  advanced auth_allow_insecure_global_id_reclaim  true
> mon  advanced mon_warn_on_insecure_global_id_reclaim false
> mon  advanced mon_warn_on_insecure_global_id_reclaim_allowed false
>
> > > I can try to move the /var/lib/ceph/mon/ dir and recreate it!?
> >
> > I'm not sure it will help. Running the mon with --debug_ms=1 might give 
> > clues why it's stuck probing.
>
> 2021-07-25 16:28:41.418 7fcc613d8700 10 mon.osd01@0(probing) e1
> probing other monitors
> 2021-07-25 16:28:41.418 7fcc613d8700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] send_to--> mon
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] -- mon_probe(probe
> a6baa789-6be2-4ce0-ab2d-7c78b899d4bd name osd01 mon_release 14) v7 --
> ?+0 0x55c6b35ae780
> 2021-07-25 16:28:41.418 7fcc613d8700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] -->
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] -- mon_probe(probe
> a6baa789-6be2-4ce0-ab2d-7c78b899d4bd name osd01 mon_release 14) v7 --
> 0x55c6b35ae780 con 0x55c6b2611180
> 2021-07-25 16:28:41.418 7fcc613d8700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] send_to--> mon
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] -- mon_probe(probe
> a6baa789-6be2-4ce0-ab2d-7c78b899d4bd name osd01 mon_release 14) v7 --
> ?+0 0x55c6b35aea00
> 2021-07-25 16:28:41.418 7fcc613d8700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] -->
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] -- mon_probe(probe
> a6baa789-6be2-4ce0-ab2d-7c78b899d4bd name osd01 mon_release 14) v7 --
> 0x55c6b35aea00 con 0x55c6b2611600
> 2021-07-25 16:28:41.814 7fcc5dbd1700  1 --2-
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] conn(0x55c6b2611600
> 0x55c6b3323c00 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=1
> rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
> 2021-07-25 16:28:41.814 7fcc62bdb700  1 --2-
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] conn(0x55c6b2611180
> 0x55c6b3323500 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=1
> rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
> 2021-07-25 16:28:41.814 7fcc62bdb700 10 mon.osd01@0(probing) e1
> ms_get_authorizer for mon
> 2021-07-25 16:28:41.814 7fcc5dbd1700 10 mon.osd01@0(probing) e1
> ms_get_authorizer for mon
> 2021-07-25 16:28:41.814 7fcc62bdb700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] conn(0x55c6b2611180
> msgr2=0x55c6b3323500 secure :-1 s=STATE_CONNECTION_ESTABLISHED
> l=0).read_bulk peer close file descriptor 27
> 2021-07-25 16:28:41.814 7fcc62bdb700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] conn(0x55c6b2611180
> msgr2=0x55c6b3323500 secure :-1 s=STATE_CONNECTION_ESTABLISHED
> l=0).read_until read failed
> 2021-07-25 16:28:41.814 7fcc62bdb700  1 --2-
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.172:3300/0,v1:10.152.28.172:6789/0] conn(0x55c6b2611180
> 0x55c6b3323500 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rev1=1
> rx=0x55c6b34bbad0 tx=0x55c6b3528130).handle_read_frame_preamble_main
> read frame preamble failed r=-1 ((1) Operation not permitted)
> 2021-07-25 16:28:41.814 7fcc5dbd1700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] conn(0x55c6b2611600
> msgr2=0x55c6b3323c00 secure :-1 s=STATE_CONNECTION_ESTABLISHED
> l=0).read_bulk peer close file descriptor 28
> 2021-07-25 16:28:41.814 7fcc5dbd1700  1 --
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] conn(0x55c6b2611600
> msgr2=0x55c6b3323c00 secure :-1 s=STATE_CONNECTION_ESTABLISHED
> l=0).read_until read failed
> 2021-07-25 16:28:41.814 7fcc5dbd1700  1 --2-
> [v2:10.152.28.171:3300/0,v1:10.152.28.171:6789/0] >>
> [v2:10.152.28.173:3300/0,v1:10.152.28.173:6789/0] conn(0x55c6b2611600
> 0x55c6b3323c00 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rev1=1
> rx=0x55c6b3553830