from:"David Caro"

[ceph-users] Re: Multipath and cephadm

2021-12-24 Thread David Caro

I did not really look deep, but by the last log it seems there's some utf
chars somewhere (Greek phi?) And the code is not handling it well when
logging, trying to use ASCII.

On Thu, 23 Dec 2021, 19:02 Michal Strnad,  wrote:

> Hi all.
>
> We have problem using disks accessible via multipath. We are using
> cephadm for deployment, Pacific version for containers, CentOS 8 Stream
> on servers and following LVM configuration.
>
> devices {
>  multipath_component_detection = 1
> }
>
>
>
> We tried several methods.
>
> 1.) Direct approach.
>
> cephadm shell ceph orch daemon add osd serverX:/dev/mapper/mpatha
>
> Errors are attached in 1.output file.
>
>
>
> 2.  With the help of OSD specifications where they are mpathX devices used.
>
> service_type: osd
> service_id: osd-spec-serverX
> placement:
>host_pattern: 'serverX'
> spec:
>data_devices:
>  paths:
>- /dev/mapper/mpathaj
>- /dev/mapper/mpathan
>- /dev/mapper/mpatham
>db_devices:
>  paths:
>- /dev/sdc
> encrypted: true
>
> Errors are attached in 2.output file.
>
>
> 2.  With the help of OSD specifications where they are dm-X devices used.
>
> service_type: osd
> service_id: osd-spec-serverX
> placement:
>host_pattern: 'serverX'
> spec:
>data_devices:
>  paths:
>- /dev/dm-1
>- /dev/dm-2
>- /dev/dm-3
>- /dev/dm-X
>db_devices:
>  size: ':2TB'
> encrypted: true
>
> Errors are attached in 3.output file.
>
> What is the right method for multipath deployments? I didn't find much
> on this topic.
>
> Thank you
>
> Michal
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

2021-09-28 Thread David Caro


Just curious, does it always happen with the same OSDs?

On 09/28 16:14, Javier Cacheiro wrote:
> Interestingly enough this happens for some pools and not for others.
> 
> For example I have just realized that when trying to connect to another
> pool the client is correctly directed to the OSD public_network address:
> 
> >> strace -f -e trace=network -s 1 rbd ls --pool cinder-volumes --name
> client.cinder 2>&1| grep sin_addr
> [pid 2363212] connect(15, {sa_family=AF_INET, sin_port=htons(6816),
> sin_addr=inet_addr("*10.113.29.7*")}, 16) = 0
> 
> But same client listing the ephemeral-vms pools is directed to the OSD
> cluster address:
> >> strace -f -e trace=network -s 1 rbd ls --pool ephemeral-vms --name
> client.cinder 2>&1| grep sin_addr
> [pid 2363485] connect(14, {sa_family=AF_INET, sin_port=htons(6806),
> sin_addr=inet_addr("*10.114.29.10*")}, 16) = -1 EINPROGRESS (Operation now
> in progress)
> 
> Very weird!
> 
> 
> 
> On Tue, 28 Sept 2021 at 16:02, Javier Cacheiro 
> wrote:
> 
> > Hi all,
> >
> > I am trying to understand a issue with ceph directing clients to connect
> > to OSDs through their cluster_network address instead of their
> > public_network address.
> >
> > I have a configured a ceph cluster with a public and cluster network:
> >
> > >> ceph config dump|grep network
> > global   advanced  cluster_network *10.114.0.0/16
> > <http://10.114.0.0/16>*  *
> >   monadvanced  public_network  10.113.0.0/16   *
> >
> > I upgraded the cluster from 16.2.4 to 16.2.6.
> >
> > After that, I am seeing that ceph is directing clients to connect to OSD's
> > cluster_network address instead of their public_address:
> >
> > >> strace -f -e trace=network -s 1 rbd ls --pool ephemeral-vms --name
> > client.cinder
> > 
> > [pid 2353692] connect(14, {sa_family=AF_INET, sin_port=htons(6806),
> > sin_addr=inet_addr("*10.114.29.10*")}, 16) = -1 EINPROGRESS (Operation
> > now in progress)
> >
> > In this case the client hangs because it is not able to access the
> > address, since its an internal address.
> >
> > This appeared after upgrading to 16.2.6, but I am not sure it was due to
> > the upgrade or it was a hidden issue that appeared after the nodes were
> > rebooted.
> >
> > It can also be that I am missing something in the config, but this config
> > was generated by the cephadm bootstrap command and not created by hand, and
> > it worked before the upgrade/reboot so I am pretty confident with it.
> >
> > What do you think, can this be a bug or is more a misconfiguration on my
> > side?
> >
> > Thanks,
> > Javier
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD swapping on Pacific

2021-08-16 Thread David Caro


Found some option that seemed to cause some trouble in the past, 
`bluefs_buffered_io`, it has been disabled/enabled by
default a couple times (disabled on v15.2.2, enabled on v15.2.13), it seems it 
might have a big effect on performance
and swapping behavior, but might be a lead.

On 08/16 14:10, Alexander Sporleder wrote:
> Hello David,
> 
> Unfortunately "vm.swapiness" dose not change the behavior. Tweaks on the 
> container side  (--memory-swappiness and --
> memory-swap) might make sens but I did not found any Ceph related suggestion. 
> 
> 
> Am Montag, dem 16.08.2021 um 13:52 +0200 schrieb David Caro:
> > Afaik the swapping behavior is controlled by the kernel, there might be 
> > some tweaks on the container engine side, but
> > you might want to try to tweak the default behavior by lowering the 
> > 'vm.swapiness' of the kernel:
> > 
> > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables
> > 
> > 
> > 
> > On 08/16 13:14, Alexander Sporleder wrote:
> > > Hello list! 
> > > We have a containerized Pacific (16.2.5) Cluster running CentOS 8.4 and 
> > > after a few weeks the OSDs start to use swap
> > > quite a lot despite free memory. The host has 196 GB of memory and 24 
> > > OSDs. "OSD Memory Target" is set to 6 GB. 
> > > 
> > > 
> > > 
> > > $ cat /proc/meminfo 
> > > MemTotal:   196426616 kB
> > > MemFree:    11675608 kB
> > > MemAvailable:   48940232 kB
> > > Buffers:    46757632 kB
> > > Cached:   653216 kB
> > > 
> > > 
> > > 
> > > $ smem -k
> > > Command  Swap  USS  PSS  RSS 
> > > ceph /usr/bin/ceph-osd -n osd.22 1.7G 3.7G 3.7G 3.7G 
> > > ceph /usr/bin/ceph-osd -n osd.10   853.4M 4.6G 4.6G 4.6G 
> > > ceph /usr/bin/ceph-osd -n osd.12   793.6M 4.6G 4.6G 4.6G 
> > > ceph /usr/bin/ceph-osd -n osd.92   561.3M 4.7G 4.7G 4.7G 
> > > ceph /usr/bin/ceph-osd -n osd.14   647.2M 4.9G 4.9G 4.9G 
> > > ceph /usr/bin/ceph-osd -n osd.15   567.8M 5.0G 5.0G 5.0G
> > > 
> > > 
> > > 
> > > Is that a known behavior, an bug or configuration problem? On two hosts I 
> > > turned of swap and the OSDs a running
> > > happily
> > > now for more the 6 weeks. 
> > > 
> > > Bets,
> > > Alex
> > > 
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD swapping on Pacific

2021-08-16 Thread David Caro

Afaik the swapping behavior is controlled by the kernel, there might be some 
tweaks on the container engine side, but
you might want to try to tweak the default behavior by lowering the 
'vm.swapiness' of the kernel:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables



On 08/16 13:14, Alexander Sporleder wrote:
> Hello list! 
> We have a containerized Pacific (16.2.5) Cluster running CentOS 8.4 and after 
> a few weeks the OSDs start to use swap
> quite a lot despite free memory. The host has 196 GB of memory and 24 OSDs. 
> "OSD Memory Target" is set to 6 GB. 
> 
> 
> 
> $ cat /proc/meminfo 
> MemTotal:   196426616 kB
> MemFree:11675608 kB
> MemAvailable:   48940232 kB
> Buffers:46757632 kB
> Cached:   653216 kB
> 
> 
> 
> $ smem -k
> CommandSwap  USS  PSS  
> RSS 
> ceph /usr/bin/ceph-osd -n osd.22 1.7G 3.7G 3.7G 3.7G 
> ceph /usr/bin/ceph-osd -n osd.10   853.4M 4.6G 4.6G 4.6G 
> ceph /usr/bin/ceph-osd -n osd.12   793.6M 4.6G 4.6G 4.6G 
> ceph /usr/bin/ceph-osd -n osd.92   561.3M 4.7G 4.7G 4.7G 
> ceph /usr/bin/ceph-osd -n osd.14   647.2M 4.9G 4.9G 4.9G 
> ceph /usr/bin/ceph-osd -n osd.15   567.8M 5.0G 5.0G 5.0G
> 
> 
> 
> Is that a known behavior, an bug or configuration problem? On two hosts I 
> turned of swap and the OSDs a running happily
> now for more the 6 weeks. 
> 
> Bets,
> Alex
> 
> _______
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: All OSDs on one host down

2021-08-06 Thread David Caro

On 08/06 07:59, Andrew Walker-Brown wrote:
> Hi Marc,
> 
> Yes i’m probably doing just that.
> 
> The ceph admin guides aren’t exactly helpful on this.  The cluster was 
> deployed using cephadm and it’s been running perfectly until now.
> 
> Wouldn’t running “journalctl -u ceph-osd@5” on host ceph-004 show me the logs 
> for osd.5 on that host?

On my containerized setup, the services that cephadm created are:

dcaro@node1:~ $ sudo systemctl list-units | grep ceph
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@crash.node1.service 
loaded active 
running   Ceph crash.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mgr.node1.mhqltg.service
loaded active 
running   Ceph mgr.node1.mhqltg for d49b287a-b680-11eb-95d4-e45f010c03a8
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mon.node1.service   
loaded active 
running   Ceph mon.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.3.service   
loaded active 
running   Ceph osd.3 for d49b287a-b680-11eb-95d4-e45f010c03a8
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.7.service   
loaded active 
running   Ceph osd.7 for d49b287a-b680-11eb-95d4-e45f010c03a8
  system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice 
loaded active 
activesystem-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice
  ceph-d49b287a-b680-11eb-95d4-e45f010c03a8.target  
loaded active 
activeCeph cluster d49b287a-b680-11eb-95d4-e45f010c03a8
  ceph.target   
loaded active 
activeAll Ceph clusters and services

where the string after 'ceph-' is the fsid of the cluster.
Hope that helps (you can use the systemctl list-units also to search the 
specific ones on yours).


> 
> Cheers,
> A
> 
> 
> 
> 
> 
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
> 
> From: Marc<mailto:m...@f1-outsourcing.eu>
> Sent: 06 August 2021 08:54
> To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; 
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> Subject: RE: All OSDs on one host down
> 
> >
> > I’ve tried restarting on of the osds but that fails, journalctl shows
> > osd not found.not convinced I’ve got the systemctl command right.
> >
> 
> You are not mixing 'not container commands' with 'container commands'. As in, 
> if you execute this journalctl outside of the container it will not find 
> anything of course.
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph and openstack throttling experience

2021-06-10 Thread David Caro

On 06/10 14:05, Marcel Kuiper wrote:
> Hi David,
> 
> That is very helpful thank you. When looking at the graphs I notice that the
> bandwidth used looks as if this is very low. Or am I misinterpreting the
> bandwidth graphs?

Hey, sorry for the delay, something broke :)

What graphs in specific are you looking?

> 
> Regards
> 
> Marcel
> 
> David Caro schreef op 2021-06-10 11:49:
> > We have a similar setup, way smaller though (~120 osds right now) :)
> > 
> > We have different capped VMs, but most have 500 write, 1000 read iops
> > cap, you can see it in effect here:
> > https://cloud-ceph-performance-tests.toolforge.org/
> > 
> > We are currently running Octopus v15.2.11.
> > 
> > It's a very 'bare' ui (under construction), but check the
> > 'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
> > 'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
> > 'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
> > that hit the cap.
> > 
> > From there you can also see the numbers of the tests running uncapped
> > (in the 'rbd_from_hypervisor' or 'rbd_from_osd'
> > suites).
> > 
> > You can see the current iops of our ceph cluster here:
> > https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1
> > 
> > Of our openstack setup:
> > https://grafana.wikimedia.org/d/00579/wmcs-openstack-eqiad1?orgId=1=15m
> > 
> > And some details on the traffic openstck puts on each ceph osd host
> > here:
> > https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1=5m
> > 
> > We are working on revamping those graphs right now, so it might become
> > easier to see numbers in a few weeks.
> > 
> > 
> > We don't usually see slow ops with the current load, though we
> > recommend not using ceph for very latency sensitive VMs
> > (like etcd), as on the network layer there's some hardware limits we
> > can't remove right now.
> > 
> > Hope that helps.
> > 
> > On 06/10 10:54, Marcel Kuiper wrote:
> > > Hi
> > > 
> > > We're running ceph nautilus 14.2.21 (going to octopus latest in a
> > > few weeks)
> > > as volume and instance backend for our openstack vm's. Our clusters
> > > run
> > > somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal
> > > and db
> > > device
> > > 
> > > Currently we do not have our vm's capped on iops and throughput. We
> > > regularly get slowops warnings (once or twice per day) and wonder
> > > whether
> > > there are more users with sort of the same setup that do throttle
> > > their
> > > openstack vm's.
> > > 
> > > - What kind of numbers are used in the field for IOPS and throughput
> > > limiting?
> > > 
> > > - As a side question, is there an easy way to get rid of the slowops
> > > warning
> > > besides restarting the involved osd. Otherwise the warning seems to
> > > stay
> > > forever
> > > 
> > > Regards
> > > 
> > > Marcel
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph and openstack throttling experience

2021-06-10 Thread David Caro

We have a similar setup, way smaller though (~120 osds right now) :)

We have different capped VMs, but most have 500 write, 1000 read iops cap, you 
can see it in effect here:
https://cloud-ceph-performance-tests.toolforge.org/

We are currently running Octopus v15.2.11.

It's a very 'bare' ui (under construction), but check the 
'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests that hit 
the cap.

From there you can also see the numbers of the tests running uncapped (in the 
'rbd_from_hypervisor' or 'rbd_from_osd'
suites).

You can see the current iops of our ceph cluster here:
https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1

Of our openstack setup:
https://grafana.wikimedia.org/d/00579/wmcs-openstack-eqiad1?orgId=1=15m

And some details on the traffic openstck puts on each ceph osd host here:
https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1=5m

We are working on revamping those graphs right now, so it might become easier 
to see numbers in a few weeks.

We don't usually see slow ops with the current load, though we recommend not 
using ceph for very latency sensitive VMs
(like etcd), as on the network layer there's some hardware limits we can't 
remove right now.

Hope that helps.

On 06/10 10:54, Marcel Kuiper wrote:
> Hi
> 
> We're running ceph nautilus 14.2.21 (going to octopus latest in a few weeks)
> as volume and instance backend for our openstack vm's. Our clusters run
> somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal and db
> device
> 
> Currently we do not have our vm's capped on iops and throughput. We
> regularly get slowops warnings (once or twice per day) and wonder whether
> there are more users with sort of the same setup that do throttle their
> openstack vm's.
> 
> - What kind of numbers are used in the field for IOPS and throughput
> limiting?
> 
> - As a side question, is there an easy way to get rid of the slowops warning
> besides restarting the involved osd. Otherwise the warning seems to stay
> forever
> 
> Regards
> 
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Integration of openstack to ceph

2021-06-10 Thread David Caro

Hi, we are working on doing something similar, and there's mainly two ways we 
integrate it:

* cinder (openstack project) and rbd (ceph), for volumes, this has been working 
well for a while.
* swift (openstack project) and rgw (ceph), for object storage, this is under 
evaluation.

You might be able to use a different integration skipping the openstack project 
layer, but we have that as a
requirement. Though the opensstack project layer allows quota and user 
management on the openstack side, so it's easier
to adopt for us.

Let us know if you find another way, and how it goes for you :)

On 06/10 10:06, Michel Niyoyita wrote:
> Dear Ceph Users,
> 
> Anyone can help on the guidance of how I can integrate ceph to openstack ?
> especially RGW.
> 
> Regards
> 
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread David Caro

On 05/06 14:03, mabi wrote:
> Hello,
> 
> I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with 
> cephadm and I added a second OSD to one of my 3 OSD nodes. I started then 
> copying data to my ceph fs mounted with kernel mount but then both OSDs on 
> that specific nodes crashed.
> 
> To this topic I have the following questions:
> 
> 1) How can I find out why the two OSD crashed? because everything is in 
> podman containers I don't know where are the logs to find out the reason why 
> this happened. From the OS itself everything looks ok, there was no out of 
> memory error.

There should be some logs under /var/log/ceph//osd./ on 
the host/hosts that were running the osds.
I found myself sometimes though disabling the '--rm' flag for the pod in the 
'unit.run' script under
/va/lib/ceph//osd./unit.run to make podman persist the container 
and be able to do a 'podman logs' on it.
Though that's probably sensible only when debugging.

> 
> 2) I would assume the two OSD container would restart on their own but this 
> is not the case it looks like. How can I restart manually these 2 OSD 
> containers on that node? I believe this should be a "cephadm orch" command?

I think 'ceph orch daemon redeploy' might do it? What is the output of 'ceph 
orch ls' and 'ceph orch ps'?
> 
> The health of the cluster right now is:
> 
> CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
> PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded 
> (33.333%), 65 pgs degraded, 65 pgs undersized
> 
> Thank your for your hints.
> 
> Best regards,
> Mabi
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread David Caro


I think that the recovery might be blocked due to all those PGs in inactive 
state:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/monitoring-a-ceph-storage-cluster#identifying-stuck-placement-groups_admin

"""
 Inactive: Placement groups cannot process reads or writes because they are 
waiting for an OSD with the most up-to-date data to come back up.
"""

What is your pool configuration? And other configs?

Can you send the output of  "ceph config dump" and "osd pool detail"?


On 05/05 11:00, Andres Rojas Guerrero wrote:
> Yes, the principal problem is the MDS start to report slowly and the
> information is no longer accessible, and the cluster never recover.
> 
> 
> # ceph status
>   cluster:
> id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
> health: HEALTH_WARN
> 2 clients failing to respond to capability release
> 2 MDSs report slow metadata IOs
> 1 MDSs report slow requests
> 2 MDSs behind on trimming
> Reduced data availability: 238 pgs inactive, 8 pgs down, 230
> pgs incomplete
> Degraded data redundancy: 1400453/220552172 objects degraded
> (0.635%), 461 pgs degraded, 464 pgs undersized
> 241 slow ops, oldest one blocked for 638 sec, daemons
> [osd.101,osd.127,osd.155,osd.166,osd.172,osd.189,osd.200,osd.210,osd.214,osd.233]...
> have slow ops.
> 
>   services:
> mon: 3 daemons, quorum ceph2mon01,ceph2mon02,ceph2mon03 (age 25h)
> mgr: ceph2mon02(active, since 6d), standbys: ceph2mon01, ceph2mon03
> mds: nxtclfs:2 {0=ceph2mon01=up:active,1=ceph2mon02=up:active} 1
> up:standby
> osd: 768 osds: 736 up (since 11m), 736 in (since 95s); 416 remapped pgs
> 
>   data:
> pools:   2 pools, 16384 pgs
> objects: 33.40M objects, 39 TiB
> usage:   63 TiB used, 2.6 PiB / 2.6 PiB avail
> pgs: 1.489% pgs not active
>  1400453/220552172 objects degraded (0.635%)
>  15676 active+clean
>  285   active+undersized+degraded+remapped+backfill_wait
>  230   incomplete
>  176   active+undersized+degraded+remapped+backfilling
>  8 down
>  6 peering
>  3 active+undersized+remapped
> 
> El 5/5/21 a las 10:54, David Caro escribió:
> > 
> > Can you share more information?
> > 
> > The output of 'ceph status' when the osd is down would help, also 'ceph 
> > health detail' could be useful.
> > 
> > On 05/05 10:48, Andres Rojas Guerrero wrote:
> >> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
> >> when some OSD go down the cluster doesn't start recover. I have checked
> >> that the option noout is unset.
> >>
> >> What could be the reason for this behavior?
> >>
> >>
> >>
> >> -- 
> >> ***
> >> Andrés Rojas Guerrero
> >> Unidad Sistemas Linux
> >> Area Arquitectura Tecnológica
> >> Secretaría General Adjunta de Informática
> >> Consejo Superior de Investigaciones Científicas (CSIC)
> >> Pinar 19
> >> 28006 - Madrid
> >> Tel: +34 915680059 -- Ext. 990059
> >> email: a.ro...@csic.es
> >> ID comunicate.csic.es: @50852720l:matrix.csic.es
> >> ***
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> 
> -- 
> ***
> Andrés Rojas Guerrero
> Unidad Sistemas Linux
> Area Arquitectura Tecnológica
> Secretaría General Adjunta de Informática
> Consejo Superior de Investigaciones Científicas (CSIC)
> Pinar 19
> 28006 - Madrid
> Tel: +34 915680059 -- Ext. 990059
> email: a.ro...@csic.es
> ID comunicate.csic.es: @50852720l:matrix.csic.es
> ***

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-05 Thread David Caro


Can you share more information?

The output of 'ceph status' when the osd is down would help, also 'ceph health 
detail' could be useful.

On 05/05 10:48, Andres Rojas Guerrero wrote:
> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
> when some OSD go down the cluster doesn't start recover. I have checked
> that the option noout is unset.
> 
> What could be the reason for this behavior?
> 
> 
> 
> -- 
> ***
> Andrés Rojas Guerrero
> Unidad Sistemas Linux
> Area Arquitectura Tecnológica
> Secretaría General Adjunta de Informática
> Consejo Superior de Investigaciones Científicas (CSIC)
> Pinar 19
> 28006 - Madrid
> Tel: +34 915680059 -- Ext. 990059
> email: a.ro...@csic.es
> ID comunicate.csic.es: @50852720l:matrix.csic.es
> ***
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cannot create issue in bugtracker

2021-05-03 Thread David Caro


I created an issue during the weekend without problems:

https://tracker.ceph.com/issues/50604


On 05/03 09:36, Tobias Urdin wrote:
> Hello,
> 
> Anybody, still error?
> 
> 
> Best regards
> 
> -
> 
> 
> Internal error
> An error occurred on the page you were trying to access.
> If you continue to experience problems please contact your Redmine 
> administrator for assistance.
> 
> If you are the Redmine administrator, check your log files for details about 
> the error.
> 
> 
> From: Tobias Urdin 
> Sent: Friday, April 30, 2021 2:52:57 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Cannot create issue in bugtracker
> 
> Hello,
> 
> 
> Is it only me that's getting Internal error when trying to create issues in 
> the bugtracker for some day or two?
> 
> https://tracker.ceph.com/issues/new
> 
> 
> Best regards
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: what-does-nosuchkey-error-mean-while-subscribing-for-notification-in-ceph

2021-04-16 Thread David Caro


What does notif.xml have in it?

Looking at the docs you linked, I say that it does not find the `S3Key` from 
that xml for whatever reason.

On 04/16 06:54, Szabo, Istvan (Agoda) wrote:
> Hi,
> 
> 
> I am trying to follow this url 
> https://docs.ceph.com/en/latest/radosgw/s3/bucketops/#create-notification
> 
> to create a publisher for my bucket into a topic.
> 
> My curl:
> 
> curl -v -H 'Date: Fri, 16 Apr 2021 05:21:14 +' -H 'Authorization: AWS 
> accessid:secretkey' -L -H 'content-type: text/xml' -H 'Content-MD5: 
> pBRX39Oo7aAUYbilIYMoAw==' -T notif.xml http://ceph:8080/vig-test?notification
> 
> and it returns me this error
> 
> 
> 
> 
> 
>   NoSuchKey
> 
>   vig-test
> 
>   tx0016ac570-0060791ecb-1c7e96b-hkg
> 
>   1c7e96b-hkg-data
> 
> 
> 
> 
> Does anybody know what does this error mean in Ceph? How can I proceed?
> 
> 
> Thank you
> 
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] How to handle bluestore fragmentation

2021-04-15 Thread David Caro

Reading the thread "s3 requires twice the space it should use", Boris pointed
out that the fragmentation for the osds is around 0.8-0.9:

> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens  wrote:
>> I also checked the fragmentation on the bluestore OSDs and it is around
>> 0.80 - 0.89 on most OSDs. yikes.
>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
>> {
>> "fragmentation_rating": 0.85906054329923576
>> }

And that made me wonder what is the current recommended (and not recommended)
way to handle and reduce the fragmentation of the existing OSDs.

Reading around I would think of tweaking the min_alloc_size_{ssd,hdd} and
redeploying those OSDs, but I was unable to find much else, I wonder what do
people do?

ps. There was another thread that got no replies asking something similar (and
a bunch of other things):
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/3PITWZRNX7RFRQNG33VSNKYGOO2IFMZG/

signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v14.2.19 Nautilus released

2021-03-30 Thread David Caro

Thanks for the quick release! \o/

On Tue, 30 Mar 2021, 22:30 David Galloway,  wrote:

> This is the 19th update to the Ceph Nautilus release series. This is a
> hotfix release to prevent daemons from binding to loopback network
> interfaces. All nautilus users are advised to upgrade to this release.
>
> Notable Changes
> ---
>
> * This release fixes a regression introduced in v14.2.18 whereby in
> certain environments, OSDs will bind to 127.0.0.1.  See
> https://tracker.ceph.com/issues/49938.
>
> Getting Ceph
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-14.2.19.tar.gz
> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
> * Release git sha1: bb796b9b5bab9463106022eef406373182465d11
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

2021-03-12 Thread David Caro


I might be wrong, but maybe the containers are missing something?
The easiest way to check if accessing those directly, but from the looks of it 
it seems some python
packages/installation issue.


Adding also more info like 'ceph versions', 'docker images'/'docker ps' might 
also help figuring out what's the issue.


On 03/12 16:33, Marc wrote:
> 
> Python3 14.2.11 is still supporting python2, I can't imagine that a minor 
> update has such a change. Furthermore was el7 officially supported not?
> 
> 
> 
> > -Original Message-----
> > From: David Caro 
> > Sent: 12 March 2021 17:28
> > To: Stefan Kooman 
> > Cc: ceph-users@ceph.io
> > Subject: [ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue
> > 
> > That looks like a python version issue (running python2 when it should
> > use python3).
> > Are the container images you use available for the rest of the world to
> > look into?
> > If so you can share the image, that would held debugging.
> > 
> > If not, I suggest checking the python version in the containers.
> > 
> > On 03/12 17:19, Stefan Kooman wrote:
> > > Hi,
> > >
> > > After upgrading a Ceph cluster to 14.2.17 with ceph-ansible (docker
> > > containers) the manager hits an issue:
> > >
> > > Module 'volumes' has failed dependency: No module named typing, python
> > > trace:
> > >
> > > 2021-03-12 17:04:22.358 7f299ac75e40 1 mgr[py] Loading python module
> > > 'volumes'
> > > 2021-03-12 17:04:22.458 7f299ac75e40 -1 mgr[py] Module not found:
> > 'volumes'
> > > 2021-03-12 17:04:22.458 7f299ac75e40 -1 mgr[py] Traceback (most recent
> > call
> > > last):
> > > File "/usr/share/ceph/mgr/volumes/__init__.py", line 2, in 
> > > from .module import Module
> > > File "/usr/share/ceph/mgr/volumes/module.py", line 10, in 
> > > from .fs.volume import VolumeClient
> > > File "/usr/share/ceph/mgr/volumes/fs/volume.py", line 13, in 
> > > from .operations.subvolume import open_subvol, create_subvol,
> > remove_subvol,
> > > \
> > > File "/usr/share/ceph/mgr/volumes/fs/operations/subvolume.py", line 8,
> > in
> > > 
> > > from .versions import loaded_subvolumes
> > > File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py",
> > line
> > > 9, in 
> > > from .subvolume_v1 import SubvolumeV1
> > > File
> > "/usr/share/ceph/mgr/volumes/fs/operations/versions/subvolume_v1.py",
> > > line 9, in 
> > > from typing import List, Dict
> > > ImportError: No module named typing
> > >
> > > I created tracker [1] for this issue with this information. Not sure
> > if non
> > > containerized deployments hit this issue as well. I will find thta out
> > > somewhere next week.
> > >
> > > FYI,
> > >
> > > Gr. Stefan
> > >
> > > [1]: https://tracker.ceph.com/issues/49770
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> > --
> > David Caro
> > SRE - Cloud Services
> > Wikimedia Foundation <https://wikimediafoundation.org/>
> > PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
> > 
> > "Imagine a world in which every single human being can freely share in
> > the
> > sum of all knowledge. That's our commitment."

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

2021-03-12 Thread David Caro

That looks like a python version issue (running python2 when it should use 
python3).
Are the container images you use available for the rest of the world to look 
into?
If so you can share the image, that would held debugging.

If not, I suggest checking the python version in the containers.

On 03/12 17:19, Stefan Kooman wrote:
> Hi,
> 
> After upgrading a Ceph cluster to 14.2.17 with ceph-ansible (docker
> containers) the manager hits an issue:
> 
> Module 'volumes' has failed dependency: No module named typing, python
> trace:
> 
> 2021-03-12 17:04:22.358 7f299ac75e40 1 mgr[py] Loading python module
> 'volumes'
> 2021-03-12 17:04:22.458 7f299ac75e40 -1 mgr[py] Module not found: 'volumes'
> 2021-03-12 17:04:22.458 7f299ac75e40 -1 mgr[py] Traceback (most recent call
> last):
> File "/usr/share/ceph/mgr/volumes/__init__.py", line 2, in 
> from .module import Module
> File "/usr/share/ceph/mgr/volumes/module.py", line 10, in 
> from .fs.volume import VolumeClient
> File "/usr/share/ceph/mgr/volumes/fs/volume.py", line 13, in 
> from .operations.subvolume import open_subvol, create_subvol, remove_subvol,
> \
> File "/usr/share/ceph/mgr/volumes/fs/operations/subvolume.py", line 8, in
> 
> from .versions import loaded_subvolumes
> File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line
> 9, in 
> from .subvolume_v1 import SubvolumeV1
> File "/usr/share/ceph/mgr/volumes/fs/operations/versions/subvolume_v1.py",
> line 9, in 
> from typing import List, Dict
> ImportError: No module named typing
> 
> I created tracker [1] for this issue with this information. Not sure if non
> containerized deployments hit this issue as well. I will find thta out
> somewhere next week.
> 
> FYI,
> 
> Gr. Stefan
> 
> [1]: https://tracker.ceph.com/issues/49770
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Metadata for LibRADOS

2021-03-04 Thread David Caro

 > > > > > 
> > > > > > > Are there any features like this in libRADOS?
> > > > > > > 
> > > > > > > Thank you
> > > > > > > ___
> > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > 
> > > > > > Matt Benjamin
> > > > > > Red Hat, Inc.
> > > > > > 315 West Huron Street, Suite 140A
> > > > > > Ann Arbor, Michigan 48103
> > > > > > 
> > > > > > http://www.redhat.com/en/technologies/storage
> > > > > > 
> > > > > > tel.  734-821-5101
> > > > > > fax.  734-769-8938
> > > > > > cel.  734-216-5309
> > > > > > 
> > > > 
> > > > --
> > > > 
> > > > Matt Benjamin
> > > > Red Hat, Inc.
> > > > 315 West Huron Street, Suite 140A
> > > > Ann Arbor, Michigan 48103
> > > > 
> > > > http://www.redhat.com/en/technologies/storage
> > > > 
> > > > tel.  734-821-5101
> > > > fax.  734-769-8938
> > > > cel.  734-216-5309
> > > > 
> > > > 
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > 
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Need Clarification on Maintenance Shutdown Procedure

2021-03-02 Thread David Caro

On 03/01 21:41, Dave Hall wrote:
> Hello,
> 
> I've had a look at the instructions for clean shutdown given at
> https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/, but
> I'm not clear about some things on the steps about shutting down the
> various Ceph components.
> 
> For my current 3-node cluster I have MONs, MDSs, MGRs, and OSDs all running
> on the same nodes.  Also, this is a non-container installation.
> 
> Since I don't have separate dedicated nodes, as described in the referenced
> web page, I think  the instructions mean that I need to issue SystemD
> commands to stop the corresponding services/targets on each node for the
> Ceph components mentioned in each step.

Yep, the systemd units are usually named 'ceph-@', for example
'ceph-osd@45' would be the systemd unit for osd.45.

> 
> Since we want to bring services up in the right order, I should also use
> SystemD commands to disable these services/targets so they don't
> automatically restart when I power the nodes back on.  After power-on, I
> would then re-enable and manually start services/targets in the order
> described.

Also yes, and if you use some configuration management or similar that might
bring them up automatically you might want to disable it temporarily too.

> 
> One other specific question:  For step 4 it says to shut down my service
> nodes.  Does this mean my MDSs?  (I'm not running any Object Gateways or
> NFS, but I think these would go in this step as well?)

Yes, that is correct. Monitor would be the MONs, and admin the MGRs.

> 
> Please let me know if I've got this right.  The cluster contains 200TB of a
> researcher's data that has taken a year to collect, so caution is needed.

Can you share a bit more about your setup? Are you using replicas? How many?
Erasure coding? (a ceph osd pool ls detail , ceph osd status or similar can
help too).


I would recommend trying to get the hand of the process in a test environment
first.

Cheers!

> 
> Thanks.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread David Caro

Have you tried just using them?
(RO, if you do RW things might go crazy, would be nice to try though).

You might be able to create a clone too, and I guess worst case just cp/deep
cp.

I'm interested in your findings btw. I'd be greatful if you share them :)

Thanks!

On 01/20 14:23, Adam Boyhan wrote:
> I have been doing some testing with RBD-Mirror Snapshots to a remote Ceph 
> cluster. 
> 
> Does anyone know if the images on the remote cluster can be utilized in 
> anyway? Would love the ability to clone them, or even readonly would be nice. 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Increase number of objects in flight during recovery

2020-12-03 Thread David Caro

Hi Frank,

out of curiosity, can you share the recovery rates you are seeing?
I would appreciate it, thanks!

On 12/03 09:44, Frank Schilder wrote:
> Hi Janne,
> 
> looked at it already. The recovery rate is unbearably slow and I would like 
> to increase it. The % misplaced objects is decreasing unnecessarily slow.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Janne Johansson 
> Sent: 03 December 2020 10:41:29
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Increase number of objects in flight during recovery
> 
> Den tors 3 dec. 2020 kl 10:11 skrev Frank Schilder 
> mailto:fr...@dtu.dk>>:
> I have the opposite problem as discussed in "slow down keys/s in recovery". I 
> need to increase the number of objects in flight during rebalance. It is 
> already all remapped PGs in state backfilling, but it looks like no more than 
> 8 objects/sec are transferred per PG at a time. The pools sits on 
> high-performance SSDs and could easily handle a transfer of 100 or more 
> objects/sec simultaneously. Is there any way to increase the number of 
> transfers/sec or simultaneous transfers? Increasing the options 
> osd_max_backfills and osd_recovery_max_active has no effect.
> Background: The pool in question (con-fs2-meta2) is the default data pool of 
> a ceph fs, which stores exclusively the kind of meta data that goes into this 
> pool. Storage consumption is reported as 0, but the number of objects is huge:
> 
> I don't run cephfs so it might not map 100%, but I think that pools for which 
> ceph stores file/object metadata (radosgw pools in my case) will show a 
> completely "false" numbers while recovering, which I think is because there 
> are tons of object metadata applied as metadata on 0-sized objects. This 
> means recovery will look like it does one object per second or something, 
> while in fact it does 100s of metadatas on that one object but the recovery 
> doesn't list this. Also, it made old ceph df and rados df say "this pool is 
> almost empty" but when you try to dump or move the pool it takes far longer 
> than it should take to move an almost-empty pool. And the pool dump gets huge.
> 
> I would take a look at iostat output for those OSD drives and see if there 
> are 8 iops or lots more actually.
> 
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Misleading error (osd has already bound to class) when starting osd on nautilus?

2020-11-25 Thread David Caro


Forwarding here in case anyone is seeing the same/similar issue, Amit gave
really good pointers and a workaround :)


Thanks Amit!


On 11/25 16:08, Amit Ghadge wrote:
> Yes, and if you want avoid in future update this flag to 0 by $echo 0 >
> /sys/block/sdx/queue/rotational
> 
> Thanks
> 
> On Wed, Nov 25, 2020 at 4:03 PM David Caro  wrote:
> 
> >
> > Yep, you are right:
> >
> > ```
> > # cat /sys/block/sdd/queue/rotational
> > 1
> > ```
> >
> > I was looking to the code too but you got there before me :)
> >
> > https://github.com/ceph/ceph/blob/25ac1528419371686740412616145703810a561f/src/common/blkdev.cc#L222
> >
> >
> > It might be an issue with the driver then reporting the wrong data. I'll
> > look
> > into it.
> >
> > Do you mind if I reply on the list with this info? (or if you want you
> > reply)
> > I think this might help others too (and myself in the future xd)
> >
> > Thanks Amit!
> >
> > On 11/25 15:50, Amit Ghadge wrote:
> > > This might happen when the disk default sets 1
> > > in /sys/block/sdx/queue/rotational , 1 for HDD and 0 for SSD, But we not
> > > see any problem till now.
> > >
> > > -AmitG
> > >
> > > On Wed, Nov 25, 2020 at 3:08 PM David Caro  wrote:
> > >
> > > >
> > > > Hi!
> > > >
> > > > I have a nautilus ceph cluster, and today I restarted one of the osd
> > > > daemons
> > > > and spend some time trying to debug an error I was seeing in the log,
> > > > though it
> > > > seems the osd is actually working.
> > > >
> > > >
> > > > The error I was seeing is:
> > > > ```
> > > > Nov 25 09:07:43 osd15 systemd[1]: Starting Ceph object storage daemon
> > > > osd.44...
> > > > Nov 25 09:07:43 osd15 systemd[1]: Started Ceph object storage daemon
> > > > osd.44.
> > > > Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.846
> > > > 7f55395fbc80 -1 osd.44 106947 log_to_monitors {default=true}
> > > > Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.850
> > > > 7f55395fbc80 -1 osd.44 106947 mon_cmd_maybe_osd_create fail: 'osd.44
> > has
> > > > already bound to class 'ssd', can not reset class to 'hdd'; use 'ceph
> > osd
> > > > crush rm-device-class ' to remove old class first': (16) Device or
> > > > resource busy
> > > > ```
> > > >
> > > > There's no other messages in the journal so at first I thought that
> > the osd
> > > > failed to start.
> > > > But it seems to be up and working correctly anyhow.
> > > >
> > > > There's no "hdd" class in my crush map:
> > > > ```
> > > > # ceph osd crush class ls
> > > > [
> > > > "ssd"
> > > > ]
> > > > ```
> > > >
> > > > And that osd is actually of the correct class:
> > > > ```
> > > > # ceph osd crush get-device-class osd.44
> > > > ssd
> > > > ```
> > > >
> > > > ```
> > > > # uname -a
> > > > Linux osd15 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1
> > (2020-06-07)
> > > > x86_64 GNU/Linux
> > > >
> > > > # ceph --version
> > > > ceph version 14.2.5-1-g23e76c7aa6
> > > > (23e76c7aa6e15817ffb6741aafbc95ca99f24cbb) nautilus (stable)
> > > > ```
> > > >
> > > > The osd shows up in the cluster and it's receiving load, so there
> > seems to
> > > > be
> > > > no problem, but does anyone know what that error is about?
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > --
> > > > David Caro
> > > > SRE - Cloud Services
> > > > Wikimedia Foundation <https://wikimediafoundation.org/>
> > > > PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
> > > >
> > > > "Imagine a world in which every single human being can freely share in
> > the
> > > > sum of all knowledge. That's our commitment."
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> >
> > --
> > David Caro
> > SRE - Cloud Services
> > Wikimedia Foundation <https://wikimediafoundation.org/>
> > PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
> >
> > "Imagine a world in which every single human being can freely share in the
> > sum of all knowledge. That's our commitment."
> >

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Misleading error (osd has already bound to class) when starting osd on nautilus?

2020-11-25 Thread David Caro


Hi!

I have a nautilus ceph cluster, and today I restarted one of the osd daemons
and spend some time trying to debug an error I was seeing in the log, though it
seems the osd is actually working.


The error I was seeing is:
```
Nov 25 09:07:43 osd15 systemd[1]: Starting Ceph object storage daemon osd.44...
Nov 25 09:07:43 osd15 systemd[1]: Started Ceph object storage daemon osd.44.
Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.846 7f55395fbc80 -1 
osd.44 106947 log_to_monitors {default=true}
Nov 25 09:07:47 osd15 ceph-osd[12230]: 2020-11-25 09:07:47.850 7f55395fbc80 -1 
osd.44 106947 mon_cmd_maybe_osd_create fail: 'osd.44 has already bound to class 
'ssd', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class ' 
to remove old class first': (16) Device or resource busy
```

There's no other messages in the journal so at first I thought that the osd
failed to start.
But it seems to be up and working correctly anyhow.

There's no "hdd" class in my crush map:
```
# ceph osd crush class ls
[
"ssd"
]
```

And that osd is actually of the correct class:
```
# ceph osd crush get-device-class osd.44
ssd
```

```
# uname -a
Linux osd15 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 
GNU/Linux

# ceph --version
ceph version 14.2.5-1-g23e76c7aa6 (23e76c7aa6e15817ffb6741aafbc95ca99f24cbb) 
nautilus (stable)
```

The osd shows up in the cluster and it's receiving load, so there seems to be
no problem, but does anyone know what that error is about?


Thanks!


-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Monitor persistently out-of-quorum

2020-10-29 Thread David Caro

] **
> PriorityFiles   Size Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
> Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
> Comp(cnt) Avg(sec) KeyIn KeyDrop
> ---
> User  0/00.00 KB   0.0  0.0 0.0  0.0   0.0  0.0   
> 0.0   0.0  0.0  1.4  0.00  0.00 1
> 0.001   0  0
> Uptime(secs): 0.0 total, 0.0 interval
> Flush(GB): cumulative 0.000, interval 0.000
> AddFile(GB): cumulative 0.000, interval 0.000
> AddFile(Total Files): cumulative 0, interval 0
> AddFile(L0 Files): cumulative 0, interval 0
> AddFile(Keys): cumulative 0, interval 0
> Cumulative compaction: 0.00 GB write, 0.22 MB/s write, 0.00 GB read, 0.00 
> MB/s read, 0.0 seconds
> Interval compaction: 0.00 GB write, 0.22 MB/s write, 0.00 GB read, 0.00 MB/s 
> read, 0.0 seconds
> Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 
> level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for 
> pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 
> memtable_compaction, 0 memtable_slowdown, interval 0 total count
> 
> ** File Read Latency Histogram By Level [default] **
> 
> ** Compaction Stats [default] **
> LevelFiles   Size Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) 
> Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) 
> Avg(sec) KeyIn KeyDrop
> 
>   L0  2/02.61 KB   0.5  0.0 0.0  0.0   0.0  0.0   
> 0.0   1.0  0.0  1.4  0.00  0.00 1
> 0.001   0  0
>   L6  1/02.39 KB   0.0  0.0 0.0  0.0   0.0  0.0   
> 0.0   0.0  0.0  0.0  0.00  0.00 0
> 0.000   0  0
>  Sum  3/05.00 KB   0.0  0.0 0.0  0.0   0.0  0.0   
> 0.0   1.0  0.0  1.4  0.00  0.00 1
> 0.001   0  0
>  Int  0/00.00 KB   0.0  0.0 0.0  0.0   0.0  0.0   
> 0.0   0.0  0.0  0.0  0.00  0.00 0
> 0.000   0  0
> 
> ** Compaction Stats [default] **
> Priority    Files   Size Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
> Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
> Comp(cnt) Avg(sec) KeyIn KeyDrop
> ---
> User  0/00.00 KB   0.0  0.0 0.0  0.0   0.0  0.0   
> 0.0   0.0  0.0  1.4  0.00  0.00 1
> 0.001   0  0
> Uptime(secs): 0.0 total, 0.0 interval
> Flush(GB): cumulative 0.000, interval 0.000
> AddFile(GB): cumulative 0.000, interval 0.000
> AddFile(Total Files): cumulative 0, interval 0
> AddFile(L0 Files): cumulative 0, interval 0
> AddFile(Keys): cumulative 0, interval 0
> Cumulative compaction: 0.00 GB write, 0.21 MB/s write, 0.00 GB read, 0.00 
> MB/s read, 0.0 seconds
> Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s 
> read, 0.0 seconds
> Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 
> level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for 
> pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 
> memtable_compaction, 0 memtable_slowdown, interval 0 total count
> 
> ** File Read Latency Histogram By Level [default] **
> 
> 2020-10-28 17:17:13.253 7eff1f7cd1c0  0 mon.mgmt03 does not exist in monmap, 
> will attempt to join an existing cluster
> 2020-10-28 17:17:13.254 7eff1f7cd1c0  0 using public_addr v2:10.2.1.1:0/0 -> 
> [v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]
> 2020-10-28 17:17:13.254 7eff1f7cd1c0  0 starting mon.mgmt03 rank -1 at public 
> addrs [v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0] at bind addrs 
> [v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0] mon_data 
> /var/lib/ceph/mon/ceph-mgmt03 fsid 374aed9e-5fc1-47e1-8d29-4416f7425e76
> 2020-10-28 17:17:13.256 7eff1f7cd1c0  1 mon.mgmt03@-1(???) e2 preinit fsid 
> 374aed9e-5fc1-47e1-8d29-4416f7425e76
> 2020-10-28 17:17:13.256 7eff1f7cd1c0  1 mon.mgmt03@-1(???) e2  
> initial_members mgmt01,mgmt02,mgmt03, filtering seed monmap
> 2020-10-28 17:17:13.256 7eff1f7cd1c0  1 mon.mgmt03@-1(???) e2 preinit clean 
> up potentially inconsistent store state
> 2020-10-28 17:17:13.258 7eff1f7cd1c0  0 -- 
> [v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0] send_to message mon_probe(probe 
> 374aed9e-5fc1-47e1-8d29-4416f7425e76 name mgmt03 new mon_release 14) v7 with 
> empty dest
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-28 Thread David Caro


Hi Wladimir, according to the logs you first sent it seems that there is an
authentication issue (the osd daemon not being able to fetch the mon config):

> жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
> /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
> (--no-mon-config to skip)
> жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
> exited, code=exited, status=1/FAILURE


The file it fails to load the keyring from is where the auth details for the
osd daemon should be in.
Some more info here:
  https://docs.ceph.com/en/latest/man/8/ceph-authtool/
  https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/
  https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/
  (specifically step 5)

I'm not sure if you were able to fix it or not, but I'd start trying to get
that fixed before playing with ceph-volume.


On 10/27 10:24, Wladimir Mutel wrote:
> Dear David,
> 
> I assimilated most of my Ceph configuration into the cluster itself as this 
> feature was announced by Mimic.
> I see some fsid in [global] section of /etc/ceph/ceph.conf , and some key in 
> [client.admin] section of /etc/ceph/ceph.client.admin.keyring
> The rest is pretty uninteresting, some minimal adjustments in config file and 
> cluster's config dump.
> 
> Looking into Python scripts of ceph-volume, I noticed that tmpfs is mounted 
> during the run "ceph-colume lvm activate",
> and "ceph-bluestore-tool prime-osd-dir" is started from the same script 
> afterwards.
> Should I try starting "ceph-volume lvm activate" in some manual way to see 
> where it stumbles and why ?
> 
> David Caro wrote:
> > Hi Wladim,
> > 
> > If the "unable to find keyring" message disappeared, what was the error 
> > after that fix?
> > 
> > If it's still failing to fetch the mon config, check your authentication 
> > (you might have to add the osd key to the keyring again), and/or that the 
> > mons ips are correct in your osd ceph.conf file.
> > 
> > On 23 October 2020 16:08:02 CEST, Wladimir Mutel  wrote:
> > > Dear all,
> > > 
> > > after breaking my experimental 1-host Ceph cluster and making one its
> > > pg 'incomplete' I left it in abandoned state for some time.
> > > Now I decided to bring it back into life and found that it can not
> > > start one of its OSDs (osd.1 to name it)
> > > 
> > > "ceph osd df" shows :
> > > 
> > > ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
> > > AVAIL%USE   VAR   PGS  STATUS
> > > 0hdd0   1.0  2.7 TiB  1.6 TiB  1.6 TiB  113 MiB  4.7
> > > GiB  1.1 TiB  59.77  0.69  102  up
> > > 1hdd  2.84549 0  0 B  0 B  0 B  0 B  0
> > > B  0 B  0 00down
> > > 2hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.8
> > > GiB  275 GiB  90.58  1.05  176  up
> > > 3hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.9
> > > GiB  271 GiB  90.69  1.05  185  up
> > > 4hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   63 MiB  4.2
> > > GiB  263 GiB  90.98  1.05  184  up
> > > 5hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   52 MiB  3.8
> > > GiB  263 GiB  90.96  1.05  178  up
> > > 6hdd  2.53400   1.0  2.5 TiB  2.3 TiB  2.3 TiB  173 MiB  5.2
> > > GiB  228 GiB  91.21  1.05  178  up
> > > 7hdd  2.53400   1.0  2.5 TiB  2.3 TiB  2.3 TiB  147 MiB  5.2
> > > GiB  230 GiB  91.12  1.05  168  up
> > >  TOTAL   19 TiB   17 TiB   16 TiB  662 MiB   31 GiB  2.6 TiB  86.48
> > > MIN/MAX VAR: 0.69/1.05  STDDEV: 10.90
> > > 
> > > "ceph device ls" shows :
> > > 
> > > DEVICE  HOST:DEV  DAEMONS
> > >  LIFE EXPECTANCY
> > > GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380  p10s:nvme0n1  osd.1 osd.2
> > > osd.3 osd.4 osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVSTp10s:sdd  osd.1
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRAp10s:sda  osd.6
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NTp10s:sdb  osd.7
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74p10s:sde  osd.2
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1Fp10s:sdh  osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87p10s:sdf  osd.3
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-26 Thread David Caro

-
>жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon
>osd.1...
>жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon
>osd.1.
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
>(--no-mon-config to skip)
>жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
>exited, code=exited, status=1/FAILURE
>жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Failed with result
>'exit-code'.
>
>And so my question is, how to make this OSD known again to Ceph cluster
>without recreating it anew with ceph-volume ?
>I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount
>point filled with appropriate files and symlinks, except of
>"/var/lib/ceph/osd/ceph-1", 
>which is just an empty folder not mounted anywhere.
>I tried to run
>
>"ceph-bluestore-tool prime-osd-dir --dev
>/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202
>--path 
>/var/lib/ceph/osd/ceph-1"
>
>it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs
>mount, and these files belonged to root. I changed owner of these files
>into ceph.ceph , 
>I created appropriate symlinks for block and block.db but ceph-osd@1
>did not want to start either. Only "unable to find keyring" messages
>disappeared.
>
>Please give any help on where to move next.
>Thanks in advance for your help.
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph rbox test on passive compressed pool

2020-09-06 Thread David Caro

The hints have to be given from the client side as far as I understand, can you 
share the client code too?

Also,not seems that there's no guarantees that it will actually do anything 
(best effort I guess):
https://docs.ceph.com/docs/mimic/rados/api/librados/#c.rados_set_alloc_hint

Cheers

On 6 September 2020 15:59:01 BST, Marc Roos  wrote:
>
>
>I have been inserting 10790 exactly the same 64kb text message to a 
>passive compressing enabled pool. I am still counting, but it looks
>like 
>only half the objects are compressed.  
>
>mail/b08c3218dbf1545ff43052412a8e mtime 2020-09-06 16:27:39.00,
>
>size 63580
>mail/00f6043775f1545ff43052412a8e mtime 2020-09-06 16:25:57.00,
>
>size 525
>mail/b875f40571f1545ff43052412a8e mtime 2020-09-06 16:25:53.00,
>
>size 63580
>mail/e87c120b19f1545ff43052412a8e mtime 2020-09-06 16:24:25.00,
>
>size 525
>
>I am not sure if this should be expected from passive, these docs[1] 
>hint that passive 'compress if hinted COMPRESSIBLE'. From that I would 
>conclude that all text messages should be compressed. 
>A previous test with a 64kb gzip attachment seemed to not compress, 
>although I did not look at all object sizes.
>
>
>
>on 14.2.11
>
>[1]
>https://documentation.suse.com/ses/5.5/html/ses-all/ceph-pools.html#sec-ceph-pool-compression
>https://docs.ceph.com/docs/mimic/rados/operations/pools/
>
>
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Multipath and cephadm

[ceph-users] Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

[ceph-users] Re: OSD swapping on Pacific

[ceph-users] Re: OSD swapping on Pacific

[ceph-users] Re: All OSDs on one host down

[ceph-users] Re: ceph and openstack throttling experience

[ceph-users] Re: ceph and openstack throttling experience

[ceph-users] Re: Integration of openstack to ceph

[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

[ceph-users] Re: Ceph cluster not recover after OSD down

[ceph-users] Re: Ceph cluster not recover after OSD down

[ceph-users] Re: Cannot create issue in bugtracker

[ceph-users] Re: what-does-nosuchkey-error-mean-while-subscribing-for-notification-in-ceph

[ceph-users] How to handle bluestore fragmentation

[ceph-users] Re: v14.2.19 Nautilus released

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

[ceph-users] Re: Metadata for LibRADOS

[ceph-users] Re: Need Clarification on Maintenance Shutdown Procedure

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

[ceph-users] Re: Increase number of objects in flight during recovery

[ceph-users] Re: Misleading error (osd has already bound to class) when starting osd on nautilus?

[ceph-users] Misleading error (osd has already bound to class) when starting osd on nautilus?

[ceph-users] Re: Monitor persistently out-of-quorum

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

[ceph-users] Re: ceph rbox test on passive compressed pool

27 matches

Site Navigation

Mail list logo

Footer information