[ceph-users] NFS - HA and Ingress completion note?

2023-10-17 Thread andreas
NFS - HA and Ingress:  [ https://docs.ceph.com/en/latest/mgr/nfs/#ingress ] 

Referring to Note#2, is NFS high-availability functionality considered complete 
(and stable)?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Duplicate help statements in Prometheus metrics in 16.2.13

2023-06-05 Thread Andreas Haupt
Dear all,

after the update to CEPH 16.2.13 the Prometheus exporter is wrongly
exporting multiple metric help & type lines for ceph_pg_objects_repaired:

[mon1] /root #curl -sS http://localhost:9283/metrics
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="34"} 0.0
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="33"} 0.0
# HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count
# TYPE ceph_pg_objects_repaired counter
ceph_pg_objects_repaired{poolid="32"} 0.0
[...]

This annoys our exporter_exporter service so it rejects the export of ceph
metrics. Is this a known issue? Will this be fixed in the next update?

Cheers,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www.zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD snapshot mirror syncs all snapshots

2023-04-12 Thread Andreas Teuchert



Hello,

I setup two-way snapshot-based RBD mirroring between two Ceph clusters.

After enabling mirroring for an image that already had regular snapshots 
independently from RBD mirror on the source cluster, the image and all 
snapshots were synced to the destination cluster.


Is there a way to avoid having all snapshots being synced? We only need 
the latest version of the image on the destination cluster and the 
snapshots add around 200% disk space overhead on average.


Best regards,

Andreas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph rbd clients surrender exclusive lock in critical situation

2023-01-19 Thread Andreas Teuchert

Hi Frank,

one thing that might be relevant here: If you disable transparent lock 
transitions, you cannot create snapshots of images that are in use in 
such a way.


This may or may not be relevant in your case. I'm just mentioning it 
because I myself was surprised by that.


Best regards,

Andreas

On 19.01.23 12:50, Frank Schilder wrote:

Hi Ilya,

thanks for the info, it did help. I agree, its the orchestration layer's 
responsibility to handle things right. I have a case open already with support 
and it looks like there is indeed a bug on that side. I was mainly after a way 
that ceph librbd clients could offer a safety net in case such bugs occur. Its 
a bit like the four-eyes principle, having an orchestration layer do things 
right is good, but having a second instance confirming the same thing is much 
better. A bug in one layer will not cause a catastrophe, because the second 
layer catches it.

I'm not sure if the rbd lock capabilities are sufficiently powerful to provide 
a command-line interface to that. The flag RBD_LOCK_MODE_EXCLUSIVE seems the 
only way and if qemu is not using it, there seems not a lot one can do in 
scripts.

Thanks for your help and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cannot create snapshots if RBD image is mapped with -oexclusive

2022-12-08 Thread Andreas Teuchert

Hello,

in case anyone finds this post while trying to find an answer to the 
same question, I believe the answer is here:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/DBJRYTMQURANFFWSS4QDCKD5KULJQ46X/

As far as I understand it:

Creating a snapshot requires to acquire the lock and with "-oexclusive" 
the RBD client is not going to release it. So this is not a bug.


Best regards,

Andreas

On 30.11.22 12:58, Andreas Teuchert wrote:

Hello,

creating snapshots of RBD images that are mapped with -oexclusive seems 
not to be possible:


# rbd map -oexclusive rbd.blu1/andreasspielt-share11
/dev/rbd7
# rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01
Creating snap: 0% complete...failed.
rbd: failed to create snapshot: (30) Read-only file system
# rbd unmap rbd.blu1/andreasspielt-share11
# rbd map rbd.blu1/andreasspielt-share11
/dev/rbd7
rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01
Creating snap: 100% complete...done.

I was surprised by this behavior and the documentation seems not to 
mention this.


Is this on purpose or a bug?

Ceph version is 17.2.5, RBD client is Ubuntu 22.04 with kernel 
5.15.0-52-generic.


Best regards,

Andreas



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cannot create snapshots if RBD image is mapped with -oexclusive

2022-11-30 Thread Andreas Teuchert

Hello,

creating snapshots of RBD images that are mapped with -oexclusive seems 
not to be possible:


# rbd map -oexclusive rbd.blu1/andreasspielt-share11
/dev/rbd7
# rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01
Creating snap: 0% complete...failed.
rbd: failed to create snapshot: (30) Read-only file system
# rbd unmap rbd.blu1/andreasspielt-share11
# rbd map rbd.blu1/andreasspielt-share11
/dev/rbd7
rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01
Creating snap: 100% complete...done.

I was surprised by this behavior and the documentation seems not to 
mention this.


Is this on purpose or a bug?

Ceph version is 17.2.5, RBD client is Ubuntu 22.04 with kernel 
5.15.0-52-generic.


Best regards,

Andreas



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Andreas Haupt
Hi Giuseppe,

On Tue, 2022-10-25 at 07:54 +, Lo Re  Giuseppe wrote:
> “””
> 
> In the mgr logs I see:
> “””
> 
> debug 2022-10-20T23:09:03.859+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
> pool 2 has overlapping roots: {-60, -1}

This is unrelated, I asked the same question some days ago:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/OZTOVT2TXEA23NI2TPTWD3WU2AZM6YSH/

Starting with Pacific the autoscaler is unable to deal with mixed pools
spread over different storage device classes. Although this is documented,
I'd call it a regression - the same kind of setup still worked with
autoscaler in Octopus.

You will find the overlapping roots by listing the device-class-based
shadow entries:

ceph osd crush tree --show-shadow


Regarding your problem, you need to look for further errors. Last time an
mgr module failed here it was due to some missing python modules ...

Something suspicious in the output of "ceph crash ls" ?

Cheers,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Autoscaler stopped working after upgrade Octopus -> Pacific

2022-10-11 Thread Andreas Haupt
Dear all,

just upgraded our cluster from Octopus to Pacific (16.2.10). This
introduced an error in autoscaler:

2022-10-11T14:47:40.421+0200 7f3ec2d03700  0 [pg_autoscaler ERROR root] pool 17 
has overlapping roots: {-4, -1}
2022-10-11T14:47:40.423+0200 7f3ec2d03700  0 [pg_autoscaler ERROR root] pool 22 
has overlapping roots: {-4, -1}
2022-10-11T14:47:40.423+0200 7f3ec2d03700  0 [pg_autoscaler ERROR root] pool 23 
has overlapping roots: {-4, -1}
2022-10-11T14:47:40.427+0200 7f3ec2d03700  0 [pg_autoscaler ERROR root] pool 27 
has overlapping roots: {-6, -4, -1}
2022-10-11T14:47:40.428+0200 7f3ec2d03700  0 [pg_autoscaler ERROR root] pool 28 
has overlapping roots: {-6, -4, -1}

Autoscaler status is empty:

[cephmon1] /root # ceph osd pool autoscale-status
[cephmon1] /root # 


On https://forum.proxmox.com/threads/ceph-overlapping-roots.104199/ I
found something similar:

---
I assume that you have at least one pool that still has the
"replicated_rule" assigned, which does not make a distinction between the
device class of the OSDs.

This is why you see this error. The autoscaler cannot decide how many PGs
the pools need. Make sure that all pools are assigned a rule that limit
them to a device class and the errors should stop.
---

Indeed, we have a mixed cluster (hdd + ssd) with some pools spanning hdd-
only, some ssd-only and some both (ec & replicated) which don't care about
the storage device class (e.g. via default "replicated_rule"):

[cephmon1] /root # ceph osd crush rule ls
replicated_rule
ssd_only_replicated_rule
hdd_only_replicated_rule
default.rgw.buckets.data.ec42
test.ec42
[cephmon1] /root #


That worked flawlessly until Octopus. Any idea how to make autoscaler work
again with that kind of setup? Can I really have pools on one device class
only in Pacific in order to get a functional autoscaler?

Thanks,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] tcmu-runner not in EPEL-8

2022-02-18 Thread Andreas Haupt
Dear all,

does anyone know by chance, why tcmu-runner is not available in EPEL-
8?Fedora maintains a SRPM for e.g. RawHide & 36:

https://kojipkgs.fedoraproject.org//packages/tcmu-runner/1.5.4/4.fc36/src/tcmu-runner-1.5.4-4.fc36.src.rpm

This one builds flawlessly under mock for EL8, so actually no problem
compiling it on our own. But it would be much more convenient to have
it in EPEL-8, as problably no one will run productive iSCSI gateways
under Fedora ;-)

Cheers,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to troubleshoot monitor node

2022-01-10 Thread Andreas Feile

Hi all,

I've set up a 6-node ceph cluster to learn how ceph works and what I can 
do with it. However, I'm new to ceph, so if the answer to one of my 
questions is RTFM, point me to the right place.


My problem is this:
The cluster consists of 3 mons and 3 osds. Even though the dashboard 
shows all green, the mon01 has a problem: the ceph command hangs and 
never comes back:



root@mon01:~# ceph --version
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus 
(stable)


root@mon01:~# ceph -s
^CCluster connection aborted


To see what happens I tried this:

root@mon01:~# ceph -s --debug-ms=1
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 Processor -- start
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- start start
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 --2- >> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 
0x7f4a28066e40 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- --> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] -- mon_getmap magic: 0 
v1 -- 0x7f4a28067330 con 0x7f4a28066a30
2022-01-10T15:51:30.434+0100 7f4a2659c700 1 -- >> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 
msgr2=0x7f4a28066e40 unknown :-1 s=STATE_CONNECTING_RE l=0).process 
reconnect failed to v2:192.168.14.48:3300/0

...


Indeed, both ports are closed:

root@mon01:~# nc -z 192.168.14.48 6789; echo $?
1
root@mon01:~# nc -z 192.168.14.48 3300; echo $?
1

In /var/log/ceph/cephadm.log, I cannot see any useful infos about what 
might go wrong.


I'm not aware of anything I could have done to trigger this error, and I 
wonder what I could do next to repair this monitor node.


Any hint is appreciated.

--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Huge headaches with NFS and ingress HA failover

2021-07-21 Thread Andreas Weisker

Hi,

we recently set up a new pacific cluster with cephadm.
Deployed nfs on two hosts and ingress on two other hosts. (ceph orch 
apply for nfs and ingress like on the docs page)


So far so good. ESXi with NFS41 connects, but the way ingress works 
confuses me.


It distributes clients static to one nfs daemon by their ip addresses. 
(I know nfs won't like it if the client switches all the time, because 
of reservations.)
Three of our ESXi servers seem to connect to host1, the 4th one to the 
other. This leads to problem in ESXi where it doesn't recognize the 
store as the same like the others. I can't find on how exactly ESXi 
calculates that, but there must be different information coming from 
these nfs daemons. nfs-ganesha doesn't behave exactly the same on these 
hosts.


Besides that, I wanted to do some failover tests, before the cluster 
goes live. I stopped stopped on nfs server, but ingress (haproxy) does't 
seem to care.
On the haproxy stats page, both backends are listed with "no check", so 
there is no failover happening to the NFS clients. haproxy does not fail 
over to the other host. Datastores are disconnected and unable to 
connect new ones.


How is ingress supposed to detect a failed nfs server and how to tell 
ganesha to be identical to each other?


Bonus question: Why can't keepalived not just manage nfs-ganesha on two 
hosts instead of haproxy? It would eliminate an extra network hop.


Hope someone has a few insights to that. Spent way too much time to 
switch to some other solution.


Best regards,

Andreas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus CentOS-7 rpm dependencies

2021-05-31 Thread Andreas Haupt
Dear all,

ceph-mgr-dashboard-15.2.13-0.el7.noarch contains three rpm dependencies
that cannot be resolved here (not part of CentOS & EPEL 7):

python3-cherrypy
python3-routes
python3-jwt

Does anybody know where they are expected to come from?

Thanks,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mon db growing. over 500Gb

2021-03-11 Thread Andreas John
Hello,

I also observed excessively growing mon DB in case of recovery. Luckily
we were able to solve it by exdending the mon db disk.

Without having the chance to re-check: The options nobackfill and
norecover might cause that behavior.It feelds like mon holds data that
cannot be flushed to an OSD.


rgds,

j.



On 11.03.21 10:47, Marc wrote:
> From what I have read here in the past, growing monitor db is related to not 
> having pg's in  'clean active' state
>
>
>> -Original Message-
>> From: ricardo.re.azev...@gmail.com 
>> Sent: 11 March 2021 00:59
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] mon db growing. over 500Gb
>>
>> Hi all,
>>
>>
>>
>> I have a fairly pressing issue. I had a monitor fall out of quorum
>> because
>> it ran out of disk space during rebalancing from switching to upmap. I
>> noticed all my monitor store.db started taking up nearly all disk space
>> so I
>> set noout, nobackfill and norecover and shutdown all the monitor
>> daemons.
>> Each store.db was at:
>>
>>
>>
>> mon.a 89GB (the one that firt dropped out)
>>
>> mon.a 400GB
>>
>> mon.c 400GB
>>
>>
>> I tried setting mon_compact_on_start. This brought  mon.a down to 1GB.
>> Cool.
>> However, when I try it on the other monitors it increased the db size
>> ~1Gb/10s so I shut them down again.
>>
>> Any idea what is going on? Or how can I shrik back down the db?
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practices for OSD on bcache

2021-03-02 Thread Andreas John
Hello,

we clearly understood that. But in ceph we have the concept of "OSD
Journal on very fast different disk".

I just asked what in theory should be the advantage of caching on
bcache/NVME vs. Journal/NVME. I would not expect any performance
advantage for bcache (if the Journal is reasonably sized).

I might be totally wrong, though. If you just do it, because you don't
want to re-create (or modify)  the OSDs, it's not worth the effort IMHO.


rgds,

derjohn


On 02.03.21 10:48, Norman.Kern wrote:
> On 2021/3/2 上午5:09, Andreas John wrote:
>> Hallo,
>>
>> do you expect that to be better (faster), than having the OSD's Journal
>> on a different disk (ssd, nvme) ?
> No, I created the OSD storage devices using bcache devices.
>>
>> rgds,
>>
>> derjohn
>>
>>
>> On 01.03.21 05:37, Norman.Kern wrote:
>>> Hi, guys
>>>
>>> I am testing ceph on bcache devices,  I found the performance is not good 
>>> as expected. Does anyone have any best practices for it?  Thanks.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Best practices for OSD on bcache

2021-03-01 Thread Andreas John
Hallo,

do you expect that to be better (faster), than having the OSD's Journal
on a different disk (ssd, nvme) ?


rgds,

derjohn


On 01.03.21 05:37, Norman.Kern wrote:
> Hi, guys
>
> I am testing ceph on bcache devices,  I found the performance is not good as 
> expected. Does anyone have any best practices for it?  Thanks.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 10G stackabe lacp switches

2021-02-16 Thread Andreas John
Hello,

this is not an answer to the question directly, but you could consider
the following to double bandwidth:


* Run each ceph node with two NICs, each has an own IP, e.g. one node
has 192.0.2.10/24 and 192.0.2.11/24

* In ceph.conf you bind 50% of the OSDs to each of those IPs:

[osd.XY]
...
public_addr = ...
cluster_addr = 192.0.2.x

* With an equally distributed traffic (enough OSD) should nearly double
your bandwidth

* To get redundancy, you could extend that config using keepalived/vrrp
to switchover the IP of a failing NIC/switch to the other one.


I am pretty aware that we also have linux bonding with mode slb, but to
my experience that didn't work very well with COTS switches, maybe due
to ARP learing issues. (We ended up buying Juniper QFX-5100 with MLAG
support).


Best Regards,

Andreas


P.S. I didn't try out the setup from above yet. If anyone did already or
will do, I would be happy about feedback.


On 16.02.21 16:56, Mario Giammarco wrote:
> Il giorno lun 15 feb 2021 alle ore 15:16 mj  ha
> scritto:
>
>>
>> On 2/15/21 1:38 PM, Eneko Lacunza wrote:
>>> Do you really need MLAG? (the 2x10G bandwith?). If not, just use 2
>>> simple switches (Mikrotik for example) and in Proxmox use an
>>> active-pasive bond, with default interface in all nodes to the same
>> switch.
>>
>> Since we are now on SSD OSDs only, and our aim is to be able to add more
>> OSD nodes, yes: I think we should aim for more than 10G bandwidth.
>>
>> So go for 40G. LACP will not give you 2x10 magical bandwidth doubling.
> BTW: I am using mikrotik 10g switches and they have great value.
> BTW2: if you use Proxmox you do not need LACP you can use linux round robin
> support that has the same performance of LACP and it does not require
> switches support.
>
>
>> Thanks!
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to reset an OSD

2021-01-13 Thread Andreas John
Hello,

I suspect there was unwritten data in RAM which didn't make it to the
disk. This shoudn't happen, that's why the journal is in place.

If you have size=2 in you pool, there is one copy on the other host. Do
delete the OSD you could probably do

ceph osd crush remove osd.x

ceph osd rm osd.x

ceph auth del osd.x

maybe "wipefs -a /dev/sdxxx"  or dd if=/dev/zero of=dev/sdxx count=1
bs=1m ...


Then you should be able deploy the disk again with the tool that you
used originally. The disk should be "fresh".


rgds,

derjohn.






On 13.01.21 15:45, Pfannes, Fabian wrote:
> failed: (22) Invalid argument

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Proxmox+Ceph Benchmark 2020

2020-10-14 Thread Andreas John
Hello Alwin,

do you know if it makes difference to disable "all green computing" in
the BIOS vs. settings the governor to "performance" in the OS?

Of not, I think I will will have some service cycles to set our
proxmox-ceph nodes correctly.


Best Regards,

Andreas


On 14.10.20 08:39, Alwin Antreich wrote:
> On Tue, Oct 13, 2020 at 11:19:33AM -0500, Mark Nelson wrote:
>> Thanks for the link Alwin!
>>
>>
>> On intel platforms disabling C/P state transitions can have a really big
>> impact on IOPS (on RHEL for instance using the network or performance
>> latency tuned profile).  It would be very interesting to know if AMD EPYC
>> platforms see similar benefits.  I don't have any in house, but if you
>> happen to have a chance it would be an interesting addendum to your report.
> Thanks for the suggestion. I indeed did a run before disabling the C/P
> states in the BIOS. But unfortunately I didn't keep the results. :/
>
> As far as I remember though, there was a visible improvement after
> disabling them.
>
> I will have a look, once I have some time to do some more benchmarks.
>
> --
> Cheers,
> Alwin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph test cluster, how to estimate performance.

2020-10-13 Thread Andreas John
Hello Daniel,

yes Samsung "Pro" SSD series aren't to much "pro", especially when it's
about write IOPS. I would tend to say get some Intel S4510 if you can
afford it. It you can't you can still try to activate overprovisioning
on the SSD, I would trend to say reserve 10-30% of the SSD for wear
leveling (writing). First check the number of sectors with hdparm -N
/dev/sdX then set a permanent HPA (host protected area) to the disk. The
"p" and no space is important.

hdparm -Np${SECTORS} --yes-i-know-what-i-am-x /dev/sdX

Wait a little (!), power cycle and re-check the disk with hdparm -N
/dev/sdX. My Samsung 850 Pro are a little reluctant to accept the
setting, but after some tries or a little waiting the change gets permanent.

At least the Samsung 850 pro stopped to die suddenly with that setting.
Without it the SSD occasionally disconnected from the bus and reappeared
after power cycle. I suspect it ran of of wear something.


HTH,

derjohn


On 13.10.20 08:41, Martin Verges wrote:
> Hello Daniel,
>
> just throw away your crappy Samsung SSD 860 Pro. It won't work in an
> acceptable way.
>
> See
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit?usp=sharing
> for a performance indication of individual disks.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Di., 13. Okt. 2020 um 07:31 Uhr schrieb Daniel Mezentsev > :
>> Hi Ceph users,
>>
>> Im working on  common lisp client utilizing rados library. Got some
>> results, but don't know how to estimate if i am getting correct
>> performance. I'm running test cluster from laptop - 2 OSDs -  VM, RAM
>> 4Gb, 4 vCPU each, monitors and mgr are running from the same VM(s). As
>> for storage, i have Samsung SSD 860 Pro, 512G. Disk is splitted into 2
>> logical volumes (LVMs), and that volumes are attached to VMs. I know
>> that i can't expect too much from that layout, just want to know if im
>> getting adequate numbers. Im doing read/write operations on very small
>> objects - up to 1kb. In async write im getting ~7.5-8.0 KIOPS.
>> Synchronouse read - pretty much the same 7.5-8.0 KIOPS. Async read is
>> segfaulting don't know why. Disk itself is capable to deliver well
>> above 50 KIOPS. Difference is magnitude. Any info is more welcome.
>>   Daniel Mezentsev, founder
>> (+1) 604 313 8592.
>> Soleks Data Group.
>> Shaping the clouds.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-10 Thread Andreas John
mary: idle
>>> 2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
>>> [INF] : scrub complete with tag '1405e5c7-3ecf-4754-918e-129e9d101f7a'
>>> 2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
>>> [INF] : scrub completed for path: /frames/postO3/hoft
>>> 2020-09-17T15:02:32.945-0500 7f39b5215700  0 log_channel(cluster) log
>>> [INF] : scrub summary: idle
>>>
>>>
>>> After the scrub completed, access to the file (ls or rm) continue to
>>> hang.  The MDS reports slow reads:
>>>
>>> 2020-09-17T15:11:05.654-0500 7f39b9a1e700  0 log_channel(cluster) log
>>> [WRN] : slow request 481.867381 seconds old, received at
>>> 2020-09-17T15:03:03.788058-0500: client_request(client.451432:11309
>>> getattr pAsLsXsFs #0x105b1c0 2020-09-17T15:03:03.787602-0500
>>> caller_uid=0, caller_gid=0{}) currently dispatched
>>>
>>> Does anyone have any suggestions on how else to clean up from a
>>> permanently lost object?
>>>
>>> --Mike
>>>
>>> On 9/16/20 2:03 AM, Frank Schilder wrote:
>>>> Sounds similar to this one: https://tracker.ceph.com/issues/46847
>>>>
>>>> If you have or can reconstruct the crush map from before adding the
>>>> OSDs, you might be able to discover everything with the temporary
>>>> reversal of the crush map method.
>>>>
>>>> Not sure if there is another method, i never got a reply to my
>>>> question in the tracker.
>>>>
>>>> Best regards,
>>>> =
>>>> Frank Schilder
>>>> AIT Risø Campus
>>>> Bygning 109, rum S14
>>>>
>>>> 
>>>> From: Michael Thomas 
>>>> Sent: 16 September 2020 01:27:19
>>>> To: ceph-users@ceph.io
>>>> Subject: [ceph-users] multiple OSD crash, unfound objects
>>>>
>>>> Over the weekend I had multiple OSD servers in my Octopus cluster
>>>> (15.2.4) crash and reboot at nearly the same time.  The OSDs are
>>>> part of
>>>> an erasure coded pool.  At the time the cluster had been busy with a
>>>> long-running (~week) remapping of a large number of PGs after I
>>>> incrementally added more OSDs to the cluster.  After bringing all
>>>> of the
>>>> OSDs back up, I have 25 unfound objects and 75 degraded objects. 
>>>> There
>>>> are other problems reported, but I'm primarily concerned with these
>>>> unfound/degraded objects.
>>>>
>>>> The pool with the missing objects is a cephfs pool.  The files
>>>> stored in
>>>> the pool are backed up on tape, so I can easily restore individual
>>>> files
>>>> as needed (though I would not want to restore the entire filesystem).
>>>>
>>>> I tried following the guide at
>>>> https://docs.ceph.com/docs/octopus/rados/troubleshooting/troubleshooting-pg/#unfound-objects.
>>>>
>>>>  I found a number of OSDs that are still 'not queried'. 
>>>> Restarting a
>>>> sampling of these OSDs changed the state from 'not queried' to
>>>> 'already
>>>> probed', but that did not recover any of the unfound or degraded
>>>> objects.
>>>>
>>>> I have also tried 'ceph pg deep-scrub' on the affected PGs, but never
>>>> saw them get scrubbed.  I also tried doing a 'ceph pg
>>>> force-recovery' on
>>>> the affected PGs, but only one seems to have been tagged accordingly
>>>> (see ceph -s output below).
>>>>
>>>> The guide also says "Sometimes it simply takes some time for the
>>>> cluster
>>>> to query possible locations."  I'm not sure how long "some time" might
>>>> take, but it hasn't changed after several hours.
>>>>
>>>> My questions are:
>>>>
>>>> * Is there a way to force the cluster to query the possible locations
>>>> sooner?
>>>>
>>>> * Is it possible to identify the files in cephfs that are affected, so
>>>> that I could delete only the affected files and restore them from
>>>> backup
>>>> tapes?
>>>>
>>>> --Mike
>>>>
>>>> ceph -s:
>>>>
>>>>   cluster:
>>>>     id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad
>>>>     health: HEALTH_ERR
>>>>     1 clients failing to respond to capability release
>>>>     1 MDSs report slow requests
>>>>     25/78520351 objects unfound (0.000%)
>>>>     2 nearfull osd(s)
>>>>     Reduced data availability: 1 pg inactive
>>>>     Possible data damage: 9 pgs recovery_unfound
>>>>     Degraded data redundancy: 75/626645098 objects
>>>> degraded
>>>> (0.000%), 9 pgs degraded
>>>>     1013 pgs not deep-scrubbed in time
>>>>     1013 pgs not scrubbed in time
>>>>     2 pool(s) nearfull
>>>>     1 daemons have recently crashed
>>>>     4 slow ops, oldest one blocked for 77939 sec, daemons
>>>> [osd.0,osd.41] have slow ops.
>>>>
>>>>   services:
>>>>     mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4 (age 9d)
>>>>     mgr: ceph3(active, since 11d), standbys: ceph2, ceph4, ceph1
>>>>     mds: archive:1 {0=ceph4=up:active} 3 up:standby
>>>>     osd: 121 osds: 121 up (since 6m), 121 in (since 101m); 4
>>>> remapped pgs
>>>>
>>>>   task status:
>>>>     scrub status:
>>>>     mds.ceph4: idle
>>>>
>>>>   data:
>>>>     pools:   9 pools, 2433 pgs
>>>>     objects: 78.52M objects, 298 TiB
>>>>     usage:   412 TiB used, 545 TiB / 956 TiB avail
>>>>     pgs: 0.041% pgs unknown
>>>>  75/626645098 objects degraded (0.000%)
>>>>  135224/626645098 objects misplaced (0.022%)
>>>>  25/78520351 objects unfound (0.000%)
>>>>  2421 active+clean
>>>>  5    active+recovery_unfound+degraded
>>>>  3    active+recovery_unfound+degraded+remapped
>>>>  2    active+clean+scrubbing+deep
>>>>  1    unknown
>>>>  1    active+forced_recovery+recovery_unfound+degraded
>>>>
>>>>   progress:
>>>>     PG autoscaler decreasing pool 7 PGs from 1024 to 512 (5d)
>>>>   []
>>>> ___
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Massive Mon DB Size with noout on 14.2.11

2020-10-02 Thread Andreas John
Hello *,

thx for taking care. I read "works as designed, be sure to have disk
space for the mon available". It sounds a little odd that the growth
from 50MB to ~15GB + compaction space happens within a couple of
seconds, when two OSD rejoin the cluster. Does it matter if I have
cephfs in use? Usually I would expect to have MDS load, but does it also
cause load on the mon with many files?

My OSD map seems to have low absolute numbers:

ceph report | grep osdmap | grep committed
report 777999536
    "osdmap_first_committed": 1276,
    "osdmap_last_committed": 1781,


If a get new disks (partitions) for the mons, is there a size
recommendation? Is there a rule of thumb? BTW: Do I still need a
filesystem for the partition of the mon DB?

Beste Regards,

derjohn


On 02.10.20 16:25, Dan van der Ster wrote:
> The important metric is the difference between these two values:
>
> # ceph report | grep osdmap | grep committed
> report 3324953770
> "osdmap_first_committed": 3441952,
> "osdmap_last_committed": 3442452,
>
> The mon stores osdmaps on disk, and trims the older versions whenever
> the PGs are clean. Trimming brings the osdmap_first_committed to be
> closer to osdmap_last_committed.
> In a cluster with no PGs backfilling or recovering, the mon should
> trim that difference to be within 500-750 epochs.
>
> If there are any PGs backfilling or recovering, then the mon will not
> trim beyond the osdmap epoch when the pools were clean.
>
> So if you are accumulating gigabytes of data in the mon dir, it
> suggests that you have unclean PGs/Pools.
>
> Cheers, dan
>
>
>
>
> On Fri, Oct 2, 2020 at 4:14 PM Marc Roos  wrote:
>>
>> Does this also count if your cluster is not healthy because of errors
>> like '2 pool(s) have no replicas configured'
>> I sometimes use these pools for testing, they are empty.
>>
>>
>>
>>
>> -Original Message-
>> Cc: ceph-users
>> Subject: [ceph-users] Re: Massive Mon DB Size with noout on 14.2.11
>>
>> As long as the cluster is no healthy, the OSD will require much more
>> space, depending on the cluster size and other factors. Yes this is
>> somewhat normal.
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Massive Mon DB Size with noout on 14.2.11

2020-10-02 Thread Andreas John
Hello,

we observed massive and sudden growth of the mon db size on disk, from
50MB to 20GB+ (GB!) and thus reaching 100% disk usage on the mountpoint.

As far as we can see, it happens if we set "noout" for a node reboot:
After the node and the OSDs come back it looks like the mon db size
increased drastically.

We have 14.2.11, 10 OSD @ 2TB and cephfs in use.

Is this a known issue? Should we avoid noout?


TIA,

derjohn


-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Doing minor version update of Ceph cluster with ceph-ansible and rolling-update playbook

2020-09-28 Thread andreas . elvers+lists . ceph . io
I want to update my mimic cluster to the latest minor version using the 
rolling-update script of ceph-ansible. The cluster was rolled out with that 
setup. 

So as long as ceph_stable_release stays on the current installed version 
(mimic) the rolling update script will do only a minor update. 

Is this assumption correct? The documentation 
(https://docs.ceph.com/projects/ceph-ansible/en/latest/day-2/upgrade.html) is 
short on this.

Thanks!
- Andreas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove separate WAL device from OSD

2020-09-22 Thread Andreas John
Hello,

isnt ceph-osd -i osdnum... –flush-journal and then removing the journal
enough?



On 22.09.20 21:09, Michael Fladischer wrote:
> Hi,
>
> Is it possible to remove an existing WAL device from an OSD? I saw
> that ceph-bluestore-tool has a command bluefs-bdev-migrate, but it's
> not clear to me if this can only move a WAL device or if it can be
> used to remove it ...
>
> Regards,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unknown PGs after osd move

2020-09-22 Thread Andreas John

On 22.09.20 22:09, Nico Schottelius wrote:
[...]
> All nodes are connected with 2x 10 Gbit/s bonded/LACP, so I'd expect at
> least a couple of hundred MB/s network bandwidth per OSD.
>
> On one server I just restarted the OSDs and now the read performance
> dropped down to 1-4 MB/s per OSD with being about 90% busy.
>
> Since nautilus we observed much longer starting times of OSDs and I
> wonder if the osd does some kind of fsck these days and delays the
> peering process because of that?
>
> The disks in question are 3.5"/10TB/6 Gbit/s SATA disks connected to an
> H800 controller - so generally speaking I do not see a reasonable
> bottleneck here.
Yes, I should! I saw in your mail:


1.)        1532 slow requests are blocked > 32 sec
    789 slow ops, oldest one blocked for 1949 sec, daemons
[osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.33,osd.35,osd.50]...
have slow ops.


An request that is blocked for > 32 sec is odd! Same goes for 1949 sec.
I my experience, they will never finish. Sometimes they go away with osd
restarts. Are those OSD the ones you relocated?


2.) client:   91 MiB/s rd, 28 MiB/s wr, 1.76k op/s rd, 686 op/s wr
    recovery: 67 MiB/s, 17 objects/s

67 MB/sec is slower than a single rotational disk can deliver.  Even 67
+ 91 MB/s is not much, especially not for an 85 OSD @ 10G cluster. The
~2500 IOPS client I/O will translate to 7500 "net" IOPS with pook size
3, maybe that is the limit.

But I guess you already know that. But before tuning, you should
probably listen to Frank's advice about the placements (See other post).
ASAP the unknown OSDs come back, the speed will probably go up due to
parallelism.


rgds,

j.










___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unknown PGs after osd move

2020-09-22 Thread Andreas John
Hey Nico,

maybe you "pinned" the IP of the OSDs in question in ceph.conf to the IP
of the old chassis?


Good Luck,

derjohn


P.S.  < 100MB/sec is a terrible performance for recovery with 85 OSDs.
Is it rotational on 1 GBit/sec network? You could set ceph osd set
nodeep-scrub to prevent too much read from the plattners and get better
recovery performance.






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unknown PGs after osd move

2020-09-22 Thread Andreas John
Hello,

On 22.09.20 20:45, Nico Schottelius wrote:
> Hello,
>
> after having moved 4 ssds to another host (+ the ceph tell hanging issue
> - see previous mail), we ran into 241 unknown pgs:

You mean, that you re-seated the OSDs into another chassis/host? Is the
crush map aware about that?

I didn't ever try that, but don't you need to cursh move it?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-22 Thread Andreas John
Hello,

https://docs.ceph.com/en/latest/rados/operations/erasure-code/

but, you could probably manually intervent, if you want an erasure coded
pool.


rgds,

j.


On 22.09.20 14:55, René Bartsch wrote:
> Am Dienstag, den 22.09.2020, 14:43 +0200 schrieb Andreas John:
>> Hello,
>>
>> yes, it does. I even comes with a GUI so manage ceph and own basic-
>> setup
>> tool. No EC support.
> What do you mean with EC?
>
> Regards,
>
> Renne
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-22 Thread Andreas John
Hello,

yes, it does. I even comes with a GUI so manage ceph and own basic-setup
tool. No EC support.

The only issue comes is with the backup stuff, which uses "vzdump" under
the hood that causes possibly  high load.

The reason is not really known yet, but some suspect that small block
sizes cause large readahead in ceph.. Use eve4pve-barc instead.

_
_

rgds

j.
On 22.09.20 14:31, René Bartsch wrote:
> Am Dienstag, den 22.09.2020, 08:50 +0200 schrieb Robert Sander:
>
>> Do you know that Proxmox is able to store VM images as RBD directly
>> in a
>> Ceph cluster?
> Does Proxmox support snapshots, backups and thin provisioning with RBD-
> VM images?
>
> Regards,
>
> Renne
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Many scrub errors after update to 14.2.10

2020-08-05 Thread Andreas Haupt
Hi *,

after updating our CEPH cluster from 14.2.9 to 14.2.10 it accumulates
scrub errors on multiple osds:

[cephmon1] /root # ceph health detail
HEALTH_ERR 6 scrub errors; Possible data damage: 6 pgs inconsistent
OSD_SCRUB_ERRORS 6 scrub errors
PG_DAMAGED Possible data damage: 6 pgs inconsistent
pg 3.69 is active+clean+inconsistent, acting [59,65,61]
pg 3.73 is active+clean+inconsistent, acting [73,88,25]
pg 12.29 is active+clean+inconsistent, acting [55,92,42]
pg 12.38 is active+clean+inconsistent, acting [150,42,13]
pg 12.46 is active+clean+inconsistent, acting [55,18,84]
pg 12.75 is active+clean+inconsistent, acting [55,155,49]

They all can easily get repaired (ceph pg repair $pg) - but I wonder
what could be the source of the problem. The cluster started with
Luminous some years ago, was updated to Mimic, then Nautilus. Never
seen this before!

OSDs are a mixture of HDD/SSD, both are affected. All on Bluestore.

Any idea? Was there maybe a code change between 14.2.9 & 14.2.10 that
could explain this? Errors in syslog look like this:

Aug  5 19:21:21 krake08 ceph-osd: 2020-08-05 19:21:21.831 7fb6b2b9d700 -1 
log_channel(cluster) log [ERR] : 12.38 scrub : stat mismatch, got 74/74 
objects, 20/20 clones, 74/74 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 
0/0 whiteouts, 182904850/172877842 bytes, 0/0 manifest objects, 0/0 
hit_set_archive bytes.
Aug  5 19:21:21 krake08 ceph-osd: 2020-08-05 19:21:21.831 7fb6b2b9d700 -1 
log_channel(cluster) log [ERR] : 12.38 scrub 1 errors
Aug  6 08:28:44 krake08 ceph-osd: 2020-08-06 08:28:44.477 7fb6b2b9d700 -1 
log_channel(cluster) log [ERR] : 12.38 repair : stat mismatch, got 76/76 
objects, 22/22 clones, 76/76 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 
0/0 whiteouts, 183166994/173139986 bytes, 0/0 manifest objects, 0/0 
hit_set_archive bytes.
Aug  6 08:28:44 krake08 ceph-osd: 2020-08-06 08:28:44.477 7fb6b2b9d700 -1 
log_channel(cluster) log [ERR] : 12.38 repair 1 errors, 1 fixed

Thanks in advance,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus to Octopus Upgrade mds without downtime

2020-05-27 Thread Andreas Schiefer

Hello,

if I understand correctly:
if we upgrade from an running nautilus cluster to octopus we have a 
downtime on an update of MDS.


Is this correct?


Mit freundlichen Grüßen / Kind regards
Andreas Schiefer
Leiter Systemadministration / Head of systemadministration


---
HOME OF LOYALTY
CRM- & Customer Loyalty Solution

by UW Service
Gesellschaft für Direktwerbung und Marketingberatung mbH
Alter Deutzer Postweg 221
51107 Koeln (Rath/Heumar)
Deutschland

Telefon : +49 221 98696 0
Telefax : +49 221 98696 5222 


i...@uw-service.de
www.hooloy.de

Amtsgericht Koeln HRB 24 768
UST-ID: DE 164 191 706
Geschäftsführer: Ralf Heim
---
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint

2020-04-22 Thread Andreas Unterkircher

Dear Yuval!


The message format you tried to use is the standard one (the one being
emitted from boto3, or any other AWS SDK [1]).
It passes the arguments using 'x-www-form-urlencoded'. For example:


Thank you for your clarification! I've previously tried it as a
x-www-form-urlencoded-body as well, but I have failed. That it was then
working using the non-standard-parameters has lead me down the wrong 
road...

But I have to admit that I'm still failing to create a topic the S3-way.

I've tried it with curl, but as well with Postman.
Even if I use your example-body, Ceph keeps telling me (at least) 
method-not-allowed.


Is this maybe because I'm using an AWS Sig v4 to authenticate?

This is the request I'm sending out:

POST / HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Accept-Encoding: identity
Date: Tue, 23 Apr 2020 05:00:35 GMT
X-Amz-Content-Sha256: 
e8d828552b412fde2cd686b0a984509bc485693a02e8c53ab84cf36d1dbb961a

Host: s3.example.com
X-Amz-Date: 2 as0200423T050035Z
Authorization: AWS4-HMAC-SHA256 
Credential=DNQXT3I8Z5MWDJ1A8YMP/20200423/de/s3/aws4_request, 
SignedHeaders=accept-encoding;content-type;date;host;x-amz-content-sha256;x-amz-date, 
Signature=fa65844ba997fe11e65be87a18f160afe1ea459892316d6060bbc663daf6eace

User-Agent: PostmanRuntime/7.24.1
Accept: */*
Connection: keep-alive

Content-Length: 303

Name=ajmmvc-1_topic_1&
Attributes.entry.2.key=amqp-exchange&
Attributes.entry.1.key=amqp-ack-level&
Attributes.entry.2.value=amqp.direct&
Version=2010-03-31&
Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001&
Attributes.entry.1.value=none&
Action=CreateTopic&
Attributes.entry.3.key=push-endpoint


This is the response that comes back:

HTTP/1.1 405 Method Not Allowed
Content-Length: 200
x-amz-request-id: tx1-005ea12159-6e47a-s3-datacenter
Accept-Ranges: bytes
Content-Type: application/xml
Date: Thu, 23 Apr 2020 05:02:17 GMT
encoding="UTF-8"?>MethodNotAllowedtx1-005ea12159-6e47a-s3-datacenter6e47a-s3-datacenter-de



This is was radosgw is seeing at the same time

2020-04-23T07:02:17.745+0200 7f5aab2af700 20 final domain/bucket 
subdomain= domain=s3.example.com in_hosted_domain=1 
in_hosted_domain_s3website=0 s->info.domain=s3.example.com 
s->info.request_uri=/
2020-04-23T07:02:17.745+0200 7f5aab2af700 10 meta>> 
HTTP_X_AMZ_CONTENT_SHA256

2020-04-23T07:02:17.745+0200 7f5aab2af700 10 meta>> HTTP_X_AMZ_DATE
2020-04-23T07:02:17.745+0200 7f5aab2af700 10 x>> 
x-amz-content-sha256:e8d828552b412fde2cd686b0a984509bc485693a02e8c53ab84cf36d1dbb961a
2020-04-23T07:02:17.745+0200 7f5aab2af700 10 x>> 
x-amz-date:20200423T050035Z
2020-04-23T07:02:17.745+0200 7f5aab2af700 20 req 1 0s get_handler 
handler=26RGWHandler_REST_Service_S3
2020-04-23T07:02:17.745+0200 7f5aab2af700 10 
handler=26RGWHandler_REST_Service_S3

2020-04-23T07:02:17.745+0200 7f5aab2af700  2 req 1 0s getting op 4
2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST:
Name=ajmmvc-1_topic_1&
Attributes.entry.2.key=amqp-exchange&
Attributes.entry.1.key=amqp-ack-level&
Attributes.entry.2.value=amqp.direct&
Version=2010-03-31&
Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001&
Attributes.entry.1.value=none&
Action=CreateTopic&
Attributes.entry.3.key=push-endpoint

2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST:
Name=ajmmvc-1_topic_1&
Attributes.entry.2.key=amqp-exchange&
Attributes.entry.1.key=amqp-ack-level&
Attributes.entry.2.value=amqp.direct&
Version=2010-03-31&
Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001&
Attributes.entry.1.value=none&
Action=CreateTopic&
Attributes.entry.3.key=push-endpoint

2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST:
Name=ajmmvc-1_topic_1&
Attributes.entry.2.key=amqp-exchange&
Attributes.entry.1.key=amqp-ack-level&
Attributes.entry.2.value=amqp.direct&
Version=2010-03-31&
Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001&
Attributes.entry.1.value=none&
Action=CreateTopic&
Attributes.entry.3.key=push-endpoint

2020-04-23T07:02:17.745+0200 7f5aab2af700  1 handler->ERRORHANDLER: 
err_no=-2003 new_err_no=-2003

2020-04-23T07:02:17.745+0200 7f5aab2af700  2 req 1 0s http status=405
2020-04-23T07:02:17.745+0200 7f5aab2af700  1 == req done 
req=0x7f5aab2a6d50 op status=0 http_status=405 latency=0s ==







Best Regards,
Andreas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint

2020-04-20 Thread Andreas Unterkircher

I've tried to debug this a bit.


     
amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 
Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 
     testtopic

     


For the above I was using the following request to create the topic - 
similar as it is described here [1]:


https://ceph.example.com/?Action=CreateTopic&Name=testtopic&Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672

(of course endpoint then URL-encoded)

It seems to me that RGWHTTPArgs::parse() is not translating the 
"Attributes.entry.1..." strings into keys & values in its map.


This are the keys & values that can now be found in the map:


Found name:  Attributes.entry.1.key
Found value: amqp-exchange
Found name:  Attributes.entry.1.value
Found value: amqp.direct
Found name:  push-endpoint
Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672

If I simply change the request to:

https://ceph.example.com/?Action=CreateTopic&Name=testtopic&amqp-exchange=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672/foobar

-> at voila, the entries in the map are correct


Found name:  amqp-exchange
Found value: amqp.direct
Found name:  push-endpoint
Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672

And then the bucket-notification works like it should.

But I don't think the documentation is wrong, or is it?

Cheers,
Andreas


[1] https://docs.ceph.com/docs/master/radosgw/notifications/#create-a-topic



[2] Index: ceph-15.2.1/src/rgw/rgw_common.cc
===
--- ceph-15.2.1.orig/src/rgw/rgw_common.cc
+++ ceph-15.2.1/src/rgw/rgw_common.cc
@@ -810,6 +810,8 @@ int RGWHTTPArgs::parse()
   string& name = nv.get_name();
   string& val = nv.get_val();

+  cout << "Found name:  " << name << std::endl;
+  cout << "Found value: " << val << std::endl;
   append(name, val);
 }
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] missing amqp-exchange on bucket-notification with AMQP endpoint

2020-04-20 Thread Andreas Unterkircher

Hello List,

I'm trying to create a (S3-)bucket-notification into RabbitMQ via
AMQP - on Ceph v15.2.1 octopus, using the official .deb packages on 
Debian Buster.



I've created the following topic (directly via S3, not via pubsub REST API):

https://sns.amazonaws.com/doc/2010-03-31/";>
    
    
    
    testuser
    testtopic
    
    
amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672
    
Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672

    testtopic
    
    arn:aws:sns:de::testtopic
    
    
    
    
...


Then I've created the following bucket-notification


    
    notify-psapp
    arn:aws:sns:de::testtopic
    s3:ObjectCreated:*
    s3:ObjectRemoved:*
    



When I upload a file into the bucket, the event itself seems to get 
fired, but radosgw keeps tell me that cmqp-exchange is not set


2020-04-20T12:24:29.935+0200 7ff01c5d3700  1 == starting new
  request req=0x7ff01c5cad50 =

  2020-04-20T12:24:30.019+0200 7ff01c5d3700  1 ERROR: failed to
  create push endpoint:
  amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672  due to:
  pubsub endpoint configuration error: AMQP: missing amqp-exchange


But it's there in the EndpointArgs, right?
Or do I miss it somewhere else?

Best Regards,
Andreas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW do not show up in 'ceph status'

2020-02-24 Thread Andreas Haupt
Sorry for the noise - problem was introduced by a missing iptables rule
:-(

On Fri, 2020-02-21 at 09:04 +0100, Andreas Haupt wrote:
> Dear all,
> 
> we recently added two additional RGWs to our CEPH cluster (version
> 14.2.7). They work flawlessly, however they do not show up in 'ceph
> status':
> 
> [cephmon1] /root # ceph -s | grep -A 6 services
>   services:
> mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h)
> mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby
> osd: 168 osds: 168 up (since 2w), 168 in (since 6w)
> rgw: 1 daemon active (ceph-s3)
>  
> As you can see, only the first, old RGW (ceph-s3) is listed. Is there
> any place where the RGWs need to get "announced"? Any idea, how to
> debug this?
> 
> Thanks,
> Andreas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW do not show up in 'ceph status'

2020-02-21 Thread Andreas Haupt
On Fri, 2020-02-21 at 15:19 +0700, Konstantin Shalygin wrote:
> On 2/21/20 3:04 PM, Andreas Haupt wrote:
> > As you can see, only the first, old RGW (ceph-s3) is listed. Is there
> > any place where the RGWs need to get "announced"? Any idea, how to
> > debug this?
> 
> You was try to restart active mgr?

Yes, multiple times, it did not change anything.

Cheers,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW do not show up in 'ceph status'

2020-02-21 Thread Andreas Haupt
Dear all,

we recently added two additional RGWs to our CEPH cluster (version
14.2.7). They work flawlessly, however they do not show up in 'ceph
status':

[cephmon1] /root # ceph -s | grep -A 6 services
  services:
mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h)
mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3
mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby
osd: 168 osds: 168 up (since 2w), 168 in (since 6w)
rgw: 1 daemon active (ceph-s3)
 
As you can see, only the first, old RGW (ceph-s3) is listed. Is there
any place where the RGWs need to get "announced"? Any idea, how to
debug this?

Thanks,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd is immidietly down and uses CPU full.

2020-02-02 Thread Andreas John
tive+recovery_wait+undersized+degraded+remapped
>    1 active+recovery_wait+degraded+remapped
> recovery io 239 MB/s, 187 objects/s
>   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr
>
>  ceph osd tree
> --
>
> [root@ceph01 ceph]# ceph osd tree
> ID WEIGHT    TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 108.19864 root default
> -2  19.09381 host ceph01
>  0   2.72769 osd.0    up  1.0  1.0
>  1   2.72769 osd.1  down    0  1.0 <-- now
> down
>  2   2.72769 osd.2    up  1.0  1.0
>  5   2.72769 osd.5    up  1.0  1.0
>  6   2.72768 osd.6    up  1.0  1.0
>  3   2.72768 osd.3    up  1.0  1.0
>  4   2.72769 osd.4    up  1.0  1.0
> -3  19.09383 host ceph02
>  8   2.72769 osd.8    up  1.0  1.0
>  9   2.72769 osd.9    up  1.0  1.0
> 10   2.72769 osd.10   up  1.0  1.0
> 12   2.72769 osd.12   up  1.0  1.0
> 11   2.72769 osd.11   up  1.0  1.0
>  7   2.72768 osd.7    up  1.0  1.0
> 13   2.72769 osd.13   up  1.0  1.0
> -4  16.36626 host ceph03
> 14   2.72769 osd.14   up  1.0  1.0
> 16   2.72769 osd.16   up  1.0  1.0
> 17   2.72769 osd.17   up  1.0  1.0
> 19   2.72769 osd.19   up  1.0  1.0
> 15   1.81850 osd.15   up  1.0  1.0
> 18   1.81850 osd.18   up  1.0  1.0
> 20   1.81850 osd.20   up  1.0  1.0
> -5  15.45706 host ceph04
> 23   2.72769 osd.23   up  1.0  1.0
> 24   2.72769 osd.24   up  1.0  1.0
> 27   2.72769 osd.27 down    0  1.0 <--
> more then 3month ago
> 21   1.81850 osd.21   up  1.0  1.0
> 22   1.81850 osd.22   up  1.0  1.0
> 25   1.81850 osd.25   up  1.0  1.0
> 26   1.81850 osd.26   up  1.0  1.0
> -6  19.09384 host ceph05
> 28   2.72769 osd.28   up  1.0  1.0
> 29   2.72769 osd.29   up  1.0  1.0
> 30   2.72769 osd.30   up  1.0  1.0
> 31   2.72769 osd.31 down    0  1.0 <--
> more then 3month ago
> 32   2.72769 osd.32   up  1.0  1.0
> 34   2.72769 osd.34   up  1.0  1.0
> 33   2.72769 osd.33 down    0  1.0 <--
> more then 3month ago
> -7  19.09384 host ceph06
> 35   2.72769 osd.35   up  1.0  1.0
> 36   2.72769 osd.36   up  1.0  1.0
> 37   2.72769 osd.37   up  1.0  1.0
> 39   2.72769 osd.39   up  1.0  1.0
> 40   2.72769 osd.40   up  1.0  1.0
> 41   2.72769 osd.41   up  1.0  1.0
> 38   2.72769 osd.38 down    0  1.0 <--
> more then 3month ago
>
>
> --
>
> On 2020/02/02 11:20, 西宮 牧人 wrote:
>> Servers: 6 (include 7osds) total 42osdsl
>> OS: Centos7
>> Ceph: 10.2.5
>>
>> Hi, everyone
>>
>> The cluster is used for VM image storage and object storage.
>> And I have a bucket which has more than 20 million objects.
>>
>> Now, I have a problem that cluster blocks operation.
>>
>> Suddenly cluster blocked operations, then VMs can't read disk.
>> After a few hours, osd.1 was down.
>>
>> There is no disk fail messages in dmesg.
>> And no error is in smartctl -a /dev/sde.
>>
>> I tried to wake up osd.1, but osd.1 is down soon.
>> Just after re-waking up osd.1, VM can access to the disk.
>> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
>> the osd was dead by suicide timeout.
>>
>> I found that the osdmap epoch of osd.1 is different from other one.
>> So I think osd.1 was dead.
>>
>>
>> Question.
>> (1) Why does the epoch of osd.1 differ from other osds ones ?
>>
>>   I checked all osds oldest_map and newest_map by ~ceph daemon osd.X
>> status~
>>   All osd's ecpoch are same number except osd.1
>>
>> (2) Why does osd.1 use CPU full?
>>
>>   After the cluster marked osd.1 down, osd.1 keeps up busy.
>>   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
>> doesn't answer.
>>
>>
>> Thank you.
>
-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting rid of trim_object Snap .... not in clones

2020-02-01 Thread Andreas John
Helllo,

answering to myself in case some else sutmbles upon this thread in the
future. I was able to remove the unexpected snap, here is the recipe:


How to remove the unexpected snapshots:

1.) Stop the OSD
ceph-osd -i 14 --flush-journal
 ...  flushed journal /var/lib/ceph/osd/ceph-14/journal for object store
/var/lib/ceph/osd/ceph-14

2.) List the Object in question
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14
--journal-path
/dev/disk/by-partuuid/212e9db1-943b-45f9-9d83-cffaeb777db7 --op list
rbd_data.59cb9c679e2a9e3.3096
[wait ... it might take minutes]

["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171076,"hash":2728045428,"max":0,"pool":7,"namespace":""}]
["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171797,"hash":2728045428,"max":0,"pool":7,"namespace":""}]
["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":-2,"hash":2728045428,"max":0,"pool":7,"namespace":""}]

3.) Remove the snap from the object
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14
--journal-path
/dev/disk/by-partuuid/212e9db1-943b-45f9-9d83-cffaeb777db7
["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171076,"hash":2728045428,"max":0,"pool":7,"namespace":""}]
remove
[wait ... it might take minutes]
remove 7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44

4.) Start the OSD Again

5.) Do this for all OSD on which the snap it exists. If it still exists
on one of the other OSDs, it will be synced before repair starts and
thus cause harm again.

6.) ceph pg repair 7.374


Happy again and in need of sleep,

derjohn




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Getting rid of trim_object Snap .... not in clones

2020-02-01 Thread Andreas John
Update: When repairing the PG I get a different error:


osd.14 80.69.45.76:6813/4059849 27 : cluster [INF] 7.374 repair starts
osd.14 80.69.45.76:6813/4059849 28 : cluster [ERR] 7.374 recorded data
digest 0xebbbfb83 != on disk 0x43d61c5d on
7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44
osd.14 80.69.45.76:6813/4059849 29 : cluster [ERR] repair 7.374
7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 is an
unexpected clone
osd.14 80.69.45.76:6813/4059849 30 : cluster [ERR] 7.374 repair stat
mismatch, got 2110/2111 objects, 131/132 clones, 2110/2111 dirty, 0/0
omap, 0/0 hit_set_archive, 0/0 whiteouts, 8304141312/8304264192
bytes,0/0 hit_set_archive bytes.
osd.14 80.69.45.76:6813/4059849 31 : cluster [ERR] 7.374 repair 3
errors, 1 fixed
osd.14 80.69.45.76:6813/4059849 32 : cluster [INF] 7.374 deep-scrub starts
osd.14 80.69.45.76:6813/4059849 33 : cluster [ERR] deep-scrub 7.374
7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 is an
unexpected clone
osd.14 80.69.45.76:6813/4059849 34 : cluster [ERR] 7.374 deep-scrub 1 errors

Sorry for being so noisy in list, but maybe someone can now recognize
what to do and give me a hint.

rgds.,
j


#On 01.02.20 10:20, Andreas John wrote:
> Hello,
>
> for those sumbling upon a similar issue: I was able to mitigate the
> issue, by setting
>
>
> === 8< ===
>
> [osd.14]
> osd_pg_max_concurrent_snap_trims = 0
>
> =
>
>
> in ceph.conf. You don't need to restart the osd, osd crash crash +
> systemd will do it for you :)
>
> Now the osd in question does no trimming anymore and thus stays up.
>
> Now I let the deep-scrubber run, and press thumbs it will clean up the
> mess.
>
>
> In case I need to clean up manually, could anyone give a hint how to
> find the rbd with that snap? The logs says:
>
>
> 7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44
> not in clones
>
>
> 1.) What is the 7faf8f716700 at the beginning of the log? Is it a daemon
> id?
>
> 2.) About the Snap "ID" 29c44: In the filesystem I see
>
> ...ceph-14/current/7.374_head/DIR_4/DIR_7/DIR_B/DIR_A/rbd\udata.59cb9c679e2a9e3.3096__29c44_A29AAB74__7
>
> Do I read it correctly that in PG 7.374 there is with rbd prefix
> 59cb9c679e2a9e3 an object that ends with ..3096, which has a snap ID
> 29c44 ... ? What does the part A29AAB74__7 ?
>
> I was nit able to find in docs how the directory / filename is structured.
>
>
> Best Regrads,
>
> j.
>
>
>
> On 31.01.20 16:04, Andreas John wrote:
>> Hello,
>>
>> in my cluster one after the other OSD dies until I recognized that it
>> was simply an "abort" in the daemon caused probably by
>>
>> 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log
>> [ERR] : trim_object Snap 29c44 not in clones
>>
>>
>> Close to this msg I get a stracktrace:
>>
>>
>>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>>  1: /usr/bin/ceph-osd() [0xb35f7d]
>>  2: (()+0x11390) [0x7f0fec74b390]
>>  3: (gsignal()+0x38) [0x7f0feab43428]
>>  4: (abort()+0x16a) [0x7f0feab4502a]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d]
>>  6: (()+0x8d6b6) [0x7f0feb4846b6]
>>  7: (()+0x8d701) [0x7f0feb484701]
>>  8: (()+0x8d919) [0x7f0feb484919]
>>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x27e) [0xc3776e]
>>  10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd]
>>  11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80)
>> [0x8690e0]
>>  12: (Context::complete(int)+0x9) [0x6c8799]
>>  13: (void ReplicatedBackend::sub_op_modify_reply> 113>(std::tr1::shared_ptr)+0x21b) [0xa5ae0b]
>>  14:
>> (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b)
>> [0xa53edb]
>>  15: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
>> ThreadPool::TPHandle&)+0x1cb) [0x84c78b]
>>  16: (OSD::dequeue_op(boost::intrusive_ptr,
>> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff]
>>  17: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x4e4) [0x696e14]
>>  18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e)
>> [0xc264fe]
>>  19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950]
>>  20: (()+0x76ba) [0x7f0fec7416ba]
>>  21: (clone()+0x6d) [0x7f0feac1541d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>>
>> Yes, I know it's still hammer, I want to upgrade soon, but I want to
>> resolve that issue first. If I lose that PG, I don't worry.
>&

[ceph-users] Getting rid of trim_object Snap .... not in clones

2020-02-01 Thread Andreas John
Hello,

for those sumbling upon a similar issue: I was able to mitigate the
issue, by setting


=== 8< ===

[osd.14]
osd_pg_max_concurrent_snap_trims = 0

=


in ceph.conf. You don't need to restart the osd, osd crash crash +
systemd will do it for you :)

Now the osd in question does no trimming anymore and thus stays up.

Now I let the deep-scrubber run, and press thumbs it will clean up the
mess.


In case I need to clean up manually, could anyone give a hint how to
find the rbd with that snap? The logs says:


7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44
not in clones


1.) What is the 7faf8f716700 at the beginning of the log? Is it a daemon
id?

2.) About the Snap "ID" 29c44: In the filesystem I see

...ceph-14/current/7.374_head/DIR_4/DIR_7/DIR_B/DIR_A/rbd\udata.59cb9c679e2a9e3.3096__29c44_A29AAB74__7

Do I read it correctly that in PG 7.374 there is with rbd prefix
59cb9c679e2a9e3 an object that ends with ..3096, which has a snap ID
29c44 ... ? What does the part A29AAB74__7 ?

I was nit able to find in docs how the directory / filename is structured.


Best Regrads,

j.



On 31.01.20 16:04, Andreas John wrote:
> Hello,
>
> in my cluster one after the other OSD dies until I recognized that it
> was simply an "abort" in the daemon caused probably by
>
> 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log
> [ERR] : trim_object Snap 29c44 not in clones
>
>
> Close to this msg I get a stracktrace:
>
>
>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>  1: /usr/bin/ceph-osd() [0xb35f7d]
>  2: (()+0x11390) [0x7f0fec74b390]
>  3: (gsignal()+0x38) [0x7f0feab43428]
>  4: (abort()+0x16a) [0x7f0feab4502a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d]
>  6: (()+0x8d6b6) [0x7f0feb4846b6]
>  7: (()+0x8d701) [0x7f0feb484701]
>  8: (()+0x8d919) [0x7f0feb484919]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x27e) [0xc3776e]
>  10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd]
>  11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80)
> [0x8690e0]
>  12: (Context::complete(int)+0x9) [0x6c8799]
>  13: (void ReplicatedBackend::sub_op_modify_reply 113>(std::tr1::shared_ptr)+0x21b) [0xa5ae0b]
>  14:
> (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b)
> [0xa53edb]
>  15: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
> ThreadPool::TPHandle&)+0x1cb) [0x84c78b]
>  16: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff]
>  17: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x4e4) [0x696e14]
>  18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e)
> [0xc264fe]
>  19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950]
>  20: (()+0x76ba) [0x7f0fec7416ba]
>  21: (clone()+0x6d) [0x7f0feac1541d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
>
> Yes, I know it's still hammer, I want to upgrade soon, but I want to
> resolve that issue first. If I lose that PG, I don't worry.
>
> So: What it the best approach? Can I use something like
> ceph-objectstore-tool ...  remove-clone-metadata  ? I
> assume 29c44 is my Object, but what's the clone od?
>
>
> Best regards,
>
> derjohn
>
> _______
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Andreas John
net-lab GmbH  |  Frankfurter Str. 99  |  63067 Offenbach
Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832
Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net

Facebook: https://www.facebook.com/netlabdotnet
Twitter: https://twitter.com/netlabdotnet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Getting rid of trim_object Snap .... not in clones

2020-01-31 Thread Andreas John
Hello,

in my cluster one after the other OSD dies until I recognized that it
was simply an "abort" in the daemon caused probably by

2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log
[ERR] : trim_object Snap 29c44 not in clones


Close to this msg I get a stracktrace:


 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: /usr/bin/ceph-osd() [0xb35f7d]
 2: (()+0x11390) [0x7f0fec74b390]
 3: (gsignal()+0x38) [0x7f0feab43428]
 4: (abort()+0x16a) [0x7f0feab4502a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d]
 6: (()+0x8d6b6) [0x7f0feb4846b6]
 7: (()+0x8d701) [0x7f0feb484701]
 8: (()+0x8d919) [0x7f0feb484919]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x27e) [0xc3776e]
 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd]
 11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80)
[0x8690e0]
 12: (Context::complete(int)+0x9) [0x6c8799]
 13: (void ReplicatedBackend::sub_op_modify_reply(std::tr1::shared_ptr)+0x21b) [0xa5ae0b]
 14:
(ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b)
[0xa53edb]
 15: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
ThreadPool::TPHandle&)+0x1cb) [0x84c78b]
 16: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff]
 17: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x4e4) [0x696e14]
 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e)
[0xc264fe]
 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950]
 20: (()+0x76ba) [0x7f0fec7416ba]
 21: (clone()+0x6d) [0x7f0feac1541d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.


Yes, I know it's still hammer, I want to upgrade soon, but I want to
resolve that issue first. If I lose that PG, I don't worry.

So: What it the best approach? Can I use something like
ceph-objectstore-tool ...  remove-clone-metadata  ? I
assume 29c44 is my Object, but what's the clone od?


Best regards,

derjohn

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io