[ceph-users] Re: [External Email] Re: XFS on RBD on EC painfully slow

2021-07-05 Thread Reed Dier
Providing an overdue update to wrap this thread up.

It turns out I wasn't seeing the forest for the trees.
Parallelizing the copy did in fact yield much larger results than the single 
threaded copies.

In the end we used a home-brewed python script to parallelize the copy using 
cp, rather than rsync, to copy things in batches, which took about 48 hours to 
copy ~35TiB from the RBD to cephfs.
I think it could have gone a bit faster, however in an effort to keep the RBD 
NFS export somewhat usable with respect to latency and iops.

So all in all, when in doubt, parallelize.
Otherwise, you're stuck with painfully slow single threaded performance.

Reed

> On May 30, 2021, at 6:32 AM, Dave Hall  wrote:
> 
> Reed,
> 
> I'd like to add to Sebastian's comments - the problem is probably rsync.
> 
> I inherited a smaller setup than you when I assumed my current 
> responsibilities - an XFS file system on a RAID and exported over NFS.  The 
> backup process is based on RSnapshot, which is based on rsync over SSH, but 
> the target is another XFS on hardware RAID.   The file system contains a 
> couple thousand user home directories for Computer Science students, so wide 
> and deep and lots of small files.
> 
> I was tormented by a daily backup process that was taking 3 days to run - 
> copying the entire file system in a single rsync.  What I ultimately 
> concluded is that rsync runs exponentially slower as the size of the file 
> tree to be copied increases.  To get around this, if you play some games with 
> find and GNU parallel, you can break your file system into many small 
> sub-trees and run many rsyncs in parallel.  Copying this way, you will see 
> amazing throughput.  
> 
> From the point of view of a programmer, I think that rsync must build a 
> representation of the source and destination file trees in memory and then 
> must traverse and re-traverse them to make sure everything got copies and 
> that nothing changed in the source tree.  I've never read the code, but I've 
> see articles that confirm my theory.
> 
> In my case, because of inflexibility in RSnapshot I have ended up with 26 
> consecutive rsyncs - a*, b*, c*, etc.  and I still go about twice as fast as 
> I would with one large rsync.  However, when I transferred this file system 
> to a new NFS server and new storage I was able to directly rsync each user in 
> parallel.  I filled up a 10GB pipe and copied the whole FS in an hour.
> 
> Typing in a hurry.  If my explanation is confusing, please don't hesitate to 
> ask me to explain better.  
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu 
> 
> 
> On Fri, May 28, 2021 at 11:12 AM Sebastian Knust 
> mailto:skn...@physik.uni-bielefeld.de>> 
> wrote:
> Hi Reed,
> 
> To add to this command by Weiwen:
> 
> On 28.05.21 13:03, 胡 玮文 wrote:
> > Have you tried just start multiple rsync process simultaneously to transfer 
> > different directories? Distributed system like ceph often benefits from 
> > more parallelism.
> 
> When I migrated from XFS on iSCSI (legacy system, no Ceph) to CephFS a 
> few months ago, I used msrsync [1] and was quite happy with the speed. 
> For your use case, I would start with -p 12 but might experiment with up 
> to -p 24 (as you only have 6C/12T in your CPU). With many small files, 
> you also might want to increase -s from the default 1000.
> 
> Note that msrsync does not work with the --delete rsync flag. As I was 
> syncing a live system, I ended up with this workflow:
> 
> - Initial sync with msrsync (something like ./msrsync -p 12 --progress 
> --stats --rsync "-aS --numeric-ids" ...)
> - Second sync with msrsync (to sync changes during the first sync)
> - Take old storage off-line for users / read-only
> - Final rsync with --delete (i.e. rsync -aS --numeric-ids --delete ...)
> - Mount cephfs at location of old storage, adjust /etc/exports with fsid 
> entries where necessary, turn system back on-line / read-write
> 
> Cheers
> Sebastian
> 
> [1] https://github.com/jbd/msrsync 
> ___
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Haproxy config, multilple RGW on the same node with different ports haproxy ignore

2021-07-05 Thread Szabo, Istvan (Agoda)
Hi,

I have this config:

https://jpst.it/2yBsD

What I'm missing from the backend part to make it able to balance on the same 
server but different port?

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-05 Thread Josh Baergen
Hey Wladimir,

I actually don't know where this is referenced in the docs, if anywhere.
Googling around shows many people discovering this overhead the hard way on
ceph-users.

I also don't know the rbd journaling mechanism in enough depth to comment
on whether it could be causing this issue for you. Are you seeing a high
allocated:stored ratio on your cluster?

Josh

On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel  wrote:

> Dear Mr Baergen,
>
> thanks a lot for your very concise explanation,
> however I would like to learn more why default Bluestore alloc.size causes
> such a big storage overhead,
> and where in the Ceph docs it is explained how and what to watch for to
> avoid hitting this phenomenon again and again.
> I have a feeling this is what I get on my experimental Ceph setup with
> simplest JErasure 2+1 data pool.
> Could it be caused by journaled RBD writes to EC data-pool ?
>
> Josh Baergen wrote:
> > Hey Arkadiy,
> >
> > If the OSDs are on HDDs and were created with the default
> > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
> > effect data will be allocated from the pool in 640KiB chunks (64KiB *
> > (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
> > which results in a ratio of 6.53:1 allocated:stored, which is pretty
> close
> > to the 7:1 observed.
> >
> > If my assumption about your configuration is correct, then the only way
> to
> > fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
> > OSDs, which will take a while...
> >
> > Josh
> >
> > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  wrote:
> >
> >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
> >> *3.5
> >> TiB *(7 times higher!)*:*
> >>
> >> root@ceph-01:~# ceph df
> >> --- RAW STORAGE ---
> >> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> >> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >>
> >> --- POOLS ---
> >> POOL   ID  PGS  STORED   OBJECTS  USED %USED
> MAX
> >> AVAIL
> >> device_health_metrics   11   19 KiB   12   56 KiB  0
> >>   61 TiB
> >> .rgw.root   2   32  2.6 KiB6  1.1 MiB  0
> >>   61 TiB
> >> default.rgw.log 3   32  168 KiB  210   13 MiB  0
> >>   61 TiB
> >> default.rgw.control 4   32  0 B8  0 B  0
> >>   61 TiB
> >> default.rgw.meta58  4.8 KiB   11  1.9 MiB  0
> >>   61 TiB
> >> default.rgw.buckets.index   68  1.6 GiB  211  4.7 GiB  0
> >>   61 TiB
> >>
> >> default.rgw.buckets.data   10  128  501 GiB5.36M  3.5 TiB   1.90
> >> 110 TiB
> >>
> >> The *default.rgw.buckets.data* pool is using erasure coding:
> >>
> >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
> >> crush-device-class=hdd
> >> crush-failure-domain=host
> >> crush-root=default
> >> jerasure-per-chunk-alignment=false
> >> k=6
> >> m=4
> >> plugin=jerasure
> >> technique=reed_sol_van
> >> w=8
> >>
> >> If anyone could help explain why it's using up 7 times more space, it
> would
> >> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus
> stable).
> >>
> >> Sincerely,
> >> Ark.
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Graphics in ceph dashboard

2021-07-05 Thread Fyodor Ustinov
Hi!

Thanks a lot for your help! 

The problem turned out to be that the container with prometheus does not use 
the hosts file. DNS only. And I have all servers were described only in hosts.

- Original Message -
> From: "Ernesto Puerta" 
> To: "Fyodor Ustinov" 
> Cc: "ceph-users" 
> Sent: Monday, 5 July, 2021 20:59:24
> Subject: Re: [ceph-users] Graphics in ceph dashboard

> Hi Fyodor,
> 
> Ceph Grafana Dashboards come from 2 different sources: Node Exporter
> metrics and the Ceph Exporter ones (a Python module running inside
> ceph-mgr).
> 
> I'd check the following steps:
> 
>   1. Ceph services are running (OSD, MDS, RGW, ...), otherwise there might
>   be no data to feed the Ceph-specific charts.
>   2. The Ceph exporter is running ("ceph mgr services") and it's URL is
>   reachable from the Prometheus server.
>   3. The Prometheus server contains "ceph_.*" metrics (you may simply
>   visit the Prometheus Web UI and start typing "ceph": if nothing shows up it
>   means that there are no Ceph metrics stored).
>   4. In the Prometheus Web UI you may check the target section: the Ceph
>   target should be green.
> 
> Hope this helps you find out the issue.
> 
> Kind Regards,
> Ernesto
> 
> 
> On Mon, Jul 5, 2021 at 5:58 PM Fyodor Ustinov  wrote:
> 
>> Hi!
>>
>> I installed fresh cluster 16.2.4
>> as described in https://docs.ceph.com/en/latest/cephadm/#cephadm
>>
>> Everything works except for one thing: there are only graphics in the
>> hosts / overall performance, (only the CPU and the network). In all other
>> places the inscription "no data".
>>
>> What could I have done wrong?
>>
>> WBR,
>> Fyodor.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Objectstore user IO and operations monitoring

2021-07-05 Thread Szabo, Istvan (Agoda)
Hi,

I'm looking for this long time ago, I have a lot of users and when 1 user can 
take down the cluster I want to know which one, but there isn't any bucket 
stats that could help.
Anyone knows anything?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Graphics in ceph dashboard

2021-07-05 Thread Ernesto Puerta
Hi Fyodor,

Ceph Grafana Dashboards come from 2 different sources: Node Exporter
metrics and the Ceph Exporter ones (a Python module running inside
ceph-mgr).

I'd check the following steps:

   1. Ceph services are running (OSD, MDS, RGW, ...), otherwise there might
   be no data to feed the Ceph-specific charts.
   2. The Ceph exporter is running ("ceph mgr services") and it's URL is
   reachable from the Prometheus server.
   3. The Prometheus server contains "ceph_.*" metrics (you may simply
   visit the Prometheus Web UI and start typing "ceph": if nothing shows up it
   means that there are no Ceph metrics stored).
   4. In the Prometheus Web UI you may check the target section: the Ceph
   target should be green.

Hope this helps you find out the issue.

Kind Regards,
Ernesto


On Mon, Jul 5, 2021 at 5:58 PM Fyodor Ustinov  wrote:

> Hi!
>
> I installed fresh cluster 16.2.4
> as described in https://docs.ceph.com/en/latest/cephadm/#cephadm
>
> Everything works except for one thing: there are only graphics in the
> hosts / overall performance, (only the CPU and the network). In all other
> places the inscription "no data".
>
> What could I have done wrong?
>
> WBR,
> Fyodor.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph with BGP?

2021-07-05 Thread German Anders
Hi All,

   I have an already created and functional ceph cluster (latest luminous
release) with two networks one for the public (layer 2+3) and the other for
the cluster, the public one uses VLAN and its 10GbE and the other one uses
Infiniband with 56Gb/s, the cluster works ok. The public network uses
Juniper QFX5100 switches with VLAN in layer2+3 configuration but the
network team needs to move to a full layer3 and they want to use BGP, so
the question is, how can we move to that schema? What are the
considerations? Is it possible? Is there any step-by-step way to move to
that schema? Also is anything better than BGP or other alternatives?

In information will be really helpful

Thanks in advance,

Cheers,

German
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Graphics in ceph dashboard

2021-07-05 Thread Fyodor Ustinov
Hi!

I installed fresh cluster 16.2.4
as described in https://docs.ceph.com/en/latest/cephadm/#cephadm

Everything works except for one thing: there are only graphics in the hosts / 
overall performance, (only the CPU and the network). In all other places the 
inscription "no data".

What could I have done wrong?

WBR,
Fyodor.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm shell fails to start due to missing config files?

2021-07-05 Thread Sebastian Wagner

Hi Vladimir,

The behavior of`cephadm shell` will be improved by 
https://github.com/ceph/ceph/pull/42028 In the meantime and as a 
workaround you can either deploy a daemon on this host or you can copy 
the system's ceph.conf into the location that is shown in the error 
message.


Hope that helps,

Sebastian

Am 02.07.21 um 19:04 schrieb Vladimir Brik:

Hello

I am getting an error on one node in my cluster (other nodes are fine) 
when trying to run "cephadm shell". Historically this machine has been 
used as the primary Ceph management host, so it would be nice if this 
could be fixed.


ceph-1 ~ # cephadm -v shell
container_init=False
Inferring fsid 79656e6e-21e2-4092-ac04-d536f25a435d
Inferring config 
/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/mon.ceph-1/config
Running command: /usr/bin/podman images --filter label=ceph=True 
--filter dangling=false --format {{.Repository}}@{{.Digest}}
/usr/bin/podman: stdout 
docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced
/usr/bin/podman: stdout 
docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
/usr/bin/podman: stdout 
docker.io/ceph/ceph@sha256:16d37584df43bd6545d16e5aeba527de7d6ac3da3ca7b882384839d2d86acc7d
Using recent ceph image 
docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced
Running command: /usr/bin/podman run --rm --ipc=host --net=host 
--entrypoint stat -e 
CONTAINER_IMAGE=docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced 
-e NODE_NAME=ceph-1 
docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced 
-c %u %g /var/lib/ceph

stat: stdout 167 167
Running command (timeout=None): /usr/bin/podman run --rm --ipc=host 
--net=host --privileged --group-add=disk -it -e LANG=C -e PS1=[ceph: 
\u@\h \W]\$  -e 
CONTAINER_IMAGE=docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced 
-e NODE_NAME=ceph-1 -v 
/var/run/ceph/79656e6e-21e2-4092-ac04-d536f25a435d:/var/run/ceph:z -v 
/var/log/ceph/79656e6e-21e2-4092-ac04-d536f25a435d:/var/log/ceph:z -v 
/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/crash:/var/lib/ceph/crash:z 
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm 
-v /run/lock/lvm:/run/lock/lvm -v 
/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/mon.ceph-1/config:/etc/ceph/ceph.conf:z 
-v /etc/ceph/ceph.client.admin.keyring:/etc/ceph/ceph.keyring:z -v 
/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/home:/root 
--entrypoint bash 
docker.io/ceph/daemon-base@sha256:0810dc7db854150bc48cf8fc079875e28b3138d070990a630b8fb7cec7cd2ced
Error: error checking path 
"/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/mon.ceph-1/config": 
stat 
/var/lib/ceph/79656e6e-21e2-4092-ac04-d536f25a435d/mon.ceph-1/config: 
no such file or directory



The machine in question doesn't run a mon daemon (but it did a long 
time ago), so I am not sure why "cephadm shell" on this particular 
machine is looking for mon.ceph-1/config



Can anybody help?

Thanks,

Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io





OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove objectstore from a RBD RGW cluster

2021-07-05 Thread Szabo, Istvan (Agoda)
Hi,

I want to remove all the objectstore related things from my cluster and keep 
only for RBD.

I've uninstalled the RGW services.
Removed the haproxy config related to that.

When I try to delete realm, zone, zonegroup it is finished but after coupe of 
minutes something recreate another zonegroup. I can't figure out what.

What I miss?

PS: The pools are still there, that will be the last step, hope I don't miss 
any necessary step.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove objectstore from a RBD RGW cluster

2021-07-05 Thread Janne Johansson
Sounds like an rgw is still running.

Den mån 5 juli 2021 08:15Szabo, Istvan (Agoda) 
skrev:

> Hi,
>
> I want to remove all the objectstore related things from my cluster and
> keep only for RBD.
>
> I've uninstalled the RGW services.
> Removed the haproxy config related to that.
>
> When I try to delete realm, zone, zonegroup it is finished but after coupe
> of minutes something recreate another zonegroup. I can't figure out what.
>
> What I miss?
>
> PS: The pools are still there, that will be the last step, hope I don't
> miss any necessary step.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] pgcalc tool removed (or moved?) from ceph.com ?

2021-07-05 Thread Dominik Csapak

Hi,

just wanted to ask if it is intentional that

http://ceph.com/pgcalc/

results in a 404 error?

is there any alternative url?
it is still linked from the offical docs.

with kind regards
Dominik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io