[ceph-users] Re: about replica size

2020-07-13 Thread Zhenshi Zhou
Thank you all for helping me understand clearly about the 'size'.


Ml Ml  于2020年7月10日周五 下午11:08写道:

> If size is 2 and one disks fails you are already going to be in error
> state with read only.
>
> Let's say you reboot one node, you will instantly get into trouble.
>
> If you are going to reboot one node and at the same time the other disk
> fails, then you very like loose data.
>
> Just never ever use size 2. Not even temporary :)
>
>
> Zhenshi Zhou  schrieb am Fr., 10. Juli 2020, 04:11:
>
>> Hi,
>>
>> As we all know, the default replica setting of 'size' is 3 which means
>> there
>> are 3 copies of an object. What is the disadvantages if I set it to 2,
>> except
>> I get fewer copies?
>>
>> Thanks
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Marc Roos


>> To anyone who is following this thread, we found a possible 
explanation for 
>> (some of) our observations.

If someone is following this, they probably want the possible 
explanation and not the knowledge of you having the possible 
explanation.
 
So you are saying if you do eg. a core installation (without gui) of 
2016/2019 disable all services. The fio test results are signficantly 
different to eg. a centos 7 vm doing the same fio test? Are you sure 
this is not related to other processes writing to disk? 



-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk] 
Sent: maandag 13 juli 2020 9:28
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible explanation 
for (some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers 
exporting data on an rbd image/disk. We recently found that Windows 
server 2016 runs fine. It is still not as fast as Linux + SAMBA share on 
an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth. 
With Windows server 2019, however, we observe near-complete stall of 
file transfers and time-outs using standard copy tools (robocopy). We 
don't have an explanation yet and are downgrading Windows servers where 
possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] compaction_threads and flusher_threads can not used

2020-07-13 Thread ??????
Hello!    For ceph nautilus v14.2.10, I cannot used 
"compaction_threads and flusher_threads" ??
    Why restrict bluestore_rocksdb_options to setting parameters?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error on upgrading to 15.2.4 / invalid service name using containers

2020-07-13 Thread Mario J . Barchéin Molina
Hello. We finally solved the problem, we just deleted the failed service
with:

 # ceph orch rm mds.label:mds

and after that, we could finish the upgrade to 15.2.4.


El vie., 10 jul. 2020 a las 3:41, Mario J. Barchéin Molina (<
ma...@intelligenia.com>) escribió:

> Hello. I'm trying to upgrade to ceph 15.2.4 from 15.2.3. The upgrade is
> almost finished, but it has entered in a service start/stop loop. I'm using
> a container deployment over Debian 10 with 4 nodes. The problem is with a
> service named literally "mds.label:mds". It has the colon character, which
> is of special use in docker. This character can't appear in the name of the
> container and also breaks the volumen binding syntax.
>
> I have seen in the /var/lib/ceph/UUID/ the files for this service:
>
> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca# ls
> -la
> total 48
> drwx-- 12167 167 4096 jul 10 02:54 .
> drwxr-x---  3 ceph   ceph4096 jun 24 16:36 ..
> drwx--  3 nobody nogroup 4096 jun 24 16:37 alertmanager.ceph-admin
> drwx--  3167 167 4096 jun 24 16:36 crash
> drwx--  2167 167 4096 jul 10 01:35 crash.ceph-admin
> drwx--  4998 996 4096 jun 24 16:38 grafana.ceph-admin
> drwx--  2167 167 4096 jul 10 02:55
> mds.label:mds.ceph-admin.rwmtkr
> drwx--  2167 167 4096 jul 10 01:33 mgr.ceph-admin.doljkl
> drwx--  3167 167 4096 jul 10 01:34 mon.ceph-admin
> drwx--  2 nobody nogroup 4096 jun 24 16:38 node-exporter.ceph-admin
> drwx--  4 nobody nogroup 4096 jun 24 16:38 prometheus.ceph-admin
> drwx--  4 root   root4096 jul  3 02:43 removed
>
>
> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr#
> ls -la
> total 32
> drwx--  2  167  167 4096 jul 10 02:55 .
> drwx-- 12  167  167 4096 jul 10 02:54 ..
> -rw---  1  167  167  295 jul 10 02:55 config
> -rw---  1  167  167  152 jul 10 02:55 keyring
> -rw---  1  167  167   38 jul 10 02:55 unit.configured
> -rw---  1  167  167   48 jul 10 02:54 unit.created
> -rw---  1 root root   24 jul 10 02:55 unit.image
> -rw---  1 root root0 jul 10 02:55 unit.poststop
> -rw---  1 root root  981 jul 10 02:55 unit.run
>
> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr#
> cat unit.run
> /usr/bin/install -d -m0770 -o 167 -g 167
> /var/run/ceph/0ce93550-b628-11ea-9484-f6dc192416ca
> /usr/bin/docker run --rm --net=host --ipc=host --name
> ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr
> -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=ceph-admin -v
> /var/ru
> n/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/run/ceph:z -v
> /var/log/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/log/ceph:z -v
> /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/crash:/var/lib/ceph/c
> rash:z -v
> /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z
> -v /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.l
> abel:mds.ceph-admin.rwmtkr/config:/etc/ceph/ceph.conf:z --entrypoint
> /usr/bin/ceph-mds docker.io/ceph/ceph:v15 -n
> mds.label:mds.ceph-admin.rwmtkr -f --setuser ceph --setgroup ceph
> --default-log-to-file=fal
> se --default-log-to-stderr=true --default-log-stderr-prefix="debug "
>
> If I try to manually run the docker command, this is the error:
>
> docker: Error response from daemon: Invalid container name
> (ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr),
> only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.
>
> If I try with a different container name, then the volume binding error
> rises:
>
> docker: Error response from daemon: invalid volume specification:
> '/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z'.
>
> This mds is not needed and I would be happy simply removing it, but I
> don't know how to do it. The documentation says how to do it for "normal"
> services, but my installation is a container deployment. I have tried to
> remove the directory and restart the upgrading process but then the
> directory with this service appears again.
>
> Please, how can I remove or rename this service so I can complete the
> upgrading?
>
> Also, I think it's a bug to allow docker-forbidden characters in the
> service names when using container deployment and it should be checked.
>
> Thank you very much.
>
> --
> *Mario J. Barchéin Molina*
> *Departamento de I+D+i*
> ma...@intelligenia.com
> Madrid: +34 911 86 35 46
> US: +1 (918) 856 - 3838
> Granada: +34 958 07 70 70
> ――
> intelligenia · Intelligent Engineering · Web & APP & Intranet
> www.intelligenia.com  · @intelligenia  ·
> fb.com/intelligenia · blog.intelligenia.com
> Madrid · C/ de la Alameda 22, 28014, Madrid, Spain
> 

[ceph-users] Re: Ceph `realm pull` permission denied error

2020-07-13 Thread Zhenshi Zhou
Hi Alex,

I didn't deploy this in containers/vms, as well as ansible or other tools.
However I deployed multisite once and I remember that I restarted the
rgw on the master site before I sync realm on the secondary site.

I'm not sure if this can help.


Alex Hussein-Kershaw  于2020年7月13日周一
下午5:48写道:

> Hi Ceph Users,
>
> I'm struggling with an issue that I'm hoping someone can point me towards
> a solution.
>
> We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The
> setup that I'm working with has 3 VMs, but of-course our design expects
> this to be scaled by a user as appropriate. I have a cluster deployed and
> it's functioning happily as storage for our product, the error occurs when
> I go to setup a second cluster and pair it with the first. I'm using
> ceph-ansible to deploy.  I get the following error about 20 minutes into
> running the site-container playbook.
>
> 2020-07-09 14:21:10,966 p=2134 u=qs-admin |  TASK [ceph-rgw : fetch the
> realm]
> ***
>
> 
> 2020-07-09 14:21:10,966 p=2134 u=qs-admin |  Thursday 09 July 2020
> 14:21:10 + (0:00:00.410)   0:16:18.245 *
> 2020-07-09 14:21:11,901 p=2134 u=qs-admin |  fatal: [10.225.21.213 ->
> 10.225.21.213]: FAILED! => changed=true
>   cmd:
>   - docker
>   - exec
>   - ceph-mon-albamons_sc2
>   - radosgw-admin
>   - realm
>   - pull
>   - --url=https://10.225.36.197:7480
>   - --access-key=2CQ006Lereqpysbr0l0s
>   - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
>   delta: '0:00:00.545895'
>   end: '2020-07-09 14:21:11.516539'
>   msg: non-zero return code
>   rc: 13
>   start: '2020-07-09 14:21:10.970644'
>   stderr: |-
> request failed: (13) Permission denied
> If the realm has been changed on the master zone, the master zone's
> gateway may need to be restarted to recognize this user.
>   stderr_lines: 
>   stdout: ''
>   stdout_lines: 
>
> Re-running the command manually reproduces the error. I understand that
> the permission denied error appears to indicate the keys are not valid,
> suggested by https://tracker.ceph.com/issues/36619. However, I've triple
> checked the keys are correct on the other site. I'm at a loss of where to
> look for debugging, I've turned up logs on both the local and remote site
> for RGW and MON processes but neither seem to yield anything related. I've
> tried restarting everything as suggested in the error text from all the
> processes to a full reboot of all the VMs. I've no idea why the keys are
> being declined either, as they are correct (or atleast `radosgw-admin
> period get` on the primary site thinks so).
>
> Thanks for your help,
> Alex
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error on upgrading to 15.2.4 / invalid service name using containers

2020-07-13 Thread Sebastian Wagner
Thanks! I've created https://tracker.ceph.com/issues/46497

Am 13.07.20 um 11:51 schrieb Mario J. Barchéin Molina:
> Hello. We finally solved the problem, we just deleted the failed service
> with:
> 
>  # ceph orch rm mds.label:mds
> 
> and after that, we could finish the upgrade to 15.2.4.
> 
> 
> El vie., 10 jul. 2020 a las 3:41, Mario J. Barchéin Molina (<
> ma...@intelligenia.com>) escribió:
> 
>> Hello. I'm trying to upgrade to ceph 15.2.4 from 15.2.3. The upgrade is
>> almost finished, but it has entered in a service start/stop loop. I'm using
>> a container deployment over Debian 10 with 4 nodes. The problem is with a
>> service named literally "mds.label:mds". It has the colon character, which
>> is of special use in docker. This character can't appear in the name of the
>> container and also breaks the volumen binding syntax.
>>
>> I have seen in the /var/lib/ceph/UUID/ the files for this service:
>>
>> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca# ls
>> -la
>> total 48
>> drwx-- 12167 167 4096 jul 10 02:54 .
>> drwxr-x---  3 ceph   ceph4096 jun 24 16:36 ..
>> drwx--  3 nobody nogroup 4096 jun 24 16:37 alertmanager.ceph-admin
>> drwx--  3167 167 4096 jun 24 16:36 crash
>> drwx--  2167 167 4096 jul 10 01:35 crash.ceph-admin
>> drwx--  4998 996 4096 jun 24 16:38 grafana.ceph-admin
>> drwx--  2167 167 4096 jul 10 02:55
>> mds.label:mds.ceph-admin.rwmtkr
>> drwx--  2167 167 4096 jul 10 01:33 mgr.ceph-admin.doljkl
>> drwx--  3167 167 4096 jul 10 01:34 mon.ceph-admin
>> drwx--  2 nobody nogroup 4096 jun 24 16:38 node-exporter.ceph-admin
>> drwx--  4 nobody nogroup 4096 jun 24 16:38 prometheus.ceph-admin
>> drwx--  4 root   root4096 jul  3 02:43 removed
>>
>>
>> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr#
>> ls -la
>> total 32
>> drwx--  2  167  167 4096 jul 10 02:55 .
>> drwx-- 12  167  167 4096 jul 10 02:54 ..
>> -rw---  1  167  167  295 jul 10 02:55 config
>> -rw---  1  167  167  152 jul 10 02:55 keyring
>> -rw---  1  167  167   38 jul 10 02:55 unit.configured
>> -rw---  1  167  167   48 jul 10 02:54 unit.created
>> -rw---  1 root root   24 jul 10 02:55 unit.image
>> -rw---  1 root root0 jul 10 02:55 unit.poststop
>> -rw---  1 root root  981 jul 10 02:55 unit.run
>>
>> root@ceph-admin:/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr#
>> cat unit.run
>> /usr/bin/install -d -m0770 -o 167 -g 167
>> /var/run/ceph/0ce93550-b628-11ea-9484-f6dc192416ca
>> /usr/bin/docker run --rm --net=host --ipc=host --name
>> ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr
>> -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=ceph-admin -v
>> /var/ru
>> n/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/run/ceph:z -v
>> /var/log/ceph/0ce93550-b628-11ea-9484-f6dc192416ca:/var/log/ceph:z -v
>> /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/crash:/var/lib/ceph/c
>> rash:z -v
>> /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z
>> -v /var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.l
>> abel:mds.ceph-admin.rwmtkr/config:/etc/ceph/ceph.conf:z --entrypoint
>> /usr/bin/ceph-mds docker.io/ceph/ceph:v15 -n
>> mds.label:mds.ceph-admin.rwmtkr -f --setuser ceph --setgroup ceph
>> --default-log-to-file=fal
>> se --default-log-to-stderr=true --default-log-stderr-prefix="debug "
>>
>> If I try to manually run the docker command, this is the error:
>>
>> docker: Error response from daemon: Invalid container name
>> (ceph-0ce93550-b628-11ea-9484-f6dc192416ca-mds.label:mds.ceph-admin.rwmtkr),
>> only [a-zA-Z0-9][a-zA-Z0-9_.-] are allowed.
>>
>> If I try with a different container name, then the volume binding error
>> rises:
>>
>> docker: Error response from daemon: invalid volume specification:
>> '/var/lib/ceph/0ce93550-b628-11ea-9484-f6dc192416ca/mds.label:mds.ceph-admin.rwmtkr:/var/lib/ceph/mds/ceph-label:mds.ceph-admin.rwmtkr:z'.
>>
>> This mds is not needed and I would be happy simply removing it, but I
>> don't know how to do it. The documentation says how to do it for "normal"
>> services, but my installation is a container deployment. I have tried to
>> remove the directory and restart the upgrading process but then the
>> directory with this service appears again.
>>
>> Please, how can I remove or rename this service so I can complete the
>> upgrading?
>>
>> Also, I think it's a bug to allow docker-forbidden characters in the
>> service names when using container deployment and it should be checked.
>>
>> Thank you very much.
>>
>> --
>> *Mario J. Barchéin Molina*
>> *Departamento de I+D+i*
>> ma...@intelligenia.com
>> Madrid: +34 911 86 35 46
>> US: +1 (918) 856 - 3838
>> Granada: +34 958 07 70 70
>> ――
>> intelligenia · Intelligent Engineering · Web 

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Maged Mokhtar

On 13/07/2020 10:43, Frank Schilder wrote:

To anyone who is following this thread, we found a possible explanation for
(some of) our observations.

If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.
  

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?

Right, its not an explanation but rather a further observation. We don't really 
have an explanation yet.

Its an identical installation of both server versions, same services 
configured. Our operators are not really into debugging Windows, that's why we 
were asking here. Their hypothesis is, that the VD driver for accessing RBD 
images has problems with Windows servers newer than 2016. I'm not a Windows 
guy, so can't really comment on this.

The test we do is a simple copy-test of a single 10g file and we monitor the 
transfer speed. This info was cut out of this e-mail, the original report for 
reference is: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/
 .

We are very sure that it is not related to other processes writing to disk, we 
monitor that too. There is also no competition on the RBD pool at the time of 
testing.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 13 July 2020 10:24
To: ceph-users; Frank Schilder
Subject: RE: [ceph-users] Re: Poor Windows performance on ceph RBD.


To anyone who is following this thread, we found a possible

explanation for

(some of) our observations.

If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?



-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk]
Sent: maandag 13 juli 2020 9:28
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible explanation
for (some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers
exporting data on an rbd image/disk. We recently found that Windows
server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
With Windows server 2019, however, we observe near-complete stall of
file transfers and time-outs using standard copy tools (robocopy). We
don't have an explanation yet and are downgrading Windows servers where
possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


I am not sure exactly how you are testing the speed on Windows, but 2 
possible factors are block size and caching.


Block size depends on the client application, so a Windows file copy 
from ui will have a 512k block size which is different than xcopy or 
robocopy, the later can change block size depending on flags  / restart 
mode..etc. Similarly the dd command on Linux will give different speed 
depending on block size.


Caching: Caching will make a big difference for sequential writes as it 
merges smaller blocks, but in some cases it is not obvious if caching is 
being used or not since it could be at different layers, for example in 
your Linux Samba export test, there could be caching done at the 
gateway, a clustered setup with high availability may explicitly turn 
caching off. What you report as initial high speed then decrease could 
be indicative of initially writing to a cache buffer then slowing when 
it fills.


It will help to quantify/compare latency (iops qd=1) via:
on Linux:
rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size 
4K  --io-pattern rand --rbd_cache=false
fio --name=xx --filename=FILE_NAME --iodepth=1 --rw=randwrite --bs=4k 
--direct=1 --runtime=30 --time_based

On Windows vm:
diskspd -b4k -d30 -o1 -t1 -r -Su -w100 -c1G  FILE_NAME


Measure/compare sequential writes with 512k block size
on Linux:
rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size 
512K  --io-pattern seq --rbd_cache=false
fio --name=xx --filename=FILE_NAME --

[ceph-users] Adding OpenStack Keystone integrated radosGWs to an existing radosGW cluster

2020-07-13 Thread Thomas Byrne - UKRI STFC
Hi all,

We run a simple single zone nautilus radosGW instance with a few gateway 
machines for some of our users. I've got some more gateway machines earmarked 
for the purpose of adding some OpenStack Keystone integrated RadosGW gateways 
to the cluster. I'm not sure how best to add them alongside the exiting 
radosGWs gateways/infrastructure. The options I think I have are:


1)  Add keystone integration to all radosGWs gateways. Simplest, but I have 
(possibly unfounded) concerns about issues with keystone causing problems for 
non OpenStack users (added authentication latency), and I'm not sure I fully 
understand how the OpenStack users/buckets will interact with our existing 
users.

2)  Add keystone integration to separate gateways. This keeps the radosGW 
servers separate, and deals with one of my concerns above.

3)  Add a separate radosGW zone/instance (not sure what the correct term 
is), and have separate gateways for this instance. Seems very heavyweight for 
what I'm trying to achieve, but that may be my inexperience talking.

4)  Something else entirely?

Any advice would be greatly appreciated!

Cheers,
Tom

This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses. Opinions, conclusions 
or other information in this message and attachments that are not related 
directly to UKRI business are solely those of the author and do not represent 
the views of UKRI.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] "task status" section in ceph -s output new?

2020-07-13 Thread Rainer Krienke
Hello,

today I upgraded from 14.2.9 to 14.2.10. Everything worked just fine. I
think at the time I upgraded the last of my three MDS servers something
new appeard in the output of ceph -s:

# ceph -s
  cluster:
id: xyz
health: HEALTH_OK

  services:
   ...
   ...

  task status:
scrub status:
mds.myceph3: idle

The "task status" section is new to me. Is it there to stay and what
kind of task status messages might appear there?

At the moment it displays information about  ceph fs scrub status
telling me nothig is going on here. Since I recently started using
cephfs reading the message I started asking myself if I should scrub
cephfs regularly or does ceph do this on its own or is scrubbing only
needed in case cephs has been damaged?

Does anyone know more about this?

Thanks for your help
Rainer
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
To anyone who is following this thread, we found a possible explanation for 
(some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers 
exporting data on an rbd image/disk. We recently found that Windows server 2016 
runs fine. It is still not as fast as Linux + SAMBA share on an rbd image (ca. 
50%), but runs with a reasonable sustained bandwidth. With Windows server 2019, 
however, we observe near-complete stall of file transfers and time-outs using 
standard copy tools (robocopy). We don't have an explanation yet and are 
downgrading Windows servers where possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
> > To anyone who is following this thread, we found a possible explanation for
> > (some of) our observations.

> If someone is following this, they probably want the possible
> explanation and not the knowledge of you having the possible
> explanation.
 
> So you are saying if you do eg. a core installation (without gui) of
> 2016/2019 disable all services. The fio test results are signficantly
> different to eg. a centos 7 vm doing the same fio test? Are you sure
> this is not related to other processes writing to disk? 

Right, its not an explanation but rather a further observation. We don't really 
have an explanation yet.

Its an identical installation of both server versions, same services 
configured. Our operators are not really into debugging Windows, that's why we 
were asking here. Their hypothesis is, that the VD driver for accessing RBD 
images has problems with Windows servers newer than 2016. I'm not a Windows 
guy, so can't really comment on this.

The test we do is a simple copy-test of a single 10g file and we monitor the 
transfer speed. This info was cut out of this e-mail, the original report for 
reference is: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/
 .

We are very sure that it is not related to other processes writing to disk, we 
monitor that too. There is also no competition on the RBD pool at the time of 
testing.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 13 July 2020 10:24
To: ceph-users; Frank Schilder
Subject: RE: [ceph-users] Re: Poor Windows performance on ceph RBD.

>> To anyone who is following this thread, we found a possible
explanation for
>> (some of) our observations.

If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?



-Original Message-
From: Frank Schilder [mailto:fr...@dtu.dk]
Sent: maandag 13 juli 2020 9:28
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible explanation
for (some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers
exporting data on an rbd image/disk. We recently found that Windows
server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
With Windows server 2019, however, we observe near-complete stall of
file transfers and time-outs using standard copy tools (robocopy). We
don't have an explanation yet and are downgrading Windows servers where
possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs: creating two subvolumegroups with dedicated data pool...

2020-07-13 Thread Christoph Ackermann
Hello list,

i have a strange issue creating cephfs subvolumegroup  with  dedicated
pools

We have four fs (volumes); one is "backup" I would like to set up two
subvolumegroup  "veeampool" and "bareospool" each with dedicated data
pools  "cephfs.backup.veeam.data" and "cephfs.backup.bareos.data".  Both
EC42 with overwrites enabled and application cephfs. Pls see below.

isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
veeampool  --pool_layout cephfs.backup.veeam.data

...First one is ok.
 
isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
bareospool --pool_layout cephfs.backup.bareos.data
Error EINVAL: Invalid pool layout 'cephfs.backup.bareos.data'. It must
be a valid data pool

Second one throws this error...   Is there only one data pool allowed
per volume?

Many thanks for clarification

Christoph

...pool ls detail
[..]
pool 44 'cephfs.backup.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
1132068
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
recovery_priority 5 application cephfs

pool 45 'cephfs.backup.data' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
1132068
flags hashpspool stripe_width 0 application cephfs


pool 46 'cephfs.backup.veeam.data' erasure profile ec-42-profile size 6
min_size 5 crush_rule 9
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 1174754 lfor 0/0/1174708
flags hashpspool,ec_overwrites stripe_width 16384 application cephfs

pool 48 'cephfs.backup.bareos.data' erasure profile ec-42-profile size 6
min_size 5 crush_rule 10
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 1176205
flags hashpspool,ec_overwrites stripe_width 16384 application cephfs



-- 
Christoph Ackermann | System Engineer


INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken
Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33 | 
mailto:c.ackerm...@infoserve.de | https://www.infoserve.de

INFOSERVE Datenschutzhinweise: https://infoserve.de/datenschutz
Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort: Saarbrücken
Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599

 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD memory leak?

2020-07-13 Thread Mark Nelson

Hi Frank,


So the osd_memory_target code will basically shrink the size of the 
bluestore and rocksdb caches to attempt to keep the overall mapped (not 
rss!) memory of the process below the target.  It's sort of "best 
effort" in that it can't guarantee the process will fit within a given 
target, it will just (assuming we are over target) shrink the caches up 
to some minimum value and that's it. 2GB per OSD is a pretty ambitious 
target.  It's the lowest osd_memory_target we recommend setting.  I'm a 
little surprised the OSD is consuming this much memory with a 2GB target 
though.


Looking at your mempool dump I see very little memory allocated to the 
caches.  In fact the majority is taken up by osdmap (looks like you have 
a decent number of OSDs) and pglog.  That indicates that the memory 
autotuning is probably working but simply can't do anything more to 
help.  Something else is taking up the memory. Figure you've got a 
little shy of 500MB for the mempools.  RocksDB will take up more (and 
potentially quite a bit more if you have memtables backing up waiting to 
be flushed to L0) and potentially some other things in the OSD itself 
that could take up memory.  If you feel comfortable experimenting, you 
could try changing the rocksdb WAL/memtable settings.  By default we 
have up to 4 256MB WAL buffers.  Instead you could try something like 2 
64MB buffers, but be aware this could cause slow performance or even 
temporary write stalls if you have fast storage.  Still, this would only 
give you up to ~0.9GB back.  Since you are on mimic, you might also want 
to check what your kernel's transparent huge pages configuration is.  I 
don't remember if we backported Patrick's fix to always avoid THP for 
ceph processes.  If your kernel is set to "always", you might consider 
trying it with "madvise".


Alternately, have you tried the built-in tcmalloc heap profiler? You 
might be able to get a better sense of where memory is being used with 
that as well.



Mark


On 7/13/20 7:07 AM, Frank Schilder wrote:

Hi all,

on a mimic 13.2.8 cluster I observe a gradual increase of memory usage by OSD 
daemons, in particular, under heavy load. For our spinners I use 
osd_memory_target=2G. The daemons overrun the 2G in virt size rather quickly 
and grow to something like 4G virtual. The real memory consumption stays more 
or less around the 2G of the target. There are some overshoots, but these go 
down again during periods with less load.

What I observe now is that the actual memory consumption slowly grows and OSDs 
start using more than 2G virtual memory. I see this as slowly growing swap 
usage despite having more RAM available (swappiness=10). This indicates 
allocated but unused memory or memory not accessed for a long time, usually a 
leak. Here some heap stats:

Before restart:
osd.101 tcmalloc heap stats:
MALLOC: 3438940768 ( 3279.6 MiB) Bytes in use by application
MALLOC: +  5611520 (5.4 MiB) Bytes in page heap freelist
MALLOC: +257307352 (  245.4 MiB) Bytes in central cache freelist
MALLOC: +   357376 (0.3 MiB) Bytes in transfer cache freelist
MALLOC: +  6727368 (6.4 MiB) Bytes in thread cache freelists
MALLOC: + 25559040 (   24.4 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   3734503424 ( 3561.5 MiB) Actual memory used (physical + swap)
MALLOC: +575946752 (  549.3 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   4310450176 ( 4110.8 MiB) Virtual address space used
MALLOC:
MALLOC: 382884  Spans in use
MALLOC: 35  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

# ceph daemon osd.101 dump_mempools
{
 "mempool": {
 "by_pool": {
 "bloom_filter": {
 "items": 0,
 "bytes": 0
 },
 "bluestore_alloc": {
 "items": 4691828,
 "bytes": 37534624
 },
 "bluestore_cache_data": {
 "items": 0,
 "bytes": 0
 },
 "bluestore_cache_onode": {
 "items": 51,
 "bytes": 28968
 },
 "bluestore_cache_other": {
 "items": 5761276,
 "bytes": 46292425
 },
 "bluestore_fsck": {
 "items": 0,
 "bytes": 0
 },
 "bluestore_txc": {
 "items": 67,
 "bytes": 46096
 },
 "bluestore_writing_deferred": {
 "items": 208,
 "bytes": 26037057
 },
 "bluestore_writing": {
 "items": 52,
 "bytes": 6789398
 },
 "bluefs": {
 "items": 9478,
 "bytes": 183720
  

[ceph-users] Re: ceph fs resize

2020-07-13 Thread Christoph Ackermann
Hello hw,

your cephfs storage limit is your RADOS cluster limit. Each OSD you add
gives you more space to allocate.

> 873 GiB data, 2.7 TiB used, 3.5 TiB / 6.2 TiB avail

>> Actually you have 873GiB used with replica 3 eval to 2.7 TiB

>> "3.5 TiB / 6.2 TiB avail" means you have 3.5 TiB free of 6.2TiB 
(with replica 3 about 1TiB usable capacity)

Beware of filling up your cluster more than 85%

Best regards,
Christoph



Am 07.07.20 um 12:23 schrieb hw:
> Hi!
>
> My ceph version is ceph version 15.2.3
> (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)
>
> I have ceph fs and I add new osd to my cluster.
>
> ceph pg stat:
>
> 289 pgs: 1 active+clean+scrubbing+deep, 288 active+clean; 873 GiB
> data, 2.7 TiB used, 3.5 TiB / 6.2 TiB avail
>
> How I can extend my ceph fs from 3.5 TiB to 6.2 TiB avail
>
> Detail information:
>
> ceph fs status
> static - 2 clients
> ==
> RANK  STATE   MDS  ACTIVITY DNS    INOS
>  0    active  static.ceph02.sgpdiv  Reqs:    0 /s   136k   128k
>   POOL TYPE USED  AVAIL
> static_metadata  metadata  10.1G   973G
>  static    data    2754G   973G
>     STANDBY MDS
> static.ceph05.aylgvy
> static.ceph04.wsljnw
> MDS version: ceph version 15.2.3
> (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)
>
> ceph osd pool autoscale-status
> POOL SIZE  TARGET SIZE  RATE  RAW CAPACITY RATIO 
> TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM AUTOSCALE
> device_health_metrics  896.1k    3.0 6399G
> 0.  1.0   1 on
> static 874.0G    3.0 6399G
> 0.4097  1.0 256 on
> static_metadata 3472M    3.0 6399G
> 0.0016  4.0  32 on
>
> ceph df
> --- RAW STORAGE ---
> CLASS  SIZE AVAIL    USED RAW USED  %RAW USED
> hdd    6.2 TiB  3.5 TiB  2.7 TiB   2.7 TiB  43.55
> TOTAL  6.2 TiB  3.5 TiB  2.7 TiB   2.7 TiB  43.55
>
> --- POOLS ---
> POOL   ID  STORED   OBJECTS  USED %USED  MAX AVAIL
> device_health_metrics   1  896 KiB   20  2.6 MiB  0    973 GiB
> static 14  874 GiB    1.44M  2.7 TiB  48.54    973 GiB
> static_metadata    15  3.4 GiB    2.53M   10 GiB   0.35    973 GiB
>
> ceph fs volume ls
> [
>     {
>     "name": "static"
>     }
> ]
>
> ceph fs subvolume ls static
> []
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 

Christoph Ackermann | System Engineer


INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken
Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33
| c.ackerm...@infoserve.de
 | _www.infoserve.de_


INFOSERVE Datenschutzhinweise: _www.infoserve.de/datenschutz_

Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort:
Saarbrücken
Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599

INFOSERVE GmbH | Homepage 

Facebook 



Xing 



YouTube 



LinkedIn 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph `realm pull` permission denied error

2020-07-13 Thread Alex Hussein-Kershaw
Hi Ceph Users,

I'm struggling with an issue that I'm hoping someone can point me towards a 
solution.

We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup 
that I'm working with has 3 VMs, but of-course our design expects this to be 
scaled by a user as appropriate. I have a cluster deployed and it's functioning 
happily as storage for our product, the error occurs when I go to setup a 
second cluster and pair it with the first. I'm using ceph-ansible to deploy.  I 
get the following error about 20 minutes into running the site-container 
playbook.

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  TASK [ceph-rgw : fetch the realm] 
***

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  Thursday 09 July 2020  14:21:10 
+ (0:00:00.410)   0:16:18.245 *
2020-07-09 14:21:11,901 p=2134 u=qs-admin |  fatal: [10.225.21.213 -> 
10.225.21.213]: FAILED! => changed=true
  cmd:
  - docker
  - exec
  - ceph-mon-albamons_sc2
  - radosgw-admin
  - realm
  - pull
  - --url=https://10.225.36.197:7480
  - --access-key=2CQ006Lereqpysbr0l0s
  - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
  delta: '0:00:00.545895'
  end: '2020-07-09 14:21:11.516539'
  msg: non-zero return code
  rc: 13
  start: '2020-07-09 14:21:10.970644'
  stderr: |-
request failed: (13) Permission denied
If the realm has been changed on the master zone, the master zone's gateway 
may need to be restarted to recognize this user.
  stderr_lines: 
  stdout: ''
  stdout_lines: 

Re-running the command manually reproduces the error. I understand that the 
permission denied error appears to indicate the keys are not valid, suggested 
by https://tracker.ceph.com/issues/36619. However, I've triple checked the keys 
are correct on the other site. I'm at a loss of where to look for debugging, 
I've turned up logs on both the local and remote site for RGW and MON processes 
but neither seem to yield anything related. I've tried restarting everything as 
suggested in the error text from all the processes to a full reboot of all the 
VMs. I've no idea why the keys are being declined either, as they are correct 
(or atleast `radosgw-admin period get` on the primary site thinks so).

Thanks for your help,
Alex
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph `realm pull` permission denied error

2020-07-13 Thread Alex Hussein-Kershaw
Hi Zhenshi,

Thanks for the suggestion, unfortunately I have tried this already and had no 
luck ☹

Best wishes,
Alex

From: Zhenshi Zhou 
Sent: 13 July 2020 10:58
To: Alex Hussein-Kershaw 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Ceph `realm pull` permission denied error

NOTE: Message is from an external sender
Hi Alex,

I didn't deploy this in containers/vms, as well as ansible or other tools.
However I deployed multisite once and I remember that I restarted the
rgw on the master site before I sync realm on the secondary site.

I'm not sure if this can help.


Alex Hussein-Kershaw 
mailto:alex.hussein-kers...@metaswitch.com>>
 于2020年7月13日周一 下午5:48写道:
Hi Ceph Users,

I'm struggling with an issue that I'm hoping someone can point me towards a 
solution.

We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup 
that I'm working with has 3 VMs, but of-course our design expects this to be 
scaled by a user as appropriate. I have a cluster deployed and it's functioning 
happily as storage for our product, the error occurs when I go to setup a 
second cluster and pair it with the first. I'm using ceph-ansible to deploy.  I 
get the following error about 20 minutes into running the site-container 
playbook.

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  TASK [ceph-rgw : fetch the realm] 
***

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  Thursday 09 July 2020  14:21:10 
+ (0:00:00.410)   0:16:18.245 *
2020-07-09 14:21:11,901 p=2134 u=qs-admin |  fatal: [10.225.21.213 -> 
10.225.21.213]: FAILED! => changed=true
  cmd:
  - docker
  - exec
  - ceph-mon-albamons_sc2
  - radosgw-admin
  - realm
  - pull
  - 
--url=https://10.225.36.197:7480
  - --access-key=2CQ006Lereqpysbr0l0s
  - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
  delta: '0:00:00.545895'
  end: '2020-07-09 14:21:11.516539'
  msg: non-zero return code
  rc: 13
  start: '2020-07-09 14:21:10.970644'
  stderr: |-
request failed: (13) Permission denied
If the realm has been changed on the master zone, the master zone's gateway 
may need to be restarted to recognize this user.
  stderr_lines: 
  stdout: ''
  stdout_lines: 

Re-running the command manually reproduces the error. I understand that the 
permission denied error appears to indicate the keys are not valid, suggested 
by 
https://tracker.ceph.com/issues/36619.
 However, I've triple checked the keys are correct on the other site. I'm at a 
loss of where to look for debugging, I've turned up logs on both the local and 
remote site for RGW and MON processes but neither seem to yield anything 
related. I've tried restarting everything as suggested in the error text from 
all the processes to a full reboot of all the VMs. I've no idea why the keys 
are being declined either, as they are correct (or atleast `radosgw-admin 
period get` on the primary site thinks so).

Thanks for your help,
Alex
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mon_osd_down_out_subtree_limit not working?

2020-07-13 Thread Frank Schilder
Hi all,

on a mimic 13.2.8 cluster I have set

  mon  advanced mon_osd_down_out_subtree_limithost

According to the documentation:

===
mon osd down out subtree limit

Description

The smallest CRUSH unit type that Ceph will not automatically mark out. For 
instance, if set to host and if all OSDs of a host are down, Ceph will not 
automatically mark out these OSDs.
===

if I shut down all OSDs on this host, these OSDs should not be marked out 
automatically after mon_osd_down_out_interval(=600) seconds. I did a test today 
and, unfortunately, the OSDs do get marked as out. Ceph status was showing 1 
host down as expected.

Am I doing something wrong or misreading the documentation?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD memory leak?

2020-07-13 Thread Frank Schilder
Hi all,

on a mimic 13.2.8 cluster I observe a gradual increase of memory usage by OSD 
daemons, in particular, under heavy load. For our spinners I use 
osd_memory_target=2G. The daemons overrun the 2G in virt size rather quickly 
and grow to something like 4G virtual. The real memory consumption stays more 
or less around the 2G of the target. There are some overshoots, but these go 
down again during periods with less load.

What I observe now is that the actual memory consumption slowly grows and OSDs 
start using more than 2G virtual memory. I see this as slowly growing swap 
usage despite having more RAM available (swappiness=10). This indicates 
allocated but unused memory or memory not accessed for a long time, usually a 
leak. Here some heap stats:

Before restart:
osd.101 tcmalloc heap stats:
MALLOC: 3438940768 ( 3279.6 MiB) Bytes in use by application
MALLOC: +  5611520 (5.4 MiB) Bytes in page heap freelist
MALLOC: +257307352 (  245.4 MiB) Bytes in central cache freelist
MALLOC: +   357376 (0.3 MiB) Bytes in transfer cache freelist
MALLOC: +  6727368 (6.4 MiB) Bytes in thread cache freelists
MALLOC: + 25559040 (   24.4 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   3734503424 ( 3561.5 MiB) Actual memory used (physical + swap)
MALLOC: +575946752 (  549.3 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   4310450176 ( 4110.8 MiB) Virtual address space used
MALLOC:
MALLOC: 382884  Spans in use
MALLOC: 35  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

# ceph daemon osd.101 dump_mempools
{
"mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 4691828,
"bytes": 37534624
},
"bluestore_cache_data": {
"items": 0,
"bytes": 0
},
"bluestore_cache_onode": {
"items": 51,
"bytes": 28968
},
"bluestore_cache_other": {
"items": 5761276,
"bytes": 46292425
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 67,
"bytes": 46096
},
"bluestore_writing_deferred": {
"items": 208,
"bytes": 26037057
},
"bluestore_writing": {
"items": 52,
"bytes": 6789398
},
"bluefs": {
"items": 9478,
"bytes": 183720
},
"buffer_anon": {
"items": 291450,
"bytes": 28093473
},
"buffer_meta": {
"items": 546,
"bytes": 34944
},
"osd": {
"items": 98,
"bytes": 1139152
},
"osd_mapbl": {
"items": 78,
"bytes": 8204276
},
"osd_pglog": {
"items": 341944,
"bytes": 120607952
},
"osdmap": {
"items": 10687217,
"bytes": 186830528
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
}
},
"total": {
"items": 21784293,
"bytes": 461822613
}
}
}

Right after restart + health_ok:
osd.101 tcmalloc heap stats:
MALLOC: 1173996280 ( 1119.6 MiB) Bytes in use by application
MALLOC: +  3727360 (3.6 MiB) Bytes in page heap freelist
MALLOC: + 25493688 (   24.3 MiB) Bytes in central cache freelist
MALLOC: + 17101824 (   16.3 MiB) Bytes in transfer cache freelist
MALLOC: + 20301904 (   19.4 MiB) Bytes in thread cache freelists
MALLOC: +  5242880 (5.0 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   1245863936 ( 1188.1 MiB) Actual memory used (physical + swap)
MALLOC: + 20488192 (   19.5 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   1266352128 ( 1207.7 MiB) Virtual address space used
MALLOC:
MALLOC:  54160  Spans in use
MALLOC: 33

[ceph-users] Re: "task status" section in ceph -s output new?

2020-07-13 Thread Patrick Donnelly
On Mon, Jul 13, 2020 at 4:12 AM Rainer Krienke  wrote:
>
> Hello,
>
> today I upgraded from 14.2.9 to 14.2.10. Everything worked just fine. I
> think at the time I upgraded the last of my three MDS servers something
> new appeard in the output of ceph -s:
>
> # ceph -s
>   cluster:
> id: xyz
> health: HEALTH_OK
>
>   services:
>...
>...
>
>   task status:
> scrub status:
> mds.myceph3: idle
>
> The "task status" section is new to me. Is it there to stay and what
> kind of task status messages ight appear there?

Any kind of cluster task. Right now (for CephFS), we just use it for
on-going scrubs. There's a bug where idle scrub is continually
reported. It will be resolved in Nautilus with this backport:
https://tracker.ceph.com/issues/46480

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: creating two subvolumegroups with dedicated data pool...

2020-07-13 Thread Patrick Donnelly
On Mon, Jul 13, 2020 at 6:09 AM Christoph Ackermann
 wrote:
>
> Hello list,
>
> i have a strange issue creating cephfs subvolumegroup  with  dedicated
> pools
>
> We have four fs (volumes); one is "backup" I would like to set up two
> subvolumegroup  "veeampool" and "bareospool" each with dedicated data
> pools  "cephfs.backup.veeam.data" and "cephfs.backup.bareos.data".  Both
> EC42 with overwrites enabled and application cephfs. Pls see below.
>
> isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
> veeampool  --pool_layout cephfs.backup.veeam.data
>
> ...First one is ok.
>
> isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
> bareospool --pool_layout cephfs.backup.bareos.data
> Error EINVAL: Invalid pool layout 'cephfs.backup.bareos.data'. It must
> be a valid data pool

Did you forget to add the data pool to the volume (file system)?

https://docs.ceph.com/docs/master/cephfs/administration/#file-systems

See "add_data_pool".


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v14.2.10 Nautilus crash

2020-07-13 Thread Sven Kieske
On Fr, 2020-07-10 at 23:52 +0200, Dan van der Ster wrote:
> Otherwise, the question I ask everyone with osdmap issues these days:
> are you using bluestore compression and lz4?

Hi,

first time on this list, so hi everybody!

We saw these crashes with 14.2.9 and upgraded today to 14.2.10.
No crashes so far since the upgrade (but it's running for only a few hours now).

To answer the question: no, we don't run bluestore compression.

But to add some more questioning:

In the bug report/fix https://tracker.ceph.com/issues/46443 it is mentioned
that this is a bug in the linux kernels monotonic timer implementation.

However it is not mentioned if this was ever reported to the upstream linux 
kernel, or if there is
a fix available for the kernel itself.

Has anyone information regarding a bug report against the kernel?
Did someone very this bug against the latest upstream kernel?

If this is not the case, we will try to reproduce with upstream
and possibly report it as a bug, because I think it is important to fix this
upstream (if it is a problem with non distro kernels).

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler
 
 
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp
 
Tel.: 05772 / 293-900
Fax: 05772 / 293-333
 
https://www.mittwald.de
 
Geschäftsführer: Robert Meyer, Florian Jürgens
 
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.



signature.asc
Description: This is a digitally signed message part
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: creating two subvolumegroups with dedicated data pool...

2020-07-13 Thread Christoph Ackermann
D'OH!  That's the hint of the day. 

First SVG i've created some time ago, today second one.  :-/

Thank you much!

Christoph


Am 13.07.20 um 18:33 schrieb Patrick Donnelly:
> On Mon, Jul 13, 2020 at 6:09 AM Christoph Ackermann
>  wrote:
>> isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
>> veeampool  --pool_layout cephfs.backup.veeam.data
>>
>> ...First one is ok.
>>
>> isceph@ceph-deploy:~$ sudo ceph fs subvolumegroup create backup
>> bareospool --pool_layout cephfs.backup.bareos.data
>> Error EINVAL: Invalid pool layout 'cephfs.backup.bareos.data'. It must
>> be a valid data pool
> Did you forget to add the data pool to the volume (file system)?
>
> https://docs.ceph.com/docs/master/cephfs/administration/#file-systems
>
> See "add_data_pool".
>
>

-- 

Christoph Ackermann | System Engineer


INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken
Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33
| c.ackerm...@infoserve.de
 | _www.infoserve.de_


INFOSERVE Datenschutzhinweise: _www.infoserve.de/datenschutz_

Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort:
Saarbrücken
Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599

INFOSERVE GmbH | Homepage 

Facebook 



Xing 



YouTube 



LinkedIn 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph stuck at: objects misplaced (0.064%)

2020-07-13 Thread ceph



Am 9. Juli 2020 08:32:32 MESZ schrieb Eugen Block :
>Do you have pg_autoscaler enabled or the balancer module?
>

AFAIK luminous does not Support pg_autoscaler

My guess, you dont have enough space for ceph to create 3rd Copy of your pgs on 
node2

Or did it solved by hisself?

Hth
Mehmet 

>
>Zitat von Ml Ml :
>
>> Hello,
>>
>> ceph is stuck since 4 days with 0.064% misplaced and i dunno why. Can
>> anyone help me to get it fixed?
>> I did restart some OSDs and reweight them again to get some data
>> moving but that did not help.
>>
>> root@node01:~ # ceph -s
>> cluster:
>> id: 251c937e-0b55-48c1-8f34-96e84e4023d4
>> health: HEALTH_WARN
>> 1803/2799972 objects misplaced (0.064%)
>> mon node02 is low on available space
>>
>> services:
>> mon: 3 daemons, quorum node01,node02,node03
>> mgr: node03(active), standbys: node01, node02
>> osd: 16 osds: 16 up, 16 in; 1 remapped pgs
>>
>> data:
>> pools: 1 pools, 512 pgs
>> objects: 933.32k objects, 2.68TiB
>> usage: 9.54TiB used, 5.34TiB / 14.9TiB avail
>> pgs: 1803/2799972 objects misplaced (0.064%)
>> 511 active+clean
>> 1 active+clean+remapped
>>
>> io:
>> client: 131KiB/s rd, 8.57MiB/s wr, 28op/s rd, 847op/s wr
>>
>> root@node01:~ # ceph health detail
>> HEALTH_WARN 1803/2800179 objects misplaced (0.064%); mon node02 is
>low
>> on available space
>> OBJECT_MISPLACED 1803/2800179 objects misplaced (0.064%)
>> MON_DISK_LOW mon node02 is low on available space
>> mon.node02 has 28% avail
>> root@node01:~ # ceph versions
>> {
>> "mon": {
>> "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd)
>> luminous (stable)": 3
>> },
>> "mgr": {
>> "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd)
>> luminous (stable)": 3
>> },
>> "osd": {
>> "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd)
>> luminous (stable)": 16
>> },
>> "mds": {},
>> "overall": {
>> "ceph version 12.2.13 (98af9a6b9a46b2d562a0de4b09263d70aeb1c9dd)
>> luminous (stable)": 22
>> }
>> }
>>
>> root@node02:~ # df -h
>> Filesystem Size Used Avail Use% Mounted on
>> udev 63G 0 63G 0% /dev
>> tmpfs 13G 1.3G 12G 11% /run
>> /dev/sda3 46G 31G 14G 70% /
>> tmpfs 63G 57M 63G 1% /dev/shm
>> tmpfs 5.0M 0 5.0M 0% /run/lock
>> tmpfs 63G 0 63G 0% /sys/fs/cgroup
>> /dev/sda1 922M 206M 653M 24% /boot
>> /dev/fuse 30M 144K 30M 1% /etc/pve
>> /dev/sde1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-11
>> /dev/sdf1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-14
>> /dev/sdc1 889G 676G 214G 77% /var/lib/ceph/osd/ceph-3
>> /dev/sdb1 889G 667G 222G 76% /var/lib/ceph/osd/ceph-2
>> /dev/sdd1 93M 5.4M 88M 6% /var/lib/ceph/osd/ceph-7
>> tmpfs 13G 0 13G 0% /run/user/0
>>
>> root@node02:~ # ceph osd tree
>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>> -1 14.34781 root default
>> -2 4.25287 host node01
>> 0 hdd 0.85999 osd.0 up 0.80005 1.0
>> 1 hdd 0.86749 osd.1 up 0.85004 1.0
>> 6 hdd 0.87270 osd.6 up 0.90002 1.0
>> 12 hdd 0.78000 osd.12 up 0.95001 1.0
>> 13 hdd 0.87270 osd.13 up 0.95001 1.0
>> -3 3.91808 host node02
>> 2 hdd 0.7 osd.2 up 0.80005 1.0
>> 3 hdd 0.5 osd.3 up 0.85004 1.0
>> 7 hdd 0.87270 osd.7 up 0.85004 1.0
>> 11 hdd 0.87270 osd.11 up 0.75006 1.0
>> 14 hdd 0.87270 osd.14 up 0.85004 1.0
>> -4 6.17686 host node03
>> 4 hdd 0.87000 osd.4 up 1.0 1.0
>> 5 hdd 0.87000 osd.5 up 1.0 1.0
>> 8 hdd 0.87270 osd.8 up 1.0 1.0
>> 10 hdd 0.87270 osd.10 up 1.0 1.0
>> 15 hdd 0.87270 osd.15 up 1.0 1.0
>> 16 hdd 1.81879 osd.16 up 1.0 1.0
>>
>> root@node01:~ # ceph osd df tree
>> ID CLASS WEIGHT   REWEIGHT SIZEUSE DATAOMAPMETA
>> AVAIL   %USE  VAR  PGS TYPE NAME
>> -1   14.55780- 14.9TiB 9.45TiB 7.46TiB 1.47GiB 23.2GiB
>> 5.43TiB 63.52 1.00   - root default
>> -24.27286- 4.35TiB 3.15TiB 2.41TiB  486MiB 7.62GiB
>> 1.21TiB 72.32 1.14   - host node01
>>  0   hdd  0.85999  0.80005  888GiB  619GiB  269GiB 92.3MiB  0B
>> 269GiB 69.72 1.10  89 osd.0
>>  1   hdd  0.86749  0.85004  888GiB  641GiB  248GiB  109MiB  0B
>> 248GiB 72.12 1.14  92 osd.1
>>  6   hdd  0.87270  0.90002  894GiB  634GiB  632GiB 98.9MiB 2.65GiB
>> 259GiB 70.99 1.12 107 osd.6
>> 12   hdd  0.7  0.95001  894GiB  664GiB  661GiB 94.4MiB 2.52GiB
>> 230GiB 74.31 1.17 112 osd.12
>> 13   hdd  0.87270  0.95001  894GiB  665GiB  663GiB 91.7MiB 2.46GiB
>> 229GiB 74.43 1.17 112 osd.13
>> -34.10808- 4.35TiB 3.17TiB 2.18TiB  479MiB 6.99GiB
>> 1.18TiB 72.86 1.15   - host node02
>>  2   hdd  0.78999  0.75006  888GiB  654GiB  235GiB 95.6MiB  0B
>> 235GiB 73.57 1.16  94 osd.2
>>  3   hdd  0.7  0.80005  888GiB  737GiB  151GiB  114MiB  0B
>> 151GiB 82.98 1.31 105 osd.3
>>  7   hdd  0.87270  0.85004  894GiB  612GiB  610GiB 88.9MiB 2.43GiB
>> 281GiB 68.50 1.08 103 osd.7
>> 11   hdd  0.87270  0.75006  894GiB  576GiB  574GiB 81.8MiB 2.19GiB
>> 317GiB 64.47 1.01  97 osd.11
>> 14   hdd  0.87270  0.85004  894GiB  66

[ceph-users] Re: Ceph `realm pull` permission denied error

2020-07-13 Thread Alex Hussein-Kershaw
I got to the bottom of this – was caused by NTP server issues and a 1 hour time 
discrepancy between the two clusters.

This was pretty painful to get to the bottom of (didn’t find any useful logs, 
most description Ceph gave me was the “permission denied” error) – hopefully 
some future engineer can save some time from my troubles!

Thanks,
Alex

From: Alex Hussein-Kershaw
Sent: 13 July 2020 12:22
To: Zhenshi Zhou 
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Ceph `realm pull` permission denied error

Hi Zhenshi,

Thanks for the suggestion, unfortunately I have tried this already and had no 
luck ☹

Best wishes,
Alex

From: Zhenshi Zhou mailto:deader...@gmail.com>>
Sent: 13 July 2020 10:58
To: Alex Hussein-Kershaw 
mailto:alex.hussein-kers...@metaswitch.com>>
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Ceph `realm pull` permission denied error

NOTE: Message is from an external sender
Hi Alex,

I didn't deploy this in containers/vms, as well as ansible or other tools.
However I deployed multisite once and I remember that I restarted the
rgw on the master site before I sync realm on the secondary site.

I'm not sure if this can help.


Alex Hussein-Kershaw 
mailto:alex.hussein-kers...@metaswitch.com>>
 于2020年7月13日周一 下午5:48写道:
Hi Ceph Users,

I'm struggling with an issue that I'm hoping someone can point me towards a 
solution.

We are using Nautilus (14.2.9) deploying Ceph in containers, in VMs. The setup 
that I'm working with has 3 VMs, but of-course our design expects this to be 
scaled by a user as appropriate. I have a cluster deployed and it's functioning 
happily as storage for our product, the error occurs when I go to setup a 
second cluster and pair it with the first. I'm using ceph-ansible to deploy.  I 
get the following error about 20 minutes into running the site-container 
playbook.

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  TASK [ceph-rgw : fetch the realm] 
***

2020-07-09 14:21:10,966 p=2134 u=qs-admin |  Thursday 09 July 2020  14:21:10 
+ (0:00:00.410)   0:16:18.245 *
2020-07-09 14:21:11,901 p=2134 u=qs-admin |  fatal: [10.225.21.213 -> 
10.225.21.213]: FAILED! => changed=true
  cmd:
  - docker
  - exec
  - ceph-mon-albamons_sc2
  - radosgw-admin
  - realm
  - pull
  - 
--url=https://10.225.36.197:7480
  - --access-key=2CQ006Lereqpysbr0l0s
  - --secret=JM3S5Hd49Nz03eIbTTNnEyqcXJkIOXbp0gWIUEbp
  delta: '0:00:00.545895'
  end: '2020-07-09 14:21:11.516539'
  msg: non-zero return code
  rc: 13
  start: '2020-07-09 14:21:10.970644'
  stderr: |-
request failed: (13) Permission denied
If the realm has been changed on the master zone, the master zone's gateway 
may need to be restarted to recognize this user.
  stderr_lines: 
  stdout: ''
  stdout_lines: 

Re-running the command manually reproduces the error. I understand that the 
permission denied error appears to indicate the keys are not valid, suggested 
by 
https://tracker.ceph.com/issues/36619.
 However, I've triple checked the keys are correct on the other site. I'm at a 
loss of where to look for debugging, I've turned up logs on both the local and 
remote site for RGW and MON processes but neither seem to yield anything 
related. I've tried restarting everything as suggested in the error text from 
all the processes to a full reboot of all the VMs. I've no idea why the keys 
are being declined either, as they are correct (or atleast `radosgw-admin 
period get` on the primary site thinks so).

Thanks for your help,
Alex
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
> If I may ask, which version of the virtio drivers do you use?

https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/latest-virtio/virtio-win.iso

Looks like virtio-win-0.1.185.*

> And do you use caching on libvirt driver level?

In the ONE interface, we use

  DISK = [ driver = "raw" , cache = "none"]

which translates to


  

in the XML. We have no qemu settings in the ceph.conf. Looks like caching is 
disabled. Not sure if this is the recommended way though and why caching is 
disabled by default.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: André Gemünd 
Sent: 13 July 2020 11:18
To: Frank Schilder
Subject: Re: [ceph-users] Re: Poor Windows performance on ceph RBD.

If I may ask, which version of the virtio drivers do you use?

And do you use caching on libvirt driver level?

Greetings
André

- Am 13. Jul 2020 um 10:43 schrieb Frank Schilder fr...@dtu.dk:

>> > To anyone who is following this thread, we found a possible explanation for
>> > (some of) our observations.
>
>> If someone is following this, they probably want the possible
>> explanation and not the knowledge of you having the possible
>> explanation.
>
>> So you are saying if you do eg. a core installation (without gui) of
>> 2016/2019 disable all services. The fio test results are signficantly
>> different to eg. a centos 7 vm doing the same fio test? Are you sure
>> this is not related to other processes writing to disk?
>
> Right, its not an explanation but rather a further observation. We don't 
> really
> have an explanation yet.
>
> Its an identical installation of both server versions, same services 
> configured.
> Our operators are not really into debugging Windows, that's why we were asking
> here. Their hypothesis is, that the VD driver for accessing RBD images has
> problems with Windows servers newer than 2016. I'm not a Windows guy, so can't
> really comment on this.
>
> The test we do is a simple copy-test of a single 10g file and we monitor the
> transfer speed. This info was cut out of this e-mail, the original report for
> reference is:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/
> .
>
> We are very sure that it is not related to other processes writing to disk, we
> monitor that too. There is also no competition on the RBD pool at the time of
> testing.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Marc Roos 
> Sent: 13 July 2020 10:24
> To: ceph-users; Frank Schilder
> Subject: RE: [ceph-users] Re: Poor Windows performance on ceph RBD.
>
>>> To anyone who is following this thread, we found a possible
> explanation for
>>> (some of) our observations.
>
> If someone is following this, they probably want the possible
> explanation and not the knowledge of you having the possible
> explanation.
>
> So you are saying if you do eg. a core installation (without gui) of
> 2016/2019 disable all services. The fio test results are signficantly
> different to eg. a centos 7 vm doing the same fio test? Are you sure
> this is not related to other processes writing to disk?
>
>
>
> -Original Message-
> From: Frank Schilder [mailto:fr...@dtu.dk]
> Sent: maandag 13 juli 2020 9:28
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.
>
> To anyone who is following this thread, we found a possible explanation
> for (some of) our observations.
>
> We are running Windows servers version 2016 and 2019 as storage servers
> exporting data on an rbd image/disk. We recently found that Windows
> server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
> an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
> With Windows server 2019, however, we observe near-complete stall of
> file transfers and time-outs using standard copy tools (robocopy). We
> don't have an explanation yet and are downgrading Windows servers where
> possible.
>
> If anyone has a hint what we can do, please let us know.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

--
Dipl.-Inf. André Gemünd, Leiter IT / Head of IT
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemu...@scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] multiple BLK-MQ queues for Ceph's RADOS Block Device (RBD) and CephFS

2020-07-13 Thread Bobby
Hi,

I have a question regarding support for multiple BLK-MQ queues for Ceph's
RADOS Block Device (RBD). The below given link says that the driver has
been using the BLK-MQ interface for a while but not actually multiple
queues until now with having a queue per-CPU. A change to not hold onto
caps that aren't actually needed.  These improvements and more can be found
as part of the Ceph changes for Linux 5.7, which should be released as
stable in early June.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.7-Ceph-Performance

My question is: Is it possible that through Ceph FS (Filesystem in User
Space) I can develop a multi-queue driver for Ceph? Asking because this way
I can avoid kernel space. (
https://docs.ceph.com/docs/nautilus/start/quick-cephfs/)

Looking forward for some help

BR
Bobby
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm adoption failed

2020-07-13 Thread Tobias Gall

Hello,

I'm trying to adopt an existing cluster.
The cluster consists of 5 converged (mon, mgr, osd, mds on same host) 
servers running Octopus 15.2.4.


I've followed the guide:
https://docs.ceph.com/docs/octopus/cephadm/adoption/

Adopting the first mon I've got the following problem:

root@mulberry:/home/toga# cephadm adopt --style legacy --name mon.mulberry
INFO:cephadm:Pulling latest docker.io/ceph/ceph:v15 container...
INFO:cephadm:Stopping old systemd unit ceph-mon@mulberry...
INFO:cephadm:Disabling old systemd unit ceph-mon@mulberry...
INFO:cephadm:Moving data...
INFO:cephadm:Chowning content...
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 4761, in 
r = args.func()
  File "/usr/sbin/cephadm", line 1162, in _default_image
return func()
  File "/usr/sbin/cephadm", line 3241, in command_adopt
command_adopt_ceph(daemon_type, daemon_id, fsid);
  File "/usr/sbin/cephadm", line 3387, in command_adopt_ceph
call_throws(['chown', '-c', '-R', '%d.%d' % (uid, gid), data_dir_dst])
  File "/usr/sbin/cephadm", line 844, in call_throws
out, err, ret = call(command, **kwargs)
  File "/usr/sbin/cephadm", line 784, in call
message = message_b.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 
1023: unexpected end of data


In `cephadm ls` the old mon is gone and the new is present:

{
"style": "cephadm:v1",
"name": "mon.mulberry",
"fsid": "74307e84-e1fe-4706-8312-fe47703928a1",
"systemd_unit": 
"ceph-74307e84-e1fe-4706-8312-fe47703928a1@mon.mulberry",

"enabled": false,
"state": "stopped",
"container_id": null,
"container_image_name": null,
"container_image_id": null,
"version": null,
"started": null,
"created": null,
"deployed": null,
"configured": null
}

But there is no container running.
How can I resolve this?

Regards,
Tobias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] missing ceph-mgr-dashboard and ceph-grafana-dashboards rpms for el7 and 14.2.10

2020-07-13 Thread Joel Davidow
https://download.ceph.com/rpm-nautilus/el8/noarch/ contains 
ceph-mgr-dashboard-14.2.10-0.el8.noarch.rpm and 
ceph-grafana-dashboards-14.2.10-0.el8.noarch.rpm but there is no 
14.2.10.0-el7.noarch.rpm for either ceph-mgr-dashboard or 
ceph-grafana-dashboards in https://download.ceph.com/rpm-nautilus/el7/noarch/.

I tried using 14.2.9 rpms but there is a dependency of ceph-mgr 14.2.9 and I 
just installed 14.2.10 which I want to continue running.

# rpm -Uvh 
https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-mgr-dashboard-14.2.9-0.el7.noarch.rpm
 
Retrieving 
https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-mgr-dashboard-14.2.9-0.el7.noarch.rpm
error: Failed dependencies:
ceph-grafana-dashboards = 2:14.2.9-0.el7 is needed by 
ceph-mgr-dashboard-2:14.2.9-0.el7.noarch
ceph-mgr = 2:14.2.9-0.el7 is needed by 
ceph-mgr-dashboard-2:14.2.9-0.el7.noarch
python-jwt is needed by ceph-mgr-dashboard-2:14.2.9-0.el7.noarch
python-routes is needed by ceph-mgr-dashboard-2:14.2.9-0.el7.noarch

Is this the appropriate forum to raise this issue or should I use a different 
process to get the missing rpms addressed?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Web UI errors

2020-07-13 Thread Will Payne
Hi,

I’ve been getting errors in the NFS section of the web interface. I’ve just 
tried upgrading to 15.2.4 to see if that helped but no joy.

The initial NFS page loads OK and when I click Add, a form loads. However, when 
this form attempts to update its values, I get a red box informing me that the 
server returned an error 500. it makes four HTTP calls - to daemon, clients, 
filesystems and fsals - these all fail.

Any suggestions on what might be wrong or where some useful logs might be?

Ta,
Will
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to configure cephfs-shell correctly

2020-07-13 Thread Zhenshi Zhou
This error disappeared after I installed 'python3-cephfs'. However the
cephfs-shell command stuck.
It stays at 'CephFS:~/>>>' and whatever subcommand I executed, it printed
an error as below:

EXCEPTION of type 'TypeError' occurred with message: 'onecmd() got an
unexpected keyword argument 'add_to_history''

How can I use 'cephfs-shell'?

Zhenshi Zhou  于2020年7月10日周五 下午3:09写道:

> Hi all,
>
> I want to use cephfs-shell dealing with operations like directory
> creation,
> instead of mounting the root directory and create manually. But I get
> errors
> when I execute the command 'cephfs-shell'.
>
> Traceback (most recent call last):
>   File "./cephfs-shell", line 9, in 
> import cephfs as libcephfs
> ModuleNotFoundError: No module named 'cephfs'
>
> CentOS7 use python2 and I've already installed python3 in the system.
> What else should I do, reinstall libcephfs?
>
> Thanks
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v14.2.10 Nautilus crash

2020-07-13 Thread Dan van der Ster
On Mon, Jul 13, 2020 at 6:37 PM Sven Kieske  wrote:
>
> On Fr, 2020-07-10 at 23:52 +0200, Dan van der Ster wrote:
> > Otherwise, the question I ask everyone with osdmap issues these days:
> > are you using bluestore compression and lz4?
>
> Hi,
>
> first time on this list, so hi everybody!

Hi Sven, welcome!

>
> We saw these crashes with 14.2.9 and upgraded today to 14.2.10.
> No crashes so far since the upgrade (but it's running for only a few hours 
> now).
>
> To answer the question: no, we don't run bluestore compression.
>
> But to add some more questioning:
>
> In the bug report/fix https://tracker.ceph.com/issues/46443 it is mentioned
> that this is a bug in the linux kernels monotonic timer implementation.

I don't quite follow where you found a connection with the kernel...
are you sure you have the same problem as Markus?

Cheers, Dan




>
> However it is not mentioned if this was ever reported to the upstream linux 
> kernel, or if there is
> a fix available for the kernel itself.
>
> Has anyone information regarding a bug report against the kernel?
> Did someone very this bug against the latest upstream kernel?
>
> If this is not the case, we will try to reproduce with upstream
> and possibly report it as a bug, because I think it is important to fix this
> upstream (if it is a problem with non distro kernels).
>
> --
> Mit freundlichen Grüßen / Regards
>
> Sven Kieske
> Systementwickler
>
>
> Mittwald CM Service GmbH & Co. KG
> Königsberger Straße 4-6
> 32339 Espelkamp
>
> Tel.: 05772 / 293-900
> Fax: 05772 / 293-333
>
> https://www.mittwald.de
>
> Geschäftsführer: Robert Meyer, Florian Jürgens
>
> St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
> Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
>
> Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit
> gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io