[ceph-users] Re: which of cpu frequency and number of threads servers osd better?

2020-11-12 Thread Tony Liu
You all mentioned first 2T and another 2T. Could you give more
details how OSD works with multi-thread, or share the link if
it's already documented somewhere?

Is it always 4T, or start with 1T and grow up to 4T? Is it max 4T?
Does each T run different job or just multiple instances of the
same job? Does disk type affect how T works, like 1T is good enough
for HDD while 4T is required for SSD?

If I change my plan to 2 SSD OSDs and 8 HDD OSDs (with 1 SSD for
WAL and DB). If each OSD requires 4T, then 16C/32T 3.0GHz could
be a better choice, because it provides sufficient Ts?
If SSD OSD requires 4T and HDD OSD only requires 1T, then 8C/16T
3.2GHz would be better, because it provides sufficient Ts as well
as stronger computing? 

Thanks!
Tony
> -Original Message-
> From: Frank Schilder 
> Sent: Thursday, November 12, 2020 10:59 PM
> To: Tony Liu ; Nathan Fish 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: which of cpu frequency and number of
> threads servers osd better?
> 
> I think this depends on the type of backing disk. We use the following
> CPUs:
> 
> Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
> Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
> Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
> 
> My experience is, that a HDD OSD hardly gets to 100% of 1 hyper thread
> load even under heavy recovery/rebalance operations on 8+2 and 6+2 EC
> pools with compression set to aggressive. The CPU is mostly doing wait-
> IO, that is, the disk is the real bottle neck, not the processor power.
> With SSDs I have seen 2HT at 100% and 2 more at 50% each. I guess NVMe
> might be more demanding.
> 
> A server with 12 HDD and 1 SSD should be fine with a modern CPU with 8
> cores. 16 threads sounds like an 8 core CPU. The 2nd generation Intel®
> Xeon® Silver 4209T with 8 cores should easily handle that (single socket
> system). We have the 16-core Intel silver in a dual socket system
> currently connected to 5HDD and 7SSD and I did a rebalance operation
> yesterday. The CPU user load did not exceed 2%, it can handle OSD
> processes easily. The server is dimensioned to run up to 12HDD and 14SSD
> OSDs (Dell R740xd2). As far as I can tell, the CPU configuration is
> overpowered for that.
> 
> Just for info, we use ganglia to record node utilisation. I use 1-year
> records and pick peak loads I observed for dimensioning the CPUs. These
> records include some very heavy recovery periods.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Tony Liu 
> Sent: 13 November 2020 04:57:53
> To: Nathan Fish
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: which of cpu frequency and number of threads
> servers osd better?
> 
> Thanks Nathan!
> Tony
> > -Original Message-
> > From: Nathan Fish 
> > Sent: Thursday, November 12, 2020 7:43 PM
> > To: Tony Liu 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] which of cpu frequency and number of threads
> > servers osd better?
> >
> > From what I've seen, OSD daemons tend to bottleneck on the first 2
> > threads, while getting some use out of another 2. So 32 threads at 3.0
> > would be a lot better. Note that you may get better performance
> > splitting off some of that SSD for block.db partitions or at least
> > block.wal for the HDDs.
> >
> > On Thu, Nov 12, 2020 at 9:57 PM Tony Liu 
> wrote:
> > >
> > > Hi,
> > >
> > > For example, 16 threads with 3.2GHz and 32 threads with 3.0GHz,
> > > which makes 11 OSDs (10x12TB HDD and 1x960GB SSD) with better
> performance?
> > >
> > >
> > > Thanks!
> > > Tony
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > > email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: which of cpu frequency and number of threads servers osd better?

2020-11-12 Thread Martin Verges
Hello Tony,

as it is HDD, your CPU won't be a bottleneck at all. Both CPUs are
overprovisioned.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Fr., 13. Nov. 2020 um 03:57 Uhr schrieb Tony Liu :

> Hi,
>
> For example, 16 threads with 3.2GHz and 32 threads with 3.0GHz,
> which makes 11 OSDs (10x12TB HDD and 1x960GB SSD) with better
> performance?
>
>
> Thanks!
> Tony
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Not able to read file from ceph kernel mount

2020-11-12 Thread Amudhan P
Hi,

This issue is fixed now after setting cluster_IP to only osd's. Mount works
perfectly fine.

"ceph config set osd cluster_network 10.100.4.0/24"

regards
Amudhan
On Sat, Nov 7, 2020 at 10:09 PM Amudhan P  wrote:

> Hi,
>
> At last, the problem fixed for now by adding cluster network IP to the
> second interface.
>
> But It looks weird why the client wants to communicate with Cluster IP.
>
> Does anyone have an idea? why we need to provide cluster IP to client
> mounting thru kernel.
>
> Initially, when the cluster was set up it had only public network. later
> added cluster with cluster IP and it was working fine until the restart of
> the entire cluster.
>
> regards
> Amudhan P
>
> On Fri, Nov 6, 2020 at 12:02 AM Amudhan P  wrote:
>
>> Hi,
>> I am trying to read file from my ceph kernel mount and file read stays in
>> bytes for very long and I am getting below error msg in dmesg.
>>
>> [  167.591095] ceph: loaded (mds proto 32)
>> [  167.600010] libceph: mon0 10.0.103.1:6789 session established
>> [  167.601167] libceph: client144519 fsid f8bc7682-0d11-11eb-a332-
>> 0cc47a5ec98a
>> [  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
>> CONNECTING)
>>
>> Ceph cluster status is healthy no error It was working fine until before
>> my entire cluster was down.
>>
>> Using Ceph octopus in debian.
>>
>> Regards
>> Amudhan P
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-12 Thread Amudhan P
Hi Eugen,

The issue looks fixed now in my kernel client mount works fine without
cluster IP.

I have re-run "ceph config set osd cluster_network 10.100.4.0/24" and
restarted all service. Eearlier it was run with "ceph config set global
cluster_network 10.100.4.0/24".

I have run the command output you have asked and output is after
applying all the changes above said.
# ceph config get mon cluster_network
ouput :
#ceph config get mon public_network
output : 10.100.3.0/24

Still testing more on this to confirm the issue and playing out with my
ceph cluster.

regards
Amudhan P

On Wed, Nov 11, 2020 at 2:14 PM Eugen Block  wrote:

> > Do you find any issue in the below commands I have used to set cluster IP
> > in cluster.
>
> Yes I do:
>
> > ### adding public IP for ceph cluster ###
> > ceph config set global cluster_network 10.100.4.0/24
>
> I'm still not convinced that your setup is as you want it to be.
> Can you share your actual config?
>
> ceph config get mon cluster_network
> ceph config get mon public_network
>
>
>
> Zitat von Amudhan P :
>
> > Hi Eugen,
> >
> > I have only added my Public IP and relevant hostname to hosts file.
> >
> > Do you find any issue in the below commands I have used to set cluster IP
> > in cluster.
> >
> > ### adding public IP for ceph cluster ###
> > ceph config set global cluster_network 10.100.4.0/24
> >
> > ceph orch daemon reconfig mon.host1
> > ceph orch daemon reconfig mon.host2
> > ceph orch daemon reconfig mon.host3
> > ceph orch daemon reconfig osd.1
> > ceph orch daemon reconfig osd.2
> > ceph orch daemon reconfig osd.3
> >
> > restarting all daemons.
> >
> > regards
> > Amudhan
> >
> > On Tue, Nov 10, 2020 at 7:42 PM Eugen Block  wrote:
> >
> >> Could it be possible that you have some misconfiguration in the name
> >> resolution and IP mapping? I've never heard or experienced that a
> >> client requires a cluster address, that would make the whole concept
> >> of separate networks obsolete which is hard to believe, to be honest.
> >> I would recommend to double-check your setup.
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi Nathan,
> >> >
> >> > Kernel client should be using only the public IP of the cluster to
> >> > communicate with OSD's.
> >> >
> >> > But here it requires both IP's for mount to work properly.
> >> >
> >> > regards
> >> > Amudhan
> >> >
> >> >
> >> >
> >> > On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish 
> wrote:
> >> >
> >> >> It sounds like your client is able to reach the mon but not the OSD?
> >> >> It needs to be able to reach all mons and all OSDs.
> >> >>
> >> >> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P 
> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I have mounted my cephfs (ceph octopus) thru kernel client in
> Debian.
> >> >> > I get following error in "dmesg" when I try to read any file from
> my
> >> >> mount.
> >> >> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
> >> state
> >> >> > CONNECTING)"
> >> >> >
> >> >> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> >> >> > cluster. I think public IP is enough to mount the share and work
> on it
> >> >> but
> >> >> > in my case, it needs me to assign public IP also to the client to
> work
> >> >> > properly.
> >> >> >
> >> >> > Does anyone have experience in this?
> >> >> >
> >> >> > I have earlier also mailed the ceph-user group but I didn't get any
> >> >> > response. So sending again not sure my mail went through.
> >> >> >
> >> >> > regards
> >> >> > Amudhan
> >> >> > ___
> >> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >>
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: which of cpu frequency and number of threads servers osd better?

2020-11-12 Thread Tony Liu
Thanks Nathan!
Tony
> -Original Message-
> From: Nathan Fish 
> Sent: Thursday, November 12, 2020 7:43 PM
> To: Tony Liu 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] which of cpu frequency and number of threads
> servers osd better?
> 
> From what I've seen, OSD daemons tend to bottleneck on the first 2
> threads, while getting some use out of another 2. So 32 threads at 3.0
> would be a lot better. Note that you may get better performance
> splitting off some of that SSD for block.db partitions or at least
> block.wal for the HDDs.
> 
> On Thu, Nov 12, 2020 at 9:57 PM Tony Liu  wrote:
> >
> > Hi,
> >
> > For example, 16 threads with 3.2GHz and 32 threads with 3.0GHz, which
> > makes 11 OSDs (10x12TB HDD and 1x960GB SSD) with better performance?
> >
> >
> > Thanks!
> > Tony
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: which of cpu frequency and number of threads servers osd better?

2020-11-12 Thread Nathan Fish
>From what I've seen, OSD daemons tend to bottleneck on the first 2
threads, while getting some use out of another 2. So 32 threads at 3.0
would be a lot better. Note that you may get better performance
splitting off some of that SSD for block.db partitions or at least
block.wal for the HDDs.

On Thu, Nov 12, 2020 at 9:57 PM Tony Liu  wrote:
>
> Hi,
>
> For example, 16 threads with 3.2GHz and 32 threads with 3.0GHz,
> which makes 11 OSDs (10x12TB HDD and 1x960GB SSD) with better
> performance?
>
>
> Thanks!
> Tony
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] which of cpu frequency and number of threads servers osd better?

2020-11-12 Thread Tony Liu
Hi,

For example, 16 threads with 3.2GHz and 32 threads with 3.0GHz,
which makes 11 OSDs (10x12TB HDD and 1x960GB SSD) with better
performance?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Tracing in ceph

2020-11-12 Thread Seena Fallah
Hi all,

Does this project work with the latest zipkin apis?
https://github.com/ceph/babeltrace-zipkin

Also what do you prefer to trace requests for rgw and rbd in ceph?

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: question about rgw delete speed

2020-11-12 Thread Nathan Fish
>From what we have experienced, our delete speed scales with the CPU
available to the MDS. And the MDS only seems to scale to 2-4 CPUs per
daemon, so for our biggest filesystem, we have 5 active MDS daemons.
Migrations reduced performance a lot, but pinning fixed that. Even better
is just getting the fastest cores you can get.

On Thu., Nov. 12, 2020, 6:08 p.m. Brent Kennedy, 
wrote:

> Ceph is definitely a good choice for storing millions of files.  It sounds
> like you plan to use this like s3, so my first question would be:  Are the
> deletes done for a specific reason?  ( e.g. the files are used for a
> process and discarded  )  If its an age thing, you can set the files to
> expire when putting them in, then ceph will automatically clear them.
>
> The more spinners you have the more performance you will end up with.
> Network 10Gb or higher?
>
> Octopus is production stable and contains many performance enhancements.
> Depending on the OS, you may not be able to upgrade from nautilus until
> they work out that process ( e.g. centos 7/8 ).
>
> Delete speed is not that great but you would have to test it with your
> cluster to see how it performs for your use case.  If you have enough space
> present, is there a process that breaks if the files are not deleted?
>
>
> Regards,
> -Brent
>
> Existing Clusters:
> Test: Ocotpus 15.2.5 ( all virtual on nvme )
> US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
> gateways, 2 iscsi gateways
> UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
> gateways, 2 iscsi gateways
> US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4
> gateways, 2 iscsi gateways
> UK Production(SSD): Octopus 15.2.5 with 5 osd servers, 3 mons, 4 gateways
>
>
>
>
> -Original Message-
> From: Adrian Nicolae 
> Sent: Wednesday, November 11, 2020 3:42 PM
> To: ceph-users 
> Subject: [ceph-users] question about rgw delete speed
>
>
> Hey guys,
>
>
> I'm in charge of a local cloud-storage service. Our primary object storage
> is a vendor-based one and I want to replace it in the near future with Ceph
> with the following setup :
>
> - 6 OSD servers with 36 SATA 16TB drives each and 3 big NVME per server
> (1 big NVME for every 12 drives so I can reserve 300GB NVME storage for
> every SATA drive), 3 MON, 2 RGW with Epyc 7402p and 128GB RAM. So in the
> end we'll have ~ 3PB of raw data and 216 SATA drives.
>
> Currently we have ~ 100 millions of files on the primary storage with the
> following distribution :
>
> - ~10% = very small files ( less than 1MB - thumbnails, text&office files
> and so on)
>
> - ~60%= small files (between 1MB and 10MB)
>
> -  20% = medium files ( between 10MB and 1GB)
>
> - 10% = big files (over 1GB).
>
> My main concern is the speed of delete operations. We have around
> 500k-600k delete ops every 24 hours so quite a lot. Our current storage is
> not deleting all the files fast enough (it's always 1 week-10 days
> behind) , I guess is not only a software issue and probably the delete
> speed will get better if we add more drives (we now have 108).
>
> What do you think about Ceph delete speed ? I read on other threads that
> it's not very fast . I wonder if this hw setup can handle our current
> delete load better than our current storage. On RGW servers I want to use
> Swift , not S3.
>
> And another question :   can I start deploying in production directly the
> latest Ceph version (Octopus) or is it safer to start with Nautilus until
> Octopus will be more stable ?
>
> Any input would be greatly appreciated !
>
>
> Thanks,
>
> Adrian.
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a way to make Cephfs kernel client to write data to ceph osd smoothly with buffer io

2020-11-12 Thread Frank Schilder
Yes, that's right. It would be nice if there was a mount option to have such 
parameters adjusted on a per-file system basis. I should mention that I 
observed a significant performance improvement for HDD throughput of the local 
disk as well when adjusting these parameters for ceph.

This is largely due to the "too much memory problem" on big servers. The kernel 
defaults are suitable for machines with 4-8G of RAM. Any enterprise server will 
beat that with the consequence of insanely large amounts of dirty buffers, 
leading to buffer flush panic overloading in particular, network file systems 
(there is a nice article by SUSE 
https://www.suse.com/support/kb/doc/?id=17857). Adjusting these parameters 
to play nice with ceph might actually improve overall performance as a side 
effect. I would give it a go.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Sage Meng 
Sent: 12 November 2020 16:00:08
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Is there a way to make Cephfs kernel client to write 
data to ceph osd smoothly with buffer io

vm.dirty_bytes and vm.dirty_background_bytes  are all system-wide control 
parameters, it will influence all the system jobs by adjusting them. Better to 
have a Ceph Special way to  make the transfer more smoothly.

Frank Schilder mailto:fr...@dtu.dk>> 于2020年11月11日周三 下午3:28写道:
These kernel parameters influence the flushing of data, and also performance:

vm.dirty_bytes
vm.dirty_background_bytes

Smaller vm.dirty_background_bytes will make the transfer more smooth and the 
ceph cluster will like that. However, it reduces the chances of merge 
operations in cache and the ceph cluster will not like that. The tuning is 
heavily workload dependent. Test with realistic workloads and a reasonably 
large spectrum of values. I got good results by tuning down 
vm.dirty_background_bytes just to the point when it reduced client performance 
of copying large files.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Sage Meng mailto:lkke...@gmail.com>>
Sent: 06 November 2020 13:45:53
To: ceph-users@ceph.io
Subject: [ceph-users] Is there a way to make Cephfs kernel client to write data 
to ceph osd smoothly with buffer io

Hi All,

  Cephfs kernel client is influenced by kernel page cache when we write
data to it,  outgoing data will be huge when os starts flush page cache.
So Is there a way to make Cephfs kernel client to write data to ceph osd
smoothly when buffer io is used ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: question about rgw delete speed

2020-11-12 Thread Brent Kennedy
Ceph is definitely a good choice for storing millions of files.  It sounds like 
you plan to use this like s3, so my first question would be:  Are the deletes 
done for a specific reason?  ( e.g. the files are used for a process and 
discarded  )  If its an age thing, you can set the files to expire when putting 
them in, then ceph will automatically clear them.

The more spinners you have the more performance you will end up with.  Network 
10Gb or higher?

Octopus is production stable and contains many performance enhancements.  
Depending on the OS, you may not be able to upgrade from nautilus until they 
work out that process ( e.g. centos 7/8 ).  

Delete speed is not that great but you would have to test it with your cluster 
to see how it performs for your use case.  If you have enough space present, is 
there a process that breaks if the files are not deleted?  


Regards,
-Brent

Existing Clusters:
Test: Ocotpus 15.2.5 ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4 gateways, 2 
iscsi gateways
UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4 gateways, 2 
iscsi gateways
US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways, 2 
iscsi gateways
UK Production(SSD): Octopus 15.2.5 with 5 osd servers, 3 mons, 4 gateways




-Original Message-
From: Adrian Nicolae  
Sent: Wednesday, November 11, 2020 3:42 PM
To: ceph-users 
Subject: [ceph-users] question about rgw delete speed


Hey guys,


I'm in charge of a local cloud-storage service. Our primary object storage is a 
vendor-based one and I want to replace it in the near future with Ceph with the 
following setup :

- 6 OSD servers with 36 SATA 16TB drives each and 3 big NVME per server
(1 big NVME for every 12 drives so I can reserve 300GB NVME storage for every 
SATA drive), 3 MON, 2 RGW with Epyc 7402p and 128GB RAM. So in the end we'll 
have ~ 3PB of raw data and 216 SATA drives.

Currently we have ~ 100 millions of files on the primary storage with the 
following distribution :

- ~10% = very small files ( less than 1MB - thumbnails, text&office files and 
so on)

- ~60%= small files (between 1MB and 10MB)

-  20% = medium files ( between 10MB and 1GB)

- 10% = big files (over 1GB).

My main concern is the speed of delete operations. We have around 500k-600k 
delete ops every 24 hours so quite a lot. Our current storage is not deleting 
all the files fast enough (it's always 1 week-10 days
behind) , I guess is not only a software issue and probably the delete speed 
will get better if we add more drives (we now have 108).

What do you think about Ceph delete speed ? I read on other threads that it's 
not very fast . I wonder if this hw setup can handle our current delete load 
better than our current storage. On RGW servers I want to use Swift , not S3.

And another question :   can I start deploying in production directly the 
latest Ceph version (Octopus) or is it safer to start with Nautilus until 
Octopus will be more stable ?

Any input would be greatly appreciated !


Thanks,

Adrian.




___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Autoscale - enable or not on main pool?

2020-11-12 Thread Brent Kennedy
I recently setup a new octopus cluster and was testing the autoscale
feature.  Used ceph-ansible so its enabled by default.  Anyhow, I have three
other clusters that are on nautilus, so I wanted to see if it made sense to
enable it there on the main pool.

 

Here is a print out of the autoscale status:

POOL SIZE TARGET SIZE RATE RAW CAPACITY  RATIO
TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE

default.rgw.buckets.non-ec 0   2.0   55859G 0.
1.0 32on

default.rgw.meta9298   3.0   55859G 0.
1.0 32on

default.rgw.buckets.index  18058M  3.0   55859G 0.0009
1.0 32on

default.rgw.control0   3.0   55859G 0.
1.0 32on

default.rgw.buckets.data9126G  2.0   55859G 0.3268
1.0   4096   1024 off

.rgw.root   3155   3.0   55859G 0.
1.0 32on

rbd155.5G  2.0   55859G 0.0056
1.0 32on

default.rgw.log374.4k  3.0   55859G 0.
1.0 64on

 

For this entry:

default.rgw.buckets.data9126G  2.0   55859G 0.3268
1.0   4096   1024 off

 

I have it disabled because it showed a warn message, but its recommending a
1024 PG setting.  When I use the online ceph calculator at ceph.io, its
saying the 4096 setting is correct.  So why is autoscaler saying 1024?

 

There are 6 osd servers with 10 OSDs each ( all SSD ).  60 TB total.

 

Pool LS output:

pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8800 lfor
0/0/344 flags hashpspool stripe_width 0 application rgw

pool 2 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8799
lfor 0/0/346 flags hashpspool stripe_width 0 application rgw

pool 3 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8798
lfor 0/0/350 flags hashpspool stripe_width 0 application rgw

pool 4 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 8802
lfor 0/0/298 flags hashpspool stripe_width 0 application rgw

pool 5 'default.rgw.buckets.index' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 638 pgp_num 608 pg_num_target 32 pgp_num_target
32 autoscale_mode on last_change 10320 lfor 0/10320/10318 owner
18446744073709551615 flags hashpspool stripe_width 0 application rgw

pool 7 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 9467 lfor 0/0/552
owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw

pool 8 'default.rgw.buckets.non-ec' replicated size 2 min_size 1 crush_rule
0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
8797 lfor 0/0/348 owner 18446744073709551615 flags hashpspool stripe_width 0
application rgw

pool 9 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins
pg_num 32 pgp_num 32 autoscale_mode on last_change 8801 flags
hashpspool,selfmanaged_snaps stripe_width 0 application rbd

 

 

Regards,

-Brent

 

Existing Clusters:

Test: Ocotpus 15.2.5 ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways,
2 iscsi gateways

UK Production(SSD): Octopus 15.2.5 with 5 osd servers, 3 mons, 4 gateways

 

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to clarify error using vfs_ceph (Samba gateway for CephFS)

2020-11-12 Thread Brad Hubbard
I don't know much about the vfs plugin (nor cephfs for that matter)
but I would suggest enabling client debug logging on the machine so
you can see what the libcephfs code is doing since that's likely where
the ENOENT is coming from.

https://docs.ceph.com/en/latest/rados/troubleshooting/log-and-debug/
https://docs.ceph.com/en/latest/cephfs/client-config-ref/

On Fri, Nov 13, 2020 at 3:39 AM Frank Schilder  wrote:
>
> You might need to give read permissions to the ceph config and key file for 
> the user that runs the SAMBA service (samba?). Either add the SAMBA user to 
> the group ceph, or change the group of the file.
>
> The statement "/" file not found could just be an obfuscating message on an 
> actual security/permission issue.
>
> Other than that I don't really know what to look for. As I said, I gave up as 
> well. Ceph kernel client does a good job for us with an ordinary SAMBA share 
> defined on it.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Matt Larson 
> Sent: 12 November 2020 18:18:32
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] Unable to clarify error using vfs_ceph (Samba 
> gateway for CephFS)
>
> Thank you Frank,
>
>  That was a good suggestion to make sure the mount wasn't the issue. I
> tried changing the `client.samba.upload` to have read access directly
> to '/' rather than '/upload' and to also change smb.conf to directly
> use 'path = /'. Still getting the same issue (log level 10 content
> below).
>
>  It appears that it is correctly reading `/etc/ceph/ceph.conf`. It
> does appear to be the ceph_mount where the failure occurs.
>
>  It would be great to have vfs_ceph working, but if I cannot I'll try
> to find other approaches.
>
> [2020/11/12 10:47:39.360943, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/smbd/vfs.c:65(vfs_find_backend_entry)
>
>   vfs_find_backend_entry called for ceph
>   Successfully loaded vfs module [ceph] with the new modules system
> [2020/11/12 10:47:39.360966, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/modules/vfs_ceph.c:103(cephwrap_connect)
>   cephwrap_connect: [CEPH] calling: ceph_create
> [2020/11/12 10:47:39.365668, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/modules/vfs_ceph.c:110(cephwrap_connect)
>   cephwrap_connect: [CEPH] calling: ceph_conf_read_file with 
> /etc/ceph/ceph.conf
> [2020/11/12 10:47:39.368842, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/modules/vfs_ceph.c:116(cephwrap_connect)
>   cephwrap_connect: [CEPH] calling: ceph_conf_get
> [2020/11/12 10:47:39.368895, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/modules/vfs_ceph.c:133(cephwrap_connect)
>   cephwrap_connect: [CEPH] calling: ceph_mount
> [2020/11/12 10:47:39.373319, 10, pid=2723021, effective(0, 0), real(0,
> 0), class=vfs] ../../source3/modules/vfs_ceph.c:160(cephwrap_connect)
>   cephwrap_connect: [CEPH] Error return: No such file or directory
> [2020/11/12 10:47:39.373357,  1, pid=2723021, effective(0, 0), real(0,
> 0)] ../../source3/smbd/service.c:668(make_connection_snum)
>   make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
> '/' failed: No such file or directory
>
> On Thu, Nov 12, 2020 at 2:29 AM Frank Schilder  wrote:
> >
> > You might face the same issue I had. vfs_ceph wants to have a key for the 
> > root of the cephfs, it is cutrently not possible to restrict access to a 
> > sub-directory mount. For this reason, I decided to go for a re-export of a 
> > kernel client mount.
> >
> > I consider this a serious security issue in vfs_ceph and will not use it 
> > until it is possible to do sub-directory mounts.
> >
> > I don't think its difficult to patch the vfs_ceph source code, if you need 
> > to use vfs_ceph and cannot afford to give access to "/" of the cephfs.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Matt Larson 
> > Sent: 12 November 2020 00:40:21
> > To: ceph-users
> > Subject: [ceph-users] Unable to clarify error using vfs_ceph (Samba gateway 
> > for CephFS)
> >
> > I am getting an error in the log.smbd from the Samba gateway that I
> > don’t understand and looking for help from anyone who has gotten the
> > vfs_ceph working.
> >
> > Background:
> >
> > I am trying to get a Samba gateway with CephFS working with the
> > vfs_ceph module. I observed that the default Samba package on CentOS
> > 7.7 did not come with the ceph.so vfs_ceph module, so I tried to
> > compile a working Samba version with vfs_ceph.
> >
> > Newer Samba versions have a requirement for GnuTLS >= 3.4.7, which is
> > not an available package on CentOS 7.7 without a custom repository. I
> > opted to build an earlier version of Samba.
> >
> > On CentOS 7.7, I built Samba 4.11.16 w

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-12 Thread Phil Merricks
Thanks for the reply Robert.  Could you briefly explain the issue with the
current setup and "what good looks like" here, or point me to some
documentation that would help me figure that out myself?

I'm guessing here it has something  to do with the different sizes and
types of dial, and possibly the EC crush rule setup?

Best regards

Phil Merricks

On Wed., Nov. 11, 2020, 1:30 a.m. Robert Sander, <
r.san...@heinlein-support.de> wrote:

> Am 07.11.20 um 01:14 schrieb seffyr...@gmail.com:
> > I've inherited a Ceph Octopus cluster that seems like it needs urgent
> maintenance before data loss begins to happen. I'm the guy with the most
> Ceph experience on hand and that's not saying much. I'm experiencing most
> of the ops and repair tasks for the first time here.
>
> My condolences. Get the data from that cluster and put the cluster down.
>
> In the current setup it will never work.
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> http://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread huxia...@horebdata.cn
Looks like this is a very dangerous bug for data safety. Hope the bug would be 
quickly identified and fixed.

best regards,

Samuel



huxia...@horebdata.cn
 
From: Janek Bevendorff
Date: 2020-11-12 18:17
To: huxia...@horebdata.cn; EDH - Manuel Rios; Rafael Lopez
CC: Robin H. Johnson; ceph-users
Subject: Re: [ceph-users] Re: NoSuchKey on key that is visible in s3 
list/radosgw bk
I have never seen this on Luminous. I recently upgraded to Octopus and the 
issue started occurring only few weeks later.

On 12/11/2020 16:37, huxia...@horebdata.cn wrote:
which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, 
Octupos, or the latest?

any idea?

samuel



huxia...@horebdata.cn
 
From: EDH - Manuel Rios
Date: 2020-11-12 14:27
To: Janek Bevendorff; Rafael Lopez
CC: Robin H. Johnson; ceph-users
Subject: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
This same error caused us to wipe a full cluster of 300TB... will be related to 
some rados index/database bug not to s3.
 
As Janek exposed is a mayor issue, because the error silent happend and you can 
only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping 
NoSuchKey. Error is not related to S3 logic ..
 
Hope this time dev's can take enought time to find and resolve the issue. Error 
happens with low ec profiles, even with replica x3 in some cases.
 
Regards
 
 
 
-Mensaje original-
De: Janek Bevendorff  
Enviado el: jueves, 12 de noviembre de 2020 14:06
Para: Rafael Lopez 
CC: Robin H. Johnson ; ceph-users 
Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
 
Here is a bug report concerning (probably) this exact issue: 
https://tracker.ceph.com/issues/47866
 
I left a comment describing the situation and my (limited) experiences with it.
 
 
On 11/11/2020 10:04, Janek Bevendorff wrote:
>
> Yeah, that seems to be it. There are 239 objects prefixed 
> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none 
> of the multiparts from the other file to be found and the head object 
> is 0 bytes.
>
> I checked another multipart object with an end pointer of 11. 
> Surprisingly, it had way more than 11 parts (39 to be precise) named 
> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I 
> could find them in the dump at least.
>
> I have no idea why the objects disappeared. I ran a Spark job over all 
> buckets, read 1 byte of every object and recorded errors. Of the 78 
> buckets, two are missing objects. One bucket is missing one object, 
> the other 15. So, luckily, the incidence is still quite low, but the 
> problem seems to be expanding slowly.
>
>
> On 10/11/2020 23:46, Rafael Lopez wrote:
>> Hi Janek,
>>
>> What you said sounds right - an S3 single part obj won't have an S3 
>> multipart string as part of the prefix. S3 multipart string looks 
>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>>
>> From memory, single part S3 objects that don't fit in a single rados 
>> object are assigned a random prefix that has nothing to do with 
>> the object name, and the rados tail/data objects (not the head 
>> object) have that prefix.
>> As per your working example, the prefix for that would be 
>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow" 
>> objects with names containing that prefix, and if you add up the 
>> sizes it should be the size of your S3 object.
>>
>> You should look at working and non working examples of both single 
>> and multipart S3 objects, as they are probably all a bit different 
>> when you look in rados.
>>
>> I agree it is a serious issue, because once objects are no longer in 
>> rados, they cannot be recovered. If it was a case that there was a 
>> link broken or rados objects renamed, then we could work to 
>> recover...but as far as I can tell, it looks like stuff is just 
>> vanishing from rados. The only explanation I can think of is some 
>> (rgw or rados) background process is incorrectly doing something with 
>> these objects (eg. renaming/deleting). I had thought perhaps it was a 
>> bug with the rgw garbage collector..but that is pure speculation.
>>
>> Once you can articulate the problem, I'd recommend logging a bug 
>> tracker upstream.
>>
>>
>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff 
>> > > wrote:
>>
>> Here's something else I noticed: when I stat objects that work
>> via radosgw-admin, the stat info contains a "begin_iter" JSON
>> object with RADOS key info like this
>>
>>
>> "key": {
>> "name":
>> 
>> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
>> "instance": "",
>> "ns": ""
>> }
>>
>>
>> and then "end_iter" with key info like this:
>>
>>
>> "key": {
>> "name":
>> ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh

[ceph-users] Re: Unable to clarify error using vfs_ceph (Samba gateway for CephFS)

2020-11-12 Thread Frank Schilder
You might need to give read permissions to the ceph config and key file for the 
user that runs the SAMBA service (samba?). Either add the SAMBA user to the 
group ceph, or change the group of the file.

The statement "/" file not found could just be an obfuscating message on an 
actual security/permission issue.

Other than that I don't really know what to look for. As I said, I gave up as 
well. Ceph kernel client does a good job for us with an ordinary SAMBA share 
defined on it.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Matt Larson 
Sent: 12 November 2020 18:18:32
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] Unable to clarify error using vfs_ceph (Samba gateway 
for CephFS)

Thank you Frank,

 That was a good suggestion to make sure the mount wasn't the issue. I
tried changing the `client.samba.upload` to have read access directly
to '/' rather than '/upload' and to also change smb.conf to directly
use 'path = /'. Still getting the same issue (log level 10 content
below).

 It appears that it is correctly reading `/etc/ceph/ceph.conf`. It
does appear to be the ceph_mount where the failure occurs.

 It would be great to have vfs_ceph working, but if I cannot I'll try
to find other approaches.

[2020/11/12 10:47:39.360943, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/smbd/vfs.c:65(vfs_find_backend_entry)

  vfs_find_backend_entry called for ceph
  Successfully loaded vfs module [ceph] with the new modules system
[2020/11/12 10:47:39.360966, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:103(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_create
[2020/11/12 10:47:39.365668, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:110(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_conf_read_file with /etc/ceph/ceph.conf
[2020/11/12 10:47:39.368842, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:116(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_conf_get
[2020/11/12 10:47:39.368895, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:133(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_mount
[2020/11/12 10:47:39.373319, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:160(cephwrap_connect)
  cephwrap_connect: [CEPH] Error return: No such file or directory
[2020/11/12 10:47:39.373357,  1, pid=2723021, effective(0, 0), real(0,
0)] ../../source3/smbd/service.c:668(make_connection_snum)
  make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
'/' failed: No such file or directory

On Thu, Nov 12, 2020 at 2:29 AM Frank Schilder  wrote:
>
> You might face the same issue I had. vfs_ceph wants to have a key for the 
> root of the cephfs, it is cutrently not possible to restrict access to a 
> sub-directory mount. For this reason, I decided to go for a re-export of a 
> kernel client mount.
>
> I consider this a serious security issue in vfs_ceph and will not use it 
> until it is possible to do sub-directory mounts.
>
> I don't think its difficult to patch the vfs_ceph source code, if you need to 
> use vfs_ceph and cannot afford to give access to "/" of the cephfs.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Matt Larson 
> Sent: 12 November 2020 00:40:21
> To: ceph-users
> Subject: [ceph-users] Unable to clarify error using vfs_ceph (Samba gateway 
> for CephFS)
>
> I am getting an error in the log.smbd from the Samba gateway that I
> don’t understand and looking for help from anyone who has gotten the
> vfs_ceph working.
>
> Background:
>
> I am trying to get a Samba gateway with CephFS working with the
> vfs_ceph module. I observed that the default Samba package on CentOS
> 7.7 did not come with the ceph.so vfs_ceph module, so I tried to
> compile a working Samba version with vfs_ceph.
>
> Newer Samba versions have a requirement for GnuTLS >= 3.4.7, which is
> not an available package on CentOS 7.7 without a custom repository. I
> opted to build an earlier version of Samba.
>
> On CentOS 7.7, I built Samba 4.11.16 with
>
> [global]
> security = user
> map to guest = Bad User
> username map = /etc/samba/smbusers
> log level = 4
> load printers = no
> printing = bsd
> printcap name = /dev/null
> disable spoolss = yes
>
> [cryofs_upload]
> public = yes
> read only = yes
> guest ok = yes
> vfs objects = ceph
> path = /upload
> kernel share modes = no
> ceph:user_id = samba.upload
> ceph:config_file = /etc/ceph/ceph.conf
>
> I have a file at /etc/ceph/ceph.conf including:
> fsid = redacted
> mon_host = reda

[ceph-users] Re: Rados Crashing

2020-11-12 Thread Brent Kennedy
I didn't know there was a replacement for the radosgw role!  I saw in the
ceph-ansible project mention of a radosgw load balancer but since I use
haproxy, I didn't dig into that.  Is that what you are referring to?
Otherwise, I cant seem to find any mention of cive being replaced.

For the issue below, I guess the dev was using a single threaded process
that was out of control.  They have done it a few times now and kills all
four gateways.  I asked them to stop and so far no repeats.  For deletes,
they should be using the bucket item aging anyways.

-Brent

-Original Message-
From: Eugen Block  
Sent: Friday, October 23, 2020 7:00 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Rados Crashing

Hi,

I read that civetweb and radosgw have a locking issue in combination with
ssl [1], just a thought based on

> failed to acquire lock on obj_delete_at_hint.79

Since Nautilus the default rgw frontend is beast, have you thought about
switching?

Regards,
Eugen


[1] https://tracker.ceph.com/issues/22951


Zitat von Brent Kennedy :

> We are performing file maintenance( deletes essentially ) and when the 
> process gets to a certain point, all four rados gateways crash with 
> the
> following:
>
>
>
>
>
> Log output:
>
> -5> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s s3:delete_obj
> verifying op params
>
> -4> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s 
> s3:delete_obj pre-executing
>
> -3> 2020-10-20 06:09:53.996 7f15f1543700  2 req 7 0.000s 
> s3:delete_obj executing
>
> -2> 2020-10-20 06:09:53.997 7f161758f700 10 monclient: 
> get_auth_request con 0x55d2c02ff800 auth_method 0
>
> -1> 2020-10-20 06:09:54.009 7f1609d74700  5 process_single_shard():
> failed to acquire lock on obj_delete_at_hint.79
>
>  0> 2020-10-20 06:09:54.035 7f15f1543700 -1 *** Caught signal 
> (Segmentation fault) **
>
> in thread 7f15f1543700 thread_name:civetweb-worker
>
>
>
> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
> nautilus
> (stable)
>
> 1: (()+0xf5d0) [0x7f161d3405d0]
>
> 2: (()+0x2bec80) [0x55d2bcd1fc80]
>
> 3: (std::string::assign(std::string const&)+0x2e) [0x55d2bcd2870e]
>
> 4: (rgw_bucket::operator=(rgw_bucket const&)+0x11) [0x55d2bce3e551]
>
> 5: (RGWObjManifest::obj_iterator::update_location()+0x184) 
> [0x55d2bced7114]
>
> 6: (RGWObjManifest::obj_iterator::operator++()+0x263) [0x55d2bd092793]
>
> 7: (RGWRados::update_gc_chain(rgw_obj&, RGWObjManifest&,
> cls_rgw_obj_chain*)+0x51a) [0x55d2bd0939ea]
>
> 8: (RGWRados::Object::complete_atomic_modification()+0x83) 
> [0x55d2bd093c63]
>
> 9: (RGWRados::Object::Delete::delete_obj()+0x74d) [0x55d2bd0a87ad]
>
> 10: (RGWDeleteObj::execute()+0x915) [0x55d2bd04b6d5]
>
> 11: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, 
> req_state*, bool)+0x915) [0x55d2bcdfbb35]
>
> 12: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::string 
> const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, 
> OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x1cd8) 
> [0x55d2bcdfdea8]
>
> 13: (RGWCivetWebFrontend::process(mg_connection*)+0x38e) 
> [0x55d2bcd41a1e]
>
> 14: (()+0x36bace) [0x55d2bcdccace]
>
> 15: (()+0x36d76f) [0x55d2bcdce76f]
>
> 16: (()+0x36dc18) [0x55d2bcdcec18]
>
> 17: (()+0x7dd5) [0x7f161d338dd5]
>
> 18: (clone()+0x6d) [0x7f161c84302d]
>
> NOTE: a copy of the executable, or `objdump -rdS ` is 
> needed to interpret this.
>
>
>
> My guess is that we need to add more resources to the gateways?  They 
> have 2 CPUs and 12GB of memory running as virtual machines on centOS 
> 7.6 .  Any thoughts?
>
>
>
> -Brent
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread EDH - Manuel Rios
This same error caused us to wipe a full cluster of 300TB... will be related to 
some rados index/database bug not to s3.

As Janek exposed is a mayor issue, because the error silent happend and you can 
only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping 
NoSuchKey. Error is not related to S3 logic ..

Hope this time dev's can take enought time to find and resolve the issue. Error 
happens with low ec profiles, even with replica x3 in some cases.

Regards



-Mensaje original-
De: Janek Bevendorff  
Enviado el: jueves, 12 de noviembre de 2020 14:06
Para: Rafael Lopez 
CC: Robin H. Johnson ; ceph-users 
Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

Here is a bug report concerning (probably) this exact issue: 
https://tracker.ceph.com/issues/47866

I left a comment describing the situation and my (limited) experiences with it.


On 11/11/2020 10:04, Janek Bevendorff wrote:
>
> Yeah, that seems to be it. There are 239 objects prefixed 
> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none 
> of the multiparts from the other file to be found and the head object 
> is 0 bytes.
>
> I checked another multipart object with an end pointer of 11. 
> Surprisingly, it had way more than 11 parts (39 to be precise) named 
> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I 
> could find them in the dump at least.
>
> I have no idea why the objects disappeared. I ran a Spark job over all 
> buckets, read 1 byte of every object and recorded errors. Of the 78 
> buckets, two are missing objects. One bucket is missing one object, 
> the other 15. So, luckily, the incidence is still quite low, but the 
> problem seems to be expanding slowly.
>
>
> On 10/11/2020 23:46, Rafael Lopez wrote:
>> Hi Janek,
>>
>> What you said sounds right - an S3 single part obj won't have an S3 
>> multipart string as part of the prefix. S3 multipart string looks 
>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>>
>> From memory, single part S3 objects that don't fit in a single rados 
>> object are assigned a random prefix that has nothing to do with 
>> the object name, and the rados tail/data objects (not the head 
>> object) have that prefix.
>> As per your working example, the prefix for that would be 
>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow" 
>> objects with names containing that prefix, and if you add up the 
>> sizes it should be the size of your S3 object.
>>
>> You should look at working and non working examples of both single 
>> and multipart S3 objects, as they are probably all a bit different 
>> when you look in rados.
>>
>> I agree it is a serious issue, because once objects are no longer in 
>> rados, they cannot be recovered. If it was a case that there was a 
>> link broken or rados objects renamed, then we could work to 
>> recover...but as far as I can tell, it looks like stuff is just 
>> vanishing from rados. The only explanation I can think of is some 
>> (rgw or rados) background process is incorrectly doing something with 
>> these objects (eg. renaming/deleting). I had thought perhaps it was a 
>> bug with the rgw garbage collector..but that is pure speculation.
>>
>> Once you can articulate the problem, I'd recommend logging a bug 
>> tracker upstream.
>>
>>
>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff 
>> > > wrote:
>>
>> Here's something else I noticed: when I stat objects that work
>> via radosgw-admin, the stat info contains a "begin_iter" JSON
>> object with RADOS key info like this
>>
>>
>>                     "key": {
>>                         "name":
>> 
>> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
>>                         "instance": "",
>>                         "ns": ""
>>                     }
>>
>>
>> and then "end_iter" with key info like this:
>>
>>
>>                     "key": {
>>                         "name":
>> ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
>>                         "instance": "",
>>                         "ns": "shadow"
>>                     }
>>
>> However, when I check the broken 0-byte object, the "begin_iter"
>> and "end_iter" keys look like this:
>>
>>
>>                     "key": {
>>                         "name":
>> 
>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
>>                         "instance": "",
>>                         "ns": "multipart"
>>                     }
>>
>> [...]
>>
>>
>>                     "key": {
>>                         "name":
>> 
>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19",
>>                         "instance": "",
>>                         "ns": "multipart"
>>           

[ceph-users] Re: Unable to clarify error using vfs_ceph (Samba gateway for CephFS)

2020-11-12 Thread Matt Larson
Thank you Frank,

 That was a good suggestion to make sure the mount wasn't the issue. I
tried changing the `client.samba.upload` to have read access directly
to '/' rather than '/upload' and to also change smb.conf to directly
use 'path = /'. Still getting the same issue (log level 10 content
below).

 It appears that it is correctly reading `/etc/ceph/ceph.conf`. It
does appear to be the ceph_mount where the failure occurs.

 It would be great to have vfs_ceph working, but if I cannot I'll try
to find other approaches.

[2020/11/12 10:47:39.360943, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/smbd/vfs.c:65(vfs_find_backend_entry)

  vfs_find_backend_entry called for ceph
  Successfully loaded vfs module [ceph] with the new modules system
[2020/11/12 10:47:39.360966, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:103(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_create
[2020/11/12 10:47:39.365668, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:110(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_conf_read_file with /etc/ceph/ceph.conf
[2020/11/12 10:47:39.368842, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:116(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_conf_get
[2020/11/12 10:47:39.368895, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:133(cephwrap_connect)
  cephwrap_connect: [CEPH] calling: ceph_mount
[2020/11/12 10:47:39.373319, 10, pid=2723021, effective(0, 0), real(0,
0), class=vfs] ../../source3/modules/vfs_ceph.c:160(cephwrap_connect)
  cephwrap_connect: [CEPH] Error return: No such file or directory
[2020/11/12 10:47:39.373357,  1, pid=2723021, effective(0, 0), real(0,
0)] ../../source3/smbd/service.c:668(make_connection_snum)
  make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
'/' failed: No such file or directory

On Thu, Nov 12, 2020 at 2:29 AM Frank Schilder  wrote:
>
> You might face the same issue I had. vfs_ceph wants to have a key for the 
> root of the cephfs, it is cutrently not possible to restrict access to a 
> sub-directory mount. For this reason, I decided to go for a re-export of a 
> kernel client mount.
>
> I consider this a serious security issue in vfs_ceph and will not use it 
> until it is possible to do sub-directory mounts.
>
> I don't think its difficult to patch the vfs_ceph source code, if you need to 
> use vfs_ceph and cannot afford to give access to "/" of the cephfs.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Matt Larson 
> Sent: 12 November 2020 00:40:21
> To: ceph-users
> Subject: [ceph-users] Unable to clarify error using vfs_ceph (Samba gateway 
> for CephFS)
>
> I am getting an error in the log.smbd from the Samba gateway that I
> don’t understand and looking for help from anyone who has gotten the
> vfs_ceph working.
>
> Background:
>
> I am trying to get a Samba gateway with CephFS working with the
> vfs_ceph module. I observed that the default Samba package on CentOS
> 7.7 did not come with the ceph.so vfs_ceph module, so I tried to
> compile a working Samba version with vfs_ceph.
>
> Newer Samba versions have a requirement for GnuTLS >= 3.4.7, which is
> not an available package on CentOS 7.7 without a custom repository. I
> opted to build an earlier version of Samba.
>
> On CentOS 7.7, I built Samba 4.11.16 with
>
> [global]
> security = user
> map to guest = Bad User
> username map = /etc/samba/smbusers
> log level = 4
> load printers = no
> printing = bsd
> printcap name = /dev/null
> disable spoolss = yes
>
> [cryofs_upload]
> public = yes
> read only = yes
> guest ok = yes
> vfs objects = ceph
> path = /upload
> kernel share modes = no
> ceph:user_id = samba.upload
> ceph:config_file = /etc/ceph/ceph.conf
>
> I have a file at /etc/ceph/ceph.conf including:
> fsid = redacted
> mon_host = redacted
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
>
> I have an /etc/ceph/client.samba.upload.keyring /w key for the user
> `samba.upload`
>
> However, connecting fails:
>
> smbclient localhost\\cryofs_upload -U guest
> Enter guest's password:
> tree connect failed: NT_STATUS_UNSUCCESSFUL
>
>
> The log.smbd gives these errors:
>
>   Initialising custom vfs hooks from [ceph]
> [2020/11/11 17:24:37.388460,  3]
> ../../lib/util/modules.c:167(load_module_absolute_path)
>   load_module_absolute_path: Module '/usr/local/samba/lib/vfs/ceph.so' loaded
> [2020/11/11 17:24:37.402026,  1]
> ../../source3/smbd/service.c:668(make_connection_snum)
>   make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
> '/upload' failed: No 

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread Janek Bevendorff
I have never seen this on Luminous. I recently upgraded to Octopus and 
the issue started occurring only few weeks later.



On 12/11/2020 16:37, huxia...@horebdata.cn wrote:
which Ceph versions are affected by this RGW bug/issues? Luminous, 
Mimic, Octupos, or the latest?


any idea?

samuel


huxia...@horebdata.cn

*From:* EDH - Manuel Rios 
*Date:* 2020-11-12 14:27
*To:* Janek Bevendorff ;
Rafael Lopez 
*CC:* Robin H. Johnson ; ceph-users

*Subject:* [ceph-users] Re: NoSuchKey on key that is visible in s3
list/radosgw bk
This same error caused us to wipe a full cluster of 300TB... will
be related to some rados index/database bug not to s3.
As Janek exposed is a mayor issue, because the error silent
happend and you can only detect it with S3, when you're going to
delete/purge a S3 bucket. Dropping NoSuchKey. Error is not related
to S3 logic ..
Hope this time dev's can take enought time to find and resolve the
issue. Error happens with low ec profiles, even with replica x3 in
some cases.
Regards
-Mensaje original-
De: Janek Bevendorff 
Enviado el: jueves, 12 de noviembre de 2020 14:06
Para: Rafael Lopez 
CC: Robin H. Johnson ; ceph-users

Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3
list/radosgw bk
Here is a bug report concerning (probably) this exact issue:
https://tracker.ceph.com/issues/47866
I left a comment describing the situation and my (limited)
experiences with it.
On 11/11/2020 10:04, Janek Bevendorff wrote:
>
> Yeah, that seems to be it. There are 239 objects prefixed
> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are
none
> of the multiparts from the other file to be found and the head
object
> is 0 bytes.
>
> I checked another multipart object with an end pointer of 11.
> Surprisingly, it had way more than 11 parts (39 to be precise)
named
> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I
> could find them in the dump at least.
>
> I have no idea why the objects disappeared. I ran a Spark job
over all
> buckets, read 1 byte of every object and recorded errors. Of the 78
> buckets, two are missing objects. One bucket is missing one object,
> the other 15. So, luckily, the incidence is still quite low, but
the
> problem seems to be expanding slowly.
>
>
> On 10/11/2020 23:46, Rafael Lopez wrote:
>> Hi Janek,
>>
>> What you said sounds right - an S3 single part obj won't have
an S3
>> multipart string as part of the prefix. S3 multipart string looks
>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>>
>> From memory, single part S3 objects that don't fit in a single
rados
>> object are assigned a random prefix that has nothing to do with
>> the object name, and the rados tail/data objects (not the head
>> object) have that prefix.
>> As per your working example, the prefix for that would be
>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239)
"shadow"
>> objects with names containing that prefix, and if you add up the
>> sizes it should be the size of your S3 object.
>>
>> You should look at working and non working examples of both single
>> and multipart S3 objects, as they are probably all a bit different
>> when you look in rados.
>>
>> I agree it is a serious issue, because once objects are no
longer in
>> rados, they cannot be recovered. If it was a case that there was a
>> link broken or rados objects renamed, then we could work to
>> recover...but as far as I can tell, it looks like stuff is just
>> vanishing from rados. The only explanation I can think of is some
>> (rgw or rados) background process is incorrectly doing
something with
>> these objects (eg. renaming/deleting). I had thought perhaps it
was a
>> bug with the rgw garbage collector..but that is pure speculation.
>>
>> Once you can articulate the problem, I'd recommend logging a bug
>> tracker upstream.
>>
>>
>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff
>> > > wrote:
>>
>> Here's something else I noticed: when I stat objects that work
>> via radosgw-admin, the stat info contains a "begin_iter" JSON
>> object with RADOS key info like this
>>
>>
>>                     "key": {
>>                         "name":
>>
"29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
>>                         "instance": "",
>>              

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread huxia...@horebdata.cn
which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, 
Octupos, or the latest?

any idea?

samuel



huxia...@horebdata.cn
 
From: EDH - Manuel Rios
Date: 2020-11-12 14:27
To: Janek Bevendorff; Rafael Lopez
CC: Robin H. Johnson; ceph-users
Subject: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
This same error caused us to wipe a full cluster of 300TB... will be related to 
some rados index/database bug not to s3.
 
As Janek exposed is a mayor issue, because the error silent happend and you can 
only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping 
NoSuchKey. Error is not related to S3 logic ..
 
Hope this time dev's can take enought time to find and resolve the issue. Error 
happens with low ec profiles, even with replica x3 in some cases.
 
Regards
 
 
 
-Mensaje original-
De: Janek Bevendorff  
Enviado el: jueves, 12 de noviembre de 2020 14:06
Para: Rafael Lopez 
CC: Robin H. Johnson ; ceph-users 
Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
 
Here is a bug report concerning (probably) this exact issue: 
https://tracker.ceph.com/issues/47866
 
I left a comment describing the situation and my (limited) experiences with it.
 
 
On 11/11/2020 10:04, Janek Bevendorff wrote:
>
> Yeah, that seems to be it. There are 239 objects prefixed 
> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none 
> of the multiparts from the other file to be found and the head object 
> is 0 bytes.
>
> I checked another multipart object with an end pointer of 11. 
> Surprisingly, it had way more than 11 parts (39 to be precise) named 
> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I 
> could find them in the dump at least.
>
> I have no idea why the objects disappeared. I ran a Spark job over all 
> buckets, read 1 byte of every object and recorded errors. Of the 78 
> buckets, two are missing objects. One bucket is missing one object, 
> the other 15. So, luckily, the incidence is still quite low, but the 
> problem seems to be expanding slowly.
>
>
> On 10/11/2020 23:46, Rafael Lopez wrote:
>> Hi Janek,
>>
>> What you said sounds right - an S3 single part obj won't have an S3 
>> multipart string as part of the prefix. S3 multipart string looks 
>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>>
>> From memory, single part S3 objects that don't fit in a single rados 
>> object are assigned a random prefix that has nothing to do with 
>> the object name, and the rados tail/data objects (not the head 
>> object) have that prefix.
>> As per your working example, the prefix for that would be 
>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow" 
>> objects with names containing that prefix, and if you add up the 
>> sizes it should be the size of your S3 object.
>>
>> You should look at working and non working examples of both single 
>> and multipart S3 objects, as they are probably all a bit different 
>> when you look in rados.
>>
>> I agree it is a serious issue, because once objects are no longer in 
>> rados, they cannot be recovered. If it was a case that there was a 
>> link broken or rados objects renamed, then we could work to 
>> recover...but as far as I can tell, it looks like stuff is just 
>> vanishing from rados. The only explanation I can think of is some 
>> (rgw or rados) background process is incorrectly doing something with 
>> these objects (eg. renaming/deleting). I had thought perhaps it was a 
>> bug with the rgw garbage collector..but that is pure speculation.
>>
>> Once you can articulate the problem, I'd recommend logging a bug 
>> tracker upstream.
>>
>>
>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff 
>> > > wrote:
>>
>> Here's something else I noticed: when I stat objects that work
>> via radosgw-admin, the stat info contains a "begin_iter" JSON
>> object with RADOS key info like this
>>
>>
>> "key": {
>> "name":
>> 
>> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
>> "instance": "",
>> "ns": ""
>> }
>>
>>
>> and then "end_iter" with key info like this:
>>
>>
>> "key": {
>> "name":
>> ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
>> "instance": "",
>> "ns": "shadow"
>> }
>>
>> However, when I check the broken 0-byte object, the "begin_iter"
>> and "end_iter" keys look like this:
>>
>>
>> "key": {
>> "name":
>> 
>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
>> "instance": "",
>> "ns": "multipart"
>> 

[ceph-users] Re: Nautilus - osdmap not trimming

2020-11-12 Thread m . sliwinski

Hi

I removed related options excluding "mon_debug_block_osdmap_trim   
false"


Logs below. I'm not sure how to extract required information so i just 
used grep. If it's not enough then please let me know. I can also upload 
etire log somewhere if required.


root@monb01:~# grep trim ceph-mon.monb01.log
2020-11-12 14:54:07.884 7f39880b1700 10 mon.monb01@0(leader) e6  
trimming session 0x5a1da80 client.? 10.100.0.81:0/1898561466 (timeout 
300.00 < now 2020-11-12 14:54:07.884754)

root@monb01:~#


root@monb01:~# grep prune ceph-mon.monb01.log
"osdmap-prune",
2020-11-12 14:50:55.756 7f39858ac700  5 
mon.monb01@0(electing).elector(625)  so far i have { mon.0: features 
4611087854035861503 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]), mon.2: 
features 4611087854035861503 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]) }
2020-11-12 14:50:55.756 7f39858ac700  5 
mon.monb01@0(electing).elector(625)  so far i have { mon.0: features 
4611087854035861503 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]), mon.1: 
features 4611087854035861503 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]), mon.2: 
features 4611087854035861503 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]) }
2020-11-12 14:50:55.756 7f39858ac700 10 mon.monb01@0(electing) e6 
win_election epoch 626 quorum 0,1,2 features 4611087854035861503 
mon_features 
mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus]) 
min_mon_release 14
2020-11-12 14:50:55.764 7f39858ac700 10 mon.monb01@0(leader).monmap v6 
apply_mon_features min_mon_release (14) and features 
(mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus])) match
2020-11-12 14:50:57.736 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:50:57.736 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:50:57.736 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:02.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:02.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:02.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:07.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:07.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:07.748 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:12.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:12.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:12.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:17.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:17.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:17.752 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:22.756 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:22.756 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:22.756 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:27.768 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:27.768 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:27.768 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:32.772 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
should_prune could only prune 4978 epochs (67114..72092), which is less 
than the required minimum (1)
2020-11-12 14:51:32.772 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps max_prune 100
2020-11-12 14:51:32.772 7f39880b1700 10 mon.monb01@0(leader).osd e72592 
try_prune_purged_snaps actually pruned 0
2020-11-12 14:51:37.771 7f39880b1700 10 mon.monb01@0(leader).osd e72

[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-12 Thread athreyavc
Hi,

Thanks for the email, But we are not using RAID at all, we are using HBAs
LSI HBA 9400-8e. Eash HDD is configured as an OSD.

On Thu, Nov 12, 2020 at 12:19 PM Edward kalk  wrote:

> for certain CPU architecture, disable spectre and meltdown mitigations.
> (be certain network to physical nodes is secure from internet access) (use
> apt proxy, http(s), curl proxy servers)
> Try to toggle on or off the physical on disk cache. (raid controller
> command)
> ^I had same issue, doing both of these fixed it. In my case the disks I
> had needed on disk cache hard set to ‘on’. raid card default was not good.
> (be sure to have diverse power and UPS protection if needed to run on disk
> cache on) (good RAID. battery if using raid cache improves perf.)
>
> to see the perf impact of spec. and melt. mitigation vs. off, run: dd
> if=/dev/zero of=/dev/null
> ^i run for 5 seconds and then ctl+c
> will show a max north bridge ops.
>
> to see the difference in await and IOPs when toggle RAID card features and
> on disk cache I run: iostat -xtc 2
> and use fio to generate disk load for testing IOPs. (google fio example
> commands)
> ^south bridge +raid controller to disks ops and latency.
>
> -Edward Kalk
> Datacenter Virtualization
> Performance Engineering
> Socket Telecom
> Columbia, MO, USA
> ek...@socket.net
>
> > On Nov 12, 2020, at 4:45 AM, athreyavc  wrote:
> >
> > Jumbo frames enabled  and MTU is 9000
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread Janek Bevendorff
Here is a bug report concerning (probably) this exact issue: 
https://tracker.ceph.com/issues/47866


I left a comment describing the situation and my (limited) experiences 
with it.



On 11/11/2020 10:04, Janek Bevendorff wrote:


Yeah, that seems to be it. There are 239 objects prefixed 
.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none 
of the multiparts from the other file to be found and the head object 
is 0 bytes.


I checked another multipart object with an end pointer of 11. 
Surprisingly, it had way more than 11 parts (39 to be precise) named 
.1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I 
could find them in the dump at least.


I have no idea why the objects disappeared. I ran a Spark job over all 
buckets, read 1 byte of every object and recorded errors. Of the 78 
buckets, two are missing objects. One bucket is missing one object, 
the other 15. So, luckily, the incidence is still quite low, but the 
problem seems to be expanding slowly.



On 10/11/2020 23:46, Rafael Lopez wrote:

Hi Janek,

What you said sounds right - an S3 single part obj won't have an S3 
multipart string as part of the prefix. S3 multipart string looks 
like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".


From memory, single part S3 objects that don't fit in a single rados 
object are assigned a random prefix that has nothing to do with 
the object name, and the rados tail/data objects (not the head 
object) have that prefix.
As per your working example, the prefix for that would be 
'.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow" 
objects with names containing that prefix, and if you add up the 
sizes it should be the size of your S3 object.


You should look at working and non working examples of both single 
and multipart S3 objects, as they are probably all a bit different 
when you look in rados.


I agree it is a serious issue, because once objects are no longer in 
rados, they cannot be recovered. If it was a case that there was a 
link broken or rados objects renamed, then we could work to 
recover...but as far as I can tell, it looks like stuff is just 
vanishing from rados. The only explanation I can think of is some 
(rgw or rados) background process is incorrectly doing something with 
these objects (eg. renaming/deleting). I had thought perhaps it was a 
bug with the rgw garbage collector..but that is pure speculation.


Once you can articulate the problem, I'd recommend logging a bug 
tracker upstream.



On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff 
> wrote:


Here's something else I noticed: when I stat objects that work
via radosgw-admin, the stat info contains a "begin_iter" JSON
object with RADOS key info like this


                    "key": {
                        "name":
"29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
                        "instance": "",
                        "ns": ""
                    }


and then "end_iter" with key info like this:


                    "key": {
                        "name":
".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
                        "instance": "",
                        "ns": "shadow"
                    }

However, when I check the broken 0-byte object, the "begin_iter"
and "end_iter" keys look like this:


                    "key": {
                        "name":

"29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
                        "instance": "",
                        "ns": "multipart"
                    }

[...]


                    "key": {
                        "name":

"29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19",
                        "instance": "",
                        "ns": "multipart"
                    }

So, it's the full name plus a suffix and the namespace is
multipart, not shadow (or empty). This in itself may just be an
artefact of whether the object was uploaded in one go or as a
multipart object, but the second difference is that I cannot find
any of the multipart objects in my pool's object name dump. I
can, however, find the shadow RADOS object of the intact S3 object.




--
*Rafael Lopez*
Devops Systems Engineer
Monash University eResearch Centre

T: +61 3 9905 9118 
E: rafael.lo...@monash.edu 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus - osdmap not trimming

2020-11-12 Thread Dan van der Ster
This is weird -- afaict it should be trimming.
Can you revert your custom paxos and osdmap options to their defaults,
then restart your mon leader, then wait 5 minutes, then finally
generate some new osdmap churn (e.g. ceph osd pool set xx min_size 2,
redundantly). Then please again share the relevant trim/prune-related
logs (at debug_mon = 10).

Cheers, Dan


On Thu, Nov 12, 2020 at 12:14 PM  wrote:
>
> Hi
>
> Thanks for the reply. Yeah, i restarted all of the mon servers, in
> sequence, and yesterday just leader alone without any success.
>
> Reports:
>
> root@monb01:~# ceph report | grep committed
> report 4002437698
>  "monmap_first_committed": 1,
>  "monmap_last_committed": 6,
>  "osdmap_first_committed": 67114,
>  "osdmap_last_committed": 72592,
>  "mdsmap_first_committed": 1,
>  "mdsmap_last_committed": 1,
>  "first_committed": 609225,
>  "last_committed": 609251,
>  "first_committed": 180754137,
>  "last_committed": 180754777,
> root@monb01:~#
>
> root@monb01:~# ceph report | jq .osdmap_clean_epochs
> report 395175214
> {
>"min_last_epoch_clean": 72592,
>"last_epoch_clean": {
>  "per_pool": [
>{
>  "poolid": 0,
>  "floor": 72592
>},
>{
>  "poolid": 1,
>  "floor": 72592
>},
>{
>  "poolid": 2,
>  "floor": 72592
>},
>{
>  "poolid": 3,
>  "floor": 72592
>},
>{
>  "poolid": 4,
>  "floor": 72592
>},
>{
>  "poolid": 5,
>  "floor": 72592
>},
>{
>  "poolid": 26,
>  "floor": 72592
>},
>{
>  "poolid": 27,
>  "floor": 72592
>},
>{
>  "poolid": 28,
>  "floor": 72592
>}
>  ]
>},
>"osd_epochs": [
>  {
>"id": 0,
>"epoch": 72592
>  },
>  {
>"id": 1,
>"epoch": 72592
>  },
>  {
>"id": 2,
>"epoch": 72592
>  },
>  {
>"id": 3,
>"epoch": 72592
>  },
>  {
>"id": 4,
>"epoch": 72592
>  },
>  {
>"id": 5,
>"epoch": 72592
>  },
>  {
>"id": 6,
>"epoch": 72592
>  },
>  {
>"id": 7,
>"epoch": 72592
>  },
>  {
>"id": 8,
>"epoch": 72592
>  },
>  {
>"id": 9,
>"epoch": 72592
>  },
>  {
>"id": 10,
>"epoch": 72592
>  },
>  {
>"id": 11,
>"epoch": 72592
>  },
>  {
>"id": 12,
>"epoch": 72592
>  },
>  {
>"id": 13,
>"epoch": 72592
>  },
>  {
>"id": 14,
>"epoch": 72592
>  },
>  {
>"id": 15,
>"epoch": 72592
>  },
>  {
>"id": 16,
>"epoch": 72592
>  },
>  {
>"id": 17,
>"epoch": 72592
>  },
>  {
>"id": 18,
>"epoch": 72592
>  },
>  {
>"id": 19,
>"epoch": 72592
>  },
>  {
>"id": 20,
>"epoch": 72592
>  },
>  {
>"id": 21,
>"epoch": 72592
>  },
>  {
>"id": 22,
>"epoch": 72592
>  },
>  {
>"id": 23,
>"epoch": 72592
>  },
>  {
>"id": 24,
>"epoch": 72592
>  },
>  {
>"id": 25,
>"epoch": 72592
>  },
>  {
>"id": 26,
>"epoch": 72592
>  },
>  {
>"id": 27,
>"epoch": 72592
>  },
>  {
>"id": 28,
>"epoch": 72592
>  },
>  {
>"id": 29,
>"epoch": 72592
>  },
>  {
>"id": 30,
>"epoch": 72592
>  },
>  {
>"id": 31,
>"epoch": 72592
>  },
>  {
>"id": 32,
>"epoch": 72592
>  },
>  {
>"id": 33,
>"epoch": 72592
>  },
>  {
>"id": 34,
>"epoch": 72592
>  },
>  {
>"id": 35,
>"epoch": 72592
>  },
>  {
>"id": 36,
>"epoch": 72592
>  },
>  {
>"id": 37,
>"epoch": 72592
>  },
>  {
>"id": 38,
>"epoch": 72592
>  },
>  {
>"id": 39,
>"epoch": 72592
>  },
>  {
>"id": 40,
>"epoch": 72592
>  },
>  {
>"id": 41,
>"epoch": 72592
>  },
>  {
>"id": 42,
>"epoch": 72592
>  },
>  {
>"id": 43,
>"epoch": 72592
>  },
>  {
>"id": 44,
>"epoch": 72592
>  },
>  {
>"id": 45,
>"epoch": 72592
>  },
>  {
>"id": 46,
>"epoch": 72592
>  },
>  {
>"id": 47,
>"epoch": 72592
>  },
>  {
>"id": 48,
>"epoch": 72592
>  },
>  {
>"id": 49,
>"epoch": 72592
>  },
>  {
>"id": 50

[ceph-users] Re: Unable to clarify error using vfs_ceph (Samba gateway for CephFS)

2020-11-12 Thread Frank Schilder
You might face the same issue I had. vfs_ceph wants to have a key for the root 
of the cephfs, it is cutrently not possible to restrict access to a 
sub-directory mount. For this reason, I decided to go for a re-export of a 
kernel client mount.

I consider this a serious security issue in vfs_ceph and will not use it until 
it is possible to do sub-directory mounts.

I don't think its difficult to patch the vfs_ceph source code, if you need to 
use vfs_ceph and cannot afford to give access to "/" of the cephfs.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Matt Larson 
Sent: 12 November 2020 00:40:21
To: ceph-users
Subject: [ceph-users] Unable to clarify error using vfs_ceph (Samba gateway for 
CephFS)

I am getting an error in the log.smbd from the Samba gateway that I
don’t understand and looking for help from anyone who has gotten the
vfs_ceph working.

Background:

I am trying to get a Samba gateway with CephFS working with the
vfs_ceph module. I observed that the default Samba package on CentOS
7.7 did not come with the ceph.so vfs_ceph module, so I tried to
compile a working Samba version with vfs_ceph.

Newer Samba versions have a requirement for GnuTLS >= 3.4.7, which is
not an available package on CentOS 7.7 without a custom repository. I
opted to build an earlier version of Samba.

On CentOS 7.7, I built Samba 4.11.16 with

[global]
security = user
map to guest = Bad User
username map = /etc/samba/smbusers
log level = 4
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes

[cryofs_upload]
public = yes
read only = yes
guest ok = yes
vfs objects = ceph
path = /upload
kernel share modes = no
ceph:user_id = samba.upload
ceph:config_file = /etc/ceph/ceph.conf

I have a file at /etc/ceph/ceph.conf including:
fsid = redacted
mon_host = redacted
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx


I have an /etc/ceph/client.samba.upload.keyring /w key for the user
`samba.upload`

However, connecting fails:

smbclient localhost\\cryofs_upload -U guest
Enter guest's password:
tree connect failed: NT_STATUS_UNSUCCESSFUL


The log.smbd gives these errors:

  Initialising custom vfs hooks from [ceph]
[2020/11/11 17:24:37.388460,  3]
../../lib/util/modules.c:167(load_module_absolute_path)
  load_module_absolute_path: Module '/usr/local/samba/lib/vfs/ceph.so' loaded
[2020/11/11 17:24:37.402026,  1]
../../source3/smbd/service.c:668(make_connection_snum)
  make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
'/upload' failed: No such file or directory

There is an /upload directory for which the samba.upload user has read
access to in the CephFS.

What does this error mean: ‘no such file or directory’ ? Is it that
vfs_ceph isn’t finding `/upload` or is some other file depended by
vfs_ceph not been found? I have also tried to specify a local path
rather than a CephFS path and will get the same error.

Is there any good guide that describes not just the Samba smb.conf,
but also what should be in /etc/ceph/ceph.conf, and how to provide the
key for the ceph:user_id ? I am really struggling to find good
first-hand documentation for this.

Thanks,
  Matt

--
Matt Larson, PhD
Madison, WI  53705 U.S.A.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-12 Thread Edward kalk
for certain CPU architecture, disable spectre and meltdown mitigations. (be 
certain network to physical nodes is secure from internet access) (use apt 
proxy, http(s), curl proxy servers) 
Try to toggle on or off the physical on disk cache. (raid controller command)
^I had same issue, doing both of these fixed it. In my case the disks I had 
needed on disk cache hard set to ‘on’. raid card default was not good. (be sure 
to have diverse power and UPS protection if needed to run on disk cache on) 
(good RAID. battery if using raid cache improves perf.)

to see the perf impact of spec. and melt. mitigation vs. off, run: dd 
if=/dev/zero of=/dev/null 
^i run for 5 seconds and then ctl+c
will show a max north bridge ops.

to see the difference in await and IOPs when toggle RAID card features and on 
disk cache I run: iostat -xtc 2
and use fio to generate disk load for testing IOPs. (google fio example 
commands)
^south bridge +raid controller to disks ops and latency.

-Edward Kalk
Datacenter Virtualization
Performance Engineering 
Socket Telecom
Columbia, MO, USA
ek...@socket.net

> On Nov 12, 2020, at 4:45 AM, athreyavc  wrote:
> 
> Jumbo frames enabled  and MTU is 9000
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus - osdmap not trimming

2020-11-12 Thread m . sliwinski

Hi

Thanks for the reply. Yeah, i restarted all of the mon servers, in 
sequence, and yesterday just leader alone without any success.


Reports:

root@monb01:~# ceph report | grep committed
report 4002437698
"monmap_first_committed": 1,
"monmap_last_committed": 6,
"osdmap_first_committed": 67114,
"osdmap_last_committed": 72592,
"mdsmap_first_committed": 1,
"mdsmap_last_committed": 1,
"first_committed": 609225,
"last_committed": 609251,
"first_committed": 180754137,
"last_committed": 180754777,
root@monb01:~#

root@monb01:~# ceph report | jq .osdmap_clean_epochs
report 395175214
{
  "min_last_epoch_clean": 72592,
  "last_epoch_clean": {
"per_pool": [
  {
"poolid": 0,
"floor": 72592
  },
  {
"poolid": 1,
"floor": 72592
  },
  {
"poolid": 2,
"floor": 72592
  },
  {
"poolid": 3,
"floor": 72592
  },
  {
"poolid": 4,
"floor": 72592
  },
  {
"poolid": 5,
"floor": 72592
  },
  {
"poolid": 26,
"floor": 72592
  },
  {
"poolid": 27,
"floor": 72592
  },
  {
"poolid": 28,
"floor": 72592
  }
]
  },
  "osd_epochs": [
{
  "id": 0,
  "epoch": 72592
},
{
  "id": 1,
  "epoch": 72592
},
{
  "id": 2,
  "epoch": 72592
},
{
  "id": 3,
  "epoch": 72592
},
{
  "id": 4,
  "epoch": 72592
},
{
  "id": 5,
  "epoch": 72592
},
{
  "id": 6,
  "epoch": 72592
},
{
  "id": 7,
  "epoch": 72592
},
{
  "id": 8,
  "epoch": 72592
},
{
  "id": 9,
  "epoch": 72592
},
{
  "id": 10,
  "epoch": 72592
},
{
  "id": 11,
  "epoch": 72592
},
{
  "id": 12,
  "epoch": 72592
},
{
  "id": 13,
  "epoch": 72592
},
{
  "id": 14,
  "epoch": 72592
},
{
  "id": 15,
  "epoch": 72592
},
{
  "id": 16,
  "epoch": 72592
},
{
  "id": 17,
  "epoch": 72592
},
{
  "id": 18,
  "epoch": 72592
},
{
  "id": 19,
  "epoch": 72592
},
{
  "id": 20,
  "epoch": 72592
},
{
  "id": 21,
  "epoch": 72592
},
{
  "id": 22,
  "epoch": 72592
},
{
  "id": 23,
  "epoch": 72592
},
{
  "id": 24,
  "epoch": 72592
},
{
  "id": 25,
  "epoch": 72592
},
{
  "id": 26,
  "epoch": 72592
},
{
  "id": 27,
  "epoch": 72592
},
{
  "id": 28,
  "epoch": 72592
},
{
  "id": 29,
  "epoch": 72592
},
{
  "id": 30,
  "epoch": 72592
},
{
  "id": 31,
  "epoch": 72592
},
{
  "id": 32,
  "epoch": 72592
},
{
  "id": 33,
  "epoch": 72592
},
{
  "id": 34,
  "epoch": 72592
},
{
  "id": 35,
  "epoch": 72592
},
{
  "id": 36,
  "epoch": 72592
},
{
  "id": 37,
  "epoch": 72592
},
{
  "id": 38,
  "epoch": 72592
},
{
  "id": 39,
  "epoch": 72592
},
{
  "id": 40,
  "epoch": 72592
},
{
  "id": 41,
  "epoch": 72592
},
{
  "id": 42,
  "epoch": 72592
},
{
  "id": 43,
  "epoch": 72592
},
{
  "id": 44,
  "epoch": 72592
},
{
  "id": 45,
  "epoch": 72592
},
{
  "id": 46,
  "epoch": 72592
},
{
  "id": 47,
  "epoch": 72592
},
{
  "id": 48,
  "epoch": 72592
},
{
  "id": 49,
  "epoch": 72592
},
{
  "id": 50,
  "epoch": 72592
},
{
  "id": 51,
  "epoch": 72592
},
{
  "id": 52,
  "epoch": 72592
},
{
  "id": 53,
  "epoch": 72592
},
{
  "id": 54,
  "epoch": 72592
},
{
  "id": 55,
  "epoch": 72592
},
{
  "id": 56,
  "epoch": 72592
},
{
  "id": 57,
  "epoch": 72592
},
{
  "id": 58,
  "epoch": 72592
},
{
  "id": 59,
  "epoch": 72592
},
{
  "id": 60,
  "epoch": 72592
},
{
  "id": 61,
  "epoch": 72592
},
{
  "id": 62,
  "epoch": 72592
},
{
  "id": 63,
  "epoch": 72592
},
{
  "id": 64,
  "epoch": 72592
},
{
  "id": 65,
  "epoch": 72592
},
{
  "id": 66,
  "epoch": 72592
},
{
  "id": 67,
  "epoch": 72592
},
{
  "id": 68,
  "epoch": 72592
},
{
  "id": 69,
  "epoch": 72592
},
{
  "id": 70,
  "epoch": 72592
},
{
  "id": 71,
  "epoch": 72592
},
{
  "id": 72,
  "epoch": 72592
},
{
  "id": 73,
  "epoch": 72592
},
{
   

[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-12 Thread athreyavc
>From different search results I read, disabling cephx can help.

Also https://static.linaro.org/connect/san19/presentations/san19-120.pdf
recommended some settings changes for the bluestore cache.

[osd]
bluestore cache autotune = 0
bluestore_cache_kv_ratio = 0.2
bluestore_cache_meta_ratio = 0.8
bluestore rocksdb options =
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,write_buffer_size=64M,compaction_readahead_size=2M
bluestore_cache_size_hdd = 536870912 # This is size of the Cache on the HDD
osd_min_pg_log_entries = 10
osd_max_pg_log_entries = 10
osd_pg_log_dups_tracked = 10
osd_pg_log_trim_min = 10

But nothing changed much.

It looks like it is mostly with the small files, when I tested the same
with  128k or even 64k block size, the results were much better.

Any suggestions?

Thanks and Regards,

Athreya



On Tue, Nov 10, 2020 at 8:51 PM  wrote:

> Hi,
>
> We have recently deployed a Ceph cluster with
>
> 12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
> 3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8
>
> We are using Ceph Octopus and we are using RBD block devices.
>
> We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8)
> across which RBDs are mapped and mounted, 25 RBDs each on each client node.
> Each RBD size is 10TB. Each RBD is formatted as EXT4 file system.
>
> From network side, we have 10Gbps Active/Passive Bond on all the Ceph
> cluster nodes, including the clients. Jumbo frames enabled  and MTU is 9000
>
> This is a new cluster and cluster health reports Ok. But we see high IO
> wait during the writes.
>
> From one of the clients,
>
> 15:14:30CPU %user %nice   %system   %iowait%steal
>  %idle
> 15:14:31all  0.06  0.00  1.00 45.03  0.00
>  53.91
> 15:14:32all  0.06  0.00  0.94 41.28  0.00
>  57.72
> 15:14:33all  0.06  0.00  1.25 45.78  0.00
>  52.91
> 15:14:34all  0.00  0.00  1.06 40.07  0.00
>  58.86
> 15:14:35all  0.19  0.00  1.38 41.04  0.00
>  57.39
> Average:all  0.08  0.00  1.13 42.64  0.00
>  56.16
>
> and the system load shows very high
>
> top - 15:19:15 up 34 days, 41 min,  2 users,  load average: 13.49, 13.62,
> 13.83
>
>
> From 'atop'
>
> one of the CPUs shows this
>
> CPU | sys   7%  | user  1% |  irq   2% |  idle   1394% | wait
>   195%  | steal 0% |  guest 0% | ipc  initial  | cycl initial  |
> curf  806MHz |  curscal   ?%
>
> On the OSD nodes, don't see much %utilization of the disks.
>
> RBD caching values are default.
>
> Are we overlooking some configuration item ?
>
> Thanks and Regards,
>
> At
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io