[ceph-users] Re: Ceph debian/ubuntu packages build

2022-08-10 Thread David Galloway
Well, so that sounds like you ended up with actual .debs so you should 
have been able to just run reprepro after installing it.


Were you trying to then create a repo?

BTW, it may be worthwhile to update the docs to mention reprepro being a 
prerequisite.


https://docs.ceph.com/en/latest/install/build-ceph/

We do use ccache.  If you have enough RAM, you could also build in a 
tmpfs to avoid waiting for your disk.  Believe me, we've done everything 
I'm aware of to speed up the builds.  Ceph's just big.


Adding mnelson in case he has any other tips.

On 8/10/22 20:20, Zhongzhou Cai wrote:
Is there any way to accelerate that, e.g., using cache? My build failed 
due to a missing package `reprepro`. After I installed it and restarted 
the build, it started from scratch and re-did everything. Any 
optimization would be nice if possible.


Thanks,
Zhongzhou Cai


On Tue, Aug 9, 2022 at 5:35 PM David Galloway > wrote:


On average, building Ubuntu packages takes about 1 to 1.5 hours on very
powerful hardware.

https://wiki.sepia.ceph.com/doku.php?id=hardware:braggi

https://wiki.sepia.ceph.com/doku.php?id=hardware:adami


It's a massive project and always been that way.

On 8/9/22 20:31, Zhongzhou Cai wrote:
 > Hi,
 >
 > I'm building ceph debian/ubuntu packages, but I found it would
take up to
 > hours if I run ./make-debs.sh script. Looks like it is
downloading a lot of
 > things like pip, tox, which I suppose should have been done during
 > install-deps.sh script? Some of the output ./make-debs.sh shows:
 > ```
 > Installing collected packages: pip
 >    Attempting uninstall: pip
 >      Found existing installation: pip 20.0.2
 >      Uninstalling pip-20.0.2:
 >        Successfully uninstalled pip-20.0.2
 > Successfully installed pip-22.2.2
 > ```
 >
 > I also tried to run ./do_cmake.sh, which takes significantly
shorter time
 > than ./make-debs.sh, even with debug build. Why is that the case?
Or do I
 > miss something?
 >
 > Thanks,
 > Zhongzhou Cai
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io

 > To unsubscribe send an email to ceph-users-le...@ceph.io

 >



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Reed Dier
I will chime in just from my ubuntu perspective, if I compare previous (LTS) 
releases of ceph to ubuntu, there has typically been a 2 release cadence per 
ubuntu release.

version
U14
U16
U18
U20
U22
U24
jewel
X
X




luminous
X
X




mimic

X
X



nautilous

X
X



octopus


X
X


pacific


X
X


quincy



X
I

reef



I
X

s




I
I

X is where debs are available, and I is implicit based on history.
So if history were carried forward, it seems like quincy should be the first 
release to drop bionic, and then that would make the S release be the first to 
drop focal.

Selfishly, I would love to see reef continue to be built for focal (as well as 
quincy built for jammy).
This cadence has worked really for my org in the past, so count me as someone 
who would prefer to see reef built for focal (and quincy for jammy) all things 
being equal.

Reed

> On Aug 10, 2022, at 10:07 AM, Ken Dreyer  wrote:
> 
> Hi folks,
> 
> In the Ceph Leadership Team meeting today we discussed dropping
> support for older distros in our Reef release. CentOS 9 and Ubuntu
> Jammy (22.04) have been out for a while. With recent changes in Ceph's
> main branch, it will make it easier to minimally require CentOS 9 and
> Ubuntu Jammy with Python 3.9+ and a newer GCC.
> 
> Practically for users, this means we'd stop building and shipping RPMs
> for CentOS 8 and debs for Ubuntu Focal from download.ceph.com. We
> would make this change for the "main" branch soon, and Reef would be
> the first stable release with this change (early-to-mid 2023).
> 
> As usual we would continue to build the older distro's packages (eg
> CentOS 8 and Focal packages) for Quincy and earlier.
> 
> We have not shipped CentOS 9 packages or Ubuntu Jammy for Quincy to
> download.ceph.com yet, and we plan to do that soon.
> 
> - Ken
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Request for Info: bluestore_compression_mode?

2022-08-10 Thread Mark Nelson


On 8/10/22 10:08, Frank Schilder wrote:

Hi Mark.


I actually had no idea that you needed both the yaml option
and the pool option configured

I guess you are referring to ceph-adm deployments, which I'm not using. In the 
ceph config data base, both options mush be enabled irrespective of how this 
happens (I separate the application ceph from deployment systems, which may or 
may not have their own logic). There was a longer thread started by me some 
years ago where someone posted a matrix of how both mode settings interact and 
what the resulting mode is.


I might have misunderstood what you were saying.  I was in fact 
referring to the yaml config and pool options.  I was under the 
impression that the pool setting overrode whatever was in the yaml and 
you didn't need to sort of chain them to be enabled in both places.  Am 
I mistaken?




Our applications are ceph fs data pools and rbd data pools, all EC pools. This 
places some heavy requirements on the compression methods in order not to kill 
IOPs performance completely. I don't know what your long-term goal with this 
is, just simplify some internals or achieve better storage utilisation. 
However, something like compression of entire files will probably kill 
performance to such a degree that it becomes useless.


Mostly the conversation just started out with how we could clean up some 
of the internals to be less complex.  That led to a discussion of how 
compression was implemented along with blobs early on in bluestore's 
life as part of a big write path overhaul.  That led to further 
questions regarding whether or not people actually use it and whether or 
not it's useful out in the field, hence this conversation. :)  The 
general thought process was questioning whether we might be able to 
handle compression more cleanly higher up in the stack (withe the 
trade-off being losing blob level granularity), but we want to be very 
careful since as you say below it could be a lot of effort for little 
gain other than making bluestore simpler (which is a gain to be sure).




I am not sure if you can get much better results out of changes. It looks like 
you could spend a lot of time on it and gain little. Maybe I can draw your 
attention to a problem that might lead to much more valuable improvements for 
both, pool performance and disk utilisation. This is going to be a bit longer, 
I need to go into details. I hope you find the time to read on.

Ceph has a couple of problems with its EC implementation. One problem that I 
have never seen discussed so far is the inability of its data stores to perform 
tail merging. I have opened a feature request 
(https://tracker.ceph.com/issues/56949) that describes the symptom and only 
requests a way to account for the excess usage. In the last sentence I mention 
that tail merging would be the real deal.

The example given there shows how extreme the problem can materialise. Here is 
the situation as of today while running my benchmark:

status usage: Filesystem Size  Used Avail Use% 
Mounted on
status usage: 10.41.24.13,10.41.24.14,10.41.24.15:/data  2.0T  276G  1.8T  14% 
/mnt/cephfs
status usage: 10.41.24.13,10.41.24.14,10.41.24.15:/  2.5T  2.1T  419G  84% 
/mnt/adm/cephfs

Only /data contains any data. The first line shows the ceph.dir.rbytes=276G 
while the second line shows the pool usage 2.1T. The discrepancy due to small 
files is more than a factor 7. Compression is enabled, but you won't gain much 
here because most files are below compression_min_blob_size.

I know that the "solution" to this (and the EC overwrite amplification problem) 
was chosen to be bluestore_min_alloc_size=4K for all types of devices, which comes with 
its own problems due to the huge rocks db required and was therefore postponed in 
octopus. I wonder how this will work on our 18TB hard drives. I personally am not 
convinced that this is a path of success and, while it reduces the problem of not having 
tail merging, it does not really remove the need for tail merging. Even on an k=8 EC 
profile, 4*8=32K is a quite large unit of atomicity. On geo-replicated EC pools even 
larger values of k are the standard.


Yep, 4K is way better than what we had before with the 64K min_alloc 
size, but the seldomly talked about reality is that if you primarily 
have small (say <8-16K) objects you might want to look at whether or not 
you are actually gaining anything with EC vs replication with the 
current implementation.




Are there any discussions and/or ideas on how to address this? ... in different 
ways?

There was also a discussion about de-duplication. Are there any news in this 
direction?


I haven't seen a lot of movement specifically on the EC and de-dup 
fronts, but it's possible that someone is working on them and I'm not in 
the loop.  Going to punt on these.





The following is speculative, based on incomplete knowledge:

An idea I would consider worthwhile is a separation 

[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Marc


> 
> The immediate driver is both a switch to newer versions of python, and
> to
> newer compilers supporting more C++20 features.

But this is known for 'decades', don't you incorporate this in your long term 
development planning? It is not like you are making some useless temporary 
photo or messaging app. Some do not even choose to use python, just because of 
this. 
It sounds a bit like stepping on purpose in a pool of water, and then complain 
about that you get wet from it.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multi-active MDS cache pressure

2022-08-10 Thread Eugen Block

Hi,


This thread contains some really insightful information. Thanks Eugen for
sharing the explanation by the SUSE team. Definitely the doc can be updated
with this, it might help a lot of people indeed.
Can you help creating a tracker for this? I wish to add the info to doc and
push a PR for the same.


I agree, it's really valuable information. I'm quite busy this week  
but I'd be happy to create the tracker tomorrow or Friday.



Zitat von Dhairya Parmar :


Hi there,

This thread contains some really insightful information. Thanks Eugen for
sharing the explanation by the SUSE team. Definitely the doc can be updated
with this, it might help a lot of people indeed.
Can you help creating a tracker for this? I wish to add the info to doc and
push a PR for the same.

On Wed, Aug 10, 2022 at 1:45 AM Malte Stroem  wrote:


Hello Eugen,

thank you very much for the full explanation.

This fixed our cluster and I am sure this helps a lot of people around
the world since this is a problem occuring everywhere.

I think this should be added to the documentation:

https://docs.ceph.com/en/latest/cephfs/cache-configuration/#mds-recall

or better:


https://docs.ceph.com/en/quincy/cephfs/health-messages/#mds-client-recall-mds-health-client-recall-many

Best wishes!
Malte

Am 09.08.22 um 16:34 schrieb Eugen Block:
> Hi,
>
>> did you have some success with modifying the mentioned values?
>
> yes, the SUSE team helped identifying the issue, I can share the
> explanation:
>
> ---snip---
> Every second (mds_cache_trim_interval config param) the mds is running
> "cache trim" procedure. One of the steps of this procedure is "recall
> client state". During this step it checks every client (session) if it
> needs to recall caps. There are several criteria for this:
>
> 1) the cache is full (exceeds mds_cache_memory_limit) and needs some
> inodes to be released;
> 2) the client exceeds mds_max_caps_per_client (1M by default);
> 3) the client is inactive.
>
> To determine a client (session) inactivity, the session's cache_liveness
> parameters is checked and compared with the value:
>
>(num_caps >> mds_session_cache_liveness_magnitude)
>
> where mds_session_cache_liveness_magnitude is a config param (10 by
> default).
> If cache_liveness is smaller than this calculated value the session is
> considered inactive and the mds sends "recall caps" request for all
> cached caps (actually the recall value is `num_caps -
> mds_min_caps_per_client(100)`).
>
> And if the client is not releasing the caps fast, the next second it
> repeats again, i.e. the mds will send "recall caps" with high value
> again and so on and the "total" counter of "recall caps" for the session
> will grow, eventually exceeding the mon warning limit.
> There is a throttling mechanism, controlled by
> mds_recall_max_decay_threshold parameter (126K by default), which should
> reduce the rate of "recall caps" counter grow but it looks like it is
> not enough for this case.
>
>  From the collected sessions, I see that during that 30 minute period
> the total num_caps for that client decreased by about 3500.
> ...
> Here is an example. A client is having 20k caps cached. At some moment
> the server decides the client is inactive (because the session's
> cache_liveness value is low). It starts to ask the client to release
> caps down to  mds_min_caps_per_client value (100 by default). For this
> every seconds it sends recall_caps asking to release `caps_num -
> mds_min_caps_per_client` caps (but not more than `mds_recall_max_caps`,
> which is 30k by default). A client is starting to release, but is
> releasing with a rate e.g. only 100 caps per second.
>
> So in the first second the mds sends recall_caps = 20k - 100
> the second second recall_caps = (20k - 100) - 100
> the third second recall_caps = (20k - 200) - 100
> and so on
>
> And every time it sends recall_caps it updates the session's recall_caps
> value, which is calculated  how many recall_caps sent in the last
> minute. I.e. the counter is growing quickly, eventually exceeding
> mds_recall_warning_threshold, which is 128K by default, and ceph starts
> to report "failing to respond to cache pressure" warning in the status.
>
> Now, after we set mds_recall_max_caps to 3K, in this situation the mds
> server sends only 3K recall_caps per second, and the maximum value the
> session's recall_caps value may have (if the mds is sending 3K every
> second for at least one minute) is 60 * 3K = 180K. I.e. it is still
> possible to achieve mds_recall_warning_threshold but only if a client is
> not "responding" for long period, and as your experiments show it is not
> the case.
> ---snip---
>
> So what helped us here was to decrease mds_recall_max_caps in 1k steps,
> starting with 1. This didn't reduce the warnings so I decreased it
> to 3000 and I haven't seen those warnings since then. Also I decreased
> the mds_cache_memory_limit again, it wasn't helping here.
>
> Regards,
> Eugen
>
>
> Zitat von Malte 

[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Gregory Farnum
The immediate driver is both a switch to newer versions of python, and to
newer compilers supporting more C++20 features.

More generally, supporting multiple versions of a distribution is a lot of
work and when Reef comes out next year, CentOS9 will be over a year old. We
generally move new stable releases to the newest long-term release of any
distro we package for. That means CentOS9 for Reef.

We aren’t dropping any distros for Quincy, of course, which is our current
stable release.
-Greg

On Wed, Aug 10, 2022 at 10:27 AM Konstantin Shalygin  wrote:

> Ken, can you please describe what incompatibilities or dependencies are
> causing to not build packages for c8s? It's not obvious from the first
> message, from community side 
>
>
> Thanks,
> k
>
> Sent from my iPhone
>
> > On 10 Aug 2022, at 20:02, Ken Dreyer  wrote:
> >
> > On Wed, Aug 10, 2022 at 11:35 AM Konstantin Shalygin 
> wrote:
> >>
> >> Hi Ken,
> >>
> >> CentOS 8 Stream will continue to receive packages or have some barrires
> for R?
> >
> > No, starting with Reef, we will no longer build nor ship RPMs for
> > CentOS 8 Stream (and debs for Ubuntu Focal) from download.ceph.com.
> > The only CentOS Stream version for Reef+ will be CentOS 9 Stream.
> >
> > - Ken
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Konstantin Shalygin
Ken, can you please describe what incompatibilities or dependencies are causing 
to not build packages for c8s? It's not obvious from the first message, from 
community side 


Thanks,
k

Sent from my iPhone

> On 10 Aug 2022, at 20:02, Ken Dreyer  wrote:
> 
> On Wed, Aug 10, 2022 at 11:35 AM Konstantin Shalygin  wrote:
>> 
>> Hi Ken,
>> 
>> CentOS 8 Stream will continue to receive packages or have some barrires for 
>> R?
> 
> No, starting with Reef, we will no longer build nor ship RPMs for
> CentOS 8 Stream (and debs for Ubuntu Focal) from download.ceph.com.
> The only CentOS Stream version for Reef+ will be CentOS 9 Stream.
> 
> - Ken
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Ken Dreyer
On Wed, Aug 10, 2022 at 11:35 AM Konstantin Shalygin  wrote:
>
> Hi Ken,
>
> CentOS 8 Stream will continue to receive packages or have some barrires for R?

No, starting with Reef, we will no longer build nor ship RPMs for
CentOS 8 Stream (and debs for Ubuntu Focal) from download.ceph.com.
The only CentOS Stream version for Reef+ will be CentOS 9 Stream.

- Ken

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Konstantin Shalygin
Hi Ken,

CentOS 8 Stream will continue to receive packages or have some barrires for R?


Thanks,
k

Sent from my iPhone

> On 10 Aug 2022, at 18:08, Ken Dreyer  wrote:
> 
> Hi folks,
> 
> In the Ceph Leadership Team meeting today we discussed dropping
> support for older distros in our Reef release. CentOS 9 and Ubuntu
> Jammy (22.04) have been out for a while. With recent changes in Ceph's
> main branch, it will make it easier to minimally require CentOS 9 and
> Ubuntu Jammy with Python 3.9+ and a newer GCC.
> 
> Practically for users, this means we'd stop building and shipping RPMs
> for CentOS 8 and debs for Ubuntu Focal from download.ceph.com. We
> would make this change for the "main" branch soon, and Reef would be
> the first stable release with this change (early-to-mid 2023).
> 
> As usual we would continue to build the older distro's packages (eg
> CentOS 8 and Focal packages) for Quincy and earlier.
> 
> We have not shipped CentOS 9 packages or Ubuntu Jammy for Quincy to
> download.ceph.com yet, and we plan to do that soon.
> 
> - Ken
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] linux distro requirements for reef

2022-08-10 Thread Ken Dreyer
Hi folks,

In the Ceph Leadership Team meeting today we discussed dropping
support for older distros in our Reef release. CentOS 9 and Ubuntu
Jammy (22.04) have been out for a while. With recent changes in Ceph's
main branch, it will make it easier to minimally require CentOS 9 and
Ubuntu Jammy with Python 3.9+ and a newer GCC.

Practically for users, this means we'd stop building and shipping RPMs
for CentOS 8 and debs for Ubuntu Focal from download.ceph.com. We
would make this change for the "main" branch soon, and Reef would be
the first stable release with this change (early-to-mid 2023).

As usual we would continue to build the older distro's packages (eg
CentOS 8 and Focal packages) for Quincy and earlier.

We have not shipped CentOS 9 packages or Ubuntu Jammy for Quincy to
download.ceph.com yet, and we plan to do that soon.

- Ken

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.9 High rate of Segmentation fault on ceph-osd processes

2022-08-10 Thread Paul JURCO
Hi!
All restarted as required in upgrade plan in the proper order, all software
was upgraded on all nodes. We are on Ubuntu 18 (all nodes).
"ceph versions" output shows all is on "16.2.9".
Thank you!

-- 
Paul Jurco


On Wed, Aug 10, 2022 at 5:43 PM Eneko Lacunza  wrote:

> Hi Paul,
>
> Did you restart OSDs after upgrading to 16.2.9 (you can just check with
> "ceph versions") ?
>
> All crashes show similar backtrace with BlueStore::Onode::put() ?
>
> Cheers
>
> El 10/8/22 a las 14:22, Paul JURCO escribió:
>
> Hi,
> We have two similar clusters in number of hosts and disks, about the same
> age with pacific 16.2.9.
> Both have a mix of hosts with 1TB and 2TB  disks (disks' capacity is not
> mixed on hosts for OSDs).
> One of the clusters has 21 osd process crashes in the last 7 days, the
> other has just 3.
> Full stack as reported in ceph-osd log:
>
> 2022-08-03T06:39:30.987+0300 7f118dd4d700 -1 *** Caught signal 
> (*Segme**ntation
> fault*) **
>
>  in thread 7f118dd4d700 *thread_name:tp_osd_tp*
>
>
>  ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
> (stable)
>
>  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f11b3ad2980]
>
>  2: (ceph::buffer::v15_2_0::ptr::release()+0x2d) [0x558cc32238ad]
>
>  3: (*BlueStore::Onode::put()*+0x1bc) [0x558cc2ea321c]
>
>  4: (BlueStore::getattr(boost::intrusive_ptr&,
> ghobject_t const&, char const*, ceph::buffer::v15_2_0::ptr&)+0x275)
> [0x558cc2ed1525]
>
>  5: (PGBackend::objects_get_attr(hobject_t const&,
> std::__cxx11::basic_string,
> std::allocator > const&, ceph::buffer::v15_2_0::list*)+0xc7)
> [0x558cc2b97ca7]
>
>  6: (PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
> std::map,
> std::allocator >, ceph::buffer::v15_2_0::list,
> std::less,
> std::allocator > >,
> std::allocator std::char_traits, std::allocator > const,
> ceph::buffer::v15_2_0::list> > > const*, bool)+0x3bd) [0x558cc2ade74d]
>
>  7: (PrimaryLogPG::get_object_context(hobject_t const&, bool,
> std::map,
> std::allocator >, ceph::buffer::v15_2_0::list,
> std::less,
> std::allocator > >,
> std::allocator std::char_traits, std::allocator > const,
> ceph::buffer::v15_2_0::list> > > const*)+0x328) [0x558cc2adee48]
>
>  8: (PrimaryLogPG::find_object_context(hobject_t const&,
> std::shared_ptr*, bool, bool, hobject_t*)+0x20f)
> [0x558cc2aeac3f]
>
>  9: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2661)
> [0x558cc2b353f1]
>
>  10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
> ThreadPool::TPHandle&)+0xcc7) [0x558cc2b42347]
>
>  11: (OSD::dequeue_op(boost::intrusive_ptr,
> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x17b)
> [0x558cc29c6d9b]
>
>  12: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*,
> boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x6a) [0x558cc2c29b9a]
>
>  13: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0xd1e) [0x558cc29e4dbe]
>
>  14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac)
> [0x558cc306a75c]
>
>  15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558cc306dc20]
>
>  16: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f11b3ac76db]
>
>  17: clone()
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> There was one bug fixed in 16.2.8 related to *BlueStore::Onode::put():*
> *"*os/bluestore: avoid premature onode release 
> (pr#44723 
> , Igor Fedotov)"
> an in tracker: https://tracker.ceph.com/issues/53608
> Is this segfault related to the bug? Is this new?
>
> We have upgraded in May '22 from 15.2.13 to 16.2.8 and in 2 days after to
> 16.2.9 on the cluster with crashes.
> 6 seg faults are on 2tb disks, 8 are on 1tb disks. 2TB are newer (below
> 2yo).
> Could be related to hardware?
> Thank you!
>
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 16.2.9 High rate of Segmentation fault on ceph-osd processes

2022-08-10 Thread Paul JURCO
Hi,
We have two similar clusters in number of hosts and disks, about the same
age with pacific 16.2.9.
Both have a mix of hosts with 1TB and 2TB  disks (disks' capacity is not
mixed on hosts for OSDs).
One of the clusters has 21 osd process crashes in the last 7 days, the
other has just 3.
Full stack as reported in ceph-osd log:

2022-08-03T06:39:30.987+0300 7f118dd4d700 -1 *** Caught signal (*Segme**ntation
fault*) **

 in thread 7f118dd4d700 *thread_name:tp_osd_tp*


 ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
(stable)

 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f11b3ad2980]

 2: (ceph::buffer::v15_2_0::ptr::release()+0x2d) [0x558cc32238ad]

 3: (*BlueStore::Onode::put()*+0x1bc) [0x558cc2ea321c]

 4: (BlueStore::getattr(boost::intrusive_ptr&,
ghobject_t const&, char const*, ceph::buffer::v15_2_0::ptr&)+0x275)
[0x558cc2ed1525]

 5: (PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string,
std::allocator > const&, ceph::buffer::v15_2_0::list*)+0xc7)
[0x558cc2b97ca7]

 6: (PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map,
std::allocator >, ceph::buffer::v15_2_0::list,
std::less,
std::allocator > >,
std::allocator, std::allocator > const,
ceph::buffer::v15_2_0::list> > > const*, bool)+0x3bd) [0x558cc2ade74d]

 7: (PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map,
std::allocator >, ceph::buffer::v15_2_0::list,
std::less,
std::allocator > >,
std::allocator, std::allocator > const,
ceph::buffer::v15_2_0::list> > > const*)+0x328) [0x558cc2adee48]

 8: (PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr*, bool, bool, hobject_t*)+0x20f)
[0x558cc2aeac3f]

 9: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2661)
[0x558cc2b353f1]

 10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xcc7) [0x558cc2b42347]

 11: (OSD::dequeue_op(boost::intrusive_ptr,
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x17b)
[0x558cc29c6d9b]

 12: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x6a) [0x558cc2c29b9a]

 13: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0xd1e) [0x558cc29e4dbe]

 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac)
[0x558cc306a75c]

 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558cc306dc20]

 16: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f11b3ac76db]

 17: clone()
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

There was one bug fixed in 16.2.8 related to *BlueStore::Onode::put():*
*"*os/bluestore: avoid premature onode release (pr#44723
, Igor Fedotov)"
an in tracker: https://tracker.ceph.com/issues/53608
Is this segfault related to the bug? Is this new?

We have upgraded in May '22 from 15.2.13 to 16.2.8 and in 2 days after to
16.2.9 on the cluster with crashes.
6 seg faults are on 2tb disks, 8 are on 1tb disks. 2TB are newer (below
2yo).
Could be related to hardware?
Thank you!
-- 
Paul Jurco
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multi-active MDS cache pressure

2022-08-10 Thread Dhairya Parmar
Hi there,

This thread contains some really insightful information. Thanks Eugen for
sharing the explanation by the SUSE team. Definitely the doc can be updated
with this, it might help a lot of people indeed.
Can you help creating a tracker for this? I wish to add the info to doc and
push a PR for the same.

On Wed, Aug 10, 2022 at 1:45 AM Malte Stroem  wrote:

> Hello Eugen,
>
> thank you very much for the full explanation.
>
> This fixed our cluster and I am sure this helps a lot of people around
> the world since this is a problem occuring everywhere.
>
> I think this should be added to the documentation:
>
> https://docs.ceph.com/en/latest/cephfs/cache-configuration/#mds-recall
>
> or better:
>
>
> https://docs.ceph.com/en/quincy/cephfs/health-messages/#mds-client-recall-mds-health-client-recall-many
>
> Best wishes!
> Malte
>
> Am 09.08.22 um 16:34 schrieb Eugen Block:
> > Hi,
> >
> >> did you have some success with modifying the mentioned values?
> >
> > yes, the SUSE team helped identifying the issue, I can share the
> > explanation:
> >
> > ---snip---
> > Every second (mds_cache_trim_interval config param) the mds is running
> > "cache trim" procedure. One of the steps of this procedure is "recall
> > client state". During this step it checks every client (session) if it
> > needs to recall caps. There are several criteria for this:
> >
> > 1) the cache is full (exceeds mds_cache_memory_limit) and needs some
> > inodes to be released;
> > 2) the client exceeds mds_max_caps_per_client (1M by default);
> > 3) the client is inactive.
> >
> > To determine a client (session) inactivity, the session's cache_liveness
> > parameters is checked and compared with the value:
> >
> >(num_caps >> mds_session_cache_liveness_magnitude)
> >
> > where mds_session_cache_liveness_magnitude is a config param (10 by
> > default).
> > If cache_liveness is smaller than this calculated value the session is
> > considered inactive and the mds sends "recall caps" request for all
> > cached caps (actually the recall value is `num_caps -
> > mds_min_caps_per_client(100)`).
> >
> > And if the client is not releasing the caps fast, the next second it
> > repeats again, i.e. the mds will send "recall caps" with high value
> > again and so on and the "total" counter of "recall caps" for the session
> > will grow, eventually exceeding the mon warning limit.
> > There is a throttling mechanism, controlled by
> > mds_recall_max_decay_threshold parameter (126K by default), which should
> > reduce the rate of "recall caps" counter grow but it looks like it is
> > not enough for this case.
> >
> >  From the collected sessions, I see that during that 30 minute period
> > the total num_caps for that client decreased by about 3500.
> > ...
> > Here is an example. A client is having 20k caps cached. At some moment
> > the server decides the client is inactive (because the session's
> > cache_liveness value is low). It starts to ask the client to release
> > caps down to  mds_min_caps_per_client value (100 by default). For this
> > every seconds it sends recall_caps asking to release `caps_num -
> > mds_min_caps_per_client` caps (but not more than `mds_recall_max_caps`,
> > which is 30k by default). A client is starting to release, but is
> > releasing with a rate e.g. only 100 caps per second.
> >
> > So in the first second the mds sends recall_caps = 20k - 100
> > the second second recall_caps = (20k - 100) - 100
> > the third second recall_caps = (20k - 200) - 100
> > and so on
> >
> > And every time it sends recall_caps it updates the session's recall_caps
> > value, which is calculated  how many recall_caps sent in the last
> > minute. I.e. the counter is growing quickly, eventually exceeding
> > mds_recall_warning_threshold, which is 128K by default, and ceph starts
> > to report "failing to respond to cache pressure" warning in the status.
> >
> > Now, after we set mds_recall_max_caps to 3K, in this situation the mds
> > server sends only 3K recall_caps per second, and the maximum value the
> > session's recall_caps value may have (if the mds is sending 3K every
> > second for at least one minute) is 60 * 3K = 180K. I.e. it is still
> > possible to achieve mds_recall_warning_threshold but only if a client is
> > not "responding" for long period, and as your experiments show it is not
> > the case.
> > ---snip---
> >
> > So what helped us here was to decrease mds_recall_max_caps in 1k steps,
> > starting with 1. This didn't reduce the warnings so I decreased it
> > to 3000 and I haven't seen those warnings since then. Also I decreased
> > the mds_cache_memory_limit again, it wasn't helping here.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Malte Stroem :
> >
> >> Hello Eugen,
> >>
> >> did you have some success with modifying the mentioned values?
> >>
> >> Or some others from:
> >>
> >> https://docs.ceph.com/en/latest/cephfs/cache-configuration/
> >>
> >> Best,
> >> Malte
> >>
> >> Am 15.06.22 um 14:12 schrieb Eugen 

[ceph-users] Re: [Ceph-maintainers] Re: Re: v15.2.17 Octopus released

2022-08-10 Thread Ilya Dryomov
On Wed, Aug 10, 2022 at 3:03 AM Laura Flores  wrote:
>
> Hey Satoru and others,
>
> Try this link:
> https://ceph.io/en/news/blog/2022/v15-2-17-octopus-released/

Note that this release also includes the fix for CVE-2022-0670 [1]
(same as in v16.2.10 and v17.2.2 hotfix releases).  I have updated the
release notes PR [2] accordingly and we will update the blog entry
later today.

[1] https://docs.ceph.com/en/latest/security/CVE-2022-0670
[2] https://github.com/ceph/ceph/pull/47198

Thanks,

Ilya

>
> - Laura
>
> On Tue, Aug 9, 2022 at 7:44 PM Satoru Takeuchi  
> wrote:
>>
>> Hi,
>>
>> 2022年8月10日(水) 7:00 David Galloway :
>> >
>> > We're happy to announce the 17th and final backport release in the
>> > Octopus series. For a detailed release notes with links & changelog
>> > please refer to the official blog entry at
>> > https://ceph.io/en/news/blog/2022/v15-2-17-RELEASE-released
>>
>> The link to the blog entry seems to be wrong. s/RELEASE/octopus/
>>
>> https://ceph.io/en/news/blog/2022/v15-2-17-octopus-released/
>>
>> Satoru
>>
>>
>> >
>> > Notable Changes
>> > ---
>> >
>> > * Octopus modified the SnapMapper key format from
>> > __
>> > to
>> > ___
>> >
>> > When this change was introduced, 94ebe0e also introduced a conversion
>> > with a crucial bug which essentially destroyed legacy keys by mapping
>> > them to  without the object-unique suffix. The
>> > conversion is fixed in this release. Relevant tracker:
>> > https://tracker.ceph.com/issues/5614
>> >
>> > * The ability to blend all RBD pools together into a single view by
>> > invoking "rbd perf image iostat" or "rbd perf image iotop" commands
>> > without any options or positional arguments is resurrected. Such
>> > invocations accidentally became limited to just the default pool
>> > (rbd_default_pool) in v15.2.14.
>> >
>> > Getting Ceph
>> > 
>> > * Git at git://github.com/ceph/ceph.git
>> > * Tarball at https://download.ceph.com/tarballs/ceph-15.2.17.tar.gz
>> > * Containers at https://quay.io/repository/ceph/ceph
>> > * For packages, see https://docs.ceph.com/docs/master/install/get-packages/
>> > * Release git sha1: 8a82819d84cf884bd39c17e3236e0632ac146dc4
>> >
>> > ___
>> > Dev mailing list -- d...@ceph.io
>> > To unsubscribe send an email to dev-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage
>
> Red Hat Inc.
>
> La Grange Park, IL
>
> lflo...@redhat.com
> M: +17087388804
>
> @RedHat   Red Hat   Red Hat
>
> ___
> Ceph-maintainers mailing list -- ceph-maintain...@ceph.io
> To unsubscribe send an email to ceph-maintainers-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph debian/ubuntu packages

2022-08-10 Thread Marc




find / -mtime -1 

?

> 
> I have a naive question: after I run ./make-debs.sh to build
> debian/ubuntu
> packages, where can I find those generated artifacts?
> 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: permissions of the .snap directory do not inherit ACLs

2022-08-10 Thread Robert Sander

Am 09.08.22 um 22:31 schrieb Patrick Donnelly:


It sounds like a bug. Please create a tracker ticket with details
about your environment and an example.



Just created https://tracker.ceph.com/issues/57084

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph drops privilege before creating /var/run/ceph

2022-08-10 Thread Marc


> 
> I've built a ceph container image based on ubuntu and used rook to
> install
> ceph in my GKE cluster, but I found in the ceph-mon log that the run-dir
> is
> not created:
> warning: unable to create /var/run/ceph: (13) Permission denied
> debug 2022-08-05T00:38:06.472+ 7f0960c2c540 -1 asok(0x56213ef7e000)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed
> to
> bind the UNIX domain socket to '/var/run/ceph/ceph-mon.a.asok': (2) No
> such
> file or directory
> 
> I looked into the ceph/ceph source code. It turns out that we drop
> privilege before we create /var/run/ceph, which might explain why the
> run-dir creation failed.

Drop privilege? I assumed this container is just running as a regular user. Is 
this not the case?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io