It looks like your radosgw is using a different version of librados. In the backtrace, the top useful line begins:

librados::v14_2_0

when it should be v15.2.0, like the ceph::buffer in the same line.

Is there an old librados lying around that didn't get cleaned up somehow?

Daniel



On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote:
Hello,

I am experiencing very frequent crashes of the radosgw service. It happens 
multiple times every hour. As an example, over the last 12 hours we've had 35 
crashes. Has anyone experienced similar behaviour of the radosgw octopus 
release service? More info below:

Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on 
one of the servers to Ubuntu 20.04 with latest updates. The second server is 
still running Ubuntu 18.04. Both services crash occasionally, but the service 
which is running on Ubuntu 20.04 crashes far more often it seems. The ceph 
cluster itself is pretty old and was initially setup around 2013. The cluster 
was updated pretty regularly with every major release. Currently, I've got 
Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers.

Crash Backtrace:

ceph crash info 
2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8 |less
{
"backtrace": [
"(()+0x46210) [0x7f815a49a210]",
"(gsignal()+0xcb) [0x7f815a49a18b]",
"(abort()+0x12b) [0x7f815a479859]",
"(()+0x9e951) [0x7f8150ee9951]",
"(()+0xaa47c) [0x7f8150ef547c]",
"(()+0xaa4e7) [0x7f8150ef54e7]",
"(()+0xaa799) [0x7f8150ef5799]",
"(()+0x344ba) [0x7f815a1404ba]",
"(()+0x71e04) [0x7f815a17de04]",
"(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, 
ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]",
"(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7f815b0d9935]",
"(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> const&, RGWSI_SysObj::Pool::ListCtx*)+0x255) [0x7f815abd7035]",
"(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&)+0x206) [0x7f815b0ccfe6]",
"(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&, void**)+0x41) [0x7f815ad23201]",
"(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&, void**)+0x71) [0x7f815ad254d1]",
"(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]",
"(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25) 
[0x7f815ae60f25]",
"(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, 
ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]",
"(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]",
"(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]",
"(()+0x9609) [0x7f815130d609]",
"(clone()+0x43) [0x7f815a576293]"
],
"ceph_version": "15.2.8",
"crash_id": "2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8",
"entity_name": "client.radosgw1.gateway",
"os_id": "ubuntu",
"os_name": "Ubuntu",
"os_version": "20.04.1 LTS (Focal Fossa)",
"os_version_id": "20.04",
"process_name": "radosgw",
"stack_sig": "347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a",
"timestamp": "2021-01-28T11:36:48.912771Z",
"utsname_hostname": "arh-ibstorage1-ib",
"utsname_machine": "x86_64",
"utsname_release": "5.4.0-64-generic",
"utsname_sysname": "Linux",
"utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021"
}





radosgw.log file (file names were redacted):


-25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - 
[28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG HTTP/1.1" 400 460 - -
-24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request 
req=0x7f80437f5780 =====
-23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for 
trans_id = tx000000000000000001431-006012a1d0-31197b5c-default
-22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1
-21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj 
verifying requester
-20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj 
normalizing buckets and tenants
-19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init 
permissions
-18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid 
dest placement: default-placement/REDUCED_REDUNDANCY
-17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22 
new_err_no=-22
-16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op 
status=0
-15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http 
status=400
-14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done 
req=0x7f80437f5780 op status=0 http_status=400 latency=0s ======
-13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8: 176.35.173.88 - - 
[28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 
460 - -
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request 
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for 
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj 
verifying requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj 
normalizing buckets and tenants
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request 
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for 
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj 
verifying requester
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request 
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for 
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj 
verifying requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj 
normalizing buckets and tenants
-7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init 
permissions
-6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid 
dest placement: default-placement/REDUCED_REDUNDANCY
-5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22 
new_err_no=-22
-4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op 
status=0
-3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http 
status=400
-2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done 
req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ======

-1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - 
[28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - 
-
0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) **
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045 
(ceph:ceph)
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8 
(bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, 
pid 30417
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s


Could someone help me troubleshoot and fix the issue?

Thanks
Andrei

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to