On Tue, Oct 24, 2017 at 3:49 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> > > On Mon, Oct 23, 2017 at 4:51 PM, pascal.pu...@pci-conseil.net < > pascal.pu...@pci-conseil.net> wrote: > >> Hello, >> Le 23/10/2017 à 02:05, Brad Hubbard a écrit : >> >> 2017-10-22 17:32:56.031086 7f3acaff5700 1 osd.14 pg_epoch: 72024 >> pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13 >> ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0 >> lpr=72022 crt=71593'41657 lcod 0' >> 0 mlcod 0'0 active+clean] hit_set_trim >> 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31 >> 01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found >> 2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In >> function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&, >> unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105 >> osd/ReplicatedPG.cc: 11782: FAILED assert(obc) >> >> It appears to be looking for (and failing to find) a hitset object with a >> timestamp from August? Does that sound right to you? Of course, it appears >> an object for that timestamp does not exist. >> >> How is-it possible ? How to fix it. I am sure, if I run a lot of read, >> other objects like this will crash other osd. >> (Cluster is OK now, I will probably destroy OSD 14 and recreate it). >> How to find this object ? >> > > You should be able to do a find on the OSDs filestore and grep the output > for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs > responsible for pg 37.1c and then move on to the others if it's feasible. > Many thanks to Kefu for correcting me on this. You'll need to use something more like the following command to find this object. find ${path_to_osd} -name 'hit\\uset\\u37.1c\\uarchive\\u2017-08-31 01:03:24.697717Z\\u2017-08-31 01:52:34.767197Z*' Apologies for the confusion, it was entirely mine. > Let us know the results. > > >> For information : All ceph server are NTP time synchrone. >> >> What are the settings for this cache tier? >> >> >> Just Tier in "backwrite" on erasure pool 2+1. >> >> # ceph osd pool get cache-nvme-data all >> size: 3 >> min_size: 2 >> crash_replay_interval: 0 >> pg_num: 512 >> pgp_num: 512 >> crush_ruleset: 10 >> hashpspool: true >> nodelete: false >> nopgchange: false >> nosizechange: false >> write_fadvise_dontneed: false >> noscrub: false >> nodeep-scrub: false >> hit_set_type: bloom >> hit_set_period: 14400 >> hit_set_count: 12 >> hit_set_fpp: 0.05 >> use_gmt_hitset: 1 >> auid: 0 >> target_max_objects: 1000000 >> target_max_bytes: 100000000000 >> cache_target_dirty_ratio: 0.4 >> cache_target_dirty_high_ratio: 0.6 >> cache_target_full_ratio: 0.8 >> cache_min_flush_age: 600 >> cache_min_evict_age: 1800 >> min_read_recency_for_promote: 1 >> min_write_recency_for_promote: 1 >> fast_read: 0 >> hit_set_grade_decay_rate: 0 >> hit_set_search_last_n: 0 >> >> # ceph osd pool get raid-2-1-data all >> size: 3 >> min_size: 2 >> crash_replay_interval: 0 >> pg_num: 1024 >> pgp_num: 1024 >> crush_ruleset: 8 >> hashpspool: true >> nodelete: false >> nopgchange: false >> nosizechange: false >> write_fadvise_dontneed: false >> noscrub: false >> nodeep-scrub: false >> use_gmt_hitset: 1 >> auid: 0 >> erasure_code_profile: raid-2-1 >> min_write_recency_for_promote: 0 >> fast_read: 0 >> >> # ceph osd erasure-code-profile get raid-2-1 >> jerasure-per-chunk-alignment=false >> k=2 >> m=1 >> plugin=jerasure >> ruleset-failure-domain=host >> ruleset-root=default >> technique=reed_sol_van >> w=8 >> >> Could you check your logs for any errors from the 'agent_load_hit_sets' >> function? >> >> >> join log : # pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04 >> ssh -x %h 'zgrep -B10 -A10 agent_load_hit_sets >> /var/log/ceph/ceph-osd.*gz'|less > log_agent_load_hit_sets.log >> >> On 19 October, I restarted on morning OSD 14. >> >> thanks for your help. >> >> regards, >> >> >> On Mon, Oct 23, 2017 at 2:41 AM, pascal.pu...@pci-conseil.net < >> pascal.pu...@pci-conseil.net> wrote: >> >>> Hello, >>> >>> I ran today a lot read IO with an simple rsync... and again, an OSD >>> crashed : >>> >>> But as before, I can't restart OSD. It continue crashing again. So OSD >>> is out, cluster is recovering. >>> >>> I had just time to increase OSD log. >>> >>> # ceph tell osd.14 injectargs --debug-osd 5/5 >>> >>> Join log : >>> >>> # grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log >>> >>> If I ran another read, an other OSD willl probably crash. >>> >>> Any Idee ? >>> >>> I will probably plan to move data from erasure pool to replicat 3x pool. >>> It's becoming unstable without any change. >>> >>> Regards, >>> >>> PS: Last sunday, I lost RBD header during remove of cache tier... a lot >>> of thanks to http://fnordahl.com/2017/04/17 >>> /ceph-rbd-volume-header-recovery/, to recreate it and resurrect RBD >>> disk :) >>> Le 19/10/2017 à 00:19, Brad Hubbard a écrit : >>> >>> On Wed, Oct 18, 2017 at 11:16 PM, >>> pascal.pu...@pci-conseil.net<pascal.pu...@pci-conseil.net> >>> <pascal.pu...@pci-conseil.net> wrote: >>> >>> hello, >>> >>> For 2 week, I lost sometime some OSD : >>> Here trace : >>> >>> 0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In >>> function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&, >>> unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962 >>> osd/ReplicatedPG.cc: 11782: FAILED assert(obc) >>> >>> Can you try to capture a log with debug_osd set to 10 or greater as >>> per http://tracker.ceph.com/issues/19185 ? >>> >>> This will allow us to see the output from the >>> PrimaryLogPG::get_object_context() function which may help identify >>> the problem. >>> >>> Please also check your machines all have the same time zone set and >>> their clocks are in sync. >>> >>> >>> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x85) [0x55eec15a09e5] >>> 2: (ReplicatedPG::hit_set_trim(std::unique_ptr<ReplicatedPG::OpContext, >>> std::default_delete<ReplicatedPG::OpContext> >&, unsigned int)+0x6dd) >>> [0x55eec107a52d] >>> 3: (ReplicatedPG::hit_set_persist()+0xd7c) [0x55eec107d1bc] >>> 4: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1a92) >>> [0x55eec109bbe2] >>> 5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, >>> ThreadPool::TPHandle&)+0x747) [0x55eec10588a7] >>> 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, >>> ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad] >>> 7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) >>> [0x55eec0f0bdfd] >>> 8: (OSD::ShardedOpWQ::_process(unsigned int, >>> ceph::heartbeat_handle_d*)+0x77b) [0x55eec0f0f7db] >>> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) >>> [0x55eec1590987] >>> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eec15928f0] >>> 11: (()+0x7e25) [0x7f7c4fd52e25] >>> 12: (clone()+0x6d) [0x7f7c4e3dc34d] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >>> interpret this. >>> >>> I am using Jewel 10.2.10 >>> >>> I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3 >>> replica with simple RBD disk. >>> (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x >>> NVMe Osd (I split NVMe in 2). >>> Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed >>> cache and recreated It. >>> From this days, it work fine. Today, an OSD crahed. But it was not an NVME >>> OSD this time, a normal OSD (sata). >>> >>> Any idee ? what about this void "*ReplicatedPG::hit_set_trim". >>> >>> *thanks for your help,* >>> * >>> Regards, >>> >>> >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing >>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >> >> > > > -- > Cheers, > Brad > -- Cheers, Brad
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com