Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

Brad Hubbard Mon, 23 Oct 2017 23:50:42 -0700

On Tue, Oct 24, 2017 at 3:49 PM, Brad Hubbard <bhubb...@redhat.com> wrote:


>
>
> On Mon, Oct 23, 2017 at 4:51 PM, pascal.pu...@pci-conseil.net <
> pascal.pu...@pci-conseil.net> wrote:
>
>> Hello,
>> Le 23/10/2017 à 02:05, Brad Hubbard a écrit :
>>
>> 2017-10-22 17:32:56.031086 7f3acaff5700  1 osd.14 pg_epoch: 72024
>> pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13
>> ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0
>> lpr=72022 crt=71593'41657 lcod 0'
>> 0 mlcod 0'0 active+clean] hit_set_trim 
>> 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31
>> 01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found
>> 2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In
>> function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&,
>> unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105
>> osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
>>
>> It appears to be looking for (and failing to find) a hitset object with a
>> timestamp from August? Does that sound right to you? Of course, it appears
>> an object for that timestamp does not exist.
>>
>> How is-it possible ? How to fix it. I am sure, if I run a lot of read,
>> other objects like this will crash other osd.
>> (Cluster is OK now, I will probably destroy OSD 14 and recreate it).
>> How to find this object ?
>>
>
> You should be able to do a find on the OSDs filestore and grep the output
> for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs
> responsible for pg 37.1c and then move on to the others if it's feasible.
>

Many thanks to Kefu for correcting me on this.

You'll need to use something more like the following command to find this
object.

find ${path_to_osd} -name 'hit\\uset\\u37.1c\\uarchive\\u2017-08-31
01:03:24.697717Z\\u2017-08-31 01:52:34.767197Z*'

Apologies for the confusion, it was entirely mine.


> Let us know the results.
>
>
>> For information : All ceph server are NTP time synchrone.
>>
>> What are the settings for this cache tier?
>>
>>
>> Just Tier in "backwrite" on erasure pool 2+1.
>>
>> # ceph osd pool get cache-nvme-data all
>> size: 3
>> min_size: 2
>> crash_replay_interval: 0
>> pg_num: 512
>> pgp_num: 512
>> crush_ruleset: 10
>> hashpspool: true
>> nodelete: false
>> nopgchange: false
>> nosizechange: false
>> write_fadvise_dontneed: false
>> noscrub: false
>> nodeep-scrub: false
>> hit_set_type: bloom
>> hit_set_period: 14400
>> hit_set_count: 12
>> hit_set_fpp: 0.05
>> use_gmt_hitset: 1
>> auid: 0
>> target_max_objects: 1000000
>> target_max_bytes: 100000000000
>> cache_target_dirty_ratio: 0.4
>> cache_target_dirty_high_ratio: 0.6
>> cache_target_full_ratio: 0.8
>> cache_min_flush_age: 600
>> cache_min_evict_age: 1800
>> min_read_recency_for_promote: 1
>> min_write_recency_for_promote: 1
>> fast_read: 0
>> hit_set_grade_decay_rate: 0
>> hit_set_search_last_n: 0
>>
>> #  ceph osd pool get raid-2-1-data all
>> size: 3
>> min_size: 2
>> crash_replay_interval: 0
>> pg_num: 1024
>> pgp_num: 1024
>> crush_ruleset: 8
>> hashpspool: true
>> nodelete: false
>> nopgchange: false
>> nosizechange: false
>> write_fadvise_dontneed: false
>> noscrub: false
>> nodeep-scrub: false
>> use_gmt_hitset: 1
>> auid: 0
>> erasure_code_profile: raid-2-1
>> min_write_recency_for_promote: 0
>> fast_read: 0
>>
>> # ceph osd erasure-code-profile get raid-2-1
>> jerasure-per-chunk-alignment=false
>> k=2
>> m=1
>> plugin=jerasure
>> ruleset-failure-domain=host
>> ruleset-root=default
>> technique=reed_sol_van
>> w=8
>>
>> Could you check your logs for any errors from the 'agent_load_hit_sets'
>> function?
>>
>>
>> join log : #  pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04
>> ssh -x  %h 'zgrep -B10 -A10 agent_load_hit_sets
>> /var/log/ceph/ceph-osd.*gz'|less > log_agent_load_hit_sets.log
>>
>> On 19 October, I restarted on morning OSD 14.
>>
>> thanks for your help.
>>
>> regards,
>>
>>
>> On Mon, Oct 23, 2017 at 2:41 AM, pascal.pu...@pci-conseil.net <
>> pascal.pu...@pci-conseil.net> wrote:
>>
>>> Hello,
>>>
>>> I ran today a lot read IO with an simple rsync... and again, an OSD
>>> crashed :
>>>
>>> But as before, I can't restart OSD. It continue crashing again. So OSD
>>> is out, cluster is recovering.
>>>
>>> I had just time to increase OSD log.
>>>
>>> # ceph tell osd.14 injectargs --debug-osd 5/5
>>>
>>> Join log :
>>>
>>> # grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log
>>>
>>> If I ran another read, an other OSD willl probably crash.
>>>
>>> Any Idee ?
>>>
>>> I will probably plan to move data from erasure pool to replicat 3x pool.
>>> It's becoming unstable without any change.
>>>
>>> Regards,
>>>
>>> PS: Last sunday, I lost RBD header during remove of cache tier... a lot
>>> of thanks to http://fnordahl.com/2017/04/17
>>> /ceph-rbd-volume-header-recovery/, to recreate it and resurrect RBD
>>> disk :)
>>> Le 19/10/2017 à 00:19, Brad Hubbard a écrit :
>>>
>>> On Wed, Oct 18, 2017 at 11:16 PM, 
>>> pascal.pu...@pci-conseil.net<pascal.pu...@pci-conseil.net> 
>>> <pascal.pu...@pci-conseil.net> wrote:
>>>
>>> hello,
>>>
>>> For 2 week, I lost sometime some OSD :
>>> Here trace :
>>>
>>>     0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In
>>> function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&,
>>> unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962
>>> osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
>>>
>>> Can you try to capture a log with debug_osd set to 10 or greater as
>>> per http://tracker.ceph.com/issues/19185 ?
>>>
>>> This will allow us to see the output from the
>>> PrimaryLogPG::get_object_context() function which may help identify
>>> the problem.
>>>
>>> Please also check your machines all have the same time zone set and
>>> their clocks are in sync.
>>>
>>>
>>>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x85) [0x55eec15a09e5]
>>>  2: (ReplicatedPG::hit_set_trim(std::unique_ptr<ReplicatedPG::OpContext,
>>> std::default_delete<ReplicatedPG::OpContext> >&, unsigned int)+0x6dd)
>>> [0x55eec107a52d]
>>>  3: (ReplicatedPG::hit_set_persist()+0xd7c) [0x55eec107d1bc]
>>>  4: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1a92)
>>> [0x55eec109bbe2]
>>>  5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x747) [0x55eec10588a7]
>>>  6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>,
>>> ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad]
>>>  7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d)
>>> [0x55eec0f0bdfd]
>>>  8: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0x77b) [0x55eec0f0f7db]
>>>  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887)
>>> [0x55eec1590987]
>>>  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eec15928f0]
>>>  11: (()+0x7e25) [0x7f7c4fd52e25]
>>>  12: (clone()+0x6d) [0x7f7c4e3dc34d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>>> interpret this.
>>>
>>> I am using Jewel 10.2.10
>>>
>>> I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3
>>> replica with simple RBD disk.
>>> (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x
>>> NVMe Osd (I split NVMe in 2).
>>> Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed
>>> cache and recreated It.
>>> From this days, it work fine. Today, an OSD crahed. But it was not an NVME
>>> OSD this time, a normal OSD (sata).
>>>
>>> Any idee ? what about this void "*ReplicatedPG::hit_set_trim".
>>>
>>> *thanks for your help,*
>>> *
>>> Regards,
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing 
>>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>>
>
>
> --
> Cheers,
> Brad
>



-- 
Cheers,
Brad

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

Reply via email to