Re: [ceph-users] Deleting an rbd image hangs

2018-05-07 Thread Jan Marquardt

Am 30.04.18 um 09:26 schrieb Jan Marquardt:
> Am 27.04.18 um 20:48 schrieb David Turner:
>> This old [1] blog post about removing super large RBDs is not relevant
>> if you're using object map on the RBDs, however it's method to manually
>> delete an RBD is still valid.  You can see if this works for you to
>> manually remove the problem RBD you're having.
> 
> I followed the instructions, but it seems that 'rados -p rbd ls | grep
> '^rbd_data.221bf2eb141f2.' | xargs -n 200  rados -p rbd rm' gets stuck,
> too. It's running since Friday and still not finished. The rbd image
> is/was about 1 TB large.
> 
> Until now the only output was:
> error removing rbd>rbd_data.221bf2eb141f2.51d2: (2) No such
> file or directory
> error removing rbd>rbd_data.221bf2eb141f2.e3f2: (2) No such
> file or directory

I am still trying to get rid of this. 'rados -p rbd ls' still shows a
lot of objects beginning with rbd_data.221bf2eb141f2, but if I try to
delete them with 'rados -p rbd rm ' it says 'No such file or
directory'. This is not the behaviour I'd expect. Any ideas?

Besides this rbd_data.221bf2eb141f2.00016379 is still causing
the OSDs crashing, which leaves the cluster unusable for us at the
moment. Even if it's just a proof of concept, I'd like to get this fixed
without destroying the whole cluster.

>>
>> [1] http://cephnotes.ksperis.com/blog/2014/07/04/remove-big-rbd-image
>>
>> On Thu, Apr 26, 2018 at 9:25 AM Jan Marquardt <j...@artfiles.de
>> <mailto:j...@artfiles.de>> wrote:
>>
>> Hi,
>>
>> I am currently trying to delete an rbd image which is seemingly causing
>> our OSDs to crash, but it always gets stuck at 3%.
>>
>> root@ceph4:~# rbd rm noc_tobedeleted
>> Removing image: 3% complete...
>>
>> Is there any way to force the deletion? Any other advices?
>>
>> Best Regards
>>
>> Jan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting an rbd image hangs

2018-04-30 Thread Jan Marquardt
Am 27.04.18 um 22:33 schrieb Jason Dillaman:
> Do you have any reason for why the OSDs crash? Anything the logs? Can
> you provide an "rbd info noc_tobedeleted"?

The reason why they are crashing is this assert:
https://github.com/ceph/ceph/blob/luminous/src/osd/PrimaryLogPG.cc#L353

With debug 20 we see this right before the OSD crashes:

2018-04-24 13:59:38.047697 7f929ba0d700 20 osd.4 pg_epoch: 144994
pg[0.103( v 140091'469328 (125640'467824,140091'469328] lb
0:c0e04acc:::rbd_data.221bf2eb141f2.00016379:head (bitwise)
local-lis/les=137681/137682 n=9535 ec=115/115 lis/c 144979/49591 les/c/f
144980/49596/0 144978/144979/144979) [4,17,2]/[2,17] r=-1 lpr=144979
pi=[49591,144979)/3 luod=0'0 crt=140091'469328 lcod 0'0 active+remapped]
 snapset 0=[]:[] legacy_snaps []

2018-04-24 16:34:54.558159 7f1c40e32700 20 osd.11 pg_epoch: 145549
pg[0.103( v 140091'469328 (125640'467824,140091'469328] lb
0:c0e04acc:::rbd_data.221bf2eb141f2.00016379:head (bitwise)
local-lis/les=138310/138311 n=9535 ec=115/115 lis/c 145548/49591 les/c/f
145549/49596/0 145547/145548/145548) [11,17,2]/[2,17] r=-1 lpr=145548
pi=[49591,145548)/3 luod=0'0 crt=140091'469328 lcod 0'0 active+remapped]
 snapset 0=[]:[] legacy_snaps []

Which is caused from this code:
https://github.com/ceph/ceph/blob/luminous/src/osd/PrimaryLogPG.cc#L349-L350

Unfortunately rbd info is not available anymore for this image, because
I already followed the instructions under
http://cephnotes.ksperis.com/blog/2014/07/04/remove-big-rbd-image until
'Remove all rbd data', which seems to be hanging, too.

> On Thu, Apr 26, 2018 at 9:24 AM, Jan Marquardt <j...@artfiles.de> wrote:
>> Hi,
>>
>> I am currently trying to delete an rbd image which is seemingly causing
>> our OSDs to crash, but it always gets stuck at 3%.
>>
>> root@ceph4:~# rbd rm noc_tobedeleted
>> Removing image: 3% complete...
>>
>> Is there any way to force the deletion? Any other advices?
>>
>> Best Regards
>>
>> Jan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting an rbd image hangs

2018-04-30 Thread Jan Marquardt
Am 27.04.18 um 20:48 schrieb David Turner:
> This old [1] blog post about removing super large RBDs is not relevant
> if you're using object map on the RBDs, however it's method to manually
> delete an RBD is still valid.  You can see if this works for you to
> manually remove the problem RBD you're having.

I followed the instructions, but it seems that 'rados -p rbd ls | grep
'^rbd_data.221bf2eb141f2.' | xargs -n 200  rados -p rbd rm' gets stuck,
too. It's running since Friday and still not finished. The rbd image
is/was about 1 TB large.

Until now the only output was:
error removing rbd>rbd_data.221bf2eb141f2.51d2: (2) No such
file or directory
error removing rbd>rbd_data.221bf2eb141f2.e3f2: (2) No such
file or directory

> 
> [1] http://cephnotes.ksperis.com/blog/2014/07/04/remove-big-rbd-image
> 
> On Thu, Apr 26, 2018 at 9:25 AM Jan Marquardt <j...@artfiles.de
> <mailto:j...@artfiles.de>> wrote:
> 
> Hi,
> 
> I am currently trying to delete an rbd image which is seemingly causing
> our OSDs to crash, but it always gets stuck at 3%.
> 
> root@ceph4:~# rbd rm noc_tobedeleted
> Removing image: 3% complete...
> 
> Is there any way to force the deletion? Any other advices?
> 
> Best Regards
> 
> Jan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deleting an rbd image hangs

2018-04-26 Thread Jan Marquardt
Hi,

I am currently trying to delete an rbd image which is seemingly causing
our OSDs to crash, but it always gets stuck at 3%.

root@ceph4:~# rbd rm noc_tobedeleted
Removing image: 3% complete...

Is there any way to force the deletion? Any other advices?

Best Regards

Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dying OSDs

2018-04-24 Thread Jan Marquardt
Hi,

it's been a while, but we are still fighting with this issue.

As suggested we deleted all snapshots, but the errors still occur.

We were able to gather some more information:

The reason why they are crashing is this assert:
https://github.com/ceph/ceph/blob/luminous/src/osd/PrimaryLogPG.cc#L353

With debug 20 we see this right before the OSD crashes:

2018-04-24 13:59:38.047697 7f929ba0d700 20 osd.4 pg_epoch: 144994
pg[0.103( v 140091'469328 (125640'467824,140091'469328] lb
0:c0e04acc:::rbd_data.221bf2eb141f2.00016379:head (bitwise)
local-lis/les=137681/137682 n=9535 ec=115/115 lis/c 144979/49591 les/c/f
144980/49596/0 144978/144979/144979) [4,17,2]/[2,17] r=-1 lpr=144979
pi=[49591,144979)/3 luod=0'0 crt=140091'469328 lcod 0'0 active+remapped]
 snapset 0=[]:[] legacy_snaps []

2018-04-24 16:34:54.558159 7f1c40e32700 20 osd.11 pg_epoch: 145549
pg[0.103( v 140091'469328 (125640'467824,140091'469328] lb
0:c0e04acc:::rbd_data.221bf2eb141f2.00016379:head (bitwise)
local-lis/les=138310/138311 n=9535 ec=115/115 lis/c 145548/49591 les/c/f
145549/49596/0 145547/145548/145548) [11,17,2]/[2,17] r=-1 lpr=145548
pi=[49591,145548)/3 luod=0'0 crt=140091'469328 lcod 0'0 active+remapped]
 snapset 0=[]:[] legacy_snaps []

Which is caused from this code:
https://github.com/ceph/ceph/blob/luminous/src/osd/PrimaryLogPG.cc#L349-L350

Any help would really be appreciated.

Best Regards

Jan


Am 12.04.18 um 10:53 schrieb Paul Emmerich:
> Hi,
> 
> thanks, but unfortunately it's not the thing I suspected :(
> Anyways, there's something wrong with your snapshots, the log also
> contains a lot of entries like this:
> 
> 2018-04-09 06:58:53.703353 7fb8931a0700 -1 osd.28 pg_epoch: 88438
> pg[0.5d( v 88438'223279 (86421'221681,88438'223279]
> local-lis/les=87450/87451 n=5634 ec=115/115 lis/c 87450/87450 les/c/f
> 87451/87451/0 87352/87450/87450) [37,6,28] r=2 lpr=87450 luod=0'0
> crt=88438'223279 lcod 88438'223278 active] _scan_snaps no head for
> 0:ba087b0f:::rbd_data.221bf2eb141f2.1436:46aa (have MIN)
> 
> The cluster I've debugged with the same crash also got a lot of snapshot
> problems including this one.
> In the end, only manually marking all snap_ids as deleted in the pool
> helped.
> 
> 
> Paul
> 
> 2018-04-10 21:48 GMT+02:00 Jan Marquardt <j...@artfiles.de
> <mailto:j...@artfiles.de>>:
> 
> Am 10.04.18 um 20:22 schrieb Paul Emmerich:
> > Hi,
> > 
> > I encountered the same crash a few months ago, see
> > https://tracker.ceph.com/issues/23030
> <https://tracker.ceph.com/issues/23030>
> > 
> > Can you post the output of
> > 
> >    ceph osd pool ls detail -f json-pretty
> > 
> > 
> > Paul
> 
> Yes, of course.
> 
> # ceph osd pool ls detail -f json-pretty
> 
> [
>     {
>         "pool_name": "rbd",
>         "flags": 1,
>         "flags_names": "hashpspool",
>         "type": 1,
>         "size": 3,
>         "min_size": 2,
>         "crush_rule": 0,
>         "object_hash": 2,
>         "pg_num": 768,
>         "pg_placement_num": 768,
>         "crash_replay_interval": 0,
>         "last_change": "91256",
>         "last_force_op_resend": "0",
>         "last_force_op_resend_preluminous": "0",
>         "auid": 0,
>         "snap_mode": "selfmanaged",
>         "snap_seq": 35020,
>         "snap_epoch": 91219,
>         "pool_snaps": [],
>         "removed_snaps":
> 
> "[1~4562,47f1~58,484a~9,4854~70,48c5~36,48fc~48,4945~d,4953~1,4957~1,495a~3,4960~1,496e~3,497a~1,4980~2,4983~3,498b~1,4997~1,49a8~1,49ae~1,49b1~2,49b4~1,49b7~1,49b9~3,49bd~5,49c3~6,49ca~5,49d1~4,49d6~1,49d8~2,49df~2,49e2~1,49e4~2,49e7~5,49ef~2,49f2~2,49f5~6,49fc~1,49fe~3,4a05~9,4a0f~4,4a14~4,4a1a~6,4a21~6,4a29~2,4a2c~3,4a30~1,4a33~5,4a39~3,4a3e~b,4a4a~1,4a4c~2,4a50~1,4a52~7,4a5a~1,4a5c~2,4a5f~4,4a64~1,4a66~2,4a69~2,4a6c~4,4a72~1,4a74~2,4a78~3,4a7c~6,4a84~2,4a87~b,4a93~4,4a99~1,4a9c~4,4aa1~7,4aa9~1,4aab~6,4ab2~2,4ab5~5,4abb~2,4abe~9,4ac8~a,4ad3~4,4ad8~13,4aec~16,4b03~6,4b0a~c,4b17~2,4b1a~3,4b1f~4,4b24~c,4b31~d,4b3f~13,4b53~1,4bfc~13ed,61e1~4a,622c~8,6235~a0,62d6~ac,63a6~2,63b2~2,63d0~2,63f7~2,6427~2,6434~10f]",
>         "quota_max_bytes": 0,
>         "quota_max_objects": 0,
>         "tiers": [],
>         "tier_of": -1,
>         "read_tier": -1,
>         "write

Re: [ceph-users] Dying OSDs

2018-04-10 Thread Jan Marquardt
Am 10.04.18 um 20:22 schrieb Paul Emmerich:
> Hi,
> 
> I encountered the same crash a few months ago, see
> https://tracker.ceph.com/issues/23030
> 
> Can you post the output of
> 
>    ceph osd pool ls detail -f json-pretty
> 
> 
> Paul

Yes, of course.

# ceph osd pool ls detail -f json-pretty

[
{
"pool_name": "rbd",
"flags": 1,
"flags_names": "hashpspool",
"type": 1,
"size": 3,
"min_size": 2,
"crush_rule": 0,
"object_hash": 2,
"pg_num": 768,
"pg_placement_num": 768,
"crash_replay_interval": 0,
"last_change": "91256",
"last_force_op_resend": "0",
"last_force_op_resend_preluminous": "0",
"auid": 0,
"snap_mode": "selfmanaged",
"snap_seq": 35020,
"snap_epoch": 91219,
"pool_snaps": [],
"removed_snaps":
"[1~4562,47f1~58,484a~9,4854~70,48c5~36,48fc~48,4945~d,4953~1,4957~1,495a~3,4960~1,496e~3,497a~1,4980~2,4983~3,498b~1,4997~1,49a8~1,49ae~1,49b1~2,49b4~1,49b7~1,49b9~3,49bd~5,49c3~6,49ca~5,49d1~4,49d6~1,49d8~2,49df~2,49e2~1,49e4~2,49e7~5,49ef~2,49f2~2,49f5~6,49fc~1,49fe~3,4a05~9,4a0f~4,4a14~4,4a1a~6,4a21~6,4a29~2,4a2c~3,4a30~1,4a33~5,4a39~3,4a3e~b,4a4a~1,4a4c~2,4a50~1,4a52~7,4a5a~1,4a5c~2,4a5f~4,4a64~1,4a66~2,4a69~2,4a6c~4,4a72~1,4a74~2,4a78~3,4a7c~6,4a84~2,4a87~b,4a93~4,4a99~1,4a9c~4,4aa1~7,4aa9~1,4aab~6,4ab2~2,4ab5~5,4abb~2,4abe~9,4ac8~a,4ad3~4,4ad8~13,4aec~16,4b03~6,4b0a~c,4b17~2,4b1a~3,4b1f~4,4b24~c,4b31~d,4b3f~13,4b53~1,4bfc~13ed,61e1~4a,622c~8,6235~a0,62d6~ac,63a6~2,63b2~2,63d0~2,63f7~2,6427~2,6434~10f]",
"quota_max_bytes": 0,
"quota_max_objects": 0,
"tiers": [],
"tier_of": -1,
"read_tier": -1,
"write_tier": -1,
"cache_mode": "none",
"target_max_bytes": 0,
"target_max_objects": 0,
"cache_target_dirty_ratio_micro": 0,
"cache_target_dirty_high_ratio_micro": 0,
"cache_target_full_ratio_micro": 0,
"cache_min_flush_age": 0,
"cache_min_evict_age": 0,
"erasure_code_profile": "",
"hit_set_params": {
"type": "none"
},
"hit_set_period": 0,
"hit_set_count": 0,
"use_gmt_hitset": true,
"min_read_recency_for_promote": 0,
"min_write_recency_for_promote": 0,
"hit_set_grade_decay_rate": 0,
"hit_set_search_last_n": 0,
"grade_table": [],
"stripe_width": 0,
"expected_num_objects": 0,
"fast_read": false,
"options": {},
"application_metadata": {
"rbd": {}
}
}
]

"Unfortunately" I started the crashed OSDs again in the meantime,
because the first pgs have been down before. So currently all OSDs are
running.

Regards,

Jan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dying OSDs

2018-04-10 Thread Jan Marquardt
Am 10.04.18 um 15:29 schrieb Brady Deetz:
> What distribution and kernel are you running?
> 
> I recently found my cluster running the 3.10 centos kernel when I
> thought it was running the elrepo kernel. After forcing it to boot
> correctly, my flapping osd issue went away. 

We are running on Ubuntu 16.04 and its current linux-hwe in version
4.13.0-32-generic.

Regards

Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dying OSDs

2018-04-10 Thread Jan Marquardt
Hi,

we are experiencing massive problems with our Ceph setup. After starting
a "repair pg" because of scrub errors OSDs started to crash, which we
could not stop so far. We are running Ceph 12.2.4. Crashed OSDs are both
bluestore and filestore.

Our cluster currently looks like this:

# ceph -s
  cluster:
id: c59e56df-2043-4c92-9492-25f05f268d9f
health: HEALTH_ERR
1 osds down
73005/17149710 objects misplaced (0.426%)
5 scrub errors
Reduced data availability: 2 pgs inactive, 2 pgs down
Possible data damage: 1 pg inconsistent
Degraded data redundancy: 611518/17149710 objects degraded
(3.566%), 86 pgs degraded, 86 pgs undersized

  services:
mon: 3 daemons, quorum head1,head2,head3
mgr: head3(active), standbys: head2, head1
osd: 34 osds: 24 up, 25 in; 18 remapped pgs

  data:
pools:   1 pools, 768 pgs
objects: 5582k objects, 19500 GB
usage:   62030 GB used, 31426 GB / 93456 GB avail
pgs: 0.260% pgs not active
 611518/17149710 objects degraded (3.566%)
 73005/17149710 objects misplaced (0.426%)
 670 active+clean
 75  active+undersized+degraded
 8   active+undersized+degraded+remapped+backfill_wait
 8   active+clean+remapped
 2   down
 2   active+undersized+degraded+remapped+backfilling
 2   active+clean+scrubbing+deep
 1   active+undersized+degraded+inconsistent

  io:
client:   10911 B/s rd, 118 kB/s wr, 0 op/s rd, 54 op/s wr
recovery: 31575 kB/s, 8 objects/s

# ceph osd tree
ID  CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
 -1   124.07297 root default
 -229.08960 host ceph1
  0   hdd   3.63620 osd.0  up  1.0 1.0
  1   hdd   3.63620 osd.1down0 1.0
  2   hdd   3.63620 osd.2  up  1.0 1.0
  3   hdd   3.63620 osd.3  up  1.0 1.0
  4   hdd   3.63620 osd.4down0 1.0
  5   hdd   3.63620 osd.5down0 1.0
  6   hdd   3.63620 osd.6  up  1.0 1.0
  7   hdd   3.63620 osd.7  up  1.0 1.0
 -3 7.27240 host ceph2
 14   hdd   3.63620 osd.14 up  1.0 1.0
 15   hdd   3.63620 osd.15 up  1.0 1.0
 -429.11258 host ceph3
 16   hdd   3.63620 osd.16 up  1.0 1.0
 18   hdd   3.63620 osd.18   down0 1.0
 19   hdd   3.63620 osd.19   down0 1.0
 20   hdd   3.65749 osd.20 up  1.0 1.0
 21   hdd   3.63620 osd.21 up  1.0 1.0
 22   hdd   3.63620 osd.22 up  1.0 1.0
 23   hdd   3.63620 osd.23 up  1.0 1.0
 24   hdd   3.63789 osd.24   down0 1.0
 -929.29919 host ceph4
 17   hdd   3.66240 osd.17 up  1.0 1.0
 25   hdd   3.66240 osd.25 up  1.0 1.0
 26   hdd   3.66240 osd.26   down0 1.0
 27   hdd   3.66240 osd.27 up  1.0 1.0
 28   hdd   3.66240 osd.28   down0 1.0
 29   hdd   3.66240 osd.29 up  1.0 1.0
 30   hdd   3.66240 osd.30 up  1.0 1.0
 31   hdd   3.66240 osd.31   down0 1.0
-1129.29919 host ceph5
 32   hdd   3.66240 osd.32 up  1.0 1.0
 33   hdd   3.66240 osd.33 up  1.0 1.0
 34   hdd   3.66240 osd.34 up  1.0 1.0
 35   hdd   3.66240 osd.35 up  1.0 1.0
 36   hdd   3.66240 osd.36   down  1.0 1.0
 37   hdd   3.66240 osd.37 up  1.0 1.0
 38   hdd   3.66240 osd.38 up  1.0 1.0
 39   hdd   3.66240 osd.39 up  1.0 1.0

The last OSDs that crashed are #28 and #36. Please find the
corresponding log files here:

http://af.janno.io/ceph/ceph-osd.28.log.1.gz
http://af.janno.io/ceph/ceph-osd.36.log.1.gz

The backtraces look almost the same for all crashed OSDs.

Any help, hint or advice would really be appreciated. Please let me know
if you need any further information.

Best Regards

Jan

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Jan Marquardt
Hi David,

Am 15.03.18 um 18:03 schrieb David Turner:
> I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last
> week I added 2 nodes to the cluster.  The backfilling has been
> ATROCIOUS.  I have OSDs consistently [2] segfaulting during recovery. 
> There's no pattern of which OSDs are segfaulting, which hosts have
> segfaulting OSDs, etc... It's all over the cluster.  I have been trying
> variants on all of these following settings with different levels of
> success, but I cannot eliminate the blocked requests and segfaulting
> OSDs.  osd_heartbeat_grace, osd_max_backfills, osd_op_thread_suicide_timeout, 
> osd_recovery_max_active, osd_recovery_sleep_hdd, osd_recovery_sleep_hybrid, 
> osd_recovery_thread_timeout,
> and osd_scrub_during_recovery.  Except for setting nobackfilling on the
> cluster I can't stop OSDs from segfaulting during recovery.
> 
> Does anyone have any ideas for this?  I've been struggling with this for
> over a week now.  For the first couple days I rebalanced the cluster and
> had this exact same issue prior to adding new storage.  Even setting
> osd_max_backfills to 1 and recovery_sleep to 1.0, with everything else
> on defaults, doesn't help.
> 
> Backfilling caused things to slow down on Jewel, but I wasn't having
> OSDs segfault multiple times/hour like I am on Luminous.  So many OSDs
> are going down that I had to set nodown to prevent potential data
> instability of OSDs on multiple hosts going up and down all the time. 
> That blocks IO for every OSD that dies either until it comes back up or
> I manually mark it down.  I hope someone has some ideas for me here. 
> Our plan moving forward is to only use half of the capacity of the
> drives by pretending they're 5TB instead of 10TB to increase the spindle
> speed per TB.  Also migrating to bluestore will hopefully help.

Do you see segfaults in dmesg?
This sounds somehow like the problems I experienced during last week.

http://tracker.ceph.com/issues/23258?next_issue_id=23257

For some reason it seems to be gone at the moment, but unfortunately I
don't know why, which is really disappointing.

Best Regards

Jan

> 
> 
> [1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on
> Intel DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz,
> 128GB RAM.
> 
> [2]    -19> 2018-03-15 16:42:17.998074 7fe661601700  5 --
> 10.130.115.25:6811/2942118  >>
> 10.130.115.48:0/372681 
> conn(0x55e3ea087000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pg
> s=1920 cs=1 l=1). rx osd.254 seq 74507 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.990698) v4
>    -18> 2018-03-15 16:42:17.998091 7fe661601700  1 --
> 10.130.115.25:6811/2942118  <==
> osd.254 10.130.115.48:0/372681  74507
>  osd_ping(ping e93182 stamp 2018-03-15 16:42:17.990698)
>  v4  2004+0+0 (492539280 0 0) 0x55e3eb8e2e00 con 0x55e3ea087000
>    -17> 2018-03-15 16:42:17.998109 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>    -16> 2018-03-15 16:42:17.998111 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>    -15> 2018-03-15 16:42:17.998120 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>    -14> 2018-03-15 16:42:17.998123 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>    -13> 2018-03-15 16:42:17.998126 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>    -12> 2018-03-15 16:42:17.998129 7fe661601700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>    -11> 2018-03-15 16:42:18.004203 7fe661601700  5 --
> 10.130.115.25:6811/2942118  >>
> 10.130.115.33:0/3348055 
> conn(0x55e3eb5f :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
> gs=1894 cs=1 l=1). rx osd.169 seq 74633 0x55e3eb8e2e00 osd_ping(ping
> e93182 stamp 2018-03-15 16:42:17.998828) v4
>    -10> 2018-03-15 16:42:18.004230 7fe661601700  1 --
> 10.130.115.25:6811/2942118  <==
> osd.169 10.130.115.33:0/3348055  74633
>  osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
> ) v4  2004+0+0 (2306332339 0 0) 0x55e3eb8e2e00 con 0x55e3eb5f
>     -9> 2018-03-15 16:42:18.004241 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>     -8> 2018-03-15 16:42:18.004244 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>     -7> 2018-03-15 16:42:18.004246 7fe661601700  1 heartbeat_map
> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out 

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Jan Marquardt
Am 05.03.18 um 13:13 schrieb Ronny Aasen:
> i had some similar issues when i started my proof of concept. especialy
> the snapshot deletion i remember well.
> 
> the rule of thumb for filestore that i assume you are running is 1GB ram
> per TB of osd. so with 8 x 4TB osd's you are looking at 32GB of ram for
> osd's + some  GB's for the mon service, + some GB's  for the os itself.
> 
> i suspect if you inspect your dmesg log and memory graphs you will find
> that the out of memory killer ends your osd's when the snap deletion (or
> any other high load task) runs.
> 
> I ended up reducing the number of osd's per node, since the old
> mainboard i used was maxed for memory.

Well, thanks for the broad hint. Somehow I assumed we fulfill the
recommendations, but of course you are right. We'll check if our boards
support 48 GB RAM. Unfortunately, there are currently no corresponding
messages. But I can't rule out that there haven't been any.

> corruptions occured for me as well. and they was normaly associated with
> disks dying or giving read errors. ceph often managed to fix them but
> sometimes i had to just remove the hurting OSD disk.
> 
> hage some graph's  to look at. personaly i used munin/munin-node since
> it was just an apt-get away from functioning graphs
> 
> also i used smartmontools to send me emails about hurting disks.
> and smartctl to check all disks for errors.

I'll check S.M.A.R.T stuff. I am wondering if scrubbing errors are
always caused by disk problems or if they also could be triggered
by flapping OSDs or other circumstances.

> good luck with ceph !

Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Jan Marquardt
Hi,

we are relatively new to Ceph and are observing some issues, where
I'd like to know how likely they are to happen when operating a
Ceph cluster.

Currently our setup consists of three servers which are acting as
OSDs and MONs. Each server has two Intel Xeon L5420 (yes, I know,
it's not state of the art, but we thought it would be sufficient
for a Proof of Concept. Maybe we were wrong?) and 24 GB RAM and is
running 8 OSDs with 4 TB harddisks. 4 OSDs are sharing one SSD for
journaling. We started on Kraken and upgraded lately to Luminous.
The next two OSD servers and three separate MONs are ready for
deployment. Please find attached our ceph.conf. Current usage looks
like this:

data:
  pools:   1 pools, 768 pgs
  objects: 5240k objects, 18357 GB
  usage:   59825 GB used, 29538 GB / 89364 GB avail

We have only one pool which is exclusively used for rbd. We started
filling it with data and creating snapshots in January until Mid of
February. Everything was working like a charm until we started
removing old snapshots then.

While we were removing snapshots for the first time, OSDs started
flapping. Besides this there was no other load on the cluster.
For idle times we solved it by adding

osd snap trim priority = 1
osd snap trim sleep = 0.1

to ceph.conf. When there is load from other operations and we
remove big snapshots OSD flapping still occurs.

Last week our first scrub errors appeared. Repairing the first
one was no big deal. The second one however was, because the
instructed OSD started crashing. First on Friday osd.17 and
today osd.11.

ceph1:~# ceph pg repair 0.1b2
instructing pg 0.1b2 on osd.17 to repair

ceph1:~# ceph pg repair 0.1b2
instructing pg 0.1b2 on osd.11 to repair

I am still researching on the crashes, but already would be
thankful for any input.

Any opinions, hints and advices would really be appreciated.

Best Regards

Jan
[global]
fsid = c59e56df-2043-4c92-9492-25f05f268d9f
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.10.100.21,10.10.100.22,10.10.100.23
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public network = 10.10.100.0/24

[osd]

osd journal size = 0
osd snap trim priority = 1
osd snap trim sleep = 0.1

[client]

rbd default features = 3
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph with Clos IP fabric

2017-05-08 Thread Jan Marquardt
Hi,

sorry for the delay, but in the meantime we were able to find a
workaround. Inspired by this:

> Side note: Configuring the loopback IP on the physical interfaces is
> workable if you set it on **all** parallel links. Example with server1:
> 
>  
> 
> “iface enp3s0f0 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp3s0f1 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp4s0f0 inet static
> 
>   address 10.10.100.21/32
> 
> iface enp4s0f1 inet static
> 
>   address 10.10.100.21/32”
> 
>  
> 
> This should guarantee that the loopback ip is advertised if one of the 4
> links to switch1 and switch2 is up, but I am not sure if that’s workable
> for ceph’s listening address.

We added the loopback ip as well on lo as on dummy0. This solves the
issue for us and the Ceph cluster works as intended.

Regards

Jan


-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph with Clos IP fabric

2017-04-18 Thread Jan Marquardt
Am 17.04.17 um 22:12 schrieb Richard Hesse:
> A couple of questions:
> 
> 1) What is your rack topology? Are all ceph nodes in the same rack
> communicating with the same top of rack switch?

The cluster is planned for one rack with two ToR-/cluster-internal
switches. The cluster will be accessed from 2-3 machines mounted into
the same rack, which have uplinks to the outside world.

> 2) Why did you choose to run the ceph nodes on loopback interfaces as
> opposed to the /24 for the "public" interface?

The fabric needs loopback ip addresses and the plan was/is to use them
directly for Ceph. What would you suggest instead?

> 3) Are you planning on using RGW at all?

No, there won't be any RGW. It is a plain rbd cluster, which will be
used for backup purposes.

Best Regards

Jan

> On Thu, Apr 13, 2017 at 10:57 AM, Jan Marquardt <j...@artfiles.de
> <mailto:j...@artfiles.de>> wrote:
> 
> Hi,
> 
> I am currently working on Ceph with an underlying Clos IP fabric and I
> am hitting some issues.
> 
> The setup looks as follows: There are 3 Ceph nodes which are running
> OSDs and MONs. Each server has one /32 loopback ip, which it announces
> via BGP to its uplink switches. Besides the loopback ip each server has
> an management interface with a public (not to be confused with ceph's
> public network) ip address. For BGP switches and servers are running
> quagga/frr.
> 
> Loopback ips:
> 
> 10.10.100.1 # switch1
> 10.10.100.2 # switch2
> 10.10.100.21# server1
> 10.10.100.22# server2
> 10.10.100.23# server3
> 
> Ceph's public network is 10.10.100.0/24 <http://10.10.100.0/24>.
> 
> Here comes the current main problem: There are two options for
> configuring the loopback address.
> 
> 1.) Configure it on lo. In this case the routing works as inteded, but,
> as far as I found out, Ceph can not be run on lo interface.
> 
> root@server1:~# ip route get 10.10.100.22
> 10.10.100.22 via 169.254.0.1 dev enp4s0f1  src 10.10.100.21
> cache
> 
> 2.) Configure it on dummy0. In this case Ceph is able to start, but
> quagga installs learned routes with wrong source addresses - the public
> management address from each host. This results in network problems,
> because Ceph uses the management ips to communicate to the other Ceph
> servers.
> 
> root@server1:~# ip route get 10.10.100.22
> 10.10.100.22 via 169.254.0.1 dev enp4s0f1  src a.b.c.d
> cache
> 
> (where a.b.c.d is the machine's public ip address on its management
> interface)
> 
> Has already someone done something similar?
> 
> Please let me know, if you need any further information. Any help would
> really be appreciated.
> 
> Best Regards
> 
> Jan
> 
> --
> Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
> Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
> E-Mail: supp...@artfiles.de <mailto:supp...@artfiles.de> | Web:
> http://www.artfiles.de
> Geschäftsführer: Harald Oltmanns | Tim Evers
> Eingetragen im Handelsregister Hamburg - HRB 81478
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph with Clos IP fabric

2017-04-13 Thread Jan Marquardt
Hi,

I am currently working on Ceph with an underlying Clos IP fabric and I
am hitting some issues.

The setup looks as follows: There are 3 Ceph nodes which are running
OSDs and MONs. Each server has one /32 loopback ip, which it announces
via BGP to its uplink switches. Besides the loopback ip each server has
an management interface with a public (not to be confused with ceph's
public network) ip address. For BGP switches and servers are running
quagga/frr.

Loopback ips:

10.10.100.1 # switch1
10.10.100.2 # switch2
10.10.100.21# server1
10.10.100.22# server2
10.10.100.23# server3

Ceph's public network is 10.10.100.0/24.

Here comes the current main problem: There are two options for
configuring the loopback address.

1.) Configure it on lo. In this case the routing works as inteded, but,
as far as I found out, Ceph can not be run on lo interface.

root@server1:~# ip route get 10.10.100.22
10.10.100.22 via 169.254.0.1 dev enp4s0f1  src 10.10.100.21
cache

2.) Configure it on dummy0. In this case Ceph is able to start, but
quagga installs learned routes with wrong source addresses - the public
management address from each host. This results in network problems,
because Ceph uses the management ips to communicate to the other Ceph
servers.

root@server1:~# ip route get 10.10.100.22
10.10.100.22 via 169.254.0.1 dev enp4s0f1  src a.b.c.d
cache

(where a.b.c.d is the machine's public ip address on its management
interface)

Has already someone done something similar?

Please let me know, if you need any further information. Any help would
really be appreciated.

Best Regards

Jan

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com