[ceph-users] All shards of PG missing object and inconsistent

2018-09-21 Thread Thomas White
Hi all,

 

I have recently performed a few tasks, namely purging several buckets from
our RGWs and added additional hosts into Ceph causing some data movement for
a rebalance. As this is now almost completed, I kicked off some deep scrubs
and one PG is now returning the following information:

 

2018-09-21 23:17:59.717286 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 shard 313 missing
14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC
TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04
Assets%2fFootage %ef%80%a2 Audio
Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54
ZXZZQgos1D4Ywp.359_6:head

2018-09-21 23:17:59.717292 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 shard 665 missing
14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC
TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04
Assets%2fFootage %ef%80%a2 Audio
Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54
ZXZZQgos1D4Ywp.359_6:head

2018-09-21 23:17:59.885884 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 shard 385 missing
14:18daa344:::default.162489536.28__shadow_24TB%2f24TB%2fDESIGNTEAM%2fPROJEC
TS%2fDLA Piper%2f_DLA_4033_ Global Thought Leadership%2fFilm%2f04
Assets%2fFootage %ef%80%a2 Audio
Sync%2fDLA_Thought_Leadership_NYC%2fCam_1%2fA13I1483.MOV.2~5v3nJDNLONBYszy54
ZXZZQgos1D4Ywp.359_6:head

2018-09-21 23:20:24.954402 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 scrub stat mismatch, got 44026/44025 objects, 0/0 clones,
44026/44025 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts,
45423386817/45419192513 bytes, 0/0 hit_set_archive bytes.

2018-09-21 23:20:24.954418 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 scrub 1 missing, 0 inconsistent objects

2018-09-21 23:20:24.954421 7f2f16796700 -1 log_channel(cluster) log [ERR] :
14.1b18 scrub 4 errors

 

The object I recognise by name as belonging to a bucket purged earlier in
the day, and is meant to be deleted. What would be the best means to resolve
this inconsistency when the object is supposed to be absent?

 

Kind Regards,

 

Thomas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore DB showing as ssd

2018-09-21 Thread Brett Chancellor
Hi all. Quick question about osd metadata information. I have several OSDs
setup with the data dir on HDD and the db going to a partition on ssd. But
when I look at the metadata for all the OSDs, it's showing the db as "hdd".
Does this effect anything? And is there anyway to change it?

$ sudo ceph osd metadata 1
{
"id": 1,
"arch": "x86_64",
"back_addr": ":6805/2053608",
"back_iface": "eth0",
"bluefs": "1",
"bluefs_db_access_mode": "blk",
"bluefs_db_block_size": "4096",
"bluefs_db_dev": "8:80",
"bluefs_db_dev_node": "sdf",
"bluefs_db_driver": "KernelDevice",
"bluefs_db_model": "PERC H730 Mini  ",
"bluefs_db_partition_path": "/dev/sdf2",
"bluefs_db_rotational": "1",
"bluefs_db_size": "266287972352",
*"bluefs_db_type": "hdd",*
"bluefs_single_shared_device": "0",
"bluefs_slow_access_mode": "blk",
"bluefs_slow_block_size": "4096",
"bluefs_slow_dev": "253:1",
"bluefs_slow_dev_node": "dm-1",
"bluefs_slow_driver": "KernelDevice",
"bluefs_slow_model": "",
"bluefs_slow_partition_path": "/dev/dm-1",
"bluefs_slow_rotational": "1",
"bluefs_slow_size": "6000601989120",
"bluefs_slow_type": "hdd",
"bluestore_bdev_access_mode": "blk",
"bluestore_bdev_block_size": "4096",
"bluestore_bdev_dev": "253:1",
"bluestore_bdev_dev_node": "dm-1",
"bluestore_bdev_driver": "KernelDevice",
"bluestore_bdev_model": "",
"bluestore_bdev_partition_path": "/dev/dm-1",
"bluestore_bdev_rotational": "1",
"bluestore_bdev_size": "6000601989120",
"bluestore_bdev_type": "hdd",
"ceph_version": "ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)",
"cpu": "Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz",
"default_device_class": "hdd",
"distro": "centos",
"distro_description": "CentOS Linux 7 (Core)",
"distro_version": "7",
"front_addr": ":6804/2053608",
"front_iface": "eth0",
"hb_back_addr": ".78:6806/2053608",
"hb_front_addr": ".78:6807/2053608",
"hostname": "ceph0rdi-osd2-1-xrd.eng.sfdc.net",
"journal_rotational": "1",
"kernel_description": "#1 SMP Tue Jun 26 16:32:21 UTC 2018",
"kernel_version": "3.10.0-862.6.3.el7.x86_64",
"mem_swap_kb": "0",
"mem_total_kb": "131743604",
"os": "Linux",
"osd_data": "/var/lib/ceph/osd/ceph-1",
"osd_objectstore": "bluestore",
"rotational": "1"
}
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush map reclassifier

2018-09-21 Thread Paul Emmerich
I've used a crush location hook script to handle this before device
classes existed.
It checked the device type on startup and assigned the crush position
based on this.

I don't have that crush map any longer anywhere but the basic version
of it looked like
this: two roots "hdd" and "ssd". The hdd roots had servers with their
hostname in it and the
ssd root had buckets of type host "-ssd" appended to the hostname (for
a reason I don't remember).

At some time someone consolidated the two roots under yet another root
because some tool
(I think it might have been proxmox?) couldn't handle separate roots,
especially if none of them
was named "default". We then had two roots within another bucket of
type root which (surprisingly)
worked and is probably a weird edge case.

Probably not too helpful because I don't have any IDs or anything left
from that era...

Paul
Am Sa., 22. Sep. 2018 um 00:39 Uhr schrieb Sage Weil :
>
> Hi everyone,
>
> In luminous we added the crush device classes that automagically
> categorize your OSDs and hdd, ssd, etc, and allow you write CRUSH rules
> that target a subset of devices.  Prior to this it was necessary to make
> custom edits to your CRUSH map with parallel hierarchies for each
> OSD type, and (similarly) to disable the osd_crush_update_on_start option.
>
> As Dan has noted previously, transitioning from a legacy map to a modern
> one using classes in the naive way will reshuffle all of your data.  He
> worked out a procedure do do this manually but it is delicate and error
> prone.  I'm working on a tool to do it in a robust/safe way now.
>
> However... I want to make sure that the tool is sufficiently general.
> Can anyone/everyone who has a customized CRUSH map to deal with different
> OSD device types please send me a copy (e.g., ceph osd getcrushmap -o
> mycrushmap) so I can test the tool against your map?
>
> Thanks!
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] crush map reclassifier

2018-09-21 Thread Sage Weil
Hi everyone,

In luminous we added the crush device classes that automagically 
categorize your OSDs and hdd, ssd, etc, and allow you write CRUSH rules 
that target a subset of devices.  Prior to this it was necessary to make 
custom edits to your CRUSH map with parallel hierarchies for each 
OSD type, and (similarly) to disable the osd_crush_update_on_start option.

As Dan has noted previously, transitioning from a legacy map to a modern 
one using classes in the naive way will reshuffle all of your data.  He 
worked out a procedure do do this manually but it is delicate and error 
prone.  I'm working on a tool to do it in a robust/safe way now.

However... I want to make sure that the tool is sufficiently general.  
Can anyone/everyone who has a customized CRUSH map to deal with different 
OSD device types please send me a copy (e.g., ceph osd getcrushmap -o 
mycrushmap) so I can test the tool against your map?

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw rest API to retrive rgw log entries

2018-09-21 Thread Jin Mao
I am looking for an API equivalent of 'radosgw-admin log list' and
'radosgw-admin log show'. Existing /usage API only reports bucket level
numbers like 'radosgw-admin usage show' does. Does anyone know if this is
possible from rest API?

Thanks.

Jin.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

Le vendredi 21 septembre 2018 à 19:45 +0200, Paul Emmerich a écrit :
> The cache tiering has nothing to do with the PG of the underlying
> pool
> being incomplete.
> You are just seeing these requests as stuck because it's the only
> thing trying to write to the underlying pool.

I agree, It was just to be sure that the problems on OSD 32, 68 and 69
are related to only one "real" problem.


> What you need to fix is the PG showing incomplete.  I assume you
> already tried reducing the min_size to 4 as suggested? Or did you by
> chance always run with min_size 4 on the ec pool, which is a common
> cause for problems like this.

Yes, it has always run with min_size 4.

We use Luminous 12.2.8 here, but some (~40%) OSD still run Luminous
12.2.7. I was hoping to "fix" this problem before to continue
upgrading.

pool details :

pool 37 'bkp-foo-raid6' erasure size 6 min_size 4 crush_rule 20
object_hash rjenkins pg_num 256 pgp_num 256 last_change 585715 lfor
585714/585714 flags hashpspool,backfillfull stripe_width 4096 fast_read
1 application rbd
removed_snaps [1~3]




> Can you share the output of "ceph osd pool ls detail"?
> Also, which version of Ceph are you running?
> Paul
> 
> Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet
> :
> > 
> > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69
> > are
> > fine, no more blocked.
> > 
> > But OSD 32 is still blocked, and PG 37.9c still marked incomplete
> > with
> > :
> > 
> > "recovery_state": [
> > {
> > "name": "Started/Primary/Peering/Incomplete",
> > "enter_time": "2018-09-21 18:56:01.222970",
> > "comment": "not enough complete instances of this PG"
> > },
> > 
> > But I don't see blocked requests in OSD.32 logs, should I increase
> > one
> > of the "debug_xx" flag ?
> > 
> > 
> > Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> > > According to the query output you pasted shards 1 and 2 are
> > > broken.
> > > But, on the other hand EC profile (4+2) should make it possible
> > > to
> > > recover from 2 shards lost simultanously...
> > > 
> > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet 
> > > napisał(a):
> > > > Well on drive, I can find thoses parts :
> > > > 
> > > > - cs0 on OSD 29 and 30
> > > > - cs1 on OSD 18 and 19
> > > > - cs2 on OSD 13
> > > > - cs3 on OSD 66
> > > > - cs4 on OSD 0
> > > > - cs5 on OSD 75
> > > > 
> > > > And I can read thoses files too.
> > > > 
> > > > And all thoses OSD are UP and IN.
> > > > 
> > > > 
> > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a
> > > > écrit :
> > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > > 
> > > > cache-
> > > > > > > flush-
> > > > > > > evict-all", but it blocks on the object
> > > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > 
> > > > > This is the object that's stuck in the cache tier (according
> > > > > to
> > > > > your
> > > > > output in https://pastebin.com/zrwu5X0w). Can you verify if
> > > > > that
> > > > > block
> > > > > device is in use and healthy or is it corrupt?
> > > > > 
> > > > > 
> > > > > Zitat von Maks Kowalik :
> > > > > 
> > > > > > Could you, please paste the output of pg 37.9c query
> > > > > > 
> > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet <
> > > > > > ceph.l...@daevel.fr>
> > > > > > napisał(a):
> > > > > > 
> > > > > > > In fact, one object (only one) seem to be blocked on the
> > > > 
> > > > cache
> > > > > > > tier
> > > > > > > (writeback).
> > > > > > > 
> > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > > 
> > > > cache-
> > > > > > > flush-
> > > > > > > evict-all", but it blocks on the object
> > > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > > > 
> > > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p
> > > > 
> > > > cache-
> > > > > > > bkp-foo
> > > > > > > ls" now show only 3 objects :
> > > > > > > 
> > > > > > > rbd_directory
> > > > > > > rbd_data.f66c92ae8944a.000f2596
> > > > > > > rbd_header.f66c92ae8944a
> > > > > > > 
> > > > > > > And "cache-flush-evict-all" still hangs.
> > > > > > > 
> > > > > > > I also switched the cache tier to "readproxy", to avoid
> > > > > > > using
> > > > > > > this
> > > > > > > cache. But, it's still blocked.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier
> > > > > > > Bonvalet
> > > > 
> > > > a
> > > > > > > écrit :
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > on a Luminous cluster, I have a PG incomplete and I
> > > > > > > > can't
> > > > 
> > > > find
> > > > > > > > how to
> > > > > > > > fix that.
> > > > > > > > 
> > > > > > > > It's an EC pool (4+2) :
> > > > > > > > 
> > > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75]
> > > > 
> > > > (reducing
> > > > > > > > pool
> > > > > > > > bkp-sb-raid6 min_size from 4 may help; search
> > > > > > > > 

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread Fabian Grünbichler
On Fri, Sep 21, 2018 at 09:03:15AM +0200, Hervé Ballans wrote:
> Hi MJ (and all),
> 
> So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the
> operation in a few words : overall, everything went well :)
> The most critical operation of all is the 'osd crush tunables optimal', I
> talk about it in more detail after...
> 
> The Proxmox documentation is really well written and accurate and, normally,
> following the documentation step by step is almost sufficient !

Glad to hear that everything worked well.

> 
> * first step : upgrade Ceph Jewel to Luminous :
> https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
> (Note here : OSDs remain in FileStore backend, no BlueStore migration)
> 
> * second step : upgrade Proxmox version 4 to 5 :
> https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
> 
> Just some numbers, observations and tips (based on our feedback, I'm not an
> expert !) :
> 
> * Before migration, make sure you are in the lastest version of Proxmox 4
> (4.4-24) and Ceph Jewel (10.2.11)
> 
> * We don't use the pve repository for ceph packages but the official one
> (download.ceph.com). Thus, during the upgrade of Promox PVE, we don't
> replace ceph.com repository with promox.com Ceph repository...

This is not recommended (and for a reason) - our packages are almost
identical to the upstream/official ones. But we do include the
occasional bug fix much faster than the official packages do, including
reverting breakage. Furthermore, when using our repository, you know
that the packages went through our own testing to ensure compatibility
with our stack (e.g., issues like JSON output changing from one minor
release to the next breaking our integration/GUI). Also, this natural
delay between upstream releases and availability in our repository has
saved our users from lots of "serious bug noticed one day after release"
issues since we switched to providing Ceph via our own repositories.

> * When you upgrade Ceph to Luminous (without tunables optimal), there is no
> impact on Proxmox 4. VMs are still running normally.
> The side effect (non blocking for the functionning of VMs) is located in the
> GUI, on the Ceph menu : it can't report the status of the ceph cluster as it
> has a JSON formatting error (indeed the output of the command 'ceph -s' is
> completely different, really more readable on Luminous)

Yes, this is to be expected. Backporting all of that just for the short
time window of "upgrade in progress" is too much work for too little
gain.

> 
> * It misses a little step in section 8 "Create Manager instances" of the
> upgrade ceph documentation. As the Ceph manager daemon is new since
> Luminous, the package doesn't exist on Jewel. So you have to install the
> ceph-mgr package on each node first before doing 'pveceph createmgr'|||
> |

It actually does not ;) ceph-mgr is pulled in by ceph on upgrades from
Jewel to Luminous - unless you manually removed that package at some
point.

> Otherwise :
> - verify that all your VMs are recently backuped on an external storage (in
> case of Disaster recovery Plan !)

Good idea in general :D

> - if you can, stop all your non-critical VMs (in order to limit client io
> operations)
> - if any, wait for the end of current backups then disable datacenter backup
> (in order to limit client io operations). !! do not forget to re-enable it
> when all is over !!
> - if any and if no longer needed, delete your snapshots, it removes many
> useless objects !
> - start the tunables operation outside of major activity periods (night,
> week-end, ??) and take into account that it can be very slow...

Scheduling and carefully planning rebalancing operations is always
needed on a production cluster. Note that the upgrade docs state that
switching to "tunables optimal" is recommended, but "will cause a
massive rebalance".

> There are probably some options to configure in ceph to avoid 'pgs stuck'
> states, but on our side, as we previously moved our critical VM's disks, we
> didn't care about that !
> 
> * Anyway, the upgrade step of Proxmox PVE is done easily and quickly (just
> follow the documentation). Note that you can upgrade Proxmox PVE before
> doing the 'tunables optimal' operation.
> 
> Hoping that you will find this information useful, good luck with your very
> next migration !

Thank you for the detailled report and feedback!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Paul Emmerich
The cache tiering has nothing to do with the PG of the underlying pool
being incomplete.
You are just seeing these requests as stuck because it's the only
thing trying to write to the underlying pool.

What you need to fix is the PG showing incomplete.  I assume you
already tried reducing the min_size to 4 as suggested? Or did you by
chance always run with min_size 4 on the ec pool, which is a common
cause for problems like this.
Can you share the output of "ceph osd pool ls detail"?
Also, which version of Ceph are you running?

Paul

Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet
:
>
> So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are
> fine, no more blocked.
>
> But OSD 32 is still blocked, and PG 37.9c still marked incomplete with
> :
>
> "recovery_state": [
> {
> "name": "Started/Primary/Peering/Incomplete",
> "enter_time": "2018-09-21 18:56:01.222970",
> "comment": "not enough complete instances of this PG"
> },
>
> But I don't see blocked requests in OSD.32 logs, should I increase one
> of the "debug_xx" flag ?
>
>
> Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> > According to the query output you pasted shards 1 and 2 are broken.
> > But, on the other hand EC profile (4+2) should make it possible to
> > recover from 2 shards lost simultanously...
> >
> > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet 
> > napisał(a):
> > > Well on drive, I can find thoses parts :
> > >
> > > - cs0 on OSD 29 and 30
> > > - cs1 on OSD 18 and 19
> > > - cs2 on OSD 13
> > > - cs3 on OSD 66
> > > - cs4 on OSD 0
> > > - cs5 on OSD 75
> > >
> > > And I can read thoses files too.
> > >
> > > And all thoses OSD are UP and IN.
> > >
> > >
> > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > cache-
> > > > > > flush-
> > > > > > evict-all", but it blocks on the object
> > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > >
> > > > This is the object that's stuck in the cache tier (according to
> > > > your
> > > > output in https://pastebin.com/zrwu5X0w). Can you verify if that
> > > > block
> > > > device is in use and healthy or is it corrupt?
> > > >
> > > >
> > > > Zitat von Maks Kowalik :
> > > >
> > > > > Could you, please paste the output of pg 37.9c query
> > > > >
> > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > > > > napisał(a):
> > > > >
> > > > > > In fact, one object (only one) seem to be blocked on the
> > > cache
> > > > > > tier
> > > > > > (writeback).
> > > > > >
> > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > cache-
> > > > > > flush-
> > > > > > evict-all", but it blocks on the object
> > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > >
> > > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p
> > > cache-
> > > > > > bkp-foo
> > > > > > ls" now show only 3 objects :
> > > > > >
> > > > > > rbd_directory
> > > > > > rbd_data.f66c92ae8944a.000f2596
> > > > > > rbd_header.f66c92ae8944a
> > > > > >
> > > > > > And "cache-flush-evict-all" still hangs.
> > > > > >
> > > > > > I also switched the cache tier to "readproxy", to avoid using
> > > > > > this
> > > > > > cache. But, it's still blocked.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet
> > > a
> > > > > > écrit :
> > > > > > > Hello,
> > > > > > >
> > > > > > > on a Luminous cluster, I have a PG incomplete and I can't
> > > find
> > > > > > > how to
> > > > > > > fix that.
> > > > > > >
> > > > > > > It's an EC pool (4+2) :
> > > > > > >
> > > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75]
> > > (reducing
> > > > > > > pool
> > > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs
> > > for
> > > > > > > 'incomplete')
> > > > > > >
> > > > > > > Of course, we can't reduce min_size from 4.
> > > > > > >
> > > > > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > > > >
> > > > > > > So, IO are blocked, we can't access thoses damaged data.
> > > > > > > OSD blocks too :
> > > > > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > > > >
> > > > > > > OSD 32 is the primary of this PG.
> > > > > > > And OSD 68 and 69 are for cache tiering.
> > > > > > >
> > > > > > > Any idea how can I fix that ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > > >
> > > > > > > ___
> > > > > > > ceph-users mailing list
> > > > > > > ceph-users@lists.ceph.com
> > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > >
> > > > > >
> > > > > > ___
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> > > >
> > > >

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are
fine, no more blocked.

But OSD 32 is still blocked, and PG 37.9c still marked incomplete with
:

"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-09-21 18:56:01.222970",
"comment": "not enough complete instances of this PG"
},

But I don't see blocked requests in OSD.32 logs, should I increase one
of the "debug_xx" flag ?


Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> According to the query output you pasted shards 1 and 2 are broken.
> But, on the other hand EC profile (4+2) should make it possible to
> recover from 2 shards lost simultanously... 
> 
> pt., 21 wrz 2018 o 16:29 Olivier Bonvalet 
> napisał(a):
> > Well on drive, I can find thoses parts :
> > 
> > - cs0 on OSD 29 and 30
> > - cs1 on OSD 18 and 19
> > - cs2 on OSD 13
> > - cs3 on OSD 66
> > - cs4 on OSD 0
> > - cs5 on OSD 75
> > 
> > And I can read thoses files too.
> > 
> > And all thoses OSD are UP and IN.
> > 
> > 
> > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > cache-
> > > > > flush-
> > > > > evict-all", but it blocks on the object
> > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > 
> > > This is the object that's stuck in the cache tier (according to
> > > your  
> > > output in https://pastebin.com/zrwu5X0w). Can you verify if that
> > > block  
> > > device is in use and healthy or is it corrupt?
> > > 
> > > 
> > > Zitat von Maks Kowalik :
> > > 
> > > > Could you, please paste the output of pg 37.9c query
> > > > 
> > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > > > napisał(a):
> > > > 
> > > > > In fact, one object (only one) seem to be blocked on the
> > cache
> > > > > tier
> > > > > (writeback).
> > > > > 
> > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > cache-
> > > > > flush-
> > > > > evict-all", but it blocks on the object
> > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > 
> > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p
> > cache-
> > > > > bkp-foo
> > > > > ls" now show only 3 objects :
> > > > > 
> > > > > rbd_directory
> > > > > rbd_data.f66c92ae8944a.000f2596
> > > > > rbd_header.f66c92ae8944a
> > > > > 
> > > > > And "cache-flush-evict-all" still hangs.
> > > > > 
> > > > > I also switched the cache tier to "readproxy", to avoid using
> > > > > this
> > > > > cache. But, it's still blocked.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet
> > a
> > > > > écrit :
> > > > > > Hello,
> > > > > > 
> > > > > > on a Luminous cluster, I have a PG incomplete and I can't
> > find
> > > > > > how to
> > > > > > fix that.
> > > > > > 
> > > > > > It's an EC pool (4+2) :
> > > > > > 
> > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75]
> > (reducing
> > > > > > pool
> > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs
> > for
> > > > > > 'incomplete')
> > > > > > 
> > > > > > Of course, we can't reduce min_size from 4.
> > > > > > 
> > > > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > > > 
> > > > > > So, IO are blocked, we can't access thoses damaged data.
> > > > > > OSD blocks too :
> > > > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > > > 
> > > > > > OSD 32 is the primary of this PG.
> > > > > > And OSD 68 and 69 are for cache tiering.
> > > > > > 
> > > > > > Any idea how can I fix that ?
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > Olivier
> > > > > > 
> > > > > > 
> > > > > > ___
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > 
> > > > > 
> > > > > ___
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > 
> > > 
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Maks Kowalik
According to the query output you pasted shards 1 and 2 are broken.
But, on the other hand EC profile (4+2) should make it possible to recover
from 2 shards lost simultanously...

pt., 21 wrz 2018 o 16:29 Olivier Bonvalet  napisał(a):

> Well on drive, I can find thoses parts :
>
> - cs0 on OSD 29 and 30
> - cs1 on OSD 18 and 19
> - cs2 on OSD 13
> - cs3 on OSD 66
> - cs4 on OSD 0
> - cs5 on OSD 75
>
> And I can read thoses files too.
>
> And all thoses OSD are UP and IN.
>
>
> Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > > flush-
> > > > evict-all", but it blocks on the object
> > > > "rbd_data.f66c92ae8944a.000f2596".
> >
> > This is the object that's stuck in the cache tier (according to
> > your
> > output in https://pastebin.com/zrwu5X0w). Can you verify if that
> > block
> > device is in use and healthy or is it corrupt?
> >
> >
> > Zitat von Maks Kowalik :
> >
> > > Could you, please paste the output of pg 37.9c query
> > >
> > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > > napisał(a):
> > >
> > > > In fact, one object (only one) seem to be blocked on the cache
> > > > tier
> > > > (writeback).
> > > >
> > > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > > flush-
> > > > evict-all", but it blocks on the object
> > > > "rbd_data.f66c92ae8944a.000f2596".
> > > >
> > > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-
> > > > bkp-foo
> > > > ls" now show only 3 objects :
> > > >
> > > > rbd_directory
> > > > rbd_data.f66c92ae8944a.000f2596
> > > > rbd_header.f66c92ae8944a
> > > >
> > > > And "cache-flush-evict-all" still hangs.
> > > >
> > > > I also switched the cache tier to "readproxy", to avoid using
> > > > this
> > > > cache. But, it's still blocked.
> > > >
> > > >
> > > >
> > > >
> > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > > > écrit :
> > > > > Hello,
> > > > >
> > > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > > how to
> > > > > fix that.
> > > > >
> > > > > It's an EC pool (4+2) :
> > > > >
> > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > > pool
> > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > > 'incomplete')
> > > > >
> > > > > Of course, we can't reduce min_size from 4.
> > > > >
> > > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > >
> > > > > So, IO are blocked, we can't access thoses damaged data.
> > > > > OSD blocks too :
> > > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > >
> > > > > OSD 32 is the primary of this PG.
> > > > > And OSD 68 and 69 are for cache tiering.
> > > > >
> > > > > Any idea how can I fix that ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Olivier
> > > > >
> > > > >
> > > > > ___
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No fix for 0x6706be76 CRCs ? [SOLVED] (WORKAROUND)

2018-09-21 Thread Alfredo Daniel Rezinovsky

I have ubuntu servers.

With ukuu I installed kernel 4.8.17-040817 (The last < 4.9 available 
kernel) and I haven't any 0x6706be76 crc since.


Nor any inconsistence.


On 19/09/18 12:01, Alfredo Daniel Rezinovsky wrote:

Tried 4.17 with the same problem

Just downgraded to 4.8. Let's see if no more 0x67... appears


On 18/09/18 16:28, Alfredo Daniel Rezinovsky wrote:
I started with this after upgrade to bionic. I had Xenial with lts 
kernels (4.13) without problem.


I will try to change to ubuntu 4.13 and wait for the logs.

Thanks


On 18/09/18 16:27, Paul Emmerich wrote:

Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).

4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.


Paul

2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
:

MOMENT !!!

"Some kernels (4.9+) sometime fail to return data when reading from 
a block

device under memory pressure."

I dind't knew that was the problem. Can't I just dowgrade the kernel?

There are known working versions o just need to be prior 4.9?


On 18/09/18 16:19, Paul Emmerich wrote:

We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
:

Changed all my hardware. Now I have plenty of free ram. swap never 
needed,

low iowait and still

7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, 
expected

0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::1729cdb.0006

It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo









--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Well on drive, I can find thoses parts :

- cs0 on OSD 29 and 30
- cs1 on OSD 18 and 19
- cs2 on OSD 13
- cs3 on OSD 66
- cs4 on OSD 0
- cs5 on OSD 75

And I can read thoses files too.

And all thoses OSD are UP and IN.


Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > flush-
> > > evict-all", but it blocks on the object
> > > "rbd_data.f66c92ae8944a.000f2596".
> 
> This is the object that's stuck in the cache tier (according to
> your  
> output in https://pastebin.com/zrwu5X0w). Can you verify if that
> block  
> device is in use and healthy or is it corrupt?
> 
> 
> Zitat von Maks Kowalik :
> 
> > Could you, please paste the output of pg 37.9c query
> > 
> > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > napisał(a):
> > 
> > > In fact, one object (only one) seem to be blocked on the cache
> > > tier
> > > (writeback).
> > > 
> > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > flush-
> > > evict-all", but it blocks on the object
> > > "rbd_data.f66c92ae8944a.000f2596".
> > > 
> > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-
> > > bkp-foo
> > > ls" now show only 3 objects :
> > > 
> > > rbd_directory
> > > rbd_data.f66c92ae8944a.000f2596
> > > rbd_header.f66c92ae8944a
> > > 
> > > And "cache-flush-evict-all" still hangs.
> > > 
> > > I also switched the cache tier to "readproxy", to avoid using
> > > this
> > > cache. But, it's still blocked.
> > > 
> > > 
> > > 
> > > 
> > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > > écrit :
> > > > Hello,
> > > > 
> > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > how to
> > > > fix that.
> > > > 
> > > > It's an EC pool (4+2) :
> > > > 
> > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > pool
> > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > 'incomplete')
> > > > 
> > > > Of course, we can't reduce min_size from 4.
> > > > 
> > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > 
> > > > So, IO are blocked, we can't access thoses damaged data.
> > > > OSD blocks too :
> > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > 
> > > > OSD 32 is the primary of this PG.
> > > > And OSD 68 and 69 are for cache tiering.
> > > > 
> > > > Any idea how can I fix that ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Olivier
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd map question

2018-09-21 Thread Vikas Rana
Hi
I’m using 10.2.10

Thx

On Fri, Sep 21, 2018 at 9:14 AM Mykola Golub 
wrote:

> Vikas, could you tell what version do you observe this on?
>
> Because I can reproduce this only on jewel, and it has been fixed
> starting since luminous 12.2.1 [1].
>
> [1] http://tracker.ceph.com/issues/20426
>
> On Wed, Sep 19, 2018 at 03:48:44PM -0400, Jason Dillaman wrote:
> > Thanks for reporting this -- it looks like we broke the part where
> > command-line config overrides were parsed out from the parsing. I've
> > opened a tracker ticket against the issue [1].
> >
> > On Wed, Sep 19, 2018 at 2:49 PM Vikas Rana  wrote:
> > >
> > > Hi there,
> > >
> > > With default cluster name "ceph" I can map rbd-nbd without any issue.
> > >
> > > But for a different cluster name, i'm not able to map image using
> rbd-nbd and getting
> > >
> > > root@vtier-P-node1:/etc/ceph# rbd-nbd --cluster cephdr map
> test-pool/testvol
> > > rbd-nbd: unknown command: --cluster
> > >
> > >
> > > I looked at the man page and the syntax looks right.
> > > Can someone please help me on what I'm doing wrong?
> > >
> > > Thanks,
> > > -Vikas
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > [1] http://tracker.ceph.com/issues/36089
> >
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> Mykola Golub
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Yep :

pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26
object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor
68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode
readproxy target_bytes 209715200 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s
x2 decay_rate 0 search_last_n 0 min_read_recency_for_promote 10
min_write_recency_for_promote 2 stripe_width 0

I can't totally disable the cache tiering, because OSD are in filestore
(so without "overwrites" feature).

Le vendredi 21 septembre 2018 à 13:26 +, Eugen Block a écrit :
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> 
> You could change the cache mode to "none" to disable it. Could you  
> paste the output of:
> 
> ceph osd pool ls detail | grep cache-bkp-foo
> 
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > In fact, one object (only one) seem to be blocked on the cache tier
> > (writeback).
> > 
> > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > flush-
> > evict-all", but it blocks on the object
> > "rbd_data.f66c92ae8944a.000f2596".
> > 
> > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-
> > foo
> > ls" now show only 3 objects :
> > 
> > rbd_directory
> > rbd_data.f66c92ae8944a.000f2596
> > rbd_header.f66c92ae8944a
> > 
> > And "cache-flush-evict-all" still hangs.
> > 
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> > 
> > 
> > 
> > 
> > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > écrit :
> > > Hello,
> > > 
> > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > how to
> > > fix that.
> > > 
> > > It's an EC pool (4+2) :
> > > 
> > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > pool
> > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > 'incomplete')
> > > 
> > > Of course, we can't reduce min_size from 4.
> > > 
> > > And the full state : https://pastebin.com/zrwu5X0w
> > > 
> > > So, IO are blocked, we can't access thoses damaged data.
> > > OSD blocks too :
> > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > 
> > > OSD 32 is the primary of this PG.
> > > And OSD 68 and 69 are for cache tiering.
> > > 
> > > Any idea how can I fix that ?
> > > 
> > > Thanks,
> > > 
> > > Olivier
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard Object Gateway

2018-09-21 Thread Volker Theile
Hi Hendrik,

thank you for reporting the issue. I've opened a tracker issue for that,
see https://tracker.ceph.com/issues/36109.

As workaround manually configure host and port via CLI using "ceph
dashboard set-rgw-api-host " and "ceph dashboard set-rgw-api-port
"?

Regards
Volker

Am 18.09.2018 um 12:57 schrieb Hendrik Peyerl:
> Hello all,
>
> we just deployed an Object Gateway to our CEPH Cluster via ceph-deploy
> in an IPv6 only Mimic Cluster. To make sure the RGW listens on IPv6 we
> set the following config:
> rgw_frontends = civetweb port=[::]:7480
>
> We now tried to enable the dashboard functionality for said gateway
> but we are running into an error 500 after trying to access it via the
> dashboard, the mgr log shows the following:
>
> {"status": "500 Internal Server Error", "version": "3.2.2", "detail":
> "The server encountered an unexpected condition which prevented it
> from fulfilling the request.", "traceback": "Traceback (most recent
> call last):\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line
> 656, in respond\\n    response.body = self.handler()\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
> 188, in __call__\\n    self.body = self.oldhandler(*args, **kwargs)\\n
> File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/jsontools.py\\",
> line 61, in json_handler\\n    value =
> cherrypy.serving.request._json_inner_handler(*args, **kwargs)\\n  File
> \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line
> 34, in __call__\\n    return self.callable(*self.args,
> **self.kwargs)\\n File
> \\"/usr/lib64/ceph/mgr/dashboard/controllers/rgw.py\\", line 23, in
> status\\n    instance = RgwClient.admin_instance()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 138,
> in admin_instance\\n    return
> RgwClient.instance(RgwClient._SYSTEM_USERID)\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 121,
> in instance\\n    RgwClient._load_settings()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 102,
> in _load_settings\\n    host, port = _determine_rgw_addr()\\n  File
> \\"/usr/lib64/ceph/mgr/dashboard/services/rgw_client.py\\", line 78,
> in _determine_rgw_addr\\n    raise LookupError(\'Failed to determine
> RGW port\')\\nLookupError: Failed to determine RGW port\\n"}']
>
>
> Any help would be greatly appreciated.
>
> Thanks,
>
> Hendrik
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Volker Theile
Software Engineer | openATTIC
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
Phone: +49 173 5876879
E-Mail: vthe...@suse.com




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] customized ceph cluster name by ceph-deploy

2018-09-21 Thread Paul Emmerich
Cluster names are deprecated, don't use them. I think they might have
been removed with ceph-deploy 2.x (?)


Paul
Am Fr., 21. Sep. 2018 um 15:13 Uhr schrieb Joshua Chen
:
>
> Hi all,
>   I am using ceph-deploy 2.0.1 to create my testing cluster by this command:
>
> ceph-deploy --cluster pescadores  new  --cluster-network 100.109.240.0/24 
> --public-network 10.109.240.0/24  cephmon1 cephmon2 cephmon3
>
> but the --cluster pescadores (name of the cluster) doesn't seem to work. 
> Anyone could help me on this or point out the direction? anything wrong with 
> my cli?
>
> or what is the equivelent ceph command to do the same job?
>
> Cheers
> Joshua
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.


You could change the cache mode to "none" to disable it. Could you  
paste the output of:


ceph osd pool ls detail | grep cache-bkp-foo



Zitat von Olivier Bonvalet :


In fact, one object (only one) seem to be blocked on the cache tier
(writeback).

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".

So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
ls" now show only 3 objects :

rbd_directory
rbd_data.f66c92ae8944a.000f2596
rbd_header.f66c92ae8944a

And "cache-flush-evict-all" still hangs.

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.




Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :

Hello,

on a Luminous cluster, I have a PG incomplete and I can't find how to
fix that.

It's an EC pool (4+2) :

pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
'incomplete')

Of course, we can't reduce min_size from 4.

And the full state : https://pastebin.com/zrwu5X0w

So, IO are blocked, we can't access thoses damaged data.
OSD blocks too :
osds 32,68,69 have stuck requests > 4194.3 sec

OSD 32 is the primary of this PG.
And OSD 68 and 69 are for cache tiering.

Any idea how can I fix that ?

Thanks,

Olivier


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd map question

2018-09-21 Thread Mykola Golub
Vikas, could you tell what version do you observe this on?

Because I can reproduce this only on jewel, and it has been fixed
starting since luminous 12.2.1 [1].

[1] http://tracker.ceph.com/issues/20426

On Wed, Sep 19, 2018 at 03:48:44PM -0400, Jason Dillaman wrote:
> Thanks for reporting this -- it looks like we broke the part where
> command-line config overrides were parsed out from the parsing. I've
> opened a tracker ticket against the issue [1].
> 
> On Wed, Sep 19, 2018 at 2:49 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > With default cluster name "ceph" I can map rbd-nbd without any issue.
> >
> > But for a different cluster name, i'm not able to map image using rbd-nbd 
> > and getting
> >
> > root@vtier-P-node1:/etc/ceph# rbd-nbd --cluster cephdr map test-pool/testvol
> > rbd-nbd: unknown command: --cluster
> >
> >
> > I looked at the man page and the syntax looks right.
> > Can someone please help me on what I'm doing wrong?
> >
> > Thanks,
> > -Vikas
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> [1] http://tracker.ceph.com/issues/36089
> 
> -- 
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mykola Golub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] customized ceph cluster name by ceph-deploy

2018-09-21 Thread Joshua Chen
Hi all,
  I am using ceph-deploy 2.0.1 to create my testing cluster by this command:

ceph-deploy --cluster pescadores  new  --cluster-network 100.109.240.0/24
--public-network 10.109.240.0/24  cephmon1 cephmon2 cephmon3

but the --cluster pescadores (name of the cluster) doesn't seem to work.
Anyone could help me on this or point out the direction? anything wrong
with my cli?

or what is the equivelent ceph command to do the same job?

Cheers
Joshua
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".


This is the object that's stuck in the cache tier (according to your  
output in https://pastebin.com/zrwu5X0w). Can you verify if that block  
device is in use and healthy or is it corrupt?



Zitat von Maks Kowalik :


Could you, please paste the output of pg 37.9c query

pt., 21 wrz 2018 o 14:39 Olivier Bonvalet  napisał(a):


In fact, one object (only one) seem to be blocked on the cache tier
(writeback).

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".

So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
ls" now show only 3 objects :

rbd_directory
rbd_data.f66c92ae8944a.000f2596
rbd_header.f66c92ae8944a

And "cache-flush-evict-all" still hangs.

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.




Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> Hello,
>
> on a Luminous cluster, I have a PG incomplete and I can't find how to
> fix that.
>
> It's an EC pool (4+2) :
>
> pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
>
> Of course, we can't reduce min_size from 4.
>
> And the full state : https://pastebin.com/zrwu5X0w
>
> So, IO are blocked, we can't access thoses damaged data.
> OSD blocks too :
> osds 32,68,69 have stuck requests > 4194.3 sec
>
> OSD 32 is the primary of this PG.
> And OSD 68 and 69 are for cache tiering.
>
> Any idea how can I fix that ?
>
> Thanks,
>
> Olivier
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Maks Kowalik
Could you, please paste the output of pg 37.9c query

pt., 21 wrz 2018 o 14:39 Olivier Bonvalet  napisał(a):

> In fact, one object (only one) seem to be blocked on the cache tier
> (writeback).
>
> I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
> evict-all", but it blocks on the object
> "rbd_data.f66c92ae8944a.000f2596".
>
> So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
> ls" now show only 3 objects :
>
> rbd_directory
> rbd_data.f66c92ae8944a.000f2596
> rbd_header.f66c92ae8944a
>
> And "cache-flush-evict-all" still hangs.
>
> I also switched the cache tier to "readproxy", to avoid using this
> cache. But, it's still blocked.
>
>
>
>
> Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> > Hello,
> >
> > on a Luminous cluster, I have a PG incomplete and I can't find how to
> > fix that.
> >
> > It's an EC pool (4+2) :
> >
> > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > 'incomplete')
> >
> > Of course, we can't reduce min_size from 4.
> >
> > And the full state : https://pastebin.com/zrwu5X0w
> >
> > So, IO are blocked, we can't access thoses damaged data.
> > OSD blocks too :
> > osds 32,68,69 have stuck requests > 4194.3 sec
> >
> > OSD 32 is the primary of this PG.
> > And OSD 68 and 69 are for cache tiering.
> >
> > Any idea how can I fix that ?
> >
> > Thanks,
> >
> > Olivier
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
In fact, one object (only one) seem to be blocked on the cache tier
(writeback).

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".

So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
ls" now show only 3 objects :

rbd_directory
rbd_data.f66c92ae8944a.000f2596
rbd_header.f66c92ae8944a

And "cache-flush-evict-all" still hangs.

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.




Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> Hello,
> 
> on a Luminous cluster, I have a PG incomplete and I can't find how to
> fix that.
> 
> It's an EC pool (4+2) :
> 
> pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
> 
> Of course, we can't reduce min_size from 4.
> 
> And the full state : https://pastebin.com/zrwu5X0w
> 
> So, IO are blocked, we can't access thoses damaged data.
> OSD blocks too :
> osds 32,68,69 have stuck requests > 4194.3 sec
> 
> OSD 32 is the primary of this PG.
> And OSD 68 and 69 are for cache tiering.
> 
> Any idea how can I fix that ?
> 
> Thanks,
> 
> Olivier
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Jason Dillaman
On Fri, Sep 21, 2018 at 6:48 AM Glen Baars  wrote:
>
> Hello Ceph Users,
>
>
>
> We have been using ceph-iscsi-cli for some time now with vmware and it is 
> performing ok.
>
>
>
> We would like to use the same iscsi service to store our Hyper-v VMs via 
> windows clustered shared volumes. When we add the volume to windows failover 
> manager we get a device is not ready error. I am assuming this is due to 
> SCSI-3 persistent reservations.

That is correct -- the upstream kernel LIO doesn't have any support
for distributing SCSI-3 persistent reservations between iSCSI gateways
at this time. SUSE has some custom kernel patches to distribute those
reservations via the Ceph cluster but it has been previously rejected
from inclusion in the upstream kernel. There is also the PetaSAN
project which is derived from the SUSE kernel plus some other changes.

> Has anyone managed to get ceph to serve iscsi to windows clustered shared 
> volumes? If so, how?
>
> Kind regards,
>
> Glen Baars
>
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Maged Mokhtar


Hi Glen,

Yes you need clustered SCSI-3 persistent reservations support.
This is supported in SUSE SLE kernels, you may also be interested in 
PetaSAN:

http://www.petasan.org
which is based on these kernels.

Maged


On 21/09/18 12:48, Glen Baars wrote:


Hello Ceph Users,

We have been using ceph-iscsi-cli for some time now with vmware and it 
is performing ok.


We would like to use the same iscsi service to store our Hyper-v VMs 
via windows clustered shared volumes. When we add the volume to 
windows failover manager we get a device is not ready error. I am 
assuming this is due to SCSI-3 persistent reservations.


Has anyone managed to get ceph to serve iscsi to windows clustered 
shared volumes? If so, how?


Kind regards,

*Glen Baars*

This e-mail is intended solely for the benefit of the addressee(s) and 
any other named recipient. It is confidential and may contain legally 
privileged or confidential information. If you are not the recipient, 
any use, distribution, disclosure or copying of this e-mail is 
prohibited. The confidentiality and legal privilege attached to this 
communication is not waived or lost by reason of the mistaken 
transmission or delivery to you. If you have received this e-mail in 
error, please notify us immediately.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host.

Le vendredi 21 septembre 2018 à 11:09 +, Eugen Block a écrit :
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> 
> I meant the replication size of the pool
> 
> ceph osd pool ls detail | grep 
> 
> In the experimental state of our cluster we had a cache tier (for
> rbd  
> pool) with size 2, that can cause problems during recovery. Since
> only  
> OSDs 68 and 69 are mentioned I was wondering if your cache tier
> also  
> has size 2.
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > Hi,
> > 
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> > 
> > But I don't see any related data on cache-tier OSD (filestore) with
> > :
> > 
> > find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'
> > 
> > 
> > I don't see any usefull information in logs. Maybe I should
> > increase
> > log level ?
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit :
> > > Hi Olivier,
> > > 
> > > what size does the cache tier have? You could set cache-mode to
> > > forward and flush it, maybe restarting those OSDs (68, 69) helps,
> > > too.
> > > Or there could be an issue with the cache tier, what do those
> > > logs
> > > say?
> > > 
> > > Regards,
> > > Eugen
> > > 
> > > 
> > > Zitat von Olivier Bonvalet :
> > > 
> > > > Hello,
> > > > 
> > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > how
> > > > to
> > > > fix that.
> > > > 
> > > > It's an EC pool (4+2) :
> > > > 
> > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > pool
> > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > 'incomplete')
> > > > 
> > > > Of course, we can't reduce min_size from 4.
> > > > 
> > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > 
> > > > So, IO are blocked, we can't access thoses damaged data.
> > > > OSD blocks too :
> > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > 
> > > > OSD 32 is the primary of this PG.
> > > > And OSD 68 and 69 are for cache tiering.
> > > > 
> > > > Any idea how can I fix that ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Olivier
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> 
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block

cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.


I meant the replication size of the pool

ceph osd pool ls detail | grep 

In the experimental state of our cluster we had a cache tier (for rbd  
pool) with size 2, that can cause problems during recovery. Since only  
OSDs 68 and 69 are mentioned I was wondering if your cache tier also  
has size 2.



Zitat von Olivier Bonvalet :


Hi,

cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.

But I don't see any related data on cache-tier OSD (filestore) with :

find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'


I don't see any usefull information in logs. Maybe I should increase
log level ?

Thanks,

Olivier


Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit :

Hi Olivier,

what size does the cache tier have? You could set cache-mode to
forward and flush it, maybe restarting those OSDs (68, 69) helps,
too.
Or there could be an issue with the cache tier, what do those logs
say?

Regards,
Eugen


Zitat von Olivier Bonvalet :

> Hello,
>
> on a Luminous cluster, I have a PG incomplete and I can't find how
> to
> fix that.
>
> It's an EC pool (4+2) :
>
> pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
>
> Of course, we can't reduce min_size from 4.
>
> And the full state : https://pastebin.com/zrwu5X0w
>
> So, IO are blocked, we can't access thoses damaged data.
> OSD blocks too :
> osds 32,68,69 have stuck requests > 4194.3 sec
>
> OSD 32 is the primary of this PG.
> And OSD 68 and 69 are for cache tiering.
>
> Any idea how can I fix that ?
>
> Thanks,
>
> Olivier
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Nicolas Huillard
Thanks!
I was in the process of upgrading, so "noout" was already set, probably
preventing setting "noin".
I thus just "ceph osdset noup", then "ceph osd down ", which
stopped activity on the disks (probably not enough to clean everything
in Bluestore, but I decided to trust its inner working).

I now have an unbootable XFS root filesystem, some OSDs out but
probably OK owith their data, and 4× redundancy. I'll pause and think
about the next steps with no urgency ;-)

Le vendredi 21 septembre 2018 à 11:09 +0200, Patrick Nawracay a écrit :
> Hi,
> 
> you'll need to set `noup` to prevent OSDs from being started
> automatically. The `noin` flags prevents that the cluster sets the
> OSD
> `in` again, after it has been set `out`.
> 
>     `ceph osd set noup` before `ceph osd down `
> 
>     `ceph osd set noin` before `ceph osd out `
> 
> Those global flags (they prevent all OSDs from being automatically
> set
> up/in again), can be disabled with unset.
> 
>     `ceph osd unset `
> 
> Please note that I'm not familiar with recovery of a Ceph cluster,
> I'm
> just trying to answer the question, but don't know if that's the best
> approach in this case.
> 
> Patrick
> 
> 
> On 21.09.2018 10:49, Nicolas Huillard wrote:
> > Hi all,
> > 
> > One of my server crashed its root filesystem, ie. the currently
> > open
> > shell just says "command not found" for any basic command (ls, df,
> > mount, dmesg, etc.)
> > ACPI soft power-off won't work because it needs scripts on /...
> > 
> > Before I reset the hardware, I'd like to cleanly stop the OSDs on
> > this
> > server (with still work because they do not need /).
> > I was able to move the MGR out of that server with "ceph mgr fail
> > [hostname]".
> > Is it possible to tell the OSD on that host to stop, from another
> > host?
> > I tried "ceph osd down [osdnumber]", but the OSD just got back "in"
> > immediately.
> > 
> > Ceph 12.2.7 on Debian
> > 
> > TIA,
> > 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
Nicolas Huillard
Associé fondateur - Directeur Technique - Dolomède

nhuill...@dolomede.fr
Fixe : +33 9 52 31 06 10
Mobile : +33 6 50 27 69 08
http://www.dolomede.fr/

https://www.observatoire-climat-energie.fr/
https://reseauactionclimat.org/planetman/
https://350.org/fr/
https://reporterre.net/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet
Hi,

cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.

But I don't see any related data on cache-tier OSD (filestore) with :

find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'


I don't see any usefull information in logs. Maybe I should increase
log level ?

Thanks,

Olivier


Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit :
> Hi Olivier,
> 
> what size does the cache tier have? You could set cache-mode to  
> forward and flush it, maybe restarting those OSDs (68, 69) helps,
> too.  
> Or there could be an issue with the cache tier, what do those logs
> say?
> 
> Regards,
> Eugen
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > Hello,
> > 
> > on a Luminous cluster, I have a PG incomplete and I can't find how
> > to
> > fix that.
> > 
> > It's an EC pool (4+2) :
> > 
> > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > 'incomplete')
> > 
> > Of course, we can't reduce min_size from 4.
> > 
> > And the full state : https://pastebin.com/zrwu5X0w
> > 
> > So, IO are blocked, we can't access thoses damaged data.
> > OSD blocks too :
> > osds 32,68,69 have stuck requests > 4194.3 sec
> > 
> > OSD 32 is the primary of this PG.
> > And OSD 68 and 69 are for cache tiering.
> > 
> > Any idea how can I fix that ?
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Glen Baars
Hello Ceph Users,

We have been using ceph-iscsi-cli for some time now with vmware and it is 
performing ok.

We would like to use the same iscsi service to store our Hyper-v VMs via 
windows clustered shared volumes. When we add the volume to windows failover 
manager we get a device is not ready error. I am assuming this is due to SCSI-3 
persistent reservations.

Has anyone managed to get ceph to serve iscsi to windows clustered shared 
volumes? If so, how?
Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible

2018-09-21 Thread Alfredo Deza
On Thu, Sep 20, 2018 at 7:04 PM solarflow99  wrote:
>
> oh, was that all it was...  git clone https://github.com/ceph/ceph-ansible/
> I installed the notario  package from EPEL, 
> python2-notario-0.0.11-2.el7.noarch  and thats the newest they have

Hey Ken, I thought the latest versions were being packaged, is there
something I've missed? The tags have changed format it seems, from
0.0.11
>
>
>
>
> On Thu, Sep 20, 2018 at 3:57 PM Alfredo Deza  wrote:
>>
>> Not sure how you installed ceph-ansible, the requirements mention a
>> version of a dependency (the notario module) which needs to be 0.0.13
>> or newer, and you seem to be using an older one.
>>
>>
>> On Thu, Sep 20, 2018 at 6:53 PM solarflow99  wrote:
>> >
>> > Hi, tying to get this to do a simple deployment, and i'm getting a strange 
>> > error, has anyone seen this?  I'm using Centos 7, rel 5   ansible 2.5.3  
>> > python version = 2.7.5
>> >
>> > I've tried with mimic luninous and even jewel, no luck at all.
>> >
>> >
>> >
>> > TASK [ceph-validate : validate provided configuration] 
>> > **
>> > task path: 
>> > /home/jzygmont/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
>> > Thursday 20 September 2018  14:05:18 -0700 (0:00:05.734)   0:00:37.439 
>> > 
>> > The full traceback is:
>> > Traceback (most recent call last):
>> >   File 
>> > "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 
>> > 138, in run
>> > res = self._execute()
>> >   File 
>> > "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 
>> > 561, in _execute
>> > result = self._handler.run(task_vars=variables)
>> >   File "/home/jzygmont/ansible/ceph-ansible/plugins/actions/validate.py", 
>> > line 43, in run
>> > notario.validate(host_vars, install_options, defined_keys=True)
>> > TypeError: validate() got an unexpected keyword argument 'defined_keys'
>> >
>> > fatal: [172.20.3.178]: FAILED! => {
>> > "msg": "Unexpected failure during module execution.",
>> > "stdout": ""
>> > }
>> >
>> > NO MORE HOSTS LEFT 
>> > **
>> >
>> > PLAY RECAP 
>> > **
>> > 172.20.3.178   : ok=25   changed=0unreachable=0failed=1
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Alexandru Cucu
Hi,

You won't be able to stop them, but if the OSDs are still running I
would just set them as out, wait for all data to be moved from them
and then it should be safe to power off the host.

---
Alex

On Fri, Sep 21, 2018 at 11:50 AM Nicolas Huillard  wrote:
>
> Hi all,
>
> One of my server crashed its root filesystem, ie. the currently open
> shell just says "command not found" for any basic command (ls, df,
> mount, dmesg, etc.)
> ACPI soft power-off won't work because it needs scripts on /...
>
> Before I reset the hardware, I'd like to cleanly stop the OSDs on this
> server (with still work because they do not need /).
> I was able to move the MGR out of that server with "ceph mgr fail
> [hostname]".
> Is it possible to tell the OSD on that host to stop, from another host?
> I tried "ceph osd down [osdnumber]", but the OSD just got back "in"
> immediately.
>
> Ceph 12.2.7 on Debian
>
> TIA,
>
> --
> Nicolas Huillard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Eugen Block

Hi Olivier,

what size does the cache tier have? You could set cache-mode to  
forward and flush it, maybe restarting those OSDs (68, 69) helps, too.  
Or there could be an issue with the cache tier, what do those logs say?


Regards,
Eugen


Zitat von Olivier Bonvalet :


Hello,

on a Luminous cluster, I have a PG incomplete and I can't find how to
fix that.

It's an EC pool (4+2) :

pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
'incomplete')

Of course, we can't reduce min_size from 4.

And the full state : https://pastebin.com/zrwu5X0w

So, IO are blocked, we can't access thoses damaged data.
OSD blocks too :
osds 32,68,69 have stuck requests > 4194.3 sec

OSD 32 is the primary of this PG.
And OSD 68 and 69 are for cache tiering.

Any idea how can I fix that ?

Thanks,

Olivier


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Patrick Nawracay
Hi,

you'll need to set `noup` to prevent OSDs from being started
automatically. The `noin` flags prevents that the cluster sets the OSD
`in` again, after it has been set `out`.

    `ceph osd set noup` before `ceph osd down `

    `ceph osd set noin` before `ceph osd out `

Those global flags (they prevent all OSDs from being automatically set
up/in again), can be disabled with unset.

    `ceph osd unset `

Please note that I'm not familiar with recovery of a Ceph cluster, I'm
just trying to answer the question, but don't know if that's the best
approach in this case.

Patrick


On 21.09.2018 10:49, Nicolas Huillard wrote:
> Hi all,
>
> One of my server crashed its root filesystem, ie. the currently open
> shell just says "command not found" for any basic command (ls, df,
> mount, dmesg, etc.)
> ACPI soft power-off won't work because it needs scripts on /...
>
> Before I reset the hardware, I'd like to cleanly stop the OSDs on this
> server (with still work because they do not need /).
> I was able to move the MGR out of that server with "ceph mgr fail
> [hostname]".
> Is it possible to tell the OSD on that host to stop, from another host?
> I tried "ceph osd down [osdnumber]", but the OSD just got back "in"
> immediately.
>
> Ceph 12.2.7 on Debian
>
> TIA,
>

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Nicolas Huillard
Hi all,

One of my server crashed its root filesystem, ie. the currently open
shell just says "command not found" for any basic command (ls, df,
mount, dmesg, etc.)
ACPI soft power-off won't work because it needs scripts on /...

Before I reset the hardware, I'd like to cleanly stop the OSDs on this
server (with still work because they do not need /).
I was able to move the MGR out of that server with "ceph mgr fail
[hostname]".
Is it possible to tell the OSD on that host to stop, from another host?
I tried "ceph osd down [osdnumber]", but the OSD just got back "in"
immediately.

Ceph 12.2.7 on Debian

TIA,

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how dynamic bucket sharding works

2018-09-21 Thread Tomasz Płaza

Hi Cephers,

Could someone explain me how dynamic bucket index sharding works?
I have created a test bucket with 4 million objects on ceph 12.2.8 and 
it showed 80 shards (ver, master_ver, max_marker fomr 0 to 79 in bucket 
stats) and leave it for a night. Next day in the morning I found this in 
reshard list:

  "time": "2018-09-21 06:15:12.094792Z",
  "tenant": "",
  "bucket_name": "test",
  "bucket_id": "_id_.7827818.1",
  "new_instance_id": "test:_id_.25481437.10",
  "old_num_shards": 8,
  "new_num_shards": 16
During this reshard bucket stats showed 16 shards (counting ver, 
master_ver, max_marker from bucket stats on marker _id_.7827818.1).
After deleting and re adding 2 objects reshard kicked in once more, this 
time from 16 to 80 shards.


Actual bucket stats are:
{
    "bucket": "test",
    "zonegroup": "84d584b4-3e95-49f8-8285-4a704f8252e3",
    "placement_rule": "default-placement",
    "explicit_placement": {
    "data_pool": "",
    "data_extra_pool": "",
    "index_pool": ""
    },
    "id": "_id_.25481803.6",
    "marker": "_id_.7827818.1",
    "index_type": "Normal",
    "owner": "test",
    "ver": 
"0#789,1#785,2#787,3#782,4#790,5#798,6#784,7#784,8#782,9#791,10#788,11#785,12#786,13#792,14#783,15#783,16#786,17#776,18#787,19#783,20#784,21#785,22#786,23#782,24#787,25#794,26#786,27#789,28#794,29#781,30#785,31#779,32#780,33#776,34#790,35#775,36#780,37#781,38#779,39#782,40#778,41#776,42#774,43#781,44#779,45#785,46#778,47#779,48#783,49#778,50#784,51#779,52#780,53#782,54#781,55#779,56#789,57#783,58#774,59#780,60#779,61#782,62#780,63#775,64#783,65#783,66#781,67#785,68#777,69#785,70#781,71#782,72#778,73#778,74#778,75#777,76#783,77#775,78#790,79#792",
    "master_ver": 
"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0,64#0,65#0,66#0,67#0,68#0,69#0,70#0,71#0,72#0,73#0,74#0,75#0,76#0,77#0,78#0,79#0",

    "mtime": "2018-09-21 08:40:33.652235",
    "max_marker": 
"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,42#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#,64#,65#,66#,67#,68#,69#,70#,71#,72#,73#,74#,75#,76#,77#,78#,79#",

    "usage": {
    "rgw.none": {
    "size": 0,
    "size_actual": 0,
    "size_utilized": 0,
    "size_kb": 0,
    "size_kb_actual": 0,
    "size_kb_utilized": 0,
    "num_objects": 2
    },
    "rgw.main": {
    "size": 419286170636,
    "size_actual": 421335109632,
    "size_utilized": 0,
    "size_kb": 409459152,
    "size_kb_actual": 411460068,
    "size_kb_utilized": 0,
    "num_objects": 401
    }
    },
    "bucket_quota": {
    "enabled": false,
    "check_on_raw": false,
    "max_size": -1,
    "max_size_kb": 0,
    "max_objects": -1
    }
}

My question is: Why on earth did ceph reshard this bucket to 8 shards 
and after than to 16 shards, and than to 80 after re adding 2 objects?


Additional question: Why do we need rgw_reshard_bucket_lock_duration if 
https://ceph.com/community/new-luminous-rgw-dynamic-bucket-sharding/ 
states: "...Furthermore, there is no need to stop IO operations that go 
to the bucket (although some concurrent operations may experience 
additional latency when resharding is in progress)..." From My 
experience it blocks write completely, only read works.


--

Thanks
Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backup ceph

2018-09-21 Thread Oliver Freyermuth
Hi,

Am 21.09.18 um 03:28 schrieb ST Wong (ITSC):
> Hi,
> 
>>> Will the RAID 6 be mirrored to another storage in remote site for DR 
>>> purpose?
>>
>> Not yet. Our goal is to have the backup ceph to which we will replicate 
>> spread across three different buildings, with 3 replicas.
> 
> May I ask if the backup ceph is a single ceph cluster span across 3 different 
> buildings, or compose of 3 ceph clusters in 3 different buildings?   Thanks.
> 

This will be a single ceph cluster with a failure domain corresponding to the 
building and three replicas. 
To test updates before rolling them out to the full cluster, we will also 
instantiate a small test cluster separately,
but we try to keep the number of production clusters down and rather let Ceph 
handle failover and replication than doing that ourselves,
which also allows to grow / shrink the cluster more easily as needed ;-). 

All the best,
Oliver

> Thanks again for your help.
> Best Regards,
> /ST Wong
> 
> -Original Message-
> From: Oliver Freyermuth  
> Sent: Thursday, September 20, 2018 2:10 AM
> To: ST Wong (ITSC) 
> Cc: Peter Wienemann ; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] backup ceph
> 
> Hi,
> 
> Am 19.09.18 um 18:32 schrieb ST Wong (ITSC):
>> Thanks for your help.
> 
> You're welcome! 
> I should also add we don't have very long-term experience with this yet - 
> Benji is pretty modern. 
> 
>>> For the moment, we use Benji to backup to a classic RAID 6.
>> Will the RAID 6 be mirrored to another storage in remote site for DR purpose?
> 
> Not yet. Our goal is to have the backup ceph to which we will replicate 
> spread across three different buildings, with 3 replicas. 
> 
>>
>>> For RBD mirroring, you do indeed need another running Ceph Cluster, but we 
>>> plan to use that in the long run (on separate hardware of course).
>> Seems this is the way to go, regardless of additional resources required? :)
>> Btw, RBD mirroring looks like a DR copy instead of a daily backup from which 
>> we can restore image of particular date ?
> 
> We would still perform daily snapshots, and keep those both in the RBD mirror 
> and in the Benji backup. Even when fading out the current RAID 6 machine at 
> some point,
> we'd probably keep Benji and direct it's output to a CephFS pool on our 
> backup Ceph cluster. If anything goes wrong with the mirroring, this still 
> leaves us
> with an independent backup approach. We also keep several days of snapshots 
> in the production RBD pool to be able to quickly roll back a VM if anything 
> goes wrong. 
> With Benji, you can also mount any of these daily snapshots via NBD in case 
> it is needed, or restore from a specific date. 
> 
> All the best,
>   Oliver
> 
>>
>> Thanks again.
>> /st wong
>>
>> -Original Message-
>> From: Oliver Freyermuth  
>> Sent: Wednesday, September 19, 2018 5:28 PM
>> To: ST Wong (ITSC) 
>> Cc: Peter Wienemann ; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] backup ceph
>>
>> Hi,
>>
>> Am 19.09.18 um 03:24 schrieb ST Wong (ITSC):
>>> Hi,
>>>
>>> Thanks for your information.
>>> May I know more about the backup destination to use?  As the size of the 
>>> cluster will be a bit large (~70TB to start with), we're looking for some 
>>> efficient method to do that backup.   Seems RBD mirroring or incremental 
>>> snapshot s with RBD 
>>> (https://ceph.com/geen-categorie/incremental-snapshots-with-rbd/) are some 
>>> ways to go, but requires another running Ceph cluster.  Is my understanding 
>>> correct?Thanks.
>>
>> For the moment, we use Benji to backup to a classic RAID 6. With Benji, only 
>> the changed chunks are backed up, and it learns that by asking Ceph for a 
>> diff of the RBD snapshots. 
>> So that's really fast after the first backup, and especially if you do 
>> trimming (e.g. via guest agent if you run VMs) of the RBD volumes before 
>> backing them up. 
>> The same is true for Backy2, but it does not support compression (which 
>> really helps by several factors(!) in saving I/O and with zstd it does not 
>> use much CPU). 
>>
>> For RBD mirroring, you do indeed need another running Ceph Cluster, but we 
>> plan to use that in the long run (on separate hardware of course). 
>>
>>> Btw, is this one (https://benji-backup.me/) Benji you'r referring to ?  
>>> Thanks a lot.
>>
>> Exactly :-). 
>>
>> Cheers,
>>  Oliver
>>
>>>
>>>
>>>
>>> Cheers,
>>> /ST Wong
>>>
>>>
>>>
>>> -Original Message-
>>> From: Oliver Freyermuth  
>>> Sent: Tuesday, September 18, 2018 6:09 PM
>>> To: ST Wong (ITSC) 
>>> Cc: Peter Wienemann 
>>> Subject: Re: [ceph-users] backup ceph
>>>
>>> Hi,
>>>
>>> we're also just starting to collect experiences, so we have nothing to 
>>> share (yet). However, we are evaluating using Benji (a well-maintained fork 
>>> of Backy2 which can also compress) in addition, trimming and fsfreezing the 
>>> VM disks shortly before,
>>> and additionally keeping a few daily and weekly snapshots. 
>>> We may 

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread mj

Hi Hervé!

Thanks for the detailed summary, much appreciated!

Best,
MJ


On 09/21/2018 09:03 AM, Hervé Ballans wrote:

Hi MJ (and all),

So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the 
operation in a few words : overall, everything went well :)
The most critical operation of all is the 'osd crush tunables optimal', 
I talk about it in more detail after...


The Proxmox documentation is really well written and accurate and, 
normally, following the documentation step by step is almost sufficient !


* first step : upgrade Ceph Jewel to Luminous : 
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous

(Note here : OSDs remain in FileStore backend, no BlueStore migration)

* second step : upgrade Proxmox version 4 to 5 : 
https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0


Just some numbers, observations and tips (based on our feedback, I'm not 
an expert !) :


* Before migration, make sure you are in the lastest version of Proxmox 
4 (4.4-24) and Ceph Jewel (10.2.11)


* We don't use the pve repository for ceph packages but the official one 
(download.ceph.com). Thus, during the upgrade of Promox PVE, we don't 
replace ceph.com repository with promox.com Ceph repository...


* When you upgrade Ceph to Luminous (without tunables optimal), there is 
no impact on Proxmox 4. VMs are still running normally.
The side effect (non blocking for the functionning of VMs) is located in 
the GUI, on the Ceph menu : it can't report the status of the ceph 
cluster as it has a JSON formatting error (indeed the output of the 
command 'ceph -s' is completely different, really more readable on Luminous)


* It misses a little step in section 8 "Create Manager instances" of the 
upgrade ceph documentation. As the Ceph manager daemon is new since 
Luminous, the package doesn't exist on Jewel. So you have to install the 
ceph-mgr package on each node first before doing 'pveceph createmgr'|||

|

* The 'osd crush tunables optimal' operation is time consuming ! in our 
case : 5 nodes (PE R730xd), 58 OSDs, replicated (3/2) rbd pool with 2048 
pgs and 2 millions objects, 22 TB used. The tunables operation took a 
little more than 24 hours !


* Really take the right time to make the 'tunables optimal' !

We encountered some pgs stuck and blocked requests during this 
operation. In our case, the involved OSDs were those with a high numbers 
of pgs (as they are high capacity disks).
The consequences can be critical since it can freeze some VMs (I guess 
those that replicas are stored on the stuck pgs ?).

The stuck state were corrected by rebooting the involved OSDs.
If you can move the disks of your critical VMs on another storage, so 
these VMs should not be impacted by the recovery (we moved some disks on 
another Ceph cluster and keep the conf in the Proxmox cluster being 
updated and there was no impact)


Otherwise :
- verify that all your VMs are recently backuped on an external storage 
(in case of Disaster recovery Plan !)
- if you can, stop all your non-critical VMs (in order to limit client 
io operations)
- if any, wait for the end of current backups then disable datacenter 
backup (in order to limit client io operations). !! do not forget to 
re-enable it when all is over !!
- if any and if no longer needed, delete your snapshots, it removes many 
useless objects !
- start the tunables operation outside of major activity periods (night, 
week-end, ??) and take into account that it can be very slow...


There are probably some options to configure in ceph to avoid 'pgs 
stuck' states, but on our side, as we previously moved our critical VM's 
disks, we didn't care about that !


* Anyway, the upgrade step of Proxmox PVE is done easily and quickly 
(just follow the documentation). Note that you can upgrade Proxmox PVE 
before doing the 'tunables optimal' operation.


Hoping that you will find this information useful, good luck with your 
very next migration !


Hervé

Le 13/09/2018 à 22:04, mj a écrit :

Hi Hervé,

No answer from me, but just to say that I have exactly the same 
upgrade path ahead of me. :-)


Please report here any tips, trics, or things you encountered doing 
the upgrades. It could potentially save us a lot of time. :-)


Thanks!

MJ

On 09/13/2018 05:23 PM, Hervé Ballans wrote:

Dear list,

I am currently in the process of upgrading Proxmox 4/Jewel to 
Proxmox5/Luminous.


I also have a new node to add to my Proxmox cluster.

What I plan to do is the following (from 
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous):


* upgrade Jewel to Luminous

* let the "ceph osd crush tunables optimal " command run

* upgrade my proxmox to v5

* add the new node (already up to date in v5)

* add the new OSDs

* let ceph rebalance the lot


A couple of questions I have :

* would it be a good idea to add the new node+OSDs and run the 
"tunables optimal" command immediately after, which would maybe gain 
a little time and avoid two successive pg rebalancing ?


* did I miss anything 

Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread Hervé Ballans

Hi MJ (and all),

So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the 
operation in a few words : overall, everything went well :)
The most critical operation of all is the 'osd crush tunables optimal', 
I talk about it in more detail after...


The Proxmox documentation is really well written and accurate and, 
normally, following the documentation step by step is almost sufficient !


* first step : upgrade Ceph Jewel to Luminous : 
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous

(Note here : OSDs remain in FileStore backend, no BlueStore migration)

* second step : upgrade Proxmox version 4 to 5 : 
https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0


Just some numbers, observations and tips (based on our feedback, I'm not 
an expert !) :


* Before migration, make sure you are in the lastest version of Proxmox 
4 (4.4-24) and Ceph Jewel (10.2.11)


* We don't use the pve repository for ceph packages but the official one 
(download.ceph.com). Thus, during the upgrade of Promox PVE, we don't 
replace ceph.com repository with promox.com Ceph repository...


* When you upgrade Ceph to Luminous (without tunables optimal), there is 
no impact on Proxmox 4. VMs are still running normally.
The side effect (non blocking for the functionning of VMs) is located in 
the GUI, on the Ceph menu : it can't report the status of the ceph 
cluster as it has a JSON formatting error (indeed the output of the 
command 'ceph -s' is completely different, really more readable on Luminous)


* It misses a little step in section 8 "Create Manager instances" of the 
upgrade ceph documentation. As the Ceph manager daemon is new since 
Luminous, the package doesn't exist on Jewel. So you have to install the 
ceph-mgr package on each node first before doing 'pveceph createmgr'|||

|

* The 'osd crush tunables optimal' operation is time consuming ! in our 
case : 5 nodes (PE R730xd), 58 OSDs, replicated (3/2) rbd pool with 2048 
pgs and 2 millions objects, 22 TB used. The tunables operation took a 
little more than 24 hours !


* Really take the right time to make the 'tunables optimal' !

We encountered some pgs stuck and blocked requests during this 
operation. In our case, the involved OSDs were those with a high numbers 
of pgs (as they are high capacity disks).
The consequences can be critical since it can freeze some VMs (I guess 
those that replicas are stored on the stuck pgs ?).

The stuck state were corrected by rebooting the involved OSDs.
If you can move the disks of your critical VMs on another storage, so 
these VMs should not be impacted by the recovery (we moved some disks on 
another Ceph cluster and keep the conf in the Proxmox cluster being 
updated and there was no impact)


Otherwise :
- verify that all your VMs are recently backuped on an external storage 
(in case of Disaster recovery Plan !)
- if you can, stop all your non-critical VMs (in order to limit client 
io operations)
- if any, wait for the end of current backups then disable datacenter 
backup (in order to limit client io operations). !! do not forget to 
re-enable it when all is over !!
- if any and if no longer needed, delete your snapshots, it removes many 
useless objects !
- start the tunables operation outside of major activity periods (night, 
week-end, ??) and take into account that it can be very slow...


There are probably some options to configure in ceph to avoid 'pgs 
stuck' states, but on our side, as we previously moved our critical VM's 
disks, we didn't care about that !


* Anyway, the upgrade step of Proxmox PVE is done easily and quickly 
(just follow the documentation). Note that you can upgrade Proxmox PVE 
before doing the 'tunables optimal' operation.


Hoping that you will find this information useful, good luck with your 
very next migration !


Hervé

Le 13/09/2018 à 22:04, mj a écrit :

Hi Hervé,

No answer from me, but just to say that I have exactly the same 
upgrade path ahead of me. :-)


Please report here any tips, trics, or things you encountered doing 
the upgrades. It could potentially save us a lot of time. :-)


Thanks!

MJ

On 09/13/2018 05:23 PM, Hervé Ballans wrote:

Dear list,

I am currently in the process of upgrading Proxmox 4/Jewel to 
Proxmox5/Luminous.


I also have a new node to add to my Proxmox cluster.

What I plan to do is the following (from 
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous):


* upgrade Jewel to Luminous

* let the "ceph osd crush tunables optimal " command run

* upgrade my proxmox to v5

* add the new node (already up to date in v5)

* add the new OSDs

* let ceph rebalance the lot


A couple of questions I have :

* would it be a good idea to add the new node+OSDs and run the 
"tunables optimal" command immediately after, which would maybe gain 
a little time and avoid two successive pg rebalancing ?


* did I miss anything in this plan?


Regards,
Hervé



___
ceph-users mailing list