[ceph-users] Pool Max Avail and Ceph Dashboard Pool Useage on Nautilus giving different percentages

2019-12-10 Thread David Majchrzak, ODERLAND Webbhotell AB
Hi!

While browsing /#/pool in nautilus ceph dashboard I noticed it said 93%
used on the single pool we have (3x replica).

ceph df detal however shows 81% used on the pool and 67% raw useage.

# ceph df detail
RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW
USED 
ssd   478 TiB 153 TiB 324 TiB  325
TiB 67.96 
TOTAL 478 TiB 153 TiB 324 TiB  325
TiB 67.96 
 
POOLS:
POOLID STORED  OBJECTS USED%USED   
  MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY  USED
COMPR UNDER COMPR 
echo  3 108 TiB  29.49M 324 TiB 81.6124
TiB N/A   N/A 29.49M0
B 0 B


I know we're looking at the most full OSD (210PGs, 79% used, 1.17 VAR)
and count max avail from that. But where's the 93% full from in
dashboard?

My guess is that is comes from calculating: 

1 - Max Avail / (Used + Max avail) = 0.93


Kind Regards,

David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tuning Nautilus for flash only

2019-11-28 Thread David Majchrzak, ODERLAND Webbhotell AB
Paul,

Absolutely, I said I was looking at those settings and most didn't make
any sense to me in a production environment (we've been running ceph
since Dumpling).

However we only have 1 cluster on Bluestore and I wanted to get some
opinions if anything other than the defaults in ceph.conf or sysctl or
things like Wido suggested with c-states would make any differences.
(Thank you Wido!)

Yes, running benchmarks is great, and we're already doing that
ourselves.

Cheers and have a nice evening!

-- 
David Majchrzak


On tor, 2019-11-28 at 17:46 +0100, Paul Emmerich wrote:
> Please don't run this config in production.
> Disabling checksumming is a bad idea, disabling authentication is
> also
> pretty bad.
> 
> There are also a few options in there that no longer exist (osd op
> threads) or are no longer relevant (max open files), in general, you
> should not blindly copy config files you find on the Internet. Only
> set an option to its non-default value after carefully checking what
> it does and whether it applies to your use case.
> 
> Also, run benchmarks yourself. Use benchmarks that are relevant to
> your use case.
> 
> Paul
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Tuning Nautilus for flash only

2019-11-28 Thread David Majchrzak, ODERLAND Webbhotell AB
Hi!

We've deployed a new flash only ceph cluster running Nautilus and I'm
currently looking at any tunables we should set to get the most out of
our NVMe SSDs.

I've been looking a bit at the options from the blog post here:

https://ceph.io/community/bluestore-default-vs-tuned-performance-comparison/

with the conf here:
https://gist.github.com/likid0/1b52631ff5d0d649a22a3f30106ccea7

However some of them, like checksumming, is for testing speed only but
not really applicable in a real life scenario with critical data.

Should we stick with defaults or is there anything that could help?

We have 256GB of RAM on each OSD host, 8 OSD hosts with 10 SSDs on
each. 2 osd daemons on each SSD. Raise ssd bluestore cache to 8GB?

Workload is about 50/50 r/w ops running qemu VMs through librbd. So
mixed block size.

3 replicas.

Appreciate any advice!

Kind Regards,
-- 
David Majchrzak


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] eu.ceph.com mirror out of sync?

2019-09-23 Thread David Majchrzak, ODERLAND Webbhotell AB
Hi,

I'll have a look at the status of se.ceph.com tomorrow morning, it's
maintained by us.

Kind Regards,

David


On mån, 2019-09-23 at 22:41 +0200, Oliver Freyermuth wrote:
> Hi together,
> 
> the EU mirror still seems to be out-of-sync - does somebody on this
> list happen to know whom to contact about this?
> Or is this mirror unmaintained and we should switch to something
> else?
> 
> Going through the list of appropriate mirrors from 
> https://docs.ceph.com/docs/master/install/mirrors/ (we are in
> Germany) I also find:
>http://de.ceph.com/
> (the mirror in Germany) to be non-resolvable.
> 
> Closest by then for us is possibly France:
>http://fr.ceph.com/rpm-nautilus/el7/x86_64/
> but also here, there's only 14.2.2, so that's also out-of-sync.
> 
> So in the EU, at least geographically, this only leaves Sweden and
> UK.
> Sweden at se.ceph.com does not load for me, but UK indeed seems fine.
> 
> Should people in the EU use that mirror, or should we all just use
> download.ceph.com instead of something geographically close-by?
> 
> Cheers,
>   Oliver
> 
> 
> On 2019-09-17 23:01, Oliver Freyermuth wrote:
> > Dear Cephalopodians,
> > 
> > I realized just now that:
> >https://eu.ceph.com/rpm-nautilus/el7/x86_64/
> > still holds only released up to 14.2.2, and nothing is to be seen
> > of 14.2.3 or 14.2.4,
> > while the main repository at:
> >https://download.ceph.com/rpm-nautilus/el7/x86_64/
> > looks as expected.
> > 
> > Is this issue with the eu.ceph.com mirror already knwon?
> > 
> > Cheers,
> >  Oliver
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Testing a hypothetical crush map

2018-08-06 Thread David Majchrzak
Hi Andras,

From what I can tell you can run crushtool with --test
http://docs.ceph.com/docs/master/man/8/crushtool/ 
(https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/0?redirect=http%3A%2F%2Fdocs.ceph.com%2Fdocs%2Fmaster%2Fman%2F8%2Fcrushtool%2F&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
http://cephnotes.ksperis.com/blog/2015/02/02/crushmap-example-of-a-hierarchical-cluster-map
 
(https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/1?redirect=http%3A%2F%2Fcephnotes.ksperis.com%2Fblog%2F2015%2F02%2F02%2Fcrushmap-example-of-a-hierarchical-cluster-map&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
David Majchrzak
CTO
ODERLAND Webbhotell AB
E // da...@oderland.se 
(https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/2?redirect=mailto%3Adavid%40oderland.se&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
P // +46.313616161 (tel:+46.313616161)
A // Östra Hamngatan 50B, 411 09 Göteborg 
(https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/3?redirect=https%3A%2F%2Fmaps.google.com%2F%3Fq%3D%25C3%2596stra%2520Hamngatan%252050B&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)
W // https://www.oderland.se 
(https://link.getmailspring.com/link/1533557996.local-a293f1fe-4d41-v1.3.0-fd741...@getmailspring.com/4?redirect=https%3A%2F%2Fwww.oderland.se&recipient=Y2VwaC11c2Vyc0BsaXN0cy5jZXBoLmNvbQ%3D%3D)

On aug 6 2018, at 1:56 pm, Andras Pataki  wrote:
>
> Hi cephers,
> Is there a way to see what a crush map change does to the PG mappings
> (i.e. what placement groups end up on what OSDs) without actually
> setting the crush map (and have the map take effect)? I'm looking for
> some way I could test hypothetical crush map changes without any effect
> on the running system.
>
> Andras
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error: journal specified but not allowed by osd backend

2018-08-03 Thread David Majchrzak
Thanks Eugen!
I was looking into running all the commands manually, following the docs for 
add/remove osd but tried ceph-disk first.

I actually made it work by changing the id part in ceph-disk ( it was checking 
the wrong journal device, which was owned by root:root ). The next part was 
that I tried re-using an old journal, so I had to create a new one ( parted / 
sgdisk to set ceph-journal parttype). Could I have just zapped the previous 
journal?
After that it prepared successfully and starting peering. Unsetting nobackfill 
let it recover a 4TB HDD in approx 9 hours.
The best part was that I didn't have to backfill twice then, by reusing the osd 
uuid.
I'll see if I can add to the docs after we have updated to Luminous or Mimic 
and started using ceph-volume.

Kind Regards
David Majchrzak

On aug 3 2018, at 4:16 pm, Eugen Block  wrote:
>
> Hi,
> we have a full bluestore cluster and had to deal with read errors on
> the SSD for the block.db. Something like this helped us to recreate a
> pre-existing OSD without rebalancing, just refilling the PGs. I would
> zap the journal device and let it recreate. It's very similar to your
> ceph-deploy output, but maybe you get more of it if you run it manually:
>
> ceph-osd [--cluster-uuid ] [--osd-objectstore filestore]
> --mkfs -i  --osd-journal  --osd-data
> /var/lib/ceph/osd/ceph-/ --mkjournal --setuser ceph --setgroup
> ceph --osd-uuid 
>
> Maybe after zapping the journal this will work. At least it would rule
> out the old journal as the show-stopper.
>
> Regards,
> Eugen
>
>
> Zitat von David Majchrzak :
> > Hi!
> > Trying to replace an OSD on a Jewel cluster (filestore data on HDD +
> > journal device on SSD).
> > I've set noout and removed the flapping drive (read errors) and
> > replaced it with a new one.
> >
> > I've taken down the osd UUID to be able to prepare the new disk with
> > the same osd.ID. The journal device is the same as the previous one
> > (should I delete the partition and recreate it?)
> > However, running ceph-disk prepare returns:
> > # ceph-disk -v prepare --cluster-uuid
> > c51a2683-55dc-4634-9d9d-f0fec9a6f389 --osd-uuid
> > dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev
> > --fs-type xfs /dev/sdo /dev/sda8
> > command: Running command: /usr/bin/ceph-osd --check-allows-journal
> > -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > command: Running command: /usr/bin/ceph-osd --check-wants-journal -i
> > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > command: Running command: /usr/bin/ceph-osd --check-needs-journal -i
> > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > Traceback (most recent call last):
> > File "/usr/sbin/ceph-disk", line 9, in 
> > load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run
> > main(sys.argv[1:])
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in 
> > main
> > args.func(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in 
> > main
> > Prepare.factory(args).prepare()
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 1896, in factory
> > return PrepareFilestore(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 1909, in __init__
> > self.journal = PrepareJournal(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 2221, in __init__
> > raise Error('journal specified but not allowed by osd backend')
> > ceph_disk.main.Error: Error: journal specified but not allowed by osd 
> > backend
> >
> > I tried googling first of course. It COULD be that we have set
> > setuser_match_path globally in ceph.conf (like this bug report:
> > https://tracker.ceph.com/issues/19642) since the cluster was created
> > as dumpling a long time ago.
> > Best practice to fix it? Create [osd.X] configs and set
> > setuser_match_path in there instead for the old OSDs?
> > Should I do any other steps preceding this if I want to use the same
> > osd UUID? I've only stopped ceph-osd@21, removed the physical disk,
> > inserted new one and tried running prepare.
> > Kind Regards,
> > David
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error: journal specified but not allowed by osd backend

2018-08-02 Thread David Majchrzak
Hm. You are right. Seems ceph-osd uses id 0 in main.py.
I'll have a look in my dev cluster and see if it helps things.

/usr/lib/python2.7/dist-packages/ceph_disk/main.py
def check_journal_reqs(args):
_, _, allows_journal = command([
'ceph-osd', '--check-allows-journal',
'-i', '0',
'--log-file', '$run_dir/$cluster-osd-check.log',
'--cluster', args.cluster,
'--setuser', get_ceph_user(),
'--setgroup', get_ceph_group(),
])
_, _, wants_journal = command([
'ceph-osd', '--check-wants-journal',
'-i', '0',
'--log-file', '$run_dir/$cluster-osd-check.log',
'--cluster', args.cluster,
'--setuser', get_ceph_user(),
'--setgroup', get_ceph_group(),
])
_, _, needs_journal = command([
'ceph-osd', '--check-needs-journal',
'-i', '0',
'--log-file', '$run_dir/$cluster-osd-check.log',
'--cluster', args.cluster,
'--setuser', get_ceph_user(),
'--setgroup', get_ceph_group(),
])
return (not allows_journal, not wants_journal, not needs_journal)

# ceph-osd --help
usage: ceph-osd -i 
--osd-data PATH data directory
--osd-journal PATH
journal file or block device
--mkfs create a [new] data directory
--convert-filestore
run any pending upgrade operations
--flush-journal flush all data out of journal
--mkjournal initialize a new journal
--check-wants-journal
check whether a journal is desired
--check-allows-journal
check whether a journal is allowed
--check-needs-journal
check whether a journal is required
--debug_osd  set debug level (e.g. 10)
--get-device-fsid PATH
get OSD fsid for the given block device

--conf/-c FILE read configuration from the given configuration file
--id/-i ID set ID portion of my name
--name/-n TYPE.ID set name
--cluster NAME set cluster name (default: ceph)
--setuser USER set uid to user or uid (and gid to user's gid)
--setgroup GROUP set gid to group or gid
--version show version and quit

-d run in foreground, log to stderr.
-f run in foreground, log to usual location.
--debug_ms N set message debug level (e.g. 1)

On aug 2 2018, at 11:57 am, Konstantin Shalygin  wrote:
>
> > ceph_disk.main.Error: Error: journal specified but not allowed by osd 
> > backend
>
> I faced this issue once before.
> The problem is - function is query for osd.0 instead your osd.21.
> Change in main.py
>
> '-i', '0',
> to 21 (your osd number)
> '-i', '21',
> and try again.
>
>
>
> k___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error: journal specified but not allowed by osd backend

2018-08-01 Thread David Majchrzak

Hi!
Trying to replace an OSD on a Jewel cluster (filestore data on HDD + journal 
device on SSD).
I've set noout and removed the flapping drive (read errors) and replaced it 
with a new one.

I've taken down the osd UUID to be able to prepare the new disk with the same 
osd.ID. The journal device is the same as the previous one (should I delete the 
partition and recreate it?)
However, running ceph-disk prepare returns:
# ceph-disk -v prepare --cluster-uuid c51a2683-55dc-4634-9d9d-f0fec9a6f389 
--osd-uuid dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev 
--fs-type xfs /dev/sdo /dev/sda8
command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 
--log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph 
--setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 
--log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph 
--setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 
--log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph 
--setgroup ceph
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in 
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run
main(sys.argv[1:])
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in main
args.func(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in main
Prepare.factory(args).prepare()
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1896, in factory
return PrepareFilestore(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1909, in 
__init__
self.journal = PrepareJournal(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2221, in 
__init__
raise Error('journal specified but not allowed by osd backend')
ceph_disk.main.Error: Error: journal specified but not allowed by osd backend

I tried googling first of course. It COULD be that we have set 
setuser_match_path globally in ceph.conf (like this bug report: 
https://tracker.ceph.com/issues/19642) since the cluster was created as 
dumpling a long time ago.
Best practice to fix it? Create [osd.X] configs and set setuser_match_path in 
there instead for the old OSDs?
Should I do any other steps preceding this if I want to use the same osd UUID? 
I've only stopped ceph-osd@21, removed the physical disk, inserted new one and 
tried running prepare.
Kind Regards,
David

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread David Majchrzak
Hi/Hej Magnus,

We had a similar issue going from latest hammer to jewel (so might not be 
applicable for you), with PGs stuck peering / data misplaced, right after 
updating all mons to latest jewel at that time 10.2.10.
Finally setting the require_jewel_osds put everything back in place ( we were 
going to do this after restarting all OSDs, following the docs/changelogs ).
What does your ceph health detail look like?
Did you perform any other commands after starting your mon upgrade? Any 
commands that might change the crush-map might cause issues AFAIK (correct me 
if im wrong, but i think we ran into this once) if your mons and osds are 
different versions.
// david
On jul 12 2018, at 11:45 am, Magnus Grönlund  wrote:
>
> Hi list,
>
> Things went from bad to worse, tried to upgrade some OSDs to Luminous to see 
> if that could help but that didn’t appear to make any difference.
> But for each restarted OSD there was a few PGs that the OSD seemed to 
> “forget” and the number of undersized PGs grew until some PGs had been 
> “forgotten” by all 3 acting OSDs and became stale, even though all OSDs (and 
> their disks) where available.
> Then the OSDs grew so big that the servers ran out of memory (48GB per server 
> with 10 2TB-disks per server) and started killing the OSDs…
> All OSDs where then shutdown to try and preserve some data on the disks at 
> least, but maybe it is too late?
>
> /Magnus
>
> 2018-07-11 21:10 GMT+02:00 Magnus Grönlund  (mailto:mag...@gronlund.se)>:
> > Hi Paul,
> >
> > No all OSDs are still jewel , the issue started before I had even started 
> > to upgrade the first OSD and they don't appear to be flapping.
> > ceph -w shows a lot of slow request etc, but nothing unexpected as far as I 
> > can tell considering the state the cluster is in.
> >
> > 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included 
> > below; oldest blocked for > 25402.278824 secs
> > 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds 
> > old, received at 2018-07-11 20:08:08.439214: 
> > osd_op(client.73540057.0:8289463 2.e57b3e32 (undecoded) 
> > ack+ondisk+retry+write+known_if_redirected e160294) currently waiting for 
> > peered
> > 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds 
> > old, received at 2018-07-11 20:08:09.348446: 
> > osd_op(client.671628641.0:998704 2.42f88232 (undecoded) 
> > ack+ondisk+retry+write+known_if_redirected e160475) currently waiting for 
> > peered
> > 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included 
> > below; oldest blocked for > 25403.279204 secs
> > 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds 
> > old, received at 2018-07-11 20:08:10.353060: 
> > osd_op(client.231731103.0:1007729 3.e0ff5786 (undecoded) 
> > ondisk+write+known_if_redirected e137428) currently waiting for peered
> > 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds 
> > old, received at 2018-07-11 20:08:10.362819: 
> > osd_op(client.207458703.0:2000292 3.a8143b86 (undecoded) 
> > ondisk+write+known_if_redirected e137428) currently waiting for peered
> > 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, 1142 
> > peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551 
> > active+clean, 2 activating+undersized+degraded+remapped, 15 
> > active+remapped+backfilling, 178 unknown, 1 active+remapped, 3 
> > activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait, 
> > 6 active+recovery_wait+degraded+remapped, 3 
> > undersized+degraded+remapped+backfill_wait+peered, 5 
> > active+undersized+degraded+remapped+backfilling, 295 
> > active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded, 
> > 21 activating+undersized+degraded, 559 active+undersized+degraded, 4 
> > remapped, 17 undersized+degraded+peered, 1 
> > active+recovery_wait+undersized+degraded+remapped; 13439 GB data, 42395 GB 
> > used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s wr, 5 op/s; 
> > 534753/10756032 objects degraded (4.972%); 779027/10756032 objects 
> > misplaced (7.243%); 256 MB/s, 65 objects/s recovering
> >
> >
> >
> >
> > There are a lot of things in the OSD-log files that I'm unfamiliar with but 
> > so far I haven't found anything that has given me a clue on how to fix the 
> > issue.
> > BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes 
> > results in PGs beeing stuck undersized!
> > I have attaced a osd-log from when a OSD i restarted started up.
> >
> > Best regards
> > /Magnus
> >
> >
> > 2018-07-11 20:39 GMT+02:00 Paul Emmerich  > (mailto:paul.emmer...@croit.io)>:
> > > Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is 
> > > there anything weird in the OSDs' log files?
> > >
> > >
> > >
> > > Paul
> > >
> > > 2018-07-11 20:30 GMT+02:00 Magnus Grönlund  > > (mailto:mag...@gronlund.se)>:
> > > > Hi,
> > > >
> > > > Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminou

[ceph-users] Any issues with old tunables (cluster/pool created at dumpling)?

2018-01-31 Thread David Majchrzak
Hi,

Upgrading an old cluster that was created with dumpling up to luminous soon 
(with a quick stop at jewel, currently upgrading deb7 -> deb8 so we can get any 
newer packages).

My idea is to keep the tuneables as they are, since this pool has active data 
and I've already disabled tunable warnings in ceph.conf.
Are there any "issues" running with old tunables? Disruption of service?

Kind Regards,
David Majchrzak
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread David Majchrzak
Works great, seemed to have alot less impact than just letting it peer all PGs 
at the same time. Used an increment of 0.05 without issue, then a ceph tell 
'osd.*' injectargs '--osd-max-backfills 2' seems to keep the HDD at around 
85-100% util, but not really affecting the clients.
Solid advice, cheers.

Kind Regards,
David Majchrzak


> 29 jan. 2018 kl. 23:14 skrev David Majchrzak :
> 
> Thanks Steve!
> 
> So the peering won't actually move any blocks around, but will make sure that 
> all PGs know what state they are in? That means that when I start increasing 
> reweight, PGs will be allocated to the disk, but won't actually recover yet. 
> However, they will be set as "degraded".
> So when all of the peering is done, I'll unset the norecover/nobackfill flags 
> and backfill will commence but will be less I/O intensive than peering and 
> backfilling at the same time?
> 
> Kind Regards,
> 
> David Majchrzak
> 
>> 29 jan. 2018 kl. 22:57 skrev Steve Taylor > <mailto:steve.tay...@storagecraft.com>>:
>> 
>> There are two concerns with setting the reweight to 1.0. The first is 
>> peering and the second is backfilling. Peering is going to block client I/O 
>> on the affected OSDs, while backfilling will only potentially slow things 
>> down.
>> 
>> I don't know what your client I/O looks like, but personally I would 
>> probably set the norecover and nobackfill flags, slowly increment your 
>> reweight value by 0.01 or whatever you deem to be appropriate for your 
>> environment, waiting for peering to complete in between each step. Also 
>> allow any resulting blocked requests to clear up before incrementing your 
>> reweight again.
>> 
>> When your reweight is all the way up to 1.0, inject osd_max_backfills to 
>> whatever you like (or don't if you're happy with it as is) and unset the 
>> norecover and nobackfill flags to let backfilling begin. If you are unable 
>> to handle the impact of backfilling with osd_max_backfills set to 1, then 
>> you need to add some new OSDs to your cluster before doing any of this. They 
>> will have to backfill too, but at least you'll have more spindles to handle 
>> it.
>> 
>> 
>> 
>> 
>> Steve Taylor | Senior Software Engineer | StorageCraft Technology 
>> Corporation <https://storagecraft.com/>
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 801.871.2799 |
>> 
>> If you are not the intended recipient of this message or received it 
>> erroneously, please notify the sender and delete it, together with any 
>> attachments, and be advised that any dissemination or copying of this 
>> message is prohibited.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote:
>>> And so I totally forgot to add df tree to the mail.
>>> Here's the interesting bit from two first nodes. where osd.11 has weight 
>>> but is reweighted to 0.
>>> 
>>> 
>>> root@osd1:~# ceph osd df tree
>>> ID WEIGHTREWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME
>>> -1 181.7-   109T 50848G 60878G 00 root default
>>> -2  36.3- 37242G 16792G 20449G 45.09 0.99 host osd1
>>>  0   3.64000  1.0  3724G  1730G  1993G 46.48 1.02 osd.0
>>>  1   3.64000  1.0  3724G  1666G  2057G 44.75 0.98 osd.1
>>>  2   3.64000  1.0  3724G  1734G  1989G 46.57 1.02 osd.2
>>>  3   3.64000  1.0  3724G  1387G  2336G 37.25 0.82 osd.3
>>>  4   3.64000  1.0  3724G  1722G  2002G 46.24 1.01 osd.4
>>>  6   3.64000  1.0  3724G  1840G  1883G 49.43 1.08 osd.6
>>>  7   3.64000  1.0  3724G  1651G  2072G 44.34 0.97 osd.7
>>>  8   3.64000  1.0  3724G  1747G  1976G 46.93 1.03 osd.8
>>>  9   3.64000  1.0  3724G  1697G  2026G 45.58 1.00 osd.9
>>>  5   3.64000  1.0  3724G  1614G  2109G 43.34 0.95 osd.5
>>> -3  36.3-  0  0  0 00 host osd2
>>> 12   3.64000  1.0  3724G  1730G  1993G 46.46 1.02 osd.12
>>> 13   3.64000  1.0  3724G  1745G  1978G 46.88 1.03 osd.13
>>> 14   3.64000  1.0  3724G  1707G  2016G 45.84 1.01 osd.14
>>> 15   3.64000  1.0  3724G  1540G  2184G 41.35 0.91 osd.15
>>> 16   3.64000  1.0  3724G  14

Re: [ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread David Majchrzak
Thanks Steve!

So the peering won't actually move any blocks around, but will make sure that 
all PGs know what state they are in? That means that when I start increasing 
reweight, PGs will be allocated to the disk, but won't actually recover yet. 
However, they will be set as "degraded".
So when all of the peering is done, I'll unset the norecover/nobackfill flags 
and backfill will commence but will be less I/O intensive than peering and 
backfilling at the same time?

Kind Regards,

David Majchrzak

> 29 jan. 2018 kl. 22:57 skrev Steve Taylor :
> 
> There are two concerns with setting the reweight to 1.0. The first is peering 
> and the second is backfilling. Peering is going to block client I/O on the 
> affected OSDs, while backfilling will only potentially slow things down.
> 
> I don't know what your client I/O looks like, but personally I would probably 
> set the norecover and nobackfill flags, slowly increment your reweight value 
> by 0.01 or whatever you deem to be appropriate for your environment, waiting 
> for peering to complete in between each step. Also allow any resulting 
> blocked requests to clear up before incrementing your reweight again.
> 
> When your reweight is all the way up to 1.0, inject osd_max_backfills to 
> whatever you like (or don't if you're happy with it as is) and unset the 
> norecover and nobackfill flags to let backfilling begin. If you are unable to 
> handle the impact of backfilling with osd_max_backfills set to 1, then you 
> need to add some new OSDs to your cluster before doing any of this. They will 
> have to backfill too, but at least you'll have more spindles to handle it.
> 
> 
> 
> 
> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 
> <https://storagecraft.com/>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
> 
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote:
>> And so I totally forgot to add df tree to the mail.
>> Here's the interesting bit from two first nodes. where osd.11 has weight but 
>> is reweighted to 0.
>> 
>> 
>> root@osd1:~# ceph osd df tree
>> ID WEIGHTREWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME
>> -1 181.7-   109T 50848G 60878G 00 root default
>> -2  36.3- 37242G 16792G 20449G 45.09 0.99 host osd1
>>  0   3.64000  1.0  3724G  1730G  1993G 46.48 1.02 osd.0
>>  1   3.64000  1.0  3724G  1666G  2057G 44.75 0.98 osd.1
>>  2   3.64000  1.0  3724G  1734G  1989G 46.57 1.02 osd.2
>>  3   3.64000  1.0  3724G  1387G  2336G 37.25 0.82 osd.3
>>  4   3.64000  1.0  3724G  1722G  2002G 46.24 1.01 osd.4
>>  6   3.64000  1.0  3724G  1840G  1883G 49.43 1.08 osd.6
>>  7   3.64000  1.0  3724G  1651G  2072G 44.34 0.97 osd.7
>>  8   3.64000  1.0  3724G  1747G  1976G 46.93 1.03 osd.8
>>  9   3.64000  1.0  3724G  1697G  2026G 45.58 1.00 osd.9
>>  5   3.64000  1.0  3724G  1614G  2109G 43.34 0.95 osd.5
>> -3  36.3-  0  0  0 00 host osd2
>> 12   3.64000  1.0  3724G  1730G  1993G 46.46 1.02 osd.12
>> 13   3.64000  1.0  3724G  1745G  1978G 46.88 1.03 osd.13
>> 14   3.64000  1.0  3724G  1707G  2016G 45.84 1.01 osd.14
>> 15   3.64000  1.0  3724G  1540G  2184G 41.35 0.91 osd.15
>> 16   3.64000  1.0  3724G  1484G  2239G 39.86 0.87 osd.16
>> 18   3.64000  1.0  3724G  1928G  1796G 51.77 1.14 osd.18
>> 20   3.64000  1.0  3724G  1767G  1956G 47.45 1.04 osd.20
>> 10   3.64000  1.0  3724G  1797G  1926G 48.27 1.06 osd.10
>> 49   3.64000  1.0  3724G  1847G  1877G 49.60 1.09 osd.49
>> 11   3.640000  0  0  0 00 osd.11
>> 
>>> 
>>> 29 jan. 2018 kl. 22:40 skrev David Majchrzak >> <mailto:da...@visions.se>>:
>>> 
>>> Hi!
>>> 
>>> Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, 
>>> debian wheezy (scheduled to upgrade once this is fixed).
>>> 
>>> I have a replaced HDD that another admin set to reweight 0 instead of 
>>> weight 0 (I can'

[ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread David Majchrzak
Hi!

Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, 
debian wheezy (scheduled to upgrade once this is fixed).

I have a replaced HDD that another admin set to reweight 0 instead of weight 0 
(I can't remember the reason).
What would be the best way to slowly backfill it? Usually I'm using weight and 
slowly growing it to max size.

I guess if I just set reweight to 1.0, it will backfill as fast as I let it, 
that is max 1 backfill / osd but it will probably disrupt client io (this being 
on hammer).

And if I set the weight on it to 0, the node will get less weight, and will 
start moving data around everywhere right?

Can I use reweight the same way as weight here, slowly increasing it up to 1.0 
by increments of say 0.01?

Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread David Majchrzak
And so I totally forgot to add df tree to the mail.
Here's the interesting bit from two first nodes. where osd.11 has weight but is 
reweighted to 0.


root@osd1:~# ceph osd df tree
ID WEIGHTREWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME
-1 181.7-   109T 50848G 60878G 00 root default
-2  36.3- 37242G 16792G 20449G 45.09 0.99 host osd1
 0   3.64000  1.0  3724G  1730G  1993G 46.48 1.02 osd.0
 1   3.64000  1.0  3724G  1666G  2057G 44.75 0.98 osd.1
 2   3.64000  1.0  3724G  1734G  1989G 46.57 1.02 osd.2
 3   3.64000  1.0  3724G  1387G  2336G 37.25 0.82 osd.3
 4   3.64000  1.0  3724G  1722G  2002G 46.24 1.01 osd.4
 6   3.64000  1.0  3724G  1840G  1883G 49.43 1.08 osd.6
 7   3.64000  1.0  3724G  1651G  2072G 44.34 0.97 osd.7
 8   3.64000  1.0  3724G  1747G  1976G 46.93 1.03 osd.8
 9   3.64000  1.0  3724G  1697G  2026G 45.58 1.00 osd.9
 5   3.64000  1.0  3724G  1614G  2109G 43.34 0.95 osd.5
-3  36.3-  0  0  0 00 host osd2
12   3.64000  1.0  3724G  1730G  1993G 46.46 1.02 osd.12
13   3.64000  1.0  3724G  1745G  1978G 46.88 1.03 osd.13
14   3.64000  1.0  3724G  1707G  2016G 45.84 1.01 osd.14
15   3.64000  1.0  3724G  1540G  2184G 41.35 0.91 osd.15
16   3.64000  1.0  3724G  1484G  2239G 39.86 0.87 osd.16
18   3.64000  1.0  3724G  1928G  1796G 51.77 1.14 osd.18
20   3.64000  1.0  3724G  1767G  1956G 47.45 1.04 osd.20
10   3.64000  1.0  3724G  1797G  1926G 48.27 1.06 osd.10
49   3.64000  1.0  3724G  1847G  1877G 49.60 1.09 osd.49
11   3.640000  0  0  0 00 osd.11

> 29 jan. 2018 kl. 22:40 skrev David Majchrzak :
> 
> Hi!
> 
> Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, 
> debian wheezy (scheduled to upgrade once this is fixed).
> 
> I have a replaced HDD that another admin set to reweight 0 instead of weight 
> 0 (I can't remember the reason).
> What would be the best way to slowly backfill it? Usually I'm using weight 
> and slowly growing it to max size.
> 
> I guess if I just set reweight to 1.0, it will backfill as fast as I let it, 
> that is max 1 backfill / osd but it will probably disrupt client io (this 
> being on hammer).
> 
> And if I set the weight on it to 0, the node will get less weight, and will 
> start moving data around everywhere right?
> 
> Can I use reweight the same way as weight here, slowly increasing it up to 
> 1.0 by increments of say 0.01?
> 
> Kind Regards,
> David Majchrzak
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread David Majchrzak
Yeah, next one will be without double rebalance, I just had alot of time on my 
hands.

Never did use kill before, however I followed the docs here. Should probably be 
updated.

http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds
 
<http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds>

Is this the tracker for the docs?
http://tracker.ceph.com/projects/ceph-website/issues?set_filter=1&tracker_id=6 
<http://tracker.ceph.com/projects/ceph-website/issues?set_filter=1&tracker_id=6>


> 26 jan. 2018 kl. 19:22 skrev Wido den Hollander :
> 
> 
> 
> On 01/26/2018 07:09 PM, David Majchrzak wrote:
>> destroy did remove the auth key, however create didnt add the auth, I had to 
>> do it manually.
>> Then I tried to start the osd.0 again and it failed because osdmap said it 
>> was destroyed.
> 
> That seems like this bug: http://tracker.ceph.com/issues/22673
> 
>> I've summed my steps below:
>> Here are my commands prior to create:
>> root@int1:~# ceph osd out 0
> 
>> <-- wait for rebalance/recover -->
>> root@int1:~# ceph osd safe-to-destroy 0
>> OSD(s) 0 are safe to destroy without reducing data durability.
> 
> Although it's a very safe route it's not required. You'll have a double 
> rebalance here.
> 
>> root@int1:~# systemctl kill ceph-osd@0
> 
> I recommend using 'stop' and not kill. The stop is a clear and graceful 
> shutdown.
> 
> As I haven't used ceph-volume before I'm not able to tell exactly why the 
> commands underneath fail.
> 
> Wido
> 
>> root@int1:~# ceph status
>>   cluster:
>> id: efad7df8-721d-43d8-8d02-449406e70b90
>> health: HEALTH_OK
>>   services:
>> mon: 3 daemons, quorum int1,int2,int3
>> mgr: int1(active), standbys: int3, int2
>> osd: 6 osds: 5 up, 5 in
>>   data:
>> pools:   2 pools, 320 pgs
>> objects: 97038 objects, 364 GB
>> usage:   1096 GB used, 1128 GB / 2224 GB avail
>> pgs: 320 active+clean
>>   io:
>> client:   289 kB/s rd, 870 kB/s wr, 46 op/s rd, 48 op/s wr
>> root@int1:~# mount | grep /var/lib/ceph/osd/ceph-0
>> /dev/sdc1 on /var/lib/ceph/osd/ceph-0 type xfs 
>> (rw,noatime,attr2,inode64,noquota)
>> root@int1:~# umount /var/lib/ceph/osd/ceph-0
>> root@int1:~# ceph-volume lvm zap /dev/sdc
>> Zapping: /dev/sdc
>> Running command: sudo wipefs --all /dev/sdc
>>  stdout: /dev/sdc: 8 bytes were erased at offset 0x0200 (gpt): 45 46 49 
>> 20 50 41 52 54
>> /dev/sdc: 8 bytes were erased at offset 0x6fc86d5e00 (gpt): 45 46 49 20 50 
>> 41 52 54
>> /dev/sdc: 2 bytes were erased at offset 0x01fe (PMBR): 55 aa
>> /dev/sdc: calling ioctl to re-read partition table: Success
>> Running command: dd if=/dev/zero of=/dev/sdc bs=1M count=10
>>  stderr: 10+0 records in
>> 10+0 records out
>> 10485760 bytes (10 MB) copied
>>  stderr: , 0.0253999 s, 413 MB/s
>> --> Zapping successful for: /dev/sdc
>> root@int1:~# ceph osd destroy 0 --yes-i-really-mean-it
>> destroyed osd.0
>> root@int1:~# ceph status
>>   cluster:
>> id: efad7df8-721d-43d8-8d02-449406e70b90
>> health: HEALTH_OK
>>   services:
>> mon: 3 daemons, quorum int1,int2,int3
>> mgr: int1(active), standbys: int3, int2
>> osd: 6 osds: 5 up, 5 in
>>   data:
>> pools:   2 pools, 320 pgs
>> objects: 97038 objects, 364 GB
>> usage:   1096 GB used, 1128 GB / 2224 GB avail
>> pgs: 320 active+clean
>>   io:
>> client:   56910 B/s rd, 1198 kB/s wr, 15 op/s rd, 48 op/s wr
>> root@int1:~# ceph-volume create --bluestore --data /dev/sdc --osd-id 0
>> usage: ceph-volume [-h] [--cluster CLUSTER] [--log-level LOG_LEVEL]
>>[--log-path LOG_PATH]
>> ceph-volume: error: unrecognized arguments: create --bluestore --data 
>> /dev/sdc --osd-id 0
>> root@int1:~# ceph-volume lvm create --bluestore --data /dev/sdc --osd-id 0
>> Running command: sudo vgcreate --force --yes 
>> ceph-efad7df8-721d-43d8-8d02-449406e70b90 /dev/sdc
>>  stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before 
>> enabling it!
>>  stdout: Physical volume "/dev/sdc" successfully created
>>  stdout: Volume group "ceph-efad7df8-721d-43d8-8d02-449406e70b90" 
>> successfully created
>> Running command: sudo lvcreate --yes -l 100%FREE -n 
>> osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 
>> ceph-efad7df8-721d-43d8-8d02-449406e70b90
>>  stderr: WARNING: lvm

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread David Majchrzak
_t::decode(ceph::buffer::list::iterator&) decode past end 
of struct encoding
 stderr: 2018-01-26 14:59:10.039925 7fd7ef951cc0 -1 
bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode 
label at offset 102: buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
of struct encoding
 stderr: 2018-01-26 14:59:10.039984 7fd7ef951cc0 -1 
bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
 stderr: 2018-01-26 14:59:11.359951 7fd7ef951cc0 -1 key 
AQA5Qmta9LERFhAAKU+AmT1Sm56nk7sWx2BATQ==
 stderr: 2018-01-26 14:59:11.888476 7fd7ef951cc0 -1 created object store 
/var/lib/ceph/osd/ceph-0/ for osd.0 fsid efad7df8-721d-43d8-8d02-449406e70b90
Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev 
/dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
 --path /var/lib/ceph/osd/ceph-0
Running command: sudo ln -snf 
/dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
 /var/lib/ceph/osd/ceph-0/block
Running command: chown -R ceph:ceph /dev/dm-4
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: sudo systemctl enable 
ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9
 stderr: Created symlink from 
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9.service
 to /lib/systemd/system/ceph-volume@.service.
Running command: sudo systemctl start ceph-osd@0
root@int1:~# ceph status
  cluster:
id: efad7df8-721d-43d8-8d02-449406e70b90
health: HEALTH_OK

  services:
mon: 3 daemons, quorum int1,int2,int3
mgr: int1(active), standbys: int3, int2
osd: 6 osds: 5 up, 5 in

  data:
pools:   2 pools, 320 pgs
objects: 97038 objects, 364 GB
usage:   1095 GB used, 1128 GB / 2224 GB avail
pgs: 320 active+clean

  io:
client:   294 kB/s rd, 1827 kB/s wr, 61 op/s rd, 96 op/s wr


root@int1:~# ceph osd tree
ID CLASS WEIGHT  TYPE NAME STATUSREWEIGHT PRI-AFF
-1   2.60458 root default
-2   0.86819 host int1
 0   ssd 0.43159 osd.0 destroyed0 1.0
 3   ssd 0.43660 osd.3up  1.0 1.0
-3   0.86819 host int2
 1   ssd 0.43159 osd.1up  1.0 1.0
 4   ssd 0.43660 osd.4up  1.0 1.0
-4   0.86819 host int3
 2   ssd 0.43159 osd.2up  1.0 1.0
 5   ssd 0.43660 osd.5up  1.0 1.0


root@int1:~# ceph auth ls

Does not list osd.0

root@int1:~# ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' mgr 
'allow profile osd' -i /var/lib/ceph/osd/ceph-0/keyring
added key for osd.0

root@int1:~# systemctl start ceph-osd@0
root@int1:~# ceph status
  cluster:
id: efad7df8-721d-43d8-8d02-449406e70b90
health: HEALTH_OK

  services:
mon: 3 daemons, quorum int1,int2,int3
mgr: int1(active), standbys: int3, int2
osd: 6 osds: 5 up, 5 in

  data:
pools:   2 pools, 320 pgs
objects: 97163 objects, 365 GB
usage:   1097 GB used, 1127 GB / 2224 GB avail
pgs: 320 active+clean

  io:
client:   284 kB/s rd, 539 kB/s wr, 32 op/s rd, 30 op/s wr

root@int1:~# systemctl status ceph-osd@0
● ceph-osd@0.service - Ceph object storage daemon osd.0
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
   └─ceph-after-pve-cluster.conf
   Active: inactive (dead) since Fri 2018-01-26 17:02:08 UTC; 54s ago
  Process: 6857 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i 
--setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 6851 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 6857 (code=exited, status=0/SUCCESS)

Jan 26 17:02:08 int1 systemd[1]: Started Ceph object storage daemon osd.0.
Jan 26 17:02:08 int1 ceph-osd[6857]: starting osd.0 at - osd_data 
/var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.801761 7fed0b5bbcc0 -1 
osd.0 0 log_to_monitors {default=true}
Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.804600 7fecf2ee4700 -1 
osd.0 0 waiting for initial osdmap
Jan 26 17:02:08 int1 ceph-osd[6857]: 2018-01-26 17:02:08.909237 7fecf7eee700 -1 
osd.0 1040 osdmap says I am destroyed, exiting


After this I followed Reed Dier's steps somewhat, zapped the disk again, 
removed auth, crush, osd. Zapped the disk/parts and device mapper.
Could then run the create command without issues.

Kind Regards,
David Majchrzak


> 26 jan. 2018 kl. 18:56 skrev Wido den Hollander :
> 
> 
> 
> On 01/26/2018 06:53 PM, David Majchrzak wrote:
>> I did do that.
>> It didn't add the auth key to ceph, so I had to do that manually. Then it 
>> said that osd.0 was set as destroye

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread David Majchrzak
I did do that.
It didn't add the auth key to ceph, so I had to do that manually. Then it said 
that osd.0 was set as destroyed, which yes, it was still in crushmap.

I followed the docs to a point.


> 26 jan. 2018 kl. 18:50 skrev Wido den Hollander :
> 
> 
> 
> On 01/26/2018 06:37 PM, David Majchrzak wrote:
>> Ran:
>> ceph auth del osd.0
>> ceph auth del osd.6
>> ceph auth del osd.7
>> ceph osd rm osd.0
>> ceph osd rm osd.6
>> ceph osd rm osd.7
>> which seems to have removed them.
> 
> Did you destroy the OSD prior to running ceph-volume?
> 
> $ ceph osd destroy 6
> 
> After you've done that you can use ceph-volume to re-create the OSD.
> 
> Wido
> 
>> Thanks for the help Reed!
>> Kind Regards,
>> David Majchrzak
>>> 26 jan. 2018 kl. 18:32 skrev David Majchrzak >> <mailto:da...@visions.se>>:
>>> 
>>> Thanks that helped!
>>> 
>>> Since I had already "halfway" created a lvm volume I wanted to start from 
>>> the beginning and zap it.
>>> 
>>> Tried to zap the raw device but failed since --destroy doesn't seem to be 
>>> in 12.2.2
>>> 
>>> http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/
>>> 
>>> root@int1:~# ceph-volume lvm zap /dev/sdc --destroy
>>> usage: ceph-volume lvm zap [-h] [DEVICE]
>>> ceph-volume lvm zap: error: unrecognized arguments: --destroy
>>> 
>>> So i zapped it with the vg/lvm instead.
>>> ceph-volume lvm zap 
>>> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
>>> 
>>> However I run create on it since the LVM was already there.
>>> So I zapped it with sgdisk and ran dmsetup remove. After that I was able to 
>>> create it again.
>>> 
>>> However - each "ceph-volume lvm create" that I ran that failed, 
>>> successfully added an osd to crush map ;)
>>> 
>>> So I've got this now:
>>> 
>>> root@int1:~# ceph osd df tree
>>> ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS TYPE NAME
>>> -1   2.60959- 2672G  1101G  1570G 41.24 1.00   - root default
>>> -2   0.87320-  894G   369G   524G 41.36 1.00   - host int1
>>>  3   ssd 0.43660  1.0  447G   358G 90295M 80.27 1.95 301 osd.3
>>>  8   ssd 0.43660  1.0  447G 11273M   436G  2.46 0.06  19 osd.8
>>> -3   0.86819-  888G   366G   522G 41.26 1.00   - host int2
>>>  1   ssd 0.43159  1.0  441G   167G   274G 37.95 0.92 147 osd.1
>>>  4   ssd 0.43660  1.0  447G   199G   247G 44.54 1.08 173 osd.4
>>> -4   0.86819-  888G   365G   523G 41.09 1.00   - host int3
>>>  2   ssd 0.43159  1.0  441G   193G   248G 43.71 1.06 174 osd.2
>>>  5   ssd 0.43660  1.0  447G   172G   274G 38.51 0.93 146 osd.5
>>>  0 00 0  0  0 00   0 osd.0
>>>  6 00 0  0  0 00   0 osd.6
>>>  7 00 0  0  0 00   0 osd.7
>>> 
>>> I guess I can just remove them from crush,auth and rm them?
>>> 
>>> Kind Regards,
>>> 
>>> David Majchrzak
>>> 
>>>> 26 jan. 2018 kl. 18:09 skrev Reed Dier >>> <mailto:reed.d...@focusvq.com>>:
>>>> 
>>>> This is the exact issue that I ran into when starting my bluestore 
>>>> conversion journey.
>>>> 
>>>> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html
>>>> 
>>>> Specifying --osd-id causes it to fail.
>>>> 
>>>> Below are my steps for OSD replace/migrate from filestore to bluestore.
>>>> 
>>>> BIG caveat here in that I am doing destructive replacement, in that I am 
>>>> not allowing my objects to be migrated off of the OSD I’m replacing before 
>>>> nuking it.
>>>> With 8TB drives it just takes way too long, and I trust my failure domains 
>>>> and other hardware to get me through the backfills.
>>>> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add 
>>>> 3) reading data elsewhere, writing back on, I am taking step one out, and 
>>>> trusting my two other copies of the objects. Just wanted to clarify my 
>>>> steps.
>>>> 
>>>> I also set norecover and norebalance flags immediately prior to running 
>>>> t

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread David Majchrzak
Ran:

ceph auth del osd.0
ceph auth del osd.6
ceph auth del osd.7
ceph osd rm osd.0
ceph osd rm osd.6
ceph osd rm osd.7

which seems to have removed them.

Thanks for the help Reed!

Kind Regards,
David Majchrzak


> 26 jan. 2018 kl. 18:32 skrev David Majchrzak :
> 
> Thanks that helped!
> 
> Since I had already "halfway" created a lvm volume I wanted to start from the 
> beginning and zap it.
> 
> Tried to zap the raw device but failed since --destroy doesn't seem to be in 
> 12.2.2
> 
> http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/ 
> <http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/>
> 
> root@int1:~# ceph-volume lvm zap /dev/sdc --destroy
> usage: ceph-volume lvm zap [-h] [DEVICE]
> ceph-volume lvm zap: error: unrecognized arguments: --destroy
> 
> So i zapped it with the vg/lvm instead.
> ceph-volume lvm zap 
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
> 
> However I run create on it since the LVM was already there.
> So I zapped it with sgdisk and ran dmsetup remove. After that I was able to 
> create it again.
> 
> However - each "ceph-volume lvm create" that I ran that failed, successfully 
> added an osd to crush map ;)
> 
> So I've got this now:
> 
> root@int1:~# ceph osd df tree
> ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS TYPE NAME
> -1   2.60959- 2672G  1101G  1570G 41.24 1.00   - root default
> -2   0.87320-  894G   369G   524G 41.36 1.00   - host int1
>  3   ssd 0.43660  1.0  447G   358G 90295M 80.27 1.95 301 osd.3
>  8   ssd 0.43660  1.0  447G 11273M   436G  2.46 0.06  19 osd.8
> -3   0.86819-  888G   366G   522G 41.26 1.00   - host int2
>  1   ssd 0.43159  1.0  441G   167G   274G 37.95 0.92 147 osd.1
>  4   ssd 0.43660  1.0  447G   199G   247G 44.54 1.08 173 osd.4
> -4   0.86819-  888G   365G   523G 41.09 1.00   - host int3
>  2   ssd 0.43159  1.0  441G   193G   248G 43.71 1.06 174 osd.2
>  5   ssd 0.43660  1.0  447G   172G   274G 38.51 0.93 146 osd.5
>  0 00 0  0  0 00   0 osd.0
>  6 00 0  0  0 00   0 osd.6
>  7 00 0  0  0 00   0 osd.7
> 
> I guess I can just remove them from crush,auth and rm them?
> 
> Kind Regards,
> 
> David Majchrzak
> 
>> 26 jan. 2018 kl. 18:09 skrev Reed Dier > <mailto:reed.d...@focusvq.com>>:
>> 
>> This is the exact issue that I ran into when starting my bluestore 
>> conversion journey.
>> 
>> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html 
>> <https://www.spinics.net/lists/ceph-users/msg41802.html>
>> 
>> Specifying --osd-id causes it to fail.
>> 
>> Below are my steps for OSD replace/migrate from filestore to bluestore.
>> 
>> BIG caveat here in that I am doing destructive replacement, in that I am not 
>> allowing my objects to be migrated off of the OSD I’m replacing before 
>> nuking it.
>> With 8TB drives it just takes way too long, and I trust my failure domains 
>> and other hardware to get me through the backfills.
>> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add 
>> 3) reading data elsewhere, writing back on, I am taking step one out, and 
>> trusting my two other copies of the objects. Just wanted to clarify my steps.
>> 
>> I also set norecover and norebalance flags immediately prior to running 
>> these commands so that it doesn’t try to start moving data unnecessarily. 
>> Then when done, remove those flags, and let it backfill.
>> 
>>> systemctl stop ceph-osd@$ID.service <mailto:ceph-osd@$id.service>
>>> ceph-osd -i $ID --flush-journal
>>> umount /var/lib/ceph/osd/ceph-$ID
>>> ceph-volume lvm zap /dev/$ID
>>> ceph osd crush remove osd.$ID
>>> ceph auth del osd.$ID
>>> ceph osd rm osd.$ID
>>> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME
>> 
>> So essentially I fully remove the OSD from crush and the osdmap, and when I 
>> add the OSD back, like I would a new OSD, it fills in the numeric gap with 
>> the $ID it had before.
>> 
>> Hope this is helpful.
>> Been working well for me so far, doing 3 OSDs at a time (half of a failure 
>> domain).
>> 
>> Reed
>> 
>>> On Jan 26, 2018, at 10:01 AM, David >> <mailto:da...@visions.se>> wrote:
>>> 
>>> 
>>> Hi!
>>> 
>>> On luminous 

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread David Majchrzak
Thanks that helped!

Since I had already "halfway" created a lvm volume I wanted to start from the 
beginning and zap it.

Tried to zap the raw device but failed since --destroy doesn't seem to be in 
12.2.2

http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/ 
<http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/>

root@int1:~# ceph-volume lvm zap /dev/sdc --destroy
usage: ceph-volume lvm zap [-h] [DEVICE]
ceph-volume lvm zap: error: unrecognized arguments: --destroy

So i zapped it with the vg/lvm instead.
ceph-volume lvm zap 
/dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9

However I run create on it since the LVM was already there.
So I zapped it with sgdisk and ran dmsetup remove. After that I was able to 
create it again.

However - each "ceph-volume lvm create" that I ran that failed, successfully 
added an osd to crush map ;)

So I've got this now:

root@int1:~# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS TYPE NAME
-1   2.60959- 2672G  1101G  1570G 41.24 1.00   - root default
-2   0.87320-  894G   369G   524G 41.36 1.00   - host int1
 3   ssd 0.43660  1.0  447G   358G 90295M 80.27 1.95 301 osd.3
 8   ssd 0.43660  1.0  447G 11273M   436G  2.46 0.06  19 osd.8
-3   0.86819-  888G   366G   522G 41.26 1.00   - host int2
 1   ssd 0.43159  1.0  441G   167G   274G 37.95 0.92 147 osd.1
 4   ssd 0.43660  1.0  447G   199G   247G 44.54 1.08 173 osd.4
-4   0.86819-  888G   365G   523G 41.09 1.00   - host int3
 2   ssd 0.43159  1.0  441G   193G   248G 43.71 1.06 174 osd.2
 5   ssd 0.43660  1.0  447G   172G   274G 38.51 0.93 146 osd.5
 0 00 0  0  0 00   0 osd.0
 6 00 0  0  0 00   0 osd.6
 7 00 0  0  0 00   0 osd.7

I guess I can just remove them from crush,auth and rm them?

Kind Regards,

David Majchrzak

> 26 jan. 2018 kl. 18:09 skrev Reed Dier :
> 
> This is the exact issue that I ran into when starting my bluestore conversion 
> journey.
> 
> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html 
> <https://www.spinics.net/lists/ceph-users/msg41802.html>
> 
> Specifying --osd-id causes it to fail.
> 
> Below are my steps for OSD replace/migrate from filestore to bluestore.
> 
> BIG caveat here in that I am doing destructive replacement, in that I am not 
> allowing my objects to be migrated off of the OSD I’m replacing before nuking 
> it.
> With 8TB drives it just takes way too long, and I trust my failure domains 
> and other hardware to get me through the backfills.
> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add 3) 
> reading data elsewhere, writing back on, I am taking step one out, and 
> trusting my two other copies of the objects. Just wanted to clarify my steps.
> 
> I also set norecover and norebalance flags immediately prior to running these 
> commands so that it doesn’t try to start moving data unnecessarily. Then when 
> done, remove those flags, and let it backfill.
> 
>> systemctl stop ceph-osd@$ID.service
>> ceph-osd -i $ID --flush-journal
>> umount /var/lib/ceph/osd/ceph-$ID
>> ceph-volume lvm zap /dev/$ID
>> ceph osd crush remove osd.$ID
>> ceph auth del osd.$ID
>> ceph osd rm osd.$ID
>> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME
> 
> So essentially I fully remove the OSD from crush and the osdmap, and when I 
> add the OSD back, like I would a new OSD, it fills in the numeric gap with 
> the $ID it had before.
> 
> Hope this is helpful.
> Been working well for me so far, doing 3 OSDs at a time (half of a failure 
> domain).
> 
> Reed
> 
>> On Jan 26, 2018, at 10:01 AM, David > <mailto:da...@visions.se>> wrote:
>> 
>> 
>> Hi!
>> 
>> On luminous 12.2.2
>> 
>> I'm migrating some OSDs from filestore to bluestore using the "simple" 
>> method as described in docs: 
>> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds
>>  
>> <http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds>
>> Mark out and Replace.
>> 
>> However, at 9.: ceph-volume create --bluestore --data $DEVICE --osd-id $ID
>> it seems to create the bluestore but it fails to authenticate with the old 
>> osd-id auth.
>> (the command above is also missing lvm or simple)
>> 
>> I think it's related to this:
>> http://tracker.ceph.com/issues/22642 <http://tracker.ceph.com/issues/22642>