Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2018-01-02 Thread Luis Periquito
On Tue, Dec 5, 2017 at 1:20 PM, Wido den Hollander  wrote:
> Hi,
>
> I haven't tried this before but I expect it to work, but I wanted to check 
> before proceeding.
>
> I have a Ceph cluster which is running with manually formatted FileStore XFS 
> disks, Jewel, sysvinit and Ubuntu 14.04.
>
> I would like to upgrade this system to Luminous, but since I have to 
> re-install all servers and re-format all disks I'd like to move it to 
> BlueStore at the same time.
>
> This system however has 768 3TB disks and has a utilization of about 60%. You 
> can guess, it will take a long time before all the backfills complete.
>
> The idea is to take a machine down, wipe all disks, re-install it with Ubuntu 
> 16.04 and Luminous and re-format the disks with BlueStore.
>
> The OSDs get back, start to backfill and we wait.
Are you OUT'ing the OSDs or removing them altogether (ceph osd crush
remove + ceph osd rm)?

I've noticed that when you remove them completely the data movement is
much bigger.

>
> My estimation is that we can do one machine per day, but we have 48 machines 
> to do. Realistically this will take ~60 days to complete.

That seems a bit optimistic for me. But it depends on how aggressive
you are, and how busy those spindles are.

>
> Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work just 
> fine I wanted to check if there are any caveats I don't know about.
>
> I'll upgrade the MONs to Luminous first before starting to upgrade the OSDs. 
> Between each machine I'll wait for a HEALTH_OK before proceeding allowing the 
> MONs to trim their datastore.

You have to: As far as I've seen after upgrading one of the MONs to
Luminous, the new OSDs running Luminous refuse to start until you have
*ALL* MONs running Luminous.

>
> The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?
>
> I think it won't, but I wanted to double-check.

I thought the same. I was running 10.2.3 and doing about the same to
upgrade to 10.2.7, so keeping Jewel. The process was pretty much the
same, but had to pause for a month half way through (because of
unrelated issues) and every so often the cluster would just stop. At
least one of the OSDs would stop responding and piling up slow
requests, even though it was idle. It was random OSDs and happened
both on HDD and SSD (this is a cache tiered s3 storage cluster) and
either version. I tried the injectargs but no output - it just printed
and if it was idle. Restart the OSD and it would spring back to
life...

So not sure if you get similar issues, but I'm now avoiding mixed
versions as much as I can.

>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2018-01-01 Thread Christian Balzer

Hello,

On Tue, 2 Jan 2018 01:23:45 +0100 Ronny Aasen wrote:

> On 30.12.2017 15:41, Milanov, Radoslav Nikiforov wrote:
> > Performance as well - in my testing FileStore was much quicker than 
> > BlueStore.  
> 
> 
> with filestore you often have a ssd journal in front, this will often 
> mask/hide slow spinning disk write performance, until the journal size 
> becomes the bottleneck.
> 
The journal size basically never becomes the bottleneck, as Ceph will
start flushing very quickly with default settings and then hits the
effective HDD speed as well.
The journal can deal nicely with short/small bursts, though.

> with bluestore only metadata db and wal is on ssd. so there is no 
> doublewrite, and there is no journal bottleneck. but write latency will 
> be the speed of the disk, and not the speed of the ssd journal. this 
> will feel like a write performance regression.
> 
Small writes with bluestore also will go to the DB (SSD), as the Ceph
developers noted that latencies were rather bad otherwise. 

>From where I'm standing, Bluestore is rather wet behind the ears, with
probably some bugs lurking (a file system, even as simple as this one
isn't trivial) and more importantly space for performance improvements.

> you can use bcache in front of bluestore to regain the "journal+ 
> doublewrite" write characteristic of filestore+journal.
> 
I'm using bcache and tested LVM cache in a non-Ceph (DRBD) setup.
LVM cache is hilariously complex, poorly (outdated) documented and
performs under normal/typical workloads far worse than bcache.
OTOH while bcache will give you nice improvements (also on reads) it isn't
bug free (and I'm not even thinking about the 4.14 data corruption issue)
and when pushed hard (when it wants/needs to flush to HDD) it will overload
things and doesn't honor I/O priorities as others have mentioned here.

I'm using bcache for now because in my use case the issues above won't
show up, but I'd be wary to use it with Ceph in a cluster where I don't
control/know the IO patterns. 

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2018-01-01 Thread Ronny Aasen

On 30.12.2017 15:41, Milanov, Radoslav Nikiforov wrote:

Performance as well - in my testing FileStore was much quicker than BlueStore.



with filestore you often have a ssd journal in front, this will often 
mask/hide slow spinning disk write performance, until the journal size 
becomes the bottleneck.


with bluestore only metadata db and wal is on ssd. so there is no 
doublewrite, and there is no journal bottleneck. but write latency will 
be the speed of the disk, and not the speed of the ssd journal. this 
will feel like a write performance regression.


you can use bcache in front of bluestore to regain the "journal+ 
doublewrite" write characteristic of filestore+journal.


kind regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-31 Thread David Herselman
Hi Travis,

In my experience, after converting OSDs from hdd FileStore with ssd journals to 
hdd BlueStore with RocksDB and it's WAL on ssd, FileStore is significantly 
faster.

https://forum.proxmox.com/threads/ceph-bluestore-not-always-faster-than-filestore.38405/

Pure ssd OSDs would however be much faster using BlueStore...


Regards
David Herselman

On 29 Dec 2017 22:06, Travis Nielsen  wrote:
Since bluestore was declared stable in Luminous, is there any remaining
scenario to use filestore in new deployments? Or is it safe to assume that
bluestore is always better to use in Luminous? All documentation I can
find points to bluestore being superior in all cases.

Thanks,
Travis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-30 Thread Martin, Jeremy
Also the ceps-deploy tool doesn’t seem to adequately support bluestone with its 
recommended configuration on lvm the osd create and prepare tools don’t work 
with any reliability with bluestone and lvm.  Posted some information and 
questions on this when deploying a new cluster on both test and production 
hardware and vm’s but was unable to get any information or proceed even though 
filestore deployed without any issue.  Focus the decision came down to 
deploying on bluestone or filestore but sense we couldn’t deploy on bluestone 
the choice then came to filestore or a competing product and we didn’t feel 
that deploying new clusters with a store likely to be replaced in the future 
(i.e) filestore was a good choice so test and consider bluestone carefully

Jeremy




> On Dec 29, 2017, at 3:05 PM, Travis Nielsen  
> wrote:
> 
> Since bluestore was declared stable in Luminous, is there any remaining
> scenario to use filestore in new deployments? Or is it safe to assume that
> bluestore is always better to use in Luminous? All documentation I can
> find points to bluestore being superior in all cases.
> 
> Thanks,
> Travis
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-30 Thread Konstantin Shalygin

Performance as well - in my testing FileStore was much quicker than BlueStore.


Proof?



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-30 Thread Milanov, Radoslav Nikiforov
Performance as well - in my testing FileStore was much quicker than BlueStore.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
Weil
Sent: Friday, December 29, 2017 3:51 PM
To: Travis Nielsen <travis.niel...@quantum.com>
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

On Fri, 29 Dec 2017, Travis Nielsen wrote:
> Since bluestore was declared stable in Luminous, is there any 
> remaining scenario to use filestore in new deployments? Or is it safe 
> to assume that bluestore is always better to use in Luminous? All 
> documentation I can find points to bluestore being superior in all cases.

The only real reason to run FileStore is for stability reasons: FileStore is 
older and well-tested, so the most conservative users may stick to FileStore 
for a bit longer.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-29 Thread Sage Weil
On Fri, 29 Dec 2017, Travis Nielsen wrote:
> Since bluestore was declared stable in Luminous, is there any remaining
> scenario to use filestore in new deployments? Or is it safe to assume that
> bluestore is always better to use in Luminous? All documentation I can
> find points to bluestore being superior in all cases.

The only real reason to run FileStore is for stability reasons: FileStore 
is older and well-tested, so the most conservative users may stick to 
FileStore for a bit longer.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-29 Thread Travis Nielsen
Since bluestore was declared stable in Luminous, is there any remaining
scenario to use filestore in new deployments? Or is it safe to assume that
bluestore is always better to use in Luminous? All documentation I can
find points to bluestore being superior in all cases.

Thanks,
Travis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-07 Thread Graham Allan

On 12/06/2017 03:20 AM, Wido den Hollander wrote:



Op 5 december 2017 om 18:39 schreef Richard Hesketh 
:

On 05/12/17 17:10, Graham Allan wrote:

On 12/05/2017 07:20 AM, Wido den Hollander wrote:


I haven't tried this before but I expect it to work, but I wanted to
check before proceeding.

I have a Ceph cluster which is running with manually formatted
FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.

I would like to upgrade this system to Luminous, but since I have to
re-install all servers and re-format all disks I'd like to move it to
BlueStore at the same time.


You don't *have* to update the OS in order to update to Luminous, do you? 
Luminous is still supported on Ubuntu 14.04 AFAIK.

Though obviously I understand your desire to upgrade; I only ask because I am 
in the same position (Ubuntu 14.04, xfs, sysvinit), though happily with a 
smaller cluster. Personally I was planning to upgrade ours entirely to Luminous 
while still on Ubuntu 14.04, before later going through the same process of 
decommissioning one machine at a time to reinstall with CentOS 7 and Bluestore. 
I too don't see any reason the mixed Jewel/Luminous cluster wouldn't work, but 
still felt less comfortable with extending the upgrade duration.



Well, the sysvinit part bothers me. This setup uses the 'devs' part in 
ceph.conf and such. It's all a kind of hacky system.

Most of these systems have run Dumpling on Ubuntu 12.04 and have been upgraded 
ever since. They are messy.

We'd like to reprovision all disks with ceph-volume while we are at it. It 
would be one step by doing the OS and Ceph at the same time.

I've never tried to run Luminous under 14.04. Looking at the DEB packages there 
doesn't seem to be sysvinit support anymore in Luminous either.


I upgraded our trusty cluster from jewel to luminous yesterday - it went 
pretty smoothly. This cluster has been around for a while too (firefly 
anyway) and a few issues have popped up, of which I will be asking 
questions separately, but I think due to historic cruft rather than 
anything inherent to luminous. Perhaps luminous checks things more 
thoroughly than jewel? The sysvinit startup certainly works fine.


I was also anxious to upgrade to Centos 7 to match our other couple of 
clusters, as I finally felt I understood the systemd/ceph-disk udev 
startup process; but it sounds like that is going away as well!


Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Yehuda Sadeh-Weinraub
It's hard to say, we don't really test your specific scenario so use
it with your own risk. There was a change in cls_refcount that we had
issues with in the upgrade suite, but looking at it I'm not sure it'll
actually be a problem for you (you'll still hit the original problem
though).
Other problematic area is the osd limit on large omap operations, for
which we added 'truncated' flag for the relevant objclass operations.
Running older rgw against newer osds might cause issues when listing
omaps. You should make sure that bucket listing works correctly, but
there may be other issues (garbage collector, listing of user's
buckets, multipart upload completion). It could be that you can
configure that osd limit to be a higher number, so that you won't hit
that issue (RGW probably never requests more than 1000 entries off
omap, so setting it to 1k should be fine).

Yehuda

On Wed, Dec 6, 2017 at 2:09 PM, Wido den Hollander  wrote:
>
>> Op 6 december 2017 om 10:25 schreef Yehuda Sadeh-Weinraub 
>> :
>>
>>
>> Are you using rgw? There are certain compatibility issues that you
>> might hit if you run mixed versions.
>>
>
> Yes, it is. So would it hurt if OSDs are running Luminous but the RGW is 
> still Jewel?
>
> Multisite isn't used, it's just a local RGW.
>
> Wido
>
>> Yehuda
>>
>> On Tue, Dec 5, 2017 at 3:20 PM, Wido den Hollander  wrote:
>> > Hi,
>> >
>> > I haven't tried this before but I expect it to work, but I wanted to check 
>> > before proceeding.
>> >
>> > I have a Ceph cluster which is running with manually formatted FileStore 
>> > XFS disks, Jewel, sysvinit and Ubuntu 14.04.
>> >
>> > I would like to upgrade this system to Luminous, but since I have to 
>> > re-install all servers and re-format all disks I'd like to move it to 
>> > BlueStore at the same time.
>> >
>> > This system however has 768 3TB disks and has a utilization of about 60%. 
>> > You can guess, it will take a long time before all the backfills complete.
>> >
>> > The idea is to take a machine down, wipe all disks, re-install it with 
>> > Ubuntu 16.04 and Luminous and re-format the disks with BlueStore.
>> >
>> > The OSDs get back, start to backfill and we wait.
>> >
>> > My estimation is that we can do one machine per day, but we have 48 
>> > machines to do. Realistically this will take ~60 days to complete.
>> >
>> > Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work 
>> > just fine I wanted to check if there are any caveats I don't know about.
>> >
>> > I'll upgrade the MONs to Luminous first before starting to upgrade the 
>> > OSDs. Between each machine I'll wait for a HEALTH_OK before proceeding 
>> > allowing the MONs to trim their datastore.
>> >
>> > The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?
>> >
>> > I think it won't, but I wanted to double-check.
>> >
>> > Wido
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander

> Op 6 december 2017 om 10:25 schreef Yehuda Sadeh-Weinraub :
> 
> 
> Are you using rgw? There are certain compatibility issues that you
> might hit if you run mixed versions.
> 

Yes, it is. So would it hurt if OSDs are running Luminous but the RGW is still 
Jewel?

Multisite isn't used, it's just a local RGW.

Wido

> Yehuda
> 
> On Tue, Dec 5, 2017 at 3:20 PM, Wido den Hollander  wrote:
> > Hi,
> >
> > I haven't tried this before but I expect it to work, but I wanted to check 
> > before proceeding.
> >
> > I have a Ceph cluster which is running with manually formatted FileStore 
> > XFS disks, Jewel, sysvinit and Ubuntu 14.04.
> >
> > I would like to upgrade this system to Luminous, but since I have to 
> > re-install all servers and re-format all disks I'd like to move it to 
> > BlueStore at the same time.
> >
> > This system however has 768 3TB disks and has a utilization of about 60%. 
> > You can guess, it will take a long time before all the backfills complete.
> >
> > The idea is to take a machine down, wipe all disks, re-install it with 
> > Ubuntu 16.04 and Luminous and re-format the disks with BlueStore.
> >
> > The OSDs get back, start to backfill and we wait.
> >
> > My estimation is that we can do one machine per day, but we have 48 
> > machines to do. Realistically this will take ~60 days to complete.
> >
> > Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work just 
> > fine I wanted to check if there are any caveats I don't know about.
> >
> > I'll upgrade the MONs to Luminous first before starting to upgrade the 
> > OSDs. Between each machine I'll wait for a HEALTH_OK before proceeding 
> > allowing the MONs to trim their datastore.
> >
> > The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?
> >
> > I think it won't, but I wanted to double-check.
> >
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Richard Hesketh
On 06/12/17 09:17, Caspar Smit wrote:
> 
> 2017-12-05 18:39 GMT+01:00 Richard Hesketh  >:
> 
> On 05/12/17 17:10, Graham Allan wrote:
> > On 12/05/2017 07:20 AM, Wido den Hollander wrote:
> >> Hi,
> >>
> >> I haven't tried this before but I expect it to work, but I wanted to
> >> check before proceeding.
> >>
> >> I have a Ceph cluster which is running with manually formatted
> >> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.
> >>
> >> I would like to upgrade this system to Luminous, but since I have to
> >> re-install all servers and re-format all disks I'd like to move it to
> >> BlueStore at the same time.
> >
> > You don't *have* to update the OS in order to update to Luminous, do 
> you? Luminous is still supported on Ubuntu 14.04 AFAIK.
> >
> > Though obviously I understand your desire to upgrade; I only ask 
> because I am in the same position (Ubuntu 14.04, xfs, sysvinit), though 
> happily with a smaller cluster. Personally I was planning to upgrade ours 
> entirely to Luminous while still on Ubuntu 14.04, before later going through 
> the same process of decommissioning one machine at a time to reinstall with 
> CentOS 7 and Bluestore. I too don't see any reason the mixed Jewel/Luminous 
> cluster wouldn't work, but still felt less comfortable with extending the 
> upgrade duration.
> >
> > Graham
> 
> Yes, you can run luminous on Trusty; one of my clusters is currently 
> Luminous/Bluestore/Trusty as I've not had time to sort out doing OS upgrades 
> on it. I second the suggestion that it would be better to do the luminous 
> upgrade first, retaining existing filestore OSDs, and then do the OS 
> upgrade/OSD recreation on each node in sequence. I don't think there should 
> realistically be any problems with running a mixed cluster for a while but 
> doing the jewel->luminous upgrade on the existing installs first shouldn't be 
> significant extra effort/time as you're already predicting at least two 
> months to upgrade everything, and it does minimise the amount of change at 
> any one time in case things do start going horribly wrong.
> 
> Also, at 48 nodes, I would've thought you could get away with cycling 
> more than one of them at once. Assuming they're homogenous taking out even 4 
> at a time should only raise utilisation on the rest of the cluster to a 
> little over 65%, which still seems safe to me, and you'd waste way less time 
> waiting for recovery. (I recognise that depending on the nature of your 
> employment situation this may not actually be desirable...)
> 
>  
> Assuming size=3 and min_size=2 and failure-domain=host:
> 
> I always thought that bringing down more then 1 host cause data 
> inaccessebility right away because the chance that a pg will have osd's in 
> these 2 hosts is there. Only if the failure-domain is higher then host (rack 
> or something) you can safely bring more then 1 host down (in the same failure 
> domain offcourse).
> 
> Am i right? 
> 
> Kind regards,
> Caspar

Oh, yeah, if you just bring them down immediately without rebalancing first, 
you'll have problems. But the intention is that rather than just killing the 
nodes, you first weight them to 0 and then wait for the cluster to rebalance 
the data off them so they are empty and harmless when you do shut them down. 
You minimise time spent waiting and overall data movement if you do this sort 
of replacement in larger batches. Others have correctly pointed out though that 
the larger the change you make at any one time, the more likely something might 
go wrong overall... I suspect a good rule of thumb is that you should try to 
add/replace/remove nodes/OSDs in batches of as many you can get away with at 
once without stretching outside the failure domain.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Yehuda Sadeh-Weinraub
Are you using rgw? There are certain compatibility issues that you
might hit if you run mixed versions.

Yehuda

On Tue, Dec 5, 2017 at 3:20 PM, Wido den Hollander  wrote:
> Hi,
>
> I haven't tried this before but I expect it to work, but I wanted to check 
> before proceeding.
>
> I have a Ceph cluster which is running with manually formatted FileStore XFS 
> disks, Jewel, sysvinit and Ubuntu 14.04.
>
> I would like to upgrade this system to Luminous, but since I have to 
> re-install all servers and re-format all disks I'd like to move it to 
> BlueStore at the same time.
>
> This system however has 768 3TB disks and has a utilization of about 60%. You 
> can guess, it will take a long time before all the backfills complete.
>
> The idea is to take a machine down, wipe all disks, re-install it with Ubuntu 
> 16.04 and Luminous and re-format the disks with BlueStore.
>
> The OSDs get back, start to backfill and we wait.
>
> My estimation is that we can do one machine per day, but we have 48 machines 
> to do. Realistically this will take ~60 days to complete.
>
> Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work just 
> fine I wanted to check if there are any caveats I don't know about.
>
> I'll upgrade the MONs to Luminous first before starting to upgrade the OSDs. 
> Between each machine I'll wait for a HEALTH_OK before proceeding allowing the 
> MONs to trim their datastore.
>
> The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?
>
> I think it won't, but I wanted to double-check.
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander

> Op 6 december 2017 om 10:17 schreef Caspar Smit :
> 
> 
> 2017-12-05 18:39 GMT+01:00 Richard Hesketh :
> 
> > On 05/12/17 17:10, Graham Allan wrote:
> > > On 12/05/2017 07:20 AM, Wido den Hollander wrote:
> > >> Hi,
> > >>
> > >> I haven't tried this before but I expect it to work, but I wanted to
> > >> check before proceeding.
> > >>
> > >> I have a Ceph cluster which is running with manually formatted
> > >> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.
> > >>
> > >> I would like to upgrade this system to Luminous, but since I have to
> > >> re-install all servers and re-format all disks I'd like to move it to
> > >> BlueStore at the same time.
> > >
> > > You don't *have* to update the OS in order to update to Luminous, do
> > you? Luminous is still supported on Ubuntu 14.04 AFAIK.
> > >
> > > Though obviously I understand your desire to upgrade; I only ask because
> > I am in the same position (Ubuntu 14.04, xfs, sysvinit), though happily
> > with a smaller cluster. Personally I was planning to upgrade ours entirely
> > to Luminous while still on Ubuntu 14.04, before later going through the
> > same process of decommissioning one machine at a time to reinstall with
> > CentOS 7 and Bluestore. I too don't see any reason the mixed Jewel/Luminous
> > cluster wouldn't work, but still felt less comfortable with extending the
> > upgrade duration.
> > >
> > > Graham
> >
> > Yes, you can run luminous on Trusty; one of my clusters is currently
> > Luminous/Bluestore/Trusty as I've not had time to sort out doing OS
> > upgrades on it. I second the suggestion that it would be better to do the
> > luminous upgrade first, retaining existing filestore OSDs, and then do the
> > OS upgrade/OSD recreation on each node in sequence. I don't think there
> > should realistically be any problems with running a mixed cluster for a
> > while but doing the jewel->luminous upgrade on the existing installs first
> > shouldn't be significant extra effort/time as you're already predicting at
> > least two months to upgrade everything, and it does minimise the amount of
> > change at any one time in case things do start going horribly wrong.
> >
> > Also, at 48 nodes, I would've thought you could get away with cycling more
> > than one of them at once. Assuming they're homogenous taking out even 4 at
> > a time should only raise utilisation on the rest of the cluster to a little
> > over 65%, which still seems safe to me, and you'd waste way less time
> > waiting for recovery. (I recognise that depending on the nature of your
> > employment situation this may not actually be desirable...)
> >
> >
> Assuming size=3 and min_size=2 and failure-domain=host:
> 
> I always thought that bringing down more then 1 host cause data
> inaccessebility right away because the chance that a pg will have osd's in
> these 2 hosts is there. Only if the failure-domain is higher then host
> (rack or something) you can safely bring more then 1 host down (in the same
> failure domain offcourse).
> 
> Am i right?

Yes, you are right. This cluster in this case has failure domain set to 'rack' 
and thus allows for multiple machines in one rack to go down without impacting 
availability.

> 
> Kind regards,
> Caspar
> 
> 
> > Rich
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander

> Op 5 december 2017 om 18:39 schreef Richard Hesketh 
> :
> 
> 
> On 05/12/17 17:10, Graham Allan wrote:
> > On 12/05/2017 07:20 AM, Wido den Hollander wrote:
> >> Hi,
> >>
> >> I haven't tried this before but I expect it to work, but I wanted to
> >> check before proceeding.
> >>
> >> I have a Ceph cluster which is running with manually formatted
> >> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.
> >>
> >> I would like to upgrade this system to Luminous, but since I have to
> >> re-install all servers and re-format all disks I'd like to move it to
> >> BlueStore at the same time.
> > 
> > You don't *have* to update the OS in order to update to Luminous, do you? 
> > Luminous is still supported on Ubuntu 14.04 AFAIK.
> > 
> > Though obviously I understand your desire to upgrade; I only ask because I 
> > am in the same position (Ubuntu 14.04, xfs, sysvinit), though happily with 
> > a smaller cluster. Personally I was planning to upgrade ours entirely to 
> > Luminous while still on Ubuntu 14.04, before later going through the same 
> > process of decommissioning one machine at a time to reinstall with CentOS 7 
> > and Bluestore. I too don't see any reason the mixed Jewel/Luminous cluster 
> > wouldn't work, but still felt less comfortable with extending the upgrade 
> > duration.
> > 

Well, the sysvinit part bothers me. This setup uses the 'devs' part in 
ceph.conf and such. It's all a kind of hacky system.

Most of these systems have run Dumpling on Ubuntu 12.04 and have been upgraded 
ever since. They are messy.

We'd like to reprovision all disks with ceph-volume while we are at it. It 
would be one step by doing the OS and Ceph at the same time.

I've never tried to run Luminous under 14.04. Looking at the DEB packages there 
doesn't seem to be sysvinit support anymore in Luminous either.

> > Graham
> 
> Yes, you can run luminous on Trusty; one of my clusters is currently 
> Luminous/Bluestore/Trusty as I've not had time to sort out doing OS upgrades 
> on it. I second the suggestion that it would be better to do the luminous 
> upgrade first, retaining existing filestore OSDs, and then do the OS 
> upgrade/OSD recreation on each node in sequence. I don't think there should 
> realistically be any problems with running a mixed cluster for a while but 
> doing the jewel->luminous upgrade on the existing installs first shouldn't be 
> significant extra effort/time as you're already predicting at least two 
> months to upgrade everything, and it does minimise the amount of change at 
> any one time in case things do start going horribly wrong.
> 

I agree that less things at once are best. But we will at least automate the 
whole install/config using Salt, so that part if covered.

The Luminous on Trusty, does that run with sysvinit or with Upstart?

> Also, at 48 nodes, I would've thought you could get away with cycling more 
> than one of them at once. Assuming they're homogenous taking out even 4 at a 
> time should only raise utilisation on the rest of the cluster to a little 
> over 65%, which still seems safe to me, and you'd waste way less time waiting 
> for recovery. (I recognise that depending on the nature of your employment 
> situation this may not actually be desirable...)
> 

We can probably do more then one node at the same time, however, I'm setting up 
a plan which the admins will execute and we want to take the safe route. Uptime 
is important as well.

If we screw up a node the damage isn't that big.

But the main question remains: Can you run a mix of Jewel and Luminous for a 
longer period.

If so, what are the caveats?

Once clusters keep growing they will need to run a mix of versions. I have 
other clusters which are running Jewel and have 400 nodes. Upgrading them all 
will take a lof of time as well.

Thanks,

Wido

> Rich
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Caspar Smit
2017-12-05 18:39 GMT+01:00 Richard Hesketh :

> On 05/12/17 17:10, Graham Allan wrote:
> > On 12/05/2017 07:20 AM, Wido den Hollander wrote:
> >> Hi,
> >>
> >> I haven't tried this before but I expect it to work, but I wanted to
> >> check before proceeding.
> >>
> >> I have a Ceph cluster which is running with manually formatted
> >> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.
> >>
> >> I would like to upgrade this system to Luminous, but since I have to
> >> re-install all servers and re-format all disks I'd like to move it to
> >> BlueStore at the same time.
> >
> > You don't *have* to update the OS in order to update to Luminous, do
> you? Luminous is still supported on Ubuntu 14.04 AFAIK.
> >
> > Though obviously I understand your desire to upgrade; I only ask because
> I am in the same position (Ubuntu 14.04, xfs, sysvinit), though happily
> with a smaller cluster. Personally I was planning to upgrade ours entirely
> to Luminous while still on Ubuntu 14.04, before later going through the
> same process of decommissioning one machine at a time to reinstall with
> CentOS 7 and Bluestore. I too don't see any reason the mixed Jewel/Luminous
> cluster wouldn't work, but still felt less comfortable with extending the
> upgrade duration.
> >
> > Graham
>
> Yes, you can run luminous on Trusty; one of my clusters is currently
> Luminous/Bluestore/Trusty as I've not had time to sort out doing OS
> upgrades on it. I second the suggestion that it would be better to do the
> luminous upgrade first, retaining existing filestore OSDs, and then do the
> OS upgrade/OSD recreation on each node in sequence. I don't think there
> should realistically be any problems with running a mixed cluster for a
> while but doing the jewel->luminous upgrade on the existing installs first
> shouldn't be significant extra effort/time as you're already predicting at
> least two months to upgrade everything, and it does minimise the amount of
> change at any one time in case things do start going horribly wrong.
>
> Also, at 48 nodes, I would've thought you could get away with cycling more
> than one of them at once. Assuming they're homogenous taking out even 4 at
> a time should only raise utilisation on the rest of the cluster to a little
> over 65%, which still seems safe to me, and you'd waste way less time
> waiting for recovery. (I recognise that depending on the nature of your
> employment situation this may not actually be desirable...)
>
>
Assuming size=3 and min_size=2 and failure-domain=host:

I always thought that bringing down more then 1 host cause data
inaccessebility right away because the chance that a pg will have osd's in
these 2 hosts is there. Only if the failure-domain is higher then host
(rack or something) you can safely bring more then 1 host down (in the same
failure domain offcourse).

Am i right?

Kind regards,
Caspar


> Rich
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-05 Thread Rafael Lopez
>
> Yes, you can run luminous on Trusty; one of my clusters is currently
> Luminous/Bluestore/Trusty as I've not had time to sort out doing OS
> upgrades on it. I second the suggestion that it would be better to do the
> luminous upgrade first, retaining existing filestore OSDs, and then do the
> OS upgrade/OSD recreation on each node in sequence. I don't think there
> should realistically be any problems with running a mixed cluster for a
> while but doing the jewel->luminous upgrade on the existing installs first
> shouldn't be significant extra effort/time as you're already predicting at
> least two months to upgrade everything, and it does minimise the amount of
> change at any one time in case things do start going horribly wrong.
>
> Also, at 48 nodes, I would've thought you could get away with cycling more
> than one of them at once. Assuming they're homogenous taking out even 4 at
> a time should only raise utilisation on the rest of the cluster to a little
> over 65%, which still seems safe to me, and you'd waste way less time
> waiting for recovery. (I recognise that depending on the nature of your
> employment situation this may not actually be desirable...)
>
> Rich
>
>
I also agree with this approach we actually did the reverse, updated OS
on all nodes from precise/trusty to xenial while cluster was still running
hammer. the only thing that we had to fiddle with was init (ie. no systemd
files provided with hammer), but you can write basic script(s) to
start/stop all osds manually. this was ok for us, particularly since we
didn't intend to run that state for a long period, and eventually upgraded
to jewel and soon to be luminous. In your case, since trusty is supported
in luminous I don't think you would have any trouble with this?


-- 
*Rafael Lopez*
Research Devops Engineer
Monash University eResearch Centre
E: rafael.lo...@monash.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-05 Thread Richard Hesketh
On 05/12/17 17:10, Graham Allan wrote:
> On 12/05/2017 07:20 AM, Wido den Hollander wrote:
>> Hi,
>>
>> I haven't tried this before but I expect it to work, but I wanted to
>> check before proceeding.
>>
>> I have a Ceph cluster which is running with manually formatted
>> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.
>>
>> I would like to upgrade this system to Luminous, but since I have to
>> re-install all servers and re-format all disks I'd like to move it to
>> BlueStore at the same time.
> 
> You don't *have* to update the OS in order to update to Luminous, do you? 
> Luminous is still supported on Ubuntu 14.04 AFAIK.
> 
> Though obviously I understand your desire to upgrade; I only ask because I am 
> in the same position (Ubuntu 14.04, xfs, sysvinit), though happily with a 
> smaller cluster. Personally I was planning to upgrade ours entirely to 
> Luminous while still on Ubuntu 14.04, before later going through the same 
> process of decommissioning one machine at a time to reinstall with CentOS 7 
> and Bluestore. I too don't see any reason the mixed Jewel/Luminous cluster 
> wouldn't work, but still felt less comfortable with extending the upgrade 
> duration.
> 
> Graham

Yes, you can run luminous on Trusty; one of my clusters is currently 
Luminous/Bluestore/Trusty as I've not had time to sort out doing OS upgrades on 
it. I second the suggestion that it would be better to do the luminous upgrade 
first, retaining existing filestore OSDs, and then do the OS upgrade/OSD 
recreation on each node in sequence. I don't think there should realistically 
be any problems with running a mixed cluster for a while but doing the 
jewel->luminous upgrade on the existing installs first shouldn't be significant 
extra effort/time as you're already predicting at least two months to upgrade 
everything, and it does minimise the amount of change at any one time in case 
things do start going horribly wrong.

Also, at 48 nodes, I would've thought you could get away with cycling more than 
one of them at once. Assuming they're homogenous taking out even 4 at a time 
should only raise utilisation on the rest of the cluster to a little over 65%, 
which still seems safe to me, and you'd waste way less time waiting for 
recovery. (I recognise that depending on the nature of your employment 
situation this may not actually be desirable...)

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-05 Thread Graham Allan



On 12/05/2017 07:20 AM, Wido den Hollander wrote:

Hi,

I haven't tried this before but I expect it to work, but I wanted to
check before proceeding.

I have a Ceph cluster which is running with manually formatted
FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04.

I would like to upgrade this system to Luminous, but since I have to
re-install all servers and re-format all disks I'd like to move it to
BlueStore at the same time.


You don't *have* to update the OS in order to update to Luminous, do 
you? Luminous is still supported on Ubuntu 14.04 AFAIK.


Though obviously I understand your desire to upgrade; I only ask because 
I am in the same position (Ubuntu 14.04, xfs, sysvinit), though happily 
with a smaller cluster. Personally I was planning to upgrade ours 
entirely to Luminous while still on Ubuntu 14.04, before later going 
through the same process of decommissioning one machine at a time to 
reinstall with CentOS 7 and Bluestore. I too don't see any reason the 
mixed Jewel/Luminous cluster wouldn't work, but still felt less 
comfortable with extending the upgrade duration.


Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-05 Thread Wido den Hollander
Hi,

I haven't tried this before but I expect it to work, but I wanted to check 
before proceeding.

I have a Ceph cluster which is running with manually formatted FileStore XFS 
disks, Jewel, sysvinit and Ubuntu 14.04.

I would like to upgrade this system to Luminous, but since I have to re-install 
all servers and re-format all disks I'd like to move it to BlueStore at the 
same time.

This system however has 768 3TB disks and has a utilization of about 60%. You 
can guess, it will take a long time before all the backfills complete.

The idea is to take a machine down, wipe all disks, re-install it with Ubuntu 
16.04 and Luminous and re-format the disks with BlueStore.

The OSDs get back, start to backfill and we wait.

My estimation is that we can do one machine per day, but we have 48 machines to 
do. Realistically this will take ~60 days to complete.

Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work just 
fine I wanted to check if there are any caveats I don't know about.

I'll upgrade the MONs to Luminous first before starting to upgrade the OSDs. 
Between each machine I'll wait for a HEALTH_OK before proceeding allowing the 
MONs to trim their datastore.

The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?

I think it won't, but I wanted to double-check.

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com