date:20180228

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Stefan Priebe - Profihost AG

Does anybody have some more input?

I keeped the balancer active for 24h now and it is rebalancing 1-3%
every 30 minutes but the distribution is still bad.

It seems to balance from left to right and than back from right to left...

Greets,
Stefan

Am 28.02.2018 um 13:47 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> with jewel we always used the python crush optimizer which gave us a
> pretty good distribution fo the used space.
> 
> Since luminous we're using the included ceph mgr balancer but the
> distribution is far from perfect and much worse than the old method.
> 
> Is there any way to tune the mgr balancer?
> 
> Currently after a balance we still have:
> 75% to 92% disk usage which is pretty unfair
> 
> Greets,
> Stefan
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Developer Monthly - March 2018

2018-02-28 Thread Robin H. Johnson

On Wed, Feb 28, 2018 at 10:51:29PM +, Sage Weil wrote:
> On Wed, 28 Feb 2018, Dan Mick wrote:
> > Would anyone else appreciate a Google Calendar invitation for the CDMs?
> > Seems like a natural.
> 
> Funny you should mention it!  I was just talking to Leo this morning about 
> creating a public Ceph Events calendar that has all of the public events 
> (CDM, tech talks, weekly perf call, etc.).
> 
> (Also, we're setting up a Ceph Meetings calendar for meetings that aren't 
> completely public that can be shared with active developers for standing 
> meetings that are currently invite-only meetings.  e.g., standups, 
> advisory board, etc.)
Yes please on the calendars!

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread Sébastien VIGNERON

Hi Max,

I had the same issue (under Ubuntu 16.04) but I have read the ceph-deploy 2.0.0 
source code and saw a "—-release" flag for the install subcommand. You can 
found the flag with the following command: ceph-deploy install --help

It looks like the culprit part of ceph-deploy can be found around line 20 of 
/usr/lib/python2.7/dist-packages/ceph_deploy/install.py:

…
14  def sanitize_args(args):
15  """
16  args may need a bunch of logic to set proper defaults that argparse 
is
17  not well suited for.
18  """
19  if args.release is None:
20  args.release = 'jewel'
21  args.default_release = True
22
23  # XXX This whole dance is because --stable is getting deprecated
24  if args.stable is not None:
25  LOG.warning('the --stable flag is deprecated, use --release 
instead')
26  args.release = args.stable
27  # XXX Tango ends here.
28
29  return args

…

Which means we now have to specify "—-release luminous" when we want to install 
a luminous cluster, at least until luminous is considered stable and the 
ceph-deploy tool is changed. 
I think it may be a Kernel version consideration: not all distro have the 
needed minimum version of the kernel (and features) for a full use of luminous.

Cordialement / Best regards,

Sébastien VIGNERON 
CRIANN, 
Ingénieur / Engineer
Technopôle du Madrillet 
745, avenue de l'Université 
76800 Saint-Etienne du Rouvray - France 
tél. +33 2 32 91 42 91 
fax. +33 2 32 91 42 92 
http://www.criann.fr 
mailto:sebastien.vigne...@criann.fr
support: supp...@criann.fr

> Le 1 mars 2018 à 00:37, Max Cuttins  a écrit :
> 
> Didn't check at time.
> 
> I deployed everything from VM standalone.
> The VM was just build up with fresh new centOS7.4 using minimal installation 
> ISO1708.
> It's a completly new/fresh/empty system.
> Then I run:
> 
> yum update -y
> yum install wget zip unzip vim pciutils -y
> yum install epel-release -y
> yum update -y 
> yum install ceph-deploy -y
> yum install yum-plugin-priorities -y
> 
> it installed:
> 
> Feb 27 19:24:47 Installed: ceph-deploy-1.5.37-0.noarch
> -> install ceph with ceph-deploy on 3 nodes.
> 
> As a result I get Jewel.
> Then... I purge everything from all the 3 nodes
> yum update again on ceph deployer node and get:
> 
> Feb 27 20:33:20 Updated: ceph-deploy-2.0.0-0.noarch
> 
> ... then I tried to reinstall over and over but I always get Jewel.
> I tryed to install after removed .ceph file config in my homedir.
> I tryed to install after change default repo to repo-luminous
> ... got always Jewel.
> 
> Only force the release in the ceph-deploy command allow me to install 
> luminous.
> 
> Probably yum-plugin-priorities should not be installed after ceph-deploy even 
> if I didn't run still any command.
> But what is so strange is that purge and reinstall everything will always 
> reinstall Jewel.
> It seems that some lock file has been write somewhere to use Jewel.
> 
> 
> 
> Il 28/02/2018 22:08, David Turner ha scritto:
>> Which version of ceph-deploy are you using?
>> 
>> On Wed, Feb 28, 2018 at 4:37 AM Massimiliano Cuttini > > wrote:
>> This worked.
>> 
>> However somebody should investigate why default is still jewel on Centos 7.4
>> 
>> Il 28/02/2018 00:53, jorpilo ha scritto:
>>> Try using:
>>> ceph-deploy --release luminous host1...
>>> 
>>>  Mensaje original 
>>> De: Massimiliano Cuttini  
>>> Fecha: 28/2/18 12:42 a. m. (GMT+01:00)
>>> Para: ceph-users@lists.ceph.com 
>>> Asunto: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)
>>> 
>>> This is the 5th time that I install and after purge the installation.
>>> Ceph Deploy is alway install JEWEL instead of Luminous.
>>> No way even if I force the repo from default to luminous:
>>> 
>>> https://download.ceph.com/rpm-luminous/el7/noarch 
>>> 
>>> It still install Jewel it's stuck.
>>> I've already checked if I had installed yum-plugin-priorities, and I did it.
>>> Everything is exaclty as the documentation request.
>>> But still I get always Jewel and not Luminous.
>>> 
>>> 
>>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread Max Cuttins


Didn't check at time.

I deployed everything from VM standalone.
The VM was just build up with fresh new centOS7.4 using minimal 
installation ISO1708.

It's a completly new/fresh/empty system.
Then I run:

yum update -y
yum install wget zip unzip vim pciutils -y
yum install epel-release -y
yum update -y
yum install ceph-deploy -y
yum install yum-plugin-priorities -y

it installed:

Feb 27 19:24:47 Installed: ceph-deploy-1.5.37-0.noarch

-> install ceph with ceph-deploy on 3 nodes.

As a result I get Jewel.

Then... I purge everything from all the 3 nodes
yum update again on ceph deployer node and get:

Feb 27 20:33:20 Updated: ceph-deploy-2.0.0-0.noarch

... then I tried to reinstall over and over but I always get Jewel.
I tryed to install after removed .ceph file config in my homedir.
I tryed to install after change default repo to repo-luminous
... got always Jewel.

Only force the release in the ceph-deploy command allow me to install 
luminous.


Probably yum-plugin-priorities should not be installed after ceph-deploy 
even if I didn't run still any command.
But what is so strange is that purge and reinstall everything will 
always reinstall Jewel.

It seems that some lock file has been write somewhere to use Jewel.



Il 28/02/2018 22:08, David Turner ha scritto:

Which version of ceph-deploy are you using?

On Wed, Feb 28, 2018 at 4:37 AM Massimiliano Cuttini 
mailto:m...@phoenixweb.it>> wrote:


This worked.

However somebody should investigate why default is still jewel on
Centos 7.4


Il 28/02/2018 00:53, jorpilo ha scritto:

Try using:
ceph-deploy --release luminous host1...

 Mensaje original 
De: Massimiliano Cuttini 

Fecha: 28/2/18 12:42 a. m. (GMT+01:00)
Para: ceph-users@lists.ceph.com 
Asunto: [ceph-users] ceph-deploy won't install luminous (but
Jewel instead)

This is the 5th time that I install and after purge the installation.
Ceph Deploy is alway install JEWEL instead of Luminous.

No way even if I force the repo from default to luminous:

|https://download.ceph.com/rpm-luminous/el7/noarch|

It still install Jewel it's stuck.

I've already checked if I had installed yum-plugin-priorities,
and I did it.
Everything is exaclty as the documentation request.
But still I get always Jewel and not Luminous.




___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis



Indeed that was the problem!

In case anyone else ever comes to the same condition please keep in 
mind that no matter what you write at the "ceph-deploy" command it will 
use at some point the the output from "hostname -s" and try to connect 
to gather data from that monitor.
If you have changed the hostname like I have done it will not match the 
existing files with the one from the output.
Thus I had to go back for a moment and use the old hostname in order to 
gather keys, deploy them and finally create the MGR. After that the 
cluster came back to healthy condition.


Thank you all for your time.

Regards,

G.


Could it be a problem that I have changed the hostname after the mon
creation?

What I mean is that

# hostname -s
ovhctrl


# ceph daemon mon.$(hostname -s) quorum_status
admin_socket: exception getting command descriptions: [Errno 2] No
such file or directory


But if I do it as "nefelus-controller" which is how was created 
initially


# ceph daemon mon.nefelus-controller quorum_status
{
"election_epoch": 69,
"quorum": [
0
],
"quorum_names": [
"nefelus-controller"
],
"quorum_leader_name": "nefelus-controller",
"monmap": {
"epoch": 2,
"fsid": "d357a551-5b7a-4501-8d8f-009c63b2c972",
"modified": "2018-02-28 18:49:55.985382",
"created": "2017-03-23 22:36:56.897038",
"features": {
"persistent": [
"kraken",
"luminous"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "nefelus-controller",
"addr": "xxx.xxx.xxx.xxx:6789/0",
"public_addr": "xxx.xxx.xxx.xxx:6789/0"
}
]
}
}



Additionally "ceph auth list" has in every entry the [mgr] caps

G.




Hi,

looks like you haven’t run the ceph-deploy command with the same 
user

name and may be not the same current working directory. This could
explain your problem.

Make sure the other daemons have a mgr cap authorisation. You can
find on this ML details about MGR caps being incorrect for OSDs and
MONs after a Jewel to Luminous upgrade. The output of a ceph auth 
list

command should help you find out if it’s the case.

Are your ceph daemons still running? What does a ceph daemon
mon.$(hostname -s) quorum_status gives you from a MON server.

JC

On Feb 28, 2018, at 10:05, Georgios Dimitrakakis 
 wrote:



Indeed John,

you are right! I have updated "ceph-deploy" (which was installed 
via "pip" that's why wasn't updated with the rest ceph packages) but 
now it complaints that keys are missing


$ ceph-deploy mgr create controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr 
create controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  mgr   : 
[('controller', 'controller')]

[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 


[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts 
controller:controller
[ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not 
found; run 'gatherkeys'



and I cannot get the keys...



$ ceph-deploy gatherkeys controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy 
gatherkeys controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  mon   : 
['controller']
[ceph_deploy.cli][INFO  ]  func  : 


[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory 
/tmp/tmpPQ895t

[controller][DEBUG ] connection detected need for sudo
[controller][DEBUG ] connected to host: controller
[controller][DEBUG ] detect platform information fro

Re: [ceph-users] Ceph Developer Monthly - March 2018

2018-02-28 Thread Sage Weil

On Wed, 28 Feb 2018, Dan Mick wrote:
> Would anyone else appreciate a Google Calendar invitation for the CDMs?
> Seems like a natural.

Funny you should mention it!  I was just talking to Leo this morning about 
creating a public Ceph Events calendar that has all of the public events 
(CDM, tech talks, weekly perf call, etc.).

(Also, we're setting up a Ceph Meetings calendar for meetings that aren't 
completely public that can be shared with active developers for standing 
meetings that are currently invite-only meetings.  e.g., standups, 
advisory board, etc.)

sage



> 
> On 02/27/2018 09:37 PM, Leonardo Vaz wrote:
> > Hey Cephers,
> > 
> > This is just a friendly reminder that the next Ceph Developer Monthly
> > meeting is coming up:
> > 
> >  http://wiki.ceph.com/Planning
> > 
> > If you have work that you're doing that it a feature work, significant
> > backports, or anything you would like to discuss with the core team,
> > please add it to the following page:
> > 
> >  http://wiki.ceph.com/CDM_07-MAR-2018
> > 
> > This edition happens on APAC friendly hours (21:00 EST) and we will
> > use the following Bluejeans URL for the video conference:
> > 
> >  https://bluejeans.com/9290089010/
> > 
> > If you have questions or comments, please let us know.
> > 
> > Kindest regards,
> > 
> > Leo
> > 
> 
> 
> -- 
> Dan Mick
> Red Hat, Inc.
> Ceph docs: http://ceph.com/docs
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Developer Monthly - March 2018

2018-02-28 Thread Dan Mick

Would anyone else appreciate a Google Calendar invitation for the CDMs?
Seems like a natural.

On 02/27/2018 09:37 PM, Leonardo Vaz wrote:
> Hey Cephers,
> 
> This is just a friendly reminder that the next Ceph Developer Monthly
> meeting is coming up:
> 
>  http://wiki.ceph.com/Planning
> 
> If you have work that you're doing that it a feature work, significant
> backports, or anything you would like to discuss with the core team,
> please add it to the following page:
> 
>  http://wiki.ceph.com/CDM_07-MAR-2018
> 
> This edition happens on APAC friendly hours (21:00 EST) and we will
> use the following Bluejeans URL for the video conference:
> 
>  https://bluejeans.com/9290089010/
> 
> If you have questions or comments, please let us know.
> 
> Kindest regards,
> 
> Leo
> 


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread David Turner

Which version of ceph-deploy are you using?

On Wed, Feb 28, 2018 at 4:37 AM Massimiliano Cuttini 
wrote:

> This worked.
>
> However somebody should investigate why default is still jewel on Centos
> 7.4
>
> Il 28/02/2018 00:53, jorpilo ha scritto:
>
> Try using:
> ceph-deploy --release luminous host1...
>
>  Mensaje original 
> De: Massimiliano Cuttini  
> Fecha: 28/2/18 12:42 a. m. (GMT+01:00)
> Para: ceph-users@lists.ceph.com
> Asunto: [ceph-users] ceph-deploy won't install luminous (but Jewel
> instead)
>
> This is the 5th time that I install and after purge the installation.
> Ceph Deploy is alway install JEWEL instead of Luminous.
>
> No way even if I force the repo from default to luminous:
>
> https://download.ceph.com/rpm-luminous/el7/noarch
>
> It still install Jewel it's stuck.
>
> I've already checked if I had installed yum-plugin-priorities, and I did
> it.
> Everything is exaclty as the documentation request.
> But still I get always Jewel and not Luminous.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-02-28 Thread Kyle Hutson

I'm following up from awhile ago. I don't think this is the same bug. The
bug referenced shows "abort: Corruption: block checksum mismatch", and I'm
not seeing that on mine.

Now I've had 8 OSDs down on this one server for a couple of weeks, and I
just tried to start it back up. Here's a link to the log of that OSD (which
segfaulted right after starting up):
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log

To me, it looks like the logs are providing surprisingly few hints as to
where the problem lies. Is there a way I can turn up logging to see if I
can get any more info as to why this is happening?

On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:

> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> > We had a 26-node production ceph cluster which we upgraded to Luminous
> > a little over a month ago. I added a 27th-node with Bluestore and
> > didn't have any issues, so I began converting the others, one at a
> > time. The first two went off pretty smoothly, but the 3rd is doing
> > something strange.
> >
> > Initially, all the OSDs came up fine, but then some started to
> > segfault. Out of curiosity more than anything else, I did reboot the
> > server to see if it would get better or worse, and it pretty much
> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
> > those, 3 again segfaulted
> >
> > I picked one that didn't properly come up and copied the log to where
> > anybody can view it:
> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> > 
> >
> > You can contrast that with one that is up:
> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> > 
> >
> > (which is still showing segfaults in the logs, but seems to be
> > recovering from them OK?)
> >
> > Any ideas?
> Ideas ? yes
>
> There is a a bug which is hitting a small number of systems and at this
> time there is no solution. Issues details at
> http://tracker.ceph.com/issues/22102.
>
> Please submit more details of your problem on the ticket.
>
> Mike
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mirror OSD configuration

2018-02-28 Thread David Turner

A more common search term for this might be Rack failure domain.  The
premise is the same for room as it is for rack, both can hold hosts and be
set as the failure domain.  There is a fair bit of discussion on how to
achieve multi-rack/room/datacenter setups.  Datacenter setups are more
likely to have rules that coincide with only having 2 failure domains like
you have here.  When you read about them just s/datacenter/room/ and you're
good.

I'm going to second the concern of only having 2 failure domains.  Unless
you guarantee 2 copies in each room, you're looking at allowing min_size=1
which is just bad practice and a common way to lose data as witnessed on
this list multiple times.  Running with size=4 and min_size=2 seems to be
your best bet here.  It's a lot of overhead, but you can handle an entire
room going down.  Higher levels of redundancy usually come with some cost,
this is the cost here.

On Wed, Feb 28, 2018 at 12:38 PM Gregory Farnum  wrote:

> On Wed, Feb 28, 2018 at 3:02 AM Zoran Bošnjak <
> zoran.bosn...@sloveniacontrol.si> wrote:
>
>> I am aware of monitor consensus requirement. It is taken care of (there
>> is a third room with only monitor node). My problem is about OSD
>> redundancy, since I can only use 2 server rooms for OSDs.
>>
>> I could use EC-pools, lrc or any other ceph configuration. But I could
>> not find a configuration that would address the issue. The write
>> acknowledge rule should read something like this:
>> 1. If both rooms are "up", do not acknowledge write until ack is received
>> from both rooms.
>> 2. If only one room is "up" (forget rule 1.) acknowledge write on the
>> first ack.
>
>
> This check is performed when PGs go active, not on every write, (once a PG
> goes active, it needs a commit from everybody in the set before writes are
> done, or else to go through peering again) but that is the standard
> behavior for Ceph if you configure CRUSH to place data redundantly in both
> rooms.
>
>
>>
>> The ceph documentation talks about recursively defined locality sets, so
>> I assume it allows for different rules on room/rack/host... levels.
>> But as far as I can see, it can not depend on "room" availability.
>>
>> Is this possible to configure?
>> I would appreciate example configuration commands.
>>
>> regards,
>> Zoran
>>
>> 
>> From: Eino Tuominen 
>> Sent: Wednesday, February 28, 2018 8:47 AM
>> To: Zoran Bošnjak; ceph-us...@ceph.com
>> Subject: Re: mirror OSD configuration
>>
>> > Is it possible to configure crush map such that it will tolerate "room"
>> failure? In my case, there is one
>> > network switch per room and one power supply per room, which makes a
>> single point of (room) failure.
>>
>> Hi,
>>
>> You cannot achieve real room redundancy with just two rooms. At minimum
>> you'll need a third room (witness) from which you'll need independent
>> network connections to the two server rooms. Otherwise it's impossible to
>> have monitor quorum when one of the two rooms fails. And then you'd need to
>> consider osd redundancy. You could do with replica size = 4, min_size = 2
>> (or any min_size = n, size = 2*n ), but that's not perfect as you lose
>> exactly half of the replicas in case of a room failure. If you were able to
>> use EC-pools you'd have more options with LRC coding (
>> http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/).
>>
>> We run ceph in a 3 room configuration with 3 monitors, size=3,
>> min_size=2. It works, but it's not without hassle either.
>>
>> --
>>   Eino Tuominen
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis

Could it be a problem that I have changed the hostname after the mon 
creation?


What I mean is that

# hostname -s
ovhctrl


# ceph daemon mon.$(hostname -s) quorum_status
admin_socket: exception getting command descriptions: [Errno 2] No such 
file or directory



But if I do it as "nefelus-controller" which is how was created 
initially


# ceph daemon mon.nefelus-controller quorum_status
{
"election_epoch": 69,
"quorum": [
0
],
"quorum_names": [
"nefelus-controller"
],
"quorum_leader_name": "nefelus-controller",
"monmap": {
"epoch": 2,
"fsid": "d357a551-5b7a-4501-8d8f-009c63b2c972",
"modified": "2018-02-28 18:49:55.985382",
"created": "2017-03-23 22:36:56.897038",
"features": {
"persistent": [
"kraken",
"luminous"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "nefelus-controller",
"addr": "xxx.xxx.xxx.xxx:6789/0",
"public_addr": "xxx.xxx.xxx.xxx:6789/0"
}
]
}
}



Additionally "ceph auth list" has in every entry the [mgr] caps

G.




Hi,

looks like you haven’t run the ceph-deploy command with the same user
name and may be not the same current working directory. This could
explain your problem.

Make sure the other daemons have a mgr cap authorisation. You can
find on this ML details about MGR caps being incorrect for OSDs and
MONs after a Jewel to Luminous upgrade. The output of a ceph auth 
list

command should help you find out if it’s the case.

Are your ceph daemons still running? What does a ceph daemon
mon.$(hostname -s) quorum_status gives you from a MON server.

JC

On Feb 28, 2018, at 10:05, Georgios Dimitrakakis 
 wrote:



Indeed John,

you are right! I have updated "ceph-deploy" (which was installed via 
"pip" that's why wasn't updated with the rest ceph packages) but now 
it complaints that keys are missing


$ ceph-deploy mgr create controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr 
create controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  mgr   : 
[('controller', 'controller')]

[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : mgr at 0x1cce500>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts 
controller:controller
[ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found; 
run 'gatherkeys'



and I cannot get the keys...



$ ceph-deploy gatherkeys controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy 
gatherkeys controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  mon   : 
['controller']
[ceph_deploy.cli][INFO  ]  func  : gatherkeys at 0x198b2a8>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory 
/tmp/tmpPQ895t

[controller][DEBUG ] connection detected need for sudo
[controller][DEBUG ] connected to host: controller
[controller][DEBUG ] detect platform information from remote host
[controller][DEBUG ] detect machine type
[controller][DEBUG ] get remote short hostname
[controller][DEBUG ] fetch remote file
[ceph_deploy.gatherkeys][WARNIN] No mon key found in host: 
controller
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to 
host:controller
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory 
/tmp/tmpPQ895t

[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon





On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
 wrote:

All,

I have updated my test ceph cluster from Jewer (10.2.10) to 
Luminous

(12.2.4) using CentOS

Re: [ceph-users] Ceph SNMP hooks?

2018-02-28 Thread David Turner

You could probably write an SNMP module for the new ceph-mgr daemon. What
do you want to use to monitor Ceph that requires SNMP?

On Wed, Feb 28, 2018 at 1:13 PM Andre Goree  wrote:

> I've looked and haven't found much information besides custom 3rd-party
> plugins so I figured I'd ask here:
>
> Is there a way to monitor a clusters 'health' via SNMP?
>
> --
> Andre Goree
> -=-=-=-=-=-
> Email - andre at drenet.net
> Website   - http://blog.drenet.net
> PGP key   - http://www.drenet.net/pubkey.html
> -=-=-=-=-=-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Jean-Charles Lopez

Hi,

looks like you haven’t run the ceph-deploy command with the same user name and 
may be not the same current working directory. This could explain your problem.

Make sure the other daemons have a mgr cap authorisation. You can find on this 
ML details about MGR caps being incorrect for OSDs and MONs after a Jewel to 
Luminous upgrade. The output of a ceph auth list command should help you find 
out if it’s the case.

Are your ceph daemons still running? What does a ceph daemon mon.$(hostname -s) 
quorum_status gives you from a MON server.

JC

> On Feb 28, 2018, at 10:05, Georgios Dimitrakakis  wrote:
> 
> 
> Indeed John,
> 
> you are right! I have updated "ceph-deploy" (which was installed via "pip" 
> that's why wasn't updated with the rest ceph packages) but now it complaints 
> that keys are missing
> 
> $ ceph-deploy mgr create controller
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/user/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr create 
> controller
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  mgr   : [('controller', 
> 'controller')]
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  subcommand: create
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   : 
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  func  :  0x1cce500>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts 
> controller:controller
> [ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found; run 
> 'gatherkeys'
> 
> 
> and I cannot get the keys...
> 
> 
> 
> $ ceph-deploy gatherkeys controller
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/user/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy gatherkeys 
> controller
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   : 
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  mon   : ['controller']
> [ceph_deploy.cli][INFO  ]  func  :  gatherkeys at 0x198b2a8>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory /tmp/tmpPQ895t
> [controller][DEBUG ] connection detected need for sudo
> [controller][DEBUG ] connected to host: controller
> [controller][DEBUG ] detect platform information from remote host
> [controller][DEBUG ] detect machine type
> [controller][DEBUG ] get remote short hostname
> [controller][DEBUG ] fetch remote file
> [ceph_deploy.gatherkeys][WARNIN] No mon key found in host: controller
> [ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:controller
> [ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpPQ895t
> [ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon
> 
> 
> 
> 
>> On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
>>  wrote:
>>> All,
>>> 
>>> I have updated my test ceph cluster from Jewer (10.2.10) to Luminous
>>> (12.2.4) using CentOS packages.
>>> 
>>> I have updated all packages, restarted all services with the proper order
>>> but I get a warning that the Manager Daemon doesn't exist.
>>> 
>>> Here is the output:
>>> 
>>> # ceph -s
>>>  cluster:
>>>id: d357a551-5b7a-4501-8d8f-009c63b2c972
>>>health: HEALTH_WARN
>>>no active mgr
>>> 
>>>  services:
>>>mon: 1 daemons, quorum controller
>>>mgr: no daemons active
>>>osd: 2 osds: 2 up, 2 in
>>> 
>>>  data:
>>>pools:   0 pools, 0 pgs
>>>objects: 0 objects, 0 bytes
>>>usage:   0 kB used, 0 kB / 0 kB avail
>>>pgs:
>>> 
>>> 
>>> While at the same time the system service is up and running
>>> 
>>> # systemctl status ceph-mgr.target
>>> ● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service
>>> instances at once
>>>   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; vendor
>>> preset: enabled)
>>>   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago
>>> 
>>> 
>>> I understand that I have to add a new MGR but when I try to do it via
>>> "ceph-deploy" it fails with

Re: [ceph-users] RBD mirroring to DR site

2018-02-28 Thread Jason Dillaman

On Wed, Feb 28, 2018 at 2:56 PM, Brady Deetz  wrote:
> Great. We are read heavy. I assume the journals do not replicate reads. Is
> that correct?

Correct -- only writes (plus discards, snapshots, etc) are replicated.

> On Wed, Feb 28, 2018 at 1:50 PM, Jason Dillaman  wrote:
>>
>> On Wed, Feb 28, 2018 at 2:42 PM, Brady Deetz  wrote:
>> > I'm considering doing one-way rbd mirroring to a DR site. The
>> > documentation
>> > states that my link to the DR site should have sufficient throughput to
>> > support replication.
>> >
>> > Our write activity is bursty. As such, we tend to see moments of high
>> > throughput 4-6gbps followed by long bouts of basically no activity.
>> >
>> > 1) how sensitive is rbd mirroring to latency?
>>
>> It's not sensitive at all -- at the worse case, your journals will
>> expand during the burst period and shrink again during the idle
>> period.
>>
>> > 2) how sensitive is rbd mirroring to falling behind on replication and
>> > having to catch up?
>>
>> It's designed to be asynchronous replication w/ consistency so it
>> doesn't matter to rbd-mirror if it's behind. In fact, you can even
>> configure it to always be X hours behind if you want to have a window
>> for avoiding accidents from propagating to the DR site.
>>
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD mirroring to DR site

2018-02-28 Thread Brady Deetz

Great. We are read heavy. I assume the journals do not replicate reads. Is
that correct?

On Wed, Feb 28, 2018 at 1:50 PM, Jason Dillaman  wrote:

> On Wed, Feb 28, 2018 at 2:42 PM, Brady Deetz  wrote:
> > I'm considering doing one-way rbd mirroring to a DR site. The
> documentation
> > states that my link to the DR site should have sufficient throughput to
> > support replication.
> >
> > Our write activity is bursty. As such, we tend to see moments of high
> > throughput 4-6gbps followed by long bouts of basically no activity.
> >
> > 1) how sensitive is rbd mirroring to latency?
>
> It's not sensitive at all -- at the worse case, your journals will
> expand during the burst period and shrink again during the idle
> period.
>
> > 2) how sensitive is rbd mirroring to falling behind on replication and
> > having to catch up?
>
> It's designed to be asynchronous replication w/ consistency so it
> doesn't matter to rbd-mirror if it's behind. In fact, you can even
> configure it to always be X hours behind if you want to have a window
> for avoiding accidents from propagating to the DR site.
>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD mirroring to DR site

2018-02-28 Thread Jason Dillaman

On Wed, Feb 28, 2018 at 2:42 PM, Brady Deetz  wrote:
> I'm considering doing one-way rbd mirroring to a DR site. The documentation
> states that my link to the DR site should have sufficient throughput to
> support replication.
>
> Our write activity is bursty. As such, we tend to see moments of high
> throughput 4-6gbps followed by long bouts of basically no activity.
>
> 1) how sensitive is rbd mirroring to latency?

It's not sensitive at all -- at the worse case, your journals will
expand during the burst period and shrink again during the idle
period.

> 2) how sensitive is rbd mirroring to falling behind on replication and
> having to catch up?

It's designed to be asynchronous replication w/ consistency so it
doesn't matter to rbd-mirror if it's behind. In fact, you can even
configure it to always be X hours behind if you want to have a window
for avoiding accidents from propagating to the DR site.

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis



I am still trying to figure what is the problem here...

Initially the cluster was updated ok...

# ceph health detail
HEALTH_WARN noout flag(s) set; all OSDs are running luminous or later 
but require_osd_release < luminous; no active mgr

noout flag(s) set
all OSDs are running luminous or later but require_osd_release < 
luminous



While I removed the "noout" flag and set the "ceph osd 
require-osd-release luminous" I had on another window and was following 
the status




# ceph -w
  cluster:
id: d357a551-5b7a-4501-8d8f-009c63b2c972
health: HEALTH_WARN
all OSDs are running luminous or later but 
require_osd_release < luminous

no active mgr

  services:
mon: 1 daemons, quorum nefelus-controller
mgr: no daemons active
osd: 2 osds: 2 up, 2 in

  data:
pools:   11 pools, 152 pgs
objects: 9754 objects, 33754 MB
usage:   67495 MB used, 3648 GB / 3714 GB avail
pgs: 152 active+clean


2018-02-28 19:03:20.105027 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:24.101868 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:25.103605 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:29.815572 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
2671 B/s rd, 89 op/s
2018-02-28 19:03:34.105263 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
4472 B/s rd, 240 op/s
2018-02-28 19:03:35.108174 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
9020 B/s rd, 538 op/s
2018-02-28 19:03:39.104781 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
7598 B/s rd, 453 op/s
2018-02-28 19:03:40.108741 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
9020 B/s rd, 538 op/s
2018-02-28 19:03:44.105574 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
7598 B/s rd, 453 op/s
2018-02-28 19:03:45.107522 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
6696 B/s rd, 471 op/s
2018-02-28 19:03:49.106530 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail; 
3958 B/s rd, 269 op/s
2018-02-28 19:03:50.110731 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:54.107816 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:55.109359 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:03:59.108575 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:00.110692 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:04.109099 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:05.111035 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:09.110238 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:10.112094 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:14.111468 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:15.113370 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:19.112223 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:20.116135 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:24.113174 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:25.114808 mon.nefelus-controller [INF] pgmap 152 pgs: 
152 active+clean; 33754 MB data, 67495 MB used, 3648 GB / 3714 GB avail
2018-02-28 19:04:28.172510 mon.nefelus-controller [INF] setting 
require_min_compat_client to curre

[ceph-users] RBD mirroring to DR site

2018-02-28 Thread Brady Deetz

I'm considering doing one-way rbd mirroring to a DR site. The documentation
states that my link to the DR site should have sufficient throughput to
support replication.

Our write activity is bursty. As such, we tend to see moments of high
throughput 4-6gbps followed by long bouts of basically no activity.

1) how sensitive is rbd mirroring to latency?
2) how sensitive is rbd mirroring to falling behind on replication and
having to catch up?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Mark Schouten

Does Xen still not support RBD? Ceph has been around for years now!


Met vriendelijke groeten,

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl



 Van:   Massimiliano Cuttini  
 Aan:   "ceph-users@lists.ceph.com"  
 Verzonden:   28-2-2018 13:53 
 Onderwerp:   [ceph-users] Ceph iSCSI is a prank? 


 
I was building ceph in order to use with iSCSI.
   But I just see from the docs that need:
CentOS 7.5
 (which is not available yet, it's still at 7.4)
 https://wiki.centos.org/Download   
Kernel 4.17
 (which is not available yet, it is still at 4.15.7)
 https://www.kernel.org/  
So I guess, there is no ufficial support and this is just a bad   prank.
 
Ceph is ready to be used with S3 since many years.
   But need the kernel of the next century to works with such an old   
technology like iSCSI.
   So sad.
  

  

 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


smime.p7s
Description: Electronic Signature S/MIME
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis


OK...now this is getting crazy...


  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:



Where has gone everything??

What's happening here?


G.


Indeed John,

you are right! I have updated "ceph-deploy" (which was installed via
"pip" that's why wasn't updated with the rest ceph packages) but now
it complaints that keys are missing

$ ceph-deploy mgr create controller
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr
create controller
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  mgr   :
[('controller', 'controller')]
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts
controller:controller
[ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found;
run 'gatherkeys'


and I cannot get the keys...



$ ceph-deploy gatherkeys controller
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy
gatherkeys controller
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  mon   : 
['controller']

[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory
/tmp/tmpPQ895t
[controller][DEBUG ] connection detected need for sudo
[controller][DEBUG ] connected to host: controller
[controller][DEBUG ] detect platform information from remote host
[controller][DEBUG ] detect machine type
[controller][DEBUG ] get remote short hostname
[controller][DEBUG ] fetch remote file
[ceph_deploy.gatherkeys][WARNIN] No mon key found in host: controller
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:controller
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory 
/tmp/tmpPQ895t

[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon





On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
 wrote:

All,

I have updated my test ceph cluster from Jewer (10.2.10) to 
Luminous

(12.2.4) using CentOS packages.

I have updated all packages, restarted all services with the proper 
order

but I get a warning that the Manager Daemon doesn't exist.

Here is the output:

# ceph -s
  cluster:
id: d357a551-5b7a-4501-8d8f-009c63b2c972
health: HEALTH_WARN
no active mgr

  services:
mon: 1 daemons, quorum controller
mgr: no daemons active
osd: 2 osds: 2 up, 2 in

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:


While at the same time the system service is up and running

# systemctl status ceph-mgr.target
● ceph-mgr.target - ceph target allowing to start/stop all 
ceph-mgr@.service

instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; 
enabled; vendor

preset: enabled)
   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago


I understand that I have to add a new MGR but when I try to do it 
via

"ceph-deploy" it fails with the following error:


# ceph-deploy mgr create controller
usage: ceph-deploy [-h] [-v | -q] [--version] [--username USERNAME]
   [--overwrite-conf] [--cluster NAME] [--ceph-conf
CEPH_CONF]
   COMMAND ...
ceph-deploy: error: argument COMMAND: invalid choice: 'mgr' (choose 
from
'new', 'install', 'rgw', 'mon', 'mds', 'gatherkeys', 'disk', 'osd', 
'admin',

'repo', 'config', 'uninstall', 'purge', 'purgedata', 'calamari',
'forgetkeys', 'pkg')


You probably have an older version of ceph-deploy, from before it 
knew

how to create mgr daemons.

John




where "controller" is the node where ceph monitor is already 
running.



Any ideas why I cannot do it via "ceph-deploy" and what

[ceph-users] Ceph SNMP hooks?

2018-02-28 Thread Andre Goree

I've looked and haven't found much information besides custom 3rd-party 
plugins so I figured I'd ask here:


Is there a way to monitor a clusters 'health' via SNMP?

--
Andre Goree
-=-=-=-=-=-
Email - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-28 Thread Andre Goree


On 2018/02/27 4:23 pm, John Spray wrote:

On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree  wrote:
Is it still considered best practice to set 'noout' for OSDs that will 
be
going under maintenance, e.g., rebooting an OSD ndoe for a kernel 
update?


I ask, because I've set this twice now during times which the OSDs 
would
only momentarily be 'out', however each time I've done this, the OSDs 
have

become unusable and I've had to rebuild them.


Can you be more specific about "unusable"?  Marking an OSD noout is of
course not meant to harm it!

John



Sorry I should've been more specific. I believe I run into an issue 
where the journal for a given OSD is corrupt and thus prevents the OSD 
from booting.


I did just find a way to flush a journal from an OSD earlier today (I 
hadn't actually troubleshot much and didn't look into getting the OSD 
back, as I should've) which I probably should've done and wouldn't have 
had to re-deploy anything, lol.


In any case, if I run into issues again if/when I need to try this, I'll 
make my way back to this thread.  For right now there is no issue and 
surely my ignorance with Ceph is showing, haha.


Thanks for the replies.

--
Andre Goree
-=-=-=-=-=-
Email - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis



Indeed John,

you are right! I have updated "ceph-deploy" (which was installed via 
"pip" that's why wasn't updated with the rest ceph packages) but now it 
complaints that keys are missing


$ ceph-deploy mgr create controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy mgr 
create controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  mgr   : 
[('controller', 'controller')]

[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : mgr at 0x1cce500>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts 
controller:controller
[ceph_deploy][ERROR ] RuntimeError: bootstrap-mgr keyring not found; 
run 'gatherkeys'



and I cannot get the keys...



$ ceph-deploy gatherkeys controller
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.0): /usr/bin/ceph-deploy 
gatherkeys controller

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  mon   : 
['controller']
[ceph_deploy.cli][INFO  ]  func  : gatherkeys at 0x198b2a8>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory 
/tmp/tmpPQ895t

[controller][DEBUG ] connection detected need for sudo
[controller][DEBUG ] connected to host: controller
[controller][DEBUG ] detect platform information from remote host
[controller][DEBUG ] detect machine type
[controller][DEBUG ] get remote short hostname
[controller][DEBUG ] fetch remote file
[ceph_deploy.gatherkeys][WARNIN] No mon key found in host: controller
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:controller
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpPQ895t
[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon





On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
 wrote:

All,

I have updated my test ceph cluster from Jewer (10.2.10) to Luminous
(12.2.4) using CentOS packages.

I have updated all packages, restarted all services with the proper 
order

but I get a warning that the Manager Daemon doesn't exist.

Here is the output:

# ceph -s
  cluster:
id: d357a551-5b7a-4501-8d8f-009c63b2c972
health: HEALTH_WARN
no active mgr

  services:
mon: 1 daemons, quorum controller
mgr: no daemons active
osd: 2 osds: 2 up, 2 in

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:


While at the same time the system service is up and running

# systemctl status ceph-mgr.target
● ceph-mgr.target - ceph target allowing to start/stop all 
ceph-mgr@.service

instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; 
vendor

preset: enabled)
   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago


I understand that I have to add a new MGR but when I try to do it 
via

"ceph-deploy" it fails with the following error:


# ceph-deploy mgr create controller
usage: ceph-deploy [-h] [-v | -q] [--version] [--username USERNAME]
   [--overwrite-conf] [--cluster NAME] [--ceph-conf
CEPH_CONF]
   COMMAND ...
ceph-deploy: error: argument COMMAND: invalid choice: 'mgr' (choose 
from
'new', 'install', 'rgw', 'mon', 'mds', 'gatherkeys', 'disk', 'osd', 
'admin',

'repo', 'config', 'uninstall', 'purge', 'purgedata', 'calamari',
'forgetkeys', 'pkg')


You probably have an older version of ceph-deploy, from before it 
knew

how to create mgr daemons.

John




where "controller" is the node where ceph monitor is already 
running.



Any ideas why I cannot do it via "ceph-deploy" and what do I have to 
do to

have it back in a healthy state?


I am running CentOS 7.4.1708 (Core).

Best,

G.

___
ceph-

Re: [ceph-users] Cannot Create MGR

2018-02-28 Thread John Spray

On Wed, Feb 28, 2018 at 5:21 PM, Georgios Dimitrakakis
 wrote:
> All,
>
> I have updated my test ceph cluster from Jewer (10.2.10) to Luminous
> (12.2.4) using CentOS packages.
>
> I have updated all packages, restarted all services with the proper order
> but I get a warning that the Manager Daemon doesn't exist.
>
> Here is the output:
>
> # ceph -s
>   cluster:
> id: d357a551-5b7a-4501-8d8f-009c63b2c972
> health: HEALTH_WARN
> no active mgr
>
>   services:
> mon: 1 daemons, quorum controller
> mgr: no daemons active
> osd: 2 osds: 2 up, 2 in
>
>   data:
> pools:   0 pools, 0 pgs
> objects: 0 objects, 0 bytes
> usage:   0 kB used, 0 kB / 0 kB avail
> pgs:
>
>
> While at the same time the system service is up and running
>
> # systemctl status ceph-mgr.target
> ● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service
> instances at once
>Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; vendor
> preset: enabled)
>Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago
>
>
> I understand that I have to add a new MGR but when I try to do it via
> "ceph-deploy" it fails with the following error:
>
>
> # ceph-deploy mgr create controller
> usage: ceph-deploy [-h] [-v | -q] [--version] [--username USERNAME]
>[--overwrite-conf] [--cluster NAME] [--ceph-conf
> CEPH_CONF]
>COMMAND ...
> ceph-deploy: error: argument COMMAND: invalid choice: 'mgr' (choose from
> 'new', 'install', 'rgw', 'mon', 'mds', 'gatherkeys', 'disk', 'osd', 'admin',
> 'repo', 'config', 'uninstall', 'purge', 'purgedata', 'calamari',
> 'forgetkeys', 'pkg')

You probably have an older version of ceph-deploy, from before it knew
how to create mgr daemons.

John

>
>
> where "controller" is the node where ceph monitor is already running.
>
>
> Any ideas why I cannot do it via "ceph-deploy" and what do I have to do to
> have it back in a healthy state?
>
>
> I am running CentOS 7.4.1708 (Core).
>
> Best,
>
> G.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mirror OSD configuration

2018-02-28 Thread Gregory Farnum

On Wed, Feb 28, 2018 at 3:02 AM Zoran Bošnjak <
zoran.bosn...@sloveniacontrol.si> wrote:

> I am aware of monitor consensus requirement. It is taken care of (there is
> a third room with only monitor node). My problem is about OSD redundancy,
> since I can only use 2 server rooms for OSDs.
>
> I could use EC-pools, lrc or any other ceph configuration. But I could not
> find a configuration that would address the issue. The write acknowledge
> rule should read something like this:
> 1. If both rooms are "up", do not acknowledge write until ack is received
> from both rooms.
> 2. If only one room is "up" (forget rule 1.) acknowledge write on the
> first ack.


This check is performed when PGs go active, not on every write, (once a PG
goes active, it needs a commit from everybody in the set before writes are
done, or else to go through peering again) but that is the standard
behavior for Ceph if you configure CRUSH to place data redundantly in both
rooms.


>
> The ceph documentation talks about recursively defined locality sets, so I
> assume it allows for different rules on room/rack/host... levels.
> But as far as I can see, it can not depend on "room" availability.
>
> Is this possible to configure?
> I would appreciate example configuration commands.
>
> regards,
> Zoran
>
> 
> From: Eino Tuominen 
> Sent: Wednesday, February 28, 2018 8:47 AM
> To: Zoran Bošnjak; ceph-us...@ceph.com
> Subject: Re: mirror OSD configuration
>
> > Is it possible to configure crush map such that it will tolerate "room"
> failure? In my case, there is one
> > network switch per room and one power supply per room, which makes a
> single point of (room) failure.
>
> Hi,
>
> You cannot achieve real room redundancy with just two rooms. At minimum
> you'll need a third room (witness) from which you'll need independent
> network connections to the two server rooms. Otherwise it's impossible to
> have monitor quorum when one of the two rooms fails. And then you'd need to
> consider osd redundancy. You could do with replica size = 4, min_size = 2
> (or any min_size = n, size = 2*n ), but that's not perfect as you lose
> exactly half of the replicas in case of a room failure. If you were able to
> use EC-pools you'd have more options with LRC coding (
> http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/).
>
> We run ceph in a 3 room configuration with 3 monitors, size=3, min_size=2.
> It works, but it's not without hassle either.
>
> --
>   Eino Tuominen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cannot Create MGR

2018-02-28 Thread Georgios Dimitrakakis


All,

I have updated my test ceph cluster from Jewer (10.2.10) to Luminous 
(12.2.4) using CentOS packages.


I have updated all packages, restarted all services with the proper 
order but I get a warning that the Manager Daemon doesn't exist.


Here is the output:

# ceph -s
  cluster:
id: d357a551-5b7a-4501-8d8f-009c63b2c972
health: HEALTH_WARN
no active mgr

  services:
mon: 1 daemons, quorum controller
mgr: no daemons active
osd: 2 osds: 2 up, 2 in

  data:
pools:   0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:


While at the same time the system service is up and running

# systemctl status ceph-mgr.target
● ceph-mgr.target - ceph target allowing to start/stop all 
ceph-mgr@.service instances at once
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; enabled; 
vendor preset: enabled)

   Active: active since Wed 2018-02-28 18:57:13 EET; 12min ago


I understand that I have to add a new MGR but when I try to do it via 
"ceph-deploy" it fails with the following error:



# ceph-deploy mgr create controller
usage: ceph-deploy [-h] [-v | -q] [--version] [--username USERNAME]
   [--overwrite-conf] [--cluster NAME] [--ceph-conf 
CEPH_CONF]

   COMMAND ...
ceph-deploy: error: argument COMMAND: invalid choice: 'mgr' (choose 
from 'new', 'install', 'rgw', 'mon', 'mds', 'gatherkeys', 'disk', 'osd', 
'admin', 'repo', 'config', 'uninstall', 'purge', 'purgedata', 
'calamari', 'forgetkeys', 'pkg')



where "controller" is the node where ceph monitor is already running.


Any ideas why I cannot do it via "ceph-deploy" and what do I have to do 
to have it back in a healthy state?



I am running CentOS 7.4.1708 (Core).

Best,

G.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Sizing your MON storage with a large cluster

2018-02-28 Thread Dan van der Ster

Hi Wido,

Are your mon's using rocksdb or still leveldb?

Are your mon stores trimming back to a small size after HEALTH_OK was restored?

One v12.2.2 cluster here just started showing the "is using a lot of
disk space" warning on one of our mons. In fact all three mons are now
using >16GB. I tried compacting and resyncing an empty mon but those
don't trim anything -- there really is 16GB of data mon store for this
healthy cluster.

(The mon's on this cluster were using ~560MB before updating to
luminous back in December.)

Any thoughts?

Cheers, Dan


On Sat, Feb 3, 2018 at 4:50 PM, Wido den Hollander  wrote:
> Hi,
>
> I just wanted to inform people about the fact that Monitor databases can
> grow quite big when you have a large cluster which is performing a very long
> rebalance.
>
> I'm posting this on ceph-users and ceph-large as it applies to both, but
> you'll see this sooner on a cluster with a lof of OSDs.
>
> Some information:
>
> - Version: Luminous 12.2.2
> - Number of OSDs: 2175
> - Data used: ~2PB
>
> We are in the middle of migrating from FileStore to BlueStore and this is
> causing a lot of PGs to backfill at the moment:
>
>  33488 active+clean
>  4802  active+undersized+degraded+remapped+backfill_wait
>  1670  active+remapped+backfill_wait
>  263   active+undersized+degraded+remapped+backfilling
>  250   active+recovery_wait+degraded
>  54active+recovery_wait+degraded+remapped
>  27active+remapped+backfilling
>  13active+recovery_wait+undersized+degraded+remapped
>  2 active+recovering+degraded
>
> This has been running for a few days now and it has caused this warning:
>
> MON_DISK_BIG mons
> srv-zmb03-05,srv-zmb04-05,srv-zmb05-05,srv-zmb06-05,srv-zmb07-05 are using a
> lot of disk space
> mon.srv-zmb03-05 is 31666 MB >= mon_data_size_warn (15360 MB)
> mon.srv-zmb04-05 is 31670 MB >= mon_data_size_warn (15360 MB)
> mon.srv-zmb05-05 is 31670 MB >= mon_data_size_warn (15360 MB)
> mon.srv-zmb06-05 is 31897 MB >= mon_data_size_warn (15360 MB)
> mon.srv-zmb07-05 is 31891 MB >= mon_data_size_warn (15360 MB)
>
> This is to be expected as MONs do not trim their store if one or more PGs is
> not active+clean.
>
> In this case we expected this and the MONs are each running on a 1TB Intel
> DC-series SSD to make sure we do not run out of space before the backfill
> finishes.
>
> The cluster is spread out over racks and in CRUSH we replicate over racks.
> Rack by rack we are wiping/destroying the OSDs and bringing them back as
> BlueStore OSDs and letting the backfill handle everything.
>
> In between we wait for the cluster to become HEALTH_OK (all PGs
> active+clean) so that the Monitors can trim their database before we start
> with the next rack.
>
> I just want to warn and inform people about this. Under normal circumstances
> a MON database isn't that big, but if you have a very long period of
> backfills/recoveries and also have a large number of OSDs you'll see the DB
> grow quite big.
>
> This has improved significantly going to Jewel and Luminous, but it is still
> something to watch out for.
>
> Make sure your MONs have enough free space to handle this!
>
> Wido
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread David Turner

My thought is that in 4 years you could have migrated to a hypervisor that
will have better performance into ceph than an added iSCSI layer. I won't
deploy VMs for ceph on anything that won't allow librbd to work. Anything
else is added complexity and reduced performance.

On Wed, Feb 28, 2018, 11:49 AM Jason Dillaman  wrote:

> On Wed, Feb 28, 2018 at 9:17 AM, Max Cuttins  wrote:
> > Sorry for being rude Ross,
> >
> > I follow Ceph since 2014 waiting for iSCSI support in order to use it
> with
> > Xen.
>
> What OS are you using in Dom0 that you cannot just directly use krbd?
> iSCSI is going to add an extra hop so it will never be able to match
> the performance of something that is directly talking to the OSDs.
>
> > When finally it seemds it was implemented the OS requirements are
> > irrealistic.
> > Seems a bad prank. 4 year waiting for this... and still not true support
> > yet.
> >
> >
> >
> >
> >
> > Il 28/02/2018 14:11, Marc Roos ha scritto:
> >>
> >>   Hi Massimiliano, have an espresso. You know the indians have a nice
> >> saying
> >>
> >> "Everything will be good at the end. If it is not good, it is still not
> >> the end."
> >>
> >>
> >>
> >> -Original Message-
> >> From: Massimiliano Cuttini [mailto:m...@phoenixweb.it]
> >> Sent: woensdag 28 februari 2018 13:53
> >> To: ceph-users@lists.ceph.com
> >> Subject: [ceph-users] Ceph iSCSI is a prank?
> >>
> >> I was building ceph in order to use with iSCSI.
> >> But I just see from the docs that need:
> >>
> >> CentOS 7.5
> >> (which is not available yet, it's still at 7.4)
> >> https://wiki.centos.org/Download
> >>
> >> Kernel 4.17
> >> (which is not available yet, it is still at 4.15.7)
> >> https://www.kernel.org/
> >>
> >> So I guess, there is no ufficial support and this is just a bad prank.
> >>
> >> Ceph is ready to be used with S3 since many years.
> >> But need the kernel of the next century to works with such an old
> >> technology like iSCSI.
> >> So sad.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread Oliver Freyermuth

Am 28.02.2018 um 16:09 schrieb Patrick Donnelly:
> On Tue, Feb 27, 2018 at 3:27 PM, Oliver Freyermuth
>  wrote:
>> As you can see:
>> - Name collision for admin socket, since the helper is already running.
> 
> You can change the admin socket path using the `admin socket` config
> variable. Use metavariables [1] to make the path unique.

Thanks for the link - did not know these yet :-). 

> 
>> - A second helper for the same mountpoint was fired up!
> 
> This is expected. If you want a single ceph-fuse mount then you need
> to persist the mount in the host namespace somewhere (using bind
> mounts) so you can reuse it. 

I still have to wonder how ntfs-3g does it - I don't see a persisted mount on 
the host,
i.e. nothing in /etc/mtab or with "mount" after the "umount" on the host. 
Still, when I try "mount" again on the host, the existing helper which is 
taking care of the child namespace
"adopts" the mount in the host namespace instead of spawning a new helper. 

My best guess (without any knowledge about Fuse...) would be that the new mount 
request shortly spawns a new helper,
which finds the old helper (via an admin socket, SHM or whatever - or maybe 
Fuse offers enumeration of other helpers?), 
and tells it to take care of the new mount, before it exits quickly. 

> However, mind what David Turner said
> regarding using a single ceph-fuse client for multiple containers.
> Right now parallel requests are not handled well in the client so it
> can be slow for multiple applications (or containers). Another option
> is to use a kernel mount which would be more performant and also allow
> parallel requests.

The kernel mount is sadly out - we need quota support :-(. 

Since we are currently network bandwidth limited (in huge file case) and MDS 
limited (in metadata ops / small file case), 
I expect the gains would be small (and come with a cost of a factor 28 of 
memory for ceph-fuse on the client machines,
and likely even higher MDS load since there are more clients). 
If I find some way to do this unprivileged, I'll give it a try, so many thanks 
for all the info! 

Cheers,
Oliver

> 
>> - On a side-note, once I exit the container (and hence close the mount 
>> namespace), the "old" helper is finally freed.
> 
> Once the last mount point is unmounted, FUSE will destroy the userspace 
> helper.
> 
> [1] 
> http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=configuration#metavariables
> 

smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Jason Dillaman

On Wed, Feb 28, 2018 at 9:17 AM, Max Cuttins  wrote:
> Sorry for being rude Ross,
>
> I follow Ceph since 2014 waiting for iSCSI support in order to use it with
> Xen.

What OS are you using in Dom0 that you cannot just directly use krbd?
iSCSI is going to add an extra hop so it will never be able to match
the performance of something that is directly talking to the OSDs.

> When finally it seemds it was implemented the OS requirements are
> irrealistic.
> Seems a bad prank. 4 year waiting for this... and still not true support
> yet.
>
>
>
>
>
> Il 28/02/2018 14:11, Marc Roos ha scritto:
>>
>>   Hi Massimiliano, have an espresso. You know the indians have a nice
>> saying
>>
>> "Everything will be good at the end. If it is not good, it is still not
>> the end."
>>
>>
>>
>> -Original Message-
>> From: Massimiliano Cuttini [mailto:m...@phoenixweb.it]
>> Sent: woensdag 28 februari 2018 13:53
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] Ceph iSCSI is a prank?
>>
>> I was building ceph in order to use with iSCSI.
>> But I just see from the docs that need:
>>
>> CentOS 7.5
>> (which is not available yet, it's still at 7.4)
>> https://wiki.centos.org/Download
>>
>> Kernel 4.17
>> (which is not available yet, it is still at 4.15.7)
>> https://www.kernel.org/
>>
>> So I guess, there is no ufficial support and this is just a bad prank.
>>
>> Ceph is ready to be used with S3 since many years.
>> But need the kernel of the next century to works with such an old
>> technology like iSCSI.
>> So sad.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Slow clients after git pull

2018-02-28 Thread Daniel Carrasco

Hello,

I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. The
webpage speed is good enough (near to NFS speed), and have HA if one FS die.
My problem comes when I deploy a git repository on that FS. The server
makes a lot of IOPS to check the files that have to update and then all
clients starts to have problems to use the FS (it becomes much slower).
In a normal usage the web takes about 400ms to load, and when the problem
start it takes more than 3s. To fix the problem I just have to remount the
FS on clients, but I can't remount the FS on every deploy...

While is deploying I see how the CPU on MDS is a bit higher, but when it
ends the CPU usage goes down again, so look like is not a problem of CPU.

My config file is:
[global]
fsid = bf56854..e611c08
mon_initial_members = fs-01, fs-02, fs-03
mon_host = 10.50.0.94,10.50.1.216,10.50.2.52
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public network = 10.50.0.0/22
osd pool default size = 3

##
### OSD
##
[osd]
  osd_mon_heartbeat_interval = 5
  osd_mon_report_interval_max = 10
  osd_heartbeat_grace = 15
  osd_fast_fail_on_connection_refused = True
  osd_pool_default_pg_num = 128
  osd_pool_default_pgp_num = 128
  osd_pool_default_size = 2
  osd_pool_default_min_size = 2

##
### Monitores
##
[mon]
  mon_osd_min_down_reporters = 1

##
### MDS
##
[mds]
  mds_cache_memory_limit = 792723456
  mds_bal_mode = 1

##
### Client
##
[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 629145600
  client_permissions = false
  rbd_cache = true
  rbd_cache_size = 671088640

My cluster and clients uses Debian 9 with latest ceph version (12.2.4). The
clients uses kernel modules to mount the share, because are a bit faster
than fuse modules. The deploy is done on one of the Ceph nodes, that have
the FS mounted by kernel module too.
My cluster is not a high usage cluster, so have all daemons on one machine
(3 machines with OSD, MON, MGR and MDS). All OSD has a copy of the data,
only one MGR is active and two of the MDS are active with one on standby.
The clients mount the FS using the three MDS IP addresses and just now
don't have any request because is not published.

Someone knows what can be happening?, because all works fine (even on other
cluster I did with an high load), but just deploy the git repository and
all start to work very slow.

Thanks!!


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Nico Schottelius

Max,

I understand your frustration.
However, last time I checked, ceph was open source.

Some of you might not remember, but one major reason why open source is
great is that YOU CAN DO your own modifications.

If you need a change like iSCSI support and it isn't there,
it is probably best, if you implement it.

Even if a lot of people are voluntarily contributing to open source
and even if there is a company behind ceph as a product, there
is no right for a feature.

Best,

Nico

p.s.: If your answer is "I don't have experience to implement it" then
my answer will be "hire somebody" and if your answer is "I don't have the
money", my answer is "You don't have the resource to have that feature".
(from: the book of reality)

Max Cuttins  writes:

> Sorry for being rude Ross,
>
> I follow Ceph since 2014 waiting for iSCSI support in order to use it
> with Xen.
> When finally it seemds it was implemented the OS requirements are
> irrealistic.
> Seems a bad prank. 4 year waiting for this... and still not true support
> yet.

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous v12.2.4 released

2018-02-28 Thread Abhishek Lekshmanan


This is the fourth bugfix release of Luminous v12.2.x long term stable
release series. This was primarily intended to fix a few build,
ceph-volume/ceph-disk issues from 12.2.3 and a few RGW issues. We
recommend all the users of 12.2.x series to update. A full changelog is
also published at the official release blog at
https://ceph.com/releases/v12-2-4-luminous-released/

Notable Changes
---
* cmake: check bootstrap.sh instead before downloading boost (issue#23071, 
pr#20515, Kefu Chai)
* core: Backport of cache manipulation: issues #22603 and #22604 (issue#22604, 
issue#22603, pr#20353, Adam C. Emerson)
* core: Snapset inconsistency is detected with its own error (issue#22996, 
pr#20501, David Zafman)
* tools: ceph-objectstore-tool: "$OBJ get-omaphdr" and "$OBJ list-omap" scan 
all pgs instead of using specific pg (issue#21327, pr#20283, David Zafman)
* ceph-volume: warn on mix of filestore and bluestore flags (issue#23003, 
pr#20568, Alfredo Deza)
* ceph-volume: adds support to zap encrypted devices (issue#22878, pr#20545, 
Andrew Schoen)
* ceph-volume: log the current running command for easier debugging 
(issue#23004, pr#20597, Andrew Schoen)
* core: last-stat-seq returns 0 because osd stats are cleared (issue#23093, 
pr#20548, Sage Weil, David Zafman)
* rgw:  make init env methods return an error (issue#23039, pr#20564, Abhishek 
Lekshmanan)
* rgw: URL-decode S3 and Swift object-copy URLs (issue#22121, issue#22729, 
pr#20236, Malcolm Lee, Matt Benjamin)
* rgw: parse old rgw_obj with namespace correctly (issue#22982, pr#20566, 
Yehuda Sadeh)
* rgw: return valid Location element, CompleteMultipartUpload (issue#22655, 
pr#20266, Matt Benjamin)
* rgw: use explicit index pool placement (issue#22928, pr#20565, Yehuda Sadeh)
* tools: ceph-disk: v12.2.2 unable to create bluestore osd using ceph-disk 
(issue#22354, pr#20563, Kefu Chai)

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-12.2.4.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* For ceph-deploy, see 
http://docs.ceph.com/docs/master/install/install-ceph-deploy
* Release git sha1: 52085d5249a80c5f5121a76d6288429f35e4e77b

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Jason Dillaman

On Wed, Feb 28, 2018 at 10:06 AM, Max Cuttins  wrote:
>
>
> Il 28/02/2018 15:19, Jason Dillaman ha scritto:
>>
>> On Wed, Feb 28, 2018 at 7:53 AM, Massimiliano Cuttini 
>> wrote:
>>>
>>> I was building ceph in order to use with iSCSI.
>>> But I just see from the docs that need:
>>>
>>> CentOS 7.5
>>> (which is not available yet, it's still at 7.4)
>>> https://wiki.centos.org/Download
>>>
>>> Kernel 4.17
>>> (which is not available yet, it is still at 4.15.7)
>>> https://www.kernel.org/
>>
>> The necessary kernel changes actually are included as part of 4.16-rc1
>> which is available now. We also offer a pre-built test kernel with the
>> necessary fixes here [1].
>
> This is a release candidate and it's not ready for production.
> Does anybody know when the kernel 4.16 will be ready for production?
>
>
>>
>>> So I guess, there is no ufficial support and this is just a bad prank.
>>>
>>> Ceph is ready to be used with S3 since many years.
>>> But need the kernel of the next century to works with such an old
>>> technology
>>> like iSCSI.
>>> So sad.
>>
>> Unfortunately, kernel vs userspace have very different development
>> timelines.We have no interest in maintaining out-of-tree patchsets to
>> the kernel.
>
>
> This is true, but having something that just works in order to have minimum
> compatibility and start to dismiss old disk is something you should think
> about.
> You'll have ages in order to improve and get better performance. But you
> should allow Users to cut-off old solutions as soon as possible while
> waiting for a better implementation.

That's exactly what is included in the kernel changes -- changes
required to stabilize LIO iSCSI with RBD (specifically in a clustered
environment). You have been able to use LIO, TGT, SCST, SPDK,  with various levels of
capabilities for a while now. There are plenty of performance changes
that are still required along with additional features like support
for SCSI persistent group reservations. Most important of all,
remember that this is a *free* open source project, so it might be
recommended to set your demands and expectations accordingly.

>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
>>
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] reweight-by-utilization reverse weight after adding new nodes?

2018-02-28 Thread Martin Palma

Thank you for your recommendation.

On Mon, Feb 26, 2018 at 5:03 PM, David Turner  wrote:
> I would recommend continuing from where you are now and running `ceph osd
> reweight-by-utilization` again.  Your weights might be a little more odd,
> but your data distribution should be the same.  If you were to reset the
> weights for the previous OSDs, you would only incur an additional round of
> reweighting for no discernible benefit.
>
> On Mon, Feb 26, 2018 at 7:13 AM Martin Palma  wrote:
>>
>> Hello,
>>
>> from some OSDs in our cluster we got the "nearfull" warning message so
>> we run the "ceph osd reweight-by-utilization" command to better
>> distribute the data.
>>
>> Now we have expanded out cluster with new nodes should we reverse the
>> weight of the changed OSDs to 1.0?
>>
>> Best,
>> Martin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Erik McCormick

On Feb 28, 2018 10:06 AM, "Max Cuttins"  wrote:



Il 28/02/2018 15:19, Jason Dillaman ha scritto:

> On Wed, Feb 28, 2018 at 7:53 AM, Massimiliano Cuttini 
> wrote:
>
>> I was building ceph in order to use with iSCSI.
>> But I just see from the docs that need:
>>
>> CentOS 7.5
>> (which is not available yet, it's still at 7.4)
>> https://wiki.centos.org/Download
>>
>> Kernel 4.17
>> (which is not available yet, it is still at 4.15.7)
>> https://www.kernel.org/
>>
> The necessary kernel changes actually are included as part of 4.16-rc1
> which is available now. We also offer a pre-built test kernel with the
> necessary fixes here [1].
>
This is a release candidate and it's not ready for production.
Does anybody know when the kernel 4.16 will be ready for production?


Release date is late March / early April.





> So I guess, there is no ufficial support and this is just a bad prank.
>>
>> Ceph is ready to be used with S3 since many years.
>> But need the kernel of the next century to works with such an old
>> technology
>> like iSCSI.
>> So sad.
>>
> Unfortunately, kernel vs userspace have very different development
> timelines.We have no interest in maintaining out-of-tree patchsets to
> the kernel.
>

This is true, but having something that just works in order to have minimum
compatibility and start to dismiss old disk is something you should think
about.
You'll have ages in order to improve and get better performance. But you
should allow Users to cut-off old solutions as soon as possible while
waiting for a better implementation.



>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Memory leak in Ceph OSD?

2018-02-28 Thread Igor Fedotov


Hi Stefan,

can you disable compression and check if memory is still leaking.

If it stops then the issue is definitely somewhere along the "compress" 
path.



Thanks,

Igor


On 2/28/2018 6:18 PM, Stefan Kooman wrote:

Hi,

TL;DR: we see "used" memory grows indefinitely on our OSD servers.
Until the point that either 1) a OSD process gets killed by OOMkiller,
or 2) OSD aborts (proably because malloc cannot provide more RAM). I
suspect a memory leak of the OSDs.

We were running 12.2.2. We are now running 12.2.3. Replicated setup,
SIZE=3, MIN_SIZE=2. All servers were rebooted. The "used" memory is
slowly, but steadily growing.

ceph.conf:
bluestore_cache_size=6G

ceph daemon osd.$daemon dump_mempools info gives:

 "total": {
 "items": 52925781,
 "bytes": 6058227868

... for roughly all OSDs. So the OSD process is not "exceeding" what it
*thinks* it's using.

We haven't noticed this during the "pre-production" phase of the cluster. Main
difference with "pre-production" and "production" is that we are using
"compression" on the pool.

ceph osd pool set $pool compression_algorithm snappy
ceph osd pool set $pool compression_mode aggressive

I haven't seen any of you complaining about memory leaks besides the well know
leak in 12.2.1. How many of you are using compression like this? If it has
anything to do with this at all ...

Currently at ~ 60 GB used with 2 days uptime. 42 GB of RAM usage for all OSDs
... 18 GB leaked?

If Ceph keeps releasing minor versions so quickly it will never really become a
big problem ;-).

Any hints to analyse this issue?

Gr. Stefan






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Memory leak in Ceph OSD?

2018-02-28 Thread Stefan Kooman

Hi,

TL;DR: we see "used" memory grows indefinitely on our OSD servers.
Until the point that either 1) a OSD process gets killed by OOMkiller,
or 2) OSD aborts (proably because malloc cannot provide more RAM). I
suspect a memory leak of the OSDs.

We were running 12.2.2. We are now running 12.2.3. Replicated setup,
SIZE=3, MIN_SIZE=2. All servers were rebooted. The "used" memory is
slowly, but steadily growing.

ceph.conf:
bluestore_cache_size=6G

ceph daemon osd.$daemon dump_mempools info gives:

"total": {
"items": 52925781,
"bytes": 6058227868

... for roughly all OSDs. So the OSD process is not "exceeding" what it
*thinks* it's using.

We haven't noticed this during the "pre-production" phase of the cluster. Main
difference with "pre-production" and "production" is that we are using
"compression" on the pool.

ceph osd pool set $pool compression_algorithm snappy
ceph osd pool set $pool compression_mode aggressive

I haven't seen any of you complaining about memory leaks besides the well know
leak in 12.2.1. How many of you are using compression like this? If it has
anything to do with this at all ...

Currently at ~ 60 GB used with 2 days uptime. 42 GB of RAM usage for all OSDs
... 18 GB leaked?

If Ceph keeps releasing minor versions so quickly it will never really become a
big problem ;-).

Any hints to analyse this issue?

Gr. Stefan




-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread Patrick Donnelly

On Tue, Feb 27, 2018 at 3:27 PM, Oliver Freyermuth
 wrote:
> As you can see:
> - Name collision for admin socket, since the helper is already running.

You can change the admin socket path using the `admin socket` config
variable. Use metavariables [1] to make the path unique.

> - A second helper for the same mountpoint was fired up!

This is expected. If you want a single ceph-fuse mount then you need
to persist the mount in the host namespace somewhere (using bind
mounts) so you can reuse it. However, mind what David Turner said
regarding using a single ceph-fuse client for multiple containers.
Right now parallel requests are not handled well in the client so it
can be slow for multiple applications (or containers). Another option
is to use a kernel mount which would be more performant and also allow
parallel requests.

> - On a side-note, once I exit the container (and hence close the mount 
> namespace), the "old" helper is finally freed.

Once the last mount point is unmounted, FUSE will destroy the userspace helper.

[1] 
http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=configuration#metavariables

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Max Cuttins




Il 28/02/2018 15:19, Jason Dillaman ha scritto:

On Wed, Feb 28, 2018 at 7:53 AM, Massimiliano Cuttini  
wrote:

I was building ceph in order to use with iSCSI.
But I just see from the docs that need:

CentOS 7.5
(which is not available yet, it's still at 7.4)
https://wiki.centos.org/Download

Kernel 4.17
(which is not available yet, it is still at 4.15.7)
https://www.kernel.org/

The necessary kernel changes actually are included as part of 4.16-rc1
which is available now. We also offer a pre-built test kernel with the
necessary fixes here [1].

This is a release candidate and it's not ready for production.
Does anybody know when the kernel 4.16 will be ready for production?





So I guess, there is no ufficial support and this is just a bad prank.

Ceph is ready to be used with S3 since many years.
But need the kernel of the next century to works with such an old technology
like iSCSI.
So sad.

Unfortunately, kernel vs userspace have very different development
timelines.We have no interest in maintaining out-of-tree patchsets to
the kernel.


This is true, but having something that just works in order to have 
minimum compatibility and start to dismiss old disk is something you 
should think about.
You'll have ages in order to improve and get better performance. But you 
should allow Users to cut-off old solutions as soon as possible while 
waiting for a better implementation.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Increase recovery / backfilling speed (with many small objects)

2018-02-28 Thread Stefan Kooman

Hi,

We recently learned on this list about the "rotational_journal = 1" for
some (all?) NVMe / SSD setups. We also hit this issue (see below). It
would eventually take a week to recover ... This was all "scratch data"
so didn't matter anyway. We recently had to do some reovery /
backfilling on our OSD nodes. Only large objects were stored now (rbd
chunks) so reovery speed was already much better. Still we had to crank
osd_max_backfills to 6, and osd_max_recovery to 3 to get some more
recovery performance. TL;DR: we set osd_recovery_sleep_hdd to 0 as well
as osd_recovery_sleep_hybrid to 0 and had another node recover. Already
with default recovery settings performance was much better. With
recovery / backfills set to 3, recovery went really fast. See [1] for
some "before / after" impression. Max throughput was around 1800 MB/s,
each OSD doing some 5K writes. For sure this was not the limit. We would
hit max nic bandwith pretty soon though.

ceph++

Gr. Stefan

[1]: https://owncloud.kooman.org/s/mvbMCVLFbWjAyOn#pdfviewer


Quoting Stefan Kooman (ste...@bit.nl):
> Hi,
> 
> I know I'm not the only one with this question as I have see similar 
> questions on this list:
> How to speed up recovery / backfilling?
> 
> Current status:
> 
> pgs: 155325434/800312109 objects degraded (19.408%)
>  1395 active+clean
>  440  active+undersized+degraded+remapped+backfill_wait
>  21   active+undersized+degraded+remapped+backfilling
> 
>   io:
> client:   180 kB/s rd, 5776 kB/s wr, 273 op/s rd, 440 op/s wr
> recovery: 2990 kB/s, 109 keys/s, 114 objects/s
> 
> What we did? Shutdown one DC. Fill cluster with loads of objects, turn
> DC back on (size = 3, min_size=2). To test exactly this: recovery.
> 
> I have been going trough all the recovery options (including legacy) but
> I cannot get the recovery speed to increase:
> 
> osd_recovery_op_priority 63
> osd_client_op_priority 3
> 
> ^^ yup, reversed those, to no avail
> 
> osd_recovery_max_active 10'
> 
> ^^ This helped for a short period of time, and then it went back to
> "slow" mode
> 
> osd_recovery_max_omap_entries_per_chunk 0
> osd_recovery_max_chunk 67108864
> 
> Haven't seen any change in recovery speed.
> 
> osd_recovery_sleep_ssd": "0.00
> ^^ default for SSD

Didn't think about hdd / hybrid setting, as we have all SSD.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Kernel problem with NBD resize

2018-02-28 Thread Alex Gorbachev

The rbd-nbd device no longer resizes online due to this bug:

https://lkml.org/lkml/2018/2/18/85

My tracker below, but this is not a Ceph issue.

https://tracker.ceph.com/issues/23137#change-108183

Has anyone heard any news and has any ability to inquire of the status
of the fix?  I don't see any recent kernel updates with nbd.c

--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread Oliver Freyermuth

Am 28.02.2018 um 15:18 schrieb David Turner:
> If you run your container in privileged mode you can mount ceph-fuse inside 
> of the VMs instead of from the shared resource on the host. I used a 
> configuration like this to test multi-tenency speed tests of CephFS using 
> ceph-fuse. The more mount points I used 1 per container, the more bandwidth I 
> was able to utilize into ceph. It was about 15 clients before any of the 
> clients were slower than a single client. Which is to say that my bandwidth 
> usage went up linearly for the first 15 containers. After that it started 
> snowing down peer container, but I eventually got to the point around 70 
> containers before I maxed the nic on the container host.
> 
> Anyway. All of that is meant to say that I would recommend against 1 fuse 
> mount point on the host for containers and add the logic to mount the FS 
> inside of the containers. Possibly check out the kennel mount options instead 
> to prevent the need for privileged containers.

Thanks for sharing these results!
In our case, this will likely not make any difference, since either the NICs of 
the OSD-hosts are limiting the system (large files) or the SSDs for the 
cephfs-metadata. 

The main issues I see with performing the ceph-fuse mount inside the container 
would be:
- Memory increase. I saw clients eating up to 400 MB, we run 28 jobs per host, 
so that's 14 GB. 
  Also, in case users process a common set of files, will that not mean things 
will be cached separately (and number of total caps held will increase)? 
- We would multiply our number of clients by a factor of 28, so we would run 
with 1100 clients. This would probably also increase the load on the MDS,
  since more clients come asking for caps and stay connected. 

So I'm unsure it's a good idea in our case (when the network already goes in 
saturation if the single host-clients are running), but it's certainly worth 
looking at. 

Cheers and thanks,
Oliver

> 
> On Tue, Feb 27, 2018, 6:27 PM Oliver Freyermuth 
> mailto:freyerm...@physik.uni-bonn.de>> wrote:
> 
> Dear Cephalopodians,
> 
> continuing a bit on the point raised in the other thread ( "CephFS very 
> unstable with many small files" )
> concerning the potentially unexpected behaviour of the ceph-fuse client 
> with regard to mount namespaces I did a first small experiment.
> 
> First off: I did not see any bad behaviour which can be traced back to 
> this directly, but maybe it is still worthwhile
> to share the information.
> 
> Here's what I did.
> 
> 1) Initially, cephfs is mounted fine:
> [root@wn001 ~]# ps faux | grep ceph
> root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
> 
> 2) Now, I fire off a container as normal user:
> $ singularity exec -B /cvmfs -B /cephfs 
> /cvmfs/some_container_repository/singularity/SL6/default/1519725973/ bash
> Welcome inside the SL6 container.
> Singularity> ls /cephfs
> benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user
> Singularity> cd /cephfs
> 
> All is fine and as expected. Singularity is one of many container 
> runtimes, you may also use charliecloud (more lightweight,
> and good to learn from the code how things work) or runc (the reference 
> implementation of OCI).
> The following may also work with a clever arrangement of "unshare" calls 
> (see e.g. https://sft.its.cern.ch/jira/projects/CVM/issues/CVM-1478 ).
> 
> 3) Now the experiment starts. On the host:
> [root@wn001 ~]# umount /cephfs/
> [root@wn001 ~]# ps faux | grep ceph
> root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
> [root@wn001 ~]# ls /cephfs/
> [root@wn001 ~]#
> 
> => The CephFS is unmounted, the fuse helper is kept running!
> The reason: It is still in use within the mount namespace in the 
> container.
> But there is no filehandle visible in the host namespace, which is why 
> the umount succeeds and returns.
> 
> 4) Now, in the container:
> Singularity> ls
> benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user
> 
> I can also write and read just fine.
> 
> 5) Now the ugly part begins. On the host:
> [root@wn001 ~]# mount /cephfs
> 2018-02-28 00:07:43.431425 7efddc61e040 -1 asok(0x5571340ae1c0) 
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
> bind the UNIX domain socket to '/var/run/ceph/ceph-client.cephfs_baf.asok': 
> (17) File exists
> 2018-02-28 00:07:43.434597 7efddc61e040 -1 init, newargv = 0x5571340abb20 
> newargc=11
> ceph-fuse[98703]: starting ceph client
> ceph-fuse[98703]: starting fuse
> [root@wn001 ~]# ps faux | grep ceph
> root        1908 31.4  0.1 1485376 201392 ?      Sl   Feb25 983:26 
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
> root

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Jason Dillaman

On Wed, Feb 28, 2018 at 7:53 AM, Massimiliano Cuttini  
wrote:
> I was building ceph in order to use with iSCSI.
> But I just see from the docs that need:
>
> CentOS 7.5
> (which is not available yet, it's still at 7.4)
> https://wiki.centos.org/Download
>
> Kernel 4.17
> (which is not available yet, it is still at 4.15.7)
> https://www.kernel.org/

The necessary kernel changes actually are included as part of 4.16-rc1
which is available now. We also offer a pre-built test kernel with the
necessary fixes here [1].

> So I guess, there is no ufficial support and this is just a bad prank.
>
> Ceph is ready to be used with S3 since many years.
> But need the kernel of the next century to works with such an old technology
> like iSCSI.
> So sad.

Unfortunately, kernel vs userspace have very different development
timelines.We have no interest in maintaining out-of-tree patchsets to
the kernel.

>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread David Turner

If you run your container in privileged mode you can mount ceph-fuse inside
of the VMs instead of from the shared resource on the host. I used a
configuration like this to test multi-tenency speed tests of CephFS using
ceph-fuse. The more mount points I used 1 per container, the more bandwidth
I was able to utilize into ceph. It was about 15 clients before any of the
clients were slower than a single client. Which is to say that my bandwidth
usage went up linearly for the first 15 containers. After that it started
snowing down peer container, but I eventually got to the point around 70
containers before I maxed the nic on the container host.

Anyway. All of that is meant to say that I would recommend against 1 fuse
mount point on the host for containers and add the logic to mount the FS
inside of the containers. Possibly check out the kennel mount options
instead to prevent the need for privileged containers.

On Tue, Feb 27, 2018, 6:27 PM Oliver Freyermuth <
freyerm...@physik.uni-bonn.de> wrote:

> Dear Cephalopodians,
>
> continuing a bit on the point raised in the other thread ( "CephFS very
> unstable with many small files" )
> concerning the potentially unexpected behaviour of the ceph-fuse client
> with regard to mount namespaces I did a first small experiment.
>
> First off: I did not see any bad behaviour which can be traced back to
> this directly, but maybe it is still worthwhile
> to share the information.
>
> Here's what I did.
>
> 1) Initially, cephfs is mounted fine:
> [root@wn001 ~]# ps faux | grep ceph
> root1908 31.4  0.1 1485376 201392 ?  Sl   Feb25 983:26
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
>
> 2) Now, I fire off a container as normal user:
> $ singularity exec -B /cvmfs -B /cephfs
> /cvmfs/some_container_repository/singularity/SL6/default/1519725973/ bash
> Welcome inside the SL6 container.
> Singularity> ls /cephfs
> benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user
> Singularity> cd /cephfs
>
> All is fine and as expected. Singularity is one of many container
> runtimes, you may also use charliecloud (more lightweight,
> and good to learn from the code how things work) or runc (the reference
> implementation of OCI).
> The following may also work with a clever arrangement of "unshare" calls
> (see e.g. https://sft.its.cern.ch/jira/projects/CVM/issues/CVM-1478 ).
>
> 3) Now the experiment starts. On the host:
> [root@wn001 ~]# umount /cephfs/
> [root@wn001 ~]# ps faux | grep ceph
> root1908 31.4  0.1 1485376 201392 ?  Sl   Feb25 983:26
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
> [root@wn001 ~]# ls /cephfs/
> [root@wn001 ~]#
>
> => The CephFS is unmounted, the fuse helper is kept running!
> The reason: It is still in use within the mount namespace in the container.
> But there is no filehandle visible in the host namespace, which is why the
> umount succeeds and returns.
>
> 4) Now, in the container:
> Singularity> ls
> benchmark  dd_test_rd.sh  dd_test.sh  grid  kern  port  user
>
> I can also write and read just fine.
>
> 5) Now the ugly part begins. On the host:
> [root@wn001 ~]# mount /cephfs
> 2018-02-28 00:07:43.431425 7efddc61e040 -1 asok(0x5571340ae1c0)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
> bind the UNIX domain socket to '/var/run/ceph/ceph-client.cephfs_baf.asok':
> (17) File exists
> 2018-02-28 00:07:43.434597 7efddc61e040 -1 init, newargv = 0x5571340abb20
> newargc=11
> ceph-fuse[98703]: starting ceph client
> ceph-fuse[98703]: starting fuse
> [root@wn001 ~]# ps faux | grep ceph
> root1908 31.4  0.1 1485376 201392 ?  Sl   Feb25 983:26
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
> root   98703  1.0  0.0 400268  9456 pts/2Sl   00:07   0:00
> ceph-fuse --id=cephfs_baf --client_mountpoint=/ /cephfs -o rw
>
> As you can see:
> - Name collision for admin socket, since the helper is already running.
> - A second helper for the same mountpoint was fired up!
> - Of course, now cephfs is accessible on the host again.
> - On a side-note, once I exit the container (and hence close the mount
> namespace), the "old" helper is finally freed.
>
> Hence, I am unsure what exactly happens during the internal "remount" when
> the cephfs_fuse helper remounts the FS to make the kernel drop all internal
> caches.
>
> Since my kernel anf FUSE experience is very limited, let me recollect what
> other Fuse-FSes do:
> - sshfs does the same, i.e. one helper in host and one helper in container
> namespace. But it does not have problems with e.g. the admin socket.
> - CVMFS ( http://cvmfs.readthedocs.io/en/stable/ ) errors out in step
> (5), i.e. the admin can not remount anymore on the host.
>   This is nasty, especially when combined with autofs and containers are
> placed on CVMFS, which is why I opened
> https://sft.its.cern.ch/jira/projects/CVM/issues/CVM-1478 with them.
>   They need to enforce a single helper only to prevent corruptio

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Max Cuttins


Sorry for being rude Ross,

I follow Ceph since 2014 waiting for iSCSI support in order to use it 
with Xen.
When finally it seemds it was implemented the OS requirements are 
irrealistic.
Seems a bad prank. 4 year waiting for this... and still not true support 
yet.





Il 28/02/2018 14:11, Marc Roos ha scritto:
  
Hi Massimiliano, have an espresso. You know the indians have a nice

saying

"Everything will be good at the end. If it is not good, it is still not
the end."



-Original Message-
From: Massimiliano Cuttini [mailto:m...@phoenixweb.it]
Sent: woensdag 28 februari 2018 13:53
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph iSCSI is a prank?

I was building ceph in order to use with iSCSI.
But I just see from the docs that need:

CentOS 7.5
(which is not available yet, it's still at 7.4)
https://wiki.centos.org/Download

Kernel 4.17
(which is not available yet, it is still at 4.15.7)
https://www.kernel.org/

So I guess, there is no ufficial support and this is just a bad prank.

Ceph is ready to be used with S3 since many years.
But need the kernel of the next century to works with such an old
technology like iSCSI.
So sad.












___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-28 Thread David C

On 27 Feb 2018 06:46, "Jan Pekař - Imatic"  wrote:

I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue before
Luminous (i did the same tests before).

It is on my test 1 node cluster with lower memory then recommended (so
server is swapping) but it shouldn't lose data (it never did before).
So slow requests may appear in the log like Florent B mentioned.

My test is to take some bigger files (few GB) and copy it to cephfs or from
cephfs to cephfs and stress the cluster so data copying stall for a while.
It will resume in few seconds/minutes and everything looks ok (no error on
copying). But copied file may be corrupted silently.

I checked wiles with MD5SUM and compared some corrupted files in detail.
There were missing some 4MB blocks of data (cephfs object size) - corrupted
file had that block of data filled with zeroes.

My idea is, that there happen something wrong when cluster is under
pressure and client want to save the block. Client gets OK and continues
with another block so data is lost and corrupted block is filled with zeros.

I tried kernel client 4.x and ceph-fuse client with same result.

I'm using erasure for cephfs data pool, cache tier and my storage is
bluestore and filestore mixed.

How can I help to debug or what should I do to help to find the problem?

Always worrying to see the dreaded C word. I operate a Luminous cluster
with a pretty varied workload and have yet to see any signs of corruption,
although of course that doesn't mean its not happening. Initial questions:

- What's the history of your cluster? Was this an upgrade or fresh Luminous
install?
- Was ceph healthy when you ran this test?
-Are you accessing this one node cluster from the node itself or from a
separate client?

I'd recommend starting a new thread with more details, it sounds like it's
pretty reproducable for you so maybe crank up your debugging and send logs.
http://docs.ceph.com/docs/luminous/dev/kernel-client-troubleshooting/

With regards
Jan Pekar

On 14.12.2017 15:41, Yan, Zheng wrote:

> On Thu, Dec 14, 2017 at 8:52 PM, Florent B  wrote:
>
>> On 14/12/2017 03:38, Yan, Zheng wrote:
>>
>>> On Thu, Dec 14, 2017 at 12:49 AM, Florent B  wrote:
>>>

 Systems are on Debian Jessie : kernel 3.16.0-4-amd64 & libfuse 2.9.3-15.

 I don't know pattern of corruption, but according to error message in
 Dovecot, it seems to expect data to read but reach EOF.

 All seems fine using fuse_disable_pagecache (no more corruption, and
 performance increased : no more MDS slow requests on filelock requests).

>>>
>>> I checked ceph-fuse changes since kraken, didn't find any clue. I
>>> would be helpful if you can try recent version kernel.
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>
>> Problem occurred this morning even with fuse_disable_pagecache=true.
>>
>> It seems to be a lock issue between imap & lmtp processes.
>>
>> Dovecot uses fcntl as locking method. Is there any change about it in
>> Luminous ? I switched to flock to see if problem is still there...
>>
>>
> I don't remenber there is any change.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-- 

Ing. Jan Pekař
jan.pe...@imatic.cz | +420603811737

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Kll
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-28 Thread David Turner

Like John says, noout prevents an osd being marked out in the cluster. It
does not impede it from being marked down and back up which is the desired
behavior when restarting a server. What are you seeing with your osds
becoming unusable and needing to rebuild them?

When rebooting a server if it takes too long to come back up then the osds
will get marked out and data will start backfilling to replace the copies
on the osds that are no longer "in" in the cluster.

Once those osds come back, not only do they need to backfill to catch up on
what they missed while they were down, but the cluster now needs to undo
all of the data migration it was doing to recover from them being marked
out.

On Tue, Feb 27, 2018, 4:24 PM John Spray  wrote:

> On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree  wrote:
> > Is it still considered best practice to set 'noout' for OSDs that will be
> > going under maintenance, e.g., rebooting an OSD ndoe for a kernel update?
> >
> > I ask, because I've set this twice now during times which the OSDs would
> > only momentarily be 'out', however each time I've done this, the OSDs
> have
> > become unusable and I've had to rebuild them.
>
> Can you be more specific about "unusable"?  Marking an OSD noout is of
> course not meant to harm it!
>
> John
>
> > Also, when I _do not_ set 'noout', it would seem that once the node
> reboots
> > the OSDs come back online without issue _and_ there is very _little_
> > recovery i/o -- I'd expect to see lots of recovery i/o if a node goes
> down
> > as the cluster tries to replace the PGs on other OSD nodes.  This further
> > makes me believe that setting 'noout' is no longer necessary.
> >
> > I'm running version 12.2.2-12.2.4 (in the middle of upgrading).
> >
> > Thanks in advance.
> >
> > --
> > Andre Goree
> > -=-=-=-=-=-
> > Email - andre at drenet.net
> > Website   - http://blog.drenet.net
> > PGP key   - http://www.drenet.net/pubkey.html
> > -=-=-=-=-=-
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Stefan Priebe - Profihost AG

Am 28.02.2018 um 13:58 schrieb Dan van der Ster:
> Hi Stefan,
> 
> Which balancer mode are you using? crush-compat scores using a mix of
> nobjects, npgs, and size. It's doing pretty well over here as long as
> you have a relatively small number of empty PGs.
>
> I believe that upmap uses nPGs only, and I haven't tested it enough
> yet to know if it actually improves things.
> 
> Also, did you only run one iteration of the balancer? It only moves up
> to 5% of objects each iteration, so it can take several to fully
> balance things.

crush-compat mode

Yes only one iteration but i set max_misplaced to 20%:
"mgr/balancer/max_misplaced": "20.00",

> 
> -- dan
> 
> 
> On Wed, Feb 28, 2018 at 1:47 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Hello,
>>
>> with jewel we always used the python crush optimizer which gave us a
>> pretty good distribution fo the used space.
>>
>> Since luminous we're using the included ceph mgr balancer but the
>> distribution is far from perfect and much worse than the old method.
>>
>> Is there any way to tune the mgr balancer?
>>
>> Currently after a balance we still have:
>> 75% to 92% disk usage which is pretty unfair
>>
>> Greets,
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Stefan Priebe - Profihost AG


Am 28.02.2018 um 13:59 schrieb John Spray:
> On Wed, Feb 28, 2018 at 12:47 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Hello,
>>
>> with jewel we always used the python crush optimizer which gave us a
>> pretty good distribution fo the used space.
>>
>> Since luminous we're using the included ceph mgr balancer but the
>> distribution is far from perfect and much worse than the old method
> 
> Can you say what mode you're using it in/what commands you used?

mode: crush-compat

OPT_NAME="${SSH_USERNAME}_$(date +'%Y-%m-%d_%H:%M')"
ceph balancer optimize $OPT_NAME
ceph balancer execute $OPT_NAME

Greets,
Stefan
> 
> John
> .
>>
>> Is there any way to tune the mgr balancer?
>>
>> Currently after a balance we still have:
>> 75% to 92% disk usage which is pretty unfair
>>
>> Greets,
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-02-28 Thread Marco Baldini - H.S. Amiata


Hi

I read the bugtracker issue and it seems a lot like my problem, even if 
I can't check the reported checksum because I don't have it in my logs, 
perhaps it's because of debug osd = 0/0 in ceph.conf


I just raised the OSD log level

ceph tell osd.* injectargs --debug-osd 5/5

I'll check OSD logs in the next days...

Thanks



Il 28/02/2018 11:59, Paul Emmerich ha scritto:

Hi,

might be http://tracker.ceph.com/issues/22464

Can you check the OSD log file to see if the reported checksum 
is 0x6706be76?



Paul

Am 28.02.2018 um 11:43 schrieb Marco Baldini - H.S. Amiata 
mailto:mbald...@hsamiata.it>>:


Hello

I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and 
1x240GB SSD. I created this cluster after Luminous release, so all 
OSDs are Bluestore. In my crush map I have two rules, one targeting 
the SSDs and one targeting the HDDs. I have 4 pools, one using the 
SSD rule and the others using the HDD rule, three pools are size=3 
min_size=2, one is size=2 min_size=1 (this one have content that it's 
ok to lose)


In the last 3 month I'm having a strange random problem. I planned my 
osd scrubs during the night (osd scrub begin hour = 20, osd scrub end 
hour = 7) when office is closed so there is low impact on the users. 
Some mornings, when I ceph the cluster health, I find:


HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
OSD_SCRUB_ERRORS X scrub errors
PG_DAMAGED Possible data damage: Y pg inconsistent

X and Y sometimes are 1, sometimes 2.

I issue a ceph health detail, check the damaged PGs, and run a ceph 
pg repair for the damaged PGs, I get


instructing pg PG on osd.N to repair

PG are different, OSD that have to repair PG is different, even the 
node hosting the OSD is different, I made a list of all PGs and OSDs. 
This morning is the most recent case:


> ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 13.65 is active+clean+inconsistent, acting [4,2,6]
pg 14.31 is active+clean+inconsistent, acting [8,3,1]
> ceph pg repair 13.65
instructing pg 13.65 on osd.4 to repair

(node-2)> tail /var/log/ceph/ceph-osd.4.log
2018-02-28 08:38:47.593447 7f112cf76700  0 log_channel(cluster) log [DBG] : 
13.65 repair starts
2018-02-28 08:39:37.573342 7f112cf76700  0 log_channel(cluster) log [DBG] : 
13.65 repair ok, 0 fixed
> ceph pg repair 14.31
instructing pg 14.31 on osd.8 to repair

(node-3)> tail /var/log/ceph/ceph-osd.8.log
2018-02-28 08:52:37.297490 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
14.31 repair starts
2018-02-28 08:53:00.704020 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
14.31 repair ok, 0 fixed


I made a list of when I got OSD_SCRUB_ERRORS, what PG and what OSD 
had to repair PG. Date is dd/mm/


21/12/2017   --  pg 14.29 is active+clean+inconsistent, acting [6,2,4]

18/01/2018   --  pg 14.5a is active+clean+inconsistent, acting [6,4,1]

22/01/2018   --  pg 9.3a is active+clean+inconsistent, acting [2,7]

29/01/2018   --  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
  instructing pg 13.3e on osd.4 to repair

07/02/2018   --  pg 13.7e is active+clean+inconsistent, acting [8,2,5]
  instructing pg 13.7e on osd.8 to repair

09/02/2018   --  pg 13.30 is active+clean+inconsistent, acting [7,3,2]
  instructing pg 13.30 on osd.7 to repair

15/02/2018   --  pg 9.35 is active+clean+inconsistent, acting [1,8]
  instructing pg 9.35 on osd.1 to repair

  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
  instructing pg 13.3e on osd.4 to repair

17/02/2018   --  pg 9.2d is active+clean+inconsistent, acting [7,5]
  instructing pg 9.2d on osd.7 to repair

22/02/2018   --  pg 9.24 is active+clean+inconsistent, acting [5,8]
  instructing pg 9.24 on osd.5 to repair

28/02/2018   --  pg 13.65 is active+clean+inconsistent, acting [4,2,6]
  instructing pg 13.65 on osd.4 to repair

  pg 14.31 is active+clean+inconsistent, acting [8,3,1]
  instructing pg 14.31 on osd.8 to repair



If can be useful, my ceph.conf is here:

[global]
auth client required = none
auth cluster required = none
auth service required = none
fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
bluestore_block_db_size = 64424509440

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Marc Roos

 
Hi Massimiliano, have an espresso. You know the indians have a nice 
saying

"Everything will be good at the end. If it is not good, it is still not 
the end."



-Original Message-
From: Massimiliano Cuttini [mailto:m...@phoenixweb.it] 
Sent: woensdag 28 februari 2018 13:53
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph iSCSI is a prank?

I was building ceph in order to use with iSCSI.
But I just see from the docs that need:

CentOS 7.5
(which is not available yet, it's still at 7.4)
https://wiki.centos.org/Download

Kernel 4.17
(which is not available yet, it is still at 4.15.7)
https://www.kernel.org/

So I guess, there is no ufficial support and this is just a bad prank.

Ceph is ready to be used with S3 since many years.
But need the kernel of the next century to works with such an old 
technology like iSCSI.
So sad.









___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread John Spray

On Wed, Feb 28, 2018 at 12:47 PM, Stefan Priebe - Profihost AG
 wrote:
> Hello,
>
> with jewel we always used the python crush optimizer which gave us a
> pretty good distribution fo the used space.
>
> Since luminous we're using the included ceph mgr balancer but the
> distribution is far from perfect and much worse than the old method

Can you say what mode you're using it in/what commands you used?

John
.
>
> Is there any way to tune the mgr balancer?
>
> Currently after a balance we still have:
> 75% to 92% disk usage which is pretty unfair
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Dan van der Ster

Hi Stefan,

Which balancer mode are you using? crush-compat scores using a mix of
nobjects, npgs, and size. It's doing pretty well over here as long as
you have a relatively small number of empty PGs.
I believe that upmap uses nPGs only, and I haven't tested it enough
yet to know if it actually improves things.

Also, did you only run one iteration of the balancer? It only moves up
to 5% of objects each iteration, so it can take several to fully
balance things.

-- dan

On Wed, Feb 28, 2018 at 1:47 PM, Stefan Priebe - Profihost AG
 wrote:
> Hello,
>
> with jewel we always used the python crush optimizer which gave us a
> pretty good distribution fo the used space.
>
> Since luminous we're using the included ceph mgr balancer but the
> distribution is far from perfect and much worse than the old method.
>
> Is there any way to tune the mgr balancer?
>
> Currently after a balance we still have:
> 75% to 92% disk usage which is pretty unfair
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Massimiliano Cuttini


I was building ceph in order to use with iSCSI.
But I just see from the docs that need:

   *CentOS 7.5*
   (which is not available yet, it's still at 7.4)
   https://wiki.centos.org/Download

   *Kernel 4.17*
   (which is not available yet, it is still at 4.15.7)
   https://www.kernel.org/

So I guess, there is no ufficial support and this is just a bad prank.

Ceph is ready to be used with S3 since many years.
But need the kernel of the next century to works with such an old 
technology like iSCSI.

So sad.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph mgr balancer bad distribution

2018-02-28 Thread Stefan Priebe - Profihost AG

Hello,

with jewel we always used the python crush optimizer which gave us a
pretty good distribution fo the used space.

Since luminous we're using the included ceph mgr balancer but the
distribution is far from perfect and much worse than the old method.

Is there any way to tune the mgr balancer?

Currently after a balance we still have:
75% to 92% disk usage which is pretty unfair

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread John Spray

On Wed, Feb 28, 2018 at 11:05 AM, John Spray  wrote:
> On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster  wrote:
>> Hi all,
>>
>> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
>> OSD's updated fine.
>>
>> When updating the MDS's (we have 2 active and 1 standby), I started
>> with the standby.
>>
>> At the moment the standby MDS restarted into 12.2.4 [1], both active
>> MDSs (still running 12.2.2) suicided like this:
>>
>> 2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
>> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
>> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
>> inode in separate object,5=mds uses versioned encoding,6=dirfrag is
>> stored in omap,8=no anchor table,9=file layout v2} not writeable with
>> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
>> writeable ranges,3=default file layouts on dirs,4=dir inode in
>> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
>> omap,7=mds uses inline data,8=file layout v2}, killing myself
>> 2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
>> wanted state up:active
>> 2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
>> shutting down rank 0
>>
>>
>> 2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
>> handle_mds_map mdsmap compatset compat={},rocompat={}
>> ,incompat={1=base v0.20,2=client writeable ranges,3=default file
>> layouts on dirs,4=dir inode in separate object,5=m
>> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
>> table,9=file layout v2} not writeable with daemo
>> n features compat={},rocompat={},incompat={1=base v0.20,2=client
>> writeable ranges,3=default file layouts on dirs,4=
>> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
>> is stored in omap,7=mds uses inline data,8=fil
>> e layout v2}, killing myself
>> 2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
>> wanted state up:active
>> 2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
>> shutting down rank 1
>
> That's not good!
>
> From looking at the commits between 12.2.2 and 12.2.4, this one looks
> suspicious:
>
> commit ddba907279719631903e3a20543056d81d176a1b
> Author: Yan, Zheng 
> Date:   Tue Oct 31 16:56:51 2017 +0800
>
> mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition
>
> Fixes: http://tracker.ceph.com/issues/21985
> Signed-off-by: "Yan, Zheng" 
> (cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638)

Apologies for the noise, my mail client hadn't loaded the earlier
responses in which this was already pointed out.

John

> John
>
>
>
>>
>>
>> The cephfs cluster was down until I updated all MDS's to 12.2.4 --
>> then they restarted cleanly.
>>
>> Looks like a pretty serious bug??!!
>>
>> Cheers, Dan
>>
>>
>> [1] here is the standby restarting, 4 seconds before the active MDS's 
>> suicided:
>>
>> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
>> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
>> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
>> (unknown), pid 10648
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread John Spray

On Wed, Feb 28, 2018 at 9:37 AM, Dan van der Ster  wrote:
> Hi all,
>
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.
>
> When updating the MDS's (we have 2 active and 1 standby), I started
> with the standby.
>
> At the moment the standby MDS restarted into 12.2.4 [1], both active
> MDSs (still running 12.2.2) suicided like this:
>
> 2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
> handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
> v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
> inode in separate object,5=mds uses versioned encoding,6=dirfrag is
> stored in omap,8=no anchor table,9=file layout v2} not writeable with
> daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,7=mds uses inline data,8=file layout v2}, killing myself
> 2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
> shutting down rank 0
>
>
> 2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
> handle_mds_map mdsmap compatset compat={},rocompat={}
> ,incompat={1=base v0.20,2=client writeable ranges,3=default file
> layouts on dirs,4=dir inode in separate object,5=m
> ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
> table,9=file layout v2} not writeable with daemo
> n features compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=
> dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
> is stored in omap,7=mds uses inline data,8=fil
> e layout v2}, killing myself
> 2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
> wanted state up:active
> 2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
> shutting down rank 1

That's not good!

>From looking at the commits between 12.2.2 and 12.2.4, this one looks
suspicious:

commit ddba907279719631903e3a20543056d81d176a1b
Author: Yan, Zheng 
Date:   Tue Oct 31 16:56:51 2017 +0800

mds: fix MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 definition

Fixes: http://tracker.ceph.com/issues/21985
Signed-off-by: "Yan, Zheng" 
(cherry picked from commit 6c1543dfc55d6db8493535b9b62a30236cf8c638)

John



>
>
> The cephfs cluster was down until I updated all MDS's to 12.2.4 --
> then they restarted cleanly.
>
> Looks like a pretty serious bug??!!
>
> Cheers, Dan
>
>
> [1] here is the standby restarting, 4 seconds before the active MDS's 
> suicided:
>
> 2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
> 2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 10648
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mirror OSD configuration

2018-02-28 Thread Zoran Bošnjak

I am aware of monitor consensus requirement. It is taken care of (there is a 
third room with only monitor node). My problem is about OSD redundancy, since I 
can only use 2 server rooms for OSDs.

I could use EC-pools, lrc or any other ceph configuration. But I could not find 
a configuration that would address the issue. The write acknowledge rule should 
read something like this:
1. If both rooms are "up", do not acknowledge write until ack is received from 
both rooms.
2. If only one room is "up" (forget rule 1.) acknowledge write on the first ack.

The ceph documentation talks about recursively defined locality sets, so I 
assume it allows for different rules on room/rack/host... levels.
But as far as I can see, it can not depend on "room" availability.

Is this possible to configure?
I would appreciate example configuration commands.

regards,
Zoran


From: Eino Tuominen 
Sent: Wednesday, February 28, 2018 8:47 AM
To: Zoran Bošnjak; ceph-us...@ceph.com
Subject: Re: mirror OSD configuration

> Is it possible to configure crush map such that it will tolerate "room" 
> failure? In my case, there is one
> network switch per room and one power supply per room, which makes a single 
> point of (room) failure.

Hi,

You cannot achieve real room redundancy with just two rooms. At minimum you'll 
need a third room (witness) from which you'll need independent network 
connections to the two server rooms. Otherwise it's impossible to have monitor 
quorum when one of the two rooms fails. And then you'd need to consider osd 
redundancy. You could do with replica size = 4, min_size = 2 (or any min_size = 
n, size = 2*n ), but that's not perfect as you lose exactly half of the 
replicas in case of a room failure. If you were able to use EC-pools you'd have 
more options with LRC coding 
(http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/).

We run ceph in a 3 room configuration with 3 monitors, size=3, min_size=2. It 
works, but it's not without hassle either.

--
  Eino Tuominen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-02-28 Thread Paul Emmerich

Hi,

might be http://tracker.ceph.com/issues/22464

Can you check the OSD log file to see if the reported checksum is 0x6706be76?


Paul

> Am 28.02.2018 um 11:43 schrieb Marco Baldini - H.S. Amiata 
> :
> 
> Hello
> 
> I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and 1x240GB 
> SSD. I created this cluster after Luminous release, so all OSDs are 
> Bluestore. In my crush map I have two rules, one targeting the SSDs and one 
> targeting the HDDs. I have 4 pools, one using the SSD rule and the others 
> using the HDD rule, three pools are size=3 min_size=2, one is size=2 
> min_size=1 (this one have content that it's ok to lose)
> 
> In the last 3 month I'm having a strange random problem. I planned my osd 
> scrubs during the night (osd scrub begin hour = 20, osd scrub end hour = 7) 
> when office is closed so there is low impact on the users. Some mornings, 
> when I ceph the cluster health, I find: 
> HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
> OSD_SCRUB_ERRORS X scrub errors
> PG_DAMAGED Possible data damage: Y pg inconsistent
> X and Y sometimes are 1, sometimes 2.
> 
> I issue a ceph health detail, check the damaged PGs, and run a ceph pg repair 
> for the damaged PGs, I get
> 
> instructing pg PG on osd.N to repair
> PG are different, OSD that have to repair PG is different, even the node 
> hosting the OSD is different, I made a list of all PGs and OSDs. This morning 
> is the most recent case:
> 
> > ceph health detail
> HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
> OSD_SCRUB_ERRORS 2 scrub errors
> PG_DAMAGED Possible data damage: 2 pgs inconsistent
> pg 13.65 is active+clean+inconsistent, acting [4,2,6]
> pg 14.31 is active+clean+inconsistent, acting [8,3,1]
> > ceph pg repair 13.65
> instructing pg 13.65 on osd.4 to repair
> 
> (node-2)> tail /var/log/ceph/ceph-osd.4.log
> 2018-02-28 08:38:47.593447 7f112cf76700  0 log_channel(cluster) log [DBG] : 
> 13.65 repair starts
> 2018-02-28 08:39:37.573342 7f112cf76700  0 log_channel(cluster) log [DBG] : 
> 13.65 repair ok, 0 fixed
> > ceph pg repair 14.31
> instructing pg 14.31 on osd.8 to repair
> 
> (node-3)> tail /var/log/ceph/ceph-osd.8.log
> 2018-02-28 08:52:37.297490 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
> 14.31 repair starts
> 2018-02-28 08:53:00.704020 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
> 14.31 repair ok, 0 fixed
> 
> 
> I made a list of when I got OSD_SCRUB_ERRORS, what PG and what OSD had to 
> repair PG. Date is dd/mm/
> 21/12/2017   --  pg 14.29 is active+clean+inconsistent, acting [6,2,4]
> 
> 18/01/2018   --  pg 14.5a is active+clean+inconsistent, acting [6,4,1]
> 
> 22/01/2018   --  pg 9.3a is active+clean+inconsistent, acting [2,7]
> 
> 29/01/2018   --  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
>  instructing pg 13.3e on osd.4 to repair
> 
> 07/02/2018   --  pg 13.7e is active+clean+inconsistent, acting [8,2,5]
>  instructing pg 13.7e on osd.8 to repair
> 
> 09/02/2018   --  pg 13.30 is active+clean+inconsistent, acting [7,3,2]
>  instructing pg 13.30 on osd.7 to repair
> 
> 15/02/2018   --  pg 9.35 is active+clean+inconsistent, acting [1,8]
>  instructing pg 9.35 on osd.1 to repair
> 
>  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
>  instructing pg 13.3e on osd.4 to repair
> 
> 17/02/2018   --  pg 9.2d is active+clean+inconsistent, acting [7,5]
>  instructing pg 9.2d on osd.7 to repair 
> 
> 22/02/2018   --  pg 9.24 is active+clean+inconsistent, acting [5,8]
>  instructing pg 9.24 on osd.5 to repair
> 
> 28/02/2018   --  pg 13.65 is active+clean+inconsistent, acting [4,2,6]
>  instructing pg 13.65 on osd.4 to repair
> 
>  pg 14.31 is active+clean+inconsistent, acting [8,3,1]
>  instructing pg 14.31 on osd.8 to repair
> 
> 
> 
> If can be useful, my ceph.conf is here:
> 
> [global]
> auth client required = none
> auth cluster required = none
> auth service required = none
> fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
> cluster network = 10.10.10.0/24
> public network = 10.10.10.0/24
> keyring = /etc/pve/priv/$cluster.$name.keyring
> mon allow pool delete = true
> osd journal size = 5120
> osd pool default min size = 2
> osd pool default size = 3
> bluestore_block_db_size = 64424509440
> 
> debug asok = 0/0
> debug auth = 0/0
> debug buffer = 0/0
> debug client = 0/0
> debug context = 0/0
> debug crush = 0/0
> debug filer = 0/0
> debug filestore = 0/0
> debug finisher = 0/0
> debug heartbeatmap = 0/0
> debug journal = 0/0
> debug journaler = 0/0
> debug lockdep = 0/0
> debug mds = 0/0
> debug mds balancer = 0/0
> debug mds locker = 0/0
> debug mds log = 0/0
> debug mds log expire = 0/0
> debug mds migrator = 0/0
> debug mon = 0/0
> debug monc = 0/0
> debug ms = 0/0
> debug objclass = 0/0
> debug objectcacher = 0/0
> debug objecter

[ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-02-28 Thread Marco Baldini - H.S. Amiata


Hello

I have a little ceph cluster with 3 nodes, each with 3x1TB HDD and 
1x240GB SSD. I created this cluster after Luminous release, so all OSDs 
are Bluestore. In my crush map I have two rules, one targeting the SSDs 
and one targeting the HDDs. I have 4 pools, one using the SSD rule and 
the others using the HDD rule, three pools are size=3 min_size=2, one is 
size=2 min_size=1 (this one have content that it's ok to lose)


In the last 3 month I'm having a strange random problem. I planned my 
osd scrubs during the night (osd scrub begin hour = 20, osd scrub end 
hour = 7) when office is closed so there is low impact on the users. 
Some mornings, when I ceph the cluster health, I find:


HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
OSD_SCRUB_ERRORS X scrub errors
PG_DAMAGED Possible data damage: Y pg inconsistent

X and Y sometimes are 1, sometimes 2.

I issue a ceph health detail, check the damaged PGs, and run a ceph pg 
repair for the damaged PGs, I get


instructing pg PG on osd.N to repair

PG are different, OSD that have to repair PG is different, even the node 
hosting the OSD is different, I made a list of all PGs and OSDs. This 
morning is the most recent case:



ceph health detail

HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 13.65 is active+clean+inconsistent, acting [4,2,6]
pg 14.31 is active+clean+inconsistent, acting [8,3,1]


ceph pg repair 13.65

instructing pg 13.65 on osd.4 to repair

(node-2)> tail /var/log/ceph/ceph-osd.4.log
2018-02-28 08:38:47.593447 7f112cf76700  0 log_channel(cluster) log [DBG] : 
13.65 repair starts
2018-02-28 08:39:37.573342 7f112cf76700  0 log_channel(cluster) log [DBG] : 
13.65 repair ok, 0 fixed


ceph pg repair 14.31

instructing pg 14.31 on osd.8 to repair

(node-3)> tail /var/log/ceph/ceph-osd.8.log
2018-02-28 08:52:37.297490 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
14.31 repair starts
2018-02-28 08:53:00.704020 7f4dd0816700  0 log_channel(cluster) log [DBG] : 
14.31 repair ok, 0 fixed


I made a list of when I got OSD_SCRUB_ERRORS, what PG and what OSD had 
to repair PG. Date is dd/mm/


21/12/2017   --  pg 14.29 is active+clean+inconsistent, acting [6,2,4]

18/01/2018   --  pg 14.5a is active+clean+inconsistent, acting [6,4,1]

22/01/2018   --  pg 9.3a is active+clean+inconsistent, acting [2,7]

29/01/2018   --  pg 13.3e is active+clean+inconsistent, acting [4,6,1]
 instructing pg 13.3e on osd.4 to repair

07/02/2018   --  pg 13.7e is active+clean+inconsistent, acting [8,2,5]
 instructing pg 13.7e on osd.8 to repair

09/02/2018   --  pg 13.30 is active+clean+inconsistent, acting [7,3,2]
 instructing pg 13.30 on osd.7 to repair

15/02/2018   --  pg 9.35 is active+clean+inconsistent, acting [1,8]
 instructing pg 9.35 on osd.1 to repair

 pg 13.3e is active+clean+inconsistent, acting [4,6,1]
 instructing pg 13.3e on osd.4 to repair

17/02/2018   --  pg 9.2d is active+clean+inconsistent, acting [7,5]
 instructing pg 9.2d on osd.7 to repair

22/02/2018   --  pg 9.24 is active+clean+inconsistent, acting [5,8]
 instructing pg 9.24 on osd.5 to repair

28/02/2018   --  pg 13.65 is active+clean+inconsistent, acting [4,2,6]
 instructing pg 13.65 on osd.4 to repair

 pg 14.31 is active+clean+inconsistent, acting [8,3,1]
 instructing pg 14.31 on osd.8 to repair



If can be useful, my ceph.conf is here:

[global]
auth client required = none
auth cluster required = none
auth service required = none
fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
bluestore_block_db_size = 64424509440

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0


[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1

osd scrub begin hour = 20
osd scrub end hour = 7
osd scrub during recovery = false
osd scrub load threshold = 0.3

[client]
rbd cache = true
rbd cache size = 268435456  #

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster

On Wed, Feb 28, 2018 at 11:38 AM, Patrick Donnelly  wrote:
> On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster  wrote:
>> (Sorry to spam)
>>
>> I guess it's related to this fix to the layout v2 feature id:
>> https://github.com/ceph/ceph/pull/18782/files
>>
>> -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
>> "file layout v2")
>> +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
>> "file layout v2")
>
> Yes, this looks to be the issue.
>
>> Is there a way to update from 12.2.2 without causing the other active
>> MDS's to suicide?
>
> I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
> deactivate other ranks), shutdown standbys, upgrade the single active,
> then upgrade/start the standbys.
>
> Unfortunately this didn't get flagged in upgrade testing. Thanks for
> the report Dan.

Thanks Patrick -- that's a good idea to reduce to 1 active.
I've create http://tracker.ceph.com/issues/23172 in case any followup is needed.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Patrick Donnelly

On Wed, Feb 28, 2018 at 2:07 AM, Dan van der Ster  wrote:
> (Sorry to spam)
>
> I guess it's related to this fix to the layout v2 feature id:
> https://github.com/ceph/ceph/pull/18782/files
>
> -#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
> "file layout v2")
> +#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
> "file layout v2")

Yes, this looks to be the issue.

> Is there a way to update from 12.2.2 without causing the other active
> MDS's to suicide?

I think it will be necessary to reduce the actives to 1 (max_mds -> 1;
deactivate other ranks), shutdown standbys, upgrade the single active,
then upgrade/start the standbys.

Unfortunately this didn't get flagged in upgrade testing. Thanks for
the report Dan.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster

(Sorry to spam)

I guess it's related to this fix to the layout v2 feature id:
https://github.com/ceph/ceph/pull/18782/files

-#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(8,
"file layout v2")
+#define MDS_FEATURE_INCOMPAT_FILE_LAYOUT_V2 CompatSet::Feature(9,
"file layout v2")

Is there a way to update from 12.2.2 without causing the other active
MDS's to suicide?

Cheers, Dan



On Wed, Feb 28, 2018 at 11:01 AM, Dan van der Ster  wrote:
> More:
>
> here is the MDS_FEATURES map for a running 12.2.2 cluster:
>
> compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,8=file layout v2}
>
> and here it is on this updated 12.2.4 cluster:
>
> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2}
>
>
> feature bit 8 is not the same for these two. Am I confused or did
> these features get changed in 12.2.3/4.
>
>
> Cheers, Dan
>
> p.s. yes 12.2.4 is tagged and out -- check your favourite repo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster

More:

here is the MDS_FEATURES map for a running 12.2.2 cluster:

compat: compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,8=file layout v2}

and here it is on this updated 12.2.4 cluster:

compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2}


feature bit 8 is not the same for these two. Am I confused or did
these features get changed in 12.2.3/4.


Cheers, Dan

p.s. yes 12.2.4 is tagged and out -- check your favourite repo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Stefan Kooman

Quoting Dan van der Ster (d...@vanderster.com):
> Hi all,
> 
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.

12.2.4? Did you mean 12.2.3? Or did I miss something?

Gr. stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Dan van der Ster

Hi all,

I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
OSD's updated fine.

When updating the MDS's (we have 2 active and 1 standby), I started
with the standby.

At the moment the standby MDS restarted into 12.2.4 [1], both active
MDSs (still running 12.2.2) suicided like this:

2018-02-28 10:25:22.761413 7f03da1b9700  0 mds.cephdwightmds0
handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir
inode in separate object,5=mds uses versioned encoding,6=dirfrag is
stored in omap,8=no anchor table,9=file layout v2} not writeable with
daemon features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,7=mds uses inline data,8=file layout v2}, killing myself
2018-02-28 10:25:22.761429 7f03da1b9700  1 mds.cephdwightmds0 suicide.
wanted state up:active
2018-02-28 10:25:23.763226 7f03da1b9700  1 mds.0.18147 shutdown:
shutting down rank 0


2018-02-28 10:25:22.761590 7f11df538700  0 mds.cephdwightmds1
handle_mds_map mdsmap compatset compat={},rocompat={}
,incompat={1=base v0.20,2=client writeable ranges,3=default file
layouts on dirs,4=dir inode in separate object,5=m
ds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor
table,9=file layout v2} not writeable with daemo
n features compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=
dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
is stored in omap,7=mds uses inline data,8=fil
e layout v2}, killing myself
2018-02-28 10:25:22.761613 7f11df538700  1 mds.cephdwightmds1 suicide.
wanted state up:active
2018-02-28 10:25:23.765653 7f11df538700  1 mds.1.18366 shutdown:
shutting down rank 1



The cephfs cluster was down until I updated all MDS's to 12.2.4 --
then they restarted cleanly.

Looks like a pretty serious bug??!!

Cheers, Dan


[1] here is the standby restarting, 4 seconds before the active MDS's suicided:

2018-02-28 10:25:18.222865 7f9f1ea3b1c0  0 set uid:gid to 167:167 (ceph:ceph)
2018-02-28 10:25:18.222892 7f9f1ea3b1c0  0 ceph version 12.2.4
(52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
(unknown), pid 10648
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy won't install luminous (but Jewel instead)

2018-02-28 Thread Massimiliano Cuttini


This worked.

However somebody should investigate why default is still jewel on Centos 7.4


Il 28/02/2018 00:53, jorpilo ha scritto:

Try using:
ceph-deploy --release luminous host1...

 Mensaje original 
De: Massimiliano Cuttini 
Fecha: 28/2/18 12:42 a. m. (GMT+01:00)
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] ceph-deploy won't install luminous (but Jewel 
instead)


This is the 5th time that I install and after purge the installation.
Ceph Deploy is alway install JEWEL instead of Luminous.

No way even if I force the repo from default to luminous:

|https://download.ceph.com/rpm-luminous/el7/noarch|

It still install Jewel it's stuck.

I've already checked if I had installed yum-plugin-priorities, and I 
did it.

Everything is exaclty as the documentation request.
But still I get always Jewel and not Luminous.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

68 matches

Mail list logo