Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Christian Theune
Hi Brad,

> On Sep 14, 2017, at 3:15 AM, Brad Hubbard  wrote:
> 
> On Wed, Sep 13, 2017 at 8:40 PM, Florian Haas  wrote:
>> Hi everyone,
>> 
>> 
>> disclaimer upfront: this was seen in the wild on Hammer, and on 0.94.7
>> no less. Reproducing this on 0.94.10 is a pending process, and we'll
>> update here with findings, but my goal with this post is really to
> 
> Just making sure it's understood Hammer is a retired release.

thanks for the reminder. We already noticed when upgrading our development 
cluster to 0.94.10 and experienced immediate continuous segfaults that was 
known and has a fix on the branch but was also met with “EOL”. 0.94.10 is 
completely unusable for me without that fix and I pulled that in manually.

Our upgrade policy has been _extremely_ cautious (much much more than we 
usually are on other things like kernels, Qemu, etc) as we’ve been bitten over 
the last years again and again by stability and performance issues.

We’re currently on the road to finally update to Jewel but wanted to figure out 
some of the kinks that we’ve been experiencing on hammer to find out whether 
those might already be fixed in Jewel or whether we’re on our way to find just 
another major stability/performance issue that (for some reason) we seem to be 
really good at stumbling into. ;)

So - we’re aware of the (recent?) Hammer EOL and we’ve been wanting to move to 
Jewel for a while already (but _are_ happy that we haven’t been bitten by some 
of the issue in the releases up to .6 or so) but we need to tread carefully.

If anyone cares to still assist on the current issue that would be very 
appreciated. We might consider upgrading to Jewel without fixing this first but 
unfortunately our dev/staging clusters aren’t always able to predict all 
performance/stability issues that we have encountered later in production.

Cheers,
Christian


--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unknown PG state in a newly created pool.

2017-09-13 Thread dE .
Ok, removed this line and it got fixed --
crush location = "region=XX datacenter= room= row=N rack=N
chassis=N"
But why will it matter?

On Thu, Sep 14, 2017 at 12:11 PM, dE .  wrote:

> Hi,
> In my test cluster where I've just 1 OSD which's up and in --
>
> 1 osds: 1 up, 1 in
>
> I create a pool with size 1 and min_size 1 and PG of 1, or 2 or 3 or any
> no. However I cannot write objects to the cluster. The PGs are stuck in an
> unknown state --
>
> ceph -c /etc/ceph/cluster.conf health detail
> HEALTH_WARN Reduced data availability: 2 pgs inactive; Degraded data
> redundancy: 2 pgs unclean; too few PGs per OSD (2 < min 30)
> PG_AVAILABILITY Reduced data availability: 2 pgs inactive
> pg 1.0 is stuck inactive for 608.785938, current state unknown, last
> acting []
> pg 1.1 is stuck inactive for 608.785938, current state unknown, last
> acting []
> PG_DEGRADED Degraded data redundancy: 2 pgs unclean
> pg 1.0 is stuck unclean for 608.785938, current state unknown, last
> acting []
> pg 1.1 is stuck unclean for 608.785938, current state unknown, last
> acting []
> TOO_FEW_PGS too few PGs per OSD (2 < min 30)
>
> From the documentation --
> Placement groups are in an unknown state, because the OSDs that host them
> have not reported to the monitor cluster in a while.
>
> But all OSDs are up and in.
>
> Thanks for any help!
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unknown PG state in a newly created pool.

2017-09-13 Thread dE .
Hi,
In my test cluster where I've just 1 OSD which's up and in --

1 osds: 1 up, 1 in

I create a pool with size 1 and min_size 1 and PG of 1, or 2 or 3 or any
no. However I cannot write objects to the cluster. The PGs are stuck in an
unknown state --

ceph -c /etc/ceph/cluster.conf health detail
HEALTH_WARN Reduced data availability: 2 pgs inactive; Degraded data
redundancy: 2 pgs unclean; too few PGs per OSD (2 < min 30)
PG_AVAILABILITY Reduced data availability: 2 pgs inactive
pg 1.0 is stuck inactive for 608.785938, current state unknown, last
acting []
pg 1.1 is stuck inactive for 608.785938, current state unknown, last
acting []
PG_DEGRADED Degraded data redundancy: 2 pgs unclean
pg 1.0 is stuck unclean for 608.785938, current state unknown, last
acting []
pg 1.1 is stuck unclean for 608.785938, current state unknown, last
acting []
TOO_FEW_PGS too few PGs per OSD (2 < min 30)

>From the documentation --
Placement groups are in an unknown state, because the OSDs that host them
have not reported to the monitor cluster in a while.

But all OSDs are up and in.

Thanks for any help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power outages!!! help!

2017-09-13 Thread hjcho616
Rooney,
Just tried hooking up osd.0 back.  osd.0 seems to be better as I was able to 
run ceph-objectstore-tool export so decided to try hooking it up.  Looks like 
journal is not happy.  Is there any way to get this running?  Or do I need to 
start getting data using ceph-objectstore-tool?
2017-09-13 18:51:50.051421 7f44dd847800  0 set uid:gid to 1001:1001 
(ceph:ceph)2017-09-13 18:51:50.051435 7f44dd847800  0 ceph version 10.2.9 
(2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 
38992017-09-13 18:51:50.052323 7f44dd847800  0 pidfile_write: ignore empty 
--pid-file2017-09-13 18:51:50.061586 7f44dd847800  0 
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)2017-09-13 
18:51:50.061823 7f44dd847800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option2017-09-13 18:51:50.061826 
7f44dd847800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) 
detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' 
config option2017-09-13 18:51:50.061838 7f44dd847800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is 
supported2017-09-13 18:51:50.077506 7f44dd847800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)2017-09-13 18:51:50.077549 
7f44dd847800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: 
extsize is disabled by conf2017-09-13 18:51:50.078066 7f44dd847800  1 leveldb: 
Recovering log #280692017-09-13 18:51:50.177610 7f44dd847800  1 leveldb: Delete 
type=0 #28069
2017-09-13 18:51:50.177708 7f44dd847800  1 leveldb: Delete type=3 #28068
2017-09-13 18:51:57.946233 7f44dd847800  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled2017-09-13 
18:51:57.947293 7f44dd847800  1 journal _open /var/lib/ceph/osd/ceph-0/journal 
fd 18: 9998729216 bytes, block size 4096 bytes, directio = 1, aio = 12017-09-13 
18:51:57.949835 7f44dd847800 -1 journal Unable to read past sequence 27057121 
but header indicates the journal has committed up through 27057593, journal is 
corrupt2017-09-13 18:51:57.951824 7f44dd847800 -1 os/filestore/FileJournal.cc: 
In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' 
thread 7f44dd847800 time 2017-09-13 18:51:57.949837os/filestore/FileJournal.cc: 
2036: FAILED assert(0)
 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) 
[0x55c809640d02] 2: (FileJournal::read_entry(ceph::buffer::list&, unsigned 
long&, bool*)+0xa84) [0x55c8093c4da4] 3: 
(JournalingObjectStore::journal_replay(unsigned long)+0x205) [0x55c8092feb95] 
4: (FileStore::mount()+0x2e28) [0x55c8092d0a88] 5: (OSD::init()+0x27d) 
[0x55c808f697ed] 6: (main()+0x2a64) [0x55c808ed05d4] 7: 
(__libc_start_main()+0xf5) [0x7f44da6e3b45] 8: (()+0x341117) [0x55c808f1b117] 
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
--- begin dump of recent events ---   -59> 2017-09-13 18:51:50.043283 
7f44dd847800  5 asok(0x55c813d76000) register_command perfcounters_dump hook 
0x55c813cbe030   -58> 2017-09-13 18:51:50.043312 7f44dd847800  5 
asok(0x55c813d76000) register_command 1 hook 0x55c813cbe030   -57> 2017-09-13 
18:51:50.043317 7f44dd847800  5 asok(0x55c813d76000) register_command perf dump 
hook 0x55c813cbe030   -56> 2017-09-13 18:51:50.043322 7f44dd847800  5 
asok(0x55c813d76000) register_command perfcounters_schema hook 0x55c813cbe030   
-55> 2017-09-13 18:51:50.043326 7f44dd847800  5 asok(0x55c813d76000) 
register_command 2 hook 0x55c813cbe030   -54> 2017-09-13 18:51:50.043330 
7f44dd847800  5 asok(0x55c813d76000) register_command perf schema hook 
0x55c813cbe030   -53> 2017-09-13 18:51:50.043334 7f44dd847800  5 
asok(0x55c813d76000) register_command perf reset hook 0x55c813cbe030   -52> 
2017-09-13 18:51:50.043339 7f44dd847800  5 asok(0x55c813d76000) 
register_command config show hook 0x55c813cbe030   -51> 2017-09-13 
18:51:50.043344 7f44dd847800  5 asok(0x55c813d76000) register_command config 
set hook 0x55c813cbe030   -50> 2017-09-13 18:51:50.043349 7f44dd847800  5 
asok(0x55c813d76000) register_command config get hook 0x55c813cbe030   -49> 
2017-09-13 18:51:50.043355 7f44dd847800  5 asok(0x55c813d76000) 
register_command config diff hook 0x55c813cbe030   -48> 2017-09-13 
18:51:50.043361 7f44dd847800  5 asok(0x55c813d76000) register_command log flush 
hook 0x55c813cbe030   -47> 2017-09-13 18:51:50.043367 7f44dd847800  5 
asok(0x55c813d76000) register_command log dump hook 0x55c813cbe030   -46> 
2017-09-13 18:51:50.043373 7f44dd847800  5 asok(0x55c813d76000) 
register_command log reopen hook 0x55c813cbe030   -45> 2017-09-13 
18:51:50.051421 7f44dd847800  0 set uid:gid to 1001:1001 (ceph:ceph)   -44> 
2017-09-13 18:51:50.051435 7f44dd847800  0 ceph version 10.2.9 
(2ee413f77150c0f375ff6f10edd

[ceph-users] access ceph filesystem at storage level and not via ethernet

2017-09-13 Thread James Okken
Thanks Ronny! Exactly the info I need. And kinda of what I thought the answer 
would be as I was typing and thinking clearer about what I was asking. I just 
was hoping CEPH would work like this since the openstack fuel tools deploy CEPH 
storage nodes easily.
I agree I would not be using CEPH for its strengths.

I am interested further in what you've said in this paragraph though:

"if you want to have FC SAN attached storage on servers, shareable 
between servers in a usable fashion I would rather mount the same SAN 
lun on multiple servers and use a cluster filesystem like ocfs or gfs 
that is made for this kind of solution."

Please allow me to ask you a few questions regarding that even though it isn't 
CEPH specific.

Do you mean gfs/gfs2 global file system?

Does ocfs and/or gfs require some sort of management/clustering server to 
maintain and manage? (akin to a CEPH OSD)
I'd love to find a distributed/cluster filesystem where I can just partition 
and format. And then be able to mount and use that same SAN datastore from 
multiple servers without a management server.
If ocfs or gfs do need a server of this sort does it needed to be involved in 
the I/O? or will I be able to mount the datastore, similar to any other disk 
and the IO goes across the fiberchannel?

One final question, if you don't mind, do you think I could use ext4or xfs and 
"mount the same SAN lun on multiple servers" if I can guarantee each server 
will only right to its own specific directory and never anywhere the other 
servers will be writing? (I even have the SAN mapped to each server using 
different lun's)

Thanks for your expertise!

-- Jim

-- next part --

Message: 27
Date: Wed, 13 Sep 2017 19:56:07 +0200
From: Ronny Aasen 
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] access ceph filesystem at storage level and
not via ethernet
Message-ID: 
Content-Type: text/plain; charset="windows-1252"; Format="flowed"


a bit cracy :)

if the disks are directly attached on a OSD node, or attachable on 
Fiberchannel does not make a difference.  you can not shortcut the ceph 
cluster and talk to the osd disks directly without eventually destroying 
the ceph cluster.

Even if you did, ceph is an object storage on disk, so you would not 
find filesystem or RBD diskimages there, only objects on your FC 
attached osd node disks with filestore, and with bluestore not even 
readable objects.

that beeing said I think a FC SAN attached ceph osd node sounds a bit 
strange. ceph's strength is the distributed scaleable solution. and 
having the osd nodes collected on a SAN array would nuter ceph's 
strengths, and amplify ceph's weakness of high latency. i would only 
consider such a solution for testing, learning or playing around without 
having actual hardware for a distributed system.  and in that case use 1 
lun for each osd disk, give 8-10 vm's some luns/osd's each, just to 
learn how to work with ceph.

if you want to have FC SAN attached storage on servers, shareable 
between servers in a usable fashion I would rather mount the same SAN 
lun on multiple servers and use a cluster filesystem like ocfs or gfs 
that is made for this kind of solution.


kind regards
Ronny Aasen

On 13.09.2017 19:03, James Okken wrote:
>
> Hi,
>
> Novice question here:
>
> The way I understand CEPH is that it distributes data in OSDs in a 
> cluster. The reads and writes come across the ethernet as RBD requests 
> and the actual data IO then also goes across the ethernet.
>
> I have a CEPH environment being setup on a fiber channel disk array 
> (via an openstack fuel deploy). The servers using the CEPH storage 
> also have access to the same fiber channel disk array.
>
> From what I understand those servers would need to make the RDB 
> requests and do the IO across ethernet, is that correct? Even though 
> with this infrastructure setup there is a ?shorter? and faster path to 
> those disks, via the fiber channel.
>
> Is there a way to access storage on a CEPH cluster when one has this 
> ?better? access to the disks in the cluster? (how about if it were to 
> be only a single OSD with replication set to 1)
>
> Sorry if this question is crazy?
>
> thanks
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Brad Hubbard
On Wed, Sep 13, 2017 at 8:40 PM, Florian Haas  wrote:
> Hi everyone,
>
>
> disclaimer upfront: this was seen in the wild on Hammer, and on 0.94.7
> no less. Reproducing this on 0.94.10 is a pending process, and we'll
> update here with findings, but my goal with this post is really to

Just making sure it's understood Hammer is a retired release.

> establish whether the behavior as seen is expected, and if so, what
> the rationale for it is. This is all about slow requests popping up on
> a rather large scale after a previously down OSD node is brought back
> into the cluster.
>
>
> So here's the sequence of events for the issue I'm describing, as seen
> in a test:
>
> 22:08:53 - OSD node stopped. OSDs 6, 17, 18, 22, 31, 32, 36, 45, 58
> mark themselves down. Cluster has noout set, so all OSDs remain in.
> fio tests are running against RBD in a loop, thus there is heavy
> client I/O activity generating lots of new objects.
>
> 22:36:57 - OSD node restarted. OSDs 6, 17, 18, 22, 31, 32, 36, 45, 58
> boot. Recovery commences, client I/O continues.
>
> 22:36:58 - OSD 17 is marked up (this will be relevant later).
>
> 22:37:01 - Last of the booted OSDs is marked up. Client I/O continues.
>
> 22:37:28 - Slow request warnings appear. osd op complaint time is set
> to 30, so these are requests that were issued around the time of the
> node restart. In other words, the cluster gets slow immediately after
> the node comes back up.
>
> 22:49:28 - Last slow request warning seen.
>
> 23:00:43 - Recovery complete, all PGs active+clean.
>
>
> Here is an example of such a slow request, as seen in ceph.log (and ceph -w) :
> 2017-09-09 22:37:33.241347 osd.30 172.22.4.54:6808/8976 3274 : cluster
> [WRN] slow request 30.447920 seconds old, received at 2017-09-09
> 22:37:02.793382: osd_op(client.790810466.0:55974959
> rbd_data.154cd34979e2a9e3.1980 [set-alloc-hint object_size
> 4194304 write_size 4194304,write 438272~24576] 277.bbddc5af snapc
> 4f37f=[4f37f] ack+ondisk+write+known_if_redirected e1353592) currently
> waiting for degraded object
>
>
> Having bumped the OSD's log level to 10 beforehand, these are some of
> OSD 30's log entries correlating with this event, with a client
> attempting to access rbd_data.154cd34979e2a9e3.1980:
>
> 2017-09-09 22:36:50.251219 7fe6ed7e9700 10 osd.30 pg_epoch: 1353586
> pg[277.5af( v 1353586'11578774 (1353421'11574953,1353586'11578774]
> local-les=1353563 n=1735 ec=862683 les/c 1353563/1353563
> 1353562/1353562/135) [30,94] r=0 lpr=1353562 luod=1353586'11578773
> lua=1353586'11578773 crt=1353586'11578772 lcod 1353586'11578772 mlcod
> 1353586'11578772 active+undersized+degraded
> snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
> append_log: trimming to 1353586'11578772 entries 1353586'11578771
> (1353585'11578560) modify
> 277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head by
> client.790810466.0:55974546 2017-09-09
> 22:36:35.644687,1353586'11578772 (1353586'11578761) modify
> 277/b6d355af/rbd_data.28ab56362eb141f2.3483/head by
> client.790921544.0:36806288 2017-09-09 22:36:37.940141
>
> Note the 10-second gap in lines containing
> "rbd_data.154cd34979e2a9e3.1980" between this log line and
> the next. But this cluster has an unusually long osd snap trim sleep
> of 500ms, so that might contribute to the delay here. Right in
> between, at 22:36:58, OSD 17 comes up.
>
>
> Next comes what I am most curious about:
>
> 2017-09-09 22:37:00.220011 7fe6edfea700 10 osd.30 pg_epoch: 1353589
> pg[277.5af( v 1353586'11578776 (1353421'11574953,1353586'11578776]
> local-les=1353589 n=1735 ec=862683 les/c 1353563/1353563
> 1353588/1353588/135) [30,17,94] r=0 lpr=1353588
> pi=1353562-1353587/1 crt=1353586'11578773 lcod 1353586'11578775 mlcod
> 0'0 inactive 
> snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
> search_for_missing
> 277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head
> 1353586'11578774 is on osd.30
> 2017-09-09 22:37:00.220307 7fe6edfea700 10 osd.30 pg_epoch: 1353589
> pg[277.5af( v 1353586'11578776 (1353421'11574953,1353586'11578776]
> local-les=1353589 n=1735 ec=862683 les/c 1353563/1353563
> 1353588/1353588/135) [30,17,94] r=0 lpr=1353588
> pi=1353562-1353587/1 crt=1353586'11578773 lcod 1353586'11578775 mlcod
> 0'0 inactive 
> snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
> search_for_missing
> 277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head
> 1353586'11578774 also missing on osd.17
> 2017-09-09 22:37:00.220599 7fe6edfea700 10 osd.30

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Josh Durgin

On 09/13/2017 03:40 AM, Florian Haas wrote:

So we have a client that is talking to OSD 30. OSD 30 was never down;
OSD 17 was. OSD 30 is also the preferred primary for this PG (via
primary affinity). The OSD now says that

- it does itself have a copy of the object,
- so does OSD 94,
- but that the object is "also" missing on OSD 17.

So I'd like to ask firstly: what does "also" mean here?


Nothing, it's just included in all the log messages in the loop looking
at whether objects are missing.


Secondly, if the local copy is current, and we have no fewer than
min_size objects, and recovery is meant to be a background operation,
then why is the recovery in the I/O path here? Specifically, why is
that the case on a write, where the object is being modified anyway,
and the modification then needs to be replicated out to OSDs 17 and
94?


Mainly because recovery pre-dated the concept of min_size. We realized
this was a problem during luminous development, but did not complete the
fix for it in time for luminous. Nice analysis of the issue though!

I'm working on the fix (aka async recovery) for mimic. This won't be 
backportable unfortunately.


Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-13 Thread David
Hi All

I did a Jewel -> Luminous upgrade on my dev cluster and it went very
smoothly.

I've attempted to upgrade on a small production cluster but I've hit a
snag.

After installing the ceph 12.2.0 packages with "yum install ceph" on the
first node and accepting all the dependencies, I found that all the OSD
daemons, the MON and the MDS running on that node were terminated. Systemd
appears to have attempted to restart them all but the daemons didn't start
successfully (not surprising as first stage of upgrading all mons in
cluster not completed). I was able to start the MON and it's running. The
OSDs are all down and I'm reluctant to attempt to start them without
upgrading the other MONs in the cluster. I'm also reluctant to attempt
upgrading the remaining 2 MONs without understanding what happened.

The cluster is on Jewel 10.2.5 (as was the dev cluster)
Both clusters running on CentOS 7.3

The only obvious difference I can see between the dev and production is the
production has selinux running in permissive mode, the dev had it disabled.

Any advice on how to proceed at this point would be much appreciated. The
cluster is currently functional, but I have 1 node out 4 with all OSDs
down. I had noout set before the upgrade and I've left it set for now.

Here's the journalctl right after the packages were installed (hostname
changed):

https://pastebin.com/fa6NMyjG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anyone else having digest issues with Apple Mail?

2017-09-13 Thread Tomasz Kusmierz
Nope. no problem here.
> On 13 Sep 2017, at 22:05, Anthony D'Atri  wrote:
> 
> For a couple of weeks now digests have been appearing to me off and on with a 
> few sets of MIME headers and maybe 1-2 messages.  When I look at the raw text 
> the whole digest is in there.
> 
> Screencap below.  Anyone else experiencing this?
> 
> 
> https://www.evernote.com/l/AL2CMToOPiBIJYZgw9KzswqiBhHHoRIm6hA 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Anyone else having digest issues with Apple Mail?

2017-09-13 Thread Anthony D'Atri
For a couple of weeks now digests have been appearing to me off and on with a 
few sets of MIME headers and maybe 1-2 messages.  When I look at the raw text 
the whole digest is in there.

Screencap below.  Anyone else experiencing this?


https://www.evernote.com/l/AL2CMToOPiBIJYZgw9KzswqiBhHHoRIm6hA 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Usage not balanced over OSDs

2017-09-13 Thread Jack
How many PGs ? How many pool (and how many data, please post rados df)

On 13/09/2017 22:30, Sinan Polat wrote:
> Hi,
> 
>  
> 
> I have 52 OSD's in my cluster, all with the same disk size and same weight.
> 
>  
> 
> When I perform a:
> 
> ceph osd df
> 
>  
> 
> The disk with the least available space: 863G
> 
> The disk with the most available space: 1055G
> 
>  
> 
> I expect the available space or the usage on the disks to be the same, since
> they have the same weight, but there is a difference of almost 200GB.
> 
>  
> 
> Due to this, the MAX AVAIL in ceph df is lower than expected (the MAX AVAIL
> is based on the disk with the least available space).
> 
>  
> 
> -  How can I balance the disk usage over the disks, so the usage /
> available space on each disk is more or less the same?
> 
> -  What will happen if I hit the MAX AVAIL, while most of the disks
> still have space?
> 
>  
> 
> Thanks!
> 
> Sinan
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Usage not balanced over OSDs

2017-09-13 Thread Sinan Polat
Hi,

 

I have 52 OSD's in my cluster, all with the same disk size and same weight.

 

When I perform a:

ceph osd df

 

The disk with the least available space: 863G

The disk with the most available space: 1055G

 

I expect the available space or the usage on the disks to be the same, since
they have the same weight, but there is a difference of almost 200GB.

 

Due to this, the MAX AVAIL in ceph df is lower than expected (the MAX AVAIL
is based on the disk with the least available space).

 

-  How can I balance the disk usage over the disks, so the usage /
available space on each disk is more or less the same?

-  What will happen if I hit the MAX AVAIL, while most of the disks
still have space?

 

Thanks!

Sinan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's 'failsafe full'

2017-09-13 Thread Sinan Polat
Hi

 

According to: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/003140.html

 

You can set it with:

on the OSDs you may (not) want to change "osd failsafe full ratio" and "osd 
failsafe nearfull ratio".

 

Van: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Namens dE
Verzonden: woensdag 13 september 2017 16:39
Aan: ceph-users@lists.ceph.com
Onderwerp: [ceph-users] What's 'failsafe full'

 

Hello everyone,

Just started with Ceph here.

I was reading the documentation here -- 

http://docs.ceph.com/docs/master/rados/operations/health-checks/#osd-out-of-order-full

And just started to wonder what's failsafe_full... I know it's some kind of 
ratio, but how do I change it? I didn't find anything on google 

 .

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?)

2017-09-13 Thread Mark Nelson

Hi Richard,

Regarding recovery speed, have you looked through any of Neha's results 
on recovery sleep testing earlier this summer?


https://www.spinics.net/lists/ceph-devel/msg37665.html

She tested bluestore and filestore under a couple of different 
scenarios.  The gist of it is that time to recover changes pretty 
dramatically depending on the sleep setting.


I don't recall if you said earlier, but are you comparing filestore and 
bluestore recovery performance on the same version of ceph with the same 
sleep settings?


Mark

On 09/12/2017 05:24 AM, Richard Hesketh wrote:

Thanks for the links. That does seem to largely confirm that what I haven't 
horribly misunderstood anything and I've not been doing anything obviously 
wrong while converting my disks; there's no point specifying separate WAL/DB 
partitions if they're going to go on the same device, throw as much space as 
you have available at the DB partitions and they'll use all the space they can, 
and significantly reduced I/O on the DB/WAL device compared to Filestore is 
expected since bluestore's nixed the write amplification as much as possible.

I'm still seeing much reduced recovery speed on my newly Bluestored cluster, 
but I guess that's a tuning issue rather than evidence of catastrophe.

Rich

On 12/09/17 00:13, Brad Hubbard wrote:

Take a look at these which should answer at least some of your questions.

http://ceph.com/community/new-luminous-bluestore/

http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/

On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh
 wrote:

On 08/09/17 11:44, Richard Hesketh wrote:

Hi,

Reading the ceph-users list I'm obviously seeing a lot of people talking about 
using bluestore now that Luminous has been released. I note that many users 
seem to be under the impression that they need separate block devices for the 
bluestore data block, the DB, and the WAL... even when they are going to put 
the DB and the WAL on the same device!

As per the docs at 
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/ this 
is nonsense:


If there is only a small amount of fast storage available (e.g., less than a 
gigabyte), we recommend using it as a WAL device. If there is more, 
provisioning a DB
device makes more sense. The BlueStore journal will always be placed on the 
fastest device available, so using a DB device will provide the same benefit 
that the WAL
device would while also allowing additional metadata to be stored there (if it will fix). 
[sic, I assume that should be "fit"]


I understand that if you've got three speeds of storage available, there may be 
some sense to dividing these. For instance, if you've got lots of HDD, a bit of 
SSD, and a tiny NVMe available in the same host, data on HDD, DB on SSD and WAL 
on NVMe may be a sensible division of data. That's not the case for most of the 
examples I'm reading; they're talking about putting DB and WAL on the same 
block device, but in different partitions. There's even one example of someone 
suggesting to try partitioning a single SSD to put data/DB/WAL all in separate 
partitions!

Are the docs wrong and/or I am missing something about optimal bluestore setup, 
or do people simply have the wrong end of the stick? I ask because I'm just 
going through switching all my OSDs over to Bluestore now and I've just been 
reusing the partitions I set up for journals on my SSDs as DB devices for 
Bluestore HDDs without specifying anything to do with the WAL, and I'd like to 
know sooner rather than later if I'm making some sort of horrible mistake.

Rich


Having had no explanatory reply so far I'll ask further...

I have been continuing to update my OSDs and so far the performance offered by 
bluestore has been somewhat underwhelming. Recovery operations after replacing 
the Filestore OSDs with Bluestore equivalents have been much slower than 
expected, not even half the speed of recovery ops when I was upgrading 
Filestore OSDs with larger disks a few months ago. This contributes to my sense 
that I am doing something wrong.

I've found that if I allow ceph-disk to partition my DB SSDs rather than 
reusing the rather large journal partitions I originally created for Filestore, 
it is only creating very small 1GB partitions. Attempting to search for 
bluestore configuration parameters has pointed me towards 
bluestore_block_db_size and bluestore_block_wal_size config settings. 
Unfortunately these settings are completely undocumented so I'm not sure what 
their functional purpose is. In any event in my running config I seem to have 
the following default values:

# ceph-conf --show-config | grep bluestore
...
bluestore_block_create = true
bluestore_block_db_create = false
bluestore_block_db_path =
bluestore_block_db_size = 0
bluestore_block_path =
bluestore_block_preallocate_file = false
bluestore_block_size = 10737418240
bluestore_block_wal_create = false
bluestore_block_wal_path =
bluestore_block_wal_size = 

Re: [ceph-users] Ceph Mentors for next Outreachy Round

2017-09-13 Thread Ali Maredia
Cephers,

Last week I talked to quite a few of you and was encouraged by the 
initial interest in mentoring an intern for an Outreachy Project. At the 
same time many of you had questions about the schedule. 

To clarify, here is the full schedule:

1. Sept. 20, 2017   Deadline to send Leo and I project ideas
2. Oct. 2, 2017 Projects will be posted on Ceph.com
3. Oct. 23, 2017Applications and contributions deadline
4. Nov. 9, 2017 Accepted interns announced on this page at 4pm UTC
5. December 5, 2017 to March 5, 2018Internships period 

Between the deadline 1 & 2, I will be working with mentors on writing a 
description
for the website and coming up with a set of tasks for prospective students to
do to understand the project better and write a good application.

The deadlines for sending Leo and I project ideas and putting projects are
soft, however the Applications and Contributions deadline is hard.

If you have any questions or want to talk about proposing a project feel free
to reach out to Leo and I.

Best,

Ali

- Original Message -
> From: "Ali Maredia" 
> To: "Ceph Development" , ceph-us...@ceph.com
> Cc: "Leonardo Vaz" 
> Sent: Tuesday, September 5, 2017 3:03:18 PM
> Subject: Ceph Mentors for next Outreachy Round
> 
> Hey Cephers,
> 
> Leo and I are coordinators for Ceph's particpation in Outreachy
> (https://www.outreachy.org/), a program similar to the Google Summer of Code
> for groups that are traditionally underrepresented in tech. During the
> program,
> mentee's work on a project for three months under a mentor and with the rest
> of the community.
> 
> Outreachy has two rounds each year, one of which is starting in December and
> ending in March.
> 
> If you have any project ideas you would like to be a mentor for this round,
> please send Leo and I a project title and a two to three sentence
> description to start with. The deadline for proposing a project for Ceph this
> round
> is a week from today, Tuesday September 12th.
> 
> If you would like a reference for this summers Google Summer of Code projects
> to get an idea of previous projects, you can see them here:
> 
> http://ceph.com/gsoc2017-ideas/
> 
> If you have any questions please don't hesitate to ask.
> 
> Thanks,
> 
> Ali & Leo
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread Maxime Guyot
Hi,

This is a common problem when doing custom CRUSHmap, the default behavior
is to update the OSD node to location in the CRUSHmap on start. did you
keep to the defaults there?

If that is the problem, you can either:
1) Disable the update on start option: "osd crush update on start = false"
(see
http://docs.ceph.com/docs/master/rados/operations/crush-map/#crush-location)
2) Customize the script defining the location of OSDs with "crush location
hook = /path/to/customized-ceph-crush-location" (see
https://github.com/ceph/ceph/blob/master/src/ceph-crush-location.in).

Cheers,
Maxime

On Wed, 13 Sep 2017 at 18:35 German Anders  wrote:

> *# ceph health detail*
> HEALTH_OK
>
> *# ceph osd stat*
> 48 osds: 48 up, 48 in
>
> *# ceph pg stat*
> 3200 pgs: 3200 active+clean; 5336 MB data, 79455 MB used, 53572 GB / 53650
> GB avail
>
>
> *German*
>
> 2017-09-13 13:24 GMT-03:00 dE :
>
>> On 09/13/2017 09:08 PM, German Anders wrote:
>>
>> Hi cephers,
>>
>> I'm having an issue with a newly created cluster 12.2.0
>> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically when I
>> reboot one of the nodes, and when it come back, it come outside of the root
>> type on the tree:
>>
>> root@cpm01:~# ceph osd tree
>> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
>> -15   12.0 *root default*
>> * 36  nvme  1.0 osd.36 up  1.0 1.0*
>> * 37  nvme  1.0 osd.37 up  1.0 1.0*
>> * 38  nvme  1.0 osd.38 up  1.0 1.0*
>> * 39  nvme  1.0 osd.39 up  1.0 1.0*
>> * 40  nvme  1.0 osd.40 up  1.0 1.0*
>> * 41  nvme  1.0 osd.41 up  1.0 1.0*
>> * 42  nvme  1.0 osd.42 up  1.0 1.0*
>> * 43  nvme  1.0 osd.43 up  1.0 1.0*
>> * 44  nvme  1.0 osd.44 up  1.0 1.0*
>> * 45  nvme  1.0 osd.45 up  1.0 1.0*
>> * 46  nvme  1.0 osd.46 up  1.0 1.0*
>> * 47  nvme  1.0 osd.47 up  1.0 1.0*
>>  -7   36.0 *root root*
>>  -5   24.0 rack rack1
>>  -1   12.0 node cpn01
>>   01.0 osd.0  up  1.0 1.0
>>   11.0 osd.1  up  1.0 1.0
>>   21.0 osd.2  up  1.0 1.0
>>   31.0 osd.3  up  1.0 1.0
>>   41.0 osd.4  up  1.0 1.0
>>   51.0 osd.5  up  1.0 1.0
>>   61.0 osd.6  up  1.0 1.0
>>   71.0 osd.7  up  1.0 1.0
>>   81.0 osd.8  up  1.0 1.0
>>   91.0 osd.9  up  1.0 1.0
>>  101.0 osd.10 up  1.0 1.0
>>  111.0 osd.11 up  1.0 1.0
>>  -3   12.0 node cpn03
>>  241.0 osd.24 up  1.0 1.0
>>  251.0 osd.25 up  1.0 1.0
>>  261.0 osd.26 up  1.0 1.0
>>  271.0 osd.27 up  1.0 1.0
>>  281.0 osd.28 up  1.0 1.0
>>  291.0 osd.29 up  1.0 1.0
>>  301.0 osd.30 up  1.0 1.0
>>  311.0 osd.31 up  1.0 1.0
>>  321.0 osd.32 up  1.0 1.0
>>  331.0 osd.33 up  1.0 1.0
>>  341.0 osd.34 up  1.0 1.0
>>  351.0 osd.35 up  1.0 1.0
>>  -6   12.0 rack rack2
>>  -2   12.0 node cpn02
>>  121.0 osd.12 up  1.0 1.0
>>  131.0 osd.13 up  1.0 1.0
>>  141.0 osd.14 up  1.0 1.0
>>  151.0 osd.15 up  1.0 1.0
>>  161.0 osd.16 up  1.0 1.0
>>  171.0 osd.17 up  1.0 1.0
>>  181.0 osd.18 up  1.0 1.0
>>  191.0 osd.19 up  1.0 1.0
>>  201.0 osd.20 up  1.0 1.0
>>  211.0 osd.21 up  1.0 1.0
>>  221.0 osd.22 up  1.0 1.0
>>  231.0 osd.23 up  1.0 1.0
>> * -4  0 node cpn04*
>>
>> Any ideas of why this happen? and how can I fix it? It supposed to be
>> inside rack2
>>
>> Thanks in advance,
>>
>> Best,
>>
>> *German*
>>
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ce

Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread Luis Periquito
What's your "osd crush update on start" option?

further information can be found
http://docs.ceph.com/docs/master/rados/operations/crush-map/

On Wed, Sep 13, 2017 at 4:38 PM, German Anders  wrote:
> Hi cephers,
>
> I'm having an issue with a newly created cluster 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically when I
> reboot one of the nodes, and when it come back, it come outside of the root
> type on the tree:
>
> root@cpm01:~# ceph osd tree
> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -15   12.0 root default
>  36  nvme  1.0 osd.36 up  1.0 1.0
>  37  nvme  1.0 osd.37 up  1.0 1.0
>  38  nvme  1.0 osd.38 up  1.0 1.0
>  39  nvme  1.0 osd.39 up  1.0 1.0
>  40  nvme  1.0 osd.40 up  1.0 1.0
>  41  nvme  1.0 osd.41 up  1.0 1.0
>  42  nvme  1.0 osd.42 up  1.0 1.0
>  43  nvme  1.0 osd.43 up  1.0 1.0
>  44  nvme  1.0 osd.44 up  1.0 1.0
>  45  nvme  1.0 osd.45 up  1.0 1.0
>  46  nvme  1.0 osd.46 up  1.0 1.0
>  47  nvme  1.0 osd.47 up  1.0 1.0
>  -7   36.0 root root
>  -5   24.0 rack rack1
>  -1   12.0 node cpn01
>   01.0 osd.0  up  1.0 1.0
>   11.0 osd.1  up  1.0 1.0
>   21.0 osd.2  up  1.0 1.0
>   31.0 osd.3  up  1.0 1.0
>   41.0 osd.4  up  1.0 1.0
>   51.0 osd.5  up  1.0 1.0
>   61.0 osd.6  up  1.0 1.0
>   71.0 osd.7  up  1.0 1.0
>   81.0 osd.8  up  1.0 1.0
>   91.0 osd.9  up  1.0 1.0
>  101.0 osd.10 up  1.0 1.0
>  111.0 osd.11 up  1.0 1.0
>  -3   12.0 node cpn03
>  241.0 osd.24 up  1.0 1.0
>  251.0 osd.25 up  1.0 1.0
>  261.0 osd.26 up  1.0 1.0
>  271.0 osd.27 up  1.0 1.0
>  281.0 osd.28 up  1.0 1.0
>  291.0 osd.29 up  1.0 1.0
>  301.0 osd.30 up  1.0 1.0
>  311.0 osd.31 up  1.0 1.0
>  321.0 osd.32 up  1.0 1.0
>  331.0 osd.33 up  1.0 1.0
>  341.0 osd.34 up  1.0 1.0
>  351.0 osd.35 up  1.0 1.0
>  -6   12.0 rack rack2
>  -2   12.0 node cpn02
>  121.0 osd.12 up  1.0 1.0
>  131.0 osd.13 up  1.0 1.0
>  141.0 osd.14 up  1.0 1.0
>  151.0 osd.15 up  1.0 1.0
>  161.0 osd.16 up  1.0 1.0
>  171.0 osd.17 up  1.0 1.0
>  181.0 osd.18 up  1.0 1.0
>  191.0 osd.19 up  1.0 1.0
>  201.0 osd.20 up  1.0 1.0
>  211.0 osd.21 up  1.0 1.0
>  221.0 osd.22 up  1.0 1.0
>  231.0 osd.23 up  1.0 1.0
>  -4  0 node cpn04
>
> Any ideas of why this happen? and how can I fix it? It supposed to be inside
> rack2
>
> Thanks in advance,
>
> Best,
>
> German
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What's 'failsafe full'

2017-09-13 Thread dE

Hello everyone,

Just started with Ceph here.

I was reading the documentation here --

http://docs.ceph.com/docs/master/rados/operations/health-checks/#osd-out-of-order-full

And just started to wonder what's failsafe_full... I know it's some kind 
of ratio, but how do I change it? I didn't find anything on google 
.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-osd crash

2017-09-13 Thread Marcin Dulak
Hi,

It looks like at sdb size around 1.1 GBytes ceph (ceph version 12.2.0
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)) is not crashing
anymore.
Please don't increase the minimum disk size requirements unnecessarily - it
makes it more demanding to test new ceph features and operational
procedures using small, virtual testing environments.

Best regards,

Marcin

On Fri, Sep 1, 2017 at 1:49 AM, Marcin Dulak  wrote:

> Hi,
>
> /var/log/ceph/ceph-osd.0.log is attached.
> My sdb is 128MB and sdc (journal) is 16MB:
>
> [root@server0 ~]# ceph-disk list
> /dev/dm-0 other, xfs, mounted on /
> /dev/dm-1 swap, swap
> /dev/sda :
>  /dev/sda1 other, 0x83
>  /dev/sda2 other, xfs, mounted on /boot
>  /dev/sda3 other, LVM2_member
> /dev/sdb :
>  /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc1
> /dev/sdc :
>  /dev/sdc1 ceph journal, for /dev/sdb1
>
> Marcin
>
> On Thu, Aug 31, 2017 at 3:05 PM, Sage Weil  wrote:
>
>> Hi Marcin,
>>
>> Can you reproduce the crash with 'debug bluestore = 20' set, and then
>> ceph-post-file /var/log/ceph/ceph-osd.0.log?
>>
>> My guess is that we're not handling a very small device properly?
>>
>> sage
>>
>>
>> On Thu, 31 Aug 2017, Marcin Dulak wrote:
>>
>> > Hi,
>> >
>> > I have a virtual CentOS 7.3 test setup at:
>> > https://github.com/marcindulak/github-test-local/blob/
>> a339ff7505267545f593f
>> > d949a6453a56cdfd7fe/vagrant-ceph-rbd-tutorial-centos7.sh
>> >
>> > It seems to crash reproducibly with luminous, and works with kraken.
>> > Is this a known issue?
>> >
>> > [ceph_deploy.conf][DEBUG ] found configuration file at:
>> > /home/ceph/.cephdeploy.conf
>> > [ceph_deploy.cli][INFO  ] Invoked (1.5.37): /bin/ceph-deploy osd
>> activate
>> > server0:/dev/sdb1:/dev/sdc server1:/dev/sdb1:/dev/sdc
>> > server2:/dev/sdb1:/dev/sdc
>> > [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> > [ceph_deploy.cli][INFO  ]  username  : None
>> > [ceph_deploy.cli][INFO  ]  verbose   : False
>> > [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>> > [ceph_deploy.cli][INFO  ]  subcommand: activate
>> > [ceph_deploy.cli][INFO  ]  quiet : False
>> > [ceph_deploy.cli][INFO  ]  cd_conf   :
>> > 
>> > [ceph_deploy.cli][INFO  ]  cluster   : ceph
>> > [ceph_deploy.cli][INFO  ]  func  : > osd at
>> > 0x109fb90>
>> > [ceph_deploy.cli][INFO  ]  ceph_conf : None
>> > [ceph_deploy.cli][INFO  ]  default_release   : False
>> > [ceph_deploy.cli][INFO  ]  disk  : [('server0',
>> > '/dev/sdb1', '/dev/sdc'), ('server1', '/dev/sdb1', '/dev/sdc'),
>> ('server2',
>> > '/dev/sdb1', '/dev/sdc')]
>> > [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
>> > server0:/dev/sdb1:/dev/sdc server1:/dev/sdb1:/dev/sdc
>> > server2:/dev/sdb1:/dev/sdc
>> > [server0][DEBUG ] connection detected need for sudo
>> > [server0][DEBUG ] connected to host: server0
>> > [server0][DEBUG ] detect platform information from remote host
>> > [server0][DEBUG ] detect machine type
>> > [server0][DEBUG ] find the location of an executable
>> > [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.3.1611 Core
>> > [ceph_deploy.osd][DEBUG ] activating host server0 disk /dev/sdb1
>> > [ceph_deploy.osd][DEBUG ] will use init type: systemd
>> > [server0][DEBUG ] find the location of an executable
>> > [server0][INFO  ] Running command: sudo /usr/sbin/ceph-disk -v activate
>> > --mark-init systemd --mount /dev/sdb1
>> > [server0][WARNIN] main_activate: path = /dev/sdb1
>> > [server0][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is
>> > /sys/dev/block/8:17/dm/uuid
>> > [server0][WARNIN] command: Running command: /sbin/blkid -o udev -p
>> /dev/sdb1
>> > [server0][WARNIN] command: Running command: /sbin/blkid -p -s TYPE -o
>> value
>> > -- /dev/sdb1
>> > [server0][WARNIN] command: Running command: /usr/bin/ceph-conf
>> > --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
>> > [server0][WARNIN] command: Running command: /usr/bin/ceph-conf
>> > --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
>> > [server0][WARNIN] mount: Mounting /dev/sdb1 on
>> /var/lib/ceph/tmp/mnt.wfKzzb
>> > with options noatime,inode64
>> > [server0][WARNIN] command_check_call: Running command: /usr/bin/mount
>> -t xfs
>> > -o noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.wfKzzb
>> > [server0][WARNIN] command: Running command: /sbin/restorecon
>> > /var/lib/ceph/tmp/mnt.wfKzzb
>> > [server0][WARNIN] activate: Cluster uuid is
>> > 04e79ca9-308c-41a5-b40d-a2737c34238d
>> > [server0][WARNIN] command: Running command: /usr/bin/ceph-osd
>> --cluster=ceph
>> > --show-config-value=fsid
>> > [server0][WARNIN] activate: Cluster name is ceph
>> > [server0][WARNIN] activate: OSD uuid is 46d7cc0b-a087-4c8c-b00c-ff584c
>> 941cf9
>> > [server0][WARNIN] activate: OSD id is 0
>> > [server0][WARNIN] activate: Initializing OSD...

Re: [ceph-users] access ceph filesystem at storage level and not via ethernet

2017-09-13 Thread Ronny Aasen

On 13.09.2017 19:03, James Okken wrote:


Hi,

Novice question here:

The way I understand CEPH is that it distributes data in OSDs in a 
cluster. The reads and writes come across the ethernet as RBD requests 
and the actual data IO then also goes across the ethernet.


I have a CEPH environment being setup on a fiber channel disk array 
(via an openstack fuel deploy). The servers using the CEPH storage 
also have access to the same fiber channel disk array.


From what I understand those servers would need to make the RDB 
requests and do the IO across ethernet, is that correct? Even though 
with this infrastructure setup there is a “shorter” and faster path to 
those disks, via the fiber channel.


Is there a way to access storage on a CEPH cluster when one has this 
“better” access to the disks in the cluster? (how about if it were to 
be only a single OSD with replication set to 1)


Sorry if this question is crazy…

thanks



a bit cracy :)

if the disks are directly attached on a OSD node, or attachable on 
Fiberchannel does not make a difference.  you can not shortcut the ceph 
cluster and talk to the osd disks directly without eventually destroying 
the ceph cluster.


Even if you did, ceph is an object storage on disk, so you would not 
find filesystem or RBD diskimages there, only objects on your FC 
attached osd node disks with filestore, and with bluestore not even 
readable objects.


that beeing said I think a FC SAN attached ceph osd node sounds a bit 
strange. ceph's strength is the distributed scaleable solution. and 
having the osd nodes collected on a SAN array would nuter ceph's 
strengths, and amplify ceph's weakness of high latency. i would only 
consider such a solution for testing, learning or playing around without 
having actual hardware for a distributed system.  and in that case use 1 
lun for each osd disk, give 8-10 vm's some luns/osd's each, just to 
learn how to work with ceph.


if you want to have FC SAN attached storage on servers, shareable 
between servers in a usable fashion I would rather mount the same SAN 
lun on multiple servers and use a cluster filesystem like ocfs or gfs 
that is made for this kind of solution.



kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread German Anders
Thanks a lot Maxime, I did the osd_crush_update_on_start = false on
ceph.conf and push it to all the nodes, and then i create a map file:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47

# types
type 0 osd
type 1 node
type 2 rack
type 3 root

# buckets
node cpn01 {
id -1 # do not change unnecessarily
# weight 12.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.1 weight 1.000
item osd.2 weight 1.000
item osd.3 weight 1.000
item osd.4 weight 1.000
item osd.5 weight 1.000
item osd.6 weight 1.000
item osd.7 weight 1.000
item osd.8 weight 1.000
item osd.9 weight 1.000
item osd.10 weight 1.000
item osd.11 weight 1.000
}
node cpn02 {
id -2 # do not change unnecessarily
# weight 12.000
alg straw
hash 0 # rjenkins1
item osd.12 weight 1.000
item osd.13 weight 1.000
item osd.14 weight 1.000
item osd.15 weight 1.000
item osd.16 weight 1.000
item osd.17 weight 1.000
item osd.18 weight 1.000
item osd.19 weight 1.000
item osd.20 weight 1.000
item osd.21 weight 1.000
item osd.22 weight 1.000
item osd.23 weight 1.000
}
node cpn03 {
id -3 # do not change unnecessarily
# weight 12.000
alg straw
hash 0 # rjenkins1
item osd.24 weight 1.000
item osd.25 weight 1.000
item osd.26 weight 1.000
item osd.27 weight 1.000
item osd.28 weight 1.000
item osd.29 weight 1.000
item osd.30 weight 1.000
item osd.31 weight 1.000
item osd.32 weight 1.000
item osd.33 weight 1.000
item osd.34 weight 1.000
item osd.35 weight 1.000
}
node cpn04 {
id -4 # do not change unnecessarily
# weight 12.000
alg straw
hash 0 # rjenkins1
item osd.36 weight 1.000
item osd.37 weight 1.000
item osd.38 weight 1.000
item osd.39 weight 1.000
item osd.40 weight 1.000
item osd.41 weight 1.000
item osd.42 weight 1.000
item osd.43 weight 1.000
item osd.44 weight 1.000
item osd.45 weight 1.000
item osd.46 weight 1.000
item osd.47 weight 1.000
}
rack rack1 {
id -5 # do not change unnecessarily
# weight 24.000
alg straw
hash 0 # rjenkins1
item cpn01 weight 12.000
item cpn03 weight 12.000
}
rack rack2 {
id -6 # do not change unnecessarily
# weight 24.000
alg straw
hash 0 # rjenkins1
item cpn02 weight 12.000
item cpn04 weight 12.000
}
root root {
id -7 # do not change unnecessarily
# weight 48.000
alg straw
hash 0 # rjenkins1
item rack1 weight 24.000
item rack2 weight 24.000
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take root
step chooseleaf firstn 0 type node
step emit
}

# end crush map

and finally issue:
# *crushtool -c map.txt -o crushmap*
# *ceph osd setcrushmap -i crushmap*

since it's a new cluster no problem with rebalance


​Best,​

*German*

2017-09-13 13:46 GMT-03:00 Maxime Guyot :

> Hi,
>
> This is a common problem when doing custom CRUSHmap, the default behavior
> is to update the OSD node to location in the CRUSHmap on start. did you
> keep to the defaults there?
>
> If that is the problem, you can either:
> 1) Disable the update on start option: "osd crush update on start = false"
> (see http://docs.ceph.com/docs/master/rados/operations/
> crush-map/#crush-location)
> 2) Customize the script defining the location of OSDs with "crush location
> hook = /path/to/customized-ceph-crush-location" (see
> https://github.com/ceph/ceph/blob/master/src/ceph-crush-location.in).
>
> Cheers,
> Maxime
>
> On Wed, 13 Sep 2017 at 18:35 German Anders  wrote:
>
>> *# ceph health detail*
>> HEALTH_OK
>>
>> *# ceph osd stat*
>> 48 osds: 48 up, 48 in
>>
>> *# ceph pg stat*
>> 3200 pgs: 3200 active+clean; 5336 MB data, 79455 MB used, 53572 GB /
>> 53650 GB avail
>>
>>
>> *German*
>>
>> 2017-09-13 13:24 GMT-03:00 dE :
>>
>>> On 09/13/2017 09:08 PM, German Anders wrote:
>>>
>>> Hi cephers,
>>>
>>> I'm having an issue with a newly created cluster 12.2.0 (
>>> 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically when
>>> I reboot one of the nodes, and when it come back, it come outside of the
>>> root type on the tree:
>>>
>>> root@cpm01:~# ceph osd tree
>>> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGH

[ceph-users] access ceph filesystem at storage level and not via ethernet

2017-09-13 Thread James Okken
Hi,

Novice question here:

The way I understand CEPH is that it distributes data in OSDs in a cluster. The 
reads and writes come across the ethernet as RBD requests and the actual data 
IO then also goes across the ethernet.

I have a CEPH environment being setup on a fiber channel disk array (via an 
openstack fuel deploy). The servers using the CEPH storage also have access to 
the same fiber channel disk array.

>From what I understand those servers would need to make the RDB requests and 
>do the IO across ethernet, is that correct? Even though with this 
>infrastructure setup there is a "shorter" and faster path to those disks, via 
>the fiber channel.

Is there a way to access storage on a CEPH cluster when one has this "better" 
access to the disks in the cluster? (how about if it were to be only a single 
OSD with replication set to 1)

Sorry if this question is crazy...

thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Luminous] rgw not deleting object

2017-09-13 Thread Jack
Thanks for the tip

For the record, I fixed it using radosgw-admin bucket check
--bucket= --check-objects --fix


On 10/09/2017 11:44, Andreas Calminder wrote:
> Hi,
> I had a similar problem on jewel, where I was unable to properly delete
> objects eventhough radosgw-admin returned rc 0 after issuing rm, somehow
> the object was deleted but the metadata wasn't removed.
> 
> I ran
> # radosgw-admin --cluster ceph object stat --bucket=weird_bucket
> --object=$OBJECT
> 
> to figure out if the object was there or not and then used the 'rados put'
> command to upload a dummy object and then remove it properly
> # rados -c /etc/ceph/ceph.conf -p ceph.rgw.buckets.data put
> be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_$OBJECT dummy.file
> 
> Hope it helps,
> Andreas
> 
> On 10 Sep 2017 1:20 a.m., "Jack"  wrote:
> 
>> Hi,
>>
>> I face a wild issue: I cannot remove an object from rgw (via s3 API)
>>
>> My steps:
>> s3cmd ls s3://bucket/object -> it exists
>> s3cmd rm s3://bucket/object -> success
>> s3cmd ls s3://bucket/object -> it still exists
>>
>> At this point, I can curl and get the object (thus, it does exists)
>>
>> Doing the same via boto leads to the same behavior
>>
>> Log sample:
>> 2017-09-10 01:18:42.502486 7fd189e7d700  1 == starting new request
>> req=0x7fd189e77300 =
>> 2017-09-10 01:18:42.504028 7fd189e7d700  1 == req done
>> req=0x7fd189e77300 op status=-2 http_status=204 ==
>> 2017-09-10 01:18:42.504076 7fd189e7d700  1 civetweb: 0x560ebc275000:
>> 10.42.43.6 - - [10/Sep/2017:01:18:38 +0200] "DELETE /bucket/object
>> HTTP/1.1" 1 0 - Boto/2.44.0 Python/3.5.4 Linux/4.12.0-1-amd64
>>
>> What can I do ?
>> What data shall I provide to debug this issue ?
>>
>> Regards,
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread German Anders
*# ceph health detail*
HEALTH_OK

*# ceph osd stat*
48 osds: 48 up, 48 in

*# ceph pg stat*
3200 pgs: 3200 active+clean; 5336 MB data, 79455 MB used, 53572 GB / 53650
GB avail


*German*

2017-09-13 13:24 GMT-03:00 dE :

> On 09/13/2017 09:08 PM, German Anders wrote:
>
> Hi cephers,
>
> I'm having an issue with a newly created cluster 12.2.0 (
> 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically when I
> reboot one of the nodes, and when it come back, it come outside of the root
> type on the tree:
>
> root@cpm01:~# ceph osd tree
> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -15   12.0 *root default*
> * 36  nvme  1.0 osd.36 up  1.0 1.0*
> * 37  nvme  1.0 osd.37 up  1.0 1.0*
> * 38  nvme  1.0 osd.38 up  1.0 1.0*
> * 39  nvme  1.0 osd.39 up  1.0 1.0*
> * 40  nvme  1.0 osd.40 up  1.0 1.0*
> * 41  nvme  1.0 osd.41 up  1.0 1.0*
> * 42  nvme  1.0 osd.42 up  1.0 1.0*
> * 43  nvme  1.0 osd.43 up  1.0 1.0*
> * 44  nvme  1.0 osd.44 up  1.0 1.0*
> * 45  nvme  1.0 osd.45 up  1.0 1.0*
> * 46  nvme  1.0 osd.46 up  1.0 1.0*
> * 47  nvme  1.0 osd.47 up  1.0 1.0*
>  -7   36.0 *root root*
>  -5   24.0 rack rack1
>  -1   12.0 node cpn01
>   01.0 osd.0  up  1.0 1.0
>   11.0 osd.1  up  1.0 1.0
>   21.0 osd.2  up  1.0 1.0
>   31.0 osd.3  up  1.0 1.0
>   41.0 osd.4  up  1.0 1.0
>   51.0 osd.5  up  1.0 1.0
>   61.0 osd.6  up  1.0 1.0
>   71.0 osd.7  up  1.0 1.0
>   81.0 osd.8  up  1.0 1.0
>   91.0 osd.9  up  1.0 1.0
>  101.0 osd.10 up  1.0 1.0
>  111.0 osd.11 up  1.0 1.0
>  -3   12.0 node cpn03
>  241.0 osd.24 up  1.0 1.0
>  251.0 osd.25 up  1.0 1.0
>  261.0 osd.26 up  1.0 1.0
>  271.0 osd.27 up  1.0 1.0
>  281.0 osd.28 up  1.0 1.0
>  291.0 osd.29 up  1.0 1.0
>  301.0 osd.30 up  1.0 1.0
>  311.0 osd.31 up  1.0 1.0
>  321.0 osd.32 up  1.0 1.0
>  331.0 osd.33 up  1.0 1.0
>  341.0 osd.34 up  1.0 1.0
>  351.0 osd.35 up  1.0 1.0
>  -6   12.0 rack rack2
>  -2   12.0 node cpn02
>  121.0 osd.12 up  1.0 1.0
>  131.0 osd.13 up  1.0 1.0
>  141.0 osd.14 up  1.0 1.0
>  151.0 osd.15 up  1.0 1.0
>  161.0 osd.16 up  1.0 1.0
>  171.0 osd.17 up  1.0 1.0
>  181.0 osd.18 up  1.0 1.0
>  191.0 osd.19 up  1.0 1.0
>  201.0 osd.20 up  1.0 1.0
>  211.0 osd.21 up  1.0 1.0
>  221.0 osd.22 up  1.0 1.0
>  231.0 osd.23 up  1.0 1.0
> * -4  0 node cpn04*
>
> Any ideas of why this happen? and how can I fix it? It supposed to be
> inside rack2
>
> Thanks in advance,
>
> Best,
>
> *German*
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> Can we see the output of ceph health detail. Maybe they're under the
> process of recovery.
>
> Also post the output of ceph osd stat so we can see what nodes are up/in
> etc... and ceph pg stat to see the status of various PGs (a pointer to the
> recovery process).
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread dE

On 09/13/2017 09:08 PM, German Anders wrote:

Hi cephers,

I'm having an issue with a newly created cluster 12.2.0 
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically 
when I reboot one of the nodes, and when it come back, it come outside 
of the root type on the tree:


root@cpm01:~# ceph osd tree
ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-15   12.0 *root default*
* 36  nvme  1.0 osd.36   up  1.0 1.0*
* 37  nvme  1.0 osd.37   up  1.0 1.0*
* 38  nvme  1.0 osd.38   up  1.0 1.0*
* 39  nvme  1.0 osd.39   up  1.0 1.0*
* 40  nvme  1.0 osd.40   up  1.0 1.0*
* 41  nvme  1.0 osd.41   up  1.0 1.0*
* 42  nvme  1.0 osd.42   up  1.0 1.0*
* 43  nvme  1.0 osd.43   up  1.0 1.0*
* 44  nvme  1.0 osd.44   up  1.0 1.0*
* 45  nvme  1.0 osd.45   up  1.0 1.0*
* 46  nvme  1.0 osd.46   up  1.0 1.0*
* 47  nvme  1.0 osd.47   up  1.0 1.0*
 -7   36.0 *root root*
 -5   24.0 rack rack1
 -1   12.0 node cpn01
  01.0 osd.0  up  1.0 1.0
  11.0 osd.1  up  1.0 1.0
  21.0 osd.2  up  1.0 1.0
  31.0 osd.3  up  1.0 1.0
  41.0 osd.4  up  1.0 1.0
  51.0 osd.5  up  1.0 1.0
  61.0 osd.6  up  1.0 1.0
  71.0 osd.7  up  1.0 1.0
  81.0 osd.8  up  1.0 1.0
  91.0 osd.9  up  1.0 1.0
 101.0 osd.10 up  1.0 1.0
 111.0 osd.11 up  1.0 1.0
 -3   12.0 node cpn03
 241.0 osd.24 up  1.0 1.0
 251.0 osd.25 up  1.0 1.0
 261.0 osd.26 up  1.0 1.0
 271.0 osd.27 up  1.0 1.0
 281.0 osd.28 up  1.0 1.0
 291.0 osd.29 up  1.0 1.0
 301.0 osd.30 up  1.0 1.0
 311.0 osd.31 up  1.0 1.0
 321.0 osd.32 up  1.0 1.0
 331.0 osd.33 up  1.0 1.0
 341.0 osd.34 up  1.0 1.0
 351.0 osd.35 up  1.0 1.0
 -6   12.0 rack rack2
 -2   12.0 node cpn02
 121.0 osd.12 up  1.0 1.0
 131.0 osd.13 up  1.0 1.0
 141.0 osd.14 up  1.0 1.0
 151.0 osd.15 up  1.0 1.0
 161.0 osd.16 up  1.0 1.0
 171.0 osd.17 up  1.0 1.0
 181.0 osd.18 up  1.0 1.0
 191.0 osd.19 up  1.0 1.0
 201.0 osd.20 up  1.0 1.0
 211.0 osd.21 up  1.0 1.0
 221.0 osd.22 up  1.0 1.0
 231.0 osd.23 up  1.0 1.0
* -4  0 node cpn04*

Any ideas of why this happen? and how can I fix it? It supposed to be 
inside rack2


Thanks in advance,

Best,

**

*German*


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Can we see the output of ceph health detail. Maybe they're under the 
process of recovery.


Also post the output of ceph osd stat so we can see what nodes are up/in 
etc... and ceph pg stat to see the status of various PGs (a pointer to 
the recovery process).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] after reboot node appear outside the root root tree

2017-09-13 Thread German Anders
Hi cephers,

I'm having an issue with a newly created cluster 12.2.0
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc). Basically when I
reboot one of the nodes, and when it come back, it come outside of the root
type on the tree:

root@cpm01:~# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-15   12.0 *root default*
* 36  nvme  1.0 osd.36 up  1.0 1.0*
* 37  nvme  1.0 osd.37 up  1.0 1.0*
* 38  nvme  1.0 osd.38 up  1.0 1.0*
* 39  nvme  1.0 osd.39 up  1.0 1.0*
* 40  nvme  1.0 osd.40 up  1.0 1.0*
* 41  nvme  1.0 osd.41 up  1.0 1.0*
* 42  nvme  1.0 osd.42 up  1.0 1.0*
* 43  nvme  1.0 osd.43 up  1.0 1.0*
* 44  nvme  1.0 osd.44 up  1.0 1.0*
* 45  nvme  1.0 osd.45 up  1.0 1.0*
* 46  nvme  1.0 osd.46 up  1.0 1.0*
* 47  nvme  1.0 osd.47 up  1.0 1.0*
 -7   36.0 *root root*
 -5   24.0 rack rack1
 -1   12.0 node cpn01
  01.0 osd.0  up  1.0 1.0
  11.0 osd.1  up  1.0 1.0
  21.0 osd.2  up  1.0 1.0
  31.0 osd.3  up  1.0 1.0
  41.0 osd.4  up  1.0 1.0
  51.0 osd.5  up  1.0 1.0
  61.0 osd.6  up  1.0 1.0
  71.0 osd.7  up  1.0 1.0
  81.0 osd.8  up  1.0 1.0
  91.0 osd.9  up  1.0 1.0
 101.0 osd.10 up  1.0 1.0
 111.0 osd.11 up  1.0 1.0
 -3   12.0 node cpn03
 241.0 osd.24 up  1.0 1.0
 251.0 osd.25 up  1.0 1.0
 261.0 osd.26 up  1.0 1.0
 271.0 osd.27 up  1.0 1.0
 281.0 osd.28 up  1.0 1.0
 291.0 osd.29 up  1.0 1.0
 301.0 osd.30 up  1.0 1.0
 311.0 osd.31 up  1.0 1.0
 321.0 osd.32 up  1.0 1.0
 331.0 osd.33 up  1.0 1.0
 341.0 osd.34 up  1.0 1.0
 351.0 osd.35 up  1.0 1.0
 -6   12.0 rack rack2
 -2   12.0 node cpn02
 121.0 osd.12 up  1.0 1.0
 131.0 osd.13 up  1.0 1.0
 141.0 osd.14 up  1.0 1.0
 151.0 osd.15 up  1.0 1.0
 161.0 osd.16 up  1.0 1.0
 171.0 osd.17 up  1.0 1.0
 181.0 osd.18 up  1.0 1.0
 191.0 osd.19 up  1.0 1.0
 201.0 osd.20 up  1.0 1.0
 211.0 osd.21 up  1.0 1.0
 221.0 osd.22 up  1.0 1.0
 231.0 osd.23 up  1.0 1.0
* -4  0 node cpn04*

Any ideas of why this happen? and how can I fix it? It supposed to be
inside rack2

Thanks in advance,

Best,

*German*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] debian-hammer wheezy Packages file incomplete?

2017-09-13 Thread David
Case close, found answer in the mailing list archive.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-March/016706.html 


Weird though that we installed it through the repo in June 2017.
Why not put them in the archive like debian-dumpling and debian-firefly?


> 13 sep. 2017 kl. 03:09 skrev David :
> 
> Hi!
> 
> Noticed tonight during maintenance that the hammer repo for debian wheezy 
> only has 2 packages listed in the Packages file.
> Thought perhaps it's being moved to archive or something. However the files 
> are still there: https://download.ceph.com/debian-hammer/pool/main/c/ceph/ 
> 
> 
> Is it a known issue or rather a "feature" =D
> 
> Kind Regards,
> 
> David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Collectd issues

2017-09-13 Thread Marc Roos


Am I the only one having these JSON issues with collectd, did I do 
something wrong in configuration/upgrade?

Sep 13 15:44:15 c01 collectd: ceph plugin: ds 
Bluestore.kvFlushLat.avgtime was not properly initialized.
Sep 13 15:44:15 c01 collectd: ceph plugin: JSON handler failed with 
status -1.
Sep 13 15:44:15 c01 collectd: ceph plugin: 
cconn_handle_event(name=osd.6,i=2,st=4): error 1
Sep 13 15:44:15 c01 collectd: ceph plugin: ds 
Bluestore.kvFlushLat.avgtime was not properly initialized.
Sep 13 15:44:15 c01 collectd: ceph plugin: JSON handler failed with 
status -1.
Sep 13 15:44:15 c01 collectd: ceph plugin: 
cconn_handle_event(name=osd.7,i=3,st=4): error 1
Sep 13 15:44:15 c01 collectd: ceph plugin: ds 
Bluestore.kvFlushLat.avgtime was not properly initialized.
Sep 13 15:44:15 c01 collectd: ceph plugin: JSON handler failed with 
status -1.
Sep 13 15:44:15 c01 collectd: ceph plugin: 
cconn_handle_event(name=osd.8,i=4,st=4): error 1






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rgw install manual install luminous

2017-09-13 Thread Marc Roos
 

Yes this command cannot find the keyring
service ceph-radosgw@gw1 start

But this can
radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gw1 -f

I think I did not populate the /var/lib/ceph/radosgw/ceph-gw1/ folder 
correctly. Maybe the sysint is checking on 'done' file or so. I mannualy 
added the keyring there. But I don’t know the exact synaxt I should 
use, all seem to be generating the same errors.

[radosgw.ceph-gw1]
 key = xxx==

My osds have
[osd.12]
 key = xxx==
But my monitors have this one
[mon.]
 key = xxx==
 caps mon = "allow *"



-Original Message-
From: Jean-Charles Lopez [mailto:jelo...@redhat.com] 
Sent: woensdag 13 september 2017 1:06
To: Marc Roos
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Rgw install manual install luminous

Hi,

see comment in line

Regards
JC

> On Sep 12, 2017, at 13:31, Marc Roos  wrote:
> 
> 
> 
> I have been trying to setup the rados gateway (without deploy), but I 
> am missing some commands to enable the service I guess? How do I 
> populate the /var/lib/ceph/radosgw/ceph-gw1. I didn’t see any command 

> like the ceph-mon.
> 
> service ceph-radosgw@gw1 start
> Gives:
> 2017-09-12 22:26:06.390523 7fb9d7f27e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.390537 7fb9d7f27e00  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> 2017-09-12 22:26:06.390592 7fb9d7f27e00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28481
> 2017-09-12 22:26:06.412882 7fb9d7f27e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.415335 7fb9d7f27e00 -1 auth: error parsing file 
> /var/lib/ceph/radosgw/ceph-gw1/keyring
> 2017-09-12 22:26:06.415342 7fb9d7f27e00 -1 auth: failed to load
> /var/lib/ceph/radosgw/ceph-gw1/keyring: (5) Input/output error
> 2017-09-12 22:26:06.415355 7fb9d7f27e00  0 librados: client.gw1 
> initialization error (5) Input/output error
> 2017-09-12 22:26:06.415981 7fb9d7f27e00 -1 Couldn't init storage 
> provider (RADOS)
> 2017-09-12 22:26:06.669892 7f1740d89e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.669919 7f1740d89e00  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
> 2017-09-12 22:26:06.669977 7f1740d89e00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28497
> 2017-09-12 22:26:06.693019 7f1740d89e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:26:06.695963 7f1740d89e00 -1 auth: error parsing file 
> /var/lib/ceph/radosgw/ceph-gw1/keyring
> 2017-09-12 22:26:06.695971 7f1740d89e00 -1 auth: failed to load
> /var/lib/ceph/radosgw/ceph-gw1/keyring: (5) Input/output error
Looks like you don’t have the keyring for the RGW user. The error 
message tells you about the location and the filename to use.
> 2017-09-12 22:26:06.695989 7f1740d89e00  0 librados: client.gw1 
> initialization error (5) Input/output error
> 2017-09-12 22:26:06.696850 7f1740d89e00 -1 Couldn't init storage 
> provider (RADOS
> 
> radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gw1 -f 
> --log-to-stderr
> --debug-rgw=1 --debug-ms=1
> Gives:
> 2017-09-12 22:20:55.845184 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.845457 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.845508 7f9004b54e00  0 ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process 
> (unknown), pid 28122
> 2017-09-12 22:20:55.867423 7f9004b54e00 -1 WARNING: the following 
> dangerous and experimental features are enabled: bluestore
> 2017-09-12 22:20:55.869509 7f9004b54e00  1  Processor -- start
> 2017-09-12 22:20:55.869573 7f9004b54e00  1 -- - start start
> 2017-09-12 22:20:55.870324 7f9004b54e00  1 -- - --> 
> 192.168.10.111:6789/0 -- auth(proto 0 36 bytes epoch 0) v1 -- 
> 0x7f9006e6ec80 con 0
> 2017-09-12 22:20:55.870350 7f9004b54e00  1 -- - --> 
> 192.168.10.112:6789/0 -- auth(proto 0 36 bytes epoch 0) v1 -- 
> 0x7f9006e6ef00 con 0
> 2017-09-12 22:20:55.870824 7f8ff1fc4700  1 --
> 192.168.10.114:0/4093088986 learned_addr learned my addr
> 192.168.10.114:0/4093088986
> 2017-09-12 22:20:55.871413 7f8ff07c1700  1 --
> 192.168.10.114:0/4093088986 <== mon.0 192.168.10.111:6789/0 1  
> mon_map magic: 0 v1  361+0+0 (1785674138 0 0) 0x7f9006e8afc0 con 
> 0x7f90070d8800
> 2017-09-12 22:20:55.871567 7f8ff07c1700  1 --
> 192.168.10.114:0/4093088986 <== mon.0 192.168.10.111:6789/0 2  
> auth_reply(proto 2 0 (0) Success) v1  33+0+0 (4108244008 0 0) 
> 0x7f9006e6ec80 con 0x7f90070d8800
> 2017-09-12 22:20:55.871662 7f8ff07c1700  1 --
> 192.168.10.114:0/4093088986 --> 192.168.10.111:6789/0 -- auth(proto 2 
> 2 bytes epoch 0) v1 -- 0x7f9006e

[ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Florian Haas
Hi everyone,


disclaimer upfront: this was seen in the wild on Hammer, and on 0.94.7
no less. Reproducing this on 0.94.10 is a pending process, and we'll
update here with findings, but my goal with this post is really to
establish whether the behavior as seen is expected, and if so, what
the rationale for it is. This is all about slow requests popping up on
a rather large scale after a previously down OSD node is brought back
into the cluster.


So here's the sequence of events for the issue I'm describing, as seen
in a test:

22:08:53 - OSD node stopped. OSDs 6, 17, 18, 22, 31, 32, 36, 45, 58
mark themselves down. Cluster has noout set, so all OSDs remain in.
fio tests are running against RBD in a loop, thus there is heavy
client I/O activity generating lots of new objects.

22:36:57 - OSD node restarted. OSDs 6, 17, 18, 22, 31, 32, 36, 45, 58
boot. Recovery commences, client I/O continues.

22:36:58 - OSD 17 is marked up (this will be relevant later).

22:37:01 - Last of the booted OSDs is marked up. Client I/O continues.

22:37:28 - Slow request warnings appear. osd op complaint time is set
to 30, so these are requests that were issued around the time of the
node restart. In other words, the cluster gets slow immediately after
the node comes back up.

22:49:28 - Last slow request warning seen.

23:00:43 - Recovery complete, all PGs active+clean.


Here is an example of such a slow request, as seen in ceph.log (and ceph -w) :
2017-09-09 22:37:33.241347 osd.30 172.22.4.54:6808/8976 3274 : cluster
[WRN] slow request 30.447920 seconds old, received at 2017-09-09
22:37:02.793382: osd_op(client.790810466.0:55974959
rbd_data.154cd34979e2a9e3.1980 [set-alloc-hint object_size
4194304 write_size 4194304,write 438272~24576] 277.bbddc5af snapc
4f37f=[4f37f] ack+ondisk+write+known_if_redirected e1353592) currently
waiting for degraded object


Having bumped the OSD's log level to 10 beforehand, these are some of
OSD 30's log entries correlating with this event, with a client
attempting to access rbd_data.154cd34979e2a9e3.1980:

2017-09-09 22:36:50.251219 7fe6ed7e9700 10 osd.30 pg_epoch: 1353586
pg[277.5af( v 1353586'11578774 (1353421'11574953,1353586'11578774]
local-les=1353563 n=1735 ec=862683 les/c 1353563/1353563
1353562/1353562/135) [30,94] r=0 lpr=1353562 luod=1353586'11578773
lua=1353586'11578773 crt=1353586'11578772 lcod 1353586'11578772 mlcod
1353586'11578772 active+undersized+degraded
snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
append_log: trimming to 1353586'11578772 entries 1353586'11578771
(1353585'11578560) modify
277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head by
client.790810466.0:55974546 2017-09-09
22:36:35.644687,1353586'11578772 (1353586'11578761) modify
277/b6d355af/rbd_data.28ab56362eb141f2.3483/head by
client.790921544.0:36806288 2017-09-09 22:36:37.940141

Note the 10-second gap in lines containing
"rbd_data.154cd34979e2a9e3.1980" between this log line and
the next. But this cluster has an unusually long osd snap trim sleep
of 500ms, so that might contribute to the delay here. Right in
between, at 22:36:58, OSD 17 comes up.


Next comes what I am most curious about:

2017-09-09 22:37:00.220011 7fe6edfea700 10 osd.30 pg_epoch: 1353589
pg[277.5af( v 1353586'11578776 (1353421'11574953,1353586'11578776]
local-les=1353589 n=1735 ec=862683 les/c 1353563/1353563
1353588/1353588/135) [30,17,94] r=0 lpr=1353588
pi=1353562-1353587/1 crt=1353586'11578773 lcod 1353586'11578775 mlcod
0'0 inactive 
snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
search_for_missing
277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head
1353586'11578774 is on osd.30
2017-09-09 22:37:00.220307 7fe6edfea700 10 osd.30 pg_epoch: 1353589
pg[277.5af( v 1353586'11578776 (1353421'11574953,1353586'11578776]
local-les=1353589 n=1735 ec=862683 les/c 1353563/1353563
1353588/1353588/135) [30,17,94] r=0 lpr=1353588
pi=1353562-1353587/1 crt=1353586'11578773 lcod 1353586'11578775 mlcod
0'0 inactive 
snaptrimq=[4f12c~1,4f136~2,4f139~1,4f13c~1,4f13f~1,4f142~1,4f146~1,4f3f6~1,4f3fa~2,4f3fd~2,4f400~3,4f404~1,4f406~1,4f40d~2,4f410~3,4f414~2,4f419~1,4f41b~1,4f41d~3,4f421~1,4f423~1,4f426~1,4f428~1,4f42a~1]]
search_for_missing
277/bbddc5af/rbd_data.154cd34979e2a9e3.1980/head
1353586'11578774 also missing on osd.17
2017-09-09 22:37:00.220599 7fe6edfea700 10 osd.30 pg_epoch: 1353589
pg[277.5af( v 1353586'11578776 (1353421'11574953,1353586'11578776]
local-les=1353589 n=1735 ec=862683 les/c 1353563/1353563
1353588/1353588/135) [30,17,94] r=0 lpr=1353588
pi=1353562-1353587/1 crt=1353586'11578773 lcod 1353586'11578775 mlcod
0'0 inactive 
snaptrimq=[

Re: [ceph-users] moving mons across networks

2017-09-13 Thread Wido den Hollander

> Op 13 september 2017 om 10:38 schreef Dan van der Ster :
> 
> 
> Hi Blair,
> 
> You can add/remove mons on the fly -- connected clients will learn
> about all of the mons as the monmap changes and there won't be any
> downtime as long as the quorum is maintained.
> 
> There is one catch when it comes to OpenStack, however.
> Unfortunately, OpenStack persists the mon IP addresses at volume
> creation time. So next time you hard reboot a VM, it will try
> connecting to the old set of mons.
> Whatever you have in ceph.conf on the hypervisors is irrelevant (after
> a volume was created) -- libvirt uses the IPs in each instance's xml
> directly.
> 

That's why I always recommend people to use DNS and preferably a Round Robin 
DNS record to overcome these situations.

That should work with OpenStack as well.

ceph-mon.storage.local.  2001:db8::101
ceph-mon.storage.local.  2001:db8::102
ceph-mon.storage.local.  2001:db8::103

And then use *ceph-mon.storage.local* in your OpenStack configuration a MON.

Wido

> There is an old ticket here: https://bugs.launchpad.net/cinder/+bug/1452641
> It's recently gone unassigned, but there is a new proposed fix here:
> http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html
> 
> As of today, you will need to manually update nova's
> block-device-mapping table for every volume when you re-ip the mons.
> 
> Cheers, Dan
> 
> 
> On Wed, Sep 13, 2017 at 4:57 AM, Blair Bethwaite
>  wrote:
> > Hi all,
> >
> > We're looking at readdressing the mons (moving to a different subnet)
> > on one of our clusters. Most of the existing clients are OpenStack
> > guests on Libvirt+KVM and we have a major upgrade to do for those in
> > coming weeks that will mean they have to go down briefly, that will
> > give us an opportunity to update their libvirt config to point them at
> > new mon addresses. We plan to do the upgrade in a rolling fashion and
> > thus need to keep Ceph services up the whole time.
> >
> > So question is, can we for example have our existing 3 mons on network
> > N1, add another 2 mons on network N2, reconfigure VMs to use the 2 new
> > mon addresses, all whilst not impacting running clients. You can
> > assume we'll setup routing such that the new mons can talk to the old
> > mons, OSDs, and vice-versa.
> >
> > Perhaps flipping the question on its head - if you configure a librbd
> > client with only a subset of mon addresses will it *only* talk to
> > those mons, or will it just use that config to bootstrap and then talk
> > to any mons that are up in the current map? Or likewise, is there
> > anything the client has to talk to the mon master for?
> >
> > --
> > Cheers,
> > ~Blairo
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] inconsistent pg but repair does nothing reporting head data_digest != data_digest from auth oi / hopefully data seems ok

2017-09-13 Thread Laurent GUERBY
Hi,

ceph pg repair is currently not fixing three "inconsistent" objects
on one of our pg on a replica 3 pool. 

The 3 replica data objets are identical (we checked them on disk on the
3 OSD), the error says "head data_digest != data_digest from auth oi",
see below.

The data in question are used on rbd volumes from KVM and we did a read
from /dev/sdX at the right place on the VM and got a good looking
result : text file, uncorrupted according to our user, so data
currently returned by ceph and replicated 3 times seems fine.

Now the question is how to tell ceph that the replica data is correct
so that the inconsistent message disappears?

We're thinking of doing manually rados get/put but may be this is not
a good idea or there is another way.

Thanks in advance for your help,

Sincerely,

Laurent

# ceph --version
ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 9 scrub errors;
pg 58.6c1 is active+clean+inconsistent, acting [46,44,19]
...
# rados list-inconsistent-obj 58.6c1 --format=json-pretty
{
"epoch": 277681,
"inconsistents": []
}
# ceph pg repair 58.6c1

... osd 46 /var/log :

shard 19: soid 58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head 
data_digest 0x783cd2c5 != data_digest 0x501f846c from auth oi 
58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head(252707'3100507 
osd.2.0:710926 dirty|data_digest|omap_digest s 4194304 uv 3481755 dd 501f846c 
od  alloc_hint [0 0])
shard 44: soid 58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head 
data_digest 0x783cd2c5 != data_digest 0x501f846c from auth oi 
58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head(252707'3100507 
osd.2.0:710926 dirty|data_digest|omap_digest s 4194304 uv 3481755 dd 501f846c 
od  alloc_hint [0 0])
shard 46: soid 58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head 
data_digest 0x783cd2c5 != data_digest 0x501f846c from auth oi 
58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head(252707'3100507 
osd.2.0:710926 dirty|data_digest|omap_digest s 4194304 uv 3481755 dd 501f846c 
od  alloc_hint [0 0])
soid 58:83772424:::rbd_data.30fce9e39dad7a6.0007f027:head: failed to 
pick suitable auth object
shard 19: soid 58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head 
data_digest 0xd8f6895a != data_digest 0x4edc70a3 from auth oi 
58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head(77394'2047065 
osd.16.0:4500125 dirty|data_digest|omap_digest s 4194304 uv 1895034 dd 4edc70a3 
od  alloc_hint [0 0])
shard 44: soid 58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head 
data_digest 0xd8f6895a != data_digest 0x4edc70a3 from auth oi 
58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head(77394'2047065 
osd.16.0:4500125 dirty|data_digest|omap_digest s 4194304 uv 1895034 dd 4edc70a3 
od  alloc_hint [0 0])
shard 46: soid 58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head 
data_digest 0xd8f6895a != data_digest 0x4edc70a3 from auth oi 
58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head(77394'2047065 
osd.16.0:4500125 dirty|data_digest|omap_digest s 4194304 uv 1895034 dd 4edc70a3 
od  alloc_hint [0 0])
soid 58:83772d9e:::rbd_data.68cb7f74b0dc51.181e:head: failed to 
pick suitable auth object
shard 19: soid 58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head 
data_digest 0xdf8916bf != data_digest 0x47b79db8 from auth oi 
58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head(252707'3100535 
osd.2.0:710954 dirty|data_digest|omap_digest s 4194304 uv 3298154 dd 47b79db8 
od  alloc_hint [0 0])
shard 44: soid 58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head 
data_digest 0xdf8916bf != data_digest 0x47b79db8 from auth oi 
58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head(252707'3100535 
osd.2.0:710954 dirty|data_digest|omap_digest s 4194304 uv 3298154 dd 47b79db8 
od  alloc_hint [0 0])
shard 46: soid 58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head 
data_digest 0xdf8916bf != data_digest 0x47b79db8 from auth oi 
58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head(252707'3100535 
osd.2.0:710954 dirty|data_digest|omap_digest s 4194304 uv 3298154 dd 47b79db8 
od  alloc_hint [0 0])
soid 58:8377bf9a:::rbd_data.2ef7e1a528b30ea.000254f6:head: failed to 
pick suitable auth object
repair 9 errors, 0 fixed

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous BlueStore EC performance

2017-09-13 Thread Blair Bethwaite
Thanks for sharing Mohamad.

What size of IOs are these?

The tail latency breakdown is probably a major factor of importance
here too, but I guess you don't have that. Why EC21, I assume that
isn't a config anyone uses in production...? But I suppose it does
facilitate a comparison between replication and EC using PGs of the
same size.

Cheers,

On 13 September 2017 at 02:12, Mohamad Gebai  wrote:
> Sorry for the delay. We used the default k=2 and m=1.
>
> Mohamad
>
>
> On 09/07/2017 06:22 PM, Christian Wuerdig wrote:
>> What type of EC config (k+m) was used if I may ask?
>>
>> On Fri, Sep 8, 2017 at 1:34 AM, Mohamad Gebai  wrote:
>>> Hi,
>>>
>>> These numbers are probably not as detailed as you'd like, but it's
>>> something. They show the overhead of reading and/or writing to EC pools as
>>> compared to 3x replicated pools using 1, 2, 8 and 16 threads (single
>>> client):
>>>
>>>  Rep   EC Diff  Slowdown
>>>  IOPS  IOPS
>>> Read
>>> 123,32522,052 -5.46%1.06
>>> 227,26127,147 -0.42%1.00
>>> 827,15127,127 -0.09%1.00
>>> 16   26,79326,728 -0.24%1.00
>>> Write
>>> 119,444 5,708-70.64%3.41
>>> 223,902 5,395-77.43%4.43
>>> 823,912 5,641-76.41%4.24
>>> 16   24,587 5,643-77.05%4.36
>>> RW
>>> 120,37911,166-45.21%1.83
>>> 234,246 9,525-72.19%3.60
>>> 833,195 9,300-71.98%3.57
>>> 16   31,641 9,762-69.15%3.24
>>>
>>> This is on an all-SSD cluster, with 3 OSD nodes and Bluestore. Ceph version
>>> 12.1.0-671-g2c11b88d14 (2c11b88d14e64bf60c0556c6a4ec8c9eda36ff6a) luminous
>>> (rc).
>>>
>>> Mohamad
>>>
>>>
>>> On 09/06/2017 01:28 AM, Blair Bethwaite wrote:
>>>
>>> Hi all,
>>>
>>> (Sorry if this shows up twice - I got auto-unsubscribed and so first attempt
>>> was blocked)
>>>
>>> I'm keen to read up on some performance comparisons for replication versus
>>> EC on HDD+SSD based setups. So far the only recent thing I've found is
>>> Sage's Vault17 slides [1], which have a single slide showing 3X / EC42 /
>>> EC51 for Kraken. I guess there is probably some of this data to be found in
>>> the performance meeting threads, but it's hard to know the currency of those
>>> (typically master or wip branch tests) with respect to releases. Can anyone
>>> point out any other references or highlight something that's coming?
>>>
>>> I'm sure there are piles of operators and architects out there at the moment
>>> wondering how they could and should reconfigure their clusters once upgraded
>>> to Luminous. A couple of things going around in my head at the moment:
>>>
>>> * We want to get to having the bulk of our online storage in CephFS on EC
>>> pool/s...
>>> *-- is overwrite performance on EC acceptable for near-line NAS use-cases?
>>> *-- recovery implications (currently recovery on our Jewel RGW EC83 pool is
>>> _way_ slower that 3X pools, what does this do to reliability? maybe split
>>> capacity into multiple pools if it helps to contain failure?)
>>>
>>> [1]
>>> https://www.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in/37
>>>
>>> --
>>> Cheers,
>>> ~Blairo
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>



-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power outages!!! help!

2017-09-13 Thread Ronny Aasen

On 13. sep. 2017 07:04, hjcho616 wrote:

Ronny,

Did bunch of ceph pg repair pg# and got the scrub errors down to 10... 
well was 9, trying to fix one became 10.. waiting for it to fix (I did 
that noout trick as I only have two copies).  8 of those scrub errors 
looks like it would need data from osd.0.


HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 22 pgs 
degraded; 6 pgs down; 3 pgs inconsistent; 6 pgs peering; 6 pgs 
recovering; 16 pgs stale; 22 pgs stuck degraded; 6 pgs stuck inactive; 
16 pgs stuck stale; 28 pgs stuck unclean; 16 pgs stuck undersized; 16 
pgs undersized; 1 requests are blocked > 32 sec; recovery 221990/4503980 
objects degraded (4.929%); recovery 147/2251990 unfound (0.007%); 10 
scrub errors; mds cluster is degraded; no legacy OSD present but 
'sortbitwise' flag is not set


 From what I saw from ceph health detail, running osd.0 would solve 
majority of the problems.  But that was the disk with the smart error 
earlier.  I did move to new drive using ddrescue.  When trying to start 
osd.0, I get this.  Is there anyway I can get around this?




running a rescued disk is not something you should try. this is when you 
should try to export using the objectstoretool


this was the drive that failed to export pg's becouse of missing 
superblock ? you could also try the export directly on the failed drive. 
just to try if that works. you many have to run the tool as ceph user if 
that is the user owning all the files


you could try running the export of one of the pg's on osd.0 again and 
post all commands and output.


good luck

Ronny





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Christian Theune
Hi,

(thanks to Florian who’s helping us getting this sorted out)

> On Sep 13, 2017, at 12:40 PM, Florian Haas  wrote:
> 
> Hi everyone,
> 
> 
> disclaimer upfront: this was seen in the wild on Hammer, and on 0.94.7
> no less. Reproducing this on 0.94.10 is a pending process, and we'll
> update here with findings, but my goal with this post is really to
> establish whether the behavior as seen is expected, and if so, what
> the rationale for it is. This is all about slow requests popping up on
> a rather large scale after a previously down OSD node is brought back
> into the cluster.
> 
> 
> So here's the sequence of events for the issue I'm describing, as seen
> in a test:
> 
> 22:08:53 - OSD node stopped. OSDs 6, 17, 18, 22, 31, 32, 36, 45, 58
> mark themselves down. Cluster has noout set, so all OSDs remain in.
> fio tests are running against RBD in a loop, thus there is heavy
> client I/O activity generating lots of new objects.

Sorry, this got confused. This was on our production setup with regular 
production traffic, not FIO. We did run this previously on our DEV cluster and 
saw similar effects though on a somewhat smaller scale regarding the actual 
timings.

On this instance of OSD (and host) reboots I did forget to set noout and we did 
have backfill in addition to recovery for a while. However, I did set the OSDs 
back “in” before they came back online. Nevertheless, the behaviour has been 
identical with previous host restarts.

Also, thanks in advance for any light that can be shed on this,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD crash starting up

2017-09-13 Thread Gonzalo Aguilar Delgado

Hi,

I'recently updated crush map to 1 and did all relocation of the pgs. At 
the end I found that one of the OSD is not starting.


This is what it shows:


2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal (Aborted) **
 in thread 7f49cbe12700 thread_name:filestore_sync

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x9616ee) [0xa93c6ef6ee]
 2: (()+0x11390) [0x7f49d9937390]
 3: (gsignal()+0x38) [0x7f49d78d3428]
 4: (abort()+0x16a) [0x7f49d78d502a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x26b) [0xa93c7ef43b]

 6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
 7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
 8: (()+0x76ba) [0x7f49d992d6ba]
 9: (clone()+0x6d) [0x7f49d79a53dd]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- begin dump of recent events ---
-3> 2017-09-13 10:37:34.253808 7f49dac6e8c0  5 osd.1 pg_epoch: 6293 
pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 
ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 
pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] 
exit Initial 0.029683 0 0.00
-2> 2017-09-13 10:37:34.253848 7f49dac6e8c0  5 osd.1 pg_epoch: 6293 
pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 
ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 
pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] 
enter Reset
-1> 2017-09-13 10:37:34.255018 7f49dac6e8c0  5 osd.1 pg_epoch: 6293 
pg[10.90(unlocked)] enter Initial
 0> 2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal 
(Aborted) **

 in thread 7f49cbe12700 thread_name:filestore_sync

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x9616ee) [0xa93c6ef6ee]
 2: (()+0x11390) [0x7f49d9937390]
 3: (gsignal()+0x38) [0x7f49d78d3428]
 4: (abort()+0x16a) [0x7f49d78d502a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x26b) [0xa93c7ef43b]

 6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
 7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
 8: (()+0x76ba) [0x7f49d992d6ba]
 9: (clone()+0x6d) [0x7f49d79a53dd]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.1.log
--- end dump of recent events ---



Is there any way to recover it or should I open a bug?


Best regards

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 11:04 AM, Dan van der Ster  wrote:
> On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander  wrote:
>>
>>> Op 13 september 2017 om 10:38 schreef Dan van der Ster 
>>> :
>>>
>>>
>>> Hi Blair,
>>>
>>> You can add/remove mons on the fly -- connected clients will learn
>>> about all of the mons as the monmap changes and there won't be any
>>> downtime as long as the quorum is maintained.
>>>
>>> There is one catch when it comes to OpenStack, however.
>>> Unfortunately, OpenStack persists the mon IP addresses at volume
>>> creation time. So next time you hard reboot a VM, it will try
>>> connecting to the old set of mons.
>>> Whatever you have in ceph.conf on the hypervisors is irrelevant (after
>>> a volume was created) -- libvirt uses the IPs in each instance's xml
>>> directly.
>>>
>>
>> That's why I always recommend people to use DNS and preferably a Round Robin 
>> DNS record to overcome these situations.
>>
>> That should work with OpenStack as well.
>>
>> ceph-mon.storage.local.  2001:db8::101
>> ceph-mon.storage.local.  2001:db8::102
>> ceph-mon.storage.local.  2001:db8::103
>>
>> And then use *ceph-mon.storage.local* in your OpenStack configuration a MON.
>>
>
> Does that work? Last time I checked, it works like this when a new
> volume is attached:
>
>- OpenStack connects to ceph using ceph.conf, DNS, whatever...
>- Retrieve the monmap.
>- Extract the list of IPs from the monmap.
>- Persist the IPs in the block-device-mapping table.
>
> I still find that logic here:
> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L163
>

And here's that same approach in the Cinder driver:

https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L350

-- dan



> Next time you hard-reboot the VM, it will connect to the IPs directly
> -- not use ceph.conf or DNS.
>
> -- Dan
>
>
>
>> Wido
>>
>>> There is an old ticket here: https://bugs.launchpad.net/cinder/+bug/1452641
>>> It's recently gone unassigned, but there is a new proposed fix here:
>>> http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html
>>>
>>> As of today, you will need to manually update nova's
>>> block-device-mapping table for every volume when you re-ip the mons.
>>>
>>> Cheers, Dan
>>>
>>>
>>> On Wed, Sep 13, 2017 at 4:57 AM, Blair Bethwaite
>>>  wrote:
>>> > Hi all,
>>> >
>>> > We're looking at readdressing the mons (moving to a different subnet)
>>> > on one of our clusters. Most of the existing clients are OpenStack
>>> > guests on Libvirt+KVM and we have a major upgrade to do for those in
>>> > coming weeks that will mean they have to go down briefly, that will
>>> > give us an opportunity to update their libvirt config to point them at
>>> > new mon addresses. We plan to do the upgrade in a rolling fashion and
>>> > thus need to keep Ceph services up the whole time.
>>> >
>>> > So question is, can we for example have our existing 3 mons on network
>>> > N1, add another 2 mons on network N2, reconfigure VMs to use the 2 new
>>> > mon addresses, all whilst not impacting running clients. You can
>>> > assume we'll setup routing such that the new mons can talk to the old
>>> > mons, OSDs, and vice-versa.
>>> >
>>> > Perhaps flipping the question on its head - if you configure a librbd
>>> > client with only a subset of mon addresses will it *only* talk to
>>> > those mons, or will it just use that config to bootstrap and then talk
>>> > to any mons that are up in the current map? Or likewise, is there
>>> > anything the client has to talk to the mon master for?
>>> >
>>> > --
>>> > Cheers,
>>> > ~Blairo
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
On Wed, Sep 13, 2017 at 10:54 AM, Wido den Hollander  wrote:
>
>> Op 13 september 2017 om 10:38 schreef Dan van der Ster :
>>
>>
>> Hi Blair,
>>
>> You can add/remove mons on the fly -- connected clients will learn
>> about all of the mons as the monmap changes and there won't be any
>> downtime as long as the quorum is maintained.
>>
>> There is one catch when it comes to OpenStack, however.
>> Unfortunately, OpenStack persists the mon IP addresses at volume
>> creation time. So next time you hard reboot a VM, it will try
>> connecting to the old set of mons.
>> Whatever you have in ceph.conf on the hypervisors is irrelevant (after
>> a volume was created) -- libvirt uses the IPs in each instance's xml
>> directly.
>>
>
> That's why I always recommend people to use DNS and preferably a Round Robin 
> DNS record to overcome these situations.
>
> That should work with OpenStack as well.
>
> ceph-mon.storage.local.  2001:db8::101
> ceph-mon.storage.local.  2001:db8::102
> ceph-mon.storage.local.  2001:db8::103
>
> And then use *ceph-mon.storage.local* in your OpenStack configuration a MON.
>

Does that work? Last time I checked, it works like this when a new
volume is attached:

   - OpenStack connects to ceph using ceph.conf, DNS, whatever...
   - Retrieve the monmap.
   - Extract the list of IPs from the monmap.
   - Persist the IPs in the block-device-mapping table.

I still find that logic here:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L163

Next time you hard-reboot the VM, it will connect to the IPs directly
-- not use ceph.conf or DNS.

-- Dan



> Wido
>
>> There is an old ticket here: https://bugs.launchpad.net/cinder/+bug/1452641
>> It's recently gone unassigned, but there is a new proposed fix here:
>> http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html
>>
>> As of today, you will need to manually update nova's
>> block-device-mapping table for every volume when you re-ip the mons.
>>
>> Cheers, Dan
>>
>>
>> On Wed, Sep 13, 2017 at 4:57 AM, Blair Bethwaite
>>  wrote:
>> > Hi all,
>> >
>> > We're looking at readdressing the mons (moving to a different subnet)
>> > on one of our clusters. Most of the existing clients are OpenStack
>> > guests on Libvirt+KVM and we have a major upgrade to do for those in
>> > coming weeks that will mean they have to go down briefly, that will
>> > give us an opportunity to update their libvirt config to point them at
>> > new mon addresses. We plan to do the upgrade in a rolling fashion and
>> > thus need to keep Ceph services up the whole time.
>> >
>> > So question is, can we for example have our existing 3 mons on network
>> > N1, add another 2 mons on network N2, reconfigure VMs to use the 2 new
>> > mon addresses, all whilst not impacting running clients. You can
>> > assume we'll setup routing such that the new mons can talk to the old
>> > mons, OSDs, and vice-versa.
>> >
>> > Perhaps flipping the question on its head - if you configure a librbd
>> > client with only a subset of mon addresses will it *only* talk to
>> > those mons, or will it just use that config to bootstrap and then talk
>> > to any mons that are up in the current map? Or likewise, is there
>> > anything the client has to talk to the mon master for?
>> >
>> > --
>> > Cheers,
>> > ~Blairo
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] moving mons across networks

2017-09-13 Thread Dan van der Ster
Hi Blair,

You can add/remove mons on the fly -- connected clients will learn
about all of the mons as the monmap changes and there won't be any
downtime as long as the quorum is maintained.

There is one catch when it comes to OpenStack, however.
Unfortunately, OpenStack persists the mon IP addresses at volume
creation time. So next time you hard reboot a VM, it will try
connecting to the old set of mons.
Whatever you have in ceph.conf on the hypervisors is irrelevant (after
a volume was created) -- libvirt uses the IPs in each instance's xml
directly.

There is an old ticket here: https://bugs.launchpad.net/cinder/+bug/1452641
It's recently gone unassigned, but there is a new proposed fix here:
http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html

As of today, you will need to manually update nova's
block-device-mapping table for every volume when you re-ip the mons.

Cheers, Dan


On Wed, Sep 13, 2017 at 4:57 AM, Blair Bethwaite
 wrote:
> Hi all,
>
> We're looking at readdressing the mons (moving to a different subnet)
> on one of our clusters. Most of the existing clients are OpenStack
> guests on Libvirt+KVM and we have a major upgrade to do for those in
> coming weeks that will mean they have to go down briefly, that will
> give us an opportunity to update their libvirt config to point them at
> new mon addresses. We plan to do the upgrade in a rolling fashion and
> thus need to keep Ceph services up the whole time.
>
> So question is, can we for example have our existing 3 mons on network
> N1, add another 2 mons on network N2, reconfigure VMs to use the 2 new
> mon addresses, all whilst not impacting running clients. You can
> assume we'll setup routing such that the new mons can talk to the old
> mons, OSDs, and vice-versa.
>
> Perhaps flipping the question on its head - if you configure a librbd
> client with only a subset of mon addresses will it *only* talk to
> those mons, or will it just use that config to bootstrap and then talk
> to any mons that are up in the current map? Or likewise, is there
> anything the client has to talk to the mon master for?
>
> --
> Cheers,
> ~Blairo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com