Re: [ceph-users] Help with the Hammer to Jewel upgrade procedure without loosing write access to the buckets

2017-01-26 Thread George Mihaiescu
Hi Mohammed,

Thanks for the hint, I think I remember seeing this when Jewel came out but I 
assumed it must be a mistake, or a mere recommendation but not a mandatory 
requirement because I always upgraded the OSDs last ones.

Today I upgraded my OSD nodes in the test environment to Jewel and regained 
write access to the buckets.

In production we have multiple RGW nodes behind load balancers, so we can 
upgrade them one at a time.

If we have to upgrade first all OSD nodes (which takes much longer considering 
they are many more) while the old Hammer RGW cannot talk to a Jewel cluster, 
then it means one cannot perform a live upgrade of Ceph, which I think breaks 
the promise of a large, distributed, always on storage system...

Now I'll have to test what happens with the cinder volumes attached to a Hammer 
cluster that's being upgraded to Jewel, and if upgrading the Ceph packages on 
the compute nodes to Jewel will require a restart of the VMs or reboot of the 
servers.

Thank you again for your help,
George


> On Jan 25, 2017, at 19:10, Mohammed Naser  wrote:
> 
> George,
> 
> I believe the supported upgrade model is monitors, OSDs, metadata servers and 
> object gateways finally.
> 
> I would suggest trying to support path, if you’re still having issues *with* 
> the correct upgrade sequence, I would look further into it
> 
> Thanks
> Mohammed
> 
>> On Jan 25, 2017, at 6:24 PM, George Mihaiescu  wrote:
>> 
>> 
>> Hi,
>> 
>> I need your help with upgrading our cluster from Hammer (last version) to 
>> Jewel 10.2.5 without loosing write access to Radosgw.
>> 
>> We have a fairly large cluster (4.3 PB raw) mostly used to store large S3 
>> objects, and we currently have more than 500 TB of data in the 
>> ".rgw.buckets" pool, so I'm very cautious about upgrading it to Jewel. 
>> The plan is to upgrade Ceph-mon and Radosgw to 10.2.5, while keeping the OSD 
>> nodes on Hammer, then slowly update them as well.
>> 
>> 
>> I am currently testing the upgrade procedure in a lab environment, but once 
>> I update ceph-mon and radosgw to Jewel, I cannot upload files into new or 
>> existing buckets anymore, but I can still create new buckets.
>> 
>> 
>> I read [1], [2], [3] and [4] and even ran the script in [4] as it can be 
>> seen below, but still cannot upload new objects.
>> 
>> I was hoping that if I wait long enough to update from Hammer to Jewel, most 
>> of the big issues will be solved by point releases, but it seems that I'm 
>> doing something wrong, probably because of lack of up to date documentation.
>> 
>> 
>> 
>> After the update to Jewel, this is how things look in my test environment.
>> 
>> root@ceph-mon1:~# radosgw zonegroup get
>> 
>> root@ceph-mon1:~# radosgw-admin period get
>> period init failed: (2) No such file or directory
>> 2017-01-25 10:13:06.941018 7f98f0d13900  0 RGWPeriod::init failed to init 
>> realm  id  : (2) No such file or directory
>> 
>> root@ceph-mon1:~# radosgw-admin zonegroup get
>> failed to init zonegroup: (2) No such file or directory
>> 
>> root@ceph-mon1:~# ceph --version
>> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>> 
>> root@ceph-mon1:~# radosgw-admin realm list
>> {
>> "default_info": "",
>> "realms": []
>> }
>> 
>> root@ceph-mon1:~# radosgw-admin period list
>> {
>> "periods": []
>> }
>> 
>> root@ceph-mon1:~# radosgw-admin period get
>> period init failed: (2) No such file or directory
>> 2017-01-25 12:26:07.217986 7f97ca82e900  0 RGWPeriod::init failed to init 
>> realm  id  : (2) No such file or directory
>> 
>> root@ceph-mon1:~# radosgw-admin zonegroup get --rgw-zonegroup=default
>> {
>> "id": "default",
>> "name": "default",
>> "api_name": "",
>> "is_master": "true",
>> "endpoints": [],
>> "hostnames": [],
>> "hostnames_s3website": [],
>> "master_zone": "default",
>> "zones": [
>> {
>> "id": "default",
>> "name": "default",
>> "endpoints": [],
>> "log_meta": "false",
>> "log_data": "false",
>> "bucket_index_max_shards": 0,
>> "read_only": "false"
>> }
>> ],
>> "placement_targets": [
>> {
>> "name": "default-placement",
>> "tags": []
>> }
>> ],
>> "default_placement": "default-placement",
>> "realm_id": ""
>> }
>> 
>> root@ceph-mon1:~# radosgw-admin zone get --zone-id=default
>> {
>> "id": "default",
>> "name": "default",
>> "domain_root": ".rgw",
>> "control_pool": ".rgw.control",
>> "gc_pool": ".rgw.gc",
>> "log_pool": ".log",
>> "intent_log_pool": ".intent-log",
>> "usage_log_pool": ".usage",
>> "user_keys_pool": ".users",
>> "user_email_pool": ".users.email",
>> "user_swift_pool": ".users.swift",
>> "user_uid_pool": ".users.uid",
>> "system_key": {
>> "access_key": "",
>> "secret_key": ""
>> },
>> "placement_pools": [
>> {
>> "key

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-26 Thread Samuel Just
Just an update.  I think the real goal with the sleep configs in general
was to reduce the number of concurrent snap trims happening.  To that end,
I've put together a branch which adds an AsyncReserver (as with backfill)
for snap trims to each OSD.  Before actually starting to do trim work, the
primary will wait in line to get one of the slots and will hold that slot
until the repops are complete.
https://github.com/athanatos/ceph/tree/wip-snap-trim-sleep is the branch
(based on master), but I've got a bit more work to do (and testing to do)
before it's ready to be tested.
-Sam

On Fri, Jan 20, 2017 at 2:05 PM, Nick Fisk  wrote:

> Hi Sam,
>
>
>
> I have a test cluster, albeit small. I’m happy to run tests + graph
> results with a wip branch and work out reasonable settings…etc
>
>
>
> *From:* Samuel Just [mailto:sj...@redhat.com]
> *Sent:* 19 January 2017 23:23
> *To:* David Turner 
>
> *Cc:* Nick Fisk ; ceph-users 
> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
>
>
> I could probably put together a wip branch if you have a test cluster you
> could try it out on.
>
> -Sam
>
>
>
> On Thu, Jan 19, 2017 at 2:27 PM, David Turner <
> david.tur...@storagecraft.com> wrote:
>
> To be clear, we are willing to change to a snap_trim_sleep of 0 and try to
> manage it with the other available settings... but it is sounding like that
> won't really work for us since our main op thread(s) will just be saturated
> with snap trimming almost all day.  We currently only have ~6 hours/day
> where our snap trim q's are empty.
> --
>
>
> 
>
> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology
> Corporation
> 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943
> <(385)%20224-2943>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> --
> --
>
> *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of David
> Turner [david.tur...@storagecraft.com]
> *Sent:* Thursday, January 19, 2017 3:25 PM
> *To:* Samuel Just; Nick Fisk
>
>
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
>
>
> We are a couple of weeks away from upgrading to Jewel in our production
> clusters (after months of testing in our QA environments), but this might
> prevent us from making the migration from Hammer.   We delete ~8,000
> snapshots/day between 3 clusters and our snap_trim_q gets up to about 60
> Million in each of those clusters.  We have to use an osd_snap_trim_sleep
> of 0.25 to prevent our clusters from falling on their faces during our big
> load and 0.1 the rest of the day to catch up on the snap trim q.
>
> Is our setup possible to use on Jewel?
> --
>
>
> 
>
> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology
> Corporation
> 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943
> <(385)%20224-2943>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> --
>
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Samuel
> Just [sj...@redhat.com]
> Sent: Thursday, January 19, 2017 2:45 PM
> To: Nick Fisk
> Cc: ceph-users
> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?
>
> Yeah, I think you're probably right.  The answer is probably to add an
> explicit rate-limiting element to the way the snaptrim events are
> scheduled.
> -Sam
>
> On Thu, Jan 19, 2017 at 1:34 PM, Nick Fisk  wrote:
> > I will give those both a go and report back, but the more I thinking
> about this the less I'm convinced that it's going to help.
> >
> > I think the problem is a general IO imbalance, there is probably
>

[ceph-users] Ceph Tech Talk in ~2 hrs

2017-01-26 Thread Patrick McGarry
Hey cephers,

Just a reminder that the 'Getting Started with Ceph Development' Ceph
Tech Talk [0] is start in about 2 hours. Sage is going to walk through
the process from start to finish, so if you have coworkers, friends,
or anyone that might be interested in getting started with Ceph,
please send them our way!

If you are already experienced in Ceph Development, feel free to use
the Q&A to discuss ways of improving the experience for new developers
and what you might want to see changed. Thanks!

[0] http://ceph.com/ceph-tech-talks/

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Proxmox VE

2017-01-26 Thread David Turner
Is there a guide on how to add Proxmox to an existing ceph deployment?  I
haven't quite gotten it so that Proxmox can manage ceph, just look at it
and access it.

On Thu, Jan 26, 2017, 6:08 AM Martin Maurer  wrote:

> Hello Everyone!
>
> We just created a new tutorial for installing Ceph Jewel on Proxmox VE.
>
> The Ceph Server integration in Proxmox VE is already available since
> three years and is a widely used component for smaller deployments to
> get a real open source hyper-converged virtualization and storage setup,
> highly scalable and without limits.
>
> Video Tutorial
> https://youtu.be/jFFLINtNnXs
>
> Documentation
> https://pve.proxmox.com/wiki/Ceph_Server
>
> --
> Best Regards,
>
> Martin Maurer
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS flapping: how to increase MDS timeouts?

2017-01-26 Thread John Spray
On Thu, Jan 26, 2017 at 8:18 AM, Burkhard Linke
 wrote:
> HI,
>
>
> we are running two MDS servers in active/standby-replay setup. Recently we
> had to disconnect active MDS server, and failover to standby works as
> expected.
>
>
> The filesystem currently contains over 5 million files, so reading all the
> metadata information from the data pool took too long, since the information
> was not available on the OSD page caches. The MDS was timed out by the mons,
> and a failover switch to the former active MDS (which was available as
> standby again) happened. This MDS in turn had to read the metadata, again
> running into a timeout, failover, etc. I resolved the situation by disabling
> one of the MDS, which kept the mons from failing the now solely available
> MDS.

The MDS does not re-read every inode on startup -- rather, it replays
its journal (the overall number of files in your system does not
factor into this).

> So given a large filesystem, how do I prevent failover flapping between MDS
> instances that are in the rejoin state and reading the inode information?

The monitor's decision to fail an unresponsive MDS is based on the MDS
not sending a beacon to the mon -- there is no limit on how long an
MDS is allowed to stay in a given state (such as rejoin).

So there are two things to investigate here:
 * Why is the MDS taking so long to start?
 * Why is the MDS failing to send beacons to the monitor while it is
in whatever process that is taking it so long?

The answer to both is likely to be found in an MDS log with the debug
level turned up, gathered as it starts up.

John


>
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with upgrade from 0.94.9 to 10.2.5

2017-01-26 Thread Piotr Dałek

On 01/24/2017 03:57 AM, Mike Lovell wrote:

i was just testing an upgrade of some monitors in a test cluster from hammer
(0.94.7) to jewel (10.2.5). after upgrade each of the first two monitors, i
stopped and restarted a single osd to cause changes in the maps. the same
error messages showed up in ceph -w. i haven't dug into it much but just
wanted to second that i've seen this happen on a recent hammer to recent
jewel upgrade.


Thanks for confirmation.
We've prepared the patch which fixes the issue for us:
https://github.com/ceph/ceph/pull/13131


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph on Proxmox VE

2017-01-26 Thread Martin Maurer
Hello Everyone!

We just created a new tutorial for installing Ceph Jewel on Proxmox VE.

The Ceph Server integration in Proxmox VE is already available since
three years and is a widely used component for smaller deployments to
get a real open source hyper-converged virtualization and storage setup,
highly scalable and without limits.

Video Tutorial
https://youtu.be/jFFLINtNnXs

Documentation
https://pve.proxmox.com/wiki/Ceph_Server

-- 
Best Regards,

Martin Maurer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Suddenly having slow writes

2017-01-26 Thread Mark Nelson

On 01/26/2017 06:16 AM, Florent B wrote:

On 01/24/2017 07:26 PM, Mark Nelson wrote:


My first thought is that PGs are splitting.  You only appear to have
168PGs for 9 OSDs, that's not nearly enough.  Beyond the poor data
distribution and associated performance imbalance, your PGs will split
very quickly since by default PGs start splitting at 320 Objects each.
Typically this is less of a problem with RBD since by default it uses
4MB objects (and thus there are fewer bigger objects), but with only
168 PGs you are likely to be heavily splitting by the time you hit
~218GB of data (make sure to take replication into account).

Normally PG splitting shouldn't be terribly expensive since it's
basically just reading a directory xattr, readdir on a small
directory, then a bunch of link/unlink operations.  When SELinux is
enable it appears that link/unlink might require an xattr read on each
object/file to determine if the link/unlink can happen.  That's a ton
of extra seek overhead.  On spinning disks this is especially bad with
XFS since subdirs may not be in the same AG as a parent dir, so after
subsequent splits, the directories become fragmented and those reads
happen all over the disk (not as much of a problem with SSDs though).

Anyway, that's my guess as to what's going on, but it could be
something else.  blktrace and/or xfs's kernel debugging stuff would
probably lend some supporting evidence if this is what's going on.

Mark


Hi Mark,

You're right, it seems that was the problem. I set more pg_num & pgp_num
to every pool and no more blocked requests ! (after few days of
backfillings)

Maybe monitors should set a warning in Ceph status when this situation
occurs, no ? It already exists "too few PGs per OSD" warning, but I
never hit it in my cluster.

Thank you a lot Mark !


No problem.  Just a warning though:  It's possible you are only delaying 
the problem until you hit the split/merge thresholds again.  If you are 
only doing RBD it's possible you'll never hit them (depending on the 
size of your RBD volumes, block/object sizes, and replication levels). 
More PGs does usually mean that PG splitting will be more spread out, 
but it also means there are more PGs to split in total at some point in 
the future.


One of the things Josh was working on was to let you pre-split PGs.  It 
means a bit slower behavior unfront, but avoids a lot of initial 
splitting and should improve performance with lots of objects.  Also, 
once bluestore is production ready, it avoids all of this entirely since 
it doesn't store objects in directory hierarchies on a traditional 
filesystem like filestore does.


Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block

Glad I could help! :-)


Zitat von Mio Vlahović :


From: Eugen Block [mailto:ebl...@nde.ag]

 From what I understand, with a rep size of 2 the cluster can't decide
which object is intact if one is broken, so the repair fails. If you
had a size of 3, the cluster would see 2 intact objects an repair the
broken one (I guess). At least we didn't have these inconsistencies
since we increased the size to 3.


I understand. Anyway, we have a healthy cluster again :)

After few ERR in logs...
2017-01-26 06:08:48.147132 osd.3 192.168.12.150:6802/5421 129 :  
cluster [ERR] 10.55 shard 3: soid  
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head data_digest 0xc44df2ba != known data_digest 0xff59029 from auth shard  
4
2017-01-26 06:19:55.708510 osd.3 192.168.12.150:6802/5421 130 :  
cluster [ERR] 10.55 deep-scrub 0 missing, 1 inconsistent objects
2017-01-26 06:19:55.708514 osd.3 192.168.12.150:6802/5421 131 :  
cluster [ERR] 10.55 deep-scrub 1 errors
2017-01-26 10:00:48.267405 osd.3 192.168.12.150:6806/18501 2 :  
cluster [ERR] 10.55 shard 3 missing  
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
2017-01-26 10:06:56.062854 osd.3 192.168.12.150:6806/18501 3 :  
cluster [ERR] 10.55 scrub 1 missing, 0 inconsistent objects
2017-01-26 10:06:56.062859 osd.3 192.168.12.150:6806/18501 4 :  
cluster [ERR] 10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 12:54:45.748066 osd.3 192.168.12.150:6806/18501 18 :  
cluster [ERR] 10.55 shard 3: soid  
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head size 0 != known size 52102, missing attr _, missing attr _user.rgw.acl, missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr _user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr _user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr  
snapset
2017-01-26 13:02:18.014584 osd.3 192.168.12.150:6806/18501 19 :  
cluster [ERR] 10.55 scrub 0 missing, 1 inconsistent objects
2017-01-26 13:02:18.014607 osd.3 192.168.12.150:6806/18501 20 :  
cluster [ERR] 10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 13:16:56.634322 osd.3 192.168.12.150:6806/18501 22 :  
cluster [ERR] 10.55 shard 3: soid  
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head data_digest 0x != known data_digest 0xff59029 from auth shard 4, size 0 != known size 52102, missing attr _, missing attr _user.rgw.acl, missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr _user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr _user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr  
snapset


We got this:
2017-01-26 13:31:04.577603 osd.3 192.168.12.150:6806/18501 23 :  
cluster [ERR] 10.55 repair 0 missing, 1 inconsistent objects
2017-01-26 13:31:04.596102 osd.3 192.168.12.150:6806/18501 24 :  
cluster [ERR] 10.55 repair 1 errors, 1 fixed


And...
# ceph -s
cluster 2bf80721-fceb-4b63-89ee-1a5faa278493
 health HEALTH_OK
 monmap e1: 1 mons at {cephadm01=192.168.12.150:6789/0}
election epoch 7, quorum 0 cephadm01
 osdmap e580: 9 osds: 9 up, 9 in
flags sortbitwise
  pgmap v11436879: 664 pgs, 13 pools, 1011 GB data, 13900 kobjects
2143 GB used, 2354 GB / 4497 GB avail
 661 active+clean
   3 active+clean+scrubbing

Your method worked! Thank you for you time and help! I will see if  
we can add some more disk to set the replica to 3.




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Mio Vlahović
> From: Eugen Block [mailto:ebl...@nde.ag]
> 
>  From what I understand, with a rep size of 2 the cluster can't decide
> which object is intact if one is broken, so the repair fails. If you
> had a size of 3, the cluster would see 2 intact objects an repair the
> broken one (I guess). At least we didn't have these inconsistencies
> since we increased the size to 3.

I understand. Anyway, we have a healthy cluster again :)

After few ERR in logs...
2017-01-26 06:08:48.147132 osd.3 192.168.12.150:6802/5421 129 : cluster [ERR] 
10.55 shard 3: soid 
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
 data_digest 0xc44df2ba != known data_digest 0xff59029 from auth shard 4
2017-01-26 06:19:55.708510 osd.3 192.168.12.150:6802/5421 130 : cluster [ERR] 
10.55 deep-scrub 0 missing, 1 inconsistent objects
2017-01-26 06:19:55.708514 osd.3 192.168.12.150:6802/5421 131 : cluster [ERR] 
10.55 deep-scrub 1 errors
2017-01-26 10:00:48.267405 osd.3 192.168.12.150:6806/18501 2 : cluster [ERR] 
10.55 shard 3 missing 
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
2017-01-26 10:06:56.062854 osd.3 192.168.12.150:6806/18501 3 : cluster [ERR] 
10.55 scrub 1 missing, 0 inconsistent objects
2017-01-26 10:06:56.062859 osd.3 192.168.12.150:6806/18501 4 : cluster [ERR] 
10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 12:54:45.748066 osd.3 192.168.12.150:6806/18501 18 : cluster [ERR] 
10.55 shard 3: soid 
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
 size 0 != known size 52102, missing attr _, missing attr _user.rgw.acl, 
missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing attr 
_user.rgw.idtag, missing attr _user.rgw.manifest, missing attr 
_user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr 
_user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr snapset
2017-01-26 13:02:18.014584 osd.3 192.168.12.150:6806/18501 19 : cluster [ERR] 
10.55 scrub 0 missing, 1 inconsistent objects
2017-01-26 13:02:18.014607 osd.3 192.168.12.150:6806/18501 20 : cluster [ERR] 
10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 13:16:56.634322 osd.3 192.168.12.150:6806/18501 22 : cluster [ERR] 
10.55 shard 3: soid 
10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
 data_digest 0x != known data_digest 0xff59029 from auth shard 4, size 
0 != known size 52102, missing attr _, missing attr _user.rgw.acl, missing attr 
_user.rgw.content_type, missing attr _user.rgw.etag, missing attr 
_user.rgw.idtag, missing attr _user.rgw.manifest, missing attr 
_user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr 
_user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr snapset

We got this:
2017-01-26 13:31:04.577603 osd.3 192.168.12.150:6806/18501 23 : cluster [ERR] 
10.55 repair 0 missing, 1 inconsistent objects
2017-01-26 13:31:04.596102 osd.3 192.168.12.150:6806/18501 24 : cluster [ERR] 
10.55 repair 1 errors, 1 fixed

And...
# ceph -s
cluster 2bf80721-fceb-4b63-89ee-1a5faa278493
 health HEALTH_OK
 monmap e1: 1 mons at {cephadm01=192.168.12.150:6789/0}
election epoch 7, quorum 0 cephadm01
 osdmap e580: 9 osds: 9 up, 9 in
flags sortbitwise
  pgmap v11436879: 664 pgs, 13 pools, 1011 GB data, 13900 kobjects
2143 GB used, 2354 GB / 4497 GB avail
 661 active+clean
   3 active+clean+scrubbing

Your method worked! Thank you for you time and help! I will see if we can add 
some more disk to set the replica to 3.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block

Yes, we have replication size of 2 also


From what I understand, with a rep size of 2 the cluster can't decide  
which object is intact if one is broken, so the repair fails. If you  
had a size of 3, the cluster would see 2 intact objects an repair the  
broken one (I guess). At least we didn't have these inconsistencies  
since we increased the size to 3.



Zitat von Mio Vlahović :


Hello,


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
Behalf Of Eugen Block
I had a similar issue recently, where I had a replication size of 2 (I
changed that to 3 after the recovery).


Yes, we have replication size of 2 also...


ceph health detail
HEALTH_ERR 16 pgs inconsistent; 261 scrub errors
pg 1.bb1 is active+clean+inconsistent, acting [15,5]

[...CUT...]

So this object was completely missing. A ceph repair didn't work, I
wasn't sure why. So I just created the empty object:

ceph-node2:~ # touch
/var/lib/ceph/osd/ceph-
5/current/1.bb1_head/rbd\\udata.16be96558ea798.022f_
_head_D7879BB1__1

ceph-node2:~ # ceph pg repair 1.bb1

and the result in the logs:

[...] cluster [INF] 1.bb1 repair starts
[...] cluster [ERR] 1.bb1 shard 5: soid
1/d7879bb1/rbd_data.16be96558ea798.022f/head
data_digest
0x != best guess data_digest  0xead60f2d from auth shard 15,
size 0 != known size 6565888, missing attr _, missing attr snapset
[...] cluster [ERR] 1.bb1 repair 0 missing, 1 inconsistent objects
[...] cluster [ERR] 1.bb1 repair 1 errors, 1 fixed


I have tried your suggestion, now we have to wait and see the  
result, so far, we still have 1 pg inconsistent...nothing  
interesting in the logs regarding this pg.


Regards!




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Mio Vlahović
Hello,

> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> Behalf Of Eugen Block 
> I had a similar issue recently, where I had a replication size of 2 (I
> changed that to 3 after the recovery).

Yes, we have replication size of 2 also...

> ceph health detail
> HEALTH_ERR 16 pgs inconsistent; 261 scrub errors
> pg 1.bb1 is active+clean+inconsistent, acting [15,5]
> 
> [...CUT...] 
>
> So this object was completely missing. A ceph repair didn't work, I
> wasn't sure why. So I just created the empty object:
> 
> ceph-node2:~ # touch
> /var/lib/ceph/osd/ceph-
> 5/current/1.bb1_head/rbd\\udata.16be96558ea798.022f_
> _head_D7879BB1__1
> 
> ceph-node2:~ # ceph pg repair 1.bb1
> 
> and the result in the logs:
> 
> [...] cluster [INF] 1.bb1 repair starts
> [...] cluster [ERR] 1.bb1 shard 5: soid
> 1/d7879bb1/rbd_data.16be96558ea798.022f/head
> data_digest
> 0x != best guess data_digest  0xead60f2d from auth shard 15,
> size 0 != known size 6565888, missing attr _, missing attr snapset
> [...] cluster [ERR] 1.bb1 repair 0 missing, 1 inconsistent objects
> [...] cluster [ERR] 1.bb1 repair 1 errors, 1 fixed

I have tried your suggestion, now we have to wait and see the result, so far, 
we still have 1 pg inconsistent...nothing interesting in the logs regarding 
this pg.

Regards!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inherent insecurity of OSD daemons when using only a "public network"

2017-01-26 Thread Willem Jan Withagen
On 13-1-2017 12:45, Willem Jan Withagen wrote:
> On 13-1-2017 09:07, Christian Balzer wrote:
>>
>> Hello,
>>
>> Something I came across a while agao, but the recent discussion here
>> jolted my memory.
>>
>> If you have a cluster configured with just a "public network" and that
>> network being in RFC space like 10.0.0.0/8, you'd think you'd be "safe",
>> wouldn't you?
>>
>> Alas you're not:
>> ---
>> root@ceph-01:~# netstat -atn |grep LIST |grep 68
>> tcp0  0 0.0.0.0:68130.0.0.0:*   LISTEN   
>>   
>> tcp0  0 0.0.0.0:68140.0.0.0:*   LISTEN   
>>   
>> tcp0  0 10.0.0.11:6815  0.0.0.0:*   LISTEN   
>>   
>> tcp0  0 10.0.0.11:6800  0.0.0.0:*   LISTEN   
>>   
>> tcp0  0 0.0.0.0:68010.0.0.0:*   LISTEN   
>>   
>> tcp0  0 0.0.0.0:68020.0.0.0:*   LISTEN   
>>   
>> etc..
>> ---
>>
>> Something that people most certainly would NOT expect to be the default
>> behavior.
>>
>> Solution, define a complete redundant "cluster network" that's identical
>> to the public one and voila:
>> ---
>> root@ceph-02:~# netstat -atn |grep LIST |grep 68
>> tcp0  0 10.0.0.12:6816  0.0.0.0:*   LISTEN   
>>   
>> tcp0  0 10.0.0.12:6817  0.0.0.0:*   LISTEN   
>>   
>> tcp0  0 10.0.0.12:6818  0.0.0.0:*   LISTEN   
>>   
>> etc.
>> ---
>>
>> I'd call that a security bug, simply because any other daemon on the
>> planet will bloody bind to the IP it's been told to in its respective
>> configuration.
> 
> I do agree that this would not be the expected result if one specifies
> specific addresses. But it could be that this is how is was designed.
> 
> I have been hacking a bit in the networking code, and my more verbose
> code (HEAD) tells me:
> 1: starting osd.0 at - osd_data td/ceph-helpers/0 td/ceph-helpers/0/journal
> 1: 2017-01-13 12:24:02.045275 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6800/0
> 1: 2017-01-13 12:24:02.045429 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6800/0
> 1: 2017-01-13 12:24:02.045603 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6801/0
> 1: 2017-01-13 12:24:02.045669 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6800/0
> 1: 2017-01-13 12:24:02.045715 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6801/0
> 1: 2017-01-13 12:24:02.045758 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6802/0
> 1: 2017-01-13 12:24:02.045810 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6800/0
> 1: 2017-01-13 12:24:02.045857 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6801/0
> 1: 2017-01-13 12:24:02.045903 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6802/0
> 1: 2017-01-13 12:24:02.045997 b7dc000 -1  Processor -- bind:119 trying
> to bind to 0.0.0.0:6803/0
> 
> So binding factually occurs on 0.0.0.0.
> 
> Here in sequence are bound:
>   Messenger *ms_public = Messenger::create(g_ceph_context,
>   Messenger *ms_cluster = Messenger::create(g_ceph_context,
>   Messenger *ms_hbclient = Messenger::create(g_ceph_context,
>   Messenger *ms_hb_back_server = Messenger::create(g_ceph_context,
>   Messenger *ms_hb_front_server = Messenger::create(g_ceph_context,
>   Messenger *ms_objecter = Messenger::create(g_ceph_context,
> 
> But a specific address indication is not passed.
> 
> I have asked on the dev-list if this is the desired behaviour.
> And if not I'll if I can come up with a fix.

A fix for this has been merged into the HEAD code
https://github.com/ceph/ceph/pull/12929

If you do define public_network and not define cluster_network then
public is used for cluster as well.

Not sure if this will get back-ported to earlier releases.

--WjW


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block
I had a similar issue recently, where I had a replication size of 2 (I  
changed that to 3 after the recovery).


ceph health detail
HEALTH_ERR 16 pgs inconsistent; 261 scrub errors
pg 1.bb1 is active+clean+inconsistent, acting [15,5]

zgrep 1.bb1 /var/log/ceph/ceph.log*
[...] cluster [INF] 1.bb1 deep-scrub starts
[...] cluster [ERR] 1.bb1 shard 5 missing  
1/d7879bb1/rbd_data.16be96558ea798.022f/head

[...] cluster [ERR] 1.bb1 deep-scrub 1 missing, 0 inconsistent objects
[...] cluster [ERR] 1.bb1 deep-scrub 1 errors

What I did was to identify the object in the filesystem, according to  
the output of "ceph osd tree":


ceph-node1:~ # find /var/lib/ceph/osd/ceph-15/current/1.bb1_head/  
-name 'rbd\\udata.16be96558ea798.022f*' -ls
155196824 6416 -rw-r--r--   1 root root  6565888 Sep 26 11:35  
/var/lib/ceph/osd/ceph-15/current/1.bb1_head/rbd\\udata.16be96558ea798.022f__head_D7879BB1__1


ceph-node2:~ # find /var/lib/ceph/osd/ceph-5/current/1.bb1_head/ -name  
'rbd\\udata.16be96558ea798.022f*' -ls

ceph-node2:~ #

So this object was completely missing. A ceph repair didn't work, I  
wasn't sure why. So I just created the empty object:


ceph-node2:~ # touch  
/var/lib/ceph/osd/ceph-5/current/1.bb1_head/rbd\\udata.16be96558ea798.022f__head_D7879BB1__1


ceph-node2:~ # ceph pg repair 1.bb1

and the result in the logs:

[...] cluster [INF] 1.bb1 repair starts
[...] cluster [ERR] 1.bb1 shard 5: soid  
1/d7879bb1/rbd_data.16be96558ea798.022f/head data_digest  
0x != best guess data_digest  0xead60f2d from auth shard 15,  
size 0 != known size 6565888, missing attr _, missing attr snapset

[...] cluster [ERR] 1.bb1 repair 0 missing, 1 inconsistent objects
[...] cluster [ERR] 1.bb1 repair 1 errors, 1 fixed


Maybe this helps.

Regards,
Eugen


Zitat von Mio Vlahović :


Hello,

We have some problems with 1 pg from this morning, this is what we  
found so far...


# ceph --version
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

# ceph -s
cluster 2bf80721-fceb-4b63-89ee-1a5faa278493
 health HEALTH_ERR
1 pgs inconsistent
2 scrub errors
 monmap e1: 1 mons at {cephadm01=192.168.12.150:6789/0}
election epoch 7, quorum 0 cephadm01
 osdmap e580: 9 osds: 9 up, 9 in
flags sortbitwise
  pgmap v11430755: 664 pgs, 13 pools, 1010 GB data, 13894 kobjects
2142 GB used, 2355 GB / 4497 GB avail
 660 active+clean
   3 active+clean+scrubbing
   1 active+clean+inconsistent

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 10.55 is active+clean+inconsistent, acting [3,4]
2 scrub errors

# ceph pg 10.55 query
{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"epoch": 580,
"up": [
3,
4
],
"acting": [
3,
4
],
"actingbackfill": [
"3",
"4"
],
"info": {
"pgid": "10.55",
"last_update": "580'40334",
"last_complete": "580'40334",
"log_tail": "448'37299",
"last_user_version": 40334,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": "[]",
"history": {
"epoch_created": 329,
"last_epoch_started": 577,
"last_epoch_clean": 577,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 576,
"same_interval_since": 576,
"same_primary_since": 572,
"last_scrub": "568'40333",
"last_scrub_stamp": "2017-01-26 10:06:56.062870",
"last_deep_scrub": "562'40329",
"last_deep_scrub_stamp": "2017-01-26 06:19:55.708518",
"last_clean_scrub_stamp": "2016-07-05 14:58:45.534218"
},
"stats": {
"version": "580'40334",
"reported_seq": "49407",
"reported_epoch": "580",
"state": "active+clean+inconsistent",
"last_fresh": "2017-01-26 11:21:55.393989",
"last_change": "2017-01-26 10:06:56.062930",
"last_active": "2017-01-26 11:21:55.393989",
"last_peered": "2017-01-26 11:21:55.393989",
"last_clean": "2017-01-26 11:21:55.393989",
"last_became_active": "2017-01-26 09:28:09.196447",
"last_became_peered": "2017-01-26 09:28:09.196447",
"last_unstale": "2017-01-26 11:21:55.393989",
"last_undegraded": "2017-01-26 11:21:55.393989",
"last_fullsized": "2017-01-26 11:21:55.393989",
"mapping_epoch": 575,
"log_start": "448'37299",
"ondisk_log_start": "448'37299",
"created": 329,
"last_epoch_clean": 577,
"parent": "0.0",
"parent_split_bits": 8,
"last_scrub": "568'40333",
"last_scrub_stamp": "2017-01-26 10:06:5

[ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Mio Vlahović
Hello,

We have some problems with 1 pg from this morning, this is what we found so 
far...

# ceph --version
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

# ceph -s
cluster 2bf80721-fceb-4b63-89ee-1a5faa278493
 health HEALTH_ERR
1 pgs inconsistent
2 scrub errors
 monmap e1: 1 mons at {cephadm01=192.168.12.150:6789/0}
election epoch 7, quorum 0 cephadm01
 osdmap e580: 9 osds: 9 up, 9 in
flags sortbitwise
  pgmap v11430755: 664 pgs, 13 pools, 1010 GB data, 13894 kobjects
2142 GB used, 2355 GB / 4497 GB avail
 660 active+clean
   3 active+clean+scrubbing
   1 active+clean+inconsistent

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 10.55 is active+clean+inconsistent, acting [3,4]
2 scrub errors

# ceph pg 10.55 query
{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"epoch": 580,
"up": [
3,
4
],
"acting": [
3,
4
],
"actingbackfill": [
"3",
"4"
],
"info": {
"pgid": "10.55",
"last_update": "580'40334",
"last_complete": "580'40334",
"log_tail": "448'37299",
"last_user_version": 40334,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": "[]",
"history": {
"epoch_created": 329,
"last_epoch_started": 577,
"last_epoch_clean": 577,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 576,
"same_interval_since": 576,
"same_primary_since": 572,
"last_scrub": "568'40333",
"last_scrub_stamp": "2017-01-26 10:06:56.062870",
"last_deep_scrub": "562'40329",
"last_deep_scrub_stamp": "2017-01-26 06:19:55.708518",
"last_clean_scrub_stamp": "2016-07-05 14:58:45.534218"
},
"stats": {
"version": "580'40334",
"reported_seq": "49407",
"reported_epoch": "580",
"state": "active+clean+inconsistent",
"last_fresh": "2017-01-26 11:21:55.393989",
"last_change": "2017-01-26 10:06:56.062930",
"last_active": "2017-01-26 11:21:55.393989",
"last_peered": "2017-01-26 11:21:55.393989",
"last_clean": "2017-01-26 11:21:55.393989",
"last_became_active": "2017-01-26 09:28:09.196447",
"last_became_peered": "2017-01-26 09:28:09.196447",
"last_unstale": "2017-01-26 11:21:55.393989",
"last_undegraded": "2017-01-26 11:21:55.393989",
"last_fullsized": "2017-01-26 11:21:55.393989",
"mapping_epoch": 575,
"log_start": "448'37299",
"ondisk_log_start": "448'37299",
"created": 329,
"last_epoch_clean": 577,
"parent": "0.0",
"parent_split_bits": 8,
"last_scrub": "568'40333",
"last_scrub_stamp": "2017-01-26 10:06:56.062870",
"last_deep_scrub": "562'40329",
"last_deep_scrub_stamp": "2017-01-26 06:19:55.708518",
"last_clean_scrub_stamp": "2016-07-05 14:58:45.534218",
"log_size": 3035,
"ondisk_log_size": 3035,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 2153869599,
"num_objects": 28148,
"num_object_clones": 0,
"num_object_copies": 56296,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 28148,
"num_whiteouts": 0,
"num_read": 21,
"num_read_kb": 696,
"num_write": 50,
"num_write_kb": 217,
"num_scrub_errors": 2,
"num_shallow_scrub_errors": 1,
"num_deep_scrub_errors": 1,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
   

Re: [ceph-users] Replacing an mds server

2017-01-26 Thread Peter Maloney
Oh, it says Coming soon somewhere? (Thanks... and I found it now at
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-mds/ )

I wrote some instructions and tested them (it was very difficult...
putting together incomplete docs, old mailing list threads, etc. and
tinkering), and couldn't find where to add them to docs, and nobody
would answer me "how do you make a new page in the docs that ends up
actually in the index?" so I never sent any pull request... so now that
I know of an existing page where it says "Coming soon", I'll add it there.

And for you, and anyone it helps:

Here is the procedure for removing ALL mds and deleting the pools:
(***this is not what you want as is***)
(killall and rm -rf on the node that has the mds running that you want
to remove, other steps on admin node.. possibly same machine like on my
test cluster)
> killall ceph-mds
> ceph mds cluster_down
>
> # this seems to actually remove the mds, unlike "ceph mds rm ..."
> # as badly explained here
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045649.html
> ceph mds fail 0
> 
> # this says "Error EINVAL: all MDS daemons must be inactive before
> removing filesystem"
> ceph fs rm cephfs --yes-i-really-mean-it
> 
> ceph osd pool delete cephfs_data cephfs_data
> --yes-i-really-really-mean-it
> ceph osd pool delete cephfs_metadata cephfs_metadata
> --yes-i-really-really-mean-it
>
> # also auth and the dir
> rm -rf "/var/lib/ceph/mds/${cluster}-${hostname}"
> ceph auth del mds."$hostname"

To replace one, I didn't test the procedure... but probably just add a
2nd as a failover, and then just do the removal parts from above:

> killall ceph-mds  #at this point, the failover happens
> ceph mds fail 0 #0 is the id here, seen in the dump command below
> rm -rf "/var/lib/ceph/mds/${cluster}-${hostname}"
> ceph auth del mds."$hostname"

This command seems not to be required, but making note of it just in case:
> ceph mds rm 0 mds."$hostname"

Check on that with:
> ceph mds dump --format json-pretty

eg.  in this output, I have one mds running "0" and it is in and up.
Make sure to save the output from before, to compare what one mds looks
like, then again with failover set up, then after.

> "in": [
> 0
> ],
> "up": {
> "mds_0": 2504573
> },
> "failed": [],
> "damaged": [],
> "stopped": [],
> "info": {
> "gid_2504573": {
> "gid": 2504573,
> "name": "ceph2",
> "rank": 0,
> "incarnation": 99,
> "state": "up:active",
> "state_seq": 65,
> "addr": "10.3.0.132:6818\/3463",
> "standby_for_rank": -1,
> "standby_for_fscid": -1,
> "standby_for_name": "",
> "standby_replay": false,
> "export_targets": [],
> "features": 576460752032874495
> }
> },


On 01/24/17 20:56, Jorge Garcia wrote:
> I have been using a ceph-mds server that has low memory. I want to
> replace it with a new system that has a lot more memory. How does one
> go about replacing the ceph-mds server? I looked at the documentation,
> figuring I could remove the current metadata server and add the new
> one, but the remove metadata server section just says "Coming
> soon...". The same page also has a warning about running multiple
> metadata servers. So am I stuck?
>
> Thanks!
>
> Jorge
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SIGHUP to ceph processes every morning

2017-01-26 Thread Torsten Casselt
Nice, there it is. Thanks a lot!

On 26.01.2017 09:43, Henrik Korkuc wrote:
> just to add to what Pawel said: /etc/logrotate.d/ceph.logrotate
> 
> On 17-01-26 09:21, Torsten Casselt wrote:
>> Hi,
>>
>> that makes sense. Thanks for the fast answer!
>>
>> On 26.01.2017 08:04, Paweł Sadowski wrote:
>>> Hi,
>>>
>>> 6:25 points to daily cron job, it's probably logrotate trying to force
>>> ceph to reopen logs.
>>>
>>>
>>> On 01/26/2017 07:34 AM, Torsten Casselt wrote:
 Hi,

 I get the following line in journalctl:

 Jan 24 06:25:02 ceph01 ceph-osd[28398]: 2017-01-24 06:25:02.302770
 7f0655516700 -1 received  signal: Hangup from  PID: 18157 task name:
 killall -q -1 ceph-mon ceph-mds ceph-osd ceph-fuse radosgw  UID: 0

 It happens every day at the same time which is the cron.daily time. But
 there's no cronjob I can relate to ceph in the appropriate folder.
 The cluster runs just fine, so it seems it restarts automatically after
 the SIGHUP. Still I'm curious why the signal is sent every morning.

 I use Kraken on Debian Jessie systems. Three monitors, 36 OSDs on three
 nodes.

 Thanks!
 Torsten
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Torsten Casselt, IT-Sicherheit, Leibniz Universität IT Services
Tel: +49-(0)511-762-799095  Schlosswender Str. 5
Fax: +49-(0)511-762-3003D-30159 Hannover
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SIGHUP to ceph processes every morning

2017-01-26 Thread Henrik Korkuc

just to add to what Pawel said: /etc/logrotate.d/ceph.logrotate

On 17-01-26 09:21, Torsten Casselt wrote:

Hi,

that makes sense. Thanks for the fast answer!

On 26.01.2017 08:04, Paweł Sadowski wrote:

Hi,

6:25 points to daily cron job, it's probably logrotate trying to force
ceph to reopen logs.


On 01/26/2017 07:34 AM, Torsten Casselt wrote:

Hi,

I get the following line in journalctl:

Jan 24 06:25:02 ceph01 ceph-osd[28398]: 2017-01-24 06:25:02.302770
7f0655516700 -1 received  signal: Hangup from  PID: 18157 task name:
killall -q -1 ceph-mon ceph-mds ceph-osd ceph-fuse radosgw  UID: 0

It happens every day at the same time which is the cron.daily time. But
there's no cronjob I can relate to ceph in the appropriate folder.
The cluster runs just fine, so it seems it restarts automatically after
the SIGHUP. Still I'm curious why the signal is sent every morning.

I use Kraken on Debian Jessie systems. Three monitors, 36 OSDs on three
nodes.

Thanks!
Torsten



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS flapping: how to increase MDS timeouts?

2017-01-26 Thread Burkhard Linke

HI,


we are running two MDS servers in active/standby-replay setup. Recently 
we had to disconnect active MDS server, and failover to standby works as 
expected.



The filesystem currently contains over 5 million files, so reading all 
the metadata information from the data pool took too long, since the 
information was not available on the OSD page caches. The MDS was timed 
out by the mons, and a failover switch to the former active MDS (which 
was available as standby again) happened. This MDS in turn had to read 
the metadata, again running into a timeout, failover, etc. I resolved 
the situation by disabling one of the MDS, which kept the mons from 
failing the now solely available MDS.



So given a large filesystem, how do I prevent failover flapping between 
MDS instances that are in the rejoin state and reading the inode 
information?


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com