from:"Sean"

Re: [ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-08 Thread Sean Matheny

I tested this out by setting norebalance and norecover, moving the host buckets 
under the rack buckets (all of them), and then unsetting. Ceph starts melting 
down with escalating slow requests, even with backfill and recovery parameters 
set to throttle. I moved the host buckets back to the default root bucket, and 
things mostly came right, but I still had some inactive / unknown pgs that I 
had to restart some OSDs to get back to health_ok.

I’m sure there’s a way you can tune things or fade in crush weights or 
something, but I’m happy just moving one at a time.

Our environment has 224 OSDs on 14 hosts, btw.

Cheers,
Sean M


On 8/01/2020, at 1:32 PM, Sean Matheny 
mailto:s.math...@auckland.ac.nz>> wrote:

We’re adding in a CRUSH hierarchy retrospectively in preparation for a big 
expansion. Previously we only had host and osd buckets, and now we’ve added in 
rack buckets.

I’ve got sensible settings to limit rebalancing set, at least what has worked 
in the past:
osd_max_backfills = 1
osd_recovery_threads = 1
osd_recovery_priority = 5
osd_client_op_priority = 63
osd_recovery_max_active = 3

I thought it would save a lot of unnecessary data movement if I move the 
existing host buckets to the new rack buckets all at once, rather than 
host-by-host. As long as recovery is throttled correctly, it shouldn’t matter 
how many objects are misplaced, the thinking goes.

1) Is doing all at once advisable, or am I putting myself at a much greater 
risk if I do have failures during the rebalance (which could take quite a 
while)?
2) My failure domain is currently set at the host level. If I want to change 
the failure domain to ‘rack’, when should I best change this (e.g. after the 
rebalancing finishes for moving the hosts to the racks)?

v12.2.2 if it makes a difference.

Cheers,
Sean M






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-07 Thread Sean Matheny

We’re adding in a CRUSH hierarchy retrospectively in preparation for a big 
expansion. Previously we only had host and osd buckets, and now we’ve added in 
rack buckets.

I’ve got sensible settings to limit rebalancing set, at least what has worked 
in the past:
osd_max_backfills = 1
osd_recovery_threads = 1
osd_recovery_priority = 5
osd_client_op_priority = 63
osd_recovery_max_active = 3

I thought it would save a lot of unnecessary data movement if I move the 
existing host buckets to the new rack buckets all at once, rather than 
host-by-host. As long as recovery is throttled correctly, it shouldn’t matter 
how many objects are misplaced, the thinking goes.

1) Is doing all at once advisable, or am I putting myself at a much greater 
risk if I do have failures during the rebalance (which could take quite a 
while)?
2) My failure domain is currently set at the host level. If I want to change 
the failure domain to ‘rack’, when should I best change this (e.g. after the 
rebalancing finishes for moving the hosts to the racks)?

v12.2.2 if it makes a difference.

Cheers,
Sean M





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Lifecycle and dynamic resharding

2019-08-02 Thread Sean Purdy

Hi,

A while back I reported a bug in luminous where lifecycle on a versioned bucket 
wasn't removing delete markers.

I'm interested in this phrase in the pull request:

"you can't expect lifecycle to work with dynamic resharding enabled."

Why not?


https://github.com/ceph/ceph/pull/29122
https://tracker.ceph.com/issues/36512

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding - FPGA / Hardware Acceleration

2019-06-14 Thread Sean Redmond

Hi James,

Thanks for your comments.

I think the CPU burn is more of a concern to soft iron here as they are
using low power ARM64 CPU's to keep the power draw low compared to using
Intel CPU's where like you say the problem maybe less of a concern.

Using less power by using ARM64 and providing EC using an FPGA does sound
interesting as I often run into power constrains when deploying. I am just
concerned that this FPGA functionally seems limited to a single vendor, who
I assume is packing their own EC plugin to get this to work (hopefully a
soft iron employee can explain to us how that is implemented). as I like
the flexibility we have with ceph to change or use multiple vendor over time

Thanks

On Fri, Jun 14, 2019 at 1:49 PM Brett Niver  wrote:

> Also the picture I saw at Cephalocon - which could have been
> inaccurate, looked to me as if it multiplied the data path.
>
> On Fri, Jun 14, 2019 at 8:27 AM Janne Johansson 
> wrote:
> >
> > Den fre 14 juni 2019 kl 13:58 skrev Sean Redmond <
> sean.redmo...@gmail.com>:
> >>
> >> Hi Ceph-Uers,
> >> I noticed that Soft Iron now have hardware acceleration for Erasure
> Coding[1], this is interesting as the CPU overhead can be a problem in
> addition to the extra disk I/O required for EC pools.
> >> Does anyone know if any other work is ongoing to support generic FPGA
> Hardware Acceleration for EC pools, or if this is just a vendor specific
> feature.
> >>
> >> [1]
> https://www.theregister.co.uk/2019/05/20/softiron_unleashes_accepherator_an_erasure_coding_accelerator_for_ceph/
> >
> >
> > Are there numbers anywhere to see how "tough" on a CPU it would be to
> calculate an EC code compared to "writing a sector to
> > a disk on a remote server and getting an ack back" ? To my very
> untrained eye, it seems like a very small part of the whole picture,
> > especially if you are meant to buy a ton of cards to do it.
> >
> > --
> > May the most significant bit of your life be positive.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Erasure Coding - FPGA / Hardware Acceleration

2019-06-14 Thread Sean Redmond

Hi Ceph-Uers,

I noticed that Soft Iron now have hardware acceleration for Erasure
Coding[1], this is interesting as the CPU overhead can be a problem in
addition to the extra disk I/O required for EC pools.

Does anyone know if any other work is ongoing to support generic FPGA
Hardware Acceleration for EC pools, or if this is just a vendor specific
feature.

Thanks

[1]
https://www.theregister.co.uk/2019/05/20/softiron_unleashes_accepherator_an_erasure_coding_accelerator_for_ceph/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v14.2.0 Nautilus released

2019-03-19 Thread Sean Purdy

Hi,


Will debian packages be released?  I don't see them in the nautilus repo.  I 
thought that Nautilus was going to be debian-friendly, unlike Mimic.


Sean

On Tue, 19 Mar 2019 14:58:41 +0100
Abhishek Lekshmanan  wrote:

> 
> We're glad to announce the first release of Nautilus v14.2.0 stable
> series. There have been a lot of changes across components from the
> previous Ceph releases, and we advise everyone to go through the release
> and upgrade notes carefully.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v12.2.11 Luminous released

2019-02-01 Thread Sean Purdy

On Fri, 1 Feb 2019 08:47:47 +0100
Wido den Hollander  wrote:

> 
> 
> On 2/1/19 8:44 AM, Abhishek wrote:
> > We are glad to announce the eleventh bug fix release of the Luminous
> > v12.2.x long term stable release series. We recommend that all users

> > * There have been fixes to RGW dynamic and manual resharding, which no
> > longer
> >   leaves behind stale bucket instances to be removed manually. For
> > finding and
> >   cleaning up older instances from a reshard a radosgw-admin command
> > `reshard
> >   stale-instances list` and `reshard stale-instances rm` should do the
> > necessary
> >   cleanup.
> > 
> 
> Great news! I hope this works! This has been biting a lot of people in
> the last year. I have helped a lot of people to manually clean this up,
> but it's great that this is now available as a regular command.
> 
> Wido

I hope so too, especially when bucket lifecycles and versioning is enabled.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw lifecycle not removing delete markers

2018-10-15 Thread Sean Purdy

Hi,


Versions 12.2.7 and 12.2.8.  I've set up a bucket with versioning enabled and 
upload a lifecycle configuration.  I upload some files and delete them, 
inserting delete markers.  The configured lifecycle DOES remove the deleted 
binaries (non current versions).  The lifecycle DOES NOT remove the delete 
markers.  With ExpiredObjectDeleteMarker set.

Is this a known issue?  I have an empty bucket full of delete markers.

Does this lifecycle do what I expect?  Remove the non-current version after a 
day, and remove orphaned delete markers:

{
"Rules": [
{
"Status": "Enabled", 
"Prefix": "", 
"NoncurrentVersionExpiration": {
"NoncurrentDays": 1
}, 
"Expiration": {
"ExpiredObjectDeleteMarker": true
}, 
"ID": "Test expiry"
}
]
}


I can't be the only one who wants to use this feature.

Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Manually deleting an RGW bucket

2018-10-01 Thread Sean Purdy

On Sat, 29 Sep 2018, Konstantin Shalygin said:
> > How do I delete an RGW/S3 bucket and its contents if the usual S3 API 
> > commands don't work?
> > 
> > The bucket has S3 delete markers that S3 API commands are not able to 
> > remove, and I'd like to reuse the bucket name.  It was set up for 
> > versioning and lifecycles under ceph 12.2.5 which broke the bucket when a 
> > reshard happened.  12.2.7 allowed me to remove the regular files but not 
> > the delete markers.
> > 
> > There must be a way of removing index files and so forth through rados 
> > commands.
> 
> 
> What error actually is?
> 
> For delete bucket you should delete all bucket objects ("s3cmd rm -rf
> s3://bucket/") and multipart uploads.


No errors, but I can't remove delete markers from the versioned bucket.


Here's the bucket:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3 ls s3://mybucket/

(no objects returned)

Try removing the bucket:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3 rb s3://mybucket/
remove_bucket failed: s3://mybucket/ An error occurred (BucketNotEmpty) when 
calling the DeleteBucket operation: Unknown

So the bucket is not empty.

List object versions:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api 
list-object-versions --bucket mybucket --prefix someprefix/0/0

Shows lots of delete markers from the versioned bucket:

{
"Owner": {
"DisplayName": "mybucket bucket owner", 
"ID": "mybucket"
}, 
"IsLatest": true, 
"VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd", 
"Key": "someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7", 
}
 
Let's try removing that delete marker object:

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api delete-object 
--bucket mybucket --key someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7 
--version-id ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd

Returns 0, has it worked?

$ aws --profile=mybucket --endpoint-url http://myserver/ s3api 
list-object-versions --bucket mybucket --prefix 
someprefix/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7

No:

"DeleteMarkers": [
{
"Owner": {
"DisplayName": "static bucket owner", 
"ID": "static"
}, 
"IsLatest": true, 
    "VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd", 
"Key": "candidate-photo/0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7", 
"LastModified": "2018-09-17T16:19:58.187Z"
}
]


So how do I get rid of the delete markers to empty the bucket?  This is my 
problem.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Manually deleting an RGW bucket

2018-09-28 Thread Sean Purdy



Hi,


How do I delete an RGW/S3 bucket and its contents if the usual S3 API commands 
don't work?

The bucket has S3 delete markers that S3 API commands are not able to remove, 
and I'd like to reuse the bucket name.  It was set up for versioning and 
lifecycles under ceph 12.2.5 which broke the bucket when a reshard happened.  
12.2.7 allowed me to remove the regular files but not the delete markers.

There must be a way of removing index files and so forth through rados commands.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Can't remove DeleteMarkers in rgw bucket

2018-09-20 Thread Sean Purdy

Hi,


We have a bucket that we are trying to empty.  Versioning and lifecycle was 
enabled.  We deleted all the objects in the bucket.  But this left a whole 
bunch of Delete Markers.

aws s3api delete-object --bucket B --key K --version-id V is not deleting the 
delete markers.

Any ideas?  We want to delete the bucket so we can reuse the bucket name.  
Alternatively, is there a way to delete a bucket that still contains delete 
markers?


$ aws --profile=owner s3api list-object-versions --bucket bucket --prefix 
0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7

{
  "DeleteMarkers": [
{
  "Owner": {
"DisplayName": "bucket owner",
"ID": "owner"
  },
  "IsLatest": true,
  "VersionId": "ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd",
  "Key": "0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7",
  "LastModified": "2018-09-17T16:19:58.187Z"
}
  ]
}

$ aws --profile=owner s3api delete-object --bucket bucket --key 
0/0/00fff6df-863d-48b5-9089-cc6e7c5997e7 --version-id 
ZB8ty9c3hxjxV5izmIKM1QwDR6fwnsd

returns 0 but the delete marker remains.


This bucket was created in 12.2.2, current version of ceph is 12.2.7 via 12.2.5


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Mimic packages not available for Ubuntu Trusty

2018-09-19 Thread Sean Purdy

I doubt it - Mimic needs gcc v7 I believe, and Trusty's a bit old for that.  
Even the Xenial releases aren't straightforward and rely on some backported 
packages.


Sean, missing Mimic on debian stretch

On Wed, 19 Sep 2018, Jakub Jaszewski said:
> Hi Cephers,
> 
> Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only
> ceph-deploy.
> https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/
> 
> Thanks
> Jakub

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Release for production

2018-09-07 Thread Sean Purdy

On Fri,  7 Sep 2018, Paul Emmerich said:
> Mimic

Unless you run debian, in which case Luminous.

Sean
 
> 2018-09-07 12:24 GMT+02:00 Vincent Godin :
> > Hello Cephers,
> > if i had to go for production today, which release should i choose :
> > Luminous or Mimic ?
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fixing a 12.2.5 reshard

2018-09-06 Thread Sean Purdy

Hi,


We were on 12.2.5 when a bucket with versioning and 100k objects got stuck when 
autoreshard kicked in.  We could download but not upload files.  But upgrading 
to 12.2.7 then running bucket check now shows twice as many objects, according 
to bucket limit check.  How do I fix this?


Sequence:

12.2.5 autoshard happened, "radosgw-admin reshard list" showed a reshard 
happening but no action.
12.2.7 upgrade went fine, didn't fix anything straightaway. "radosgw-admin 
reshard list" same.  Still no file uploads.  bucket limit check showed 100k 
files in the bucket as expected, and no shards.
Ran "radosgw-admin bucket check --fix"

Now "reshard list" shows no reshards in progress, but bucket limit check shows 
200k files in two shards, 100k per shard.  It should be half this.


The output of "bucket check --fix" has 
existing_header: "num_objects": 203344 for "rgw.main"
calculated_header: "num_objects": 101621

Shouldn't it install the calculated_header?



Before:

$ sudo radosgw-admin reshard list
[
  {

"tenant": "",
"bucket_name": "static",
"bucket_id": "a5501bce-1360-43e3-af08-8f3d1e102a79.3475308.1",
"new_instance_id": "static:a5501bce-1360-43e3-af08-8f3d1e102a79.3620665.1",
"old_num_shards": 1,
"new_num_shards": 2
  }
]

$ sudo radosgw-admin bucket limit check
{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 101621,
"num_shards": 0,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}

Output from bucket check --fix

{
"existing_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 37615290807,
"size_actual": 38017675264,
"size_utilized": 0,
"size_kb": 36733683,
"size_kb_actual": 37126636,
"size_kb_utilized": 0,
"num_objects": 203344
}
}
},
"calculated_header": {
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 101621
},
"rgw.main": {
"size": 18796589005,
"size_actual": 18997686272,
"size_utilized": 18796589005,
"size_kb": 18356044,
"size_kb_actual": 18552428,
"size_kb_utilized": 18356044,
"num_objects": 101621
}
}
}
}

After:

{
"user_id": "static",
"buckets": [
{
"bucket": "static",
"tenant": "",
"num_objects": 203242,
"num_shards": 2,
"objects_per_shard": 101621,
"fill_status": "OK"
}
]
}


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading ceph with HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent

2018-09-05 Thread Sean Purdy

On Wed,  5 Sep 2018, John Spray said:
> On Wed, Sep 5, 2018 at 8:38 AM Marc Roos  wrote:
> >
> >
> > The adviced solution is to upgrade ceph only in HEALTH_OK state. And I
> > also read somewhere that is bad to have your cluster for a long time in
> > an HEALTH_ERR state.
> >
> > But why is this bad?

See https://ceph.com/community/new-luminous-pg-overdose-protection
under "Problems with past intervals"

"if the cluster becomes unhealthy, and especially if it remains unhealthy for 
an extended period of time, a combination of effects can cause problems."

"If a cluster is unhealthy for an extended period of time (e.g., days or even 
weeks), the past interval set can become large enough to require a significant 
amount of memory."


Sean
 
> Aside from the obvious (errors are bad things!), many people have
> external monitoring systems that will alert them on the transitions
> between OK/WARN/ERR.  If the system is stuck in ERR for a long time,
> they are unlikely to notice new errors or warnings.  These systems can
> accumulate faults without the operator noticing.
> 
> > Why is this bad during upgrading?
> 
> It depends what's gone wrong.  For example:
>  - If your cluster is degraded (fewer than desired number of replicas
> of data) then taking more services offline (even briefly) to do an
> upgrade will create greater risk to the data by reducing the number of
> copies available.
> - If your system is in an error state because something has gone bad
> on disk, then recovering it with the same software that wrote the data
> is a more tested code path than running some newer code against a
> system left in a strange state by an older version.
> 
> There will always be exceptions to this (e.g. where the upgrade is the
> fix for whatever caused the error), but the general purpose advice is
> to get a system nice and clean before starting the upgrade.
> 
> John
> 
> > Can I quantify how bad it is? (like with large log/journal file?)
> >
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] bucket limit check is 3x actual objects after autoreshard/upgrade

2018-08-22 Thread Sean Purdy

Hi,


I was testing versioning and autosharding in luminous 12.2.5 upgrading to 
12.2.7  I wanted to know if the upgraded autosharded bucket is still usable.  
Looks like it is, but a bucket limit check seems to show too many objects.


On my test servers, I created a bucket using 12.2.5, turned on versioning and 
autosharding, uploaded 100,000 objects, and bucket uploads hung, as is known.  
Autosharding said it was running but didn't complete.

Then I upgraded that cluster to 12.2.7.  Resharding seems to have finished, 
(two shards), but "bucket limit check" says there are 300,000 objects, 150k per 
shard, and gives a "fill_status OVER 100%" message.

But an "s3 ls" shows 100k objects in the bucket. And a "rados ls" shows 200k 
objects, two per file, one has file data and one is empty.

e.g. for file TEST.89488
$ rados ls -p default.rgw.buckets.index | grep TEST.89488\$
a7fb3a0d-e0a4-401c-b7cb-dbc535f3c1af.114156.2_TEST.89488 (empty)
a7fb3a0d-e0a4-401c-b7cb-dbc535f3c1af.114156.2__:ZuP3m9XRFcarZYrLGTVd8rcOksWkGBr_TEST.89488
 (has data)

Both "bucket check" and "bucket check --check-objects" just return []


How should I go about fixing this?  The bucket *seems* functional, and I don't 
*think* there are extra objects, but the index check thinks there is?  How do I 
find out what the index actually says?  Or whether there really are extra files 
that need removing.


Thanks for any ideas or pointers.


Sean

$ /usr/local/bin/aws --endpoint-url http://test-cluster/ --profile test s3 ls 
s3://test2/ | wc -l
13

$ sudo rados ls -p default.rgw.buckets.index | grep -c TEST
200133

$ sudo radosgw-admin bucket limit check
[
{
"user_id": "test",
"buckets": [
...
{
"bucket": "test2",
"tenant": "",
"num_objects": 300360,
"num_shards": 2,
"objects_per_shard": 150180,
"fill_status": "OVER 100.00%"
}
]
}
]

$ sudo radosgw-admin reshard status --bucket test2
[
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
}
]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Clock skew

2018-08-15 Thread Sean Crosby

Hi Dominique,

The clock skew warning shows up when your NTP daemon is not synced.

You can see the sync in the output of ntpq -p

This is a synced NTP

# ntpq -p
 remote   refid  st t when poll reach   delay   offset
jitter
==
 ntp.unimelb.edu 210.9.192.50 2 u   24   64   170.496   -6.421
 0.181
*ntp2.unimelb.ed 202.6.131.1182 u   26   64   170.613  -11.998
 0.250
 ntp41.frosteri. .INIT.  16 u-   6400.0000.000
 0.000
 dns01.ntl02.pri .INIT.  16 u-   6400.0000.000
 0.000
 cosima.470n.act .INIT.  16 u-   6400.0000.000
 0.000
 x.ns.gin.ntt.ne .INIT.  16 u-   6400.0000.000
 0.000

The *'s show that there is a sync with a NTP server. When you start or
restart ntp, it takes a while for a sync to occur

Here's immediately after restarting the ntp daemon

# ntpq -p
 remote   refid  st t when poll reach   delay   offset
jitter
==
 ntp.unimelb.edu 210.9.192.50 2 u-   6410.496   -6.421
 0.000
 ntp2.unimelb.ed 202.6.131.1182 u-   6410.474  -11.678
 0.000
 ntp41.frosteri. .INIT.  16 u-   6400.0000.000
 0.000
 dns01.ntl02.pri .INIT.  16 u-   6400.0000.000
 0.000
 cosima.470n.act .INIT.  16 u-   6400.0000.000
 0.000
 x.ns.gin.ntt.ne .INIT.  16 u-   6400.0000.000
 0.000

Make sure that nothing is regularly restarting ntpd. For us, we had puppet
and dhcp regularly fight over the contents of ntp.conf, and it caused a
restart of ntpd.

Sean


On Wed, 15 Aug 2018 at 19:37, Dominque Roux 
wrote:

> Hi all,
>
> We recently facing clock skews from time to time.
> This means that sometimes everything is fine but hours later the warning
> appears again.
>
> NTPD is running and configured with the same pool.
>
> Did someone else already had the same issue and could probably help us
> to fix this?
>
> Thanks a lot!
>
> Dominique
> --
>
> Your Swiss, Open Source and IPv6 Virtual Machine. Now on
> www.datacenterlight.ch
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovering from broken sharding: fill_status OVER 100%

2018-08-07 Thread Sean Purdy

Hi,


On my test servers, I created a bucket using 12.2.5, turned on versioning, 
uploaded 100,000 objects, and the bucket broke, as expected.  Autosharding said 
it was running but didn't complete.

Then I upgraded that cluster to 12.2.7.  Resharding seems to have finished, but 
now that cluster says it has *300,000* objects, instead of 100,000.  But an S3 
list shows 100,000 objects.

How do I fix this?  We have a production cluster that has a similar bucket.

I have tried both "bucket check" and "bucket check --check-objects" and they 
just return []


$ /usr/local/bin/aws --endpoint-url http://test/ --profile test s3 ls 
s3://test2/ | wc -l
13

$ sudo radosgw-admin bucket limit check
[
{
"user_id": "test",
"buckets": [
...
{
"bucket": "test2",
"tenant": "",
"num_objects": 300360,
"num_shards": 2,
"objects_per_shard": 150180,
"fill_status": "OVER 100.00%"
}
]
}
]

$ sudo radosgw-admin reshard status --bucket test2
[
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
},
{
"reshard_status": 0,
"new_bucket_instance_id": "",
"num_shards": -1
}
]


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis

3:54:00.195276 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.aa( v 19579'25769295 (19579'25766294,19579'25769295] local-les=16453
n=214 ec=5 les/c 16453/16453 15889/16452/16452) [21,16] r=0 lpr=0
crt=16449'19948380 lcod 0'0 mlcod 0'0 inactive] enter Reset
-3> 2018-08-04 03:54:00.195526 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab(unlocked)] enter Initial
-2> 2018-08-04 03:54:00.254812 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab( v 19579'1116897 (18464'1113896,19579'1116897] local-les=13378
n=217 ec=5 les/c 13378/13378 13286/13377/13377) [4,21] r=1 lpr=0
pi=12038-13376/4 crt=709'35663 lcod 0'0 inactive NOTIFY] exit Initial
0.059287 0 0.00
-1> 2018-08-04 03:54:00.254842 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab( v 19579'1116897 (18464'1113896,19579'1116897] local-les=13378
n=217 ec=5 les/c 13378/13378 13286/13377/13377) [4,21] r=1 lpr=0
pi=12038-13376/4 crt=709'35663 lcod 0'0 inactive NOTIFY] enter Reset
 0> 2018-08-04 03:54:00.275885 7f3102aa87c0 -1 osd/PG.cc: In function
'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::bufferlist*)' thread 7f3102aa87c0 time 2018-08-04 03:54:00.274454
osd/PG.cc: 2577: FAILED assert(values.size() == 1)

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 2: (OSD::load_pgs()+0x1993) [0x655d13]
 3: (OSD::init()+0x1ba1) [0x65fff1]
 4: (main()+0x1ea7) [0x602fd7]
 5: (__libc_start_main()+0xed) [0x7f31008a276d]
 6: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.21.log
--- end dump of recent events ---
2018-08-04 03:54:00.314451 7f3102aa87c0 -1 *** Caught signal (Aborted) **
 in thread 7f3102aa87c0

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /usr/bin/ceph-osd() [0x98aa3a]
 2: (()+0xfcb0) [0x7f3101cd0cb0]
 3: (gsignal()+0x35) [0x7f31008b70d5]
 4: (abort()+0x17b) [0x7f31008ba83b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f310120869d]
 6: (()+0xb5846) [0x7f3101206846]
 7: (()+0xb5873) [0x7f3101206873]
 8: (()+0xb596e) [0x7f310120696e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0xa6adcf]
 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 11: (OSD::load_pgs()+0x1993) [0x655d13]
 12: (OSD::init()+0x1ba1) [0x65fff1]
 13: (main()+0x1ea7) [0x602fd7]
 14: (__libc_start_main()+0xed) [0x7f31008a276d]
 15: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
 0> 2018-08-04 03:54:00.314451 7f3102aa87c0 -1 *** Caught signal
(Aborted) **
 in thread 7f3102aa87c0

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /usr/bin/ceph-osd() [0x98aa3a]
 2: (()+0xfcb0) [0x7f3101cd0cb0]
 3: (gsignal()+0x35) [0x7f31008b70d5]
 4: (abort()+0x17b) [0x7f31008ba83b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f310120869d]
 6: (()+0xb5846) [0x7f3101206846]
 7: (()+0xb5873) [0x7f3101206873]
 8: (()+0xb596e) [0x7f310120696e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0xa6adcf]
 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 11: (OSD::load_pgs()+0x1993) [0x655d13]
 12: (OSD::init()+0x1ba1) [0x65fff1]
 13: (main()+0x1ea7) [0x602fd7]
 14: (__libc_start_main()+0xed) [0x7f31008a276d]
 15: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 mo

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Redmond

Hi,

You can export and import PG's using ceph_objectstore_tool, but if the osd
won't start you may have trouble exporting a PG.

It maybe useful to share the errors you get when trying to start the osd.

Thanks

On Fri, Aug 3, 2018 at 10:13 PM, Sean Patronis  wrote:

>
>
> Hi all.
>
> We have an issue with some down+peering PGs (I think), when I try to mount or 
> access data the requests are blocked:
>
> 114891/7509353 objects degraded (1.530%)
>  887 stale+active+clean
>1 peering
>   54 active+recovery_wait
>19609 active+clean
>   91 active+remapped+wait_backfill
>   10 active+recovering
>1 active+clean+scrubbing+deep
>9 down+peering
>   10 active+remapped+backfilling
> recovery io 67324 kB/s, 10 objects/s
>
> when I query one of these down+peering PGs, I can see the following:
>
>  "peering_blocked_by": [
> { "osd": 7,
>   "current_lost_at": 0,
>   "comment": "starting or marking this osd lost may let us 
> proceed"},
> { "osd": 21,
>   "current_lost_at": 0,
>   "comment": "starting or marking this osd lost may let us 
> proceed"}]},
> { "name": "Started",
>   "enter_time": "2018-08-01 07:06:16.806339"}],
>
>
>
> Both of these OSDs (7 and 21) will not come back up and in with ceph due
> to some errors, but I can mount the disks and read data off of them.  Can I
> manually move/copy these PGs off of these down and out OSDs and put them on
> a good OSD?
>
> This is an older ceph cluster running firefly.
>
> Thanks.
>
>
>
>
> This email message may contain privileged or confidential information, and
> is for the use of intended recipients only. Do not share with or forward to
> additional parties except as necessary to conduct the business for which
> this email (and attachments) was clearly intended. If you have received
> this message in error, please immediately advise the sender by reply email
> and then delete this message.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis

Hi all.

We have an issue with some down+peering PGs (I think), when I try to
mount or access data the requests are blocked:

114891/7509353 objects degraded (1.530%)
 887 stale+active+clean
   1 peering
  54 active+recovery_wait
   19609 active+clean
  91 active+remapped+wait_backfill
  10 active+recovering
   1 active+clean+scrubbing+deep
   9 down+peering
  10 active+remapped+backfilling
recovery io 67324 kB/s, 10 objects/s

when I query one of these down+peering PGs, I can see the following:

 "peering_blocked_by": [
{ "osd": 7,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"},
{ "osd": 21,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"}]},
{ "name": "Started",
  "enter_time": "2018-08-01 07:06:16.806339"}],



Both of these OSDs (7 and 21) will not come back up and in with ceph due to
some errors, but I can mount the disks and read data off of them.  Can I
manually move/copy these PGs off of these down and out OSDs and put them on
a good OSD?

This is an older ceph cluster running firefly.

Thanks.

-- 
This email message may contain privileged or confidential information, and 
is for the use of intended recipients only. Do not share with or forward to 
additional parties except as necessary to conduct the business for which 
this email (and attachments) was clearly intended. If you have received 
this message in error, please immediately advise the sender by reply email 
and then delete this message.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Converting to dynamic bucket resharding in Luminous

2018-07-30 Thread Sean Redmond

Hi,

I also had the same issues and took to disabling this feature.

Thanks

On Mon, Jul 30, 2018 at 8:42 AM, Micha Krause  wrote:

> Hi,
>
>   I have a Jewel Ceph cluster with RGW index sharding enabled.  I've
>> configured the index to have 128 shards.  I am upgrading to Luminous.  What
>> will happen if I enable dynamic bucket index resharding in ceph.conf?  Will
>> it maintain my 128 shards (the buckets are currently empty), and will it
>> split them (to 256, and beyond) when they get full enough?
>>
> Yes it will.
> However I would not recommend enabling dynamic resharding, I had some
> problems with it, like resharding loops where large buckets failed to
> reshard, and it tried resharding
> them over and over again.
> I had problems deleting some buckets that had multiple reshards done
> because of missing objects (Maybe objects where deleted during a dynamic
> reshard, and this was not recorded to
> the indexes).
>
> So for the time being I disabled dynamic resharding again.
>
> Micha Krause
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Setting up Ceph on EC2 i3 instances

2018-07-28 Thread Sean Redmond

Hi,

You may need to consider the latency between the az's, it may make it
difficult to get very high iops - I suspect that is the reason ebs is
replicated within a single AZ.

Have you any data that shows the latency between the az's?

Thanks

On Sat, 28 Jul 2018, 05:52 Mansoor Ahmed,  wrote:

> Hello,
>
> We are working on setting up Ceph on AWS i3 instances that have NVMe SSD
> as instance store to create our own EBS that spans multiple availability
> zones. We want to achieve better performance compared to EBS with
> provisioned IOPS.
>
> I thought it would be good to reach out to the community to see if any one
> has done this or if anyone would advice against it or any other advice that
> could be of help.
>
> Thank you for your help in advance.
>
> Regards
> Mansoor
> ᐧ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-25 Thread Sean Bolding

Thanks. Yes, it turns out this was not an issue with Ceph, but rather an
issue with XenServer. Starting in version 7, Xenserver changed how they
manage LVM by adding a VHD layer on top of it. They did it to handle live
migrations but ironically broke live migrations when using any iSCSI
including iSCSI to Ceph via lrbd. It works just fine with NFS based storage
repositories but not block storage. Doesn't look like they are going to fix
it since they are moving on to using glusterfs instead with an experimental
version of it starting in XenServer 7.5

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Ronny Aasen
Sent: Monday, July 23, 2018 6:13 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reclaim free space on RBD images that use
Bluestore?

 

On 23.07.2018 22:18, Sean Bolding wrote:

I have XenServers that connect via iSCSI to Ceph gateway servers that use
lrbd and targetcli. On my ceph cluster the RBD images I create are used as
storage repositories in Xenserver for the virtual machine vdisks. 

 

Whenever I delete a virtual machine, XenServer shows that the repository
size has decreased. This also happens when I mount a virtual drive in
Xenserver as a virtual drive in a Windows guest. If I delete a large file,
such as an exported VM, it shows as deleted and space available. However;
when check in Ceph  using ceph -s or ceph df it still shows the space being
used.

 

I checked everywhere and it seems there was a reference to it here
https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim or
discard freed blocks was ever implemented.

 

The only way I have found is to play musical chairs and move the VMs to
different repositories and then completely remove the old RBD images in
ceph. This is not exactly easy to do.

 

Is there a way to reclaim free space on RBD images that use Bluestore?
What commands do I use and where do I use this from? If such command exist
do I run them on the ceph cluster or do I run them from XenServer? Please
help.

 

 

Sean

 

 

 


I am not familiar with Xen, but it does sounds like you have a rbd mounted
with a filesystem on the xen server.
in that case it is the same as for other filesystems. Deleted files are just
deleted in the file allocation table, and the RBD space is "reclaimed" when
the filesystem zeroes out the now unused blocks. 

in many filesystems you would run the fstrim command to overwrite free'd
blocks with zeroes, optionally mount the fs with the the discard option. 
in xenserver >6.5 this should be a button in xencenter to reclaim freed
space. 


kind regards
Ronny Aasen

 
<http://t.sidekickopen08.com/e1t/o/5/f18dQhb0S7kC8dDMPbW2n0x6l2B9gXrN7sKj6v5
KRN6W56jV0P7dSBj2W5vbH2n6yGGzjf197v5Y04?si=71238989&pi=37da8a12-5013
-4034-91ea-a306cca5c995> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-23 Thread Sean Bolding

I have XenServers that connect via iSCSI to Ceph gateway servers that use
lrbd and targetcli. On my ceph cluster the RBD images I create are used as
storage repositories in Xenserver for the virtual machine vdisks. 

 

Whenever I delete a virtual machine, XenServer shows that the repository
size has decreased. This also happens when I mount a virtual drive in
Xenserver as a virtual drive in a Windows guest. If I delete a large file,
such as an exported VM, it shows as deleted and space available. However;
when check in Ceph  using ceph -s or ceph df it still shows the space being
used.

 

I checked everywhere and it seems there was a reference to it here
https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim or
discard freed blocks was ever implemented.

 

The only way I have found is to play musical chairs and move the VMs to
different repositories and then completely remove the old RBD images in
ceph. This is not exactly easy to do.

 

Is there a way to reclaim free space on RBD images that use Bluestore?
What commands do I use and where do I use this from? If such command exist
do I run them on the ceph cluster or do I run them from XenServer? Please
help.

 

 

Sean

 

 

 

 
<http://t.sidekickopen08.com/e1t/o/5/f18dQhb0S7kC8dDMPbW2n0x6l2B9gXrN7sKj6v5
KRN6W56jV0P7dSBj2W5vbH2n6yGGzjf197v5Y04?si=71238989&pi=0b6680fd-93ad
-4c52-9d5a-8021344ed59c> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [rgw] Very high cache misses with automatic bucket resharding

2018-07-16 Thread Sean Redmond

Hi,

Do you have on going resharding? 'radosgw-admin reshard list' should so you
the status.

Do you see the number of objects in .rgw.bucket.index pool increasing?

I hit a lot of problems trying to use auto resharding in 12.2.5 - I have
disabled it for the moment.

Thanks

[1] https://tracker.ceph.com/issues/24551

On Mon, Jul 16, 2018 at 12:32 PM, Rudenko Aleksandr 
wrote:

> Hi, guys.
>
> I use Luminous 12.2.5.
>
> Automatic bucket index resharding has not been activated in the past.
>
> Few days ago i activated auto. resharding.
>
> After that and now i see:
>
> - very high Ceph read I/O (~300 I/O before activating resharding, ~4k now),
> - very high Ceph read bandwidth (50 MB/s before activating resharding, 250
> MB/s now),
> - very high RGW cache miss (400 count/s before activating resharding,
> ~3.5k now).
>
> For Ceph monitoring i use MGR+Zabbix plugin and zabbix-template from ceph
> github repo.
> For RGW monitoring i use RGW perf dump and my script.
>
> Why is it happening? When is it ending?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-10 Thread Sean Purdy

Hi Sean,

On Tue, 10 Jul 2018, Sean Redmond said:
> Can you please link me to the tracker 12.2.6 fixes? I have disabled
> resharding in 12.2.5 due to it running endlessly.

http://tracker.ceph.com/issues/22721


Sean
 
> Thanks
> 
> On Tue, Jul 10, 2018 at 9:07 AM, Sean Purdy 
> wrote:
> 
> > While we're at it, is there a release date for 12.2.6?  It fixes a
> > reshard/versioning bug for us.
> >
> > Sean
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous 12.2.6 release date?

2018-07-10 Thread Sean Redmond

Hi Sean (Good name btw),

Can you please link me to the tracker 12.2.6 fixes? I have disabled
resharding in 12.2.5 due to it running endlessly.

Thanks

On Tue, Jul 10, 2018 at 9:07 AM, Sean Purdy 
wrote:

> While we're at it, is there a release date for 12.2.6?  It fixes a
> reshard/versioning bug for us.
>
> Sean
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous 12.2.6 release date?

2018-07-10 Thread Sean Purdy

While we're at it, is there a release date for 12.2.6?  It fixes a 
reshard/versioning bug for us.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pre-sharding s3 buckets

2018-06-29 Thread Sean Purdy

On Wed, 27 Jun 2018, Matthew Vernon said:
> Hi,
> 
> On 27/06/18 11:18, Thomas Bennett wrote:
> 
> > We have a particular use case that we know that we're going to be
> > writing lots of objects (up to 3 million) into a bucket. To take
> > advantage of sharding, I'm wanting to shard buckets, without the
> > performance hit of resharding.
> 
> I assume you're running Jewel (Luminous has dynamic resharding); you can
> set rgw_override_bucket_index_max_shards = X in your ceph.conf, which
> will cause all new buckets to have X shards for the indexes.
> 
> HTH,
> 
> Matthew

But watch out if you are running Luminous - manual and automatic
resharding breaks if you have versioning or lifecycles on your bucket.
Fix in next stable release 12.2.6 apparently.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023968.html
http://tracker.ceph.com/issues/23886


Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] luminous radosgw hung at logrotate time

2018-06-23 Thread Sean Purdy

Hi,


All our radosgw hung at logrotate time.  Logs show:

  ERROR: keystone revocation processing returned error r=-22

(we're not running keystone)

Killing radosgw manually and running manually fixed this - but systemctl 
commands did not.

We're running luminous 12.2.1 on debian stretch.  Is 
http://tracker.ceph.com/issues/22365 a fix for this?  (12.2.3)

In addition, systemctl start/stop/restart radosgw isn't working and I seem to 
have to run the radosgw command and options manually.


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

2018-06-20 Thread Sean Redmond

Hi,

It sounds like the .rgw.bucket.index pool has grown maybe due to some
problem with dynamic bucket resharding.

I wonder if the (stale/old/not used) bucket index's needs to be purged
using something like the below

radosgw-admin bi purge --bucket= --bucket-id=

Not sure how you would find the old_bucket_id however.

Thanks

[1]
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli


On Wed, Jun 20, 2018 at 12:34 PM, Tom W  wrote:

> Hi all,
>
>
>
> We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and
> after this we decided to update our tunables configuration to the optimals,
> which were previously at Firefly. During this process, we have noticed the
> OSDs (bluestore) rapidly filling on the RGW index and GC pool. We estimated
> the index to consume around 30G of space and the GC negligible, but they
> are now filling all 4 OSDs per host which contain 2TB SSDs in each.
>
>
>
> Does anyone have any experience with this, or how to determine why the
> sudden growth has been encountered during recovery after the tunables
> update?
>
>
>
> We have disabled resharding activity due to this issue,
> https://tracker.ceph.com/issues/24551 and our gc queue is only a few
> items at present.
>
>
>
> Kind Regards,
>
>
>
> Tom
>
> --
>
> NOTICE AND DISCLAIMER
> This e-mail (including any attachments) is intended for the above-named
> person(s). If you are not the intended recipient, notify the sender
> immediately, delete this email from your system and do not disclose or use
> for any purpose. We may monitor all incoming and outgoing emails in line
> with current legislation. We have taken steps to ensure that this email and
> attachments are free from any virus, but it remains your responsibility to
> ensure that viruses do not adversely affect you
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-13 Thread Sean Purdy

On Wed, 13 Jun 2018, Fabian Grünbichler said:
> I hope we find some way to support Mimic+ for Stretch without requiring
> a backport of gcc-7+, although it unfortunately seems unlikely at this
> point.

Me too.  I picked ceph luminous on debian stretch because I thought it would be 
maintained going forwards, and we're a debian shop.  I appreciate Mimic is a 
non-LTS release, I hope issues of debian support are resolved by the time of 
the next LTS.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSD recommendation

2018-05-31 Thread Sean Redmond

Hi,

I know the s4600 thread well as I had over 10 of those drives fail before I
took them all out of production.

Intel did say a firmware fix was on the way but I could not wait and opted
for SM863A and never looked back...

I will be sticking with SM863A for now on futher orders.

Thanks

On Thu, 31 May 2018, 15:33 Fulvio Galeazzi,  wrote:

> Hallo Simon,
>  I am also about to buy some new hardware and for SATA ~400GB I was
> considering Micron 5200 MAX, rated at 5 DWPD, for journaling/FSmetadata.
>Is anyone using such drives, and to what degree of satisfaction?
>
>Thanks
>
> Fulvio
>
>  Original Message 
> Subject: Re: [ceph-users] SSD recommendation
> From: Simon Ironside 
> To: ceph-users@lists.ceph.com
> Date: 5/31/2018 2:36 PM
>
> > It looks like the choices available to me in the SATA ~400GB and 3 DWPD
> > over 5 years range pretty much boils down to just the Intel DC S4600 and
> > the Samsung SM863a options anyway. Since David Herselman's thread has
> > put me off Intels I think I'll go with the Samsungs.
> >
> > Regards,
> > Simon.
> >
> > On 30/05/18 20:00, Simon Ironside wrote:
> >> Hi Everyone,
> >>
> >> I'm about to purchase hardware for a new production cluster. I was
> >> going to use 480GB Intel DC S4600 SSDs as either Journal devices for
> >> Filestore and/or DB/WAL for Bluestore spinning disk OSDs until I saw
> >> David Herselman's "Many concurrent drive failures" thread which has
> >> given me the fear.
> >>
> >> What's the current go to for Journal and/or DB/WAL SSDs if not the
> S4600?
> >>
> >> I'm planning on using AMD EPYC based Supermicros for OSD nodes with 3x
> >> 10TB SAS 7.2k to each SSD with 10gig networking. Happy to provide more
> >> info here if it's useful.
> >>
> >> Thanks,
> >> Simon.
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Sean Sullivan

Thanks Yan! I did this for the bug ticket and missed these replies. I hope
I did it correctly. Here are the pastes of the dumps:

https://pastebin.com/kw4bZVZT -- primary
https://pastebin.com/sYZQx0ER -- secondary


they are not that long here is the output of one:


   1. Thread 17 "mds_rank_progr" received signal SIGSEGV, Segmentation fault
   .
   2. [Switching to Thread 0x7fe3b100a700 (LWP 120481)]
   3. 0x5617aacc48c2 in Server::handle_client_getattr (this=this@entry=
   0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
   /build/ceph-12.2.5/src/mds/Server.cc:3065
   4. 3065/build/ceph-12.2.5/src/mds/Server.cc: No such file or
   directory.
   5. (gdb) t
   6. [Current thread is 17 (Thread 0x7fe3b100a700 (LWP 120481))]
   7. (gdb) bt
   8. #0  0x5617aacc48c2 in Server::handle_client_getattr (
   this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true)
   at /build/ceph-12.2.5/src/mds/Server.cc:3065
   9. #1  0x5617aacfc98b in Server::dispatch_client_request (
   this=this@entry=0x5617b5acbcd0, mdr=...) at
   /build/ceph-12.2.5/src/mds/Server.cc:1802
   10. #2  0x5617aacfce9b in Server::handle_client_request (
   this=this@entry=0x5617b5acbcd0, req=req@entry=0x5617bdfa8700)at
   /build/ceph-12.2.5/src/mds/Server.cc:1716
   11. #3  0x5617aad017b6 in Server::dispatch (this=0x5617b5acbcd0,
   m=m@entry=0x5617bdfa8700) at /build/ceph-12.2.5/src/mds/Server.cc:258
   12. #4  0x5617aac6afac in MDSRank::handle_deferrable_message (
   this=this@entry=0x5617b5d22000, m=m@entry=0x5617bdfa8700)at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:716
   13. #5  0x5617aac795cb in MDSRank::_dispatch (this=this@entry=
   0x5617b5d22000, m=0x5617bdfa8700, new_msg=new_msg@entry=false) at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:551
   14. #6  0x5617aac7a472 in MDSRank::retry_dispatch (this=
   0x5617b5d22000, m=) at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:998
   15. #7  0x5617aaf0207b in Context::complete (r=0, this=0x5617bd568080
   ) at /build/ceph-12.2.5/src/include/Context.h:70
   16. #8  MDSInternalContextBase::complete (this=0x5617bd568080, r=0) at
   /build/ceph-12.2.5/src/mds/MDSContext.cc:30
   17. #9  0x5617aac78bf7 in MDSRank::_advance_queues (this=
   0x5617b5d22000) at /build/ceph-12.2.5/src/mds/MDSRank.cc:776
   18. #10 0x5617aac7921a in MDSRank::ProgressThread::entry (this=
   0x5617b5d22d40) at /build/ceph-12.2.5/src/mds/MDSRank.cc:502
   19. #11 0x7fe3bb3066ba in start_thread (arg=0x7fe3b100a700) at
   pthread_create.c:333
   20. #12 0x7fe3ba37241d in clone () at
   ../sysdeps/unix/sysv/linux/x86_64/clone.S:109



I
* set the debug level to mds=20 mon=1,
*  attached gdb prior to trying to mount aufs from a separate client,
*  typed continue, attempted the mount,
* then backtraced after it seg faulted.

I hope this is more helpful. Is there something else I should try to get
more info? I was hoping for something closer to a python trace where it
says a variable is a different type or a missing delimiter. womp. I am
definitely out of my depth but now is a great time to learn! Can anyone
shed some more light as to what may be wrong?



On Fri, May 4, 2018 at 7:49 PM, Yan, Zheng  wrote:

> On Wed, May 2, 2018 at 7:19 AM, Sean Sullivan  wrote:
> > Forgot to reply to all:
> >
> > Sure thing!
> >
> > I couldn't install the ceph-mds-dbg packages without upgrading. I just
> > finished upgrading the cluster to 12.2.5. The issue still persists in
> 12.2.5
> >
> > From here I'm not really sure how to do generate the backtrace so I hope
> I
> > did it right. For others on Ubuntu this is what I did:
> >
> > * firstly up the debug_mds to 20 and debug_ms to 1:
> > ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1'
> >
> > * install the debug packages
> > ceph-mds-dbg in my case
> >
> > * I also added these options to /etc/ceph/ceph.conf just in case they
> > restart.
> >
> > * Now allow pids to dump (stolen partly from redhat docs and partly from
> > ubuntu)
> > echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a
> > /etc/systemd/system.conf
> > sysctl fs.suid_dumpable=2
> > sysctl kernel.core_pattern=/tmp/core
> > systemctl daemon-reload
> > systemctl restart ceph-mds@$(hostname -s)
> >
> > * A crash was created in /var/crash by apport but gdb cant read it. I
> used
> > apport-unpack and then ran GDB on what is inside:
> >
>
> core dump should be in /tmp/core
>
> > apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/
> > cd /root/crash_dump/
> > gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee
> > /root/ceph_mds_$(hostname -s)_backtrace
> >
> > * This left me with the attached backtraces (which I think are wrong as I

[ceph-users] Bucket reporting content inconsistently

2018-05-11 Thread Sean Redmond

HI all,



We have recently upgraded to 10.2.10 in preparation for our upcoming
upgrade to Luminous and I have been attempting to remove a bucket. When
using tools such as s3cmd I can see files are listed, verified by the
checking with bi list too as shown below:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bi list
--bucket='bucketnamehere' | grep -i "\"idx\":" | wc -l

3278



However, on attempting to delete the bucket and purge the objects , it
appears not to be recognised:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket rm --bucket=
bucketnamehere --purge-objects

2018-05-10 14:11:05.393851 7f0ab07b6a00 -1 ERROR: unable to remove
bucket(2) No such file or directory



Checking the bucket stats, it does appear that the bucket is reporting no
content, and repeat the above content test there has been no change to the
3278 figure:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket stats
--bucket="bucketnamehere"

{

"bucket": "bucketnamehere",

"pool": ".rgw.buckets",

"index_pool": ".rgw.buckets.index",

"id": "default.28142894.1",

"marker": "default.28142894.1",

"owner": "16355",

"ver":
"0#5463545,1#5483686,2#5483484,3#5474696,4#5479052,5#5480339,6#5469460,7#5463976",

"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0",

"mtime": "2015-12-08 12:42:26.286153",

"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#",

"usage": {

"rgw.main": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

},

"rgw.multimeta": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

}

},

"bucket_quota": {

"enabled": false,

"max_size_kb": -1,

"max_objects": -1

}

}



I have attempted a bucket index check and fix on this, however, it does not
appear to have made a difference and no fixes or errors reported from it.
Does anyone have any advice on how to proceed with removing this content?
At this stage I am not too concerned if the method needed to remove this
generates orphans, as we will shortly be running a large orphan scan after
our upgrade to Luminous. Cluster health otherwise reports normal.


Thanks

Sean Redmond
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to configure s3 bucket acl so that one user's bucket is visible to another.

2018-05-09 Thread Sean Purdy

The other way to do it is with policies.

e.g. a bucket owned by user1, but read access granted to user2:

{ 
  "Version":"2012-10-17",
  "Statement":[
{
  "Sid":"user2 policy",
  "Effect":"Allow",
  "Principal": {"AWS": ["arn:aws:iam:::user/user2"]},
  "Action":["s3:GetObject","s3:ListBucket"],
  "Resource":[
"arn:aws:s3:::example1/*",
"arn:aws:s3:::example1"
  ]
}
  ]
}

And set the policy with:
$ s3cmd setpolicy policy.json s3://example1/
or similar.

user2 won't see the bucket in their list of buckets, but will be able to read 
and list the bucket in this case.

More at 
https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html


Sean


On Tue,  8 May 2018, David Turner said:
> Sorry I've been on vacation, but I'm back now.  The command I use to create
> subusers for a rgw user is...
> 
> radosgw-admin user create --gen-access-key --gen-secret --uid=user_a
> --display_name="User A"
> radosgw-admin subuser create --gen-access-key --gen-secret
> --access={read,write,readwrite,full} --key-type=s3 --uid=user_a
> --subuser=subuser_1
> 
> Now all buckets created by user_a (or a subuser with --access=full) can now
> be accessed by user_a and all user_a:subusers.  What you missed was
> changing the default subuser type from swift to s3.  --access=full is
> needed for any user needed to be able to create and delete buckets, the
> others are fairly self explanatory for what they can do inside of existing
> buckets.
> 
> There are 2 approaches to use with subusers depending on your use case.
> The first use case is what I use for buckets.  We create 1 user per bucket
> and create subusers when necessary.  Most of our buckets are used by a
> single service and that's all the service uses... so they get the keys for
> their bucket and that's it.  Subusers are create just for the single bucket
> that the original user is in charge of.
> 
> The second use case is where you want a lot of buckets accessed by a single
> set of keys, but you want multiple people to all be able to access the
> buckets.  In this case I would create a single user and use that user to
> create all of the buckets and then create the subusers for everyone to be
> able to access the various buckets.  Note that with this method you get no
> more granularity to settings other than subuser_2 only has read access to
> every bucket.  You can't pick and choose which buckets a subuser has write
> access to, it's all or none.  That's why I use the first approach and call
> it "juggling" keys because if someone wants access to multiple buckets,
> they have keys for each individual bucket as a subuser.
> 
> On Sat, May 5, 2018 at 6:28 AM Marc Roos  wrote:
> 
> >
> > This 'juggle keys' is a bit cryptic to me. If I create a subuser it
> > becomes a swift user not? So how can that have access to the s3 or be
> > used in a s3 client. I have to put in the client the access and secret
> > key, in the subuser I only have a secret key.
> >
> > Is this multi tentant basically only limiting this buckets namespace to
> > the tenants users and nothing else?
> >
> >
> >
> >
> >
> > -Original Message-
> > From: David Turner [mailto:drakonst...@gmail.com]
> > Sent: zondag 29 april 2018 14:52
> > To: Yehuda Sadeh-Weinraub
> > Cc: ceph-users@lists.ceph.com; Безруков Илья Алексеевич
> > Subject: Re: [ceph-users] How to configure s3 bucket acl so that one
> > user's bucket is visible to another.
> >
> > You can create subuser keys to allow other users to have access to a
> > bucket. You have to juggle keys, but it works pretty well.
> >
> >
> > On Sun, Apr 29, 2018, 4:00 AM Yehuda Sadeh-Weinraub 
> > wrote:
> >
> >
> > You can't. A user can only list the buckets that it owns, it cannot
> > list other users' buckets.
> >
> > Yehuda
> >
> > On Sat, Apr 28, 2018 at 11:10 AM, Безруков Илья Алексеевич
> >  wrote:
> > > Hello,
> > >
> > > How to configure s3 bucket acl so that one user's bucket is
> > visible to
> > > another.
> > >
> > >
> > > I can create a bucket, objects in it and give another user
> > access
> > to it.
> > > But another user does not see this bucket in the list of
> > available buckets.
> > >
> >

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-04 Thread Sean Sullivan

Most of this is over my head but the last line of the logs on both mds
servers show something similar to:

 0> 2018-05-01 15:37:46.871932 7fd10163b700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7fd10163b700 thread_name:mds_rank_progr

When I search for this in ceph user and devel mailing list the only mention
I can see is from 12.0.3:

https://marc.info/?l=ceph-devel&m=149726392820648&w=2 -- ceph-devel

I don't see any mention of journal.cc in my logs however so I hope they are
not related. I also have not experienced any major loss in my cluster as of
yet and cephfs-journal-tool shows my journals as healthy.  To trigger this
bug I created a cephfs directory and user called aufstest. Here is the part
of the log with the crash mentioning aufstest.

https://pastebin.com/EL5ALLuE



I created a new bug ticket on ceph.com with all of the current info as I
believe this isn't a problem with my setup specifically and anyone else
trying this will have the same issue.
https://tracker.ceph.com/issues/23972

I hope this is the correct path. If anyone can guide me in the right
direction for troubleshooting this further I would be grateful.

On Tue, May 1, 2018 at 6:19 PM, Sean Sullivan  wrote:

> Forgot to reply to all:
>
>
> Sure thing!
>
> I couldn't install the ceph-mds-dbg packages without upgrading. I just
> finished upgrading the cluster to 12.2.5. The issue still persists in 12.2.5
>
> From here I'm not really sure how to do generate the backtrace so I hope I
> did it right. For others on Ubuntu this is what I did:
>
> * firstly up the debug_mds to 20 and debug_ms to 1:
> ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1'
>
> * install the debug packages
> ceph-mds-dbg in my case
>
> * I also added these options to /etc/ceph/ceph.conf just in case they
> restart.
>
> * Now allow pids to dump (stolen partly from redhat docs and partly from
> ubuntu)
> echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a
> /etc/systemd/system.conf
> sysctl fs.suid_dumpable=2
> sysctl kernel.core_pattern=/tmp/core
> systemctl daemon-reload
> systemctl restart ceph-mds@$(hostname -s)
>
> * A crash was created in /var/crash by apport but gdb cant read it. I used
> apport-unpack and then ran GDB on what is inside:
>
> apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/
> cd /root/crash_dump/
> gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee
> /root/ceph_mds_$(hostname -s)_backtrace
>
> * This left me with the attached backtraces (which I think are wrong as I
> see a lot of ?? yet gdb says /usr/lib/debug/.build-id/1d/
> 23dc5ef4fec1dacebba2c6445f05c8fe6b8a7c.debug was loaded)
>
>  kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD
>  kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY
>
>
> The log files are pretty large (one 4.1G and the other 200MB)
>
> kh10-8 (200MB) mds log -- https://griffin-objstore.op
> ensciencedatacloud.org/logs/ceph-mds.kh10-8.log
> kh09-8 (4.1GB) mds log -- https://griffin-objstore.op
> ensciencedatacloud.org/logs/ceph-mds.kh09-8.log
>
> On Tue, May 1, 2018 at 12:09 AM, Patrick Donnelly 
> wrote:
>
>> Hello Sean,
>>
>> On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan 
>> wrote:
>> > I was creating a new user and mount point. On another hardware node I
>> > mounted CephFS as admin to mount as root. I created /aufstest and then
>> > unmounted. From there it seems that both of my mds nodes crashed for
>> some
>> > reason and I can't start them any more.
>> >
>> > https://pastebin.com/1ZgkL9fa -- my mds log
>> >
>> > I have never had this happen in my tests so now I have live data here.
>> If
>> > anyone can lend a hand or point me in the right direction while
>> > troubleshooting that would be a godsend!
>>
>> Thanks for keeping the list apprised of your efforts. Since this is so
>> easily reproduced for you, I would suggest that you next get higher
>> debug logs (debug_mds=20/debug_ms=1) from the MDS. And, since this is
>> a segmentation fault, a backtrace with debug symbols from gdb would
>> also be helpful.
>>
>> --
>> Patrick Donnelly
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mgr dashboard differs from ceph status

2018-05-04 Thread Sean Purdy

I get this too, since I last rebooted a server (one of three).

ceph -s says:

  cluster:
id: a8c34694-a172-4418-a7dd-dd8a642eb545
health: HEALTH_OK

  services:
mon: 3 daemons, quorum box1,box2,box3
mgr: box3(active), standbys: box1, box2
osd: N osds: N up, N in
rgw: 3 daemons active

mgr dashboard says:

Overall status: HEALTH_WARN

MON_DOWN: 1/3 mons down, quorum box1,box3

I wasn't going to worry too much.  I'll check logs and restart an mgr then.

Sean

On Fri,  4 May 2018, John Spray said:
> On Fri, May 4, 2018 at 7:21 AM, Tracy Reed  wrote:
> > My ceph status says:
> >
> >   cluster:
> > id: b2b00aae-f00d-41b4-a29b-58859aa41375
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum ceph01,ceph03,ceph07
> > mgr: ceph01(active), standbys: ceph-ceph07, ceph03
> > osd: 78 osds: 78 up, 78 in
> >
> >   data:
> > pools:   4 pools, 3240 pgs
> > objects: 4384k objects, 17533 GB
> > usage:   53141 GB used, 27311 GB / 80452 GB avail
> > pgs: 3240 active+clean
> >
> >   io:
> > client:   4108 kB/s rd, 10071 kB/s wr, 27 op/s rd, 331 op/s wr
> >
> > but my mgr dashboard web interface says:
> >
> >
> > Health
> > Overall status: HEALTH_WARN
> >
> > PG_AVAILABILITY: Reduced data availability: 2563 pgs inactive
> >
> >
> > Anyone know why the discrepency? Hopefully the dashboard is very
> > mistaken! Everything seems to be operating normally. If I had 2/3 of my
> > pgs inactive I'm sure all of my rbd backing my VMs would be blocked etc.
> 
> A situation like this probably indicates that something is going wrong
> with the mon->mgr synchronisation of health state (it's all calculated
> in one place and the mon updates the mgr every few seconds).
> 
> 1. Look for errors in your monitor logs
> 2. You'll probably find that everything gets back in sync if you
> restart a mgr daemon
> 
> John
> 
> > I'm running ceph-12.2.4-0.el7.x86_64 on CentOS 7. Almost all filestore
> > except for one OSD which recently had to be replaced which I made
> > bluestore. I plan to slowly migrate everything over to bluestore over
> > the course of the next month.
> >
> > Thanks!
> >
> > --
> > Tracy Reed
> > http://tracyreed.org
> > Digital signature attached for your safety.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-01 Thread Sean Sullivan

Forgot to reply to all:

Sure thing!

I couldn't install the ceph-mds-dbg packages without upgrading. I just
finished upgrading the cluster to 12.2.5. The issue still persists in 12.2.5

>From here I'm not really sure how to do generate the backtrace so I hope I
did it right. For others on Ubuntu this is what I did:

* firstly up the debug_mds to 20 and debug_ms to 1:
ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1'

* install the debug packages
ceph-mds-dbg in my case

* I also added these options to /etc/ceph/ceph.conf just in case they
restart.

* Now allow pids to dump (stolen partly from redhat docs and partly from
ubuntu)
echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a
/etc/systemd/system.conf
sysctl fs.suid_dumpable=2
sysctl kernel.core_pattern=/tmp/core
systemctl daemon-reload
systemctl restart ceph-mds@$(hostname -s)

* A crash was created in /var/crash by apport but gdb cant read it. I used
apport-unpack and then ran GDB on what is inside:

apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/
cd /root/crash_dump/
gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee
/root/ceph_mds_$(hostname -s)_backtrace

* This left me with the attached backtraces (which I think are wrong as I
see a lot of ?? yet gdb says /usr/lib/debug/.build-id/1d/
23dc5ef4fec1dacebba2c6445f05c8fe6b8a7c.debug was loaded)

 kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD
 kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY


The log files are pretty large (one 4.1G and the other 200MB)

kh10-8 (200MB) mds log -- https://griffin-objstore.
opensciencedatacloud.org/logs/ceph-mds.kh10-8.log
kh09-8 (4.1GB) mds log -- https://griffin-objstore.
opensciencedatacloud.org/logs/ceph-mds.kh09-8.log

On Tue, May 1, 2018 at 12:09 AM, Patrick Donnelly 
wrote:

> Hello Sean,
>
> On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan 
> wrote:
> > I was creating a new user and mount point. On another hardware node I
> > mounted CephFS as admin to mount as root. I created /aufstest and then
> > unmounted. From there it seems that both of my mds nodes crashed for some
> > reason and I can't start them any more.
> >
> > https://pastebin.com/1ZgkL9fa -- my mds log
> >
> > I have never had this happen in my tests so now I have live data here. If
> > anyone can lend a hand or point me in the right direction while
> > troubleshooting that would be a godsend!
>
> Thanks for keeping the list apprised of your efforts. Since this is so
> easily reproduced for you, I would suggest that you next get higher
> debug logs (debug_mds=20/debug_ms=1) from the MDS. And, since this is
> a segmentation fault, a backtrace with debug symbols from gdb would
> also be helpful.
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

I forgot that I left my VM mount command running. It hangs my VM but more
alarming is that it crashes my MDS servers on the ceph cluster. The ceph
cluster is all hardware nodes and the openstack vm does not have an admin
keyring (although the cephX keyring for cephfs generated does have write
permissions to the ec42 pool.


 +-+
 |
   |
 |   Luminous CephFS
Cluster   |
 |   version 12.2.4
|
 |   Ubuntu 16.04
|
 |   4.10.0-38-generic (all
hardware nodes)|
 |
   |
++
 +---+++
||   |   |
  ||
|  Openstack VM  |   |  Ceph Monitor A   |  Ceph
Monitor B|  Ceph Monitor C|
|  Ubuntu 16.04  +--->   |  Ceph Mon Server  |  Ceph
MDS A|  Ceph MDS Failover |
|  4.13.0-39-generic |   |  kh08-8   |  Kh09-8
  |  kh10-8|
|  Cephfs via kernel |   |   |
  ||
++
 +---+++
 |
   |
 |ec42
   16384 PGs   |
 |CephFS Data Pool
   |
 |Erasure coded with
4/2 profile   |
 |
   |

 +-+
 |
   |
 |   cephfs_metadata
   4096 PGs|
 |   CephFS Metadata Pool
|
 |   Replicated pool (n=3)
   |
 |
   |

 +-+

As far as I am aware this shouldn't happen. I will try upgrading as soon as
I can but I didn't see anything like this mentioned in the change log and
am worried this will still exist in 12.2.5. Has anyone seen this before?


On Mon, Apr 30, 2018 at 7:24 PM, Sean Sullivan  wrote:

> So I think I can reliably reproduce this crash from a ceph client.
>
> ```
> root@kh08-8:~# ceph -s
>   cluster:
> id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
> mgr: kh08-8(active)
> mds: cephfs-1/1/1 up  {0=kh09-8=up:active}, 1 up:standby
> osd: 570 osds: 570 up, 570 in
> ```
>
>
> then from a client try to mount aufs over cephfs:
> ```
> mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
> ```
>
> Now watch as your ceph mds servers fail:
>
> ```
> root@kh08-8:~# ceph -s
>   cluster:
> id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
> health: HEALTH_WARN
> insufficient standby MDS daemons available
>
>   services:
> mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
> mgr: kh08-8(active)
> mds: cephfs-1/1/1 up  {0=kh10-8=up:active(laggy or crashed)}
> ```
>
>
> I am now stuck in a degraded and I can't seem to get them to start again.
>
> On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan 
> wrote:
>
>> I had 2 MDS servers (one active one standby) and both were down. I took a
>> dumb chance and marked the active as down (it said it was up but laggy).
>> Then started the primary again and now both are back up. I have never seen
>> this before I am also not sure of what I just did.
>>
>> On Mon, Apr 30, 2018 at 4:32 PM, Sean Sullivan 
>> wrote:
>>
>>> I was creating a new user and mount point. On another hardware node I
>>> mounted CephFS as admin to mount as root. I created /aufstest and then
>>> unmounted. From there it seems that both of my mds nodes crashed for some
>>> reason and I can't start them any more.
>>>
>>> https://pastebin.c

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

So I think I can reliably reproduce this crash from a ceph client.

```
root@kh08-8:~# ceph -s
  cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_OK

  services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up  {0=kh09-8=up:active}, 1 up:standby
osd: 570 osds: 570 up, 570 in
```


then from a client try to mount aufs over cephfs:
```
mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
```

Now watch as your ceph mds servers fail:

```
root@kh08-8:~# ceph -s
  cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_WARN
insufficient standby MDS daemons available

  services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up  {0=kh10-8=up:active(laggy or crashed)}
```


I am now stuck in a degraded and I can't seem to get them to start again.

On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan  wrote:

> I had 2 MDS servers (one active one standby) and both were down. I took a
> dumb chance and marked the active as down (it said it was up but laggy).
> Then started the primary again and now both are back up. I have never seen
> this before I am also not sure of what I just did.
>
> On Mon, Apr 30, 2018 at 4:32 PM, Sean Sullivan 
> wrote:
>
>> I was creating a new user and mount point. On another hardware node I
>> mounted CephFS as admin to mount as root. I created /aufstest and then
>> unmounted. From there it seems that both of my mds nodes crashed for some
>> reason and I can't start them any more.
>>
>> https://pastebin.com/1ZgkL9fa -- my mds log
>>
>> I have never had this happen in my tests so now I have live data here. If
>> anyone can lend a hand or point me in the right direction while
>> troubleshooting that would be a godsend!
>>
>> I tried cephfs-journal-tool inspect and it reports that the journal
>> should be fine. I am not sure why it's crashing:
>>
>> /home/lacadmin# cephfs-journal-tool journal inspect
>> Overall journal integrity: OK
>>
>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

I had 2 MDS servers (one active one standby) and both were down. I took a
dumb chance and marked the active as down (it said it was up but laggy).
Then started the primary again and now both are back up. I have never seen
this before I am also not sure of what I just did.

On Mon, Apr 30, 2018 at 4:32 PM, Sean Sullivan  wrote:

> I was creating a new user and mount point. On another hardware node I
> mounted CephFS as admin to mount as root. I created /aufstest and then
> unmounted. From there it seems that both of my mds nodes crashed for some
> reason and I can't start them any more.
>
> https://pastebin.com/1ZgkL9fa -- my mds log
>
> I have never had this happen in my tests so now I have live data here. If
> anyone can lend a hand or point me in the right direction while
> troubleshooting that would be a godsend!
>
> I tried cephfs-journal-tool inspect and it reports that the journal should
> be fine. I am not sure why it's crashing:
>
> /home/lacadmin# cephfs-journal-tool journal inspect
> Overall journal integrity: OK
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Sean Sullivan

I was creating a new user and mount point. On another hardware node I
mounted CephFS as admin to mount as root. I created /aufstest and then
unmounted. From there it seems that both of my mds nodes crashed for some
reason and I can't start them any more.

https://pastebin.com/1ZgkL9fa -- my mds log

I have never had this happen in my tests so now I have live data here. If
anyone can lend a hand or point me in the right direction while
troubleshooting that would be a godsend!

I tried cephfs-journal-tool inspect and it reports that the journal should
be fine. I am not sure why it's crashing:

/home/lacadmin# cephfs-journal-tool journal inspect
Overall journal integrity: OK
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] The mystery of sync modules

2018-04-27 Thread Sean Purdy

Hi,


Mimic has a new feature, a cloud sync module for radosgw to sync objects to 
some other S3-compatible destination.

This would be a lovely thing to have here, and ties in nicely with object 
versioning and DR.  But I am put off by confusion and complexity with the whole 
multisite/realm/zone group/zone thing, and the docs aren't very forgiving, 
including a recommendation to delete all your data!

Is there a straightforward way to set up the additional zone for a sync module 
with a preexisting bucket?  Whether it's the elasticsearch metadata search or 
the cloud replication, setting up sync modules on your *current* buckets must 
be a FAQ or at least frequently desired option.

Do I need a top-level realm?  I'm not actually using multisite for two 
clusters, I just want to use sync modules.  If I do, how do I transition my 
current default realm and RGW buckets?

Any blog posts to recommend?

It's not a huge cluster, but it does include production data.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW bucket lifecycle policy vs versioning

2018-04-26 Thread Sean Purdy

Hi,

Both versioned buckets and lifecycle policies are implemented in ceph, and look 
useful.

But are lifecycle policies implemented for versioned buckets?  i.e. can I set a 
policy that will properly expunge all "deleted" objects after a certain time?  
i.e. objects where the delete marker is the latest version.  This is available 
in AWS for example.


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW GC Processing Stuck

2018-04-24 Thread Sean Redmond

Hi,

sure no problem, I posted it here

http://tracker.ceph.com/issues/23839

On Tue, 24 Apr 2018, 16:04 Matt Benjamin,  wrote:

> Hi Sean,
>
> Could you create an issue in tracker.ceph.com with this info?  That
> would make it easier to iterate on.
>
> thanks and regards,
>
> Matt
>
> On Tue, Apr 24, 2018 at 10:45 AM, Sean Redmond 
> wrote:
> > Hi,
> > We are currently using Jewel 10.2.7 and recently, we have been
> experiencing
> > some issues with objects being deleted using the gc. After a bucket was
> > unsuccessfully deleted using –purge-objects (first error next discussed
> > occurred), all of the rgw’s are occasionally becoming unresponsive and
> > require a restart of the processes before they will accept requests
> again.
> > On investigation of the garbage collection, it has an enormous list
> which we
> > are struggling to count the length of, but seem stuck on a particular
> object
> > which is not updating, shown in the logs below:
> >
> >
> >
> > 2018-04-23 15:16:04.101660 7f1fdcc29a00  0 gc::process: removing
> > .rgw.buckets:default.290071.4_XXX//XX/XX/XXX.ZIP
> >
> > 2018-04-23 15:16:04.104231 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_1
> >
> > 2018-04-23 15:16:04.105541 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_2
> >
> > 2018-04-23 15:16:04.176235 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_3
> >
> > 2018-04-23 15:16:04.178435 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_4
> >
> > 2018-04-23 15:16:04.250883 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_5
> >
> > 2018-04-23 15:16:04.297912 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_6
> >
> > 2018-04-23 15:16:04.298803 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_7
> >
> > 2018-04-23 15:16:04.320202 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_8
> >
> > 2018-04-23 15:16:04.340124 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_9
> >
> > 2018-04-23 15:16:04.383924 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_10
> >
> > 2018-04-23 15:16:04.386865 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_11
> >
> > 2018-04-23 15:16:04.389067 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_12
> >
> > 2018-04-23 15:16:04.413938 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_13
> >
> > 2018-04-23 15:16:04.487977 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_14
> >
> > 2018-04-23 15:16:04.544235 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_1
> >
> > 2018-04-23 15:16:04.546978 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_2
> >
> > 2018-04-23 15:16:04.598644 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_3
> >
> > 2018-04-23 15:16:04.629519 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_4
> >
> > 2018-04-23 15:16:04.700492 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_5
> >
> > 2018-04-23 15:16:04.765798 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_6
> >
> > 2018-04-23 15:16:04.772774 7f1fdcc29a00  0 gc::process: removing
> >
> .rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_7
> >
> >

[ceph-users] RGW GC Processing Stuck

2018-04-24 Thread Sean Redmond

Hi,
We are currently using Jewel 10.2.7 and recently, we have been experiencing
some issues with objects being deleted using the gc. After a bucket was
unsuccessfully deleted using –purge-objects (first error next discussed
occurred), all of the rgw’s are occasionally becoming unresponsive and
require a restart of the processes before they will accept requests again.
On investigation of the garbage collection, it has an enormous list which
we are struggling to count the length of, but seem stuck on a particular
object which is not updating, shown in the logs below:



2018-04-23 15:16:04.101660 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.290071.4_XXX//XX/XX/XXX.ZIP

2018-04-23 15:16:04.104231 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_1

2018-04-23 15:16:04.105541 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_2

2018-04-23 15:16:04.176235 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_3

2018-04-23 15:16:04.178435 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_4

2018-04-23 15:16:04.250883 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_5

2018-04-23 15:16:04.297912 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_6

2018-04-23 15:16:04.298803 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_7

2018-04-23 15:16:04.320202 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_8

2018-04-23 15:16:04.340124 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_9

2018-04-23 15:16:04.383924 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_10

2018-04-23 15:16:04.386865 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_11

2018-04-23 15:16:04.389067 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_12

2018-04-23 15:16:04.413938 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_13

2018-04-23 15:16:04.487977 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.bxz6tqhZzqZozTFkxPVspHfIhhVxaj5_14

2018-04-23 15:16:04.544235 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_1

2018-04-23 15:16:04.546978 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_2

2018-04-23 15:16:04.598644 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_3

2018-04-23 15:16:04.629519 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_4

2018-04-23 15:16:04.700492 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_5

2018-04-23 15:16:04.765798 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_6

2018-04-23 15:16:04.772774 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_7

2018-04-23 15:16:04.846379 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_8

2018-04-23 15:16:04.935023 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_9

2018-04-23 15:16:04.937229 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_10

2018-04-23 15:16:04.968289 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_11

2018-04-23 15:16:05.005194 7f1fdcc29a00  0 gc::process: removing
.rgw.buckets:default.175209462.16__shadow_.06ry24pXQW8yH8EJpoqjEtZF6M6tiUv_12



We seem completely unable to get this deleted, and nothing else of
immediate concern is flagging up as a potential cause of all RGWs become
unresponsive at the same time. On the bucket containing this object (the
one we originally tried to purge), I have attempted a further purge passing
the “—bypass-gc” parameter to it, but this also resulted in all rgws
becoming unresponsive within 30 minutes and so I terminated the operation
and restarted the rgws again.



The bucket we attempted to remove has no shards and I have attached the
details below. 90% of the conten

Re: [ceph-users] Is there a faster way of copy files to and from a rgw bucket?

2018-04-23 Thread Sean Purdy

On Sat, 21 Apr 2018, Marc Roos said:
> 
> I wondered if there are faster ways to copy files to and from a bucket, 
> like eg not having to use the radosgw? Is nfs-ganesha doing this faster 
> than s3cmd?

I find the go-based S3 clients e.g. rclone, minio mc, are a bit faster than the 
python-based ones, s3cmd, aws.


Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] London Ceph day yesterday

2018-04-20 Thread Sean Purdy

Just a quick note to say thanks for organising the London Ceph/OpenStack day.  
I got a lot out of it, and it was nice to see the community out in force.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [rgw] civetweb behind haproxy doesn't work with absolute URI

2018-03-29 Thread Sean Purdy

We had something similar recently.  We had to disable "rgw dns name" in the end.


Sean

On Thu, 29 Mar 2018, Rudenko Aleksandr said:
> 
> Hi friends.
> 
> 
> I'm sorry, maybe it isn't bug, but i don't know how to solve this problem.
> 
> I know that absolute URIs are supported in civetweb and it works fine for me 
> without haproxy in the middle.
> 
> But if client send absolute URIs through reverse proxy(haproxy) to civetweb, 
> civetweb breaks connection without responce.
> 
> i set:
> 
> debug rgw = 20
> debug civetweb = 10
> 
> 
> but no any messgaes in civetweb logs(access, error) and in rgw logs.
> in tcpdump i only see as rgw closes connection after request with absolute 
> URI. Relative URIs in requests work fine with haproxy.
> 
> Client:
> Docker registry v2.6.2, s3 driver based on aws-sdk-go/1.2.4 (go1.7.6; linux; 
> amd64) uses absolute URI in requests.
> 
> s3 driver options of docker registry:
> 
>   s3:
> region: us-east-1
> bucket: docker
> accesskey: 'access_key'
> secretkey: 'secret_key'
> regionendpoint: http://storage.my-domain.ru
> secure: false
> v4auth: true
> 
> 
> ceph.conf for rgw instance:
> 
> [client]
> rgw dns name = storage.my-domain.ru<http://storage.my-domain.ru>
> rgw enable apis = s3, admin
> rgw dynamic resharding = false
> rgw enable usage log = true
> rgw num rados handles = 8
> rgw thread pool size = 256
> 
> [client.rgw.a]
> host = aj15
> keyring = /var/lib/ceph/radosgw/rgw.a.keyring
> rgw enable static website = true
> rgw frontends = civetweb 
> authentication_domain=storage.my-domain.ru<http://storage.my-domain.ru> 
> num_threads=128 port=0.0.0.0:7480 
> access_log_file=/var/log/ceph/civetweb.rgw.access.log 
> error_log_file=/var/log/ceph/civetweb.rgw.error.log
> debug rgw = 20
> debug civetweb = 10
> 
> 
> very simple haproxy.cfg:
> 
> global
> chroot /var/empty
> # /log is chroot path
> log /haproxy-log local2
> 
> pidfile /var/run/haproxy.pid
> 
> user haproxy
> group haproxy
> daemon
> 
> ssl-default-bind-options no-sslv3
> ssl-default-bind-ciphers 
> ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
> ssl-dh-param-file /etc/pki/tls/dhparams.pem
> 
> defaults
> mode http
> log global
> 
> frontend s3
> 
> bind *:80
> bind *:443 ssl crt /etc/pki/tls/certs/s3.pem crt 
> /etc/pki/tls/certs/s3-buckets.pem
> 
> use_backend rgw
> 
> backend rgw
> 
> balance roundrobin
> 
> server a aj15:7480 check fall 1
> server a aj16:7480 check fall 1
> 
> 
> http haeder from tcpdump before and after haproxy:
> 
> GET http://storage.my-domain.ru/docker?max-keys=1&prefix= HTTP/1.1
> Host: storage.my-domain.ru<http://storage.my-domain.ru>
> User-Agent: aws-sdk-go/1.2.4 (go1.7.6; linux; amd64)
> Authorization: AWS4-HMAC-SHA256 
> Credential=user:u...@cloud.croc.ru<mailto:u...@cloud.croc.ru>/20180328/us-east-1/s3/aws4_request,
>  SignedHeaders=host;x-amz-content-sha256;x-amz-date, 
> Signature=10043867bbb2833d50f9fe16a6991436a5c328adc5042556ce1ddf1101ee2cb9
> X-Amz-Content-Sha256: 
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> X-Amz-Date: 20180328T111255Z
> Accept-Encoding: gzip
> 
> i don't understand how use haproxy and absolute URIs in requests(
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Sean Purdy

On Wed,  7 Mar 2018, Wei Jin said:
> Same issue here.
> Will Ceph community support Debian Jessie in the future?

Seems odd to stop it right in the middle of minor point releases.  Maybe it was 
an oversight?  Jessie's still supported in Debian as oldstable and not even in 
LTS yet.


Sean

 
> On Mon, Mar 5, 2018 at 6:33 PM, Florent B  wrote:
> > Jessie is no more supported ??
> > https://download.ceph.com/debian-luminous/dists/jessie/main/binary-amd64/Packages
> > only contains ceph-deploy package !
> >
> >
> > On 28/02/2018 10:24, Florent B wrote:
> >> Hi,
> >>
> >> Since yesterday, the "ceph-luminous" repository does not contain any
> >> package for Debian Jessie.
> >>
> >> Is it expected ?
> >>
> >> Thank you.
> >>
> >> Florent
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Efficient deletion of large radosgw buckets

2018-02-16 Thread Sean Purdy

Thanks David.


> purging the objects and bypassing the GC is definitely the way to go

Cool.

> What rebalancing do you expect to see during this operation that you're 
> trying to avoid

I think I just have a poor understanding or wasn't thinking very hard :)  I 
suppose the question really was "are there any performance implications in 
deleting large buckets that I should be aware of?".  So, no really.  Just will 
take a while.

The actual cluster is small and balanced with free space.  Buckets are not 
customer-facing.


Thanks for the advice,

Sean


On Thu, 15 Feb 2018, David Turner said:
> Which is more important to you?  Deleting the bucket fast or having the
> used space become available?  If deleting the bucket fast is the priority,
> then you can swamp the GC by multithreading object deletion from the bucket
> with python or something.  If having everything deleted and cleaned up from
> the cluster is the priority (which is most likely the case), then what you
> have there is the best option.  If you want to do it in the background away
> from what the client can see, then you can change the ownership of the
> bucket so they no longer see it and then take care of the bucket removal in
> the background, but purging the objects and bypassing the GC is definitely
> the way to go. ... It's just really slow.
> 
> I just noticed that your question is about ceph rebalancing.  What
> rebalancing do you expect to see during this operation that you're trying
> to avoid?  I'm unaware of any such rebalancing (unless it might be the new
> automatic OSD rebalancing mechanism in Luminous to keep OSDs even... but
> deleting data shouldn't really trigger that if the cluster is indeed
> balanced).
> 
> On Thu, Feb 15, 2018 at 9:13 AM Sean Purdy  wrote:
> 
> >
> > Hi,
> >
> > I have a few radosgw buckets with millions or tens of millions of
> > objects.  I would like to delete these entire buckets.
> >
> > Is there a way to do this without ceph rebalancing as it goes along?
> >
> > Is there anything better than just doing:
> >
> > radosgw-admin bucket rm --bucket=test --purge-objects --bypass-gc
> >
> >
> > Thanks,
> >
> > Sean Purdy
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Efficient deletion of large radosgw buckets

2018-02-15 Thread Sean Purdy


Hi,

I have a few radosgw buckets with millions or tens of millions of objects.  I 
would like to delete these entire buckets.

Is there a way to do this without ceph rebalancing as it goes along?

Is there anything better than just doing:

radosgw-admin bucket rm --bucket=test --purge-objects --bypass-gc


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-12 Thread Sean Redmond

Hi David,

To follow up on this I had a 4th drive fail (out of 12) and have opted to
order the below disks as a replacement, I have an ongoing case with Intel
via the supplier - Will report back anything useful - But I am going to
avoid the Intel s4600 2TB SSD's for the moment.

1.92TB Samsung SM863a 2.5" Enterprise SSD, SATA3 6Gb/s, 2-bit MLC V-NAND

Regards
Sean Redmond

On Wed, Jan 10, 2018 at 11:08 PM, Sean Redmond 
wrote:

> Hi David,
>
> Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
> disk model) in None RAID mode via a perc RAID card.
>
> The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.
>
> Thanks for your feedback on the HGST disks.
>
> Thanks
>
> On Wed, Jan 10, 2018 at 10:55 PM, David Herselman  wrote:
>
>> Hi Sean,
>>
>>
>>
>> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
>> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
>> firmware. No official answer as to whether or not one can logically unlock
>> the drives, no answer to my question whether or not Intel publish serial
>> numbers anywhere pertaining to recalled batches and no information
>> pertaining to whether or not firmware updates would address any known
>> issues.
>>
>>
>>
>> This with us being an accredited Intel Gold partner…
>>
>>
>>
>>
>>
>> We’ve returned the lot and ended up with 9/12 of the drives failing in
>> the same manner. The replaced drives, which had different serial number
>> ranges, also failed. Very frustrating is that the drives fail in a way that
>> result in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>>
>>
>>
>>
>>
>> I would be interested to know what platform your drives were in and
>> whether or not they were connected to a RAID module/card.
>>
>>
>>
>> PS: After much searching we’ve decided to order the NVMe conversion kit
>> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
>> rating.
>>
>>
>>
>>
>>
>> Regards
>>
>> David Herselman
>>
>>
>>
>> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
>> *Sent:* Thursday, 11 January 2018 12:45 AM
>> *To:* David Herselman 
>> *Cc:* Christian Balzer ; ceph-users@lists.ceph.com
>>
>> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
>> activate pgs?
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
>> within a matter of days after being burn-in tested then placed into
>> production.
>>
>>
>>
>> I am interested to know, did you every get any further feedback from the
>> vendor on your issue?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:
>>
>> Hi,
>>
>> I assume this can only be a physical manufacturing flaw or a firmware
>> bug? Do Intel publish advisories on recalled equipment? Should others be
>> concerned about using Intel DC S4600 SSD drives? Could this be an
>> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
>> way, all pure Intel...
>>
>> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped
>> through images, file systems are subsequently severely damaged.
>>
>> Is it possible to get Ceph to read in partial data shards? It would
>> provide between 25-75% more yield...
>>
>>
>> Is there anything wrong with how we've proceeded thus far? Would be nice
>> to reference examples of using ceph-objectstore-tool but documentation is
>> virtually non-existent.
>>
>> We used another SSD drive to simulate bringing all the SSDs back online.
>> We carved up the drive to provide equal partitions to essentially simulate
>> the original SSDs:
>>   # Partition a drive to provide 12 x 150GB partitions, eg:
>> sdd   8:48   0   1.8T  0 disk
>> |-sdd18:49   0   140G  0 part
>> |-sdd28:50   0   140G  0 part
>> |-sdd38:51   0   140G  0 part
>> |-sdd48:52   0   140G  0 part
>> |-sdd58:53   0   140G  0 part
>> |-sdd68:54   0   140G  0 part
>> |-sdd78:55   0   140G  0 part
>> |-sdd88:56   0   140G  0 part
>> |-sdd98:57   0   140G  0 part
>> |-sdd10   8:58   0   140G  0 part
>> |-sdd11   8:59   0   140G  0 part
>> +-sdd12   8:60   0   140G  0 part
>>
>>
>>   Pre

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond

Hi David,

Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
disk model) in None RAID mode via a perc RAID card.

The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.

Thanks for your feedback on the HGST disks.

Thanks

On Wed, Jan 10, 2018 at 10:55 PM, David Herselman  wrote:

> Hi Sean,
>
>
>
> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
> firmware. No official answer as to whether or not one can logically unlock
> the drives, no answer to my question whether or not Intel publish serial
> numbers anywhere pertaining to recalled batches and no information
> pertaining to whether or not firmware updates would address any known
> issues.
>
>
>
> This with us being an accredited Intel Gold partner…
>
>
>
>
>
> We’ve returned the lot and ended up with 9/12 of the drives failing in the
> same manner. The replaced drives, which had different serial number ranges,
> also failed. Very frustrating is that the drives fail in a way that result
> in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>
>
>
>
>
> I would be interested to know what platform your drives were in and
> whether or not they were connected to a RAID module/card.
>
>
>
> PS: After much searching we’ve decided to order the NVMe conversion kit
> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
> rating.
>
>
>
>
>
> Regards
>
> David Herselman
>
>
>
> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
> *Sent:* Thursday, 11 January 2018 12:45 AM
> *To:* David Herselman 
> *Cc:* Christian Balzer ; ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
>
>
> Hi,
>
>
>
> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
> within a matter of days after being burn-in tested then placed into
> production.
>
>
>
> I am interested to know, did you every get any further feedback from the
> vendor on your issue?
>
>
>
> Thanks
>
>
>
> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:
>
> Hi,
>
> I assume this can only be a physical manufacturing flaw or a firmware bug?
> Do Intel publish advisories on recalled equipment? Should others be
> concerned about using Intel DC S4600 SSD drives? Could this be an
> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
> way, all pure Intel...
>
> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through
> images, file systems are subsequently severely damaged.
>
> Is it possible to get Ceph to read in partial data shards? It would
> provide between 25-75% more yield...
>
>
> Is there anything wrong with how we've proceeded thus far? Would be nice
> to reference examples of using ceph-objectstore-tool but documentation is
> virtually non-existent.
>
> We used another SSD drive to simulate bringing all the SSDs back online.
> We carved up the drive to provide equal partitions to essentially simulate
> the original SSDs:
>   # Partition a drive to provide 12 x 150GB partitions, eg:
> sdd   8:48   0   1.8T  0 disk
> |-sdd18:49   0   140G  0 part
> |-sdd28:50   0   140G  0 part
> |-sdd38:51   0   140G  0 part
> |-sdd48:52   0   140G  0 part
> |-sdd58:53   0   140G  0 part
> |-sdd68:54   0   140G  0 part
> |-sdd78:55   0   140G  0 part
> |-sdd88:56   0   140G  0 part
> |-sdd98:57   0   140G  0 part
> |-sdd10   8:58   0   140G  0 part
> |-sdd11   8:59   0   140G  0 part
> +-sdd12   8:60   0   140G  0 part
>
>
>   Pre-requisites:
> ceph osd set noout;
> apt-get install uuid-runtime;
>
>
>   for ID in `seq 24 35`; do
> UUID=`uuidgen`;
> OSD_SECRET=`ceph-authtool --gen-print-key`;
> DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
> echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i
> - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
> mkdir /var/lib/ceph/osd/ceph-$ID;
> mkfs.xfs $DEVICE;
> mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
> ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
> --name osd.$ID --add-key $OSD_SECRET;
> ceph-osd -i $ID --mkfs --osd-uuid $UUID;
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
> systemctl enable ceph-osd@$ID;
> systemctl start ceph-osd@$ID;
>   done
>
>
> Once up we imported previous exports of empty he

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond

Hi,

I have a case where 3 out to 12 of these Intel S4600 2TB model failed
within a matter of days after being burn-in tested then placed into
production.

I am interested to know, did you every get any further feedback from the
vendor on your issue?

Thanks

On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:

> Hi,
>
> I assume this can only be a physical manufacturing flaw or a firmware bug?
> Do Intel publish advisories on recalled equipment? Should others be
> concerned about using Intel DC S4600 SSD drives? Could this be an
> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
> way, all pure Intel...
>
> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through
> images, file systems are subsequently severely damaged.
>
> Is it possible to get Ceph to read in partial data shards? It would
> provide between 25-75% more yield...
>
>
> Is there anything wrong with how we've proceeded thus far? Would be nice
> to reference examples of using ceph-objectstore-tool but documentation is
> virtually non-existent.
>
> We used another SSD drive to simulate bringing all the SSDs back online.
> We carved up the drive to provide equal partitions to essentially simulate
> the original SSDs:
>   # Partition a drive to provide 12 x 150GB partitions, eg:
> sdd   8:48   0   1.8T  0 disk
> |-sdd18:49   0   140G  0 part
> |-sdd28:50   0   140G  0 part
> |-sdd38:51   0   140G  0 part
> |-sdd48:52   0   140G  0 part
> |-sdd58:53   0   140G  0 part
> |-sdd68:54   0   140G  0 part
> |-sdd78:55   0   140G  0 part
> |-sdd88:56   0   140G  0 part
> |-sdd98:57   0   140G  0 part
> |-sdd10   8:58   0   140G  0 part
> |-sdd11   8:59   0   140G  0 part
> +-sdd12   8:60   0   140G  0 part
>
>
>   Pre-requisites:
> ceph osd set noout;
> apt-get install uuid-runtime;
>
>
>   for ID in `seq 24 35`; do
> UUID=`uuidgen`;
> OSD_SECRET=`ceph-authtool --gen-print-key`;
> DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
> echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i
> - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
> mkdir /var/lib/ceph/osd/ceph-$ID;
> mkfs.xfs $DEVICE;
> mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
> ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
> --name osd.$ID --add-key $OSD_SECRET;
> ceph-osd -i $ID --mkfs --osd-uuid $UUID;
> chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
> systemctl enable ceph-osd@$ID;
> systemctl start ceph-osd@$ID;
>   done
>
>
> Once up we imported previous exports of empty head files in to 'real' OSDs:
>   kvm5b:
> systemctl stop ceph-osd@8;
> ceph-objectstore-tool --op import --pgid 7.4s0 --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
> chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
> systemctl start ceph-osd@8;
>   kvm5f:
> systemctl stop ceph-osd@23;
> ceph-objectstore-tool --op import --pgid 7.fs0 --data-path
> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
> chown ceph:ceph -R /var/lib/ceph/osd/ceph-23;
> systemctl start ceph-osd@23;
>
>
> Bulk import previously exported objects:
> cd /var/lib/vz/template/ssd_recovery;
> for FILE in `ls -1A osd*_*.export | grep -Pv '^osd(8|23)_'`; do
>   OSD=`echo $FILE | perl -pe 's/^osd(\d+).*/\1/'`;
>   PGID=`echo $FILE | perl -pe 's/^osd\d+_(.*?).export/\1/g'`;
>   echo -e "systemctl stop ceph-osd@$OSD\t ceph-objectstore-tool --op
> import --pgid $PGID --data-path /var/lib/ceph/osd/ceph-$OSD --journal-path
> /var/lib/ceph/osd/ceph-$OSD/journal --file /var/lib/vz/template/ssd_
> recovery/osd"$OSD"_$PGID.export";
> done | sort
>
> Sample output (this will wrap):
> systemctl stop ceph-osd@27   ceph-objectstore-tool --op import --pgid
> 7.4s3 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.4s3.export
> systemctl stop ceph-osd@27   ceph-objectstore-tool --op import --pgid
> 7.fs5 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.fs5.export
> systemctl stop ceph-osd@30   ceph-objectstore-tool --op import --pgid
> 7.fs4 --data-path /var/lib/ceph/osd/ceph-30 --journal-path
> /var/lib/ceph/osd/ceph-30/journal --file /var/lib/vz/template/ssd_
> recovery/osd30_7.fs4.export
> systemctl stop ceph-osd@31   ceph-objectstore-tool --op import --pgid
> 7.4s2 --data-path /var/lib/ceph/osd/ceph-31 --journal-path
> /var/lib/ceph/osd/ceph-31/journal --file /var/lib/vz/template/ssd_
> recovery/osd31_7.4s2.export
> systemctl stop ceph-osd@32   ceph-objectstore-tool

Re: [ceph-users] Ubuntu 17.10, Luminous - which repository

2017-12-08 Thread Sean Redmond

Hi,

Did you see this http://docs.ceph.com/docs/master/install/get-packages/ It
contains details on how to add the apt repo's provided by the ceph project.

You may also want to consider 16.04 if this is a production install as
17.10 has a pretty short life (
https://www.ubuntu.com/info/release-end-of-life)

Thanks

On Fri, Dec 8, 2017 at 12:08 PM, Markus Goldberg  wrote:

> Hi,
> which repository should i take for Luminous under Ubuntu 17.10?
> I want a total new install with ceph-deploy, no upgrade.
> Is there any good tutorial for fresh install incl. bluestor?
>
>
> --
> MfG,
>   Markus Goldberg
>
> --
> Markus Goldberg   Universität Hildesheim
>   Rechenzentrum
> Tel +49 5121 88392822 Universitätsplatz 1
> ,
> D-31141 Hildesheim, Germany
> Fax +49 5121 88392823 email goldb...@uni-hildesheim.de
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HEALTH_ERR : PG_DEGRADED_FULL

2017-12-07 Thread Sean Redmond

Can you share - ceph osd tree / crushmap and `ceph health detail` via
pastebin?

Is recovery stuck or it is on going?

On 7 Dec 2017 07:06, "Karun Josy"  wrote:

> Hello,
>
> I am seeing health error in our production cluster.
>
>  health: HEALTH_ERR
> 1105420/11038158 objects misplaced (10.015%)
> Degraded data redundancy: 2046/11038158 objects degraded
> (0.019%), 102 pgs unclean, 2 pgs degraded
> Degraded data redundancy (low space): 4 pgs backfill_toofull
>
> The cluster space was running out.
> So I was in the process of adding a disk.
> Since I got this error, we deleted some of the data to create more space.
>
>
> This is the current usage, after clearing some space, earlier 3 disks were
> at 85%.
> 
>
> $ ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  PGS
>  0   ssd 1.86469  1.0  1909G  851G 1058G 44.59 0.78 265
> 16   ssd 0.87320  1.0   894G  361G  532G 40.43 0.71 112
>  1   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 163
>  2   ssd 0.87320  1.0   894G  490G  403G 54.84 0.96 145
> 17   ssd 0.87320  1.0   894G  163G  731G 18.24 0.32  58
>  3   ssd 0.87320  1.0   894G  616G  277G 68.98 1.21 176
>  4   ssd 0.87320  1.0   894G  593G  300G 66.42 1.17 179
>  5   ssd 0.87320  1.0   894G  419G  474G 46.89 0.82 130
>  6   ssd 0.87320  1.0   894G  422G  472G 47.21 0.83 129
>  7   ssd 0.87320  1.0   894G  397G  496G 44.50 0.78 115
>  8   ssd 0.87320  1.0   894G  656G  237G 73.44 1.29 184
>  9   ssd 0.87320  1.0   894G  560G  333G 62.72 1.10 170
> 10   ssd 0.87320  1.0   894G  623G  270G 69.78 1.22 183
> 11   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 172
> 12   ssd 0.87320  1.0   894G  610G  283G 68.29 1.20 172
> 13   ssd 0.87320  1.0   894G  597G  296G 66.87 1.17 180
> 14   ssd 0.87320  1.0   894G  597G  296G 66.79 1.17 168
> 15   ssd 0.87320  1.0   894G  610G  283G 68.32 1.20 179
> TOTAL 17110G 9746G 7363G 56.97
>
> How to fix this? Please help!
>
> Karun
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous v12.2.2 released

2017-12-05 Thread Sean Redmond

Hi Florent,

I have always done mons ,osds, rgw, mds, clients

Packages that don't auto restart services on update IMO is a good thing.

Thanks

On Tue, Dec 5, 2017 at 3:26 PM, Florent B  wrote:

> On Debian systems, upgrading packages does not restart services !
>
> On 05/12/2017 16:22, Oscar Segarra wrote:
>
> I have executed:
>
> yum upgrade -y ceph
>
> On each node and everything has worked fine...
>
> 2017-12-05 16:19 GMT+01:00 Florent B :
>
>> Upgrade procedure is OSD or MON first ?
>>
>> There was a change on Luminous upgrade about it.
>>
>>
>> On 01/12/2017 18:34, Abhishek Lekshmanan wrote:
>> > We're glad to announce the second bugfix release of Luminous v12.2.x
>> > stable release series. It contains a range of bug fixes and a few
>> > features across Bluestore, CephFS, RBD & RGW. We recommend all the users
>> > of 12.2.x series update.
>> >
>> > For more detailed information, see the blog[1] and the complete
>> > changelog[2]
>> >
>> > A big thank you to everyone for the continual feedback & bug
>> > reports we've received over this release cycle
>> >
>> > Notable Changes
>> > ---
>> > * Standby ceph-mgr daemons now redirect requests to the active
>> messenger, easing
>> >   configuration for tools & users accessing the web dashboard, restful
>> API, or
>> >   other ceph-mgr module services.
>> > * The prometheus module has several significant updates and
>> improvements.
>> > * The new balancer module enables automatic optimization of CRUSH
>> weights to
>> >   balance data across the cluster.
>> > * The ceph-volume tool has been updated to include support for
>> BlueStore as well
>> >   as FileStore. The only major missing ceph-volume feature is dm-crypt
>> support.
>> > * RGW's dynamic bucket index resharding is disabled in multisite
>> environments,
>> >   as it can cause inconsistencies in replication of bucket indexes to
>> remote
>> >   sites
>> >
>> > Other Notable Changes
>> > -
>> > * build/ops: bump sphinx to 1.6 (issue#21717, pr#18167, Kefu Chai,
>> Alfredo Deza)
>> > * build/ops: macros expanding in spec file comment (issue#22250,
>> pr#19173, Ken Dreyer)
>> > * build/ops: python-numpy-devel build dependency for SUSE (issue#21176,
>> pr#17692, Nathan Cutler)
>> > * build/ops: selinux: Allow getattr on lnk sysfs files (issue#21492,
>> pr#18650, Boris Ranto)
>> > * build/ops: Ubuntu amd64 client can not discover the ubuntu arm64 ceph
>> cluster (issue#19705, pr#18293, Kefu Chai)
>> > * core: buffer: fix ABI breakage by removing list _mempool member
>> (issue#21573, pr#18491, Sage Weil)
>> > * core: Daemons(OSD, Mon…) exit abnormally at injectargs command
>> (issue#21365, pr#17864, Yan Jun)
>> > * core: Disable messenger logging (debug ms = 0/0) for clients unless
>> overridden (issue#21860, pr#18529, Jason Dillaman)
>> > * core: Improve OSD startup time by only scanning for omap corruption
>> once (issue#21328, pr#17889, Luo Kexue, David Zafman)
>> > * core: upmap does not respect osd reweights (issue#21538, pr#18699,
>> Theofilos Mouratidis)
>> > * dashboard: barfs on nulls where it expects numbers (issue#21570,
>> pr#18728, John Spray)
>> > * dashboard: OSD list has servers and osds in arbitrary order
>> (issue#21572, pr#18736, John Spray)
>> > * dashboard: the dashboard uses absolute links for filesystems and
>> clients (issue#20568, pr#18737, Nick Erdmann)
>> > * filestore: set default readahead and compaction threads for rocksdb
>> (issue#21505, pr#18234, Josh Durgin, Mark Nelson)
>> > * librbd: object map batch update might cause OSD suicide timeout
>> (issue#21797, pr#18416, Jason Dillaman)
>> > * librbd: snapshots should be created/removed against data pool
>> (issue#21567, pr#18336, Jason Dillaman)
>> > * mds: make sure snap inode’s last matches its parent dentry’s last
>> (issue#21337, pr#17994, “Yan, Zheng”)
>> > * mds: sanitize mdsmap of removed pools (issue#21945, issue#21568,
>> pr#18628, Patrick Donnelly)
>> > * mgr: bulk backport of ceph-mgr improvements (issue#21594, issue#17460,
>> >   issue#21197, issue#21158, issue#21593, pr#18675, Benjeman Meekhof,
>> >   Sage Weil, Jan Fajerski, John Spray, Kefu Chai, My Do, Spandan Kumar
>> Sahu)
>> > * mgr: ceph-mgr gets process called “exe” after respawn (issue#21404,
>> pr#18738, John Spray)
>> > * mgr: fix crashable DaemonStateIndex::get calls (issue#17737,
>> pr#18412, John Spray)
>> > * mgr: key mismatch for mgr after upgrade from jewel to luminous(dev)
>> (issue#20950, pr#18727, John Spray)
>> > * mgr: mgr status module uses base 10 units (issue#21189, issue#21752,
>> pr#18257, John Spray, Yanhu Cao)
>> > * mgr: mgr[zabbix] float division by zero (issue#21518, pr#18734, John
>> Spray)
>> > * mgr: Prometheus crash when update (issue#21253, pr#17867, John Spray)
>> > * mgr: prometheus module generates invalid output when counter names
>> contain non-alphanum characters (issue#20899, pr#17868, John Spray, Jeremy
>> H Austin)
>> > * mgr: Quieten scary RuntimeError from restful module on startup

Re: [ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy

On Tue, 28 Nov 2017, Yehuda Sadeh-Weinraub said:
> rgw has a sync modules framework that allows you to write your own
> sync plugins. The system identifies objects changes and triggers

I am not a C++ developer though.

http://ceph.com/rgw/new-luminous-rgw-metadata-search/ says

"Stay tuned in future releases for sync plugins that replicate data to (or even 
from) cloud storage services like S3!"

But then it looks like you wrote that blog post!  I guess I'll stay tuned


Sean


> callbacks that can then act on those changes. For example, the
> metadata search feature that was added recently is using this to send
> objects metadata into elasticsearch for indexing.
> 
> Yehuda
> 
> On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
> > Hi,
> >
> >
> > http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
> > notifications are not supported.  I'd like something like object 
> > notifications so that we can backup new objects in realtime, instead of 
> > trawling the whole object list for what's changed.
> >
> > Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
> > which traps requests and updates redis - 
> > https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?
> >
> >
> > Thanks,
> >
> > Sean Purdy
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy

Hi,


http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object notifications 
are not supported.  I'd like something like object notifications so that we can 
backup new objects in realtime, instead of trawling the whole object list for 
what's changed.

Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
which traps requests and updates redis - 
https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Sean Redmond

Hi,

Is it possible to add new empty osds to your cluster? Or do these also
crash out?

Thanks

On 18 Nov 2017 14:32, "Ashley Merrick"  wrote:

> Hello,
>
>
>
> So seems noup does not help.
>
>
>
> Still have the same error :
>
>
>
> 2017-11-18 14:26:40.982827 7fb4446cd700 -1 *** Caught signal (Aborted)
> **in thread 7fb4446cd700 thread_name:tp_peering
>
>
>
> ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous
> (stable)
>
> 1: (()+0xa0c554) [0x56547f500554]
>
> 2: (()+0x110c0) [0x7fb45cabe0c0]
>
> 3: (gsignal()+0xcf) [0x7fb45ba85fcf]
>
> 4: (abort()+0x16a) [0x7fb45ba873fa]
>
> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x28e) [0x56547f547f0e]
>
> 6: (PG::start_peering_interval(std::shared_ptr,
> std::vector > const&, int, std::vector std::allocator > const&, int, ObjectStore::Transaction*)+0x1569)
> [0x56547f029ad9]
>
> 7: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x479)
> [0x56547f02a099]
>
> 8: (boost::statechart::simple_state PG::RecoveryState::RecoveryMachine, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_
> mode)0>::react_impl(boost::statechart::event_base const&, void
> const*)+0x188) [0x56547f06c6d8]
>
> 9: (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator, boost::statechart::null_
> exception_translator>::process_event(boost::statechart::event_base
> const&)+0x69) [0x56547f045549]
>
> 10: (PG::handle_advance_map(std::shared_ptr,
> std::shared_ptr, std::vector >&,
> int, std::vector >&, int, PG::RecoveryCtx*)+0x4a7)
> [0x56547f00e837]
>
> 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
> PG::RecoveryCtx*, std::set,
> std::less >, std::allocator
> > >*)+0x2e7) [0x56547ef56e67]
>
> 12: (OSD::process_peering_events(std::__cxx11::list std::allocator > const&, ThreadPool::TPHandle&)+0x1e4) [0x56547ef57cb4]
>
> 13: (ThreadPool::BatchWorkQueue::_void_process(void*,
> ThreadPool::TPHandle&)+0x2c) [0x56547efc2a0c]
>
> 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x56547f54ef28]
>
> 15: (ThreadPool::WorkThread::entry()+0x10) [0x56547f5500c0]
>
> 16: (()+0x7494) [0x7fb45cab4494]
>
> 17: (clone()+0x3f) [0x7fb45bb3baff]
>
> NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
>
>
> I guess even with noup the OSD/PG still has the peer with the other PG’s
> which is the stage that causes the failure, most OSD’s seem to stay up for
> about 30 seconds, and every time it’s a different PG listed on the failure.
>
>
>
> ,Ashley
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* 18 November 2017 22:19
> *To:* Ashley Merrick 
> *Cc:* Eric Nelson ; ceph-us...@ceph.com
> *Subject:* Re: [ceph-users] OSD Random Failures - Latest Luminous
>
>
>
> Does letting the cluster run with noup for a while until all down disks
> are idle, and then letting them come in help at all?  I don't know your
> specific issue and haven't touched bluestore yet, but that is generally
> sound advice when is won't start.
>
> Also is there any pattern to the osds that are down? Common PGs, common
> hosts, common ssds, etc?
>
>
>
> On Sat, Nov 18, 2017, 7:08 AM Ashley Merrick 
> wrote:
>
> Hello,
>
>
>
> Any further suggestions or work around’s from anyone?
>
>
>
> Cluster is hard down now with around 2% PG’s offline, on the occasion able
> to get an OSD to start for a bit but then will seem to do some peering and
> again crash with “*** Caught signal (Aborted) **in thread 7f3471c55700
> thread_name:tp_peering”
>
>
>
> ,Ashley
>
>
>
> *From:* Ashley Merrick
>
> *Sent:* 16 November 2017 17:27
> *To:* Eric Nelson 
>
> *Cc:* ceph-us...@ceph.com
> *Subject:* Re: [ceph-users] OSD Random Failures - Latest Luminous
>
>
>
> Hello,
>
>
>
> Good to hear it's not just me, however have a cluster basically offline
> due to too many OSD's dropping for this issue.
>
>
>
> Anybody have any suggestions?
>
>
>
> ,Ashley
> --
>
> *From:* Eric Nelson 
> *Sent:* 16 November 2017 00:06:14
> *To:* Ashley Merrick
> *Cc:* ceph-us...@ceph.com
> *Subject:* Re: [ceph-users] OSD Random Failures - Latest Luminous
>
>
>
> I've been seeing these as well on our SSD cachetier that's been ravaged by
> disk failures as of late Same tp_peering assert as above even running
> luminous branch from git.
>
>
>
> Let me know if you have a bug filed I can +1 or have found a workaround.
>
>
>
> E
>
>
>
> On Wed, Nov 15, 2017 at 10:25 AM, Ashley Merrick 
> wrote:
>
> Hello,
>
>
>
> After replacing a single OSD disk due to a failed disk I am now seeing 2-3
> OSD’s randomly stop and fail to start, do a boot loop get to load_pgs and
> then fail with the following (I tried setting OSD log’s to 5/5 but didn’t
> get any extra lines around the error just more information pre boot.
>
>
>
> Could this be a certain PG causing the

[ceph-users] luminous ubuntu 16.04 HWE (4.10 kernel). ceph-disk can't prepare a disk

2017-10-22 Thread Sean Sullivan

On freshly installed ubuntu 16.04 servers with the HWE kernel selected
(4.10). I can not use ceph-deploy or ceph-disk to provision osd.


 whenever I try I get the following::

ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys
--bluestore --cluster ceph --fs-type xfs -- /dev/sdy
command: Running command: /usr/bin/ceph-osd --cluster=ceph
--show-config-value=fsid
get_dm_uuid: get_dm_uuid /dev/sdy uuid path is /sys/dev/block/65:128/dm/uuid
set_type: Will colocate block with data on /dev/sdy
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_size
[command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_db_size
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_size
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_wal_size
get_dm_uuid: get_dm_uuid /dev/sdy uuid path is /sys/dev/block/65:128/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdy uuid path is /sys/dev/block/65:128/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdy uuid path is /sys/dev/block/65:128/dm/uuid
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in 
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5704, in
run
main(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5655, in
main
args.func(args)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2091, in
main
Prepare.factory(args).prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2080, in
prepare
self._prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2154, in
_prepare
self.lockbox.prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2842, in
prepare
verify_not_in_use(self.args.lockbox, check_partitions=True)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 950, in
verify_not_in_use
raise Error('Device is mounted', partition)
ceph_disk.main.Error: Error: Device is mounted: /dev/sdy5

unmounting the disk does not seem to help either. I'm assuming something is
triggering too early but i'm not sure how to delay or figure that out.

has anyone deployed on xenial with the 4.10 kernel? Am I missing something
important?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] zombie partitions, ceph-disk failure.

2017-10-20 Thread Sean Sullivan

I am trying to stand up ceph (luminous) on 3 72 disk supermicro servers
running ubuntu 16.04 with HWE enabled (for a 4.10 kernel for cephfs). I am
not sure how this is possible but even though I am running the following
line to wipe all disks of their partitions, once I run ceph-disk to
partition the drive udev or device mapper automatically mounts a lockbox
partition and ceph-disk fails::


wipe line::

for disk in $(lsblk --output MODEL,NAME | grep -iE "HGST|SSDSC2BA40" | awk
'{print $NF}'); do sgdisk -Z /dev/${disk}; dd if=/dev/zero of=/dev/${disk}
bs=1024 count=1; ceph-disk zap /dev/${disk}; sgdisk -o /dev/${disk};
sgdisk -G /dev/${disk}; done

ceph-disk line:
cephcmd="ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir
/etc/ceph/dmcrypt-keys --block.db /dev/${pssd}  --block.wal /dev/${pssd}
--bluestore --cluster ceph --fs-type xfs
-- /dev/${phdd}"


prior to running that on a single disk all of the drives are empty except
the OS drives

root@kg15-1:/home/ceph-admin# lsblk --fs
NAMEFSTYPELABELUUID
 MOUNTPOINT
sdbu
sdy
sdam
sdbb
sdf
sdau
sdab
sdbk
sdo
sdbs
sdw
sdak
sdd
sdas
sdbi
sdm
sdbq
sdu
sdai
sdb
sdaq
sdbg
sdk
sdaz
sds
sdag
sdbe
sdi
sdax
sdq
sdae
sdbn
sdbv
├─sdbv3 linux_raid_member kg15-1:2 664f69b7-2dd7-7012-75e3-a920ba7416b8
│ └─md2 ext4   6696d9f5-3385-47cb-8e8b-058637f8a1b8 /
├─sdbv1 linux_raid_member kg15-1:0 c4c78d8b-5c0b-6d51-d0a4-ecd40432f98c
│ └─md0 ext4   44f76d8d-0333-49a7-ab89-dafe70f6f12d
/boot
└─sdbv2 linux_raid_member kg15-1:1 e3a74474-502c-098c-9415-7b99abcbd2e1
  └─md1 swap   37e071a9-9361-456b-a740-87ddc99a8260
[SWAP]
sdz
sdan
sdbc
sdg
sdav
sdac
sdbl
sdbt
sdx
sdal
sdba
sde
sdat
sdaa
sdbj
sdn
sdbr
sdv
sdaj
sdc
sdar
sdbh
sdl
sdbp
sdt
sdah
sda
├─sda2  linux_raid_member kg15-1:1 e3a74474-502c-098c-9415-7b99abcbd2e1
│ └─md1 swap   37e071a9-9361-456b-a740-87ddc99a8260
[SWAP]
├─sda3  linux_raid_member kg15-1:2 664f69b7-2dd7-7012-75e3-a920ba7416b8
│ └─md2 ext4   6696d9f5-3385-47cb-8e8b-058637f8a1b8 /
└─sda1  linux_raid_member kg15-1:0 c4c78d8b-5c0b-6d51-d0a4-ecd40432f98c
  └─md0 ext4   44f76d8d-0333-49a7-ab89-dafe70f6f12d
/boot
sdap
sdbf
sdj
sday
sdr
sdaf
sdbo
sdao
sdbd
sdh
sdaw
sdp
sdad
sdbm

-

but as soon as I run that cephcmd (which worked prior to upgrading to the
4.10 kernel:

ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys
--block.db /dev/sdd  --block.wal /dev/sdd  --bluestore --cluster ceph
--fs-type xfs -- /dev/sdbu
command: Running command: /usr/bin/ceph-osd --cluster=ceph
--show-config-value=fsid
get_dm_uuid: get_dm_uuid /dev/sdbu uuid path is
/sys/dev/block/68:128/dm/uuid
set_type: Will colocate block with data on /dev/sdbu
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_size
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_db_size
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_size
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd.
--lookup bluestore_block_wal_size
get_dm_uuid: get_dm_uuid /dev/sdbu uuid path is
/sys/dev/block/68:128/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdbu uuid path is
/sys/dev/block/68:128/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdbu uuid path is
/sys/dev/block/68:128/dm/uuid
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in 
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5704, in
run
main(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5655, in
main
args.func(args)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2091, in
main
Prepare.factory(args).prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2080, in
prepare
self._prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2154, in
_prepare
self.lockbox.prepare()
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2842, in
prepare
verify_not_in_use(self.args.lockbox, check_partitions=True)
  File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 950, in
verify_not_in_use
raise Error('Device is mounted', partition)
ceph_disk.main.Error: Error: Device is mounted: /dev/sdbu5


So it says sdbu is mounted. I unmount it and again it errors saying it
can't create the partition it just tried to create.

root@kg15-1:/# mount | grep sdbu
/dev/sdbu5 on
/var/lib/ceph/osd-lockbox/0e3baee9-a5dd-46f0-ae53-0e7dd2b0b257 type ext4
(rw,relatime,stripe=4,

[ceph-users] collectd doesn't push all stats

2017-10-20 Thread Sean Purdy

Hi,


The default collectd ceph plugin seems to parse the output of "ceph daemon 
 perf dump" and generate graphite output.  However, I see more 
fields in the dump than in collectd/graphite

Specifically I see get stats for rgw (ceph_rate-Client_rgw_nodename_get) but 
not put stats (e.g. ceph_rate-Client_rgw_nodename_put)

e.g. (abbreviated) dump says:
{
"client.rgw.store01": {
"req": 164927606,
"failed_req": 43482,
"get": 162727054,
"put": 917996,
}
}
but put stats don't show up.

Anybody know how to tweak the plugin to select the stats you want to see?  e.g. 
monitor paxos stuff doesn't show up either.  Perhaps there's a deliberate 
limitation somewhere, but it seems odd to show "get" and not "put" request 
rates.

(collectd 5.7.1 on debian stretch, ceph luminous 12.2.1)


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-19 Thread Sean Sullivan

I have tried using ceph-disk directly and i'm running into all sorts of
trouble but I'm trying my best. Currently I am using the following cobbled
script which seems to be working:
https://github.com/seapasulli/CephScripts/blob/master/provision_storage.sh
I'm at 11 right now. I hope this works.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow requests

2017-10-19 Thread Sean Purdy

Are you using radosgw?  I found this page useful when I had a similar issue:

http://www.osris.org/performance/rgw.html


Sean

On Wed, 18 Oct 2017, Ольга Ухина said:
> Hi!
> 
> I have a problem with ceph luminous 12.2.1. It was upgraded from kraken,
> but I'm not sure if it was a problem in kraken.
> I have slow requests on different OSDs on random time (for example at
> night, but I don't see any problems at the time of problem with disks, CPU,
> there is possibility of network problem at night). During daytime I have
> not this problem.
> Almost all requests are nearly 30 seconds, so I receive warnings like this:
> 
> 2017-10-18 01:20:26.147758 mon.st3 mon.0 10.192.1.78:6789/0 22686 : cluster
> [WRN] Health check failed: 1 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:28.025315 mon.st3 mon.0 10.192.1.78:6789/0 22687 : cluster
> [WRN] overall HEALTH_WARN 1 slow requests are blocked > 32 sec
> 2017-10-18 01:20:32.166758 mon.st3 mon.0 10.192.1.78:6789/0 22688 : cluster
> [WRN] Health check update: 38 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:6789/0 22689 : cluster
> [WRN] Health check update: 49 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2017-10-18 01:20:38.727421 osd.23 osd.23 10.192.1.158:6840/3659 1758 :
> cluster [WRN] 27 slow requests, 5 included below; oldest blocked for >
> 30.839843 secs
> 2017-10-18 01:20:38.727425 osd.23 osd.23 10.192.1.158:6840/3659 1759 :
> cluster [WRN] slow request 30.814060 seconds old, received at 2017-10-18
> 01:20:07.913300: osd_op(client.12464272.1:56610561 31.410dd55
> 5 31:aaabb082:::rbd_data.7b3e22ae8944a.00012e2c:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 2977792~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 39
> 2017-10-18 01:20:38.727431 osd.23 osd.23 10.192.1.158:6840/3659 1760 :
> cluster [WRN] slow request 30.086589 seconds old, received at 2017-10-18
> 01:20:08.640771: osd_repop(client.12464806.1:17326170 34.242
> e10926/10860 34:426def95:::rbd_data.acdc9238e1f29.1231:head v
> 10926'4976910) currently write_thread_in_journal_buffer
> 2017-10-18 01:20:38.727433 osd.23 osd.23 10.192.1.158:6840/3659 1761 :
> cluster [WRN] slow request 30.812569 seconds old, received at 2017-10-18
> 01:20:07.914791: osd_repop(client.12464272.1:56610570 31.1eb
> e10926/10848 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head v
> 10926'135331) currently write_thread_in_journal_buffer
> 2017-10-18 01:20:38.727436 osd.23 osd.23 10.192.1.158:6840/3659 1762 :
> cluster [WRN] slow request 30.807328 seconds old, received at 2017-10-18
> 01:20:07.920032: osd_op(client.12464272.1:56610586 31.3f2f2e2
> 6 31:6474f4fc:::rbd_data.7b3e22ae8944a.00013673:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 12288~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 30
> 2017-10-18 01:20:38.727438 osd.23 osd.23 10.192.1.158:6840/3659 1763 :
> cluster [WRN] slow request 30.807253 seconds old, received at 2017-10-18
> 01:20:07.920107: osd_op(client.12464272.1:56610588 31.2d23291
> 8 31:1894c4b4:::rbd_data.7b3e22ae8944a.00013a5b:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 700416~4096]
> snapc 0=[] ondisk+write e10926) currently sub_op_commit_rec from 28
> 2017-10-18 01:20:38.006142 osd.39 osd.39 10.192.1.159:6808/3323 1501 :
> cluster [WRN] 2 slow requests, 2 included below; oldest blocked for >
> 30.092091 secs
> 2017-10-18 01:20:38.006153 osd.39 osd.39 10.192.1.159:6808/3323 1502 :
> cluster [WRN] slow request 30.092091 seconds old, received at 2017-10-18
> 01:20:07.913962: osd_op(client.12464272.1:56610570 31.e683e9e
> b 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write 143360~4096]
> snapc 0=[] ondisk+write e10926) currently op_applied
> 2017-10-18 01:20:38.006159 osd.39 osd.39 10.192.1.159:6808/3323 1503 :
> cluster [WRN] slow request 30.086123 seconds old, received at 2017-10-18
> 01:20:07.919930: osd_op(client.12464272.1:56610587 31.e683e9eb
> 31:d797c167:::rbd_data.7b3e22ae8944a.00013828:head [set-alloc-hint
> object_size 4194304 write_size 4194304,write 3256320~4096] snapc 0=[]
> ondisk+write e10926) currently op_applied
> 2017-10-18 01:20:38.374091 osd.38 osd.38 10.192.1.159:6857/236992 1387 :
> cluster [WRN] 2 slow requests, 2 included below; oldest blocked for >
> 30.449318 secs
> 2017-10-18 01:20:38.374107 osd.38 osd.38 10.192.1.159:6857/236992 1388 :
> cluster [WRN] slow request 30.449318 seconds old, received at 2017-10-18
> 01:20:07.924670: osd_op(client.12464272.1:56610603 31.fe179bed
>

[ceph-users] Luminous can't seem to provision more than 32 OSDs per server

2017-10-18 Thread Sean Sullivan

I am trying to install Ceph luminous (ceph version 12.2.1) on 4 ubuntu
16.04 servers each with 74 disks, 60 of which are HGST 7200rpm sas drives::

HGST HUS724040AL sdbv  sas
root@kg15-2:~# lsblk --output MODEL,KNAME,TRAN | grep HGST | wc -l
60

I am trying to deploy them all with ::
a line like the following::
ceph-deploy osd zap kg15-2:(sas_disk)
ceph-deploy osd create --dmcrypt --bluestore --block-db (ssd_partition)
kg15-2:(sas_disk)

This didn't seem to work at all so I am now trying to troubleshoot by just
provisioning the sas disks::
ceph-deploy osd create --dmcrypt --bluestore kg15-2:(sas_disk)

Across all 4 hosts I can only seem to get 32 OSDs up and after that the
rest fail::
root@kg15-1:~# ps faux | grep [c]eph-osd' | wc -l
32
root@kg15-2:~# ps faux | grep [c]eph-osd' | wc -l
32
root@kg15-3:~# ps faux | grep [c]eph-osd' | wc -l
32

The ceph-deploy tool doesn't seem to log or notice any failure but the host
itself shows the following in the osd log:

2017-10-17 23:05:43.121016 7f8ca75c9e00  0 set uid:gid to 64045:64045
(ceph:ceph)
2017-10-17 23:05:43.121040 7f8ca75c9e00  0 ceph version 12.2.1 (
3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 69926
2017-10-17 23:05:43.123939 7f8ca75c9e00  1
bluestore(/var/lib/ceph/tmp/mnt.8oIc5b)
mkfs path /var/lib/ceph/tmp/mnt.8oIc5b
2017-10-17 23:05:43.124037 7f8ca75c9e00  1 bdev create path
/var/lib/ceph/tmp/mnt.8oIc5b/block type kernel
2017-10-17 23:05:43.124045 7f8ca75c9e00  1 bdev(0x564b7a05e900
/var/lib/ceph/tmp/mnt.8oIc5b/block) open path /var/lib/ceph/tmp/mnt.8oIc5b/
block
2017-10-17 23:05:43.124231 7f8ca75c9e00  1 bdev(0x564b7a05e900
/var/lib/ceph/tmp/mnt.8oIc5b/block) open size 4000668520448 (0x3a37a6d1000,
3725 GB) block_size 4096 (4096 B) rotational
2017-10-17 23:05:43.124296 7f8ca75c9e00  1
bluestore(/var/lib/ceph/tmp/mnt.8oIc5b)
_set_cache_sizes max 0.5 < ratio 0.99
2017-10-17 23:05:43.124313 7f8ca75c9e00  1
bluestore(/var/lib/ceph/tmp/mnt.8oIc5b)
_set_cache_sizes cache_size 1073741824 meta 0.5 kv 0.5 data 0
2017-10-17 23:05:43.124349 7f8ca75c9e00 -1
bluestore(/var/lib/ceph/tmp/mnt.8oIc5b)
_open_db /var/lib/ceph/tmp/mnt.8oIc5b/block.db link target doesn't exist
2017-10-17 23:05:43.124368 7f8ca75c9e00  1 bdev(0x564b7a05e900
/var/lib/ceph/tmp/mnt.8oIc5b/block) close
2017-10-17 23:05:43.402165 7f8ca75c9e00 -1
bluestore(/var/lib/ceph/tmp/mnt.8oIc5b)
mkfs failed, (2) No such file or directory
2017-10-17 23:05:43.402185 7f8ca75c9e00 -1 OSD::mkfs: ObjectStore::mkfs
failed with error (2) No such file or directory
2017-10-17 23:05:43.402258 7f8ca75c9e00 -1  ** ERROR: error creating empty
object store in /var/lib/ceph/tmp/mnt.8oIc5b: (2) No such file or directory


I have a few questions. I am not sure where to start troubleshooting so I
have a few questions.

1.) Anyone have any idea on why 32?
2.) Is there a good guide / outline on how to get the benefit of storing
the keys in the monitor while still having ceph more or less manage the
drives but provisioning the drives without ceph-deploy? I looked at the
manual deployment long and short form and it doesn't mention dmcrypt or
bluestore at all. I know I can use crypttab and cryptsetup to do this and
then give ceph-disk the path to the mapped device but I would prefer to
keep as much management in ceph as possible if I could.  (mailing list
thread :: https://www.mail-archive.com/ceph-users@lists.ceph.com/
msg38575.html )

3.) Ideally I would like to provision the drives with the DB on the SSD.
(or would it be better to make a cache tier? I read on a reddit thread that
the tiering in ceph isn't being developed anymore is it still worth it?)

Sorry for the bother and thanks for all the help!!!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread Sean Purdy

Hi,


Is there any way that radosgw can ping something when a file is removed or 
added to a bucket?

Or use its sync facility to sync files to AWS/Google buckets?

Just thinking about backups.  What do people use for backups?  Been looking at 
rclone.


Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New OSD missing from part of osd crush tree

2017-09-29 Thread Sean Purdy

On Thu, 10 Aug 2017, John Spray said:
> On Thu, Aug 10, 2017 at 4:31 PM, Sean Purdy  wrote:
> > Luminous 12.1.1 rc

And 12.2.1 stable

> > We added a new disk and did:

> > That worked, created osd.18, OSD has data.
> >
> > However, mgr output at http://localhost:7000/servers showed
> > osd.18 under a blank hostname and not e.g. on the node we attached it to.
> 
> Don't worry about this part.  It's a mgr bug that it sometimes fails
> to pick up the hostname for a service
> (http://tracker.ceph.com/issues/20887)
> 
> John

Thanks.  This still happens in 12.2.1 (I notice the bug isn't closed).  mgrs 
have been restarted. It is consistently the same OSD that mgr can't find a 
hostname for.  I'd have thought if it were a race condition, then different 
OSDs would show up detached.

Oh well, no biggie right now.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-28 Thread Sean Purdy

On Thu, 28 Sep 2017, Matthew Vernon said:
> Hi,
> 
> TL;DR - the timeout setting in ceph-disk@.service is (far) too small - it
> needs increasing and/or removing entirely. Should I copy this to ceph-devel?

Just a note.  Looks like debian stretch luminous packages have a 10_000 second 
timeout:

from /lib/systemd/system/ceph-disk@.service

Environment=CEPH_DISK_TIMEOUT=1
ExecStart=/bin/sh -c 'timeout $CEPH_DISK_TIMEOUT flock 
/var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose --log-stdout 
trigger --sync %f'
 

Sean

> On 15/09/17 16:48, Matthew Vernon wrote:
> >On 14/09/17 16:26, Götz Reinicke wrote:
> >>After that, 10 OSDs did not came up as the others. The disk did not get
> >>mounted and the OSD processes did nothing … even after a couple of
> >>minutes no more disks/OSDs showed up.
> >
> >I'm still digging, but AFAICT it's a race condition in startup - in our
> >case, we're only seeing it if some of the filesystems aren't clean. This
> >may be related to the thread "Very slow start of osds after reboot" from
> >August, but I don't think any conclusion was reached there.
> 
> This annoyed me enough that I went off to find the problem :-)
> 
> On systemd-enabled machines[0] ceph disks are activated by systemd's
> ceph-disk@.service, which calls:
> 
> /bin/sh -c 'timeout 120 flock /var/lock/ceph-disk-$(basename %f)
> /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
> 
> ceph-disk trigger --sync calls ceph-disk activate which (among other things)
> mounts the osd fs (first in a temporary location, then in /var/lib/ceph/osd/
> once it's extracted the osd number from the fs). If the fs is unclean, XFS
> auto-recovers before mounting (which takes time - range 2-25s for our 6TB
> disks) Importantly, there is a single global lock file[1] so only one
> ceph-disk activate can be doing this at once.
> 
> So, each fs is auto-recovering one at at time (rather than in parallel), and
> once the elapsed time gets past 120s, timeout kills the flock, systemd kills
> the cgroup, and no more OSDs start up - we typically find a few fs mounted
> in /var/lib/ceph/tmp/mnt.. systemd keeps trying to start the remaining
> osds (via ceph-osd@.service), but their fs isn't in the correct place, so
> this never works.
> 
> The fix/workaround is to adjust the timeout value (edit the service file
> directly, or for style points write an override in /etc/systemd/system
> remembering you need a blank ExecStart line before your revised one).
> 
> Experimenting, one of our storage nodes with 60 6TB disks took 17m35s to
> start all its osds when started up with all fss dirty. So the current 120s
> is far too small (it's just about OK when all the osd fss are clean).
> 
> I think, though, that having the timeout at all is a bug - if something
> needs to time out under some circumstances, should it be at a lower layer,
> perhaps?
> 
> A couple of final points/asides, if I may:
> 
> ceph-disk trigger uses subprocess.communicate (via the command() function),
> which means it swallows the log output from ceph-disk activate (and only
> outputs it after that process finishes) - as well as producing confusing
> timestamps, this means that when systemd kills the cgroup, all the output
> from the ceph-disk activate command vanishes into the void. That made
> debugging needlessly hard. Better to let called processes like that output
> immediately?
> 
> Does each fs need mounting twice? could the osd be encoded in the partition
> label or similar instead?
> 
> Is a single global activation lock necessary? It slows startup down quite a
> bit; I see no reason why (at least in the one-osd-per-disk case) you
> couldn't be activating all the osds at once...
> 
> Regards,
> 
> Matthew
> 
> [0] I note, for instance, that /etc/init/ceph-disk.conf doesn't have the
> timeout, so presumably upstart systems aren't affected
> [1] /var/lib/ceph/tmp/ceph-disk.activate.lock at least on Ubuntu
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE. ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-21 Thread Sean Purdy

On Thu, 21 Sep 2017, Marc Roos said:
>  
> 
> In my case it was syncing, and was syncing slowly (hour or so?). You 
> should see this in the log file. I wanted to report this, because my 
> store.db is only 200MB, and I guess you want your monitors up and 
> running quickly.

Well I wondered about that, but if it can't talk to the monitor quorum leader, 
it's not going to start copying data.

And no new files had been added to this test cluster.

 
> I also noticed that when the 3rd monitor left the quorum, ceph -s 
> command was slow timing out. Probably trying to connect to the 3rd 
> monitor, but why? When this monitor is not in quorum.

There's a setting for client timeouts.  I forget where.
 

Sean
 
 
 
 
 
> -Original Message-
> From: Sean Purdy [mailto:s.pu...@cv-library.co.uk] 
> Sent: donderdag 21 september 2017 12:02
> To: Gregory Farnum
> Cc: ceph-users
> Subject: Re: [ceph-users] monitor takes long time to join quorum: 
> STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER
> 
> On Wed, 20 Sep 2017, Gregory Farnum said:
> > That definitely sounds like a time sync issue. Are you *sure* they 
> > matched each other?
> 
> NTP looked OK at the time.  But see below.
> 
> 
> > Is it reproducible on restart?
> 
> Today I did a straight reboot - and it was fine, no issues.
> 
> 
> The issue occurs after the machine is off for a number of hours, or has 
> been worked on in the BIOS for a number of hours and then booted.  And 
> then perhaps waited at the disk decrypt key prompt.
> 
> So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
> machines)
> 
> 
> Logs show a time change a few seconds after boot.  After boot it's 
> running NTP and within that 45 minute period the NTP state looks the 
> same as the other nodes in the (small) cluster.
> 
> How much drift is allowed between monitors?
> 
> 
> Logs say:
> 
> Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
> Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24) 
> ...
> Sep 20 09:46:44 store03 systemd[1]: Time has been changed Sep 20 
> 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
> 0xdd6ca972.c694801d does not match aorg 00. from 
> server@172.16.0.16 xmt 0xdd6ca974.0c5c18f
> 
> So system time was changed about 6 seconds after disks were 
> unlocked/boot proceeded.  But there was still 45 minutes of monitor 
> messages after that.  Surely the time should have converged sooner than 
> 45 minutes?
> 
> 
> 
> NTP from today, post-problem.  But ntpq at the time of the problem 
> looked just as OK:
> 
> store01:~$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 47 ms
> 
> store02$ ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 63 ms
> 
> store03:~$ sudo ntpstat
> synchronised to NTP server (172.16.0.19) at stratum 3
>time correct to within 63 ms
> 
> store03:~$ ntpq -p
>  remote   refid  st t when poll reach   delay   offset  
> jitter
> 
> ==
> +172.16.0.16 85.91.1.164  3 u  561 1024  3770.2870.554   
> 0.914
> +172.16.0.18 94.125.129.7 3 u  411 1024  3770.388   -0.331   
> 0.139
> *172.16.0.19 158.43.128.332 u  289 1024  3770.282   -0.005   
> 0.103
> 
> 
> Sean
> 
>  
> > On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy  
> wrote:
> > 
> > >
> > > Hi,
> > >
> > >
> > > Luminous 12.2.0
> > >
> > > Three node cluster, 18 OSD, debian stretch.
> > >
> > >
> > > One node is down for maintenance for several hours.  When bringing 
> > > it back up, OSDs rejoin after 5 minutes, but health is still 
> > > warning.  monitor has not joined quorum after 40 minutes and logs 
> > > show BADAUTHORIZER message every time the monitor tries to connect 
> to the leader.
> > >
> > > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
> > > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
> > > l=0).handle_connect_reply connect got BADAUTHORIZER
> > >
> > > Then after ~45 minutes monitor *does* join quorum.
> > >
> > > I'm presuming this isn't normal behaviour?  Or if it is, let me know 
> 
> > > and I won't worry.
> > >
> > > All three nodes are using ntp and look OK timewise.
> > >
> > >
> &

Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-21 Thread Sean Purdy

On Wed, 20 Sep 2017, Gregory Farnum said:
> That definitely sounds like a time sync issue. Are you *sure* they matched
> each other?

NTP looked OK at the time.  But see below.


> Is it reproducible on restart?

Today I did a straight reboot - and it was fine, no issues.


The issue occurs after the machine is off for a number of hours, or has been 
worked on in the BIOS for a number of hours and then booted.  And then perhaps 
waited at the disk decrypt key prompt.

So I'd suspect hardware clock drift at those times.  (Using Dell R720xd 
machines)


Logs show a time change a few seconds after boot.  After boot it's running NTP 
and within that 45 minute period the NTP state looks the same as the other 
nodes in the (small) cluster.

How much drift is allowed between monitors?


Logs say:

Sep 20 09:45:21 store03 ntp[2329]: Starting NTP server: ntpd.
Sep 20 09:45:21 store03 ntpd[2462]: proto: precision = 0.075 usec (-24)
...
Sep 20 09:46:44 store03 systemd[1]: Time has been changed
Sep 20 09:46:44 store03 ntpd[2462]: receive: Unexpected origin timestamp 
0xdd6ca972.c694801d does not match aorg 00. from 
server@172.16.0.16 xmt 0xdd6ca974.0c5c18f

So system time was changed about 6 seconds after disks were unlocked/boot 
proceeded.  But there was still 45 minutes of monitor messages after that.  
Surely the time should have converged sooner than 45 minutes?



NTP from today, post-problem.  But ntpq at the time of the problem looked just 
as OK:

store01:~$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 47 ms

store02$ ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ sudo ntpstat
synchronised to NTP server (172.16.0.19) at stratum 3
   time correct to within 63 ms

store03:~$ ntpq -p
 remote   refid  st t when poll reach   delay   offset  jitter
==
+172.16.0.16 85.91.1.164  3 u  561 1024  3770.2870.554   0.914
+172.16.0.18 94.125.129.7 3 u  411 1024  3770.388   -0.331   0.139
*172.16.0.19 158.43.128.332 u  289 1024  3770.282   -0.005   0.103


Sean

 
> On Wed, Sep 20, 2017 at 2:50 AM Sean Purdy  wrote:
> 
> >
> > Hi,
> >
> >
> > Luminous 12.2.0
> >
> > Three node cluster, 18 OSD, debian stretch.
> >
> >
> > One node is down for maintenance for several hours.  When bringing it back
> > up, OSDs rejoin after 5 minutes, but health is still warning.  monitor has
> > not joined quorum after 40 minutes and logs show BADAUTHORIZER message
> > every time the monitor tries to connect to the leader.
> >
> > 2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >>
> > 172.16.0.43:6812/2422 conn(0x5600720fb800 :-1
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
> > l=0).handle_connect_reply connect got BADAUTHORIZER
> >
> > Then after ~45 minutes monitor *does* join quorum.
> >
> > I'm presuming this isn't normal behaviour?  Or if it is, let me know and I
> > won't worry.
> >
> > All three nodes are using ntp and look OK timewise.
> >
> >
> > ceph-mon log:
> >
> > (.43 is leader, .45 is rebooted node, .44 is other live node in quorum)
> >
> > Boot:
> >
> > 2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0
> > (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process
> > (unknown), pid 2243
> >
> > 2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept connect_seq 3 vs existing csq=0 existing_state=STATE_CONNECTING
> > 2017-09-20 09:46:01.824723 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept we reset (peer sent cseq 3, 0x5600722c.cseq = 0), sending
> > RESETSESSION
> > 2017-09-20 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x56007244d000 :6789
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> > accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
> > 2017-09-20 09:46:01.828053 7f49e1b27700  0 -- 172.16.0.45:6789/0 >>
> > 172.16.0.44:6789/0 conn(0x5600722c :-1
> > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 l=0).process
> > missed message?  skipped from seq 0 to 552717734
> >
> > 2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789

Re: [ceph-users] Fwd: FileStore vs BlueStore

2017-09-20 Thread Sean Purdy

On Wed, 20 Sep 2017, Burkhard Linke said:
> Hi,
> 
> 
> On 09/20/2017 12:24 PM, Sean Purdy wrote:
> >On Wed, 20 Sep 2017, Burkhard Linke said:
> >>The main reason for having a journal with filestore is having a block device
> >>that supports synchronous writes. Writing to a filesystem in a synchronous
> >>way (e.g. including all metadata writes) results in a huge performance
> >>penalty.
> >>
> >>With bluestore the data is also stored on a block devices, and thus also
> >>allows to perform synchronous writes directly (given the backing storage is
> >>handling sync writes correctly and in a consistent way, e.g. no drive
> >>caches, bbu for raid controllers/hbas). And similar to the filestore journal
> >Our Bluestore disks are hosted on RAID controllers.  Should I set cache 
> >policy as WriteThrough for these disks then?
> 
> It depends on the setup and availability of a BBU. If you have a BBU and
> cache on the controller, using write back should be ok if you monitor the
> BBU state. To be on the safe side is using write through and live with the
> performance impact.

We do have BBU and cache and we do monitor state.  Thanks!

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: FileStore vs BlueStore

2017-09-20 Thread Sean Purdy

On Wed, 20 Sep 2017, Burkhard Linke said:
> The main reason for having a journal with filestore is having a block device
> that supports synchronous writes. Writing to a filesystem in a synchronous
> way (e.g. including all metadata writes) results in a huge performance
> penalty.
> 
> With bluestore the data is also stored on a block devices, and thus also
> allows to perform synchronous writes directly (given the backing storage is
> handling sync writes correctly and in a consistent way, e.g. no drive
> caches, bbu for raid controllers/hbas). And similar to the filestore journal

Our Bluestore disks are hosted on RAID controllers.  Should I set cache policy 
as WriteThrough for these disks then?


Sean Purdy

> the bluestore wal/rocksdb partitions can be used to allow both faster
> devices (ssd/nvme) and faster sync writes (compared to spinners).
> 
> Regards,
> Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-20 Thread Sean Purdy


Hi,


Luminous 12.2.0

Three node cluster, 18 OSD, debian stretch.


One node is down for maintenance for several hours.  When bringing it back up, 
OSDs rejoin after 5 minutes, but health is still warning.  monitor has not 
joined quorum after 40 minutes and logs show BADAUTHORIZER message every time 
the monitor tries to connect to the leader.

2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER

Then after ~45 minutes monitor *does* join quorum.

I'm presuming this isn't normal behaviour?  Or if it is, let me know and I 
won't worry.

All three nodes are using ntp and look OK timewise.


ceph-mon log:

(.43 is leader, .45 is rebooted node, .44 is other live node in quorum)

Boot:

2017-09-20 09:45:21.874152 7f49efeb8f80  0 ceph version 12.2.0 
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), 
pid 2243

2017-09-20 09:46:01.824708 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept connect_seq 3 vs existing csq=0 existing_state=STATE_CONNECTING
2017-09-20 09:46:01.824723 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept we reset (peer sent cseq 3, 0x5600722c.cseq = 0), sending 
RESETSESSION
2017-09-20 09:46:01.825247 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x56007244d000 :6789 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
2017-09-20 09:46:01.828053 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.44:6789/0 conn(0x5600722c :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21872 cs=1 l=0).process 
missed message?  skipped from seq 0 to 552717734

2017-09-20 09:46:05.580342 7f49e1b27700  0 -- 172.16.0.45:6789/0 >> 
172.16.0.43:6789/0 conn(0x5600720fe800 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=49261 cs=1 l=0).process 
missed message?  skipped from seq 0 to 1151972199
2017-09-20 09:46:05.581097 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER
2017-09-20 09:46:05.581590 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER
...
[message repeats for 45 minutes]
...
2017-09-20 10:23:38.818767 7f49e2b29700  0 -- 172.16.0.45:0/2243 >> 
172.16.0.43:6812/2422 conn(0x5600720fb800 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect
 got BADAUTHORIZER


At this point, "ceph mon stat" says .45/store03 not in quorum:

e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 376, leader 0 store01, quorum 0,1 store01,store02


Then suddenly a valid connection is made and sync happens:

2017-09-20 10:23:43.041009 7f49e5b2f700  1 mon.store03@2(synchronizing).mds e1 
Unable to load 'last_metadata'
2017-09-20 10:23:43.041967 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2381 e2381: 18 total, 13 up, 14 in
...
2017-09-20 10:23:43.045961 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2393 e2393: 18 total, 15 up, 15 in
...
2017-09-20 10:23:43.049255 7f49e5b2f700  1 mon.store03@2(synchronizing).osd 
e2406 e2406: 18 total, 18 up, 18 in
...
2017-09-20 10:23:43.054828 7f49e5b2f700  0 log_channel(cluster) log [INF] : 
mon.store03 calling new monitor election
2017-09-20 10:23:43.054901 7f49e5b2f700  1 mon.store03@2(electing).elector(372) 
init, last seen epoch 372


Now "ceph mon stat" says:

e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 378, leader 0 store01, quorum 0,1,2 store01,store02,store03

and everything's happy.


What should I look for/fix?  It's a fairly vanilla system.


Thanks in advance,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3cmd not working with luminous radosgw

2017-09-19 Thread Sean Purdy

On Tue, 19 Sep 2017, Yoann Moulin said:
> Hello,
> 
> Does anyone have tested s3cmd or other tools to manage ACL on luminous 
> radosGW ?

Don't know about ACL, but s3cmd for other things works for me.  Version 1.6.1


My config file includes (but is not limited to):

access_key = yourkey
secret_key = yoursecret
host_bucket = %(bucket)s.host.yourdomain
host_base = host.yourdomain

$ s3cmd -c s3cfg-ceph ls s3://test/148671665
2017-08-02 21:39 18218   s3://test/1486716654.15214271.docx.gpg.97
2017-08-02 22:10 18218   s3://test/1486716654.15214271.docx.gpg.98
2017-08-02 22:48 18218   s3://test/1486716654.15214271.docx.gpg.99

I have not tried rclone or ACL futzing.


Sean Purdy
 
> I have opened an issue on s3cmd too
> 
> https://github.com/s3tools/s3cmd/issues/919
> 
> Thanks for your help
> 
> Yoann
> 
> > I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 
> > 1.5M files) with rclone, I'm able to list/copy files with rclone but
> > s3cmd does not work at all, it is just able to give the bucket list but I 
> > can't list files neither update ACL.
> > 
> > does anyone already test this ?
> > 
> > root@iccluster012:~# rclone --version
> > rclone v1.37
> > 
> > root@iccluster012:~# s3cmd --version
> > s3cmd version 2.0.0
> > 
> > 
> > ### rclone ls files ###
> > 
> > root@iccluster012:~# rclone ls testadmin:image-net/LICENSE
> >  1589 LICENSE
> > root@iccluster012:~#
> > 
> > nginx (as revers proxy) log :
> > 
> >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
> >> HTTP/1.1" 200 0 "-" "rclone/v1.37"
> >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET 
> >> /image-net?delimiter=%2F&max-keys=1024&prefix= HTTP/1.1" 200 779 "-" 
> >> "rclone/v1.37"
> > 
> > rgw logs :
> > 
> >> 2017-09-15 10:30:02.620266 7ff1f58f7700  1 == starting new request 
> >> req=0x7ff1f58f11f0 =
> >> 2017-09-15 10:30:02.622245 7ff1f58f7700  1 == req done 
> >> req=0x7ff1f58f11f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:02.622324 7ff1f58f7700  1 civetweb: 0x56061584b000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
> >> HTTP/1.0" 1 0 - rclone/v1.37
> >> 2017-09-15 10:30:02.623361 7ff1f50f6700  1 == starting new request 
> >> req=0x7ff1f50f01f0 =
> >> 2017-09-15 10:30:02.689632 7ff1f50f6700  1 == req done 
> >> req=0x7ff1f50f01f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:02.689719 7ff1f50f6700  1 civetweb: 0x56061585: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET 
> >> /image-net?delimiter=%2F&max-keys=1024&prefix= HTTP/1.0" 1 0 - rclone/v1.37
> > 
> > 
> > 
> > ### s3cmds ls files ###
> > 
> > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls 
> > s3://image-net/LICENSE
> > root@iccluster012:~#
> > 
> > nginx (as revers proxy) log :
> > 
> >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-"
> >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F&prefix=LICENSE 
> >> HTTP/1.1" 200 318 "-" "-"
> > 
> > rgw logs :
> > 
> >> 2017-09-15 10:30:04.295355 7ff1f48f5700  1 == starting new request 
> >> req=0x7ff1f48ef1f0 =
> >> 2017-09-15 10:30:04.295913 7ff1f48f5700  1 == req done 
> >> req=0x7ff1f48ef1f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:04.295977 7ff1f48f5700  1 civetweb: 0x560615855000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location 
> >> HTTP/1.0" 1 0 - -
> >> 2017-09-15 10:30:04.299303 7ff1f40f4700  1 == starting new request 
> >> req=0x7ff1f40ee1f0 =
> >> 2017-09-15 10:30:04.300993 7ff1f40f4700  1 == req done 
> >> req=0x7ff1f40ee1f0 op status=0 http_status=200 ==
> >> 2017-09-15 10:30:04.301070 7ff1f40f4700  1 civetweb: 0x56061585a000: 
> >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET 
> >> /?delimiter=%2F&prefix=LICENSE HTTP/1.0" 1 0 - 
> > 
> > 
> > 
> > ### s3cmd : list bucket ###
> > 
> > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://
> > 2017-08-28 12:27  s3://image-net
> > roo

Re: [ceph-users] Very slow start of osds after reboot

2017-08-31 Thread Sean Purdy

Datapoint: I have the same issue on 12.1.1, three nodes, 6 disks per node.

On Thu, 31 Aug 2017, Piotr Dzionek said:
> For a last 3 weeks I have been running latest LTS Luminous Ceph release on
> CentOS7. It started with 4th RC and now I have Stable Release.
> Cluster runs fine, however I noticed that if I do a reboot of one the nodes,
> it takes a really long time for cluster to be in ok status.
> Osds are starting up, but not as soon as the server is up. They are up one
> by one during a period of 5 minutes. I checked the logs and all osds have
> following errors.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy

On Wed, 23 Aug 2017, David Turner said:
> This isn't a solution to fix them not starting at boot time, but a fix to
> not having to reboot the node again.  `ceph-disk activate-all` should go
> through and start up the rest of your osds without another reboot.

Thanks, will try next time.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy

Hi,

Luminous 12.1.1

I've had a couple of servers where at cold boot time, one or two of the OSDs 
haven't mounted/been detected.  Or been partially detected.  These are luminous 
Bluestore OSDs.  Often a warm boot fixes it, but I'd rather not have to reboot 
the node again.

Sometimes /var/lib/ceph/osd/ceph-NN is empty - i.e. not mounted.  And sometimes 
/var/lib/ceph/osd/ceph-NN is mounted, but the /var/lib/ceph/osd/ceph-NN/block 
symlink is pointing to a /dev/mapper UUID path that doesn't exist.  Those 
partitions have to be mounted before "systemctl start ceph-osd@NN.service" will 
work.

What happens at disk detect and mount time?  Is there a timeout somewhere I can 
extend?

How can I tell udev to have another go at mounting the disks?

If it's in the docs and I've missed it, apologies.


Thanks in advance,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-23 Thread Sean Purdy

On Tue, 15 Aug 2017, Sean Purdy said:
> Luminous 12.1.1 rc1
> 
> Hi,
> 
> 
> I have a three node cluster with 6 OSD and 1 mon per node.
> 
> I had to turn off one node for rack reasons.  While the node was down, the 
> cluster was still running and accepting files via radosgw.  However, when I 
> turned the machine back on, radosgw uploads stopped working and things like 
> "ceph status" starting timed out.  It took 20 minutes for "ceph status" to be 
> OK.  

Well I've figured out why "ceph status" was hanging (and possibly radosgw).  It 
seems that ceph utility looks at ceph.conf to find a monitor to connect to (or 
at least that's what strace implied), but our ceph.conf only had one monitor 
out of three actually listed in the file.  And that was the node I turned off.  
Updating mon_initial_members and mon_host with the other two monitors worked.

TBF, 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/administration_guide/managing_cluster_size
 does mention you should add your second and third monitors here.  But I hadn't 
read that, and elsewhere I read that on boot the monitors will discover other 
monitors, so I thought you didn't need to list them all.  e.g. 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address
 (which also says clients use ceph.conf to find monitors - I missed that part).

Anyway, I'll do a few more tests with a better ceph.conf


Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-21 Thread Sean Purdy

Hi,

On Thu, 17 Aug 2017, Gregory Farnum said:
> On Wed, Aug 16, 2017 at 4:04 AM Sean Purdy  wrote:
> 
> > On Tue, 15 Aug 2017, Gregory Farnum said:
> > > On Tue, Aug 15, 2017 at 4:23 AM Sean Purdy 
> > wrote:
> > > > I have a three node cluster with 6 OSD and 1 mon per node.
> > > >
> > > > I had to turn off one node for rack reasons.  While the node was down, 
> > > > the
> > > > cluster was still running and accepting files via radosgw.  However, 
> > > > when I
> > > > turned the machine back on, radosgw uploads stopped working and things 
> > > > like
> > > > "ceph status" starting timed out.  It took 20 minutes for "ceph status" 
> > > > to
> > > > be OK.

> Did you try running "ceph -s" from more than one location? If you had a
> functioning quorum that should have worked. And any live clients should
> have been able to keep working.

I tried from more than one location, yes.
 

> > Timing went like this:
> >
> > 11:22 node boot
> > 11:22 ceph-mon starts, recovers logs, compaction, first BADAUTHORIZER
> > message
> > 11:22 starting disk activation for 18 partitions (3 per bluestore)
> > 11:23 mgr on other node can't find secret_id
> > 11:43 bluefs mount succeeded on OSDs, ceph-osds go live
> > 11:45 last BADAUTHORIZER message in monitor log
> > 11:45 this host calls and wins a monitor election, mon_down health check
> > clears
> > 11:45 mgr happy
> >
> 
> The timing there on the mounting (how does it take 20 minutes?!?!?) and
> everything working again certainly is suspicious. It's not the direct cause
> of the issue, but there may be something else going on which is causing
> both of them.
> 
> All in all; I'm confused.


I tried again today, having a node down for an hour.  This might be a different 
set of questions.


This time, after the store came up, OSDs caught up quickly.

But the monitor process on the rebooted node took 25 minutes to come back into 
quorum.  Is this normal?


2017-08-21 16:10:45.243323 7f3fb62b2700  0 
mon.store03@2(synchronizing).data_health(0) update_stats avail 94% total 211 
GB, used 914 MB, avail 200 GB
...
2017-08-21 16:38:45.251345 7f3fb62b2700  0 mon.store03@2(peon).data_health(298) 
update_stats avail 94% total 211 GB, used 1229 MB, avail 199 GB

What is the monitor process doing this time?  It didn't seem to be maxing out 
network, CPU or disk.


During this time, e.g. "ceph mon stat" on any node took 6 to 15s to return.  
Which I presume is a function of "mon client hunt interval".  But still seems 
long.

However, radosgw file transactions seemed to work fine during the entire 
process.  So it's probably working as designed.


Mon 21 Aug 16:30:06 BST 2017    
   
e5: 3 mons at 
{store01=172.16.0.43:6789/0,store02=172.16.0.44:6789/0,store03=172.16.0.45:6789/0},
 election epoch 294, leader 0 store01, quorum 0,1 store01,store02

real0m8.456s
user0m0.304s
sys 0m0.024s


Thanks for feedback, I'm still new to this.

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-16 Thread Sean Purdy

On Tue, 15 Aug 2017, Gregory Farnum said:
> On Tue, Aug 15, 2017 at 4:23 AM Sean Purdy  wrote:
> > I have a three node cluster with 6 OSD and 1 mon per node.
> >
> > I had to turn off one node for rack reasons.  While the node was down, the
> > cluster was still running and accepting files via radosgw.  However, when I
> > turned the machine back on, radosgw uploads stopped working and things like
> > "ceph status" starting timed out.  It took 20 minutes for "ceph status" to
> > be OK.

> > 2017-08-15 11:28:29.835943 7fdf2d74b700  0 monclient(hunting):
> > authenticate timed out after 3002017-08-15
> > 11:28:29.835993 7fdf2d74b700  0 librados: client.admin authentication error
> > (110) Connection timed out
> >
> 
> That just means the client couldn't connect to an in-quorum monitor. It
> should have tried them all in sequence though — did you check if you had
> *any* functioning quorum?

There was a functioning quorum - I checked with "ceph --admin-daemon 
/var/run/ceph/ceph-mon.xxx.asok quorum_status".  Well - I interpreted the 
output as functioning.  There was a nominated leader.

> > 2017-08-15 11:23:07.180123 7f11c0fcc700  0 -- 172.16.0.43:0/2471 >>
> > 172.16.0.45:6812/1904 conn(0x556eeaf4d000 :-1
> > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
> > l=0).handle_connect_reply connect got BADAUTHORIZER
> >
> 
> This one's odd. We did get one report of seeing something like that, but I
> tend to think it's a clock sync issue.

I saw some messages about clock sync, but ntpq -p looked OK on each server.  
Will investigate further.

 remote   refid  st t when poll reach   delay   offset  jitter
==
+172.16.0.16 129.250.35.250   3 u  847 1024  3770.2891.103   0.376
+172.16.0.18 80.82.244.1203 u   93 1024  3770.397   -0.653   1.040
*172.16.0.19 158.43.128.332 u  279 1024  3770.2440.262   0.158

> > ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-9: (2) No
> > such file or directory
> >
> And that would appear to be something happening underneath Ceph, wherein
> your data wasn't actually all the way mounted or something?

It's the machine mounting the disks at boot time - udev or ceph-osd.target 
keeps retrying until eventually the disk/OSD is mounted.  Or eventually it 
gives up.  Do the OSDs need a monitor quorum at startup?  It kept restarting 
OSDs for 20 mins.

Timing went like this:

11:22 node boot
11:22 ceph-mon starts, recovers logs, compaction, first BADAUTHORIZER message
11:22 starting disk activation for 18 partitions (3 per bluestore)
11:23 mgr on other node can't find secret_id
11:43 bluefs mount succeeded on OSDs, ceph-osds go live
11:45 last BADAUTHORIZER message in monitor log
11:45 this host calls and wins a monitor election, mon_down health check clears
11:45 mgr happy

> Anyway, it should have survived that transition without any noticeable
> impact (unless you are running so close to capacity that merely getting the
> downed node up-to-date overwhelmed your disks/cpu). But without some basic
> information about what the cluster as a whole was doing I couldn't
> speculate.

This is a brand new 3 node cluster.  Dell R720 running Debian 9 with 2x SSD for 
OS and ceph-mon, 6x 2Tb SATA for ceph-osd using bluestore, per node.  Running 
radosgw as object store layer.  Only activity is a single-threaded test job 
uploading millions of small files over S3.  There are about 5.5million test 
objects so far (additionally 3x replication).  This job was fine when the 
machine was down, stalled when machine booted.

Looking at activity graphs at the time, there didn't seem to be a network 
bottleneck or CPU issue or disk throughput bottleneck.  But I'll look a bit 
closer.

ceph-mon is on an ext4 filesystem though.   Perhaps I should move this to xfs?  
Bluestore is xfs+bluestore.

I presume it's a monitor issue somehow.

> -Greg

Thanks for your input.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-15 Thread Sean Purdy

Luminous 12.1.1 rc1

Hi,


I have a three node cluster with 6 OSD and 1 mon per node.

I had to turn off one node for rack reasons.  While the node was down, the 
cluster was still running and accepting files via radosgw.  However, when I 
turned the machine back on, radosgw uploads stopped working and things like 
"ceph status" starting timed out.  It took 20 minutes for "ceph status" to be 
OK.  

In the recent past I've rebooted one or other node and the cluster kept 
working, and when the machine came back, the OSDs and monitor rejoined the 
cluster and things went on as usual.

The machine was off for 21 hours or so.

Any idea what might be happening, and how to mitigate the effects of this next 
time a machine has to be down for any length of time?


"ceph status" said:

2017-08-15 11:28:29.835943 7fdf2d74b700  0 monclient(hunting): authenticate 
timed out after 3002017-08-15 11:28:29.835993 
7fdf2d74b700  0 librados: client.admin authentication error (110) Connection 
timed out


monitor log said things like this before everything came together:

2017-08-15 11:23:07.180123 7f11c0fcc700  0 -- 172.16.0.43:0/2471 >> 
172.16.0.45:6812/1904 conn(0x556eeaf4d000 :-1 
s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply 
connect got BADAUTHORIZER

but "ceph --admin-daemon /var/run/ceph/ceph-mon.xxx.asok quorum_status" did 
work.  This monitor node was detected but not yet in quorum.


OSDs had 15 minutes of

ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-9: (2) No such 
file or directory

before becoming available.


Advice welcome.

Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] New OSD missing from part of osd crush tree

2017-08-10 Thread Sean Purdy

Luminous 12.1.1 rc


Our OSD osd.8 failed.  So we removed that.


We added a new disk and did:

$ ceph-deploy osd create  --dmcrypt --bluestore store02:/dev/sdd

That worked, created osd.18, OSD has data.

However, mgr output at http://localhost:7000/servers showed
osd.18 under a blank hostname and not e.g. on the node we attached it to.
But it is working.  "ceph osd tree" looks OK


The problem I see is:
When I do "ceph osd crush tree" I see the items list under the name:default~hdd 
tree:

device_class:hdd
name:store02~hdd
type:host

but my new drive is missing under this name - there are 5 OSDs, not 6.


*However*, if I look further down under the name:default tree

device_class:""
name:store02
type:host

I see all devices I am expecting, including osd.18


Is this something to worry about?  Or is there something needs fixing?  Health 
is warning for scrubbing reasons.


Output of related commands below.


Thanks for any help,

Sean Purdy


$ sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAMEUP/DOWN REWEIGHT PRI-AFF
-1   32.73651 root default
-3   10.91217 host store01
 0   hdd  1.81870 osd.0 up  1.0 1.0
 5   hdd  1.81870 osd.5 up  1.0 1.0
 6   hdd  1.81870 osd.6 up  1.0 1.0
 9   hdd  1.81870 osd.9 up  1.0 1.0
12   hdd  1.81870 osd.12up  1.0 1.0
15   hdd  1.81870 osd.15up  1.0 1.0
-5   10.91217 host store02
 1   hdd  1.81870 osd.1 up  1.0 1.0
 7   hdd  1.81870 osd.7 up  1.0 1.0
10   hdd  1.81870 osd.10up  1.0 1.0
13   hdd  1.81870 osd.13up  1.0 1.0
16   hdd  1.81870 osd.16up  1.0 1.0
18   hdd  1.81870 osd.18up  1.0 1.0
-7   10.91217 host store03
 2   hdd  1.81870 osd.2 up  1.0 1.0
 3   hdd  1.81870 osd.3 up  1.0 1.0
 4   hdd  1.81870 osd.4 up  1.0 1.0
11   hdd  1.81870 osd.11up  1.0 1.0
14   hdd  1.81870 osd.14up  1.0 1.0
17   hdd  1.81870 osd.17up  1.0 1.0


$ sudo ceph osd crush tree
[
{
"id": -8,
"device_class": "hdd",
"name": "default~hdd",
"type": "root",
"type_id": 10,
"items": [
{
"id": -2,
"device_class": "hdd",
"name": "store01~hdd",
"type": "host",
"type_id": 1,
"items": [
{
"id": 0,
"device_class": "hdd",
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 5,
"device_class": "hdd",
"name": "osd.5",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 6,
"device_class": "hdd",
"name": "osd.6",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 9,
"device_class": "hdd",
"name": "osd.9",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},
{
"id": 12,
"device_class": "hdd",
"name": "osd.12",
"type": "osd",
"type_id": 0,
"crush_weight": 1.818695,
"depth": 2
},

[ceph-users] radosgw hung when OS disks went readonly, different node radosgw restart fixed it

2017-07-31 Thread Sean Purdy


Hi,


Just had an incident in a 3-node test cluster running 12.1.1 on debian stretch

Each cluster had its own mon, mgr, radosgw, and osds.  Just object store.

I had s3cmd looping and uploading files via S3.

On one of the machines, the RAID controller barfed and dropped the OS disks.  
Or the disks failed.  TBC.  Anyway, / and /var went readonly.

The monitor on that machine found it couldn't write its logs and died.  But the 
OSDs stayed up - those disks didn't go readonly.


health: HEALTH_WARN
1/3 mons down, quorum store01,store03
osd: 18 osds: 18 up, 18 in
rgw: 3daemons active


The S3 process started timing out on connections to radosgw.  Even when talking 
to one of the other two radosgw instances.  (I'm RRing the DNS records at the 
moment).

I stopped the OSDs on that box.  No change.  I stopped radosgw on that box.  
Still no change.  The S3 upload process was still hanging/timing out.  A manual 
telnet to port 80 on the good nodes still hung.

"radosgw-admin bucket list" showed buckets &c

Then I restarted radosgw on one of the other two nodes.  After about a minute, 
the looping S3 upload process started working again.


So my questions:  Why did I have to manually restart radosgw on one of the 
other nodes?  Why didn't it either keep working, or e.g. start working when 
radosgw was stopped on the bad node?

Also where are the radosgw server/access logs?


I know it's probably an unusual edge case or something, but we're aiming for HA 
and redundancy.


Thanks!

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-06-30 Thread Sean Purdy

On Fri, 30 Jun 2017, Lenz Grimmer said:
> > 1/ Stop testing filestore+btrfs for luminous onward.  We've recommended 
> > against btrfs for a long time and are moving toward bluestore anyway.
> 
> Searching the documentation for "btrfs" does not really give a user any
> clue that the use of Btrfs is discouraged.
> 
> Where exactly has this been recommended?

As a new user, I certainly picked up on btrfs being discouraged, or not as 
stable as XFS.

e.g.
http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/?highlight=btrfs

"We currently recommend XFS for production deployments.

We used to recommend btrfs for testing, development, and any non-critical 
deployments ..."


http://docs.ceph.com/docs/master/start/hardware-recommendations/?highlight=btrfs

"btrfs is not quite stable enough for production"
 

> If you want to get rid of filestore on Btrfs, start a proper deprecation
> process and inform users that support for it it's going to be removed in
> the near future. The documentation must be updated accordingly and it
> must be clearly emphasized in the release notes.

But this sounds sane.


Sean Purdy
CV-Library Ltd
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Object store backups

2017-05-23 Thread Sean Purdy

Hi,

Another newbie question.  Do people using radosgw mirror their buckets
to AWS S3 or compatible services as a backup?  We're setting up a
small cluster and are thinking of ways to mitigate total disaster.
What do people recommend?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-mon and existing zookeeper servers

2017-05-23 Thread Sean Purdy

Hi,


This is my first ceph installation.  It seems to tick our boxes.  Will be
using it as an object store with radosgw.

I notice that ceph-mon uses zookeeper behind the scenes.  Is there a way to
point ceph-mon at an existing zookeeper cluster, using a zookeeper chroot?

Alternatively, might ceph-mon coexist peacefully with a different zookeeper
already on the same machine?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade osd ceph version

2017-03-05 Thread Sean Redmond

Hi,

You should upgrade them all to the latest point release if you don't want
to upgrade to the latest major release.

Start with the mons, then the osds.

Thanks

On 3 Mar 2017 18:05, "Curt Beason"  wrote:

> Hello,
>
> So this is going to be a noob question probably.  I read the
> documentation, but it didn't really cover upgrading to a specific version.
>
> We have a cluster with mixed versions.  While I don't want to upgrade the
> latest version of ceph, I would like to upgrade the osd's so they are all
> on the same version.  Most of them are on 0.87.1 or 0.87.2.  There are 2
> servers with osd's on 0.80.10.  What is the best way to go through and
> upgrade them all to 0.87.2?
>
> They are all running Ubuntu 14 with kernel 3.13 or newer.
>
> Cheers,
> Curt
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-monstore-tool rebuild assert error

2017-02-07 Thread Sean Sullivan

I have a hammer cluster that died a bit ago (hammer 94.9) consisting of 3
monitors and 630 osds spread across 21 storage hosts. The clusters monitors
all died due to leveldb corruption and the cluster was shut down. I was
finally given word that I could try to revive the cluster this week!

https://github.com/ceph/ceph/blob/hammer/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds

I see that the latest hammer code in github has the ceph-monstore-tool
rebuild backport and that is what I am running on the cluster now (ceph
version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c). I
was able to scrape all 630 of the osds and am left with a 1.1G store.db
directory. Using python I was successfully able to list all of the keys and
values which was very promising. That said I can not run the final command
in the recovery-using-osds article (ceph-monstore-tool rebuild)
successfully.

Whenever I run the tool (with the newly created admin keyring or with my
existing one) it errors with the following:

1. 0> 2017-02-17 15:00:47.516901 7f8b4d7408c0 -1
./mon/MonitorDBStore.h:
In function 'KeyValueDB::Iterator MonitorDBStore::get_iterator(const
string&)' thread 7f8b4d7408c0 time 2017-02-07 15:00:47.516319
2.

The complete trace is here
http://pastebin.com/NQE8uYiG

Can anyone lend a hand and tell me what may be wrong? I am able to iterate
over the leveldb database in python so the structure should be somewhat
okay? Am I SOL at this point? The cluster isn't production any longer and
while I don't have months of time I would really like to recover this
cluster just to see if it is at all possible.
--
- Sean: I wrote this. -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now

2017-02-06 Thread Sean Sullivan

cd6d88c0  5 asok(0x355a000)
register_command log flush hook 0x350a0d0
-3> 2017-02-06 17:35:54.362215 7f10cd6d88c0  5 asok(0x355a000)
register_command log dump hook 0x350a0d0
-2> 2017-02-06 17:35:54.362220 7f10cd6d88c0  5 asok(0x355a000)
register_command log reopen hook 0x350a0d0
-1> 2017-02-06 17:35:54.379684 7f10cd6d88c0  2 auth: KeyRing::load:
loaded key file /home/lacadmin/admin.keyring
 0> 2017-02-06 17:35:59.885651 7f10cd6d88c0 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f10cd6d88c0

 ceph version 0.94.9-4530-g83af8cd
(83af8cdaaa6d94404e6146b68e532a784e3cc99c)
 1: ceph-monstore-tool() [0x5e960a]
 2: (()+0x10330) [0x7f10cc5c8330]
 3: (strlen()+0x2a) [0x7f10cac629da]
 4: (std::basic_string, std::allocator
>::basic_string(char const*, std::allocator const&)+0x25)
[0x7f10cb576d75]
 5: (rebuild_monstore(char const*, std::vector >&, MonitorDBStore&)+0x878) [0x544958]
 6: (main()+0x3e05) [0x52c035]
 7: (__libc_start_main()+0xf5) [0x7f10cabfbf45]
 8: ceph-monstore-tool() [0x540347]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   1/ 1 ms
  10/10 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent   500
  max_new 1000
  log_file
--- end dump of recent events ---
Segmentation fault (core dumped)

--

I have tried copying my monitor and admin keyring into the admin.keyring
used to try to rebuild and it still fails. I am not sure whether this is
due to my packages or if something else is wrong. Is there a way to test or
see what may be happening?


On Sat, Aug 13, 2016 at 10:36 PM, Sean Sullivan 
wrote:

> So with a patched leveldb to skip errors I now have a store.db that I can
> extract the pg,mon,and osd map from. That said when I try to start kh10-8
> it bombs out::
>
> ---
> ---
> root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8# ceph-mon -i $(hostname) -d
> 2016-08-13 22:30:54.596039 7fa8b9e088c0  0 ceph version 0.94.7 (
> d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653
> starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data
> /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf
> 2016-08-13 22:30:54.608150 7fa8b9e088c0  0 starting mon.kh10-8 rank 2 at
> 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid
> e452874b-cb29-4468-ac7f-f8901dfccebf
> 2016-08-13 22:30:54.608395 7fa8b9e088c0  1 mon.kh10-8@-1(probing) e1
> preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf
> 2016-08-13 22:30:54.608617 7fa8b9e088c0  1 
> mon.kh10-8@-1(probing).paxosservice(pgmap
> 0..35606392) refresh upgraded, format 0 -> 1
> 2016-08-13 22:30:54.608629 7fa8b9e088c0  1 mon.kh10-8@-1(probing).pg v0
> on_upgrade discarding in-core PGMap
> terminate called after throwing an instance of
> 'ceph::buffer::end_of_buffer'
>   what():  buffer::end_of_buffer
> *** Caught signal (Aborted) **
>  in thread 7fa8b9e088c0
>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>  1: ceph-mon() [0x9b25ea]
>  2: (()+0x10330) [0x7fa8b8f0b330]
>  3: (gsignal()+0x37) [0x7fa8b73a8c37]
>  4: (abort()+0x148) [0x7fa8b73ac028]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
>  6: (()+0x5e6d6) [0x7fa8b7cb16d6]
>  7: (()+0x5e703) [0x7fa8b7cb1703]
>  8: (()+0x5e922) [0x7fa8b7cb1922]
>  9: ceph-mon() [0x853c39]
>  10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167)
> [0x894227]
>  11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
>  12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
>  13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
>  14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
>  15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
>  16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
>  17: (Monitor::init_paxos()+0x85) [0x5b2365]
>  18: (Monitor::preinit()+0x7d7) [0x5b6f87]
>  19: (main()+0x230c) [0x57853c]
>  20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
>  21: ceph-m

Re: [ceph-users] Problems with http://tracker.ceph.com/?

2017-01-20 Thread Sean Redmond

Hi,

Is the current strange DNS issue with docs.ceph.com related to this also? I
noticed that docs.ceph.com is getting a different A record from
ns4.redhat.com vs ns{1..3}.redhat.com

dig output here > http://pastebin.com/WapDY9e2

Thanks

On Thu, Jan 19, 2017 at 11:03 PM, Dan Mick  wrote:

> On 01/19/2017 09:57 AM, Shinobu Kinjo wrote:
>
> >> The good news is the tenant delete failed. The bad news is we're
> looking for
> >> the tracker volume now, which is no longer present in the Ceph project.
>
> We've reloaded a new instance of tracker.ceph.com from a backup of the
> database, and believe it's back online now.  The backup was taken at
> about 12:31 PDT, so the last 8 or so hours of changes are, sadly, gone,
> so if you had tracker updates during that time period, you may need to
> redo them.
>
> Sorry for the inconvenience.  We've relocated the tracker service to
> hopefully mitigate this vulnerability.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Problems with http://tracker.ceph.com/?

2017-01-19 Thread Sean Redmond

Looks like there maybe an issue with the ceph.com and tracker.ceph.com
website at the moment
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS

2017-01-17 Thread Sean Redmond

I found the kernel clients to perform better in my case.

I ran into a couple of issues with some metadata pool corruption and omap
inconsistencies. That said the repair tools are useful and managed to get
things back up and running.

The community has been very responsive to any issues I have ran into, this
really increases my confidence levels in any open source project.

On Tue, Jan 17, 2017 at 6:39 AM, w...@42on.com  wrote:

>
>
> Op 17 jan. 2017 om 03:47 heeft Tu Holmes  het
> volgende geschreven:
>
> I could use either one. I'm just trying to get a feel for how stable the
> technology is in general.
>
>
> Stable. Multiple customers of me run it in production with the kernel
> client and serious load on it. No major problems.
>
> Wido
>
> On Mon, Jan 16, 2017 at 3:19 PM Sean Redmond 
> wrote:
>
>> What's your use case? Do you plan on using kernel or fuse clients?
>>
>> On 16 Jan 2017 23:03, "Tu Holmes"  wrote:
>>
>> So what's the consensus on CephFS?
>>
>> Is it ready for prime time or not?
>>
>> //Tu
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS

2017-01-16 Thread Sean Redmond

What's your use case? Do you plan on using kernel or fuse clients?

On 16 Jan 2017 23:03, "Tu Holmes"  wrote:

> So what's the consensus on CephFS?
>
> Is it ready for prime time or not?
>
> //Tu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph radosgw - 500 errors -- odd

2017-01-13 Thread Sean Sullivan

622757888
stripe_ofs=12622757888 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.650262 7feacf6c6700
 0 RGWObjManifest::operator++(): result: ofs=12626952192
stripe_ofs=12626952192 part_ofs=12614369280 rule->part_size=15728640
/var/log/radosgw/client.radosgw.log-2017-01-13 11:30:41.656394 7feacf6c6700
 0 RGWObjManifest::operator++(): result: ofs=12630097920
stripe_ofs=12630097920 part_ofs=12630097920 rule->part_size=15728640


I am able to download that file just fine locally using boto but i have
heard from some users that the download hangs indefinitely on occasion. The
cluster has been healthy afaik (as of graphite showing health_ok) for the
entire period. I am not sure why this is happening or how to troubleshoot
it further. Obviously rgw is throwing a 500 which to me means an underlying
issue with ceph or the rgw server. All of my downloads complete with boto
so I am not sure what is wrong or how this is happening. Is there anything
I can do to figure out where the 500 is coming from // troubleshoot
further?

-- 
- Sean:  I wrote this. -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] docs.ceph.com down?

2017-01-02 Thread Sean Redmond

If you need the docs you can try reading them here

https://github.com/ceph/ceph/tree/master/doc

On Mon, Jan 2, 2017 at 7:45 PM, Andre Forigato 
wrote:

> Hello Marcus,
>
> Yes, it´s down. :-(
>
>
> André
>
> - Mensagem original -
> > De: "Marcus Müller" 
> > Para: ceph-users@lists.ceph.com
> > Enviadas: Segunda-feira, 2 de janeiro de 2017 16:55:13
> > Assunto: [ceph-users] docs.ceph.com down?
>
> > Hi all,
>
> > I can not reach docs.ceph.com for some days. Is the site really down or
> do I
> > have a problem here?
>
> > Regards,
> > Marcus
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd removal problem

2016-12-29 Thread Sean Redmond

Hi,

Hmm, could you try and dump the crush map - decompile it - modify it to
remove the DNE osd's, compile it and load it back into ceph?

http://docs.ceph.com/docs/master/rados/operations/crush-map/#get-a-crush-map

Thanks

On Thu, Dec 29, 2016 at 1:01 PM, Łukasz Chrustek  wrote:

> Hi,
>
> ]# ceph osd tree
> IDWEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>-7  16.89590 root ssd-disks
>   -11 0 host ssd1
> 598798032 0 osd.598798032 DNE0
> 21940 0 osd.21940 DNE0
>71 0 osd.71DNE0
>
> ]# ceph osd rm osd.598798032
> Error EINVAL: osd id 598798032 is too largeinvalid osd id-34
> ]# ceph osd rm osd.21940
> osd.21940 does not exist.
> ]# ceph osd rm osd.71
> osd.71 does not exist.
>
> > ceph osd rm osd.$ID
>
> > On Thu, Dec 29, 2016 at 10:44 AM, Łukasz Chrustek 
> wrote:
>
> > Hi,
>
> >  I was trying to delete 3 osds from cluster, deletion procces took very
> >  long  time and I interrupted it. mon process then crushed, and in ceph
> >  osd tree (after restart ceph-mon) I saw:
>
> >   ~]# ceph osd tree
> >  ID WEIGHTTYPE NAMEUP/DOWN REWEIGHT
> PRIMARY-AFFINITY
> >  -7  16.89590 root ssd-disks
> > -11 0 host ssd1
> >  -231707408 0
> >   22100 0 osd.22100DNE0
> >  71 0 osd.71   DNE0
>
>
> >  when I tried to delete osd.22100:
>
> >  [root@cc1 ~]# ceph osd crush remove osd.22100
> >  device 'osd.22100' does not appear in the crush map
>
> >  then I tried to delete osd.71 and mon proccess crushed:
>
> >  [root@cc1 ~]# ceph osd crush remove osd.71
> >  2016-12-28 17:52:34.459668 7f426a862700  0 monclient: hunting for new
> mon
>
> >  after restart of ceph-mon in ceph osd tree it shows:
>
> >  # ceph osd tree
> >  IDWEIGHTTYPE NAME UP/DOWN REWEIGHT
> PRIMARY-AFFINITY
> > -7  16.89590 root ssd-disks
> >-11 0 host ssd1
> >  598798032 0 osd.598798032 DNE0
> >  21940 0 osd.21940 DNE0
> > 71 0 osd.71DNE0
>
> >  My question is how to delete this osds without direct editing crushmap
> >  ? It is production system, I can't affort any service interruption :(,
> >  when I try to ceph osd crush remove then ceph-mon crushes
>
> >  I  dumped  crushmap,  but it took 19G (!!) after decompiling (compiled
> >  file  is  very small). So, I cleaned this file with perl (it take very
> >  long  time), and I have now small txt crushmap, which I edited. But is
> >  there  any  chance  that ceph will still remember somewhere about this
> >  huge  numbers  for osds ? Is it safe to apply this cleaned crushmap to
> >  cluster ? Cluster now works OK, but there is over 23TB production data
> >  which I can't loose. Please advice what to do.
>
>
> >  --
> >  Regards
> >  Luk
>
> >  ___
> >  ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> --
> Pozdrowienia,
>  Łukasz Chrustek
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 3 >

1 - 100 of 209 matches

Mail list logo