Re: [ceph-users] [Ceph-large] Large Omap Warning on Log pool

2019-06-12 Thread Aaron Bassett
Correct, it was pre-jewel. I believe we toyed with multisite replication back 
then so it may have gotten baked into the zonegroup inadvertently. Thanks for 
the info!

> On Jun 12, 2019, at 11:08 AM, Casey Bodley  wrote:
> 
> Hi Aaron,
> 
> The data_log objects are storing logs for multisite replication. Judging by 
> the pool name '.us-phx2.log', this cluster was created before jewel. Are you 
> (or were you) using multisite or radosgw-agent?
> 
> If not, you'll want to turn off the logging (log_meta and log_data -> false) 
> in your zonegroup configuration using 'radosgw-admin zonegroup get/set', 
> restart gateways, then delete the data_log and meta_log objects.
> 
> If it is multisite, then the logs should all be trimmed in the background as 
> long as all peer zones are up-to-date. There was a bug prior to 12.2.12 that 
> prevented datalog trimming 
> (https://urldefense.proofpoint.com/v2/url?u=http-3A__tracker.ceph.com_issues_38412=DwICAg=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=v4DUT5hhECo7oEd5wRUGTpZor7RdHML6WBqg4ShUkD4=WdoWXzoFQ7-MAOLhHAeaFOBUVwtktGzweP8mpMieCDo=).
> 
> Casey
> 
> 
> On 6/11/19 5:41 PM, Aaron Bassett wrote:
>> Hey all,
>> I've just recently upgraded some of my larger rgw clusters to latest 
>> luminous and now I'm getting a lot of warnings about large omap objects. 
>> Most of them were on the indices and I've taken care of them by sharding 
>> where appropriate. However on two of my clusters I have a large object in 
>> the rgw log pool.
>> 
>> ceph health detail
>> HEALTH_WARN 1 large omap objects
>> LARGE_OMAP_OBJECTS 1 large omap objects
>> 1 large objects found in pool '.us-phx2.log'
>> Search the cluster log for 'Large omap object found' for more details.
>> 
>> 
>> 2019-06-11 10:50:04.583354 7f8d2b737700  0 log_channel(cluster) log [WRN] : 
>> Large omap object found. Object: 51:b9a904f6:::data_log.27:head Key count: 
>> 15903755 Size (bytes): 2305116273
>> 
>> 
>> I'm not sure what to make of this. I don't see much chatter on the mailing 
>> lists about the log pool, other than a thread about swift lifecycles, which 
>> I dont use.  The log pool is pretty large, making it difficult to poke 
>> around in there:
>> 
>> .us-phx2.log 51  118GiB  0.03
>> 384TiB  12782413
>> 
>> That said i did a little poking around and it looks like a mix of these 
>> data_log object and some delete hints, but mostly a lot of objects starting 
>> with dates that point to different s3 pools. The object referenced in the 
>> osd log has 15912300  omap keys, and spot checking it, it looks like it's 
>> mostly referencing a pool we use with out dns resolver. We have a dns 
>> service that checks rgw endpoint health by uploading and deleting an object 
>> every few minutes to check health, and adds/removes endpoints from the A 
>> record as indicated.
>> 
>> So I guess I've got a few questions:
>> 
>> 1) what is the nature of the data in the data_log.* objects in the log pool? 
>> Is it safe to remove or is it more like a binlog that needs to be intact 
>> from the beginning of time?
>> 
>> 2) with the log pool in general, beyond the individual objects omap sizes, 
>> is there any concern about size? If so, is there a way to force it to 
>> truncate? I see some log commands in radosgw-admin, but documentation is 
>> light.
>> 
>> 
>> Thanks,
>> Aaron
>> 
>> CONFIDENTIALITY NOTICE
>> This e-mail message and any attachments are only for the use of the intended 
>> recipient and may contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient, any disclosure, distribution or other use of this e-mail message 
>> or attachments is prohibited. If you have received this e-mail message in 
>> error, please delete and notify the sender immediately. Thank you.
>> 
>> ___
>> Ceph-large mailing list
>> ceph-la...@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dlarge-2Dceph.com=DwICAg=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=v4DUT5hhECo7oEd5wRUGTpZor7RdHML6WBqg4ShUkD4=LMKCnwYhtrDHqSyT7s13zJjf1CxEb8FXZ5AxvZ8IYTc=
>> 
>> 
>> 
> ___
> Ceph-large mailing list
> ceph-la...@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dlarge-2Dceph.com=DwICAg=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=v4DUT5hhECo7oEd5wRUGTpZor7RdHML6WBqg4ShUkD4=LMKCnwYhtrDHqSyT7s13zJjf1CxEb8FXZ5AxvZ8IYTc=


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw index all keys in all buckets

2019-05-02 Thread Aaron Bassett
Hello,
I'm trying to write a tool to index all keys in all buckets stored in radosgw. 
I've created a user with the following caps:

"caps": [
{
"type": "buckets",
"perm": "read"
},
{
"type": "metadata",
"perm": "read"
},
{
"type": "usage",
"perm": "read"
},
{
"type": "users",
"perm": "read"
}
],


With these caps I'm able to use a python radosgw-admin lib to list buckets and 
acls and users, but not keys. This user is also unable to read buckets and/or 
keys through the normal s3 api. Is there a way to create an s3 user that has 
read access to all buckets and keys without explicitly being granted acls?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] msgr2 and cephfs

2019-04-24 Thread Aaron Bassett
Ah nevermind, I found ceph mon set addrs and I'm good to go. 

Aaron

> On Apr 24, 2019, at 4:36 PM, Aaron Bassett  
> wrote:
> 
> Yea ok thats what I guessed. I'm struggling to get my mons to listen on both 
> ports. On startup they report:
> 
> 2019-04-24 19:58:43.652 7fcf9cd3c040 -1 WARNING: 'mon addr' config option 
> [v2:172.17.40.143:3300/0,v1:172.17.40.143:6789/0] does not match monmap file
> continuing with monmap configuration
> 2019-04-24 19:58:43.652 7fcf9cd3c040  0 starting mon.bos-r1-r3-head1 rank 0 
> at public addrs v2:172.17.40.143:3300/0 at bind addrs v2:172.17.40.143:3300/0 
> mon_data /var/lib/ceph/mon/ceph-bos-r1-r3-head1 fsid 
> 4a361f9c-e28b-4b6b-ab59-264dcb51da97
> 
> 
> which means i assume I have to jump through the add/remove mons hoops or just 
> burn it down and start over? FWIW the docs seem to indicate they'll listen to 
> both by default (in nautilus).
> 
> Aaron
> 
>> On Apr 24, 2019, at 4:29 PM, Jason Dillaman  wrote:
>> 
>> AFAIK, the kernel clients for CephFS and RBD do not support msgr2 yet.
>> 
>> On Wed, Apr 24, 2019 at 4:19 PM Aaron Bassett
>>  wrote:
>>> 
>>> Hi,
>>> I'm standing up a new cluster on nautilus to play with some of the new 
>>> features, and I've somehow got my monitors only listening on msgrv2 port 
>>> (3300) and not the legacy port (6789). I'm running kernel 4.15 on my 
>>> clients. Can I mount cephfs via port 3300 or do I have to figure out how to 
>>> get my mons listening to both?
>>> 
>>> Thanks,
>>> Aaron
>>> CONFIDENTIALITY NOTICE
>>> This e-mail message and any attachments are only for the use of the 
>>> intended recipient and may contain information that is privileged, 
>>> confidential or exempt from disclosure under applicable law. If you are not 
>>> the intended recipient, any disclosure, distribution or other use of this 
>>> e-mail message or attachments is prohibited. If you have received this 
>>> e-mail message in error, please delete and notify the sender immediately. 
>>> Thank you.
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=zjPqBuK3C5vPalm69GpAWDz3vdkT0jYEVhvV0NG3OyI=wUk0q5ArWhrXvqzMNGRcL3qzKPjAoDQ481ek_5j4BQ0=
>> 
>> 
>> 
>> -- 
>> Jason
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] msgr2 and cephfs

2019-04-24 Thread Aaron Bassett
Yea ok thats what I guessed. I'm struggling to get my mons to listen on both 
ports. On startup they report:

2019-04-24 19:58:43.652 7fcf9cd3c040 -1 WARNING: 'mon addr' config option 
[v2:172.17.40.143:3300/0,v1:172.17.40.143:6789/0] does not match monmap file
 continuing with monmap configuration
2019-04-24 19:58:43.652 7fcf9cd3c040  0 starting mon.bos-r1-r3-head1 rank 0 at 
public addrs v2:172.17.40.143:3300/0 at bind addrs v2:172.17.40.143:3300/0 
mon_data /var/lib/ceph/mon/ceph-bos-r1-r3-head1 fsid 
4a361f9c-e28b-4b6b-ab59-264dcb51da97


which means i assume I have to jump through the add/remove mons hoops or just 
burn it down and start over? FWIW the docs seem to indicate they'll listen to 
both by default (in nautilus).

Aaron

> On Apr 24, 2019, at 4:29 PM, Jason Dillaman  wrote:
> 
> AFAIK, the kernel clients for CephFS and RBD do not support msgr2 yet.
> 
> On Wed, Apr 24, 2019 at 4:19 PM Aaron Bassett
>  wrote:
>> 
>> Hi,
>> I'm standing up a new cluster on nautilus to play with some of the new 
>> features, and I've somehow got my monitors only listening on msgrv2 port 
>> (3300) and not the legacy port (6789). I'm running kernel 4.15 on my 
>> clients. Can I mount cephfs via port 3300 or do I have to figure out how to 
>> get my mons listening to both?
>> 
>> Thanks,
>> Aaron
>> CONFIDENTIALITY NOTICE
>> This e-mail message and any attachments are only for the use of the intended 
>> recipient and may contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient, any disclosure, distribution or other use of this e-mail message 
>> or attachments is prohibited. If you have received this e-mail message in 
>> error, please delete and notify the sender immediately. Thank you.
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=zjPqBuK3C5vPalm69GpAWDz3vdkT0jYEVhvV0NG3OyI=wUk0q5ArWhrXvqzMNGRcL3qzKPjAoDQ481ek_5j4BQ0=
> 
> 
> 
> -- 
> Jason


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] msgr2 and cephfs

2019-04-24 Thread Aaron Bassett
Hi,
I'm standing up a new cluster on nautilus to play with some of the new 
features, and I've somehow got my monitors only listening on msgrv2 port (3300) 
and not the legacy port (6789). I'm running kernel 4.15 on my clients. Can I 
mount cephfs via port 3300 or do I have to figure out how to get my mons 
listening to both?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW ops log lag?

2019-04-12 Thread Aaron Bassett
Ok thanks. Is the expectation that events will be available on that socket as 
soon as the occur or is it more of a best effort situation? I'm just trying to 
nail down which side of the socket might be lagging. It's pretty difficult to 
recreate this as I have to hit the cluster very hard to get it to start lagging.

Thanks, Aaron 

> On Apr 12, 2019, at 11:16 AM, Matt Benjamin  wrote:
> 
> Hi Aaron,
> 
> I don't think that exists currently.
> 
> Matt
> 
> On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
>  wrote:
>> 
>> I have an radogw log centralizer that we use to for an audit trail for data 
>> access in our ceph clusters. We've enabled the ops log socket and added 
>> logging of the http_authorization header to it:
>> 
>> rgw log http headers = "http_authorization"
>> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
>> rgw enable ops log = true
>> 
>> We have a daemon that listens on the ops socket, extracts/manipulates some 
>> information from the ops log, and sends it off to our log aggregator.
>> 
>> This setup works pretty well for the most part, except when the cluster 
>> comes under heavy load, it can get _very_ laggy - sometimes up to several 
>> hours behind. I'm having a hard time nailing down whats causing this lag. 
>> The daemon is rather naive, basically just some nc with jq in between, but 
>> the log aggregator has plenty of spare capacity, so I don't think its 
>> slowing down how fast the daemon is consuming from the socket.
>> 
>> I was revisiting the documentation about this ops log and noticed the 
>> following which I hadn't seen previously:
>> 
>> When specifying a UNIX domain socket, it is also possible to specify the 
>> maximum amount of memory that will be used to keep the data backlog:
>> rgw ops log data backlog = 
>> Any backlogged data in excess to the specified size will be lost, so the 
>> socket needs to be read constantly.
>> 
>> I'm wondering if theres a way I can query radosgw for the current size of 
>> that backlog to help me narrow down where the bottleneck may be occuring.
>> 
>> Thanks,
>> Aaron
>> 
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> This e-mail message and any attachments are only for the use of the intended 
>> recipient and may contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient, any disclosure, distribution or other use of this e-mail message 
>> or attachments is prohibited. If you have received this e-mail message in 
>> error, please delete and notify the sender immediately. Thank you.
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=FzFoCJ8qtZ66OKdL1Ph10qjZbCEjvMg9JyS_9LwEpSg=
>> 
>> 
> 
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redhat.com_en_technologies_storage=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=hi6_HiZS0D_nzAqKsvJPPfmi8nZSv4lZCRFZ1ru9CxM=
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW ops log lag?

2019-04-12 Thread Aaron Bassett
I have an radogw log centralizer that we use to for an audit trail for data 
access in our ceph clusters. We've enabled the ops log socket and added logging 
of the http_authorization header to it:

rgw log http headers = "http_authorization"
rgw ops log socket path = /var/run/ceph/rgw-ops.sock
rgw enable ops log = true

We have a daemon that listens on the ops socket, extracts/manipulates some 
information from the ops log, and sends it off to our log aggregator.

This setup works pretty well for the most part, except when the cluster comes 
under heavy load, it can get _very_ laggy - sometimes up to several hours 
behind. I'm having a hard time nailing down whats causing this lag. The daemon 
is rather naive, basically just some nc with jq in between, but the log 
aggregator has plenty of spare capacity, so I don't think its slowing down how 
fast the daemon is consuming from the socket.

I was revisiting the documentation about this ops log and noticed the following 
which I hadn't seen previously:

When specifying a UNIX domain socket, it is also possible to specify the 
maximum amount of memory that will be used to keep the data backlog:
rgw ops log data backlog = 
Any backlogged data in excess to the specified size will be lost, so the socket 
needs to be read constantly.

I'm wondering if theres a way I can query radosgw for the current size of that 
backlog to help me narrow down where the bottleneck may be occuring.

Thanks,
Aaron



CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Civetweb log format

2018-03-13 Thread Aaron Bassett
Well I have it mostly wrapped up and writing to graylog, however the ops log 
has a `remote_addr` field, but as far as I can tell it's always blank. I found 
this fix but it seems to only be in v13.0.1 
https://github.com/ceph/ceph/pull/16860

Is there any chance we'd see backports of this to Jewel and/or luminous?


Aaron

On Mar 12, 2018, at 5:50 PM, Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:

Quick update:

adding the following to your config:

rgw log http headers = "http_authorization"
rgw ops log socket path = /tmp/rgw
rgw enable ops log = true
rgw enable usage log = true


and you can now

 nc -U /tmp/rgw |./jq --stream 'fromstream(1|truncate_stream(inputs))'
{
  "time": "2018-03-12 21:42:19.479037Z",
  "time_local": "2018-03-12 21:42:19.479037",
  "remote_addr": "",
  "user": "test",
  "operation": "PUT",
  "uri": "/testbucket/",
  "http_status": "200",
  "error_code": "",
  "bytes_sent": 19,
  "bytes_received": 0,
  "object_size": 0,
  "total_time": 600967,
  "user_agent": "Boto/2.46.1 Python/2.7.12 Linux/4.4.0-42-generic",
  "referrer": "",
  "http_x_headers": [
{
  "HTTP_AUTHORIZATION": "AWS : "
}
  ]
}

pretty good start on getting an audit log going!


On Mar 9, 2018, at 10:52 PM, Konstantin Shalygin 
<k0...@k0ste.ru<mailto:k0...@k0ste.ru>> wrote:



Unfortunately I can't quite figure out how to use it. I've got "rgw log http 
headers = "authorization" in my ceph.conf but I'm getting no love in the rgw 
log.



I think this shold have 'http_' prefix, like:


rgw log http headers = "http_host, http_x_forwarded_for"





k



CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Civetweb log format

2018-03-12 Thread Aaron Bassett
Quick update:

adding the following to your config:

rgw log http headers = "http_authorization"
rgw ops log socket path = /tmp/rgw
rgw enable ops log = true
rgw enable usage log = true


and you can now

 nc -U /tmp/rgw |./jq --stream 'fromstream(1|truncate_stream(inputs))'
{
  "time": "2018-03-12 21:42:19.479037Z",
  "time_local": "2018-03-12 21:42:19.479037",
  "remote_addr": "",
  "user": "test",
  "operation": "PUT",
  "uri": "/testbucket/",
  "http_status": "200",
  "error_code": "",
  "bytes_sent": 19,
  "bytes_received": 0,
  "object_size": 0,
  "total_time": 600967,
  "user_agent": "Boto/2.46.1 Python/2.7.12 Linux/4.4.0-42-generic",
  "referrer": "",
  "http_x_headers": [
{
  "HTTP_AUTHORIZATION": "AWS : "
}
  ]
}

pretty good start on getting an audit log going!


On Mar 9, 2018, at 10:52 PM, Konstantin Shalygin 
> wrote:



Unfortunately I can't quite figure out how to use it. I've got "rgw log http 
headers = "authorization" in my ceph.conf but I'm getting no love in the rgw 
log.



I think this shold have 'http_' prefix, like:


rgw log http headers = "http_host, http_x_forwarded_for"





k


CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Civetweb log format

2018-03-09 Thread Aaron Bassett
David, that's exactly my goal as well.

On closer reading of the docs, I see that this setting is to be used for 
writing these headers to the ops log. I guess it's time for me to learn what 
that's about. I've never quite been able to figure out how to get my hands on 
it. I also see an option for writing the ops log to a socket instead of the 
bucket it normally writes to. Seems like a good place for me to snag the info I 
need and transform and log it in an audit log. I'm going to investigate this 
and see what turns up.

Aaron

On Mar 9, 2018, at 5:12 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

Matt, my only goal is to be able to have something that can be checked to see 
which key was used to access which resource. The closest I was able to get in 
Jewel was rgw debug logging 10/10, but it generates 100+ lines of logs for 
every request and as Aaron points out takes some logic to combine the object, 
the key, and the action as well that it doesn't actually catch every type of 
request.

It sounds like you've done some work with this. How can we utilize what you've 
done to be able to have audit logging on buckets?

On Fri, Mar 9, 2018, 5:00 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
Ah yes, I found it: 
https://github.com/ceph/ceph/commit/3192ef6a034bf39becead5f87a0e48651fcab705<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_commit_3192ef6a034bf39becead5f87a0e48651fcab705=DwMFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=FHllL29ULMv_4o5Dyy1U8sv5F1VnHXVdIVkQ7EifinQ=EQhGrmDRtCR7Ib7inLmL5FIVGjBvxnnMpJtYauqKGMQ=>

Unfortunately I can't quite figure out how to use it. I've got "rgw log http 
headers = "authorization" in my ceph.conf but I'm getting no love in the rgw 
log.


Also, setting rgw debug level to 10 did get me the user access key id, but only 
incidentally, talking about a cache miss and put for the user, so I'm not sure 
how much I'd want to depend on that. Also, to Davids point, that makes thing 
very chatty and I'll have to do some processing to correlate the key id with 
the rest of the request info.


Aaron

On Mar 8, 2018, at 8:18 PM, Matt Benjamin 
<mbenj...@redhat.com<mailto:mbenj...@redhat.com>> wrote:

Hi Yehuda,

I did add support for logging arbitrary headers, but not a
configurable log record a-la webservers.  To level set, David, are you
speaking about a file or pipe log sync on the RGW host?

Matt

On Thu, Mar 8, 2018 at 7:55 PM, Yehuda Sadeh-Weinraub 
<yeh...@redhat.com<mailto:yeh...@redhat.com>> wrote:
On Thu, Mar 8, 2018 at 2:22 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:
I remember some time ago Yehuda had commented on a thread like this saying
that it would make sense to add a logging/auditing feature like this to RGW.
I haven't heard much about it since then, though.  Yehuda, do you remember
that and/or think that logging like this might become viable.

I vaguely remember Matt was working on this. Matt?

Yehuda



On Thu, Mar 8, 2018 at 4:17 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>>
wrote:

Yea thats what I was afraid of. I'm looking at possibly patching to add
it, but i really dont want to support my own builds. I suppose other
alternatives are to use proxies to log stuff, but that makes me sad.

Aaron


On Mar 8, 2018, at 12:36 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

Setting radosgw debug logging to 10/10 is the only way I've been able to
get the access key in the logs for requests.  It's very unfortunate as it
DRASTICALLY increases the amount of log per request, but it's what we needed
to do to be able to have the access key in the logs along with the request.

On Tue, Mar 6, 2018 at 3:09 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>>
wrote:

Hey all,
I'm trying to get something of an audit log out of radosgw. To that end I
was wondering if theres a mechanism to customize the log format of civetweb.
It's already writing IP, HTTP Verb, path, response and time, but I'm hoping
to get it to print the Authorization header of the request, which containers
the access key id which we can tie back into the systems we use to issue
credentials. Any thoughts?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the
intended recipient and may contain information that is privileged,
confidential or exempt from disclosure under applicable law. If you are not
the intended recipient, any disclosure, distribution or other use of this
e-mail message or attachments is prohibited. If you have received this
e-mail message in error, please delete and notify the sender immediately.
Thank you.


Re: [ceph-users] Civetweb log format

2018-03-09 Thread Aaron Bassett
Ah yes, I found it: 
https://github.com/ceph/ceph/commit/3192ef6a034bf39becead5f87a0e48651fcab705

Unfortunately I can't quite figure out how to use it. I've got "rgw log http 
headers = "authorization" in my ceph.conf but I'm getting no love in the rgw 
log.


Also, setting rgw debug level to 10 did get me the user access key id, but only 
incidentally, talking about a cache miss and put for the user, so I'm not sure 
how much I'd want to depend on that. Also, to Davids point, that makes thing 
very chatty and I'll have to do some processing to correlate the key id with 
the rest of the request info.


Aaron

On Mar 8, 2018, at 8:18 PM, Matt Benjamin 
<mbenj...@redhat.com<mailto:mbenj...@redhat.com>> wrote:

Hi Yehuda,

I did add support for logging arbitrary headers, but not a
configurable log record a-la webservers.  To level set, David, are you
speaking about a file or pipe log sync on the RGW host?

Matt

On Thu, Mar 8, 2018 at 7:55 PM, Yehuda Sadeh-Weinraub 
<yeh...@redhat.com<mailto:yeh...@redhat.com>> wrote:
On Thu, Mar 8, 2018 at 2:22 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:
I remember some time ago Yehuda had commented on a thread like this saying
that it would make sense to add a logging/auditing feature like this to RGW.
I haven't heard much about it since then, though.  Yehuda, do you remember
that and/or think that logging like this might become viable.

I vaguely remember Matt was working on this. Matt?

Yehuda



On Thu, Mar 8, 2018 at 4:17 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>>
wrote:

Yea thats what I was afraid of. I'm looking at possibly patching to add
it, but i really dont want to support my own builds. I suppose other
alternatives are to use proxies to log stuff, but that makes me sad.

Aaron


On Mar 8, 2018, at 12:36 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

Setting radosgw debug logging to 10/10 is the only way I've been able to
get the access key in the logs for requests.  It's very unfortunate as it
DRASTICALLY increases the amount of log per request, but it's what we needed
to do to be able to have the access key in the logs along with the request.

On Tue, Mar 6, 2018 at 3:09 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>>
wrote:

Hey all,
I'm trying to get something of an audit log out of radosgw. To that end I
was wondering if theres a mechanism to customize the log format of civetweb.
It's already writing IP, HTTP Verb, path, response and time, but I'm hoping
to get it to print the Authorization header of the request, which containers
the access key id which we can tie back into the systems we use to issue
credentials. Any thoughts?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the
intended recipient and may contain information that is privileged,
confidential or exempt from disclosure under applicable law. If you are not
the intended recipient, any disclosure, distribution or other use of this
e-mail message or attachments is prohibited. If you have received this
e-mail message in error, please delete and notify the sender immediately.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIBaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=q8So9TjC57treWWapD23wxqiYyUohBcrF1HlEB82ntY=SqGv02oZlntXRPTSqDK9e5nWhELurcxGkg8HxB-py_k=






--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redhat.com_en_technologies_storage=DwIBaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=q8So9TjC57treWWapD23wxqiYyUohBcrF1HlEB82ntY=WETrkwV8EkHd9iypM-7_WonFV4XeYhJbXCjg-c6dr84=

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Civetweb log format

2018-03-08 Thread Aaron Bassett
Yea thats what I was afraid of. I'm looking at possibly patching to add it, but 
i really dont want to support my own builds. I suppose other alternatives are 
to use proxies to log stuff, but that makes me sad.

Aaron

On Mar 8, 2018, at 12:36 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

Setting radosgw debug logging to 10/10 is the only way I've been able to get 
the access key in the logs for requests.  It's very unfortunate as it 
DRASTICALLY increases the amount of log per request, but it's what we needed to 
do to be able to have the access key in the logs along with the request.

On Tue, Mar 6, 2018 at 3:09 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
Hey all,
I'm trying to get something of an audit log out of radosgw. To that end I was 
wondering if theres a mechanism to customize the log format of civetweb. It's 
already writing IP, HTTP Verb, path, response and time, but I'm hoping to get 
it to print the Authorization header of the request, which containers the 
access key id which we can tie back into the systems we use to issue 
credentials. Any thoughts?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwMFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=ogtM9WoiRZhDifjkdoFxejjr0IZzhgQRL9fNBlUDqS0=1gu9IXpFcNZ-YpWAJYnGAecQkJ240NG0wnzGXdjMtfk=>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Civetweb log format

2018-03-06 Thread Aaron Bassett
Hey all,
I'm trying to get something of an audit log out of radosgw. To that end I was 
wondering if theres a mechanism to customize the log format of civetweb. It's 
already writing IP, HTTP Verb, path, response and time, but I'm hoping to get 
it to print the Authorization header of the request, which containers the 
access key id which we can tie back into the systems we use to issue 
credentials. Any thoughts?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stuck down+peering after host failure.

2017-12-11 Thread Aaron Bassett
Morning All,
I have a large-ish (16 node, 1100 osds) cluster I recent had to move from one 
DC to another. Before shutting everything down, I set noout, norecover, and 
nobackfill, thinking this would help everything stand back up again. Upon 
installation at the new DC, one of the nodes refused to boot. With my crush 
rule having the failure domain as host, I did not think this would be a 
problem. However, once I turned off noout, norecover, and nobackfille, 
everything else came up and settled in, I still have 1545 pgs stuck 
down+peering. On other pgs, recovery and backfilling are proceeding as 
expected, but these pgs appear to be permanently stuck. When querying the 
down+peering pgs, they all mention pgs from the down node in 
""down_osds_we_would_probe". I'm not sure why it *needs* to query these since 
it should have two other copies on other nodes? I'm not sure if bringing 
everything up with noout or norecover on confused things. Looking for advice...

Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-14 Thread Aaron Bassett
I issued the pg deep scrub command ~24 hours ago and nothing has changed. I see 
nothing in the active osd's log about kicking off the scrub.

On Jul 13, 2017, at 2:24 PM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

# ceph pg deep-scrub 22.1611

On Thu, Jul 13, 2017 at 1:00 PM Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
I'm not sure if I'm doing something wrong, but when I run this:

# ceph osd deep-scrub 294


All i get in the osd log is:

2017-07-13 16:57:53.782841 7f40d089f700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub starts
2017-07-13 16:57:53.785261 7f40ce09a700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub ok


each time I run it, its the same pg.

Is there some reason its not scrubbing all the pgs?

Aaron

> On Jul 13, 2017, at 10:29 AM, Aaron Bassett 
> <aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
>
> Ok good to hear, I just kicked one off on the acting primary so I guess I'll 
> be patient now...
>
> Thanks,
> Aaron
>
>> On Jul 13, 2017, at 10:28 AM, Dan van der Ster 
>> <d...@vanderster.com<mailto:d...@vanderster.com>> wrote:
>>
>> On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
>> <aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
>>> Because it was a read error I check SMART stats for that osd's disk and 
>>> sure enough, it had some uncorrected read errors. In order to stop it from 
>>> causing more problems > I stopped the daemon to let ceph recover from the 
>>> other osds. The cluster has now finished rebalancing, but remains in ERR 
>>> state as it still thinks this pg is inconsistent.
>>
>> It should clear up after you trigger another deep-scrub on that PG.
>>
>> Cheers, Dan
>

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwMFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=90fQy7EQVP-iiOQMmmdnZAY-MzkbEFaK3Oy9HYN9crU=61zqhbS1pDGRjiUA3_WJOnB7AaJlByAfrW0kMxEKe-U=>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-13 Thread Aaron Bassett
I'm not sure if I'm doing something wrong, but when I run this:

# ceph osd deep-scrub 294


All i get in the osd log is:

2017-07-13 16:57:53.782841 7f40d089f700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub starts
2017-07-13 16:57:53.785261 7f40ce09a700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub ok


each time I run it, its the same pg.

Is there some reason its not scrubbing all the pgs?

Aaron

> On Jul 13, 2017, at 10:29 AM, Aaron Bassett <aaron.bass...@nantomics.com> 
> wrote:
>
> Ok good to hear, I just kicked one off on the acting primary so I guess I'll 
> be patient now...
>
> Thanks,
> Aaron
>
>> On Jul 13, 2017, at 10:28 AM, Dan van der Ster <d...@vanderster.com> wrote:
>>
>> On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
>> <aaron.bass...@nantomics.com> wrote:
>>> Because it was a read error I check SMART stats for that osd's disk and 
>>> sure enough, it had some uncorrected read errors. In order to stop it from 
>>> causing more problems > I stopped the daemon to let ceph recover from the 
>>> other osds. The cluster has now finished rebalancing, but remains in ERR 
>>> state as it still thinks this pg is inconsistent.
>>
>> It should clear up after you trigger another deep-scrub on that PG.
>>
>> Cheers, Dan
>

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-13 Thread Aaron Bassett
Ok good to hear, I just kicked one off on the acting primary so I guess I'll be 
patient now...

Thanks,
Aaron

> On Jul 13, 2017, at 10:28 AM, Dan van der Ster <d...@vanderster.com> wrote:
>
> On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
> <aaron.bass...@nantomics.com> wrote:
>> Because it was a read error I check SMART stats for that osd's disk and sure 
>> enough, it had some uncorrected read errors. In order to stop it from 
>> causing more problems > I stopped the daemon to let ceph recover from the 
>> other osds. The cluster has now finished rebalancing, but remains in ERR 
>> state as it still thinks this pg is inconsistent.
>
> It should clear up after you trigger another deep-scrub on that PG.
>
> Cheers, Dan

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG stuck inconsistent, but appears ok?

2017-07-13 Thread Aaron Bassett
Good Morning,
I have an odd situation where a pg is listed inconsistent, but rados is 
struggling to tell me about it:

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 requests are blocked > 32 sec; 1 osds have 
slow requests; 1 scrub errors
pg 22.1611 is active+clean+inconsistent, acting 
[294,1080,970,324,722,70,949,874,943,606,518]
1 scrub errors

# rados list-inconsistent-pg .us-smr.rgw.buckets
["22.1611"]

# rados list-inconsistent-obj 22.1611
[]error 2: (2) No such file or directory

A little background, I got into this state because the inconsistent pg popped 
up in ceph -s. I used list-inconsistent-obj to find which osd was causing the 
problem:

{
"osd": 497,
"missing": false,
"read_error": true,
"data_digest_mismatch": false,
"omap_digest_mismatch": false,
"size_mismatch": false,
"size": 599488
},


Because it was a read error I check SMART stats for that osd's disk and sure 
enough, it had some uncorrected read errors. In order to stop it from causing 
more problems I stopped the daemon to let ceph recover from the other osds. The 
cluster has now finished rebalancing, but remains in ERR state as it still 
thinks this pg is inconsistent.

ceph pg query output is here: https://hastebin.com/mamesokexa.cpp

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW/Civet: Reads too much data when client doesn't close the connection

2017-07-12 Thread Aaron Bassett
Yup already working on fixing the client, but it seems like a potentially nasty 
issue for RGW, as a malicious client could potentially DOS an endpoint pretty 
easily this way.

Aaron

> On Jul 12, 2017, at 11:48 AM, Jens Rosenboom <j.rosenb...@x-ion.de> wrote:
>
> 2017-07-12 15:23 GMT+00:00 Aaron Bassett <aaron.bass...@nantomics.com>:
>> I have a situation where a client is GET'ing a large key (100GB) from 
>> RadosGW and just reading the first few bytes to determine if it's a gzip 
>> file or not, and then just moving on without closing the connection. I'm 
>> RadosGW then goes on to read the rest of the object out of the cluster, 
>> while sending nothing to the client as it's no longer listening. When this 
>> client does this to many objects in quick succession, it essentially creates 
>> a DOS on my cluster as all my rgws are reading out of the cluster as fast as 
>> they can but not sending the data anywhere. This is on an up to date Jewel 
>> cluster, using civetweb for the web server.
>>
>> I just wanted to reach out and see if anyone else has seen this before I dig 
>> in more and try to find more details about where the problem may lay.
>
> I would say your client is broken, if it is only interested in a range
> of the object, it should include a corresponding range header with the
> GET request.
>
> Though I agree that the behaviour for closed connections could
> probably improved, too. See 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__tracker.ceph.com_issues_20166=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=6pdFEFo2m68_ouTlVrEa4GOrzh-WcOpK4K8hRD2n2ho=wtiIaAqUaoNJeBMwjyIDRQXs-So9Hj6xELikPSSRuV0=
>   for a
> similar issue, something like the opposite of your case.

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW/Civet: Reads too much data when client doesn't close the connection

2017-07-12 Thread Aaron Bassett
I have a situation where a client is GET'ing a large key (100GB) from RadosGW 
and just reading the first few bytes to determine if it's a gzip file or not, 
and then just moving on without closing the connection. I'm RadosGW then goes 
on to read the rest of the object out of the cluster, while sending nothing to 
the client as it's no longer listening. When this client does this to many 
objects in quick succession, it essentially creates a DOS on my cluster as all 
my rgws are reading out of the cluster as fast as they can but not sending the 
data anywhere. This is on an up to date Jewel cluster, using civetweb for the 
web server.

I just wanted to reach out and see if anyone else has seen this before I dig in 
more and try to find more details about where the problem may lay.

Aaron

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph with Clos IP fabric

2017-04-24 Thread Aaron Bassett
Agreed. In an ideal world I would have interleaved all my compute, long term 
storage and processing posix. Unfortunately, business doesn't always work out 
so nicely so I'm left with buying and building out to match changing needs. In 
this case we are a small part of a larger org and have been allocated X racks 
in the cage, which is at this point land locked with no room to expand so it is 
actual floor space that's limited. Hence the necessity to go as dense as 
possible when adding any new capacity. Luckily ceph is flexible enough to 
function fine when deployed like an EMC solution, it's just muuuch cheaper and 
more fun to operate!

Aaron

On Apr 24, 2017, at 12:59 AM, Richard Hesse 
<richard.he...@weebly.com<mailto:richard.he...@weebly.com>> wrote:

It's not a requirement to build out homogeneous racks of ceph gear. Most larger 
places don't do that (it creates weird hot spots).  If you have 5 racks of 
gear, you're better off spreading out servers in those 5 than just a pair of 
racks that are really built up. In Aaron's case, he can easily do that since 
he's not using a cluster network.

Just be sure to dial in your crush map and failure domains with only a pair of 
installed cabinets.

Thanks for sharing Christian! It's always good to hear about how others are 
using and deploying Ceph, while coming to similar and different conclusions.

Also,when you say datacenter space is expensive, are you referring to power or 
actual floor space? Datacenter space is almost always sold by power and floor 
space is usually secondary. Are there markets where that's opposite? If so, 
those are ripe for new entrants!


On Apr 23, 2017 7:56 PM, "Christian Balzer" 
<ch...@gol.com<mailto:ch...@gol.com>> wrote:

Hello,

Aaron pretty much stated most of what I was going to write, but to
generalize things and make some points more obvious, I shall pipe up as
well.

On Sat, 22 Apr 2017 21:45:58 -0700 Richard Hesse wrote:

> Out of curiosity, why are you taking a scale-up approach to building your
> ceph clusters instead of a scale-out approach? Ceph has traditionally been
> geared towards a scale-out, simple shared nothing mindset.

While true, scale-out does come at a cost:
a) rack space, which is mighty expensive where we want/need to be and also
of limited availability in those locations.
b) increased costs by having more individual servers, as in having two
servers with 6 OSDs versus 1 with 12 OSDs will cost you about 30-40% more
at the least (chassis, MB, PSU, NIC).

And then there is the whole scale thing in general, I'm getting the
impression that the majority of Ceph users have small to at best medium
sized clusters, simply because they don't need all that much capacity (in
terms of storage space).

Case in point, our main production Ceph clusters fit into 8-10U with 3-4
HDD based OSD servers and 2-4 SSD based cache tiers, obviously at this
size with everything being redundant (switches, PDU, PSU).
Serving hundreds (nearly 600 atm) of VMs, with a planned peak around
800 VMs.
That Ceph cluster will never have to grow beyond this size.
For me Ceph (RBD) was/is a more scalable approach than DRBD, allowing for
n+1 compute node deployments instead of having pairs (where one can't live
migrate to outside of this pair).

>These dual ToR
> deploys remind me of something from EMC, not ceph. Really curious as I'd
> rather have 5-6 racks of single ToR switches as opposed to three racks of
> dual ToR. Is there a specific application or requirement? It's definitely
> adding a lot of complexity; just wondering what the payoff is.
>

If you have plenty of racks, bully for you.
Though personally I'd try to keep failure domains (especially when they
are as large as full rack!) to something like 10% of the cluster.
We're not using Ethernet for the Ceph network (IPoIB), but if we were it
would be dual TORS with MC-LAG (and dual PSU, PDU) all the way.
Why have a SPOF that WILL impact your system (a rack worth of data
movement) in the first place?

Regards,

Christian

> Also, why are you putting your "cluster network" on the same physical
> interfaces but on separate VLANs? Traffic shaping/policing? What's your
> link speed there on the hosts? (25/40gbps?)
>
> On Sat, Apr 22, 2017 at 12:13 PM, Aaron Bassett 
> <aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>
> > wrote:
>
> > FWIW, I use a CLOS fabric with layer 3 right down to the hosts and
> > multiple ToRs to enable HA/ECMP to each node. I'm using Cumulus Linux's
> > "redistribute neighbor" feature, which advertises a /32 for any ARP'ed
> > neighbor. I set up the hosts with an IP on each physical interface and on
> > an aliased looopback: lo:0. I handle the separate cluster network by adding
> > a vlan to each interface and routing those separately on the ToRs with acls
> > to keep traffic apart.
> 

Re: [ceph-users] Ceph with Clos IP fabric

2017-04-23 Thread Aaron Bassett
We have space limitations in our DCs and so have to build as densely as 
possibly. These clusters are two racks of 500 osds each, though there is more 
hardware en route to start scaling them out. With just two racks, the risk of 
losing a ToR and taking down the cluster was enough to justify the slight added 
complexity of extra ToRs to ensure we have HA at that point in the 
architecture. It's not adding that much complexity, as it's all handled by 
configuration management once you get the kinks worked out the first time. We 
use this architecture throughout our networks, so running it for ceph is not 
any different than running it for any of our other service. I find it to be 
less complex and easier to debug than doing an MLAG setup as well.

We are currently running hosts with dual 10G nics, one to each ToR, but are 
evaluating 25 or 40 for upcoming deploys.

Once we gain confidence in ceph to expand  beyond a couple thousand osds in a 
cluster, I will certainly look to simplify by cutting down to one 
higher-throughput ToR per rack.

The logical public/private separation is to keep the traffic on a separate 
network and for ease of monitoring.

Aaron

On Apr 23, 2017, at 12:45 AM, Richard Hesse 
<richard.he...@weebly.com<mailto:richard.he...@weebly.com>> wrote:

Out of curiosity, why are you taking a scale-up approach to building your ceph 
clusters instead of a scale-out approach? Ceph has traditionally been geared 
towards a scale-out, simple shared nothing mindset. These dual ToR deploys 
remind me of something from EMC, not ceph. Really curious as I'd rather have 
5-6 racks of single ToR switches as opposed to three racks of dual ToR. Is 
there a specific application or requirement? It's definitely adding a lot of 
complexity; just wondering what the payoff is.

Also, why are you putting your "cluster network" on the same physical 
interfaces but on separate VLANs? Traffic shaping/policing? What's your link 
speed there on the hosts? (25/40gbps?)

On Sat, Apr 22, 2017 at 12:13 PM, Aaron Bassett 
<aaron.bass...@nantomics.com<mailto:aaron.bass...@nantomics.com>> wrote:
FWIW, I use a CLOS fabric with layer 3 right down to the hosts and multiple 
ToRs to enable HA/ECMP to each node. I'm using Cumulus Linux's "redistribute 
neighbor" feature, which advertises a /32 for any ARP'ed neighbor. I set up the 
hosts with an IP on each physical interface and on an aliased looopback: lo:0. 
I handle the separate cluster network by adding a vlan to each interface and 
routing those separately on the ToRs with acls to keep traffic apart.

Their documentation may help clarify a bit:
https://docs.cumulusnetworks.com/display/DOCS/Redistribute+Neighbor#RedistributeNeighbor-ConfiguringtheHost(s)<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.cumulusnetworks.com_display_DOCS_Redistribute-2BNeighbor-23RedistributeNeighbor-2DConfiguringtheHost-28s-29=DwMFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=LdECIE-c_E-ufsCbsGI575rzq7hblB5dlnMiGXM2TfA=6oi_DiEYVSkwq3Our18Hrkt0qP6K5xXi7DIZePY3OE8=>

Honestly the trickiest part is getting the routing on the hosts right, you 
essentially set static routes over each link and the kernel takes care of the 
ECMP.

I understand this is a bit different from your setup, but Ceph has no trouble 
at all with the IPs on multiple interfaces.

Aaron

Date: Sat, 22 Apr 2017 17:37:01 +
From: Maxime Guyot <maxime.gu...@elits.com<mailto:maxime.gu...@elits.com>>
To: Richard Hesse <richard.he...@weebly.com<mailto:richard.he...@weebly.com>>, 
Jan Marquardt
<j...@artfiles.de<mailto:j...@artfiles.de>>
Cc: "ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" 
<ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Ceph with Clos IP fabric
Message-ID: 
<919c8615-c50b-4611-9b6b-13b4fbf69...@elits.com<mailto:919c8615-c50b-4611-9b6b-13b4fbf69...@elits.com>>
Content-Type: text/plain; charset="utf-8"

Hi,

That only makes sense if you're running multiple ToR switches per rack for the 
public leaf network. Multiple public ToR switches per rack is not very common; 
most Clos crossbar networks run a single ToR switch. Several >guides on the 
topic (including Arista & Cisco) suggest that you use something like MLAG in a 
layer 2 domain between the switches if you need some sort of switch redundancy 
inside the rack. This increases complexity, and most people decide that it's 
not worth it and instead scale out across racks to gain the redundancy and 
survivability that multiple ToR offer.
If you use MLAG for L2 redundancy, you?ll still want 2 BGP sessions for L3 
redundancy, so why not skipping the MLAG all together and terminating your BGP 
session on each ToR?

Judging by the routes (169.254.0.1), you are using BGP unnumebered?

It sounds like the ?ip route get? outp

Re: [ceph-users] Ceph with Clos IP fabric

2017-04-22 Thread Aaron Bassett
FWIW, I use a CLOS fabric with layer 3 right down to the hosts and multiple 
ToRs to enable HA/ECMP to each node. I'm using Cumulus Linux's "redistribute 
neighbor" feature, which advertises a /32 for any ARP'ed neighbor. I set up the 
hosts with an IP on each physical interface and on an aliased looopback: lo:0. 
I handle the separate cluster network by adding a vlan to each interface and 
routing those separately on the ToRs with acls to keep traffic apart.

Their documentation may help clarify a bit:
https://docs.cumulusnetworks.com/display/DOCS/Redistribute+Neighbor#RedistributeNeighbor-ConfiguringtheHost(s)

Honestly the trickiest part is getting the routing on the hosts right, you 
essentially set static routes over each link and the kernel takes care of the 
ECMP.

I understand this is a bit different from your setup, but Ceph has no trouble 
at all with the IPs on multiple interfaces.

Aaron

Date: Sat, 22 Apr 2017 17:37:01 +
From: Maxime Guyot >
To: Richard Hesse >, 
Jan Marquardt
>
Cc: "ceph-users@lists.ceph.com" 
>
Subject: Re: [ceph-users] Ceph with Clos IP fabric
Message-ID: 
<919c8615-c50b-4611-9b6b-13b4fbf69...@elits.com>
Content-Type: text/plain; charset="utf-8"

Hi,

That only makes sense if you're running multiple ToR switches per rack for the 
public leaf network. Multiple public ToR switches per rack is not very common; 
most Clos crossbar networks run a single ToR switch. Several >guides on the 
topic (including Arista & Cisco) suggest that you use something like MLAG in a 
layer 2 domain between the switches if you need some sort of switch redundancy 
inside the rack. This increases complexity, and most people decide that it's 
not worth it and instead scale out across racks to gain the redundancy and 
survivability that multiple ToR offer.
If you use MLAG for L2 redundancy, you?ll still want 2 BGP sessions for L3 
redundancy, so why not skipping the MLAG all together and terminating your BGP 
session on each ToR?

Judging by the routes (169.254.0.1), you are using BGP unnumebered?

It sounds like the ?ip route get? output you get when using dummy0 is caused by 
a fallback on the default route, supposedly on eth0? Can check the exact routes 
received on server1 with ?show ip bgp neighbors  received-routes? 
once you enable ?neighbor  soft-reconfiguration inbound? and what?s 
installed in the table ?ip route??


Intrigued by this problem, I tried to reproduce it in a lab with virtualbox. I 
ran into the same problem.

Side note: Configuring the loopback IP on the physical interfaces is workable 
if you set it on **all** parallel links. Example with server1:

?iface enp3s0f0 inet static
 address 10.10.100.21/32
iface enp3s0f1 inet static
 address 10.10.100.21/32
iface enp4s0f0 inet static
 address 10.10.100.21/32
iface enp4s0f1 inet static
 address 10.10.100.21/32?

This should guarantee that the loopback ip is advertised if one of the 4 links 
to switch1 and switch2 is up, but I am not sure if that?s workable for ceph?s 
listening address.


Cheers,
Maxime

From: Richard Hesse >
Date: Thursday 20 April 2017 16:36
To: Maxime Guyot >
Cc: Jan Marquardt >, 
"ceph-users@lists.ceph.com" 
>
Subject: Re: [ceph-users] Ceph with Clos IP fabric

On Thu, Apr 20, 2017 at 2:13 AM, Maxime Guyot 
>
 wrote:
2) Why did you choose to run the ceph nodes on loopback interfaces as opposed 
to the /24 for the "public" interface?
I can?t speak for this example, but in a clos fabric you generally want to 
assign the routed IPs on loopback rather than physical interfaces. This way if 
one of the link goes down (t.ex the public interface), the routed IP is still 
advertised on the other link(s).

That only makes sense if you're running multiple ToR switches per rack for the 
public leaf network. Multiple public ToR switches per rack is not very common; 
most Clos crossbar networks run a single ToR switch. Several guides on the 
topic (including Arista & Cisco) suggest that you use something like MLAG in a 
layer 2 domain between the switches if you need some sort of switch redundancy 
inside the rack. This increases complexity, and most people decide that it's 
not worth it and instead  scale out across racks to gain the redundancy and 
survivability that multiple ToR offer.

On Thu, Apr 20, 2017 at 4:04 AM, Jan Marquardt 

[ceph-users] RadosGW slow gc

2015-01-01 Thread Aaron Bassett
I’m doing some load testing on radosgw to get ready for production and I had a 
problem with it stalling out. I had 100 cores from several nodes doing 
multipart uploads in parallel. This ran great for about two days, managing to 
upload about 2000 objects with an average size of 100GB. Then it stalled out 
and stopped. Ever since then, the gw has been gc’ing very slowly. During the 
upload run, it was creating objects at ~ 100/s, now it’s cleaning them at ~3/s. 
At this rate it wont be done for nearly a year and this is only a fraction of 
the data I need to put in. 

The pool I’m writing to is a cache pool at size 2 with an EC pool at 10+2 
behind it. (This data is not mission critical so we are trying to save space). 
I don’t know if this will affect the slow gc or not. 

I tried turning up rgw gc max objs to 256, but it didn’t seem to make a 
difference.

I’m working under the assumption that my uploads started stalling because too 
many un-gc’ed parts accumulated, but I may be way off base there. 

Any thoughts would be much appreciated, Aaron 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Balancing erasure crush rule

2014-12-23 Thread Aaron Bassett
I’m trying to set up an erasure coded pool with k=9 m=6 on 13 osd hosts. I’m 
trying to write a crush rule for this which will balance this between hosts as 
much as possible. I understand that having 9+6=15  13, I will need to parse 
the tree twice in order to find enough pgs. So what I’m trying to do is select 
~1 from each host on the first pass, and then select n more osds to fill it 
out, without using any osds from the first pass, and preferably balancing them 
between racks. 

For starters, I don't know if this is even possible or if its the right 
approach to what I'm trying to do, but heres my attempt:

rule .us-phx.rgw.buckets.ec {
ruleset 1
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step take default
step chooseleaf indep 0 type host
step emit
step take default
step chooseleaf indep 0 type rack
step emit
}

This gets me pretty close, the first pass works great and the second pass does 
a nice balance between racks, but in my testing ~ 6 out of 1000 pgs will have 
two osds in their group. I'm guessing I need to get down to one pass to make 
sure that doesn't happen, but I'm having a hard time sorting out how to hit the 
requirement of balancing among hosts *and* allowing for more than one osd per 
host. 

Thanks, Aaron 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancing erasure crush rule

2014-12-23 Thread Aaron Bassett
After some more work i realized that didn't get me closer at all. It was still 
only selecting 13 osds *and* still occasionally re-selecting the same one. I 
think the multiple emit/takes isn't working like I expect. Given:
 step take default
 step chooseleaf indep 0 type host
 step emit
 step take default
 step chooseleaf indep 0 type host
 step emit
In a rule, I would expect it to try to select ~1 osd per host once, and then 
start over again. Instead, what I'm seeing is it selects ~1 osd per host and 
then when it starts again, it re-selects those same osds, resulting in multiple 
placements on 2 or 3 osds per pg.

It turns out what I'm trying to do is described here:
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg01076.html
But I can't find any other references to anything like this. 

Thanks, Aaron
 
 On Dec 23, 2014, at 9:23 AM, Aaron Bassett aa...@five3genomics.com wrote:
 
 I’m trying to set up an erasure coded pool with k=9 m=6 on 13 osd hosts. I’m 
 trying to write a crush rule for this which will balance this between hosts 
 as much as possible. I understand that having 9+6=15  13, I will need to 
 parse the tree twice in order to find enough pgs. So what I’m trying to do is 
 select ~1 from each host on the first pass, and then select n more osds to 
 fill it out, without using any osds from the first pass, and preferably 
 balancing them between racks. 
 
 For starters, I don't know if this is even possible or if its the right 
 approach to what I'm trying to do, but heres my attempt:
 
 rule .us-phx.rgw.buckets.ec {
ruleset 1
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step take default
step chooseleaf indep 0 type host
step emit
step take default
step chooseleaf indep 0 type rack
step emit
 }
 
 This gets me pretty close, the first pass works great and the second pass 
 does a nice balance between racks, but in my testing ~ 6 out of 1000 pgs will 
 have two osds in their group. I'm guessing I need to get down to one pass to 
 make sure that doesn't happen, but I'm having a hard time sorting out how to 
 hit the requirement of balancing among hosts *and* allowing for more than one 
 osd per host. 
 
 Thanks, Aaron

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Incomplete PGs

2014-12-04 Thread Aaron Bassett
I have a small update to this: 

After an even closer reading of an offending pg's query I noticed the following:

peer: 4,
pgid: 19.6e,
last_update: 51072'48910307,
last_complete: 51072'48910307,
log_tail: 50495'48906592,

The log tail seems to have lagged behind the last_update/last_complete. I 
suspect this is whats causing the cluster to reject these pgs. Anyone know how 
i can go about cleaning this up?


Aaron 

 On Dec 1, 2014, at 8:12 PM, Aaron Bassett aa...@five3genomics.com wrote:
 
 Hi all, I have a problem with some incomplete pgs. Here’s the backstory: I 
 had a pool that I had accidently left with a size of 2. On one of the ods 
 nodes, the system hdd started to fail and I attempted to rescue it by 
 sacrificing one of my osd nodes. That went ok and I was able to bring the 
 node back up minus the one osd. Now I have 11 incomplete osds. I believe 
 these are mostly from the pool that only had size two, but I cant tell for 
 sure. I found another thread on here that talked about using 
 ceph_objectstore_tool to add or remove pg data to get out of an incomplete 
 state. 
 
 Let’s start with the one pg I’ve been playing with, this is a loose 
 description of where I’ve been. First I saw that it had the missing osd in 
 “down_osds_we_would_probe” when I queried it, and some reading around that 
 told me to recreate the missing osd, so I did that. It (obviously) didnt have 
 the missing data, but it took the pg from down+incomplete to just incomplete. 
 Then I tried pg_force_create and that didnt seem to make a difference. Some 
 more googling then brought me to ceph_objectstore_tool and I started to take 
 a closer look at the results from pg query. I noticed that the list of 
 probing osds gets longer and longer till the end of the query has something 
 like:
 
 probing_osds: [
   0,
   3,
   4,
   16,
   23,
   26,
   35,
   41,
   44,
   51,
   56”],
 
 So I took a look at those osds and noticed that some of them have data in the 
 directory for the troublesome pg and others dont. So I tried picking one with 
 the *most* data and i used ceph_objectstore_tool to export the pg. It was  
 6G so a fair amount of data is still there. I then imported it (after 
 removing) into all the others in that list. Unfortunately, it is still 
 incomplete. I’m not sure what my next step should be here. Here’s some other 
 stuff from the query on it:
 
 info: { pgid: 0.63b,
last_update: 50495'8246,
last_complete: 50495'8246,
log_tail: 20346'5245,
last_user_version: 8246,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 1,
last_epoch_started: 51102,
last_epoch_clean: 50495,
last_epoch_split: 0,
same_up_since: 68312,
same_interval_since: 68312,
same_primary_since: 68190,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486},
stats: { version: 50495'8246,
reported_seq: 84279,
reported_epoch: 69394,
state: down+incomplete,
last_fresh: 2014-12-01 23:23:07.355308,
last_change: 2014-12-01 21:28:52.771807,
last_active: 2014-11-24 13:37:09.784417,
last_clean: 2014-11-22 21:59:49.821836,
last_became_active: 0.00,
last_unstale: 2014-12-01 23:23:07.355308,
last_undegraded: 2014-12-01 23:23:07.355308,
last_fullsized: 2014-12-01 23:23:07.355308,
mapping_epoch: 68285,
log_start: 20346'5245,
ondisk_log_start: 20346'5245,
created: 1,
last_epoch_clean: 50495,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486,
log_size: 3001,
ondisk_log_size: 3001,
 
 
 Also in the peering section, all the peers now have the same last_update: 
 which makes me think it should just pick up and take off. 
 
 There is another think I’m having problems with and I’m not sure if it’s 
 related or not. I set a crush map manually as I have a mix of ssd and platter 
 osds and it seems to work when I set it, the cluster starts rebalancing, etc, 
 but if I do a restart ceph-all on all my nodes the crush maps seems to revert 
 to the one I didn’t set. I don’t know if its being blocked from taking by 
 these incomplete pgs or if I’m missing a step to get it to “stick” It makes 
 me think when I’m stopping and starting these osds to use 
 ceph_objectstore_tool on them they may be getting out of sync with the 
 cluster.
 
 Any insights would be greatly appreciated,
 
 Aaron 
 

___
ceph-users mailing list
ceph

[ceph-users] Incomplete PGs

2014-12-01 Thread Aaron Bassett
Hi all, I have a problem with some incomplete pgs. Here’s the backstory: I had 
a pool that I had accidently left with a size of 2. On one of the ods nodes, 
the system hdd started to fail and I attempted to rescue it by sacrificing one 
of my osd nodes. That went ok and I was able to bring the node back up minus 
the one osd. Now I have 11 incomplete osds. I believe these are mostly from the 
pool that only had size two, but I cant tell for sure. I found another thread 
on here that talked about using ceph_objectstore_tool to add or remove pg data 
to get out of an incomplete state. 

Let’s start with the one pg I’ve been playing with, this is a loose description 
of where I’ve been. First I saw that it had the missing osd in 
“down_osds_we_would_probe” when I queried it, and some reading around that told 
me to recreate the missing osd, so I did that. It (obviously) didnt have the 
missing data, but it took the pg from down+incomplete to just incomplete. Then 
I tried pg_force_create and that didnt seem to make a difference. Some more 
googling then brought me to ceph_objectstore_tool and I started to take a 
closer look at the results from pg query. I noticed that the list of probing 
osds gets longer and longer till the end of the query has something like:

 probing_osds: [
   0,
   3,
   4,
   16,
   23,
   26,
   35,
   41,
   44,
   51,
   56”],

So I took a look at those osds and noticed that some of them have data in the 
directory for the troublesome pg and others dont. So I tried picking one with 
the *most* data and i used ceph_objectstore_tool to export the pg. It was  6G 
so a fair amount of data is still there. I then imported it (after removing) 
into all the others in that list. Unfortunately, it is still incomplete. I’m 
not sure what my next step should be here. Here’s some other stuff from the 
query on it:

info: { pgid: 0.63b,
last_update: 50495'8246,
last_complete: 50495'8246,
log_tail: 20346'5245,
last_user_version: 8246,
last_backfill: MAX,
purged_snaps: [],
history: { epoch_created: 1,
last_epoch_started: 51102,
last_epoch_clean: 50495,
last_epoch_split: 0,
same_up_since: 68312,
same_interval_since: 68312,
same_primary_since: 68190,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486},
stats: { version: 50495'8246,
reported_seq: 84279,
reported_epoch: 69394,
state: down+incomplete,
last_fresh: 2014-12-01 23:23:07.355308,
last_change: 2014-12-01 21:28:52.771807,
last_active: 2014-11-24 13:37:09.784417,
last_clean: 2014-11-22 21:59:49.821836,
last_became_active: 0.00,
last_unstale: 2014-12-01 23:23:07.355308,
last_undegraded: 2014-12-01 23:23:07.355308,
last_fullsized: 2014-12-01 23:23:07.355308,
mapping_epoch: 68285,
log_start: 20346'5245,
ondisk_log_start: 20346'5245,
created: 1,
last_epoch_clean: 50495,
parent: 0.0,
parent_split_bits: 0,
last_scrub: 28158'8240,
last_scrub_stamp: 2014-11-18 17:08:49.368486,
last_deep_scrub: 28158'8240,
last_deep_scrub_stamp: 2014-11-18 17:08:49.368486,
last_clean_scrub_stamp: 2014-11-18 17:08:49.368486,
log_size: 3001,
ondisk_log_size: 3001,


Also in the peering section, all the peers now have the same last_update: which 
makes me think it should just pick up and take off. 

There is another think I’m having problems with and I’m not sure if it’s 
related or not. I set a crush map manually as I have a mix of ssd and platter 
osds and it seems to work when I set it, the cluster starts rebalancing, etc, 
but if I do a restart ceph-all on all my nodes the crush maps seems to revert 
to the one I didn’t set. I don’t know if its being blocked from taking by these 
incomplete pgs or if I’m missing a step to get it to “stick” It makes me think 
when I’m stopping and starting these osds to use ceph_objectstore_tool on them 
they may be getting out of sync with the cluster.

Any insights would be greatly appreciated,

Aaron 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-14 Thread Aaron Bassett
Well I upgraded both clusters to giant this morning just to see if that would 
help, and it didn’t. I have a couple questions though. I have the same 
regionmap on both clusters, with both zones in it, but then i only have the 
buckets and zone info for one zone in each cluster, is this right? Or do I need 
all the buckets and zones in both clusters? Reading the docs it doesn’t seem 
like i do because I’m expecting data to sync from one zone in one cluster to 
the other zone on the other cluster, but I don’t know what to think anymore. 

Also do both users need to be system users on both ends? 

Aaron


 On Nov 12, 2014, at 4:00 PM, Craig Lewis cle...@centraldesktop.com wrote:
 
 http://tracker.ceph.com/issues/9206 http://tracker.ceph.com/issues/9206
 
 My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html 
 http://www.spinics.net/lists/ceph-users/msg12665.html
 
 
 IIRC, the system uses didn't see the other user's bucket in a bucket listing, 
 but they could read and write the objects fine.
 
 
 
 On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett aa...@five3genomics.com 
 mailto:aa...@five3genomics.com wrote:
 In playing around with this a bit more, I noticed that the two users on the 
 secondary node cant see each others buckets. Is this a problem?
 
 IIRC, the system user couldn't see each other's buckets, but they could read 
 and write the objects. 
 On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com 
 mailto:cle...@centraldesktop.com wrote:
 
 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
 issue with Apache 2.4 on the primary and replication.  It's fixed, just 
 waiting for the next firefly release.  Although, that causes 40x errors 
 with Apache 2.4, not 500 errors.
 It is apache 2.4, but I’m actually running 0.80.7 so I probably have that 
 bug fix?
 
 
 No, the unreleased 0.80.8 has the fix.
  
  
 
 Have you verified that both system users can read and write to both 
 clusters?  (Just make sure you clean up the writes to the slave cluster).
 Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it 
 was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
 syncing properly, as are the users. It seems like really the only thing that 
 isn’t syncing is the .zone.rgw.buckets pool.
 
 That's pretty much the same behavior I was seeing with Apache 2.4.
 
 Try downgrading the primary cluster to Apache 2.2.  In my testing, the 
 secondary cluster could run 2.2 or 2.4.
 Do you have a link to that bug#? I want to see if it gives me any clues. 
 
 Aaron 
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-12 Thread Aaron Bassett
In playing around with this a bit more, I noticed that the two users on the 
secondary node cant see each others buckets. Is this a problem?
 On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com wrote:
 
 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
 issue with Apache 2.4 on the primary and replication.  It's fixed, just 
 waiting for the next firefly release.  Although, that causes 40x errors with 
 Apache 2.4, not 500 errors.
 It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug 
 fix?
 
 
 No, the unreleased 0.80.8 has the fix.
  
  
 
 Have you verified that both system users can read and write to both 
 clusters?  (Just make sure you clean up the writes to the slave cluster).
 Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it 
 was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
 syncing properly, as are the users. It seems like really the only thing that 
 isn’t syncing is the .zone.rgw.buckets pool.
 
 That's pretty much the same behavior I was seeing with Apache 2.4.
 
 Try downgrading the primary cluster to Apache 2.2.  In my testing, the 
 secondary cluster could run 2.2 or 2.4.
Do you have a link to that bug#? I want to see if it gives me any clues. 

Aaron 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett
 osd_op_reply(1784 
statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0) v6
2014-11-11 14:37:06.701597 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 
0x7f51b4001460 prio 127
2014-11-11 14:37:06.701627 7f51ff0f0700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).reader reading tag...
2014-11-11 14:37:06.701636 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.701678 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).write_ack 49
2014-11-11 14:37:06.701684 7f54ebfff700  1 -- 172.16.10.103:0/1007381 == 
osd.25 172.16.10.103:6934/14875 49  osd_op_reply(1784 
statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0) v6  190+0+0 
(1714651716 0 0) 0x7f51b4001460 con 0x7f53f00053f0
2014-11-11 14:37:06.701710 7f51ff1f1700 10 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer: state = open policy.server=0
2014-11-11 14:37:06.701728 7f51ff1f1700 20 -- 172.16.10.103:0/1007381  
172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 
c=0x7f53f00053f0).writer sleeping
2014-11-11 14:37:06.701751 7f54ebfff700 10 -- 172.16.10.103:0/1007381 
dispatch_throttle_release 190 to dispatch throttler 190/104857600
2014-11-11 14:37:06.701762 7f54ebfff700 20 -- 172.16.10.103:0/1007381 done 
calling dispatch on 0x7f51b4001460
2014-11-11 14:37:06.701815 7f54447f0700  0 WARNING: set_req_state_err err_no=5 
resorting to 500
2014-11-11 14:37:06.701894 7f54447f0700  1 == req done req=0x7f546800f3b0 
http_status=500 ==


Any information you could give me would be wonderful as I’ve been banging my 
head against this for a few days. 

Thanks, Aaron 

 On Nov 5, 2014, at 3:02 PM, Aaron Bassett aa...@five3genomics.com wrote:
 
 Ah so I need both users in both clusters? I think I missed that bit, let me 
 see if that does the trick.
 
 Aaron 
 On Nov 5, 2014, at 2:59 PM, Craig Lewis cle...@centraldesktop.com 
 mailto:cle...@centraldesktop.com wrote:
 
 One region two zones is the standard setup, so that should be fine.
 
 Is metadata (users and buckets) being replicated, but not data (objects)? 
 
 
 Let's go through a quick checklist:
 Verify that you enabled log_meta and log_data in the region.json for the 
 master zone
 Verify that RadosGW is using your region map with radosgw-admin regionmap 
 get --name client.radosgw.name 
 Verifu 
 Verify that RadosGW is using your zone map with radosgw-admin zone get 
 --name client.radosgw.name 
 Verify that all the pools in your zone exist (RadosGW only auto-creates the 
 basic ones).
 Verify that your system users exist in both zones with the same access and 
 secret.
 Hopefully that gives you an idea what's not working correctly.  
 
 If it doesn't, crank up the logging on the radosgw daemon on both sides, and 
 check the logs.  Add debug rgw = 20 to both ceph.conf (in the 
 client.radosgw.name section), and restart.  Hopefully those logs will tell 
 you what's wrong.
 
 
 On Wed, Nov 5, 2014 at 11:39 AM, Aaron Bassett aa...@five3genomics.com 
 mailto:aa...@five3genomics.com wrote:
 Hello everyone, 
 I am attempted to setup a two cluster situation for object storage disaster 
 recovery. I have two physically separate sites so using 1 big cluster isn’t 
 an option. I’m attempting to follow the guide at: 
 http://ceph.com/docs/v0.80.5/radosgw/federated-config/ 
 http://ceph.com/docs/v0.80.5/radosgw/federated-config/ . After a couple 
 days of flailing, I’ve settled on using 1 region with two zones, where each 
 cluster is a zone. I’m now attempting to set up an agent as per the 
 “Multi-Site Data Replication section. The agent kicks off ok and starts 
 making all sorts of connections, but no objects were being copied to the 
 non-master zone. I re-ran the agent with the -v flag and saw a lot of:
 
 DEBUG:urllib3.connectionpool:GET 
 /admin/opstate?client-id=radosgw-agentobject=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39op-id=phx-r1-head1%3A2451615%3A1
  HTTP/1.1 200 None  

 DEBUG:radosgw_agent.worker:op state is []
   
 DEBUG:radosgw_agent.worker:error geting op state: list index out of range
   
 
 So it appears something is still wrong with my agent though I have no idea 
 what. I can’t seem to find any errors in any other logs. Does anyone have 
 any insight here? 
 
 I’m also wondering if what I’m attempting with two cluster in the same 
 region as separate zones makes sense

Re: [ceph-users] Federated gateways

2014-11-11 Thread Aaron Bassett

 On Nov 11, 2014, at 4:21 PM, Craig Lewis cle...@centraldesktop.com wrote:
 
 Is that radosgw log from the primary or the secondary zone?  Nothing in that 
 log jumps out at me.
This is the log from the secondary zone. That HTTP 500 response code coming 
back is the only problem I can find. There are a bunch of 404s from other 
requests to logs and stuff, but I assume those are normal because there’s no 
activity going on. I guess it’s just that cryptic  WARNING: set_req_state_err 
err_no=5 resorting to 500 line that’s the problem. I think I need to get a 
stack trace from that somehow. 

 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
 issue with Apache 2.4 on the primary and replication.  It's fixed, just 
 waiting for the next firefly release.  Although, that causes 40x errors with 
 Apache 2.4, not 500 errors.
It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug 
fix?

 
 Have you verified that both system users can read and write to both clusters? 
  (Just make sure you clean up the writes to the slave cluster).
Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it was 
earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
syncing properly, as are the users. It seems like really the only thing that 
isn’t syncing is the .zone.rgw.buckets pool.

Thanks, Aaron 
 
 
 
 
 On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett aa...@five3genomics.com 
 mailto:aa...@five3genomics.com wrote:
 Ok I believe I’ve made some progress here. I have everything syncing *except* 
 data. The data is getting 500s when it tries to sync to the backup zone. I 
 have a log from the radosgw with debug cranked up to 20:
 
 2014-11-11 14:37:06.688331 7f54447f0700  1 == starting new request 
 req=0x7f546800f3b0 =
 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl header 
 for bucket, generating default
 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381 -- 172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4 remote, 172.16.10.103:6934/14875 http://172.16.10.103:6934/14875, have 
 pipe.
 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 
 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 
 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) 
 v4
 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 
 206599450695048354
 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping
 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK
 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http://172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 
 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 
 http://172.16.10.103:0/1007381  172.16.10.103:6934/14875 
 http

[ceph-users] Federated gateways

2014-11-05 Thread Aaron Bassett
Hello everyone, 
I am attempted to setup a two cluster situation for object storage disaster 
recovery. I have two physically separate sites so using 1 big cluster isn’t an 
option. I’m attempting to follow the guide at: 
http://ceph.com/docs/v0.80.5/radosgw/federated-config/ 
http://ceph.com/docs/v0.80.5/radosgw/federated-config/ . After a couple days 
of flailing, I’ve settled on using 1 region with two zones, where each cluster 
is a zone. I’m now attempting to set up an agent as per the “Multi-Site Data 
Replication section. The agent kicks off ok and starts making all sorts of 
connections, but no objects were being copied to the non-master zone. I re-ran 
the agent with the -v flag and saw a lot of:

DEBUG:urllib3.connectionpool:GET 
/admin/opstate?client-id=radosgw-agentobject=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39op-id=phx-r1-head1%3A2451615%3A1
 HTTP/1.1 200 None 

DEBUG:radosgw_agent.worker:op state is []   
   
DEBUG:radosgw_agent.worker:error geting op state: list index out of range   
   

So it appears something is still wrong with my agent though I have no idea 
what. I can’t seem to find any errors in any other logs. Does anyone have any 
insight here? 

I’m also wondering if what I’m attempting with two cluster in the same region 
as separate zones makes sense?

Thanks, Aaron ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com