[ceph-users] Idea for speedup RadosGW for buckets with many objects.

2016-02-17 Thread Krzysztof Księżyk
Hi,

I'm experiencing problem with poor performance of RadosGW while
operating on bucket with many object. That's known issue with LevelDB
and can be partially resolved using shrading but I have one more idea.
As I see in ceph osd logs all slow requests are while making call to
rgw.bucket_list:

2016-02-17 03:17:56.846694 7f5396f63700  0 log_channel(cluster) log
[WRN] : slow request 30.272904 seconds old, received at 2016-02-17
03:17:26.573742: osd_op(client.12611484.0:15137332 .dir.default.4162.3
[call rgw.bucket_list] 9.2955279 ack+read+known_if_redirected e3252)
currently started

I don't know exactly how Ceph internally works but maybe data required
to return results for rgw.bucket_list could be cached for some time.
Cache TTL would be parametrized and could be disabled to keep the same
behaviour as current one. There can be 3 cases when there's a call to
rgw.bucket_list:
1. no cached data
2. up-to-date cache
3. outdated cache

Ad 1. First call starts generating full list. All new requests are put
on hold. When list is ready it's saved to cache
Ad 2. All calls are served from cache
Ad 3. First request starts generating full list. All new requests are
served from outdated cache until new cached data is ready

This can be even optimized by periodically generating fresh cache, even
if it's not expired yet to reduce cases when cache is outdated.

Maybe this idea is stupid, maybe not, but if it's doable it would be
nice to have choice.

Kind regards -
Krzysztof Księżyk


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] storing bucket index in different pool than default

2016-01-29 Thread Krzysztof Księżyk

Hi,

When I show bucket info I see:


> [root@prod-ceph-01 /home/chris.ksiezyk]> radosgw-admin bucket stats -b bucket1
> {
> "bucket": "bucket1",
> "pool": ".rgw.buckets",
> "index_pool": ".rgw.buckets.index",
> "id": "default.4162.3",
> "marker": "default.4162.3",
> "owner": "user1",
> "ver": "0#9442297",
> "master_ver": "0#0",
> "mtime": "2015-12-04 14:03:17.00",
> "max_marker": "0#",
> "usage": {
> "rgw.main": {
> "size_kb": 1082449749,
> "size_kb_actual": 1092031396,
> "num_objects": 4707779
> }
> },
> "bucket_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> }
> }
> 

Is there a way to store bucket and bucket index in newly created pools?

Kind regards -
Krzysztof Księżyk

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW performance s3 many objects

2016-01-27 Thread Krzysztof Księżyk
Stefan Rogge  writes:

> 
> 
> Hi,
> we are using the Ceph with RadosGW and S3 setting.
> With more and more objects in the storage the writing speed slows down 
significantly. With 5 million object in the storage we had a writing speed 
of 10MS/s. With 10 million objects in the storage its only 5MB/s.  
> Is this a common issue?
> Is the RadosGW suitable for a large amount of objects or would you 
recommend to not use the RadosGW with these amount of objects?
> 
> Thank you.
> 
> Stefan
> 
> I found also a ticket at the ceph tracker with the same issue:
> 
> http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability 
>   
> 
> ___
> ceph-users mailing list
> ceph-users@...
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't 
aware of it and now the only way to improve things is create new bucket 
with bucket index shrading or change way our apps store data into buckets. 
And of course copy tons of data :( In my case also sth happened to leveldb 
files and now I cannot even run some radosgw-admin commands like:

radosgw-admin bucket check -b 

what causes osd daemon flapping and process timeout messages in logs. PGS 
containing  .rgw.bucket.index  can't be even backfilled to other osd as osd 
process dies with messages:

[...]
> 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
> 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In 
function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, 
const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
> common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
> 
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7f7a019f4be5]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x2d9) [0x7f7a019343b9]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
>  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
>  6: (()+0x7df5) [0x7f79ffa8fdf5]
>  7: (clone()+0x6d) [0x7f79fe3381ad]
> 
> 
I don't know - maybe it's because number of leveldb files in omap folder 
(total 5.1GB). Read somewhere that things can be improved by setting 
'leveldb_compression' to false and leveldb_compact_on_mount to true but I 
don't know if these options have any effect in 9.2.0 as they are not 
documented for this release. Tried with 'leveldb_compression' but without 
visible effect and wasn't brave enough with trying leveldb_compact_on_mount 
on production env. But setting it to true on my test 0.94.5 makes osd 
failing on restart.

Kind regards -
Krzysztof Księżyk


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 411 Content-Length required error

2016-01-27 Thread Krzysztof Księżyk
On Mon, 2016-01-25 at 16:06 -0500, John Hogenmiller wrote:
> Greetings,
> 
> 
> 
> When I submit a request with "Transfer-Encoding: chunked", I get a 411
> Length required error back. It's very similar
> to http://tracker.ceph.com/issues/3297 except I am running the ceph
> version of fastcgi. Ceph does not appear to produce apache2 2.4
> versions, I am running upstream Apache from Ubuntu on 14.04 LTS.
> 
> 
> My apache and ceph.conf files are at:
> https://gist.github.com/ytjohn/da854151d8d360b927d0
> 
> 
> Versions:
>  * Ceph 9.2.0.1trusty
>  * apache2: 2.4.7-1ubuntu4.8
> 
>  * libapache2-mod-fastcgi:  2.4.7~0910052141-ceph1
> 
> 
> Example session:
> 
> 
> ~ s3curl.pl --id iphone_lab --put=1mb.img --debug --  --header
> "Transfer-Encoding: chunked"
> http://172.29.4.148/chunkedbucket2ip/imb.img
>  
> s3curl: exec curl -v -H 'Date: Mon, 25 Jan 2016 19:34:06 +' -H
> 'Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=' -L -H
> 'content-type: ' -T 1mb.img --header 'Transfer-Encoding: chunked'
> http://172.29.4.148/chunkedbucket2ip/imb.img
> *   Trying 172.29.4.148...
> * Connected to 172.29.4.148 (172.29.4.148) port 80 (#0)
> > PUT /chunkedbucket2ip/imb.img HTTP/1.1
> > Host: 172.29.4.148
> > User-Agent: curl/7.43.0
> > Accept: */*
> > Date: Mon, 25 Jan 2016 19:34:06 +
> > Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=
> > Transfer-Encoding: chunked
> > Expect: 100-continue
> > 
> < HTTP/1.1 100 Continue
> < HTTP/1.1 411 Length Required
> < Date: Mon, 25 Jan 2016 19:34:06 GMT
> < Server: Apache/2.4.7 (Ubuntu)
> < x-amz-request-id: tx1fda9-0056a678ae-10da-default
> < Accept-Ranges: bytes
> < Content-Length: 156
> < Connection: close
> < Content-Type: application/xml
> < 
> * Closing connection 0
>  encoding="UTF-8"?>MissingContentLengthtx1fda9-0056a678ae-10da-default
> 
>   
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

In addition to my previous message - I was just reading messages on this
mailing list and found in one post that 
setting "rgw content length compat" to true, should solve this issue.

Kind regards -
Krzysztof Księżyk
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW performance s3 many objects

2016-01-27 Thread Krzysztof Księżyk

On Sun, 2016-01-24 at 13:44 +0100, Stefan Rogge wrote: 

> Hi,
> we are using the Ceph with RadosGW and S3 setting.
> With more and more objects in the storage the writing speed slows down
> significantly. With 5 million object in the storage we had a writing
> speed of 10MS/s. With 10 million objects in the storage its only
> 5MB/s.  
> Is this a common issue?
> Is the RadosGW suitable for a large amount of objects or would you
> recommend to not use the RadosGW with these amount of objects?
> 
> 
> Thank you.
> 
> 
> Stefan
> 
> 
> I found also a ticket at the ceph tracker with the same issue:
> 
> 
> http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't
aware of it and now the only way to improve things is create new bucket
with bucket index shrading or change way our apps store data into
buckets. And of course copy tons of data :( In my case also sth happened
to leveldb files and now I cannot even run some radosgw-admin commands
like:

radosgw-admin bucket check -b 

what causes osd daemon flapping and process timeout messages in logs.
PGS containing  .rgw.bucket.index  can't be even backfilled to other osd
as osd process dies with messages:


> [...]
> 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy 
> 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
> 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In 
> function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, 
> const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
> common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
> 
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x85) [0x7f7a019f4be5]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, 
> long)+0x2d9) [0x7f7a019343b9]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
>  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
>  6: (()+0x7df5) [0x7f79ffa8fdf5]
>  7: (clone()+0x6d) [0x7f79fe3381ad]
> 


I don't know - maybe it's because number of leveldb files in omap folder
(total 5.1GB). Read somewhere that things can be improved by setting
'leveldb_compression' to false and leveldb_compact_on_mount to true but
I don't know if these options have any effect in 9.2.0 as they are not
documented for this release. Tried with 'leveldb_compression' but
without visible effect and wasn't brave enough with trying
leveldb_compact_on_mount on live. But setting it to true on my test
0.94.5 makes osd failing on restart.

Kind regards -
Krzysztof Księżyk 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 411 Content-Length required error

2016-01-26 Thread Krzysztof Księżyk
John Hogenmiller  writes:

> 
> Greetings,
> When I submit a request with "Transfer-Encoding: chunked", I get a 411 
Length required error back. It's very similar 
to http://tracker.ceph.com/issues/3297 except I am running the ceph version 
of fastcgi. Ceph does not appear to produce apache2 2.4 versions, I am 
running upstream Apache from Ubuntu on 14.04 LTS.
> 
> My apache and ceph.conf files are at: 
https://gist.github.com/ytjohn/da854151d8d360b927d0
> 
> 
> Versions:
>  * Ceph 9.2.0.1trusty
>  * apache2: 2.4.7-1ubuntu4.8 * libapache2-mod-fastcgi:  2.4.7~0910052141-
ceph1
> 
> 
> Example session:
> 
> 
> ~ s3curl.pl --id iphone_lab --put=1mb.img --debug --  --header "Transfer-
Encoding: chunked" http://172.29.4.148/chunkedbucket2ip/imb.img
>  
> s3curl: exec curl -v -H 'Date: Mon, 25 Jan 2016 19:34:06 +' -H 
'Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=' -L -H 
'content-type: ' -T 1mb.img --header 'Transfer-Encoding: chunked' 
http://172.29.4.148/chunkedbucket2ip/imb.img
> 
> *   Trying 172.29.4.148...
> 
> * Connected to 172.29.4.148 (172.29.4.148) port 80 (#0)
> 
> 
> > PUT /chunkedbucket2ip/imb.img HTTP/1.1
> 
> 
> > Host: 172.29.4.148
> 
> > User-Agent: curl/7.43.0
> 
> > Accept: */*
> 
> 
> > Date: Mon, 25 Jan 2016 19:34:06 +
> 
> > Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=
> 
> > Transfer-Encoding: chunked
> 
> > Expect: 100-continue
> 
> > 
> 
> < HTTP/1.1 100 Continue
> 
> < HTTP/1.1 411 Length Required
> 
> < Date: Mon, 25 Jan 2016 19:34:06 GMT
> 
> < Server: Apache/2.4.7 (Ubuntu)
> 
> 
> < x-amz-request-id: tx1fda9-0056a678ae-10da-default
> 
> < Accept-Ranges: bytes
> 
> < Content-Length: 156
> 
> 
> < Connection: close
> 
> < Content-Type: application/xml
> 
> < 
> 
> * Closing connection 0
> 
> 
> 
MissingContentLengthtx1fda9-
0056a678ae-10da-default
>   
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@...
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


Hi,

I've tried unsuccessfully for two days to force Apache or Nginx to proxy 
requests to fastcgi. Even tracking debug log files didn't help. Finally I 
gave up and started radosgw as webserver adding this to ceph.conf:

[client.rgw.ceph-01]
rgw_frontends = civetweb port=7480
[...]


Then connect directly to port 7480 (this is default one) and problem with 
411 disappears.

Anyway, if you figure out how to force Apache / Nginx to cooperate with 
fastcgi, let me know.

Kind regards -
Krzysztof Księżyk
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com