[ceph-users] Idea for speedup RadosGW for buckets with many objects.
Hi, I'm experiencing problem with poor performance of RadosGW while operating on bucket with many object. That's known issue with LevelDB and can be partially resolved using shrading but I have one more idea. As I see in ceph osd logs all slow requests are while making call to rgw.bucket_list: 2016-02-17 03:17:56.846694 7f5396f63700 0 log_channel(cluster) log [WRN] : slow request 30.272904 seconds old, received at 2016-02-17 03:17:26.573742: osd_op(client.12611484.0:15137332 .dir.default.4162.3 [call rgw.bucket_list] 9.2955279 ack+read+known_if_redirected e3252) currently started I don't know exactly how Ceph internally works but maybe data required to return results for rgw.bucket_list could be cached for some time. Cache TTL would be parametrized and could be disabled to keep the same behaviour as current one. There can be 3 cases when there's a call to rgw.bucket_list: 1. no cached data 2. up-to-date cache 3. outdated cache Ad 1. First call starts generating full list. All new requests are put on hold. When list is ready it's saved to cache Ad 2. All calls are served from cache Ad 3. First request starts generating full list. All new requests are served from outdated cache until new cached data is ready This can be even optimized by periodically generating fresh cache, even if it's not expired yet to reduce cases when cache is outdated. Maybe this idea is stupid, maybe not, but if it's doable it would be nice to have choice. Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] storing bucket index in different pool than default
Hi, When I show bucket info I see: > [root@prod-ceph-01 /home/chris.ksiezyk]> radosgw-admin bucket stats -b bucket1 > { > "bucket": "bucket1", > "pool": ".rgw.buckets", > "index_pool": ".rgw.buckets.index", > "id": "default.4162.3", > "marker": "default.4162.3", > "owner": "user1", > "ver": "0#9442297", > "master_ver": "0#0", > "mtime": "2015-12-04 14:03:17.00", > "max_marker": "0#", > "usage": { > "rgw.main": { > "size_kb": 1082449749, > "size_kb_actual": 1092031396, > "num_objects": 4707779 > } > }, > "bucket_quota": { > "enabled": false, > "max_size_kb": -1, > "max_objects": -1 > } > } > Is there a way to store bucket and bucket index in newly created pools? Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW performance s3 many objects
Stefan Rogge writes: > > > Hi, > we are using the Ceph with RadosGW and S3 setting. > With more and more objects in the storage the writing speed slows down significantly. With 5 million object in the storage we had a writing speed of 10MS/s. With 10 million objects in the storage its only 5MB/s. > Is this a common issue? > Is the RadosGW suitable for a large amount of objects or would you recommend to not use the RadosGW with these amount of objects? > > Thank you. > > Stefan > > I found also a ticket at the ceph tracker with the same issue: > > http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability > > > ___ > ceph-users mailing list > ceph-users@... > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > Hi, I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't aware of it and now the only way to improve things is create new bucket with bucket index shrading or change way our apps store data into buckets. And of course copy tons of data :( In my case also sth happened to leveldb files and now I cannot even run some radosgw-admin commands like: radosgw-admin bucket check -b what causes osd daemon flapping and process timeout messages in logs. PGS containing .rgw.bucket.index can't be even backfilled to other osd as osd process dies with messages: [...] > 2016-01-25 15:47:22.700737 7f79fc66d700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150 > 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751 > common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout") > > ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f7a019f4be5] > 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long)+0x2d9) [0x7f7a019343b9] > 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6] > 4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc] > 5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb] > 6: (()+0x7df5) [0x7f79ffa8fdf5] > 7: (clone()+0x6d) [0x7f79fe3381ad] > > I don't know - maybe it's because number of leveldb files in omap folder (total 5.1GB). Read somewhere that things can be improved by setting 'leveldb_compression' to false and leveldb_compact_on_mount to true but I don't know if these options have any effect in 9.2.0 as they are not documented for this release. Tried with 'leveldb_compression' but without visible effect and wasn't brave enough with trying leveldb_compact_on_mount on production env. But setting it to true on my test 0.94.5 makes osd failing on restart. Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 411 Content-Length required error
On Mon, 2016-01-25 at 16:06 -0500, John Hogenmiller wrote: > Greetings, > > > > When I submit a request with "Transfer-Encoding: chunked", I get a 411 > Length required error back. It's very similar > to http://tracker.ceph.com/issues/3297 except I am running the ceph > version of fastcgi. Ceph does not appear to produce apache2 2.4 > versions, I am running upstream Apache from Ubuntu on 14.04 LTS. > > > My apache and ceph.conf files are at: > https://gist.github.com/ytjohn/da854151d8d360b927d0 > > > Versions: > * Ceph 9.2.0.1trusty > * apache2: 2.4.7-1ubuntu4.8 > > * libapache2-mod-fastcgi: 2.4.7~0910052141-ceph1 > > > Example session: > > > ~ s3curl.pl --id iphone_lab --put=1mb.img --debug -- --header > "Transfer-Encoding: chunked" > http://172.29.4.148/chunkedbucket2ip/imb.img > > s3curl: exec curl -v -H 'Date: Mon, 25 Jan 2016 19:34:06 +' -H > 'Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=' -L -H > 'content-type: ' -T 1mb.img --header 'Transfer-Encoding: chunked' > http://172.29.4.148/chunkedbucket2ip/imb.img > * Trying 172.29.4.148... > * Connected to 172.29.4.148 (172.29.4.148) port 80 (#0) > > PUT /chunkedbucket2ip/imb.img HTTP/1.1 > > Host: 172.29.4.148 > > User-Agent: curl/7.43.0 > > Accept: */* > > Date: Mon, 25 Jan 2016 19:34:06 + > > Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10= > > Transfer-Encoding: chunked > > Expect: 100-continue > > > < HTTP/1.1 100 Continue > < HTTP/1.1 411 Length Required > < Date: Mon, 25 Jan 2016 19:34:06 GMT > < Server: Apache/2.4.7 (Ubuntu) > < x-amz-request-id: tx1fda9-0056a678ae-10da-default > < Accept-Ranges: bytes > < Content-Length: 156 > < Connection: close > < Content-Type: application/xml > < > * Closing connection 0 > encoding="UTF-8"?>MissingContentLengthtx1fda9-0056a678ae-10da-default > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com In addition to my previous message - I was just reading messages on this mailing list and found in one post that setting "rgw content length compat" to true, should solve this issue. Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW performance s3 many objects
On Sun, 2016-01-24 at 13:44 +0100, Stefan Rogge wrote: > Hi, > we are using the Ceph with RadosGW and S3 setting. > With more and more objects in the storage the writing speed slows down > significantly. With 5 million object in the storage we had a writing > speed of 10MS/s. With 10 million objects in the storage its only > 5MB/s. > Is this a common issue? > Is the RadosGW suitable for a large amount of objects or would you > recommend to not use the RadosGW with these amount of objects? > > > Thank you. > > > Stefan > > > I found also a ticket at the ceph tracker with the same issue: > > > http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi, I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't aware of it and now the only way to improve things is create new bucket with bucket index shrading or change way our apps store data into buckets. And of course copy tons of data :( In my case also sth happened to leveldb files and now I cannot even run some radosgw-admin commands like: radosgw-admin bucket check -b what causes osd daemon flapping and process timeout messages in logs. PGS containing .rgw.bucket.index can't be even backfilled to other osd as osd process dies with messages: > [...] > 2016-01-25 15:47:22.700737 7f79fc66d700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150 > 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In > function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, > const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751 > common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout") > > ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x85) [0x7f7a019f4be5] > 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, > long)+0x2d9) [0x7f7a019343b9] > 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6] > 4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc] > 5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb] > 6: (()+0x7df5) [0x7f79ffa8fdf5] > 7: (clone()+0x6d) [0x7f79fe3381ad] > I don't know - maybe it's because number of leveldb files in omap folder (total 5.1GB). Read somewhere that things can be improved by setting 'leveldb_compression' to false and leveldb_compact_on_mount to true but I don't know if these options have any effect in 9.2.0 as they are not documented for this release. Tried with 'leveldb_compression' but without visible effect and wasn't brave enough with trying leveldb_compact_on_mount on live. But setting it to true on my test 0.94.5 makes osd failing on restart. Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 411 Content-Length required error
John Hogenmiller writes: > > Greetings, > When I submit a request with "Transfer-Encoding: chunked", I get a 411 Length required error back. It's very similar to http://tracker.ceph.com/issues/3297 except I am running the ceph version of fastcgi. Ceph does not appear to produce apache2 2.4 versions, I am running upstream Apache from Ubuntu on 14.04 LTS. > > My apache and ceph.conf files are at: https://gist.github.com/ytjohn/da854151d8d360b927d0 > > > Versions: > * Ceph 9.2.0.1trusty > * apache2: 2.4.7-1ubuntu4.8 * libapache2-mod-fastcgi: 2.4.7~0910052141- ceph1 > > > Example session: > > > ~ s3curl.pl --id iphone_lab --put=1mb.img --debug -- --header "Transfer- Encoding: chunked" http://172.29.4.148/chunkedbucket2ip/imb.img > > s3curl: exec curl -v -H 'Date: Mon, 25 Jan 2016 19:34:06 +' -H 'Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10=' -L -H 'content-type: ' -T 1mb.img --header 'Transfer-Encoding: chunked' http://172.29.4.148/chunkedbucket2ip/imb.img > > * Trying 172.29.4.148... > > * Connected to 172.29.4.148 (172.29.4.148) port 80 (#0) > > > > PUT /chunkedbucket2ip/imb.img HTTP/1.1 > > > > Host: 172.29.4.148 > > > User-Agent: curl/7.43.0 > > > Accept: */* > > > > Date: Mon, 25 Jan 2016 19:34:06 + > > > Authorization: AWS iphone_lab:i/l3AJ0C5pc/nSUUcwn7943ag10= > > > Transfer-Encoding: chunked > > > Expect: 100-continue > > > > > < HTTP/1.1 100 Continue > > < HTTP/1.1 411 Length Required > > < Date: Mon, 25 Jan 2016 19:34:06 GMT > > < Server: Apache/2.4.7 (Ubuntu) > > > < x-amz-request-id: tx1fda9-0056a678ae-10da-default > > < Accept-Ranges: bytes > > < Content-Length: 156 > > > < Connection: close > > < Content-Type: application/xml > > < > > * Closing connection 0 > > > MissingContentLengthtx1fda9- 0056a678ae-10da-default > > > > > > ___ > ceph-users mailing list > ceph-users@... > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > Hi, I've tried unsuccessfully for two days to force Apache or Nginx to proxy requests to fastcgi. Even tracking debug log files didn't help. Finally I gave up and started radosgw as webserver adding this to ceph.conf: [client.rgw.ceph-01] rgw_frontends = civetweb port=7480 [...] Then connect directly to port 7480 (this is default one) and problem with 411 disappears. Anyway, if you figure out how to force Apache / Nginx to cooperate with fastcgi, let me know. Kind regards - Krzysztof Księżyk ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com