Re: [ceph-users] radosgw issues
Guess I'll try again. I gave this another shot, following the documentation, and still end up with basically a fork bomb rather than the nice ListAllMyBucketsResult output that the docs say I should get. Everything else about the cluster works fine, and I see others talking about the gateway as if it just worked, so I'm led to believe that I'm probably doing something stupid. For the benefit of anyone that was sitting on the edge of their seat waiting for me to figure this out, I found that indeed I had done something stupid. Somehow I managed to miss the warning highlighted in red, set apart by itself and labeled "Important" in the documentation. I had not turned off FastCgiWrapper in /etc/httpd/conf.d/fastcgi.conf. Fixing that made everything work just fine. Thanks to all who offered help off list! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw issues
On 2014-06-16 13:16, lists+c...@deksai.com wrote: I've just tried setting up the radosgw on centos6 according to http://ceph.com/docs/master/radosgw/config/ While I can run the admin commands just fine to create users etc., making a simple wget request to the domain I set up returns a 500 due to a timeout. Every request I make results in another radosgw process being created, which seems to start even more processes itself. I only have to make a few requests to have about 60 radosgw processes. Guess I'll try again. I gave this another shot, following the documentation, and still end up with basically a fork bomb rather than the nice ListAllMyBucketsResult output that the docs say I should get. Everything else about the cluster works fine, and I see others talking about the gateway as if it just worked, so I'm led to believe that I'm probably doing something stupid. Has anybody else run into the situation where apache times out while fastcgi just launches more and more processes? The init script launches a process, and the webserver seems to launch the same thing, so I'm not clear on what should be happening here. Either way, I get nothing back when making a simple GET request to the domain. If anybody has suggestions, even if they are "You nincompoop! Everybody knows that you need to do such and such", that would be helpful. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw issues
On 2014-06-17 07:30, John Wilkins wrote: You followed this intallation guide: http://ceph.com/docs/master/install/install-ceph-gateway/ [16] An then you, followed this http://ceph.com/docs/master/radosgw/config/ [1] configuration guide and then you executed: sudo /etc/init.d/ceph-radosgw start And there was no ceph-radosgw script? We need to verify that first, and file a bug if we're not getting an init script in CentOS packages. I took a look again, and the package I had installed seemed to have come from epel, and did not contain the init script. I started from scratch with a minimal install of centos6 that hadn't been used for anything else. The package from the ceph repo does indeed have the init script. Unfortunately, I'm still running into the same issue. I removed all the rgw pools, started ceph-radosgw, and it recreated a few of them: .rgw.root .rgw.control .rgw .rgw.gc .users.uid Manually creating the rest of them has no effect. It complains about aquiring locks and listing objects: 2014-06-17 00:31:45.150494 7f86ec450820 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 1704 2014-06-17 00:31:45.150556 7f86ec450820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2014-06-17 00:31:45.150590 7f86ec450820 -1 WARNING: cross zone / region transfer performance may be affected 2014-06-17 00:32:02.469894 7f86ec450820 0 framework: fastcgi 2014-06-17 00:32:02.469958 7f86ec450820 0 starting handler: fastcgi 2014-06-17 00:32:13.455904 7f86b700 -1 failed to list objects pool_iterate returned r=-2 2014-06-17 00:32:13.455918 7f86b700 0 ERROR: lists_keys_next(): ret=-2 2014-06-17 00:32:13.455924 7f86b700 0 ERROR: sync_all_users() returned ret=-2 2014-06-17 00:32:13.611812 7f86d95f9700 0 RGWGC::process() failed to acquire lock on gc.16 2014-06-17 00:32:14.105180 7f86d95f9700 0 RGWGC::process() failed to acquire lock on gc.0 If I make a request, the server eventually fills up with so many radosgw process that the apache user can no longer fork any new processes. This is an strace from apache: read(13, "GET / HTTP/1.1\r\nUser-Agent: Wget/1.15 (linux-gnu)\r\nAccept: */*\r\nHost: gateway.ceph.chc.tlocal\r\nConnection: Keep-Alive\r\n\r\n", 8000) = 121 stat("/s3gw.fcgi", 0x7fffcc662580) = -1 ENOENT (No such file or directory) stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) = 0 open("/var/www/html/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) open("/var/www/html/s3gw.fcgi/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory) open("/var/www/html/s3gw.fcgi", O_RDONLY|O_CLOEXEC) = 14 fcntl(14, F_GETFD) = 0x1 (flags FD_CLOEXEC) fcntl(14, F_SETFD, FD_CLOEXEC) = 0 read(14, "#!/bin/sh\nexec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway\n", 4096) = 81 stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) = 0 brk(0x7fab146ce000) = 0x7fab146ce000 write(2, "[Mon Jun 16 22:26:03 2014] [warn] FastCGI: 10.30.85.51 GET http://gateway.ceph.chc.tlocal/ auth \n", 97) = 97 stat("/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 15 connect(15, {sa_family=AF_FILE, path="/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615"}, 63) = 0 fcntl(15, F_GETFL) = 0x2 (flags O_RDWR) fcntl(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 select(16, [15], [15], NULL, {3, 99784}) = 1 (out [15], left {3, 99781}) write(15, "\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0\r\0\0\n\1SCRIPT_URL/\1\4\0\1\0+\0\0\n\37SCRIPT_URIhttp://gateway.ceph.chc.tlocal/\1\4\0\1\0\24\0\0\22\0HTTP_AUTHORIZATION\1\4\0\1\0&\0\0\17\25HTTP_USER_AGENTWget/1.15 (linux-gnu)\1\4\0\1\0\20\0\0\v\3HTTP_ACCEPT*/*\1\4\0\1\0\"\0\0\t\27HTTP_HOSTgateway.ceph.chc.tlocal\1\4\0\1\0\33\0\0\17\nHTTP_CONNECTIONKee"..., 841) = 841 select(16, [15], [], NULL, {3, 99624}) = 0 (Timeout) write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30 select(16, [15], [], NULL, {2, 996562}) = 0 (Timeout) write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30 select(16, [15], [], NULL, {2, 996992}) = 0 (Timeout) write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30 select(16, [15], [], NULL, {2, 996700}^C ... This continues until it times out. and in /var/log/ceph/client.radosgw.gateway.log this repeats as all the other processes start 2014-06-16 22:27:28.411653 7f84742fb820 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 12225 2014-06-16 22:27:28.411668 7f84742fb820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2014-06-16 22:27:28.411672 7f84742fb820 -1 WARNING: cross zone / region transfer performance may be affected 2014-06-16 22:27:28.420286 7f84742fb820 -1 asok(0x8f2fe0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket
[ceph-users] radosgw issues
I've just tried setting up the radosgw on centos6 according to http://ceph.com/docs/master/radosgw/config/ There didn't seem to be an init script in the rpm I installed, so I copied the one from here: https://raw.githubusercontent.com/ceph/ceph/31b0823deb53a8300856db3c104c0e16d05e79f7/src/init-radosgw.sysv That launches one process. While I can run the admin commands just fine to create users etc., making a simple wget request to the domain I set up returns a 500 due to a timeout. Every request I make results in another radosgw process being created, which seems to start even more processes itself. I only have to make a few requests to have about 60 radosgw processes. I am at a bit of a loss to tell what is going on. I've included what I assume to be the error below. I see some filures to aquire locks, and it complaining that another process has already created the unix socket, but don't know how relevant those are. If anyone could point me in the right direction, I would appreciate it. 2014-06-15 20:21:03.814081 7f8c94cf9700 1 -- 10.30.83.29:0/1028955 --> 10.30.85.60:6806/18981 -- ping v1 -- ?+0 0x7f8c98068e50 con 0x1e73630 2014-06-15 20:21:03.814097 7f8c94cf9700 1 -- 10.30.83.29:0/1028955 --> 10.30.85.60:6813/20066 -- ping v1 -- ?+0 0x7f8c980693e0 con 0x1e76460 2014-06-15 20:21:03.814108 7f8c94cf9700 1 -- 10.30.83.29:0/1028955 --> 10.30.85.60:6810/19519 -- ping v1 -- ?+0 0x7f8c98069600 con 0x1e79b90 2014-06-15 20:21:03.815286 7fd779f84820 0 framework: fastcgi 2014-06-15 20:21:03.815305 7fd779f84820 0 starting handler: fastcgi 2014-06-15 20:21:03.816726 7fd74e6f8700 1 -- 10.30.83.29:0/1008789 --> 10.30.85.60:6800/18433 -- osd_op(client.6329.0:24 [pgls start_epoch 0] 7.0 ack+read e290) v4 -- ?+0 0x7fd75800a630 con 0x1505210 2014-06-15 20:21:03.817584 7fd770adb700 1 -- 10.30.83.29:0/1008789 <== osd.1 10.30.85.60:6800/18433 9 osd_op_reply(24 [pgls start_epoch 0] v0'0 uv0 ondisk = 1) v6 167+0+44 (1244889235 0 139081063) 0x7fd754000ce0 con 0x1505210 2014-06-15 20:21:03.817665 7fd770adb700 1 -- 10.30.83.29:0/1008789 --> 10.30.85.60:6806/18981 -- osd_op(client.6329.0:25 [pgls start_epoch 0] 7.1 ack+read e290) v4 -- ?+0 0x7fd75800aae0 con 0x15043c0 2014-06-15 20:21:03.819356 7fd770adb700 1 -- 10.30.83.29:0/1008789 <== osd.2 10.30.85.60:6806/18981 5 osd_op_reply(25 [pgls start_epoch 0] v0'0 uv0 ondisk = 1) v6 167+0+44 (3347405639 0 139081063) 0x1512950 con 0x15043c0 2014-06-15 20:21:03.819509 7fd770adb700 1 -- 10.30.83.29:0/1008789 --> 10.30.85.61:6800/28605 -- osd_op(client.6329.0:26 [pgls start_epoch 0] 7.2 ack+read e290) v4 -- ?+0 0x7fd75800c3e0 con 0x7fd75800ac80 2014-06-15 20:21:03.819635 7fd74fafa700 2 garbage collection: start 2014-06-15 20:21:03.819798 7fd74fafa700 1 -- 10.30.83.29:0/1008789 --> 10.30.85.60:6806/18981 -- osd_op(client.6329.0:27 gc.21 [call lock.lock] 6.6dc01772 ondisk+write e290) v4 -- ?+0 0x7fd7600022b0 con 0x15043c0 2014-06-15 20:21:03.823164 7fd770adb700 1 -- 10.30.83.29:0/1008789 <== osd.2 10.30.85.60:6806/18981 6 osd_op_reply(27 gc.21 [call] v0'0 uv0 ondisk = -16 ((16) Device or resource busy)) v6 172+0+0 (3774749926 0 0) 0x1512950 con 0x15043c0 2014-06-15 20:21:03.823309 7fd74fafa700 0 RGWGC::process() failed to acquire lock on gc.21 2014-06-15 20:21:03.823457 7fd74fafa700 1 -- 10.30.83.29:0/1008789 --> 10.30.85.60:6810/19519 -- osd_op(client.6329.0:28 gc.22 [call lock.lock] 6.97748d0d ondisk+write e290) v4 -- ?+0 0x7fd760002bc0 con 0x150b280 2014-06-15 20:21:03.821327 7fd74d2f6700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fd74d2f6700 time 2014-06-15 20:21:03.819948 common/Thread.cc: 110: FAILED assert(ret == 0) ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: (Thread::create(unsigned long)+0x8a) [0x7fd77900363a] 2: (ThreadPool::start_threads()+0x12e) [0x7fd778fe770e] 3: (ThreadPool::start()+0x7a) [0x7fd778feaa9a] 4: (RGWFCGXProcess::run()+0x195) [0x4ae305] 5: /usr/bin/radosgw() [0x4b3bbe] 6: (()+0x79d1) [0x7fd7772929d1] 7: (clone()+0x6d) [0x7fd776fdfb5d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -125> 2014-06-15 20:21:03.205715 7fd779f84820 5 asok(0x14c3070) register_command perfcounters_dump hook 0x14c4a80 -124> 2014-06-15 20:21:03.205761 7fd779f84820 5 asok(0x14c3070) register_command 1 hook 0x14c4a80 -123> 2014-06-15 20:21:03.205766 7fd779f84820 5 asok(0x14c3070) register_command perf dump hook 0x14c4a80 -122> 2014-06-15 20:21:03.205773 7fd779f84820 5 asok(0x14c3070) register_command perfcounters_schema hook 0x14c4a80 -121> 2014-06-15 20:21:03.205780 7fd779f84820 5 asok(0x14c3070) register_command 2 hook 0x14c4a80 -120> 2014-06-15 20:21:03.205784 7fd779f84820 5 asok(0x14c3070) register_command perf schema hook 0x14c4a80 -119> 2014-06-15 20:21:03.205787 7fd779f84820 5 asok(0x14c3070) register_command confi
[ceph-users] What exactly is the kernel rbd on osd issue?
I remember reading somewhere that the kernel ceph clients (rbd/fs) could not run on the same host as the OSD. I tried finding where I saw that, and could only come up with some irc chat logs. The issue stated there is that there can be some kind of deadlock. Is this true, and if so, would you have to run a totally different kernel in a vm, or would some form of namespacing be enough to avoid it? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd: add failed: (34) Numerical result out of range
I was building a small test cluster and noticed a difference with trying to rbd map depending on whether the cluster was built using fedora or CentOS. When I used CentOS osds, and tried to rbd map from arch linux or fedora, I would get "rbd: add failed: (34) Numerical result out of range". It seemed to happen when the tool was writing to /sys/bus/rbd/add_single_major. If I rebuild the osds using fedora (20 in this case), everything works fine. In each scenario, I used ceph-0.80.1 on all the boxes. Is that expected? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com