Re: [ceph-users] radosgw issues

2014-07-08 Thread lists+ceph




Guess I'll try again.  I gave this another shot, following the
documentation, and still end up with basically a fork bomb rather than
the nice ListAllMyBucketsResult output that the docs say I should get.
 Everything else about the cluster works fine, and I see others
talking about the gateway as if it just worked, so I'm led to believe
that I'm probably doing something stupid.


For the benefit of anyone that was sitting on the edge of their seat 
waiting for me to figure this out, I found that indeed I had done 
something stupid.  Somehow I managed to miss the warning highlighted in 
red, set apart by itself and labeled "Important" in the documentation.


I had not turned off FastCgiWrapper in /etc/httpd/conf.d/fastcgi.conf.  
Fixing that made everything work just fine.

Thanks to all who offered help off list!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw issues

2014-06-30 Thread lists+ceph

On 2014-06-16 13:16, lists+c...@deksai.com wrote:

I've just tried setting up the radosgw on centos6 according to
http://ceph.com/docs/master/radosgw/config/



While I can run the admin commands just fine to create users etc.,
making a simple wget request to the domain I set up returns a 500 due
to a timeout.  Every request I make results in another radosgw process
being created, which seems to start even more processes itself.  I
only have to make a few requests to have about 60 radosgw processes.



Guess I'll try again.  I gave this another shot, following the 
documentation, and still end up with basically a fork bomb rather than 
the nice ListAllMyBucketsResult output that the docs say I should get.  
Everything else about the cluster works fine, and I see others talking 
about the gateway as if it just worked, so I'm led to believe that I'm 
probably doing something stupid.  Has anybody else run into the 
situation where apache times out while fastcgi just launches more and 
more processes?


The init script launches a process, and the webserver seems to launch 
the same thing, so I'm not clear on what should be happening here.  
Either way, I get nothing back when making a simple GET request to the 
domain.


If anybody has suggestions, even if they are "You nincompoop!  Everybody 
knows that you need to do such and such", that would be helpful.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw issues

2014-06-16 Thread lists+ceph

On 2014-06-17 07:30, John Wilkins wrote:

You followed this intallation guide:
http://ceph.com/docs/master/install/install-ceph-gateway/ [16]

An then you, followed this http://ceph.com/docs/master/radosgw/config/
[1] configuration guide and then you executed:

sudo /etc/init.d/ceph-radosgw start
And there was no ceph-radosgw script? We need to verify that first,
and file a bug if we're not getting an init script in CentOS packages.



I took a look again, and the package I had installed seemed to have come 
from epel, and did not contain the init script.  I started from scratch 
with a minimal install of centos6 that hadn't been used for anything 
else.  The package from the ceph repo does indeed have the init script.


Unfortunately, I'm still running into the same issue.  I removed all the 
rgw pools, started ceph-radosgw, and it recreated a few of them:

.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid

Manually creating the rest of them has no effect.  It complains about 
aquiring locks and listing objects:
2014-06-17 00:31:45.150494 7f86ec450820  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 1704
2014-06-17 00:31:45.150556 7f86ec450820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-06-17 00:31:45.150590 7f86ec450820 -1 WARNING: cross zone / region 
transfer performance may be affected

2014-06-17 00:32:02.469894 7f86ec450820  0 framework: fastcgi
2014-06-17 00:32:02.469958 7f86ec450820  0 starting handler: fastcgi
2014-06-17 00:32:13.455904 7f86b700 -1 failed to list objects 
pool_iterate returned r=-2
2014-06-17 00:32:13.455918 7f86b700  0 ERROR: lists_keys_next(): 
ret=-2
2014-06-17 00:32:13.455924 7f86b700  0 ERROR: sync_all_users() 
returned ret=-2
2014-06-17 00:32:13.611812 7f86d95f9700  0 RGWGC::process() failed to 
acquire lock on gc.16
2014-06-17 00:32:14.105180 7f86d95f9700  0 RGWGC::process() failed to 
acquire lock on gc.0




If I make a request, the server eventually fills up with so many radosgw 
process that the apache user can no longer fork any new processes.


This is an strace from apache:

read(13, "GET / HTTP/1.1\r\nUser-Agent: Wget/1.15 (linux-gnu)\r\nAccept: 
*/*\r\nHost: gateway.ceph.chc.tlocal\r\nConnection: Keep-Alive\r\n\r\n", 
8000) = 121
stat("/s3gw.fcgi", 0x7fffcc662580)  = -1 ENOENT (No such file or 
directory)
stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) 
= 0
open("/var/www/html/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory)
open("/var/www/html/s3gw.fcgi/.htaccess", O_RDONLY|O_CLOEXEC) = -1 
ENOTDIR (Not a directory)

open("/var/www/html/s3gw.fcgi", O_RDONLY|O_CLOEXEC) = 14
fcntl(14, F_GETFD)  = 0x1 (flags FD_CLOEXEC)
fcntl(14, F_SETFD, FD_CLOEXEC)  = 0
read(14, "#!/bin/sh\nexec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n 
client.radosgw.gateway\n", 4096) = 81
stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) 
= 0

brk(0x7fab146ce000) = 0x7fab146ce000
write(2, "[Mon Jun 16 22:26:03 2014] [warn] FastCGI: 10.30.85.51 GET 
http://gateway.ceph.chc.tlocal/ auth \n", 97) = 97
stat("/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615", 
{st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0

socket(PF_FILE, SOCK_STREAM, 0) = 15
connect(15, {sa_family=AF_FILE, 
path="/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615"}, 
63) = 0

fcntl(15, F_GETFL)  = 0x2 (flags O_RDWR)
fcntl(15, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
select(16, [15], [15], NULL, {3, 99784}) = 1 (out [15], left {3, 99781})
write(15, 
"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0\r\0\0\n\1SCRIPT_URL/\1\4\0\1\0+\0\0\n\37SCRIPT_URIhttp://gateway.ceph.chc.tlocal/\1\4\0\1\0\24\0\0\22\0HTTP_AUTHORIZATION\1\4\0\1\0&\0\0\17\25HTTP_USER_AGENTWget/1.15 
(linux-gnu)\1\4\0\1\0\20\0\0\v\3HTTP_ACCEPT*/*\1\4\0\1\0\"\0\0\t\27HTTP_HOSTgateway.ceph.chc.tlocal\1\4\0\1\0\33\0\0\17\nHTTP_CONNECTIONKee"..., 
841) = 841

select(16, [15], [], NULL, {3, 99624})  = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996562}) = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996992}) = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996700}^C 
...


This continues until it times out.

and in /var/log/ceph/client.radosgw.gateway.log this repeats as all the 
other processes start




2014-06-16 22:27:28.411653 7f84742fb820  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 12225
2014-06-16 22:27:28.411668 7f84742fb820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-06-16 22:27:28.411672 7f84742fb820 -1 WARNING: cross zone / region 
transfer performance may be affected
2014-06-16 22:27:28.420286 7f84742fb820 -1 asok(0x8f2fe0) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed 
to bind the UNIX domain socket 

[ceph-users] radosgw issues

2014-06-15 Thread lists+ceph
I've just tried setting up the radosgw on centos6 according to 
http://ceph.com/docs/master/radosgw/config/


There didn't seem to be an init script in the rpm I installed, so I 
copied the one from here:

https://raw.githubusercontent.com/ceph/ceph/31b0823deb53a8300856db3c104c0e16d05e79f7/src/init-radosgw.sysv
That launches one process.

While I can run the admin commands just fine to create users etc., 
making a simple wget request to the domain I set up returns a 500 due to 
a timeout.  Every request I make results in another radosgw process 
being created, which seems to start even more processes itself.  I only 
have to make a few requests to have about 60 radosgw processes.


I am at a bit of a loss to tell what is going on.  I've included what I 
assume to be the error below.  I see some filures to aquire locks, and 
it complaining that another process has already created the unix socket, 
but don't know how relevant those are.  If anyone could point me in the 
right direction, I would appreciate it.




2014-06-15 20:21:03.814081 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6806/18981 -- ping v1 -- ?+0 0x7f8c98068e50 con 0x1e73630
2014-06-15 20:21:03.814097 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6813/20066 -- ping v1 -- ?+0 0x7f8c980693e0 con 0x1e76460
2014-06-15 20:21:03.814108 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6810/19519 -- ping v1 -- ?+0 0x7f8c98069600 con 0x1e79b90

2014-⁠06-⁠15 20:21:03.815286 7fd779f84820  0 framework: fastcgi
2014-⁠06-⁠15 20:21:03.815305 7fd779f84820  0 starting handler: fastcgi
2014-06-15 20:21:03.816726 7fd74e6f8700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6800/18433 -- osd_op(client.6329.0:24  [pgls start_epoch 0] 
7.0 ack+read e290) v4 -- ?+0 0x7fd75800a630 con 0x1505210
2014-06-15 20:21:03.817584 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.1 10.30.85.60:6800/18433 9  osd_op_reply(24  [pgls start_epoch 
0] v0'0 uv0 ondisk = 1) v6  167+0+44 (1244889235 0 139081063) 
0x7fd754000ce0 con 0x1505210
2014-06-15 20:21:03.817665 7fd770adb700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6806/18981 -- osd_op(client.6329.0:25  [pgls start_epoch 0] 
7.1 ack+read e290) v4 -- ?+0 0x7fd75800aae0 con 0x15043c0
2014-06-15 20:21:03.819356 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.2 10.30.85.60:6806/18981 5  osd_op_reply(25  [pgls start_epoch 
0] v0'0 uv0 ondisk = 1) v6  167+0+44 (3347405639 0 139081063) 
0x1512950 con 0x15043c0
2014-06-15 20:21:03.819509 7fd770adb700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.61:6800/28605 -- osd_op(client.6329.0:26  [pgls start_epoch 0] 
7.2 ack+read e290) v4 -- ?+0 0x7fd75800c3e0 con 0x7fd75800ac80

2014-⁠06-⁠15 20:21:03.819635 7fd74fafa700  2 garbage collection: start
2014-06-15 20:21:03.819798 7fd74fafa700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6806/18981 -- osd_op(client.6329.0:27 gc.21 [call lock.lock] 
6.6dc01772 ondisk+write e290) v4 -- ?+0 0x7fd7600022b0 con 0x15043c0
2014-06-15 20:21:03.823164 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.2 10.30.85.60:6806/18981 6  osd_op_reply(27 gc.21 [call] v0'0 
uv0 ondisk = -16 ((16) Device or resource busy)) v6  172+0+0 
(3774749926 0 0) 0x1512950 con 0x15043c0
2014-06-15 20:21:03.823309 7fd74fafa700  0 RGWGC::process() failed to 
acquire lock on gc.21
2014-06-15 20:21:03.823457 7fd74fafa700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6810/19519 -- osd_op(client.6329.0:28 gc.22 [call lock.lock] 
6.97748d0d ondisk+write e290) v4 -- ?+0 0x7fd760002bc0 con 0x150b280
2014-06-15 20:21:03.821327 7fd74d2f6700 -1 common/Thread.cc: In function 
'void Thread::create(size_t)' thread 7fd74d2f6700 time 2014-06-15 
20:21:03.819948

common/⁠Thread.cc: 110: FAILED assert(ret == 0)

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (Thread::create(unsigned long)+0x8a) [0x7fd77900363a]
 2: (ThreadPool::start_threads()+0x12e) [0x7fd778fe770e]
 3: (ThreadPool::start()+0x7a) [0x7fd778feaa9a]
 4: (RGWFCGXProcess::run()+0x195) [0x4ae305]
 5: /⁠usr/⁠bin/⁠radosgw() [0x4b3bbe]
 6: (()+0x79d1) [0x7fd7772929d1]
 7: (clone()+0x6d) [0x7fd776fdfb5d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


-⁠-⁠-⁠ begin dump of recent events -⁠-⁠-⁠
  -125> 2014-06-15 20:21:03.205715 7fd779f84820  5 asok(0x14c3070) 
register_command perfcounters_dump hook 0x14c4a80
  -124> 2014-06-15 20:21:03.205761 7fd779f84820  5 asok(0x14c3070) 
register_command 1 hook 0x14c4a80
  -123> 2014-06-15 20:21:03.205766 7fd779f84820  5 asok(0x14c3070) 
register_command perf dump hook 0x14c4a80
  -122> 2014-06-15 20:21:03.205773 7fd779f84820  5 asok(0x14c3070) 
register_command perfcounters_schema hook 0x14c4a80
  -121> 2014-06-15 20:21:03.205780 7fd779f84820  5 asok(0x14c3070) 
register_command 2 hook 0x14c4a80
  -120> 2014-06-15 20:21:03.205784 7fd779f84820  5 asok(0x14c3070) 
register_command perf schema hook 0x14c4a80
  -119> 2014-06-15 20:21:03.205787 7fd779f84820  5 asok(0x14c3070) 
register_command confi

[ceph-users] What exactly is the kernel rbd on osd issue?

2014-06-12 Thread lists+ceph
I remember reading somewhere that the kernel ceph clients (rbd/fs) could
not run on the same host as the OSD.  I tried finding where I saw that,
and could only come up with some irc chat logs.

The issue stated there is that there can be some kind of deadlock.  Is
this true, and if so, would you have to run a totally different kernel
in a vm, or would some form of namespacing be enough to avoid it?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd: add failed: (34) Numerical result out of range

2014-06-09 Thread lists+ceph

I was building a small test cluster and noticed a difference with trying
to rbd map depending on whether the cluster was built using fedora or
CentOS.

When I used CentOS osds, and tried to rbd map from arch linux or fedora,
I would get "rbd: add failed: (34) Numerical result out of range".  It
seemed to happen when the tool was writing to /sys/bus/rbd/add_single_major.

If I rebuild the osds using fedora (20 in this case), everything
works fine.

In each scenario, I used ceph-0.80.1 on all the boxes.

Is that expected?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com