Hi,

we have a setup of 4 Servers running ceph and radosgw. We use it as an internal 
S3 service for our files. The Servers run Debian Squeeze with Ceph 0.67.4. 

The cluster has been running smoothly for quite a while, but we are currently 
experiencing issues with the radosgw. For some files the HTTP Download just 
stalls at around 500kb. 

The Apache error log just says:
[error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle 
timeout (30 sec)
[error] [client ] Handler for fastcgi-script returned invalid result code 1

radosgw logging:
7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
0x7f00934bb700' had timed out after 600
7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
0x7f00ab4eb700' had timed out after 600

The interesting thing is that the cluster health is fine an only some files are 
not working properly. Most of them just work fine. A restart of radosgw fixes 
the issue. The other ceph logs are also clean.

Any idea why this happens?

Sebastian


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to