I'm having a problem getting RadosGW replication to work after upgrading to
Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to
Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and
Ubuntu's packages cause the same problem.

I'm pretty sure I'm missing something obvious, but I'm not seeing it.

Has anybody else upgraded their federated gateways to apache 2.4?

My setup
2 VMs, each running their own ceph cluster with replication=1
test0-ceph.cdlocal is the primary zone, named us-west
test1-ceph.cdlocal is the secondary zone, named us-central
Before I start, replication works, and I'm running

   - Ubuntu 14.04 LTS
   - Emperor (0.72.2-1precise, retained using apt-hold)
   - Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)

As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets
permission errors.  radosgw-agent.log:
2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync
object bucket3/test6.jpg: state is error

The access logs from the primary say (using vhost_combined log format):
test0-ceph.cdlocal:80 - - [23/Oct/2014:15:16:51 -0700] "PUT
/test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET
/?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"
test0-ceph.cdlocal:80 - - [23/Oct/2014:15:17:34 -0700] "GET
HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test0-ceph.cdlocal:80 - - [23/Oct/2014:15:17:34 -0700] "GET
HTTP/1.1" 403 249 "-" "-" is the primary cluster, .144 is the secondary cluster, and
.1 is my workstation.

The access logs on the secondary show:
test1-ceph.cdlocal:80 - - [23/Oct/2014:15:18:07 -0700] "GET
HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 - - [23/Oct/2014:15:18:07 -0700] "PUT
HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 - - [23/Oct/2014:15:18:07 -0700] "GET
HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

If I crank up radosgw debugging, it tells me that the calculated digest is
correct for the /admin/* requests, but fails for the object GET:
2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated
2014-10-23 15:44:29.257690 7fa6fcfb9700 15
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
2014-10-23 15:44:29.411573 7fa6fc7b8700 15
2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request

That explains the 403 responses.

So I have metadata replication working, but the data replication is failing
with permission problems.  I verified that I can create users and buckets
in the primary, and have them replicate to the secondary.

A similar situation was posted to the list before.  That time, the problem
was that the system users weren't correctly deployed to both the primary
and secondary clusters.  I verified that both users exist in both clusters,
with the same access and secret.

Just to test, I used s3cmd.  I can read and write to both clusters using
both system user's credentials.

Anybody have any ideas?
ceph-users mailing list

Reply via email to