Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems

2014-10-24 Thread Craig Lewis
Thanks!  I'll continue with Apache 2.2 until the next release.

On Fri, Oct 24, 2014 at 8:58 AM, Yehuda Sadeh  wrote:

> On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis 
> wrote:
> > I'm having a problem getting RadosGW replication to work after upgrading
> to
> > Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster
> to
> > Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and
> > Ubuntu's packages cause the same problem.
> >
> > I'm pretty sure I'm missing something obvious, but I'm not seeing it.
> >
> > Has anybody else upgraded their federated gateways to apache 2.4?
> >
> >
> >
> > My setup
> > 2 VMs, each running their own ceph cluster with replication=1
> > test0-ceph.cdlocal is the primary zone, named us-west
> > test1-ceph.cdlocal is the secondary zone, named us-central
> > Before I start, replication works, and I'm running
> >
> > Ubuntu 14.04 LTS
> > Emperor (0.72.2-1precise, retained using apt-hold)
> > Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)
> >
> >
> > As soon as I upgrade Apache to 2.4 in the primary cluster, replication
> gets
> > permission errors.  radosgw-agent.log:
> > 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync
> > object bucket3/test6.jpg: state is error
> >
> > The access logs from the primary say (using vhost_combined log format):
> > test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT
> > /test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700]
> "GET
> > /?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"
> > 
> > test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700]
> "GET
> >
> /admin/log?marker=089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000
> > HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> > test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700]
> "GET
> >
> /bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us
> > HTTP/1.1" 403 249 "-" "-"
> >
> > 172.16.205.143 is the primary cluster, .144 is the secondary cluster,
> and .1
> > is my workstation.
> >
> >
> > The access logs on the secondary show:
> > test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700]
> "GET
> >
> /admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2
> > HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> > test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700]
> "PUT
> >
> /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent
> > HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> > test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700]
> "GET
> >
> /admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3
> > HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> >
> > If I crank up radosgw debugging, it tells me that the calculated digest
> is
> > correct for the /admin/* requests, but fails for the object GET:
> > /admin/log
> > 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated
> > digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> > 2014-10-23 15:44:29.257690 7fa6fcfb9700 15
> > auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> > /bucket3/test6.jpg
> > 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> > digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> > 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
> > 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request
> > 
> > /bucket3/test6.jpg
> > 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> > digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> > 2014-10-23 15:44:29.411573 7fa6fc7b8700 15
> > auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=
> > 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
> > 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request
> >
> > That explains the 403 responses.
> >
> > So I have metadata replication working, but the data replication is
> failing
> > with permission problems.  I verified that I can create users and
> buckets in
> > the primary, and have them replicate to the secondary.
> >
> >
> > A similar situation was posted to the list before.  That time, the
> problem
> > was that the system users weren't correctly deployed to both the primary
> and
> > secondary clusters.  I verified that both users exist in both clusters,
> with
> > the same access and secret.
> >
> > Just to test, I used s3cmd.  I can read and write to both clusters using
> > both system user's credentials.
> >
> >
> > Anybody have any ideas?
> >
>
> You're hitting issue #9206. Apache 2.4 filters out certain http
> headers because they use underscores instead of dashes. There's a fix
> for that for firefly, although it hasn't made it to an officially
> released version.
>
> Yehuda
>
___
ceph-users mailing list
ceph-

Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems

2014-10-24 Thread Yehuda Sadeh
On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis  wrote:
> I'm having a problem getting RadosGW replication to work after upgrading to
> Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to
> Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and
> Ubuntu's packages cause the same problem.
>
> I'm pretty sure I'm missing something obvious, but I'm not seeing it.
>
> Has anybody else upgraded their federated gateways to apache 2.4?
>
>
>
> My setup
> 2 VMs, each running their own ceph cluster with replication=1
> test0-ceph.cdlocal is the primary zone, named us-west
> test1-ceph.cdlocal is the secondary zone, named us-central
> Before I start, replication works, and I'm running
>
> Ubuntu 14.04 LTS
> Emperor (0.72.2-1precise, retained using apt-hold)
> Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)
>
>
> As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets
> permission errors.  radosgw-agent.log:
> 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync
> object bucket3/test6.jpg: state is error
>
> The access logs from the primary say (using vhost_combined log format):
> test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT
> /test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET
> /?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"
> 
> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
> /admin/log?marker=089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000
> HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
> /bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us
> HTTP/1.1" 403 249 "-" "-"
>
> 172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1
> is my workstation.
>
>
> The access logs on the secondary show:
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
> /admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2
> HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "PUT
> /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent
> HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
> test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
> /admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3
> HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
>
> If I crank up radosgw debugging, it tells me that the calculated digest is
> correct for the /admin/* requests, but fails for the object GET:
> /admin/log
> 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated
> digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> 2014-10-23 15:44:29.257690 7fa6fcfb9700 15
> auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=
> /bucket3/test6.jpg
> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
> 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request
> 
> /bucket3/test6.jpg
> 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
> digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
> 2014-10-23 15:44:29.411573 7fa6fc7b8700 15
> auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=
> 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
> 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request
>
> That explains the 403 responses.
>
> So I have metadata replication working, but the data replication is failing
> with permission problems.  I verified that I can create users and buckets in
> the primary, and have them replicate to the secondary.
>
>
> A similar situation was posted to the list before.  That time, the problem
> was that the system users weren't correctly deployed to both the primary and
> secondary clusters.  I verified that both users exist in both clusters, with
> the same access and secret.
>
> Just to test, I used s3cmd.  I can read and write to both clusters using
> both system user's credentials.
>
>
> Anybody have any ideas?
>

You're hitting issue #9206. Apache 2.4 filters out certain http
headers because they use underscores instead of dashes. There's a fix
for that for firefly, although it hasn't made it to an officially
released version.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW Federated Gateways and Apache 2.4 problems

2014-10-23 Thread Craig Lewis
I'm having a problem getting RadosGW replication to work after upgrading to
Apache 2.4 on my primary test cluster.  Upgrading the secondary cluster to
Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and
Ubuntu's packages cause the same problem.

I'm pretty sure I'm missing something obvious, but I'm not seeing it.

Has anybody else upgraded their federated gateways to apache 2.4?



My setup
2 VMs, each running their own ceph cluster with replication=1
test0-ceph.cdlocal is the primary zone, named us-west
test1-ceph.cdlocal is the secondary zone, named us-central
Before I start, replication works, and I'm running

   - Ubuntu 14.04 LTS
   - Emperor (0.72.2-1precise, retained using apt-hold)
   - Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold)


As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets
permission errors.  radosgw-agent.log:
2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync
object bucket3/test6.jpg: state is error

The access logs from the primary say (using vhost_combined log format):
test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] "PUT
/test6.jpg HTTP/1.1" 200 209 "-" "-"- - - [23/Oct/2014:13:24:18 -0700] "GET
/?delimiter=/ HTTP/1.1" 200 1254 "-" "-" "bucket3.test0-ceph.cdlocal"

test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
/admin/log?marker=089.89.3&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2&max-entries=1000
HTTP/1.1" 200 398 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] "GET
/bucket3/test6.jpg?rgwx-uid=us-central&rgwx-region=us&rgwx-prepend-metadata=us
HTTP/1.1" 403 249 "-" "-"

172.16.205.143 is the primary cluster, .144 is the secondary cluster, and
.1 is my workstation.


The access logs on the secondary show:
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
/admin/replica_log?bounds&type=bucket-index&bucket-instance=bucket3%3Aus-west.5697.2
HTTP/1.1" 200 643 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "PUT
/bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3&rgwx-source-zone=us-west&rgwx-client-id=radosgw-agent
HTTP/1.1" 403 286 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"
test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] "GET
/admin/opstate?client-id=radosgw-agent&object=bucket3%2Ftest6.jpg&op-id=test1-ceph0.cdlocal%3A6484%3A3
HTTP/1.1" 200 355 "-" "Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic"

If I crank up radosgw debugging, it tells me that the calculated digest is
correct for the /admin/* requests, but fails for the object GET:
/admin/log
2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated
digest=6Tt13P6naWJEc0mJmYyDj6NzBS8=
2014-10-23 15:44:29.257690 7fa6fcfb9700 15
auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8=
/bucket3/test6.jpg
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0
2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request

/bucket3/test6.jpg
2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated
digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U=
2014-10-23 15:44:29.411573 7fa6fc7b8700 15
auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0=
2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41
2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request

That explains the 403 responses.

So I have metadata replication working, but the data replication is failing
with permission problems.  I verified that I can create users and buckets
in the primary, and have them replicate to the secondary.


A similar situation was posted to the list before.  That time, the problem
was that the system users weren't correctly deployed to both the primary
and secondary clusters.  I verified that both users exist in both clusters,
with the same access and secret.

Just to test, I used s3cmd.  I can read and write to both clusters using
both system user's credentials.


Anybody have any ideas?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com