Ok I believe I’ve made some progress here. I have everything syncing *except* data. The data is getting 500s when it tries to sync to the backup zone. I have a log from the radosgw with debug cranked up to 20:
2014-11-11 14:37:06.688331 7f54447f0700 1 ====== starting new request req=0x7f546800f3b0 ===== 2014-11-11 14:37:06.688978 7f54447f0700 0 WARNING: couldn't find acl header for bucket, generating default 2014-11-11 14:37:06.689358 7f54447f0700 1 -- 172.16.10.103:0/1007381 --> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950 osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001950 prio 127 2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).write_ack 48 2014-11-11 14:37:06.695421 7f54ebfff700 1 -- 172.16.10.103:0/1007381 <== osd.25 172.16.10.103:6934/14875 48 ==== osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 ==== 190+0+0 (4092879147 0 0) 0x7f51b4001950 con 0x7f53f00053f0 2014-11-11 14:37:06.695458 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.695476 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.695495 7f54ebfff700 10 -- 172.16.10.103:0/1007381 dispatch_throttle_release 190 to dispatch throttler 190/104857600 2014-11-11 14:37:06.695506 7f54ebfff700 20 -- 172.16.10.103:0/1007381 done calling dispatch on 0x7f51b4001950 2014-11-11 14:37:06.695616 7f54447f0700 0 > HTTP_DATE -> Tue Nov 11 14:37:06 2014 2014-11-11 14:37:06.695636 7f54447f0700 0 > HTTP_X_AMZ_COPY_SOURCE -> test/upload 2014-11-11 14:37:06.696823 7f54447f0700 1 -- 172.16.10.103:0/1007381 --> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 -- ?+0 0x7f534800fbb0 con 0x7f53f00053f0 2014-11-11 14:37:06.696866 7f54447f0700 20 -- 172.16.10.103:0/1007381 submit_message osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote, 172.16.10.103:6934/14875, have pipe. 2014-11-11 14:37:06.696935 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.696972 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer encoding 49 features 17592186044415 0x7f534800fbb0 osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write e47531) v4 2014-11-11 14:37:06.697120 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 49): sig = 6092508395557517420 2014-11-11 14:37:06.697161 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sending 49 0x7f534800fbb0 2014-11-11 14:37:06.697223 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.697257 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.701315 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ACK 2014-11-11 14:37:06.701364 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 49 2014-11-11 14:37:06.701376 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.701389 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got MSG 2014-11-11 14:37:06.701402 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190 data=0 off 0 2014-11-11 14:37:06.701415 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler 0/104857600 2014-11-11 14:37:06.701435 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got front 190 2014-11-11 14:37:06.701449 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).aborted = 0 2014-11-11 14:37:06.701458 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message 2014-11-11 14:37:06.701569 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader got message 49 0x7f51b4001460 osd_op_reply(1784 statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0) v6 2014-11-11 14:37:06.701597 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 queue 0x7f51b4001460 prio 127 2014-11-11 14:37:06.701627 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).reader reading tag... 2014-11-11 14:37:06.701636 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.701678 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).write_ack 49 2014-11-11 14:37:06.701684 7f54ebfff700 1 -- 172.16.10.103:0/1007381 <== osd.25 172.16.10.103:6934/14875 49 ==== osd_op_reply(1784 statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0) v6 ==== 190+0+0 (1714651716 0 0) 0x7f51b4001460 con 0x7f53f00053f0 2014-11-11 14:37:06.701710 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0 2014-11-11 14:37:06.701728 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524 cs=1 l=1 c=0x7f53f00053f0).writer sleeping 2014-11-11 14:37:06.701751 7f54ebfff700 10 -- 172.16.10.103:0/1007381 dispatch_throttle_release 190 to dispatch throttler 190/104857600 2014-11-11 14:37:06.701762 7f54ebfff700 20 -- 172.16.10.103:0/1007381 done calling dispatch on 0x7f51b4001460 2014-11-11 14:37:06.701815 7f54447f0700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2014-11-11 14:37:06.701894 7f54447f0700 1 ====== req done req=0x7f546800f3b0 http_status=500 ====== Any information you could give me would be wonderful as I’ve been banging my head against this for a few days. Thanks, Aaron > On Nov 5, 2014, at 3:02 PM, Aaron Bassett <aa...@five3genomics.com> wrote: > > Ah so I need both users in both clusters? I think I missed that bit, let me > see if that does the trick. > > Aaron >> On Nov 5, 2014, at 2:59 PM, Craig Lewis <cle...@centraldesktop.com >> <mailto:cle...@centraldesktop.com>> wrote: >> >> One region two zones is the standard setup, so that should be fine. >> >> Is metadata (users and buckets) being replicated, but not data (objects)? >> >> >> Let's go through a quick checklist: >> Verify that you enabled log_meta and log_data in the region.json for the >> master zone >> Verify that RadosGW is using your region map with radosgw-admin regionmap >> get --name client.radosgw.<name> >> Verifu >> Verify that RadosGW is using your zone map with radosgw-admin zone get >> --name client.radosgw.<name> >> Verify that all the pools in your zone exist (RadosGW only auto-creates the >> basic ones). >> Verify that your system users exist in both zones with the same access and >> secret. >> Hopefully that gives you an idea what's not working correctly. >> >> If it doesn't, crank up the logging on the radosgw daemon on both sides, and >> check the logs. Add debug rgw = 20 to both ceph.conf (in the >> client.radosgw.<name> section), and restart. Hopefully those logs will tell >> you what's wrong. >> >> >> On Wed, Nov 5, 2014 at 11:39 AM, Aaron Bassett <aa...@five3genomics.com >> <mailto:aa...@five3genomics.com>> wrote: >> Hello everyone, >> I am attempted to setup a two cluster situation for object storage disaster >> recovery. I have two physically separate sites so using 1 big cluster isn’t >> an option. I’m attempting to follow the guide at: >> http://ceph.com/docs/v0.80.5/radosgw/federated-config/ >> <http://ceph.com/docs/v0.80.5/radosgw/federated-config/> . After a couple >> days of flailing, I’ve settled on using 1 region with two zones, where each >> cluster is a zone. I’m now attempting to set up an agent as per the >> “Multi-Site Data Replication section. The agent kicks off ok and starts >> making all sorts of connections, but no objects were being copied to the >> non-master zone. I re-ran the agent with the -v flag and saw a lot of: >> >> DEBUG:urllib3.connectionpool:"GET >> /admin/opstate?client-id=radosgw-agent&object=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39&op-id=phx-r1-head1%3A2451615%3A1 >> HTTP/1.1" 200 None >> >> DEBUG:radosgw_agent.worker:op state is [] >> >> DEBUG:radosgw_agent.worker:error geting op state: list index out of range >> >> >> So it appears something is still wrong with my agent though I have no idea >> what. I can’t seem to find any errors in any other logs. Does anyone have >> any insight here? >> >> I’m also wondering if what I’m attempting with two cluster in the same >> region as separate zones makes sense? >> >> Thanks, Aaron >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com