Oops, we have another check in do_op which returns ealier when r is ENOENT and obc doesn't exist.
if (r && (r != -ENOENT || !obc)) { dout(20) << __func__ << "find_object_context got error " << r << dendl; osd->reply_op_error(op, r); return; } We can't simply remove this check as with the obs not existing case because the obc is used later... -----Original Message----- From: Wang, Zhiqiang Sent: Friday, June 5, 2015 8:29 AM To: Samuel Just Cc: David Zafman; Sage Weil; ceph-devel@vger.kernel.org Subject: RE: 'Racing read got wrong version' during proxy write testing Hi Sam, You are right, making the 'copy get' as a cache op works for the promotion case, but it has some troubles in the copy from operation. I tried it yesterday and ran into some failures on the copy from operation. I think removing the check in do_op is the right answer. For coy_get, I think we can still return ENOENT along with the reqids in the payload, instead of returning 0 and indicating it doesn't exist in the payload. This can keep most of the 'finishing promote' code unchanged. What do you think? -----Original Message----- From: Samuel Just [mailto:sj...@redhat.com] Sent: Friday, June 5, 2015 2:20 AM To: Wang, Zhiqiang Cc: David Zafman; Sage Weil; ceph-devel@vger.kernel.org Subject: Re: 'Racing read got wrong version' during proxy write testing Yeah, I think the right answer is to remove that check and restructure all of the existing reader ops which rely on it to return ENOENT from do_osd_ops. For copy_get, we instead return 0 and indicate in the structured payload that the object logically does not exist (along with the relevant log entries). -Sam ----- Original Message ----- From: "Samuel Just" <sj...@redhat.com> To: "Zhiqiang Wang" <zhiqiang.w...@intel.com> Cc: "David Zafman" <dzaf...@redhat.com>, "Sage Weil" <sw...@redhat.com>, ceph-devel@vger.kernel.org Sent: Thursday, June 4, 2015 11:12:14 AM Subject: Re: 'Racing read got wrong version' during proxy write testing copy_get can happen outside of the context of a cache op, so we can't (shouldn't) just flag it as a cache op. I wonder whether it wouldn't be better to live without that check at all and let all of the ops in do_osd_ops individually return ENOENT as appropriate. -Sam ----- Original Message ----- From: "Zhiqiang Wang" <zhiqiang.w...@intel.com> To: "David Zafman" <dzaf...@redhat.com>, "Sage Weil" <sw...@redhat.com> Cc: ceph-devel@vger.kernel.org Sent: Wednesday, June 3, 2015 7:08:35 PM Subject: RE: 'Racing read got wrong version' during proxy write testing Hi David, Proxy write hasn't been merge into master yet. It's not likely this is causing #11511. -----Original Message----- From: David Zafman [mailto:dzaf...@redhat.com] Sent: Thursday, June 4, 2015 9:46 AM To: Wang, Zhiqiang; Sage Weil Cc: ceph-devel@vger.kernel.org Subject: Re: 'Racing read got wrong version' during proxy write testing I'm wonder if this issue could be the cause of #11511. Could a proxy write have raced with the fill_in_copy_get() so object_info_t size doesn't correspond with the size of the object in the filestore? David On 6/3/15 6:22 PM, Wang, Zhiqiang wrote: > Making the 'copy get' op to be a cache op seems like a good idea. > > -----Original Message----- > From: Sage Weil [mailto:sw...@redhat.com] > Sent: Thursday, June 4, 2015 9:14 AM > To: Wang, Zhiqiang > Cc: ceph-devel@vger.kernel.org > Subject: RE: 'Racing read got wrong version' during proxy write > testing > > On Wed, 3 Jun 2015, Wang, Zhiqiang wrote: >> I ran into the 'op not idempotent' problem during the testing today. >> There is one bug in the previous fix. In that fix, we copy the reqids >> in the final step of 'fill_in_copy_get'. If the object is deleted, >> since the 'copy get' op is a read op, it returns earlier with ENOENT in >> do_op. >> No reqids will be copied during promotion in this case. This again >> leads to the 'op not idempotent' problem. We need a 'smart' way to >> detect the op is a 'copy get' op (looping the ops vector doesn't seem >> smart?) and copy the reqids in this case. > Hmm. I think the idea here is/was that that ENOENT would somehow include the > reqid list from PGLog::get_object_reqids(). > > I think teh trick is getting it past the generic check in do_op: > > if (!op->may_write() && > !op->may_cache() && > (!obc->obs.exists || > ((m->get_snapid() != CEPH_SNAPDIR) && > obc->obs.oi.is_whiteout()))) { > reply_ctx(ctx, -ENOENT); > return; > } > > Maybe we mark these as cache operations so that may_cache is true? > > Sam, what do you think? > > sage > > >> -----Original Message----- >> From: Sage Weil [mailto:sw...@redhat.com] >> Sent: Tuesday, May 26, 2015 12:27 AM >> To: Wang, Zhiqiang >> Cc: ceph-devel@vger.kernel.org >> Subject: Re: 'Racing read got wrong version' during proxy write >> testing >> >> On Mon, 25 May 2015, Wang, Zhiqiang wrote: >>> Hi all, >>> >>> I ran into a problem during the teuthology test of proxy write. It is like >>> this: >>> >>> - Client sends 3 writes and a read on the same object to base tier >>> - Set up cache tiering >>> - Client retries ops and sends the 3 writes and 1 read to the cache >>> tier >>> - The 3 writes finished on the base tier, say with versions v1, v2 >>> and >>> v3 >>> - Cache tier proxies the 1st write, and start to promote the object >>> for the 2nd write, the 2nd and 3rd writes and the read are blocked >>> - The proxied 1st write finishes on the base tier with version v4, >>> and returns to cache tier. But somehow the cache tier fails to send >>> the reply due to socket failure injecting >>> - Client retries the writes and the read again, the writes are >>> identified as dup ops >>> - The promotion finishes, it copies the pg_log entries from the base >>> tier and put it in the cache tier's pg_log. This includes the 3 >>> writes on the base tier and the proxied write >>> - The writes dispatches after the promotion, they are identified as >>> completed dup ops. Cache tier replies these write ops with the >>> version from the base tier (v1, v2 and v3) >>> - In the last, the read dispatches, it reads the version of the >>> proxied write (v4) and replies to client >>> - Client complains that 'racing read got wrong version' >>> >>> In a previous discussion of the 'ops not idempotent' problem, we solved it >>> by copying the pg_log entries in the base tier to cache tier during >>> promotion. Seems like there is still a problem with this approach in the >>> above scenario. My first thought is that when proxying the write, the cache >>> tier should use the original reqid from the client. But currently we don't >>> have a way to pass the original reqid from cache to base. Any ideas? >> I agree--I think the correct fix here is to make the proxied op be >> recognized as a dup. We can either do that by passing in an optional reqid >> to the Objecter, or extending the op somehow so that both reqids are listed. >> I think the first option will be cleaner, but I think we will also need to >> make sure the 'retry' count is preserved as (I think) we skip the dup check >> if retry==0. And we probably want to preserve the behavior that a given >> (reqid, retry) only exists once in the system. >> >> This probably means adding more optional args to Objecter::read()...? >> >> sage >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majord...@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majord...@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html