What I understand is that if the read op specifies the RWORDERED flag, it is processed in order. Otherwise, it may be out of order.
-----Original Message----- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang Sent: Wednesday, December 10, 2014 10:26 AM To: Wang, Zhiqiang Cc: Sage Weil; Cook, Nigel; Yehuda Sadeh; Josh Durgin; Samuel Just; Jason Dillaman; ceph-devel Subject: Re: rados read ordering On Wed, Dec 10, 2014 at 9:10 AM, Wang, Zhiqiang <zhiqiang.w...@intel.com> wrote: > For multiple clients accessing the same volume, same object, I guess the > clients need to do some synchronization between them to guarantee what it > reads is what it wrote. That is to say, if two clients, say A and B, A is > doing write, B is doing read. If B wants to read what A writes, it needs to > know A has completed the write before issuing the read. If A hasn't completed > the write yet, it could be possible that the write fails, so you can't expect > B reads what A writes. I think this makes sense from the client/application > perspective. I think each write for a object is a barrier. If receiving a write op, previous read ops all are allowed to out of order and read ops after write op also can out of order. I think it's still SERIALIZABLE level for rados client. > > -----Original Message----- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil > Sent: Wednesday, December 10, 2014 8:43 AM > To: Cook, Nigel > Cc: Yehuda Sadeh; Josh Durgin; Samuel Just; Wang, Zhiqiang; Jason > Dillaman; ceph-devel > Subject: Re: rados read ordering > > On Wed, 10 Dec 2014, Cook, Nigel wrote: >> Folks >> >> I'm wondering if this is related to the question I posed a few days >> ago.. >> >> Can CEPH support 2 clients simultaneously accessing a single volume >> - for example a database cluster - and honor read and write order of >> blocks across the multiple clients? >> >> Can you comment? > > I don't think so. This (non)change is about a single client submitting two > reads to a single object (say, different blocks in the same disk) and whether > the OSD is allowed to respond out of order (say, because some blocks are in > cache and some aren't). > > In the shared volume case, it is generally not important what happens with > requests that are submitted in parallel.. they can take different amounts of > time on the wire and which happens first (i.e., which arrives at the OSD > first) depends on happenstance. What does matter is that any read that > happens after a complete write reflects that read, which is why the concern > is around caching. > > sage > > >> >> Regards, >> Nigel Cook +1 720 319 7508 >> >> Sent from a mobile device. >> Please excuse both my brevity and sp3lling >> >> On Dec 8, 2014 10:12 AM, Yehuda Sadeh <yeh...@redhat.com> wrote: >> On Mon, Dec 8, 2014 at 9:03 AM, Sage Weil <sw...@redhat.com> wrote: >> > The current RADOS behavior is that reads (on any given object) are >> > always processed in the order they are submitted by the client. >> > This causes a few headaches for the cache tiering that it would be >> > nice to avoid. It also occurs to me that there are likely cases >> > where we could go a lot faster by not strictly ordering things. >> > For example, a stat can respond more quickly than a large read, and >> > some reads may hit cache while others go to disk. This doesn't >> > happen currently because of the (lame) way we do reads synchronously, but >> > hope that can change too. >> > >> > I propose we drop this semantic. If a client wants reads to have a >> > strict ordering, they can set the existing RWORDERED flag (which >> > also orders them with respect to writes). That's not the most >> > general thing ever, but I'm not sure we care about callers who want >> > reads ordered with respect to each other but not writes. >> > >> > The real question is whether there are any users that want/need >> > this currently. I can't think of any offhand. In several places >> > we submit multiple *writes* and expect them to be strictly ordered >> > (e.g., we set a completion on teh last write only). I don't think >> > we do this anywhere for reads though... >> > >> > Josh, Yehuda, Jason--can you think of any in RBD or RGW that would >> > depend on this? >> > >> >> None that I can think of. For objects data, we already stripe it >> across multiple objects, and the underlying assumption is that we're >> going to get responses out of order so we make sure we commit >> in-order. Guards are used on the head object, and the read is >> synchronous there anyway. I can't think of any other place where we'd >> have an issue. >> >> Yehuda >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majord...@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majord...@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majord...@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majord...@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html N�����r��y����b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�m��������zZ+�����ݢj"��!�i