On Thu, Mar 9, 2017 at 7:20 PM 许雪寒 <xuxue...@360.cn> wrote: > Thanks for your reply. > > As the log shows, in our test, a READ that come after a WRITE did finished > before that WRITE.
This is where you've gone astray. Any storage system is perfectly free to reorder simultaneous requests -- defined as those whose submit-reply time overlaps. So you submitted write W, then submitted read R, then got a response to R before W. That's allowed, and preventing it is actually impossible in general. In the specific case you've outlined, we *could* try to prevent it, but doing so is pretty ludicrously expensive and, since the "reorder" can happen anyway, doesn't provide any benefit. So we don't try. :) That said, obviously we *do* provide strict ordering across write boundaries: a read submitted after a write completed will always see the results of that write. -Greg And I read the source code, it seems that, for writes, in > ReplicatedPG::do_op method, the thread in OSD_op_tp calls > ReplicatedPG::get_rw_lock method which tries to get RWState::RWWRITE. If it > fails, the op will be put into obc->rwstate.waiters queue and be requeued > when repop finishes, however, the OSD_op_tp's thread doesn't wait for repop > and tries to get the next OP. Can this be the cause? > > -----邮件原件----- > 发件人: Wei Jin [mailto:wjin...@gmail.com] > 发送时间: 2017年3月9日 21:52 > 收件人: 许雪寒 > 抄送: ceph-users@lists.ceph.com > 主题: Re: [ceph-users] How does ceph preserve read/write consistency? > > On Thu, Mar 9, 2017 at 1:45 PM, 许雪寒 <xuxue...@360.cn> wrote: > > Hi, everyone. > > > As shown above, WRITE req with tid 1312595 arrived at 18:58:27.439107 > and READ req with tid 6476 arrived at 18:59:55.030936, however, the latter > finished at 19:00:20:333389 while the former finished commit at > 19:00:20.335061 and filestore write at 19:00:25.202321. And in these logs, > we found that between the start and finish of each req, there was a lot of > "dequeue_op" of that req. We read the source code, it seems that this is > due to "RWState", is that correct? > > > > And also, it seems that OSD won't distinguish reqs from different > clients, so is it possible that io reqs from the same client also finish in > a different order than that they were created in? Could this affect the > read/write consistency? For instance, that a read can't acquire the data > that were written by the same client just before it. > > > > IMO, that doesn't make sense for rados to distinguish reqs from different > clients. > Clients or Users should do it by themselves. > > However, as for one specific client, ceph can and must guarantee the > request order. > > 1) ceph messenger (network layer) has in_seq and out_seq when receiving > and sending message > > 2) message will be dispatched or fast dispatched and then be queued in > ShardedOpWq in order. > > If requests belong to different pgs, they may be processed concurrently, > that's ok. > > If requests belong to the same pg, they will be queued in the same shard > and will be processed in order due to pg lock (both read and write). > For continuous write, op will be queued in ObjectStore in order due to pg > lock and ObjectStore has OpSequence to guarantee the order when applying op > to page cache, that's ok. > > With regard to 'read after write' to the same object, ceph must guarantee > read can get the correct write content. That's done by > ondisk_read/write_lock in ObjectContext. > > > > We are testing hammer version, 0.94.5. Please help us, thank you:-) > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com