On Thu, Mar 9, 2017 at 7:20 PM 许雪寒 <xuxue...@360.cn> wrote:

> Thanks for your reply.
>
> As the log shows, in our test, a READ that come after a WRITE did finished
> before that WRITE.


This is where you've gone astray. Any storage system is perfectly free to
reorder simultaneous requests -- defined as those whose submit-reply time
overlaps. So you submitted write W, then submitted read R, then got a
response to R before W. That's allowed, and preventing it is actually
impossible in general. In the specific case you've outlined, we *could* try
to prevent it, but doing so is pretty ludicrously expensive and, since the
"reorder" can happen anyway, doesn't provide any benefit.
So we don't try. :)

That said, obviously we *do* provide strict ordering across write
boundaries: a read submitted after a write completed will always see the
results of that write.
-Greg

And I read the source code, it seems that, for writes, in
> ReplicatedPG::do_op method, the thread in OSD_op_tp calls
> ReplicatedPG::get_rw_lock method which tries to get RWState::RWWRITE. If it
> fails, the op will be put into obc->rwstate.waiters queue and be requeued
> when repop finishes, however, the OSD_op_tp's thread doesn't wait for repop
> and tries to get the next OP. Can this be the cause?
>
> -----邮件原件-----
> 发件人: Wei Jin [mailto:wjin...@gmail.com]
> 发送时间: 2017年3月9日 21:52
> 收件人: 许雪寒
> 抄送: ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] How does ceph preserve read/write consistency?
>
> On Thu, Mar 9, 2017 at 1:45 PM, 许雪寒 <xuxue...@360.cn> wrote:
> > Hi, everyone.
>
> > As shown above, WRITE req with tid 1312595 arrived at 18:58:27.439107
> and READ req with tid 6476 arrived at 18:59:55.030936, however, the latter
> finished at 19:00:20:333389 while the former finished commit at
> 19:00:20.335061 and filestore write at 19:00:25.202321. And in these logs,
> we found that between the start and finish of each req, there was a lot of
> "dequeue_op" of that req. We read the source code, it seems that this is
> due to "RWState", is that correct?
> >
> > And also, it seems that OSD won't distinguish reqs from different
> clients, so is it possible that io reqs from the same client also finish in
> a different order than that they were created in? Could this affect the
> read/write consistency? For instance, that a read can't acquire the data
> that were written by the same client just before it.
> >
>
> IMO, that doesn't make sense for rados to distinguish reqs from different
> clients.
> Clients or Users should do it by themselves.
>
> However, as for one specific client, ceph can and must guarantee the
> request order.
>
> 1) ceph messenger (network layer) has in_seq and out_seq when receiving
> and sending message
>
> 2) message will be dispatched or fast dispatched and then be queued in
> ShardedOpWq in order.
>
> If requests belong to different pgs, they may be processed concurrently,
> that's ok.
>
> If requests belong to the same pg, they will be queued in the same shard
> and will be processed in order due to pg lock (both read and write).
> For continuous write, op will be queued in ObjectStore in order due to pg
> lock and ObjectStore has OpSequence to guarantee the order when applying op
> to page cache, that's ok.
>
> With regard to  'read after write' to the same object, ceph must guarantee
> read can get the correct write content. That's done by
> ondisk_read/write_lock in ObjectContext.
>
>
> > We are testing hammer version, 0.94.5.  Please help us, thank you:-)
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to