On Mon, 2006-12-11 at 10:50 +0100, Jens Axboe wrote:
> On Sun, Dec 10 2006, Ming Zhang wrote:
> > Today I use blktrace observe a strange (at least to me) behavior at
> > block layer. Wonder if anybody can shed some lights? Thanks.
> > 
> > Here is the detail.
> > 
> > ... previous requests are ok.
> > 
> >   8,16   0      782     7.025277381  4915  Q   W 6768 + 32 [istiod1]
> >   8,16   0      783     7.025283850  4915  G   W 6768 + 32 [istiod1]
> >   8,16   0      784     7.025286799  4915  P   R [istiod1]
> >   8,16   0      785     7.025287794  4915  I   W 6768 + 32 [istiod1]
> > 
> > Write request to lba 6768 was inserted to the queue.
> > 
> >   8,16   0      786     7.026059876  4915  Q   R 6768 + 32 [istiod1]
> >   8,16   0      787     7.026064451  4915  G   R 6768 + 32 [istiod1]
> >   8,16   0      788     7.026066369  4915  I   R 6768 + 32 [istiod1]
> > 
> > Read request to same lba was inserted to the queue as well. though it
> > can not be merged, i thought it can be satisfied by previous write
> > request directly. seems merge function does not consider this.
> 
> That is the job of the upper layers, typically the page cache. For this
> scenario to take place, you must be using raw or O_DIRECT. And in that
> case, it is the job of the application to ensure proper ordering of
> requests.

ic. i assumed blkio should take responsibility on this as well. so i am
wrong.

> 
> >   8,16   0      789     7.034883766     0 UT   R [swapper] 2
> >   8,16   0      790     7.034904284     9  U   R [kblockd/0] 2
> > 
> > Unplug because of a read.
> > 
> >   8,16   0      791     7.045272094     9  D   R 6768 + 32 [kblockd/0]
> >   8,16   0      792     7.045654039     9  C   R 6768 + 32 [0]
> > 
> > Strangely, read request was sent to device before write request and thus
> > return a wrong data.
> 
> Linux doesn't guarantee any request ordering for O_DIRECT io.

so this means it can be inserted front and back. and no fixed order?


> 
> >   8,16   0      793     7.045669809     9  D   W 6768 + 32 [kblockd/0]
> >   8,16   0      794     7.049840970     0  C   W 6768 + 32 [0]
> > 
> > Write finished.
> > 
> > So read get a wrong data back to application. one thing not sure is
> > where (front/back) the request are insert into queue and who mess up the
> > order here.
> 
> There is no mess up, you are making assumptions that aren't valid.
> 
> > Is it possible for I event, we can know the extra flag, so we know where
> > it is inserted.
> 
> That would be too expensive, as we have to peak inside the io scheduler
> queue. So no.

see http://lxr.linux.no/source/block/elevator.c?v=2.6.18#L341, here we
generate insert event and we know where already. so export that flag is
not expensive.


> 
> > ---- is the code to generate this io -----. disk is a regular disk and
> > current scheduler is CFQ.
> 
> Ah ok, so you are doing this inside the kernel. If you want to ensure
> write ordering, then you need to mark the request as a barrier.
> 
>         submit_bio(rw || (1 << BIO_RW_BARRIER), bio);

we tried that if we mark a write request as barrier, we lose half
performance. if we mark it as BIO_RW_SYNC, it is almost no change.
though i still need to figure out the reason of that half performance
loss compared with BIO_RW_SYNC

> 
> I wont comment on your design, but it seems somewhat strange - why are
> you doing this in the kernel? What is the segment switching doing?

we are writing an iscsi target in kernel level.

which segment switching u meant?

> 
> BTW, this mail really isn't about blktrace, it probably should have been
> sent to the linux-kernel list. You wouldn't send a vmstat observed
> problem to the vmstat list, would you? :-)
> 

make sense. not next time. thx.



-
To unsubscribe from this list: send the line "unsubscribe linux-btrace" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to