Vasily Tarasov writes:
> Hello all,
>
> I was playing with block device io tracing recently, and have
> encountered some strange things.
> (kernel 2.6.18 and blktrace just from git)
>
> So there is one (dominant) reader on HN: fio tool with such job file:
>
> [global]
> rw=randread
> direct=0
> timeout=200
>
> [reader0]
> filename=file0 ;file of size 1 gb
>
> Here is a piece of decoded blktrace output:
>
> 8,0 0 105919 200.287000000 0 C R 338688739 + 8 [0]
> 8,0 0 105920 200.287000000 7017 Q R 339904179 + 8 [fio]
> 8,0 0 105921 200.287000000 7017 B R 339904179 + 8 [fio]
> 8,0 0 105922 200.287000000 7017 G R 339904179 + 8 [fio]
> 8,0 0 105923 200.287000000 7017 P R [fio]
> 8,0 0 105924 200.287000000 7017 I R 339904179 + 8 [fio]
> 8,0 0 105925 200.287000000 7017 D R 339904179 + 8 [fio]
> 8,0 0 105926 200.287000000 7017 U R [fio] 1
> 8,0 0 105927 200.291000000 0 C R 339904179 + 8 [0]
> 8,0 0 105928 200.291000000 7017 Q R 340230323 + 8 [fio]
> 8,0 0 105929 200.291000000 7017 B R 340230323 + 8 [fio]
> 8,0 0 105930 200.291000000 7017 G R 340230323 + 8 [fio]
> 8,0 0 105931 200.291000000 7017 P R [fio]
> 8,0 0 105932 200.291000000 7017 I R 340230323 + 8 [fio]
> 8,0 0 105933 200.291000000 7017 D R 340230323 + 8 [fio]
> 8,0 0 105934 200.291000000 7017 U R [fio] 1
> 8,0 0 105935 200.302000000 0 C R 340230323 + 8 [0]
> 8,0 0 105936 200.302000000 7017 Q R 340265211 + 8 [fio]
> 8,0 0 105937 200.302000000 7017 B R 340265211 + 8 [fio]
> 8,0 0 105938 200.302000000 7017 G R 340265211 + 8 [fio]
> 8,0 0 105939 200.302000000 7017 P R [fio]
> 8,0 0 105940 200.302000000 7017 I R 340265211 + 8 [fio]
> 8,0 0 105941 200.302000000 7017 D R 340265211 + 8 [fio]
> 8,0 0 105942 200.302000000 7017 U R [fio] 1
> 8,0 0 105943 200.306000000 0 C R 340265211 + 8 [0]
> 8,0 0 105944 200.306000000 7017 Q R 338171203 + 8 [fio]
> 8,0 0 105945 200.306000000 7017 B R 338171203 + 8 [fio]
> 8,0 0 105946 200.306000000 7017 G R 338171203 + 8 [fio]
> 8,0 0 105947 200.306000000 7017 P R [fio]
> 8,0 0 105948 200.306000000 7017 I R 338171203 + 8 [fio]
> 8,0 0 105949 200.306000000 7017 D R 338171203 + 8 [fio]
> 8,0 0 105950 200.306000000 7017 U R [fio] 1
> 8,0 0 105951 200.312000000 0 C R 338171203 + 8 [0]
> 8,0 0 105952 200.312000000 7017 Q R 339044731 + 8 [fio]
> 8,0 0 105953 200.312000000 7017 B R 339044731 + 8 [fio]
> 8,0 0 105954 200.312000000 7017 G R 339044731 + 8 [fio]
>
> As you see, there is a lot of "B" - back merges according to documentation.
> But in final results:
>
> CPU0 (8,0):
> Reads Queued: 13,244, 53,084KiB Writes Queued:
> 42, 188KiB
> Read Dispatches: 13,243, 53,080KiB Write Dispatches:
> 26, 120KiB
> Reads Requeued: 0 Writes Requeued: 0
> Reads Completed: 13,243, 53,080KiB Writes Completed:
> 26, 120KiB
> Read Merges: 1 Write Merges: 4
> Read depth: 2 Write depth: 2
> IO unplugs: 13,243 Timer unplugs: 6
>
> Throughput (R/W): 258KiB/s / 0KiB/s
> Events (8,0): 106,136 entries
> Skips: 0 forward (0 - 0.0%)
>
> Only one read merge?
> So the first question is why does this happen?
> And the second one. As you can see fio produces randread, so actually
> not a lot of merges
> should be there. But there is a plenty of "B".
>
Hi,
It looks like the documentation is wrong - from the blkparse code,
apparently 'B' means 'Bounce', and is what's actually used instead of
the 'W' the documentation has. 'Backmerge' according to the code is
actually 'M', and is used instead of the 'B' the documentation has for
that. There doesn't seem to be anything that uses 'front or back
merge' as described for 'M' in the documentation. There also seems to
be the same mixup in the formatting code.
The patch below makes those changes according to my understanding of
how they're actually used in the code...
Tom
diff --git a/blkparse_fmt.c b/blkparse_fmt.c
index 7830738..b309b4b 100644
--- a/blkparse_fmt.c
+++ b/blkparse_fmt.c
@@ -335,7 +335,7 @@ static void process_default(char *act, s
case 'D': /* Issue */
case 'I': /* Insert */
case 'Q': /* Queue */
- case 'W': /* Bounce */
+ case 'B': /* Bounce */
if (t->action & BLK_TC_ACT(BLK_TC_PC)) {
char *p;
fprintf(ofp, "%u ", t->bytes);
@@ -356,9 +356,8 @@ static void process_default(char *act, s
}
break;
- case 'B': /* Back merge */
+ case 'M': /* Back merge */
case 'F': /* Front merge */
- case 'M': /* Front or back merge */
case 'G': /* Get request */
case 'S': /* Sleep request */
fprintf(ofp, "%llu + %u [%s]\n", (unsigned long long) t->sector,
diff --git a/doc/blktrace.tex b/doc/blktrace.tex
index 27b13c2..72f7e2e 100644
--- a/doc/blktrace.tex
+++ b/doc/blktrace.tex
@@ -527,22 +527,20 @@ Short & Long
\item[Q -- queued] This notes intent to queue io at the given location.
No real requests exists yet.
- \item[W -- bounced] The data pages attached to this \emph{bio} are
+ \item[B -- bounced] The data pages attached to this \emph{bio} are
not reachable by the hardware and must be bounced to a lower memory
location. This causes a big slowdown in io performance, since the data
must be copied to/from kernel buffers. Usually this can be fixed with
using better hardware - either a better io controller, or a platform
with an IOMMU.
- \item[B -- back merge] A previously inserted request exists that ends
+ \item[M -- back merge] A previously inserted request exists that ends
on the boundary of where this io begins, so the io scheduler can merge
them together.
\item[F -- front merge] Same as the back merge, except this io ends
where a previously inserted requests starts.
- \item[M -- front or back merge] One of the above.
-
\item[G -- get request] To send any type of request to a block device,
a \emph{struct request} container must be allocated first.
@@ -625,18 +623,18 @@ The following table shows the various ac
\begin{tabular}{|l|l|}\hline
Act & Description \\ \hline\hline
A & IO was remapped to a different device \\ \hline
-B & IO back merged with request on queue \\ \hline
+B & IO bounced \\ \hline
C & IO completion \\ \hline
D & IO issued to driver \\ \hline
F & IO front merged with request on queue \\ \hline
G & Get request \\ \hline
I & IO inserted onto request queue \\ \hline
+M & IO back merged with request on queue \\ \hline
P & Plug request \\ \hline
Q & IO handled by request queue code \\ \hline
S & Sleep request \\ \hline
T & Unplug due to timeout \\ \hline
U & Unplug request \\ \hline
-W & IO bounced \\ \hline
X & Split \\ \hline
\end{tabular}
@@ -690,7 +688,7 @@ The default output for all event types i
\item[D -- issued]
\item[I -- inserted]
\item[Q -- queued]
- \item[W -- bounced] If a payload is present, the number of payload bytes
+ \item[B -- bounced] If a payload is present, the number of payload bytes
is output, followed by the payload in hexadecimal between parenthesis.
If no payload is present, the sector and number of blocks are presented
@@ -699,10 +697,9 @@ The default output for all event types i
either case, it is followed by the command associated with the event
(surrounded by square brackets).
- \item[B -- back merge]
+ \item[M -- back merge]
\item[F -- front merge]
\item[G -- get request]
- \item[M -- front or back merge]
\item[S -- sleep] The starting sector and number of blocks is output
(with an intervening plus (+) character), followed by the command
associated with the event (surrounded by square brackets).
-
To unsubscribe from this list: send the line "unsubscribe linux-btrace" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html