Thanks a lot to Alan for this suggestion. I think it makes sense to simulate a 
scatter gather in driver for this case. I'll try it later and expect to see the 
improved performance.

>-----Original Message-----
>From: Alan Cox [mailto:a...@lxorguk.ukuu.org.uk]
>Sent: 2010年4月13日 23:21
>To: Gao, Yunpeng
>Cc: James Bottomley; Martin K. Petersen; Robert Hancock;
>linux-...@vger.kernel.org; linux-mmc@vger.kernel.org
>Subject: Re: How to make kernel block layer generate bigger request in the
>request queue?
>
>> And I just curious why the block layer does not merge these contiguous 
>> sectors
>into one single request? For example, if > the block layer generate 
>'start_sect:
>48776, nsect: 64, rw: r' instead of below requests, I think the performance 
>will
>> be better.
>
>You said earlier "My hardware doesn't support scatter/gather"
>
>> start_sect: 48776, nsect: 8, rw: r
>> start_sect: 48784, nsect: 8, rw: r
>> start_sect: 48792, nsect: 8, rw: r
>> start_sect: 48800, nsect: 8, rw: r
>> start_sect: 48808, nsect: 8, rw: r
>> start_sect: 48816, nsect: 8, rw: r
>> start_sect: 48824, nsect: 8, rw: r
>> start_sect: 48832, nsect: 8, rw: r
>
>Print the bus address of each request and you will probably find they are
>not contiguous so they have not been merged because your hardware could
>not do that transfer and you have no IOMMU.
>
>If the overhead per command is really really huge you can preallocate an
>internal buffer of say 32K or 64K in your driver and tell the block layer
>you do scatter gather, then copy the buffers into a linear chunk. I'd be
>very surprised if that was a win overall on any vaguely sane hardware but
>flash with erase block overhead and the like might be one of the less
>sane cases.
>
>Alan

Reply via email to