Re: Integration of SCST in the mainstream Linux kernel

Vladislav Bolkhovitin Tue, 05 Feb 2008 11:02:09 -0800

James Bottomley wrote:

On Mon, 2008-02-04 at 21:38 +0300, Vladislav Bolkhovitin wrote:
James Bottomley wrote:
On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:
James Bottomley wrote:
On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:
James Bottomley wrote:
So, James, what is your opinion on the above? Or the overall SCSI targetproject simplicity doesn't matter much for you and you think it's fineto duplicate Linux page cache in the user space to keep the in-kernelpart of the project as small as possible?
The answers were pretty much contained here

http://marc.info/?l=linux-scsi&m=120164008302435

and here:

http://marc.info/?l=linux-scsi&m=120171067107293

Weren't they?
No, sorry, it doesn't look so for me. They are about performance, butI'm asking about the overall project's architecture, namely about onepart of it: simplicity. Particularly, what do you think aboutduplicating Linux page cache in the user space to have zero-copy cachedI/O? Or can you suggest another architectural solution for that problemin the STGT's approach?
Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.
No, please reread this thread, especially this message:http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one ofthe advantages of the kernel space implementation. The user spaceimplementation has to have data copied between the cache and user spacebuffer, but the kernel space one can use pages in the cache directly,without extra copy.
Well, you've said it thrice (the bellman cried) but that doesn't make it
true.

The way a user space solution should work is to schedule mmapped I/O
from the backing store and then send this mmapped region off for target
I/O.  For reads, the page gather will ensure that the pages are up to
date from the backing store to the cache before sending the I/O out.
For writes, You actually have to do a msync on the region to get the
data secured to the backing store.
James, have you checked how fast is mmaped I/O if work size > size ofRAM? It's several times slower comparing to buffered I/O. It was manytimes discussed in LKML and, seems, VM people consider it unavoidable.
Erm, but if you're using the case of work size > size of RAM, you'll
find buffered I/O won't help because you don't have the memory for
buffers either.
James, just check and you will see, buffered I/O is a lot faster.
So in an out of memory situation the buffers you don't have are a lot
faster than the pages I don't have?

There isn't OOM in both cases. Just pages reclamation/readahead workmuch better in the buffered case.

So, using mmaped IO isn't an option for high performance. Plus, mmapedIO isn't an option for high reliability requirements, since it doesn'tprovide a practical way to handle I/O errors.
I think you'll find it does ... the page gather returns -EFAULT if
there's an I/O error in the gathered region.
Err, to whom return? If you try to read from a mmaped page, which can'tbe populated due to I/O error, you will get SIGBUS or SIGSEGV, I don'tremember exactly. It's quite tricky to get back to the faulted commandfrom the signal handler.
Or do you mean mmap(MAP_POPULATE)/munmap() for each command? Do youthink that such mapping/unmapping is good for performance?
msync does something
similar if there's a write failure.
You also have to pull tricks with
the mmap region in the case of writes to prevent useless data being read
in from the backing store.
Can you be more exact and specify what kind of tricks should be done forthat?
Actually, just avoid touching it seems to do the trick with a recent
kernel.
Hmm, how can one write to an mmaped page and don't touch it?
I meant from user space ... the writes are done inside the kernel.

Sure, the mmap() approach agreed to be unpractical, but could youelaborate more on this anyway, please? I'm just curious. Do you thinkabout implementing a new syscall, which would put pages with data in themmap'ed area?

However, as Linus has pointed out, this discussion is getting a bit off

topic.

No, that isn't off topic. We've just proved that there is no good way toimplement zero-copy cached I/O for STGT. I see the only practical wayfor that, proposed by FUJITA Tomonori some time ago: duplicating Linuxpage cache in the user space. But will you like it?

There's no actual evidence that copy problems are causing any
performatince issues issues for STGT.  In fact, there's evidence that
they're not for everything except IB networks.

The zero-copy cached I/O has not yet been implemented in SCST, I simplyso far have not had time for that. Currently SCST performs better STGT,because of simpler processing path and less context switches percommand. Memcpy() speed on modern systems is about the same asthroughput of 20Gbps link (1600MB/s), so when the zero-copy will beimplemented, I won't be surprised by more 50-70% SCST advantage.

James



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Scst-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/scst-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

Reply via email to