Hi Jan,
thanks for taking time to look at the code.

On Mon, Oct 8, 2012 at 11:26 AM, Jan Schmidt <list.bt...@jan-o-sch.net> wrote:
> Hi Alex,
>
> On Thu, October 04, 2012 at 17:59 (+0200), Alex Lyakas wrote:
>> as I promised, here is some code for you to look at.
>
> And quite a lot of it. I hadn't thought of such a big change when I wrote
> "preferably in form of a patch".
>
> As a side note, your patch doesn't follow the general kernel coding style (but
> read on before you rework this one).
>
>> First I will describe the approach in general.
>>
>> # Get rid of the pipe. Instead, user-space passes a buffer and kernel
>> fills the specified user-space buffer with commands.
>> # When the buffer is full, kernel stops generating commands and
>> returns a checkpoint to the user-space.
>> # User-space does whatever it wants with the returned buffer, and then
>> calls the kernel again, with a buffer and a checkpoint that was
>> returned by the kernel from previous SEND ioctl().
>> # Kernel re-arms itself to the specified checkpoint, and fills the
>> specified buffer with commands, attaches a new checkpoint and so on.
>> # Eventually kernel signals to the user that there are no more commands.
>
> We had that in the very beginning of btrfs send. Having only a single ioctl
> saves a whole lot of system calls.
>
>> I realize this is a big change, and a new IOCTL has to be introduced
>> in order not to break current user-kernel protocol.
>> The pros as I see them:
>> # One data-copy is avoided (no pipe). For WRITE commands two
>> data-copies are avoided (no read_buf needed)
>
> I'm not sure I understand those correctly. If you're talking about the user 
> mode
> part, we could simply pass stdout to the kernel, saving the unnecessary pipe 
> and
> copy operations in between without introducing a new buffer.
What I meant is the following:
# For non-WRITE commands the flow is: put the command onto send_buf,
copy to pipe, then user-space copies it out from the pipe. With my
code: put command onto send_buf, then copy to user-space buffer
(copy_to_user). So one data-copy is avoided (2 vs 3).
# For WRITE commands: read data onto read_buf, then copy to send_buf,
then copy to pipe, then user-mode copies to its buffer. With my code:
read onto send_buf, then copy to user-space buffer. So 2 data-copies
are avoided (2 vs 4).
Does it make sense?

>
>> # ERESTARTSYS issue disappears. If needed, ioctl is restarted, but
>> there is no problem with that, it will simply refill the buffer from
>> the same checkpoint.
>
> This is the subject of this thread and the thing I'd like to focus on 
> currently.
>
>> Cons:
>> # Instead of one ioctl(), many ioctls() are issued to finish the send.
>> # Big code change
>
> Two big cons. I'd like to quota Alexander's suggestions again:
>
> On Wed, August 01, 2012 at 14:09 (+0200), Alexander Block wrote:
>> I have two possible solutions in my mind.
>> 1. Store some kind of state in the ioctl arguments so that we can
>> continue where we stopped when the ioctl reenters. This would however
>> complicate the code a lot.
>> 2. Spawn a thread when the ioctl is called and leave the ioctl
>> immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls
>> if they happen from a non syscall thread.
>
> What do you think about those two?
I am not familiar enough with Linux kernel - will the second one work?

>
> I like the first suggestion. Combining single-ioctl with signal handling
> capabilities feels like the right choice. When we get ERESTARTSYS, we know
> exactly how many bytes made it to user mode. To reach a comfortable state for 
> a
> restart, we can store part of the stream together with the meta information in
> our internal state before returning to user mode.
I thought that ERESTARTSYS never returns to user mode (this is what I
saw in my tests also).

> The ioctl will be restarted sooner or later and our internal state tells us 
> where to proceed.

Ok, so you say that we should maintain checkpoint-like information and
use it if the ioctl is automatically restarted. This is quite close to
what I have done, I believe; we still need all the capabilities of
re-arming tree search, saving context, skipping commands etc, that I
have written. Did I understand your proposal correctly?

But tell me one thing: let's say we call vfs_write() and it returns
-ERESTARTSYS. Can we be sure that it wrote 0 bytes to the pipe in this
call? Otherwise, we don't know how many bytes it wrote, so we cannot
resume it correctly.

Thanks,
Alex.



>
> Thanks,
> -Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to