On 14.06.2017 02:26, Benjamin LaHaise wrote:
> On Tue, Jun 13, 2017 at 07:17:43PM +0300, Kirill Tkhai wrote:
>> On 13.06.2017 18:26, Benjamin LaHaise wrote:
>>> On Tue, Jun 13, 2017 at 06:11:03PM +0300, Kirill Tkhai wrote:
>>> ...
>>>> The functionality, I did, grew from real need and experience. We try to
>>>> avoid kernel modification, where it's possible, but the in-flight aio
>>>> requests is not a case suitable for that.
>>>
>>> What you've done only works for *your* use-case, but not in general.  Like
>>> in other subsystems, you need to provide hooks on a per file descriptor
>>> basis for quiescing different kinds of file descriptors.
>>
>> Which hooks do you suggest? It's possible there is no a file descriptor open 
>> after
>> a request is submitted. Do you suggest an interface to reopen a struct file?
> 
> That's what you have to contend with.  AIO keeps the struct file pinned
> for the life of the request.  This is one of the complex issues you need
> to address.
> 
>>> Your current
>>> patch set completely ignores things like usb gadget.  You need to create
>>> infrastructure for restarting i/os after your checkpointing occurs, which
>>> you haven't put any thought into in this patchset.  If you want to discuss
>>> how to do that, fine, but the approach in this patchset simply does not
>>> work in general.  What happens when an aio doesn't complete or takes hours
>>> to complete?
>>
>> Here is wait_event_interruptible(), but it's possible to convert it
>> in wait_event_interruptible_hrtimeout() like it's made in read_events().
>> It's not a deadly issue of patch. The function read_events() simply
>> waits for timeout, can't we do the same?
> 
> Nope.  An aio may not complete in a timely fashion, in which case your
> wait for all aios to complete will simply wait forever.  I take it this is
> not the desired behaviour of checkpointing.

But read_events() does it. What the difference between them?
 
>> Could you please describe how will cancelling aio requests will help to wait
>> till their completion? Is there is guarantee, they will be easily queued 
>> back?
>> I suppose, no, because there are may be a memory limit or some low level
>> drivers limitations, dependent on internal conditions.
> 
> This is the problem you're facing.  Quiescing aios is complicated!

It's too generic words and they may refer to anything. And it's a problem,
which is not connected with small aio engine, because it's impossible to 
guarantee
availability of kernel resources from there. Imagine, parallel task eats memory
of your just cancelled request: then you just can't queue request back.

This argument makes your suggestion not rock-stable suitable for all cases:
it will break user applications in such situations.

>> Also, it's not seems good to overload aio with the functionality of obtaining
>> closed file descriptors of submitted requests.
>>
>> Do you mean this way, or I misunderstood you? Could you please to concretize 
>> your
>> idea?
> 
> Yes, this is what I'm talking about.
> 
>> In my vision cancelling requests does not allow to implement the need I 
>> described.
>> If we can't queue request back, it breaks snapshotting and user application 
>> in
>> general.
> 
> This is what you have to figure out.  This is why your patch is incomplete
> and cannot be accepted.  You can't punt the complexity your feature
> requires onto other maintainers -- your implementation has to be reasonably
> complete at time of patch inclusion.  Can you see now why your patch can't
> be included as-is?  The assumption you made that aios complete in a timely
> fashion is incorrect.  Everything else stems from that.

I just can't see why read_events() just waits for requests completion not paying
attention of the all above you said. Could you please clarify the difference
between two these situations?

Reply via email to