On Aug 28, 2006, at 8:23 AM, Julian Martin Kunkel wrote:
Hi,
I want to adapt the I/O statemachines to reread the dfile array in
case a I/O
server responds with PVFS_ENOENT during the flow or within the
inital I/O
ACK. This might happen if the file is migrated away and the client
does not
have the updated dfile array befor it initiates the I/O.
Thus, I want to reread the dfile array and only restart the I/O for
this
particular server. The progress of the other I/O requests should
not be
influenced.
While looking at the sys-io.sm I wonder if the transition for the case
IO_RETRY in the state io_analyze_results does this. Maybe some
extra lines
could be added for example to restart the process if the initial
acknowledge
returns with PVFS_ENOENT and also do not increase the retry count
in this
case ?
I'm thankful for any suggestions how that could be implemented easily.
I think IO_RETRY is a little different. The first step (before the
IO request/response) of the sys-io.sm is a getattr to the metadata
server to get the datafile handles. Its this step that you want to
repeat if the IO request to the IO server fails, right? So instead
of jumping back to io_datafile_post_msgpairs, I think you'll want to
jump all the way back to io_init. Its probably easier to create
another return code (IO_REINIT or something), and return that from
io_datafile_complete_operations. I think there will be some cleanup
that you have to do in complete_operations before you can jump back
up to init as well.
Also, it seems unlikely that the dfile handle array would have
changed from the initial getattr to the IO requests (wouldn't a
migrate disable the metadata server temporarily?), so this retry is
probably only necessary if the attribute cache holding the dfile
handle array has become stale. You could just turn that attr cache
off with a 0 timeout for now, otherwise you'll have to invalidate the
cache (at least the dfile handle array bits of it) before doing the
getattr again.
In this context a weird error message:
In case the fs is corrupted, e.g. there is a metafile pointing to a
non-
existing datafile I think the I/O should abort quickly instead of
doing
retries (in the migration case retry to get dfiles if they did not
change
abort). Currently on the client sm returns the error: "Operation
now in
progress". You can try this by removing a datafile with pvfs2-
remove-object
(first get object number with pvfs2-viewdist).
Hmm..that is a little odd. I think the EINPROGRESS only gets
returned from aio_error though...I don't see us setting it anywhere
in our code. My guess is that the remove-object may not remove the
actual fd from the open cache, so the IO doesn't fail sooner with
ENOENT as it should. I haven't looked at the code to verify that
though.
-sam
thanks,
julian
---
Ben (Obi-Wan) Kenobi:
Use the Force, Luke!
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers