On Aug 28, 2006, at 8:23 AM, Julian Martin Kunkel wrote:

Hi,
I want to adapt the I/O statemachines to reread the dfile array in case a I/O server responds with PVFS_ENOENT during the flow or within the inital I/O ACK. This might happen if the file is migrated away and the client does not
have the updated dfile array befor it initiates the I/O.
Thus, I want to reread the dfile array and only restart the I/O for this particular server. The progress of the other I/O requests should not be
influenced.
While looking at the sys-io.sm I wonder if the transition for the case
IO_RETRY in the state io_analyze_results does this. Maybe some extra lines could be added for example to restart the process if the initial acknowledge returns with PVFS_ENOENT and also do not increase the retry count in this
case ?
I'm thankful for any suggestions how that could be implemented easily.


I think IO_RETRY is a little different. The first step (before the IO request/response) of the sys-io.sm is a getattr to the metadata server to get the datafile handles. Its this step that you want to repeat if the IO request to the IO server fails, right? So instead of jumping back to io_datafile_post_msgpairs, I think you'll want to jump all the way back to io_init. Its probably easier to create another return code (IO_REINIT or something), and return that from io_datafile_complete_operations. I think there will be some cleanup that you have to do in complete_operations before you can jump back up to init as well.

Also, it seems unlikely that the dfile handle array would have changed from the initial getattr to the IO requests (wouldn't a migrate disable the metadata server temporarily?), so this retry is probably only necessary if the attribute cache holding the dfile handle array has become stale. You could just turn that attr cache off with a 0 timeout for now, otherwise you'll have to invalidate the cache (at least the dfile handle array bits of it) before doing the getattr again.

In this context a weird error message:
In case the fs is corrupted, e.g. there is a metafile pointing to a non- existing datafile I think the I/O should abort quickly instead of doing retries (in the migration case retry to get dfiles if they did not change abort). Currently on the client sm returns the error: "Operation now in progress". You can try this by removing a datafile with pvfs2- remove-object
(first get object number with pvfs2-viewdist).

Hmm..that is a little odd. I think the EINPROGRESS only gets returned from aio_error though...I don't see us setting it anywhere in our code. My guess is that the remove-object may not remove the actual fd from the open cache, so the IO doesn't fail sooner with ENOENT as it should. I haven't looked at the code to verify that though.

-sam


thanks,
julian

---
Ben (Obi-Wan) Kenobi:
        Use the Force, Luke!
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to