> On Tue, Sep 01, 2020 at 03:43:37PM +0100, Jose M Calhariz wrote: > > The an example of errors on the logs are: > > > > afs: disk cache read error in CacheItems slot 350195 off 28015620/35000020 > > code -4/80 > > afs: Error while alloc'ing cache slot for file 204:536874423.964.4794; > > failing with an i/o error
Hi, I'm the person that mentioned this briefly during the AFS workshop this week. These messages are not in themselves a problem; they are just reporting that we got an error code from the Linux kernel when trying to read from the disk cache. On Tue, 1 Sep 2020 16:07:55 -0700 Benjamin Kaduk <ka...@mit.edu> wrote: > This error message is supposed to indicate that a read from the cache > filesystem got EIO, which in turn is supposed to indicate a physical > problem with the drive. That said, I'm not going to jump to conclusions > and try to blame your drive, as there are several other things that could > be coming into play. The code logged is -4, which is EINTR (EIO would be -5). The most likely trigger of this is a process that got a SIGKILL signal (or other fatal signal) while we were reading from the disk cache. Traditionally we wouldn't get errors in that case, but Linux started returning errors in that situation after some version (possibly depending on the local fs in use? but I don't recall exactly). If you think these messages happen when some other bug or problem is happening, that's possible, but the messages themselves are not a problem. If you want to avoid the situation that causes these messages, you can try to avoid SIGKILL'ing the relevant processes, if you know what's causing that. The message you've shown doesn't log the pid, but there is already a change in 1.8.8pre1 to log the pid and some other information in that log message. If you want the specific patch to add some more info to that log message, it's here (gerrit 14437): https://git.openafs.org/?p=openafs.git;a=patch;h=5d863b4f6e817b1cc2615265c7747e17a2037ae6 I know of at least one bug that can be triggered by the log message you've mentioned, which is fixed by gerrit 14451 here: https://git.openafs.org/?p=openafs.git;a=patch;h=c55607d732a65f8acb1dfc6bf93aee0f4409cecf That's also in 1.8.8pre1, so if it's feasible for you to just try 1.8.8pre1, that's probably easiest. The messages will still appear with 1.8.8pre1, but they may be more informative, and some other related bugs may be fixed. If you are seeing some other problematic behavior with 1.8.8pre1, I can take a look if you provide some details. -- Andrew Deason adea...@sinenomine.net