Hi!

A. SUMMARY

Long story short: I have a file name on my zfs without a file to it. ls
will include it in the dir content, but stat-ing that file will result
in an ENOENT error: "No such file or directory".


B. HISTORY

So how did I come to this situation? I've recently had to kill the
sending side of an rsync, with the receiving side on FreeBSD. For
reasons yet unknown, the next run of rsync started deleting stuff it
shouldn't. Details on this are in PR 162318 [1], but quoting the most
important things:

Logging into the receiving FreeBSD as root, I found that large parts of
the user's home directory content had disappeared, even outside the
subdirectory used as the rsync destination!
- All the .* config files in the home directory were gone
- The .ssh directory was still present, but its content was gone as well
- Both the home dir and the .ssh subdir contained a file "rsync.%stat",
  which should be the name of an extattr instead, used to implement the
  rsync --fake-super command line option.

[1] http://www.freebsd.org/cgi/query-pr.cgi?pr=162318


C. SYMPTOMS

I first assumed a problem in the binary rsync build for FreeBSD, but
devs on the above bug report favored RAM failure or an upstream source
code bug. So I gave it another try, and payed closer attention to the
error messages. Among them was the following:

> rsync: stat "/home/name/backup/etc/ca-certificates" failed: No such file or 
> directory (2)

Strange thing is, this isn't specific to rsync at all, it can be
reproduced using simple command line tools like ls:

> # ls /home/name/backup/etc/ | grep ca-cert
> ca-certificates
> ca-certificates.conf
> ca-certificates.conf~
> # ls /home/name/backup/etc/ca-*
> ls: /home/name/backup/etc/ca-certificates: No such file or directory
> /home/name/backup/etc/ca-certificates.conf
> /home/name/backup/etc/ca-certificates.conf~

So as you see, the name is returned by readdir(3), where both ls for the
dir and the wildcard expansion find it. But anything that stat(2)s the
file will encounter an ENOENT error. "zpool status" says everything's
fine, so zfs isn't aware of any corruption.

I believe that no matter what errors user space programs might make, the
kernel zfs driver should never allow the above to happen. Either a file
is there, or it isn't, there should be no such mixture. So what do you
think, is this likely to be a bug in the zfs implementation?

I found one other person describing problems like this: in threads
titled "file lose inode in Memory-Based file system.", lisen1001
described pretty much the same thing, except on ramdisk on 8.2 instead
of my own hdd-based raidz on 9.0-RC1 [2,3].

[2] http://thread.gmane.org/gmane.os.freebsd.questions/280183
[3] http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/13153


D. NEXT STEPS

As I'm new to FreeBSD, I'm not yet sure how bug reports are handled
around here. As I said, I've reported a bug report against rsync, and it
has been closed on the grounds that this appears to be an upstream
problem. Would it make sense to include the above information in the bug
report for reference? Would replying to the gnats address be enough to
accomplish that? Should the bug be reopened, as I assume all my problems
to be related, and as the zfs corruption at least is specific to
FreeBSD? If so, how does one reopen a report? Or who can do that?

Do you agree that this looks like a problem in the ZFS implementation?
Should I file a new problem report for that?

Can you suggest any way I could resolve the corruption on my local ZFS
pool, short of destroying and recreating the whole file system? "rm" for
the file doesn't work, as it, too, encounters the ENOENT. Is there any
tool to check or rebuild the inode data structures of zfs? "zpool scrub"
doesn't seem to fit the bill, as its manpage indicates a computation of
file content checksums.


Greetings,
 Martin von Gagern

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to