name

dweimer Mon, 09 Sep 2013 06:08:49 -0700

On 08/16/2013 8:49 am, dweimer wrote:

On 08/15/2013 10:00 am, dweimer wrote:

On 08/14/2013 9:43 pm, Shane Ambler wrote:
On 14/08/2013 22:57, dweimer wrote:
I have a few systems running on ZFS with a backup script thatcreatessnapshots, then backs up the .zfs/snapshot/name directory to makesureopen files are not missed. This has been working great but all ofthesudden one of my systems has stopped working. It takes thesnapshots
fine, zfs list -t spnapshot shows the snapshots, but if you do an ls
command, on the .zfs/snapshot/ directory it returns not a directory.
part of the zfs list output:

NAME                        USED  AVAIL  REFER  MOUNTPOINT
zroot                      4.48G  29.7G    31K  none
zroot/ROOT                 2.92G  29.7G    31K  none
zroot/ROOT/91p5-20130812   2.92G  29.7G  2.92G  legacy
zroot/home                  144K  29.7G   122K  /home

part of the zfs list -t snapshot output:

NAME                                            USED  AVAIL  REFER
MOUNTPOINT
zroot/ROOT/91p5-20130812@91p5-20130812--bsnap 340K - 2.92G-zroot/home@home--bsnap 22K - 122K-
ls /.zfs/snapshot/91p5-20130812--bsnap/
Does work at the right now, since the last reboot, but wasn't always
working, this is my boot environment.

if I do ls /home/.zfs/snapshot/, result is:
ls: /home/.zfs/snapshot/: Not a directory

if I do ls /home/.zfs, result is:
ls: snapshot: Bad file descriptor
shares
I have tried zpool scrub zroot, no errors were found, if I rebootthesystem I can get one good backup, then I start having problems.Anyone
else ever ran into this, any suggestions as to a fix?
System is running FreeBSD 9.1-RELEASE-p5 #1 r253764: Mon Jul 2915:07:35
CDT 2013, zpool is running version 28, zfs is running version 5
I can say I've had this problem. Not certain what fixed it. I do
remember I decided to stop snapshoting if I couldn't access them and
deleted existing snapshots. I later restarted the machine before I
went back for another look and they were working.

So my guess is a restart without existing snapshots may be the key.
Now if only we could find out what started the issue so we can stopit
happening again.
I had actually rebooted it last night, prior to seeing this message, I
do know it didn't have any snapshots this time.  As I am booting from
ZFS using boot environments I may have had an older boot environment
still on the system the last time it was rebooted.  Backups ran great
last night after the reboot, and I was able to kick off my pre-backup
job and access all the snapshots today.  Hopefully it doesn't come
back, but if it does I will see if I can find anything else wrong.

FYI,
It didn't shutdown cleanly, so if this helps anyone find the issue,
this is from my system logs:
Aug 14 22:08:04 cblproxy1 kernel:
Aug 14 22:08:04 cblproxy1 kernel: Fatal trap 12: page fault while inkernel mode
Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0; apic id = 00
Aug 14 22:08:04 cblproxy1 kernel: fault virtual address = 0xa8
Aug 14 22:08:04 cblproxy1 kernel: fault code            = supervisor
write data, page not present
Aug 14 22:08:04 cblproxy1 kernel: instruction pointer   =
0x20:0xffffffff808b0562
Aug 14 22:08:04 cblproxy1 kernel: stack pointer         =
0x28:0xffffff80002238f0
Aug 14 22:08:04 cblproxy1 kernel: frame pointer         =
0x28:0xffffff8000223910
Aug 14 22:08:04 cblproxy1 kernel: code segment          = base 0x0,
limit 0xfffff, type 0x1b
Aug 14 22:08:04 cblproxy1 kernel: = DPL 0, pres 1, long 1, def32 0,gran 1
Aug 14 22:08:04 cblproxy1 kernel: processor eflags      = interrupt
enabled, resume, IOPL = 0
Aug 14 22:08:04 cblproxy1 kernel: current process = 1(init)
Aug 14 22:08:04 cblproxy1 kernel: trap number           = 12
Aug 14 22:08:04 cblproxy1 kernel: panic: page fault
Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0
Aug 14 22:08:04 cblproxy1 kernel: KDB: stack backtrace:
Aug 14 22:08:04 cblproxy1 kernel: #0 0xffffffff808ddaf0 atkdb_backtrace+0x60
Aug 14 22:08:04 cblproxy1 kernel: #1 0xffffffff808a951d at panic+0x1fd
Aug 14 22:08:04 cblproxy1 kernel: #2 0xffffffff80b81578 attrap_fatal+0x388Aug 14 22:08:04 cblproxy1 kernel: #3 0xffffffff80b81836 attrap_pfault+0x2a6
Aug 14 22:08:04 cblproxy1 kernel: #4 0xffffffff80b80ea1 at trap+0x2a1
Aug 14 22:08:04 cblproxy1 kernel: #5 0xffffffff80b6c7b3 atcalltrap+0x8
Aug 14 22:08:04 cblproxy1 kernel: #6 0xffffffff815276da at
zfsctl_umount_snapshots+0x8a
Aug 14 22:08:04 cblproxy1 kernel: #7 0xffffffff81536766 atzfs_umount+0x76Aug 14 22:08:04 cblproxy1 kernel: #8 0xffffffff809340bc atdounmount+0x3ccAug 14 22:08:04 cblproxy1 kernel: #9 0xffffffff8093c101 atvfs_unmountall+0x71Aug 14 22:08:04 cblproxy1 kernel: #10 0xffffffff808a8eae atkern_reboot+0x4eeAug 14 22:08:04 cblproxy1 kernel: #11 0xffffffff808a89c0 atkern_reboot+0Aug 14 22:08:04 cblproxy1 kernel: #12 0xffffffff80b81dab atamd64_syscall+0x29bAug 14 22:08:04 cblproxy1 kernel: #13 0xffffffff80b6ca9b atXfast_syscall+0xfb


Well its back, 3 of the 8 file systems I am taking snapshots of failed
in last nights backups.

The only thing different on this system from all the 4 others I have
running is that it has a second disk volume with a UFS file system.

Setup is 2 Disks, both setup with GPART:
=>      34  83886013  da0  GPT  (40G)
        34       256    1  boot0  (128k)
       290  10485760    2  swap0  (5.0G)
  10486050  73399997    3  zroot0  (35G)

=>      34  41942973  da1  GPT  (20G)
        34  41942973    1  squid1  (20G)

I didn't want the Squid cache directory on ZFS, system is running on
an ESX 4.1 server backed by iSCSI SAN.  I have 4 other servers running
on  the same group of ESX servers and SAN, booting from ZFS without
this problem.  Two of the other 4 are also running Squid but forward
to this one so they are running without a local disk cache.

A quick update on this, in case anyone else runs into it, I did finallytry on the 2nd of this month to delete my UFS volume, and create a newZFS volume to replace it. I recreated the Squid cache directories andlet squid start over building up cache. So far their hasn't been anoticeable impact on performance with the switch over, and the snapshotproblem has not reoccurred since making the change. Its only a weekinto running this way but the problem before started within 36-48 hours.


--
Thanks,
   Dean E. Weimer
   http://www.dweimer.net/
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: ZFS Snapshots Not able to be accessed under .zfs/snapshot/name

Reply via email to