Re: [zfs-discuss] ZFS on Fit-PC Slim?
On 6 Nov 2008, at 04:09, Vincent Fox wrote: > According to the slides I have seen, a ZFS filesystem even on a > single disk can handle massive amounts of sector failure before it > becomes unusable. I seem to recall it said 1/8th of the disk? So > even on a single disk the redundancy in the metadata is valuable. > And if I don't have really very much data I can set copies=2 so I > have better protection for the data as well. > > My goal is a compact low-powered and low-maintenance widget. > Eliminating the chance of fsck is always a good thing now that I > have tasted ZFS. In my personal experience, disks are more likely to fail completely than suffer from small sector failures. But don't get me wrong, provided you have a good backup strategy and can afford the downtime of replacing the disk and restoring, then ZFS is still a great filesystem to use for a single disk. Dont be put off. Many of the people on this list are running multi- terabyte enterprise solutions and are unable to think in terms of non- redundant, small numbers of gigabytes :-) > I'm going to try and see if Nevada will even install when it > arrives, and report back. Perhaps BSD is another option. If not I > will fall back to Ubuntu. I have FreeBSD and ZFS working fine(*) on a 1.8GHz VIA C7 (32bit) processor. Admittedly this is with 2GB of RAM, but I set aside 1GB for ARC and the machine is still showing 750MB free at the moment, so I'm sure it could run with 256MB of ARC in under 512MB. 1.8GHz is a fair bit faster than the Geode in the Fit-PC, but the C7 scales back to 900MHz and my machine still runs acceptably at that speed (although I wouldn't want to buildworld with it). I say, give it a go and see what happens. I'm sure I can still dimly recall a time when 500MHz/512MB was a kick-ass system... Jonathan (*) This machine can sustain 110MB/s off of the 4-disk RAIDZ1 set, which is substantially more than I can get over my 100Mb network. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-auto-snapshot default schedules
On 25 Sep 2008, at 17:14, Darren J Moffat wrote: > Chris Gerhard has a zfs_versions script that might help: > http://blogs.sun.com/chrisg/entry/that_there_is Ah. Cool. I will have to try this out. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-auto-snapshot default schedules
On 25 Sep 2008, at 14:40, Ross wrote: > For a default setup, I would have thought a years worth of data > would be enough, something like: Given that this can presumably be configured to suit everyone's particular data retention plan, for a default setup, what was originally proposed seems obvious and sensible to me. Going slightly off-topic: All this auto-snapshot stuff is ace, but what's really missing, in my view, is some easy way to actually determine where the version of the file you want is. I typically find myself futzing about with diff across a dozen mounted snapshots trying to figure out when the last good version is. It would be great if there was some way to know if a snapshot contains blocks for a particular file, i.e., that snapshot contains an earlier version of the file than the next snapshot / now. If you could do that and make ls support it with an additional flag/column, it'd be a real time-saver. The current mechanism is especially hard as the auto-mount dirs can only be found at the top of the filesystem so you have to work with long path names. An fs trick to make .snapshot dirs of symbolic links appear automagically would rock, i.e., % cd /foo/bar/baz % ls -l .snapshot [...] nightly.0 -> /foo/.zfs/snapshot/nightly.0/bar/baz % diff {,.snapshot/nightly.0/}importantfile Yes, I know this last command can just be written as: % diff /foo/{,.zfs/snapshot/nightly.0}/bar/baz/importantfile but this requires me to a) type more; and b) remember where the top of the filesystem is in order to split the path. This is obviously more of a pain if the path is 7 items deep, and the split means you can't just use $PWD. [My choice of .snapshot/nightly.0 is a deliberate nod to the competition ;-)] Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Announcement: The Unofficial Unsupported Python ZFS API
On 14 Jul 2008, at 16:07, Will Murnane wrote: > As long as I'm composing an email, I might as well mention that I had > forgotten to mention Swig as a dependency (d'oh!). I now have a > mention of it on the page, and a spec file that can be built using > pkgtool. If you tried this before and gave up because of a missing > package, please give it another shot. Not related to the actual API itself, but just thought I'd note that all the cool kids are using ctypes these days to bind Python to foreign libraries. http://docs.python.org/lib/module-ctypes.html This has the advantage of requiring no other libraries and no compile phase at all. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
On 9 Jun 2008, at 14:59, Thomas Maier-Komor wrote: >> time gdd if=/dev/zero bs=1048576 count=10240 of=/data/video/x >> >> real 0m13.503s >> user 0m0.016s >> sys 0m8.981s >> >> > > Are you sure gdd doesn't create a sparse file? One would presumably expect it to be instantaneous if it was creating a sparse file. It's not a compressed filesystem though is it? /dev/ zero tends to be fairly compressible ;-) I think, as someone else pointed out, running zpool iostat at the same time might be the best way to see what's really happening. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore
On 30 May 2008, at 15:49, J.P. King wrote: > For _my_ purposes I'd be happy with zfs send/receive, if only it was > guaranteed to be compatible between versions. I agree that the > inability > to extract single files is an irritation - I am not sure why this is > anything more than an implementation detail, but I haven't gone into > it in > depth. I would presume it is because zfs send/receive works at the block level, below the ZFS POSIX layer - i.e., below the filesystem level. I would guess that a stream is simply a list of the blocks that were modified between the two snapshots, suitable for "re-playing" on another pool. This means that the stream may not contain your entire file. An interesting point regarding this is that send/receive will be optimal in the case of small modifications to very large files, such as database files or large log files. The actual modified/appended blocks would be sent rather than the whole changed file. This may be an important point depending on your file modification patterns. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore
On 29 May 2008, at 17:52, Chris Siebenmann wrote: > The first issue alone makes 'zfs send' completely unsuitable for the > purposes that we currently use ufsdump. I don't believe that we've > lost > a complete filesystem in years, but we restore accidentally deleted > files all the time. (And snapshots are not the answer, as it is common > that a user doesn't notice the problem until well after the fact.) > > ('zfs send' to live disks is not the answer, because we cannot afford > the space, heat, power, disks, enclosures, and servers to spin as many > disks as we have tape space, especially if we want the fault isolation > that separate tapes give us. most especially if we have to build a > second, physically separate machine room in another building to put > the > backups in.) However, the original poster did say they were wanting to backup to another disk and said they wanted something lightweight/cheap/easy. zfs send/receive would seem to fit the bill in that case. Let's answer the question rather than getting into an argument about whether zfs send/receive is suitable for an enterprise archival solution. Using snapshots is a useful practice as it costs fairly little in terms of disk space and provides immediate access to fairly recent, accidentally deleted files. If one is using snapshots, sending the streams to the backup pool is a simple procedure. One can then keep as many snapshots on the backup pool as necessary to provide the amount of history required. All of the files are kept in identical form on the backup pool for easy browsing when something needs to be restored. In event of catastrophic failure of the primary pool, one can quickly move the backup disk to the primary system and import it as the new primary pool. It's a bit-perfect incremental backup strategy that requires no additional tools. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore
On 29 May 2008, at 15:51, Thomas Maier-Komor wrote: >> I very strongly disagree. The closest ZFS equivalent to ufsdump is >> 'zfs >> send'. 'zfs send' like ufsdump has initmiate awareness of the the >> actual on disk layout and is an integrated part of the filesystem >> implementation. >> >> star is a userland archiver. >> > > The man page for zfs states the following for send: > > The format of the stream is evolving. No backwards compati- > bility is guaranteed. You may not be able to receive your > streams on future versions of ZFS. > > I think this should be taken into account when considering 'zfs send' > for backup purposes... Presumably, if one is backing up to another disk, one could zfs receive to a pool on that disk. That way you get simple file-based access, full history (although it could be collapsed by deleting older snapshots as necessary), and no worries about stream format changes. Jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Video streaming and prefetch
Hi all, I'm new to this list and ZFS, so forgive me if I'm re-hashing an old topic. I'm also using ZFS on FreeBSD not Solaris, so forgive me for being a heretic ;-) I recently setup a home NAS box and decided that ZFS is the only sensible way to manage 4TB of disks. The primary use of the box is to serve my telly (actually a Mac mini). This is using afp (via netatalk) to serve space to the telly for storing and retrieving video. The video tends to be 2-4GB files that are read/written sequentially at a rate in the region of 800KB/s. Unfortunately, the performance has been very choppy. The video software assumes it's talking to fast local storage and thus makes little attempt to buffer. I spent a long time trying to figure out the network problem before determining that the problem is actually in reading from the FS. This is a pretty cheap box, but it can still sustain 110GB/s off the array and low milliseconds access times. So there really is no excuse for not being able to serve up 800KB/s in an even fashion. After some experimentation I have determined that the problem is prefetching. Given this thing is mostly serving sequentially at a low, even rate it ought to be perfect territory for prefetching. I spent the weekend reading the ZFS code (bank holiday fun eh?) and running some experiments and think the problem is in the interaction between the prefetching code and the running processes. (Warning: some of the following is speculation on observed behaviour and may be rubbish.) The behaviour I see is the file streaming stalling whenever the prefetch code decides to read some more blocks. The dmu_zfetch code is all run as part of the read() operation. When this finds itself getting close to running out of prefetched blocks it queues up requests for more blocks - 256 of them. At 128KB per block, that's 32MB of data it requests. At this point it should be asynchronous and the caller should get back control and be able to process the data it just read. However, my NAS box is a uniprocessor and the issue thread is higher priority than user processes. So, in fact, it immediately begins issuing the physical reads to the disks. Given that modern disks tend to prefetch into their own caches anyway, some of these reads are likely to be served up instantly. This causes interrupts back into the kernel to deal with the data. This queues up the interrupt threads, which are also higher priority than user processes. These consume a not-insubstantial amount of CPU time to gather, checksum and load the blocks into the ARC. During which time, the disks have located the other blocks and started serving them up. So what I seem to get is a small "perfect storm" of interrupt processing. This delays the user process for a few hundred milliseconds. Even though the originally requested block was *in* the cache! To add insult to injury the, user process in this case, when it finally regains the CPU and returns the data to the the caller, then sleeps for a couple of hundred milliseconds. So prefetching, instead of evening-out reading and reducing jitter, has produced the worst case performance of compressing all of the jitter into one massive lump every 40 seconds (32MB / 800K). I get reasonably even performance if I disable prefetching or if I reduce the zfetch_block_cap to 16-32 blocks instead of 256. Other than just taking this opportunity to rant, I'm wondering if anyone else has seen similar problems and found a way around them? Also, to any ZFS developers: why does the prefetching logic follow the same path as a regular async read? Surely these ought to be way down the priority list? My immediate thought after a weekend of reading the code was to re-write it to use a low priority prefetch thread and have all of the dmu_zfetch() logic in that instead of in-line with the original dbuf_read(). Jonathan PS: Hi Darren! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss