[zfs-discuss] [Fwd: [Fwd: MySQL benchmark]]
Original Message Subject: [zfs-discuss] MySQL benchmark Date: Tue, 30 Oct 2007 00:32:43 + From: Robert Milkowski [EMAIL PROTECTED] Reply-To: Robert Milkowski [EMAIL PROTECTED] Organization: CI TASK http://www.task.gda.pl To: zfs-discuss@opensolaris.org Hello zfs-discuss, http://dev.mysql.com/tech-resources/articles/mysql-zfs.html I've just quickly glanced thru it. However the argument about double buffering problem is not valid. -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- end of forwarded message --- I absolutely agree with Robert here. Data is cached once in the Database and, absent directio, _some_ extra memory is required to stage the I/Os. On read it's a tiny amount since memory can be reclaimed as soon as it's copied to user space. 1 threads each waiting for 8K will serviced using 80M of extra memory. On the write path we need to stage the data for the purpose of a ZFS transaction group. When the dust settles we will be able to do this every 5 seconds. So what percentage of DB blocks are modified in 5 to 10 seconds ? If the answer is 5% then yes, the lack of directio is a concern for you. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Filesystem Community? [was: SquashFS port, interested?]
On Mon, 2007-11-05 at 02:16 -0800, Thomas Lecomte wrote: Hello there - I'm still waiting for an answer from Phillip Lougher [the SquashFS developer]. I had already contacted him some month ago, without any answer though. I'll still write a proposal, and probably start the work soon too. Sounds good! *me thinks it would be cool to finally have a generic filesystem community* -M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [fuse-discuss] Filesystem Community? [was: SquashFS port, interested?]
On Mon, 5 Nov 2007, Mark Phalan wrote: On Mon, 2007-11-05 at 02:16 -0800, Thomas Lecomte wrote: Hello there - I'm still waiting for an answer from Phillip Lougher [the SquashFS developer]. I had already contacted him some month ago, without any answer though. I'll still write a proposal, and probably start the work soon too. Sounds good! *me thinks it would be cool to finally have a generic filesystem community* _Do_ we finally get one ? Can't wait :-) FrankH. -M ___ fuse-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/fuse-discuss -- No good can come from selling your freedom, not for all the gold in the world, for the value of this heavenly gift far exceeds that of any fortune on earth. -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [fuse-discuss] Filesystem Community? [was: SquashFS port, interested?]
[EMAIL PROTECTED] wrote: *me thinks it would be cool to finally have a generic filesystem community* _Do_ we finally get one ? Can't wait :-) I would like to have a generic filesystem community. . or declare the ufs communtiy to be the generic part in addition. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [fuse-discuss] Filesystem Community? [was: SquashFS port, interested?]
On Mon, 2007-11-05 at 10:27 +, [EMAIL PROTECTED] wrote: On Mon, 5 Nov 2007, Mark Phalan wrote: On Mon, 2007-11-05 at 02:16 -0800, Thomas Lecomte wrote: Hello there - I'm still waiting for an answer from Phillip Lougher [the SquashFS developer]. I had already contacted him some month ago, without any answer though. I'll still write a proposal, and probably start the work soon too. Sounds good! *me thinks it would be cool to finally have a generic filesystem community* _Do_ we finally get one ? Can't wait :-) I know it was part of the OGB/2007/002 Community and Project Reorganisation proposal but I have no idea what happened to it :( See the thread OGB/2007/002 Community and Project Reorganisation on ogb-discuss from April. I'm CCing ogb-discuss. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HAMMER
Peter Tribble wrote: I'm not worried about the compression effect. Where I see problems is backing up million/tens of millions of files in a single dataset. Backing up each file is essentially a random read (and this isn't helped by raidz which gives you a single disks worth of random read I/O per vdev). I would love to see better ways of backing up huge numbers of files. It's worth correcting this point... the RAIDZ behavior you mention only occurs if the read size is not aligned to the dataset's block size. The checksum verifier must read the entire stripe to validate the data, but it does that in parallel across the stripe's vdevs. The whole block is then available for delivery to the application. Although, backing up millions/tens of millions of files in a single backup dataset is a bad idea anyway. The metadata searches will kill you, no matter what backend filesystem is supporting it. zfs send is the faster way of backing up huge numbers of files. But you pay the price in restore time. (But that's the normal tradeoff) --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force SATA1 on AOC-SAT2-MV8
That explains the problems; however, I am able to get them to run by jumpering them down to SATA1 which brings me back to my original question. Is there a way to force sata 1 without cracking the drive case and voiding the warranty? I only have so many expansion slots, so an 8 port supermicro is about the only controller card option that I have found. If someone can suggest an 8 port esata card that works with Solaris, I may give that a try instead - but I was mainly looking for something along the lines of 'change line x in configuration file y' Thanks again, Eric This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force SATA1 on AOC-SAT2-MV8
Eric Haycraft wrote: That explains the problems; however, I am able to get them to run by jumpering them down to SATA1 which brings me back to my original question. Is there a way to force sata 1 without cracking the drive case and voiding the warranty? I only have so many expansion slots, so an 8 port supermicro is about the only controller card option that I have found. If someone can suggest an 8 port esata card that works with Solaris, I may give that a try instead - but I was mainly looking for something along the lines of 'change line x in configuration file y' There is no way (short of patching the text of the driver) to alter the allowed SATA communication speeds for the marvell88sx driver. If you wish you can request an RFE (Request For Enhancement), but I don't think it will be given high priority. Sorry, Lida Horn Thanks again, Eric This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unreasonably high sys utilization during file create operations.
While doing some testing of ZFS on systems which house the storage backend for a custom imap data store I have witnessed 90-100% sys utilization during moderately high file creation periods. I'm not sure if this is something inherent in the design of ZFS or if this can be tuned out. But the sys usage goes so high as nto cause blocking on tcp operations for observable periods as long as 20-40 seconds. This effectively makes the system unusable for it's designed goal as the tcp delay is longer than could reasonably be expected as a timeout value for the network services/clients which run/utilize the server. Any insight would be greatly appreciated. This behavior is observed on freshly installed sol10u4 (patched to latest 10_recommended) on coolthreads t1000 servers using dual 10k rpm drives. Thanks in Advance. Here's a an excerpt from vmstat showing 100% sys utilization: kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s0 s2 -- -- in sy cs us sy id 19 0 0 19502744 3999456 366 9912 2194 0 0 0 0 0 0 0 0 24868 112987 58937 49 49 2 5 0 0 19499696 4001416 345 3951 3501 0 0 0 0 0 0 0 0 36075 168337 96305 57 40 3 1 0 0 19492128 3993160 450 4542 2275 0 0 0 0 0 0 0 0 38505 128766 82580 25 46 29 14 0 0 19488608 3989880 30 207 0 0 0 0 0 0 0 0 0 3042 3932 3106 1 75 24 214 0 0 19488800 3990048 0 2 5 0 0 0 0 5 5 0 0 2647 1717 1767 0 97 3 [b]242 0 0 19489568 3990776 0 1 0 0 0 0 0 0 0 0 0 2548 1057 387 0 100 0[/b] [b]298 0 0 19489568 3990776 0 0 0 0 0 0 0 1 0 0 0 2523 1086 587 0 100 0[/b] 43 0 0 19481184 3980456 1289 31472 2499 0 0 0 0 144 145 0 0 32371 179144 97796 44 54 2 2 0 0 19481872 3974600 2125 9766 5922 0 0 0 0 96 96 0 0 40929 170587 112274 37 55 7 0 0 0 19486568 3974504 551 1110 2632 0 0 0 0 107 107 0 0 35579 95488 77269 17 31 52 0 0 0 19485136 3969520 354 1310 2183 0 0 0 0 125 125 0 0 24527 56481 46909 11 17 72 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force SATA1 on AOC-SAT2-MV8
On Sun, 4 Nov 2007, Rob Windsor wrote: Eric Haycraft wrote: The drives (6 in total) are external (eSATA) ones, so they have their own enclosure that I can't open without voiding the warranty... I destroyed one enclosure trying out ways to get it to work and learned that there was no way to open them up without wrecking the case :( I have 2 meter sata to esata cables. The drives are 750GB FreeAgent Pro USB/eSATA drives from Seagate. Thanks for your help. IIRC, eSATA has different signalling specifications from (i)SATA (higher voltages, for example). This would mean that a (passive) SATA-eSATA adapter on a SATA2 card could present its own issues. OK - now I understand the issues. The only suggestion I can offer is to try/see if a shielded SATA cable would work for you. See: http://www.cs-electronics.com/serial-ata.htm Also - talk to the CS Electronics guys - they may offer you alternative solutions. I've found them to be knowledgable and reasonably priced - they have never foobarred an order on me. [the usual disclaimers - just a satisfied customer over many years]. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from sugar-coating school? Sorry - I never attended! :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Jumpstart integration and the amazing invisible zpool.cache
---8--- run last in client_end_script ---8--- #!/bin/sh zpool list | grep -w data /dev/null || exit 0 echo /sbin/zpool export data /sbin/zpool export data echo /sbin/mount -F lofs /devices /a/devices /sbin/mount -F lofs /devices /a/devices echo chroot /a /sbin/zpool import data chroot /a /sbin/zpool import data The final step is the trick ;) /Tomas Thomas thank you a million times over for this suggestion. I had a few little hangups getting this implemented, but here is the script-fu that accomplished it. Some of it is still a bit kludgy for my tastes, but I expect (Sun are you listening) that zfs root and zfs targets will be supported natively in jumpstart soon enough. The shuffle of data after the ufsdump/restore is a little different if your inital jumpstart profile puts var and usr on seperate partitions. I think that dump/restore in the current CVS repo for opensolaris might have different behavior. #Define some usefule variables DISK1=`/bin/echo ${SI_DISKLIST}|/bin/awk -F, '{print $1}'` DISK2=`/bin/echo ${SI_DISKLIST}|/bin/awk -F, '{print $2}'` #create the base zfs mirror device pool echo rebuild device nodes devfsadm echo rebuilding device nodes on target root devfsadm -r /a echo Destroy existing base zpool zpool destroy -f base echo Create new base zpool as a mirror of slice 3 from both disks zpool create -m none base mirror ${DISK1}s3 ${DISK2}s3 echo Create base/var zfs vol zfs create -o mountpoint=legacy -o atime=off base/var echo Create base/usr zfs vol zfs create -o mountpoint=legacy -o atime=off base/usr echo Create base/spool zfs vol zfs create -o mountpoint=legacy -o atime=off base/spool echo Creating and setting perms for /a/var.z mkdir /a/var.z chmod 755 /a/var.z chown 0:0 /a/var.z echo Creating and setting perms for /a/usr.z mkdir /a/usr.z chmod 755 /a/usr.z chown 0:0 /a/usr.z echo Adding lines to vfstab for zfs mounts echo base/var - /varzfs - yes -/a/etc/vfstab echo base/usr - /usrzfs - yes -/a/etc/vfstab echo base/spool- /var/spool zfs - yes -/a/etc/vfstab echo mounting var.z and usr.z mount -F zfs base/var /a/var.z mount -F zfs base/usr /a/usr.z echo dumping /a/usr to /a/usr.z (cd /a/usr.z;ufsdump 0f - /a/usr|ufsrestore rf -;mv ./usr/* ./;rmdir ./usr) echo dumping /a/var to /a/var.z (cd /a/var.z;ufsdump 0f - /a/var|ufsrestore rf -;mv ./var/* ./;rmdir ./var) echo export base zpool /sbin/zpool export base echo loop /devices to /a/devices /sbin/mount -F lofs /devices /a/devices echo import to base zpool to /a chroot chroot /a /sbin/zpool import base mv /a/var /a/var.local mv /a/usr /a/usr.local echo Creating and setting perms for /a/var mkdir /a/var chmod 755 /a/var chown 0:0 /a/var echo Creating and setting perms for /a/usr mkdir /a/usr chmod 755 /a/usr chown 0:0 /a/usr echo unmounting /a/var.z and /a/usr.z umount /a/var.z umount /a/usr.z echo importing zpool base again for the final time zpool import -f base echo mounting /a/usr and /a/var for sane shutdown mount -F zfs base/var /a/var mount -F zfs base/usr /a/usr echo move spool contents mv /a/var/spool /a/var/spool.old echo create new mount point mkdir /a/var/spool chmod 755 /a/var/spool chown 0:3 /a/var/spool echo mounting new spool zfs vol mount -F zfs base/spool /a/var/spool echo moving spool contents into new mount mv /a/var/spool.old/* /a/var/spool/ echo Finished! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory issue
Jeff, this sounds like the notorious array cache flushing issue. See http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes -- richard Jeff Meidinger wrote: Hello, I received the following question from a company I am working with: We are having issues with our early experiments with ZFS with volumes mounted from a 6130. Here is what we have and what we are seeing: T2000 (geronimo) on the fibre with a 6130. 6130 configured with UFS volumes mapped and mounted on several other hosts. it's the only host using ZFS volume (only one volume/filesystem configured). When I attempt to load the volume from backup, we see memory being consumed at a very high rate on the host with the ZFS filesystem mounted and it seems that disk latencies all hosts connected through the fibre to the 6130 increase to the point where performance problems are noted. Our monitoring system eventually got blocked, I assume, due to resource starvation; either the machine was thrashing or waiting for I/O. Before the system hung, I looked at memory allocation using kdb and saw anonymous allocations responsible for far and away the biggest chunk. Also, when the backup is suspended the memory is not freed. Eventually, the server hung and rebooted (perhaps due to Oracle cluster-health mechanism - I won't blame ZFS for that ;-). I suspect a ZFS caching issue. I was directed to this doc (http://blogs.digitar.com/jjww/?itemid=44) . It sort of addresses the issue we have encountered but I'd rather get the news from you guys. How shall I proceed? I have a system I can use and abuse in preproduction for this purpose. We need to load a Terabyte into a production ZFS filesystem without pulling down everyone on the fibre... Please respond to me directly as well as to the alias as I am not added yet. Thanks, Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS very slow under xVM
I had a similar problem on a quad core amd box with 8 gig of ram... The performance was nice for a few minutes but then the system will crawl to a halt. The problem was that the areca SATA drivers can't do DMA when the dom0 memory wasn't at 3 gig or lower. On 04/11/2007, at 3:49 PM, Martin wrote: Mitchell The problem seems to occur with various IO patterns. I first noticed it after using ZFS based storage for a disk image for a xVM/ Xen virtual domain, and then, while tracking ti down, observed that either cp of a large .iso disk image would reproduce the problem, and more later, a single dd if=/dev/zero of=myfile bs=16k count=15 would. So I guess this latter case is a mostly write pattern to the disk, especially after it is noted that the command returns after around 5 seconds, leaving the rest buffered in memory. best regards Martin This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss