Re: Large filesystem recommendation
On 07/25/2013 08:41 PM, Yasha Karant wrote: How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? this is an example of an desktop ssd test: http://uk.hardware.info/reviews/4178/hardwareinfo-tests-lifespan-of-samsung-ssd-840-250gb-tlc-ssd-updated-with-final-conclusion and this is TLC! IMHO, at this moment, the reliability of ssds depends less on the cell storage technology than of firmware quality .. i would say that a pair of at least 120 GB ssds (raid1) are safe to use in enterprise environments. (N.B. you would need at least 80k iops writes at queue depths 8) HTH, Adrian smime.p7s Description: S/MIME Cryptographic Signature
Re: Large filesystem recommendation
Hi Paul Robert Marino! On 2013.07.24 at 20:49:18 -0400, Paul Robert Marino wrote next: Admittedly my knowledge about modern versions of GNU tar may be out of date. My experiences with older versions oder the last 18 years was it didn't support those things. Glad to hear its been updated :-) I will be eager to read the man files tomorrow so I can update some old scripts I wrote a long time ago. The support appeared in mainline tar only recently (about 2 years ago in 1.26), but UV has been patching tar that goes with EL distributions to support this for quite a time (at least since RHEL5 release, http://magazine.redhat.com/2007/07/02/tips-from-an-rhce-tar-vs-star-the-battle-of-xattrs/) In SL6 it's still just tar 1.23 but it fully supports it. So yes, go ahead and use it! It should be supported on about every system by now. -- Vladimir
Re: Large filesystem recommendation
It's not so bad if you build the system taking these things into account (much easier if you wait long enough to read about others' experiences :-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for the log devices. I can't say that they're the best possible choice, opinions vary all over the map, but the box is currently happily accepting 2Gbps continuous NFS writes, which seems pretty decent. Graham On 7/24/2013 5:36 PM, Paul Robert Marino wrote: ZFS is a performance nightmare if you plan to export it via NFS because of a core design conflict with how NFS locking and the ZIL journal in ZFS. Its not just a linux issue it effects Solaris and BSD as well. My only experience with ZFS was on a Solaris NFS server and we had to get a dedicated flash backed ram drive for the ZIL to fix our performance issues, and let me tell you sun charged us a small fortune for the card. Aside from that most of the cool features are available in XFS if you dive deep enough into the documentation though most of them like multi disk spanning can be handled now by LVM or MD but are at least in my opinion handled better by hardware raid. Though I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS.
Re: Large filesystem recommendation
How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? If a ZFS log on a SSD fails, what happens? Is the log automagically recreated on a secondary SSD? Are the drives (spinning and/or SSD) mirrored? Are primary (non-log) data lost? Yasha Karant On 07/25/2013 09:56 AM, Graham Allan wrote: It's not so bad if you build the system taking these things into account (much easier if you wait long enough to read about others' experiences :-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for the log devices. I can't say that they're the best possible choice, opinions vary all over the map, but the box is currently happily accepting 2Gbps continuous NFS writes, which seems pretty decent. Graham On 7/24/2013 5:36 PM, Paul Robert Marino wrote: ZFS is a performance nightmare if you plan to export it via NFS because of a core design conflict with how NFS locking and the ZIL journal in ZFS. Its not just a linux issue it effects Solaris and BSD as well. My only experience with ZFS was on a Solaris NFS server and we had to get a dedicated flash backed ram drive for the ZIL to fix our performance issues, and let me tell you sun charged us a small fortune for the card. Aside from that most of the cool features are available in XFS if you dive deep enough into the documentation though most of them like multi disk spanning can be handled now by LVM or MD but are at least in my opinion handled better by hardware raid. Though I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS.
Re: Large filesystem recommendation
I'm not sure if anyone really knows what the reliability will be, but the hope is obviously that these SLC-type drives should be longer-lasting (and they are in a mirror). Losing the ZIL used to be a fairly fatal event, but that was a long time ago (ZFS v19 or something). I think with current ZFS versions you just lose the performance boost if the dedicated ZIL device fails or goes away. There's a good explanation here: http://www.nexentastor.org/boards/2/topics/6890 Graham On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote: How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? If a ZFS log on a SSD fails, what happens? Is the log automagically recreated on a secondary SSD? Are the drives (spinning and/or SSD) mirrored? Are primary (non-log) data lost? Yasha Karant
RE: Large filesystem recommendation
Overview: http://www.nexenta.com/corp/zfs-education/203-nexentastor-an-introduction-to-zfss-hybrid-storage-pool- The ZIL: See: https://blogs.oracle.com/realneel/entry/the_zfs_intent_log https://blogs.oracle.com/perrin/entry/the_lumberjack http://nex7.blogspot.com/2013/04/zfs-intent-log.html Accordingly it is actually quite ok to use cheap SSD. Two things to do if doing so however: 1) low latency is key keep this in mind when selecting the prospective SSD to use 2) Mirror and stripe the vdev EG: RAID10 ZIL 4x SSD to safe The L2ARC: https://blogs.oracle.com/brendan/entry/test http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/ Accordingly with the L2ARC it is also ok to use cheap SSD same above to two rules apply. However due to the nature of the cache data a striped vdev of 2 SSD is fine as well. Foregoing details but one can also achieve the same sort of general idea to a point as the above with an external journal with ext4. Also with BTRFS mkfs.btrfs -m raid10 SSD SSD SSD SDD -d raid10 disk disk disk disk - Chris -Original Message- From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Graham Allan Sent: Thursday, July 25, 2013 12:54 PM To: Yasha Karant Cc: scientific-linux-users Subject: Re: Large filesystem recommendation I'm not sure if anyone really knows what the reliability will be, but the hope is obviously that these SLC-type drives should be longer-lasting (and they are in a mirror). Losing the ZIL used to be a fairly fatal event, but that was a long time ago (ZFS v19 or something). I think with current ZFS versions you just lose the performance boost if the dedicated ZIL device fails or goes away. There's a good explanation here: http://www.nexentastor.org/boards/2/topics/6890 Graham On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote: How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? If a ZFS log on a SSD fails, what happens? Is the log automagically recreated on a secondary SSD? Are the drives (spinning and/or SSD) mirrored? Are primary (non-log) data lost? Yasha Karant
RE: Large filesystem recommendation
I would actually direct ZOL support questions directly at the zfs-discuss mailing list: http://zfsonlinux.org/lists.html Also we(The GEHC Compute Systems Team) work with SL/Fermi via our internal GE Linux distribution based on SL called HELiOS (http://helios.gehealthcare.com). See: http://scientificlinuxforum.org/index.php?showtopic=1336 - Chris -Original Message- From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Yasha Karant Sent: Thursday, July 25, 2013 1:42 PM To: scientific-linux-users Subject: Re: Large filesystem recommendation Based upon the information below, zfs is under consideration for our disk farm server system. At one point, we had to run lustre to meet an external funding recommendation -- but we do not have that aegis at present. However, one important question: Porting a file system to an OS environment is not always trivial, and can result in actual performance (and in some cases, reliability) reduction/degradation. Is the port of zfs to ELNx x86-64 (N currently 6) professionally supported, and if so, by which entity? Do understand that I regard SL as being professionally supported because there are (paid) professional staff working on SL via Fermilab/CERN -- and TUV EL definitely is so supported. I found: Native ZFS on Linux Produced at Lawrence Livermore National Laboratory from: http://zfsonlinux.org/ that references: http://zfsonlinux.org/zfs-disclaimer.html Is LLNL actually supporting zfs? Yasha Karant On 07/25/2013 10:57 AM, Brown, Chris (GE Healthcare) wrote: Overview: http://www.nexenta.com/corp/zfs-education/203-nexentastor-an-introduct ion-to-zfss-hybrid-storage-pool- The ZIL: See: https://blogs.oracle.com/realneel/entry/the_zfs_intent_log https://blogs.oracle.com/perrin/entry/the_lumberjack http://nex7.blogspot.com/2013/04/zfs-intent-log.html Accordingly it is actually quite ok to use cheap SSD. Two things to do if doing so however: 1) low latency is key keep this in mind when selecting the prospective SSD to use 2) Mirror and stripe the vdev EG: RAID10 ZIL 4x SSD to safe The L2ARC: https://blogs.oracle.com/brendan/entry/test http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/ Accordingly with the L2ARC it is also ok to use cheap SSD same above to two rules apply. However due to the nature of the cache data a striped vdev of 2 SSD is fine as well. Foregoing details but one can also achieve the same sort of general idea to a point as the above with an external journal with ext4. Also with BTRFS mkfs.btrfs -m raid10 SSD SSD SSD SDD -d raid10 disk disk disk disk - Chris -Original Message- From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Graham Allan Sent: Thursday, July 25, 2013 12:54 PM To: Yasha Karant Cc: scientific-linux-users Subject: Re: Large filesystem recommendation I'm not sure if anyone really knows what the reliability will be, but the hope is obviously that these SLC-type drives should be longer-lasting (and they are in a mirror). Losing the ZIL used to be a fairly fatal event, but that was a long time ago (ZFS v19 or something). I think with current ZFS versions you just lose the performance boost if the dedicated ZIL device fails or goes away. There's a good explanation here: http://www.nexentastor.org/boards/2/topics/6890 Graham On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote: How reliable are the SSDs, including actual non-corrected BER, and what is the failure rate / interval ? If a ZFS log on a SSD fails, what happens? Is the log automagically recreated on a secondary SSD? Are the drives (spinning and/or SSD) mirrored? Are primary (non-log) data lost? Yasha Karant
RE: Large filesystem recommendation
Although that said EXT4 is still an inode centric file system with a journal added so moving the journal to a faster volume wont have as big an effect as it does on ground up designed journaling file systems. So while that feature may speed up the journal for EXT4 its still limited by the speed of the in-filesystem inodes regardless of where the journal is located.The difference being XFS, JFS, ZFS, and a few others primarily rely on the journal and write the inboxes as needed after the fact for backwards compatibility for older low level binaries. XFS also uses them with the xfsrepair tool as a DR backup incase the very rare casein the journal getting corrupted (usually due to a hardware issue like raid controller backplane melt down) but even in that case XFS only thin-provisions creates the inodes it reallyneeds the first time they are written to. Which is why themkfs.xfs tool is so fast.EXT4 still pre-allocates all of the possible inodes during formating and writes to the inodes before the journal-- Sent from my HP Pre3On Jul 25, 2013 1:17, Paul Robert Marino prmari...@gmail.com wrote: That's cool I've never noticed that I the documentation but I'll look for it.-- Sent from my HP Pre3On Jul 24, 2013 18:41, Scott Weikart scot...@benetech.org wrote: Though I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS. ext4 supports that. -scott From: owner-scientific-linux-us...@listserv.fnal.gov owner-scientific-linux-us...@listserv.fnal.gov on behalf of Paul Robert Marino prmari...@gmail.com Sent: Wednesday, July 24, 2013 3:36 PM To: Brown, Chris (GE Healthcare); Graham Allan; John Lauro Cc: scientific-linux-users Subject: RE: Large filesystem recommendation ZFS is a performance nightmare if you plan to export it via NFS because of a core design conflict with how NFS locking and the ZIL journal in ZFS. Its not just a linux issue it effects Solaris and BSD as well. My only experience with ZFS was on a Solaris NFS server and we had to get a dedicated flash backed ram drive for the ZIL to fix our performance issues, and let me tell you sun charged us a small fortune for the card. Aside from that most of the cool features are available in XFS if you dive deep enough into the documentation though most of them like multi disk spanning can be handled now by LVM or MD but are at least in my opinion handled better by hardware raid. Though I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS. -- Sent from my HP Pre3 On Jul 24, 2013 16:53, Brown, Chris (GE Healthcare) christopher.br...@med.ge.com wrote: ZFS on Linux will provide you all the goodness that it brought to Solaris and BSD. Check out: http://listserv.fnal.gov/scripts/wa.exe?A2=ind1303L=scientific-linux-usersT=0P=21739 http://listserv.fnal.gov/scripts/wa.exe?A2=ind1303L=scientific-linux-usersT=0P=21882 http://listserv.fnal.gov/scripts/wa.exe?A2=ind1307L=scientific-linux-usersT=0P=4752 - Chris -Original Message- From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Graham Allan Sent: Wednesday, July 24, 2013 3:46 PM To: John Lauro Cc: scientific-linux-users Subject: Re: Large filesystem recommendation XFS seems like the most obvious and maybe safest choice. FWIW, we use it on SL5 and SL6. Ultimately any issues we've had with it turned out to be hardware-related. ZFS has some really nice features, and we are using it for larger filesystems than we have XFS, but so far only on BSD rather than Linux... On Wed, Jul 24, 2013 at 01:59:03PM -0400, John Lauro wrote: What is recommended for a large file system (40TB) under SL6? In the past I have always had good luck with jfs. Might not be the fastest, but very stable. It works well with being able to repair huge filesystems in reasonable amount of RAM, and handle large directories, and large files. Unfortunately jfs doesn't appear to be supported in 6? (or is there a repo I can add?) Besides for support of 40TB filesystem, also need support of files 4TB, and directories with hundreds of thousands of files. What do people recommend? -- - Graham Allan School of Physics and Astronomy - University of Minnesota -