Re: Large filesystem recommendation

2013-07-26 Thread Adrian Sevcenco
On 07/25/2013 08:41 PM, Yasha Karant wrote:
 How reliable are the SSDs, including actual non-corrected BER, and what
 is the failure rate / interval ?
this is an example of an desktop ssd test:
http://uk.hardware.info/reviews/4178/hardwareinfo-tests-lifespan-of-samsung-ssd-840-250gb-tlc-ssd-updated-with-final-conclusion

and this is TLC! IMHO, at this moment, the reliability of ssds depends
less on the cell storage technology than of firmware quality .. i would
say that a pair of at least 120 GB ssds (raid1) are safe to use in
enterprise environments. (N.B. you would need at least 80k iops writes
at queue depths  8)

HTH,
Adrian



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Large filesystem recommendation

2013-07-25 Thread Vladimir Mosgalin
Hi Paul Robert Marino!

 On 2013.07.24 at 20:49:18 -0400, Paul Robert Marino wrote next:

 Admittedly my knowledge about modern versions of GNU tar may be out of date. 
 My
 experiences with older versions oder the last 18 years was it didn't support
 those things. Glad to hear its been updated :-) I will be eager to read the 
 man
 files tomorrow so I can update some old scripts I wrote a long time ago.

The support appeared in mainline tar only recently (about 2 years ago in
1.26), but UV has been patching tar that goes with EL distributions to
support this for quite a time (at least since RHEL5 release,
http://magazine.redhat.com/2007/07/02/tips-from-an-rhce-tar-vs-star-the-battle-of-xattrs/)
In SL6 it's still just tar 1.23 but it fully supports it.

So yes, go ahead and use it! It should be supported on about every
system by now.

-- 

Vladimir


Re: Large filesystem recommendation

2013-07-25 Thread Graham Allan
It's not so bad if you build the system taking these things into account 
(much easier if you wait long enough to read about others' experiences 
:-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for 
the log devices. I can't say that they're the best possible choice, 
opinions vary all over the map, but the box is currently happily 
accepting 2Gbps continuous NFS writes, which seems pretty decent.


Graham

On 7/24/2013 5:36 PM, Paul Robert Marino wrote:

ZFS is a performance nightmare if you plan to export it via NFS because
of a core design conflict with how NFS locking and the ZIL journal in
ZFS. Its not just a linux issue it effects Solaris and BSD as well. My
only experience with ZFS was on a Solaris NFS server and we had to get a
dedicated flash backed ram drive for the ZIL to fix our performance
issues, and let me tell you sun charged us a small fortune for the card.
Aside from that most of the cool features are available in XFS if you
dive deep enough into the documentation though most of them like multi
disk spanning can be handled now by LVM or MD but are at least in my
opinion handled better by hardware raid. Though I will admit the being
able to move your journal to a separate faster volume to increase
performance is very cool and that's only a feature I've seen in XFS and ZFS.


Re: Large filesystem recommendation

2013-07-25 Thread Yasha Karant
How reliable are the SSDs, including actual non-corrected BER, and what 
is the failure rate / interval ?


If a ZFS log on a SSD fails, what happens?  Is the log automagically 
recreated on a secondary SSD?  Are the drives (spinning and/or SSD) 
mirrored? Are primary (non-log) data lost?


Yasha Karant

On 07/25/2013 09:56 AM, Graham Allan wrote:

It's not so bad if you build the system taking these things into account
(much easier if you wait long enough to read about others' experiences
:-). We built our BSD ZFS systems using inexpensive Intel 313 SSDs for
the log devices. I can't say that they're the best possible choice,
opinions vary all over the map, but the box is currently happily
accepting 2Gbps continuous NFS writes, which seems pretty decent.

Graham

On 7/24/2013 5:36 PM, Paul Robert Marino wrote:

ZFS is a performance nightmare if you plan to export it via NFS because
of a core design conflict with how NFS locking and the ZIL journal in
ZFS. Its not just a linux issue it effects Solaris and BSD as well. My
only experience with ZFS was on a Solaris NFS server and we had to get a
dedicated flash backed ram drive for the ZIL to fix our performance
issues, and let me tell you sun charged us a small fortune for the card.
Aside from that most of the cool features are available in XFS if you
dive deep enough into the documentation though most of them like multi
disk spanning can be handled now by LVM or MD but are at least in my
opinion handled better by hardware raid. Though I will admit the being
able to move your journal to a separate faster volume to increase
performance is very cool and that's only a feature I've seen in XFS
and ZFS.


Re: Large filesystem recommendation

2013-07-25 Thread Graham Allan
I'm not sure if anyone really knows what the reliability will be, but
the hope is obviously that these SLC-type drives should be
longer-lasting (and they are in a mirror).

Losing the ZIL used to be a fairly fatal event, but that was a long time
ago (ZFS v19 or something). I think with current ZFS versions you just
lose the performance boost if the dedicated ZIL device fails or goes away.
There's a good explanation here:
  http://www.nexentastor.org/boards/2/topics/6890

Graham

On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote:
 How reliable are the SSDs, including actual non-corrected BER, and
 what is the failure rate / interval ?
 
 If a ZFS log on a SSD fails, what happens?  Is the log automagically
 recreated on a secondary SSD?  Are the drives (spinning and/or SSD)
 mirrored? Are primary (non-log) data lost?
 
 Yasha Karant


RE: Large filesystem recommendation

2013-07-25 Thread Brown, Chris (GE Healthcare)
Overview: 
http://www.nexenta.com/corp/zfs-education/203-nexentastor-an-introduction-to-zfss-hybrid-storage-pool-


The ZIL:
See:
https://blogs.oracle.com/realneel/entry/the_zfs_intent_log
https://blogs.oracle.com/perrin/entry/the_lumberjack
http://nex7.blogspot.com/2013/04/zfs-intent-log.html

Accordingly it is actually quite ok to use cheap SSD.
Two things to do if doing so however:
1) low latency is key keep this in mind when selecting the prospective SSD to 
use
2) Mirror and stripe the vdev EG: RAID10 ZIL 4x SSD to safe

The L2ARC:
https://blogs.oracle.com/brendan/entry/test
http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/

Accordingly with the L2ARC it is also ok to use cheap SSD same above to two 
rules apply. However due to the nature of the cache data a striped vdev of 2 
SSD is fine as well.


Foregoing details but one can also achieve the same sort of general idea to a 
point as the above with an external journal with ext4.
Also with BTRFS mkfs.btrfs -m raid10 SSD SSD SSD SDD -d raid10 disk disk 
disk disk 

- Chris

-Original Message-
From: owner-scientific-linux-us...@listserv.fnal.gov 
[mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Graham 
Allan
Sent: Thursday, July 25, 2013 12:54 PM
To: Yasha Karant
Cc: scientific-linux-users
Subject: Re: Large filesystem recommendation

I'm not sure if anyone really knows what the reliability will be, but the hope 
is obviously that these SLC-type drives should be longer-lasting (and they are 
in a mirror).

Losing the ZIL used to be a fairly fatal event, but that was a long time ago 
(ZFS v19 or something). I think with current ZFS versions you just lose the 
performance boost if the dedicated ZIL device fails or goes away.
There's a good explanation here:
  http://www.nexentastor.org/boards/2/topics/6890

Graham

On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote:
 How reliable are the SSDs, including actual non-corrected BER, and 
 what is the failure rate / interval ?
 
 If a ZFS log on a SSD fails, what happens?  Is the log automagically 
 recreated on a secondary SSD?  Are the drives (spinning and/or SSD) 
 mirrored? Are primary (non-log) data lost?
 
 Yasha Karant


RE: Large filesystem recommendation

2013-07-25 Thread Brown, Chris (GE Healthcare)
I would actually direct ZOL support questions directly at the zfs-discuss 
mailing list: http://zfsonlinux.org/lists.html

Also we(The GEHC Compute Systems Team) work with SL/Fermi via our internal GE 
Linux distribution based on SL called HELiOS (http://helios.gehealthcare.com).
See:  http://scientificlinuxforum.org/index.php?showtopic=1336

- Chris

-Original Message-
From: owner-scientific-linux-us...@listserv.fnal.gov 
[mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Yasha 
Karant
Sent: Thursday, July 25, 2013 1:42 PM
To: scientific-linux-users
Subject: Re: Large filesystem recommendation

Based upon the information below, zfs is under consideration for our disk farm 
server system.  At one point, we had to run lustre to meet an external funding 
recommendation -- but we do not have that aegis at present.  However, one 
important question:

Porting a file system to an OS environment is not always trivial, and can 
result in actual performance (and in some cases, reliability) 
reduction/degradation.  Is the port of zfs to ELNx x86-64 (N currently
6) professionally supported, and if so, by which entity?  Do understand that I 
regard SL as being professionally supported because there are
(paid) professional staff working on SL via Fermilab/CERN -- and TUV EL 
definitely is so supported.

I found:   Native ZFS on Linux
Produced at Lawrence Livermore National Laboratory
from:
http://zfsonlinux.org/

that references:

http://zfsonlinux.org/zfs-disclaimer.html

Is LLNL actually supporting zfs?

Yasha Karant

On 07/25/2013 10:57 AM, Brown, Chris (GE Healthcare) wrote:
 Overview: 
 http://www.nexenta.com/corp/zfs-education/203-nexentastor-an-introduct
 ion-to-zfss-hybrid-storage-pool-


 The ZIL:
 See:
 https://blogs.oracle.com/realneel/entry/the_zfs_intent_log
 https://blogs.oracle.com/perrin/entry/the_lumberjack
 http://nex7.blogspot.com/2013/04/zfs-intent-log.html

 Accordingly it is actually quite ok to use cheap SSD.
 Two things to do if doing so however:
 1) low latency is key keep this in mind when selecting the prospective 
 SSD to use
 2) Mirror and stripe the vdev EG: RAID10 ZIL 4x SSD to safe

 The L2ARC:
 https://blogs.oracle.com/brendan/entry/test
 http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/

 Accordingly with the L2ARC it is also ok to use cheap SSD same above to two 
 rules apply. However due to the nature of the cache data a striped vdev of 2 
 SSD is fine as well.


 Foregoing details but one can also achieve the same sort of general idea to a 
 point as the above with an external journal with ext4.
 Also with BTRFS mkfs.btrfs -m raid10 SSD SSD SSD SDD -d raid10 disk 
 disk disk disk

 - Chris

 -Original Message-
 From: owner-scientific-linux-us...@listserv.fnal.gov 
 [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of 
 Graham Allan
 Sent: Thursday, July 25, 2013 12:54 PM
 To: Yasha Karant
 Cc: scientific-linux-users
 Subject: Re: Large filesystem recommendation

 I'm not sure if anyone really knows what the reliability will be, but the 
 hope is obviously that these SLC-type drives should be longer-lasting (and 
 they are in a mirror).

 Losing the ZIL used to be a fairly fatal event, but that was a long time ago 
 (ZFS v19 or something). I think with current ZFS versions you just lose the 
 performance boost if the dedicated ZIL device fails or goes away.
 There's a good explanation here:
http://www.nexentastor.org/boards/2/topics/6890

 Graham

 On Thu, Jul 25, 2013 at 10:41:50AM -0700, Yasha Karant wrote:
 How reliable are the SSDs, including actual non-corrected BER, and 
 what is the failure rate / interval ?

 If a ZFS log on a SSD fails, what happens?  Is the log automagically 
 recreated on a secondary SSD?  Are the drives (spinning and/or SSD) 
 mirrored? Are primary (non-log) data lost?

 Yasha Karant


RE: Large filesystem recommendation

2013-07-24 Thread Paul Robert Marino
Although that said EXT4 is still an inode centric file system with a journal added so moving the journal to a faster volume wont have as big an effect as it does on ground up designed journaling file systems. So while that feature may speed up the journal for EXT4 its still limited by the speed of the in-filesystem inodes regardless of where the journal is located.The difference being XFS, JFS, ZFS, and a few others primarily rely on the journal and write the inboxes as needed after the fact for backwards compatibility for older low level binaries. XFS also uses them with the xfsrepair tool as a DR backup incase the very rare casein the journal getting corrupted (usually due to a hardware issue like raid controller backplane melt down) but even in that case XFS only thin-provisions creates the inodes it reallyneeds the first time they are written to. Which is why themkfs.xfs tool is so fast.EXT4 still pre-allocates all of the possible inodes during formating and writes to the inodes before the journal-- Sent from my HP Pre3On Jul 25, 2013 1:17, Paul Robert Marino prmari...@gmail.com wrote: That's cool I've never noticed that I the documentation but I'll look for it.-- Sent from my HP Pre3On Jul 24, 2013 18:41, Scott Weikart scot...@benetech.org wrote: 

 Though I will admit the being able to move your journal to a
 separate faster volume to increase performance is very cool
 and that's only a feature I've seen in XFS and ZFS.

ext4 supports that.

-scott


From: owner-scientific-linux-us...@listserv.fnal.gov owner-scientific-linux-us...@listserv.fnal.gov on behalf of Paul Robert Marino prmari...@gmail.com
Sent: Wednesday, July 24, 2013 3:36 PM
To: Brown, Chris (GE Healthcare); Graham Allan; John Lauro
Cc: scientific-linux-users
Subject: RE: Large filesystem recommendation


ZFS is a performance nightmare if you plan to export it via NFS because of a core design conflict with how NFS locking and the ZIL journal in ZFS. Its not just a linux issue it effects Solaris and BSD as well. My only experience with ZFS was on a Solaris
 NFS server and we had to get a dedicated flash backed ram drive for the ZIL to fix our performance issues, and let me tell you sun charged us a small fortune for the card.
Aside from that most of the cool features are available in XFS if you dive deep enough into the documentation though most of them like multi disk spanning can be handled now by LVM or MD but are at least in my opinion handled better by hardware raid. Though
 I will admit the being able to move your journal to a separate faster volume to increase performance is very cool and that's only a feature I've seen in XFS and ZFS.




-- Sent from my HP Pre3



On Jul 24, 2013 16:53, Brown, Chris (GE Healthcare) christopher.br...@med.ge.com wrote:


ZFS on Linux will provide you all the goodness that it brought to Solaris and BSD.

Check out: 
http://listserv.fnal.gov/scripts/wa.exe?A2=ind1303L=scientific-linux-usersT=0P=21739


http://listserv.fnal.gov/scripts/wa.exe?A2=ind1303L=scientific-linux-usersT=0P=21882


http://listserv.fnal.gov/scripts/wa.exe?A2=ind1307L=scientific-linux-usersT=0P=4752


- Chris 


-Original Message- 
From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Graham Allan

Sent: Wednesday, July 24, 2013 3:46 PM 
To: John Lauro 
Cc: scientific-linux-users 
Subject: Re: Large filesystem recommendation 

XFS seems like the most obvious and maybe safest choice. FWIW, we use it on SL5 and SL6. Ultimately any issues we've had with it turned out to be hardware-related.


ZFS has some really nice features, and we are using it for larger filesystems than we have XFS, but so far only on BSD rather than Linux...


On Wed, Jul 24, 2013 at 01:59:03PM -0400, John Lauro wrote: 
 What is recommended for a large file system (40TB) under SL6? 
 
 In the past I have always had good luck with jfs. Might not be the 
 fastest, but very stable. It works well with being able to repair 
 huge filesystems in reasonable amount of RAM, and handle large 
 directories, and large files. Unfortunately jfs doesn't appear to be 
 supported in 6? (or is there a repo I can add?) 
 
 
 Besides for support of 40TB filesystem, also need support of files 4TB, and directories with hundreds of thousands of files. What do people recommend?


-- 
- 
Graham Allan 
School of Physics and Astronomy - University of Minnesota 
-