On 6/13/14, 7:03 AM, "Ellis H. Wilson III" <[email protected]> wrote:
>On 06/13/2014 09:31 AM, Joe Landman wrote: >> On 06/13/2014 09:17 AM, Skylar Thompson wrote: >>> We've recently implemented a quota of 1 million files per 1TB of >>> filesystem space. And yes, we had to clean up a number of groups' and >>> individuals' spaces before implementing that. There seems to be a trend >>> in the bioinformatics community for using the filesystem as a database. >> >> I wasn't going to say anything about this, but, yes, there are some >> significant abuses of file systems going on in this community. But this >> is nothing new, sadly ... I've seen this since the late 90's. > >I think we're all probably too close to the tool in question (HPC >storage). Ultimately this is just a hammer for scientists and other >non-CS/IT types, so of course they are going to scoff when we tell them >they are holding the hammer such that it hits sideways. "Who's to tell >me how to hold the hammer?! This side has more metallic surface area >anyhow, making it easier to hit the nail this way!" > >So you can either: >a) Fix it transparently with automatic policies/FS's in the back-end. >(I know of at least one FS that packs small files with metadata >transparently on SSDs to expedite small file IOPS, but message me >off-list for that as I start work for that shop soon and don't want to >so blatantly advertise). There are limits to how much these Let¹s not let ³concern for efficiency² get in the way of ³users solving problems². I suspect that for a LOT of problems, buying more/faster hardware is more cost effective than changing how the scientist/engineer/user works. Sure, there are HPC applications which are run repeatedly and for which performance is very important (numerical weather simulations, for instance). If it¹s that big a deal, why not make it transparent: as Ellis gave an example of a system that ³blocks² small transactions into better ones transparently. That is the way it should be: the user doesn¹t care how it happens. Do you manually manage memory allocation and cacheing? Or do you let the OS take care of it. Heartbleed is a fine example of what happens someone tries to ³optimize² the performance. Obviously, if you¹re a ³developer of HPC² as a opposed to a ³user of HPC², then understanding what works better or worse or is more or less efficient is important. But there¹s a LOT more ³users of HPC² who are NOT ³developers of HPC², and that¹s who should be the focus. Doesn¹t this harken back to the perennial assembler vs high level language dispute. I think you should spend your time making better optimizing compilers (or better languages for specifying what it is you want to do) rather than advocating programming in assembler. > _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
