Tom, as Reuti says let's have a look at the nature of these files. what are they, and are analysis jobs really revisiting them again and again?
This is a marvellous tool for analysing filesystem usage: http://www.chiark.greenend.org.uk/~sgtatham/agedu/ I have used it a lot in the past on the scratch storage of our clusters to highlight data which hadn't been used in ages. I'm not sure how long agedu will take to index a large Lustre filesystem like yours, but it would be well worth having a try. Agedu doesn't work on DMF filesystems (as it uses a stat ont he file, and migrated files would appear to be very small). On 12 June 2014 12:12, John Hearns <[email protected]> wrote: > Tom, > I agree with you regarding small files. > In my case, I manage a DMF (SGI Data Migration Facility) setup. > I was concerned at the amount of small files which we were storing - in > terms of the size of the database files, and storing small files to tape. > SGI engineers reassured me that the system will happily cope with millions > of files, and does so on many sites. > DMF also waits till a large 'chunk' is to be written to tape, ie small > writes are queued up. > > However, when watching the amount of files being pushed to the tape tier > one day I noticed something like 10 000 files or more from one user. > Cue the application of a LART. > Seriously though - I did have a word and he agreed to zip up all the small > PNG files his project was generating. > > I have a general policy here that when lots of small files are generated > then the directory is zipped up and the zip files is stored. > We have codes which generate lots of zip files which are stitched together > into movies, and we also store wind tunnel data which is again > lots of PNG files. It is unlikely that anyone would ever want the raw data > files again, but if they should do then an unzip is easy. > > > > Do you distinguish and segregate them (and/or the people that use them) > on special > > hardware/filesystems? > Suggest you invest in a LART. http://dictionary.reference.com/browse/lart > > > > > On 12 June 2014 11:43, Reuti <[email protected]> wrote: > >> Hi, >> >> Am 11.06.2014 um 21:03 schrieb Tom Harvill: >> >> > This is my first time posting to this list, thanks in advance for any >> time you spend >> > replying. >> > >> > We've found that a large majority of our files (~40MM of ~50MM) are >> less than 10KB. >> > We believe our filesystem (lustre) is bottlenecked with IOPs and >> locking related to >> > jobs running against these files. We have ~700TB usable storage with >> ~500TB consumed, >> > almost all consumption is by a relatively small number of very very >> large files. >> >> What data is represented in 10KB: binary or ASCII data - would it work to >> put it in a database instead of all these single files? How do you access >> the files: by some kind of index, name, directory...? >> >> -- Reuti >> >> >> > I want to ask this general question: how does your shop deal with the >> general problem of >> > small files in filesystems on (beowulf) compute clusters? Specifically, >> files that users expect >> > to actively use for read and write operations for their research. >> > >> > Do you distinguish and segregate them (and/or the people that use them) >> on special >> > hardware/filesystems? >> > >> > Thanks! >> > Tom >> > >> > Tom Harvill >> > Holland Computing Center >> > University of Nebraska >> > _______________________________________________ >> > Beowulf mailing list, [email protected] sponsored by Penguin >> Computing >> > To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, [email protected] sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > >
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
