On 06/13/2014 01:37 PM, Lux, Jim (337C) wrote: > I¹ve always advocated using the file system as a database: in the sense of > ³lots of little files, one for each data blob², where ³data blob² is > bigger than a few bytes, but perhaps in the hundreds/thousands of bytes or > larger. > > 1) Rather than spend time implementing some sort of database, the file > system is already there > 2) The file system is likely optimized better for whatever platform it is > running. It runs ³closer to the metal², and hopefully is tightly > integrated with things like cacheing, and operating system tricks. > 3) The file system is optimized for allocation and deallocation of space, > so I don¹t have to write that, or hope that my ³database engine of choice² > does it right. > 4) backup and restore of parts of the data is straightforward without > needing any special utilities(e.g. File timestamp gives mod dates, etc.)
This works well for local access, and even OK over something like NFS for a single node's access, but for distributed access cache invalidation (especially for metadata) gets to be a serious problem. I work with a lot of people who write a workflow for their desktop/laptop, then try running it successfully with a single process on a cluster. When they try running it with hundreds of processes distributed across many nodes, they're confused when it all falls apart. Backup/restore can be complicated too, depending on the storage technology. Many storage vendors assume that NDMP is the end-all for backup technology, and provide nothing but that. My view is that NDMP is a scam setup by storage vendors to get people to buy more storage, but that's a discussion for another thread... Skylar _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
