Mark, OrangeFS is NOT optimized for small file I/O, thus we don't recommend untarring or compiling on this system.
We are working to make small file I/O better, but the system was originally designed for large I/O. To that end, we now have a readahead cache in 2.9.6, which helps with small reads, but we don't have a write cache. We are currently working on some kernel-level page caching mechanisms to help with small writes, but the concern is with data consistency amongst compute nodes. The user must be responsible for ensuring that only one node, out of many within one job, needs the most current data. That's a lot to ask of most users, so you see our dilemma. In your scenario, are you using LMDB or BDB? We have found that LMDB gives must better metadata performance than BDB. We deliver LMDB with the tar ball. To use it, add to the configure command --with-db-backend=LMDB. If you want to try the readahead cache, also add to the configure line --enable-ra-cache. Hope this helps! Becky Ligon Mark: Below are comments from Walt Ligon, one of the original designers of the system: I would say that OFS is primarily designed to support large parallel computations with large data I/O. It is not intended as a general purpose file system. In particular it is not particularly efficient for large numbers of very small I/O tasks as typically found in an interactive workload such as tar, gcc, grep, etc. The problem is the latency of sending a request from a single client through the kernel to the servers and back. Reconfiguring the servers, etc. really has no effect on latency. That said, we have been working to improve small file I/O. In the version you have there is a "User Interface" that allows an application to bypass the kernel which can lower latency of accesses, this is potentially suitable for specific applications but less so for common utilities such as tar. In the current version we have added a configurable readahead cache to the client that can improve small read performance for larger files. We are currently working to take advantage of the Linux page cache to improve very small I/O. On Thu, Oct 19, 2017 at 11:43 AM, Mark Guz <[email protected]> wrote: > Hi, > > I'm running a PoC orangefs setup, and have read through all of the > previous posts on performance. We are running 2.9.6 on Redhat 7.4. The > clients are using the kernel interface > > I'm running currently on 1 Power 750 server as host (with 8 dual meta/data > servers running). The clients are a mix of Intel and PPC64 systems all > interconnected by Infiniband DDR cards in Connected mode. > > The storage backend is a 4G FC attached ssd chassis with 20 250 Gig SSD > cards (not regular drives) in, with 8 are assigned to meta and 8 are > assigned to data > > The network tests good. 7ish Gb/s with no retries or errors. We are using > bmi_tcp. > > I can get great performance for large files as expected but when > performing small file actions the performance is significantly poorer. > > For example. I can untar linux-4.13.3.tar.xz locally on the filesystem > in 14seconds while on the orangefs it takes 10mins > > I can see the performance difference when playing with stripe sizes etc > when copying monolithic files, but there seems to be a wall that gets hit > when there is a lot of metadata activity. > > I can see how the need for network back and forth could impact performance > but is it reasonable to see a 42x performance drop in such cases? > > Also if I then try and compile the kernel on the orangefs it takes well > over 2hours and most of the time is spent io waiting. Compiling locally > takes about 20mins on the same server. > > I've tried running over multiple hosts, running some meta only and some > data only servers, i've tried running with just 1 meta and 1 data server. > I've applied all the system level optimizations I've found and just cannot > reasonably speed up the untar operation. > > Maybe it's a client thing? there's not much i can see that's configurable > about the client though. > > It seems to me that i should be able to get better performance, even on > small file operations, but I'm kinda stumped. > > Am I just chasing unicorns or is it possible to get usable performance for > this sort of file activity? (untaring, compiling etc etc) > > Config below > > <Defaults> > UnexpectedRequests 256 > EventLogging none > EnableTracing no > LogStamp datetime > BMIModules bmi_tcp > FlowModules flowproto_multiqueue > PerfUpdateInterval 10000000 > ServerJobBMITimeoutSecs 30 > ServerJobFlowTimeoutSecs 60 > ClientJobBMITimeoutSecs 30 > ClientJobFlowTimeoutSecs 600 > ClientRetryLimit 5 > ClientRetryDelayMilliSecs 100 > PrecreateBatchSize 0,1024,1024,1024,32,1024,0 > PrecreateLowThreshold 0,256,256,256,16,256,0 > TroveMaxConcurrentIO 16 > <Security> > TurnOffTimeouts yes > </Security> > </Defaults> > > <Aliases> > Alias server01 tcp://server01:3334 > Alias server02 tcp://server01:3335 > Alias server03 tcp://server01:3336 > Alias server04 tcp://server01:3337 > Alias server05 tcp://server01:3338 > Alias server06 tcp://server01:3339 > Alias server07 tcp://server01:3340 > Alias server08 tcp://server01:3341 > </Aliases> > > <ServerOptions> > Server server01 > DataStorageSpace /usr/local/storage/server01/data > MetadataStorageSpace /usr/local/storage/server01/meta > LogFile /var/log/orangefs-server-server01.log > </ServerOptions> > > <ServerOptions> > Server server02 > DataStorageSpace /usr/local/storage/server02/data > MetadataStorageSpace /usr/local/storage/server02/meta > LogFile /var/log/orangefs-server-server02.log > </ServerOptions> > > <ServerOptions> > Server server03 > DataStorageSpace /usr/local/storage/server03/data > MetadataStorageSpace /usr/local/storage/server03/meta > LogFile /var/log/orangefs-server-server03.log > </ServerOptions> > <ServerOptions> > Server server04 > DataStorageSpace /usr/local/storage/server04/data > MetadataStorageSpace /usr/local/storage/server04/meta > LogFile /var/log/orangefs-server-server04.log > </ServerOptions> > <ServerOptions> > Server server05 > DataStorageSpace /usr/local/storage/server05/data > MetadataStorageSpace /usr/local/storage/server05/meta > LogFile /var/log/orangefs-server-server05.log > </ServerOptions> > <ServerOptions> > Server server06 > DataStorageSpace /usr/local/storage/server06/data > MetadataStorageSpace /usr/local/storage/server06/meta > LogFile /var/log/orangefs-server-server06.log > </ServerOptions> > <ServerOptions> > Server server07 > DataStorageSpace /usr/local/storage/server07/data > MetadataStorageSpace /usr/local/storage/server07/meta > LogFile /var/log/orangefs-server-server07.log > </ServerOptions> > <ServerOptions> > Server server08 > DataStorageSpace /usr/local/storage/server08/data > MetadataStorageSpace /usr/local/storage/server08/meta > LogFile /var/log/orangefs-server-server08.log > </ServerOptions> > > <Filesystem> > Name orangefs > ID 146181131 > RootHandle 1048576 > FileStuffing yes > FlowBufferSizeBytes 1048576 > FlowBuffersPerFlow 8 > DistrDirServersInitial 1 > DistrDirServersMax 1 > DistrDirSplitSize 100 > TreeThreshold 16 > <Distribution> > Name simple_stripe > Param strip_size > Value 1048576 > </Distribution> > <MetaHandleRanges> > Range server01 3-576460752303423489 > Range server02 576460752303423490-1152921504606846976 > Range server03 1152921504606846977-1729382256910270463 > Range server04 1729382256910270464-2305843009213693950 > Range server05 2305843009213693951-2882303761517117437 > Range server06 2882303761517117438-3458764513820540924 > Range server07 3458764513820540925-4035225266123964411 > Range server08 4035225266123964412-4611686018427387898 > </MetaHandleRanges> > <DataHandleRanges> > Range server01 4611686018427387899-5188146770730811385 > Range server02 5188146770730811386-5764607523034234872 > Range server03 5764607523034234873-6341068275337658359 > Range server04 6341068275337658360-6917529027641081846 > Range server05 6917529027641081847-7493989779944505333 > Range server06 7493989779944505334-8070450532247928820 > Range server07 8070450532247928821-8646911284551352307 > Range server08 8646911284551352308-9223372036854775794 > </DataHandleRanges> > <StorageHints> > TroveSyncMeta yes > TroveSyncData no > TroveMethod alt-aio > #DirectIOThreadNum 120 > #DirectIOOpsPerQueue 200 > #DBCacheSizeBytes 17179869184 <(717)%20986-9184> > #AttrCacheSize 10037 > #AttrCacheMaxNumElems 2048 > </StorageHints> > </Filesystem> > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
