Mark: Since you are a flash specialist, I have a question for you. We are running tests on a box that has 6 nvme pcie cards. Each of the six cards are defined with their own filesystem (XFS). If we run a dd to each filesystem at the same time, the I/O performance is horrible. Is there a way to tune this environment, or cannot the pcie bus handle such a load?
Becky OrangeFS Developer On Thu, Oct 19, 2017 at 4:55 PM, Mark Guz <[email protected]> wrote: > Hi Becky > > I hear what you're saying. I was just hoping that the slowdown would not > be quite so dramatic. > > Switching to LMDB does offer an improvement. Untar now only takes 7mins... > which is a pretty good speedup. Still to slow to be practical for our uses > however. > > Another user Martin also offered some suggestions in response to my > original post. So I'm going to try that too. > > Thanks! > Mark Guz > > Senior IT Specialist > Flash Systems & Technology > > Office 713-587-1048 <(713)%20587-1048> > Mobile 832-290-8161 <(832)%20290-8161> > [email protected] > > > > > > > > From: Becky Ligon <[email protected]> > To: Mark Guz <[email protected]> > Cc: "[email protected]" <pvfs2-users@beowulf- > underground.org> > Date: 10/19/2017 01:12 PM > Subject: Re: [Pvfs2-users] Questions about performance > Sent by: [email protected] > ------------------------------ > > > > Mark, > > OrangeFS is NOT optimized for small file I/O, thus we don't recommend > untarring or compiling on this system. > > We are working to make small file I/O better, but the system was > originally designed for large I/O. To that end, we now have a readahead > cache in 2.9.6, which helps with small reads, but we don't have a write > cache. We are currently working on some kernel-level page caching > mechanisms to help with small writes, but the concern is with data > consistency amongst compute nodes. The user must be responsible for > ensuring that only one node, out of many within one job, needs the most > current data. That's a lot to ask of most users, so you see our dilemma. > > In your scenario, are you using LMDB or BDB? We have found that LMDB > gives must better metadata performance than BDB. We deliver LMDB with the > tar ball. To use it, add to the configure command --with-db-backend=LMDB. > > If you want to try the readahead cache, also add to the configure line > --enable-ra-cache. > > Hope this helps! > > Becky Ligon > > > Mark: > > Below are comments from Walt Ligon, one of the original designers of the > system: > > I would say that OFS is primarily designed to support large parallel > computations with large data I/O. It is not intended as a general purpose > file system. In particular it is not particularly efficient for large > numbers of very small I/O tasks as typically found in an interactive > workload such as tar, gcc, grep, etc. The problem is the latency of > sending a request from a single client through the kernel to the servers > and back. Reconfiguring the servers, etc. really has no effect on latency. > > That said, we have been working to improve small file I/O. In the version > you have there is a "User Interface" that allows an application to bypass > the kernel which can lower latency of accesses, this is potentially > suitable for specific applications but less so for common utilities such as > tar. In the current version we have added a configurable readahead cache > to the client that can improve small read performance for larger files. We > are currently working to take advantage of the Linux page cache to improve > very small I/O. > > > > On Thu, Oct 19, 2017 at 11:43 AM, Mark Guz <*[email protected]* > <[email protected]>> wrote: > Hi, > > I'm running a PoC orangefs setup, and have read through all of the > previous posts on performance. We are running 2.9.6 on Redhat 7.4. The > clients are using the kernel interface > > I'm running currently on 1 Power 750 server as host (with 8 dual meta/data > servers running). The clients are a mix of Intel and PPC64 systems all > interconnected by Infiniband DDR cards in Connected mode. > > The storage backend is a 4G FC attached ssd chassis with 20 250 Gig SSD > cards (not regular drives) in, with 8 are assigned to meta and 8 are > assigned to data > > The network tests good. 7ish Gb/s with no retries or errors. We are using > bmi_tcp. > > I can get great performance for large files as expected but when > performing small file actions the performance is significantly poorer. > > For example. I can untar linux-4.13.3.tar.xz locally on the filesystem > in 14seconds while on the orangefs it takes 10mins > > I can see the performance difference when playing with stripe sizes etc > when copying monolithic files, but there seems to be a wall that gets hit > when there is a lot of metadata activity. > > I can see how the need for network back and forth could impact performance > but is it reasonable to see a 42x performance drop in such cases? > > Also if I then try and compile the kernel on the orangefs it takes well > over 2hours and most of the time is spent io waiting. Compiling locally > takes about 20mins on the same server. > > I've tried running over multiple hosts, running some meta only and some > data only servers, i've tried running with just 1 meta and 1 data server. > I've applied all the system level optimizations I've found and just cannot > reasonably speed up the untar operation. > > Maybe it's a client thing? there's not much i can see that's configurable > about the client though. > > It seems to me that i should be able to get better performance, even on > small file operations, but I'm kinda stumped. > > Am I just chasing unicorns or is it possible to get usable performance for > this sort of file activity? (untaring, compiling etc etc) > > Config below > > <Defaults> > UnexpectedRequests 256 > EventLogging none > EnableTracing no > LogStamp datetime > BMIModules bmi_tcp > FlowModules flowproto_multiqueue > PerfUpdateInterval 10000000 > ServerJobBMITimeoutSecs 30 > ServerJobFlowTimeoutSecs 60 > ClientJobBMITimeoutSecs 30 > ClientJobFlowTimeoutSecs 600 > ClientRetryLimit 5 > ClientRetryDelayMilliSecs 100 > PrecreateBatchSize 0,1024,1024,1024,32,1024,0 > PrecreateLowThreshold 0,256,256,256,16,256,0 > TroveMaxConcurrentIO 16 > <Security> > TurnOffTimeouts yes > </Security> > </Defaults> > > <Aliases> > Alias server01 tcp://server01:3334 > Alias server02 tcp://server01:3335 > Alias server03 tcp://server01:3336 > Alias server04 tcp://server01:3337 > Alias server05 tcp://server01:3338 > Alias server06 tcp://server01:3339 > Alias server07 tcp://server01:3340 > Alias server08 tcp://server01:3341 > </Aliases> > > <ServerOptions> > Server server01 > DataStorageSpace /usr/local/storage/server01/data > MetadataStorageSpace /usr/local/storage/server01/meta > LogFile /var/log/orangefs-server-server01.log > </ServerOptions> > > <ServerOptions> > Server server02 > DataStorageSpace /usr/local/storage/server02/data > MetadataStorageSpace /usr/local/storage/server02/meta > LogFile /var/log/orangefs-server-server02.log > </ServerOptions> > > <ServerOptions> > Server server03 > DataStorageSpace /usr/local/storage/server03/data > MetadataStorageSpace /usr/local/storage/server03/meta > LogFile /var/log/orangefs-server-server03.log > </ServerOptions> > <ServerOptions> > Server server04 > DataStorageSpace /usr/local/storage/server04/data > MetadataStorageSpace /usr/local/storage/server04/meta > LogFile /var/log/orangefs-server-server04.log > </ServerOptions> > <ServerOptions> > Server server05 > DataStorageSpace /usr/local/storage/server05/data > MetadataStorageSpace /usr/local/storage/server05/meta > LogFile /var/log/orangefs-server-server05.log > </ServerOptions> > <ServerOptions> > Server server06 > DataStorageSpace /usr/local/storage/server06/data > MetadataStorageSpace /usr/local/storage/server06/meta > LogFile /var/log/orangefs-server-server06.log > </ServerOptions> > <ServerOptions> > Server server07 > DataStorageSpace /usr/local/storage/server07/data > MetadataStorageSpace /usr/local/storage/server07/meta > LogFile /var/log/orangefs-server-server07.log > </ServerOptions> > <ServerOptions> > Server server08 > DataStorageSpace /usr/local/storage/server08/data > MetadataStorageSpace /usr/local/storage/server08/meta > LogFile /var/log/orangefs-server-server08.log > </ServerOptions> > > <Filesystem> > Name orangefs > ID 146181131 > RootHandle 1048576 > FileStuffing yes > FlowBufferSizeBytes 1048576 > FlowBuffersPerFlow 8 > DistrDirServersInitial 1 > DistrDirServersMax 1 > DistrDirSplitSize 100 > TreeThreshold 16 > <Distribution> > Name simple_stripe > Param strip_size > Value 1048576 > </Distribution> > <MetaHandleRanges> > Range server01 3-576460752303423489 > Range server02 576460752303423490-1152921504606846976 > Range server03 1152921504606846977-1729382256910270463 > Range server04 1729382256910270464-2305843009213693950 > Range server05 2305843009213693951-2882303761517117437 > Range server06 2882303761517117438-3458764513820540924 > Range server07 3458764513820540925-4035225266123964411 > Range server08 4035225266123964412-4611686018427387898 > </MetaHandleRanges> > <DataHandleRanges> > Range server01 4611686018427387899-5188146770730811385 > Range server02 5188146770730811386-5764607523034234872 > Range server03 5764607523034234873-6341068275337658359 > Range server04 6341068275337658360-6917529027641081846 > Range server05 6917529027641081847-7493989779944505333 > Range server06 7493989779944505334-8070450532247928820 > Range server07 8070450532247928821-8646911284551352307 > Range server08 8646911284551352308-9223372036854775794 > </DataHandleRanges> > <StorageHints> > TroveSyncMeta yes > TroveSyncData no > TroveMethod alt-aio > #DirectIOThreadNum 120 > #DirectIOOpsPerQueue 200 > #DBCacheSizeBytes *17179869184* <(717)%20986-9184> > #AttrCacheSize 10037 > #AttrCacheMaxNumElems 2048 > </StorageHints> > </Filesystem> > > > > > _______________________________________________ > Pvfs2-users mailing list > *[email protected]* > <[email protected]> > *http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf-2Dunderground.org_mailman_listinfo_pvfs2-2Dusers&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=X793IODuLpqIji9UhFNQFg&m=q2QoLyJnjytE-wyQH54g9Y2uoAONgwsJA6-A-cgcg-4&s=b2jaOOImqK69C6A9brzTHzdeINjZ2tZcaMNjGNMmJ5M&e=> > > > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
