>The plan is to load the new cache from the old GPFS then dump once the cache is full.
>We've already increase numThreashThreads from 4 to 8 and seen only marginal improvements, we could attempt to increase this further. AFM have replication performance issues with small files on high latency networks. There is a plan to fix these issues. >I'm also wondering if its worth increasing the Refresh Intervals to speed up read of already cache files. At this stage we want to fill the cache and don't care about write back until we switch the target to the >new NFS/GPFS from our old GPFS storage to a new box back at our off-site location, (otherwise known as the office) Increasing the refresh intervals will improve the application performance at cache site. It is better to set large refresh intervals if the cache is the only writer. ~Venkat (vpuvv...@in.ibm.com) From: Peter Childs <p.chi...@qmul.ac.uk> To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org> Date: 01/04/2018 04:47 PM Subject: Re: [gpfsug-discuss] Use AFM for migration of many small files Sent by: gpfsug-discuss-boun...@spectrumscale.org We are doing something very similar using 4.2.3-4 and GPFS 4.2.1-1 on the nfs backend. Did you have any success? The plan is to load the new cache from the old GPFS then dump once the cache is full. We've already increase numThreashThreads from 4 to 8 and seen only marginal improvements, we could attempt to increase this further. I'm also wondering if its worth increasing the Refresh Intervals to speed up read of already cache files. At this stage we want to fill the cache and don't care about write back until we switch the target to the new NFS/GPFS from our old GPFS storage to a new box back at our off-site location, (otherwise known as the office) [root@afmgateway1 scratch]# mmlsfileset home home -L --afm Filesets in file system 'home': Attributes for fileset home: ============================= Status Linked Path /data2/home Id 42 Root inode 1343225859 Parent Id 0 Created Wed Jan 3 12:32:33 2018 Comment Inode space 41 Maximum number of inodes 100000000 Allocated inodes 15468544 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://afm1/gpfs/data1/afm/home Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 8 Prefetch Threshold 0 (default) Eviction Enabled no Thanks in advance. Peter Childs On Tue, 2017-09-05 at 19:57 +0530, Venkateswara R Puvvada wrote: Which version of Spectrum Scale ? What is the fileset mode ? >We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here >I see just ~150MB/s – compare this to the 1000+MB/s we get for larger files. How was the performance measured ? If parallel IO is enabled, AFM uses multiple gateway nodes to prefetch the large files (if file size if more than 1GB). Performance difference between small and lager file is huge (1000MB - 150MB = 850MB) here, and generally it is not the case. How many files were present in list file for prefetch ? Could you also share full internaldump from the gateway node ? >I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few >read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. AFM prefetches the files on multiple threads. Default flush threads for prefetch are 36 (fileset.afmNumFlushThreads (default 4) + afmNumIOFlushThreads (default 32)). >Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and >I need to look elsewhere to get better performance for prefetch of many smaller files? See above, AFM reads files on multiple threads parallelly. Try increasing the afmNumFlushThreads on fileset and verify if it improves the performance. ~Venkat (vpuvv...@in.ibm.com) From: "Billich Heinrich Rainer (PSI)" <heiner.bill...@psi.ch> To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org> Date: 09/04/2017 10:18 PM Subject: [gpfsug-discuss] Use AFM for migration of many small files Sent by: gpfsug-discuss-boun...@spectrumscale.org Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s – compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn’t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It’s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=eMSpfXQgE3wMSVM0G0vCYr6LEgERhP8iGfw3uNk8lLs&s=mcQ13uhvwS4yPbA2uCmwccc7l4mjTmL2fAdPLimS0Hc&e= -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=07QQkI0Rg8NyUEgPIuJwfg3elEXqTpOjIFpy2WbaEg0&s=kGEDPbMo64yU7Tcwu61ggT89tfq_3QdX-r6NoANXh78&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss