Hi Stephane & Aurélien

Here are the stats that I see in my logs:

Below is the best and worst avg. speed I noted in the log, with 
nb_threads_scan=2: 
2020/11/03 16:51:04 [4850/3] STATS |      avg. speed  (effective):    618.32 
entries/sec (3.23 ms/entry/thread)
2020/11/25 18:06:10 [4850/3] STATS |      avg. speed  (effective):    187.93 
entries/sec (10.62 ms/entry/thread)

Finally the full scan results are below:
2020/11/25 17:13:41 [4850/4] FS_Scan | Full scan of /scratch completed, 
369729104 entries found (123 errors). Duration = 1964257.21s

Stephane, now I wonder what could have caused poor scanning performance. Once I 
kicked off my initial scan during the LAD with same number of threads(2) my 
scan along with some users jobs in the following days caused opening and 
closing of file 150-200 million file operations and as a result filled up my 
change log too soon than I expected.  I had to cancel the first initial scan to 
bring the situation under control. After I cleared change log, I asked 
Robinhood to perform a new full scan. I am not sure if this cancel and restart 
could have caused delays with additional lookup into database for existing 
entries of already scanned 200millions files by then? Other thing your point 
out is you have RAID10 SSD, on our end I have RAID-5 3.6TB of SSD's, this 
probably explains the slowness?

I wasn't sure of the impact of the scan hence chose only 2 threads, I am 
guessing I could bump that up to 4 next times to see if the benefits my scan 
times. 

Thank you,
Amit

-----Original Message-----
From: Stephane Thiell <sthi...@stanford.edu> 
Sent: Monday, December 7, 2020 11:43 AM
To: Degremont, Aurelien <degre...@amazon.com>
Cc: Kumar, Amit <ahku...@mail.smu.edu>; Russell Dekema <deke...@umich.edu>; 
lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Robinhood scan time

Hi Amit,

Your number is very low indeed.

At our site, we're seeing ~100 million files/day during a Robinhood scan with 
nb_threads_scan =4 and on hardware using Intel based CPUs:

2020/11/16 07:29:46 [126653/2] STATS |      avg. speed  (effective):   1207.06 
entries/sec (3.31 ms/entry/thread)

2020/11/16 07:31:44 [126653/29] FS_Scan | Full scan of /oak completed, 
1508197871 entries found (65 errors). Duration = 1249490.23s

In that case, our Lustre MDS and Robinhood server are running all on 2 x CPU 
E5-2643 v3 @ 3.40GHz.
The Robinhood server has 768GB of RAM and 7TB of SSDs in RAID-10 for the DB.

On another filesystem, using AMD Naples -based CPUs and a dedicated Robinhood 
DB, hosted a different server with AMD Rome CPUs, we’re seeing a rate of 
266M/day during a Robinhood scan with nb_threads_scan = 8:

2020/09/20 21:43:46 [25731/4] FS_Scan | Full scan of /fir completed, 877905438 
entries found (744 errors). Duration = 284564.88s


Best,

Stephane

> On Dec 7, 2020, at 4:49 AM, Degremont, Aurelien <degre...@amazon.com> wrote:
> 
> Hi Amit,
> 
> Thanks for this data point, that's interesting.
> Robinhood prints a scan summary in its logfile at the end of scan. It could 
> be nice if you can copy/paste it, for further reference.
> 
> Aurélien
> 
> Le 04/12/2020 23:39, « lustre-discuss au nom de Kumar, Amit » 
> <lustre-discuss-boun...@lists.lustre.org au nom de ahku...@mail.smu.edu> a 
> écrit :
> 
>    CAUTION: This email originated from outside of the organization. Do not 
> click links or open attachments unless you can confirm the sender and know 
> the content is safe.
> 
> 
> 
>    Dual Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz;
>    256GB RAM
>    System x3650 M5
>    Storage for MDT is from NetApp EF560.
> 
>    Best regards,
>    Amit
> 
>    -----Original Message-----
>    From: Russell Dekema <deke...@umich.edu>
>    Sent: Friday, December 4, 2020 4:27 PM
>    To: Kumar, Amit <ahku...@mail.smu.edu>
>    Cc: lustre-discuss@lists.lustre.org
>    Subject: Re: [lustre-discuss] Robinhood scan time
> 
>    Greetings,
> 
>    What kind of hardware are you running on your metadata array?
> 
>    Cheers,
>    Rusty Dekema
> 
>    On Fri, Dec 4, 2020 at 5:12 PM Kumar, Amit <ahku...@mail.smu.edu> wrote:
>> 
>> HI All,
>> 
>> 
>> 
>> During LAD’20 Andreas mentioned if I could share the Robinhood scan time for 
>> the 369millions files we have. So here it is. It took ~23 days for me to 
>> complete initial scan of all 369 million files, on a dedicated robinhood 
>> server that has 384GB RAM. I had it setup with all tweaks for database and 
>> client that was mentioned in Robinhood document. I only used 2 threads for 
>> this scan. Hope this reference helps.
>> 
>> 
>> 
>> Thank you,
>> 
>> Amit
>> 
>> 
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
>    ----IF CLASSIFICATION START----
> 
>    ----IF CLASSIFICATION END----
>    _______________________________________________
>    lustre-discuss mailing list
>    lustre-discuss@lists.lustre.org
>    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to