Re: [Beowulf] big read triggers migration and slow memory IO?

mathog Thu, 09 Jul 2015 09:41:44 -0700

On 09-Jul-2015 06:48, Stuart Barkley wrote:

Even though I doubt it is your problem, this smells similar to the
zone_reclaim_mode issues we saw last year.


You might check 'sar -B' output.  Specifically the 'pgscand/s' column.


Stays at 0, but see caveat below


Check the setting of /proc/sys/vm/zone_reclaim_mode (it should be 0).


It is.

The caveat - this morning I cannot make the tests go slow! Sameaccount, same command, same input file. Apparently the issue depends onhow the system was used previously and it sorts itself out, eventually,on an idle system. Before this problem was noticed40 of the 48 nodes had each been used to generate and write one of thesehuge files (17.45GB). My testing of the read speed went on for aboutfour hours after that, and it was uniformly slow for test files over the"just below 2^34 byte" limit for my account. The system then sat idlefor about 15 hours, and now the performance issue isn't happening, noteven on a test file twice the size of the largest attempted yesterday.

Interestingly, the "taskset" isn't needed now either. When the testprogram is run without it it runs nicely and no "migration/#" processever pops up.

Seems like there is some sort of state that the earlier processingimposed on the system which caused the OS to be short of who knows what,triggering all of these issues when a lot of memory was needed on oneCPU (or in one process).


I will re-abuse the system and see if that reintroduces the problem.

Thanks,

David Mathog
[email protected]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] big read triggers migration and slow memory IO?

Reply via email to