Every dog has its day! ;)

Prentice

On 07/09/2015 05:59 PM, James Cuff wrote:
Awesome!!!

With my job title most folks think I'm essentially technically neutered these days.

Good to see there is still some life in this old dog :-)

Best,

J.

On Thursday, July 9, 2015, mathog <[email protected] <mailto:[email protected]>> wrote:

    On 09-Jul-2015 11:54, James Cuff wrote:

        
http://blog.jcuff.net/2015/04/of-huge-pages-and-huge-performance-hits.html


    Well, that seems to be it, but not quite with the same symptoms
    you observed.  khugepaged never showed up, and "perf top" never
    revealed _spin_lock_irqsave.  Instead this is what "perf top"
    shows in my tests:

    (hugepage=always, when migration/# process observed)
     89.97%  [kernel]       [k] compaction_alloc
      1.21%  [kernel]       [k] compact_zone
      1.18%  [kernel]       [k] get_pageblock_flags_group
      0.75%  [kernel]       [k] __reset_isolation_suitable
      0.57%  [kernel]       [k] clear_page_c_e

    (hugepage=always, when events/# process observed)
     85.97%  [kernel]       [k] compaction_alloc
      0.84%  [kernel]       [k] compact_zone
      0.65%  [kernel]       [k] get_pageblock_flags_group
      0.64%  perf           [.] 0x000000000005cff7

    (hugepage=never)
     29.86%  [kernel]       [k] clear_page_c_e
     21.88%  [kernel]       [k] copy_user_generic_string
     12.46%  [kernel]       [k] __alloc_pages_nodemask
      5.70%  [kernel]       [k] page_fault

    This is good, because "perf top" shows that the underlying issue
    is compaction_alloc and compact_zone even though what top shows
    is in one case migration/# and when locked to a cpu, events/#.

    Switching hugepage always->never seems to make things work right
    away.  Switching hugepage never->always seems to take a while to
    break.  In order to get it to start failing many of the big files
    involved must be copied to /dev/null again, even though they were
    presumably already in file cache.

    Searched for "compaction_alloc" and "compact_zone" and found a
    suggestion here

    
https://structureddata.github.io/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/

    to do:

    echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

    (transparent_hugepage is a link to redhat_transparent_hugepage).
    Reenabled hugepage and reproduced the painfully slow IO, set
    defrag to "never" and the IO was fast again, even though hugepage
    was still enabled.

    So on my machine the problem seems to be with hugepage defrag
    specifically.  Disabling just that is sufficient to resolve the
    issue, it isn't necessary to take out all of hugepage. Will let
    it run that way for a while and see if anything else shows up.

    For future reference:

    CentOS release 6.6 (Final)
    kernel 2.6.32-504.23.4.el6.x86_64
    Dell Inc. PowerEdge T620/03GCPM, BIOS 2.2.2 01/16/2014
    48 Intel Xeon CPU E5-2695 v2 @ 2.40GHz  (in /proc/cpuinfo)
    RAM 529231456 kB (in /proc/meminfo)

    Thanks all!

    David Mathog
    [email protected]
    Manager, Sequence Analysis Facility, Biology Division, Caltech



--
(Via iPhone)


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to