John,

Many thanks for your reply!

I've added the line to /etc/system but I haven't rebooted yet. According to 
http://wesunsolve.net/bugid/id/6958068 another workaround is to add

cpu_deep_idle           disable

to /etc/power.conf and to then run pmconfig. The workaround doesn't mention 
having to reboot after running pmconfig, and the man page for pmconfig 
indicates that changes are made effective immediately. Is this sufficient to 
change the cpu_deep_idle setting or do I actually need to reboot? (This system 
is sensitive to downtime from rebooting, so I'm hoping to avoid the reboot if 
it's not necessary.)

Also, before I made the above change, I created a new dtrace script that trace 
the processing of the open(/dev/arp) system call in the arp process and had 
this run from cron every 5 minutes. The issue occurred again after I made this 
change (but before I changed the above power setting) where it was offline for 
a couple of minutes. I've attached the resulting dtrace output, as well as the 
dtrace scripts. The most relevant piece of the output is:

  0          -> fop_open                      10723     0 ms    1 us    
19014758217431532
  0            -> spec_open                   10723     0 ms    1 us    
19014758217432883
  0              -> secpolicy_spec_open       10723     0 ms    1 us    
19014758217434464
  0                ...
  0              <- secpolicy_spec_open       10723     0 ms    1 us    
19014758217461622
  0              -> spec_lockcsp              10723     0 ms    1 us    
19014758217463617
  0              <- spec_lockcsp              10723     0 ms    1 us    
19014758217464847
  0              -> stropen                   10723     0 ms    1 us    
19014758217466486
  0                -> allocq                  10723     0 ms    1 us    
19014758217468424
  0                  -> kmem_cache_alloc      10723     0 ms    0 us    
19014758217469378
  0                  <- kmem_cache_alloc      10723     0 ms    1 us    
19014758217470382
  0                <- allocq                  10723     0 ms    1 us    
19014758217471735
  0                -> shalloc                 10723     0 ms    1 us    
19014758217473117
  0                  -> kmem_cache_alloc      10723     0 ms    1 us    
19014758217474227
  0                  <- kmem_cache_alloc      10723     0 ms    1 us    
19014758217475314
  0                <- shalloc                 10723     0 ms    1 us    
19014758217477226
  0                -> setq                    10723     0 ms    1 us    
19014758217478676
  0                <- setq                    10723     0 ms    1 us    
19014758217480417
  0                -> set_qend                10723     0 ms    1 us    
19014758217482116
  0                <- set_qend                10723     0 ms    1 us    
19014758217483702
  0                -> qattach                 10723     0 ms    1 us    
19014758217485231
  0                  -> allocq                10723     0 ms    0 us    
19014758217486207
  0                    -> kmem_cache_alloc    10723     0 ms    0 us    
19014758217486886
  0                    <- kmem_cache_alloc    10723     0 ms    0 us    
19014758217487662
  0                  <- allocq                10723     0 ms    0 us    
19014758217488579
  0                  -> setq                  10723     0 ms    1 us    
19014758217489594
  0                  <- setq                  10723     0 ms    0 us    
19014758217490432
  0                  -> entersq               10723     0 ms    1 us    
19014758217492028
  0                  <- entersq               10723     0 ms    1 us    
19014758217493900
  0                  -> ip_open               10723     0 ms    1 us    
19014758217495654
  0                    ...
  0                  <- ip_open               10723     0 ms    1 us    
19014758217563000
  0                  -> leavesq               10723     0 ms    1 us    
19014758217564404
  0                  <- leavesq               10723     0 ms    1 us    
19014758217565782
  0                <- qattach                 10723     0 ms    1 us    
19014758217567311
  0                -> zoneid_to_netstackid    10723     0 ms    1 us    
19014758217568959
  0                  -> netstack_find_shared_zoneid 10723       0 ms    1 us    
19014758217570399
  0                  <- netstack_find_shared_zoneid 10723       0 ms    1 us    
19014758217571794
  0                <- zoneid_to_netstackid    10723     0 ms    1 us    
19014758217573281
  0                -> netstack_find_by_stackid 10723    0 ms    1 us    
19014758217574665
  0                  -> netstack_hold         10723     0 ms    0 us    
19014758217575610
  0                  <- netstack_hold         10723     0 ms    0 us    
19014758217576309
  0                <- netstack_find_by_stackid 10723    0 ms    1 us    
19014758217577415
  0                -> push_mod                10723     0 ms    1 us    
19014758217579097
  0                  -> fmodsw_find           10723     0 ms    1 us    
19014758217580829
  0                  <- fmodsw_find           10723     0 ms    1 us    
19014758217582630
  0                  -> qattach               10723     0 ms    0 us    
19014758217583590
  0                    -> allocq              10723     0 ms    0 us    
19014758217584330
  0                      ...
  0                    <- allocq              10723     0 ms    1 us    
19014758217586850
  0                    -> setq                10723     0 ms    0 us    
19014758217587650
  0                    <- setq                10723     0 ms    1 us    
19014758217588671
  0                    -> entersq             10723     0 ms    0 us    
19014758217589425
  0                      -> cv_wait           10723     0 ms    1 us    
19014758217590660
  0                        -> thread_lock     10723     0 ms    1 us    
19014758217591880
  0                        <- thread_lock     10723     0 ms    1 us    
19014758217593066
  0                        -> cv_block        10723     0 ms    1 us    
19014758217594205
(this is where it effectively waits for the CV to become available, which only 
becomes available several minutes later when the network starts working again)

As far as I can tell, openstr is used to construct the ARP stream device, and 
this eventually calls push_mod, which calls qattach. qattach then calls 
entersq, which then calls cv_wait, which is where we block until the network 
connectivity resumes (it actually context switches several times while it's 
waiting, but effectively it's blocked on the cv). I'm going to modify my dtrace 
script to attempt to get more information about this particular cv in case it 
happens again.

I'm crossing my fingers that the power setting change will have fixed it for 
good! But if not, I hope to have some more detailed data to report back.

Since the issue usually occurs once every 24 hours I'll report back whether or 
not the power settings fix this.

Thanks,
Kevin
-- 
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to