John, Many thanks for your reply!
I've added the line to /etc/system but I haven't rebooted yet. According to http://wesunsolve.net/bugid/id/6958068 another workaround is to add cpu_deep_idle disable to /etc/power.conf and to then run pmconfig. The workaround doesn't mention having to reboot after running pmconfig, and the man page for pmconfig indicates that changes are made effective immediately. Is this sufficient to change the cpu_deep_idle setting or do I actually need to reboot? (This system is sensitive to downtime from rebooting, so I'm hoping to avoid the reboot if it's not necessary.) Also, before I made the above change, I created a new dtrace script that trace the processing of the open(/dev/arp) system call in the arp process and had this run from cron every 5 minutes. The issue occurred again after I made this change (but before I changed the above power setting) where it was offline for a couple of minutes. I've attached the resulting dtrace output, as well as the dtrace scripts. The most relevant piece of the output is: 0 -> fop_open 10723 0 ms 1 us 19014758217431532 0 -> spec_open 10723 0 ms 1 us 19014758217432883 0 -> secpolicy_spec_open 10723 0 ms 1 us 19014758217434464 0 ... 0 <- secpolicy_spec_open 10723 0 ms 1 us 19014758217461622 0 -> spec_lockcsp 10723 0 ms 1 us 19014758217463617 0 <- spec_lockcsp 10723 0 ms 1 us 19014758217464847 0 -> stropen 10723 0 ms 1 us 19014758217466486 0 -> allocq 10723 0 ms 1 us 19014758217468424 0 -> kmem_cache_alloc 10723 0 ms 0 us 19014758217469378 0 <- kmem_cache_alloc 10723 0 ms 1 us 19014758217470382 0 <- allocq 10723 0 ms 1 us 19014758217471735 0 -> shalloc 10723 0 ms 1 us 19014758217473117 0 -> kmem_cache_alloc 10723 0 ms 1 us 19014758217474227 0 <- kmem_cache_alloc 10723 0 ms 1 us 19014758217475314 0 <- shalloc 10723 0 ms 1 us 19014758217477226 0 -> setq 10723 0 ms 1 us 19014758217478676 0 <- setq 10723 0 ms 1 us 19014758217480417 0 -> set_qend 10723 0 ms 1 us 19014758217482116 0 <- set_qend 10723 0 ms 1 us 19014758217483702 0 -> qattach 10723 0 ms 1 us 19014758217485231 0 -> allocq 10723 0 ms 0 us 19014758217486207 0 -> kmem_cache_alloc 10723 0 ms 0 us 19014758217486886 0 <- kmem_cache_alloc 10723 0 ms 0 us 19014758217487662 0 <- allocq 10723 0 ms 0 us 19014758217488579 0 -> setq 10723 0 ms 1 us 19014758217489594 0 <- setq 10723 0 ms 0 us 19014758217490432 0 -> entersq 10723 0 ms 1 us 19014758217492028 0 <- entersq 10723 0 ms 1 us 19014758217493900 0 -> ip_open 10723 0 ms 1 us 19014758217495654 0 ... 0 <- ip_open 10723 0 ms 1 us 19014758217563000 0 -> leavesq 10723 0 ms 1 us 19014758217564404 0 <- leavesq 10723 0 ms 1 us 19014758217565782 0 <- qattach 10723 0 ms 1 us 19014758217567311 0 -> zoneid_to_netstackid 10723 0 ms 1 us 19014758217568959 0 -> netstack_find_shared_zoneid 10723 0 ms 1 us 19014758217570399 0 <- netstack_find_shared_zoneid 10723 0 ms 1 us 19014758217571794 0 <- zoneid_to_netstackid 10723 0 ms 1 us 19014758217573281 0 -> netstack_find_by_stackid 10723 0 ms 1 us 19014758217574665 0 -> netstack_hold 10723 0 ms 0 us 19014758217575610 0 <- netstack_hold 10723 0 ms 0 us 19014758217576309 0 <- netstack_find_by_stackid 10723 0 ms 1 us 19014758217577415 0 -> push_mod 10723 0 ms 1 us 19014758217579097 0 -> fmodsw_find 10723 0 ms 1 us 19014758217580829 0 <- fmodsw_find 10723 0 ms 1 us 19014758217582630 0 -> qattach 10723 0 ms 0 us 19014758217583590 0 -> allocq 10723 0 ms 0 us 19014758217584330 0 ... 0 <- allocq 10723 0 ms 1 us 19014758217586850 0 -> setq 10723 0 ms 0 us 19014758217587650 0 <- setq 10723 0 ms 1 us 19014758217588671 0 -> entersq 10723 0 ms 0 us 19014758217589425 0 -> cv_wait 10723 0 ms 1 us 19014758217590660 0 -> thread_lock 10723 0 ms 1 us 19014758217591880 0 <- thread_lock 10723 0 ms 1 us 19014758217593066 0 -> cv_block 10723 0 ms 1 us 19014758217594205 (this is where it effectively waits for the CV to become available, which only becomes available several minutes later when the network starts working again) As far as I can tell, openstr is used to construct the ARP stream device, and this eventually calls push_mod, which calls qattach. qattach then calls entersq, which then calls cv_wait, which is where we block until the network connectivity resumes (it actually context switches several times while it's waiting, but effectively it's blocked on the cv). I'm going to modify my dtrace script to attempt to get more information about this particular cv in case it happens again. I'm crossing my fingers that the power setting change will have fixed it for good! But if not, I hope to have some more detailed data to report back. Since the issue usually occurs once every 24 hours I'll report back whether or not the power settings fix this. Thanks, Kevin -- This message posted from opensolaris.org _______________________________________________ networking-discuss mailing list networking-discuss@opensolaris.org