Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> What next? The machine answered the question: It was running smoothly for close to an hour. Then I left for lunch. When I came back, the monitor was black, no reaction. I tried all and everything, with power button as last resort. That resulted in a cold start. Since this was a good opportunity, I gave it a shot and pulled a network cable to it, and disabled WLAN. And it connected properly to the network; so the NIC is probably not broken, as one could assume. The message lines are as in my earlier mail, except that there are two more: one with bge0 link up, immediately followed by bad address 0.0.0.0 I had issued ifconfig bge0 dhcp afterwards, and there are no more bge0 messages in the log. So what we seem to encounter here, is a bad architectural mistake in the kernel. Blame nwam on pulling the wrong cords, nevermind. But the kernel must not allow this to happen: When bge0 can't connect, it monopolises all resources to load the 'correct' firmware to get it back up? On top of that, I never used bge0, always wpi0. So there is no reason at all for the kernel to try to force bge0 to work. -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
[osol-help] swapping out system drive
hi all im new to opensolaris, so i just want to confirm something. i have it configured as a file server just the way a want, everything working. but now, i want to swap out the system drive so something smaller, as it is currently using a 1tb drive, and it uses only like under 50gb of it. for windows, i know i can just image the system drive on to the new drive and have everything just work. can i do the same with opensolaris? just image the drive and plug it in? it took me a while to have it setup the way i want, and i dont want to risk screwing it up. thanks -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> Hmm, looks like the bge driver is using > software interrupts, and I think these could > be running at priority level 4. > > Seems that the bge hardware has some > problems, and the driver tries to reset the > bge network hardware in an attempt to > recover from the bge hardware problem. > > bge_poll_firmware() could be busy waiting > for up to one second; I suspect this could > explain the kernel cpu time usage. > > Are there any error or warning messages > logged to /var/adm/messages when the > system starts consuming kernel cpu time? > > > Maybe the hang can be avoided when the > bge nic driver isn't used and the bge interface > is unconfigured / unplumbed? Or the bge > nic driver isn't allowed to load, by using > the kernel option "-B disable-bge=true" ? I started at the end, with -B disable-bge=true. The network applet still shows bge0, but it doesn't try to configure it. ifconfig bge0 unplumb says bge0 is no interface, so the kernel option seems to have worked. Lockstat though still shows 98% of i86_mwait at 'sane' state. I checked the /var/adm/messages, but it is so long, and I don't know what I should look for. I tried 'excess' and 'consum', but neither had any hits. What looks strange to me, the layperson in kernel land: Aug 8 22:05:34 OSolUwe mac: [ID 469746 kern.info] NOTICE: bge0 registered Aug 8 22:05:34 OSolUwe pci_pci: [ID 370704 kern.info] PCI-device: pci103c,3...@e, bge0 Aug 8 22:05:34 OSolUwe genunix: [ID 936769 kern.info] bge0 is /p...@0,0/pci8086,2...@1e/pci103c,3...@e Aug 8 22:05:46 OSolUwe genunix: [ID 408114 kern.info] /p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online Aug 8 22:05:47 OSolUwe ip: [ID 856290 kern.notice] ip: joining multicasts failed (4) on bge0 - will use link layer broadcasts for multicast Aug 8 22:05:50 OSolUwe in.ndpd[366]: [ID 169330 daemon.error] Interface bge0 has been removed from kernel. in.ndpd will no longer use it Aug 8 22:05:54 OSolUwe genunix: [ID 408114 kern.info] /p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online At least, I can confirm that now the system keeps running normally; meaning that at least the symptoms have been suppressed by that kernel option. What next? -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] 2009.06 getting slow
Ta, will check on Monday. Think I have got 10gb out of 250gb left -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] opensolaris login info?
OpenSolaris uses RBAC and root is set as an RBAC role.This means you cannot remote login as root and is more secure. The jack/jack default user allows you to login after the install and switch to root and the switch user to root is logged. You can then disable root as a role if you wish and add a custom login. It would be a good idea to remove the jack login after you create a custom one or change the jack password. Use the following to remove RBAC Role from the root login. "rolemod -K type=normal root" The "jack" account also has RBAC attributes and can issue root commands if you prefix with "pfexec", similar to "sudo". Anon Y Mous wrote: Did you already try going to google.com and typing in the following words: opensolaris root password ? It's been a while since I've done this, so I might be wrong, but I believe OpenSolaris 2009.06 CD has this default regular user: username: jack password: jack and the root password is: opensolaris During the installation process it usually asks you to create a new user with a user name and a password. I recommend not making the user you create during the installation "jack" with a password of "jack" as your machine will probably get hacked a few hours after you enable ssh if you use a default user-name and password like that. ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] Request for mentor- Documentation Project
Hi Meera, you might get a bigger response if you post in the "discuss" thread here: http://www.opensolaris.org/jive/forum.jspa?forumID=13 Good luck to you. I'm a part time on and off computer science university student / full time Linux system administrator (Linux skills pay my bills) that has been dabbling in OpenSolaris as well. Studying UNIX (i.e. Solaris) instead of just knowing Linux and BSD can be a very rewarding experience, but it's also a very steep learning curve (think of learning vi for the first time): http://dailyvim.blogspot.com/2009/02/editor-comparison.html and you'll know what to expect. :-) -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> > Maybe we can find out who's calling drv_usecwait(), > > using: > > lockstat -kIW -f drv_usecwait -s 10 sleep 15 > # lockstat -kIW -f drv_usecwait -s 10 sleep 15 > > Profiling interrupt: 88 events in 16.823 seconds (5 events/sec) > > --- > Count indv cuml rcnt nsec Hottest CPU+PILCaller >86 98% 98% 0.00 867 cpu[0]+4 drv_usecwait > > nsec -- Time Distribution -- count Stack > 1024 |@@ 76bge_poll_firmware > 2048 |@@@10bge_chip_reset > bge_reset > bge_restart > bge_chip_factotum > av_dispatch_softvect > dispatch_softint > switch_sp_and_call Hmm, looks like the bge driver is using software interrupts, and I think these could be running at priority level 4. Seems that the bge hardware has some problems, and the driver tries to reset the bge network hardware in an attempt to recover from the bge hardware problem. bge_poll_firmware() could be busy waiting for up to one second; I suspect this could explain the kernel cpu time usage. Are there any error or warning messages logged to /var/adm/messages when the system starts consuming kernel cpu time? Maybe the hang can be avoided when the bge nic driver isn't used and the bge interface is unconfigured / unplumbed? Or the bge nic driver isn't allowed to load, by using the kernel option "-B disable-bge=true" ? -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> If you repeat that lockstat, does the result look > similar? > cpu usage by "cpu[0]+4", in tsc_read(), > ddi_mem_get32(), > tsc_gethrtime(), ...drv_usecwait() ? > > > Maybe we can find out who's calling drv_usecwait(), > using: > lockstat -kIW -f drv_usecwait -s 10 sleep 15 Okay, think, I caught them all here: First the two at sanity (~0% CPU): # lockstat -kIW -f drv_usecwait -s 10 sleep 15 Profiling interrupt: 1 events in 15.041 seconds (0 events/sec) --- Count indv cuml rcnt nsec Hottest CPU+PILCaller 1 100% 100% 0.00 1246 cpu[0] drv_usecwait nsec -- Time Distribution -- count Stack 2048 |@@ 1 ec_wait_ibf_clear ec_rd ec_handler AcpiEvAddressSpaceDispatch AcpiExAccessRegion AcpiExFieldDatumIo AcpiExExtractFromField AcpiExReadDataFromField AcpiExResolveNodeToValue --- # lockstat -kIW -D 20 sleep 15 Profiling interrupt: 2918 events in 15.042 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PILCaller --- 2896 99% 99% 0.00 3174 cpu[1] i86_mwait 12 0% 100% 0.00 3050 cpu[0] (usermode) 2 0% 100% 0.00 2757 cpu[0] mutex_enter 1 0% 100% 0.00 1944 cpu[1]+11 savectx 1 0% 100% 0.00 1886 cpu[1] cv_broadcast 1 0% 100% 0.00 4440 cpu[1] page_get_mnode_freelist 1 0% 100% 0.00 1777 cpu[1] bt_getlowbit 1 0% 100% 0.00 3452 cpu[0] hwblkpagecopy 1 0% 100% 0.00 3109 cpu[0]+5 ddi_mem_put8 1 0% 100% 0.00 3844 cpu[0] _sys_sysenter_post_swapgs 1 0% 100% 0.00 1414 cpu[0]+2 dtrace_dynvar_clean --- The first command usually returned nothing; I ran it around 10 times until I got that output above. Next, the two at ~50% CPU use: # lockstat -kIW -D 20 sleep 15 Profiling interrupt: 3268 events in 16.849 seconds (194 events/sec) Count indv cuml rcnt nsec Hottest CPU+PILCaller --- 1601 49% 49% 0.00 1098 cpu[1]+9 i86_mwait 781 24% 73% 0.00 881 cpu[0]+4 tsc_read 315 10% 83% 0.00 531420 cpu[0]+4 ddi_getl 245 7% 90% 0.00 871 cpu[0]+4 tsc_gethrtime 136 4% 94% 0.00 864 cpu[0]+4 mul32 83 3% 97% 0.00 860 cpu[0]+4 gethrtime 73 2% 99% 0.00 869 cpu[0]+4 drv_usecwait 8 0% 99% 0.0075265 cpu[1] (usermode) 4 0% 99% 0.00 1023 cpu[1]+9 mutex_delay_default 3 0% 99% 0.00 2278 cpu[0]+4 do_splx 3 0% 100% 0.00 1653 cpu[0] AcpiUtDebugPrint 1 0% 100% 0.00 3645 cpu[1]+9 as_segcompar 1 0% 100% 0.00 1710 cpu[1]+9 avl_find 1 0% 100% 0.00 3877 cpu[1]+9 page_lookup_create 1 0% 100% 0.00 976 cpu[1]+9 default_lock_delay 1 0% 100% 0.00 3036 cpu[1]+9 mutex_enter 1 0% 100% 0.00 3232 cpu[1]+9 inb 1 0% 100% 0.00 1633692 cpu[1]+9 ddi_io_put32 1 0% 100% 0.00 951528 cpu[1]+9 ddi_getw 1 0% 100% 0.00 1419253 cpu[1] ddi_getb --- # lockstat -kIW -f drv_usecwait -s 10 sleep 15 Profiling interrupt: 88 events in 16.823 seconds (5 events/sec) --- Count indv c
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> > Ok, for system cpu time usage: try to run a kernel > > profile, to find out what kernel functions are consuming > > the time, lockstat -kIW -D 20 sleep 15 > > I did one on the machine, and then quickly an ssh and > another one in ssh for the screenshot: > > # lockstat -kIW -D 20 sleep 15 > > Profiling interrupt: 3074 events in 15.841 seconds > (194 events/sec) > > Count indv cuml rcnt nsec Hottest CPU+PILCaller > --- > 2430 79% 79% 0.00 2682 cpu[0] i86_mwait > 279 9% 88% 0.00 1364 cpu[0]+4 tsc_read > 113 4% 92% 0.00 554980 cpu[0]+4 ddi_mem_get32 > 103 3% 95% 0.00 1437 cpu[0]+4 tsc_gethrtime >53 2% 97% 0.00 1369 cpu[0]+4 mul32 >35 1% 98% 0.00 1337 cpu[0]+4 gethrtime >28 1% 99% 0.00 1379 cpu[0]+4 drv_usecwait ... > and 10 seconds later it was completely dead. > > Does this help, or do you need another one? Hmm, the 79% i86_mwait() should be 79% idle time. The rest is ~ 20% cpu time usage for accessing some memory mapped registers, reading the cpu's time stamp counter (tsc); on CPU #0 at priority level 4 "cpu[0]+4". Looks like the kernel is busy waiting for some time using drv_usecwait at priority level 4. If you repeat that lockstat, does the result look similar? cpu usage by "cpu[0]+4", in tsc_read(), ddi_mem_get32(), tsc_gethrtime(), ...drv_usecwait() ? Maybe we can find out who's calling drv_usecwait(), using: lockstat -kIW -f drv_usecwait -s 10 sleep 15 -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> ISR(s) > > ... > > 25 0x30 4 PCIEdg MSI0 1 - > pepb_intr_handler > could be related to PCI-e / PCI bus bridge; > maybe some hotplug or power management event > interrupt. > > The five minute delay could be a hint that it is > related to > power management. Are there perhaps BIOS setup > options > to enabled / disable power management for PCI-e > devices? Alas, no. As much as I have come to like the machine, the BIOS is atypical. Just a proprietary "down to cannot set anything here" from HP. -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
[osol-help] Request for mentor- Documentation Project
Hi, I am a computer science student at Amrita University. I have a passion for writing and I am an admirer of FOSS, also a prize winner of Code for Freedom contest 2007. Now I would like to try my hand on Open Solaris Documentation Project. But I didn't get a real idea as how and where to start.. hope to get help and support from the community. -- -- Thanks & Regards Meera R -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org
Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes
> > Looks like driver interrupts, on cpu #0, and at IPL 4. > > > > What interrupts are bound to cpu 0 / IPL 4, on your > > machine? This information is printed by > > > > echo ::interrupts | mdb -k > > This is whole lot while 'sane' (close to 0 CPU use): > IRQ Vect IPL BusTrg Type CPU Share APIC/INT# ISR(s) > ... > 25 0x30 4 PCIEdg MSI0 1 - pepb_intr_handler So it could be related to PCI-e / PCI bus bridge; maybe some hotplug or power management event interrupt. The five minute delay could be a hint that it is related to power management. Are there perhaps BIOS setup options to enabled / disable power management for PCI-e devices? -- This message posted from opensolaris.org ___ opensolaris-help mailing list opensolaris-help@opensolaris.org