Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Uwe Dippel
> What next?

The machine answered the question: It was running smoothly for close to an 
hour. Then I left for lunch. When I came back, the monitor was black, no 
reaction. I tried all and everything, with power button as last resort. That 
resulted in a cold start.

Since this was a good opportunity, I gave it a shot and pulled a network cable 
to it, and disabled WLAN. And it connected properly to the network; so the NIC 
is probably not broken, as one could assume. The message lines are as in my 
earlier mail, except that there are two more: one with bge0 link up, 
immediately followed by bad address 0.0.0.0
I had issued ifconfig bge0 dhcp afterwards, and there are no more bge0 messages 
in the log.
So what we seem to encounter here, is a bad architectural mistake in the 
kernel. Blame nwam on pulling the wrong cords, nevermind. But the kernel must 
not allow this to happen: When bge0 can't connect, it monopolises all resources 
to load the 'correct' firmware to get it back up? On top of that, I never used 
bge0, always wpi0. So there is no reason at all for the kernel to try to force 
bge0 to work.
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


[osol-help] swapping out system drive

2009-08-08 Thread kenny
hi all

im new to opensolaris, so i just want to confirm something.
i have it configured as a file server just the way a want, everything working.
but now, i want to swap out the system drive so something smaller, as it is 
currently using a 1tb drive, and it uses only like under 50gb of it.

for windows, i know i can just image the system drive on to the new drive and 
have everything just work.
can i do the same with opensolaris? just image the drive and plug it in?

it took me a while to have it setup the way i want, and i dont want to risk 
screwing it up.


thanks
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Uwe Dippel
> Hmm, looks like the bge driver is using
> software interrupts, and I think these could
> be running at priority level 4.
> 
> Seems that the bge hardware has some
> problems, and the driver tries to reset the
> bge network hardware in an attempt to 
> recover from the bge hardware problem.
> 
> bge_poll_firmware() could be busy waiting 
> for up to one second; I suspect this could
> explain the kernel cpu time usage.
> 
> Are there any error or warning messages
> logged to /var/adm/messages when the
> system starts consuming kernel cpu time?
> 
> 
> Maybe the hang can be avoided when the
> bge nic driver isn't used and the bge interface
> is unconfigured / unplumbed?  Or the bge
> nic driver isn't allowed to load, by using
> the kernel option "-B disable-bge=true" ?

I started at the end, with -B disable-bge=true. The network applet still shows 
bge0, but it doesn't try to configure it. ifconfig bge0 unplumb says bge0 is no 
interface, so the kernel option seems to have worked. Lockstat though still 
shows 98% of i86_mwait at 'sane' state.

I checked the /var/adm/messages, but it is so long, and I don't know what I 
should look for. I tried 'excess' and 'consum', but neither had any hits.

What looks strange to me, the layperson in kernel land:
Aug  8 22:05:34 OSolUwe mac: [ID 469746 kern.info] NOTICE: bge0 registered
Aug  8 22:05:34 OSolUwe pci_pci: [ID 370704 kern.info] PCI-device: 
pci103c,3...@e, bge0
Aug  8 22:05:34 OSolUwe genunix: [ID 936769 kern.info] bge0 is 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e
Aug  8 22:05:46 OSolUwe genunix: [ID 408114 kern.info] 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online
Aug  8 22:05:47 OSolUwe ip: [ID 856290 kern.notice] ip: joining multicasts 
failed (4) on bge0 - will use link layer broadcasts for multicast
Aug  8 22:05:50 OSolUwe in.ndpd[366]: [ID 169330 daemon.error] Interface bge0 
has been removed from kernel. in.ndpd will no longer use it
Aug  8 22:05:54 OSolUwe genunix: [ID 408114 kern.info] 
/p...@0,0/pci8086,2...@1e/pci103c,3...@e (bge0) online

At least, I can confirm that now the system keeps running normally; meaning 
that at least the symptoms have been suppressed by that kernel option.

What next?
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] 2009.06 getting slow

2009-08-08 Thread John-Paul Drawneek
Ta, will check on Monday.

Think I have got 10gb out of 250gb left
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] opensolaris login info?

2009-08-08 Thread Leslie Wood
OpenSolaris uses RBAC and root is set as an RBAC role.This means you 
cannot remote login as root and is more secure.  The jack/jack default 
user allows you to login after the install and switch to root and the 
switch user to root is logged.  You can then disable root as a role if 
you wish and add a custom login.  It would be a good idea to remove the 
jack login after you create a custom one or change the jack password.  
Use the following to remove RBAC Role from the root login.


  "rolemod -K type=normal root"

The "jack" account also has RBAC attributes and can issue root commands 
if you prefix with "pfexec", similar to "sudo".


Anon Y Mous wrote:

Did you already try going to google.com and typing in the following words:

   opensolaris root password

?

It's been a while since I've done this, so I might be wrong, but I believe 
OpenSolaris 2009.06 CD has this default regular user:

  username: jack
  password: jack

and the root password is:  opensolaris

During the installation process it usually asks you to create a new user with a user name and a 
password. I recommend not making the user you create during the installation "jack" with 
a password of "jack" as your machine will probably get hacked a few hours after you 
enable ssh if you use a default user-name and password like that.
  

___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] Request for mentor- Documentation Project

2009-08-08 Thread Anon Y Mous
Hi Meera, you might get a bigger response if you post in the "discuss" thread 
here:

  http://www.opensolaris.org/jive/forum.jspa?forumID=13


Good luck to you. I'm a part time on and off computer science university 
student / full time Linux system administrator (Linux skills pay my bills) that 
has been dabbling in OpenSolaris as well. Studying UNIX (i.e. Solaris) instead 
of just knowing Linux and BSD can be a very rewarding experience, but it's also 
a very steep learning curve (think of learning vi for the first time):

  http://dailyvim.blogspot.com/2009/02/editor-comparison.html

and you'll know what to expect. :-)
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Jürgen Keil
> > Maybe we can find out who's calling drv_usecwait(),
> > using:
> > lockstat -kIW -f drv_usecwait -s 10 sleep 15
> # lockstat -kIW -f drv_usecwait -s 10 sleep 15 
> 
> Profiling interrupt: 88 events in 16.823 seconds (5 events/sec)
> 
> ---
> Count indv cuml rcnt nsec Hottest CPU+PILCaller  
>86  98%  98% 0.00  867 cpu[0]+4   drv_usecwait
> 
>   nsec -- Time Distribution -- count Stack   
>   1024 |@@ 76bge_poll_firmware   
>   2048 |@@@10bge_chip_reset  
>  bge_reset   
>  bge_restart 
>  bge_chip_factotum   
>  av_dispatch_softvect
>  dispatch_softint
>  switch_sp_and_call  

Hmm, looks like the bge driver is using
software interrupts, and I think these could
be running at priority level 4.

Seems that the bge hardware has some
problems, and the driver tries to reset the
bge network hardware in an attempt to 
recover from the bge hardware problem.

bge_poll_firmware() could be busy waiting 
for up to one second; I suspect this could
explain the kernel cpu time usage.

Are there any error or warning messages
logged to /var/adm/messages when the
system starts consuming kernel cpu time?


Maybe the hang can be avoided when the
bge nic driver isn't used and the bge interface
is unconfigured / unplumbed?  Or the bge
nic driver isn't allowed to load, by using
the kernel option "-B disable-bge=true" ?
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Uwe Dippel
> If you repeat that lockstat, does the result look
> similar?
> cpu usage by "cpu[0]+4", in tsc_read(),
> ddi_mem_get32(),
> tsc_gethrtime(), ...drv_usecwait()  ?
> 
> 
> Maybe we can find out who's calling drv_usecwait(),
> using:
> lockstat -kIW -f drv_usecwait -s 10 sleep 15

Okay, think, I caught them all here:

First the two at sanity (~0% CPU):

# lockstat -kIW -f drv_usecwait -s 10 sleep 15

Profiling interrupt: 1 events in 15.041 seconds (0 events/sec)

---
Count indv cuml rcnt nsec Hottest CPU+PILCaller  
1 100% 100% 0.00 1246 cpu[0] drv_usecwait

  nsec -- Time Distribution -- count Stack   
  2048 |@@ 1 ec_wait_ibf_clear   
 ec_rd   
 ec_handler  
 AcpiEvAddressSpaceDispatch
 AcpiExAccessRegion  
 AcpiExFieldDatumIo  
 AcpiExExtractFromField  
 AcpiExReadDataFromField
 AcpiExResolveNodeToValue
---

# lockstat -kIW -D 20 sleep 15

Profiling interrupt: 2918 events in 15.042 seconds (194 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller 
---
 2896  99%  99% 0.00 3174 cpu[1] i86_mwait  
   12   0% 100% 0.00 3050 cpu[0] (usermode) 
2   0% 100% 0.00 2757 cpu[0] mutex_enter
1   0% 100% 0.00 1944 cpu[1]+11  savectx
1   0% 100% 0.00 1886 cpu[1] cv_broadcast   
1   0% 100% 0.00 4440 cpu[1] page_get_mnode_freelist
1   0% 100% 0.00 1777 cpu[1] bt_getlowbit   
1   0% 100% 0.00 3452 cpu[0] hwblkpagecopy  
1   0% 100% 0.00 3109 cpu[0]+5   ddi_mem_put8   
1   0% 100% 0.00 3844 cpu[0] _sys_sysenter_post_swapgs
1   0% 100% 0.00 1414 cpu[0]+2   dtrace_dynvar_clean
---

The first command usually returned nothing; I ran it around 10 times until I 
got that output above.


Next, the two at ~50% CPU use:

# lockstat -kIW -D 20 sleep 15

Profiling interrupt: 3268 events in 16.849 seconds (194 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller  
---
 1601  49%  49% 0.00 1098 cpu[1]+9   i86_mwait   
  781  24%  73% 0.00  881 cpu[0]+4   tsc_read
  315  10%  83% 0.00   531420 cpu[0]+4   ddi_getl
  245   7%  90% 0.00  871 cpu[0]+4   tsc_gethrtime   
  136   4%  94% 0.00  864 cpu[0]+4   mul32   
   83   3%  97% 0.00  860 cpu[0]+4   gethrtime   
   73   2%  99% 0.00  869 cpu[0]+4   drv_usecwait
8   0%  99% 0.0075265 cpu[1] (usermode)  
4   0%  99% 0.00 1023 cpu[1]+9   mutex_delay_default 
3   0%  99% 0.00 2278 cpu[0]+4   do_splx 
3   0% 100% 0.00 1653 cpu[0] AcpiUtDebugPrint
1   0% 100% 0.00 3645 cpu[1]+9   as_segcompar
1   0% 100% 0.00 1710 cpu[1]+9   avl_find
1   0% 100% 0.00 3877 cpu[1]+9   page_lookup_create  
1   0% 100% 0.00  976 cpu[1]+9   default_lock_delay  
1   0% 100% 0.00 3036 cpu[1]+9   mutex_enter 
1   0% 100% 0.00 3232 cpu[1]+9   inb 
1   0% 100% 0.00  1633692 cpu[1]+9   ddi_io_put32
1   0% 100% 0.00   951528 cpu[1]+9   ddi_getw
1   0% 100% 0.00  1419253 cpu[1] ddi_getb
---

# lockstat -kIW -f drv_usecwait -s 10 sleep 15 

Profiling interrupt: 88 events in 16.823 seconds (5 events/sec)

---
Count indv c

Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Jürgen Keil
> > Ok, for system cpu time usage:  try to run a kernel
> > profile, to find out what kernel functions are consuming
> > the time,  lockstat -kIW -D 20 sleep 15
> 
> I did one on the machine, and then quickly an ssh and
> another one in ssh for the screenshot:
> 
> # lockstat -kIW -D 20 sleep 15
> 
> Profiling interrupt: 3074 events in 15.841 seconds
> (194 events/sec)
> 
> Count indv cuml rcnt nsec Hottest CPU+PILCaller  
> ---
>  2430  79%  79% 0.00 2682 cpu[0] i86_mwait   
>   279   9%  88% 0.00 1364 cpu[0]+4   tsc_read
>   113   4%  92% 0.00   554980 cpu[0]+4   ddi_mem_get32   
>   103   3%  95% 0.00 1437 cpu[0]+4   tsc_gethrtime   
>53   2%  97% 0.00 1369 cpu[0]+4   mul32   
>35   1%  98% 0.00 1337 cpu[0]+4   gethrtime   
>28   1%  99% 0.00 1379 cpu[0]+4   drv_usecwait
...

> and 10 seconds later it was completely dead.
> 
> Does this help, or do you need another one?

Hmm, the 79% i86_mwait() should be 79% idle time.

The rest is ~ 20% cpu time usage for accessing some
memory mapped registers, reading the cpu's time
stamp counter (tsc); on CPU #0 at priority level 4
"cpu[0]+4".  Looks like the kernel is busy waiting
for some time using drv_usecwait at priority level 4.

If you repeat that lockstat, does the result look similar?
cpu usage by "cpu[0]+4", in tsc_read(), ddi_mem_get32(),
tsc_gethrtime(), ...drv_usecwait()  ?


Maybe we can find out who's calling drv_usecwait(),
using:
lockstat -kIW -f drv_usecwait -s 10 sleep 15
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Uwe Dippel
> ISR(s) 
> > ...
> > 25   0x30 4   PCIEdg MSI0   1 -
> pepb_intr_handler
> could be related to PCI-e / PCI bus bridge;
> maybe some hotplug or power management event
> interrupt.
> 
> The five minute delay could be a hint that it is
> related to
> power management.  Are there perhaps BIOS setup
> options
> to enabled / disable power management for PCI-e
> devices?

Alas, no. As much as I have come to like the machine, the BIOS is atypical. 
Just a proprietary "down to cannot set anything here" from HP.
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


[osol-help] Request for mentor- Documentation Project

2009-08-08 Thread Meera R
Hi,
   I am  a computer science student at Amrita University.
   I have a passion for writing and I am an admirer of FOSS, also a prize
   winner of Code for Freedom contest 2007.
   Now I would like to try my hand on Open Solaris Documentation Project.
   But I didn't get a real idea as how and where to start..
   hope to get help and support from the community.
   --
   --
   Thanks & Regards
   Meera R
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org


Re: [osol-help] OpenSolaris 2009.06 stalls always after some 5 minutes

2009-08-08 Thread Jürgen Keil
> > Looks like driver interrupts, on cpu #0, and at IPL 4.
> > 
> > What interrupts are bound to cpu 0 / IPL 4, on your
> > machine?  This information is printed by 
> > 
> > echo ::interrupts | mdb -k
> 
> This is whole lot while 'sane' (close to 0 CPU use):
> IRQ  Vect IPL BusTrg Type   CPU Share APIC/INT# ISR(s) 
> ...
> 25   0x30 4   PCIEdg MSI0   1 - pepb_intr_handler

So it could be related to PCI-e / PCI bus bridge;
maybe some hotplug or power management event interrupt.

The five minute delay could be a hint that it is related to
power management.  Are there perhaps BIOS setup options
to enabled / disable power management for PCI-e devices?
-- 
This message posted from opensolaris.org
___
opensolaris-help mailing list
opensolaris-help@opensolaris.org