Re: [osol-discuss] b134: i915 freezes OS for few seconds

2010-03-24 Thread Brian Ruthven - Sun UK


http://defect.opensolaris.org/bz/show_bug.cgi?id=12528
which has been closed in favour of
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6936915

Brian


Karel Gardas wrote:

Hello,
recently I've discovered that b134 sometimes freezes for few seconds -- usually after some 
GUI activity is performed or during it. First I've though this is similar to this thread: 
http://www.opensolaris.org/jive/thread.jspa?threadID=126157&tstart=0  but I'm not sure 
now since it does not happen regularly but just from time to time. I've also observed that 
after the "freeze" the dmesg contains message like:
Mar 24 14:54:34 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 2418 now 2418

If I grep for it /var/adm/messages I get this for recent days:
Mar 16 21:28:52 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 82522 now 82522
Mar 16 21:36:40 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 115654 now 115654
Mar 17 19:51:24 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 428548 now 428548
Mar 18 09:45:20 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 70062 now 70062
Mar 22 23:24:49 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 617658 now 617658
Mar 22 23:26:58 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 625252 now 625252
Mar 24 14:54:34 thinkpad genunix: [ID 221365 kern.warning] WARNING: 
i915_gem_ring_throttle: i915_wait_request request->seqno 2418 now 2418

The symptoms of "freeze" is that GUI freeze completely for a few seconds. Mouse 
is not moving (even if I try). Nothing change, etc. For example if I start gnome-terminal 
and freeze happen during its start I see terminal window but its backgroud is gray and 
not yet painted into black as it should be. And when GUI is unfreezed, then terminal 
normally repaints and continues in its initialization. Looks like freeze happen just 
during some GUI operation...

My question now is if this is already a well known issue or if I shall report 
it somewhere.

Thanks,
Karel
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Need help debuggin hang issue on build 134

2010-03-23 Thread Brian Ruthven - Sun UK



Ronny Egner wrote:

ff008bfe0480 unix:die+dd ()
ff008bfe0590 unix:trap+177b ()
ff008bfe05a0 unix:cmntrap+e6 ()
ff008bfe0690 0 ()
ff008bfe06b0 unix:debug_enter+38 ()
ff008bfe06d0 unix:abort_sequence_enter+35 ()
ff008bfe0720 kbtrans:kbtrans_streams_key+102 ()
ff008bfe0750 conskbd:conskbdlrput+e7 ()
ff008bfe07c0 unix:putnext+21e ()
ff008bfe0800 kbtrans:kbtrans_queueevent+7c ()
ff008bfe0830 kbtrans:kbtrans_queuepress+7c ()
ff008bfe0870 kbtrans:kbtrans_untrans_keypressed_raw+46 ()
ff008bfe08a0 kbtrans:kbtrans_processkey+32 ()
ff008bfe08f0 kbtrans:kbtrans_streams_key+175 ()
ff008bfe0920 usbkbm:usbkbm_wrap_kbtrans+20 ()
ff008bfe0960 usbkbm:usbkbm_streams_callback+3c ()
ff008bfe09e0 usbkbm:usbkbm_unpack_usb_packet+2f6 ()
ff008bfe0a10 usbkbm:usbkbm_rput+84 ()
ff008bfe0a80 unix:putnext+21e ()
ff008bfe0ac0 hid:hid_interrupt_pipe_callback+7c ()
ff008bfe0b00 usba:usba_req_normal_cb+155 ()
ff008bfe0b60 usba:hcdi_do_cb+133 ()
ff008bfe0ba0 usba:hcdi_cb_thread+b2 ()
ff008bfe0c40 genunix:taskq_thread+248 ()
ff008bfe0c50 unix:thread_start+8 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: all
  



So just judgint from the stack trace ($C macro) the problem seems to be related 
to some kind of USB device?
Can anyone help me?
  


Your stack trace above is the direct result of dropping to the debugger, 
from handling the USB interrupt at the bottom of the trace, all the way 
up through the usbkbm and kbtrans drivers, up to dropping out of Solaris 
via the debug_enter() call. The remainder of the stack is the result of 
forcing the panic - a jmpl to a zero address triggers an instant bad 
trap, and the system panics through the normal mechanism. The USB device 
here is merely the messenger of the Alt-F1 sequence to drop to the 
debugger. This particular stack is not a problem in itself.



For your issue, you will need to see what other threads are doing in the 
system at the time you halted it.


[ Might be worth ruling out 
http://defect.opensolaris.org/bz/show_bug.cgi?id=12528 - I don't think 
it's the right bug, but it reared its head recently. I don't know 
whether this only affects the graphics card or the entire system... ]


You may need to be looking at what the IO layer was doing with the two 
internal disks at the time - IIRC, threads blocked in biowait() are a 
good starting point.


Regards,
Brian


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] using digest with stdin command line input.

2010-02-26 Thread Brian Ruthven - Sun UK

You were so close  :-(

$ echo "This is an input string" | digest -v -a md5
md5 35e1332bf506d35c7c9f1c2b0cb39d4e


Brian


Morten Gulbrandsen wrote:

Please how can I with bash read a string for hashing with digest, from stdin?

bash-3.00$ digest  -v  -a  md5  "This is an input string."
digest: can not open input file This is an input string.

I did observe that the redirect operators are not displayed in the preview, so 
I attach my example file.
  



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] SSH

2010-02-25 Thread Brian Ruthven - Sun UK


Which OS is this?
It's presumably not opensolaris (or perhaps a really old release) given 
the version of bash.

It's also not Sun's SSH client nor server. You may be on your own here.

If this is Solaris 8 thu 10 and you are using Sun's ssh client, then 
please log a support call through your contract.


Regards,
Brian


Gavin Gore wrote:

Can anyone tell me what the hell "SshReadLine/sshreadline.c:2336: Initializing 
ReadLine..." is and why is it taking so long for a response??

When I ssh from one machine, (which used to respond quickly) I get a 
minute/2minute wait before I get a login prompt!

When I do ssh -vvv r...@flint I get;

bash-2.03$ ssh -vvv r...@flint
debug: Connecting to flint, port 22... (SOCKS not used)
debug: Ssh2Transport/trcommon.c:3823: My version: 
SSH-2.0-ReflectionForSecureIT_6.1.0.16
debug: client supports 3 auth methods: 'publickey,keyboard-interactive,password'
debug: Ssh2Common/sshcommon.c:497: local ip = 172.26.235.138, local port = 36662
debug: Ssh2Common/sshcommon.c:499: remote ip = 149.254.0.60, remote port = 22
debug: SshConnection/sshconn.c:1975: Wrapping...
debug: SshReadLine/sshreadline.c:2336: Initializing ReadLine...
 
[LONG PAUSE HERE  1minute]
 
debug: Remote version: SSH-2.0-ReflectionForSecureIT_6.1.0.16

debug: Ssh2Transport/trcommon.c:1422: lang s to c: `', lang c to s: `'
debug: Ssh2Transport/trcommon.c:1488: c_to_s: cipher aes128-ctr, mac hmac-sha1, 
compression none
debug: Ssh2Transport/trcommon.c:1491: s_to_c: cipher aes128-ctr, mac hmac-sha1, 
compression none
debug: Remote host key found from database.
debug: Ssh2Common/sshcommon.c:310: Received SSH_CROSS_STARTUP packet from 
connection protocol.
debug: Ssh2Common/sshcommon.c:360: Received SSH_CROSS_ALGORITHMS packet from 
connection protocol.
debug: server offers auth methods 'publickey,password'.
debug: Ssh2AuthPubKeyClient/authc-pubkey.c:1530: Starting pubkey auth...
debug: Ssh2AuthPubKeyClient/authc-pubkey.c:1487: Agent is not running.
debug: Ssh2AuthPubKeyClient/authc-pubkey.c:1336: Got 0 keys from the agent.
debug: SshUnixUserFiles/sshunixuserfiles.c:255: Using 
'/export/home/ggore/.ssh2/identification' as identity file.
debug: SshConfig/sshconfig.c:2913: Unable to open 
/export/home/ggore/.ssh2/identification
debug: Ssh2AuthPubKeyClient/authc-pubkey.c:1316: Trying 0 key candidates.
debug: Ssh2AuthPubKeyClient/authc-pubkey.c:784: All keys declined by server, 
disabling method.
debug: Ssh2AuthClient/sshauthc.c:333: Method 'publickey' disabled.
debug: server offers auth methods 'publickey,password'.
debug: Ssh2AuthPasswdClient/authc-passwd.c:108: Starting password auth...
root's password:
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] 320Gb FAT32 drive: Solaris reports 327M used despite containing ~20Gb of data

2010-02-11 Thread Brian Ruthven - Sun UK


I have a Western Digital "My Passport" driver (the 320Gb variety) which 
came pre-formatted with FAT32. Having used this drive a bit under 
Windows, when I look on a Solaris system, it claims that only 327M is 
used, despite containing about 20Gb of data:


# df -h /media/My\ Passport
Filesystem size   used  avail capacity  Mounted on
/dev/dsk/c2t0d0p0:1298G   327M   298G 1%/media/My Passport

# du -hds /media/My\ Passport
 21G   /media/My Passport

Windows Vista shows the expected figures for capacity, free and used, 
and the "check for errors" reports cleanly.


Has anybody got any suggestions before I log a bug on this?
The above output is taken from an snv_128 (SXCE) system, but I also see 
the same on snv_132 OpenSolaris.



Thanks,
Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] metastat vs raidctl

2010-02-10 Thread Brian Ruthven - Sun UK


Hi Bill,

This is hardware raid vs. software raid, and you need to understand and 
use the correct tool to administer each one.


Hardware raid uses a dedicated controller (managed via raidctl) which 
mirrors the disk and presents a single "disk" for the OS to use. The os 
must have a driver for the raid card, but otherwise does not have to 
care what the underlying device is (mirror, stripe, raid5, etc)


Software raid uses a kernel module to do the mirroring. You can still 
see the individual drives (c1t0d0 and c1t1d0 in your output below), but 
you access it using the top level metadevice d6. SVM will make sure any 
writes to d6 are written to both d16 and d26, and hence to c1t0d0s6 and 
c1t1d0s6.


The warning about "no databases" from metastat is the first clue that it 
is not using software raid. If raidctl produces output (other than an 
obvious error message), then you are probably using hardware raid.


Hope that helps,
Brian



Bill Ward wrote:

Good day,
We had one of ourv245 with raid1 mirroring appear to loose a drive. metastat 
returns the following :
bash-3.00# raidctl
No RAID volumes found
bash-3.00# metastat
d6: Mirror
Submirror 0: d16
  State: Needs maintenance
Submirror 1: d26
  State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 56894016 blocks (27 GB)

d16: Submirror of d6
State: Needs maintenance
Invoke: metareplace d6 c1t0d0s6 
Size: 56894016 blocks (27 GB)
Stripe 0:
Device Start Block  DbaseState Reloc Hot Spare
c1t0d0s6  0 No Maintenance   Yes


d26: Submirror of d6
State: Okay
Size: 56894016 blocks (27 GB)
Stripe 0:
Device Start Block  DbaseState Reloc Hot Spare
c1t1d0s6  0 NoOkay   Yes

[elided]

Device Relocation Information:
Device   Reloc  Device ID
c1t1d0   Yesid1,s...@n50e01579f140
c1t0d0   Yesid1,s...@n50e0157a12f0
bash-3.00# raidctl
No RAID volumes found

But as you can see, raidctl shows no RAID.

On our T2000 or T5220 that are setup with mirroring they do show raidctl but 
metastat returns an error that there are no existing databases.

Any reason why?
thanks
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Java hang on keyboard entry

2010-02-09 Thread Brian Ruthven - Sun UK


Sounds like http://defect.opensolaris.org/bz/show_bug.cgi?id=13982
Try the workaround and see if it makes any difference ->
   http://defect.opensolaris.org/bz/show_bug.cgi?id=13982#c8

Brian


daniel goulder wrote:

Hi

I have been running OpenSolaris development builds since b126 and have recently 
updated to both 131 and 132

when I upgraded to 131 I noticed that jconsole would hang momentarily whenever 
I made any keyboard input (for example, typing a host to connect to over JMX).

However when I used jvisualvm this behaviour would not occur.

Now I have upgraded to 132 and this behaviour has got worse.  It now occurs on 
both jconsole and jvisualvm.

Does anyone have any idea how I can diagnose this?  I was using the compiz 
window manager and I disabled that to see if it made any difference, but it 
didn't.

TIA

Danny
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] [laptop-discuss] Toshiba M10 - 111 to 131 upgrade issues - GDM fails, blank screen

2010-01-28 Thread Brian Ruthven - Sun UK



Bruce wrote:

On Thu, 28 Jan 2010 14:05:41 +, Brian Ruthven - Sun UK wrote:

  

Bruce Porter wrote:


Hi,

While ago I tried to jump from 111b to something around 129, but I run
into:
http://defect.opensolaris.org/bz/show_bug.cgi?id=12380





I can actually get as far as 129 (but slow boot and login puts me off),
my real big problems start at 130 (the blank screen actually becomes a
hard reset when I leave it alone).

I can boot to CLI with gdm disabled

My logs show nothing that stands out
(Available if anyone is interested)
  
  
  

I've got a note in my /etc/motd saying I copied the i915 and drm modules
from b129 for a similar issue. I sadly didn't note the bug number, but I
suspect 6912996 could be the problem for the slow X startup (which is
marked as fixed in b132).
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6912996

The hard reset issue could be 6914386. Do you have a crash dump in
/var/crash (you might need to enable savecore with "dumpadm -y")
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914386




This is looking more and more like the real big issue, I'll generate a
crashdump

I assume it is bad form to post the entire thing to the NG/Forum ? :-)
  


Er, perhaps :-)

If you could at least give us the panic message and the stack trace, 
that would be a start. Compare them with the bugs I mentioned above and 
see if you find a match.


Brian

  

HTH
Brian







I also did upgrade through several builds. In case your problem is not
related to graphics, then please check your /etc/minor_perm and
/etc/driver_aliases and compare with original on your previous boot
environment eventually fixing entries in those files.





My minor perms show a lot of differences, short of doing a pot luck
edit/merge is there anything in particular that could be an issue ?

---8<
br...@ytclaptop:/mnt/etc$ diff minor_perm /etc/minor_perm 1a2
  
  

clone:dnet 0666 root sys



2a4
  
  

clone:elxl 0666 root sys



3a6
  
  

clone:ibd 0666 root sys



4a8
  
  

clone:iprb 0666 root sys



5a10
  
  

clone:pcelx 0666 root sys



6a12
  
  

clone:spwr 0666 root sys



14a21,23
  
  

clone:llc1 0666 root sys
clone:loop 0666 root sys
clone:ptmx 0666 root sys



17a27,32
  
  

clone:ticlts 0666 root sys
clone:ticots 0666 root sys
clone:ticotsord 0666 root sys
clone:tidg 0666 root sys
clone:tivc 0666 root sys
clone:tmux 0666 root sys



69a85
  
  

md:admin 644 root sys



105a122,148
  
  

clone:bge 0666 root sys
clone:igb 0666 root sys
clone:ixgbe 0666 root sys
clone:rge 0666 root sys
clone:xge 0666 root sys
clone:nge 0666 root sys
clone:e1000g 0666 root sys
clone:chxge 0666 root sys
clone:pcn 0666 root sys
clone:rtls 0666 root sys
clone:ath 0666 root sys
clone:vnic 0666 root sys
clone:ipw 0666 root sys
clone:iwh 0666 root sys
clone:iwi 0666 root sys
clone:iwk 0666 root sys
clone:pcwl 0666 root sys
clone:pcan 0666 root sys
clone:ral 0666 root sys
clone:rtw 0666 root sys
clone:rum 0666 root sys
clone:ural 0666 root sys
clone:wpi 0666 root sys
clone:zyd 0666 root sys
clone:dmfe 0666 root sys
clone:afe 0666 root sys
clone:mxfe 0666 root sys



144a188
  
  

amd_iommu:* 0644 root sys



152a197
  
  

pem:* 0666 bin bin



161a207
  
  

afe:* 0666 root root



183a230
  
  

emlxs:* 0600 root sys



187a235
  
  

qlc:* 0600 root sys



192a241,242
  
  

pcmem:* 0666 bin bin
pcram:* 0666 bin bin



193a244
  
  

e1000g:* 0666 root root



210a262
  
  

sd:* 0640 root sys



238,256d289
< iptunq:* 0640 root sys
< simnet:* 0666 root sys
< clone:simnet 0666 root sys
< clone:bridge 0666 root sys
< sd:* 0640 root sys
< bpf:bpf 0666 root sys
< qlge:* 0666 root sys
< audio:* 0666 root sys
< md:admin 0644 root sys
< amd_iommu:* 0644 root sys
< hwahc:* 0644 root sys
< hwarc:* 0644 root sys
< wusb_ca:* 0666 root sys
< wusb_df:* 0666 root sys
< oce:* 0666 root sys
< dlpistub:* 0666 root sys
< qlc:* 0666 root sys
< emlxs:* 0666 root sys
< clone:ptmx 0666 root sys
---8<



  

Thanks,

Vita




On Mon, 25 Jan 2010, Bruce Porter wrote:




Hi Bruce...

This was certainly the relevant bug for me on my



M10



with Intel graphics,
the new libraries supplied in that CR have me up



and



running on 131 with a
few other issues but at least it is usable.



I am not seeing a link to the libraries ?

Any clue to the "other" issues"


  

HTH
-George


On 25/01/2010 01:04 PM, Bruce Porter wrote:



Ok, did the upgrade this morning

Re: [osol-discuss] [laptop-discuss] Toshiba M10 - 111 to 131 upgrade issues - GDM fails, blank screen

2010-01-28 Thread Brian Ruthven - Sun UK



Bruce Porter wrote:

Hi,

While ago I tried to jump from 111b to something
around 129, but I run into:
http://defect.opensolaris.org/bz/show_bug.cgi?id=12380




I can actually get as far as 129 (but slow boot and login puts me off), my real 
big problems start at 130 (the blank screen actually becomes a hard reset when 
I leave it alone).

I can boot to CLI with gdm disabled

My logs show nothing that stands out
(Available if anyone is interested)
  


I've got a note in my /etc/motd saying I copied the i915 and drm modules 
from b129 for a similar issue. I sadly didn't note the bug number, but I 
suspect 6912996 could be the problem for the slow X startup (which is 
marked as fixed in b132).

   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6912996

The hard reset issue could be 6914386. Do you have a crash dump in 
/var/crash (you might need to enable savecore with "dumpadm -y")

   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914386


HTH
Brian





  

I also did upgrade through several builds. In case
your problem is not
related to graphics, then please check your
/etc/minor_perm and
/etc/driver_aliases and compare with original on your
previous boot
environment eventually fixing entries in those files.




My minor perms show a lot of differences, short of doing a pot luck edit/merge 
is there anything in particular that could be an issue ?

---8<
br...@ytclaptop:/mnt/etc$ diff minor_perm /etc/minor_perm
1a2
  

clone:dnet 0666 root sys


2a4
  

clone:elxl 0666 root sys


3a6
  

clone:ibd 0666 root sys


4a8
  

clone:iprb 0666 root sys


5a10
  

clone:pcelx 0666 root sys


6a12
  

clone:spwr 0666 root sys


14a21,23
  

clone:llc1 0666 root sys
clone:loop 0666 root sys
clone:ptmx 0666 root sys


17a27,32
  

clone:ticlts 0666 root sys
clone:ticots 0666 root sys
clone:ticotsord 0666 root sys
clone:tidg 0666 root sys
clone:tivc 0666 root sys
clone:tmux 0666 root sys


69a85
  

md:admin 644 root sys


105a122,148
  

clone:bge 0666 root sys
clone:igb 0666 root sys
clone:ixgbe 0666 root sys
clone:rge 0666 root sys
clone:xge 0666 root sys
clone:nge 0666 root sys
clone:e1000g 0666 root sys
clone:chxge 0666 root sys
clone:pcn 0666 root sys
clone:rtls 0666 root sys
clone:ath 0666 root sys
clone:vnic 0666 root sys
clone:ipw 0666 root sys
clone:iwh 0666 root sys
clone:iwi 0666 root sys
clone:iwk 0666 root sys
clone:pcwl 0666 root sys
clone:pcan 0666 root sys
clone:ral 0666 root sys
clone:rtw 0666 root sys
clone:rum 0666 root sys
clone:ural 0666 root sys
clone:wpi 0666 root sys
clone:zyd 0666 root sys
clone:dmfe 0666 root sys
clone:afe 0666 root sys
clone:mxfe 0666 root sys


144a188
  

amd_iommu:* 0644 root sys


152a197
  

pem:* 0666 bin bin


161a207
  

afe:* 0666 root root


183a230
  

emlxs:* 0600 root sys


187a235
  

qlc:* 0600 root sys


192a241,242
  

pcmem:* 0666 bin bin
pcram:* 0666 bin bin


193a244
  

e1000g:* 0666 root root


210a262
  

sd:* 0640 root sys


238,256d289
< iptunq:* 0640 root sys
< simnet:* 0666 root sys
< clone:simnet 0666 root sys
< clone:bridge 0666 root sys
< sd:* 0640 root sys
< bpf:bpf 0666 root sys
< qlge:* 0666 root sys
< audio:* 0666 root sys
< md:admin 0644 root sys
< amd_iommu:* 0644 root sys
< hwahc:* 0644 root sys
< hwarc:* 0644 root sys
< wusb_ca:* 0666 root sys
< wusb_df:* 0666 root sys
< oce:* 0666 root sys
< dlpistub:* 0666 root sys
< qlc:* 0666 root sys
< emlxs:* 0666 root sys
< clone:ptmx 0666 root sys
---8< 



  

Thanks,

Vita




On Mon, 25 Jan 2010, Bruce Porter wrote:



Hi Bruce...

This was certainly the relevant bug for me on my


M10


with Intel graphics,
the new libraries supplied in that CR have me up


and


running on 131 with a
few other issues but at least it is usable.


I am not seeing a link to the libraries ?

Any clue to the "other" issues"

  

HTH
-George


On 25/01/2010 01:04 PM, Bruce Porter wrote:


Ok, did the upgrade this morning from 111b to
  

131.


1st off, boot time now appears to be a lot
  

quicker


so an improvement over 124->130 from that POV :-)


Unfortunately I still cannot get the GUI up (as
  

has


been the way since attempts to load 130).


The system boots, GDM starts up, screen goes
  

blank


with cursor in top left. Eventually the entire


screen


blanks (no sign of cursor), so I have to reboot.
Ctrl/Alt/Backspace makes no difference.


I have booted the system into single user and
  

disabled GDM so I can get at any logs that may be
needed to help resolve.


Back to working in 111b.

Is this the relevant bug ?


  

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu


g_id=6912996
___
laptop-discuss mailing list
laptop-disc...@opensolaris.org


--
This message posted from o

Re: [osol-discuss] unable to upgrade from b130 to b131?

2010-01-25 Thread Brian Ruthven - Sun UK


This sounds like the /contrib dependency bug noted in the email 
announcing build 131 (and previous builds too):


http://defect.opensolaris.org/bz/show_bug.cgi?id=13233

If you have packages from the /contrib repo, then check the list 
attached to that bug report. If you have any of them, uninstall them 
prior to running the upgrade and re-install them afterwards (if you need 
them).


Regards,
Brian

solarg wrote:

On 01/25/10 09:48 AM, Alex Smith (K4RNT) wrote:


Try "pfexec pkg image-upgrade" and see if that works.



unfortunately not:
he...@tara:~# pkg image-update
WARNING: pkg(5) appears to be out of date, and should be updated before
running image-update.  Please update pkg(5) using 'pfexec pkg install
SUNWipkg' and then retry the image-update.

as you can see in my precedent message, it refuses to upgrade 
SUNWipkg, i dont understand why?



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] 128a was the latest fairly stable build

2010-01-21 Thread Brian Ruthven - Sun UK



Brian Ruthven - Sun UK wrote:



Bruce wrote:

1) Progressivly longer boot times
snv_111b 2.5 mins to login
snv_128 4.5 mins to login

  


I seem to recall a similar increase in build time,


sorry - I mean boot time.
Fingers working faster than brain...



and a FW update of the DVD drive (of all things!) sorted it out. It 
seems that the relevant driver (I don't recall offhand) was timing out 
on boot and after resume operations which stalled the loading of 
kernel modules for at least 60-90 seconds.


I checked the Toshiba website for any firmware updates (I was 
originally looking for any BIOS updates) and I came across the DVD 
update.


I realise it's a stab in the dark, but I hope it helps on one point.

Brian



--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] 128a was the latest fairly stable build

2010-01-21 Thread Brian Ruthven - Sun UK



Bruce wrote:

1) Progressivly longer boot times
snv_111b 2.5 mins to login
snv_128 4.5 mins to login

  


I seem to recall a similar increase in build time, and a FW update of 
the DVD drive (of all things!) sorted it out. It seems that the relevant 
driver (I don't recall offhand) was timing out on boot and after resume 
operations which stalled the loading of kernel modules for at least 
60-90 seconds.


I checked the Toshiba website for any firmware updates (I was originally 
looking for any BIOS updates) and I came across the DVD update.


I realise it's a stab in the dark, but I hope it helps on one point.

Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Reliable memory checking tool for C++ based applications?

2010-01-20 Thread Brian Ruthven - Sun UK


I've normally found libumem quite reliable (that's not to say the 
::findleaks is bug free, of course).


Silly question - do you know for certain there is a memory leak?

Brian


Karel Gardas wrote:

Hello,

while developing C++ based applications on OS 2009.06 I'm curious if there is 
any reliable tool I may use for checking memory leaks of my C++ 
libraries/applications. I've tried to use:

1) libumem/mdb -- this does not work, i.e. it reports none C++ memory leak
2) librtc/dbx check -- this does not work, i.e. my program crashes quickly when 
librtc is preloaded and its core file suggest it crashes on SIGSEGV directly in 
RTC library
3) purify trial -- unsupported on SunOS 5.11

Strange or perhaps interesting thing is that I remember I've used libumem 
successfully on SXDE 1/08 at its times and while developing the same software, 
but now it's no go.

Do you have any idea what to try and how?

Thanks!
Karel
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] OpenSolaris snv_130 panic.

2010-01-14 Thread Brian Ruthven - Sun UK


sorry - 0xc000 is a bit over-zealous (it's what I had to use for a 
different problem).


Try 0x8000 instead. I'm just curious what the last module loaded 
was. I expected the majority of the output to scroll off the top of the 
screen - it's only the last few lines I was hoping to catch with a 
"loading module XXX" message. It does depend somewhat on how big the 
stack trace is though.


I'm not sure why kmdb is unusable, but it does ring a bell. Maybe I 
suffered that problem in the past. I don't recall what the fault was 
though :-(


Regards,
Brian


Juris Krumins wrote:

I've tried.
But strings are scrolling too fast (is it possible to do some kind of
paging), and the only thing I can see before "trap type 8 ..." is 



kobj_read_file: size 
kobj_close: 0x82

after that I have "trap type 8 ..." and 


panic: entering debugger 
Loaded modules: [ mac specfs ]
kmdb: target stopped at:
kmdb_enter+0xb: movq .


More that that, after I got "trap type 8 ..." I can't use kmdb because
it's unresponsive.
Maybe you can suggest me some tricks (commands for kmdb)  to get all
info, cause I'm not that experienced in kmdb.

 
-Original Message-

From: Brian Ruthven - Sun UK 
To: Juris Krumins 
Cc: opensolaris-discuss@opensolaris.org
Subject: Re: [osol-discuss] OpenSolaris snv_130 panic.
Date: Thu, 14 Jan 2010 11:12:53 +


Juris Krumins wrote:
  

panic[cpu0]/thread = fbc2e3a0: mutex_enter: bad mutex, lp=20 
owner=f000e987f000fea0 thread=fbc2e3a0

unix: mutex_panic + 73 ()
unix: mutex_vector_enter +446()
genunix: zone_getspecific+2b ()
genunix: core+5f ()
unix: kern_gpfault+18e ()
unix: trap+41e ()
unix: cmntrap + e6 ()


I've dig a little bit through src.opensolaris.org and I seems to me that mutex_panic comes from startup.c:1517: 	dispinit(); function call 
So dispinit() call   disp.c:221: mutex_enter(&cpu_lock); which is the case of mutex_panic().
  



That's not quite right. The call to mutex_panic is from 
mutex_vector_enter, and is caused by a "NULL" pointer being passed to 
mutex_enter (actually the value 0x20, but this was probably from 
dereferencing a struct member at offset 0x20 from a NULL pointer).


I'm intrigued that the mutex address (the "lp=" part in the panic 
message) indicates what should be unmapped memory, and I would have 
expected it to panic with a page fault (or whatever the x86 equivalent 
of a BAD TRAP type 0x31 is). Instead, it appears to have found the value 
0xf000e987f000fea0 in there, which failed the validation by 
mutex_vector_enter.


There are three possible calls to mutex_panic from mutex_vector_enter, 
but only one causes "bad mutex":


if (!MUTEX_TYPE_ADAPTIVE(lp)) {
mutex_panic("mutex_enter: bad mutex", lp);
return;
}


Anyway, I digress somewhat. The problem is further down the stack - 
zone_getspecific attempts to acquire zone_lock:



void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
struct zsd_entry *t;
void *data;

mutex_enter(&zone->zone_lock);


Sure enough, zone_lock is at offset 0x20:

 > ::print -a zone_t zone_lock
20 zone_lock {
20 zone_lock._opaque
}


So "zone" was NULL, and the caller of zone_getspecific passed in a NULL 
- this was core().


Rather worryingly, if I look at the core() function on my system (albeit 
snv_128), I see that just before the return address on the stack, I see:


 > core::dis
[snip]
core+0x4d:  movl   +0x3255d5(%rip),%edi 

core+0x53:  movq   +0x2dbba6(%rip),%rsi 


core+0x5a:  call   +0x187701
core+0x5f:  movq   %rax,%r15

This corresponds to the first call to zone_getspecific:

global_cg = zone_getspecific(core_zone_key, global_zone);

So it would seem that the global zone pointer 'global_zone' was not yet 
initialised when you took this panic. It starts life as a NULL and is 
initialised by zone_init, which happens very early in boot. I don't know 
offhand what user-land processes have been started by then, but I'm 
prepared for the answer to be "none". In which case, why is a trap taken 
in user mode?


Can you try this next time please:

Add the "-kd" option to the kernel$ line as you did before.
At the kmdb prompt, type:
moddebug/W 0xe000
:c

Moddebug at this value will cause much to be printed out. See what the 
last "loading XXX" message was just before the "trap type 8..." message.
Also, gather the output of ::ps to check what user-land processes are 
running. I suspect the list will be short :-)




Thanks,
Brian



  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] OpenSolaris snv_130 panic.

2010-01-14 Thread Brian Ruthven - Sun UK



Juris Krumins wrote:

panic[cpu0]/thread = fbc2e3a0: mutex_enter: bad mutex, lp=20 
owner=f000e987f000fea0 thread=fbc2e3a0

unix: mutex_panic + 73 ()
unix: mutex_vector_enter +446()
genunix: zone_getspecific+2b ()
genunix: core+5f ()
unix: kern_gpfault+18e ()
unix: trap+41e ()
unix: cmntrap + e6 ()


I've dig a little bit through src.opensolaris.org and I seems to me that mutex_panic comes from startup.c:1517: 	dispinit(); function call 
So dispinit() call   disp.c:221: mutex_enter(&cpu_lock); which is the case of mutex_panic().
  


That's not quite right. The call to mutex_panic is from 
mutex_vector_enter, and is caused by a "NULL" pointer being passed to 
mutex_enter (actually the value 0x20, but this was probably from 
dereferencing a struct member at offset 0x20 from a NULL pointer).


I'm intrigued that the mutex address (the "lp=" part in the panic 
message) indicates what should be unmapped memory, and I would have 
expected it to panic with a page fault (or whatever the x86 equivalent 
of a BAD TRAP type 0x31 is). Instead, it appears to have found the value 
0xf000e987f000fea0 in there, which failed the validation by 
mutex_vector_enter.


There are three possible calls to mutex_panic from mutex_vector_enter, 
but only one causes "bad mutex":


   if (!MUTEX_TYPE_ADAPTIVE(lp)) {
   mutex_panic("mutex_enter: bad mutex", lp);
   return;
   }


Anyway, I digress somewhat. The problem is further down the stack - 
zone_getspecific attempts to acquire zone_lock:



void *
zone_getspecific(zone_key_t key, zone_t *zone)
{
   struct zsd_entry *t;
   void *data;

   mutex_enter(&zone->zone_lock);


Sure enough, zone_lock is at offset 0x20:

> ::print -a zone_t zone_lock
20 zone_lock {
   20 zone_lock._opaque
}


So "zone" was NULL, and the caller of zone_getspecific passed in a NULL 
- this was core().


Rather worryingly, if I look at the core() function on my system (albeit 
snv_128), I see that just before the return address on the stack, I see:


> core::dis
[snip]
core+0x4d:  movl   +0x3255d5(%rip),%edi 

core+0x53:  movq   +0x2dbba6(%rip),%rsi 


core+0x5a:  call   +0x187701
core+0x5f:  movq   %rax,%r15

This corresponds to the first call to zone_getspecific:

   global_cg = zone_getspecific(core_zone_key, global_zone);

So it would seem that the global zone pointer 'global_zone' was not yet 
initialised when you took this panic. It starts life as a NULL and is 
initialised by zone_init, which happens very early in boot. I don't know 
offhand what user-land processes have been started by then, but I'm 
prepared for the answer to be "none". In which case, why is a trap taken 
in user mode?


Can you try this next time please:

   Add the "-kd" option to the kernel$ line as you did before.
   At the kmdb prompt, type:
   moddebug/W 0xe000
   :c

Moddebug at this value will cause much to be printed out. See what the 
last "loading XXX" message was just before the "trap type 8..." message.
Also, gather the output of ::ps to check what user-land processes are 
running. I suspect the list will be short :-)




Thanks,
Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] PKG - dev repository

2010-01-05 Thread Brian Ruthven - Sun UK


This has been discussed before on this list (I don't have a reference 
handy , but google should be your friend here, and I think it was Nick 
Todd who posted it).


From memory, the steps are something like this:

beadm create snv-126
beadm mount snv-126 /mnt
pkg install -R /mnt ent...@0.5.11,5.11-0.126
bootadm update-archive -R /mnt
beadm unmount snv-126
beadm activate snv-126
reboot


I don't recall whether the update-archive is necessary (I think it is), 
but it wouldn't hurt anyway.


Regards,
Brian



Vinicius Segantin Viteri wrote:

Hi,

I configured my repository to http://pkg.opensolaris.org/dev/ and
always that i use "pkg image-update" the system is upgraded to the
last version (snv_130 now). But i don't want the snv_130. Is there any
configuration on pkg that i can set to upgrade the opensolaris snv_111
(release) to snv_126, for example?

Thanks,
Vinicius Viteri
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Here's some more info for bugster CR 6913752

2010-01-05 Thread Brian Ruthven - Sun UK



andrew wrote:

It doesn't have an option to use external power, however I'm assuming I can 
power it off a USB port on another PC. I'll give that a try tomorrow.

I can tell you that the power is shut off to the USB port since the eSATA 
enclosure has a blue light that is illuminated when power is applied, and it 
goes out. Windows 7 leaves it powered when it suspends to disk (booted off the 
internal drive) whereas OpenSolaris doing an S3 suspend powers it down. It has 
no real reason to keep the USB port powered though, since it doesn't know 
that's how the drive is being powered.

  


The paranoid part of me says this could be a bad idea (floating voltage 
levels between the two PC's power supplies, etc...). An alternative 
could be to plug in a powered USB hub to this system, then power the 
drive from the powered hub.


Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] devfsadm:driver failed to attach:

2010-01-04 Thread Brian Ruthven - Sun UK



samir kumar mishra wrote:


* The .conf file is missing or wrongly placed.



I believe the conf file should be under /usr/kernel/drv instead of where 
you have it, although this is through observation rather than anything else.


You may also have to place the actual file in /usr/kernel/drv - I would 
guess that symlinks may be disallowed to avoid inadvertently loading 
untrusted modules.


Regards,
Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] PXE boot failed with "Error 25: Disk read error"

2009-12-17 Thread Brian Ruthven - Sun UK


Hi Pradeep,

Please log a support call with your local Solution Centre for this. They 
will be able to assist with troubleshooting your S10 install issues.


Regards,
Brian


Pradeep wrote:

Hi All,
   I am installing solaris over pxe using jumpstart server ,
It went through initial tftboot and after while loding module 
it failed giving error 


  module$ /I86PC.Solaris_10-1/$ISADIR/x86.miniroot
loading   module$ /I86PC.Solaris_10-1/$ISADIR/x86.miniroot
Error 25: Disk read error
Attempting to jumpstart a FLAR.


Here is my configuration of jumpstart server 
This solaris 10 update 8

uname -a
SunOS wheeler 5.10 Generic_141445-09 i86pc i386 i86pc
-bash-3.00# pntadm -P 192.168.144.0

Client ID   Flags   Client IP   Server IP   Lease Expiration
Macro   Comment

0100139703012D  00  192.168.144.146 192.168.144.6   12/18/2009  wheeler 
-bash-3.00# 
-bash-3.00# cat /etc/release 
   Solaris 10 10/09 s10x_u8wos_08a X86

   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 16 September 2009
 cat /tftpboot/menu.lst.0100139703012D 
default=0

timeout=30
min_mem64 1024
title Solaris_10 install
kernel$ /I86PC.Solaris_10-1/multiboot kernel/$ISADIR/unix -B 
install_media=192.168.144.6:/export/install
module$ /I86PC.Solaris_10-1/$ISADIR/x86.miniroot
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Finding solaris version information in code

2009-12-17 Thread Brian Ruthven - Sun UK


Why do you want/need to do this? What problem are you trying to solve here?

Regards,
Brian

Shreyas Bhatewara wrote:

Folks,

I am writing a device driver for current and past versions of (Open)Solaris

How can I check the version of the running kernel in the driver code ? I tried 
using utsname.version (/.release) but it does not exist for Solaris 10 or older 
kernels. Is there any other literal/variable which spans current and old 
Solaris kernels which can give the kernel version/ release information ? For Eg 
: linux has LINUX_VERSION_CODE

Thanks in advance.
->Shreyas
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Build 129 image-update catch-22

2009-12-16 Thread Brian Ruthven - Sun UK
If you have installed packages from the /contrib repo, then you may be 
hitting one of the bugs below, and will need to uninstall them.


http://defect.opensolaris.org/bz/show_bug.cgi?id=13241 which is caused by
http://defect.opensolaris.org/bz/show_bug.cgi?id=13233

Cheers,
Brian

Brian Utterback wrote:

I don't know if this is expected or not. I was at build 128 and I
wanted to do an image update to build 129. When I ran the
image-update, it came back and said that before I could run
image-update, I needed to update package SUNWipkg. So I tried to
update SUNWipkg as it said, but then it came back and said that it
could not update SUNWipkg because it required updating too many
packages and I would need a new BE to do it.

I suspect that somehow SUNWipkg got released with some dependencies
that it shouldn't have. I did manage to manually create a BE and
update that, but given that the process is automatic with the package
manager and image-update, it seems rather user unfriendly.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] in.mpathd[208]: [ID 997829 daemon.info] probe status 1 Fake probe reply seq

2009-12-15 Thread Brian Ruthven - Sun UK


Huawei,

Please take this to your local Sun support engineers by logging a call 
through your support contract. IPMP in OpenSolaris is completely 
different to that in Solaris 10, and these mailing lists are targetted 
at OpenSolaris.


Regards,
Brian


Raghu S P wrote:

Hello all,

I am getting above mentioned probe status messages in /var/adm/messages on 
Solaris 10.

i have found some bug ids and pasted here:

6378221  i feel the same kind of scenario is reported here
6390530 -- i have not faced this scenario.

in.mpathd[208]: [ID 997829 daemon.info] probe status 1 Fake probe reply seq 14167 
snxt 14168 on nxge1 from < Cisco gate way>
has anyone guide, why these messages are getting generated?
is there any workaround ?

Regards
Huawei--Born to win
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] snv_111b x86 crashing

2009-12-11 Thread Brian Ruthven - Sun UK


Take a look for the word "panic" in /var/adm/messages. You may see a 
stack trace there if it was a panic. Post everything between the first 
"panic" line and the next instance of "syncing filesystems". This should 
give us the relevant information.


Regards,
Brian

melbogia wrote:

Hello,
I have converted couple of machines we had into opensolaris servers. They have 
15 disk slots so we are using two MV88SX6081 8-port SATA II PCI-X Controller. I 
created a raidz2 pool from 13 disks (first 2 are mirrored for rpool). I then 
try to run a mkfile command to see what write speeds I get

mkfile 100g /datapool/testfile

12 GB into the operation the server crashes/reboots without any indication why.

The other machine is working fine and mkfile doesn't crash it. The hardware is 
almost identical with the exception of disks. 2TB Western Digital disks on that 
one versus 500 GB Seagate disks in the one that crashes.

How do I figure out what is crashing the server or go about fixing it? Any help 
is appreciated.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Group lost its members

2009-12-10 Thread Brian Ruthven - Sun UK



Dr. Martin Mundschenk wrote:


System messages are recorded in /var/adm/messages on Solaris 
(assuming you've not directed them elsewhere in /etc/syslog.conf). 
Search them for the word "panic" or "savecore". Also, find the line 
which begins "SunOS Release" and look at the stuff before it. For 
example, this was a clean shutdown and reboot:




Here are the lines from /var/adm/messages from the time around the reboot:

Dec 10 01:17:48 iunis unix: [ID 836849 kern.notice] 
Dec 10 01:17:48 iunis ^Mpanic[cpu1]/thread=ff0008214c60: 
Dec 10 01:17:48 iunis genunix: [ID 335743 kern.notice] BAD TRAP: 
type=e (#pf Page fault) rp=ff0008214210 addr=0 occurred in module 
"sbp2" due to a NULL pointer dereference
Dec 10 01:17:48 iunis unix: [ID 10 kern.notice] 
Dec 10 01:17:48 iunis unix: [ID 839527 kern.notice] sched: 
Dec 10 01:17:48 iunis unix: [ID 753105 kern.notice] #pf Page fault
Dec 10 01:17:48 iunis unix: [ID 532287 kern.notice] Bad kernel fault 
at addr=0x0

...
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff0008214310 sbp2:sbp2_ses_remove_task_locked+2f ()
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff0008214350 sbp2:sbp2_ses_remove_task+38 ()
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff00082143b0 sbp2:sbp2_ses_submit_task+1a8 ()
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff00082143e0 scsa1394:scsa1394_sbp2_start+43 ()
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff0008214430 scsa1394:scsa1394_scsi_start+f1 ()
Dec 10 01:17:48 iunis genunix: [ID 655072 kern.notice] 
ff0008214470 scsi:scsi_transport+a7 ()




This looks like bug 6835533 ( 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6835533 )

At least the mystery of the reboot can be solved :-)




/etc/group would be the place to check then. What is the timestamp on 
that file? Was it modified around the time your groups stopped working?


Well, good hint, but I modified the file this morning to reconfigure 
the groups. So the modification time has changed.


In that case, I guess I'm fresh out of ideas, unless you have any ZFS 
snapshots you can dig through to find an older copy. Did you enable 
time-slider on the root filesystem by any chance?


Regards,
Brian


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] Group lost its members

2009-12-10 Thread Brian Ruthven - Sun UK



Dr. Martin Mundschenk wrote:


I think we need a bit more information here. You say the system 
rebooted but don't say way (the sendmail messages alone are not 
indicative of a reboot). Was this a panic?


Where do I get more information? At least there is nothing more in syslog.


System messages are recorded in /var/adm/messages on Solaris (assuming 
you've not directed them elsewhere in /etc/syslog.conf). Search them for 
the word "panic" or "savecore". Also, find the line which begins "SunOS 
Release" and look at the stuff before it. For example, this was a clean 
shutdown and reboot:


Nov 20 16:56:39 snv ntpd[618]: [ID 702911 daemon.notice] ntpd exiting on 
signal 15
Nov 20 16:56:39 snv /usr/dt/bin/ttsession[949]: [ID 649700 daemon.error] 
exiting

Nov 20 16:56:42 snv syslogd: going down on signal 15
Nov 20 16:56:42 snv rpcbind: [ID 564983 daemon.error] rpcbind 
terminating on signal.

Nov 20 16:57:07 snv genunix: [ID 672855 kern.notice] syncing file systems...
Nov 20 16:57:07 snv genunix: [ID 904073 kern.notice]  done
Nov 23 09:17:13 snv genunix: [ID 540533 kern.notice] ^MSunOS Release 
5.11 Version snv_126 64-bit
Nov 23 09:17:13 snv genunix: [ID 943908 kern.notice] Copyright 1983-2009 
Sun Microsystems, Inc.  All rights reserved.

Nov 23 09:17:13 snv Use is subject to license terms.


Incidentally, looking in /var/log/syslog (which is where I presume you 
started) I also see these three lines:


Nov 23 09:17:59 snv sendmail[554]: [ID 702911 mail.warning] 
gethostbyaddr(192.168.56.1) failed: 1
Nov 23 09:17:59 snv sendmail[633]: [ID 702911 mail.info] starting daemon 
(8.14.3+Sun): queue...@00:15:00
Nov 23 09:17:59 snv sendmail[634]: [ID 702911 mail.info] starting daemon 
(8.14.3+Sun): smtp+queue...@00:15:00


So these may be a red herring, and my guesswork above may not be valid  :-)
(maybe somebody with sendmail experience can explain to us why this IP 
address seems to crop up?)







My guess on the sendmail and groups issue is a name service problem.
Checking sendmail's source, the message prints h_errno returned from 
gethostbyaddr which, from  is  HOST_NOT_FOUND.
Again, guessing, I would speculate that wherever 192.168.56.1 is 
defined is the same place as where the groups are defined, and they 
became inaccessible for some reason.


Well, 192.168.56.1 is no address in our home network and I don't know, 
why sendmail wants to contact it.


See above. This seems to be something in sendmail itself (or perhaps the 
default config).






Where are the missing groups defined?


Only  local, I guess in files (PAM?).


/etc/group would be the place to check then. What is the timestamp on 
that file? Was it modified around the time your groups stopped working?


Some more esoteric possibilities are a ZFS rollback during the panic 
somehow undid your changes (unlikely), or maybe an upgrade had been 
completed, the new BE activated, but not yet rebooted. Changes then made 
to the /etc/group would not be reflected in the new BE, and the the 
panic would cause the system to boot from the new BE, etc...


Let's start with the panic message and the output from "beadm list" to 
start with.


Regards,
Brian






What name service are you using?
If NIS/NIS+/LDAP, are all the configured servers/replicas still 
running?


Not configured


Did somebody delete these entries from the name service?
Can any other hosts which use the same name service resolve these 
entries? e.g. Does "getent hosts 192.168.56.1" return anything?


no

What are the nsswitch.conf entries for hosts: and group: (and 
possibly netgroup: if you're using it).


passwd: files
group:  files
hosts:  files dns

Martin




___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] Group lost its members

2009-12-10 Thread Brian Ruthven - Sun UK


Hi Martin,

I think we need a bit more information here. You say the system rebooted 
but don't say way (the sendmail messages alone are not indicative of a 
reboot). Was this a panic?


My guess on the sendmail and groups issue is a name service problem.
Checking sendmail's source, the message prints h_errno returned from 
gethostbyaddr which, from  is  HOST_NOT_FOUND.
Again, guessing, I would speculate that wherever 192.168.56.1 is defined 
is the same place as where the groups are defined, and they became 
inaccessible for some reason.


Where are the missing groups defined?
What name service are you using?
   If NIS/NIS+/LDAP, are all the configured servers/replicas still running?
   Did somebody delete these entries from the name service?
   Can any other hosts which use the same name service resolve these 
entries? e.g. Does "getent hosts 192.168.56.1" return anything?
What are the nsswitch.conf entries for hosts: and group: (and possibly 
netgroup: if you're using it).


Regards,
Brian


Dr. Martin Mundschenk wrote:
I just figured out, that the system rebooted at 01:19 h in the night 
with no reason. The only entries in syslog at that time are:




Dec 10 01:20:45 iunis sendmail[553]: [ID 702911 mail.warning] 
gethostbyaddr(192.168.56.1) failed: 1
Dec 10 01:20:45 iunis sendmail[572]: [ID 702911 mail.info] starting 
daemon (8.14.3+Sun): smtp+queue...@00:15:00
Dec 10 01:20:45 iunis sendmail[571]: [ID 702911 mail.info] starting 
daemon (8.14.3+Sun): queue...@00:15:00



I wonder, why it spoiled my groups?!

Martin


Am 10.12.2009 um 10:28 schrieb Dr. Martin Mundschenk:


Hi!

This morning all members of a group vanished. The result was, that 
scripts could not be executed due to missing permissions.


What circumstances can lead to such a behavior?

Martin

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org 







___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] iconv in cronjob

2009-12-09 Thread Brian Ruthven - Sun UK

Hi Martin,

cron will not set up the terminal as an interactive shell would, and I 
doubt that cron will set locale, etc... for josb. Thus, I would conclude 
from your output that it defaults to ASCII (code 646), whereas your 
interactive shell is running a different locale, hence the default 
fromcode is different.


However, according to the iconv(5) man page, 646->8859 is a valid 
conversion, but you need to drop the "-1" suffix.


Try this from the command line:

$ /bin/iconv -f 646 -t 8859 /dev/null

This seems to work for me, but adding "-1" fails:

$ /bin/iconv -f 646 -t 8859-1 /dev/null
Not supported 646 to 8859-1

HTH
Brian


Dr. Martin Mundschenk wrote:

Hi!

I use a script to query a dataset from a DB2 and convert the output to 
ISO-Latin1 using iconv:


iconv -t 8859-1  /tmp/anzeigen_roh.txt > /tmp/tb_anzeigen.txt

It works fine, when invoked from tty. When the script ist invoced by 
crontab, I receive this error:


Not supported 646 to 8859-1

Any hint?

Martin


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] [ug-bosug] Setting up of a Sun lab with Sun Blade 1500, Sun Ray 1G, Sun Fire 440

2009-12-04 Thread Brian Ruthven - Sun UK



L.Guruprasad wrote:
I have the Solaris sparc iso burnt to a dvd 
(sol-10-u8-ga-sparc-dvd.iso). The Sun Blade 1500 machine that I'm 
going to try to install the os, has solaris 9 running already. I can't 
get the computer to boot from the DVD. By the time, monitor displays 
visual, the boot device is already chosen as 'disk' and it is booting 
from the hard disk. Is the sparc iso bootable? Is this the way of 
installing on sparc machines. I have no idea as this is the first time 
I am installing anything on the sparc hardware.


You will need to issue a command like "boot cdrom" from the "ok" prompt. 
If the system has already booted from the disk, then log in as root and 
issue "init 0". This should shut down, and when you see something like 
"Program terminated" followed by "ok".


If the system won't boot, or you don't care about the OS, the hold the 
"Stop" key (top left corner of a Sun-style keyboard) and press A 
(referred to as Stop-A). This should halt the OS and drop you 
immediately to the ok prompt.


Now try typing "boot cdrom". The system should reset, and attempt to 
boot from the "cd" drive (yes it's actually the DVD drive, but the OBP 
doesn't really care).


Hope that helps,
Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] BFU to b128a

2009-11-30 Thread Brian Ruthven - Sun UK


Ian Allison wrote:
This leaves me in the bfu shell, from where I run the automatic 
conflict resolution (/opt/onbld/bin/acr). Everything seems to go fine, 
but when I try to reboot, I see an error like...


df: Could not find mount point for /tmp
//boot/solaris/bin/create_ramdisk[510]:  tmp_free = tmp_free / 3 : bad 
number

//boot/solaris/bin/create_ramdisk[2]: 12100 Segmentation Fault(coredump)
//boot/solaris/bin/create_ramdisk[3]: 12101 Segmentation Fault(coredump)
bootadm: boot-archive creation FAILED, command: 
'//boot/solaris/bin/create_ramdisk'



I'm pretty sure this is a dependency thing because if I do a full 
install rather than "Core System Support" everything seems to work 
find and I can reboot into b128.



These errors you are getting come from this section of create_ramdisk:

if [ $format = ufs ] ; then
   # calculate image size
   getsize

   # check to see if there is sufficient space in tmpfs
   #
   tmp_free=`df -b /tmp | tail -1 | awk '{ printf ($2) }'`
   (( tmp_free = tmp_free / 3 ))


It would seem (from experimentation with failing commands in the 
pipeline) that one of df / tail / awk failed. It would seem to be df:


df: Could not find mount point for /tmp 


So, as a starting point, we need to see why df is complaining. What is 
the output of the following commands (run from before the bfu starts):


   1) type df
   2) df -b /tmp
   3) df /tmp
   4) grep /tmp /etc/mnttab


Regards,
Brian


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] What version of a program in IPS?

2009-11-27 Thread Brian Ruthven - Sun UK


Thommy M. Malmström wrote:

How can I tell what version of for example firefox, that is in an IPS package?
  



You can try something like:

$ pkg contents -m SUNWfirefox | grep legacy
legacy arch=i386 category=FIREFOX,application,JDSosol desc="Mozilla 
Firefox Web browser - development files" hotline="Please contact your 
local service provider" name="Mozilla Firefox Web browser - development 
files" pkg=SUNWfirefox-devel variant.arch=i386 vendor="Sun Microsystems, 
Inc." version=3.5.3,REV=110.0.4.2009.09.29.05.14
legacy arch=i386 category=FIREFOX,application,JDSosol desc="Mozilla 
Firefox Web browser" hotline="Please contact your local service 
provider" name="Mozilla Firefox Web browser" pkg=SUNWfirefox 
variant.arch=i386 vendor="Sun Microsystems, Inc." 
version=3.5.3,REV=110.0.4.2009.09.29.05.14
legacy arch=sparc category=FIREFOX,application,JDSosol desc="Mozilla 
Firefox Web browser" hotline="Please contact your local service 
provider" name="Mozilla Firefox Web browser" pkg=SUNWfirefox 
variant.arch=sparc vendor="Sun Microsystems, Inc." 
version=3.5.3,REV=110.0.4.2009.09.29.08.29
legacy arch=sparc category=FIREFOX,application,JDSosol desc="Mozilla 
Firefox Web browser - development files" hotline="Please contact your 
local service provider" name="Mozilla Firefox Web browser - development 
files" pkg=SUNWfirefox-devel variant.arch=sparc vendor="Sun 
Microsystems, Inc." version=3.5.3,REV=110.0.4.2009.09.29.08.29


The above was taken from snv_125 which includes firefox 3.5.3.

This probably does not hold true for all applications (as it depends 
what metadata was supplied when the package was built), but happens to 
work for firefox and thunderbird. You may also have to search through 
all the raw output to find the version number. For example with pidgin, 
the documentation path includes the version number (but then "pidgin 
--version" tells you :-) ).




Regards,
Brian


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Real Player Installation Issue

2009-11-20 Thread Brian Ruthven - Sun UK


I'm not that familiar with realplayer myself, but it looks like you're 
launching the binary directly instead of a wrapper script (which, I 
guess, sets HELIX_LIBS as indicated in the error).


Is there a ./realplay script?
If so, what happens when you run that?

(I'm ignoring the quotes mark " at the start of the command - I assume 
this is a copy and paste error rather than something you typed?)


Brian


Narendra Tiwary wrote:

" pfexec ./realplay.bin
Warning: The realplay shell script should export HELIX_LIBS environment
variable and be used to launch RealPlayer.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] RBAC database files /etc/security/auth_attr , prof_attr cleaned out.

2009-11-19 Thread Brian Ruthven - Sun UK


I'm guessing a bit here (as I'm not 100% certain how these files are 
generated or delivered), but according to what I think are the "source" 
files:
   
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libsecdb/prof_attr.txt
   
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libsecdb/auth_attr.txt

both files are much larger than what you are seeing.

My "jump-to-cause" would be the cluster installation script/method which 
seems to have added to these files, and stripped out the CDDL block from 
the top too. Checking an snv_125-based OpenSolaris system shows me:


$ wc -l /etc/security/[ap]*_attr
202 /etc/security/auth_attr
131 /etc/security/prof_attr
333 total

If you have an older BE where these files are intact, I would be tempted 
to re-activate that (assuming you've not upgraded your ZFS pool version) 
and retry the various upgrades/installs you did in separate, new BEs to 
find the one which "broke" these files. All this can be done without 
disturbing your existing BE.


Regards,
Brian



dennis mathews wrote:

Has anyone come across their RBAC files ( 200906 - 111b ) being reduced from 
around 60-odd entries to less than 5 ? Are these files auto-generated now by 
any chance ?

Below is the full contents of the files. Incidentally exec_attr still has all 
it's contents. I know this because I've got the fresh installs bootenv.

$ cat /etc/security/auth_attr 
solaris.cluster.admin:::Manage Quorum Server Daemons::

solaris.cluster.read:::Print Quorum Server Configuration::
solaris.smf.manage.zfs-auto-snapshot:::Manage the ZFS Automatic Snapshot 
Service::

$ cat /etc/security/prof_attr 
Basic Solaris Userauths=solaris.cluster.read

Quorum Server Managementauths=solaris.cluster.admin

Looks very strange. I can't run pfexec anymore

pfexec /usr/bin/cat /etc/shadow
/usr/bin/cat: can't get execution attributes

$profiles 
Primary Administrator

Console User
Basic Solaris User
 .. but none of these profiles have any entries in /etc/security/prof_attr

$auths
solaris.device.cdrw,solaris.cluster.read

auths on the fresh install was solaris.*

I have never tried directly editing these files nor have I changed any default 
profiles, or RBAC settings, so I'm confused how this might have happened. Could 
an update has caused this ?

Possibly related to this is that my shutdown option from the menu has 
dissappeared.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] How to make syslogger write to a fifo?

2009-10-30 Thread Brian Ruthven - Sun UK


Hi Harry,

It's not actually that difficult once you get used to it. Think of it 
like learning a new programming language - you're not going to write an 
entire OS from scratch on your first day. Here's an example that took 
about 30 mins to write (although the bulk of this was looking into the 
restart_on property for the  tag):



"/usr/share/lib/xml/dtd/service_bundle.dtd.1">



   
   
   
   
   
   
   
   
   
   




/lib/svc/method/svc-myservice could contain:

#!/bin/sh
. /lib/svc/share/smf_include.sh
FIFO=/tmp/myfifo
# Something here to grep $FIFO /etc/syslog.conf and add it if not there
[ ! -p $FIFO ] && mkfifo -m 644 $FIFO
/usr/local/bin/pipe_reader.pl $FIFO &
exit $SMF_EXIT_OK


You could even use a property within the SMF service (defined in the 
manifest) to store the location of the fifo, but we're getting more 
advanced there... It does avoid having to modify the script each time 
you want to change the fifo location.


Do bear in mind that the SMF framework is much more than simply a "if 
not running, start" type of monitor for services. It allows interlinking 
with dependencies, multiple instances of the same service, grouping of 
faults into boundaries (the "contract" subsystem), running in "degraded" 
mode (although I've not seen this used yet), handling of multiple zones, 
etc...


One of the most useful places I've interacted with SMF is the rpcbind 
program. If rpcbind dies and is restarted on Solaris 8/9, then all your 
services stop working unless you restart each by hand. With Solaris 10+ 
and SMF, if rpcbind dies, SMF will restart it. Moreover, it will also 
restart any services which have declared a dependency upon rpc/bind and 
with a restart_on property which is != 'none'.


The example manifest above will actually restart system-log (i.e. 
syslogd in the current zone) if "myservice" dies. Also, the 
"optional_all" dependency in the "dependent" section states that 
system-log will not be prevented from running if myservice is disabled. 
i.e. syslog won't break if you disable myservice.


All in all, very useful for automated restarting of a service *and its 
dependents*.


Hope that helps,
Brian



Harry Putnam wrote:

Brian Ruthven - Solaris Network Sustaining - Sun UK
 writes:

  

Hi Harry,

I've got nothing canned which I can quickly pass on, however, anything
I gather will be from google ;-)

The top three hits searching for "writing smf manifest" look useful:

http://www.sun.com/bigadmin/content/selfheal/sdev_intro.jsp
http://wikis.sun.com/display/BigAdmin/SMF+Short+Cuts
http://blogs.warwick.ac.uk/chrismay/entry/solaris_smf_manifest/



[...]

Jesus... after looking at those URL a bit I feel like just start
boohooing and go home.  This SMF stuff seems horribly complicated to
me.  They use terms like `JBoss' with no explanation.. 


And the XML itself just seem vastly overdone for something that should
be fairly simple.

  

Hopefully that is enough to get you started. I'd suggest copying the
manifest from a simple service such as utmp.xml and customise it to
your needs. If your service needs a startup script, then you should
include /lib/svc/share/smf_include.sh so you can use the correct exit
codes to signal the right things to the framework.



I guess it will be a start... but man I don't understand hardly any of
it.

To attach a script to an existing service and make it restart when
that service restarts is not really something that should require
yards and yards of code, several documents, and god only knows what
else.

Its tempting to just write a perl script, that looks for the service
to be running, and starts up if it is.  


Is that a really bad approach for this?

`this' in case it has gotten away in the thread is to run a script
that reads from a named-pipe.. that the syslogger writes everything
to. 


The purpose of the script is to have finely grained control over
writing various things to logs... using regular expressions.

And after starting on the script, I realized I might want to change
the regular expressions as the script runs.

So far, I've figured out a way to do that I think, by making the
script read a secondary file every five minutes.  I might write a
regular expression and matching log file in the secondary file and
the script as it runs will start looking for those.  And writing hits
to the new log.

So far I plan just to use matching pairs in the secondary file like
this.

  REGEX LOG.log

What I haven't really got into yet is the best way to have this script
running in the background... checking for syslog to be running.  That
is, the script would never stop running even if the syslogger shut
down. 


Maybe even some kind of `trap' in the script where it would send me a
message in the event it was killed.  (At least for some kinds of KILL)

That's where it starts to look like it might be better to insinuate
this script in there through SMF. 


I'd really like to see a 

Re: [osol-discuss] Problems with hal service on OpenSolaris 2009.06

2009-10-30 Thread Brian Ruthven - Sun UK


You might find that this is 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6850995 and 
simply re-installing the package has only fixed it once.


If it happens again, try "svcadm clear hal" and see if things spring 
into life again.


I did see something like this under builds 124 and 125 where if I booted 
the system with my memory stick connected, hald would dump core every 
time. I've yet to reproduce this with build 126.


Regards,
Brian


carlopmart wrote:

Thanks Paul, I have solved the problem reinstalling hal package.



--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] exec_attr is empty..

2009-10-20 Thread Brian Ruthven - Sun UK


What did you upgrade from?
If you mount the old BE, did the file have any contents there?
Is this reprodicible? (i.e. beadm activate the old BE and attempt a 
fresh upgrade to snv_125 again - does the same happen?)


Regards,
Brian


Alexander wrote:
What a hell!!! 
After updating to opensolaris build 125 I got an empty /etc/security/exec_attr, pkg fix SUNWcs (from old BE) didn't help from the first time, only after deleting empty file.

Do these bugs bother only me?
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] strange bash behavior

2009-10-20 Thread Brian Ruthven - Sun UK


What is your home directory in /etc/passwd?
What is the underlying directory path?
If these two are not the same, then:
   Are you using automounter?
   Are there any symlinks in the path to your homedir?

e.g. I use /home/brian as my home directory, but the underlying path is 
/export/home/brian, and I use automounter. bash-3.2.50(1) works fine 
with this setup if I either "cd $HOME", "cd" (no args), "cd ~" - all use 
the '~' in the prompt.


I would expect bash to use $HOME (set during login) rather than 
something in your home directory (unless you fiddle with it in .bashrc 
or the like).


Regards,
Brian



Chris wrote:

bash rev:
GNU bash, version 3.2.25(1)-release (i386-pc-solaris2.11)
Copyright (C) 2005 Free Software Foundation, Inc.

I just tried the same prompt on another account and it worked fine. I am now 
thinking that this does not work on my account because this account was created 
before I moved the home directory to a different drive. There has to be some 
reference of the old home directory path lingering somewhere that bash is 
looking at on whether to replace it with the tilde or not. Not sure where this 
could be though.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] How to dtrace thread_reaper?

2009-10-20 Thread Brian Ruthven - Sun UK


If I'm not mistaken, thread_reaper() is created during system boot by a 
call to thread_create. It then lives forever in the system until it is 
shut down.
Thus the entry probe will only ever be fired once during the kernel 
initialisation, and by the time you run your dtrace script, 
thread_reaper:entry has long gone.


Take a look at the code at 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/thread.c#896 
and you'll see a big "for (;;)" loop.


Try a probe on thread_reap_list:entry (which is called twice on each 
iteration) and see if that gives you any output.
Whether or not the output is any good or usable is up to you ... and 
note the :entry specifier too to avoid duplicating your output by 
including the return probe.


However, from your description of the way the program works, I'm not 
sure the thread_reaper code is the right place to be looking at anyway. 
I would doubt that normally exiting threads end up on thread_deathrow 
(which is what thread_reaper looks at). I could be wrong... :-)


Regards,
Brian


Thomas Blank wrote:

Hi all,
I want to trace the thread_reaper and want to find out, how often it runs and 
how many threads it really reaps.
I use this one-liner in the first step:
[r...@itotcsol104 bin]# dtrace -n 'fbt:genunix:thread_reaper: { @num[probefunc] 
= count(); }'
dtrace: description 'fbt:genunix:thread_reaper: ' matched 1 probe
^C

During the run there were created about 40,000 threads - the threads exit 
immediately after their work. I monitored the number of threads with the 
nthread macro of mdb.
The mentioned dtrace script ran during the whole test and ~10 minutes 
afterwards, but I do not get any output from it.

What am I doing wrong? Can anybody help me out with this?
Thanks for your help! Thomas
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] How to make syslogger write to a fifo?

2009-10-16 Thread Brian Ruthven - Sun UK


[ Please note - this is off the top of my head, so nothing is tested. 
There may^H^H^H will be errors in here :-) ]


Personally, I would use an SMF service to wrap my reader, and place a 
dependent entry in the manifest something like this:


   
   
   

Then, syslog will wait until your service is running (i.e. reading from 
the pipe) before starting syslog. Moreover, I believe the restart_on 
property should restart syslogd if your service dies, is refreshed or is 
stopped. A value of 'error' may be sufficient for your service...



Then, to answer your second part, the program must open and continuously 
read from the file until EPIPE is received (or some other appropriate 
FIFO error). You can do this in something like C or perl explicitly 
(like "tail -f" does), but as a quick workaround, try:


tail -f mypipe | script

(and make 'script' read from stdin). Yes tail may insert a delay, but at 
least you can quickly workaround the quitting issue. It's not pretty, 
but shouldn't require much modification to your script.


If you're using perl, then this works with syslog:

#!/usr/bin/perl -w
use strict;
if (open(PIPE, "< /var/tmp/mypipe")) {
   while () {
   print "Read $_";
   # Do processing stuff here...
   }
}


Hope that helps,
Brian



Harry Putnam wrote:

Brian Ruthven - Solaris Network Sustaining - Sun UK
 writes:

  
I believe you can create a named pipe (using mkfifo) somewhere in the 
filesystem (I'd suggest somewhere persistent, not /tmp or /var/run, 
etc...). I've seen it created as /var/log/logpipe or similar. Then 
simply name the file in syslog.conf without the '|' symbol.



Haa... ok, that was my error... my linux background.. showing... on
linux syslog you need the pipe there.

  
However, note that if there is nothing reading from the pipe when 
syslogd starts (or when it receives a HUP) then it will ignore it, so 
you need to make sure something is reading the pipe before syslog 
starts. SMF is probably the best way to do this - you can insert a 
dependancy for the svc:/system/system-log service to make sure your pipe 
reader starts first.



  
Also, if there is nothing reading from the pipe when syslogd tries to 
write to it, it will close it and ignore it until the next 
restart/refresh. However, the smf "restart on" property of the 
dependancy can refresh system-log if your reader dies and is restarted.



Thanks, all good info... but one thing... How do you keep the reader
running?

I mean a script using the fifo for input like `script myfifo' quits as
soon as the first line of syslog output comes thru..

Whereas something like `tail -f myfifo' keeps reading.

What I'm after is attaching a perl script to the fifo that is capable
of searching and sorting on any regex or any 2 regex actually, that you
feed the script on startup.

But unlike linux... where the pipe symbol goes into /etc/syslog.conf
on opensolaris my script just quits on the first line of output.

Maybe I can just add some trickiness to the script to make it act like
tail -f... but is that really necessary?

Am I missing something obvious here that would make any script
continue to read from the fifo?

man syslog.conf has no hits on either `fifo' or `pipe' so I'm guessing
there is no help there... I haven't pored through  every line but nothing
jumps out as helpful.

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] swap purge

2009-10-14 Thread Brian Ruthven - Sun UK


How do you know that the processes represented in the 'w' column are in 
fact dead processes?
This bring me back to my original question: What state is the process 
in? (i.e. how does it show up in a "ps -f" listing?)


Do you have a reproducible test case to demonstrate this?

Regards,
Brian



Mike DeMarco wrote:

From: Brian Ruthven - Sun UK 
To: Mike DeMarco 
Cc: opensolaris-discuss@opensolaris.org
Sent: Wed, October 14, 2009 10:58:06 AM
Subject: Re: [osol-discuss] swap purge


What state is the process in? How do you know it is swapped out? What problem 
is this causing?
If the parent crashed, then init should inherit the child as its own, and it 
AFAIK does reap dead children periodically.
If the process shows up as  because the parent is not reaping it, then 
there should be no memory associated with it any longer (actually, there is enough to 
give the hint a process is still there, but the address space should have already 
been reclaimed, and what's left of the process now uses the kernel's address space 
until the parent can be notified).

If you really want to force it into memory, then temporarily removing your swap 
device and re-adding it is one way of trying this, but may fail if you actually 
need the swap space. It's also a very heavy handed way of doing it  ;-)

Regards,
Brian


Whenever we have had systems go deep into swap some software will fail and the 
processes that were swapped out will die. When this happens the swapped out 
process will remain in swap space, there is no pid to get back on the stack to 
pull the process back out of the swap abyss. So the swap space remains consumed 
until the next reboot. The kernel must know that the process is in swap since 
it swapped it out and is keeping track of it as shown by the w column of 
vmstat. There has to be a way to message this iformation in the kernel and tell 
the kernel to forget about that swapped out process it is never coming back. 
This would help my by allowing me to cleanup the disk space that is being 
consumed by swap instead of allocating more disk spaceto swap that is of no use.

I have also found that unless the pid can pull back the swap space when swap -d 
is called the swap -d will fail since the kernel can not clear the swap file 
out fully to delete it.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] swap purge

2009-10-14 Thread Brian Ruthven - Sun UK


What state is the process in? How do you know it is swapped out? What 
problem is this causing?
If the parent crashed, then init should inherit the child as its own, 
and it AFAIK does reap dead children periodically.
If the process shows up as  because the parent is not reaping 
it, then there should be no memory associated with it any longer 
(actually, there is enough to give the hint a process is still there, 
but the address space should have already been reclaimed, and what's 
left of the process now uses the kernel's address space until the parent 
can be notified).


If you really want to force it into memory, then temporarily removing 
your swap device and re-adding it is one way of trying this, but may 
fail if you actually need the swap space. It's also a very heavy handed 
way of doing it  ;-)


Regards,
Brian


Mike DeMarco wrote:

kill doesn't work?

max



Nothing to kill, the process was swapped out and the parent crashed out so 
there is no longer a owner of the swapped out process. It remains in swap until 
the next reboot at which time the kernel no longer has any idea about it so the 
swap space is free. I am looking for a tool that would let me examine the swap 
space allocations that the kernel must know about and send out a reaper to 
reclaim the space and tell the kernel to mark it as free.

Tools to examine swapped out processes seem to be lacking.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] "2-button simulating middle-click" feature stops working mid-session?

2009-10-02 Thread Brian Ruthven - Sun UK


Thanks - I'll take a look and hopefully give it a try next week and post 
any results I find (although it can take a while to reproduce).


Thanks for the input,
Brian

Alan Coopersmith wrote:

Sorry, but I don't know what the kernel driver guys would need to figure out
why their driver is sending events for presses of a button that doesn't exist.
If you need to prove to them it is reporting that, there is a dtrace script
in bug 6526932 that reports the button press events the X server gets from
the kernel - but that would presumably log a very large amount of data if you
don't know when it happens.   You could probably customize it down to just
printing when fe->id == BUT(2).

-Alan Coopersmith-   alan.coopersm...@sun.com
 Sun Microsystems, Inc. - X Window System Engineering

Brian Ruthven - Solaris Network Sustaining - Sun UK wrote:
  

Hi Alan,

Finally, this just happened to me again.
FWIW, I updated to snv_124 yesterday, and with no external mouse plugged
in, I see the "(II) 3rd Button detected: disabling emulate3Button" line
in /var/log/Xorg.0.log (full file attached).

I've been using the system for approx 1 hour so far, and only using the
touchpad.
What further diagnosis would I need before filing a bug?

Thanks,
Brian


Alan Coopersmith wrote:


Brian Ruthven - Solaris Network Sustaining - Sun UK wrote:
 
  

I have a Toshiba Tecra M10, and the mouse pad has two buttons.
Most of the time, I can highlight some text using the left button (click
and drag as normal), then I can press both buttons together to paste in
a target window (i.e. simulating the middle click).

This mostly works (and I usually copy-n-paste this way), but at some
point during my login session, it stops working, and instead I only ever
get the right-click context menu.



The default configuration of Xorg is to recognize left+right as emulating
a third button until/unless a third button is actually clicked, at which
point it assumes you don't need it any more.

Unfortunately, on builds before about 119, the default on Solaris is to
open /dev/mouse and have the kernel combine all mouse like devices into
a single output stream, so a click on an external mouse will disable it
on all mice.   With the switch to hal-based input hotplug in 119 and
later,
each mouse is individually opened, so it should track each one
seperately.

 
  

I've not worked out what changes this, and I've not got a clue where to
start diagnosing this.



Any messages in /var/log/Xorg.0.log about disabling 3 button emulation?

  
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] ssh-agent broken on dev 123?

2009-10-01 Thread Brian Ruthven - Sun UK


This looks like CR 6878610 - 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6878610

Introduced in b122, fixed in b124.

Regards,
Brian

Jan Hnatek wrote:

Hi all,

when using ssh-agent on Osol dev 123, I'm getting:
$ LC_ALL=C ssh-agent ssh 
Error writing to authentication socket.
...

I haven't seen this on earlier builds, any idea
what might be wrong here?

Thanks,
hnhn




--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Problem using snoop

2009-09-15 Thread Brian Ruthven - Sun UK

man snoop:

-a
Listen to packets on /dev/audio (warning: can be noisy).


That's why you get no output on screen. I'm assuming you have no 
speakers/audio device, and I'll also assume you expected to see packets 
on screen rather than hear them (which in itself is a nice gimmick, but 
otherwise slightly useless :-) )

Brian


Amit kumar nayak wrote:

Hi
I got problem using snoop command.
Here is the details as follows.
I am using VMware workstation and running solaris10 in windows platform.
when am trying to using command
 [b][i]snoop -a dhcp[/i][/b]

it gives the message 
[b]Using device /dev/e1000g0 (promiscuous mode)[/b]. without any further proceed.


Please any help regarding the problem.
Thanks in advance.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Exited status 65 in SMF

2009-09-07 Thread Brian Ruthven - Sun UK


For any service, this exit status comes from the start/stop method. Your 
output indicates the stop method returned 65.
Use "svcprop -p stop/exec FMRI" (replacing FMRI as appropriate) to see 
what the stop method is.

If it's a script, look at it.
If it's a binary, try a man page for it.

From what I remember of directory server, it will be a script which is 
responsible for starting it. Have a look and see why it could return 65.


Regards,
Brian


Kittipong Theerawatsathein wrote:

I've got these following messages. So, could anyone can provide me the source 
and meaning of this exited status please?

Directory Server instance '1811' still running!! Failed to stop the ns-slapd 
process: 1811
Waiting for Directory Server instance '' to stop...
Waiting for Directory Server instance '' to stop...
Method "stop" exited with status 65


Thanks
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Mounting Extended partitions logical drives.

2009-08-13 Thread Brian Ruthven - Sun UK


srikalyan wrote:

Hi Everyone.
  I am quite new to opensolaris and very much liking it. I would like to know how to mount extended partition on opensolaris (ext lba). I was using ubuntu before opensolaris and ubuntu use to do it automatically (all you need to do is click the drive). I did install every software which is required for mounting the ntfs drives and I can mount a primary partition on which I have vista. But, I also have windows 7 rc and another partition  which holds all my code for my research in the extended partition. And, I have search a lot but could not find a right solution. 
  


I think the support you need went into snv_119 with PSARC/2006/379 
"Solaris on Extended partition" (supporting bug: 6644364 "Extended 
partitions need to be supported on Solaris").


However, I don't believe snv_119 is available yet.

My suggestion to opensolaris team is that opensolaris is definitely is going to be a big hit and you guyz need to add few things so that i makes end user lives easy. 
for eg:

preinstall gls (colored ls) and alias it in .bashrc.
  
I believe this is already there. Admittedly I'm using snv_119 (rather 
than OpenSolaris), but "ls --color" gives me colour output. Adding the 
appropriate alias in .bashrc is probably left as a personal 
configuration rather than out-of-the-box for all users, however, 
somebody may disagree with me there :-)



cd xyz+tab should only point only directories but not files. because it is a cd 
command.
  
This is a shell thing AFAIK. I use tcsh which can offer completions 
based on command and filetype. From the tcsh man page:

   > complete cd 'p/1/d/'
will tell tcsh to only complete args to the cd command using 
directories. I presume bash has something similar?



preinstall ntfs-3g and auto mount the partition just clicking the drive just 
like ubuntu.
  
Bug 6819757 integrated ntfsprogs - I've not checked whether these are 
the same as (or behave similarly to) ntfs-3g.
I'm not convinced by the argument to allow automounting of them (this 
implies some heuristics to locate and 'type' each partition), but I 
don't necessarily think it would be a bad thing to do.
[ I've lost track of how experimental the NTFS stuff is, so 
automatically mounting may not be a good idea.]


-Brian


Thanks,
~Tiger.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Intallation Of open Solaris10

2009-08-03 Thread Brian Ruthven - Sun UK


If you mean OpenSolaris, then I'd suggest starting at 
http://www.opensolaris.org/os/newbies/


If you mean Solaris 10, the you should contact your normal support 
channels - you are unlikely to find help for that in this forum.


Regards,
Brian


Suraj Sankar wrote:

Hello everyone,

Could you please let me know the steps to install OpenSolaris 10 on a server?
Quite Urgent!!!
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] auto mount nfs shares

2009-07-24 Thread Brian Ruthven - Sun UK


OK.

Your /etc/auto_master should contain a line at the end like this:

/-  /etc/auto_direct


Then, create /etc/auto_direct and put something like this in it:

/local/mount/point  server:/remote/mount/point


If you need to give any options (such as those given on the 
mount_nfs(1M) man page), then add these after the local mount point and 
before the remote mount point, e.g.:


/usr/mysoftware   -oro   server:/export/mysoftware


Once you've done that, running /usr/sbin/automount should mount the new 
mountpoint for you. If it doesn't, then try "svcadm restart autofs".


Incidentally, the name "auto_direct" is not hard-coded - you can name it 
anything you wish.


Regards,
Brian


Harry Putnam wrote:

Brian Ruthven - Sun UK 
writes:

  

Have you checked the automount(1M) man page? The simple case for 1
filesystem to be automounted is to specify it in a direct map. The
section on "Direct Maps" gives an example of what to add to
/etc/auto_master.

See also the "Map Entry Format" section for what to put in /etc/auto_direct.

That should hopefully get you up and running.




I'm denser than most... but now that I'm absolutely confused by what
appears to be a man page that is designed purposely to be confusing.
(I know that was not the intent of the author but jesus does
it really have to be so convuluted).

Seems I need something like:

/etc/auto_master

/localdir   [options] remoteHost:/remotedir

But I'm not really understanding at all what role  /etc/auto_direct
plays.

It appears its another way to annotate something in /etc/auto_master

like:

  /projects-  auto_direct  


I see no mention anywhere of creating an /etc/auto_direct... or any
indication of what might go in it.

Is there no chance of ignoring all the mapping baloney and put some kind
of entry in /etc/vfstab? 


___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] auto mount nfs shares

2009-07-24 Thread Brian Ruthven - Sun UK


Have you checked the automount(1M) man page? The simple case for 1 
filesystem to be automounted is to specify it in a direct map. The 
section on "Direct Maps" gives an example of what to add to 
/etc/auto_master.


See also the "Map Entry Format" section for what to put in /etc/auto_direct.

That should hopefully get you up and running.

Regards,
Brian


Harry Putnam wrote:

I'm pretty sure this is well documented but my google searches like:

  `opensolaris automount nfs'

are turning up so much extraneous bull pucky... I'm not finding it.

I want to make sure a specific nfs share is mounted at bootup.  


Server is opensolaris 2006.09 and client is opensolaris 2010.10.

I'm concerned here with the client mounting a share served to it from
the 2006.09 machine.

The share is readily mountable by hand... I just want to automate it.

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Problems with tempnam

2009-07-16 Thread Brian Ruthven - Sun UK


The truss(1) man page documents this.
I believe any object which still includes symbols (e.g. not stripped) is 
tracable in this manner.


Happy trussing.
Brian


girish.prabhakar...@wipro.com wrote:


Thanks for the reply.
I will try with the -ulibc::tempnam,fopen option.
One last query, in general if I am interested to trace a method using 
truss, will -::methodname work.
Is it applicable to libraries provided by Solaris or even for any 3rd 
party libraries.


Thanks
Girish

-Original Message-
From: brian.ruth...@sun.com [mailto:brian.ruth...@sun.com]
Sent: 15 July 2009 23:42
To: Girish Prabhakarrao
Subject: Re: [osol-discuss] Problems with tempnam


[ I notice you've removed CC:opensolaris-discuss@opensolaris.org - was
this intentional? I've not included it again in case it was, but feel
free to add it to your reply. ]


IIRC, seeing an open followed by a close is exactly the same failure
mode as the old "gethostbyname() fails with >255 file descriptors" issue
(CR 4353836 I think - see
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4353836 for
some details).

In that case, truss would show an open("/etc/netconfig") which suceeded,
returning an fd# >= 256, but in place of a read() command or anything
else, the next line would immediately close(2) the fd.


Replies are inline (including some I missed in your original post):



girish.prabhakar...@wipro.com wrote:
>
> Hi,
> Thanks for the quick reply.
>
> That implies the problem is not with tempnam but with fopen. If i use
> C++ stream libraries instead of fopen will it take the fd limit to the
> one specified by ulimit.
> Say,
> ofstream file.open(temporaryFile);
>

That sounds about right. stdio has documented limits, and it looks like
you're hitting one.

>
> 2)I just removed the fopen and ran the code just with tempnam, the
> truss shows me that there is a stat() and access() system call called
> every time a tempnam is executed.
>
> 23683/1:stat64(0x000484D4, 0xFFBFF670)  = 0
> 23683/1: 0x000484D4: "/home/giripra/"
> 23683/1:d=0x05F0017B i=245281 m=0040777 l=46 u=7145 
> g=1 sz=40960

> 23683/1:at = Jul 15 12:30:26 IST 2009  [ 1247641226 ]
> 23683/1:mt = Jul 15 18:06:04 IST 2009  [ 1247661364 ]
> 23683/1:ct = Jul 15 18:06:04 IST 2009  [ 1247661364 ]
> 23683/1:bsz=8192  blks=88fs=nfs
> 23683/1:access(0x00067C68, 3)   = 0
> 23683/1: 0x00067C68: "/home/giripra"
>
> However on the site where the actual problem is occurring(Please refer
> http://forums.sun.com/thread.jspa?threadID=5396830&tstart=0 

> > ) , I

> see that for some hours when binary is working as expected the number
> of fds are less than 255 and once it stops working the number of fds
> are more than 255. In the code immediatly after tempnam is called we
> call fopen. Your explanation clarfies the problem.
>

Your forum thread relates to Solaris 10. This is an OpenSolaris mailing
list. You would be better pursuing this through your support contract
and your local Solution Centre.


> But if fopen fails why is that we do not see system calls for tempnam
> in the truss. Below I have pasted the truss for 2 cases when binary is
> working as expected(Case when fds are less than 255) and when binary
> does not behave as expected(Occurs when fds in the process has
> exceeded 255)
>

Are these truss outputs from your test case program originally posted or
from the in-production app? If the latter, then I would hope that error
checking is being done. Your test case does not check the value of
outFile after calling fopen(). If it did, it should detect that fopen
returns NULL (or should as documented in the man page) and sets errno to
EMFILE.

If the truss output is coming from your test app, then I'm not sure what
is going on. The equivalent C program always shows the same stat/access
calls, regardless of whether fopen is used or not. Try adding the option
"-ulibc::tempnam,fopen" to truss to trace entry to and exit from the
tempnam() and fopen() library calls. I get output like this around the
255/256 fd mark:


/1...@1:-> libc:tempnam(0x8050b90, 0x0)
/1:stat64("/tmp/brian/", 0x08047900)= 0
/1:access("/tmp/brian", W_OK|X_OK)= 0
/1:getpid()= 1205 [1204]
/1:lstat64("/tmp/brian/SJA44aOwc", 0x08047810)Err#2 ENOENT
/1...@1:<- libc:tempnam() = 0x80662c8
/1...@1:-> libc:fopen(0x80662c8, 0x8050b9c)
/1:open("/tmp/brian/SJA44aOwc", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 255
/1...@1:<- libc:fopen() = 0x80675a0
Count 255 /tmp/brian/SJA44aOwc opened
/1:write(1, " C o u n t   2 5 5   / t".., 38)= 38
/1...@1:-> libc:tempnam(0x8050b90, 0x0)
/1:stat64("/tmp/brian/", 0

Re: [osol-discuss] Problems with tempnam

2009-07-15 Thread Brian Ruthven - Sun UK


You are opening using fopen, and this is limited to 255 file 
descriptors[1]. This is a documented design limitation of stdio, and was 
an intentional limiation when stdio was created.


What is happening is that the stdio library is calling open, which 
returns an fd >255, which it cannot handle, so it silently closes it 
again, and IIRC, all data written to the FILE* handle is silently 
discarded, but don't quote me on that.



[1] Actually, this may now only be true (at least in recent Solaris 
versions) for 32-bit apps. As a quick fix, try re-compiling as a 64-bit 
app if possible and re-test. There was talk some time ago about raising 
the default stdio fd limit for 64-bit apps, but I've lost track of where 
it got to.



Regards,
Brian



Girish wrote:

Hi,
I tried a sample program to see the behaviour of tempnam with increased fds.

#include
#include
#include 
main()
{
int i;
for(i=0;i<5;i++)
{
char *temporaryFile = tempnam("/home/giripra/", 0 );
FILE *outFile = fopen(temporaryFile,"wb");
std::cout<<"Count: "<  



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] update to specific dev branch

2009-07-15 Thread Brian Ruthven - Sun UK


I've seen it discussed before that you do something like this:

make sure your're currently booted to <117 to avoid a backwards "upgrade"
beadm create snv-117
beadm mount snv-117 /mnt
pkg -R /mnt install ent...@0.5.11,5.11-0.117

See http://www.opensolaris.org/jive/thread.jspa?messageID=363217 for the 
full email trail.


Brian


John wrote:

Hi,

Is it possible upgrade to a specific Dev branch?
For example, I would like to install 2009.06 and update to 117, not 118.

Sorry if this has been asked before...

Thanks,
J.
  


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Solaris issue with the ar command

2009-07-06 Thread Brian Ruthven - Sun UK


Sorry I can't really comment on the rest of the post (other than point 
out that most of the time difference is user code rather than kernel), 
but this bit I can help with:



Brian Cameron wrote:



It looks strange to me that a single process is moving from one to
another cpu all the time. Is this normal? can this be tuned? how?



The scheduler in the kernel will place the execution thread on an 
available cpu, preferring the one it last ran on (on the basis that the 
cache may still be hot for this thread's data).


However, sometimes this is not possible (perhaps the cpu is busy, 
perhaps it has been offlined, etc...), so rather than delay the thread 
until the original cpu becomes available, the kernel will schedule the 
thread on another available cpu.


You can control this to some extent using pbind(1M) which will cause the 
process to only run on 1 cpu. You may need to use psrset(1M) to create a 
1-cpu processor set (which will disallow all other LWPs from running on 
that cpu) and then pbind(1M) the process to the processor set.


Brian


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Perhaps this

2009-07-06 Thread Brian Ruthven - Sun UK


I was following with interest Matthias Pfuetzner's blog on something 
similar - basically a very small case with a CF card as the root device 
with a pair of external USB drivers in a mirrored ZFS pool:


http://blogs.sun.com/pfuetz/entry/eco_responsible_small_home_server

It doesn't quite match some of your specs, but it may provoke some 
thoughts :-)



Brian


Johan Kempe wrote:

Perhaps this is not the right forum but it was said in the help forum that that 
was dead and I see all sorts of questions for help in this forum part so I hope 
I'm excused.

I'm going to build myself a NAS with opensolaris and zfs. I would love some tips and info about what hardware you would recommend, that is most stable and supported. I'm looking to build the zfs with 6x 1.5 tb western green drives. Thinking about having the OS on another separate drive, would like to have it mirrored but not with hw raid so that I stay free from problems if that controller card would fail. I'm looking for ECC support on the motherboard. That is, I'm interested in a mobo with at least 6 sata connectors, preferably some pci-e connectors for various of cards and one pci-x port. 


Any tip on a motherboard for this? Preferably less then 150 euro/ 210 usd.

I seems quite hard to go throu all mobos searching and searching the hardware 
comparability list. I was dead set on a mobo with 8 working sata connectors 
thought, maybe I should use a sata card for the extra slots for the OS disk?

How would you go about with the OS disk? would you cut out a bit from the 
raidz2 for it or use seperate disks?

Tips and info really appreciated, thanks.
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] SSH not working after upgrading from release to dev snv_117

2009-07-02 Thread Brian Ruthven - Sun UK


Bringing the answer back to the community in case any others trip over this:

After some email exchanges, Hernan found that LD_LIBRARY_PATH was set to 
include /opt/csw/lib (the Blastwave library directory). Unsetting 
LD_LIBRARY_PATH enabled ssh to work correctly.


Brian


Brian Ruthven - Sun UK wrote:


Can you provide specific details about the client system (presumably 
your opensolaris host), the remote system and can you cut-and-paste 
the error exactly please (along with telling us where you saw it - 
e.g. on the client's terminal, in the server's log file, etc...). 
Please include OS releases and patch levels if appropriate.


If you still have the snv_111b BE around, can you activate it and 
retry the test from that BE to the same ssh server host? Does that 
still work?


Brian


HeCSa wrote:

Hello!
My ssh client, after upgrading from snv_111b to snv_117 is not working
anymore, giving to me an ld.so.1 error.
Does anybody else have this error when trying to ssh a machine from your
Osol one?
Thanks, and best regards,

HeCSa.



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  




--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Error : symbol __0dLACE_WStringEnpos referenced symbol not found

2009-07-01 Thread Brian Ruthven - Sun UK


You've still not really given much detail for us (and in fact, the line 
below doesn't even seen to be a complete command line - I can't see any 
reference to a file to compile). Furthermore, as you say it's Solaris 
10, I'm not sure how much help you're going to get on an opensolaris 
mailing list.


My recommendation is to pursue this either through your Solaris 10 
support contract via your local Sun Solution Centre, or through 
Developer support (starting at 
http://developers.sun.com/services/index.jsp )


Regards,
Brian


Soundararajan, Srinath wrote:

Hi
 
All code was compiling successfully on solaris 8.

when i try in solaris 10 , i chaged compiler and few more folders.
it is compiling but giving run time error.not able to solve. please 
help.let me know if any info required.
 
compiler command
/v7.0.p2/bin/CC -c -G -compat=4 -features=anachronisms -KPIC 
-DOS_SOLARIS_2_5 -instances=static -g -D_DEBUG -DSOLARIS 
-DRW_MULTI_THREAD -DRWDEBUG=1 -DRW_THR_DISABLE_CERTIFIED_ONLY 
-DRW_THR_OS_VERSION_SUNOS=0x0540
 
Regards,

Srinath S

----------------
*From:* Brian Ruthven - Sun UK [mailto:brian.ruth...@sun.com]
*Sent:* Wed 01/07/2009 16:10
*To:* Soundararajan, Srinath
*Cc:* opensolaris-discuss@opensolaris.org
*Subject:* Re: [osol-discuss] Error : symbol __0dLACE_WStringEnpos 
referenced symbol not found



Please give us a little more to go on that that.

As a start, useful context would be things like:

1) Where did you see this error message?
2) What were you doing at the time?
3) What impact (if any) did it have?
4) Is this something which has worked before, but now doesn't work?
5) Have you just upgraded, and did it work before the upgrade?
6) Is it reproducible?


Brian


Rekha wrote:
> fatal: relocation error: file server: symbol __0dLACE_WStringEnpos: 
referenced symbol not found

> Killed
>
>
> can some one please help. not able to solve this error..
>  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG



--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] Error : symbol __0dLACE_WStringEnpos referenced symbol not found

2009-07-01 Thread Brian Ruthven - Sun UK


Please give us a little more to go on that that.

As a start, useful context would be things like:

   1) Where did you see this error message?
   2) What were you doing at the time?
   3) What impact (if any) did it have?
   4) Is this something which has worked before, but now doesn't work?
   5) Have you just upgraded, and did it work before the upgrade?
   6) Is it reproducible?


Brian


Rekha wrote:

fatal: relocation error: file server: symbol __0dLACE_WStringEnpos: referenced 
symbol not found
Killed


can some one please help. not able to solve this error..
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] OpenSolaris 111b to 117 upgrade fails

2009-06-30 Thread Brian Ruthven - Sun UK


To quote Shawn Walker in a previous thread 
(http://www.opensolaris.org/jive/thread.jspa?threadID=106705&tstart=15):


http://defect.opensolaris.org/bz/show_bug.cgi?id=9568

They're basically harmless, feel free to ignore them for now. 


HTH
Brian


Konstantin Lebedev wrote:

Mr. Vikram Dutta says:

  

Update publisher to point to pkg.opensolaris.org/dev
Then Try this on the cmdline
$ pfexec pkg image-update
It may prompt you to update SUNWipkg



These actions have helped me to correct a situation. Updating has passed 
successfully, the truth I have received the following message:

[i]shin...@shinkei:~$ pfexec pkg image-update
DOWNLOADPKGS   FILES XFER (MB)
Completed617/617 27512/27512 424.96/424.96 


PHASEACTIONS
Removal Phase  9107/9107 
Install Phase15216/15216 
Update Phase 24897/25748 driver (softmac) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required.

driver (vnic) upgrade (removal of policy'read_priv_set=net_rawaccess 
write_priv_set=net_rawaccess) failed: minor node spec required.
Update Phase 24976/25748 driver (aggr) upgrade 
(removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) 
failed: minor node spec required.
driver (ibd) upgrade (removal of policy'read_priv_set=net_rawaccess 
write_priv_set=net_rawaccess) failed: minor node spec required.
Update Phase 24983/25748 driver (elxl) upgrade 
(removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) 
failed: minor node spec required.
driver (iprb) upgrade (removal of policy'read_priv_set=net_rawaccess 
write_priv_set=net_rawaccess) failed: minor node spec required.
driver (dnet) upgrade (removal of policy'read_priv_set=net_rawaccess 
write_priv_set=net_rawaccess) failed: minor node spec required.
Update Phase 24991/25748 driver (pcelx) upgrade 
(removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) 
failed: minor node spec required.
Update Phase 25748/25748 
PHASE  ITEMS
Reading Existing Index   8/8 
Indexing Packages617/617 
Optimizing Index...

PHASE  ITEMS
Indexing Packages625/625
[skip][/i]
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] SSH not working after upgrading from release to dev snv_117

2009-06-30 Thread Brian Ruthven - Sun UK


Can you provide specific details about the client system (presumably 
your opensolaris host), the remote system and can you cut-and-paste the 
error exactly please (along with telling us where you saw it - e.g. on 
the client's terminal, in the server's log file, etc...). Please include 
OS releases and patch levels if appropriate.


If you still have the snv_111b BE around, can you activate it and retry 
the test from that BE to the same ssh server host? Does that still work?


Brian


HeCSa wrote:

Hello!
My ssh client, after upgrading from snv_111b to snv_117 is not working
anymore, giving to me an ld.so.1 error.
Does anybody else have this error when trying to ssh a machine from your
Osol one?
Thanks, and best regards,

HeCSa.



___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Bad Memory

2009-06-26 Thread Brian Ruthven - Sun UK


I'd be very surprised if the system booted the "wrong" kernel by 
default. [ If it did, then please file a bug. ]


I'm assuming your CPU is 64-bit?
If so, there are some things you can look at to check you've booted 64-bit.

   1) The boot line will say something like: "SunOS Release 5.11 
Version snv_xxx 64-bit".
   2) Once logged in, you can run "isainfo -v". If it returns 
information about 64-bit, then it would have to be a 64-bit kernel.


Even so, I would expect a 32-bit limitation to either be 2Gb (signed 
32-bit number) or 4Gb (unsigned), not 3Gb. Much more likely is the 
memory is used elsewhere. See previous posts in this thread for some 
information about this:


Jurgen Keil wrote:

~1GB is lost because the address space is used for
PCI device memory mapped i/o ports, apic registers,
pcie memory mapped configuration space, etc...

Some chipsets are able to remap the memory that
is lost because of the pci memory mapped address
range to a physical address >= 4GB. Unlike 32-bit
Windows, 32-bit Solaris would be able to use that
memory >= 4GB.




Incidentally, to answer your question, you can select which kernel by 
editing the grub line. References to $ISADIR will select the kernel 
according to your arch (64 bit first, if possible). You can force the 
32-bit kernel by removing the $ISADIR reference in both the kernel$ and 
module$ lines.


Regards,
Brian



Luis Martinez wrote:

Well, i was suspecting that, but i don't now how to do that!!! I installed the 
system from CD and grub only shows one option. I wonder if i confused in the 
installation... i don't know.

How can i choose 32-bit or 64-bit kernels?
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Assistance finding drivers for Marvell Yukon 88E8038

2009-06-25 Thread Brian Ruthven - Sun UK


The HCL at 
http://www.sun.com/bigadmin/hcl/data/os/components/views/networking_all_results.page1.html 
(linked from the front page of www.opensolaris.org) lists the 88E8036 
device, following the links takes you to http://www.marvell.com/drivers 
to download the drivers. Try that.


I recall using this driver with an older S10 release with no problems 
(may be OK for OpenSolaris, but might be stumped by the quiesce(9F) 
support which was added around snv_106). AFAIK there is no Sun-bundled 
drivers for the Marvell devices (otherwise it would work out of the box 
and you wouldn't be asking this question!).


If the native driver doesn't work, I notice an option to download the 
"Microsoft NDIS2" driver. You may be able to use the NDIS wrapper 
(http://www.opensolaris.org/os/community/laptop/wireless/ndis/) albeit 
with some limitations (like no WPA2-PSK support).


Brian

Alex Smith (K4RNT) wrote:

pci bus 0x0002 cardnum 0x00 function 0x00: vendor 0x11ab device 0x4352
 Marvell Technology Group Ltd. 88E8038 PCI-E Fast Ethernet Controller

That's the result of scanpci. I'm not new to OpenSolaris, but I am new
to rolling my own drivers.

Any assistance would be appreciated.


  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Resource Limit

2009-06-04 Thread Brian Ruthven - Sun UK


Could this be bug 4434773?  
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4434773


The symptoms are the message you mention in /var/adm/messages, and they 
only show up when you do "rctladm -e syslog=debug 
process.max-stack-size" or similar to enable the syslog messages. Looks 
like a bogus message is produced.


It was logged back in 2001, so it doesn't look like a new problem. If 
you've pinned this down to webconsole, is the webconsole still running, 
or does smf show it as maintenance? If you don't actually need 
webconsole, it might be better to disable it anyway :-)


Bug 6589440 (closed as a dup of 4434773) hints that any stack growth 
(within the stack limit) could erroneously trigger this message.


HTH
Brian


Mike DeMarco wrote:

I have found that it is the webconsole that is exceeding the default limit.
I have tried to increase the stack size by adding a ulimit -s 25000 to 
/lib/svc/method/svc-webconsole. When restarting it still reports exceeding the 
process.max-stack size.

genunix: [ID 120576 kern.notice] basic rctl process.max-stack-size (value 
10485760) exceeded by process 1382
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] rcmd: socket: Cannot assign requested address on Solaris 10

2009-05-27 Thread Brian Ruthven - Sun UK


This isn't really the place for S10 questions (and you've not mentioned 
that OpenSolaris is involved), but a little googling and some source 
code provides the answer:



"bad port" is reported by in.rshd under the following circumstances:

   bad_port = (port >= IPPORT_RESERVED ||
   port < (uint_t)(IPPORT_RESERVED/2));

[ 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/cmd-inet/usr.sbin/in.rshd.c#522 
]


IPPORT_RESERVED is defined as 1024, so any port which is >1024 or <512 
will generate this message.


I've not checked whether this port range is part of the RFC, but you may 
need to check what port(s) the client is connecting from. The same code 
existed all the way back to at least Solaris 7, so I doubt this is a new 
issue :-)


What is the client program? Maybe the relevant binary has had it's 
setuid bit (or appropriate privileges) stripped? (I'm guessing here...). 
If both V490's are Solaris 10, then any further issues should be through 
your normal Sun Support channel.


Regards,
Brian


Krishnan wrote:

Hi,
I am having some problems with rsh/remsh on solaris 10
while trying to rsh from a V890 to another V890 , I am getting this error message 
sometimes "rcmd: socket: Cannot assign requested address" and the 
/var/adm/messages file shows this
May 26 11:26:54 hostname rsh[7895]: [ID 521673 daemon.notice] connection from 
"hostname" (IP address) - bad port
This is causing an issue because the app team is cont.injecting files from/to 
servers
the release level is  Solaris 10 5/08 and the patch level is Generic_127127-11
Any help would be appreciated
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Resource Limit

2009-05-26 Thread Brian Ruthven - Sun UK


My advice would be to set up coreadm to capture whichever process is 
doing this rather than guessing. [ This works on the assumption that the 
process is being sent a SEGV, and hence will dump core if so enabled. ]


I would also advise against changing a global value for the sake of one 
(misbehaving?) application. I would much rather identify the process, 
then either log a bug (if this is errant behaviour), understand what 
configuration is necessary to reduce the stack usage, or provide a 
per-process change to the limit (i.e. a shell script wrapper to start 
the process) rather than blindly setting a global variable.


My suspicion is that there is a recursive function somewhere which is 
consuming the stack segment, causing the limit to be reached and the 
process is terminated. Simply raising the limit will give it more 
headroom, but not actually move your problem further forward - the core 
dumps (if any) will simply be larger :-)


Obviously this is speculation, but I would suggest coreadm (with global 
cores and core logging enabled) at least temporarily to catch the 
process. Searching the /var/svc/log files may also reveal a process 
which died unexpectedly, perhaps with the service simply restarting. If 
you suspect Xorg, then start with the cde-login or gdm SMF services to 
see if it reported anything there.


Regards,
Brian


Mike DeMarco wrote:

Thanks for your post Brian:

  It is hard to tell what process is throwing this error as it is only displayed as the PID and since the process dies and svc watcher attempts to restart it the PID number changes. 
>From /var/adm/messages around the time that this error is generated xorg is also trying to start. So My best guess is that it is xorg that wants more stack size.


Is there a way to increase the global max-stack-size above its default of 10Meg? 
/etc/project does not do this. I have found that /etc/project is not working 
properly even under Solaris10u5 & u6, I am working this through Sun support now.

  

I'm guessing here, but: the limit probably comes from
the shell-imposed 
stack limit (default 10Mb / 10240Kb).

This is normally put in place to catch bad apps or
recursive functions 
that don't stop recursing. It helps stop one process
quickly consuming 
vast amounts of RAM (OK, a simplistic approcach, I

know).

The question is - what is the app, and why does it
need more than 10Mb 
for stack?

If it really does need that much, then it may need to
have the stack 
limit increased (perhaps a shell script wrapper with
the appropriate 
"ulimit -s" command, or maybe there's a more clever
way to do this now 
:-) ).


One word of caution, especially for 32-bit processes:
Don't be too 
liberal with the setting for the stack limit. Because
of the location of 
stack and libraries within the process virtual
address space, the stack 
limit effectively removes that amount of memory from
the process. In a 
32-bit process, only 4Gb is available, and setting
the stack limit to 
1Gb would actually only leave approx 3Gb for the
process to use (despite 
the stack only being a few Kb in size).


I would suggest looking at what the process is, and
see whether there is 
a software fault there first, before fiddling with
the stack limit. 
coreadm should help you catch what the process is (as
out of stack 
generates SIGSEGV I believe) if you don't already

know.


Regards,
Brian





--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Resource Limit

2009-05-26 Thread Brian Ruthven - Sun UK


I'm guessing here, but: the limit probably comes from the shell-imposed 
stack limit (default 10Mb / 10240Kb).
This is normally put in place to catch bad apps or recursive functions 
that don't stop recursing. It helps stop one process quickly consuming 
vast amounts of RAM (OK, a simplistic approcach, I know).


The question is - what is the app, and why does it need more than 10Mb 
for stack?
If it really does need that much, then it may need to have the stack 
limit increased (perhaps a shell script wrapper with the appropriate 
"ulimit -s" command, or maybe there's a more clever way to do this now 
:-) ).


One word of caution, especially for 32-bit processes: Don't be too 
liberal with the setting for the stack limit. Because of the location of 
stack and libraries within the process virtual address space, the stack 
limit effectively removes that amount of memory from the process. In a 
32-bit process, only 4Gb is available, and setting the stack limit to 
1Gb would actually only leave approx 3Gb for the process to use (despite 
the stack only being a few Kb in size).


I would suggest looking at what the process is, and see whether there is 
a software fault there first, before fiddling with the stack limit. 
coreadm should help you catch what the process is (as out of stack 
generates SIGSEGV I believe) if you don't already know.



Regards,
Brian



Mike DeMarco wrote:

I am having a problem on build 113 with long boot times and resource limit 
problems.

I have turned on resource logging and have tried to adjust all pools upwards 
but am still getting a warning message on boot with a long delay.

May 25 05:43:55 euclid mac: [ID 469746 kern.info] NOTICE: softmac1006 registered
May 25 05:43:58 euclid genunix: [ID 120576 kern.notice] basic rctl 
process.max-stack-size (value 10485760) exceeded by process 644.
May 25 05:43:58 euclid genunix: [ID 120576 kern.notice] basic rctl 
process.max-stack-size (value 10485760) exceeded by process 674.
May 25 05:44:01 euclid genunix: [ID 120576 kern.notice] basic rctl 
process.max-stack-size (value 10485760) exceeded by process 716.
May 25 05:44:06 euclid last message repeated 12 times
May 25 05:44:55 euclid genunix: [ID 120576 kern.notice] basic rctl 
process.max-stack-size (value 10485760) exceeded by process 645.
May 25 05:44:55 euclid last message repeated 5 times
May 25 05:44:59 euclid genunix: [ID 120576 kern.notice] basic rctl 
process.max-stack-size (value 10485760) exceeded by process 779.
May 25 05:44:59 euclid last message repeated 2 times
May 25 05:45:00 euclid pseudo: [ID 129642 kern.info] pseudo-device: devinfo0

The tuning values set by projmod do not seem to do anything to adjust the 
values as I still see failures for the defaults and not the updated values.

I have attempted to turn off resource limits through svc

svcs -a | grep resource
disabled5:43:34 svc:/system/resource-mgmt:default

So question 1 is why can I not change the resource value so it works on bootup
2) why is it mot turning off.

TIA
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Cannot mount some dos logical drives.

2009-05-26 Thread Brian Ruthven - Sun UK



Atiqur Rahman wrote:

Another thing to notice:
Everything system boots with this warning,
WARNING: missing " on line 1121 /etc/driver_aliases

$ head -1121 /etc/driver_aliases | tail -1
qlc "\"
  


This is a known issue:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6836641

Brian

--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Drops NIC driver during smb transfer, plus weird oddities

2009-05-22 Thread Brian Ruthven - Sun UK


You still may have a potential disk problem, which (assuming it is part 
of your root filesystem/zpool) could explain the hangs.
I'm not convinced that this is either a NIC/network or memory related 
problem.


Usually it is wise to fix any obvious faults first, and the disk 
problems are obvious faults. The "device is gone" strikes me as a 
message that would ultimately stop the system in its tracks unless the 
device comes back. That would certainly account for a hang.


[ Incidentally, the best system I've seen which managed without a root 
filesystem was an E10k domain. When we eventually got the call here at 
Sun, we determined that the last remaining root filesystem mirror disk 
had disappeared 26 hours prior to the system grinding to a halt. 
Goodness knows how it managed for so long before finally giving up, but 
I was rather impressed :-)  ]


Regards,
Brian


CF wrote:

Thanks for replying, Brian.

It's correct that I pushed the reset button, the computer was unresponsive.
After rebooting once more, the NIC returned. I've been looking over the 
different components, and found that my Dom0 were allocated all of my RAM, and 
I suspect this is the cause. I've transferred a few hundred GBs over the LAN 
and external disks, with no performance issues.
After reducing the memory allocated to the xen hypervisor, the system gradually 
freed up more memory. However, if I reduce it to <500MB - even with no virtual 
machines running - my computer hangs terribly, and swap usage sky rockets. I had 
no idea that it was using memory without any VMs running, I need to do some 
research here..

Again, thanks for replying.
  


--
Brian RuthvenSun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Drops NIC driver during smb transfer, plus weird oddities

2009-05-20 Thread Brian Ruthven - Sun UK


By the look of it you have a storage problem. mpt is a disk device 
driver, not a network driver.

The last error prior to reboot (power cycle?) was:

EVENT-TIME: Tue May 19 18:08:33 CEST 2009
PLATFORM: System Product Name, CSN: System Serial Number, HOSTNAME: srv01st
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: ad3d3ce2-0110-cc51-f2e1-befc0cd3f0ba
DESC: The number of I/O errors associated with a ZFS device exceeded
acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more 
information.
AUTO-RESPONSE: The device has been offlined and marked as faulted. An 
attempt

will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.


Looks to me like zfs pulled the plug due to disk errors - the mpt 
warnings: scsi resets, and "device is gone" hint this is a hardware 
problem. It may be that your boot archive or some critical boot-related 
files got damaged, hence the missing NIC driver, but I'm guessing at 
this point.


Personally, I would recommend you first sort out the obvious hardware 
problem (check connections, check the disk is good, etc...) and correct 
or eliminate it before trying to diagnose the other issues.


How did you reboot, by the way? Looks like a power cycle or reset to me.

Regards,
Brian



CF wrote:
I've had a server up for about a week, and something strange happened while copying a batch of files over the network. After copying for about three hours, I lost connectivity to the server, although it answered my ping and the my xVMs were up. 
The machine only showed a black screen with some error text, so I rebooted. Everything worked, except that my main NIC skge0 was unavailable and marked as "missing:[driver unavailable]"

Take a look: http://pastebin.com/f52d34233 I rebooted the machine at 18:52

Basically, a lot of:
#
May 19 18:07:49 srv01st scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:49 srv01st mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31120200
#
May 19 18:07:49 srv01st scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:49 srv01st mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x31120200
#
May 19 18:07:49 srv01st scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:49 srv01st mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31120403
#
May 19 18:07:49 srv01st scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:49 srv01st mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x31120403
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:52 srv01st Log info 0x31120200 received for target 5.
#
May 19 18:07:52 srv01st scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:52 srv01st Log info 0x31120200 received for target 5.
#
May 19 18:07:52 srv01st scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:52 srv01st Log info 0x31120200 received for target 5.
#
May 19 18:07:52 srv01st scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:52 srv01st Log info 0x31120200 received for target 5.
#
May 19 18:07:52 srv01st scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
#
May 19 18:07:52 srv01st Log info 0x31120200 received for target 5.
#
May 19 18:07:52 srv01st scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
#
May 19 18:07:52 srv01st scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):

#
May 19 18:08:33 srv01st fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: Major
#
May 19 18:08:33 srv01st EVENT-TIME: Tue May 19 18:08:33 CEST 2009
#
May 19 18:08:33 srv01st PLATFORM: System Product Name, CSN: System Serial 
Number, HOSTNAME: srv01st
#
May 19 18:08:33 srv01st SOURCE: zfs-diagnosis, REV: 1.0
#
May 19 18:08:33 srv01st EVENT-ID: ad3d3ce2-0110-cc51-f2e1-befc0cd3f0ba
#
May 19 18:08:33 srv01st DESC: The number of I/O errors associated with a ZFS 
device exceeded
#
May 19 18:08:33 srv01st  acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
#
May 19 18:08:33 srv01st AUTO-RESPONSE: The device has been offlined and marked 
as faulted.  An attempt
#
May 19 18:08:33 srv01st  will be made to activate a hot spare if 
available.
#
M