Re: [OpenIndiana-discuss] Deleting zpool.cache on OI

2013-12-01 Thread Steve Gonczi
Yes, I suggest you follow the guidance given by Jim. 
Once you have the system up and running, you may want to try 
to import the pool using an explicit zpool import command. 

Esp. older versions of zfs had a problem when large files were deleted. 
The fs mount path tries to perform any interrupted delete-s that have been 
started before the pool is mounted. Did you by any chance delete 
some large files? 

The pool may come up eventually, but it may take a long time. 
(possibly days). 

Once you decouple the startup from the pool mount, you can either just 
destroy the problem pool, or let it finish what it is doing in the background. 

Steve 


- Original Message -
Yes, Steve, exactly. I'd like to save the rest of my installation, but 
I have a pool that when mounted on any system, prevents a reboot when 
mounted. 

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Deleting zpool.cache on OI

2013-11-30 Thread Steve Gonczi
I could be wrong but I think he wants to know how to do this on a 
system that hangs while trying to mount the pool. 

Your article is should help . Nice job, btw. 

Steve 


- Original Message -
What do you want to achieve this way? 

http://wiki.openindiana.org/oi/Advanced+-+ZFS+Pools+as+SMF+services+and+iSCSI+loopback+mounts
 

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] RAM based devices as ZIL

2013-09-19 Thread Steve Gonczi
For a fast (high ingest rate) system, 4G may not be enough. 

If your Ram device space is not sufficient to hold all the in-flight Zill 
blocks, 
it will fail over, and the Zil will just redirect to your main data pool. 

This is hard to notice, unless you have an idea of how much 
data should be flowing to your pool, as you monitor it with 
zpool iostat. Then you may notice the extra data being written to your 
data pool. 

The calculation of how much Zil space you need is not straight forward, 
because blocks in general are freed in a delayed manner. 

In other words, it is possible that the some Zil blocks are no longer needed 
because the transactions they represent already committed, but the blocks 
have not made it back to available status because of the conservative 
nature of the freed block recycling algo. 

Rule of thumb, 3 to 5 txg-s worth of ingest, depending on who you ask. 

Dedup and compression makes Slog sizing harder, because the Zil is neither 
compressed nor deduped. I would say if you dedup and / or compress, 
all bets are off. 



/sG/ 

- Original Message -
Hello, 

Does anyone have any real world experience using a RAM based device like the 
DDRdrive X1 as a ZIL on 151a7? At 4GB they seem to be a little small but with 
some txg commit interval tweaking it looks like it may work. The entire 4GB is 
saved to NAND in case of a power failure so it seems like a pretty safe 
solution 
(entire system is on UPS and generator anyway). 

Thanks, 

Wim 
___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] spontaneous reboot with record in fault management

2013-08-26 Thread Steve Gonczi
This looks like an actual pci device error to me. 
I would dig deeper and look at the errors with fmdump -v -e 

Steve 



- Original Message -
on occasion i have systems spontaneously rebooting. i can often find entries 
like this in fault management but it is not particularly helpful. i suspect 
there is really nothing wrong and the software is generating a panic and 
rebooting. is there a way to mask this from any type of action or figure out 
what the source of the issue is? 

in this particularly case, i watched the system dump 96gb of ram on to a 
dedicated dump device. however, i was unable to retrieve the data afterwards 
and received a message from savecore that read something like 'save core: bad 
magic number b' 

any insights would be appreciated. 

thanks, 
j. 


root@db017:~# fmadm faulty 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Inefficient zvol space usage on 4k drives

2013-08-07 Thread Steve Gonczi
Hi Jim, 

This looks to me more like a rounding-up problem, esp. looking at the 
bug report quoted. The waste factor increases as the block size goes 
down. Kind of looks like it fits the ratio of the blocks nominal size, vs 
its minimal on-disk foot print. 

For example, compressed blocks are variable size. 
If a block compresses to some small but non-zero size, it would take 
up the size of the smallest on-disk allocation unit. For an 8K block 
block, the smallest non-zero allocation could be 4K ( vs. 512 bytes) 

A similar thing would happen to small files, taking up less than a single 
block's 
worth of bytes. Zfs alters the block size for these, to closely match the 
actual bytes stored. 
A one byte file would take up merely a single sector. For small files, 
512 byes vs. 4K minimum size can make a big difference. 

If most of the blocks are compressed, or there are a lot of small files, 
the 8k vs 512 or the 8k vs 4k ratio pretty much predicts doubling of 
the on-disk footprint at 8k block size. 

I do not see how the sector size could cause a similarly significant 
increase in the on-disk footprint by making metadata storage inefficient. 

I presume when you are talking about metadata, you mean 
the interior nodes (level  0) of files. 

If a file is = 3 blocks in size, it will not have any interior nodes. 
Otherwise, the nodes are allocated one page at a time, as many as needed. 
Metadata pages currently contain 128 block pointer structs (128*128 bytes == 
16K) 
This interior node page size is independent of the file system's 
user-changeable 
block size. 
I do not believe that these pages are variable size. 
So a rough guesstimate would be : one 16K metadata page for every 128 blocks 
in the file. 
(Technically, there could be multiple levels of interior node pages, but the 
128x fanout 
is so aggressive that you can neglect those for an order-of-magnitude 
rough guess) 


On the average, metadata takes up less than 2% of the space needed by user 
payload. 

I am planning on playing with 4K sectors to try and repeat the experiment 
mentioned, 
I am curious what are the performance and space usage implications when 
the file size and compression are taken into consideration. 


Steve 

- Original Message -
Yes, I've had similar results on my rig and complained some time ago... 
yet the ZFS world moves forward with desiring ashift=12 as the default 
(and it may be inevitable ultimately). I think the main problem is that 
small userdata blocks involve a larger portion of metadata, which may 
come in small blocks which don't fully cover a sector (supposedly they 
should be aggregated into up-to-16k clusters, but evidently are not 
always so. 

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] COMSTAR qlt dropping link and resetting

2012-05-20 Thread Steve Gonczi
At one time I posted a dtrace script to track txg open times.  Look for it in 
the forum archives or I can repost it .Some Other folks Posted Similar
Scripts ...

I would not be surprised to find a txg being open for an unusually long time 
when the problem happens.

That would Indicate a problem in the
Zfs disk io path. eg.  Large File deletions  with dedup Turned
 On may cause an Io Storm and that
In turn may cause your
Problem.

Steve 


On May 18, 2012, at 4:25 AM, Adrian Carpenter ta...@wbic.cam.ac.uk wrote:

 We are  using one port of each of  a pair of Qlogic 2562 cards to act as a FC 
 target for our Xen environment, we are running oi_151a4.  The other port on 
 each card is used as an initiator to attach to some FC storage(Nexsan).  We 
 use two Qlogic SanBoxes configured so that the Xen hypervisors have a 
 redundant path to the FC target which provides a Storage Repository from a 
 ZVOL.  Everything works really well as expected with good throughput,  but 
 randomly once every couple of days simultaneously  BOTH  FC targets on the 
 Openindiana box reset their FC connections -  this of course causes real 
 problem in Xen environment and is not help by having redundant multipaths.
 
 We are at a bit of a loss,  does anyone have any suggestions?
 
 Adrian
 
 
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Access to /etc on a pool from another system?

2012-03-19 Thread Steve Gonczi
I am reading you mail as I connect the disk array from a dismantled OI 148 
system to another computer and... 

/etc is typically on your root pool (ie. rpool). 

Note that the rpool is often on a different device, perhaps a flash drive. You 
may, or may not 
have connected this to the new motherboard. 


Steve 

- Original Message -
I'm trying to access the /etc files from another system on which I installed OI 
148. I can import the pool as fpool and can access /mnt/fpool  /mnt/export. 

But for the life of me I can't figure out how to get to the /etc filesystem in 
fpool. All the examples google turns up point to things I already know how to 
do (e.g access fpool/export) 

I *think* I've done this before, but don't find any notes in my logbook 

Thanks, 
Reg 

___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] How do I debug this?

2012-02-08 Thread Steve Gonczi
It is long shot, but check how much space you have where your core dumps 
supposed 
to go. Your root pool may have limited space. 

Also, the visibility of core dumps has security implications, they could be 
inaccessible 
unless you are looking as root. 

Steve 

- Original Message -
Thanks. I'll give it a go and see if I get a core file. Interesting 
that I have per-process core dumps enabled but this one just didn't show up. 


___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing OI 151 ssh with OpenSSH 5.9?

2012-01-18 Thread Steve Gonczi
Take a look at README.altprivsep in usr/src/cmd/ssh. 
Seems like the Solaris team significantly changed how privilege 
separation works. 

Looking at the Illumos hg log (which contains 
the tail end of the Osol hg log) the Sun ssh code was periodically 
resynced with openssh. The last resync visible 2009/408 
(presumably 2009 april 8). 

That would peg the Sun ssh version as last synced with OpenSSH 5.2. 

The current OpenSSH is 5.9.. 


Steve G 



- Original Message -
They're needed so that sshd correctly uses solaris's version of PAM and audit 
and other subsystems like that. 


Probably but someone would have to do the work. 


___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] ZIL write cache performance

2012-01-13 Thread Steve Gonczi
In the scenario you are describing 
An ssd should be faster 
The reason it is not cut and dry is because you are comparing the spare 
Bandwidth of your array which has possibly many
Spindles to the bandwidth of a single device. So it depends on how fast your 
array is, how much spare bandwidth it has and what is the sustainable write 
rate of your ssd. There is no guarantee that the ssd will come out on top 
although in your case it would  with the perf fix

::sG::


On Jan 13, 2012, at 6:56 AM, Matt Connolly matt.connolly...@gmail.com wrote:

 Yes, it is as you guess comparing ZIL on the main pool vs ZIL on an SSD. I 
 understand that the ZIL on its own is more of an integrity function rather 
 than a performance boost. 
 
 However, I would have expected some performance boost by using an SSD log 
 device since writing to the dedicated log device reduces I/O load on the main 
 pool (or is this wrong?)
 
 Thanks for the heads up about the bug and pending fix.. I'll take a look.
 
 -Matt. 
 
 On 13/01/2012, at 9:34 AM, Steve Gonczi gon...@comcast.net wrote:
 
 Hi Matt, 
 
 The ZIL is not a performance enhancer. (This is a common misunderstanding, 
 people sometimes view the ZIL as a write cache) . 
 
 It is a way to simulate sync semantics on files where you really 
 need that, instead of the coarser ganularity guarantee that zfs gives 
 you without it. (txg level, where the in-progress transaction group may 
 roll back if you crash). 
 
 If I am reading your post correctly, you are comparing 2 scenarios 
 
 1) Zil is enabled, and goes to the main storage pool 
 2) Zil is enabled but it goes to a dedicated SSD instead. 
 Please verify that this is indeed the case.
 
 Yes, this is the case. 
 
 You should not expect having an SSD based zil performing better than 
 when turning the ZIL off altogether. 
 
 The latter of course will have better 
 performance, but you have to live with the possibility of losing some data. 
 
 Given that the case is (1) and (2), it all depends on how much performance 
 headroom your pool has, vs. the write performance of the SSD. 
 
 A fast SSD ( e.g.: DRAM based, and preferably dedicated to Zil and not 
 split) 
 would work best. It does not have to be huge, just large enough to store 
 (say) 
 5 seconds worth of your planned peak data inflow. 
 
 You need to be aware of a recent performance regression discovered 
 pertaining to 
 ZIL ( George Wilson has just posted the fix for review on the illumos dev 
 list) 
 This has been in Illumos for a while, so it is possible, that it is biting 
 you. 
 
 Steve 
 
 - Original Message -
 Hi, I've installed an SSD drive in my OI machine and have it partitioned 
 (sliced) with a main slice to boot from and a smaller slice to use as a 
 write cache (ZIL) for our data pool. 
 
 I've noticed that for many tasks, using the ZIL actually slows many tasks at 
 hand (operation within a qemu-kvm virtual machine, mysql loading importing a 
 dump file, etc). I know I bought a cheap SSD to play with so I wasn't 
 expected the best performance, but I would have expected some improvement, 
 not a slow down. 
 
 In one particular test, I have mysql running in a zone and loading a test 
 data set takes about 40 seconds without the ZIL and about 60 seconds with 
 ZIL. I certainly wasn't expecting a 50% slow down. 
 
 Is this to be expected? 
 
 Are there any best practices for testing an SSD to see if it will actually 
 improve performance of a zfs pool? 
 
 
 Thanks, 
 Matt 
 
 
 ___ 
 OpenIndiana-discuss mailing list 
 OpenIndiana-discuss@openindiana.org 
 http://openindiana.org/mailman/listinfo/openindiana-discuss 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] ZIL write cache performance

2012-01-12 Thread Steve Gonczi
Hi Matt, 

The ZIL is not a performance enhancer. (This is a common misunderstanding, 
people sometimes view the ZIL as a write cache) . 

It is a way to simulate sync semantics on files where you really 
need that, instead of the coarser ganularity guarantee that zfs gives 
you without it. (txg level, where the in-progress transaction group may 
roll back if you crash). 

If I am reading your post correctly, you are comparing 2 scenarios 

1) Zil is enabled, and goes to the main storage pool 
2) Zil is enabled but it goes to a dedicated SSD instead. 
Please verify that this is indeed the case. 

You should not expect having an SSD based zil performing better than 
when turning the ZIL off altogether. 

The latter of course will have better 
performance, but you have to live with the possibility of losing some data. 

Given that the case is (1) and (2), it all depends on how much performance 
headroom your pool has, vs. the write performance of the SSD. 

A fast SSD ( e.g.: DRAM based, and preferably dedicated to Zil and not split) 
would work best. It does not have to be huge, just large enough to store (say) 
5 seconds worth of your planned peak data inflow. 

You need to be aware of a recent performance regression discovered pertaining 
to 
ZIL ( George Wilson has just posted the fix for review on the illumos dev list) 
This has been in Illumos for a while, so it is possible, that it is biting you. 

Steve 

- Original Message -
Hi, I've installed an SSD drive in my OI machine and have it partitioned 
(sliced) with a main slice to boot from and a smaller slice to use as a write 
cache (ZIL) for our data pool. 

I've noticed that for many tasks, using the ZIL actually slows many tasks at 
hand (operation within a qemu-kvm virtual machine, mysql loading importing a 
dump file, etc). I know I bought a cheap SSD to play with so I wasn't expected 
the best performance, but I would have expected some improvement, not a slow 
down. 

In one particular test, I have mysql running in a zone and loading a test data 
set takes about 40 seconds without the ZIL and about 60 seconds with ZIL. I 
certainly wasn't expecting a 50% slow down. 

Is this to be expected? 

Are there any best practices for testing an SSD to see if it will actually 
improve performance of a zfs pool? 


Thanks, 
Matt 


___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] ZFS stalls with oi_151?

2011-10-21 Thread Steve Gonczi
Are you running with dedup enabled?

If the box is still responsive, try to generate a thread stack listing e.g:
echo ::threadlist -v  mdb -k  /tmp/threads.txt

Steve

On Oct 21, 2011, at 4:16, Tommy Eriksen t...@rackhosting.com wrote:

 Hi guys,
 
 I've got a bit of a ZFS problem:
 All of a sudden, and it doesn't seem related to load or anything, the system 
 will stop writing to the disks in my storage pool. No error messages are 
 logged (that I can find anyway), nothing in dmesg, messages or the likes. 
 
 ZFS stalls, a simple snapshot command (or the likes) just hangs indefinitely 
 and can't be stopped with ctrl+c or kill -9.
 
 Today, the stall happened after I had been running 2 VMs on each (running on 
 vsphere5 connecting via iscsi) running iozone -s 200G (just to generate a 
 bunch of load). Happily, this morning, I saw that they were still running 
 without problem and stopped them. Then, when asking vSphere to delete the 
 VMs, all write I/O stalled. A bit too much irony for me :)
 
 However, and this puzzled me, everything else seems to run perfectly, even up 
 to zfs writing new data on the l2arc devices while data is read.
 
 Boxes (2 of the same) are:
 Supermicro based, 24 bay chassis
 2*X5645 Xeon
 48gigs of RAM
 3*LSI2008 controllers coupled to
 20 Seagate Constellation ES 3TB SATA
 2 Intel 600GB SSD
 2 Intel 311 20GB SSD
 
 18 of the 3TB drives are set up in mirrored vdevs, the last 2 are spares.
 
 Running oi_151a (trying a downgrade to 148 today, I think, since I have 5 or 
 so boxes running without problems on 148, but both my 151a are playing up).
 
 /etc/system variables:
 set zfs:zfs_vdev_max_pending = 4
 set zfs:l2arc_noprefetch = 0
 set zfs:zfs_vdev_cache_size = 0
 
 
 I can write to a (spare) disk on the same controller without errors, so I 
 take it its not a general I/O stall on the controller:
 root@zfsnas3:/var/adm# dd if=/dev/zero of=/dev/rdsk/c8t5000C50035DE14FAd0s0 
 bs=1M
 ^C1640+0 records in
 1640+0 records out
 1719664640 bytes (1.7 GB) copied, 11.131 s, 154 MB/s
 
 iostat reported - note no writes to any of the other drives. All writes just 
 stall.
 
   extended device statistics    errors --- 
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
 device
 3631.6  167.2 14505.5 152337.1  0.0  2.20.00.6   0 157   0   0   0   
 0 c8
 109.00.8  472.90.0  0.0  0.00.00.5   0   3   0   0   0   0 
 c8t5000C50035B922CCd0
 143.00.8  567.10.0  0.0  0.10.00.5   0   3   0   0   0   0 
 c8t5000C50035CA8A5Cd0
  89.60.8  414.10.0  0.0  0.10.00.6   0   2   0   0   0   0 
 c8t5000C50035CAB258d0
  95.80.8  443.30.0  0.0  0.00.00.5   0   2   0   0   0   0 
 c8t5000C50035DE3DEBd0
 144.80.8  626.40.0  0.0  0.10.00.6   0   4   0   0   0   0 
 c8t5000C50035BE1945d0
 134.00.8  505.70.0  0.0  0.00.00.4   0   3   0   0   0   0 
 c8t5000C50035DDB02Ed0
   1.00.43.40.0  0.0  0.00.00.0   0   0   0   0   0   0 
 c8t5000C50035DE0414d0
 107.80.8  461.60.0  0.0  0.00.00.3   0   2   0   0   0   0 
 c8t5000C50035D40D15d0
 117.20.8  516.50.0  0.0  0.10.00.5   0   3   0   0   0   0 
 c8t5000C50035DE0C86d0
  64.20.8  261.20.0  0.0  0.00.00.6   0   2   0   0   0   0 
 c8t5000C50035DD6044d0
   2.00.86.80.0  0.0  0.00.00.0   0   0   0   0   0   0 
 c8t5001517959582943d0
   2.00.86.80.0  0.0  0.00.00.0   0   0   0   0   0   0 
 c8t5001517959582691d0
 109.80.8  423.50.0  0.0  0.00.00.3   0   2   0   0   0   0 
 c8t5000C50035C13A6Bd0
 765.00.8 3070.90.0  0.0  0.20.00.2   0   7   0   0   0   0 
 c8t5001517959699FE0d0
   1.0  149.23.4 152337.1  0.0  1.00.06.5   0  97   0   0   0   0 
 c8t5000C50035DE14FAd0
 210.40.8  775.40.0  0.0  0.10.00.4   0   3   0   0   0   0 
 c8t5000C50035CA1E58d0
 689.40.8 2776.60.0  0.0  0.10.00.2   0   7   0   0   0   0 
 c8t50015179596A8717d0
 108.60.8  430.50.0  0.0  0.00.00.4   0   2   0   0   0   0 
 c8t5000C50035CBD12Ad0
 165.60.8  561.50.0  0.0  0.10.00.4   0   3   0   0   0   0 
 c8t5000C50035CA90DDd0
 164.40.8  578.50.0  0.0  0.10.00.4   0   4   0   0   0   0 
 c8t5000C50035DDFC34d0
 125.60.8  477.70.0  0.0  0.00.00.4   0   2   0   0   0   0 
 c8t5000C50035DE2AD3d0
  93.20.8  371.30.0  0.0  0.00.00.4   0   2   0   0   0   0 
 c8t5000C50035B94C40d0
 113.20.8  445.30.0  0.0  0.10.00.5   0   3   0   0   0   0 
 c8t5000C50035BA02AEd0
  75.40.8  304.80.0  0.0  0.00.00.4   0   2   0   0   0   0 
 c8t5000C50035DDA579d0
 
 
 …Is anyone else seeing similar?
 
 Thanks a lot,
 Tommy
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 

Re: [OpenIndiana-discuss] How to troubleshoot failing hardware causing hoot hangs

2011-09-14 Thread Steve Gonczi
Hello, 

Looking at the Hald source: ( usr/src/cmd/hal/hald /hald.c) 

Error 95 is coming from a script, ti is just informing you that a fatal error 
occurred. 
The informative error code is the 2. 

This tells you that hald forked a child process, and it timed out 
waiting for the child process to write to a pipe. 

The child process hung or failed for some reason, and the parent decided to 
kill it. The child code could hang for a number of reasons. 

One possible way to debug this is load mdb so that it breaks early in the boot, 
set a breakpoints on some of the processing steps, like 
hald_dbus_local_server_init , ospec_init ettc.. and similar processing steps to 
narrow down where it hangs. 

I see from the source that Hald has fairly detailed built-in logging that may 
help 
debugging this. 

If the environment variables HALD_VERBOSE and HALD_USE_SYSLOG are defined, 
you should get detailed status messages. 
There is probably a man page somewhere on how to set these. 

Said log settings can also be modified via hald command line options 
( Sorry, I have no idea what script or setup file you have to hack to 
specify these on startup): 
static void 210 usage () 211 { 212 fprintf ( stderr , \n usage : hald 
[--daemon=yes|no] [--verbose=yes|no] [--help]\n ); 213 fprintf ( stderr , 214 
\n 215 --daemon=yes|no  Become a daemon\n 216 
--verbose=yes|no Print out debug (overrides HALD_VERBOSE)\n 217 
--use-syslog Print out debug messages to syslog instead of stderr.\n 
218  Use this option to get debug messages if HAL 
runs as\n 219  daemon.\n 220 --help 
  Show this information and exit\n 221 --version
Output version information and exit 222 \n 223 The HAL daemon detects 
devices present in the system and provides the\n 224 org.freedesktop.Hal 
service through the system-wide message bus provided\n 225 by D-BUS.\n 


Steve 


- Original Message -
Hi, 
I'm about to RMA my motherboard but before that I want to troubleshoot 
the issue further so that I can give more specific information on what's 
failing on the motherboard. 

What happens is that some hardware is failing on the motherboard which 
causes OI to hang during boot. So my question is how can I find out what 
hardware is failing? The problem is that when I reset the system it 
boots up just fine after the reset and e.g. the svcs -xv gives no 
information on failures on last boot. These issues also don't happen 
every time I start up the system, it happens rather sporadically. 

Here's what I found out; when it freezes, the last lines of the console 
looks like this: 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] How to troubleshoot failing hardware causing hoot hangs

2011-09-14 Thread Steve Gonczi
Perhaps the focus should be amping up hald logging, so that if and when 
the problem happens you have some info to look at. 

The hald man page has examples on how to do this via svccfg. 

Steve 

- Original Message -
Hi Steve, thanks a lot for your help! 
The problem is that the issues that occur are different at different 
bootups. Since the beginning of this year this computer/server has been 
started up and shut down a bit over 200 times, where this error occurred 
5 times including today. 
---snip--- 

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] init 6/reboot reboots the OS but not the hardware

2011-08-04 Thread Steve Gonczi
You need remote reset or remote power cycle capability. 
An ILOM console, if your hardware has support for it 
would provide this. 

ILOM is common on most server class hardware 
( Sun servers certainly have it, so do Supermicro 
boards). 

Failing that, there are inexpensive remote power cycle 
power outlets you can buy. These allow power control of the 
individual power plugs remotely. 

Steve 

- Original Message -
-BEGIN PGP SIGNED MESSAGE- 
Hash: SHA1 

Hi, I just upgraded an aged OpenSolaris 2009.6 to OpenIndiana 148 and I 
have a silly but annoying issue. 

When doing init 6 or, even, reboot, the OS shutdowns and reboot 
inmediatelly, but the machine not actually reboots. 

That is, the system is shutdown and the kernel reboots inmediatelly, 
just like would do a zone reboot, but the machine doesn't powercycle, 
doesn't show the BIOS, doesn't show the POST, and, the really annoying 
thing in this case, doesn't show the GRUB menu. And this is critical to me. 

The machine is in a remote datacenter, but I have KVM access. What can I 
do?. 

My GRUB entry is: (just in case is something related to the way I launch OI) 

 
root@ns224064:~# cat /rpool/boot/grub/menu.lst 
default 2 
timeout 10 

title opensolaris-2 
bootfs rpool/ROOT/opensolaris-2 
kernel$ /platform/i86pc/kernel/$ISADIR/unix -v -B $ZFS-BOOTFS 
module$ /platform/i86pc/$ISADIR/boot_archive 
# End of LIBBE entry = 

title SOLARIS10 
rootnoverify (hd2,0) 
chainloader +1 

title OpenIndiana-148 
bootfs rpool/ROOT/OpenIndiana-148 
kernel$ /platform/i86pc/kernel/$ISADIR/unix -v -B $ZFS-BOOTFS 
module$ /platform/i86pc/$ISADIR/boot_archive 
# End of LIBBE entry = 
 

- -- 
Jesus Cea Avion _/_/ _/_/_/ _/_/_/ 
j...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ 
jabber / xmpp:j...@jabber.org _/_/ _/_/ _/_/_/_/_/ 
. _/_/ _/_/ _/_/ _/_/ _/_/ 
Things are not so easy _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ 
My name is Dump, Core Dump _/_/_/ _/_/_/ _/_/ _/_/ 
El amor es poner tu felicidad en la felicidad de otro - Leibniz 
-BEGIN PGP SIGNATURE- 
Version: GnuPG v1.4.10 (GNU/Linux) 
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ 

iQCVAwUBTjrnwJlgi5GaxT1NAQLoJAQAn26zsDa2nbKz8c1gmVFA/R4ODB0Y1SHf 
hY8k9f96PEABlyAo9gI5ggijFzAzGmNzlGwwJVgEwZllbcnBZjhFL2RA2zHLbstU 
4CvWUhVsKZuKPukIUs5136ezZLaxGmJ76UnaMo1xk8J+NNtGfrCCil/C8sBCmZYR 
Hlntc/HVDmc= 
=JsqL 
-END PGP SIGNATURE- 

___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Kernel Panic installing openindiana on a HP BL460c G1

2011-06-27 Thread Steve Gonczi
Hello, 

This should be analyzed and root caused. 
A ::stack woudl be useful to see the call parameters. 

Without disassembling zio_buf_alloc() I can only guess that 
the mutex_enter you see crashing is really in kmem_cache_alloc() 

If that proves to be the case, I would verify the offset of cc_lock in ccp, to 
see if 
ccp was null, or corrupt. 

Next step would be a trip to the kmem_cpu _cache related code, to see if 
KMEM_CPU_CACHE can return zero or a corrupt value in some cases. 

Again, my guess would be this is a NULL pointer dereference,. 

Does this system suffer from a severe out of memory condition? 

Best Wishes, 

Steve 



*/ 2486 void * 2487 kmem_cache_alloc ( kmem_cache_t * cp , int kmflag ) 2488 { 
2489 kmem_cpu_cache_t * ccp = KMEM_CPU_CACHE ( cp ); 2490 kmem_magazine_t * fmp 
; 2491 void * buf ; 2492 2493 mutex_enter ( ccp - cc_lock ); 


/sG/ 

- Original Message -
Now with the pictures, hope this works: 

http://imageshack.us/photo/my-images/402/oi151paniccdollarstatus.jpg/ 
http://img3.imageshack.us/i/oi151panicmsgbuf1.jpg/ 
http://img5.imageshack.us/i/oi151panicmsgbuf2.jpg/ 
http://img89.imageshack.us/i/oi151panicmsgbuf3.jpg/ 
http://img23.imageshack.us/i/oi151panicmsgbuf4.jpg/ 
http://img16.imageshack.us/i/oi151panicmsgbuf5.jpg/ 
http://img694.imageshack.us/i/oi151panicmsgbuf6.jpg/ 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] oracle removes 32bit x86 cpu support for solaris 11 will OI do same?

2011-06-24 Thread Steve Gonczi
For Intel CPUs, 32 bit code is certainly more compact , and in some cases 
arguably faster than 64 bit code. (say, comparing the same code on the same 
machine 
compiled 32 and 64 bit) 

But, newer cpu silicon tends to make performance improvements 
in many ways (e.g locating more supporting circuity on the cpu's silicon, 
increasing L1 /L2 
cache sizes, etc) 

Newer CPUs also tend to be more energy efficient. 
Intel made great strides towards energy efficiency. 
E.g.: idling the cpu when not in use ( deep C states etc. 
of gating off any circuitry that is not in use, modulating the cpu clock rate 
( SpeedStep). 

So performance and energy efficiency is more dependent on 
which generation of cpu core design we have, rather than on 
just the the bitness . 


The primary advantage of 64 bit per se ( ie running a given cpu in 64 bit 
mode) 
is the increased addressable memory space. 
The current hardware limit set by the manufacturers is at 48 address bits 
(256 terabytes theoretical limit) Actual OS support cuts this in half, or less. 
Motherboard limitations further curtail this, but 48G motherboards are now 
commonplace. 

On 32 bit Intel (Amd) you are typically limited to 4G, which is split between 
kernel and userland 
depending on the OS and configuration. (E.g.: 1G kernel and 3G userland) 

Steve 

- Michael Stapleton michael.staple...@techsologic.com wrote: 


While we are talking about 32 | 64 bit processes; 
Which one is better? 
Faster? 
More efficient? 

Mike 






___ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss@openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] PANIC vmem_hash_delete(): bad free all versions svn_111

2011-05-14 Thread Steve Gonczi
Hi Gabriel, 

The immediate cause of this panic is an attempt to free an 
address==null. 

The intersting part (how this comes about), is hard to figure out 
without more info. 

At the very minimum, a stack (and some amount of luck), 
ideally a crash dump would be necessary. 

This brings into focus another issue. The community would benefit 
from a server, where people could upload crash dumps in cases like this. 

I am sure there are several people reading this list, who may be able and 
inclined to take a quick look and provide a first cut diagnosis on a 
volunteer basis . 

Steve 

/sG/ 

- Gabriel de la Cruz gabriel.delac...@gmail.com wrote: 


Hi, 
could someone point me out what is going on here?, 
I have an IBM x3550 M3 panicking with any kernel version higher than 
svn_111... 
I tried upgrading to svn_134, installing svn_134 from live cd and with 
oi_148b live cd.. 
The live CDs panic as well. 

Any ideas? 
Thanks! :D 


___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] PANIC vmem_hash_delete(): bad free all versions svn_111

2011-05-14 Thread Steve Gonczi
man dumpadm.. 

You have to enable crash dumps, and select a location where you have some 
room to save them. When you just run dumpadm it will tell you 
what your current settings are. 

If you do not have crashdumps enabled, you may be able to save the last 
crashdump by running savecore /some/location/where/you/have/room 
as soon as possible after you come up. 

If you are unable to boot up (keep crashing) you could edit the grub 
menu entry for the current kernel and add substitute -k -d 
for console=graphic, to crash into the debugger and look around. 


/sG/ 

- 
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Monitoring OI with Zenoss

2011-01-28 Thread Steve Gonczi
Check out chime

-:::-sG-:::-

On Jan 27, 2011, at 23:28, WK openindi...@familyk.org wrote:

 I am experimenting with using Zenoss to monitor OI 148, and I was wondering 
 if anyone had any advice on configuring SNMP for this purpose? Zenoss is 
 showing the uptime, but not much more. On my Linux machines, it shows network 
 routers, file systems, load average, cpu utilization, memory utilization and 
 I/O. I'm just trying to get a similar display for OI.
 
 ___
 OpenIndiana-discuss mailing list
 OpenIndiana-discuss@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss