Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-27 Thread Wolfram Tomalla
Hi Andy!

And how many snapshots do these filesystems have? Or especially the volumes.
According to the messages I would expect that the box is creating
2 devices for 1 snapshots for your volume.

Just a guess.

Wolfram

2009/7/27 Andrew Turner andrew.tur...@spearpointsolutions.co.uk:
 Cleaning up the Thread..


 Export completes successfully in 30s or so. Re-Importing now, same messages 
 in the log and it's been running 1h43m and counting.

 How many filesystems are in the pool?

 --
 Ian.



 Hi Ian

 There's two zfs filesystems, one of 2.2T shared via CIFS and one of 500G 
 which is shared by ISCSI, used for Timemachine and thus reformated into UFS+.

 Regards

 Andy
 --
 This message posted from opensolaris.org
 ___
 opensolaris-discuss mailing list
 opensolaris-discuss@opensolaris.org

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-27 Thread Jürgen Keil
 Pulling the messages from /var/adm/messages it boots normally upto
 
 Jul 26 17:49:03 Moria genunix: [ID 936769 kern.info] lx_systrace0 is 
 /pseudo/lx_systr...@0
 Jul 26 17:49:03 Moria pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) 
 instance 1 irq 0xf vector 0x41 ioapic 0x4 intin 0xf is bound to cpu 0

What p-ata devices are connected to this system?

Where are the 4x1TB drives connected?  p-ata? or nv_sata?

Maybe there is a p-ata device connected to the ide controller
that is using IRQ14, but somehow the system is trying to
probe for p-ata ide devices on IRQ15 - but that secondary
ide channel does not exist?
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-27 Thread Andrew Turner
Hi Wolfram,

According to the bug, and re-reading the messages, the problem is actually to 
do with the way the box is trying to initialize the IDE 1 which doesn't have 
any equipment attached (cdrom is on 0 and intializes cleanly, everything else 
is SATA).

Of course why it does that, and why it's related to SATA drives is a question 
for the devs! :-)

Andy
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-27 Thread Andrew Turner
Yup it certainly appears that way!  Other drives are on SATA.  Check bug 9909 
for details.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


[osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
I was wondering if anyone might be able to give me some pointers on 
troubleshooting extremely long startup times (2.5hrs).  This is a problem that 
has only manifest itself in 2009.06 and was not present in either 2008.05 or 
2008.11. The hardware configuration has not changed.

The issue is definitely related to the 4x1TB Raid-z array on the box, when I 
disconnect the drives the boot time is very snappy.  With them connected it 
sits there for 2.5hrs seemingly doing nothing.  The box is pingable but not 
fully up (ssh etc are refused).  Looking in /var/adm/messages doesn't give a 
lot of help, with the only message that an IRQ is being assigned to CPU 0 and 
CPU 1 alternatively.  I'll add an exact snippet once the box has booted as it's 
in this state atm.

I've looked to see whether this is an IRQ conflict, and have enabled MSI 
interrupts on the NGE driver to remove the major conflicts but without any 
change.  Any pointers would be appreciated!

Thanks!
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Ian Collins

Andrew Turner wrote:

I was wondering if anyone might be able to give me some pointers on 
troubleshooting extremely long startup times (2.5hrs).  This is a problem that 
has only manifest itself in 2009.06 and was not present in either 2008.05 or 
2008.11. The hardware configuration has not changed.

The issue is definitely related to the 4x1TB Raid-z array on the box, when I disconnect the drives the boot time is very snappy.  


What happens if you export the pool (but leave the drives connected) 
before rebooting?


--
Ian.

___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
Thanks, I'll try that later tho I expect it to boot cleanly, as I have tried 
rebuilding a clean version of 2009.06 in the past and then imported the pool.  
Pre-Import it was snappy, post-import terrible. For the record, I upgraded the 
pool to v14 when I moved to 2009.06.

Pulling the messages from /var/adm/messages it boots normally upto

Jul 26 17:49:03 Moria genunix: [ID 936769 kern.info] lx_systrace0 is 
/pseudo/lx_systr...@0
Jul 26 17:49:03 Moria pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) 
instance 1 irq 0xf vector 0x41 ioapic 0x4 intin 0xf is bound to cpu 0
Jul 26 17:49:14 Moria pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) 
instance 1 irq 0xf vector 0x41 ioapic 0x4 intin 0xf is bound to cpu 1


and then a few thousand messages and 3 hours later..


Jul 26 20:54:34 Moria pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) 
instance 1 irq 0xf vector 0x41 ioapic 0x4 intin 0xf is bound to cpu 1
Jul 26 20:54:41 Moria pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) 
instance 1 irq 0xf vector 0x41 ioapic 0x4 intin 0xf is bound to cpu 0
Jul 26 20:54:49 Moria genunix: [ID 454863 kern.info] dump on 
/dev/zvol/dsk/rpool/dump size 2047 MB
Jul 26 20:54:53 Moria /usr/lib/power/powerd: [ID 387247 daemon.error] Able to 
open /dev/srn


Current IRQ's are

echo ::interrupts -d | mdb -k
IRQ  Vect IPL BusTrg Type   CPU Share APIC/INT# Driver Name(s) 
40xb0 12  ISAEdg Fixed  1   1 0x0/0x4   asy#0
90x81 9   PCILvl Fixed  1   1 0x0/0x9   acpi_wrapper_isr
14   0x40 5   ISAEdg Fixed  0   1 0x0/0xe   ata#0
16   0x83 9   PCILvl Fixed  0   1 0x0/0x10  nvidia#0
20   0x85 9   PCILvl Fixed  0   2 0x0/0x14  ohci#0, nv_sata#0
21   0x42 5   PCILvl Fixed  1   1 0x0/0x15  nv_sata#1
22   0x43 5   PCILvl Fixed  0   1 0x0/0x16  nv_sata#2
23   0x84 9   PCILvl Fixed  0   1 0x0/0x17  ehci#0
24   0x82 7   PCIEdg MSI1   1 - pcie_pci#1
25   0x60 6   PCIEdg MSI1   1 - nge#0
26   0x61 6   PCIEdg MSI1   1 - nge#0
27   0x62 6   PCIEdg MSI0   1 - nge#1
28   0x63 6   PCIEdg MSI0   1 - nge#1
160  0xa0 0  Edg IPIall 0 - poke_cpu
192  0xc0 13 Edg IPIall 1 - xc_serv
208  0xd0 14 Edg IPIall 1 - kcpc_hw_overflow_intr
209  0xd1 14 Edg IPIall 1 - cbe_fire
210  0xd3 14 Edg IPIall 1 - cbe_fire
240  0xe0 15 Edg IPIall 1 - xc_serv
241  0xe1 15 Edg IPIall 1 - apic_error_intr


Cheers!
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
Export completes successfully in 30s or so.  Re-Importing now, same messages in 
the log and it's been running 1h43m and counting.
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Ian Collins
On Mon 27/07/09 02:38 , Andrew Turner andrew.tur...@spearpointsolutions.co.uk 
sent:

 Export completes successfully in 30s or so.  Re-Importing now, same messages 
 in the log and it's been running 1h43m and counting.

How many filesystems are in the pool?

-- 
Ian.
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
Cleaning up the Thread..


 Export completes successfully in 30s or so. Re-Importing now, same messages 
 in the log and it's been running 1h43m and counting.

How many filesystems are in the pool?

-- 
Ian.



Hi Ian

There's two zfs filesystems, one of 2.2T shared via CIFS and one of 500G which 
is shared by ISCSI, used for Timemachine and thus reformated into UFS+.

Regards

Andy
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
Ok this is bug 9909/bugster 6863859.  Very frustrating!  

Thanks for the help anyway!
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org


Re: [osol-discuss] Long Startup Times (2.5hrs) in 2009.06

2009-07-26 Thread Andrew Turner
Ian, as this is orphaned from the main thread, can you delete the post?

Thanks!
-- 
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org