Re: [OmniOS-discuss] ami instance upgrade problems

2018-07-25 Thread Al Slater
Hi,

The problem is caused, I think, by device numbering differences in xen
versus ec2.

I solved this problem when creating my AMIs by the following procedure.

1.  Create an extra volume in EC2 with the same size as the instance
boot volume.

2.  Attach the extra volume to the instance.

3.  zpool attach the extra volume (c1t1d0?)
   zpool attach rpool c4t0d0 c1t1d0

4.  zpool detach the original volume with incorrect name (c4t0d0)
   zpool detach rpool c4t0d0

5.  zpool attach the original volume with the proper name (c1t0d0)
   zpool attach rpool c1t1d0 c0t0d0

6.  zpool detach the extra volume.
   zpool detach rpool c1t1d0

7.  Detach the extra volume from the instance and delete it.

Double check all disk names in your instances first!

regards

Al

On 20/07/18 01:46, PÁSZTOR György wrote:
> Hi,
> 
> I'm learning amazon, so I thought it would be nice, if I'd play with an
> omnios inside my free tier experiments instead of linux.
> I installed omnios from the "official" source: ami-0169c5108d1bdfd57
> (Yes, I choose Ohio instead of N. Virginia to play at)
> 
> One small sidenote: ipv6 is not enabled in the official image, however I
> configured so for the instance. After install I manually run this:
> ipadm create-addr -T addrconf xnf0/v6
> Well, it solved everything. It seems way more simpleer then toying with
> linux.
> 
> But... I can not update my instance.
> pkg update created the omnios-1 be, but I can not activate it.
> root@ip-172-31-28-110:~# beadm list
> BE   Active Mountpoint Space Policy Created
> omnios   NR /  801M  static 2018-05-04 18:52
> omnios-1 -  /tmp/tmpTvggQP 158M  static 2018-07-20 00:15
> root@ip-172-31-28-110:~# beadm activate -v omnios-1
> be_do_installboot: device c4t0d0
> be_do_installboot: install failed for device c4t0d0.
>   Command: "/usr/sbin/installboot -m -f /tmp/tmpTvggQP/boot/pmbr 
> /tmp/tmpTvggQP/boot/gptzfsboot /dev/rdsk/c4t0d0s0"
>   Errors:
> open: No such file or directory
> Unable to open device /dev/rdsk/c4t0d0s0
> be_run_cmd: command terminated with error status: 1
> Unable to activate omnios-1.
> Error installing boot files.
> root@ip-172-31-28-110:~# ls -l /dev/rdsk/*t0d0s0
> lrwxrwxrwx   1 root root  34 Jul 18 22:07 /dev/rdsk/c1t0d0s0 -> 
> ../../devices/xpvd/xdf@51712:a,raw
> lrwxrwxrwx   1 root root   8 Jul 20 00:21 /dev/rdsk/c4t0d0s0 -> 
> c1t0d0s0
> root@ip-172-31-28-110:~# /usr/sbin/installboot -m -f /tmp/tmpTvggQP/boot/pmbr 
> /tmp/tmpTvggQP/boot/gptzfsboot /dev/rdsk/c1t0d0s0 
> bootblock version installed on /dev/rdsk/c1t0d0s0 is more recent or identical
> Use -F to override or install without the -u option
> 
> root@ip-172-31-28-110:~# zpool status -v
>   pool: syspool
>  state: ONLINE
>   scan: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> syspool ONLINE   0 0 0
>   c4t0d0ONLINE   0 0 0
> 
> errors: No known data errors
> root@ip-172-31-28-110:~# echo | format
> Searching for disks...done
> 
> 
> AVAILABLE DISK SELECTIONS:
>0. c1t0d0 
>   /xpvd/xdf@51712
> Specify disk (enter its number): Specify disk (enter its number): 
> 
> 
> Well, I'm stucked at this point. I don't know how could I fix these.
> I assume, the problem is, somewhere around the c4 vs c1 numbering, so it
> try to open the wrong device.
> 
> Note.: So let's just assume, it should work, without running the installboot
> command.
> 
> root@ip-172-31-28-110:~# cd /usr/sbin
> root@ip-172-31-28-110:/usr/sbin# mv installboot installboot.save
> root@ip-172-31-28-110:/usr/sbin# ln -s ../bin/true installboot
> root@ip-172-31-28-110:/usr/sbin# beadm activate -v omnios-1
> be_do_installboot: device c4t0d0
>   Command: "/usr/sbin/installboot -m -f /tmp/tmpTvggQP/boot/pmbr 
> /tmp/tmpTvggQP/boot/gptzfsboot /dev/rdsk/c4t0d0s0"
> Activated successfully
> root@ip-172-31-28-110:/usr/sbin# reboot
> OmniOS 5.11 omnios-r151026-b6848f4455   June 2018
> root@ip-172-31-28-110:~# beadm list
> BE   Active Mountpoint Space Policy Created
> omnios   -  -  3.66M static 2018-05-04 18:52
> omnios-1 NR /  1.01G static 2018-07-20 00:15
> 
> Will. It's a very dirty hack. Is there a nicer way to fix this c4 vs c1
> thing?
> Btw.: it seems installboot would give back false, even if it could open the
> device, because it already has the same version of boot block. Shouldn't
> that circumstance checked on behalf of beadm?
> 
> Cheers,
> gyu
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] xnf panic

2018-04-09 Thread Al Slater
Hi Andy,

OK, I booted a t2.micro instance from my AMI (Original OmniOS CE r151022
HVM).  Updated pkg and then updated to r151022as.

Then shut the instance down, changed the type to m4.large, then started
the instance.  A panic/reboot loop ensued.  Stopped the instance.

Changed the instance back to t2.micro and booted.  Applied the hotfix
and rebooted.  Rebooted fine.

Stopped the instance, changed instance type to m4.large, and started the
instance.  The instance started and I could log in just fine.

To be sure, I stopped and restarted the instance a few times. It always
came back up ok.

So, the patch looks good to me.  If you can think of any more tests you
would like me to run, I'll give them a go for you.

Thank you for the speedy fix.

regards

Al


On 09/04/18 18:05, Al Slater wrote:
> Hi Andy,
> 
> Wow, that was fast.  I will test this when I have fed the family.
> 
> I have my own AMI, so I can spin a server as a T2 instance, apply the
> fix and then restart after changing to an M4.
> 
> regards
> 
> Al
> 
> On 09/04/18 16:04, Andy Fiddaman wrote:
>>
>> Al,
>>
>> I've prepared a hot-fix containing this driver update if you'd like to
>> test. I tried to test it myself on AWS but can't provision any type of
>> machine from that unofficial CE r151022 AMI in London. Just snapshot the
>> EBS volume first in case of problems.
>>
>> To install:
>>
>>   pkg apply-hot-fix https://downloads.omniosce.org/pkg/r151022/7186-xnf.p5p
>>
>> which will create a new boot-environment, and reboot.
>>
>> Assuming it looks ok, this update will be part of next Monday's release.
>>
>> Regards,
>>
>> Andy
>>
>> On Mon, 9 Apr 2018, Al Slater wrote:
>>
>> ; Hi,
>> ;
>> ; Has the fix for 7186 (xnf: panic on Xen 4.x) been integrated into
>> ; r151022 since the initial CE release?
>> ;
>> ; I have an instance in AWS that they required me to stop and start again
>> ; due to host patching.  When I started it again the instance went into a
>> ; panic/reboot loop.  The stack dump looked similar to the one in the
>> ; error report.
>> ;
>> ; I managed to get the instance started by changing the instance type from
>> ; m4.large to t2.large.  Presumably AWS are migrating towards Xen versions
>> ; > 4 in london region.  I don't know how long until the t2 hosts are 
>> updated.
>> ;
>> ; regards
>> ;
>> ;
>>
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] xnf panic

2018-04-09 Thread Al Slater
Hi Andy,

Wow, that was fast.  I will test this when I have fed the family.

I have my own AMI, so I can spin a server as a T2 instance, apply the
fix and then restart after changing to an M4.

regards

Al

On 09/04/18 16:04, Andy Fiddaman wrote:
> 
> Al,
> 
> I've prepared a hot-fix containing this driver update if you'd like to
> test. I tried to test it myself on AWS but can't provision any type of
> machine from that unofficial CE r151022 AMI in London. Just snapshot the
> EBS volume first in case of problems.
> 
> To install:
> 
>   pkg apply-hot-fix https://downloads.omniosce.org/pkg/r151022/7186-xnf.p5p
> 
> which will create a new boot-environment, and reboot.
> 
> Assuming it looks ok, this update will be part of next Monday's release.
> 
> Regards,
> 
> Andy
> 
> On Mon, 9 Apr 2018, Al Slater wrote:
> 
> ; Hi,
> ;
> ; Has the fix for 7186 (xnf: panic on Xen 4.x) been integrated into
> ; r151022 since the initial CE release?
> ;
> ; I have an instance in AWS that they required me to stop and start again
> ; due to host patching.  When I started it again the instance went into a
> ; panic/reboot loop.  The stack dump looked similar to the one in the
> ; error report.
> ;
> ; I managed to get the instance started by changing the instance type from
> ; m4.large to t2.large.  Presumably AWS are migrating towards Xen versions
> ; > 4 in london region.  I don't know how long until the t2 hosts are updated.
> ;
> ; regards
> ;
> ;
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] xnf panic

2018-04-09 Thread Al Slater
On 09/04/2018 08:49, Al Slater wrote:
> Has the fix for 7186 (xnf: panic on Xen 4.x) been integrated into
> r151022 since the initial CE release?
> 
> I have an instance in AWS that they required me to stop and start again
> due to host patching.  When I started it again the instance went into a
> panic/reboot loop.  The stack dump looked similar to the one in the
> error report.
> 
> I managed to get the instance started by changing the instance type from
> m4.large to t2.large.  Presumably AWS are migrating towards Xen versions
>> 4 in london region.  I don't know how long until the t2 hosts are updated.

Repeatable on any m4 instance type I tried.

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55
panic[cpu0]/thread=ff000f4e6c40: BAD TRAP: type=e (#pf Page fault)
rp=ff000f4e69b0 addr=40 occurred in module "xnf" due to a NULL
pointer dereference

sched: #pf Page fault

Bad kernel fault at addr=0x40

pid=0, pc=0xf79b6e67, sp=0xff000f4e6aa0, eflags=0x10206

cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
1406b8<smep,osxsav,xmme,fxsr,pge,pae,pse,de>


cr2: 40cr3: c40cr8: c
rdi:  286 rsi:6 rdx:c
rcx: ff03d5dbf064  r8:0  r9:0
rax:  150 rbx:3 rbp: ff000f4e6af0
r10:0 r11: fb800983 r12: ff03d5d9
r13:0 r14:   15 r15:9
fsb:0 gsb: fbc397e0  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:0 rip: f79b6e67
 cs:   30 rfl:10206 rsp: ff000f4e6aa0
 ss:   38

Warning - stack not written to the dump buffer
ff000f4e6890 unix:die+df ()
ff000f4e69a0 unix:trap+e18 ()
ff000f4e69b0 unix:cmntrap+e6 ()
ff000f4e6af0 xnf:xnf_tx_clean_ring+c7 ()
ff000f4e6b60 xnf:tx_slots_get+95 ()
ff000f4e6ba0 xnf:xnf_intr+15b ()
ff000f4e6bf0 unix:av_dispatch_softvect+78 ()
ff000f4e6c20 unix:dispatch_softint+39 ()
ff000f635460 unix:switch_sp_and_call+13 ()
ff000f6354a0 unix:dosoftint+44 ()
ff000f635500 unix:do_interrupt+ba ()
ff000f635510 unix:cmnint+ba ()
fffff7c6aec0 sha2:SHA256TransformBlocks+109f ()


-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] xnf panic

2018-04-09 Thread Al Slater
Hi,

Has the fix for 7186 (xnf: panic on Xen 4.x) been integrated into
r151022 since the initial CE release?

I have an instance in AWS that they required me to stop and start again
due to host patching.  When I started it again the instance went into a
panic/reboot loop.  The stack dump looked similar to the one in the
error report.

I managed to get the instance started by changing the instance type from
m4.large to t2.large.  Presumably AWS are migrating towards Xen versions
> 4 in london region.  I don't know how long until the t2 hosts are updated.

regards

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] sudo update

2017-11-23 Thread Al Slater
Hi Andy,

On 23/11/17 10:40, Andy Fiddaman wrote:
> 
> On Thu, 23 Nov 2017, Al Slater wrote:
> 
> ; Hi,
> ;
> ; I have just updated a number of my omniosce boxes to r151022y, bringing
> ; in the sudo updates in r151022u.
> ;
> ; All my machines have BSM auditing enabled, and now I am seeing the
> ; following when using sudo
> ;
> ; sudo: au_preselect: Bad file number
> 
> Hi, this is something we specifically tested along with the sudo update
> since auditing was an area that changed quite a bit. Could you please check
> that all of your packages are up-to-date (particularly SUNWcs) and that the
> output of the following commands matches on your system?
> 
> r151022% auditrecord -e AUE_sudo
> 
> sudo
>   program sudo See sudo(1m)
>   event ID6650 AUE_sudo
>   class   lo,ua,as (0x00061000)
>   header
>   subject
>   exec_arguments   command args
>   [text]   error message (failure only)
>   return
> 
> r151022% grep sudo /etc/security/audit_event /usr/lib/audit/audit_record_attr
> /etc/security/audit_event:# sudo event
> /etc/security/audit_event:6650:AUE_sudo:sudo(1m):lo,ua,as
> /usr/lib/audit/audit_record_attr:label=AUE_sudo
> 
> If the problem persists, please post the audit configuration that you're
> using so we can try and replicate (auditconfig -getflags)

Ok, I can see the issue.

The upgrade installed a audit_event.new into /etc/security, but it was
not merged into our modified audit_event.

I can see what I need to do to fix this now.  Thank you for the pointers.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] sudo update

2017-11-22 Thread Al Slater
Hi,

I have just updated a number of my omniosce boxes to r151022y, bringing
in the sudo updates in r151022u.

All my machines have BSM auditing enabled, and now I am seeing the
following when using sudo

sudo: au_preselect: Bad file number


regards

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Problem updating OmniOS machines

2017-08-23 Thread Al Slater
Does anyone have any ideas what the cause is here, or how to debug it?

On 21/08/2017 11:22, Al Slater wrote:
> I have an number of omnios boxes running r151022, all upgraded from
> r151014.  Currently uname says omnios-r151022-f9693432c2
> 
> All but one of them are failing to update.  The process just stops after
> "Downloading linked" for each zone with no error message, but with a
> return code of 1.   Each machine said I had to upgrade pkg first, which
> was done.
> 
> 
> aslate-admin@mars:/export/home/aslate-admin$ sudo pkg update -r
> Packages to update: 123
>Create boot environment: Yes
> Create backup boot environment:  No
> 
> Planning linked: 0/8 done; 1 working: zone:qa-redis1
> Linked image 'zone:qa-redis1' output:
> | Packages to update: 14
> `
> Planning linked: 1/8 done; 1 working: zone:qa-redis3
> Linked image 'zone:qa-redis3' output:
> | Packages to update: 14
> `
> Planning linked: 2/8 done; 1 working: zone:qa-seclb1
> Linked image 'zone:qa-seclb1' output:
> | Packages to update: 14
> `
> Planning linked: 3/8 done; 1 working: zone:pg-ugweb01
> Linked image 'zone:pg-ugweb01' output:
> | Packages to update: 14
> `
> Planning linked: 4/8 done; 1 working: zone:qa-b2cweb05
> Linked image 'zone:qa-b2cweb05' output:
> | Packages to update: 14
> `
> Planning linked: 5/8 done; 1 working: zone:base
> Linked image 'zone:base' output:
> | Packages to update: 14
> `
> Planning linked: 6/8 done; 1 working: zone:qa-lb1
> Linked image 'zone:qa-lb1' output:
> | Packages to update: 14
> `
> Planning linked: 7/8 done; 1 working: zone:qa-tseclb1
> Linked image 'zone:qa-tseclb1' output:
> | Packages to update: 14
> `
> Planning linked: 8/8 done
> DOWNLOADPKGS FILESXFER (MB)
>  SPEED
> Completed123/123 3553/355379.6/79.6
>   0B/s
> 
> Downloading linked: 0/8 done; 1 working: zone:qa-redis1
> Downloading linked: 1/8 done; 1 working: zone:qa-redis3
> Downloading linked: 2/8 done; 1 working: zone:qa-seclb1
> Downloading linked: 3/8 done; 1 working: zone:pg-ugweb01
> Downloading linked: 4/8 done; 1 working: zone:qa-b2cweb05
> Downloading linked: 5/8 done; 1 working: zone:base
> Downloading linked: 6/8 done; 1 working: zone:qa-lb1
> Downloading linked: 7/8 done; 1 working: zone:qa-tseclb1
> Linked progress: /aslate-admin@mars:/export/home/aslate-admin$ echo $?
> 1
> 
> 
> Running with -v doesn't give any hints.
> 
> The machines are updating from my own pkg repo, which is kept in sync
> with the omniosce repo.
> 
> Any ideas what is wrong?
> 


-- 
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Problem updating OmniOS machines

2017-08-21 Thread Al Slater
I have an number of omnios boxes running r151022, all upgraded from
r151014.  Currently uname says omnios-r151022-f9693432c2

All but one of them are failing to update.  The process just stops after
"Downloading linked" for each zone with no error message, but with a
return code of 1.   Each machine said I had to upgrade pkg first, which
was done.


aslate-admin@mars:/export/home/aslate-admin$ sudo pkg update -r
Packages to update: 123
   Create boot environment: Yes
Create backup boot environment:  No

Planning linked: 0/8 done; 1 working: zone:qa-redis1
Linked image 'zone:qa-redis1' output:
| Packages to update: 14
`
Planning linked: 1/8 done; 1 working: zone:qa-redis3
Linked image 'zone:qa-redis3' output:
| Packages to update: 14
`
Planning linked: 2/8 done; 1 working: zone:qa-seclb1
Linked image 'zone:qa-seclb1' output:
| Packages to update: 14
`
Planning linked: 3/8 done; 1 working: zone:pg-ugweb01
Linked image 'zone:pg-ugweb01' output:
| Packages to update: 14
`
Planning linked: 4/8 done; 1 working: zone:qa-b2cweb05
Linked image 'zone:qa-b2cweb05' output:
| Packages to update: 14
`
Planning linked: 5/8 done; 1 working: zone:base
Linked image 'zone:base' output:
| Packages to update: 14
`
Planning linked: 6/8 done; 1 working: zone:qa-lb1
Linked image 'zone:qa-lb1' output:
| Packages to update: 14
`
Planning linked: 7/8 done; 1 working: zone:qa-tseclb1
Linked image 'zone:qa-tseclb1' output:
| Packages to update: 14
`
Planning linked: 8/8 done
DOWNLOADPKGS FILESXFER (MB)
 SPEED
Completed123/123 3553/355379.6/79.6
  0B/s

Downloading linked: 0/8 done; 1 working: zone:qa-redis1
Downloading linked: 1/8 done; 1 working: zone:qa-redis3
Downloading linked: 2/8 done; 1 working: zone:qa-seclb1
Downloading linked: 3/8 done; 1 working: zone:pg-ugweb01
Downloading linked: 4/8 done; 1 working: zone:qa-b2cweb05
Downloading linked: 5/8 done; 1 working: zone:base
Downloading linked: 6/8 done; 1 working: zone:qa-lb1
Downloading linked: 7/8 done; 1 working: zone:qa-tseclb1
Linked progress: /aslate-admin@mars:/export/home/aslate-admin$ echo $?
1


Running with -v doesn't give any hints.

The machines are updating from my own pkg repo, which is kept in sync
with the omniosce repo.

Any ideas what is wrong?

-- 
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] initialboot

2017-07-31 Thread Al Slater
Hi,

Sorry, I clearly didn't explain well enough.

Normally, initial-boot is enabled in the iso/kayak image that is
installed.  When it is started on the first boot it runs /.initialboot
and then disables itself.

I was thinking that if there was some way to "re-enable" initial-boot,
then I could drop a /.initialboot script, re-enable initial-boot and
then shutdown before creating new AMI, such that when an instance based
upon the new AMI was launched, it would run .initialboot.

The problem is, enabling initial-boot immediately runs the .initialboot
script and then disables itself.  So, I hoped there was a way to enable
the service such that it did not immediately enable, but was enabled so
it would start after the next reboot.

Al

On 31/07/17 21:49, PÁSZTOR György wrote:
> Hi,
> 
> I hope you don't mind, but I started a new thread with this, since it seems
> a completly new topic.
> 
> "Al Slater" <al.sla...@scluk.com> írta 2017-07-31 21:05-kor:
>> One more question though, is there any way to enable an SMF service for
>> the next reboot, but not immediately.  Specifically, I want to enable
>> the initial-boot service with a .initialboot file in place, then create
>> a new AMI.
> 
> I don't completely understand. You want to enable initialboot after the
> boot was complete, and only after a certain amount of time?
> I'm not sure, what this initialboot exactly does, but it seems not a simple
> service, it's a milestone. Maybe, I would not mess with it.
> Otherwise, if I need a delay between the service and the boot, and it's
> important to remain "disabled" while it's not enabled:
> Create an @reboot cronjob. I don't remember which cron implementation is
> the default. On linux's vixie's cron the time can be @reboot.
> 
>> I wist to use .initialboot to grab the instance configuration from
>> amazon (hostname, root keys etc) and configure appropriately when the
>> new instance starts.
> 
> Again: I don't completely understand your scenario.
> You created one ami, and you want to "close it back", and clone it several
> times, so after it's first reboot, it should do the initalboot steps?
> Why do you want to wait?
> What I just found about the /.initialboot, it's a simple shell script.
> If you need to wait here, why not just put a sleep command into the
> beginning of the script?
> Or if you have to wait for some specific resource: Why don't poll it once
> per every 5 sec or so?
> 
> Cheers,
> Gyu
> 



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Omios, hvm and AWS

2017-07-31 Thread Al Slater
On 31/07/17 21:30, Eric Sproul wrote:
> On Mon, Jul 31, 2017 at 4:05 PM, Al Slater <al.sla...@scluk.com> wrote:
>> One more question though, is there any way to enable an SMF service for
>> the next reboot, but not immediately.  Specifically, I want to enable
>> the initial-boot service with a .initialboot file in place, then create
>> a new AMI.
>>
>> I wist to use .initialboot to grab the instance configuration from
>> amazon (hostname, root keys etc) and configure appropriately when the
>> new instance starts.
> 
> Hi Al,
> The initial-boot service isn't really suitable for this sort of thing.
> You might want to check out
> pkg://omnios/system/management/ec2-credential which specifically
> handles setting up the credentials at first boot.  That could be
> trivially extended[1] to set the system hostname and probably any
> other "standard" thing that operators want.
> 
> Eric
> 
> [1] 
> https://github.com/omniosorg/omnios-build/blob/master/build/ec2-credential/files/install-ec2-credential

Thanks for the pointer Eric.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Omios, hvm and AWS

2017-07-31 Thread Al Slater
On 31/07/2017 11:39, Peter Tribble wrote:
> 
> 
> On Mon, Jul 31, 2017 at 11:09 AM, Al Slater <al.sla...@scluk.com
> <mailto:al.sla...@scluk.com>> wrote:
> 
> On 31/07/2017 11:07, Al Slater wrote:
> > On 30/07/2017 20:15, Peter Tribble wrote:
> >> > The following should get you going:
> >> >
> >> > 
> https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/
> 
> <https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/>
> >> 
> <https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/
> 
> <https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/>>
> >
> > OK, I followed the above procedure and have produced an AMI.
> >
> > When I create an instance and try to boot it, I get the following in the
> > system log:
> 
> SunOS Release 5.11 Version omnios-r151022-f9693432c2 64-bit
> 
> Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights
> reserved.
> 
> NOTICE: Cannot read the pool label from '/xpvd/xdf@51728:a'
> NOTICE: spa_import_rootpool: error 5
> 
> Cannot mount root on /xpvd/xdf@51728:a fstype zfs
> panic[cpu0]/thread=fbc38560: vfs_mountroot: cannot mount root
> Warning - stack not written to the dump buffer
> fbc7ad70 genunix:vfs_mountroot+39b ()
> fbc7adb0 genunix:main+138 ()
> fbc7adc0 unix:_locore_start+90 ()
> 
> 
> How can I fix this?
> 
> 
> You're likely the first person down this path.
> 
> Generically, this means that the device paths embedded in the pool
> don't match those provided by the "hardware" you're booting on.
> 
> So the system thinks it should have a disk at /xpvd/xdf@51728:a
> 
> On my instance, I have:
> 
> /dev/rdsk/c2t0d0s0 -> ../../devices/xpvd/xdf@51712:a,raw
> 
> In other words, 51712 not 51728.
> 
> For this to work, you have to set up your xen instance to exactly mirror
> what EC2 provides. Somehow it's gotten mixed up. In your configuration,
> did you use xvda? I think 51728 is what you get if you use xvdb for the
> disk,
> which won't work. I had:
> 
> disk=[  'file:/home/ptribble/iso/tribblix-0m20.1.iso,hdb:cdrom,r',
> 'file:/root/ami-template.img,xvda,w' ]
> 

Thanks Peter,  I see what happened...

I started off with the instructions from
https://wiki.openindiana.org/oi/Creating+OpenIndiana+EC2+image

Then changed to following the instructions at
https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/

while neglecting to change the disks line in my xen config.

Oh well, starting again...

-- 
Al Slater



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Omios, hvm and AWS

2017-07-31 Thread Al Slater
On 31/07/2017 11:07, Al Slater wrote:
> On 30/07/2017 20:15, Peter Tribble wrote:
>> > The following should get you going:
>> >
>> > 
>> https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/
>> 
>> <https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/>
> 
> OK, I followed the above procedure and have produced an AMI.
> 
> When I create an instance and try to boot it, I get the following in the
> system log:

SunOS Release 5.11 Version omnios-r151022-f9693432c2 64-bit

Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.

NOTICE: Cannot read the pool label from '/xpvd/xdf@51728:a'
NOTICE: spa_import_rootpool: error 5

Cannot mount root on /xpvd/xdf@51728:a fstype zfs
panic[cpu0]/thread=fbc38560: vfs_mountroot: cannot mount root
Warning - stack not written to the dump buffer
fbc7ad70 genunix:vfs_mountroot+39b ()
fbc7adb0 genunix:main+138 ()
fffffbc7adc0 unix:_locore_start+90 ()


How can I fix this?

-- 
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Omios, hvm and AWS

2017-07-29 Thread Al Slater
Hi Peter,

On 28/07/17 22:37, Peter Tribble wrote:
> I wish to run up a number of OmniOS instances in AWS.
> 
> The current OmniOS AMIs in AWS seem to use pv virtualization, precluding
> their use on the t2 and m4 instance types that I want to use.
> 
> 
> Worse; newer regions only support hvm. In my case, this rules out London.

That is precisely where I want to run my instances.

> So, I thought I would try to produce my own AMI with hvm virtualization.
> 
> I am looking to use omniosce r151022, is this likely to work at all?
> 
> I have read https://omnios.omniti.com/wiki.php/Ec2Ami
> <https://omnios.omniti.com/wiki.php/Ec2Ami>, does anyone know
> how that procedure would be amended to cater for loader/hvm instead of
> pv-grub?
> 
>  
> The following should get you going:
> 
> https://www.prakashsurya.com/post/2017-02-06-creating-a-custom-amazon-ec2-ami-from-iso/

That looks very helpful, thank you for the link.

> Essentially, if you install any illumos distro you can send the disk
> image up
> to AWS and create an AMI. If you create the image by installing using Xen
> *exactly* as described, you're done. If you're getting the image from
> somewhere
> else then the phys_path to the disk embedded in the pool will be wrong
> and need
> to be rewritten, which basically means going into Xen again.

I will be using xen so hopefully all will be good...


-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Omios, hvm and AWS

2017-07-29 Thread Al Slater
Thank you, that clarified my understanding.

Al

On 27/07/17 22:47, PÁSZTOR György wrote:
> Hi,
> 
> "Al Slater" <al.sla...@scluk.com> írta 2017-07-27 12:17-kor:
>> So, I thought I would try to produce my own AMI with hvm virtualization.
>>
>> I am looking to use omniosce r151022, is this likely to work at all?
> 
> I haven't tryed to upgrade my r151022 with the ce updates, but I'm pretty
> sure that it must work.
> 
>> I have read https://omnios.omniti.com/wiki.php/Ec2Ami, does anyone know
>> how that procedure would be amended to cater for loader/hvm instead of
>> pv-grub?
> 
> If you use hvm, then there is no need for an extra loader. Just install
> omnios, as you would onto the "virtual" hdd.
> However, I never tried amazon's env. I experimenting with omnios on my home
> nas. (See my mail two days ago)
> 
> The only drawback what I found: if the xen hypervisor is >=4.6 (or >4.5.1 I
> don't know yet), then the pv network driver won't work.
> 
> Cheers,
> Gyu
> 



-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Omios, hvm and AWS

2017-07-27 Thread Al Slater
Hi,

I wish to run up a number of OmniOS instances in AWS.

The current OmniOS AMIs in AWS seem to use pv virtualization, precluding
their use on the t2 and m4 instance types that I want to use.

So, I thought I would try to produce my own AMI with hvm virtualization.

I am looking to use omniosce r151022, is this likely to work at all?

I have read https://omnios.omniti.com/wiki.php/Ec2Ami, does anyone know
how that procedure would be amended to cater for loader/hvm instead of
pv-grub?


-- 
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ANNOUNCEMENT OmniOS Community Edition - OmniOSce r151022h

2017-07-13 Thread Al Slater
root` to the PATH given in the list.) 
> 
> 
> 5. Install the new ca-bundle containing our new CA 
> 
> ```
> # /usr/bin/pkg update -rv web/ca-bundle 
> ```
> 
> 6. Remove the CA file imported by hand 
> 
> ```
> # rm /etc/ssl/pkg/omniosce-ca.cert.pem 
> ```
> 
> 7. Finally update as usual 
> 
> ```https://pkg.omniosce.org/r151022/core/
> # /usr/bin/pkg update -rv 
> ```
> 
> 
> ## About OmniOS Community Edition Association 
> 
> OmniOS Community Edition Association (OmniOSce) is a Swiss association, 
> dedicated to the continued support and release of OmniOS for the benefit of 
> all parties involved. The board of OmniOSce controls access to the OmniOS CA. 
> Current board members are: Tobias Oetiker (President), Andy Fiddaman 
> (Development), Dominik Hassler (Treasurer). 
> 
> 
> ## About Citrus-IT 
> 
> Citrus IT is a UK company that provides a managed email service platform to 
> companies around the world. For many years they ran their systems on Solaris 
> with SPARC hardware but transitioned to OmniOS in 2012. www.citrus-it.net 
> 
> 
> ## About OETIKER+PARTNER AG 
> 
> OETIKER+PARTNER is a Swiss system management and software development 
> company. Employees from O+P are involved in many Open Source Software 
> projects. O+P runs most of their server hardware on OmniOS. www.oetiker.ch 
> 
> 
> Press inquiries to i...@omniosce.org 
> 
> Published July 12, 2017 
> 
> OmniOSce 
> Aarweg 17 
> 4600 Olten 
> Switzerland
> 
> http://www.omniosce.org
> 
> 


-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Updates for OmniOS r151014 & r151016

2015-11-18 Thread Al Slater
On 13/11/15 20:13, Dan McDonald wrote:
> 014:
> --
> 

>
> - ilbd memory leak plug

Thanks for getting this is there Dan, we have been running leak free for
3 days now.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-10 Thread Al Slater
On 10/11/15 15:26, Dan McDonald wrote:
> 
>> On Nov 10, 2015, at 2:50 AM, Al Slater <al.sla...@scluk.com> wrote:
>>
>> On 10/11/2015 07:40, Al Slater wrote:
>>> It seems to me that ilbd_run_probe just needs to call
>>> posix_spawn_file_actions_destroy appropriately.
>>
>> And probably posix_spawnattr_destroy as well?
> 
> Wow!  Great catch.  I'll bet a small sum you nailed this to the wall.
> 
> Want me to build you a replacement ilbd?

Yes please :)

Thanks for your, and Bob's, help with this.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-09 Thread Al Slater

Hi Dan,

On 06/11/2015 18:31, Dan McDonald wrote:

You said you had a test box, right?


Yes.


Can you:

- Disable UMEM_DEBUG
- RESTART the service.
- IMMEDIATELY after restart do pmap, and do pmap once per (sec, 10 sec, 
something) to see how it grows?


Attached is a compressed file with 5hrs or so of 10s pmaps.  Hopefully 
not too big for the list.



After that, maybe we can dtrace and see what's going on.




--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55


pmap.6589.gz
Description: application/gzip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-09 Thread Al Slater
On 09/11/15 15:43, Dan McDonald wrote:
> 
>> On Nov 9, 2015, at 8:39 AM, Al Slater <al.sla...@scluk.com> wrote:
>> 
>> Attached is a compressed file with 5hrs or so of 10s pmaps.
>> Hopefully not too big for the list.
> 
> It compressed nicely.  I'm noticing a pattern:
> 
> Mon Nov  9 08:21:45 UTC 2015 total Kb  134008  133504  131416
> - Mon Nov  9 08:50:21 UTC 2015 total Kb  265080  264576  262488
> - Mon Nov  9 09:37:42 UTC 2015 total Kb  265088  264580  262492
> - Mon Nov  9 09:47:40 UTC 2015 total Kb  527232  526724  524636
> - Mon Nov  9 11:42:19 UTC 2015 total Kb 1051520 1050960 1048872
> - Mon Nov  9 11:42:29 UTC 2015 total Kb 1051520 1051012 1048924
> -
> 
> 
> It's mostly linear growth.  Notice the time intervals also double
> whenever the footprint essentially doubles?
> 
> So I need to back up and ask some things, especially given libumem
> doesn't appear to show leaks or even usage:
> 
> 1.) Is the eating of memory affecting your system peformance?  (If
> you've only 8GB, yeah, I can see that.)

Hmmm...  I started investigating after the servers hung a couple of
times.  I have not conclusively proved that this was the cause, but the
machines have been running for months with no issue after I added a
cronjob to restart ilb twice a day.  I can see a gradual increase in
kernel memory use as well, but I have not investigated that.

> 2.) Is ilb failing after it gets sufficiently large?

Again, no link conclusively proved, but I did see log messages like the
following when the memory use had grown to 4Gb...

Nov  5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it

I looked at the source for ilbd and I think this could be caused by a
memory allocation failure in iu_schedule_timer.

After these messages was generated, it looks like the disabled servers
were never re-enabled, so eventually this could end up with no enabled
servers, and therefore no service, without manual intervention.

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Al Slater

On 05/11/2015 14:57, Dan McDonald wrote:



On Nov 5, 2015, at 6:38 AM, Al Slater <al.sla...@scluk.com> wrote:

I have the 4Gb core file.  Is there anything useful I can extract from
it to try and spot where the problem is?


Your one ::findleaks showed nothing.  Did your 4GB corefile have ::findleaks 
show nothing as well?


::findleaks against the 4GB corefile showed nothing.

--
Al Slater



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-06 Thread Al Slater
On 06/11/15 14:51, Dan McDonald wrote:
> 
>> On Nov 6, 2015, at 9:39 AM, Dan McDonald <dan...@omniti.com> wrote:
>>
>> Lots of LARGE anonymous mappings.  I wonder why that happened? I'll dig into 
>> that a bit more.
> 
> pmap(1) works even better on running processes.  Could you run, say "pmap -xa 
> `pgrep ilbd`" on your running machine?

Here you go...

root@loki:/export/home/BRIGHTON/aslate# pmap -xa `pgrep ilbd`
12346:  /usr/lib/inet/ilbd
 Address  Kbytes RSSAnon  Locked Mode   Mapped File
08027000 132 132 132   - rw---[ stack ]
0805  76  76   -   - r-x--  ilbd
08073000   4   4   4   - rw---  ilbd
08074000  96   -   -   - rw---  ilbd
0808C000115611401112   - rw---[ heap ]
0D20  262144  262144  262144   - rwx--[ anon ]
1D40  524288  524288  524288   - rwx--[ anon ]
3D60 1048576 1048576 1048576   - rwx--[ anon ]
7D80 1048576 1048576 1048576   - rwx--[ anon ]
BDA0  524288  524288  524288   - rwx--[ anon ]
DDC0  262144  262144  262144   - rwx--[ anon ]
EDE0  131072  131072  131072   - rwx--[ anon ]
F600   65536   65536   65536   - rwx--[ anon ]
FA20   32768   32768   32768   - rwx--[ anon ]
FC40   16384   16384   16384   - rwx--[ anon ]
FD60819281928192   - rwx--[ anon ]
FE00409640964096   - rwx--[ anon ]
FE60204820482048   - rwx--[ anon ]
FE8A  36  16   -   - r-x--  libtsol.so.2
FE8B9000   4   4   4   - rw---  libtsol.so.2
FE8C   4   4   4   - rwx--[ anon ]
FE8D 140 112   -   - r-x--  libbsm.so.1
FE903000  28  28  28   - rw---  libbsm.so.1
FE90A000   4   -   -   - rw---  libbsm.so.1
FE91  16  16   -   - r-x--  libsecdb.so.1
FE924000   4   4   4   - rw---  libsecdb.so.1
FE93102410241024   - rwx--[ anon ]
FEA4 512 512 512   - rwx--[ anon ]
FEAD 256 256 256   - rwx--[ anon ]
FEB2 128 128 128   - rwx--[ anon ]
FEB5  64  64  64   - rwx--[ anon ]
FEB7  64  16  16   - rwx--[ anon ]
FEB9   4   4   4   - rwx--[ anon ]
FEBA  20  20   -   - r-x--  libilb.so.1
FEBB5000   4   4   4   - rw---  libilb.so.1
FEBC  32  32   -   - r-x--  libuutil.so.1
FEBD8000   4   4   4   - rw---  libuutil.so.1
FEBE   4   4   4   - rwx--[ anon ]
FEBF 172 148   -   - r-x--  libscf.so.1
FEC2B000   4   4   4   - rw---  libscf.so.1
FEC3  20  20   -   - r-x--  libinetutil.so.1
FEC45000   4   4   4   - rw---  libinetutil.so.1
FEC5   4   4   4   - rwx--[ anon ]
FEC6  20  12   -   - r-x--  libcmdutils.so.1
FEC75000   4   4   4   - rw---  libcmdutils.so.1
FEC8   4   4   -   - r--s-  dev:528,24 ino:2821218250
FEC9  64  64   4   - rwx--[ anon ]
FECB  64  64   4   - rwx--[ anon ]
FECD 416 368   -   - r-x--  libnsl.so.1
FED48000   8   8   8   - rw---  libnsl.so.1
FED4A000  20  16   4   - rw---  libnsl.so.1
FED5   4   4   4   - rwx--[ anon ]
FED6  52  48   -   - r-x--  libsocket.so.1
FED7D000   4   4   4   - rw---  libsocket.so.1
FED8  24  12  12   - rwx--[ anon ]
FED91252 936   -   - r-x--  libc_hwcap1.so.1
FEED9000  36  36  32   - rwx--  libc_hwcap1.so.1
FEEE2000   8   8   8   - rwx--  libc_hwcap1.so.1
FEEF   4   4   4   - rwx--[ anon ]
FEF0 196 112   -   - r-x--  libumem.so.1
FEF4   8   4   4   - rwx--  libumem.so.1
FEF52000  76  72  16   - rw---  libumem.so.1
FEF65000  24  24  24   - rw---  libumem.so.1
FEF7   4   4   -   - r--s-  ld.config
FEF8   4   4   4   - rwx--[ anon ]
FEF9   4   4   4   - rw---[ anon ]
FEFA   4   4   4   - rw---[ anon ]
FEFB   4   4   4   - rwx--[ anon ]
FEFB5000 216 216   -   - r-x--  ld.so.1
FEFFB000   8   8   8   - rwx--  ld.so.1
FEFFD000   4   4   4   - rwx--  ld.so.1
 --- --- ------- ---
total Kb 3936668 3935948 3933588

-- 
Al Slater

___
OmniOS-discuss mailing list
OmniOS-disc

Re: [OmniOS-discuss] ILB memory leak?

2015-11-05 Thread Al Slater

To the mailing list as well...

On 22/10/2015 09:43, Al Slater wrote:
> On 21/10/2015 17:35, Dan McDonald wrote:
>>
>>> On Oct 21, 2015, at 6:08 AM, Al Slater <al.sla...@scluk.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I am running omnios r151014 on a couple of machines with a couple
>>> of zones each.  1 zone runs apache as an SSL reverse proxy, the
>>> other runs ILB for load balancing web to app tier connections.
>>>
>>> I noticed that in the ILB zone, the ilbd process memory grows to
>>> about 2Gb.   Restarting ILB releases the memory, and then the
>>> memory usage gradually increases again, with each memory increase
>>> approximately 2 * the size of the previous one.  I run a cronjob
>>> twice a day ( 8am and 8pm) which restarts the ilb service and
>>> releases the memory.
>>>
>>> A graph of memory usage is available at
>>> https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0
>>>
>   >> There are currently 62 rules in the load balancer, with a
>   >> total
>>> of 664 server/port pairs.
>>>
>>> Is there anything I can provide that would help track this down?
>>
>> You can use svccfg(1M) to enable user-level memory debugging on ilb.
>>   It may cause the ilb daemon to dump core.  (And you're just noticing
>>   this in the process, not kernel memory consumption, correct?)
>
> I am seeing kernel memory consumption increasing as well, but that may
> be a different issue.  The ilbd process memory is definitely growing.
>
>> As root:
>>
>> svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
>> svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
>>   enable ilb
>>
>> That should enable user-level memory debugging.  If you get a
>> coredump, save it and share it.  If you don't and the ilb daemon
>> keeps running, eventually please:
>>
>> gcore `pgrep ilbd`
>>
>> and share THAT corefile.  You can also do this by youself:
>>
>> mdb  > ::findleaks
>>
>> and share ::findleaks.
>>
>> Once you're done generating corefiles, repeat the steps above, but
>> use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
>> setenv lines.
>
> Thanks Dan.  As we are talking about production boxes here, I will have
> to try and reproduce on another box and then I will give the process
> above a go and see what we come up with.

I have reproduced the problem on a test box.

prstat shows:

3041 daemon   3946M 3946M sleep   590   0:48:03 0.1% ilbd/1


memstat:

root@loki:/export/home/BRIGHTON/aslate# echo ::memstat | mdb -k
Page SummaryPagesMB  %Tot
     
Kernel 238420   931   12%
ZFS File Data  630861  2464   31%
Anon  1054835  4120   51%
Exec and libs2204 80%
Page cache  10624411%
Free (cachelist) 9236360%
Free (freelist)105626   4125%

Total 2051806  8014
Physical  2051805  8014

mdb findleaks:

root@loki:/export/home/BRIGHTON/aslate# mdb core.3041
Loading modules: [ libumem.so.1 libc.so.1 libcmdutils.so.1 libuutil.so.1
ld.so.1 ]
 > ::findleaks
findleaks: no memory leaks detected
 >

Now, I am seeing lots of log messages like the following in
/var/adm/messages

Nov  5 11:17:01 l1-lb2 ilbd[3041]: [ID 410242 daemon.error]
ilbd_hc_probe_timer: cannot restart timer: rule ggp server _ggp.11,
disabling it


So, I was wrong about growing to 2Gb, the truth is nearer 4Gb.  I am
guessing that ilbd_hc_restart_timer is failing because no more memory
can be allocated.

I have the 4Gb core file.  Is there anything useful I can extract from
it to try and spot where the problem is?


-- Al Slater



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-11-05 Thread Al Slater

Hi Dan,

On 05/11/2015 14:57, Dan McDonald wrote:



On Nov 5, 2015, at 6:38 AM, Al Slater <al.sla...@scluk.com> wrote:

I have the 4Gb core file.  Is there anything useful I can extract
from it to try and spot where the problem is?


Your one ::findleaks showed nothing.  Did your 4GB corefile have
::findleaks show nothing as well?

::umausers may be helpful.



root@loki:/export/home/BRIGHTON/aslate# mdb core.3041
Loading modules: [ libumem.so.1 libc.so.1 libcmdutils.so.1 libuutil.so.1
ld.so.1 ]

::umausers

71424 bytes for 62 allocations with data size 1152:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_ilbd_alloc_sg+0x13
 ilbd_create_sg+0x9a
 ilbd_scf_instance_walk_pg+0x2a6
 ilbd_walk_sg_pgs+0x37
 i_ilbd_read_config+0x28
 main_loop+0x7f
 main+0x1d3
 _start+0x83
53120 bytes for 664 allocations with data size 80:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 ilbd_hc_srv_add+0x18
 ilbd_hc_associate_rule+0xd8
 ilbd_create_rule+0x1a3
 ilbd_scf_instance_walk_pg+0x1c4
 ilbd_walk_rule_pgs+0x37
 i_ilbd_read_config+0x4e
 main_loop+0x7f
 main+0x1d3
 _start+0x83
53120 bytes for 664 allocations with data size 80:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_add_srv2sg+0x15
 ilbd_add_server_to_group+0x310
 ilbd_scf_instance_walk_pg+0x2dd
 ilbd_walk_sg_pgs+0x37
 i_ilbd_read_config+0x28
 main_loop+0x7f
 main+0x1d3
 _start+0x83
31584 bytes for 658 allocations with data size 48:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x99
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 libinetutil.so.1`iu_schedule_timer_ms+0x2d
 libinetutil.so.1`iu_schedule_timer+0x37
 ilbd_hc_restart_timer+0xbc
 ilbd_hc_probe_timer+0x23
 libinetutil.so.1`iu_expire_timers+0xbe
 ilbd_hc_timeout+0x11
 main_loop+0xe6
 main+0x1d3
 _start+0x83
12288 bytes for 1 allocations with data size 12288:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x18f
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libc.so.1`ltzset_u+0xa2
 libc.so.1`localtime_r+0x35
 libc.so.1`ctime_r+0x2c
 libc.so.1`vsyslog+0x1e4
 ilbd_log+0x48
 main+0x15e
 _start+0x83
10368 bytes for 54 allocations with data size 192:
 libumem.so.1`umem_cache_alloc_debug+0x1fe
 libumem.so.1`umem_cache_alloc+0x99
 libumem.so.1`umem_alloc+0x50
 libumem.so.1`umem_malloc+0x36
 libumem.so.1`calloc+0x50
 i_alloc_ilbd_rule+0x17
 ilbd_create_rule+0xfa
 ilbd_scf_instance_walk_pg+0x1c4
 ilbd_walk_rule_pgs+0x37
 i_ilbd_read_config+0x4e
 main_loop+0x7f
 main+0x1d3
 _start+0x83



Sharing the corefile would also be helpful.


I have put it on dropbox

https://www.dropbox.com/s/y6cv78d1xk5j5u7/core.3041.gz?dl=0


I'm assuming, given you see problems at 4GB that ilbd is a 32-bit
process, right?


Yes,

#  file /usr/lib/inet/ilbd
/usr/lib/inet/ilbd: ELF 32-bit LSB executable 80386 Version 1,
dynamically linked, not stripped, no debugging information available

cheers

--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ILB memory leak?

2015-10-22 Thread Al Slater

On 21/10/2015 17:35, Dan McDonald wrote:



On Oct 21, 2015, at 6:08 AM, Al Slater <al.sla...@scluk.com>
wrote:

Hi,

I am running omnios r151014 on a couple of machines with a couple
of zones each.  1 zone runs apache as an SSL reverse proxy, the
other runs ILB for load balancing web to app tier connections.

I noticed that in the ILB zone, the ilbd process memory grows to
about 2Gb.   Restarting ILB releases the memory, and then the
memory usage gradually increases again, with each memory increase
approximately 2 * the size of the previous one.  I run a cronjob
twice a day ( 8am and 8pm) which restarts the ilb service and
releases the memory.

A graph of memory usage is available at
https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0


>> There are currently 62 rules in the load balancer, with a
>> total

of 664 server/port pairs.

Is there anything I can provide that would help track this down?


You can use svccfg(1M) to enable user-level memory debugging on ilb.
 It may cause the ilb daemon to dump core.  (And you're just noticing
 this in the process, not kernel memory consumption, correct?)


I am seeing kernel memory consumption increasing as well, but that may 
be a different issue.  The ilbd process memory is definitely growing.



As root:

svcadm disable -t ilb svccfg -s ilb setenv LD_PRELOAD libumem.so
svccfg -s ilb setenv UMEM_DEBUG default svccfg -s ilb refresh svcadm
 enable ilb

That should enable user-level memory debugging.  If you get a
coredump, save it and share it.  If you don't and the ilb daemon
keeps running, eventually please:

gcore `pgrep ilbd`

and share THAT corefile.  You can also do this by youself:

mdb  > ::findleaks

and share ::findleaks.

Once you're done generating corefiles, repeat the steps above, but
use "unsetenv LD_PRELOAD" and "unsetenv UMEM_DEBUG" instead of the
setenv lines.


Thanks Dan.  As we are talking about production boxes here, I will have 
to try and reproduce on another box and then I will give the process 
above a go and see what we come up with.


--
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] ILB memory leak?

2015-10-21 Thread Al Slater

Hi,

I am running omnios r151014 on a couple of machines with a couple of 
zones each.  1 zone runs apache as an SSL reverse proxy, the other runs 
ILB for load balancing web to app tier connections.


I noticed that in the ILB zone, the ilbd process memory grows to about 
2Gb.   Restarting ILB releases the memory, and then the memory usage 
gradually increases again, with each memory increase approximately 2 * 
the size of the previous one.  I run a cronjob twice a day ( 8am and 
8pm) which restarts the ilb service and releases the memory.


A graph of memory usage is available at 
https://www.dropbox.com/s/zaz51apxslnivlq/ILB_Memory_2_days.png?dl=0


There are currently 62 rules in the load balancer, with a total of 664 
server/port pairs.


Is there anything I can provide that would help track this down?


--
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] pkgrecv r151014

2015-04-06 Thread Al Slater
Hi,

I am trying to pkgrecv r151014 into my own repository and keep bumping
into this:

pkgrecv: Invalid contentpath opt/sunstudio12.1/prod/lib/sys/libsunir.so:
chash failure: expected: b251c238070b6fdbf392194e85319e2c954a5384
computed: 17d9899f959ac5835569e8870f7e02eb14607242. (happened 4 times)

Is there a problem with this package in the repository?

-- 
Al Slater


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgrecv r151014

2015-04-06 Thread Al Slater
On 06/04/15 11:03, Al Slater wrote:
 Hi,
 
 I am trying to pkgrecv r151014 into my own repository and keep bumping
 into this:
 
 pkgrecv: Invalid contentpath opt/sunstudio12.1/prod/lib/sys/libsunir.so:
 chash failure: expected: b251c238070b6fdbf392194e85319e2c954a5384
 computed: 17d9899f959ac5835569e8870f7e02eb14607242. (happened 4 times)
 
 Is there a problem with this package in the repository?

Same happens with pkg install...

# pkg install pkg:/developer/sunstudio12.1@12.1-0.151014
   Packages to install:  1
   Create boot environment: No
Create backup boot environment: No

DOWNLOADPKGS FILESXFER (MB)
  SPEED
developer/sunstudio12.1  0/1 5042/7006  203.1/256.3
 3.0M/s



Errors were encountered while attempting to retrieve package or file
data for
the requested operation.
Details follow:

Invalid contentpath opt/sunstudio12.1/prod/lib/sys/libsunir.so: chash
failure: expected: b251c238070b6fdbf392194e85319e2c954a5384 computed:
17d9899f959ac5835569e8870f7e02eb14607242. (happened 4 times)


regards

-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgrecv r151014

2015-04-06 Thread Al Slater
Thanks Eric, AV on the gateway was the problem.

Al

On 06/04/15 15:14, Eric Sproul wrote:
 On Mon, Apr 6, 2015 at 6:24 AM, Al Slater al.sla...@scluk.com wrote:
 On 06/04/15 11:03, Al Slater wrote:
 Hi,

 I am trying to pkgrecv r151014 into my own repository and keep bumping
 into this:

 pkgrecv: Invalid contentpath opt/sunstudio12.1/prod/lib/sys/libsunir.so:
 chash failure: expected: b251c238070b6fdbf392194e85319e2c954a5384
 computed: 17d9899f959ac5835569e8870f7e02eb14607242. (happened 4 times)

 Is there a problem with this package in the repository?
 
 It seems fine from my location:
 
 $ pkg contents -mr developer/sunstudio12.1 | grep libsunir.so
 file 19d832f8b112a9545e9d9b5aaf1384a7a37248f3
 chash=b251c238070b6fdbf392194e85319e2c954a5384 elfarch=i386 elfbits=32
 elfhash=710138bfbc99dd3aefd4a41dd49b9779cae35f15 group=bin mode=0755
 
 $ curl -s 
 http://pkg.omniti.com/omnios/r151014/file/1/19d832f8b112a9545e9d9b5aaf1384a7a37248f3
 | sha1sum
 b251c238070b6fdbf392194e85319e2c954a5384  -
 
 Eric
 

-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] pkgsend generate bug with spaces in file names

2014-11-04 Thread Al Slater
On 04/11/14 13:43, Lauri Tirkkonen wrote:
 On Tue, Nov 04 2014 13:35:39 +, Al Slater wrote:
 I have run into the same problem while packaging cmake.  Is there a
 solution?
 
 I have an open pull request for this, but from what I understand OmniOS'
 pkg isn't currently in a state where they can merge it. The OmniTIers
 can probably elaborate on that, but if you want to apply the patch
 yourself, it's at https://github.com/postwait/pkg5/pull/4

Thanks for that.


-- 
Al Slater

Technical Director
SCL

Phone : +44 (0)1273 07
Fax   : +44 (0)1273 01
email : al.sla...@scluk.com

Stanton Consultancy Ltd

Park Gate, 161 Preston Road, Brighton, East Sussex, BN1 6AU

Registered in England Company number: 1957652 VAT number: GB 760 2433 55

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss