[OmniOS-discuss] Panic when trying to remove an unused LUN with stmfadm

2016-04-16 Thread Stephan Budach

Hi,

I have experienced this issue a couple of times now. First in r016 but 
just today on r018, too. When trying to remove a LUN by issueing 
something like


root@nfsvmpool05:/root# stmfadm delete-lu 600144F04E4653564D504F4F4C303538
packet_write_wait: Connection to 10.11.14.49: Broken pipe
Shared connection to nfsvmpool05 closed.

the system hung up. When it came back online, I was able to remove that 
LUN without any issue. The fun thing is, that I created that LUN just 
before and it hadn't even been in use, as it hadn't been attached to any 
view.


The syslog shows the usual COMSTAR thingy about kernel heap corruption, 
which I encountered already a couple of times, although in rather normal 
operation mode.


Apr 16 10:17:15 nfsvmpool05 genunix: [ID 478202 kern.notice] kernel 
memory allocator:

Apr 16 10:17:15 nfsvmpool05 unix: [ID 836849 kern.notice]
Apr 16 10:17:15 nfsvmpool05 ^Mpanic[cpu6]/thread=ff0e495c4880:
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 812275 kern.notice] kernel heap 
corruption detected

Apr 16 10:17:15 nfsvmpool05 unix: [ID 10 kern.notice]
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 802836 kern.notice] 
ff003df44ae0 fba4e8d4 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44b20 genunix:kmem_free+1a8 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44b60 stmf:stmf_deregister_lu+1a7 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44ba0 stmf_sbd:sbd_delete_locked_lu+95 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44c00 stmf_sbd:sbd_delete_lu+a9 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44c80 stmf_sbd:stmf_sbd_ioctl+292 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44cc0 genunix:cdev_ioctl+39 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44d10 specfs:spec_ioctl+60 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44da0 genunix:fop_ioctl+55 ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44ec0 genunix:ioctl+9b ()
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 655072 kern.notice] 
ff003df44f10 unix:brand_sys_sysenter+1c9 ()

Apr 16 10:17:15 nfsvmpool05 unix: [ID 10 kern.notice]
Apr 16 10:17:15 nfsvmpool05 genunix: [ID 672855 kern.notice] syncing 
file systems...

Apr 16 10:17:15 nfsvmpool05 genunix: [ID 904073 kern.notice]  done
Apr 16 10:17:16 nfsvmpool05 genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Apr 16 10:17:16 nfsvmpool05 ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 1 reset port

Apr 16 10:20:46 nfsvmpool05 genunix: [ID 10 kern.notice]
Apr 16 10:20:46 nfsvmpool05 genunix: [ID 665016 kern.notice] ^M100% 
done: 1153955 pages dumped,

Apr 16 10:20:46 nfsvmpool05 genunix: [ID 851671 kern.notice] dump succeeded

Fortuanetly this time, there has been a debug kernel running, as Dan 
suggested on former occurances and I do have a dump at hand, which I 
could upload to uploads.omniti.com, if I'd get a token to do so.


@Dan, may I get one?

Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS 151016/ 151017 not listed in network environment

2016-04-16 Thread Guenther Alka

update

OmniOS is listed on a Windows machine (tried in 151018)
when you set the smbshare propertynetbios_enable=true

Could this be re-enabled as default as this is an expected behaviour?

thanks


Gea



Am 31.03.2016 um 20:30 schrieb Guenther Alka:

Hallo Dan
This is not related to Windows AD but a problem that the Solarish 
kernel SMB server does not offer a master browser capability that is 
needed to provide the list of SMB servers in Windows "network 
neighborhood".


So you need a Windows or SAMBA server in the subnet that provides this 
feature.
This worked together with my 151014 boxes that are listed in network 
neighborhood


see
https://docs.oracle.com/cd/E36784_01/html/E36832/winclientcannotcontactbynetbios.html 



thanks

Gea

Am 31.03.2016 um 19:02 schrieb Dan McDonald:

On Mar 31, 2016, at 11:46 AM, Guenther Alka  wrote:

We have an AD environment where the AD server is the master browser
to list all servers and workstations on our Windows machines under 
network


This works for a couple of OmniOS 151014 servers but not for my 
151016/ 151017 testservers. Is this a known problem or is there 
something that I should consider?
Are you using the in-kernel SMB service?  That got updated in a big 
way as part of 016 (non-global-zone SMB serving), and more is coming 
in 018.


I'm not a Windows/AD expert, so please, when reporting AD issues, be 
a bit more pedantic in your explanation.  You should also check the 
illumos mailing list archives as well.


Thanks,
Dan



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


--
H  f   G
Hochschule für Gestaltung
university of design

Schwäbisch Gmünd
Rektor-Klaus Str. 100
73525 Schwäbisch Gmünd

Guenther Alka, Dipl.-Ing. (FH)
Leiter des Rechenzentrums
head of computer center

Tel 07171 602 627
Fax 07171 69259
guenther.a...@hfg-gmuend.de
http://rz.hfg-gmuend.de

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [developer] NVMe Performance

2016-04-16 Thread Richard Elling

> On Apr 15, 2016, at 7:49 PM, Richard Yao  wrote:
> 
> On 04/15/2016 10:24 PM, Josh Coombs wrote:
>> On Fri, Apr 15, 2016 at 9:26 PM, Richard Yao  wrote:
>> 
>>> 
>>> The first is to make sure that ZFS uses proper alignment on the device.
>>> According to what I learned via Google searches, the Intel DC P3600
>>> supports both 512-byte sectors and 4096-byte sectors, but is low leveled
>>> formatted to 512-byte sectors by default. You could run fio to see how the
>>> random IO performance differs on 512-byte IOs at 512-byte formatting vs 4KB
>>> IOs at 4KB formatting, but I expect that you will find it performs best in
>>> the 4KB case like Intel's enterprise SATA SSDs do. If the 512-byte random
>>> IO performance was notable, Intel would have advertised it, but they did
>>> not do that:
>>> 
>>> 
>>> http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3600-spec.pdf
>>> 
>>> http://www.cadalyst.com/%5Blevel-1-with-primary-path%5D/how-configure-oracle-redo-intel-pcie-ssd-dc-p3700-23534
>>> 
>> So, I played around with this.  Intel's isdct tool will let you secure
>> erase the P3600 and set it up as a 4k sector device, or a 512, with a few
>> other options as well.  I have to re-look but it might support 8k sectors
>> too.  Unfortunately the NVMe driver doesn't play well with the SSD
>> formatted for anything other than 512 byte sectors.  I noted my findings in
>> Illumos bug #6912.
> 
> The documentation does not say that it will do 8192-byte sectors,
> although ZFS in theory should be okay with them. My tests on the Intel
> DC S3700 suggested that 4KB vs 8KB was too close to tell. I recall
> deciding that Intel did a good enough job at 4KB that it should go into
> ZoL's quirks list as a 4KB drive.

ZIL traffic is all 4K, unless phys blocksize is larger. There are a number of 
Flash SSDs
that prefer 8k, and you can tell by the “optimal transfer size.” Since the bulk 
of the market
driving SSD sales is running NTFS, 4K is the market sweet spot.

> 
> The P3600 is probably similar because its NAND flash controller "is an
> evolution of the design used in the S3700/S3500":
> 
> http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme
>  
> 
> 
>> I need to look at how Illumos partitions the devices if you just feed zpool
>> the device rather than a partition, I didn't look to see if it was aligning
>> things correctly or not on it's own.
> 
> It will put the first partition at a 1MB boundary and set an internal
> alignment shift consistent with what the hardware reports.
> 
>> The second is that it is possible to increase IOPS beyond Intel's
>>> specifications by doing a secure erase, giving SLOG a tiny 4KB aligned
>>> partition and leaving the rest of the device unused. Intel's numbers are
>>> for steady state performance where almost every flash page is dirty. If you
>>> leave a significant number of pages clean (i.e. unused following a secure
>>> erase), the drive should perform better than what Intel claims by virtue of
>>> the internal book keeping and garbage collection having to do less. 
>>> Anandtech
>>> has benchmarks numbers showing this effect on older consumer SSDs on
>>> Windows in a comparison with the Intel DC S3700:
>>> 
>> 
>> Using isdct I have mine set to 50% over-provisioning, so they show up as
>> 200GB devices now.  As noted in bug 6912 you have to secure erase after
>> changing that setting or the NVMe driver REALLY gets unhappy.
> 
> If you are using it as a SLOG, you would probably want something like
> 98% overprovisioning to match the ZeusRAM, which was designed for use as
> a ZFS SLOG device and was very well regarded until it was discontinued:
> 
> https://www.hgst.com/sites/default/files/resources/[FAQ]_ZeusRAM_FQ008-EN-US.pdf
>  
> 

ZeusRAM was great for its time, but the 12G replacements perform similarly. The
biggest difference between ZeusRAMs and Flash SSDs seems to be in the garbage
collection. In my testing, low DWPD drives have less consistent performance as 
the
garbage collection is less optimized. For the 3 DWPD drives we’ve tested, the 
performance
for slog workloads is more consistent than the 1 DWPD drives.

> 
> ZFS generally does not need much more from a SLOG device. The way to
> ensure that you do not overprovision more/less than ZFS is willing to
> use on your system would be to look at zfs_dirty_data_max
> 
> That being said, you likely will want to run fio random IO benchmarks at
> different overprovisioning levels after a secure erase and a dd
> if=/dev/urandom of=/path/to/device so you can see the difference in
> performance yourself. Happy benchmarking. :)

/dev/urandom is too (intentionally) slow. You’ll bottleneck there.

Richard’s advice is good: test with random workloads. Contrary to p