Re: [zfs-discuss] ZFS boot: another way

2007-07-03 Thread Douglas Atique
I'm afraid the Solaris installer won't let me stop the process just before it 
starts copying files to the target filesystem. It would be very nice to get 
away with the UFS slice altogether, but between filesystem creation and 
initialisation (which seems mandatory) and copying there is no pause where I 
could open a terminal and do the trick.

-- Douglas
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS panic on 32bit x86

2007-07-03 Thread grant beattie
hi all,

I was extracting a 8GB tar and encountered this panic. the system was 
just installed last week with Solaris 10 update 3 and the latest 
recommended patches as of June 26. I can provide more output from mdb, 
or the crashdump itself if it would be of any use.

any ideas what's going on here?


# uname -a
SunOS fang 5.10 Generic_125101-09 i86pc i386 i86pc

# mdb -k unix.0 vmcore.0
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip 
sctp usba fcp fctl nca md lofs zfs random nfs sppp crypto ptm fcip cpc 
logindmux ]
 > ::status
debugging crash dump vmcore.0 (32-bit) from fang
operating system: 5.10 Generic_125101-09 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=d4ab09d4 addr=20 occurred in module 
"zfs" due to a NULL pointer dereference
dump content: kernel pages only
 > *panic_thread::findstack -v
stack pointer for thread d490d600: d4ab08d0
   d4ab09d4 0xd4ab08f4()
   d4ab0a30 zap_leaf_lookup+0x25(ebe7bc80, d4ab0bf0, 0, ea7a6950, d4ab0a60,
   d4ab0bf0)
   d4ab0a9c fzap_lookup+0x88(ebe7bc80, d4ab0bf0, 8, 0, 1, 0)
   d4ab0ad0 zap_lookup+0xb0(d4e02f48, 209ce, 0, d4ab0bf0, 8, 0)
   d4ab0b28 zfs_dirent_lock+0x23e(d4ab0b58, d82d8cd0, d4ab0bf0, d4ab0b54, 6)
   d4ab0b5c zfs_dirlook+0x9b(d82d8cd0, d4ab0bf0, d4ab0d30)
   d4ab0b84 zfs_lookup+0x6f(e219fd80, d4ab0bf0, d4ab0d30, d4ab0da0, 1, 
d3a65c00)
   d4ab0bc0 fop_lookup+0x2c(e219fd80, d4ab0bf0, d4ab0d30, d4ab0da0, 1, 
d3a65c00)
   d4ab0d38 lookuppnvp+0x295(d4ab0da0, 0, 0, d4ab0e50, 0, d3a65c00)
   d4ab0d70 lookuppnat+0xe8(d4ab0da0, 0, 0, d4ab0e50, 0, 0)
   d4ab0e58 vn_createat+0x9f(8090840, 0, d4ab0e98, 1, 80, d4ab0f00)
   d4ab0f0c vn_openat+0x323(8090840, 0, 2502, 180, d4ab0f68, 0)
   d4ab0f6c copen+0x24f()
   d4ab0f84 open64+0x1d()
   d4ab0fac sys_sysenter+0x100()

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Albert Chin
PSARC 2007/171 will be available in b68. Any documentation anywhere on
how to take advantage of it?

Some of the Sun storage arrays contain NVRAM. It would be really nice
if the array NVRAM would be available for ZIL storage. It would also
be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
to various sun low/mid-range servers that are currently acting as
ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
could be used for the ZIL? I think several HD's are available with
SCSI/ATA interfaces.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Casper . Dik

>PSARC 2007/171 will be available in b68. Any documentation anywhere on
>how to take advantage of it?
>
>Some of the Sun storage arrays contain NVRAM. It would be really nice
>if the array NVRAM would be available for ZIL storage. It would also
>be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
>to various sun low/mid-range servers that are currently acting as
>ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
>could be used for the ZIL? I think several HD's are available with
>SCSI/ATA interfaces.


Would flash memory be fast enough (current flash memory has reasonable
sequential write throughput but horrible "I/O" ops)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: another way

2007-07-03 Thread Dick Davies
I've found it's fairly easy to trim down a 'core' install, installing
to a temporary UFS root,
doing the ufs -> zfs thing, and  then re-use the old UFS slice as swap.

Obviously you need a separate /boot slice in this setup.

On 03/07/07, Douglas Atique <[EMAIL PROTECTED]> wrote:
> I'm afraid the Solaris installer won't let me stop the process just before it 
> starts copying files to the target filesystem. It would be very nice to get 
> away with the UFS slice altogether, but between filesystem creation and 
> initialisation (which seems mandatory) and copying there is no pause where I 
> could open a terminal and do the trick.

-- 
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS scaling

2007-07-03 Thread William Loewe
Hello,

I'm using a Sun Fire X4500 "Thumper" and trying to get some sense of the best 
performance I can get from it with zfs.

I'm running without mirroring or raid, and have checksumming turned off.  I 
built the zfs with these commands:

# zpool create mypool disk0 disk1 ... diskN
# zfs set checksum=off mypool
# zfs create mypool/testing

When I run an application with 8 threads performing writes, I see this 
performance:

   1 disk  --  42 MB/s
   2 disks --  81 MB/s
   4 disks -- 147 MB/s
   8 disks -- 261 MB/s
  12 disks -- 347 MB/s
  16 disks -- 433 MB/s
  32 disks -- 687 MB/s
  45 disks -- 621 MB/s

I'm surprised it doesn't scale better than this, and I'm curious to know what 
the best configuration is for getting the maximum write performance from the 
Thumper.

Thanks,

  --  Bill.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS scaling

2007-07-03 Thread Paul Fisher
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> William Loewe
>
> I'm using a Sun Fire X4500 "Thumper" and trying to get some 
> sense of the best performance I can get from it with zfs.
> 
> I'm running without mirroring or raid, and have checksumming 
> turned off.  I built the zfs with these commands:
> 
> # zpool create mypool disk0 disk1 ... diskN
> # zfs set checksum=off mypool
> # zfs create mypool/testing
> 
> When I run an application with 8 threads performing writes, I 
> see this performance:
> 
>1 disk  --  42 MB/s
>2 disks --  81 MB/s
>4 disks -- 147 MB/s
>8 disks -- 261 MB/s
>   12 disks -- 347 MB/s
>   16 disks -- 433 MB/s
>   32 disks -- 687 MB/s
>   45 disks -- 621 MB/s

This is more a matter of the number of vdevs at the top-level of the pool 
coupled with the fact that there are six (6) controllers upon which the disks 
attached.  The good news is that adding redundancy does not slow it down.  The 
following two example configurations demonstrate the two ends of the 
performance expectations you can have for the thumper.

For example, a pool constructed of three (3) raidz2 vdevs, each with twelve 
(12) like so:

bash-3.00# zpool status conf6z2pool
  pool: conf6z2pool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
conf6z2pool  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c7t5d0  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
spares
  c8t0d0AVAIL   


will yield the following performance for several sustained writes of block
size 128k yields:

 (9 x dd if=/dev/zero bs=128k)

bash-3.00# zpool iostat conf6z2pool 1 
capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
conf6z2pool  9.56G  16.3T  0  3.91K  0   492M
conf6z2pool  9.56G  16.3T  0  4.04K  0   509M
conf6z2pool  9.56G  16.3T  0  4.05K  0   510M
conf6z2pool  9.56G  16.3T  0  4.11K  0   517M


and sustained read performance of several 128k streams yields:

 (9 x dd of=/dev/zero bs=128k)

bash-3.00# zpool iostat conf6z2pool 1
capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
conf6z2pool  1.30T  15.0T  5.97K  0   759M  0
conf6z2pool  1.30T  15.0T  5.97K  0   759M  0
conf6z2pool  1.30T  15.0T  5.96K  0   756M  0

and sustained read/write performance of several 128k streams yields:

 (9 x dd if=/dev/zero bs=128k && 9 x dd of=/dev/null bs=128k)

bash-3.00# zpool iostat conf6z2pool 1
capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
conf6z2pool  1.30T  15.0T  3.34K  2.54K   424M   320M
conf6z2pool  1.30T  15.0T  2.89K  2.83K   367M   356M
conf6z2pool  1.30T  15.0T  2.96K  2.80K   375M   353M
conf6z2pool  1.30T  15.0T  3.50K  2.58K   445M   325M

The complete opposite end of the performance spectrum will come from a
pool of mirrored vdevs with the following configuraiton:

bash-3.00# zpool status conf7mpool
  pool: conf7mpool

Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Richard Elling
Albert Chin wrote:
> PSARC 2007/171 will be available in b68. Any documentation anywhere on
> how to take advantage of it?

Should be part of b68.

> Some of the Sun storage arrays contain NVRAM. It would be really nice
> if the array NVRAM would be available for ZIL storage. It would also
> be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
> to various sun low/mid-range servers that are currently acting as
> ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
> could be used for the ZIL? I think several HD's are available with
> SCSI/ATA interfaces.

First, you need a workload where the ZIL has an impact.

The expected improvement comes from separating the ZIL workload, which is
small, sequential iops with a high desire for low latency, from more
random workloads.  If you are using an array with NVRAM, it will hide some
of the issues which lead us to want the separation.  However, if you are
using a bunch of JBODs, then it might be worthwhile to have a device
dedicated to the ZIL.

I expect the workload characterizations to begin soon, once b68 arrives and
people get back from the holiday.  After some testing, we'll have a better
idea of how well it works for a variety of workloads.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS + SAN configuration recommendations?

2007-07-03 Thread Alderman, Sean
Hi all,
  I'm new to the list as of today, I've come because I'm fascinated with
ZFS and my company has just begun an adventure into the unknown with
Solaris 10.

We've got a few Sun Fire X4200's and a few Sun Fire V245's that we're
playing with and we've come to a decision point about how to configure
SAN LUNs on these boxes.  I'm curious what you all think would be a best
practice for the relatively simple scenario described below:

Application Use:  Oracle 10.2
Server: Sun Fire V245 w/ Sun branded Emulex FC HBA
SAN  Storage Allocated:  10 100GB LUNs

I'm not much of an oracle guy, but I will say we don't have a lot of
experience running with Oracle on File systems, most of our existing
Oracle Servers are RAC configured with ASM on raw SAN...and we don't
like this very much.

I'm wondering what the best way to allocate these LUNs with ZFS would
be...

Configure one zpool with all 10 LUNs and a single file system assigning
no special constraints (mirror/striping/raid/zraid) to the pool?

Configure a zpool for each of the 10 LUNs with a single file system
inside each pool?

Configure one zpool with all 10 LUNs and 10 file systems (again no
special zpool config)?

There are some undefined variables, such as the SAN and Oracle
configurations, but I'm not in a position to control those, I don't
admin the SAN, nor am I a DBA.  Strictly from the System Admin
perspective would there be a best solution here?  If we were using
Veritas Volume Manager, and we were to consider a zpool to be equivalent
to a volume group (also a zfs ~ vxfs logical volume), VVM has
limitations where performance becomes bad if LUNs are too large, or too
many, and so forth.  Does ZFS have the same constraints?  Does it follow
that allowing ZFS to manage all the LUNs under a single pool and file
system will perform better?...following the idea that the lower the
level of control the better performance through less layers of
abstraction/overhead.

My next question would be to consider those scenarios with the use of
ZFS mirror or raid functionality.  Does this add unnecessary overhead at
the cost of performance when the SAN may be configured in a RAID 5 or
RAID 10 arrangement?

Many thanks!
--
Sean

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Albert Chin
On Tue, Jul 03, 2007 at 05:31:00PM +0200, [EMAIL PROTECTED] wrote:
> 
> >PSARC 2007/171 will be available in b68. Any documentation anywhere on
> >how to take advantage of it?
> >
> >Some of the Sun storage arrays contain NVRAM. It would be really nice
> >if the array NVRAM would be available for ZIL storage. It would also
> >be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
> >to various sun low/mid-range servers that are currently acting as
> >ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
> >could be used for the ZIL? I think several HD's are available with
> >SCSI/ATA interfaces.
> 
> Would flash memory be fast enough (current flash memory has reasonable
> sequential write throughput but horrible "I/O" ops)

Good point. The speeds for the following don't seem very impressive:
  http://www.adtron.com/products/A25fb-SerialATAFlashDisk.html
  http://www.sandisk.com/OEM/ProductCatalog(1321)-SanDisk_SSD_SATA_5000_25.aspx

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status -v: machine readable format?

2007-07-03 Thread David Smith
I was wondering if anyone had a script to parse the "zpool status -v" output 
into a more machine readable format?

Thanks,

David
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS scaling

2007-07-03 Thread Richard Elling
William Loewe wrote:
> Hello,
> 
> I'm using a Sun Fire X4500 "Thumper" and trying to get some sense of the best 
> performance I can get from it with zfs.
> 
> I'm running without mirroring or raid, and have checksumming turned off.  I 
> built the zfs with these commands:
> 
> # zpool create mypool disk0 disk1 ... diskN
> # zfs set checksum=off mypool
> # zfs create mypool/testing
> 
> When I run an application with 8 threads performing writes, I see this 
> performance:
> 
>1 disk  --  42 MB/s
>2 disks --  81 MB/s
>4 disks -- 147 MB/s
>8 disks -- 261 MB/s
>   12 disks -- 347 MB/s
>   16 disks -- 433 MB/s
>   32 disks -- 687 MB/s
>   45 disks -- 621 MB/s
> 
> I'm surprised it doesn't scale better than this, and I'm curious to know what 
> the best configuration is for getting the maximum write performance from the 
> Thumper.

When doing testing like this, you will need to make sure you are creating
enough large I/O to be interesting.  Unlike many RAID-0 implementations,
ZFS will allocate 128kByte blocks across vdevs on a slab basis.  The default
slab size is likely to be 1 MByte, so if you want to see I/O spread across
45 disks, then you'd need to generate > 45 MBytes of concurrent, write I/O.
Otherwise, you will only see a subset of the disks active.  This should be
measurable via iostat with a small period.  Once written, random reads
should exhibit more stochastic balancing of the iops across disks.

The disks used in the X4500 have a media bandwidth of 31-64.8 MBytes/s, so
getting 42 MBytes/s to or from a single disk is not unreasonable.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Albert Chin
On Tue, Jul 03, 2007 at 09:01:50AM -0700, Richard Elling wrote:
> Albert Chin wrote:
> > Some of the Sun storage arrays contain NVRAM. It would be really nice
> > if the array NVRAM would be available for ZIL storage. It would also
> > be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
> > to various sun low/mid-range servers that are currently acting as
> > ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
> > could be used for the ZIL? I think several HD's are available with
> > SCSI/ATA interfaces.
> 
> First, you need a workload where the ZIL has an impact.

ZFS/NFS + zil_disable is faster than ZFS/NFS without zil_disable. So,
I presume, ZFS/NFS + an NVRAM-backed ZIL would be noticeably faster
than ZFS/NFS + ZIL.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Casper . Dik


>Good point. The speeds for the following don't seem very impressive:
>  http://www.adtron.com/products/A25fb-SerialATAFlashDisk.html
>  http://www.sandisk.com/OEM/ProductCatalog(1321)-SanDisk_SSD_SATA_5000_25.aspx


The adton URL leaves out IOops altogether.

Sandisks limit itself to read IOops (claims a whopping 7000).

But from what I'm told, write IOops are a tiny fraction.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status -v: machine readable format?

2007-07-03 Thread George Wilson
David Smith wrote:
> I was wondering if anyone had a script to parse the "zpool status -v" output 
> into a more machine readable format?
>
> Thanks,
>
> David
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
David,

Are you using the latest Nevada bits, as they will actually print out 
the pathname associated with the errors. Take a look at Eric's blog:

http://blogs.sun.com/erickustarz/entry/damaged_files_and_zpool_status

Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + SAN configuration recommendations?

2007-07-03 Thread Richard Elling
Alderman, Sean wrote:
> Hi all,
>   I'm new to the list as of today, I've come because I'm fascinated with 
> ZFS and my company has just begun an adventure into the unknown with 
> Solaris 10.
> 
> We've got a few Sun Fire X4200's and a few Sun Fire V245's that we're 
> playing with and we've come to a decision point about how to configure 
> SAN LUNs on these boxes.  I'm curious what you all think would be a best 
> practice for the relatively simple scenario described below:
> 
> Application Use:  Oracle 10.2
> Server: Sun Fire V245 w/ Sun branded Emulex FC HBA
> SAN  Storage Allocated:  10 100GB LUNs
> 
> I'm not much of an oracle guy, but I will say we don't have a lot of 
> experience running with Oracle on File systems, most of our existing 
> Oracle Servers are RAC configured with ASM on raw SAN…and we don't like 
> this very much.

If you are using RAC, your choices are limited.  ZFS will not work with RAC.
You should check out QFS, which does work with RAC, and is in the queue for
being open sourced from Sun.  Watch
http://www.opensolaris.org/os/project/samqfs/

> I'm wondering what the best way to allocate these LUNs with ZFS would be…

Good pointers include:
http://blogs.sun.com/realneel/entry/zfs_and_databases
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

> Configure one zpool with all 10 LUNs and a single file system assigning 
> no special constraints (mirror/striping/raid/zraid) to the pool?
> 
> Configure a zpool for each of the 10 LUNs with a single file system 
> inside each pool?
> 
> Configure one zpool with all 10 LUNs and 10 file systems (again no 
> special zpool config)?

In general, the best advice is KISS.  For Oracle databases, we also tend
to recommend separate file system or zpool for redo logs.  To restate
this more generally, use separate zpools when you need separate policies
for the data which include zpool-specific settings.  Similarly, use
separate file systems when the file system policies may be different.

> There are some undefined variables, such as the SAN and Oracle 
> configurations, but I'm not in a position to control those, I don't 
> admin the SAN, nor am I a DBA.  Strictly from the System Admin 
> perspective would there be a best solution here?  If we were using 
> Veritas Volume Manager, and we were to consider a zpool to be equivalent 
> to a volume group (also a zfs ~ vxfs logical volume), VVM has 
> limitations where performance becomes bad if LUNs are too large, or too 
> many, and so forth.  Does ZFS have the same constraints?  Does it follow 
> that allowing ZFS to manage all the LUNs under a single pool and file 
> system will perform better?...following the idea that the lower the 
> level of control the better performance through less layers of 
> abstraction/overhead.

ZFS seems to scale well, from a management perspective.  VxVM has a bit
of a reputation due to implementation and patches over the years which
impacted the scalability -- I would expect most of these to be solved
in modern releases.

> My next question would be to consider those scenarios with the use of 
> ZFS mirror or raid functionality.  Does this add unnecessary overhead at 
> the cost of performance when the SAN may be configured in a RAID 5 or 
> RAID 10 arrangement?

ZFS can recover from many more faults than your RAID array (including your
RAID array).  But it may not be able to recover if it is not configured for
redundancy.

I think of this decision as one of, "where would you like to be able to
recover from faults?"  The correct answer being, "as close to the application
as possible."
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Richard Elling
Albert Chin wrote:
> On Tue, Jul 03, 2007 at 09:01:50AM -0700, Richard Elling wrote:
>> Albert Chin wrote:
>>> Some of the Sun storage arrays contain NVRAM. It would be really nice
>>> if the array NVRAM would be available for ZIL storage. It would also
>>> be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
>>> to various sun low/mid-range servers that are currently acting as
>>> ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
>>> could be used for the ZIL? I think several HD's are available with
>>> SCSI/ATA interfaces.
>> First, you need a workload where the ZIL has an impact.
> 
> ZFS/NFS + zil_disable is faster than ZFS/NFS without zil_disable. So,
> I presume, ZFS/NFS + an NVRAM-backed ZIL would be noticeably faster
> than ZFS/NFS + ZIL.

... for NFS workloads which are sync-sensitive.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Adam Leventhal
Flash SSDs typically boast a huge number of _read_ IOPS (thousands), but
very few write IOPS (tens). The write throughput numbers quoted are almost
certainly for non-synchronous writes whose latency can easily be in the
milisecond range. STEC makes an interesting device which offers fast
_synchronous_ writes on an SSD, but at a pretty steep cost.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Albert Chin
On Tue, Jul 03, 2007 at 10:31:28AM -0700, Richard Elling wrote:
> Albert Chin wrote:
> > On Tue, Jul 03, 2007 at 09:01:50AM -0700, Richard Elling wrote:
> >> Albert Chin wrote:
> >>> Some of the Sun storage arrays contain NVRAM. It would be really nice
> >>> if the array NVRAM would be available for ZIL storage. It would also
> >>> be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
> >>> to various sun low/mid-range servers that are currently acting as
> >>> ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that
> >>> could be used for the ZIL? I think several HD's are available with
> >>> SCSI/ATA interfaces.
> >> First, you need a workload where the ZIL has an impact.
> > 
> > ZFS/NFS + zil_disable is faster than ZFS/NFS without zil_disable. So,
> > I presume, ZFS/NFS + an NVRAM-backed ZIL would be noticeably faster
> > than ZFS/NFS + ZIL.
> 
> ... for NFS workloads which are sync-sensitive.

Well, yes. We've made the decision not to set zil_disable in lieu of
the possibility of the ZFS/NFS server crashing and having the clients
out of sync with what's on the ZFS/NFS server. I think this is the
common case though for a ZFS/NFS server.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Bryan Cantrill

On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote:
> PSARC 2007/171 will be available in b68. Any documentation anywhere on
> how to take advantage of it?
> 
> Some of the Sun storage arrays contain NVRAM. It would be really nice
> if the array NVRAM would be available for ZIL storage. 

It depends on your array, of course, but in most arrays you can control
the amount of write cache (i.e., NVRAM) dedicated to particular LUNs.
So to use the new separate logging most effectively, you should take
your array, and dedicate all of your NVRAM to a single LUN that you then
use as your separate log device.  Your pool should then use a LUN or LUNs
that do not have any NVRAM dedicated to it.  

> It would also
> be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage
> to various sun low/mid-range servers that are currently acting as
> ZFS/NFS servers. 

You can do it yourself very easily -- check out the umem cards from Micro
Memory, available at http://www.umem.com.  Reasonable prices ($1000/GB),
they have a Solaris driver, and the performance absolutely rips.

> Or maybe someone knows of cheap SSD storage that
> could be used for the ZIL? I think several HD's are available with
> SCSI/ATA interfaces.

As Adam mentioned, this is a bit more involved, as most SSDs are biased
very heavily towards reads and away from writes.  So this will be quite
a bit more expensive than NVRAM, at least at the moment...

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Richard Elling
Adam Leventhal wrote:
> Flash SSDs typically boast a huge number of _read_ IOPS (thousands), but
> very few write IOPS (tens). The write throughput numbers quoted are almost
> certainly for non-synchronous writes whose latency can easily be in the
> milisecond range. STEC makes an interesting device which offers fast
> _synchronous_ writes on an SSD, but at a pretty steep cost.

Yes, and the size of the write iop is crucial.  Interestingly, 128 kBytes
is a good size for flash SSD writes...
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Albert Chin
On Tue, Jul 03, 2007 at 11:02:24AM -0700, Bryan Cantrill wrote:
> On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote:
> > PSARC 2007/171 will be available in b68. Any documentation anywhere on
> > how to take advantage of it?
> > 
> > Some of the Sun storage arrays contain NVRAM. It would be really nice
> > if the array NVRAM would be available for ZIL storage. 
> 
> It depends on your array, of course, but in most arrays you can control
> the amount of write cache (i.e., NVRAM) dedicated to particular LUNs.
> So to use the new separate logging most effectively, you should take
> your array, and dedicate all of your NVRAM to a single LUN that you then
> use as your separate log device.  Your pool should then use a LUN or LUNs
> that do not have any NVRAM dedicated to it.  

Hmm, interesting. We'll try to find out if the 6140's can do this.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Bryan Cantrill

On Tue, Jul 03, 2007 at 01:10:25PM -0500, Albert Chin wrote:
> On Tue, Jul 03, 2007 at 11:02:24AM -0700, Bryan Cantrill wrote:
> > On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote:
> > > PSARC 2007/171 will be available in b68. Any documentation anywhere on
> > > how to take advantage of it?
> > > 
> > > Some of the Sun storage arrays contain NVRAM. It would be really nice
> > > if the array NVRAM would be available for ZIL storage. 
> > 
> > It depends on your array, of course, but in most arrays you can control
> > the amount of write cache (i.e., NVRAM) dedicated to particular LUNs.
> > So to use the new separate logging most effectively, you should take
> > your array, and dedicate all of your NVRAM to a single LUN that you then
> > use as your separate log device.  Your pool should then use a LUN or LUNs
> > that do not have any NVRAM dedicated to it.  
> 
> Hmm, interesting. We'll try to find out if the 6140's can do this.

Yes, they can:  use CAM to set the write cache to be "disabled" on all but
the LUN(s) that you want to use as the separate ZIL.

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread Al Hopper
On Tue, 3 Jul 2007, [EMAIL PROTECTED] wrote:

>
>
>> Good point. The speeds for the following don't seem very impressive:
>>  http://www.adtron.com/products/A25fb-SerialATAFlashDisk.html
>>  
>> http://www.sandisk.com/OEM/ProductCatalog(1321)-SanDisk_SSD_SATA_5000_25.aspx
>
>
> The adton URL leaves out IOops altogether.
>
> Sandisks limit itself to read IOops (claims a whopping 7000).
>
> But from what I'm told, write IOops are a tiny fraction.

Agreed.  Here is a cut/paste for a buyer review in ref to a Samsung 
IDE flash memory based 32Gb drive [1]:

Reviewed By: on 6/17/2007
Rating: 5
Tech Level: high - Ownership: 1 month to 1 year

Pros: Performance tests on Linux, UDMA-4, Raw IO. 50MBps sustained 
read. 26MBps sustained write. 2 MB erase block. 3698 4K random 
reads/sec. 24 4K random writes/sec.
Cons: Wish it was SATA and cheaper.
Other Thoughts: As is typical of all Flash drives, terrible random 
write performance, so don't expect to use this in a high-end 
Read/Write database application. You will be disappointed.

--

I can't vouch for the accuracy of this information - so ... consider 
it as but one data point. The Sandisk products (when they ship) will 
probably offer better performance than this product, but I expect to 
see the same general operational characteristics.

[1] http://www.newegg.com/Product/Product.aspx?Item=N82E16820147015

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Share and Remote mounting ZFS for anonyous ftp

2007-07-03 Thread Dale Erhart

Experts,

Sorry if this is a FAQ but I'm not on this alias.
Please reply directly to me.

I'm working on a project setting up a web portal that
will use 2 hosts for load balancing ftp's. I wanted to
use ZFS to showcase it to our customer.

What I've been trying to setup is anonymous ftp to a host that
is sharing a ZFS file system. Anonymous ftp is configured and
does work on the 2 hosts I'm working with. But when I try to
create a common ftp mount point between the 2 hosts (load balancing)
I get permission errors or fchmown errors.

I was wondering if there is a setup/configuration issue or won't ZFS
work with remote mounting and ftp.

Configuration:
SystemA sharing /export/ftp/incoming (zfs)
SystemB mounting SystemA:/export/ftp/incoming

Both hosts have the same permissions on the directories.
I've setup anonymous ftp on both systems with the ftpconfig.
Went through the steps of setting up a shared zfs file system:
zfs sharenfs=on portal/ftp-incoming
zfs set sharenfs=rw=SystemB.domain,root=SystemB.domain portal/ftp-incoming

Mounted the shared file system on SystemB:
mount SystemA:/export/ftp/incoming /export/ftp/incoming

I've setup /etc/ftpd/ftpaccess for upload to /export/ftp/incoming and
to change owner and permissions to a local user:
upload   /export/ftp   /incoming   yes   webadmin   www   0440   nodirs

Problem is that I will get errors when I try to upload a file. Errors are
either permission denied or a fchown error. I've changed ownership on the
/export/ftp/incoming from root to webadmnin to ftp without success.

Need suggestions fast for this project is suppose to go live soon.

Thank you for your help,

Dale


--
  * Dale Erhart *

*Sun Microsystems, Inc.*
1299 E. Algonquin Rd
Schaumburg, Il 60196 US
Phone x66978/+1 630 626 7011
Mobile 630-857-6292
Email [EMAIL PROTECTED]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Space allocation failure

2007-07-03 Thread Manoj Joseph
Manoj Joseph wrote:
> Manoj Joseph wrote:
>> Manoj Joseph wrote:
>>> Hi,
>>>
>>> In brief, what I am trying to do is to use libzpool to access a zpool 
>>> - like ztest does.
>>
>> [snip]
>>
>>> No, AFAIK, the pool is not damaged. But yes, it looks like the device 
>>> can't be written to by the userland zfs.
>>
>> Well, I might have figured out something.
>>
>> Turssing the process shows this:
>>
>> /1: open64("/dev/rdsk/c2t0d0s0", O_RDWR|O_LARGEFILE) = 3
>> /108:   pwrite64(3, " X0101\0140104\n $\0\r  ".., 638, 4198400) Err#22 
>> EINVAL
>> /108:   pwrite64(3, "FC BFC BFC BFC BFC BFC B".., 386, 4199038) Err#22 
>> EINVAL
>> [more failures...]
>>
>> The writes are not aligned to a block boundary. And, apparantly, 
>> unlike files, this does not work for devices.
>>
>> Question: were ztest and libzpool not meant to be run on real devices? 
>> Or could there be an issue in how I setup up things?
> 
> The failing write has this call stack:
> 
>   pwrite64:return
>   libc.so.1`_pwrite64+0x15
>   libzpool.so.1`vn_rdwr+0x5b
>   libzpool.so.1`vdev_file_io_start+0x17e
>   libzpool.so.1`vdev_io_start+0x18
>   libzpool.so.1`zio_vdev_io_start+0x33d
>   [snip]
> 
> usr/src/uts/common/fs/zfs/vdev_file.c has this:
> 
> /*
>  * From userland we access disks just like files.
>  */
> #ifndef _KERNEL
> 
> vdev_ops_t vdev_disk_ops = {
> vdev_file_open,
> vdev_file_close,
> vdev_default_asize,
> vdev_file_io_start,
> vdev_file_io_done,
> NULL,
> VDEV_TYPE_DISK,/* name of this vdev type */
> B_TRUE/* leaf vdev */
> };
> 
> Guess vdev_file_io_start() does not work very well for devices.

Unlike what I had assumed earlier, zio_t that is passed to 
vdev_file_io_start() has aligned offset and size.

The libzpool library, when writing data to the devices below a zpool, 
splits the write into two. This is done for the sake of testing. The 
comment in the routine, vn_rdwr() says this:
/*
  * To simulate partial disk writes, we split writes into two
  * system calls so that the process can be killed in between.
  */

This has the effect of creating misaligned writes to raw devices which 
fail with errno=EINVAL.

Patching that solves the problem for me. :)

End of this thread! ;)

Cheers
Manoj
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log

2007-07-03 Thread David Magda
On Jul 3, 2007, at 11:26, Albert Chin wrote:

> PSARC 2007/171 will be available in b68. Any documentation anywhere on
> how to take advantage of it?

For those not in the know, PSARC 2007/171 is a separate intent log  
for ZFS:

http://cz.opensolaris.org/os/community/arc/caselog/2007/171/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How does ZFS snapshot COW file data?

2007-07-03 Thread Eric Hamilton
Apologies in advance for the newbie internals question, but could someone 
please give me a pointer to how ZFS snapshots cause future modifications to 
files to be written to different disk blocks?  I'm looking at OpenSolaris NV 
bld 66.

How do snapshots interact with open files or files with pages in the 
OpenSolaris page cache?  And what are the effects of O_DSYNC on snapshot 
consistency of open files?  My general understanding is that ZFS always writes 
to new locations (which makes snapshot simple), but does that apply to data 
pages too?  Does that mean that paging out dirty mmap pages go to new places 
and require metadata updates as well?

I've found the ZFS tour and documentation and opengrok helpful.  I've got 
cscope built locally, and I'll dig it out eventually, but I thought perhaps a 
kind soul could give me a pointer and maybe others will learn something 
interesting too.

Eric
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS snapshot COW file data?

2007-07-03 Thread Darren Dunham
> Apologies in advance for the newbie internals question, but could
> someone please give me a pointer to how ZFS snapshots cause future
> modifications to files to be written to different disk blocks?  I'm
> looking at OpenSolaris NV bld 66.

"snapshots" don't really cause that.  ZFS never overwrites data, so all
writes are to "different" disk blocks.  Data in a file is never
overwritten directly.  

Once this is true, then the creation of snapshots is easier, but the two
are separate.

> How do snapshots interact with open files or files with pages in the
> OpenSolaris page cache?

I don't believe they do.  Are you thinking of something in particular?

> My general understanding is that ZFS always writes to new locations
> (which makes snapshot simple), but does that apply to data pages too?

All data and metadata (except for the uberblock dance).

> Does that mean that paging out dirty mmap pages go to new places and
> require metadata updates as well?

Yes.
-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss