Re: [OpenIndiana-discuss] building a new box soon- HDD concerns and recommendations for virtual serving

2013-04-21 Thread Jay Heyl
On Sat, Apr 20, 2013 at 8:48 PM, Carl Brewer  wrote:

>
> Like this :
>
> root@hostie:~# zdb | egrep 'ashift| name'
> name: 'rpool'
> ashift: 12
>
>
>
> And as I understand it, the 12 means 4k blocks, good, right? :)


Yep, those are the good kind. You're future-proof for the foreseeable
future.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] building a new box soon- HDD concerns and recommendations for virtual serving

2013-04-17 Thread Jay Heyl
On Tue, Apr 16, 2013 at 7:53 PM, Carl Brewer  wrote:

>
> 2 x 2TB HDDs for rpool (ZFS mirror)
> 4 x 2TB HDD's to get at least a 4TB mirror (or is RAID-Z a better option?)
>
> Would I be better off with some 500GB HDD's for the rpool?  And while I
> fiddle with this thing, is there any way to get the live CD installer to
> work with these drives without poking around with some other OS to
> partition the drive?


In my rather limited experience, dedicating two 2TB drives to the rpool for
a home server is rather a waste. I have mirrored 500GB drives on mine and
they are a vast wasteland of unused capacity. In my opinion you could go
even smaller if you could put the money to better use.

The raid-z vs mirrors question can quickly get rather complicated. I'd say
if you think 4TB is going to hold you for a good long while, then go with
mirrors. If the six drives you're already talking about are going to
stretch the capacity of your case and you think you may need to expand
beyond 4TB in the reasonably near future, then you might want to consider
raid-z1 to get 6TB usable space from your data pool.

One thing I would recommend is trying to use the ashift=12 setting to force
the use of 4k blocks. I ran into problems because my initial pools were
created with 512-byte blocks. When I bought some spare drives I couldn't
use them because they were advanced format with 4k blocks and zfs won't mix
block sizes on the same vdev. Had I used 4k blocks when I initially set
everything up I wouldn't have had this problem with the new drives.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-17 Thread Jay Heyl
On Wed, Apr 17, 2013 at 5:38 AM, Edward Ned Harvey (openindiana) <
openindi...@nedharvey.com> wrote:

> > From: Sašo Kiselkov [mailto:skiselkov...@gmail.com]
> >
> > Raid-Z indeed does stripe data across all
> > leaf vdevs (minus parity) and does so by splitting the logical block up
> > into equally sized portions.
>
> Jay, there you have it.  You asked why use mirrors, and you said you would
> use raidz2 or raidz3 unless cpu overhead is too much.  I recommended using
> mirrors and avoiding raidzN, and here is the answer why.
>
> If you have 16 disks arranged in 8x mirrors, versus 10 disks in raidz2
> which stripes across 8 disks plus 2 parity disks, then the serial write of
> each configuration is about the same; that is, 8x the sustained write speed
> of a single device.  But if you have two or more parallel sequential read
> threads, then the sequential read speed of the mirrors will be 16x while
> the raidz2 is only 8x.  The mirror configuration can do 8x random write
> while the raidz2 is only 1x.  And the mirror can do 16x random read while
> the raidz2 is only 1x.
>

It (finally) occurs to me that not all mirrors are created equal. I've been
assuming, and probably ignoring hints to the contrary, that what was being
compared here was a raid-z2 configuraton with a 2-way mirror composed of
two 8-disk vdevs. I now realize you're talking about 8 separate 2-disk
mirrors organized into a pool. "mirror x1 y1 mirror x2 y2 mirror x3 y3..."
I also realize that almost every discussion I've seen online concerning
mirrors proposes organizing the drives in the way I was thinking about it
(which is probably why I was thinking that way). I suppose this is
something different that zfs brings to the table when compared to more
conventional hardware raid.


>
> In the case you care about the least, they're equal.  In the case you care
> about most, the mirror configuration is 16x faster.
>
> You also said the raidz2 will offer more protection against failure,
> because you can survive any two disk failures (but no more.)  I would argue
> this is incorrect (I've done the probability analysis before).  Mostly
> because the resilver time in the mirror configuration is 8x to 16x faster
> (there's 1/8 as much data to resilver, and IOPS is limited by a single
> disk, not the "worst" of several disks, which introduces another factor up
> to 2x, increasing the 8x as high as 16x), so the smaller resilver window
> means lower probability of "concurrent" failures on the critical vdev.
>  We're talking about 12 hours versus 1 week, actual result of my machines
> in production.  Also, while it's possible to fault the pool with only 2
> failures in the mirror configuration, the probability is against that
> happening.  The first disk failure probability is 1/16 for each disk ...
> And then if you have a 2nd concurrent failure, there's a 14/15 probability
> that it occurs on a separately independent (safe) mirror.  The 3rd
> concurrent failure 12/14 chance of being safe.  The 4th concurrent failure
> 10/13 chance of being safe.  Etc.  The mirror configuration can probably
> withstand a higher number of failures, and also the resilver window for
> each failure is smaller.  When you look at the total probability of pool
> failure, they were both like 10^-17 or something like that.  In other
> words, we're splitting hairs but as long as we are, we might as well point
> out that they're both about the same.
>

This also starts to make a lot more sense. Confused the hell out of me the
first three times I read it. I'm going to have to ponder this a bit more as
my thinking has been heavily influenced by the more conventional mirror
arrangement.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-17 Thread Jay Heyl
On Wed, Apr 17, 2013 at 11:21 AM, Jim Klimov  wrote:

> On 2013-04-17 20:09, Jay Heyl wrote:
>
>> reply. Unless the first device to answer returns garbage (something
>>> that doesn't match the expected checksum), other copies are not read
>>> as part of this request.
>>>
>>>
>> Ah, that makes much more sense. Thanks for the clarification. Now that you
>> put it that way I have to wonder how I ever came under the impression it
>> was any other way.
>>
>
>
> Well, there are different architectures, so some might do what you
> suggested. From what I read just yesterday, RAM mirroring on some
> high-end servers works indeed like you described - by reading both
> parts and comparing the results, testing ECC if needed, etc. to
> figure out the correct memory contents or return an error if both
> parts are faulty and can't be trusted (ECC mismatch on both).
>
> Military, nuclear and space systems often are built like 3 or 5
> computers (odd amount for easier quorum) doing the same calculations
> over same inputs, and comparing the results to be sure of them or to
> redo the task.
>
> So I guess it depends on your background - why you thought this of
> disk systems ;)


Actually, I'm pretty sure I read it somewhere on the internet. My fault for
thinking every guy with a blog actually knows what he's talking about. :-)
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-17 Thread Jay Heyl
On Tue, Apr 16, 2013 at 5:49 PM, Jim Klimov  wrote:

> On 2013-04-17 02:10, Jay Heyl wrote:
>
>> Not to get into bickering about semantics, but I asked, "Or am I wrong
>> about reads being issued in parallel to all the mirrors in the array?", to
>> which you replied, "Yes, in normal case... this assumption is wrong... but
>> reads should be in parallel." (Ellipses intended for clarity, not argument
>> munging.) If reads are in parallel, then it seems as though my assumption
>> is correct. I realize the system will discard data from all but the first
>> reads and that using only the first response can improve performance, but
>> in terms of number of IOPs, which is where I intended to go with this, it
>> seems to me the mirrored system will have at least as many if not more
>> than
>> the raid-zn system.
>>
>> Or have I completely misunderstood what you intended to say?
>>
>
> Um, right... I got torn between several letters and forgot the details
> of one. So, here's what I replied to with poor wording - *I thought you
> meant* "A single read request from a program would be redirected as a
> series of parallel requests to mirror components asking for the same
> data, whichever one answers first" - this is no, the "wrong" in my
> reply. Unless the first device to answer returns garbage (something
> that doesn't match the expected checksum), other copies are not read
> as part of this request.
>

Ah, that makes much more sense. Thanks for the clarification. Now that you
put it that way I have to wonder how I ever came under the impression it
was any other way.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-16 Thread Jay Heyl
On Tue, Apr 16, 2013 at 4:01 PM, Jim Klimov  wrote:

> On 2013-04-16 23:56, Jay Heyl wrote:
>
>> result in more devices being hit for both read and write. Or am I wrong
>> about reads being issued in parallel to all the mirrors in the array?
>>
>
> Yes, in normal case (not scrubbing which makes a point of reading
> everything) this assumption is wrong. Writes do hit all devices
> (mirror halves or raid disks), but reads should be in parallel.
> For mechanical HDDs this allows to double average read speeds
> (or triple for 3-way mirrors, etc.) because different spindles
> begin using their heads in shorter strokes around different areas,
> if there are enough concurrent randomly placed reads.


Not to get into bickering about semantics, but I asked, "Or am I wrong
about reads being issued in parallel to all the mirrors in the array?", to
which you replied, "Yes, in normal case... this assumption is wrong... but
reads should be in parallel." (Ellipses intended for clarity, not argument
munging.) If reads are in parallel, then it seems as though my assumption
is correct. I realize the system will discard data from all but the first
reads and that using only the first response can improve performance, but
in terms of number of IOPs, which is where I intended to go with this, it
seems to me the mirrored system will have at least as many if not more than
the raid-zn system.

Or have I completely misunderstood what you intended to say?
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-16 Thread Jay Heyl
On Tue, Apr 16, 2013 at 2:25 PM, Timothy Coalson  wrote:

> On Tue, Apr 16, 2013 at 3:48 PM, Jay Heyl  wrote:
>
> > My question about the rationale behind the suggestion of mirrored SSD
> > arrays was really meant to be more in relation to the question from the
> OP.
> > I don't see how mirrored arrays of SSDs would be effective in his
> > situation.
> >
>
> There is another detail here to keep in mind: ZFS checks checksums on every
> read from storage, and with raid-zn used with block sizes that give it more
> capacity than mirroring (that is, data blocks are large enough that they
> get split across multiple data sectors and therefore devices, instead of
> degenerate single data sector plus parity sector(s) - OP mentioned 32K
> blocks, so they should get split), this means each random filesystem read
> that isn't cached hits a large number of devices in a raid-zn vdev, but
> only one device in a mirror vdev (unless ZFS splits these reads across
> mirrors, but even then it is still fewer devices hit).  If you are limited
> by IOPS of the devices, then this could make raid-zn slower.
>

I'm getting a sense of comparing apples to oranges here, but I do see your
point about the raid-zn always requiring reads from more devices due to the
parity. OTOH, it was my impression that read operations on n-way mirrors
are always issued to each of the 'n' mirrors. Just for the sake of
argument, let's say we need room for 1TB of storage. For raid-z2 we use
4x500GB devices. For the mirrored setup we have two mirrors each with
2x500GB devices. Reads to the raid-z2 system will hit four devices. If my
assumption is correct, reads to the mirrored system will also hit four
devices. If we go to a 3-way mirror, reads would hit six devices.

In all but degenerate cases, mirrored arrangements are going to include
more drives for the same amount of usable storage, so it seems they should
result in more devices being hit for both read and write. Or am I wrong
about reads being issued in parallel to all the mirrors in the array?
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-16 Thread Jay Heyl
On Tue, Apr 16, 2013 at 11:54 AM, Jim Klimov  wrote:

> On 2013-04-16 20:30, Jay Heyl wrote:
>
>> What would be the logic behind mirrored SSD arrays? With spinning platters
>> the mirrors improve performance by allowing the fastest of the mirrors to
>> respond to a particular command to be the one that defines throughput.
>> With
>>
>
> Well, to think up a rationale: it is quite possible to saturate a bus
> or an HBA with SSDs, leading to increased latency in case of intense
> IO just because some tasks (data packets) are waiting in queue waiting
> for the bottleneck to dissolve. If another side of the mirror has a
> different connection (another HBA, another PCI bus) then IOs can go
> there - increasing overall performance.
>

This strikes me as a strong argument for carefully planning the arrangement
of storage devices of any sort in relation to HBAs and buses. It seems
significantly less strong as an argument for a mirror _maybe_ having a
different connection and responding faster.

My question about the rationale behind the suggestion of mirrored SSD
arrays was really meant to be more in relation to the question from the OP.
I don't see how mirrored arrays of SSDs would be effective in his
situation.

Personally, I'd go with RAID-Z2 or RAID-Z3 unless the computational load on
the CPU is especially high. This would give you as good as or better fault
protection than mirrors at significantly less cost. Indeed, given his
scenario of write early, read often later on, I might even be tempted to go
for the new TLC SSDs from Samsung. For this particular use the much reduced
"lifetime" of the devices would probably not be a factor at all. OTOH,
given the almost-no-limits budget, shaving $100 here or there is probably
not a big consideration. (And just to be clear, I would NOT recommend the
TLC SSDs for a more general solution. It was specifically the write-few,
read-many scenario that made me think of them.)

Basically, this answer stems from logic which applies to "why would we
> need 6Gbit/s on HDDs?" Indeed, HDDs won't likely saturate their buses
> with even sequential reads. The link speed really applies to the bursts
> of IO between the system and HDD's caches. Double bus speed roughly
> halves the time a HDD needs to keep the bus busy for its portion of IO.
> And when there are hundreds of disks sharing a resource (an expander
> for example), this begins to matter.


It's actually not all that difficult to saturate a 6Gb/s pathway with ZFS
when there are multiple storage devices on the other end of that path. No
single HDD today is going to come close to needing that full 6Gb/s, but put
four or five of them hanging off that same path and that ultra-super
highway starts looking pretty congested. Put SSDs on the other end and the
6Gb/s pathway is going to quickly become your bottleneck.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Recommendations for fast storage

2013-04-16 Thread Jay Heyl
On Mon, Apr 15, 2013 at 5:00 AM, Edward Ned Harvey (openindiana) <
openindi...@nedharvey.com> wrote:

>
> So I'm just assuming you're going to build a pool out of SSD's, mirrored,
> perhaps even 3-way mirrors.  No cache/log devices.  All the ram you can fit
> into the system.


What would be the logic behind mirrored SSD arrays? With spinning platters
the mirrors improve performance by allowing the fastest of the mirrors to
respond to a particular command to be the one that defines throughput. With
SSDs, they all should respond in basically the same time. There is no
latency due to head movement or waiting for the proper spot on the disc to
rotate under the heads. The improvement in read performance seen in
mirrored spinning platters should not be present with SSDs. Admittedly,
this is from a purely theoretical perspective. I've never assembled an SSD
array to compare mirrored vs RAID-Zx performance. I'm curious if you're
aware of something I'm overlooking.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] OI as file storage server

2012-09-14 Thread Jay Heyl
On Fri, Sep 14, 2012 at 12:07 AM, Neddy, NH. Nam  wrote:

>
> stick with it more. But I doubt about in my real case, my storage
> server will be working as a file level storage more than block level
> storeage. Does it slow down ZFS performance?


I don't claim to be an expert on this. I'm just a guy running a rather
large home file server using ZFS. I have a ten-drive RAIDZ2 array. The
drives are Samsung F4 2TB drives. Nothing fancy. The drives are not
especially fast. In this configuration I can max out my GB Ethernet without
appearing to strain ZFS in any way. Your mileage may vary.

Admittedly, I'm also not supporting dozens of users so that may affect
things. At most my server will be supporting some BitTorrent operations,
streaming of a movie, and scanning of drives from another computer. It is
easily three times faster than it actually needs to be to serve peak demand.

I should also point out there are tricks you can do with ZFS using SSDs
dedicated to logging and such that apparently speed up write operations
considerably. I've done none of these.

As for setting up a test system, you can use a live boot CD to boot OI on
any computer you want to try. I did that initially and used a few USB flash
drives to set up test scenarios. It wasn't totally realistic, but it gave
me a few for how things worked. When you're done, pop out the CD, pull out
the flash drives, and you're right back where you were. Couldn't be easier.

  -- Jay
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Options for a root filesystem

2012-06-26 Thread Jay Heyl
On Mon, Jun 25, 2012 at 5:55 PM, Vishwas Durai wrote:

> Hello all,
> I'm wondering what options are available for root filesystem in OI? By
> default,  install  uses ZFS and creates a rpool. But If I'm a ZFS hacker
> and made some changes to some core structures, how does one go about
> debugging that? Is dropping to kmdb and debugging the only available
> (painful) option?


Trust me, you do NOT want to be modifying the file system code used on your
live system. Leave your base system running with the release code and use a
virtual machine as your sand box. Actually, use a COPY of a nicely
configured virtual machine. If you mess up, correcting your mistake is a
simple matter of making another copy of the virtual machine. Unless you
enjoy installing/restoring your development machine...

  -- Jay
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


[OpenIndiana-discuss] Drive compatibility

2012-06-09 Thread Jay Heyl
I had a bit of a hiccup last week with my zfs pool. It's a ten-drive raidz2
vdev. All the drives are Samsung F4s, though of two slightly different
models. Two of the drives showed up one morning as "degraded". In somewhat
of a panic I rushed out and bought a couple Seagate drives as replacements.
When I tried to do the actual replace operation zfs told me the new drive
was not compatible with the existing drive array. I don't recall the
wording of the error message.

I have since learned what I thought was complete disaster was zfs being
extremely cautious. I ended up clearing the errors and two subsequent
scrubs have turned up no errors.

Just in case, I bought some more Samsung F4 drives. Last night I installed
one of them as a spare drive. No problem. Then, since it was still mounted,
I designated the Seagate drive as a spare. Again, no problem. Since I know
it doesn't like the Seagate drive for use with this array, this tells me
mounting a drive as a spare doesn't check to see if the drive will really
work as a spare.

Is there any way short of actually doing a zpool replace to determine if a
drive will be truly compatible with an existing vdev?
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


[OpenIndiana-discuss] ZFS restore from snapshot

2012-06-02 Thread Jay Heyl
I have a file that shows as corrupted in the live file system and two
snapshots. The file gives every indication of being valid in earlier
snapshots. I've tried to restore it from the good snapshot but it doesn't
seem to want to take. After several failed attempts to copy directly from
the snapshot, I copied from the snapshot to my home directory. That copy
appears to be good. (It's an image file and the image comes up fine.) I've
deleted the file from the live directory and it seems to disappear. But
every time I copy the file from my home directory into the target
directory, the target directory copy appears corrupted.

With most other file systems I'd suspect this indicates a problem with the
directory rather than with the actual file, but I don't know enough about
zfs to judge whether this makes sense or not.

Any ideas what might be going on with this file and how I can get it
restored from the good snapshot?

The other odd thing, not directly related to getting it restored, is that
neither the file nor the directory it's in has been purposely changed since
the time of the good snapshot. Obviously something changed or there
wouldn't be any difference between the live version and the snapshot, but
I'm bewildered about how it might have happened.
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] zfs and usb drives

2012-05-31 Thread Jay Heyl
On Wed, May 30, 2012 at 7:14 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

>
> I have been using USB drives in a mirror configuration for quite a few
> years with zfs.  No problems have been encountered due to using zfs. The
> main thing I learned is to always export the pool before unplugging the USB
> cables because the pool won't come back up if the cables are plugged into
> different ports than before.
>

That's excellent information. I'm sure you've saved me some future grief.
Thanks.


>
> I have heard that one should always plug the drives directly into the
> computer rather than into a USB bridge device.


I once had four USB drives connected to Windows via a powered hub. It
worked okay, but there was definitely a performance penalty. Moving files
from one drive to another proceeded at about half the USB speed limit. When
they're each connected directly to a USB port the transfers can go at close
to the USB max.

  -- Jay
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


[OpenIndiana-discuss] zfs and usb drives

2012-05-29 Thread Jay Heyl
A while ago I put together a server using OI and zfs that I was hoping
would provide me room to grow for well into the future. While I'm not yet
running out of room, the usage chart shows the future approaching
considerably faster than I had hoped. The primary storage pool is composed
of ten drives in a raidz2 setup. Given the current price of hard drives
combined with wanting to make the next jump another similar ten drive array
(plus some issues not really relevant to the question I'm getting to), I've
been considering some cheaper alternatives for staying ahead of the growth
curve.

A fair amount of what I currently have on the server could be classified as
non-critical. I'd rather not lose it, but I won't cry too hard if I do.

One idea is to use the external USB drives I've collected to create a
separate pool for the non-critical data. I have four 2TB USB drives that
have been performing flawlessly connected to Windows systems for quite a
while. Any opinions on using them to set up another raidz pool? Is this a
reasonable idea or a really bad idea?

I'd be inclined to go with raidz1 on this new pool simply to not give up
50% of the storage to "parity" data. (I might be convinced to buy another
2TB drive to get a 60/40 split on a raidz2 pool.) I realize there's also
the option of just going JBOD for the non-critical stuff, but I think I"d
prefer the greater level of assurance provided by raidz.

Informed thoughts on this topic will be greatly appreciated.

  -- Jay
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss