Re: [OpenIndiana-discuss] Firefox 20
I tried as well with the tar.bz2 package, and it is the same crash and core dump. The problem has been fixed and the reason was a compiler error. I just wonder why they keep compiling with SolarisStudio? A.S. -- Apostolos Syropoulos Xanthi, Greece ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Firefox 20
On Tue, 16 Apr 2013, Apostolos Syropoulos wrote: I tried as well with the tar.bz2 package, and it is the same crash and core dump. The problem has been fixed and the reason was a compiler error. I just wonder why they keep compiling with SolarisStudio? It avoids needing to depend on GCC run-time libraries, for which there is no useful standard on Solaris. It also helps keep the code honest by assuring that it is compiled with at least three different compilers. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage (OpenIndiana-discuss Digest, Vol 33, Issue 20)
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] It would be difficult to believe that 10Gbit Ethernet offers better bandwidth than 56Gbit Infiniband (the current offering). The swiching model is quite similar. The main reason why IB offers better latency is a better HBA hardware interface and a specialized stack. 5X is 5X. Put another way, the reason infiniband is so much higher throughput and lower latency than ethernet is because the switching (at the physical layer) is completely different from ethernet, and messages are passed directly from user-level to user-level on remote system ram via RDMA, bypassing the OSI layer model and other kernel overhead. I read a paper from vmware, where they implemented RDMA over ethernet and doubled the speed of vmotion (but still not as fast as infiniband, by like 4x.) Beside the bypassing of OSI layers and kernel latency, IB latency is lower because Ethernet switches use store-and-forward buffering managed by the backplane in the switch, in which a sender sends a packet to a buffer on the switch, which then pushes it through the backplane, and finally to another buffer on the destination. IB uses cross-bar, or cut-through switching, in which the sending host channel adapter signals the destination address to the switch, then waits for the channel to be opened. Once the channel is opened, it stays open, and the switch in between is nothing but signal amplification (as well as additional virtual lanes for congestion management, and other functions). The sender writes directly to RAM on the destination via RDMA, no buffering in between. Bypassing the OSI layer model. Hence much lower latency. IB also has native link aggregation into data-striped lanes, hence the 1x, 2x, 4x, 16x designations, and the 40Gbit specifications. Something which is quasi-possible in ethernet via LACP, but not as good and not the same. IB guarantees packets delivered in the right order, with native congestion control as compared to ethernet which may drop packets and TCP must detect and retransmit... Ethernet includes a lot of support for IP addressing, and variable link speeds (some 10Gbit, 10/100, 1G etc) and all of this asynchronous. For these reasons, IB is not a suitable replacement for IP communications done on ethernet, with a lot of variable peer-to-peer and broadcast traffic. IB is designed for networks where systems want to establish connections to other systems, and those connections remain mostly statically connected. Primarily clustering storage networks. Not primarily TCP/IP. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage (OpenIndiana-discuss Digest, Vol 33, Issue 20)
some of these points are a bit dated. Allow me to make some updates. I'm sure that you are aware that most 10gig switches these days are cut through and not store and forward. That's Arista, HP, Dell Force10, Mellanox, and IBM/Blade. Cisco has a mix of things, but they aren't really in the low latency space. The 10g and 40g port to port forwarding is in nanoseconds. buffering is mostly reserved to carrier operations anymore, and even there it is becoming less common because of the toll it causes to things like IPVideo and VOIP. Buffers are good for web farms, still, and to a certain extent storage servers or WAN links where there is a high degree of contention from disparate traffic. At a physical level, the signalling of IB compared to Ethernet (10g+) is very similar, which is why Mellanox can make a single chip that does 10gbit 40gbit, and QDR and FDR infiniband on any port. there are also a fair number of vendors that support RDMA in ethernet NIC now, like SolarFlare with Onboot technology. The main reason for lowest achievable latency is higher speed. Latency is roughly equivalent to the inversion of bandwidth. But, the higher levels of protocols that you stack on top contribute much more than the hardware theoretical minimums or maximums. TCP/IP is a killer in terms of adding overhead. That's why there are protocols like ISER, SRP, and friends. RDMA is much faster than the kernel overhead induced by TCP session setups and other host side user/kernel boundaries and buffering. PCI latency is also higher than the port to port latency on a good 10g switch, nevermind 40 or FDR infiniband. There is even a special layer that you can write custom protocols to on Infiniband called Verbs for lowering latency further. Infiniband is inherently a layer1 and 2 protocol, and the subnet manager (software) is resposible for setting up all virtual circuits (routes between hosts on the fabric) and rerouting when a path goes bad. Also, the link aggregation, as you mention, is rock solid and amazingly good. Auto rerouting is fabulous and super fast. But, you don't get layer3. TCP over IB works out of the box, but adds large overhead. Still, it does make it possible that you can have IB native and IP over IB with gateways to a TCP network with a single cable. That's pretty cool. Sent from my android device. -Original Message- From: Edward Ned Harvey (openindiana) openindi...@nedharvey.com To: Discussion list for OpenIndiana openindiana-discuss@openindiana.org Sent: Tue, 16 Apr 2013 10:49 AM Subject: Re: [OpenIndiana-discuss] Recommendations for fast storage (OpenIndiana-discuss Digest, Vol 33, Issue 20) From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] It would be difficult to believe that 10Gbit Ethernet offers better bandwidth than 56Gbit Infiniband (the current offering). The swiching model is quite similar. The main reason why IB offers better latency is a better HBA hardware interface and a specialized stack. 5X is 5X. Put another way, the reason infiniband is so much higher throughput and lower latency than ethernet is because the switching (at the physical layer) is completely different from ethernet, and messages are passed directly from user-level to user-level on remote system ram via RDMA, bypassing the OSI layer model and other kernel overhead. I read a paper from vmware, where they implemented RDMA over ethernet and doubled the speed of vmotion (but still not as fast as infiniband, by like 4x.) Beside the bypassing of OSI layers and kernel latency, IB latency is lower because Ethernet switches use store-and-forward buffering managed by the backplane in the switch, in which a sender sends a packet to a buffer on the switch, which then pushes it through the backplane, and finally to another buffer on the destination. IB uses cross-bar, or cut-through switching, in which the sending host channel adapter signals the destination address to the switch, then waits for the channel to be opened. Once the channel is opened, it stays open, and the switch in between is nothing but signal amplification (as well as additional virtual lanes for congestion management, and other functions). The sender writes directly to RAM on the destination via RDMA, no buffering in between. Bypassing the OSI layer model. Hence much lower latency. IB also has native link aggregation into data-striped lanes, hence the 1x, 2x, 4x, 16x designations, and the 40Gbit specifications. Something which is quasi-possible in ethernet via LACP, but not as good and not the same. IB guarantees packets delivered in the right order, with native congestion control as compared to ethernet which may drop packets and TCP must detect and retransmit... Ethernet includes a lot of support for IP addressing, and variable link speeds (some 10Gbit, 10/100, 1G etc) and all of this asynchronous. For these reasons, IB is not a suitable replacement for IP
Re: [OpenIndiana-discuss] Recommendations for fast storage
I am not an expert of this subject , but with respect to my readings in some e-mails in different mailing lists and from some relevant pages in Wikipedia about SSD drives , the following points are mentioned about SSD disadvantages ( even for Enterprise labeled drives ) : SSD units are very vulnerable to power cuts during work up to complete failure which they can not be used any more to complete loss of data . MLC ( Multi-Level Cell ) SSD units have a short life time if they are continuously written ( they are more suitable to write once ( in a limited number of writes sense ) - read many ) . SLC ( Single-Level Cell ) SSD units have much more long life span , but they are expensive with respect to MLC SSD units . SSD units may fail due to write wearing in an unexpected time , making them very unreliable for mission critical works . Due to the above points ( they may be wrong perhaps ) personally I would select revolving plate SAS disks and up to now I did not buy any SSD for these reasons . The above points are a possible disadvantages set for consideration . Thank you very much . Mehmet Erol Sanliturk ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Mon, Apr 15, 2013 at 5:00 AM, Edward Ned Harvey (openindiana) openindi...@nedharvey.com wrote: So I'm just assuming you're going to build a pool out of SSD's, mirrored, perhaps even 3-way mirrors. No cache/log devices. All the ram you can fit into the system. What would be the logic behind mirrored SSD arrays? With spinning platters the mirrors improve performance by allowing the fastest of the mirrors to respond to a particular command to be the one that defines throughput. With SSDs, they all should respond in basically the same time. There is no latency due to head movement or waiting for the proper spot on the disc to rotate under the heads. The improvement in read performance seen in mirrored spinning platters should not be present with SSDs. Admittedly, this is from a purely theoretical perspective. I've never assembled an SSD array to compare mirrored vs RAID-Zx performance. I'm curious if you're aware of something I'm overlooking. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 2013-04-16 19:17, Mehmet Erol Sanliturk wrote: I am not an expert of this subject , but with respect to my readings in some e-mails in different mailing lists and from some relevant pages in Wikipedia about SSD drives , the following points are mentioned about SSD disadvantages ( even for Enterprise labeled drives ) : My awareness in the subject is of similar nature, but with different results someplace... Here goes: SSD units are very vulnerable to power cuts during work up to complete failure which they can not be used any more to complete loss of data . Yes, maybe, for some vendors. Information is scarce about which ones are better in practical reliability, leading to requests for info like this thread. Still, some vendors make a living by selling expensive gear into critical workloads, and are thought to perform well. One factor, though not always a guarantee, of proper end-of-work in case of a power-cut, is presence of either batteries/accumulators, or capacitors, which power the device long enough for it to save its caches, metadata, etc. Then the mileage varies how well who does it. MLC ( Multi-Level Cell ) SSD units have a short life time if they are continuously written ( they are more suitable to write once ( in a limited number of writes sense ) - read many ) . SLC ( Single-Level Cell ) SSD units have much more long life span , but they are expensive with respect to MLC SSD units . I hear SLC are also faster due to more simple design. Price stems from requirement to have more cells than MLC to implement the same amount of storage bits. Also there are now some new designs like eMLC which are young and untested, but are said to have MLC price and SLC reliability. With decrease of sizes in technical process, diffusion and brownian movement of atoms plays an increasingly greater role. Indeed, while early SSDs boasted tens and hundreds of thousands of rewrite cycles, now 5-10k is good. But faster. SSD units may fail due to write wearing in an unexpected time , making them very unreliable for mission critical works . For this reason there is over-provisioning. The SSD firmware detects unreliable chips and excludes them from use, relocating data onto spare chips. Also there is wear-leveling, it is when the firmware tries to make sure that all ships are utilized more or less equally and on average the device lives longer. Basically, an SSD (unlike a normal USB Flash key) implements a RAID over tens of chips with intimate knowledge and diagnostic mechanisms over the storage pieces. Overall, vendors now often rate their devices in gbytes of writes in their lifetime, or in full rewrites of the device. Last year we had a similar discussion on-list, regarding then-new Intel DC S3700 http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/50424 and it struck me that in practical terms they boasted Endurance Rating - 10 drive writes/day over 5 years. That is a lot for many use-cases. They are also relatively pricey, at $2.5/gb linearly from 100G to 800G devices (in a local webshop here). Due to the above points ( they may be wrong perhaps ) personally I would select revolving plate SAS disks and up to now I did not buy any SSD for these reasons . The above points are a possible disadvantages set for consideration . They are not wrong in general, and there are any number of examples where bad things do happen. But there are devices which are said to successfully work around the fundamental drawbacks with some other technology, such as firmware and capacitors and so on. It is indeed not yet a subject and market to be careless with, by taking just any device off the shelf and expecting it to perform well and live long. Also it is beneficial to do some homework during system configuration and reduce unnecessary writes to the SSDs - by moving logs out of the rpool, disabling atime updates and so on. There are things an SSD is good for, and some things HDDs are better at (or are commonly thought to be) - i.e. price and longevity past infant death toll, and the choice of components does depend on expected system utilization as well as performance requirements as well as how much you're ready to cash up for that. All that said, I haven't yet touched an SSD so far, but mostly due to financial reasons with both dayjob and home rigs... //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
Mehmet Erol Sanliturk wrote: I am not an expert of this subject , but with respect to my readings in some e-mails in different mailing lists and from some relevant pages in Wikipedia about SSD drives , the following points are mentioned about SSD disadvantages ( even for Enterprise labeled drives ) : SSD units are very vulnerable to power cuts during work up to complete failure which they can not be used any more to complete loss of data . That's why some of them include their own momentary power store, or in some systems, the system has a momentary power store to keep them powered for a period after the last write operation. MLC ( Multi-Level Cell ) SSD units have a short life time if they are continuously written ( they are more suitable to write once ( in a limited number of writes sense ) - read many ) . SLC ( Single-Level Cell ) SSD units have much more long life span , but they are expensive with respect to MLC SSD units . SSD units may fail due to write wearing in an unexpected time , making them very unreliable for mission critical works . All the Enterprise grade SSDs I've used can tell you how far through their life they are (in terms of write wearing). Some of the monitoring tools pick this up and warn you when you're down to some threshold, such as 20% left. Secondly, when they wear out, they fail to write (effectively become write protected). So you find out before they confirm committing your data, and you can still read all the data back. This is generally the complete opposite of the failure modes of hard drives, although like any device, the SSD might fail for other reasons. I have not played with consumer grade drives. Due to the above points ( they may be wrong perhaps ) personally I would select revolving plate SAS disks and up to now I did not buy any SSD for these reasons . The above points are a possible disadvantages set for consideration . The extra cost of using loads of short stroked 15k drives to get anywhere near SSD performance is generally prohibitive. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 11:54 AM, Jim Klimov jimkli...@cos.ru wrote: On 2013-04-16 20:30, Jay Heyl wrote: What would be the logic behind mirrored SSD arrays? With spinning platters the mirrors improve performance by allowing the fastest of the mirrors to respond to a particular command to be the one that defines throughput. With Well, to think up a rationale: it is quite possible to saturate a bus or an HBA with SSDs, leading to increased latency in case of intense IO just because some tasks (data packets) are waiting in queue waiting for the bottleneck to dissolve. If another side of the mirror has a different connection (another HBA, another PCI bus) then IOs can go there - increasing overall performance. This strikes me as a strong argument for carefully planning the arrangement of storage devices of any sort in relation to HBAs and buses. It seems significantly less strong as an argument for a mirror _maybe_ having a different connection and responding faster. My question about the rationale behind the suggestion of mirrored SSD arrays was really meant to be more in relation to the question from the OP. I don't see how mirrored arrays of SSDs would be effective in his situation. Personally, I'd go with RAID-Z2 or RAID-Z3 unless the computational load on the CPU is especially high. This would give you as good as or better fault protection than mirrors at significantly less cost. Indeed, given his scenario of write early, read often later on, I might even be tempted to go for the new TLC SSDs from Samsung. For this particular use the much reduced lifetime of the devices would probably not be a factor at all. OTOH, given the almost-no-limits budget, shaving $100 here or there is probably not a big consideration. (And just to be clear, I would NOT recommend the TLC SSDs for a more general solution. It was specifically the write-few, read-many scenario that made me think of them.) Basically, this answer stems from logic which applies to why would we need 6Gbit/s on HDDs? Indeed, HDDs won't likely saturate their buses with even sequential reads. The link speed really applies to the bursts of IO between the system and HDD's caches. Double bus speed roughly halves the time a HDD needs to keep the bus busy for its portion of IO. And when there are hundreds of disks sharing a resource (an expander for example), this begins to matter. It's actually not all that difficult to saturate a 6Gb/s pathway with ZFS when there are multiple storage devices on the other end of that path. No single HDD today is going to come close to needing that full 6Gb/s, but put four or five of them hanging off that same path and that ultra-super highway starts looking pretty congested. Put SSDs on the other end and the 6Gb/s pathway is going to quickly become your bottleneck. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, 16 Apr 2013, Jay Heyl wrote: It's actually not all that difficult to saturate a 6Gb/s pathway with ZFS when there are multiple storage devices on the other end of that path. No single HDD today is going to come close to needing that full 6Gb/s, but put four or five of them hanging off that same path and that ultra-super highway starts looking pretty congested. Put SSDs on the other end and the 6Gb/s pathway is going to quickly become your bottleneck. SATA and SAS are dedicated point-to-point interfaces so there is no additive bottleneck with more drives as long as the devices are directly connected. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 04/16/2013 10:57 PM, Bob Friesenhahn wrote: On Tue, 16 Apr 2013, Jay Heyl wrote: It's actually not all that difficult to saturate a 6Gb/s pathway with ZFS when there are multiple storage devices on the other end of that path. No single HDD today is going to come close to needing that full 6Gb/s, but put four or five of them hanging off that same path and that ultra-super highway starts looking pretty congested. Put SSDs on the other end and the 6Gb/s pathway is going to quickly become your bottleneck. SATA and SAS are dedicated point-to-point interfaces so there is no additive bottleneck with more drives as long as the devices are directly connected. Not true. Modern flash storage is quite capable of saturating a 6 Gbps SATA link. SAS has an advantage here, being dual-port natively with active-active load balancing deployed as standard practice. Also please note that SATA is half-duplex, whereas SAS is full-duplex. The problem with SATA vs SAS for flash storage is that there are, as yet, no flash devices of the NL-SAS kind. By this I mean drives that are only about 10-20% more expensive than their SATA counterparts, offering native SAS connectivity, but not top-notch enterprise features and/or performance. This situation existed in HDDs not long ago: you had 7k2 SATA and 10k/15k SAS, but no 7k2 SAS. That's why we had to do all that nonsense with SAS to SATA interposers (I have an old Sun J4200 with 1TB SATA drives that had an interposer on each of the 12 drives). Since then, NL-SAS largely made this route obsolete, so now I just buy 7k2 NL-SAS drives and skip the whole interposer thing. Now if any of the big storage drive got their shit together and started offering flash storage with native SAS at slightly above SATA prices, I'd be delighted. Trouble is, the manufacturers seem to be trying to position SAS SSDs as even more expensive/performing than SAS HDDs types of products. When I can buy a 512GB SATA SSD for the price of a 600GB SAS drive, that seems a strange proposition indeed... -- Saso ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 3:48 PM, Jay Heyl j...@frelled.us wrote: My question about the rationale behind the suggestion of mirrored SSD arrays was really meant to be more in relation to the question from the OP. I don't see how mirrored arrays of SSDs would be effective in his situation. There is another detail here to keep in mind: ZFS checks checksums on every read from storage, and with raid-zn used with block sizes that give it more capacity than mirroring (that is, data blocks are large enough that they get split across multiple data sectors and therefore devices, instead of degenerate single data sector plus parity sector(s) - OP mentioned 32K blocks, so they should get split), this means each random filesystem read that isn't cached hits a large number of devices in a raid-zn vdev, but only one device in a mirror vdev (unless ZFS splits these reads across mirrors, but even then it is still fewer devices hit). If you are limited by IOPS of the devices, then this could make raid-zn slower. Disclaimer: this is theory, I haven't tested this in practice, nor have I done any math to see if it should matter to SSDs. However, since it is a configuration question rather than a hardware question, it may be possible to acquire (some of) the hardware first and test both setups before deciding. Tim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 04/16/2013 11:25 PM, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 3:48 PM, Jay Heyl j...@frelled.us wrote: My question about the rationale behind the suggestion of mirrored SSD arrays was really meant to be more in relation to the question from the OP. I don't see how mirrored arrays of SSDs would be effective in his situation. There is another detail here to keep in mind: ZFS checks checksums on every read from storage, and with raid-zn used with block sizes that give it more capacity than mirroring (that is, data blocks are large enough that they get split across multiple data sectors and therefore devices, instead of degenerate single data sector plus parity sector(s) - OP mentioned 32K blocks, so they should get split), this means each random filesystem read that isn't cached hits a large number of devices in a raid-zn vdev, but only one device in a mirror vdev (unless ZFS splits these reads across mirrors, but even then it is still fewer devices hit). If you are limited by IOPS of the devices, then this could make raid-zn slower. If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Cheers, -- Saso ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). Tim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 04/16/2013 11:37 PM, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). Because a non-power-of-2 number of drives causes a read-modify-write sequence on every (almost) write. HDDs are block devices and they can only ever write in increments of their sector size (512 bytes or nowadays often 4096 bytes). Using your example above, you divide a 128k block by 8, you get 8x16k updates - all nicely aligned on 512 byte boundaries, so your drives can write that in one go. If you divide by 10, you get an ugly 12.8k, which means if your drives are of the 512-byte sector variety, they write 24x 512 sectors and then for the last partial sector write, they first need to fetch the sector from the patter, modify if in memory and then write it out again. I said almost every write is affected, but this largely depends on your workload. If your writes are large async writes, then this RMW cycle only happens at the end of the transaction commit (simplifying a bit, but you get the idea), which is pretty small. However, if you are doing many small updates in different locations (e.g. writing the ZIL), this can significantly amplify the load. Cheers, -- Saso ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 2:25 PM, Timothy Coalson tsc...@mst.edu wrote: On Tue, Apr 16, 2013 at 3:48 PM, Jay Heyl j...@frelled.us wrote: My question about the rationale behind the suggestion of mirrored SSD arrays was really meant to be more in relation to the question from the OP. I don't see how mirrored arrays of SSDs would be effective in his situation. There is another detail here to keep in mind: ZFS checks checksums on every read from storage, and with raid-zn used with block sizes that give it more capacity than mirroring (that is, data blocks are large enough that they get split across multiple data sectors and therefore devices, instead of degenerate single data sector plus parity sector(s) - OP mentioned 32K blocks, so they should get split), this means each random filesystem read that isn't cached hits a large number of devices in a raid-zn vdev, but only one device in a mirror vdev (unless ZFS splits these reads across mirrors, but even then it is still fewer devices hit). If you are limited by IOPS of the devices, then this could make raid-zn slower. I'm getting a sense of comparing apples to oranges here, but I do see your point about the raid-zn always requiring reads from more devices due to the parity. OTOH, it was my impression that read operations on n-way mirrors are always issued to each of the 'n' mirrors. Just for the sake of argument, let's say we need room for 1TB of storage. For raid-z2 we use 4x500GB devices. For the mirrored setup we have two mirrors each with 2x500GB devices. Reads to the raid-z2 system will hit four devices. If my assumption is correct, reads to the mirrored system will also hit four devices. If we go to a 3-way mirror, reads would hit six devices. In all but degenerate cases, mirrored arrangements are going to include more drives for the same amount of usable storage, so it seems they should result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 4:44 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: On 04/16/2013 11:37 PM, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.com wrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). Because a non-power-of-2 number of drives causes a read-modify-write sequence on every (almost) write. HDDs are block devices and they can only ever write in increments of their sector size (512 bytes or nowadays often 4096 bytes). Using your example above, you divide a 128k block by 8, you get 8x16k updates - all nicely aligned on 512 byte boundaries, so your drives can write that in one go. If you divide by 10, you get an ugly 12.8k, which means if your drives are of the 512-byte sector variety, they write 24x 512 sectors and then for the last partial sector write, they first need to fetch the sector from the patter, modify if in memory and then write it out again. I said almost every write is affected, but this largely depends on your workload. If your writes are large async writes, then this RMW cycle only happens at the end of the transaction commit (simplifying a bit, but you get the idea), which is pretty small. However, if you are doing many small updates in different locations (e.g. writing the ZIL), this can significantly amplify the load. Okay, I get the carryover of partial stripe causing problems, that makes sense, and at least has implications on space efficiency given that ZFS mainly uses power of 2 block sizes. However, I was not under the impression that ZFS ever uses partial sectors, that instead it uses fewer devices in the final stripe, ie, it would be split 10+2, 10+2...6+2. If what you say is true, I'm not sure how ZFS both manages to address halfway through a sector (if it must keep that old partial sector, it must be used somewhere, yes?), and yet has problems with changing sector sizes (the infamous ashift). Are you perhaps thinking of block device style software raid, where you need to ensure that even non-useful bits have correct parity computed? Tim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
ZFS datablocks are also a power of two what means, that if you have 1,2,4,8,16,32,.. datadisks, every write is evenly spread over all disks. If you add one disk ex from 8 to 9 datadisks, any one disk is not used on a read/write. Does that means, 9 datadisks are slower than 8 disks? No, 9 disks are faster, maybee not 1/9 faster but faster. So think about more like a myth Add Raid redundancy disks to that count, example with 8 datadisks, add one disk for Z1 (9), 2 disks for Z2 (10) and 3 disks forZ3(11) for disks per vdev. Am 16.04.2013 um 23:37 schrieb Timothy Coalson: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). Tim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss -- ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
clarification below... On Apr 16, 2013, at 2:44 PM, Sašo Kiselkov skiselkov...@gmail.com wrote: On 04/16/2013 11:37 PM, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). This makes sense, even for more random workloads. Because a non-power-of-2 number of drives causes a read-modify-write sequence on every (almost) write. HDDs are block devices and they can only ever write in increments of their sector size (512 bytes or nowadays often 4096 bytes). Using your example above, you divide a 128k block by 8, you get 8x16k updates - all nicely aligned on 512 byte boundaries, so your drives can write that in one go. If you divide by 10, you get an ugly 12.8k, which means if your drives are of the 512-byte sector variety, they write 24x 512 sectors and then for the last partial sector write, they first need to fetch the sector from the patter, modify if in memory and then write it out again. This is true for RAID-5/6, but it is not true for ZFS or raidz. Though it has been a few years, I did a bunch of tests and found no correlation between the number of disks in the set (within boundaries as described in the man page) and random performance for raidz. This is not the case for RAID-5/6 where pathologically bad performance is easy to create if you know the number of disks and stripe width. -- richard I said almost every write is affected, but this largely depends on your workload. If your writes are large async writes, then this RMW cycle only happens at the end of the transaction commit (simplifying a bit, but you get the idea), which is pretty small. However, if you are doing many small updates in different locations (e.g. writing the ZIL), this can significantly amplify the load. Cheers, -- Saso ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss -- richard.ell...@richardelling.com +1-760-896-4422 ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, 16 Apr 2013, Sašo Kiselkov wrote: SATA and SAS are dedicated point-to-point interfaces so there is no additive bottleneck with more drives as long as the devices are directly connected. Not true. Modern flash storage is quite capable of saturating a 6 Gbps SATA link. SAS has an advantage here, being dual-port natively with active-active load balancing deployed as standard practice. Also please note that SATA is half-duplex, whereas SAS is full-duplex. You did not describe how my statement about not being additive is wrong. This is different than per-drive bandwidth being insufficient for latest SSDs. Please expound on Not true. SAS/SATA are not like old parallel SCSI. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 2013-04-16 23:56, Jay Heyl wrote: result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? Yes, in normal case (not scrubbing which makes a point of reading everything) this assumption is wrong. Writes do hit all devices (mirror halves or raid disks), but reads should be in parallel. For mechanical HDDs this allows to double average read speeds (or triple for 3-way mirrors, etc.) because different spindles begin using their heads in shorter strokes around different areas, if there are enough concurrent randomly placed reads. Due to ZFS data structure, you know your target block's expected checksum before you read its data (from any mirror half, or from data disks in a raidzn); then ZFS calculates the checksum of the data it has read and combined, and only if there is a mismatch, it has to read from other disks for redundancy (and fix the detected broken part). HTH, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 6:01 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-04-16 23:56, Jay Heyl wrote: result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? Yes, in normal case (not scrubbing which makes a point of reading everything) this assumption is wrong. Writes do hit all devices (mirror halves or raid disks), but reads should be in parallel. For mechanical HDDs this allows to double average read speeds (or triple for 3-way mirrors, etc.) because different spindles begin using their heads in shorter strokes around different areas, if there are enough concurrent randomly placed reads. There is another part to his question, specifically whether a single random read that falls within one block of the file hits more than one top level vdev - to put it another way, whether a single block of a file is striped across top level vdevs. I believe every block is allocated from one and only one vdev (blocks with ditto copies allocate multiple blocks, ideally from different vdevs, but this is not the same thing), such that every read that hits only one file block goes to only one top level vdev unless something goes wrong badly enough to need a ditto copy. Tim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 2013-04-16 23:37, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). My take on this is not that these geometries are slower, but that they may be less efficient in terms of overheads at data storage. Say, you write a 16-sector block of userdata to your arrays. In case of 8+2 that would be two full stripes of parity and data. In case of 9+2 that would be a 9+2 and a 7+2 stripe. Access to this data is less balanced, placing more load on some disks which have 2 sectors of this block, and less load on others which have only one sector. It seems more sad when (i.e. due to compression) you have 1 or 2 userdata sectors remaining on a second stripe, but must still provide the 2 or 3 sectors of redundancy for this mini stripe. Also, as I found, ZFS raidzN makes precautions to not leave some potentially unusable holes (i.e. 1 or 2 free sectors, where you can't fit parity and data), so it would allocate full stripes when you have sufficiently unlucky stripe lengths just a few sectors shorter than full (i.e. 7+2 above would likely be allocated as 9+2 with zeroed-out extra sectors)... These things do add up to gigabytes, though they can happen on power-of-two sized arrays with compression just as easily (I found this with a 6-disk raidz2, with 4 data disks). Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 2013-04-17 01:12, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 6:01 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-04-16 23:56, Jay Heyl wrote: result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? Yes, in normal case (not scrubbing which makes a point of reading everything) this assumption is wrong. Writes do hit all devices (mirror halves or raid disks), but reads should be in parallel. For mechanical HDDs this allows to double average read speeds (or triple for 3-way mirrors, etc.) because different spindles begin using their heads in shorter strokes around different areas, if there are enough concurrent randomly placed reads. There is another part to his question, specifically whether a single random read that falls within one block of the file hits more than one top level vdev - to put it another way, whether a single block of a file is striped across top level vdevs. I believe every block is allocated from one and only one vdev (blocks with ditto copies allocate multiple blocks, ideally from different vdevs, but this is not the same thing), such that every read that hits only one file block goes to only one top level vdev unless something goes wrong badly enough to need a ditto copy. I believe so too... I think, striping over top-level vdevs is subject to many tuning and algorithmical influences, and is not as simple as even-odd IOs. Also IIRC there is some size of data (several MBytes) that is preferably sent as sequential IO to one TLVDEV, then it is striped over to another, in order to better utilize the strengths of faster sequential IO vs. lags of seeking. But I may be way off-track here with this belief ;) Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 04/17/2013 12:08 AM, Richard Elling wrote: clarification below... On Apr 16, 2013, at 2:44 PM, Sašo Kiselkov skiselkov...@gmail.com wrote: On 04/16/2013 11:37 PM, Timothy Coalson wrote: On Tue, Apr 16, 2013 at 4:29 PM, Sašo Kiselkov skiselkov...@gmail.comwrote: If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. This is even worse on writes if the raidz has bad geometry (number of data drives isn't a power of 2). Off topic slightly, but I have always wondered at this - what exactly causes non-power of 2 plus number of parities geometries to be slower, and by how much? I tested for this effect with some consumer drives, comparing 8+2 and 10+2, and didn't see much of a penalty (though the only random test I did was read, our workload is highly sequential so it wasn't important). This makes sense, even for more random workloads. Because a non-power-of-2 number of drives causes a read-modify-write sequence on every (almost) write. HDDs are block devices and they can only ever write in increments of their sector size (512 bytes or nowadays often 4096 bytes). Using your example above, you divide a 128k block by 8, you get 8x16k updates - all nicely aligned on 512 byte boundaries, so your drives can write that in one go. If you divide by 10, you get an ugly 12.8k, which means if your drives are of the 512-byte sector variety, they write 24x 512 sectors and then for the last partial sector write, they first need to fetch the sector from the patter, modify if in memory and then write it out again. This is true for RAID-5/6, but it is not true for ZFS or raidz. Though it has been a few years, I did a bunch of tests and found no correlation between the number of disks in the set (within boundaries as described in the man page) and random performance for raidz. This is not the case for RAID-5/6 where pathologically bad performance is easy to create if you know the number of disks and stripe width. -- richard You are right, and I think I already know where I went wrong, though I'll need to check raidz_map_alloc to confirm. If memory serves me right, raidz actually splits the I/O up so that each stripe component is simply length-aligned and padded out to complete a full sector (otherwise the zio_vdev_child_io would fail in a block-alignment assertion in zio_create here: zio_create(zio_t *pio, spa_t *spa,... { .. ASSERT(P2PHASE(size, SPA_MINBLOCKSIZE) == 0); .. I was probably misremembering the power-of-2 rule from a discussion about 4k sector drives. There the amount of wasted space can be significant, especially on small-block data, e.g. the default 8k volblocksize not being able to scale beyond 2 data drives + parity. Cheers, -- Saso ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
From: Sašo Kiselkov [mailto:skiselkov...@gmail.com] If you are IOPS constrained, then yes, raid-zn will be slower, simply because any read needs to hit all data drives in the stripe. Saso, I would expect you to know the answer to this question, probably: I have heard that raidz is more similar to raid-1e than raid-5. Meaning, when you write data to raidz, it doesn't get striped across all devices in the raidz vdev... Rather, two copies of the data get written to any of the available devices in the raidz. Can you confirm? If the behavior is to stripe across all the devices in the raidz, then the raidz iops really can't exceed that of a single device, because you have to wait for every device to respond before you have a complete block of data. But if it's more like raid-1e and individual devices can read independently of each other, then at least theoretically, the raidz with n-devices in it could return iops performance on-par with n-times a single disk. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On Tue, Apr 16, 2013 at 4:01 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-04-16 23:56, Jay Heyl wrote: result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? Yes, in normal case (not scrubbing which makes a point of reading everything) this assumption is wrong. Writes do hit all devices (mirror halves or raid disks), but reads should be in parallel. For mechanical HDDs this allows to double average read speeds (or triple for 3-way mirrors, etc.) because different spindles begin using their heads in shorter strokes around different areas, if there are enough concurrent randomly placed reads. Not to get into bickering about semantics, but I asked, Or am I wrong about reads being issued in parallel to all the mirrors in the array?, to which you replied, Yes, in normal case... this assumption is wrong... but reads should be in parallel. (Ellipses intended for clarity, not argument munging.) If reads are in parallel, then it seems as though my assumption is correct. I realize the system will discard data from all but the first reads and that using only the first response can improve performance, but in terms of number of IOPs, which is where I intended to go with this, it seems to me the mirrored system will have at least as many if not more than the raid-zn system. Or have I completely misunderstood what you intended to say? ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
From: Jay Heyl [mailto:j...@frelled.us] So I'm just assuming you're going to build a pool out of SSD's, mirrored, perhaps even 3-way mirrors. No cache/log devices. All the ram you can fit into the system. What would be the logic behind mirrored SSD arrays? With spinning platters the mirrors improve performance by allowing the fastest of the mirrors to respond to a particular command to be the one that defines throughput. When you read from a mirror, ZFS doesn't read the same data from both sides of the mirror simultaneously and let them race, wasting bus memory bandwidth to attempt gaining smaller latency. If you have a single thread doing serial reads, I also have no cause to believe that zfs reads stripes from multiple sides of the mirror to accelerate - rather, it relies on the striping across multiple mirrors or vdev's. But if you have multiple threads requesting independent random read operations that are on the same mirror, I have measured the results that you get very nearly n-times a single disk random read performance by using a n-way mirror and at least n or 2n independent random read threads. There is no latency due to head movement or waiting for the proper spot on the disc to rotate under the heads. Nothing, including ZFS, has such an in-depth knowledge of the inner drive geometry as to know how long is necessary for the rotational latency to come around. Also, rotational latency is almost nothing compared to head seek. For this reason, short-stroking makes a big difference, when you have a data usage pattern that can easily be confined to a small number of adjacent tracks. I believe, if you use a HDD for log device, it's aware of itself and does short-stroking, but I don't actually know. Also, this is really a completely separate subject. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
For the context of ZPL, easy answer below :-) ... On Apr 16, 2013, at 4:12 PM, Timothy Coalson tsc...@mst.edu wrote: On Tue, Apr 16, 2013 at 6:01 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-04-16 23:56, Jay Heyl wrote: result in more devices being hit for both read and write. Or am I wrong about reads being issued in parallel to all the mirrors in the array? Yes, in normal case (not scrubbing which makes a point of reading everything) this assumption is wrong. Writes do hit all devices (mirror halves or raid disks), but reads should be in parallel. For mechanical HDDs this allows to double average read speeds (or triple for 3-way mirrors, etc.) because different spindles begin using their heads in shorter strokes around different areas, if there are enough concurrent randomly placed reads. There is another part to his question, specifically whether a single random read that falls within one block of the file hits more than one top level vdev - No. to put it another way, whether a single block of a file is striped across top level vdevs. I believe every block is allocated from one and only one vdev (blocks with ditto copies allocate multiple blocks, ideally from different vdevs, but this is not the same thing), such that every read that hits only one file block goes to only one top level vdev unless something goes wrong badly enough to need a ditto copy. Correct. -- richard -- richard.ell...@richardelling.com +1-760-896-4422 ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
From: Mehmet Erol Sanliturk [mailto:m.e.sanlit...@gmail.com] SSD units are very vulnerable to power cuts during work up to complete failure which they can not be used any more to complete loss of data . If there are any junky drives out there that fail so dramatically, those are junky and the exception. Just imagine how foolish the engineers would have to be, Power loss? I didn't think of that... Complete drive failure in power loss is acceptable behavior. Definitely an inaccurate generalization about SSD's. There is nothing inherent about flash memory as compared to magnetic material, that would cause such a thing. I repeat: I'm not saying there's no such thing as a SSD that has such a problem. I'm saying if there is, it's junk. And you can safely assume any good drive doesn't have that problem. MLC ( Multi-Level Cell ) SSD units have a short life time if they are continuously written ( they are more suitable to write once ( in a limited number of writes sense ) - read many ) . It's a fact that NAND has a finite number of write cycles, and it gets slower to write, the more times it's been re-written. It is also a fact that when SSD's were first introduced to the commodity market about 11 years ago, that they failed quickly due to OSes (windows) continually writing the same sectors over and over. But manufacturers have been long since aware of this problem, and solved it by overprovisioning and wear-leveling. Similar to ZFS copy-on-write, which has the ability to logically address some blocks and secretly re-map them to different sectors behind the scenes... SSD's with wear-leveling secretly remap sectors during writes. SSD units may fail due to write wearing in an unexpected time , making them very unreliable for mission critical works . Every page has a write counter, which is used to predict failure. A very predictable, and very much *not* unexpected time. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
On 2013-04-17 02:10, Jay Heyl wrote: Not to get into bickering about semantics, but I asked, Or am I wrong about reads being issued in parallel to all the mirrors in the array?, to which you replied, Yes, in normal case... this assumption is wrong... but reads should be in parallel. (Ellipses intended for clarity, not argument munging.) If reads are in parallel, then it seems as though my assumption is correct. I realize the system will discard data from all but the first reads and that using only the first response can improve performance, but in terms of number of IOPs, which is where I intended to go with this, it seems to me the mirrored system will have at least as many if not more than the raid-zn system. Or have I completely misunderstood what you intended to say? Um, right... I got torn between several letters and forgot the details of one. So, here's what I replied to with poor wording - *I thought you meant* A single read request from a program would be redirected as a series of parallel requests to mirror components asking for the same data, whichever one answers first - this is no, the wrong in my reply. Unless the first device to answer returns garbage (something that doesn't match the expected checksum), other copies are not read as part of this request. Now, if there are many requests on the system issued simultaneously, which is most often the case, then reads from different requests are directed to different disks, but again - one read goes to one disk except pathological cases. It is likely that the system selects a disk to read from based, in part, on its expectation of where the disk head is (i.e. last requested LBA is nearest to the LBA we want now) in order to minimize latency and unproductive time losses. Thus sequential reads where requests for nearby sectors come in a succession are likely to be satisfied by a single disk in the mirror, leaving other disks available to satisfy other reads. Copies of a write request however are sent to all disks and committed (flushed) before the synchronous request is accepted as completed (for example, a write-and-commit of a TXG transaction group). Hope this makes my point clearer, it is late here ;) //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Recommendations for fast storage
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 17/04/13 02:10, Jay Heyl wrote: Not to get into bickering about semantics, but I asked, Or am I wrong about reads being issued in parallel to all the mirrors in the array? Each read is issued only to a (lets say, random) disk in the mirror, unless the read is faulty. You can check this easily with zpool iostat -v. - -- Jesús Cea Avión _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ Twitter: @jcea_/_/_/_/ _/_/_/_/_/ jabber / xmpp:j...@jabber.org _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQCVAwUBUW4LLJlgi5GaxT1NAQJc7QQAin3XjOVhOqlD5/Q0xplH+TLtPNjzsCqd rAvz30tnokA1MXgGpCXx2u5rGnS2CE/Xi5boMBMegxf+feAgQlANYRykpgwSqxeo VgBUvkoWC2oKAk2hAT1UxcbKD+YhuESx8n1B/JHej97eZuhlthpmGZC2e1H3GDiH 0+I/+wNPUVY= =yyd1 -END PGP SIGNATURE- ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] building a new box soon- HDD concerns and recommendations for virtual serving
Further to my original post, I have a new (desktop, I know ... but I am on a tight budget) Intel MB with an i5-3750 CPU and 32 GB of desktop RAM. Booting the 151a7 live DVD shows that it thinks it's a 32 bit system (huh?). It regognises almost all the devices when I run the device manager, but doesn't see any HDD's. they're just not there at all. At boot time it says they're too big for a 32 bit kernel. Why is it booting a 32 bit kernel anyway? Maybe some BIOS thing I need to set? I booted the first (default) image that the live CD displays. The MB is a DH77EB which is an H77 Express chipset. As well as thinking it's 32 bit, it also doesn't see any drives at all. I have a single new Seagate 2TB (4k blocks?) ST2000DM001 in it that the live CD doesn't see at all. I know there was a recent thread on installing to these drives that seemed inconclusive as to how best to install to them. I want to have : 2 x 2TB HDDs for rpool (ZFS mirror) 4 x 2TB HDD's to get at least a 4TB mirror (or is RAID-Z a better option?) Would I be better off with some 500GB HDD's for the rpool? And while I fiddle with this thing, is there any way to get the live CD installer to work with these drives without poking around with some other OS to partition the drive? help! Thank you Carl ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss