The first parity uses straight XOR on uint64_t, while the second parity performs the LFSR on all bytes in a uint64_t with some bitwise math (search for VDEV_RAIDZ_64MUL_2) that adds up to 8 operators by my count, followed by xor - using the LFSR lookup table on each byte might have a chance at being faster, but its hard to know without testing. I am not at all surprised it takes significantly more cpu time, and I see no way to make it significantly closer in speed to a simple XOR.
However, the test indicated more than 1GB/s, is the N40L's processor really 10x slower per core than the machine you ran the test on? I mean, sure, its from 2010 and running at 1.5GHz, but still... Tim On Thu, Feb 21, 2013 at 5:21 PM, Reginald Beardsley <[email protected]>wrote: > Wow! I'm also deeply embarrassed for not having looked at the source > myself before posting. I should have. > > FWIW A 10x performance hit for double parity instead of single parity is > probably a code tuning or algorithm issue. > > Have Fun! > Reg > > --- On Thu, 2/21/13, Sašo Kiselkov <[email protected]> wrote: > > > From: Sašo Kiselkov <[email protected]> > > Subject: Re: [OpenIndiana-discuss] RAIDZ performance > > To: [email protected] > > Date: Thursday, February 21, 2013, 2:08 PM > > On 02/21/2013 07:27 PM, Timothy > > Coalson wrote: > > > I think last time this was asked, the consensus was > > that the implementation > > > was based on linear feedback shift registers and xor, > > which happens to be a > > > reed-solomon code (not as clear on this part, but what > > matters is what it > > > is, not what it isn't). Regardless, from reading > > the source previously, I > > > am fairly sure it operates bytewise, with xor for first > > syndrome (parity), > > > and LFSR and then xor for the other syndromes. > > > > > > See > > > > http://openindiana.org/pipermail/openindiana-discuss/2012-October/010419.html > > > > I tore out the parity calculations for raidz1 and raidz2 > > (attached) from > > vdev_raidz.c and here are the results: > > > > ("5 1 32 1000000" below means 1000000 iterations over a > > 5-drive > > raidz-1 at 32k per data drive; 4 data drives * 32k = > > 128k block) > > $ for ((I=0; $I < 2 ; I=$I + 1 )); do time ./raidz_test 5 > > 1 32 1000000 & > > done > > real 0m32.045s > > user 0m32.336s > > sys 0m0.015s > > > > real 0m32.372s > > user 0m32.486s > > sys 0m0.017s > > > > So combined raidz1 throughput is: > > 128 * 1024 * 1000000 / 2^30 / 32 * 2 = 7.6293 GB/s > > > > ("4 2 64 1000000" below means 1000000 iterations over a > > 4-drive > > raidz-2 at 64k per data drive; 2 data drives * 64k = > > 128k block) > > RAIDZ2: > > for ((I=0; $I < 2 ; I=$I + 1 )); do time ./raidz_test 4 2 > > 64 1000000 & done > > real 3m3.040s > > user 3m0.920s > > sys 0m0.078s > > > > real 3m3.082s > > user 3m1.092s > > sys 0m0.058s > > > > So combined raidz2 throughput is: > > 128 * 1024 * 1000000 / 2^30 / 183 * 2 = 1.3341 GB/s > > > > Next comes the factor of reduced data spindle count. A > > 4-drive raidz1 > > will contain 3 data spindles, while a 4-drive raidz2 will > > only contain 2 > > data spindles. Fewer spindles = less raw throughput. > > > > I think we can thus conclude that the performance drop > > Reginald is > > seeing is entirely expected. > > > > Cheers, > > -- > > Saso > > > > -----Inline Attachment Follows----- > > > > _______________________________________________ > > OpenIndiana-discuss mailing list > > [email protected] > > http://openindiana.org/mailman/listinfo/openindiana-discuss > > > > _______________________________________________ > OpenIndiana-discuss mailing list > [email protected] > http://openindiana.org/mailman/listinfo/openindiana-discuss > _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
