Ragnar Kjørstad wrote:
>
> > The hash is only 23 bits. From 7 up to 30. Low bits (0-6) are used to store
> > generation counter. Highest bit (31) is left 0, because offset in file in GNU
> > system is signed integer (off_t which is long int).
>
> Thanks for all your help.
>
> The new hash is now working well.
>
> The code is:
>
> /* Copyright Big Storage, 2000 */
> u32 maildir_hash (const char *msg, int len)
> {
> u32 num_part[3] = {0,0,0};
> int i;
> int j=0;
> for (i=0; i<len; i++) {
> if (msg[i]>='0' && msg[i]<='9') {
> num_part[j]*=10;
> num_part[j]+=msg[i]-'0';
> } else {
> j++;
> if (j==3)
> break;
> }
> }
> if (num_part[0] && num_part[1] && num_part[2])
> return (num_part[0]<<7)+(num_part[1]<<5)+(num_part[2]<<5);
> else
> return r5_hash(msg, len);
> }
>
> And performance-numbers:
>
> untar find find
> reiserfs r5: 4m52.461s 10m42.421s 10m31.282s
> reiserfs tea: 5m00.709s 10m48.716s 10m36.089s
> reiserfs tea noatime: 4m45.495s 5m26.378s 5m33.648s
> reiserfs r5: 4m52.960s 10m37.554s 10m32.216s
> reiserfs r5 noatime: 4m50.794s 5m17.570s 5m24.448s
> reiserfs notail tea: 1m16.500s 7m49.262s 9m18.548s
> reiserfs notail tea noatime: 1m17.413s 4m20.977s 6m00.232s
> reiserfs notail r5: 1m19.156s 7m29.467s 9m38.376s
> reiserfs notail r5 noatime: 1m16.934s 4m19.736s 5m59.732s
> xfs: 9m41.229s 0m55.330s 1m06.169s
> xfs noatime: 9m34.167s 0m51.635s 1m03.481s
> reiserfs maildir: 3m01.894s 4m19.049s 4m46.440s
> reiserfs maildir noatime: 3m03.944s 2m28.942s 3m05.409s
> reiserfs notail maildir: 1m01.591s 0m50.718s 0m58.597s
> reiserfs notail maildir noatime: 1m01.045s 0m40.927s 0m48.239s
>
> Observations:
> * Creating the files on reiserfs is twice as fast as xfs
> * With "notail" it's even four times faster, 8 times faster than xfs!
> * XFS much faster then reiserfs for duing "find . -type f|xargs cat"
> Suprising to me, as this is a lot of small files, and exactly what
> reiserfs was built to do??
This requires plan A to fix. Plan A missed code freeze, and will go into v4.
> * Using noatime helps a lot, but reiserfs is still far behind.
> * Using the new hash, maildir-hash, improves untar performance a little
> bit, and find by a factor of 6!
>
> We will be testing this on a mailserver next to see what the impact will
> be on smtp and pop speeds.
>
> Regardless of those results, I believe that our findings show:
> * The choice of hash seems very important to filesystem-performance
> * On dedicated systems, performance can be improved a lot by
> creating a special hash for that task
> * XFS works much better than reiserfs without tweeks for this test.
> WHY?
>
> I'm not so sure it makes sense to include this hash in the standard
> kernel, because it's very special purpose - or we would end up with a
> hundred different hashes for different uses. I'm not so sure that's what
> we want?
>
> --
> Ragnar Kjorstad
> Big Storage
I have no problem with a hundred different hashes. Improve the commenting, deal with
the licensing
in some acceptable way, and we'll take the code
in 2.5.1.
Hans