Ragnar Kjørstad wrote:
> 
> > The hash is only 23 bits. From 7 up to 30. Low bits (0-6) are used to store
> > generation counter. Highest bit (31) is left 0, because offset in file in GNU
> > system is signed integer (off_t which is long int).
> 
> Thanks for all your help.
> 
> The new hash is now working well.
> 
> The code is:
> 
> /* Copyright Big Storage, 2000 */
> u32 maildir_hash (const char *msg, int len)
> {
>   u32 num_part[3] = {0,0,0};
>   int i;
>   int j=0;
>   for (i=0; i<len; i++) {
>         if (msg[i]>='0' && msg[i]<='9') {
>                 num_part[j]*=10;
>                 num_part[j]+=msg[i]-'0';
>         } else {
>                 j++;
>                 if (j==3)
>                         break;
>         }
>   }
>   if (num_part[0] && num_part[1] && num_part[2])
>         return (num_part[0]<<7)+(num_part[1]<<5)+(num_part[2]<<5);
>   else
>         return r5_hash(msg, len);
> }
> 
> And performance-numbers:
> 
>                                            untar         find         find
> reiserfs r5:                           4m52.461s   10m42.421s   10m31.282s
> reiserfs tea:                          5m00.709s   10m48.716s   10m36.089s
> reiserfs tea noatime:                  4m45.495s    5m26.378s    5m33.648s
> reiserfs r5:                           4m52.960s   10m37.554s   10m32.216s
> reiserfs r5 noatime:                   4m50.794s    5m17.570s    5m24.448s
> reiserfs notail tea:                   1m16.500s    7m49.262s    9m18.548s
> reiserfs notail tea noatime:           1m17.413s    4m20.977s    6m00.232s
> reiserfs notail r5:                    1m19.156s    7m29.467s    9m38.376s
> reiserfs notail r5 noatime:            1m16.934s    4m19.736s    5m59.732s
> xfs:                                   9m41.229s    0m55.330s    1m06.169s
> xfs noatime:                           9m34.167s    0m51.635s    1m03.481s
> reiserfs maildir:                      3m01.894s    4m19.049s    4m46.440s
> reiserfs maildir noatime:              3m03.944s    2m28.942s    3m05.409s
> reiserfs notail maildir:               1m01.591s    0m50.718s    0m58.597s
> reiserfs notail maildir noatime:       1m01.045s    0m40.927s    0m48.239s
> 
> Observations:
> * Creating the files on reiserfs is twice as fast as xfs
> * With "notail" it's even four times faster, 8 times faster than xfs!
> * XFS much faster then reiserfs for duing "find . -type f|xargs cat"
>   Suprising to me, as this is a lot of small files, and exactly what
>   reiserfs was built to do??

This requires plan A to fix.  Plan A missed code freeze, and will go into v4.

> * Using noatime helps a lot, but reiserfs is still far behind.
> * Using the new hash, maildir-hash, improves untar performance a little
>   bit, and find by a factor of 6!
> 
> We will be testing this on a mailserver next to see what the impact will
> be on smtp and pop speeds.
> 
> Regardless of those results, I believe that our findings show:
> * The choice of hash seems very important to filesystem-performance
> * On dedicated systems, performance can be improved a lot by
>   creating a special hash for that task
> * XFS works much better than reiserfs without tweeks for this test.
>   WHY?
> 
> I'm not so sure it makes sense to include this hash in the standard
> kernel, because it's very special purpose - or we would end up with a
> hundred different hashes for different uses. I'm not so sure that's what
> we want?
> 
> --
> Ragnar Kjorstad
> Big Storage
I have no problem with a hundred different hashes.  Improve the commenting, deal with 
the licensing
in some acceptable way, and we'll take the code
in 2.5.1.

Hans

Reply via email to