On Mon, 25 Jul 2011, Marc Deop wrote:

> It's more than twice as fast than the previous sh script.

In part this is /bin/sh v /bin/bash and using 'bashisms' 
matter, but yes, I did not seek to optimize a teaching 
throwaway

> 1- m5sum the file we need
  ... actually the NAME of the file, to make it explicit we are
        not looking at content [also a reasonable approach if one is
        looking to find and de-duplicate a filestore]

> 2- look for the first letter of the hash
  ... actually this may be more than a single letter of the
        hash --- with ca 3000 files, and 16 hash characters,
        we should end up with about 200 files per
        subdirectory.  The filesystem should be doing some sort of
        index as well -- as I recall, a B-tree in the case of
        extN but I've not expressly looked.  The php case was
        mentioned, however, and its directory searching is less
        optimal

We have a customer with a similar problem with a naiively 
written set of home brewed PHP code, and are helping them work 
through similar issues

> 3- get into the directory
> 4- now we look for our file
  ... this is probably a single operation to suck the sub-directory
        listing into an array in php, and use an associative
        match

but you are right, we are moving increasingly away from a 
CentOS issue to a more general coding style issue

-- Russ herrold
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to