On Wed, Aug 30, 2023 at 07:55:14AM -0400, songbird wrote:
> Karl Vogel wrote:
> ...
> > If nothing else, it's faster to run "locate" and look for file extensions;
> > running "file" on that much crap took nearly 9 hours.
> 
> do you have SSDs or spinning rust?

  I have a 256-Gb SSD and two mirrored Western Digital Blue 1.8-Tb drives.
  About 2 million files are on SSD and the rest are on rust.

  I used "file" v5.45 built from source, which does a nice job but is IO-
  and CPU-intensive.

> when i just did this:
>     # find / -type f | wc -l
> it took all of 24 seconds for the 2.4 million files found.

  Generating hashes for SSD files is faster than getting the filetype;
  it takes about 17 minutes for 3.6 million files (153 Gbytes).  I like
  the Blake-2 hash cuz it's fast as hell, among other things:

    #!/bin/ksh
    #<zroot-hash: run Blake hash on all zroot dataset files

    export PATH=/usr/local/bin:/bin:/usr/bin
    tag=${0##*/}
    set -o nounset
    umask 022

    logmsg () { logger -t "$tag" "$@"; }
    die ()    { logmsg "FATAL: $@"; exit 1; }

    work=$(mktemp -q "/tmp/$tag.work.XXXXXX")
    case "$?" in
        0)  test -f "$work" || die "$work: tmp list file not found" ;;
        *)  die "can't create work file" ;;
    esac

    # Get a list of all regular files on SSD.

    mount | grep '^zroot' | awk '{print $3}' |
      while read dataset
      do
          logmsg "listing $dataset"
          find "$dataset" -xdev -type f -print0 >> $work
      done

    # Store hashes for SSD datasets.
    # The hash file is sorted by filename to make comparisons easier.

    logmsg "running b2sum"
    fdbdir=$(date '+/var/fdb/%Y/%m%d')
    sort -z $work | xargs -0r b2sum -l 128 > "$fdbdir/zroot.sum"
    rm $work
    exit 0

  Useful for finding changed files -- security, backups, etc.

> what script did you use?

    #!/bin/ksh
    #<ftype: get a sampling of filetypes for all SSD filesystems, /src.

    export PATH=/usr/local/bin:/bin:/usr/bin
    set -o nounset
    tag=${0##*/}
    umask 022

    logmsg () { logger -t "$tag" "$@"; }

    work="/tmp/$tag.$$"
    fsys="/ /doc /home /usr/local /search /usr/src /dist /src"

    logmsg start
    find $fsys -xdev -print0 | xargs -0 file -N --mime-type > $work
    logmsg finish

    mv $work filetypes
    exit 0

-- 
Karl Vogel                      I don't speak for the USAF or my company

Assisted in daily preparation of large quantities of consumable items
in a fast-paced setting.  (Translation: Short-order cook)
                                        --from a list of resume' blunders

Reply via email to