I'm looking for an elegant way to perform chunk-stream-based processing of arrays/ranges. I'm building a file indexing/search engine in D that calculates various kinds of statistics on files such as histograms and SHA1-digests. I want these calculations to be performed in a single pass with regards to data-access locality.

Here is an excerpt from the engine

    /** Process File in Cache Friendly Chunks. */
    void calculateCStatInChunks(immutable (ubyte[]) src,
size_t chunkSize, bool doSHA1, bool doBHist8) {
        if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
        if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

        import std.digest.sha;
        SHA1 sha1;
        if (doSHA1) { sha1.start(); }

        import std.range: chunks;
        foreach (chunk; src.chunks(chunkSize)) {
            if (doSHA1) { sha1.put(chunk); }
            if (doBHist8) { /*...*/ }
        }

        if (doSHA1) {
            _cstat.contentsDigest = sha1.finish();
        }
    }

Seemingly this is not a very elegant (functional) approach as I have to spread logic for each statistics (reducer) across three different places in the code, namely `start`, `put` and `finish`.

Does anybody have suggestions/references on Haskell-monad-like stream based APIs that can make this code more D-style component-based?

Reply via email to