Re: Stream-Based Processing of Range Chunks in D

qznc Tue, 10 Dec 2013 02:31:42 -0800

On Tuesday, 10 December 2013 at 09:57:44 UTC, Nordlöw wrote:

I'm looking for an elegant way to perform chunk-stream-basedprocessing of arrays/ranges. I'm building a fileindexing/search engine in D that calculates various kinds ofstatistics on files such as histograms and SHA1-digests. I wantthese calculations to be performed in a single pass withregards to data-access locality.
Here is an excerpt from the engine

    /** Process File in Cache Friendly Chunks. */
    void calculateCStatInChunks(immutable (ubyte[]) src,
size_t chunkSize, bool doSHA1,bool doBHist8) {if (!_cstat.contentsDigest[].allZeros) { doSHA1 =false; }
        if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

        import std.digest.sha;
        SHA1 sha1;
        if (doSHA1) { sha1.start(); }

        import std.range: chunks;
        foreach (chunk; src.chunks(chunkSize)) {
            if (doSHA1) { sha1.put(chunk); }
            if (doBHist8) { /*...*/ }
        }

        if (doSHA1) {
            _cstat.contentsDigest = sha1.finish();
        }
    }
Seemingly this is not a very elegant (functional) approach as Ihave to spread logic for each statistics (reducer) across threedifferent places in the code, namely `start`, `put` and`finish`.
Does anybody have suggestions/references on Haskell-monad-likestream based APIs that can make this code more D-stylecomponent-based?

You could make a range step for each kind of statistic, whichoutputs the input range unchanged and does its job as a sideeffect.


  SHA1 sha1;
  src.chunks(chunkSize)
     .add_sha1(doSHA1, &sha1)
     .add_bhist(doBHist8)
     .strict_consuming();

You could try to use constructor/destructor mechanisms forsha1.start and sha1.finish. Or at least scope guards:


SHA1 sha1;
if (doSHA1) { sha1.start(); }
scope(exit) if (doSHA1) { _cstat.contentsDigest = sha1.finish(); }

Re: Stream-Based Processing of Range Chunks in D

Reply via email to