On Thursday, 16 April 2015 at 20:33:17 UTC, Marc Schütz wrote:
On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
For CSV files, what I found is that parsing is quite slow (and memory intensive).

If your sure that CSV reading is the culprit, writing a custom parser could help. It's possible to load a CSV file with almost no memory overhead. What I would do:

- Use std.mmfile with Mode.readCopyOnWrite to map the file into memory. - Iterate over the lines, and then over the fields using std.algorithm.splitter.
- Don't copy, but return slices into the mapped memory.
- If a field needs to be unescaped, this can be done in-place. Unescaping never makes a string longer, and the original file won't be modified thanks to COW (private mapping).

These are REALLY USEFUL optimizations! Thanks Marc.

Although, i'm still no better with the memory usage. I've reduced the application to just loading a CSV file into structs.

Here is void main :

````
void main(string[] args)
{
    auto text = readText(args[1]);

    foreach(record; csvReader!(string[string])(text, null))
    {
        if (!record["RIC"] || !record["TRBCIndCode"]) {
            continue;
        }

        // Add a Security to Securities
securities.add(record["RIC"], record["TRBCIndCode"], record, []);
    }

    delete text;

    GC.collect();

    writefln("%d securities processed", securities.length);
    writefln("Securities : %d MB", securities.bytes/1024/1024);

    import core.thread;
    Thread.sleep(dur!"seconds"(60));
}
````

The output is :

````
make screener-d-simple; ./screener-d data/instruments-clean.csv

dmd -vgc -ofscreener-d source/simplemain.d source/lib/security.d
source/simplemain.d(30): vgc: indexing an associative array may cause GC allocation source/simplemain.d(30): vgc: indexing an associative array may cause GC allocation source/simplemain.d(35): vgc: indexing an associative array may cause GC allocation source/simplemain.d(35): vgc: indexing an associative array may cause GC allocation
source/simplemain.d(38): vgc: 'delete' requires GC
source/lib/security.d(105): vgc: indexing an associative array may cause GC allocation source/lib/security.d(111): vgc: indexing an associative array may cause GC allocation source/lib/security.d(113): vgc: indexing an associative array may cause GC allocation source/lib/security.d(115): vgc: indexing an associative array may cause GC allocation source/lib/security.d(118): vgc: indexing an associative array may cause GC allocation source/lib/security.d(122): vgc: operator ~= may cause GC allocation source/lib/security.d(123): vgc: indexing an associative array may cause GC allocation source/lib/security.d(164): vgc: indexing an associative array may cause GC allocation source/lib/security.d(164): vgc: indexing an associative array may cause GC allocation source/lib/security.d(173): vgc: indexing an associative array may cause GC allocation source/lib/security.d(173): vgc: indexing an associative array may cause GC allocation source/lib/security.d(182): vgc: indexing an associative array may cause GC allocation source/lib/security.d(182): vgc: indexing an associative array may cause GC allocation source/lib/security.d(191): vgc: indexing an associative array may cause GC allocation source/lib/security.d(191): vgc: indexing an associative array may cause GC allocation source/lib/security.d(203): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-213(213): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-213(213): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-219(219): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-219(219): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-225(225): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-225(225): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-231(231): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-231(231): vgc: indexing an associative array may cause GC allocation

20066 securities processed
Securities : 188 MB
````

And yet memory usage is 617 MB.

Reply via email to