On Thursday, 16 April 2015 at 20:33:17 UTC, Marc Schütz wrote:
On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
For CSV files, what I found is that parsing is quite slow (and
memory intensive).
If your sure that CSV reading is the culprit, writing a custom
parser could help. It's possible to load a CSV file with almost
no memory overhead. What I would do:
- Use std.mmfile with Mode.readCopyOnWrite to map the file into
memory.
- Iterate over the lines, and then over the fields using
std.algorithm.splitter.
- Don't copy, but return slices into the mapped memory.
- If a field needs to be unescaped, this can be done in-place.
Unescaping never makes a string longer, and the original file
won't be modified thanks to COW (private mapping).
These are REALLY USEFUL optimizations! Thanks Marc.
Although, i'm still no better with the memory usage. I've reduced
the application to just loading a CSV file into structs.
Here is void main :
````
void main(string[] args)
{
auto text = readText(args[1]);
foreach(record; csvReader!(string[string])(text, null))
{
if (!record["RIC"] || !record["TRBCIndCode"]) {
continue;
}
// Add a Security to Securities
securities.add(record["RIC"], record["TRBCIndCode"],
record, []);
}
delete text;
GC.collect();
writefln("%d securities processed", securities.length);
writefln("Securities : %d MB", securities.bytes/1024/1024);
import core.thread;
Thread.sleep(dur!"seconds"(60));
}
````
The output is :
````
make screener-d-simple; ./screener-d data/instruments-clean.csv
dmd -vgc -ofscreener-d source/simplemain.d source/lib/security.d
source/simplemain.d(30): vgc: indexing an associative array may
cause GC allocation
source/simplemain.d(30): vgc: indexing an associative array may
cause GC allocation
source/simplemain.d(35): vgc: indexing an associative array may
cause GC allocation
source/simplemain.d(35): vgc: indexing an associative array may
cause GC allocation
source/simplemain.d(38): vgc: 'delete' requires GC
source/lib/security.d(105): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(111): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(113): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(115): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(118): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(122): vgc: operator ~= may cause GC
allocation
source/lib/security.d(123): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(164): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(164): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(173): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(173): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(182): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(182): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(191): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(191): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d(203): vgc: indexing an associative array
may cause GC allocation
source/lib/security.d-mixin-213(213): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-213(213): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-219(219): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-219(219): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-225(225): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-225(225): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-231(231): vgc: indexing an
associative array may cause GC allocation
source/lib/security.d-mixin-231(231): vgc: indexing an
associative array may cause GC allocation
20066 securities processed
Securities : 188 MB
````
And yet memory usage is 617 MB.