On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
Fwiw, I have been working on something similar. Others will have more experience on the GC, but perhaps you might find this interesting.

For CSV files, what I found is that parsing is quite slow (and memory intensive). So rather than parse the same data every time, I found it helpful to do so once in a batch that runs on a cron job, and write out to msgpack format.

I am not a GC expert, but what happens if you run GC.collect() once you are done parsing?

auto loadGiltPrices()
{
        auto data=cast(ubyte[])std.file.read("/hist/msgpack/dmo.pack");
        return cast(immutable)data.unpack!(GiltPriceFromDMO[][string]);
}

struct GiltPriceFromDMO
{
        string name;
        string ISIN;
        KPDateTime redemptionDate;
        KPDateTime closeDate;
        int indexLag;
        double cleanPrice;
        double dirtyPrice;
        double accrued;
        double yield;
        double modifiedDuration;
}

void main(string[] args)
{
        auto gilts=readCSVDMO();
        ubyte[] data=pack(gilts);
        std.file.write("dmo.pack",data);
        writefln("* done");
        data=cast(ubyte[])std.file.read("dmo.pack");
}

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
I've written a simple socket-server app that securities (stock
market shares) data and allows clients to query over them. The
app starts by loading instrument information from a CSV file into some structs, then listens on a socket responding to queries. It
doesn't mutate the data or allocate anything substantial.

There are 2 main structs in the app. One stores security data,
and the other groups together securities. They are defined as
follows :

````
__gshared Securities securities;

struct Security
{
         string RIC;
         string TRBC;
         string[string] fields;
         double[string] doubles;

         @nogc @property pure size_t bytes()
         {
             size_t bytes;

             bytes = RIC.sizeof + RIC.length;
             bytes += TRBC.sizeof + TRBC.length;

             foreach(k,v; fields) {
                 bytes += (k.sizeof + k.length + v.sizeof +
v.length);
             }

             foreach(k, v; doubles) {
                 bytes += (k.sizeof + k.length + v.sizeof);
             }

             return bytes + Security.sizeof;
         }
}

struct Securities
{
         Security[] securities;
         private size_t[string] rics;

         // Store offsets for each TRBC group
         ulong[2][string] econSect;
         ulong[2][string] busSect;
         ulong[2][string] IndGrp;
         ulong[2][string] Ind;

         @nogc @property pure size_t bytes()
         {
             size_t bytes;

             foreach(Security s; securities) {
                 bytes += s.sizeof + s.bytes;
             }

             foreach(k, v; rics) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; econSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; busSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; IndGrp) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; Ind) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             return bytes + Securities.sizeof;
         }
}
````

Calling Securities.bytes shows "188 MB", but "ps" shows about 591
MB of Resident memory. Where is the memory usage coming from?
What am i missing?

Laeeth,

GC.collect() made no difference. It seems the memory is being held by the data structures above. I think i may not be accounting for hash table usage properly, or it could be something else.

I only need to work with interday data for now, so the CSV load speed doesn't bother me atm. Great idea on using the mmfile w msgpack! I will try that out.

Adil

Reply via email to