Nemec, Bernhard wrote:

(First of all, thanks for creating a simple test app)

No, I don't want to overwrite anything. I expect the data file to grow, and
because of MMF, I also expect the size of allocated *virtual memory* to
grow. But I don't want the *physical* RAM usage to grow endlessly.

Hmm... I'm not sure what is going on. I took your code, switched to blocked views, made an optimization so it would run faster, and had to take it *way* up to 95M rows to make it reach resident size which exceed the RAM on my Linux box. And indeed, it starts thrashing - I thought at first that Linux would simply use up memory because it was all free, but it does keep going and pushing stuff out to swap space:


[...]
last added row was no. 94399999
  VmSize:        2083072 kB
  VmLck:               0 kB
  VmRSS:          729096 kB
  VmData:         794820 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 94499999

As you can see, I switched to blocked views:

$ sdx mkinfo test.db
test.db
  Metakit data starts at offset 0 and is stored as little-endian
  Views:
  94401x test2 {{_B {id:I name}}}
$

And swapping is obvious:

$ vmstat 3
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 140112 4312 7896 13448 0 2 27 23 24 18 1 0 95 3
1 0 141808 6092 6904 13028 961 1272 3161 2479 1727 1200 33 9 0 58
0 1 143680 6420 6008 14604 993 1357 3083 2543 1627 955 41 9 0 51
1 0 161000 4620 5384 10652 1673 5779 2205 6857 1351 395 43 5 0 52
0 3 162836 3352 5292 13092 1727 4520 3781 6171 1573 945 42 11 0 48
0 1 176228 4680 3496 15628 1795 1145 4045 2880 1690 1530 30 5 0 65
1 0 177392 5048 3364 14200 1432 984 2680 2804 1601 797 47 8 0 45
1 3 180332 3448 3328 11932 1143 1647 2699 3305 1640 834 41 10 0 49
1 0 199456 3592 3392 13484 1917 6375 2373 7743 1292 429 30 4 0 66
1 1 220064 5636 3356 14472 2647 7493 4469 7535 1875 1843 30 9 0 60
[...]


Now compare this with a datafile which contains three 32-bit ints instead of your small string:

last added row was no. 99199999
  VmSize:          59220 kB
  VmLck:               0 kB
  VmRSS:           57400 kB
  VmData:          56760 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 99299999

$ sdx mkinfo test.db
test.db
  Metakit data starts at offset 0 and is stored as little-endian
  Views:
  99301x test2 {{_B {id:I n1:I n2:I n3:I}}}
$ ls -l test.db
-rw-r--r--    1 jcw      users    1536395782 Jan 28 18:10 test.db
$

This is more like it: 57 Mb resident data for a 1.5 Gb datafile (and no swapping at all now).

There are no memory leaks in MK as far as I know, so the only conclusion can be that MK must be holding on to a lot more temp data with string properties. Which is not totally surprising, variable-sized strings take quite some data management to track in columns - and there are 94,401 strings columns in that first test above (each subview in the blocked view).

The math points to about 8.4 Kb resident per string column, which is in fact not surprising at all: internal buffers get malloc'ed in 4 Kb chuncks (the slack is for dealing with potential growth), and there are two vectors for each string column: byte data and string sizes.

It looks like MK is keeping all subview string buffers allocated, not just mapped to file. I'm surprised.

Note that all of the above uses blocked views. The problem just mentioned does not happen when using a single view (but the performance of commits slows down dramatically as the number of rows increase, I'd say at least two orders of magnitude for as far as I got).

The good news though, is that the behavior you describe does *not* carry forward once the datafile grows way further. I do see some fluctuations:

last added row was no. 12099999
  VmSize:         473956 kB
  VmLck:               0 kB
  VmRSS:            1376 kB
  VmData:            400 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 12199999
  VmSize:         474824 kB
  VmLck:               0 kB
  VmRSS:            2244 kB
  VmData:           1268 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 12299999
  VmSize:         474316 kB
  VmLck:               0 kB
  VmRSS:            1736 kB
  VmData:            760 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 12399999

But memory usage really does not increase in the way you assume:

last added row was no. 16099999
  VmSize:         635836 kB
  VmLck:               0 kB
  VmRSS:            1536 kB
  VmData:            560 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 16199999
  VmSize:         635840 kB
  VmLck:               0 kB
  VmRSS:            1540 kB
  VmData:            564 kB
  VmStk:               8 kB
  VmExe:              24 kB
  VmLib:            2368 kB
last added row was no. 16299999

(The datafile was at 700 Mb at this point)

-jcw

PS. Below are the changes I made (it's better not to allocate temp property values inside a loop, use a row object defined outside it):

c4_View testview = storage.GetAs("test2[_B[id:I,n1:I,n2:I,n3:I]]").Blocked();
//c4_View testview = storage.GetAs("test1[id:I,n1:I,n2:I,n3:I]");


  c4_IntProp id("id"),n1("n1"),n2("n2"),n3("n3");
  c4_StringProp name("name");


for (int j = 0; j < 1000; ++j) { ShowMemoryInfo();

    c4_Row r;
    n1(r) = 1234567;
    n2(r) = 2345678;
    n3(r) = 3456789;

int n;

    for (int i = 0; i < 100000; i++)
    {
      id(r) = i;
      n = testview.Add( r );
    }

[...]

_____________________________________________
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to