Hi Stuart, Steve,

I've taken deeper look into code. I still didn't trace carefully index calculation in classes BitMaskMappingArray and ArrayBitMaskMappingStrategy, but I managed to improve performance by increasing arrays size in those classes (which is set in HProfFile class).

If I understand code correctly, when capacity of BitMaskMappingArray will be exhausted bucketSize is doubled, which in turn causes that more reads (even cached) is required to set position of IDataProvider/IArrayEntryProvider.

Following are loading time results for default array size (1000) and increased (1000000). Test ran against generated dump file (5000000 instances of Data).
Default (1000):
HeapSubRecord: 100000 (866ms, 4215kB)
HeapSubRecord: 200000 (1716ms, 7879kB)
HeapSubRecord: 300000 (2833ms, 11263kB)
HeapSubRecord: 400000 (3889ms, 14283kB)
HeapSubRecord: 500000 (3893ms, 17319kB)
HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was doubled)
HeapSubRecord: 700000 (7721ms, 23531kB)
HeapSubRecord: 800000 (7729ms, 26567kB)
HeapSubRecord: 900000 (7731ms, 29671kB)
HeapSubRecord: 1000000 (7704ms, 32731kB)
... (I didn't wait until end)

Increased(1000000):
HeapSubRecord: 100000 (622ms, 17809kB)
HeapSubRecord: 200000 (309ms, 20345kB)
HeapSubRecord: 300000 (283ms, 23861kB)
HeapSubRecord: 400000 (274ms, 27921kB)
HeapSubRecord: 500000 (269ms, 29957kB)
HeapSubRecord: 600000 (264ms, 31993kB)
HeapSubRecord: 700000 (272ms, 36097kB)
HeapSubRecord: 800000 (288ms, 37739kB)
HeapSubRecord: 900000 (263ms, 39835kB)
HeapSubRecord: 1000000 (259ms, 41931kB)
HeapSubRecord: 1100000 (300ms, 44773kB)
HeapSubRecord: 1200000 (283ms, 46901kB)
HeapSubRecord: 1300000 (291ms, 49029kB)
HeapSubRecord: 1400000 (328ms, 53801kB)
HeapSubRecord: 1500000 (259ms, 53801kB)
HeapSubRecord: 1600000 (272ms, 58125kB)
HeapSubRecord: 1700000 (264ms, 60293kB)
HeapSubRecord: 1800000 (264ms, 62473kB)
HeapSubRecord: 1900000 (361ms, 61373kB)
HeapSubRecord: 2000000 (274ms, 63105kB)
...
HeapSubRecord: 9000000 (284ms, 231969kB)
HeapSubRecord: 9100000 (272ms, 233597kB)
HeapSubRecord: 9200000 (281ms, 236357kB)
HeapSubRecord: 9300000 (274ms, 240469kB)
HeapSubRecord: 9400000 (279ms, 244541kB)
HeapSubRecord: 9500000 (269ms, 246549kB)
HeapSubRecord: 9600000 (279ms, 250565kB)
HeapSubRecord: 9700000 (265ms, 252573kB)
HeapSubRecord: 9800000 (279ms, 256629kB)
HeapSubRecord: 9900000 (265ms, 258669kB)
HeapSubRecord: 10000000 (463ms, 263997kB)
(end)

i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if I remember correctly).

Regards
Lukasz


Stuart Monteith wrote:
The hprof dump reader spends a lot of time reading the whole file, for various reason. The indices it has in memory are constructed through an initial read, and this is also the source of the memory usage. In addition, there is some correlation to be done which
also takes up time, and induces yet more reading.

I'm sure some work could be done to improve the performance further, but we'll have to look at the tradeoff between diskspace and memory usage. The hprof file format itself is what it is, however, and we have no influence over that. The CJVMTI agent is has lots of room for improvement, but I suspect its potential for improvement is unlikely to be much better than existing hprof implementations. The built-in JVM hprof dumper will probably be a hard act
to follow.

The HProf implementation is not thread-safe. Realistically, I think it is something that ought to be considered once things are more mature. There will be algorithms that can deal with the JVM
structure sensible.

And thanks Lukasz, it's great to have your input.

Regards,
    Stuart

Reply via email to