Re: Review Request 113355: Reduce UDSEntry memory usage with implicit sharing

Frank Reininghaus Thu, 24 Oct 2013 15:35:13 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113355/
-----------------------------------------------------------


(Updated Oct. 24, 2013, 10:34 p.m.)


Review request for kdelibs.


Changes
-------

This new patch is by no means final. It's mainly supposed to make it easier to 
understand what I tested by including a simple "ListJob" executable to measure 
the raw memory usage of UDSEntry (based on Mark's draft - thanks for that! - 
and of course not supposed to be committed), and by including a version which 
sacrifices the O(1) access of QHash in favor of a QVector<uint> which keeps the 
"uds" values.

I have measured the memory usage and the profiling data from the 
udsentrybenchmark (included in this patch). The results are at 
https://docs.google.com/spreadsheet/ccc?key=0Al_c1ZMwqRICdEtaUy1mWXN3bl9wWnJlQ045eHRzT3c&usp=sharing

Benchmarks were taken on my rather old machine (AMD Athlon 64 X2 5000+ 
processor, 4 GB RAM, openSUSE 12.3, 64 bit).

For all memory and time measurements, I repeated the measurement 7 times and 
took the median value.


Some remarks:


1. About Milian's question how much of the memory saving is due to the implicit 
sharing of the QStrings:

Listing 500,000 files with current master takes 405.6 MB. Implicit sharing of 
QStrings only (http://pastebin.kde.org/p52a24b49) reduces that to 345. On the 
other hand, r2 of this review request with the implicit sharing of QStrings 
disabled (Version 5 in the spreadsheet) takes 207.2 MB, and the full r2 (which 
combines QString implicit sharing with the implicit sharing of the QHash) 146.2 
MB.

So the answer is: The shared QStrings are responsible for only a small part of 
the saving.


2. About the QHash vs. QMap question (Version 1 vs. Version 3 in my table): 
there are only minor differences. QMap needs less memory, but the difference is 
not really significant. I guess that this is because QMap also allocates a node 
for every stored item, including a few pointers and some overhead caused by the 
memory allocator.


3. r2 of this request (Version 4 in the table) is significantly slower than the 
other variants when inserting fields into the UDSEntry (this happens inside the 
kioslave). Much of this can be fixed by reserving sufficient space for the 
QVector before inserting though (Version 6). Funnily enough, the source of 
kio_file contains a commented-out "entry.reserve(8)", so it seems that it was 
possible at some point to reserve memory for the UDSEntry in advance. Maybe one 
might consider restoring that feature.


4. Dropping the QHash completely and storing the "uds" values in a simple 
QVector<int> udsIndexes, where udsIndexes[i] = uds means that fields[i] stores 
the "Field" for "uds" (Version 7, the current r3 of this review request) is 
significantly faster than all other solutions for the "8 fields in the 
UDSEntry" case. I haven't tried yet how it scales if there are more entries in 
the UDSEntry. It should be easy to try it by modifying the benchmark, but I 
really have to go to sleep now ;-) I think it could be that using, e.g., 20 
fields instead of 8 won't make a big difference at all, because the benchmarks 
(which I hope are similar to typical uses of UDSEntry in kioslaves and 
applications) most likely do not spend a lot of time in the index lookup code.


5. I haven't tried every possible approach because I didn't have enough time. 
Further possibilities would be to replace QVector by std::vector (at least the 
one that is not shared), or to try the std::vector<std::pair<unit, Field> > 
idea that Jan suggested (but I would not expect any big benchmark difference to 
r3 of this request).


Repository: kdelibs


Description
-------

This patch is based on some discussions on the kfm-devel mailing list, see 
http://lists.kde.org/?t=137935784800003&r=1&w=2

Mark found out that KIO's UDSEntry class is one of the major consumers of 
memory in applications which use KIO to list directories with a large number of 
files, and I found a way use implicit sharing to drastically reduce the amount 
of memory it needs. Many thanks to Milian for his great blog post 
http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which I 
would probably not have had such ideas.


1. The problem

The UDSEntry keeps all sorts of information about files which can be stored in 
a string (name, user, group, etc.) or in a long long (file size, modification 
time, etc.). All these data can be accessed with a uint key, and UDSEntry 
returns the corresponding QString or long long in O(1) time. Internally, it 
stores the data in a QHash<uint, Field>, where Field is a struct that has a 
QString and a long long member.

The problem is that QHash needs quite a lot of memory to provide the O(1) 
access, see http://doc.qt.digia.com/qq/qq19-containers.html for details, and 
that the minimum capacity of a QHash seems to be 17, even though the number of 
entries in a UDSEntry is often 8 in the rather typical standard kio_file use 
case.


2. Proposed solution

(a) We can store the "Fields" in a QVector<Field>, which needs only very little 
overhead compared to the memory that the actual "Fields" need. When loading a 
UDSEntry from a QDataStream, we just append all "Fields" to this QVector one by 
one. Moreover, we need a QHash<uint, int>, which maps each key to the index of 
its Field in the QVector. This restructuring alone does not reduce the memory 
usage, of course.

(b) The key observation is that the QDataStream, which KIO::ListJob reads the 
UDSEntries from, typically provides the different "Fields" in exactly the same 
order. This means that the QHash<uint, int> is usually exactly the same for 
every UDSEntry, and we can make use of implicit sharing to store only one copy 
of this QHash. I've modified

void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a)

such that it remembers the most recent QHash<uint, int> and just adds an 
implicitly shared copy of it to "a" if the order of the Fields has not changed.

(c) Moreover, some of the QString Fields in the UDSEntries in one directory are 
often the same, like, e.g., the user and the group. The "load" function also 
remembers the most recently read values for each Field in a static 
QVector<QString> and just puts an implicitly shared copy into the UDSEntry if 
possible.


3. Possible disadvantages

(a) When using the "remove" member, the new version of UDSEntry does not remove 
the Field from the QVector<Field>. This means that removing and adding a 
"Field" repeatedly would let the memory usage grow indefinitely. According to 
David (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this doesn't 
matter though because no known user of UDSEntry uses its remove() member. Maybe 
we should remove remove (pun stolen grom David) in the frameworks branch then?

(b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, 
UDSEntry&) was reentrant. This is not the case for my changed version. 
Reentrancy could be restored rather easily by protecting the access to the 
static data with a mutex, but given that most of KIO is not supposed to be used 
from outside the main thread AFAIK, I don't know if this is necessary.


4. Changes since the first version of the patch which I posted in 
http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2

(a) Implemented the minor changes suggested by David in 
http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2

(b) Added a unit test to verify that storing and loading UDSEntries from a 
stream works even if the order of the fields is permuted, and some fields are 
removed or added in between.

(c) Fixed a bug which was uncovered by the test: 
cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) 
instead of cachedUdsFields.erase(cachedUdsFields.begin() + i)

(d) Use QVector::reserve to reserve the appropriate size for the 
QVector<Field>. Saves some time when loading the UDSEntry, and reduces the 
memory usage further.

(e) Changed the type of the loop variable from quint32 to int to fix some 
compiler warnings.


Diffs (updated)
-----

  kio/kio/udsentry.cpp 1e1f503 
  kio/tests/CMakeLists.txt 1019312 
  kio/tests/simplelistjobtest.cpp PRE-CREATION 
  kio/tests/udsentrybenchmark.cpp PRE-CREATION 
  kio/tests/udsentrytest.h PRE-CREATION 
  kio/tests/udsentrytest.cpp PRE-CREATION 

Diff: http://git.reviewboard.kde.org/r/113355/diff/


Testing
-------

Old and new unit tests pass. The memory usage of Dolphin when loading a 
directory with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 
MB. Any application that uses a file dialog, a KDirLister, or anything else 
that uses a KIO::ListJob to list directory contents should benefit from similar 
savings.


Thanks,

Frank Reininghaus

Re: Review Request 113355: Reduce UDSEntry memory usage with implicit sharing

Reply via email to