-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113355/#review42745
-----------------------------------------------------------


I see now that I have tried to put too much stuff into a single patch - it's 
too hard to digest and to understand, and the number of possibilities to modify 
different aspects of UDSEntry in a different way is just too large for me to 
test all of them.

Still, I consider it a bug that every KDE application that deals directly or 
indirectly with UDSEntry wastes ~half a kilobyte of memory for every single 
file, and considering that this can be fixed by modifying just a single .cpp 
file, I would very much like to see at least a small improvement in the not too 
distant future.

Maybe the best approach is to discard this review request, and open a new one 
that just adds the unit test (it never hurts to have more of them IMHO) and 
makes use of the implicit sharing of QStrings. That would at least bring us 
some of the memory savings, and it would be a rather straightforward change 
that would hopefully still be considered safe enough to include in kdelibs 4.12.

@dfaure: I took from our discussion on the mailing list that you agree with the 
idea to reduce UDSEntry's memory usage. What do you think?

- Frank Reininghaus


On Oct. 26, 2013, 5:56 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113355/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2013, 5:56 p.m.)
> 
> 
> Review request for kdelibs.
> 
> 
> Repository: kdelibs
> 
> 
> Description
> -------
> 
> This patch is based on some discussions on the kfm-devel mailing list, see 
> http://lists.kde.org/?t=137935784800003&r=1&w=2
> 
> Mark found out that KIO's UDSEntry class is one of the major consumers of 
> memory in applications which use KIO to list directories with a large number 
> of files, and I found a way use implicit sharing to drastically reduce the 
> amount of memory it needs. Many thanks to Milian for his great blog post 
> http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which 
> I would probably not have had such ideas.
> 
> 
> 1. The problem
> 
> The UDSEntry keeps all sorts of information about files which can be stored 
> in a string (name, user, group, etc.) or in a long long (file size, 
> modification time, etc.). All these data can be accessed with a uint key, and 
> UDSEntry returns the corresponding QString or long long in O(1) time. 
> Internally, it stores the data in a QHash<uint, Field>, where Field is a 
> struct that has a QString and a long long member.
> 
> The problem is that QHash needs quite a lot of memory to provide the O(1) 
> access, see http://doc.qt.digia.com/qq/qq19-containers.html for details, and 
> that the minimum capacity of a QHash seems to be 17, even though the number 
> of entries in a UDSEntry is often 8 in the rather typical standard kio_file 
> use case.
> 
> 
> 2. Proposed solution
> 
> (a) We can store the "Fields" in a QVector<Field>, which needs only very 
> little overhead compared to the memory that the actual "Fields" need. When 
> loading a UDSEntry from a QDataStream, we just append all "Fields" to this 
> QVector one by one. Moreover, we need a QHash<uint, int>, which maps each key 
> to the index of its Field in the QVector. This restructuring alone does not 
> reduce the memory usage, of course.
> 
> (b) The key observation is that the QDataStream, which KIO::ListJob reads the 
> UDSEntries from, typically provides the different "Fields" in exactly the 
> same order. This means that the QHash<uint, int> is usually exactly the same 
> for every UDSEntry, and we can make use of implicit sharing to store only one 
> copy of this QHash. I've modified
> 
> void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a)
> 
> such that it remembers the most recent QHash<uint, int> and just adds an 
> implicitly shared copy of it to "a" if the order of the Fields has not 
> changed.
> 
> (c) Moreover, some of the QString Fields in the UDSEntries in one directory 
> are often the same, like, e.g., the user and the group. The "load" function 
> also remembers the most recently read values for each Field in a static 
> QVector<QString> and just puts an implicitly shared copy into the UDSEntry if 
> possible.
> 
> 
> 3. Possible disadvantages
> 
> (a) When using the "remove" member, the new version of UDSEntry does not 
> remove the Field from the QVector<Field>. This means that removing and adding 
> a "Field" repeatedly would let the memory usage grow indefinitely. According 
> to David (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this 
> doesn't matter though because no known user of UDSEntry uses its remove() 
> member. Maybe we should remove remove (pun stolen grom David) in the 
> frameworks branch then?
> 
> (b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, 
> UDSEntry&) was reentrant. This is not the case for my changed version. 
> Reentrancy could be restored rather easily by protecting the access to the 
> static data with a mutex, but given that most of KIO is not supposed to be 
> used from outside the main thread AFAIK, I don't know if this is necessary.
> 
> 
> 4. Changes since the first version of the patch which I posted in 
> http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2
> 
> (a) Implemented the minor changes suggested by David in 
> http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2
> 
> (b) Added a unit test to verify that storing and loading UDSEntries from a 
> stream works even if the order of the fields is permuted, and some fields are 
> removed or added in between.
> 
> (c) Fixed a bug which was uncovered by the test: 
> cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) 
> instead of cachedUdsFields.erase(cachedUdsFields.begin() + i)
> 
> (d) Use QVector::reserve to reserve the appropriate size for the 
> QVector<Field>. Saves some time when loading the UDSEntry, and reduces the 
> memory usage further.
> 
> (e) Changed the type of the loop variable from quint32 to int to fix some 
> compiler warnings.
> 
> 
> Diffs
> -----
> 
>   kio/kio/udsentry.h e1c8b05 
>   kio/kio/udsentry.cpp 1e1f503 
>   kio/tests/CMakeLists.txt 1019312 
>   kio/tests/simplelistjobtest.cpp PRE-CREATION 
>   kio/tests/udsentrybenchmark.cpp PRE-CREATION 
>   kio/tests/udsentrytest.h PRE-CREATION 
>   kio/tests/udsentrytest.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/113355/diff/
> 
> 
> Testing
> -------
> 
> Old and new unit tests pass. The memory usage of Dolphin when loading a 
> directory with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 
> MB. Any application that uses a file dialog, a KDirLister, or anything else 
> that uses a KIO::ListJob to list directory contents should benefit from 
> similar savings.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

Reply via email to