Re: Review Request 118452: Reduce the memory usage of UDSEntry by using QVector, rather than QHash, for the internal data storage

David Faure Sat, 07 Jun 2014 06:53:21 -0700


> On June 6, 2014, 10:34 p.m., David Faure wrote:
> > Interestingly, I had a benchmark to compare a number of data structures for 
> > UDSEntry (which made me turn the Qt3 QList into the Qt4 QHash). That 
> > benchmark is the commented-out "newApiPerformance()" method in jobtest.cpp.
> > 
> > I just extended it to include your suggestion: 
> > http://www.davidfaure.fr/2014/udsentry_benchmark.diff
> > 
> > Here are the results I get (in debug mode, because otherwise I'd have to 
> > make sure the compiler doesn't optimize away the useless iterations loops 
> > :))
> > 
> > Old API: slave code: 5
> > Old API: app code: 5
> > QHash+QVariant API: slave code: 9
> > QHash+QVariant API: app code: 4
> > QHash+struct API: slave code: 7
> > QHash+struct API: app code: 3
> > QMap+struct API: slave code: 8
> > QMap+struct API: app code: 4
> > Frank's API: slave code: 13
> > Frank API: app code: 6
> > 
> > So, unless I'm missing something, this is much slower for the slave (but we 
> > could decide to just not use UDSEntry in the slave, as discussed on k-f-d), 
> > but it's also slower on the app side....
> > 
> > Can you check if I did something stupid when grabbing your code or writing 
> > the testcode for it?
> > Otherwise the next step is to actually run this with -O2, ensuring the 
> > loops still loop.
> 
> Milian Wolff wrote:
>     I'd say a benchmark should use QBENCHMARK. Also, I'm afraid the results 
> you posted are meaningless when they come from a non-optimized, esp. 
> considering that the differences are so small. I suggest someone picks this 
> test up and makes it a proper benchmark. To prevent the optimizer from 
> kicking in, actually use the results in one way or another. E.g. calculate 
> the sum of string lengths returned and number values for the read operation. 
> And of course also compare the sum to some expected value.
> 
> David Faure wrote:
>     Yeah - this code is very old, it predates Q_BENCHMARK :-)
>     
>     "the differences are small" - not really, these are entire seconds, due 
> to large amount of iterations.
>     
>     But yep, let me convert all this to a proper benchmark. Coming up.


Done, see kio/autotests/udsentry_benchmark.

Results (in release mode, with ./kiocore-udsentry_benchmark -minimumvalue 1000)

KDE3:          slave=1123 app=1411
Hash+Variant:  slave=1856 app=1605
Hash+Struct:   slave=1051 app=1418  // the kde4 solution
Map+Struct:    slave=1418 app=1427
TwoVectors:    slave=1204 app=1390  // Frank's suggestion in this RR.

Not bad :-)

Again, please double-check the test and the results....


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/118452/#review59479
-----------------------------------------------------------


On June 1, 2014, 1:50 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/118452/
> -----------------------------------------------------------
> 
> (Updated June 1, 2014, 1:50 p.m.)
> 
> 
> Review request for KDE Frameworks and David Faure.
> 
> 
> Repository: kio
> 
> 
> Description
> -------
> 
> I am continuing to split up https://git.reviewboard.kde.org/r/113355/ , which 
> attempts to make UDSEntry more efficient memory and CPU-wise, into 
> independent parts. This is the third step after 
> https://git.reviewboard.kde.org/r/113591/ and 
> https://git.reviewboard.kde.org/r/115739/ .
> 
> The present patch modifies the internal data storage of UDSEntry. UDSEntry 
> contains a mapping from unsigned int keys to "Field" values, where Field is a 
> struct that contains a QString and a long long (some keys correspond to 
> numeric values, like size, date, etc, whereas others, like user and group, 
> correspond to a QString).
> 
> Currently, UDSEntry stores all data in a QHash<uint, Field> internally. This 
> ensures that everything can be accessed in O(1) time, but is not very 
> efficient memory-wise because a separate memory allocation is done for each 
> hash node.
> 
> I propose to change this and store both the uint keys and the Field values in 
> a QVector each. This means that accessing a value becomes a O(n) operation, 
> since the entire QVector of keys may need to be scanned in order to find a 
> value, but because the number n of values in a UDSEntry is usually quite 
> small and can currently not exceed a number ~100, this should not matter in 
> practice.
> 
> Some parts of https://git.reviewboard.kde.org/r/113355/ are still missing:
> 
> (a) The QVectors which store the keys (which are usually the same for all 
> items in a directory) are not shared yet. Changing this would reduce the 
> memory usage further, but I decided to keep this change separate in order to 
> keep the current patch small and easy to understand. Moreover, this makes it 
> easier to benchmark other similar approaches (e.g., replace QVector by 
> std::vector, or store keys and values together in a 
> std::vector<std::pair<uint,Field>>).
> 
> (b) No space is reserved in the vectors when key/value pairs are inserted one 
> by one. Implementing this would make UDSEntry faster on the slave side (since 
> repeated re-allocations would not be necessary any more), but this can be 
> done in a later patch. Moreover, it might not be needed any more if UDSEntry 
> is not used directly any more on the slave side, as suggested on the 
> frameworks mailing list by Aaron (good idea BTW!). 
> 
> 
> Diffs
> -----
> 
>   src/core/udsentry.cpp c6ac21a 
> 
> Diff: https://git.reviewboard.kde.org/r/118452/diff/
> 
> 
> Testing
> -------
> 
> Unit tests still pass.
> 
> The memory usage of listjobtest with a directory with 100,000 files is 
> reduced from 71344 K to 35392 K according to KSysGuard. I see similar savings 
> when opening the directory in Dolphin.
> 
> I still haven't set up a Qt5/KF5 build in release mode (shame on me!), so I 
> cannot present any benchmark results.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

_______________________________________________
Kde-frameworks-devel mailing list
Kde-frameworks-devel@kde.org
https://mail.kde.org/mailman/listinfo/kde-frameworks-devel

Re: Review Request 118452: Reduce the memory usage of UDSEntry by using QVector, rather than QHash, for the internal data storage

Reply via email to