-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/116692/#review53035
-----------------------------------------------------------


I've been doing some simple tests which are similar to your patch -

diff --git a/src/pim/agent/tests/emailtest.cpp 
b/src/pim/agent/tests/emailtest.cpp
index abba699..edeba37 100644
--- a/src/pim/agent/tests/emailtest.cpp
+++ b/src/pim/agent/tests/emailtest.cpp
@@ -32,6 +32,8 @@
 #include <Akonadi/ItemFetchScope>
 #include <Akonadi/Item>
 
+#include <malloc.h>
+
 class App : public QApplication {
     Q_OBJECT
 public:
@@ -69,6 +71,9 @@ App::App(int& argc, char** argv, int flags)
 
 void App::main()
 {
+    int pagesize = 1024;
+    mallopt(M_TRIM_THRESHOLD, 5*pagesize);
+
     Akonadi::CollectionFetchJob* job = new 
Akonadi::CollectionFetchJob(Akonadi::Collection::root(),
                                                                        
Akonadi::CollectionFetchJob::Recursive);
     connect(job, SIGNAL(finished(KJob*)), this, 
SLOT(slotRootCollectionsFetched(KJob*)));
@@ -117,8 +122,14 @@ void App::itemReceived(const Akonadi::Item::List& itemList)
     QTime timer;
     timer.start();
 
+    int i = 0;
     Q_FOREACH (const Akonadi::Item& item, itemList) {
         m_indexer.index(item);
+        i++;
+
+        if (i%100 == 0) {
+            m_indexer.commit();
+        }
     }
 
     m_indexTime += timer.elapsed();

Without this patch the memory usage spikes up to about 1.2gb for me. With the 
more frequent commits it stays at ~800mb. With the malloc it jumps around a lot 
more and goes down depending on the collection size, but it still does go up to 
800mb. I'm trying to diagnose this more.

- Vishesh Handa


On March 10, 2014, 11:12 a.m., Aaron J. Seigo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/116692/
> -----------------------------------------------------------
> 
> (Updated March 10, 2014, 11:12 a.m.)
> 
> 
> Review request for Akonadi and Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Baloo is using Xapian for storing processed results from data fed to it by 
> akonadi; in doing so it processes all the data it is sent to index and only 
> once this is complete is the data committed to the Xapian database. From 
> http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#acbea2163142de795024880a7123bc693
>  we see: "For efficiency reasons, when performing multiple updates to a 
> database it is best (indeed, almost essential) to make as many modifications 
> as memory will permit in a single pass through the database. To ensure this, 
> Xapian batches up modifications." This means that *all* the data to be stored 
> in the Xapian database first ends up in RAM. When indexing large mailboxes 
> (or any other large chunk of data) this results in a very large amount of 
> memory allocation. On one test of 100k mails in a maildir folder this 
> resulted in 1.5GB of RAM used. In normal daily usage with maildir I find that 
> it easily balloons to several hundred megabytes within day
 s. This makes the Baloo indexer unusable on systems with smaller amounts of 
memory (e.g. mobile devices, which typically have only 512MB-2GB of RAM)
> 
> Making this even worse is that the indexer is both long-lived *and* the 
> default glibc allocator is unable to return the used memory back to the OS 
> (probably due to memory fragmentation, though I have not confirmed this). Use 
> of other allocators shows the temporary ballooning of memory during 
> processing, but once that is done the memory is released and returned back to 
> the OS. As such, this is not a memory leak .. but it behaves like one on 
> systems with the default glibc allocator with akonai_baloo_indexer taking 
> increasingly large amounts of memory on the system that never get returned to 
> the OS. (This is actually how I noticed the problem in the first place.)
> 
> The approach used to address this problem is to periodically commit data to 
> the Xapian database. This happens uniformly and transparently to the 
> AbstractIndexer subclasses. The exact behavior is controlled by the 
> s_maxUncommittedItems constant which is set arbitrarily to 100: after an 
> indexer hits 100 uncommitted changes, the results are committed immediately. 
> Caveats:
> 
> * This is not a guaranteed fix for the memory fragmentation issue experienced 
> with glibc: it is still possible for the memory to grow slowly over time as 
> each smaller commit leaves some % of un-releasable memory due to 
> fragmentation. It has helped with day to day usage here, but in the "100k 
> mails in a maildir structure" test memory did still balloon upwards. 
> 
> * It make indexing non-atomic from akonadi's perspective: data fed to 
> akonadi_baloo_indexer to be indexed may show up in chunks and even, in the 
> case of a crash of the indexer, be only partially added to the database.
> 
> Alternative approaches (not necessarily mutually exclusive to this patch or 
> each other):
> 
> * send smaller data sets from akonadi to akonadi_baloo_indexer for 
> processing. This would allow akonadi_baloo_indexer to retain the atomic 
> commit approach while avoiding the worst of the Xapian memory usage; it would 
> not address the issue of memory fragmentation
> * restart akonadi_baloo_indexer process from time to time; this would resolve 
> the fragmentation-over-time issue but not the massive memory usage due to 
> atomically indexing large datasets
> * improve Xapian's chert backend (to become default in 1.4) to not fragment 
> memory so much; this would not address the issue of massive memory usage due 
> to atomically indexing large datasets
> * use an allocator other than glibc's; this would not address the issue of 
> massive memory usage due to atomically indexing large datasets
> 
> 
> Diffs
> -----
> 
>   src/pim/agent/emailindexer.cpp 05f80cf 
>   src/pim/agent/abstractindexer.h 8ae6f5c 
>   src/pim/agent/abstractindexer.cpp fa9e96f 
>   src/pim/agent/akonotesindexer.h 83f36b7 
>   src/pim/agent/akonotesindexer.cpp ac3e66c 
>   src/pim/agent/contactindexer.h 49dfdeb 
>   src/pim/agent/contactindexer.cpp a5a6865 
>   src/pim/agent/emailindexer.h 9a5e5cf 
> 
> Diff: https://git.reviewboard.kde.org/r/116692/diff/
> 
> 
> Testing
> -------
> 
> I have been running with the patch for a couple of days and one other person 
> on irc has tested an earlier (but functionally equivalent) version. Rather 
> than reaching the common 250MB+ during regular usage it now idles at ~20MB 
> (up from ~7MB when first started; so some fragmentation remains as noted in 
> the description, but with far better long-term results)
> 
> 
> Thanks,
> 
> Aaron J. Seigo
> 
>

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Reply via email to