FYI, if you care, follow frameworks devel, guess double posting
only ends in pain.

Greetings
Christoph

----- Weitergeleitete Mail -----
Von: "cullmann" <cullm...@absint.com>
An: "kde-frameworks-devel" <kde-frameworks-de...@kde.org>
Gesendet: Mittwoch, 14. September 2016 23:29:22
Betreff: Scrap baloo?

Hi,

first, read that from my mail to the maintainer thread:

<snip>

Hi,

after looking a bit more at the code, I think there are ATM a lot of things 
that need fixing:

1) 32-bit system: I see no fix, > 1GB of index and baloo + all baloo using 
applications fail

  see bugs like https://bugs.kde.org/show_bug.cgi?id=356114 here we have the 
5GB limit, which is now raised
  for 64-bit, but not for 32-bit

2) Larger filesystems: unfortunately one decided to ignore the upper 32-bit of 
the inodes

/**
 * Convert the QT_STATBUF into a 64 bit unique identifier for the file.
 * This identifier is combination of the device id and inode number.
 */
inline quint64 statBufToId(const QT_STATBUF& stBuf)
{
    // We're loosing 32 bits of info, so this could potentially break
    // on file systems with really large inode and device ids
    return devIdAndInodeToId(static_cast<quint32>(stBuf.st_dev),
                             static_cast<quint32>(stBuf.st_ino));
}

=> random breakage e.g. on my NFS drive here as the IDs clash and all 
invariants no longer hold.
(e.g. something can be a file but in addition a directory, ....)

3) No error handling of most lmdb faults (like already mentioned)

4) No error handling for any data corruption: e.g. many places will just 
endless loop or malloc, like
  DocumentUrlDB::get(quint64 docId) (we have bugs for that)

5) lmdb locking issues: crash one read-write process => all other things stall 
(or crash because of 3+4)

6) No resource management nor crash handling for the baloo_file_extractor which 
either OOMs you or corrupts the database on crash leading to 5)

CC'd Vishesh, perhaps I am wrong with that issues and misunderstand the code, 
unfortunately e.g. the database
structure is not that well documented, if I don't just not find the correct 
docs in the git.

</snip>

Now executive summary, after a day more looking at the code.

1) 32-bit systems: never will be usable, thanks to lmdb, at least not with 
non-trivial index sizes

2) network file system homes: never will be usable, thanks to lmdb (ask its 
author: http://lmdb.tech/doc/ "Do not use LMDB databases on remote filesystems, 
even between processes on the same host. This breaks flock() on some OSes, 
possibly memory map sync, and certainly sync between programs on different 
hosts."

3) close to no error handling in the code => see the crash reports, I cleaned 
up a bit, but they are piling
  
https://bugs.kde.org/reports.cgi?product=frameworks-baloo&output=show_chart&datasets=CONFIRMED&datasets=ASSIGNED&datasets=REOPENED&datasets=UNCONFIRMED&datasets=RESOLVED&banner=1

4) fundamental problems like: wrong data structure for index (32-bit inodes in 
21th century?) and close to zero docs what it does internally

Proposal:

Scrap baloo_file* and Co. and just reimplement the public API (modulo the 
settings for the then non-existing indexer daemon)
to use tracker.

Benefits:

1) Tracker is maintained: https://github.com/GNOME/tracker/graphs/contributors
2) We share the index with GNOME/* and save double indexing on "many" Linux 
systems which are not plain KDE Plasma Desktop based
3) We can delete 99% of the code (question is if we can remove the very buggy 
extractors from KFileMetaData, too, afterwards somewhen).

=> Opinions?

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullm...@absint.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234
-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullm...@absint.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234

Reply via email to