> On Сен. 23, 2015, 2:59 п.п., Vishesh Handa wrote: > > Hmm. This is actually an optimization I care about. > > Do you think you could share your `~/.local/share/baloo/index` file. That > > way I will be able to reproduce it and see exactly where/why the data is > > changing? > > > > Btw, awesome work at diagnosing this! :) > > Igor Poboiko wrote: > It's not that easy, it takes ~150MB of space. Right now my internet > connection won't accept that X_X > Also I'm not sure you'll be able to reproduce it even having my index, > just because for me this bug occurs during index creation (when it actually > writes something into DB; btw, isn't it obvious it can change data? or lmdb > guarantees it won't?). You'll just be able to see some .pdf-files of type > document (in DocumentDB; so it shows correctly in 'balooshow') and of type > document (in PostingDB, so it pops up during search) > > But you can ping me on IRC (poboiko at #kde-baloo at freenode), and I > will do whatever you want :) > > Vishesh Handa wrote: > I don't get much IRC time these days :( > > I'm a little confused. `balooshow file` shows a type of T5 (PDF). This > means that when the file was indexed, the correct type was stored. However, > when calling 'baloosearch type:Folder' the file is in the results. This > indicates that it is a problem when searching, and not indexing, no? > > Boudhayan Gupta wrote: > Given that it's a memory corruption bug, this patch may fix this: > > https://paste.kde.org/pwsj1pbnq > > The stacktrace clearly implicates code that this patch is modifying. > > I cannot test this, however. This bug isn't mine; IRC nick "genstorm" > reported this.
Vishesh: as far as I understood, you have several indexes. * PostingDB, where keys are search terms and values are documents (e.g. key: "T5", value: "123123123", id of "file.pdf") * DocumentDB, where keys are documents and value are list of terms (e.g. key: 123123123, value: "T9") They are used differently: * First is used by "baloosearch" for searching and is being populated by "baloo_file" indexer * Second is used by "balooshow" and is being populated by "baloo_file_extractor", which is called by "baloo_file" What I've discovered is that extractor fails (like I've explained), leading to inconsistent DB, such as type "T5" in first DB and "T9" in second DB, because of that memory corruption thing. - Igor ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://git.reviewboard.kde.org/r/125362/#review85830 ----------------------------------------------------------- On Сен. 23, 2015, 2:42 п.п., Igor Poboiko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://git.reviewboard.kde.org/r/125362/ > ----------------------------------------------------------- > > (Updated Сен. 23, 2015, 2:42 п.п.) > > > Review request for Baloo. > > > Repository: baloo > > > Description > ------- > > I've noted following bug: sometypes e.g "PDF" (T5) files had a "Folder" (T9) > type in KRunner. > "balooshow -x" showed that it has a "T5", while "baloosearch --type folder" > still listed it. > > * Debugging showed that it appears somewhere in WritingTransaction::commit() > * There wasn't any WritingTransaction::m_pendingOperations["T9"] access at > all > * This hash contained "T9" key (QHash::keys().contains("T9") == true), but > it didn't (QHash::contains("T9") == false and QHash::count("T9") == 0) > * Because of that QHashIterator fails miserably iterating over non-existing > values (e.g iter.value() returns some value with some data for that > non-existing values) > * Bisection showed that QHash got corrupted at "documentTermsDB.put(id, > docTerms)" (engine/writingtransaction.cpp:185), to be specific - on mdb_put() > line > > That was the bug itself. The problem is that QByteArray::fromRawData() is > used everywhere, which does not copy data from DB but just stores a pointer > to some place in memory-mapped file in DB. And it doesn't know if data where > it points to changed, leading to undefined behavior like that. > > This patch removes ::fromRawData() calls replacing it by copy-constructors. > (maybe somewhere we can leave it, but I'm not sure it's a optimization we > should care about) > > > Diffs > ----- > > src/codecs/doctermscodec.cpp e8801f9 > src/codecs/postingcodec.cpp 1edb645 > src/engine/documentdatadb.cpp 690df70 > src/engine/documentdb.cpp ea0cb66 > src/engine/idfilenamedb.cpp d4e1eb1 > src/engine/positiondb.cpp 568dc54 > src/engine/postingdb.cpp e183db5 > > Diff: https://git.reviewboard.kde.org/r/125362/diff/ > > > Testing > ------- > > After applying this patch I have no more files in wrong category, so the > issue is gone. > > > Thanks, > > Igor Poboiko > >
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
