> On Sept. 23, 2015, 2:59 p.m., Vishesh Handa wrote:
> > Hmm. This is actually an optimization I care about. 
> > Do you think you could share your `~/.local/share/baloo/index` file. That 
> > way I will be able to reproduce it and see exactly where/why the data is 
> > changing?
> > 
> > Btw, awesome work at diagnosing this! :)
> 
> Igor Poboiko wrote:
>     It's not that easy, it takes ~150MB of space. Right now my internet 
> connection won't accept that X_X
>     Also I'm not sure you'll be able to reproduce it even having my index, 
> just because for me this bug occurs during index creation (when it actually 
> writes something into DB; btw, isn't it obvious it can change data? or lmdb 
> guarantees it won't?). You'll just be able to see some .pdf-files of type 
> document (in DocumentDB; so it shows correctly in 'balooshow') and of type 
> document (in PostingDB, so it pops up during search)
>     
>     But you can ping me on IRC (poboiko at #kde-baloo at freenode), and I 
> will do whatever you want :)
> 
> Vishesh Handa wrote:
>     I don't get much IRC time these days :(
>     
>     I'm a little confused. `balooshow file` shows a type of T5 (PDF). This 
> means that when the file was indexed, the correct type was stored. However, 
> when calling 'baloosearch type:Folder' the file is in the results. This 
> indicates that it is a problem when searching, and not indexing, no?
> 
> Boudhayan Gupta wrote:
>     Given that it's a memory corruption bug, this patch may fix this:
>     
>     https://paste.kde.org/pwsj1pbnq
>     
>     The stacktrace clearly implicates code that this patch is modifying.
>     
>     I cannot test this, however. This bug isn't mine; IRC nick "genstorm" 
> reported this.
> 
> Igor Poboiko wrote:
>     Vishesh: as far as I understood, you have several indexes. 
>      * PostingDB, where keys are search terms and values are documents (e.g. 
> key: "T5", value: "123123123", id of "file.pdf")
>      * DocumentDB, where keys are documents and value are list of terms (e.g. 
> key: 123123123, value: "T9")
>     They are used differently:
>      * First is used by "baloosearch" for searching and is being populated by 
> "baloo_file" indexer
>      * Second is used by "balooshow" and is being populated by 
> "baloo_file_extractor", which is called by "baloo_file"
>      
>      What I've discovered is that extractor fails (like I've explained), 
> leading to inconsistent DB, such as type "T5" in first DB and "T9" in second 
> DB, because of that memory corruption thing.

You're completely right. And it makese sense.

Gimme a day or two, I want to run a few tests, but if we cannot figure it out, 
maybe we should ship this for 5.15. It's better to loose on performance than 
have crashes and incorrect values.


- Vishesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/125362/#review85830
-----------------------------------------------------------


On Sept. 23, 2015, 2:42 p.m., Igor Poboiko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/125362/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2015, 2:42 p.m.)
> 
> 
> Review request for Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> I've noted following bug: sometypes e.g "PDF" (T5) files had a "Folder" (T9) 
> type in KRunner.
> "balooshow -x" showed that it has a "T5", while "baloosearch --type folder" 
> still listed it.
> 
>  * Debugging showed that it appears somewhere in WritingTransaction::commit()
>  * There wasn't any WritingTransaction::m_pendingOperations["T9"] access at 
> all
>  * This hash contained "T9" key (QHash::keys().contains("T9") == true), but 
> it didn't (QHash::contains("T9") == false and QHash::count("T9") == 0)
>  * Because of that QHashIterator fails miserably iterating over non-existing 
> values (e.g iter.value() returns some value with some data for that 
> non-existing values)
>  * Bisection showed that QHash got corrupted at "documentTermsDB.put(id, 
> docTerms)" (engine/writingtransaction.cpp:185), to be specific - on mdb_put() 
> line
> 
> That was the bug itself. The problem is that QByteArray::fromRawData() is 
> used everywhere, which does not copy data from DB but just stores a pointer 
> to some place in memory-mapped file in DB. And it doesn't know if data where 
> it points to changed, leading to undefined behavior like that.
> 
> This patch removes ::fromRawData() calls replacing it by copy-constructors. 
> (maybe somewhere we can leave it, but I'm not sure it's a optimization we 
> should care about)
> 
> 
> Diffs
> -----
> 
>   src/codecs/doctermscodec.cpp e8801f9 
>   src/codecs/postingcodec.cpp 1edb645 
>   src/engine/documentdatadb.cpp 690df70 
>   src/engine/documentdb.cpp ea0cb66 
>   src/engine/idfilenamedb.cpp d4e1eb1 
>   src/engine/positiondb.cpp 568dc54 
>   src/engine/postingdb.cpp e183db5 
> 
> Diff: https://git.reviewboard.kde.org/r/125362/diff/
> 
> 
> Testing
> -------
> 
> After applying this patch I have no more files in wrong category, so the 
> issue is gone.
> 
> 
> Thanks,
> 
> Igor Poboiko
> 
>

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Reply via email to