-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/120438/#review67687
-----------------------------------------------------------


Hey Aaron. Thanks for putting this up for review. Could you please split this 
patch into multiple reviews? I see a mix of this patch, xapian->sqlite changes, 
new API, coding styles changes, and some improvements in the FileMapping class.

The indexing progress seems to be now tracked in sqlite instead of Xapian. You 
mentioned it is more performant. Do you know why that is? Also, I'd love to 
know how these benchmarks were measured. I don't see any tests for benchmarks. 
Perhaps you forgot to add them to this patch?

And finally, from what I understand this removed the need for the whole binary 
search when a file fails to index. Could you please add a test in order to make 
sure the extractor crashing is handled properly? Some tests for the file 
indexing queue would also be needed.

How do you handle when a file takes too long to index? The FileIndexingJob had 
a timer for 5 minutes.


.gitignore
<https://git.reviewboard.kde.org/r/120438/#comment47171>

    Is this required?



src/file/lib/extractorclient.h
<https://git.reviewboard.kde.org/r/120438/#comment47169>

    This add new public API.



src/file/main.cpp
<https://git.reviewboard.kde.org/r/120438/#comment47170>

    No more transactions?


- Vishesh Handa


On Sept. 30, 2014, 12:47 p.m., Aaron J. Seigo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/120438/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 12:47 p.m.)
> 
> 
> Review request for Baloo.
> 
> 
> Repository: baloo
> 
> 
> Description
> -------
> 
> Current Baloo's file indexer takes batches of files by its command line and 
> processes them async. There are several iterations over and creation of 
> various collection classes. This patch set does away with all of this.
> 
> Instead, the worker process now is directed over stdin and responds over 
> stdout (old school UNIX!). This is done using a very simple protocol which is 
> fully documented in docs/ and is encapsulated in a convenience class so as to 
> make future maintenance relatively painless in the advent that the protocol 
> is adjusted or an entirely new system is put inplace. IOW, the next me should 
> have an easier time improving on what is there. This allows the worker 
> process to be long(er)-lived and moves more of the relevant bookkeeping to 
> the same place (the worker process). The resulting code is about the same 
> size but cleaner and much more performant.
> 
> The tracking of file progress is also moved to the (less expensive, easier to 
> update incrementally) sqlite db where file data is already kept. It also 
> versions the sqlite schema so that runtime upgrades can be made (as required 
> here) both now and in the future without user intervention.
> 
> Finally, a tiny (93 LOC, including header) executable that wraps the protocol 
> encapsulation class is provided for 100% backwards compatibility of *all* 
> command line options that currently exist.
> 
> This makes it a 100% drop-in replacement.
> 
> Performance of the affected unit tests is ~2.5x better on my laptop: 7.6 v 
> 3.0 seconds.
> 
> Known weaknesses:
> 
> * still isn't hardened against multiple processes indexing simultaneously. 
> the added sqlite columns are a step in this direction, howeer
> * the DBus signal that was being emitted was removed and has not been 
> replaced with anything yet. That is because the affected class is in 
> baloo-widgets (which happens to be the only user of the relevant *public* 
> class in the baloo file library) and I have not yet worked on that, pending 
> this patch set being approved in principle. I am actually of the suspicion 
> that keeping that data synchronized at all times is not needed .. but if it 
> is, there are much better ways than DBus. Note that the DBus overhead was 
> applied to *every single file indexed* even if Dolphin (the only user of this 
> feature) wasn't running or had the metadata frame open.
> 
> The base idea behind this patch is to make file indexing as blazing fast as 
> possible, which means using as few resources, CPU & memory, as possible. That 
> ought to be the #1 priority for Baloo when it comes to indexing files, both 
> to keep it away from user's notice on desktop class hardware as well as 
> extend its relevance to smaller devices. The code in this patch is at least 
> as clear (imho more so, actually) than what it replaces and was written for 
> maintainability.
> 
> 
> Diffs
> -----
> 
>   src/file/extractor/app.cpp PRE-CREATION 
>   src/file/extractor/baloo_file_extractor.h 
> a8cff51ae763c36fb2934c80c04cbf0f710a01c7 
>   src/file/extractor/baloo_file_extractor.cpp 
> b2e7f97f2f0537807c314e2aca008d67711a69ce 
>   src/file/extractor/extractorworker.cpp 
> e710908dc8b02b7edc9f40e44b9d5d5883986846 
>   src/file/extractor/main.cpp 606a1eb9d42b7079fee497a3fdffbe2327512c13 
>   src/file/fileindexingjob.h PRE-CREATION 
>   src/file/fileindexingjob.cpp PRE-CREATION 
>   src/file/fileindexingqueue.h 5b1c612feb84046acf36b3dec3bf890da2cbf650 
>   src/file/fileindexingqueue.cpp c6479cd25308bee9bb6fe6e6791391e6ca83bddf 
>   src/file/indexingqueue.cpp c8ae4d9d1436ebe222a4758bd5ce551e091ec714 
>   src/file/lib/CMakeLists.txt fb158a5f00f07ab48de963bf72924c83fc1c6673 
>   src/file/lib/extractorclient.h 1e2ae09c7bd0068ca185b814c5c4996c0b563856 
>   src/file/lib/extractorclient.cpp 221b90abe1899c27c03cf77e279fb71afce80b4d 
>   src/file/lib/filemapping.cpp 5769fe23576c341917e8f37f7dd2743dd446a6e1 
>   src/file/main.cpp 41855215c7b550c403d5ef523d29f9b73db58cb0 
>   src/file/metadatamover.cpp 5117687c1247275235c3830b9d2234bfff353748 
>   src/file/util.h a4cc2423fd46cda7146b02c572cc3f4ccd069e3e 
>   src/file/util.cpp 2f7ac6de0dddc7d71c2b36bb4fe88677bfe52336 
>   src/file/autotest/fileindexingjob/fileindexingjobtest.h PRE-CREATION 
>   src/file/autotest/fileindexingjob/fileindexingjobtest.cpp PRE-CREATION 
>   src/file/autotest/scheduler/schedulertest.cpp 
> b91a481138382b609068b4737c654ac5182c37ec 
>   src/file/basicindexingjob.h 3d2fb339e38be862332ddca45b4926f8000fa0c1 
>   src/file/basicindexingjob.cpp 8d70fa2b1ca737eaa8d2e717a0eb2dbf430fac84 
>   src/file/basicindexingqueue.cpp 8484c9b6868890cae6c2e19b29ea92d500aac03e 
>   src/file/commitqueue.cpp b26f73cfdbc390cceef01aa6abd2b5af6f9689b2 
>   src/file/database.h 6548de7dbce003112dad8c58c39c624dd806a261 
>   src/file/database.cpp 5c9ccdc442d230bdc5dc9edbe5bc595537d08a8f 
>   src/file/extractor/CMakeLists.txt 7cf3478f547f7fbcdf8908a3471b50fab09b5220 
>   src/file/extractor/app.h 8059a16e8ec2e2a84d1a3b7eb1dd8cfc2a3136de 
>   .gitignore 1377554ebea6f98a2c748183bc5a96852af12ac2 
>   docs/file_extractor.md 3d830e2313bd35994f9fd1d6b52df422d0556fd8 
>   src/file/CMakeLists.txt 1a9e6bf8f2c70b096fbb2c6c9eb78feec9d06622 
>   src/file/autotest/CMakeLists.txt ccb8575fa8c07f2b136d259ed63d43a09ad75a12 
>   src/file/autotest/basicindexingjobtest.h PRE-CREATION 
>   src/file/autotest/basicindexingjobtest.cpp PRE-CREATION 
>   src/file/autotest/fileindexingjob/CMakeLists.txt PRE-CREATION 
>   src/file/autotest/fileindexingjob/extractor.cpp PRE-CREATION 
> 
> Diff: https://git.reviewboard.kde.org/r/120438/diff/
> 
> 
> Testing
> -------
> 
> Indexed files, ran testsuite with 0 failures.
> 
> 
> Thanks,
> 
> Aaron J. Seigo
> 
>

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Reply via email to