D28932: Store filename terms just once

2020-04-17 Thread Stefan Brüns
bruns created this revision.
bruns added reviewers: Baloo, ngraham.
Herald added projects: Frameworks, Baloo.
Herald added a subscriber: kde-frameworks-devel.
bruns requested review of this revision.

REVISION SUMMARY
  Filename terms where stored twice, once with the "F" filename property
  prefix, and once without prefix. This allows to trivially search for
  files where a term matches in filename or content, but has a number
  of drawbacks:
  
  1. It is not possible to search for a term in content only
  2. The storage size for filenames is approximately doubled
  3. File renaming can cause significant I/O load
  4. Terms appearing in both content and filename may be stored incomplete in 
the phrase storage.
  
  Re (2.), in case full text indexing is disabled this is a significant
  part of the storage size. With full text indexing, the space savings
  are likely neglegible.
  
  Re (3.), when renaming a file where part of the filename is a common term,
  e.g. "The fox.txt", renaming caused rewriting of data for "the", "fox"
  and "txt". While for "txt" and "fox" this is neglegible, "the" is common
  enough to cause a of rewrite of 10% of the whole DB.
  
  The default search behaviour of matching both filename and content
  can be restored by internally creating queries for both filename and
  content and ORing both together. This extra step does not have any
  noticeable (or even measurable) performance impact.
  
  Depends on D28929 

TEST PLAN
  $> ctest -R querytest
  $> baloosearch content:pdf
  $> baloosearch filename:pdf
  $> baloosearch pdf
  $> baloosearch content:pdf OR filename:pdf
  (the last two queries are equivalent)

REPOSITORY
  R293 Baloo

BRANCH
  submit

REVISION DETAIL
  https://phabricator.kde.org/D28932

AFFECTED FILES
  autotests/integration/querytest.cpp
  src/engine/termgenerator.cpp
  src/engine/termgenerator.h
  src/file/basicindexingjob.cpp
  src/lib/searchstore.cpp

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-17 Thread Stefan Brüns
bruns updated this revision to Diff 80435.
bruns added a comment.


  whitespace

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D28932?vs=80434&id=80435

BRANCH
  submit

REVISION DETAIL
  https://phabricator.kde.org/D28932

AFFECTED FILES
  autotests/integration/querytest.cpp
  src/engine/termgenerator.cpp
  src/engine/termgenerator.h
  src/file/basicindexingjob.cpp
  src/lib/searchstore.cpp

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-17 Thread Stefan Brüns
bruns updated this revision to Diff 80439.
bruns added a comment.


  add missing tests

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D28932?vs=80435&id=80439

BRANCH
  submit

REVISION DETAIL
  https://phabricator.kde.org/D28932

AFFECTED FILES
  autotests/integration/querytest.cpp
  src/engine/termgenerator.cpp
  src/engine/termgenerator.h
  src/file/basicindexingjob.cpp
  src/lib/searchstore.cpp

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-25 Thread Stefan Brüns
bruns added a comment.


  Ping!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-25 Thread Nathaniel Graham
ngraham added a comment.


  I'll get around to reviewing this soon. I'm trying to figure out of I think 
the loss is acceptable.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-25 Thread Stefan Brüns
bruns added a comment.


  In D28932#657011 , @ngraham wrote:
  
  > I'll get around to reviewing this soon. I'm trying to figure out of I think 
the loss is acceptable.
  
  
  There is no loss, there is even a gain (queries work correctly in all 
constellations).

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-04-26 Thread Stefan Brüns
bruns added a dependent revision: D29207: [Indexers] Ignore name-based mimetype 
for initial indexing decisions.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-05-02 Thread Stefan Brüns
bruns added a comment.


  Ping!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-05-02 Thread Stefan Brüns
bruns edited the summary of this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-05-04 Thread Stefan Brüns
bruns added a comment.


  This has been pending for more than two weeks now, without any sort of review 
...
  
  @ngraham If you have any questions, please ask!

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-05-04 Thread Nathaniel Graham
ngraham accepted this revision.
ngraham added a comment.
This revision is now accepted and ready to land.


  Sorry for the delay. Makes sense.

REPOSITORY
  R293 Baloo

BRANCH
  submit

REVISION DETAIL
  https://phabricator.kde.org/D28932

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams


D28932: Store filename terms just once

2020-05-04 Thread Stefan Brüns
This revision was automatically updated to reflect the committed changes.
Closed by commit R293:7605f4d7f7c4: Store filename terms just once (authored by 
bruns).

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D28932?vs=80439&id=81937

REVISION DETAIL
  https://phabricator.kde.org/D28932

AFFECTED FILES
  autotests/integration/querytest.cpp
  src/engine/termgenerator.cpp
  src/engine/termgenerator.h
  src/file/basicindexingjob.cpp
  src/lib/searchstore.cpp

To: bruns, #baloo, ngraham
Cc: kde-frameworks-devel, hurikhan77, lots0logs, LeGast00n, cblack, 
fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams