bruns created this revision. bruns added reviewers: Baloo, ngraham, astippich, poboiko. Herald added projects: Frameworks, Baloo. Herald added a subscriber: kde-frameworks-devel. bruns requested review of this revision.
REVISION SUMMARY The (somewhat arbitrary) term truncation was applied to the UTF-8 encoded data, somethimes truncating the term in the middle of a codepoint. Truncate the QString instead. This also has the effect of leaving more useful characters for languages where the majority of codepoints are encoded as 2 or more bytes. This requires some extra storage size in the DB when a term which would have been truncated previously now goes in as is, but likely only a few terms / languages are affected (for english words UTF-8 encodes most codepoints in 1 byte). There is a small caveat for the SearchStore. As queries were truncated likewise, an untruncated query would no longer find untruncated terms from new index runs. To allow matches nevertheless, truncated terms use StartsWith instead of Equal matches. TEST PLAN ctest REPOSITORY R293 Baloo BRANCH phrasestorage_fixes REVISION DETAIL https://phabricator.kde.org/D21865 AFFECTED FILES src/engine/termgenerator.cpp src/lib/searchstore.cpp To: bruns, #baloo, ngraham, astippich, poboiko Cc: kde-frameworks-devel, LeGast00n, fbampaloukas, domson, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, abrahams