On Tuesday 20 January 2015 13:18:03 David Narvaez wrote: > On Tue, Jan 20, 2015 at 12:10 PM, Vishesh Handa <m...@vhanda.in> wrote: > > Hey guys > > > > We have a plain text indexing plugin in KFileMetaData. It gives the plain > > text of any file whose mimetype beings with 'text/'. We used to use > > QString::fromUtf8 to convert this into a string. However, this may not be > > ideal as a different encoding can exist. > > > > I've just written a patch to use the system codec and if the conversion > > fails, to abort. Does anyone have an opinions on this? I'm slightly > > conflicted. > > > > Reasons for doing this: If we cannot correctly convert it to text, we're > > just indexing garbage. This often happens with a binary file getting > > detected as text. [1]. > > What about guessing the encoding from some heuristic[0]?
Or just use Qt directly: http://stackoverflow.com/questions/18227530/check-if-utf-8-string-is-valid-in-qt/18228382#18228382 If it fails, either discard the file. Or try again with the system encoding (if that is not UTF-8) and discard otherwise. Bye -- Milian Wolff m...@milianw.de http://milianw.de >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<