On Wednesday 01 April 2015 17:07:55 Kyle Neal wrote:
> Here is the implementation of my valid utf8 checker
>
> bool Parser::isUTF8( std::string string )
> {
> QString utf8str = QString::fromUtf8( string.c_str() );
>
> for ( int i = 0; i < utf8str.length(); i++ ) {
> if ( utf8str.at( i ) == -3 ) {
> return false;
> }
>
> return true;
> }
This is wrong. Technically speaking, the source could have the UTF-8 version
of the U+FFFD character, in which case it's valid UTF-8 but you'd return
false.
Instead, use QTextCodec with a stateful decoder and check if the number of
invalid characters is non-zero.
That is:
QTextCodec::ConverterState state;
QTextCodec *utf8Codec = QTextCodec::codecForMib(106);
QString result = utf8Codec->toUnicode(string.c_str(), string.length(),
&state);
return state.invalidChars;
I would also recommend that you:
- don't discard that QString result. Reuse it.
- don't use std::string to represent an encoding. A QTextCodec pointer is the
right way.
- use the code above to check any encoding, not just UTF-8
- stop using ifstream
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
_______________________________________________
Interest mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/interest