15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelm...@qt.io>: > On 14.03.2017 10:50, Konstantin Tokarev wrote: >> 14.03.2017, 12:44, "Harald Vistnes" <harald.vist...@gmail.com>: >>> Hi, >>> >>> I'm currently working on reading and parsing large ASCII based text files >>> and I am wondering what is the current best practice. There are so many >>> classes and macros available, so it can be a bit confusing to know what to >>> use when. >>> >>> QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal, >>> QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder and >>> so on. And then std::string and raw const char* strings. >>> >>> In my case I want to read a large ASCII file line by line, so I don't need >>> unicode. I need to compare a string with a literal, extract substrings and >>> convert some strings to numbers. >>> >>> Should I just use QString all the way, or is it faster to use some other >>> classes when you know you don't need unicode? >> You should use QByteArray here, which is what QIODevice::readLine() >> returns. Avoid using QString as long as possible because that will trigger >> conversion of your text to UTF16 encoding, which may be totally useless in >> your use case. > > If the program is small and you don't want it to ever grow beyond ASCII, > using byte arrays is okay, but in my experience, if you want to be > future-proof, you should interpret byte-arrays *as soon as possible*. > > Then you have an object with a controlled format and you can use that > throughout your program, without worrying about encodings.
In the modern world there is one portable encoding used for exchanging data between systems: UTF-8. So in wide range of applications one can safely assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no confusion. YMMV though. Things change if you intermix textual and non-textual QByteArray's near in your code, in this case it's better to store text strings in objects of different class. > Keeping the > data raw will increase the probability that some module does something > wrong because it assumes a wrong encoding and breaks your results (i.e. > using bytewise comparison for string comparison, which works for ASCII, > but not for unicode - even if both have the same encoding, because there > are letters that have multiple different unicode codepoints). > > -- > > Viktor Engelmann > Software Engineer > > The Qt Company GmbH > Rudower Chaussee 13 > D-12489 Berlin > > viktor.engelm...@qt.io > +49 151 26784521 > > http://qt.io > Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho > Sitz der Gesellschaft: Berlin > Registergericht: Amtsgericht Charlottenburg, HRB 144331 B > > _______________________________________________ > Interest mailing list > Interest@qt-project.org > http://lists.qt-project.org/mailman/listinfo/interest -- Regards, Konstantin _______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest