On Thursday, 22 August 2019 05:31:44 PDT Edward Welbourne wrote: > That's the UTF8 path Thiago is talking about. > There is no short-cut, although I do wonder why there isn't a "search > for the first byte whose top bit is set", which might equip us with one.
There is. It's that code you didn't understand: the simdDecodeAscii() function is called from the UTF-8 decoder and fails only if the input isn't ASCII. for ( ; end - src >= 16; src += 16, dst += 16) { __m128i data = _mm_loadu_si128((const __m128i*)src); [load 16 characters] #ifdef __AVX2__ const int BitSpacing = 2; // load and zero extend to an YMM register const __m256i extended = _mm256_cvtepu8_epi16(data); [this is the Latin1 to UTF16 expansion, but may be wrong] uint n = _mm256_movemask_epi8(extended); [this extracts the high bit from each byte] if (!n) { // store _mm256_storeu_si256((__m256i*)dst, extended); continue; [if the input was US-ASCII, repeat] } [here, we handle the case of the input containing non-ASCII: store the input that was US-ASCII, the find the first US-ASCII scanning backwards from the end] -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel System Software Products _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development