D19107: Write valid UTF8 characters without escaping.

2019-02-20 Thread Jos van den Oever
This revision was not accepted when it landed; it landed in state "Needs 
Review".
This revision was automatically updated to reflect the committed changes.
Closed by commit R237:2cdcd4f30666: Write valid UTF8 characters without 
escaping. (authored by vandenoever).

REPOSITORY
  R237 KConfig

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D19107?vs=52096&id=52159

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-20 Thread Thiago Macieira
thiago added a comment.


  It looks good to me.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-20 Thread Jos van den Oever
vandenoever added a comment.


  So it's accepted?

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Thiago Macieira
thiago added a comment.


  Much better, thanks.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever added inline comments.

INLINE COMMENTS

> thiago wrote in kconfigini.cpp:683
> This function does operate properly to find valid syntax UTF-8 sequences, but 
> it is neither catching overlong sequences nor UTF-8 content above U+10 
> (UTF-8 can encode 0x11000 in 4 bytes).
> 
> See 
> https://code.woboq.org/qt5/qtbase/tests/auto/corelib/codecs/utf8/utf8data.cpp.html#_Z19loadInvalidUtf8Rowsv
>  for potential UTF-8 pitfalls.

Thanks for the unicode explanation. I've added the checks for out of range and 
overlong now.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever updated this revision to Diff 52096.
vandenoever added a comment.


  Clean up test by using QTest data.

REPOSITORY
  R237 KConfig

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D19107?vs=52091&id=52096

BRANCH
  utf8

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever updated this revision to Diff 52091.
vandenoever added a comment.


  Remove duplicate if statement.

REPOSITORY
  R237 KConfig

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D19107?vs=52090&id=52091

BRANCH
  utf8

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever updated this revision to Diff 52090.
vandenoever added a comment.


  Added tests and code fixes to deal with overlong sequences and content 
  above U+10.

REPOSITORY
  R237 KConfig

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D19107?vs=52015&id=52090

BRANCH
  utf8

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever added inline comments.

INLINE COMMENTS

> thiago wrote in kconfigini.cpp:683
> This function does operate properly to find valid syntax UTF-8 sequences, but 
> it is neither catching overlong sequences nor UTF-8 content above U+10 
> (UTF-8 can encode 0x11000 in 4 bytes).
> 
> See 
> https://code.woboq.org/qt5/qtbase/tests/auto/corelib/codecs/utf8/utf8data.cpp.html#_Z19loadInvalidUtf8Rowsv
>  for potential UTF-8 pitfalls.

My understanding of overlong sequences was wrong. I understand now what is 
meant from your link.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Thiago Macieira
thiago added inline comments.

INLINE COMMENTS

> vandenoever wrote in kconfigini.cpp:683
> A check for U+10 > value is needed.
> Overlong sequences are caught on line 696 (count < 4).

That's not what an overlong sequence is. You can produce 2-, 3- and 4-byte 
overlong sequences. See the examples in the Qt test, like 0xc0 0x80.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Jos van den Oever
vandenoever added inline comments.

INLINE COMMENTS

> thiago wrote in kconfigini.cpp:683
> This function does operate properly to find valid syntax UTF-8 sequences, but 
> it is neither catching overlong sequences nor UTF-8 content above U+10 
> (UTF-8 can encode 0x11000 in 4 bytes).
> 
> See 
> https://code.woboq.org/qt5/qtbase/tests/auto/corelib/codecs/utf8/utf8data.cpp.html#_Z19loadInvalidUtf8Rowsv
>  for potential UTF-8 pitfalls.

A check for U+10 > value is needed.
Overlong sequences are caught on line 696 (count < 4).

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-19 Thread Thiago Macieira
thiago added inline comments.

INLINE COMMENTS

> kconfigini.cpp:683
> +// When an additional byte leads to an invalid character, return 
> false.
> +bool addByte(unsigned char b) {
> +if (count == 0) {

This function does operate properly to find valid syntax UTF-8 sequences, but 
it is neither catching overlong sequences nor UTF-8 content above U+10 
(UTF-8 can encode 0x11000 in 4 bytes).

See 
https://code.woboq.org/qt5/qtbase/tests/auto/corelib/codecs/utf8/utf8data.cpp.html#_Z19loadInvalidUtf8Rowsv
 for potential UTF-8 pitfalls.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread Jos van den Oever
vandenoever updated this revision to Diff 52015.
vandenoever added a comment.


  - Remove VALUE define.
  - Spelling fix.

REPOSITORY
  R237 KConfig

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D19107?vs=51933&id=52015

BRANCH
  utf8

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread Jos van den Oever
vandenoever added inline comments.

INLINE COMMENTS

> dfaure wrote in kconfigtest.cpp:1774
> What's the purpose of this very short-lived define, compared to just inlining 
> this into the next line?

Right, copy-paste leftover. I'll fix it.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread Jos van den Oever
vandenoever added a reviewer: thiago.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks, thiago
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread David Faure
dfaure added a comment.


  No objection from me, but I'm no utf-8 expert.
  
  Thiago's input would be very valuable...

INLINE COMMENTS

> kconfigtest.cpp:1774
> +#endif
> +#define VALUE 
> "v1=Téléchargements\nv2=$¢ह€𐍈\nv3=\\xc2\\xe0\\xa4\\xf0\\x90\\x8d"
> +QCOMPARE(fileBytes, QByteArrayLiteral("[General]\n" VALUE "\n"));

What's the purpose of this very short-lived define, compared to just inlining 
this into the next line?

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread Nathaniel Graham
ngraham added a reviewer: Frameworks.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol, #frameworks
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-18 Thread Nathaniel Graham
ngraham edited the summary of this revision.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-17 Thread Viorel-Cătălin Răpițeanu
rapiteanu added a comment.


  The commit fixes (as expected) the config write issue observed in 403557.

REPOSITORY
  R237 KConfig

REVISION DETAIL
  https://phabricator.kde.org/D19107

To: vandenoever, dfaure, arichardson, apol
Cc: rapiteanu, kde-frameworks-devel, michaelh, ngraham, bruns


D19107: Write valid UTF8 characters without escaping.

2019-02-17 Thread Jos van den Oever
vandenoever created this revision.
vandenoever added reviewers: dfaure, arichardson, apol.
Herald added a project: Frameworks.
Herald added a subscriber: kde-frameworks-devel.
vandenoever requested review of this revision.

REVISION SUMMARY
  commit 6a18528 
 
introduced escaping of bytes >= 127 to ensure that
  KConfig files are valid UTF8.
  The simplistic approach with a cutoff results in many escaped bytes
  where it is not required. Especially non-western configuration files
  would have many escapes.
  
  This commit fixes that by only escaping bytes that are not valid UTF8.

TEST PLAN
  ninja && ninja test

REPOSITORY
  R237 KConfig

BRANCH
  utf8

REVISION DETAIL
  https://phabricator.kde.org/D19107

AFFECTED FILES
  autotests/kconfigtest.cpp
  autotests/kconfigtest.h
  src/core/kconfigini.cpp

To: vandenoever, dfaure, arichardson, apol
Cc: kde-frameworks-devel, michaelh, ngraham, bruns