On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu <ji...@openjdk.org> wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

Some misc remarks from a build PoV:
 * We count language and region settings as a build environment requirement, 
not a portability issue. 
 * It is really a shame that Microsoft is making changes to these so darned 
hard. On all other platforms, LC_ALL=C in the make file just fixes the problem. 
:-(
 * But we do want the JDK to be easy to build, so this means that we might need 
to support building on more than en_US on Windows, at least until Microsoft 
get's their act together.

>From what I see in the discussion here there seems to be no clarity in what 
>range of character the specification allows. This needs to be absolutely clear 
>for any changes here -- we can't filter out legal characters just because they 
>are problematic to build on non en_US platforms.

However, I'm thinking that you need to take a step back and see what you are 
really trying to solve. To me, it seems that sscanf is not the right tool for 
the job, and the fact that it has worked until now is more a lucky coincidence. 
It seems, from a quick glance, that you should consider the input a byte array, 
and process it like that, instead of a string, if the encoding is unclear, and 
the spec is talking about character values (like 0x7f) rather than what 
characters they are supposed to represent in a specific encoding.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5704

Reply via email to