Hello,
This cannot be considered a bug. Because Global does not
support multi-byte character code set.
[/usr/local/share/gtags/FAQ]
--------------------------------------------------------------
Q10. Does Global support multi-byte code set?
Which character code set is supported?
A10. Global doesn't support multi-byte character code set yet.
Global supports only ASCII and ASCII super-sets.
--------------------------------------------------------------
Shift-JIS "機能" consists of the following characters:
0x22 "
0x8b (binary)
0x40 @
0x94 (binary)
0x5c \
0x22 "
Since 0x5c ('\') quotes 0x22 ('"'), the parser considers the rest of
the source code as a long string. It is impossible to recognize it
as a failure because it is a correct process.
Regards,
Shigio
On Fri, Nov 17, 2023 at 11:46 AM Johnny Cheng <[email protected]> wrote:
> Hi,
>
> I found that if a file contains a specific CJK characters sequence, the
> parser seems fail to continue parsing the file.
>
> See the follow example source file, let’s say `test.c` in encoding of
> Shift-JIS (cp932).
>
> extern void printf(char * msg, ...);
>
>
>
> void Foo() {
>
> char msg[] = "機能";
>
> printf(msg);
>
> }
>
>
>
> void Hello() {
>
> return;
>
> }
>
> (In case of mojibake due to encoding issue for Kanji, screenshots are also
> provided below.)
>
> - *What was occurred? (as is)*
>
> Now if you run `gtags` command in same folder follow by `global -f
> test.c`, you only get one tag, which is `Foo`, but `Hello` shall also be
> found.
>
> - *What did you expect from it?*
>
> However, if I modify the source a little bit, then tag `Hello` is found.
> See variations I tried in the table below.
>
>
> *Cases Table*
>
> Cases
>
> Source Code Screenshot
>
> global -f test.c
>
> Bad Case
>
> [image: image001.png]
>
> (Encoding is cp932, or shift-jis)
>
> Foo 4 test.cpp void Foo() {
>
> Good Cases
>
> <image001.png>
>
> (Encoding is utf8)
>
>
>
> [image: image002.png]
>
> (Encoding is cp932, or shift-jis)
>
>
>
> [image: image003.png]
>
> (Encoding is cp932, or shift-jis)
>
> Foo 4 test.cpp void Foo() {
>
> Hello 9 test.cpp void Hello() {
>
>
> *My environment*
>
> OS
>
> Windows 11 Enterprise 22H2 64bit Build 22621.2428
>
> gtags --version
>
> gtags (Global) 6.6.9
>
> Powered by Berkeley DB 1.85.
>
> Copyright (c) 1996-2022 Tama Communications Corporation
>
> License GPLv3+: GNU GPL version 3 or later
> http://www.gnu.org/licenses/gpl.html
>
> This is free software; you are free to change and redistribute it.
>
> There is NO WARRANTY, to the extent permitted by law.
>
>
> *Possible Solutions*
>
> - Add a command line encoding option to read the file properly.
> - Find out why such file cannot be fully parsed, ignore such special
> error, and continue parsing.
>
> Also, if such case happens, at least print out some error message to
> inform user that some files are not fully parsed.
>
>
>
>
>
> Johnny Cheng
>
>
--
Shigio YAMAGUCHI <[email protected]>
PGP fingerprint:
26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB