Hi,
I found that if a file contains a specific CJK characters sequence, the parser
seems fail to continue parsing the file.
See the follow example source file, let’s say `test.c` in encoding of Shift-JIS
(cp932).
extern void printf(char * msg, ...);
void Foo() {
char msg[] = "機能";
printf(msg);
}
void Hello() {
return;
}
(In case of mojibake due to encoding issue for Kanji, screenshots are also
provided below.)
What was occurred? (as is)
Now if you run `gtags` command in same folder follow by `global -f test.c`, you
only get one tag, which is `Foo`, but `Hello` shall also be found.
What did you expect from it?
However, if I modify the source a little bit, then tag `Hello` is found. See
variations I tried in the table below.
Cases Table
Cases
Source Code Screenshot
global -f test.c
Bad Case
(Encoding is cp932, or shift-jis)
Foo 4 test.cpp void Foo() {
Good Cases
<image001.png>
(Encoding is utf8)
(Encoding is cp932, or shift-jis)
(Encoding is cp932, or shift-jis)
Foo 4 test.cpp void Foo() {
Hello 9 test.cpp void Hello() {
My environment
OS
Windows 11 Enterprise 22H2 64bit Build 22621.2428
gtags --version
gtags (Global) 6.6.9
Powered by Berkeley DB 1.85.
Copyright (c) 1996-2022 Tama Communications Corporation
License GPLv3+: GNU GPL version 3 or later http://www.gnu.org/licenses/gpl.html
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Possible Solutions
Add a command line encoding option to read the file properly.
Find out why such file cannot be fully parsed, ignore such special error, and
continue parsing.
Also, if such case happens, at least print out some error message to inform
user that some files are not fully parsed.
Johnny Cheng