Hi Marcus, Are you talking about a bug or a feature addition? If it is a bug, could you please explain the specific steps to reproduce it? If it is a new feature, could you please explain the specification? Thank you in advance.
Regards, Shigio On Thu, May 30, 2024 at 2:33 AM Marcus Harnisch <[email protected]> wrote: > > Hi Shigio > > I am thinking about tackling this feature in a reasonably useful and robust > way. I am not concerned about Python 2.x, but wouldn't want to break > compatibility either. As it stands, ‘latin1’ encoding is used for > implementing something like “binary but with newlines”. > > The current implementation of pygments_parser.py is incomplete wrt I/O > encoding and will probably break when challenged with characters outside the > ASCII range. > Encodings of any form of input that are not ASCII-compatible are probably not > going to work at all. > Many OS-facing functions, such as ‘os.getenv’, but also the low-level parts > of ‘subprocess.Popen()’ use ‘sys.getfilesystemencoding()’ for determining the > desired encoding. Most current unixoid OS are configured to UTF-8 based > locales, and even Python on Windows defaults to UTF-8 for OS-facing encoding > (since 2016, Python 3.6+, PEP 529). > Any non-ASCII content of gtags.conf is most likely going to break > pygments_parser.py in one way or another. I'd propose to rely on > ‘sys.getfilesystemencoding()’ as well for reading. > Source code must be presented to Pygment's Lexers as string. Programming > languages that allow non-ASCII source code would normally use UTF-8 (e.g. > Python), which I'd recommend for ‘read_file()’, possibly with an appropriate > error handler. Depending on how a Lexer implements string handling, exotic > encodings might even be less broken than before if bytes are preserved via > ‘surrogateescape’ or ‘backslashreplace’. > > IMHO, relying on the respective system default encoding in most places and an > explicit UTF-8 in read_file() is going to improve compatibility and by side > effect helps with unifying code paths between Python 2 and 3. > > Best regards, > Marcus > > On Thu, May 16, 2024 at 12:42 AM Marcus Harnisch > <[email protected]> wrote: >> >> Hi Shigio >> >> Glad to hear that it didn't work :-) Thank you for adding this to the known >> bugs list. >> >> Best regards, >> Marcus >> >> On Tue, May 14, 2024 at 8:16 AM Shigio YAMAGUCHI <[email protected]> wrote: >>> >>> Hi Marcus, >>> I confirmed that the problem is reproduced. >>> I have made a new entry to the 'Known bugs' list. >>> Thank you for the report. >>> >>> [https://www.gnu.org/software/global/bugs.html] >>> o Pygments plug-in parser with python3 does not work, if 'ctagscom' is not >>> set. >>> If it is not set, default path obtained by configure script should be >>> used. >>> >>> $ cat > gtags.conf >>> default:\ >>> :ctagscom=:\ >>> :langmap=C\:.c.h:\ >>> :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la: >>> $ gtags >>> $ global -x '.*' >>> $ _ # no tags >>> >>> Regards, >>> Shigio >>> >>> On Mon, May 13, 2024 at 5:04 PM Marcus Harnisch >>> <[email protected]> wrote: >>> > >>> > Hi Shigio >>> > >>> > On Sat, May 11, 2024 at 5:35 AM Shigio YAMAGUCHI <[email protected]> wrote: >>> >> >>> >> $ cat gtags.conf >>> >> default:\ >>> >> :ctagscom=/opt/local/bin/uctags:\ >>> >> :langmap=C\:.c.h:\ >>> >> :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la: >>> > >>> > >>> > The important difference, which exposes the bug, is your explicit >>> > configuration of ctagscom. Leave it undefined and rely on whatever >>> > UNIVERSAL_CTAGS has been configured to. Only if ctagscom is empty, you >>> > will see a comparison between b'' (empty bytearray) and '' (empty string). >>> > >>> > Best regards, >>> > Marcus >>> >>> >>> >>> -- >>> Shigio YAMAGUCHI <[email protected]> >>> PGP fingerprint: >>> 26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB -- Shigio YAMAGUCHI <[email protected]> PGP fingerprint: 26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB
