Re: Python 3 and pygments-parser

Shigio YAMAGUCHI Sun, 02 Jun 2024 23:21:42 -0700

Hi Marcus,

Are you talking about a bug or a feature addition?
If it is a bug, could you please explain the specific steps to reproduce it?
If it is a new feature, could you please explain the specification?
Thank you in advance.


Regards,
Shigio

On Thu, May 30, 2024 at 2:33 AM Marcus Harnisch
<[email protected]> wrote:
>
> Hi Shigio
>
> I am thinking about tackling this feature in a reasonably useful and robust 
> way. I am not concerned about Python 2.x, but wouldn't want to break 
> compatibility either. As it stands, ‘latin1’ encoding is used for 
> implementing something like “binary but with newlines”.
>
> The current implementation of pygments_parser.py is incomplete wrt I/O 
> encoding and will probably break when challenged with characters outside the 
> ASCII range.
> Encodings of any form of input that are not ASCII-compatible are probably not 
> going to work at all.
> Many OS-facing functions, such as ‘os.getenv’, but also the low-level parts 
> of ‘subprocess.Popen()’ use ‘sys.getfilesystemencoding()’ for determining the 
> desired encoding. Most current unixoid OS are configured to UTF-8 based 
> locales, and even Python on Windows defaults to UTF-8 for OS-facing encoding 
> (since 2016, Python 3.6+, PEP 529).
> Any non-ASCII content of gtags.conf is most likely going to break 
> pygments_parser.py in one way or another. I'd propose to rely on 
> ‘sys.getfilesystemencoding()’ as well for reading.
> Source code must be presented to Pygment's Lexers as string. Programming 
> languages that allow non-ASCII source code would normally use UTF-8 (e.g. 
> Python), which I'd recommend for ‘read_file()’, possibly with an appropriate 
> error handler. Depending on how a Lexer implements string handling, exotic 
> encodings might even be less broken than before if bytes are preserved via 
> ‘surrogateescape’ or ‘backslashreplace’.
>
> IMHO, relying on the respective system default encoding in most places and an 
> explicit UTF-8 in read_file() is going to improve compatibility and by side 
> effect helps with unifying code paths between Python 2 and 3.
>
> Best regards,
> Marcus
>
> On Thu, May 16, 2024 at 12:42 AM Marcus Harnisch 
> <[email protected]> wrote:
>>
>> Hi Shigio
>>
>> Glad to hear that it didn't work :-) Thank you for adding this to the known 
>> bugs list.
>>
>> Best regards,
>> Marcus
>>
>> On Tue, May 14, 2024 at 8:16 AM Shigio YAMAGUCHI <[email protected]> wrote:
>>>
>>> Hi Marcus,
>>> I confirmed that the problem is reproduced.
>>> I have made a new entry to the 'Known bugs' list.
>>> Thank you for the report.
>>>
>>> [https://www.gnu.org/software/global/bugs.html]
>>> o Pygments plug-in parser with python3 does not work, if 'ctagscom' is not 
>>> set.
>>>   If it is not set, default path obtained by configure script should be 
>>> used.
>>>
>>> $ cat > gtags.conf
>>> default:\
>>>         :ctagscom=:\
>>>         :langmap=C\:.c.h:\
>>>         :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la:
>>> $ gtags
>>> $ global -x '.*'
>>> $ _                             # no tags
>>>
>>> Regards,
>>> Shigio
>>>
>>> On Mon, May 13, 2024 at 5:04 PM Marcus Harnisch
>>> <[email protected]> wrote:
>>> >
>>> > Hi Shigio
>>> >
>>> > On Sat, May 11, 2024 at 5:35 AM Shigio YAMAGUCHI <[email protected]> wrote:
>>> >>
>>> >> $ cat gtags.conf
>>> >> default:\
>>> >> :ctagscom=/opt/local/bin/uctags:\
>>> >> :langmap=C\:.c.h:\
>>> >> :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la:
>>> >
>>> >
>>> > The important difference, which exposes the bug, is your explicit 
>>> > configuration of ctagscom. Leave it undefined and rely on whatever 
>>> > UNIVERSAL_CTAGS has been configured to. Only if ctagscom is empty, you 
>>> > will see a comparison between b'' (empty bytearray) and '' (empty string).
>>> >
>>> > Best regards,
>>> > Marcus
>>>
>>>
>>>
>>> --
>>> Shigio YAMAGUCHI <[email protected]>
>>> PGP fingerprint:
>>> 26F6 31B4 3D62 4A92 7E6F  1C33 969C 3BE3 89DD A6EB



-- 
Shigio YAMAGUCHI <[email protected]>
PGP fingerprint:
26F6 31B4 3D62 4A92 7E6F  1C33 969C 3BE3 89DD A6EB

Re: Python 3 and pygments-parser

Reply via email to