Hi Ben,

On Sun, Oct 15, 2023 at 02:22:43AM -0700, Ben Wong wrote:
> Package: dos2unix
> Version: 7.5.1-1
> Severity: normal
> X-Debbugs-Cc: [email protected]
> 
> Dear Maintainer,
> 
> The dos2unix man page claims that the default mode is "ASCII" and that
> in ASCII mode only line endings will be changed. This is no longer
> true. In the default mode, UTF-16 is converted to UTF-8 and the BOM is
> removed.
> 
> I do not know if this is still considered an "ASCII" mode or if the
> default is some new UTF-8 mode. Please consider updating the
> documentation to match the current behavior.

Thank you for your bug report.

I believe the portion of the manpage you are referring to is:

CONVERSION MODES
  ascii
    In mode "ascii" only line breaks are converted. This is the default
    conversion mode.  [**Missing information about UTF-16 behavior.**]

    Although the name of this mode is ASCII, which is a 7 bit standard,
    the actual mode is 8 bit. Use  always  this  mode  when  converting
    Unicode UTF-8 files.

Is this where you are expecting to see the manpage updated?

It is perhaps somewhat hidden in the manpage, but I think this at least
partially addresses the use case you describe:

  -u, --keep-utf16
      Keep  the  original  UTF-16  encoding of the input file. The output
      file will be written in the same UTF-16  encoding,  little  or  big
      endian,  as the input file.  This prevents transformation to UTF-8.
      An UTF-16 BOM will be  written  accordingly.  This  option  can  be
      disabled with the "-ascii" option.

That is, the use of -ascii (the default) negates --keep-utf16 and thus
*does* perform the transformation to UTF-8 and *does not* write the
UTF-16 BOM.

I will forward the report to the upstream author.

Thank you,
tony

Reply via email to