Update of bug #62830 (group groff):
Status: In Progress => Fixed
Open/Closed: Open => Closed
_______________________________________________________
Follow-up Comment #15:
commit 78d1ef7c37edeb8cc39ae15bec3020eb31472bd8
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (1/6).
* src/include/unicode.h (to_utf8_string): Declare new function.
* src/libs/libgroff/unicode.cpp (to_utf8_string): New function converts
input integer into UTF-8 sequence (or an HTML character entity in
hexadecimal if the integer is out of range).
commit 6692471f0a31f00b052cec9b223ed963a130edc1
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (2/6).
* src/include/font.h (class font): Declare private member variable
`wch`, a pointer to an existing list type `font_char_metric`. Declare
private member function `get_font_wchar_metric()` to access it.
* src/libs/libgroff/font.cpp (struct font_char_metric): Add members
`next` (a pointer to the struct's own type) and `end_code` of type
`int`.
(glyph_to_ucs_codepoint): New function returns UCS code point from a
(non-composite) `glyph` object, or -1 if invalid.
(font::font): Constructor initializes `wch` member variable to null
pointer.
(font::~font): Destructor frees storage allocated in `font::load()`
for `special_device_coding` member of `wcp` struct, and that of `wcp`
itself.
(font::contains): If `glyph_to_ucs_codepoint()` returns a valid value
for the glyph, populate its wide character metrics and return true.
(font::get_font_wchar_metric): New function obtains font metrics of
input character by Unicode code point.
(font::get_width, font::get_height, font::get_depth)
(font::get_italic_correction, font::get_left_italic_correction)
(font::get_subscript_correction, font::get_character_type)
(font::get_code, font::get_special_device_encoding): If
`glyph_to_ucs_codepoint()` returns a valid value for the glyph,
populate its wide character metrics and return the appropriate
parameter based on them.
(font::get_width): Add conditional guard when computing width for a
glyph from a "Unicode font"; use the computation only if the device
description file ("DESC") didn't declare "unscaled_charwidths".
(font::load): Recognize new directive in font description files:
"charset-range", which works like the existing "charset" directive
except that the glyph descriptions use a `name` of the form
"uFFFF..uFFFF" (where "FFFF" is a hexadecimal digit sequence), and
apply the metrics identically to all glyphs in the designated range.
(font::load): When processing glyph descriptions in "charset" section
and the device has declared the "unicode" directive, stop scaling the
width of the glyph by what `wcwidth()` returns for it. (Does this fix
Savannah #44018?)
commit 64e5f5c687160592d1a47a8dff83d8088fdcc39b
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (3/6).
* src/preproc/html/pre-html.cpp (scanArguments): Recognize but ignore
new option `-U`, used by `grohtml` postprocessor.
* src/devices/grohtml/post-html.cpp: Declare new constant integer
objects `CHARSET_ASCII`, `CHARSET_MIXED`, and `CHARSET_UTF8` to
configure representation of character entities in output.
(main): New option `-U` takes argument configuring the means of
encoding character entities. If the argument is `0` or `-`, select
`CHARSET_ASCII`; if `1`, select `CHARSET_MIXED`, and if `2` or `+`,
select `CHARSET_UTF8`, which is also the default.
(to_unicode): Replace this function with... (to_numerical_char_ref):
...this, which generates a hexadecimal HTML character entity.
(html_printer::add_to_sbuf): Write out UTF-8 sequence if
`charset_encoding` is not `CHARSET_ASCII`, otherwise a numerical
character reference.
(get_html_entity): Return UTF-8 sequence if `charset_encoding` is
`CHARSET_UTF8`. Otherise, Return UTF-8 sequence if `charset_encoding`
is not `CHARSET_ASCII`, otherwise a numerical character reference.
(html_printer::writeHeadMetaStyle): Describe document {XHTML: encoding
and} content as UTF-8 if `charset_encoding` is not `CHARSET_ASCII`,
otherwise as US-ASCII.
commit 7d91bcb4c29a8ba882149422f8e227edef4678ac
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (4/6).
* src/devices/grops/ps.h:
* src/devices/grops/ps.cpp: Include C99 "stdint.h" header for desired
`unit16_t` data type.
(class ps_output): Change type of `put_string` member function's first
argument from `const char *` to `const uint16_t *`. Add third
argument of Boolean type, `is_utf16le`.
* src/devices/grops/ps.cpp (ps_output::put_string): Adjust computations
of `len` and `col` locals if the font in use is UTF-16LE-encoding, and
write out 4-digit instead of 2-digit hexadecimal numeric literals when
that is the case.
(class ps_printer): Change type of `sbuf` member variable from `char`
to `uint16_t`. Change type of third argument to `set_subencoding`
member function from `unsigned char *` to `uint16_t *`.
(ps_printer::set_subencoding): Rename third argument from `codep` to
`code`--it's no longer an indirect reference to a single `char`, but a
2-element `uint16_t` array. If the font's "internalname" directive
contains the substring "-UTF16-", populate `code` argument with
little-endian 16-bit value.
(ps_printer::set_char): Declare `code` as above: a 2-element
`uint16_t` array instead of an unsigned char. Handle case of `code`
using surrogate pairs (`code[1] > 0`).
(ps_printer::flush_sbuf): Conditionalize form of output on font
encoding. Set the Boolean argument to `ps::put_string()` per the
font's "internalname" directive matching the substring "-UTF16-".
commit 76c81423da32cf5eb1451bd69fd4ec7da5ad12c3
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (5/6).
Ship font description files. These are intended as abstractions of
faces to permit consistent naming while permitting customization, just
as with the 12 text typefaces supported across output devices for Latin
scripts in groff (three families of four styles each). These CJK font
descriptions are not organized into groff font families, but are
similar.
CSH: Simplified Chinese, Hei style
CSS: Simplified Chinese, Song style
CTH: Traditional Chinese, Hei style
CTS: Traditional Chinese, Song style
JPG: Japanese, Gothic style
JPM: Japanese, Mincho style
KOG: Korean, Gothic style
KOM: Korean, Mincho style
* font/devdvi/CSH:
* font/devdvi/CSS:
* font/devdvi/CTH:
* font/devdvi/CTS:
* font/devdvi/JPG:
* font/devdvi/JPM:
* font/devdvi/KOG:
* font/devdvi/KOM:
* font/devhtml/CSH:
* font/devhtml/CSS:
* font/devhtml/CTH:
* font/devhtml/CTS:
* font/devhtml/JPG:
* font/devhtml/JPM:
* font/devhtml/KOG:
* font/devhtml/KOM:
* font/devps/CSH:
* font/devps/CSS:
* font/devps/CTH:
* font/devps/CTS:
* font/devps/JPG:
* font/devps/JPM:
* font/devps/KOG:
* font/devps/KOM:
* font/devutf8/CSH:
* font/devutf8/CSS:
* font/devutf8/CTH:
* font/devutf8/CTS:
* font/devutf8/JPG:
* font/devutf8/JPM:
* font/devutf8/KOG:
* font/devutf8/KOM: Ship font descriptions.
* font/devdvi/devdvi.am (DEVDVIFONTFILES):
* font/devhtml/devhtml.am (DEVHTMLFONTS, DEVHTMLFONTFILES):
* font/devdvi/devps.am (DEVPSFONTFILES):
* font/devutf8/devutf8.am (DEVUTF8FONTS, DEVUTF8FONTFILES): Add them.
The test "contrib/hdtbl/examples/test-hdtbl.sh" fails at this commit.
commit f7ca7ae9dd65e8747b3b23a53d50010720a0c711
Author: G. Branden Robinson <[email protected]>
Date: Wed Nov 20 19:51:12 2024 -0600
[hdtbl]: Update test expectations WRT new fonts.
* contrib/hdtbl/examples/test-hdtbl.sh.in: Update test expectations to
reflect addition of 8 font descriptions for CJK support.
commit 26dbf12ff0808169fedcc01a573280cdfba7c4ba
Author: TANAKA Takuji <[email protected]>
Date: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (6/6).
* src/roff/groff/tests/dvi-device-smoke-test.sh:
* src/roff/groff/tests/ps-device-smoke-test.sh: New tests exercise
output drivers and their encodings of CJK characters.
* src/roff/groff/groff.am (groff_TESTS): Run tests.
Fixes <https://savannah.gnu.org/bugs/?62830>.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?62830>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
