[HarfBuzz] Rendering of Arabic shadda-kasrah
Could someone please look at the discussion and the data of the Emacs bug#34035 (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34035) and tell whether the fonts that produce incorrect display are faulty, and if so, what is the problem with those fonts? Also, is there perhaps some way around these problems that would yield better results even with the fonts which currently display the kasrah below the base letter? Because I tried many different fonts with reasonable coverage of Arabic, and the vast majority of them produce this problem, so it seems like the fonts which don't are quite rare. (As you see from the last messages, hb-view produces the same display as Emacs with HarfBuzz, so at least we are doing no worse in these cases, and we have reason to believe this is not an Emacs-specific problem.) TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] vertical text for RTL scripts?
> From: Phil M Perry > Date: Wed, 15 Jul 2020 12:38:49 -0400 > > my TTBHebrew example in HarfBuzz.pdf (did anyone look at it?) I did. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] HarfBuzz crash when shaping Arabic?
Could someone please look at this Emacs bug report: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42352 and tell if something like this rings a bell? According to the backtrace, the crash happened inside HarfBuzz (the backtrace levels above that are the Emacs signal handling mechanism). The user who reported that uses HarfBuzz 2.3.1 as packaged by Debian: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42352#28 The crash happened when some Arabic text was passed to hb_shape_full without providing the explicit direction of the text, but instead relying on hb_buffer_guess_segment_properties to guess it. Was there perhaps a problem in that version of HarfBuzz that could crash like that? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] vertical text for RTL scripts?
> Cc: harfbuzz@lists.freedesktop.org > From: Phil M Perry > Date: Mon, 13 Jul 2020 11:11:51 -0400 > > Eli, I realize that (except for Chinese, Japanese, and possibly Korean), > text is normally written horizontally (LTR or RTL). Vertical text is for > special uses such as signage and advertising. Yes, I understand that, and was replying with that in mind. > Anyway, I'm still not sure what the convention is for writing vertical > text in RTL languages. There's not much discussion of this online, > except for "I want to get a Hebrew tattoo down my spine saying 'daughter > of Jehovah' -- which way will read correctly?" The convention for LTR > scripts is to start at the top and grow downwards, which is like taking > the original LTR coordinate system and rotating it 90 degrees clockwise > (with individual letters rotated back). The next line (column) is to the > LEFT. Yes, agreed. > For RTL, my sources suggest that the last letter input (first one > read) This is fundamentally incorrect: both input from keyboard and reading are done in the same order, even for RTL languages. The only order which is reversed for RTL languages is the left-to-right order on display: the first RTL letter read is generally the rightmost, unlike with LTR scripts. I think the above observation is important, because I'm guessing it is the basis of your confusion regarding the vertical layout. In the vertical layout, the left vs right issue no longer exists (at least as long as we are talking about a single column), so the distinction between LTR and RTL scripts also disappears. Therefore: > should be at the TOP of the text column, which means rotating the > original horizontal coordinate system 90 degrees COUNTERclockwise. For > TTB of a RTL script, it is like a clockwise rotation, with the first > input letter at the top, but reading from the bottom/original right. No, the first input letter is at the top, and the first one you read is also at the top. > Embedded LTR text is read TTB. For BTT, it is like a COUNTERclockwise > rotation, with the first input letter at the bottom, reading from > top/original right. Unfortunately, this leaves embedded LTR text > backwards from what would be expected No. Embedded LTR text will also be laid out TTB, i.e. without reordering it. In short, in vertical layout there's no bidi reordering at all: both LTR and RTL characters are displayed in the logical order, top to bottom. Technically, I think this happens because bidi reordering per UAX#9 works on the line level, so when each character is a separate line, reordering has no effect. > Also, for BTT, is it correct that the next line (column) is to the > RIGHT? Yes, I believe the columns should progress from right to left for the RTL text (modulo the base paragraph direction issue, which your description completely ignores, so my assumption is that you are talking about RTL text in a right-to-left paragraph and LTR text in a left-to-right paragraph, not the other way around). > Finally, I tried some English (LTR Latin) text vertically with "field" > in it, WITHOUT explicitly turning off ligatures (-liga), and it kept the > "f" and "i" separate (good)... does this mean that HarfBuzz officially > knows not to do ligatures with vertical text? Kerning doesn't appear to > be a problem, either. That's something for the HarfBuzz experts here to answer; I'm not such an expert. HTH ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] vertical text for RTL scripts?
> From: Phil M Perry > Date: Sun, 12 Jul 2020 10:15:31 -0400 > > Now, if I specify TTB direction, what should I see? Likewise, what > should BTT direction show? I know very little about RTL/bidi scripts, > and googling for examples gives ambiguous and conflicting information. I > realize that most scripts and languages are rarely written vertically, > except for East Asian (CJK) languages, but it would be nice to know that > the code is handling them correctly. > > If you want to write Hebrew vertically, would you choose TTB or BBT? Hebrew is not written vertically, no more than English or German are. So if you must write it vertically, I guess TTB would be the preferred layout, like with Latin scripts. For example, that's how vertically-laid-out shop signs are made. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sun, 24 May 2020 20:27:26 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > It seems to me that Emacs knows what script a cluster is in; perhaps > it just hasn't united the concepts. It's a kind of coincidence: different scripts almost always require different fonts, and Emacs only composes characters displayed in the same font. > Users may have written some weird clustering combinations, and I can > imagine some weird combinations in the Private Use Areas. I should > investigate. Don't expect anything about PUA, Emacs doesn't assign any useful properties to them. > > That's a feature (you can disable it with disable-point-adjustment). > > Is this documented in info, or does one have to trawl the code to find > out what it does? Every variable in Emacs has a doc string, and you can search them with several apropos commands. We don't describe in the manual every obscure variable, there are too many of them. > It seems that Emacs needs several levels of movement > - by codepoints, by grapheme cluster, by akshara (will be the same as > grapheme cluster in many cases) and by HarfBuzz cluster, or whatever > is used to make access into lam-alif impossible. I have no idea which one Emacs uses, not in these terms. All I can say is that, in HarfBuzz terms, we get the number of "elements" from hb_buffer_get_length, and then index the arrays returned by hb_buffer_get_glyph_infos. Each "element" thus indexed is a separate "thing" for display purposes, and Emacs by default won't let you "enter" such a "thing", it will move across it in its entirety in one go. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sun, 24 May 2020 18:00:45 +0200 > Cc: harfbuzz@lists.freedesktop.org > > In general the safest is to pass the whole paragraph of text and the start > and length of each item (item being a run with same font, direction, script, > and language). I was talking about text that has a single font, direction, script, and language. > This, for example, ensures that HarfBuzz can do basic Arabic-like shaping > across item boundaries e.g. if you break items in the middle of an Arabic > word (due to font change, for example), you still get the > initial/medial/final forms across the boundary as appropriate. Or to put a > combining mark at the start of a paragraph on a dotted circle as it otherwise > has no base. > > If this is not possible, then you can try to pass enough context, like reach > back and forward to first character that is not a combining mark. This may or > may not be enough. > > Shaping space-delimited words is orthogonal to that, context is better be > always provided. So this sounds like passing a physical line that ends in a newline should be good enough? Or are there issues that cross newlines as well? And what is a "paragraph" in this context? > Some fonts do have OpenType lookups that interact with space (e.g. kerning > pairs involving space, or even substitutions involving space), so shaping > words independently will give suboptimal result. You can use HarfBuzz API to > find out if the font has OpenType layout rules involving space, or decide to > live with this limitation. Which API provides this information? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> > I almost understand (and agree), sans one part: the "arbitrary parts" > > of what you wrote. If we want to produce a ligature out of "ffi", the > > shaper will get "fii" and nothing more. Which part here is arbitrary? > > Sending "ffi" alone is an arbitrary decision. The font might have kerning > between "ffi" and what comes before and after it, but you won't get it. The > font might not have a ligature for "ffi" at all, but using kerning instead, > so you will get kerning between "ffi" glyphs and not other glyphs which is > arbitrary. It might be a cursive font that changes glyph shapes based on > surrounding glyphs, and you will get that for "ffi" and not elsewhere which > is arbitrary. > > That is just plain wrong, there is no way around it. So, to make sure I understand the correct solution: you are saying that all the text to be displayed should go through the shaper, is that right? If so, how large should be the chunks of text to be passed to the shaper in any one call, in order to have a correct result? Would it be enough to pass whitespace-separated words one by one? or do we need to send entire physical lines (up to the terminating newline character)? or maybe an entire paragraph? What is the recommendation here? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 21:42:24 +0100 > From: Richard Wordingham > > > As for different scripts: if the character codepoints are the same, > > Emacs currently assigns each character to a single script. > > I'll need to dig deeper. Composition of both 'a' and Greek alpha with > an acute accent works, which suggest that the problem isn't there for > characters with a script property of 'inherited'. Emacs currently leaves it up to HarfBuzz to guess the script, as it doesn't yet have the necessary smarts. > > Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 > > instead, it has several bugs in this area fixed, and will use HarfBuzz > > if available at build time. > > The behaviour in 27.05 is the almost the same as for 24.4, but the > breaking in item (1) is automatically repaired. The process seems slow > - I can see the glyph become final and then revert back to being > medial. I'm puzzled by not being able to step into lam-alif but being > able to step through a series 'beh's. The step into command for > advancing codepoint by codepoint semiworks. The cluster shaping > doesn't break at the cursor - Handa gave me a C code fix so I could > achieve that - but the number of steps into to pass through a cluster > matches the number of codepoints. > > Pressing the 'delete' key still deletes a single character, but may be > that because it's mapped to tpu-delete-current-char. If you press DEL (or Backspace), it will delete a single codepoint. > So, what's not working in Arabic is that one can't move the cursor > through ligatures. That's a feature (you can disable it with disable-point-adjustment). The rest of your observations seem to be too Emacs-specific to discuss here. You are welcome to submit an Emacs bug report if you think something isn't working as it should, or would like to discuss Emacs-specific details. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Cc: harfbuzz@lists.freedesktop.org > From: Simon Cozens > Date: Sat, 23 May 2020 20:14:16 +0100 > > On 23/05/2020 08:44, Eli Zaretskii wrote: > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > Travelling further in the wrong direction is always an option, but don't > expect it to get you closer to the right destination. I don't think this is an adequate analogy. What Emacs does is an approximation to what should be done. The approximation falls short of the target, that's true, and might even produce clearly incorrect results in some cases (although I've yet to see such cases, and I'm using Emacs for editing non-ASCII text for 20 years). But it is still an approximation, so it is not really "the wrong direction" (which you seem to interpret as 180 degrees off, otherwise even going in the wrong direction might bring me closer to the destination, right?). ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 20:06:32 +0100 > From: Richard Wordingham > > There are three different tools for producing what looks like an "ffi" > ligature: > > 1) Make a ligature > 2) Contextual substitution > 3) A mix of contextual substitution and kerning. > > A font that uses the first will produce a ligature for Emacs. > > A font that uses contextual substitution will not work - you will just > see the 3 unligated characters with their default glyphs. > > A font that uses a mix of contextual substitution and kerning will > likewise fail. However, if is possible that you might get the "ff" > ligature and a normal 'i', or a normal 'f' and an "fi" ligature. > > From the point of view of someone who expects full shaping, what result > you get will be arbitrary, depending on how the font designer has > marshalled his tools. I understand. Still, the result looks reasonably good in most cases, especially in an editor whose main purpose is to edit programs, and which doesn't pretend to produce typographical accuracy. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:54:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > We pass to the shaper the part of text that matches the regexps you > > can see at the end of misc-lang.el, then display the glyphs the shaper > > returns. The above description is a high-level overview; there are > > many details that I cannot describe in a short message. For example, > > for Arabic, when we get back the grapheme clusters, we lay them out, > > then skip to the end of the text that we passed to the shaper. > > You mean this: > https://repo.or.cz/emacs.git/blob/HEAD:/lisp/language/misc-lang.el#l78 > > I’m not sure how can I read it, but it seems to be missing the entire Arabic > Extended-A and Arabic Mathematical Alphabetic Symbols blocks. I’m not also > sure how it would handle using combining marks from other blocks with Arabic > text (say putting U+20D6 over an Arabic letter). If you can suggest improvements to those patterns, please do, and thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:40:44 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Sending “ffi” alone is an arbitrary decision. The font might have kerning > between “ffi” and what comes before and after it, but you won’t get it. The > font might not hav a ligature for “ffi” at all, but using kerning instead, so > you will get kerning between “ffi” glyphs and not other glyphs which is > arbitrary. It might be a cursive font that changes glyph shapes based on > surrounding glyphs, and you will get that for “ffi” and not elsewhere which > is arbitrary. > > That is just plain wrong, there is no way around it. OK, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:18:33 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > The Emacs display engine examines the text to be displayed and laid > > out one character at a time, and makes layout decisions after each > > character or grapheme cluster it lays out. Its design is therefore > > fundamentally incompatible with shaping large substrings of buffer > > text at once. We do support that for short sequences of characters, > > which seems to work well enough for complex shaping (a.k.a. "character > > compositions") of scripts that require that, but we still do that one > > grapheme cluster at a time. > > That wouldn’t work for Arabic. You can’t shape Arabic one grapheme cluster at > a time (or any other text actually, but the brokenness in Arabic will be > immediately obvious), so I’m most certain that is not exactly how Arabic is > handled in Emacs right now. We pass to the shaper the part of text that matches the regexps you can see at the end of misc-lang.el, then display the glyphs the shaper returns. The above description is a high-level overview; there are many details that I cannot describe in a short message. For example, for Arabic, when we get back the grapheme clusters, we lay them out, then skip to the end of the text that we passed to the shaper. > > The character composition is implemented > > in Lisp, which is called by the display engine, and which then calls > > back into C to invoke the shaper. This implementation is meant to > > allow a great deal of control on what should be composed and how. But > > it is also relatively slow, which is another reason why doing that for > > all the text to be laid out is impractical: it slows down redisplay to > > the degree that it becomes annoying to users. > > Having more control should not be at the price of doing things wrong. No one said it should, that's just how things are. > The whole composition concept of Emacs does not make any sense to me, all > text is “composed”. You can have a special mode that would disable shaping > for specific purposes (opening huge log files, wanting to see raw text with > no bidi or shaping, etc), but this can be done in cooperation with HarfBuzz > and not by bypassing it entirely. We are talking about a piece of software designed 21 years ago. I realize that it makes no sense to you, but that's what we have, and will probably have for the next 10 years or so. We must make the most out of what we have. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 20:09:50 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Overall, if you can’t send the whole text (words are the absolute minimum, > but this has its issues as well), don’t just send arbitrary parts of it as > the result will be some inconsistent mess. I almost understand (and agree), sans one part: the "arbitrary parts" of what you wrote. If we want to produce a ligature out of "ffi", the shaper will get "fii" and nothing more. Which part here is arbitrary? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:54:51 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > > Emacs supports more than one rule for each composable sequence of > > characters. > > That doesn't help when the rules give conflicting divisions into > clusters, which is the case with Tai Tham. The assumption is that either the rules can be arranged in an order that allows to use the first matching rule, or, failing that, that you write your own composing function that implements whatever logic that's required to select the right rule. > The Devanagari rule only covers the Vedic marks in the Devanagari block, > the 'stress signs' according to the comments. Can rules essentially > for different scripts now share combining marks? The newer Vedic marks > were supposed to be available to at least all Indian Indic scripts. I don't know enough about this to make sure I even understand the question, let alone can provide an answer. One thing I can say is that the regexp pattern in a rule can specify different context (the surrounding characters) even if the character that triggers the rule is the same. Failing that, I guess the solution will again be the function that produces the composition. As for different scripts: if the character codepoints are the same, Emacs currently assigns each character to a single script. > > Does Emacs indeed fail to wrap Arabic text? can you show an example? > > Character level wrapping still almost works down at Emacs 24.4, but I > don't know that it wasn't broken in later enhancements. There are three > features that make me think Emacs 24.4 might be different to the > current state of affairs: > > (1) Clicking into the text breaks text before the cursor, but not after > it. > (2) I can't step into lam-alif the way I step into Indic clusters. > (3) Lam-alif isn't broken by line wrap. Emacs 24.4 is very old, and doesn't use HarfBuzz. Please try Emacs 27 instead, it has several bugs in this area fixed, and will use HarfBuzz if available at build time. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 16:33:12 +0100 > From: Richard Wordingham > > On Sat, 23 May 2020 11:25:38 +0300 > Eli Zaretskii wrote: > > > > From: Khaled Hosny > > > Date: Sat, 23 May 2020 09:51:21 +0200 > > > Cc: harfbuzz@lists.freedesktop.org > > > What are you going to do about kerning, or mark positioning? > > > Partially kerning arbitrary glyphs (because the sub string match > > > some regular expression) is worse than not kerning at all. > > > > I don't think I understand the question. How is kerning related to > > the issue at hand? I'm not an expert on typesetting text (so maybe I > > don't even understand what exactly is meant by "kerning" in this > > context), so please tell more details about this. > > The simplest way of laying out proportionally spaced text is to have a > fixed glyph-dependent distance ('advance width') from the 'origin' of a > glyph to the origin of the next glyph and simply lay them out in a > sequence, like movable type. However, if one chooses widths suitable > for the sequences 'AM' and 'MV', then there may be an unsightly gap in > the middle of 'AV'. Kerning is basically the process of adjusting those > gaps. Kerning is done by the shaper. To do it, it needs the > whole sequence of characters. Ah, okay, thanks. Then yes, Emacs just uses the advance width that we get from the metrics of each glyph. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Sat, 23 May 2020 14:51:53 +0100 > From: Richard Wordingham > > > > They may of course have more than one set of such rules, with the > > > rule sets defining different sets of sequences. > > > > Who are "they" in this context? > > Devanagari and Tai Tham are two examples I am aware of. Emacs supports more than one rule for each composable sequence of characters. > Devanagari has different rules for positioning of Vedic marks between > fonts using the script tags dev and dev2 for it on one hand and the > unofficial script tag dev3, which follows the USE rules for character > ordering. For tag dev, Microsoft says that candrabindu, consonant> is one cluster; others, including Unicode, say > it's two. Candrabindu in the middle and candrabindu at the end mean > different things; the former nasalises a consonant, while the latter > nasalises a vowel. The visual distinction exists, at least when > half-forms are used. See the rules set up near the end of indian.el in Emacs. If they don't cover what you describe, we can add more. > > I'm not talking about Arabic. Emacs has a set of regular expressions > > for sequences of Arabic characters that need shaping, misc-lang.el in > > Emacs. If the set is incomplete, we can augment it. > > That regular expression treats every Arabic word as in need of shaping. > > > If a font requires special shaping for any sequence of any number of > > 26 (or maybe 52) ASCII letters, then the Emacs display engine will > > need to be redesigned. So this extreme possibility doesn't bother me. > > In general, they do require it. But how is this worse than handling > Arabic? I don't know. Maybe it isn't. Or maybe the slowdown while displaying ASCII and moving the cursor through it will be unbearable. > Is the problem that you want to keep the option of line > wrapping splitting words for ASCII, but are not bothered for Arabic or > other human languages? Does Emacs indeed fail to wrap Arabic text? can you show an example? > > > How would you handle the possibility that all three of <æ>, > > > and might be rendered by the same glyph, althouɡh they > > > are comprised of 1, 2 and 3 characters respectively? > > > > By using a composition rule that matches both and . > > The rules are regexp-based, and expressing the above as a regexp is > > simple. Once a sequence of characters matches the regexp, Emacs calls > > the shaper (hb_shape etc.) to produce the font glyphs for the > > sequence, and displays the glyphs that the shaper returns. > > I think you mean that Emacs would store the position of components by > an index that was the sequence of characters, not the glyph ID. That > would also deal with precomposed characters - it would be the character > sequence that mattered, and for cursor movement and rendering, > the canonically equivalent sequence(s) and the precomposed character > would remain distinct. Sorry, I don't follow: what do you mean by "store"? Emacs stores the rules used to compose characters, and it stores the results of the compositions already done by applying those rules, as part of displaying some chunk of text. Which one of these did you have in mind? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:59:15 +0200 > Cc: harfbuzz@lists.freedesktop.org > > Also either Emacs is currently treating text that it enables shaping for as > second-class citizens where limitations/degraded performance is acceptable > (which is really really bad) Could you tell more about which limitations and degraded performance you had in mind? I'm not sure we have this, but cannot tell without understanding the issues. > or “redesigning the entire Emacs display engine” is not really needed as you > can just declare all text as text that needs to be shaped and be done with it. The Emacs display engine examines the text to be displayed and laid out one character at a time, and makes layout decisions after each character or grapheme cluster it lays out. Its design is therefore fundamentally incompatible with shaping large substrings of buffer text at once. We do support that for short sequences of characters, which seems to work well enough for complex shaping (a.k.a. "character compositions") of scripts that require that, but we still do that one grapheme cluster at a time. The character composition is implemented in Lisp, which is called by the display engine, and which then calls back into C to invoke the shaper. This implementation is meant to allow a great deal of control on what should be composed and how. But it is also relatively slow, which is another reason why doing that for all the text to be laid out is impractical: it slows down redisplay to the degree that it becomes annoying to users. That is why solving these problems in the way that you suggest requires a complete rewrite of the Emacs display code. It simply cannot currently support what you expect. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 09:51:21 +0200 > Cc: harfbuzz@lists.freedesktop.org > > > Thanks. Since (b) is not really feasible without redesigning the > > entire Emacs display engine (for which I see no volunteers lining up > > any time soon), I guess we will have to use some more-or-less > > reasonable and somewhat unreliable heuristics by supporting only some > > ligatures that are known in advance. > > What are you going to do about kerning, or mark positioning? Partially > kerning arbitrary glyphs (because the sub string match some regular > expression) is worse than not kerning at all. I don't think I understand the question. How is kerning related to the issue at hand? I'm not an expert on typesetting text (so maybe I don't even understand what exactly is meant by "kerning" in this context), so please tell more details about this. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> From: Khaled Hosny > Date: Sat, 23 May 2020 08:36:10 +0200 > Cc: harfbuzz@lists.freedesktop.org > > >The only way of > > doing this right, I'm told, is to either (a) query the font to get the > > list of all the ligatures it supports, or (b) assume any combination > > of characters can produce a ligature, and therefore we need to pass > > all the characters intended for display through hb_shape. The latter > > in particular is in stark contrast to how the current Emacs display > > code is designed and implemented. > > (a) is not realistically possible as doing it properly has pretty much the > same cost as shaping the text. So your only reliable option is (b). Thanks. Since (b) is not really feasible without redesigning the entire Emacs display engine (for which I see no volunteers lining up any time soon), I guess we will have to use some more-or-less reasonable and somewhat unreliable heuristics by supporting only some ligatures that are known in advance. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Ligatures
> Date: Fri, 22 May 2020 22:22:49 +0100 > From: Richard Wordingham > > > The current support for producing ligatures works in the same way as > > complex text shaping for scripts that require that, like Arabic and > > Khmer: the sequences of characters that can be displayed as ligatures > > are identified in advance with suitable regular expressions, and the > > display engine then passes these sequences to hb_shape to produce the > > ligatures. > > > > This works well for scripts that require complex shaping, because such > > scripts generally have well-defined rules for the sequences of > > codepoints that need shaping. > > They may of course have more than one set of such rules, with the rule > sets defining different sets of sequences. Who are "they" in this context? > > However, I'm being told that this assumption is false, and that each > > font defines ligatures from any number of arbitrary combinations of > > characters, and therefore the exhaustive list of the ligatures is in > > practice infinite and cannot be provided in advance. > > This arbitrariness is true. Over the set of all credible fonts for a > given character repertoire, the number of ligating combinations is > unbounded. I understand that the number of combinations is theoretically unbounded. I'm asking if it is also unbounded in practice. That is, do font designers add ligatures for arbitrary combinations of characters, regardless of some reasonable set of requirements? For example, is the set of ligatures of Latin characters shown here: https://en.wikipedia.org/wiki/Orthographic_ligature#Latin_alphabet reasonably complete, or should I expect any number of other arbitrary combinations of Latin characters popping up in fonts? And if the latter, then what is the purpose of providing such arbitrary ligatures? > > To be specific, I'm talking about 2 kinds of ligatures: > > > > . ligatures made of Latin characters, like "ffi" and "Th" > > . ligatures produced from symbols, like "==>" that is > > converted into ⟹ Yes, these are the only cases that I'm asking here about. I'm not asking about shaping complex scripts such as Arabic, where this problem doesn't exist AFAIK. > Have you addressed the cursive scripts yet, such as Arabic? At its > simplest, most consonants have four shapes, initial, medial, final and > isolated, and roughly speaking the shape used depends on the adjacent > spacing characters. For the most part, Emacs would have to pass whole > words into HarfBuzz for shaping. In some of the more advanced fonts, > the vowel marks in a word may also affect the shape of the consonant > skeleton. And of course, sometimes the Arabic script prefers to join > letters vertically, as well as having a few straightforward ligatures. I'm not talking about Arabic. Emacs has a set of regular expressions for sequences of Arabic characters that need shaping, misc-lang.el in Emacs. If the set is incomplete, we can augment it. > A cursive Latin script font may behave in the same way, with the shape > of letters depending on what precedes and follows them. With a small > enough character repertoire, there might be no ligatures, but your > rendering logic would fail miserably. If a font requires special shaping for any sequence of any number of 26 (or maybe 52) ASCII letters, then the Emacs display engine will need to be redesigned. So this extreme possibility doesn't bother me. > How would you handle the possibility that all three of <æ>, and > might be rendered by the same glyph, althouɡh they are > comprised of 1, 2 and 3 characters respectively? By using a composition rule that matches both and . The rules are regexp-based, and expressing the above as a regexp is simple. Once a sequence of characters matches the regexp, Emacs calls the shaper (hb_shape etc.) to produce the font glyphs for the sequence, and displays the glyphs that the shaper returns. > And if Emacs is not imposing a normalisation, then all the > precomposed characters in Unicode might have been entered as one or > as more than one character? If you are talking about composition with combining characters, Emacs already has the rules to compose them as described above. You can try this in your Emacs: insert a, then U+0301 COMBINING ACUTE ACCENT, and you should see them composed into a single glyph (provided that you use a suitable font). But I'm not asking about character composition in general, I'm asking specifically about ligatures of ASCII characters, without any non-ASCII codepoints or combining accents. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Ligatures
Hi, This is a bit off-topic, but I thought it could be appropriate to ask here, since we have here some of the best experts on this subject. We are discussing support for ligatures in Emacs, specifically when using HarfBuzz as the shaping engine. See the discussion from https://lists.gnu.org/archive/html/emacs-devel/2020-05/msg02493.html The current support for producing ligatures works in the same way as complex text shaping for scripts that require that, like Arabic and Khmer: the sequences of characters that can be displayed as ligatures are identified in advance with suitable regular expressions, and the display engine then passes these sequences to hb_shape to produce the ligatures. This works well for scripts that require complex shaping, because such scripts generally have well-defined rules for the sequences of codepoints that need shaping. My original thoughts were that ligatures could be supported in the same way, based on the assumption that the list of possible ligatures is finite and can be stored in a suitable data stricture in advance. However, I'm being told that this assumption is false, and that each font defines ligatures from any number of arbitrary combinations of characters, and therefore the exhaustive list of the ligatures is in practice infinite and cannot be provided in advance. The only way of doing this right, I'm told, is to either (a) query the font to get the list of all the ligatures it supports, or (b) assume any combination of characters can produce a ligature, and therefore we need to pass all the characters intended for display through hb_shape. The latter in particular is in stark contrast to how the current Emacs display code is designed and implemented. To be specific, I'm talking about 2 kinds of ligatures: . ligatures made of Latin characters, like "ffi" and "Th" . ligatures produced from symbols, like "==>" that is converted into ⟹ Can someone please tell what are the recommended practices regarding these ligatures? Is the set of possible ligatures indeed infinite and impossible to know in advance? And does HarfBuzz have APIs to query a font about the ligatures it supports? Thanks in advance for any help. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Support for Stylistic Sets
> Date: Sun, 15 Sep 2019 12:37:25 +0100 > From: Richard Wordingham > > > > > Does HarfBuzz guess the language? > > > > > > Yes. > > It seems to use the current locale. That will usually be wrong for > cuneiform, and generally be wrong for multilingual text. But Emacs currently doesn't know better anyway. When it does, we will pass that information to HarfBuzz, but for now I see no reason to replace HarfBuzz's guess based on the locale by Emacs's guess based on that same locale. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Support for Stylistic Sets
> From: Nikolay Sivov > Date: Sun, 15 Sep 2019 10:03:01 +0300 > Cc: Richard Wordingham , > Harfbuzz > > > Essentially yes, i.e. unsupported features will simply be ignored. > > Then there's no need to know whether a feature is supported. Thanks. > > MS Word for example shows a preview for each support ssXX feature, and user > can select one they want. > > I don't know how (or why) you plan to use that for emacs, but you'll need to > have some logic to figure out > which one to enable. I think this should be up to the user and/or the application, i.e. Lisp program that wants to take advantage of these features. Or maybe I misunderstand what you mean by "figure out which one to enable"? Can you elaborate on the potential pitfalls? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Support for Stylistic Sets
> Date: Sat, 14 Sep 2019 21:33:00 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > On Sat, 14 Sep 2019 21:15:04 +0300 > Eli Zaretskii wrote: > > > > Date: Sat, 14 Sep 2019 18:13:25 +0100 > > > From: Richard Wordingham > > > > > > I think it's safe to specify the use of unsupported features, in > > > which case this is a luxury feature. > > > > you mean, specifying an unsupported feature will not cause hb_shape to > > fail, but instead just use the nominal glyphs? > > Essentially yes, i.e. unsupported features will simply be ignored. Then there's no need to know whether a feature is supported. Thanks. > > Emacs currently leaves it to HarfBuzz to guess the language, so I > > don't think this is an issue. > > Does HarfBuzz guess the language? Yes. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Support for Stylistic Sets
> Date: Sat, 14 Sep 2019 18:13:25 +0100 > From: Richard Wordingham > > I think it's safe to specify the use of unsupported features, in which > case this is a luxury feature. you mean, specifying an unsupported feature will not cause hb_shape to fail, but instead just use the nominal glyphs? > One complication is that features are provided by a font on a (per > script) per language basis. Why is that a complication? The user who requests the feature should do so only for text of a suitable script, no? > For example, my Da Lekh font provides feature ss19 for the default > language, but not for Lao, Tai Lü or 'Shan'. In this font, Feature > ss19 means apply Lao style, and that is applied automatically if the > font is told it is being used for Lao. It would be a bit off to tag > aerated Pali text as Lao just to get a Lao style. Aerated Pali has > different line-breaking rules to Lao, which is written without visible > word separation. Emacs currently leaves it to HarfBuzz to guess the language, so I don't think this is an issue. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Support for Stylistic Sets
> From: Nikolay Sivov > Date: Sat, 14 Sep 2019 17:00:53 +0300 > Cc: Harfbuzz > > AFAIU, HarfBuzz does support Stylistic Sets, but it is not clear to me > what should an application do to request glyphs corresponding to a > certain stylistic set. > > Suppose an application wants to display a text string using a specific > stylistic set -- could someone please outline the sequence of API > calls to get that, or point me to some documentation which describes > that? > > Hi, Eli. > > I think it must be a matter of enabling features explicitly, in case you're > asking about it would be features > ss01-ss20, see hb_shape() arguments documentation. > Basically, you set hb_feature_t fields to appropriate tag, value (1 for > enabled), and start/end limits. That should > do it. I'm beginning to see the light, thanks. So hb_feature_t's 'value' field should always be 1 for an enabled features, and its 'tag' field should be something like HB_TAG('s', 's', '0', '1') is that right? The next question is how to know whether a given hb_font_t supports a given feature? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Support for Stylistic Sets
Hi, AFAIU, HarfBuzz does support Stylistic Sets, but it is not clear to me what should an application do to request glyphs corresponding to a certain stylistic set. Suppose an application wants to display a text string using a specific stylistic set -- could someone please outline the sequence of API calls to get that, or point me to some documentation which describes that? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Display issue with DejaVu Sans Mono font
> From: Khaled Hosny > Date: Mon, 19 Aug 2019 01:05:51 +0200 > Cc: Harfbuzz > > > So this is indeed some problem with that particular font? > > It is partly a font issue (missing anchors and combining marks default > position to the left of base glyph), and partly HarfBuzz design decision of > preferring composed forms. See > https://github.com/harfbuzz/harfbuzz/issues/653. Thanks for the pointer. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Display issue with DejaVu Sans Mono font
> From: Khaled Hosny > Date: Sun, 18 Aug 2019 00:48:14 +0200 > Cc: Harfbuzz > > > https://lists.gnu.org/archive/html/bug-gnu-emacs/2019-08/msg01082.html > > > > Is there something wrong with this font when displaying this sequence, > > or is there some kind of bug in Emacs and/or HarfBuzz? > > The second accent is placed next to the glyph, but hb-view is incorrectly > clipping the image, as you can see from hb-shape output: > > $ hb-shape DejaVuSansMono.ttf -u '061,301,302' > [aacute=0+1233|uni0302=0+0] > > Adding some margins gives: > > $ hb-view DejaVuSansMono.ttf -u '061,301,302’ --margin=0,150,0,0 > > > > HarfBuzz will compose U+0061 + U+0301 to U+00E1 (since it prefers composed > form when supported by the font), and that glyph does not have anchors to > position any marks above it, so the circumflex ends up with its default > position next to the glyph. So this is indeed some problem with that particular font? Because other fonts, including monospaced ones, don't seem to produce the same problem: the U+0302 glyph is correctly placed on the base character. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Display issue with DejaVu Sans Mono font
We have some strange display problem in Emacs with this sequence: u+0097 u+0301 u+0302 The problem seems to happen only with DejaVu Sans Mono font. It looks like hb-view also displays the sequence with only one of the two combining accents, see the images in https://lists.gnu.org/archive/html/bug-gnu-emacs/2019-08/msg01082.html Is there something wrong with this font when displaying this sequence, or is there some kind of bug in Emacs and/or HarfBuzz? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Failure in hb_font_get_nominal_glyph
> From: Behdad Esfahbod > Date: Thu, 25 Jul 2019 12:08:43 -0400 > Cc: Khaled Hosny , harfbuzz@lists.freedesktop.org > > Looks good to me. Thanks! ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Failure in hb_font_get_nominal_glyph
> From: Behdad Esfahbod > Date: Wed, 24 Jul 2019 15:21:15 -0400 > Cc: Eli Zaretskii , > "harfbuzz@lists.freedesktop.org" > > Ah, right. Yes. Before 2.0.0 you'd have to call hb_ot_font_set_funcs() > explicitly... > > Thanks Khaled! Thanks. Just to be sure I understand: is the below the right fix? diff --git a/src/w32uniscribe.c b/src/w32uniscribe.c index aa6bebd..8fbbe7e 100644 --- a/src/w32uniscribe.c +++ b/src/w32uniscribe.c @@ -32,6 +32,7 @@ #define _WIN32_WINNT 0x0600 #include #ifdef HAVE_HARFBUZZ # include +# include /* for hb_ot_font_set_funcs */ # if GNUC_PREREQ (4, 3, 0) # define bswap_32(v) __builtin_bswap32(v) # else @@ -1305,7 +1308,12 @@ w32hb_get_font (struct font *font, double *scale) hb_face_t *hb_face = hb_face_create_for_tables (w32hb_get_font_table, font_handle, NULL); if (hb_face_get_glyph_count (hb_face) > 0) -hb_font = hb_font_create (hb_face); +{ + hb_font = hb_font_create (hb_face); + /* This is needed for HarfBuzz before 2.0.0; it is the default +in later versions. */ + hb_ot_font_set_funcs (hb_font); +} struct uniscribe_font_info *uniscribe_font = (struct uniscribe_font_info *) font; ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Failure in hb_font_get_nominal_glyph
> From: Behdad Esfahbod > Date: Wed, 24 Jul 2019 15:11:03 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > Nothing stands out to me. Thanks for taking a look. Could something like that be caused by an old version of Freetype library used with HarfBuzz? I believe when the OP upgraded his HarfBuzz he also upgraded Freetype as its dependency. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Failure in hb_font_get_nominal_glyph
Could someone please take a look at the problems described here: https://lists.gnu.org/archive/html/emacs-devel/2019-07/msg00540.html https://lists.gnu.org/archive/html/emacs-devel/2019-07/msg00557.html https://lists.gnu.org/archive/html/emacs-devel/2019-07/msg00558.html https://lists.gnu.org/archive/html/emacs-devel/2019-07/msg00561.html and tell whether it is expected that HarfBuzz 1.7.5 is too old to support hb_font_get_nominal_glyph reliably on MS-Windows? According to the HarfBuzz docs, that function is available since v1.2.3. Or maybe the code we have in Emacs has a bug? If you want to have a look at the code that fails, it is here: http://git.savannah.gnu.org/cgit/emacs.git/tree/src/w32uniscribe.c#n1328 In a nutshell, the question is: why would hb_font_get_nominal_glyph fail for the Courier New font, even when we are requesting a glyph for an ASCII character? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Order of combining diacriticals
> From: Khaled Hosny > Date: Thu, 20 Jun 2019 22:09:24 +0200 > Cc: Behdad Esfahbod , Harfbuzz > > > I mean whether you are using HarfBuzz with FreeType font functions, > internal ones or something custom does not matter for fallback Hebrew > shaping. > > If you want to additionally use HarfBuzz with bitmap or Type 1 fonts > on Windows, you would need to implement custom font functions for > thase that would use GDI API to access glyph metrics and kerning, but > this is orthogonal to fallback shaping. Ah, okay. I understand now, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Order of combining diacriticals
> From: Khaled Hosny > Date: Thu, 20 Jun 2019 17:33:47 +0200 > Cc: Behdad Esfahbod , Harfbuzz > > > > >. For fonts that have no 'hebr' features, Emacs performs > > > substitution of known precomposed characters before it invokes the > > > shaping engine. In this case, it substituted U+FB31 for the > > > sequence U+05D1,U+05BC, and passed the sequence U+FB31,U+05B0 to > > > HarfBuzz. > > > > > > You should remove all such hacks. > > > > I understand that for HarfBuzz they are probably not needed, if the > > necessary functions for accessing the glyphs are provided (something > > that might not be true on Windows, where we don't use Freetype > > directly). > > This functionality either depends on Unicode decompositions (or in > case of Hebrew hard-coded tables in HarfBuzz), so the font functions > used make no difference. I'm not sure I understand what font functions you are talking about here. The simplest font backends in Emacs: Xfont on Unix and GDI on MS-Windows, when working with fonts that don't have the necessary OTF features, might be unable, to figure out that certain combinations of base character and combining mark have precomposed glyphs in the font being used. So Emacs feeds them the precomposed characters instead. How are font functions related to this? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HB config
> From: Behdad Esfahbod > Date: Tue, 18 Jun 2019 12:12:44 -0700 > Cc: "harfbuzz@lists.freedesktop.org" > > Hi Jonathan, Dominik, others, > > You might have noticed I spent last couple of months trimming down HarfBuzz > binary size. I put some notes > together in the repo: > > https://github.com/harfbuzz/harfbuzz/blob/master/CONFIG.md > > I like to hear any feedback, as well as any other tricks that need to be > documented. Thank you very much, there's a lot of very useful information there. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Order of combining diacriticals
> From: Behdad Esfahbod > Date: Fri, 14 Jun 2019 11:34:17 -0700 > Cc: Khaled Hosny , > "harfbuzz@lists.freedesktop.org" > > On Thu, Jun 13, 2019 at 2:18 AM Eli Zaretskii wrote: > >. For fonts that have no 'hebr' features, Emacs performs > substitution of known precomposed characters before it invokes the > shaping engine. In this case, it substituted U+FB31 for the > sequence U+05D1,U+05BC, and passed the sequence U+FB31,U+05B0 to > HarfBuzz. > > You should remove all such hacks. I understand that for HarfBuzz they are probably not needed, if the necessary functions for accessing the glyphs are provided (something that might not be true on Windows, where we don't use Freetype directly). But Emacs also has other font backends, which are not as capable. In any case, this particular situation uncovered a subtle bug in how Emacs uses the information provided by HarfBuzz, so it was a Good Thing we did have this particular hack. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Order of combining diacriticals
> Date: Wed, 12 Jun 2019 22:24:12 +0200 > From: Khaled Hosny > Cc: harfbuzz@lists.freedesktop.org > > On Wed, Jun 12, 2019 at 10:22:48PM +0300, Eli Zaretskii wrote: > > In Emacs, we use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES cluster > > level, because HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS produced > > incorrect display. > > The cluster levels shouldn’t affect display, the glyph positions are > exactly the same for all the three: Thanks, I guess I was misremembering something I've read in the HarfBuzz docs. > > U+05D1 HEBREW LETTER BET > > U+05B0 HEBREW POINT SHEVA > > U+05BC HEBREW POINT DAGESH > > > > > I need to type them in the above order; if I type DAGESH before SHEVA, > > the produced display is incorrect. > > The glyph order and positions are the same regardless of the input order > (which is what I’d expect since HarfBuzz normalizes mark order), the > only difference is cluster values which is also expected AFICT: > > $ hb-shape NotoSerifHebrew-Regular.ttf --unicodes="U+05D1,U+05B0,U+05BC" > --cluster-level=1 > [uni05B0=1@178,0+0|uni05BC=1@153,0+0|uni05D1=0+539] > > $ hb-shape NotoSerifHebrew-Regular.ttf --unicodes="U+05D1,U+05BC,U+05B0" > --cluster-level=1 > [uni05B0=2@178,0+0|uni05BC=1@153,0+0|uni05D1=0+539] > > > Is this expected with level-0 clusters? Or should I look for a bug in > > how Emacs uses HarfBuzz? > > Might be a result of hb_buffer_reverse_clusters() used by Emacs. Since we work on cluster level 0, there's only one cluster in this case, no matter what is the order of the characters in the original text. So cluster reversal cannot (and does not) have any effect here. The problem was a different one. The puzzle had two parts: . I used the Courier New font, which evidently doesn't have the 'hebr' OTF features in its GSUB and GPOS tables. If I use a font that does have those features, e.g., Symbola, the problem doesn't happen. . For fonts that have no 'hebr' features, Emacs performs substitution of known precomposed characters before it invokes the shaping engine. In this case, it substituted U+FB31 for the sequence U+05D1,U+05BC, and passed the sequence U+FB31,U+05B0 to HarfBuzz. It turned out there was a subtle bug in the code which uses the information returned by HarfBuzz, which is triggered by this use case: the TO value of the LGLYPH object was computed in a way that confused the Emacs display engine. Fixing the logic in that case resolved the problem. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Order of combining diacriticals
In Emacs, we use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES cluster level, because HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS produced incorrect display. With this level, whenever I type a Hebrew base character with more than one diacritical, I need to type them in certain order, otherwise the display is incorrect. For example, in this series of characters: U+05D1 HEBREW LETTER BET U+05B0 HEBREW POINT SHEVA U+05BC HEBREW POINT DAGESH I need to type them in the above order; if I type DAGESH before SHEVA, the produced display is incorrect. Is this expected with level-0 clusters? Or should I look for a bug in how Emacs uses HarfBuzz? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> Date: Fri, 7 Jun 2019 05:31:33 +0200 > From: Khaled Hosny > Cc: Behdad Esfahbod , harfbuzz@lists.freedesktop.org > > > > HarfBuzz handles everything it understands. It was designed, in fact, > > > such that when combined with > > > FreeType or other external font funcs implementation, it even "handles" > > > font formats it does not understand. > > > Eg. HarfBuzz doesn't understand BDF, PCF, etc, but if you use hb-ft, you > > > can use hb-ft for everything, and > > > BDF, PCF etc also magically work because HarfBuzz defers to FreeType for > > > glyph access, and simply > > > "passes through" for the rest. It was designed such that you can keep > > > one shaping code path. > > > > We don't currently use hb-ft on Windows. But thanks, I think I > > understand. > > You can achieve the same by implementing font functions for the font > formats HarfBuzz does not directly support, using e.g. GDI API to access > glyph info in these fonts (see hb_font_funcs_set_* functions). Thanks, I will look into this, time permitting. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Emacs now uses HarfBuzz
This is to let you know that the master branch of Emacs now uses HarfBuzz as its shaping engine. I would like to thank everyone here for your help in making this happen, whether by contributing code or by advice (or both). You may now wish to add Emacs to the list pf projects which use HarfBuzz. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> Date: Thu, 6 Jun 2019 01:31:19 +0100 > From: Richard Wordingham > > On Wed, 05 Jun 2019 20:26:41 +0300 > Eli Zaretskii wrote: > > > To make the question perhaps more concrete: the current code considers > > a font to be a match for shaping with HarfBuzz if it's either OTF or > > TTF, and covers at least one Unicode sub-range above u+00FF > > codepoint. Is this a reasonable test, or should the code consider > > additional font features? > > Even that's fraught. For example, my Tai Tham font Da Lekh includes > some Thai characters because they're used with Tai Tham text, but > doesn't include Thai script characters that aren't. I trust you're > allowing for the fact that a font for an Indian script will typically > use the dandas from the Devanagari block, without the font supporting > anything else from the Devanagari block. That's another layer of matching in Emacs. The lower layer constructs a list of all fonts that could match, and then a higher layer tests which one of those actually match the requirements of the script. I was talking about the former one, you are talking about the latter. > 1) Some good old faces may lack punctuation characters and logograms. > This doesn't mean the fonts haven't been equipped with new, good GSUB > and GPOS tables. > > 2) There seems to be an implication that Lao usage only uses one set of > digits. > > 3) A Lao-based font would omit some consonants because they aren't used > in the Lao tradition. > > 4) Some of the consonant marks are alien to modern Northern Thai > habits, and may therefore be omitted from an old typeface. > > Some fonts omit explicit shaping for Tai Tham because they entirely > reasonably want to avoid the USE. (Rumour has it that Andrew Glass > wants to ban some words from being shaped properly.) They rely on the > shaping being done by other features as applied to the default script. > This doesn't work well on Windows, but could work well with HarfBuzz as > the renderer. It's only a heuristic that they have a restricted > repertoire - proper DIY Indic rearrangement is a pain, but even I can > achieve it. > > Restricted repertoire would be very reasonable for a Myanmar script > font - it's a more extreme version of the fact that Icelandic and > German don't have the same set of letters. If all else fails, Emacs offers a facility for specifying the fonts to be used, which could go down to individual codepoints. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> Date: Thu, 6 Jun 2019 09:56:18 +0700 > From: Martin Hosken > > In case it is unclear, harfbuzz can quite happily handle any TTF or OTF > whether or not it is designed to be shaped with OpenType or not. So you only > need one code path and can simply pass any font to harfbuzz for shaping and > harfbuzz will do the Right Thing (TM). Good news, I would suggest :) Thanks, I think it's clear. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> From: Behdad Esfahbod > Date: Wed, 5 Jun 2019 12:45:00 -0700 > Cc: "harfbuzz@lists.freedesktop.org" > > HarfBuzz handles everything it understands. It was designed, in fact, such > that when combined with > FreeType or other external font funcs implementation, it even "handles" font > formats it does not understand. > Eg. HarfBuzz doesn't understand BDF, PCF, etc, but if you use hb-ft, you can > use hb-ft for everything, and > BDF, PCF etc also magically work because HarfBuzz defers to FreeType for > glyph access, and simply > "passes through" for the rest. It was designed such that you can keep one > shaping code path. We don't currently use hb-ft on Windows. But thanks, I think I understand. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> From: Behdad Esfahbod > Date: Wed, 5 Jun 2019 12:07:36 -0700 > Cc: "harfbuzz@lists.freedesktop.org" > > In other words, I don't know of a legitimate way to filter out broken fonts > like code2000. If that's what you are > asking for. No, I wasn't asking about Code2000, I was asking a more general question. > Let me ask it differently: why do you think you need to filter anything out? I assumed that some fonts will not benefit from HarfBuzz, i.e. will not support complex script shaping, because they lack some fundamental features HarfBuzz needs. When Emacs needs to find a font for displaying a character which is not supported by the default font, it scans the available fonts on the system, looking for matching fonts. On Windows, we currently have 2 matching criteria: one for fonts suitable for shaping with Uniscribe, the other for all the rest (the latter generally don't support complex script shaping). For HarfBuzz, the code currently employs the same matching criteria as for Uniscribe (I described them roughly in a previous message). I was asking whether HarfBuzz has additional requirements from fonts, or would any font that's good for Uniscribe will be good for HarfBuzz. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Patches for building HarfBuzz with mingw.org's MinGW
> From: Behdad Esfahbod > Date: Tue, 4 Jun 2019 13:08:35 -0700 > Cc: "harfbuzz@lists.freedesktop.org" > > I can't say I'm super-excited about adding more workarounds, specially since > I don't even understand why you > can't use mingw64 instead. > > At any rate, I'm not one to judge. Please open a github Pull Request with > your changes and we'll go from > there. Done, I think. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> Date: Wed, 05 Jun 2019 05:36:11 +0300 > From: Eli Zaretskii > Cc: harfbuzz@lists.freedesktop.org > > > We assume fonts support shaping. Ie. we don't have a way to check for font > > suitability for correct shaping. > > I understand, thanks. I wasn't asking how to do that with HarfBuzz, I > was asking what font features should my font matching function examine > to make sure the font will "support shaping" in the HarfBuzz sense. > Features that can be tested without actually shaping some text, of > course, i.e. without actually opening the font and using it. To make the question perhaps more concrete: the current code considers a font to be a match for shaping with HarfBuzz if it's either OTF or TTF, and covers at least one Unicode sub-range above u+00FF codepoint. Is this a reasonable test, or should the code consider additional font features? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Selecting fonts for HarfBuzz
> From: Behdad Esfahbod > Date: Tue, 4 Jun 2019 12:05:00 -0700 > Cc: "harfbuzz@lists.freedesktop.org" > > We assume fonts support shaping. Ie. we don't have a way to check for font > suitability for correct shaping. I understand, thanks. I wasn't asking how to do that with HarfBuzz, I was asking what font features should my font matching function examine to make sure the font will "support shaping" in the HarfBuzz sense. Features that can be tested without actually shaping some text, of course, i.e. without actually opening the font and using it. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Sun, 2 Jun 2019 20:29:15 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > On Sun, 02 Jun 2019 21:01:35 +0300 > Eli Zaretskii wrote: > > > The version of HarfBuzz I built on Windows and am using with Emacs has > > Graphite support, so I reckon I don't have to worry about picking up a > > Graphite shaper? > > It depends what you want to do with the shaper. If you want to study > what it does in the way of sequencing the glyphs, you need to ensure > you use the shaper you want to study! The order the glyphs are > presented to the renderer may be very different between using a > Graphite shaper and using the HarfBuzz OpenType shaper. For one thing, > swapping glyphs round is easy in Graphite and complicated in OpenType. I don't think I understand what you mean by "Graphite shaper". I'm using just HarfBuzz (which has Graphite capabilities); no other shaper is involved. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Selecting fonts for HarfBuzz
When searching the system for suitable fonts, are there any considerations or features the client should prefer, or prefer not to have, besides preferring OTF/TTF fonts, to produce the best shaping via HarfBuzz? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Sun, 2 Jun 2019 18:35:07 +0100 > From: Richard Wordingham > > It looks as though you will have to resort to Padauk for Windows 7 > Uniscribe shaping for Myanmar, and trust that you don't accidentally > pick up a Graphite shaper. With Emacs learning to shape text via HarfBuzz, Uniscribe is about to become deprecated for Emacs on Windows. Which is good, since Microsoft want the users to move away of Uniscribe, and the replacement DirectWrite will probably never be supported by Emacs. The version of HarfBuzz I built on Windows and am using with Emacs has Graphite support, so I reckon I don't have to worry about picking up a Graphite shaper? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Patches for building HarfBuzz with mingw.org's MinGW
Hi, I'd like to submit a few small patches that allow HarfBuzz to be built on Windows with mingw.org's MinGW toolchain. (And before you ask: the reason you don't see the problems I describe below in your MinGW builds is that you use MinGW64, which is a different flavor of MinGW.) The patches are against HarfBuzz 2.5.1. Here are the patches, with explanations: 1. This patch is needed because MinGW doesn't have _BitScanForward and _BitScanReverse. They are only used with old GCC versions, so conditioning their calls by those old versions of GCC is good enough, IMO. --- src/hb-algs.hh~02019-06-01 08:49:47.0 +0300 +++ src/hb-algs.hh 2019-06-02 11:03:52.373677900 +0300 @@ -400,7 +400,7 @@ return sizeof (unsigned long long) * 8 - __builtin_clzll (v); #endif -#if (defined(_MSC_VER) && _MSC_VER >= 1500) || defined(__MINGW32__) +#if (defined(_MSC_VER) && _MSC_VER >= 1500) || (defined(__MINGW32__) && (__GNUC__ < 4)) if (sizeof (T) <= sizeof (unsigned int)) { unsigned long where; @@ -474,7 +474,7 @@ return __builtin_ctzll (v); #endif -#if (defined(_MSC_VER) && _MSC_VER >= 1500) || defined(__MINGW32__) +#if (defined(_MSC_VER) && _MSC_VER >= 1500) || (defined(__MINGW32__) && (__GNUC__ < 4)) if (sizeof (T) <= sizeof (unsigned int)) { unsigned long where; 2. This patch is needed because mingw.org's MinGW defines MemoryBarrier as an inline function, not as a macro. __MINGW32_VERSION is defined only by mingw.org's MinGW, so the change shouldn't affect MinGW64. --- src/hb-atomic.hh~0 2019-05-27 20:07:58.0 +0300 +++ src/hb-atomic.hh2019-06-02 10:55:49.013099500 +0300 @@ -107,7 +107,7 @@ static inline void _hb_memory_barrier () { -#ifndef MemoryBarrier +#if !defined(MemoryBarrier) && !defined(__MINGW32_VERSION) /* MinGW has a convoluted history of supporting MemoryBarrier. */ LONG dummy = 0; InterlockedExchange (&dummy, 1); 3. This patch is needed because MinGW doesn't define E_NOT_SUFFICIENT_BUFFER. --- src/hb-uniscribe.cc~0 2019-05-14 03:28:16.0 +0300 +++ src/hb-uniscribe.cc 2019-06-02 11:04:43.843081900 +0300 @@ -31,6 +31,10 @@ #include #include +#ifndef E_NOT_SUFFICIENT_BUFFER +#define E_NOT_SUFFICIENT_BUFFER HRESULT_FROM_WIN32 (ERROR_INSUFFICIENT_BUFFER) +#endif + #include "hb-uniscribe.h" #include "hb-open-file.hh" 4. This patch is needed because mingw.org's MinGW doesn't have the intrin.h header file; instead, the intrinsics are declared by including windows.h. --- src/hb.hh~0 2019-05-14 09:42:00.0 +0300 +++ src/hb.hh 2019-06-02 11:06:01.413041500 +0300 @@ -183,8 +183,15 @@ #include #if (defined(_MSC_VER) && _MSC_VER >= 1500) || defined(__MINGW32__) +#ifdef __MINGW32_VERSION +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN 1 +#endif +#include +#else #include #endif +#endif #define HB_PASTE1(a,b) a##b #define HB_PASTE(a,b) HB_PASTE1(a,b) Thank you for developing HarfBuzz. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Fri, 31 May 2019 08:54:50 +0300 > From: Eli Zaretskii > Cc: harfbuzz@lists.freedesktop.org > > > Date: Thu, 30 May 2019 21:19:00 +0100 > > From: Richard Wordingham > > Cc: harfbuzz@lists.freedesktop.org > > > > > I don't see any reordering here (with HarfBuzz), but maybe it's > > > because the only font I have that covers Myanmar is Code2000. > > > > That's probably the problem. I have Version 1.171 of the font, and the > > closest is comes to layout support for Myanmar is empty lists of > > lookups for undefined script "myan". The script tag should be "mymr", > > so HarfBuzz applies no script-specific shaping. There may be other > > issues, as changing "myan" to "mymr" doesn't fix the problem. > > That figures, as Emacs by default claims there are no fonts on this > system that support Myanmar, and I need to force it use Code2000. > > I will try with other fonts later. With Da Lekh I do see the reordering, but only with HarfBuzz as the font backend; Uniscribe doesn't seem to support that, at least not the version on Windows 7. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Thu, 30 May 2019 21:19:00 +0100 > From: Richard Wordingham > Cc: harfbuzz@lists.freedesktop.org > > > I don't see any reordering here (with HarfBuzz), but maybe it's > > because the only font I have that covers Myanmar is Code2000. > > That's probably the problem. I have Version 1.171 of the font, and the > closest is comes to layout support for Myanmar is empty lists of > lookups for undefined script "myan". The script tag should be "mymr", > so HarfBuzz applies no script-specific shaping. There may be other > issues, as changing "myan" to "mymr" doesn't fix the problem. That figures, as Emacs by default claims there are no fonts on this system that support Myanmar, and I need to force it use Code2000. I will try with other fonts later. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Thu, 30 May 2019 19:48:34 +0100 > From: Richard Wordingham > > The reordering is that the order in the backing store is: > > > > but the ordering in the display, left to right, is: > > > > I'd be surprised if this caused much problem. I think the big issue is > related to the different meaning of advance width for left-to right and > right-to-left layout. The OpenType scheme just changes the order of > the major base glyphs for (non-Kharoshthi) Indic reordering, so what you > see on the page is what you have in the glyph sequence. I don't see any reordering here (with HarfBuzz), but maybe it's because the only font I have that covers Myanmar is Code2000. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Thu, 30 May 2019 01:18:24 +0100 > From: Richard Wordingham > > On Wed, 29 May 2019 22:32:12 +0300 > Eli Zaretskii wrote: > > The attached files shows a rendering of KA, U+1A6E TAI THAM VOWEL SIGN E, U+1A63 TAI THAM VOWEL SIGN AA>; one > could equally well use . The > visual order (in the direction of the script, from left to right) is > . What font(s) do you use for these scripts? Also, I'm not sure I understand why you describe some kind of reordering in this case: AFAICT, all of the characters you mentioned have string L directionality. So why would they need to be reordered? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to make sure an hb_font_t object is valid?
> From: Ebrahim Byagowi > Date: Thu, 30 May 2019 11:14:13 +0430 > Cc: "harfbuzz@lists.freedesktop.org" > > Oh hb_font_t, I am sorry, as far as I know they are always valid, I don't > know of a case that it can be invalid > other than having an invalid hb_face_t. Maybe others can help better on this. > Relying on hb_shape_full result is > not that common practice as most of clients don't use it and they use > hb_shape which returns void, I suggest > you to stick to that also. As far as I remember, it was Khaled who wrote the shaper, and his original code used hb_shape_full. We just didn't dare to change that, although I can see that the arguments we actually pass to the shaper don't really justify calling hb_shape_full, it only provides the return value, unlike hb_shape, so maybe Khaled wanted that value for more solid code? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] HarfBuzz shaping of R2L text
> Date: Wed, 29 May 2019 21:18:48 +0200 > From: Khaled Hosny > Cc: harfbuzz@lists.freedesktop.org > > AFAIK, yes this is expected. Usually the glyph order shouldn’t matter, > one just draws them as they are ordered by HarfBuzz and for anything > that requires glyph to glyph to character mapping, the clusters provide > all the information needed. The display looks correct, I was just surprised that the order was reversed regardless of the buffer's direction. > As it happens, somewhere in Emacs does not like that for whatever reason > and would raw the glyph in the wrong order, so it my HarfBuzz in Emacs > integration code I used hb_buffer_reverse_clusters() right after shaping > to get the glyph correctly drawn. AFAICT, hb_buffer_reverse_clusters doesn't reverse the order of the glyphs, it only renumbers the clusters such that they are in ascending order. And in the specific case I described, there's only one cluster anyway (I use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES, because HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS caused problems on Windows). > No idea how Emacs would deal with reordered Indic glyphs which don’t > always follow the input order. Can you show an example of such a situation and what is expected from the correct shaping and display? I could then see what happens in Emacs. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] How to make sure an hb_font_t object is valid?
Last time I asked a similar question, I was told to use hb_face_get_glyph_count. But eventually I need to know that an hb_font_t I create from the face is valid and can be used for shaping. What are the best practices for doing that? Or maybe the shaper will return 'false' when given an invalid font, and all I need is to test the return value of the shaper? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] HarfBuzz shaping of R2L text
Hi, While testing the results of hb_shape_full called to shape R2L text, I observed behavior that surprised me: shaping an R2L base letter with a diacritical produces a sequence of glyphs in reverse order, i.e. the glyph for the diacritical comes first, before the base letter. For example, if I shape the sequence (in the logical order) U+05EA HEBREW LETTER TAV U+05BB HEBREW POINT QUBUTS the glyphs left in the buffer by the shaper are in reverse order, first QUBUTS, then TAV. I thought that this was because of bidi reordering, but the result doesn't change if I set the buffer direction to LTR before calling the shaper. The order of the clusters does change with the direction, i.e. with LTR the first cluster is zero, followed by 1, etc., whereas with RTL the clusters are in the decreasing order. But the glyphs are always in the same order: the point first, then the letter. I see the same with the Arabic script if I shape U+0633 followed by U+0651 (in logical order). This doesn't happen with LTR text in unidirectional scripts, including with Latin text when shaping a base letter followed by a diacritical. Is this expected behavior? If so, what are the reasons? Also, can it be controlled by the client application? E.g., Uniscribe can be told to produce glyphs in the logical order, after shaping them for RTL display. TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> Cc: beh...@behdad.org, harfbuzz@lists.freedesktop.org > From: Jonathan Kew > Date: Sat, 11 May 2019 22:15:46 +0100 > > > Would wrapping in a blob the buffer returned by GetFontData be enough? > > If you use GetFontData to get the complete font as a single buffer (i.e. > pass zero for the dwTable parameter), yes. > > Alternatively, you could use hb_face_create_for_tables, with a > reference_table_func that uses GetFontData to read individual tables > when harfbuzz asks for them. FTR, I found that using GetFontData to produce a blob that wraps the entire data of a font does work, but is not really practical, except in small test programs. If you have a program that occasionally needs to load many fonts in order to display many different scripts at the same time (Emacs basically does that all the time), you will likely run out of memory, especially in 32-bit builds, because some fonts are simply huge (I've seen fonts of several dozen MBs). A 32-bit build of Emacs ran out of memory when displaying the HELLO file, which shows a greeting in many different scripts. So eventually, I went with the hb_face_create_for_tables method. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Units of members of hb_glyph_position_t
> From: Behdad Esfahbod > Date: Tue, 28 May 2019 15:03:48 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > > You pick what value you want to represent one pixel as. Say, you choose > 1024. Then if you want to > render at > > "16px" font size, you set scale to 16*1024. That's all. > > And then the values of hb_glyph_position_t should be divided by 1024 > to produce pixels when using this hb_font_t object? > > Yes. OK, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Units of members of hb_glyph_position_t
> From: Behdad Esfahbod > Date: Tue, 28 May 2019 14:46:45 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > You pick what value you want to represent one pixel as. Say, you choose > 1024. Then if you want to render at > "16px" font size, you set scale to 16*1024. That's all. And then the values of hb_glyph_position_t should be divided by 1024 to produce pixels when using this hb_font_t object? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Units of members of hb_glyph_position_t
> From: Behdad Esfahbod > Date: Mon, 27 May 2019 21:21:10 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > control those units (if they are under the client program's control). > > They are controlled mainly using hb_font_set_scale(). > > In particular, I get a huge value of x_advance for the letter U+05EA > HEBREW LETTER TAV when it is followed by U+05BB HEBREW POINT QUBUTS. > The value of x_advance I get is 1229, which is too large even after > dividing by 64 (which, btw, I still am not sure is TRT in my case, > > FreeType works in 26.6 fixed-point, ie. 64 units per 1.0. That's where the > 64 value comes from. And you > don't see it in your code because hb_ft_font_create* sets that on hb_font for > you. > > In your Windows code, you should call hb_font_set_scale(). I believe right > now you are *not* calling, and you > get values in the face's UPEM. That's the default scale for fonts. You can > get the face UPEM using > hb_face_get_upem(). OK, I figured out how to scale the units from UPEM to pixels for a given font size, and now I see reasonable results after such scaling. However, I think something is still amiss, because I still don't understand how to determine the values with which to call hb_font_set_scale. Say I call it with an integer value N, what will that produce in terms of values of hb_glyph_position_t? Will the values there be in the 0..N range, where N means the full height of the em box? If so, how would I then convert those values to pixels -- this conversion will need the font size as well, right? And if so, I might well leave the values in UPEM units, and convert them to pixels by hand. I feel that I'm still missing something, since you said "you should call hb_font_set_scale". So presumably if I call that function, conversion to pixels will somehow become easier? Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Units of members of hb_glyph_position_t
> From: Behdad Esfahbod > Date: Mon, 27 May 2019 21:21:10 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > control those units (if they are under the client program's control). > > They are controlled mainly using hb_font_set_scale(). What happens if hb_font_set_scale is not called? Is there some kind of default? > In particular, I get a huge value of x_advance for the letter U+05EA > HEBREW LETTER TAV when it is followed by U+05BB HEBREW POINT QUBUTS. > The value of x_advance I get is 1229, which is too large even after > dividing by 64 (which, btw, I still am not sure is TRT in my case, > > FreeType works in 26.6 fixed-point, ie. 64 units per 1.0. That's where the > 64 value comes from. And you > don't see it in your code because hb_ft_font_create* sets that on hb_font for > you. hb_ft_font_create is not used in the Windows code, because the Windows code doesn't use Freetype to open and otherwise manipulate fonts. > In your Windows code, you should call hb_font_set_scale(). I believe right > now you are *not* calling, and you > get values in the face's UPEM. That's the default scale for fonts. You can > get the face UPEM using > hb_face_get_upem(). OK, thanks, I will look into this. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Units of members of hb_glyph_position_t
> Date: Mon, 27 May 2019 21:21:47 +0300 > From: Eli Zaretskii > > I cannot figure out in what units are these values reported, or how to > control those units (if they are under the client program's control). > In particular, I get a huge value of x_advance for the letter U+05EA > HEBREW LETTER TAV when it is followed by U+05BB HEBREW POINT QUBUTS. > The value of x_advance I get is 1229, which is too large even after > dividing by 64 (which, btw, I still am not sure is TRT in my case, > because I don't understand the source of the 64 value). Btw, if someone wants to look at the code I'm using to call the shaper, it's here: http://git.savannah.gnu.org/cgit/emacs.git/tree/src/ftfont.c?h=harfbuzz The function that calls the HarfBuzz shaper starts at line 2978 on that file. This code is for GNU/Linux, but the code I'm using on Windows (which is not yet in the repository) is an exact copy of that function. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Units of members of hb_glyph_position_t
I cannot figure out in what units are these values reported, or how to control those units (if they are under the client program's control). In particular, I get a huge value of x_advance for the letter U+05EA HEBREW LETTER TAV when it is followed by U+05BB HEBREW POINT QUBUTS. The value of x_advance I get is 1229, which is too large even after dividing by 64 (which, btw, I still am not sure is TRT in my case, because I don't understand the source of the 64 value). Can someone please help me figure out what am I doing wrong? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get a glyph code for a character?
> Date: Sat, 25 May 2019 17:17:23 +0200 > From: Khaled Hosny > Cc: Richard Wordingham , > harfbuzz@lists.freedesktop.org > > On Sat, May 25, 2019 at 06:08:42PM +0300, Eli Zaretskii wrote: > > > Date: Sat, 25 May 2019 15:50:38 +0100 > > > From: Richard Wordingham > > > > > > I presume you're after the glyph indicated by the raw cmap, e.g. > > > without localisation. > > > > Not sure what kind of localisation are you alluding to here. I must > > confess that I'm relatively ignorant about fonts, glyphs, and shaping, > > so I'm probably missing a lot here. For example, I have no idea what > > is a "raw cmap". > > For any given script and language, the font might provide a different > localized glyph than the default one. Only hb_shape[_full]() will apply > such localization. Ah, okay. Well, as you know, Emacs currently doesn't know the script of a character at all, and only knows the global session-wide value of the language, not the language of the text from which the character came. So in practice it seems the nominal glyph will do for now. > Then hb_shape() is the right tool here. HarfBuzz will also automatically > insert dotted circle for combining marks that are at the start of the > text string if HB_BUFFER_FLAG_BOT is set on the buffer. You can safely > set HB_BUFFER_FLAG_BOT and HB_BUFFER_FLAG_EOT on any buffer as long as > the text passed to hb_buffer_add* functions is the full paragraph text > not just a chunk of it (that is another reason why one should pass the > full paragraph and the item offset and length to these function instead > of just the substring). Thanks, I will look into this later. Right now I have a more urgent issues: the glyph metrics seem to be wrong (width too large or somesuch, not sure yet). In general, though, Emacs never lays out entire paragraphs of text, I think we pass at most a single screen line to the shaper. Changing that would probably need a significant redesign of the display code. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get a glyph code for a character?
> From: Behdad Esfahbod > Date: Sat, 25 May 2019 11:01:31 -0400 > Cc: "harfbuzz@lists.freedesktop.org" > > What is the best way of providing such a method with HarfBuzz on > MS-Windows? One possibility is obviously to call hb_shape, but maybe > there's a simpler way for a single codepoint? > > hb_font_get_nominal_glyph(). > > Use of such facilities in an application is quite suspect though. > > Btw, what does hb_font_get_glyph() return? > > Boolean indicating whether the font supports that character. Great, thank you very much. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get a glyph code for a character?
> Date: Sat, 25 May 2019 15:50:38 +0100 > From: Richard Wordingham > > I presume you're after the glyph indicated by the raw cmap, e.g. > without localisation. Not sure what kind of localisation are you alluding to here. I must confess that I'm relatively ignorant about fonts, glyphs, and shaping, so I'm probably missing a lot here. For example, I have no idea what is a "raw cmap". > Using hb_shape could very well result in the addition of a dotted > circle for a combining mark - is that what you want? AFAIK, this method is only called in Emacs for a combining mark when we indeed want it displayed as a separate character, with the dotted circle. It is normally called for base (non-combining) characters. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] How to get a glyph code for a character?
One of the methods an Emacs font-backend should provide is the encode_char method, which returns the glyph code of the selected font for a character given by its Unicode codepoint. For example, the XFT backend uses the XftCharIndex function for that purpose, and the Freetype backend uses FT_Get_Char_Index. What is the best way of providing such a method with HarfBuzz on MS-Windows? One possibility is obviously to call hb_shape, but maybe there's a simpler way for a single codepoint? Btw, what does hb_font_get_glyph() return? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Konstantin Ritt > Date: Fri, 24 May 2019 19:16:24 +0300 > Cc: Ebrahim Byagowi , Harfbuzz > > > hb_blob_t *my_reference_table(hb_face_t * /*face*/, hb_tag_t tag, void > *user_data) > { > HDC hdc = (HDC)user_data; > SelectObject(hdc, hfont); > > char *buffer = NULL; > DWORD length = 0; > > length = GetFontData(hdc, byte_swap(tag), 0, buffer, length); > if (length == GDI_ERROR) > return hb_blob_get_empty(); > > buffer = (char *)::malloc(length); > length = GetFontData(hdc, byte_swap(tag), 0, buffer, length); > if (length == GDI_ERROR) > length = 0; > > return hb_blob_create((const char *)buffer, length, > HB_MEMORY_MODE_READONLY, buffer, ::free); > } > > hb_face_t *my_face_create_from_hdc(HDC hdc) > { > return hb_face_create_for_tables(my_reference_table, (void *)hdc, NULL); > } Thanks, I think how to manage the memory of a blob is now clear to me. But the question about hb_face_t management is still not entirely clear. I don't really need hb_face_t, I only create it as an intermediate step towards hb_font_t. So my question is: once I have hb_font_t, can I destroy the hb_face_t I used to create hb_font_t? If not, how do I arrange for hb_face_t to be destroyed when the corresponding hb_font_t is destroyed? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Ebrahim Byagowi > Date: Fri, 24 May 2019 20:13:43 +0430 > Cc: Harfbuzz > > Pardon me for the may inaccurate following answer I have to write quickly, Thanks for your help. > > Also, does HarfBuzz support TrueType Collection (TTC) files, and if so, > > does it want the data only for the > currently selected font or all > of the data? > > It does, if you want harfbuzz handles it for you, you should give it the full > blob and set the index you like in > second argument of hb_face_create, otherwise you should handle it yourself. OK, this brings me to another question: what should I in general pass as the 2nd argument of hb_face_create? Suppose I'm using a TTF or OTF font file, should I always pass zero as the 2nd argument? What is the semantics of that argument? > > I'm now working on the HarfBuzz font driver for Emacs on Windows using > > GetFontData with the dwTable > argument zero, to get the entire data of the font. > > Is it DirectWrite? Have you seen the helper we have the in hb-directwrite.h > and hb-uniscribe.h? They can be > very useful. I'm not using DirectWrite, nor am I using Uniscribe. My HarfBuzz is built without these two, as I understand building with these back-ends is only needed for comparison. I want to use the HarfBuzz shaper, and only it (Emacs already has support for Uniscribe). But yes, I do consult these files to figure out answers to my questions. > > does their memory need to be freed in some manner after I have the > > hb_font_t object, or do I have to keep > them as long as hb_font_t is in use? > > Don't free it yourself specially if in use, you can use harfbuzz destroy > callback so harfbuzz can handle it for > you. Sorry, I don't think I understand: what do you mean by "harfbuzz destroy callback"? If you mean the 'destroy" argument of hb_blob_create, then AFAIU this is called only to destroy user_data, and I don't have user_data, I pass NULL as the 4th argument of hb_blob_create. And hb_face_create doesn't have any callback argument at all. I see in the few programs in util/ that both the blob and the face are destroyed as soon as hb_font_t object is created, which is why I thought I could do the same. But now you seem to say I shouldn't? For that matter, what should I use as the 'mode' argument of hb_blob_create? This page: https://harfbuzz.github.io/object-model-blobs.html shows an example of calling hb_blob_create with 'free' (in my case, 'xfree') as the 'destroy' callback, so I guess my interpretation of that argument as being pertinent to user_data was incorrect? Still, the questions about memory management for hb_face_t and about the semantics of the hb_memory_mode_t enum values are left unanswered. > > I see that hb_blob_create, hb_face_create etc. return empty objects when > > they fail. But I see no "is-empty" > function or macro in the docs, did I miss something? > > Some of the objects may work with empty comparison but it is not broken face > https://github.com/harfbuzz/harfbuzz/issues/1572 but something does it very > accurately is > hb_face_get_glyph_count AFAIU, you are saying that if hb_face_get_glyph_count returns zero, the face is empty and shouldn't be used, is that right? > > Where do those 64.0 factors come from? > > Subpixel accuracy, harfbuzz works with integers but as subpixel accuracy > needed you have to we need to do > some scaling. Scaling is not the pixels but _set_ppem and _set_ptem is (this > is very inaccurate, but I hope > would be useful) Does this mean I should use the factor of 64 in my code as well? Or does that value depend on some properties of the font? > > > Or point me to the documentation where that is described, if I missed it? > > https://harfbuzz.github.io/ may address some of your issues Thanks again for your help. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
Ping! Could someone please help me understand how the memory for the various HarfBuzz objects should be handled? Or point me to the documentation where that is described, if I missed it? Please?? > Date: Sat, 18 May 2019 14:33:45 +0300 > From: Eli Zaretskii > Cc: harfbuzz@lists.freedesktop.org > > > Cc: beh...@behdad.org, harfbuzz@lists.freedesktop.org > > From: Jonathan Kew > > Date: Sat, 11 May 2019 22:15:46 +0100 > > > > >> If you've got access to the font as a file or as a single buffer in > > >> memory, then wrapping the entire thing as a blob and handing it to > > >> hb_face_create will be simplest. > > > > > > Would wrapping in a blob the buffer returned by GetFontData be enough? > > > > If you use GetFontData to get the complete font as a single buffer (i.e. > > pass zero for the dwTable parameter), yes. > > I'm now working on the HarfBuzz font driver for Emacs on Windows using > GetFontData with the dwTable argument zero, to get the entire data of > the font. The question for which I cannot find an answer is regarding > the memory management of the font data I get from GetFontData. The > buffer into which I get the font data is malloc'ed. Then I create a > blob from that buffer using hb_blob_create, use that blob to create a > face with hb_face_create, and finally use the face to create a font > with hb_font_create. The result of hb_font_create I cache and use > thereafter each time I need to call hb_shape_full. But what about the > hb_blob_t and the hb_face_t objects created in the process -- does > their memory need to be freed in some manner after I have the > hb_font_t object, or do I have to keep them as long as hb_font_t is in > use? The question about the blob also directly affects whether I need > to keep around the buffer allocated for the GetFontData call, or can > it be freed once I have the hb_font_t object. > > Another question is about error handling. I see that hb_blob_create, > hb_face_create etc. return empty objects when they fail. But I see no > "is-empty" function or macro in the docs, did I miss something? If > not, how does one test for errors in a C program? I assumed that any > errors cause subsequent calls to fail, and so only checked the last > call to hb_font_create for errors -- is that correct? > > Thanks in advance for any help. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
And one more question. The shaping example in the HarfBuzz docs does this after shaping: x_offset = glyph_pos[i].x_offset / 64.0; y_offset = glyph_pos[i].y_offset / 64.0; x_advance = glyph_pos[i].x_advance / 64.0; y_advance = glyph_pos[i].y_advance / 64.0; Where do those 64.0 factors come from? IOW, I guess the question is in what units are the members of hb_glyph_position_t measured, and how to scale that to the pixels of the display? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> Date: Sat, 11 May 2019 22:44:49 +0300 > From: Eli Zaretskii > Cc: harfbuzz@lists.freedesktop.org > > > From: Behdad Esfahbod > > Date: Sat, 11 May 2019 12:25:57 -0700 > > Cc: Jonathan Kew , > > "harfbuzz@lists.freedesktop.org" > > > > you can even implement Windows-backed font-funcs. Several projects > > do that. Say, look at Qt maybe? > > I looked at XeTeX, but it goes the Freetype way. I'll look at Qt, > thanks. For the record: Qt seems to use GetFontData for individual OTF tables. See qwindowsfontdatabase.cpp in the Qt sources. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> Cc: beh...@behdad.org, harfbuzz@lists.freedesktop.org > From: Jonathan Kew > Date: Sat, 11 May 2019 22:15:46 +0100 > > >> If you've got access to the font as a file or as a single buffer in > >> memory, then wrapping the entire thing as a blob and handing it to > >> hb_face_create will be simplest. > > > > Would wrapping in a blob the buffer returned by GetFontData be enough? > > If you use GetFontData to get the complete font as a single buffer (i.e. > pass zero for the dwTable parameter), yes. I'm now working on the HarfBuzz font driver for Emacs on Windows using GetFontData with the dwTable argument zero, to get the entire data of the font. The question for which I cannot find an answer is regarding the memory management of the font data I get from GetFontData. The buffer into which I get the font data is malloc'ed. Then I create a blob from that buffer using hb_blob_create, use that blob to create a face with hb_face_create, and finally use the face to create a font with hb_font_create. The result of hb_font_create I cache and use thereafter each time I need to call hb_shape_full. But what about the hb_blob_t and the hb_face_t objects created in the process -- does their memory need to be freed in some manner after I have the hb_font_t object, or do I have to keep them as long as hb_font_t is in use? The question about the blob also directly affects whether I need to keep around the buffer allocated for the GetFontData call, or can it be freed once I have the hb_font_t object. Another question is about error handling. I see that hb_blob_create, hb_face_create etc. return empty objects when they fail. But I see no "is-empty" function or macro in the docs, did I miss something? If not, how does one test for errors in a C program? I assumed that any errors cause subsequent calls to fail, and so only checked the last call to hb_font_create for errors -- is that correct? Thanks in advance for any help. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> Cc: beh...@behdad.org, harfbuzz@lists.freedesktop.org > From: Jonathan Kew > Date: Sat, 11 May 2019 22:15:46 +0100 > > > Would wrapping in a blob the buffer returned by GetFontData be enough? > > If you use GetFontData to get the complete font as a single buffer (i.e. > pass zero for the dwTable parameter), yes. > > Alternatively, you could use hb_face_create_for_tables, with a > reference_table_func that uses GetFontData to read individual tables > when harfbuzz asks for them. OK, thanks. I think this is a large chunk of the solution to my problem. Assuming that I want to use GetFontData, what factors and aspects should I consider when deciding whether to create a single blob with the entire font's data or to go for the hb_face_create_for_tables variety? Also, does HarfBuzz support TrueType Collection (TTC) files, and if so, does it want the data only for the currently selected font or all of the data? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Behdad Esfahbod > Date: Sat, 11 May 2019 12:25:57 -0700 > Cc: Jonathan Kew , > "harfbuzz@lists.freedesktop.org" > > you can even implement Windows-backed font-funcs. Several projects > do that. Say, look at Qt maybe? I looked at XeTeX, but it goes the Freetype way. I'll look at Qt, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> Cc: Behdad Esfahbod , > "harfbuzz@lists.freedesktop.org" > From: Jonathan Kew > Date: Sat, 11 May 2019 20:11:17 +0100 > > > Yes. The font file. Maybe describe what you are trying to do? > > > > If you've got access to the font as a file or as a single buffer in > memory, then wrapping the entire thing as a blob and handing it to > hb_face_create will be simplest. Would wrapping in a blob the buffer returned by GetFontData be enough? > In a case where you don't necessarily have easy access to the complete > font file, but have platform APIs that you can use to retrieve specific > font tables (like IDWriteFontFace::TryGetFontTable on Windows, or > CGFontCopyTableForTag on macOS), that's where you might prefer to use > hb_face_create_for_tables (like Firefox does). This expects you to > provide a reference_table_func that will return a blob containing the > data of any given font table (identified by its 32-bit OpenType table tag). So there should be a function for each of the OpenType table tag, each function returning a pointer to the table's data? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Behdad Esfahbod > Date: Sat, 11 May 2019 11:51:16 -0700 > Cc: Jonathan Kew , > "harfbuzz@lists.freedesktop.org" > > Not sure yet. What is a "font" for this purpose? Does it have to be > the full contents of a font file on disk? > > Yes. The font file. Maybe describe what you are trying to do? I'm trying to use on MS-Windows the HarfBuzz shaping function for Emacs, which Khaled wrote. The code as written uses Freetype-specific data (FT_Face), and I'm trying to provide it with the Windows equivalents instead. As for passing the font file's data to hb_blob_create: it is quite unusual to manipulate physical font files on MS-Windows, the usual paradigm is to use a "logical font", which is a specification for a font, and then retrieve the metrics of the font using dedicated APIs. So I wonder whether there's an alternative to accessing the physical font files. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Behdad Esfahbod > Date: Sat, 11 May 2019 11:38:58 -0700 > Cc: Jonathan Kew , > "harfbuzz@lists.freedesktop.org" > > The blob simply hold the font file bytes. There's even > hb_blob_create_from_file. > > Makes sense? Not sure yet. What is a "font" for this purpose? Does it have to be the full contents of a font file on disk? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Behdad Esfahbod > Date: Sat, 11 May 2019 11:26:29 -0700 > Cc: Jonathan Kew , > "harfbuzz@lists.freedesktop.org" > > Or just use hb_face_create() and hb_font_create(). Thanks, that's what I thought I needed to do to begin with. However, hb_face_create needs a 'blob' argument, and I couldn't understand how to call hb_blob_create to get a suitable blob. Then I looked at how hb_ft_face_create does it, and saw that it makes a blob out of FT_Face structure. But I couldn't see how that blob is used by HarfBuzz, and so couldn't decide how to call hb_blob_create in my case. Was I asking myself the right questions? If so, how can I find the answers? TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
> From: Jonathan Kew > Date: Sat, 11 May 2019 11:08:42 +0100 > > You can use hb_face_create_for_tables, passing it a function that can > retrieve font tables (as hb_blobs) when requested by harfbuzz. > > This is what Firefox does, to use harfbuzz with Windows or MacOS font > APIs; a starting point to explore the Firefox code would be [1], where > we call hb_face_create_for_tables and pass it HBGetTable as the > reference_table_func. This calls down to the GetFontTable() method, > which has separate implementations for the various platforms. Thanks, this gets me a notch forward, but I'm afraid there's still a lot of fog. Specifically, what does HarfBuzz expect from the hb_blob's it retrieves this way, and where and how does it use those blobs? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] How to get hb_face_t and hb_font_t without Freetype?
Is it possible to create a hb_face_t without going through Freetype? If so, could someone please tell what that would entail? The tutorial only shows how to create hb_font_t using Freetype, and I found no other documentation related to this, except the functions' signatures. The implementations of hb_ft_face_create and hb_ft_face_create look deceptively simple, so maybe it wouldn't be hard to implement something similar without going through Freetype. But the question is what is needed from the data stashed away by hb_blob_create, and where is that data used? I guess there are some callbacks specific to Freetype which the HarfBuzz shaper needs, and those callbacks need to access the blob data? But none of that seems to be documented. Could someone please post some information about these issues, or point me to existing documentation if I missed it? The context for these questions is WIP to add a HarfBuzz shaping capabilities to Emacs on MS-Windows. The existing HarfBuzz integration, for Posix platforms, uses Freetype, because Freetype is already used by Emacs on Posix systems to access font capabilities. But on Windows Emacs uses native Windows interfaces to access and utilize font and text metrics data, so going through Freetype would probably add interfaces whose equivalents already exist. The question is how to use those equivalents to give HarfBuzz what it needs. TIA ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Building and testing HarfBuzz 2.3.0 on MinGW
> From: Ebrahim Byagowi > Date: Fri, 8 Feb 2019 16:02:46 +0330 > Cc: Nathan Willis , Harfbuzz > > > > My conclusion was that ICU is not needed, but maybe it has some advantages, > > It will be a good idea if someone ships ICU anyway, they use their ICU (or > glib, which can provide unicode > callbacks also) instead having extra a harfbuzz buildin UCDN, at least for > size reduction reasons. > [...] > > Glib is needed for running a large part of the test suite > > It can provide unicode callbacks also as just said before. Thanks, but I still don't think I understand: given that Unicode character properties are all derived from the same UCD database, what would be the motivation to use Glib or ICU for these purposes, even if these libraries are already linked into a program? Do ICU/Glib support some extensions that UCDN doesn't, or are more likely to support the latest Unicode Standard? > > It is not clear to me what are GObject and Introspection needed for; it > > would be good to clarify that. > > Roughly, gnome way of writing language bindings, ie. make non C/C++ language > users able to interact with > the library with Gnome provided facilities. Not needed for C/C++ or users > don't use gobject introspection > anyway. Thanks, this part is now clear, I think. > > Btw, the information about "Building on Windows" is IMO outdated: > > nowadays one can use the "normal" Unix configure/make steps assuming > > one has MSYS and MinGW installed. That's what I did. There should be > > no need anymore for any Windows-specific build procedures. > > Not everyone will agree with you on that I guess, maybe different use-cases > or something, as you see vcpkg > project https://github.com/Microsoft/vcpkg/graphs/contributors is still a > pretty busy project, that's why I > suggest vcpkg for non-msys Windows users, even instead directly using our > cmake on Windows. Vcpkg > itself uses our cmake but can switch to meson if needed and it can target > Linux in addition to Windows, for > use-cases I am not aware of. So maybe that section should be extended to mention both methods? People who build HarfBuzz on Windows are likely to have MSYS installed anyway, because building the dependencies mostly does require it. And even if they don't have it already installed, it's good to mention that, just so that the reader would know such a method is supported. When I first read that, I was left wondering whether the normal configure && make paradigm will get me a port as functional as the method described on that page. Thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Building and testing HarfBuzz 2.3.0 on MinGW
> From: Nathan Willis > Date: Mon, 4 Feb 2019 12:28:02 + > Cc: harfbuzz@lists.freedesktop.org > > On Sat, Jan 26, 2019 at 5:35 PM Eli Zaretskii wrote: > > > 1) It would be good to have some guidance in some README or in the > > HTML docs regarding the optional dependencies and configuration > > options, and their significance. For example, it turns out Glib is > > needed to run a large portion of the test suite, something that wasn't > > clear (I initially concluded that I didn't need Glib at all). Also, > > hb-shape is not built if Glib isn't available. Similarly, hb-view is > > not build unless both Cairo and cairo-ft are available. > > > > > I added https://harfbuzz.github.io/building.html#configuration a few weeks > ago; would you mind elaborating on what is missing there from your POV? Thanks for adding this, and sorry for the long delay in responding. The information you added tells when to use the optional configure switches. That is important, but there's a more general issue of what optional dependencies are needed for which parts of HarfBuzz's functionalities. This is important for someone who wants to build HarfBuzz with the minimal set of dependencies, but without losing any functionality important for one's use case. Without a good understanding of these issues, one cannot easily decide on which of the configure switches to use, and more importantly what packages need to be installed before building HarfBuzz. In response to my questions, Khaled once provided some of the information about that. I now combine that below with what I learned while building HarfBuzz: . ICU is needed for accessing Unicode character properties; UCDN is the built-in alternative to that which has no external dependencies. My conclusion was that ICU is not needed, but maybe it has some advantages, in which case it would be good to describe them. . Cairo is needed for command-line tools (so can be skipped if one only wants the library). Note that Cairo alone is not enough for building the command-line tools, you also need cairo-ft, and for hb-shape one also needs Glib. . Freetype is one of two font callbacks; the other is built-in and has no external dependencies. The decision whether to use Freetype largely depends on whether the program(s) to be linked against HarfBuzz already use Freetype. . Fontconfig is only needed for command-line tools. . Graphite2 is becoming less and less important, as fonts which require that are rare, and their importance for minority scripts is diminishing with recent OpenType developments. . Glib is needed for running a large part of the test suite, so if one decides not to build with Glib, a separate build with Glib just for running the test suite is a good idea. . Python is required (and should be on PATH) for most of the test suite. . It is not clear to me what are GObject and Introspection needed for; it would be good to clarify that. Btw, the information about "Building on Windows" is IMO outdated: nowadays one can use the "normal" Unix configure/make steps assuming one has MSYS and MinGW installed. That's what I did. There should be no need anymore for any Windows-specific build procedures. Thanks, and let me know if I can help more with this documentation effort. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Building and testing HarfBuzz 2.3.0 on MinGW
> From: Nathan Willis > Date: Mon, 4 Feb 2019 12:28:02 + > Cc: harfbuzz@lists.freedesktop.org > > On Sat, Jan 26, 2019 at 5:35 PM Eli Zaretskii wrote: > > 1) It would be good to have some guidance in some README or in the > HTML docs regarding the optional dependencies and configuration > options, and their significance. For example, it turns out Glib is > needed to run a large portion of the test suite, something that wasn't > clear (I initially concluded that I didn't need Glib at all). Also, > hb-shape is not built if Glib isn't available. Similarly, hb-view is > not build unless both Cairo and cairo-ft are available. > > I added https://harfbuzz.github.io/building.html#configuration a few weeks > ago; would you mind elaborating on > what is missing there from your POV? Hi, I didn't forget, I just have my plate full. Will respond in a few days. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Building and testing HarfBuzz 2.3.0 on MinGW
> From: Ebrahim Byagowi > Date: Sun, 27 Jan 2019 01:02:01 +0330 > Cc: Harfbuzz > > 1) Agreed Btw, one other prerequisite for running the test suite is Python. I suggest that to be mentioned as well. In my case, Python was not on PATH, and most tests failed. > 2) Something feels wrong as we compile all these in our msys2 CI already and > that shouldn't be that different > from your setup I saw that similar failures were reported here: https://github.com/harfbuzz/harfbuzz/issues/1560 So I upgraded my Freetype 2.5.0.1 to the latest 2.9.1, and then all the tests passed. Therefore, I suggest that the oldest version of Freetype that is considered "good enough" for the test suite be referenced in the documentation of prerequisites for running the tests. > 3) Uniscribe and DirectWrite backends and now CoreText, are mostly for > comparison while development, so > developers can check what can be expected behavior while development, and are > not used in the test suit at > least which tends to be platform agnostic so don't use them at all if you can. Got it, thanks. ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] Building and testing HarfBuzz 2.3.0 on MinGW
I'm resending this after subscribing to the list, since my original message, send a month ago, only got an automated response that it's waiting for a moderator. (Does someone actually tend to the moderator's tasks of this list?) I've built HarfBuzz 2.3.0 on MS-Windows using mingw.org's MinGW (https://osdn.net/projects/mingw/, different from MinGW64). In general, the build was successful, with a small number of changes that I will soon report to the issue tracker. I have a few questions/suggestions as result of this experience, which I'd like to voice. Thanks in advance for any responses. 1) It would be good to have some guidance in some README or in the HTML docs regarding the optional dependencies and configuration options, and their significance. For example, it turns out Glib is needed to run a large portion of the test suite, something that wasn't clear (I initially concluded that I didn't need Glib at all). Also, hb-shape is not built if Glib isn't available. Similarly, hb-view is not build unless both Cairo and cairo-ft are available. 2) Several tests fail. For example, "indic-joiners" and "use" in shaping/data/in-house, CVAR-1 and CVAR-2 in shaping/data/text-rendering-tests, most of the gpos_* tests in shaping/data/aots:, etc. I also built HarfBuzz on GNU/Linux, and I see failures in almost the same tests. The reason for the failures are some differences between the expected and the actual outputs. Are these real problems, for which you'd like me to report issues, or is this a known problem? Did someone succeed to run the entire test suite without a single failure? 3) I'm uncertain about the use of Uniscribe in the Windows build. I was told that it was only used "for comparison", which I interpreted to mean it was used in the test suite. But I don't think it's the case, since the Uniscribe dependent functions of HarfBuzz are in the library, so it seems like Uniscribe is used by the library itself. What is the purpose of using Uniscribe (and DirectWrite, when that is compiled in)? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/harfbuzz