Re: Progress update on adjustment database
The final branch has everything it needs to now! I would really like you or someone else who has worked a lot with freetype to give it a code review. Thank you. On Mon, Oct 30, 2023 at 2:26 AM Werner LEMBERG wrote: > > > There was an intentional change to src/base/ftobjs.c. I made the > > function that finds the best unicode charmap public so that I could > > use it from the autofitter. > > OK, I see. I was confused because the `FT_BASE_TAG` is missing for > this function in `ftobjs.c`. > > > Werner >
Re: Progress update on adjustment database
> There was an intentional change to src/base/ftobjs.c. I made the > function that finds the best unicode charmap public so that I could > use it from the autofitter. OK, I see. I was confused because the `FT_BASE_TAG` is missing for this function in `ftobjs.c`. Werner
Re: Progress update on adjustment database
> And no accidental changes to `freetype/config/ftoption.h` and > `src/base/ftobjs,c`, please :-) There was an intentional change to src/base/ftobjs.c. I made the function that finds the best unicode charmap public so that I could use it from the autofitter. On Fri, Oct 27, 2023 at 4:26 AM Werner LEMBERG wrote: > > > I have added and tested type 3 lookup handling, and also added the > > comments you asked for. > > Thanks. Minor nits that I've seen while skimming through: > > s/indicies/indices/ > s/flatenning/flattening/ > > > I will begin on the final branch if there's nothing else to be done. > > OK. Please rebase to master before you start. > > And no accidental changes to `freetype/config/ftoption.h` and > `src/base/ftobjs,c`, please :-) > > > Werner >
Re: Progress update on adjustment database
> I have added and tested type 3 lookup handling, and also added the > comments you asked for. Thanks. Minor nits that I've seen while skimming through: s/indicies/indices/ s/flatenning/flattening/ > I will begin on the final branch if there's nothing else to be done. OK. Please rebase to master before you start. And no accidental changes to `freetype/config/ftoption.h` and `src/base/ftobjs,c`, please :-) Werner
Re: Progress update on adjustment database
I have added and tested type 3 lookup handling, and also added the comments you asked for. I will begin on the final branch if there's nothing else to be done. Thanks for the help! On Mon, Oct 23, 2023 at 12:54 PM Werner LEMBERG wrote: > > > I need some help figuring out how to handle the type 3 lookups. > > I need to do 2 things: > > - Figure out which features contain type 3 lookups > > - Determine the number of variations the feature contains > > Simply add all glyphs a type 3 lookup provides to the list of glyphs. > No further special handling is necessary. > > > For the second one, this function seems relevant: > > > https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups > > But this returns a list of lookups if you already know the variation > > index, when I want to know the range of possible variation indices. > > This function is not relevant – it's about variation fonts, see > > > https://learn.microsoft.com/en-us/typography/opentype/spec/chapter2#device-and-variationindex-tables > > > hb_ot_layout_lookup_get_glyph_alternates > > also looks useful and could partially solve the problem. > > This is the right function, I think. > > > With this function, I can handle the type 3 lookup cases in > > isolation, only finding the glyphs directly resulting from the > > feature and no further transformations. > > Sounds sufficient to me. > > > Also, can I have some advice on testing the code? How should I make > > these changes bulletproof? > > Alas, we don't have a testing framework for such issues. However, as > soon as your code lands in 'master', the Chromium people and other > parties run their fuzzers on FreeType, which usually unveils memory > leaks or segfaults quite soon. They also do intensive comparison of > graphic images; however, I don't know whether they use the auto-hinter > for that. > > > Werner >
Re: Progress update on adjustment database
> I need some help figuring out how to handle the type 3 lookups. > I need to do 2 things: > - Figure out which features contain type 3 lookups > - Determine the number of variations the feature contains Simply add all glyphs a type 3 lookup provides to the list of glyphs. No further special handling is necessary. > For the second one, this function seems relevant: > https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups > But this returns a list of lookups if you already know the variation > index, when I want to know the range of possible variation indices. This function is not relevant – it's about variation fonts, see https://learn.microsoft.com/en-us/typography/opentype/spec/chapter2#device-and-variationindex-tables > hb_ot_layout_lookup_get_glyph_alternates > also looks useful and could partially solve the problem. This is the right function, I think. > With this function, I can handle the type 3 lookup cases in > isolation, only finding the glyphs directly resulting from the > feature and no further transformations. Sounds sufficient to me. > Also, can I have some advice on testing the code? How should I make > these changes bulletproof? Alas, we don't have a testing framework for such issues. However, as soon as your code lands in 'master', the Chromium people and other parties run their fuzzers on FreeType, which usually unveils memory leaks or segfaults quite soon. They also do intensive comparison of graphic images; however, I don't know whether they use the auto-hinter for that. Werner
Re: Progress update on adjustment database
> OK. Minor nit: Please avoid overlong git commit messages (i.e., not > longer than 78 characters). And there should be an empty line after > the first line to help tools like `gitk` to properly display git > commits. [Overlong lines should be avoided in the C code, too, both > for comments and code.] > Excellent! I think it would also be beneficial if you could mention > your findings in either a git comment or in the code itself, together > with some real-world examples of such quirks (i.e., font name, font > version, glyph name, reason why it fails, etc., etc.) Will do, thanks. I need some help figuring out how to handle the type 3 lookups. I need to do 2 things: - Figure out which features contain type 3 lookups - Determine the number of variations the feature contains For the second one, this function seems relevant: https://harfbuzz.github.io/harfbuzz-hb-ot-layout.html#hb-ot-layout-feature-with-variations-get-lookups But this returns a list of lookups if you already know the variation index, when I want to know the range of possible variation indices. hb_ot_layout_lookup_get_glyph_alternates also looks useful and could partially solve the problem. With this function, I can handle the type 3 lookup cases in isolation, only finding the glyphs directly resulting from the feature and no further transformations. Also, can I have some advice on testing the code? How should I make these changes bulletproof? On Mon, Oct 16, 2023 at 11:58 PM Werner LEMBERG wrote: > > >> I simply noticed that it's possible for 2 characters to map to the > >> same glyph, which means that a glyph would map to 2 characters. I > >> don't have any examples in mind for when this would actually > >> happen. I was planning on either ignoring the situation to let it > >> be resolved arbitrarily, or removing both entries. > > > > The situation is resolved arbitrarily for now. > > OK. Minor nit: Please avoid overlong git commit messages (i.e., not > longer than 78 characters). And there should be an empty line after > the first line to help tools like `gitk` to properly display git > commits. [Overlong lines should be avoided in the C code, too, both > for comments and code.] > > > Also: what else needs to be done for the project to be complete and > > ready to become a part of freetype? The remaining tasks I can think > > of are: > > > > - Fill in, or find someone to fill in the rest of the adjustment > > database. > > This is certainly helpful. However, it doesn't need to be complete > right now, but it should cover a good share of languages that use the > Latin script. BTW, please add a comment to the `adjustment_database` > array, explaining the format. > > > - properly address the 'salt' table and similar cases in the glyph > > alternative finding algorithm. > > - Test everything more thoroughly. > > Sounds good, thanks. I also request you to produce a 'final' GSoC > tree with cleaned-up commit messages, as mentioned in other e-mails to > this list to other GSoC participants. > > > At this point, I know that the segment removal + vertical stretch is > > definitely the best approach, and the latest commit applies that to > > all the characters with tildes rather than a comparison of > > approaches. I previously thought that it caused some regressions, > > but I now know that the examples I had were just preexisting quirks > > in either the font or the autohinter. > > Excellent! I think it would also be beneficial if you could mention > your findings in either a git comment or in the code itself, together > with some real-world examples of such quirks (i.e., font name, font > version, glyph name, reason why it fails, etc., etc.) > > > Werner >
Re: Progress update on adjustment database
>> I simply noticed that it's possible for 2 characters to map to the >> same glyph, which means that a glyph would map to 2 characters. I >> don't have any examples in mind for when this would actually >> happen. I was planning on either ignoring the situation to let it >> be resolved arbitrarily, or removing both entries. > > The situation is resolved arbitrarily for now. OK. Minor nit: Please avoid overlong git commit messages (i.e., not longer than 78 characters). And there should be an empty line after the first line to help tools like `gitk` to properly display git commits. [Overlong lines should be avoided in the C code, too, both for comments and code.] > Also: what else needs to be done for the project to be complete and > ready to become a part of freetype? The remaining tasks I can think > of are: > > - Fill in, or find someone to fill in the rest of the adjustment > database. This is certainly helpful. However, it doesn't need to be complete right now, but it should cover a good share of languages that use the Latin script. BTW, please add a comment to the `adjustment_database` array, explaining the format. > - properly address the 'salt' table and similar cases in the glyph > alternative finding algorithm. > - Test everything more thoroughly. Sounds good, thanks. I also request you to produce a 'final' GSoC tree with cleaned-up commit messages, as mentioned in other e-mails to this list to other GSoC participants. > At this point, I know that the segment removal + vertical stretch is > definitely the best approach, and the latest commit applies that to > all the characters with tildes rather than a comparison of > approaches. I previously thought that it caused some regressions, > but I now know that the examples I had were just preexisting quirks > in either the font or the autohinter. Excellent! I think it would also be beneficial if you could mention your findings in either a git comment or in the code itself, together with some real-world examples of such quirks (i.e., font name, font version, glyph name, reason why it fails, etc., etc.) Werner
Re: Progress update on adjustment database
> Perhaps the following? > (1) If glyph A is in the 'cmap' table, and glyph B is not, prefer > glyph A. > (2) If one glyph needs X lookups and another glyph needs Y, and X < Y, > prefer glyph X. > I'm not sure whether (2) makes sense, though. > Can you give one or more examples for such cases? To repeat what I said in an email that I accidentally didn't send to the mailing list: > I simply noticed that it's possible for 2 characters to map to the same glyph, which means that a glyph would map to 2 characters. I don't have any examples in mind for when this would actually happen. I was planning on either ignoring the situation to let it be resolved arbitrarily, or removing both entries. The situation is resolved arbitrarily for now. I want to have a reason for the rule that I choose before I choose a rule for such cases. (1) seems to generally make sense because the cmap table is more "direct". Also: what else needs to be done for the project to be complete and ready to become a part of freetype? The remaining tasks I can think of are: - Fill in, or find someone to fill in the rest of the adjustment database. - properly address the 'salt' table and similar cases in the glyph alternative finding algorithm. - Test everything more thoroughly. At this point, I know that the segment removal + vertical stretch is definitely the best approach, and the latest commit applies that to all the characters with tildes rather than a comparison of approaches. I previously thought that it caused some regressions, but I now know that the examples I had were just preexisting quirks in either the font or the autohinter. > To help people understand the non-trivial algorithm I > suggest that you add a big comment that shows it working step by step > for an example font, using a reduced set of features and glyphs. This comment has been added. On Tue, Oct 3, 2023 at 7:04 AM Werner LEMBERG wrote: > > > OK. I think it is a bad side effect of the current auto-hinting > > > algorithm that there are different approaches. > > > > I just want to clarify: you understood that the reason I used > > different approaches for each letter was to compare the approaches? > > My intent is to use one of those approaches as a universal algorithm > > for all characters with tildes. So every character would just have > > a boolean flag for whether to apply tilde hinting or not. > > Ah, I was confused, sorry – I thought that you get such varying > results for a single algorithm. Sometimes it happens (not taking your > current work into account) that blue zones affect the hinting of > accents in a bad way, and I thought this were such cases. > > > Also, did you see my question about a glyph mapping to multiple > > characters? > > I missed it, sorry again. You write: > > > It's possible that 2 characters in the adjustment database could map > > to the same glyph, which will create 2 entries in the reverse > > character map with the same glyph as a key. In this case, the > > character that glyph maps to is decided arbitrarily based on which > > one the binary search chooses and which order qsort puts them in. > > What should be done in these cases? > > Perhaps the following? > > (1) If glyph A is in the 'cmap' table, and glyph B is not, prefer > glyph A. > > (2) If one glyph needs X lookups and another glyph needs Y, and X < Y, > prefer glyph X. > > I'm not sure whether (2) makes sense, though. > > Can you give one or more examples for such cases? > > > Werner >
Re: Progress update on adjustment database
> > OK. I think it is a bad side effect of the current auto-hinting > > algorithm that there are different approaches. > > I just want to clarify: you understood that the reason I used > different approaches for each letter was to compare the approaches? > My intent is to use one of those approaches as a universal algorithm > for all characters with tildes. So every character would just have > a boolean flag for whether to apply tilde hinting or not. Ah, I was confused, sorry – I thought that you get such varying results for a single algorithm. Sometimes it happens (not taking your current work into account) that blue zones affect the hinting of accents in a bad way, and I thought this were such cases. > Also, did you see my question about a glyph mapping to multiple > characters? I missed it, sorry again. You write: > It's possible that 2 characters in the adjustment database could map > to the same glyph, which will create 2 entries in the reverse > character map with the same glyph as a key. In this case, the > character that glyph maps to is decided arbitrarily based on which > one the binary search chooses and which order qsort puts them in. > What should be done in these cases? Perhaps the following? (1) If glyph A is in the 'cmap' table, and glyph B is not, prefer glyph A. (2) If one glyph needs X lookups and another glyph needs Y, and X < Y, prefer glyph X. I'm not sure whether (2) makes sense, though. Can you give one or more examples for such cases? Werner
Re: Progress update on adjustment database
> OK. I think it is a bad side effect of the current auto-hinting algorithm that there are different approaches. I just want to clarify: you understood that the reason I used different approaches for each letter was to compare the approaches? My intent is to use one of those approaches as a universal algorithm for all characters with tildes. So every character would just have a boolean flag for whether to apply tilde hinting or not. > Looks good. To help people understand the non-trivial algorithm I suggest that you add a big comment that shows it working step by step for an example font, using a reduced set of features and glyphs. will do! Also, did you see my question about a glyph mapping to multiple characters? On Sat, Sep 30, 2023 at 2:40 AM Werner LEMBERG wrote: > > > > Thanks. Do you have meanwhile found an explanation why o-tilde > > > looks so bad for Times New Roman at 16ppem? > > > > All 4 letters in each row have a different approach: > > > > õ: vertical stretch, no segment removal > > ñ: no vertical stretch, segment removal > > ã: vertical stretch and segment removal > > all other tildes: no changes applied > > OK. I think it is a bad side effect of the current auto-hinting > algorithm that there are different approaches. However, using the > adjustment database I wonder whether the knowledge of the character > topology can help improve the situation. In other words, do you see a > possibility to 'decouple' the (vertical) hinting of the tilde from the > base glyph hinting by checking a flag in the database? For this > purpose, a 'tilde' could be defined as the contour that lies higher > than the ascender of small letters – this implies that you need > another flag or enumeration to refer to small letter, uppercase > letters, etc. > > As an example, the database information for glyph 'o tilde' could be > > * lowercase character > * hint contour(s) higher than the lowercase ascender hight > separately > * stretch tilde vertically > > > I implemented the algorithm for all glyph variants! The version I > > used is different from what I wrote originally to fix some errors. > > Looks good. To help people understand the non-trivial algorithm I > suggest that you add a big comment that shows it working step by step > for an example font, using a reduced set of features and glyphs. > > > I've only tried it on a pretty simple case so far, so I'll need to > > assemble a more complex test font or two. > > A feature-rich (and freely available) font family is 'Libertinus', for > example. > > > Werner >
Re: Progress update on adjustment database
> > Thanks. Do you have meanwhile found an explanation why o-tilde > > looks so bad for Times New Roman at 16ppem? > > All 4 letters in each row have a different approach: > > õ: vertical stretch, no segment removal > ñ: no vertical stretch, segment removal > ã: vertical stretch and segment removal > all other tildes: no changes applied OK. I think it is a bad side effect of the current auto-hinting algorithm that there are different approaches. However, using the adjustment database I wonder whether the knowledge of the character topology can help improve the situation. In other words, do you see a possibility to 'decouple' the (vertical) hinting of the tilde from the base glyph hinting by checking a flag in the database? For this purpose, a 'tilde' could be defined as the contour that lies higher than the ascender of small letters – this implies that you need another flag or enumeration to refer to small letter, uppercase letters, etc. As an example, the database information for glyph 'o tilde' could be * lowercase character * hint contour(s) higher than the lowercase ascender hight separately * stretch tilde vertically > I implemented the algorithm for all glyph variants! The version I > used is different from what I wrote originally to fix some errors. Looks good. To help people understand the non-trivial algorithm I suggest that you add a big comment that shows it working step by step for an example font, using a reduced set of features and glyphs. > I've only tried it on a pretty simple case so far, so I'll need to > assemble a more complex test font or two. A feature-rich (and freely available) font family is 'Libertinus', for example. Werner
Re: Progress update on adjustment database
I have a question: It's possible that 2 characters in the adjustment database could map to the same glyph, which will create 2 entries in the reverse character map with the same glyph as a key. In this case, the character that glyph maps to is decided arbitrarily based on which one the binary search chooses and which order qsort puts them in. What should be done in these cases? On Tue, Sep 26, 2023 at 11:17 PM Craig White wrote: > > Thanks. Do you have meanwhile found an explanation why o-tilde looks > so bad for Times New Roman at 16ppem? > > All 4 letters in each row have a different approach: > > õ: vertical stretch, no segment removal > ñ: no vertical stretch, segment removal > ã: vertical stretch and segment removal > all other tildes: no changes applied > > Actually, I tried the o tilde character again with no adjustments and it > looked the same. In this case, the vertical stretch wasn't enough to fix > the issue. > > > Sounds good. Unfortunately, I'm a bit short of time right now; I'll > think about your algorithm within the next few days. However, please > proceed anyway! > I implemented the algorithm for all glyph variants! The version I used is > different from what I wrote originally to fix some errors. Here's the > current version: > results is now a global set of glyphs instead of an argument to the > function. It initially starts empty > fs is a global set of features, also initially empty > all other definitions are the same > > func all_glyphs(codepoint c) > { > result = result ∪ lookup(c, fs) > foreach (feature f ∈ (features - fs)) //for all features not already > in fs > { > new_glyphs = lookup(c, fs ∪ f) - result > if (new_glyphs != ∅) > { > result = result ∪ new_glyphs > fs = fs ∪ f > all_glyphs(c) > fs = fs - f > } > } > } > I've only tried it on a pretty simple case so far, so I'll need to > assemble a more complex test font or two. > > On Mon, Sep 18, 2023, 4:03 AM Werner LEMBERG wrote: > >> >> > Is testing all these combinations really necessary? >> >> I don't know :-) I just wanted to point out that feature combinations >> have to be considered. >> >> > [...] My intuition says very few of these combinations actually >> > matter. >> >> Yes, I agree. >> >> > I wrote some pseudocode for a different approach that I believe >> > accomplishes the same thing, while being more efficient and hopefully >> > removing the need to constrain the set of features considered: [...] >> >> Sounds good. Unfortunately, I'm a bit short of time right now; I'll >> think about your algorithm within the next few days. However, please >> proceed anyway! >> >> > I attached some pictures of the tilde unflattening approaches. >> >> Thanks. Do you have meanwhile found an explanation why o-tilde looks >> so bad for Times New Roman at 16ppem? >> >> > I chose sizes that showcase the differences between the approaches, >> > and also committed my current code if you would like to try it >> > yourself. >> >> Will try if I find some time. >> >> >> Werner >> >
Re: Progress update on adjustment database
> Thanks. Do you have meanwhile found an explanation why o-tilde looks so bad for Times New Roman at 16ppem? All 4 letters in each row have a different approach: õ: vertical stretch, no segment removal ñ: no vertical stretch, segment removal ã: vertical stretch and segment removal all other tildes: no changes applied Actually, I tried the o tilde character again with no adjustments and it looked the same. In this case, the vertical stretch wasn't enough to fix the issue. > Sounds good. Unfortunately, I'm a bit short of time right now; I'll think about your algorithm within the next few days. However, please proceed anyway! I implemented the algorithm for all glyph variants! The version I used is different from what I wrote originally to fix some errors. Here's the current version: results is now a global set of glyphs instead of an argument to the function. It initially starts empty fs is a global set of features, also initially empty all other definitions are the same func all_glyphs(codepoint c) { result = result ∪ lookup(c, fs) foreach (feature f ∈ (features - fs)) //for all features not already in fs { new_glyphs = lookup(c, fs ∪ f) - result if (new_glyphs != ∅) { result = result ∪ new_glyphs fs = fs ∪ f all_glyphs(c) fs = fs - f } } } I've only tried it on a pretty simple case so far, so I'll need to assemble a more complex test font or two. On Mon, Sep 18, 2023, 4:03 AM Werner LEMBERG wrote: > > > Is testing all these combinations really necessary? > > I don't know :-) I just wanted to point out that feature combinations > have to be considered. > > > [...] My intuition says very few of these combinations actually > > matter. > > Yes, I agree. > > > I wrote some pseudocode for a different approach that I believe > > accomplishes the same thing, while being more efficient and hopefully > > removing the need to constrain the set of features considered: [...] > > Sounds good. Unfortunately, I'm a bit short of time right now; I'll > think about your algorithm within the next few days. However, please > proceed anyway! > > > I attached some pictures of the tilde unflattening approaches. > > Thanks. Do you have meanwhile found an explanation why o-tilde looks > so bad for Times New Roman at 16ppem? > > > I chose sizes that showcase the differences between the approaches, > > and also committed my current code if you would like to try it > > yourself. > > Will try if I find some time. > > > Werner >
Re: Progress update on adjustment database
> Is testing all these combinations really necessary? I don't know :-) I just wanted to point out that feature combinations have to be considered. > [...] My intuition says very few of these combinations actually > matter. Yes, I agree. > I wrote some pseudocode for a different approach that I believe > accomplishes the same thing, while being more efficient and hopefully > removing the need to constrain the set of features considered: [...] Sounds good. Unfortunately, I'm a bit short of time right now; I'll think about your algorithm within the next few days. However, please proceed anyway! > I attached some pictures of the tilde unflattening approaches. Thanks. Do you have meanwhile found an explanation why o-tilde looks so bad for Times New Roman at 16ppem? > I chose sizes that showcase the differences between the approaches, > and also committed my current code if you would like to try it > yourself. Will try if I find some time. Werner
Re: Progress update on adjustment database
> So, if my understanding is correct, hb_ot_shape_glyphs_closure will > take an input character or characters and tell me all the glyphs > that it gets transformed into, as well as the final form. Yes. > I'm not sure about this interpretation, because the documentation > uses the term "Transitive closure", which I'm not familiar with. Indeed, it's a bit unfortunate that the documentation is not more verbose. > As for iterating through auto-hinter styles, do you mean that I > should get a list of features and try each one for the 'features' > parameter? Yes, you should try each one, and all combinations of them. However, the number of features that are of interest (at least for latin scripts) is small, which means that the number of iterations doesn't become very large; see macro `META_STYLE_LATIN` in file `afstyles.h` for a list. > Also, I wanted to share my progress in the tilde unflattening. > [...] This sounds very promising, thanks! > The segment removal should be part of the solution, but the question > is to what extent the vertical stretch should be part of the > solution. My gut feeling says that both is needed. I hope that you find constraints that work reliably for a large bunch of (common) fonts. To try to answer this, I tested on a bunch of fonts. > [...] Please also post some images. Werner
Re: Progress update on adjustment database
So, if my understanding is correct, hb_ot_shape_glyphs_closure will take an input character or characters and tell me all the glyphs that it gets transformed into, as well as the final form. I'm not sure about this interpretation, because the documentation uses the term "Transitive closure", which I'm not familiar with. As for iterating through auto-hinter styles, do you mean that I should get a list of features and try each one for the 'features' parameter? Also, I wanted to share my progress in the tilde unflattening. You suggested earlier that I fix the algorithm by removing the segments containing points of the tilde from their edges. This worked remarkably well. In fact, I found that I often got better results with only this change. The problem with the vertical stretch was that while it was able to fix the tilde in many cases, it caused the size to be inconsistent. The tilde would be 2 pixels wide at one ppem, then increase to 3 when downsizing, when this issue wasn't present before (this happens both with and without the segment removal). The segment removal is simpler and should be preferred if it solves the problem adequately. The drawback of segment removal is that it causes the tildes to have blurrier outlines, and it doesn't fix some of the cases that it would if it was combined with the vertical stretch. The segment removal should be part of the solution, but the question is to what extent the vertical stretch should be part of the solution. To try to answer this, I tested on a bunch of fonts. To aid testing, different characters use different versions of the algorithm: õ: vertical stretch, no segment ñ: no vertical stretch, segment removal ã: vertical stretch and segment removal all other tildes: no changes applied Results: Liberation Sans Regular: Segment removal delays flattening, but adding vertical stretch delays it even further. Times New Roman: All approaches had weird behavior at exactly 16 ppem. With only the stretch, the tilde flattened at this size. With both approaches with segment removal, the tilde stretched to 3 pixels tall and looked very out of place. This wasn't an issue with any non-adjusted letters with tildes (indicating it didn't originate from the font). In all cases, the tilde was not flat but very blurry at all sizes where the glyph was legible. Noto Sans Regular, suggested by Brad: both approaches with segment removal fix the tilde until around 13 ppem. This font has flat tildes at sizes as high as 26 ppem, which occurred with the vertical-stretch-only mode. Calibri Regular: Didn't have a flat tilde issue to begin with. Both approaches using segment removal had no size anomalies. On Sat, Sep 2, 2023 at 1:59 AM Werner LEMBERG wrote: > > > I discovered there was an issue when I tried using a test font I created > > with the following as one of its lookups (in ttx format): > > > > > > > > > > > > > > > > > > > > > > There is one lookup here, but 2 substitutions. My program needs to > > iterate through each substitution individually, but the function > > hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets > > representing all the input and output characters for this lookup. > > How can I get one substitution at a time? > > Indeed, you can't use use `hb_ot_layout_lookup_collect_glyphs` for > that. However, given that you actually want to map input character > codes to output glyph indices, what about using > `hb_ot_shape_glyphs_closure` on single input characters, iterating > over the auto-hinter 'styles'? If you get a single output glyph, > everything's fine. If you get more than a single output glyph this > essentially means that two or more lookups have been applied in > succession, but you only have to take care of the glyph that is part > of the 'style'. > > Note that this is untested on my side – I was just searching in the > HarfBuzz API. > > Behdad, if you have a better idea, please chime in :-) > > > Werner >
Re: Progress update on adjustment database
> I discovered there was an issue when I tried using a test font I created > with the following as one of its lookups (in ttx format): > > > > > > > > > > > There is one lookup here, but 2 substitutions. My program needs to > iterate through each substitution individually, but the function > hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets > representing all the input and output characters for this lookup. > How can I get one substitution at a time? Indeed, you can't use use `hb_ot_layout_lookup_collect_glyphs` for that. However, given that you actually want to map input character codes to output glyph indices, what about using `hb_ot_shape_glyphs_closure` on single input characters, iterating over the auto-hinter 'styles'? If you get a single output glyph, everything's fine. If you get more than a single output glyph this essentially means that two or more lookups have been applied in succession, but you only have to take care of the glyph that is part of the 'style'. Note that this is untested on my side – I was just searching in the HarfBuzz API. Behdad, if you have a better idea, please chime in :-) Werner
Re: Progress update on adjustment database
I discovered there was an issue when I tried using a test font I created with the following as one of its lookups (in ttx format): There is one lookup here, but 2 substitutions. My program needs to iterate through each substitution individually, but the function hb_ot_layout_lookup_collect_glyphs only returns 2 unordered sets representing all the input and output characters for this lookup. How can I get one substitution at a time? On Tue, Aug 29, 2023 at 3:13 PM Werner LEMBERG wrote: > > > As I was testing my attempt at supporting GSUB lookups, I found that > > hb_ot_layout_lookup_collect_glyphs, is actually not what I need, > > because I assumed that each lookup contains exactly one > > substitution, when it actually may contain multiple. What I really > > need is a way to get each individual substitution. How do I do > > this? > > It's not completely clear to me what you exactly need, please give an > example. > > Behdad, any idea whether HarfBuzz can help here? Otherwise it is > probably necessary to parse the GSUB table by ourselves, which is > something I would like to avoid... > > > Werner >
Re: Progress update on adjustment database
> As I was testing my attempt at supporting GSUB lookups, I found that > hb_ot_layout_lookup_collect_glyphs, is actually not what I need, > because I assumed that each lookup contains exactly one > substitution, when it actually may contain multiple. What I really > need is a way to get each individual substitution. How do I do > this? It's not completely clear to me what you exactly need, please give an example. Behdad, any idea whether HarfBuzz can help here? Otherwise it is probably necessary to parse the GSUB table by ourselves, which is something I would like to avoid... Werner
Re: Progress update on adjustment database
As I was testing my attempt at supporting GSUB lookups, I found that hb_ot_layout_lookup_collect_glyphs, is actually not what I need, because I assumed that each lookup contains exactly one substitution, when it actually may contain multiple. What I really need is a way to get each individual substitution. How do I do this?
Re: Progress update on adjustment database
On Saturday, 12 August 2023 at 17:17:47 BST, Werner LEMBERG wrote: > > It is one of those things I never remember - winding rule of > > truetype and postscript are different, one is even-odd, the other is > > non-zero. [...] > You are on the completely wrong track here. What I'm talking about is > strictly local and not (directly) related to having two different > contours. For example, to recognize stems and serifs, the contour > must be the same. In case you are not aware of how the auto-hinter > collects 'segments' and 'edges', please read > https://www.tug.org/TUGboat/tb24-3/lemberg.pdf > which still gives a good overview (inspite of its age and some changed > details here and there). Argh, sorry :-).
Re: Progress update on adjustment database
> It is one of those things I never remember - winding rule of > truetype and postscript are different, one is even-odd, the other is > non-zero. [...] You are on the completely wrong track here. What I'm talking about is strictly local and not (directly) related to having two different contours. For example, to recognize stems and serifs, the contour must be the same. In case you are not aware of how the auto-hinter collects 'segments' and 'edges', please read https://www.tug.org/TUGboat/tb24-3/lemberg.pdf which still gives a good overview (inspite of its age and some changed details here and there). Werner
Re: Progress update on adjustment database
It is one of those things I never remember - winding rule of truetype and postscript are different, one is even-odd, the other is non-zero. That concerns the number of times a line joining an "interior" (inked) point going to infinity, how many times it crosses the glyph contour. Anyway, viewing from a small gap between two "inked" portion, the two contours on either side must be running in opposite direction. There is a simple case where two contours runs in the same direction: think of a single connected contour that is like two circles except it crosses itself at one point (just say, the top - ie. You draw anti-clockwise from 12, when you get to 1, move inward to draw another circle, and move outward to 12 to join original when you get to 1 a 2nd time). I think the two winding rules means in either case, the region between the inner/outer part is inked (viewing from that area locally, you see contours running in parallel), because the contour goes around you "once". Whether the inner circle is inked depends on whether you are talking postscript/cff or truetype. One is inked, the other is not. (That region is enclosed by the contour globally "twice" - hence difference in even-odd versus non-zero). So this self-intersecting contour should either be drawn as mostly an 'o' shape, or a solidly inked circle. Granted, self-intersecting contours are rare, but they are legal. Anyway, a small gap we have been discussing for which we want to preserve during hinting, locally if you are sitting at that spot, you see contours running in opposite directions around you. If you see contours running in the same direction, you are possibly in a inked part instead, like between the self-intersecting bi-circle I just told you. On Saturday, 12 August 2023 at 06:43:14 BST, Craig White wrote: I'm still missing something. Why would the direction of the contour matter if, in either case, it's the same set of points? On Fri, Aug 11, 2023 at 6:52 AM Werner LEMBERG wrote: > You said that for an i - like shape: >> Both contours have the same direction. > > What kind of problems does this rule protect against? Sorry, this was sloppily formulated. It's about the *local* direction of contours, that is, whether a horizontal contour segment goes from left to right or from right to left. For the 'i' stem and the 'i' dot, both contours must have the same direction globally, but locally, at the dividing space, the corresponding lower and upper segments must have the opposite directions. Werner
Re: Progress update on adjustment database
> I'm still missing something. Why would the direction of the contour > matter if, in either case, it's the same set of points? It doesn't directly matter for your code. However, FreeType's auto-hinter handles these cases differently (namely, to detect stems), which might in turn influence the adjustments you have to manage. It's just a heads-up. Werner
Re: Progress update on adjustment database
I'm still missing something. Why would the direction of the contour matter if, in either case, it's the same set of points? On Fri, Aug 11, 2023 at 6:52 AM Werner LEMBERG wrote: > > > You said that for an i - like shape: > >> Both contours have the same direction. > > > > What kind of problems does this rule protect against? > > Sorry, this was sloppily formulated. It's about the *local* direction > of contours, that is, whether a horizontal contour segment goes from > left to right or from right to left. For the 'i' stem and the 'i' > dot, both contours must have the same direction globally, but locally, > at the dividing space, the corresponding lower and upper segments must > have the opposite directions. > > > Werner >
Re: Progress update on adjustment database
> You said that for an i - like shape: >> Both contours have the same direction. > > What kind of problems does this rule protect against? Sorry, this was sloppily formulated. It's about the *local* direction of contours, that is, whether a horizontal contour segment goes from left to right or from right to left. For the 'i' stem and the 'i' dot, both contours must have the same direction globally, but locally, at the dividing space, the corresponding lower and upper segments must have the opposite directions. Werner
Re: Progress update on adjustment database
I have a question about your suggestion earlier for a validation rule for an i - like shape You said that for an i - like shape: > Both contours have the same direction. What kind of problems does this rule protect against? On Tue, Aug 8, 2023 at 1:45 PM Craig White wrote: > > Right now I'm abroad (actually, in New York :-) – it will take some > > time until I can do such tests, sorry. > > That's ok. I didn't mean to imply that I was expecting you to do the > tests. I just said that in case you tried the code anyway. > > On Mon, Aug 7, 2023 at 8:59 AM Werner LEMBERG wrote: > >> >> > If you're testing this yourself, keep in mind that the adjustment >> > will only be applied to n with tilde. Any other letter with a tilde >> > can be used as a control group. >> >> Right now I'm abroad (actually, in New York :-) – it will take some >> time until I can do such tests, sorry. >> >> >> Werner >> >
Re: Progress update on adjustment database
> If you're testing this yourself, keep in mind that the adjustment > will only be applied to n with tilde. Any other letter with a tilde > can be used as a control group. Right now I'm abroad (actually, in New York :-) – it will take some time until I can do such tests, sorry. Werner
Re: Progress update on adjustment database
> [...] I have confirmed by commenting/uncommenting steps of the > hinting process that af_glyph_hints_align_edge_points is the > function that snaps the tilde back to flat, so if my understanding > is correct, the points at the top and bottom of the tilde are > forming edges and are being rounded towards each other, causing the > tilde to remain flat. Yes, it seems so. In case you haven't read it already, this article gives a nice overview, but please be aware that it is 20(!) years old and thus partially outdated with respect to some details: https://www.tug.org/TUGboat/tb24-3/lemberg.pdf > How should I proceed? Perhaps you can try to remove the affected segments from the corresponding edges. If that fails there is still the possibility to apply the vertical distortion afterwards... Werner
Re: Progress update on adjustment database
It's pushed. If you're testing this yourself, keep in mind that the adjustment will only be applied to n with tilde. Any other letter with a tilde can be used as a control group. The commit also has the vertical adjustment mode that pushes the bottom contour down, which I haven't found characters to test on yet. On Wed, Aug 2, 2023 at 6:29 PM Craig White wrote: > Thanks for your help. I fixed the issue by marking all on-curve points > that I moved as touched (and letting IUP do the rest), setting oy = y, and > applying the stretch to fy as well. I also had some calculation errors in > the height measurement the debug prints used and the algorithm itself. > > The next problem is that I relied on an assumption that if the original > height of the tilde was about 2 pixels tall, the grid-fitted position would > probably be 2 pixels. My method estimates the width of the tilde by > looking for the points at the tips of the tilde curves and measuring their > distance to the bounding box (see the attached image for a visualization), > then adding 1 pixel and scaling the contour vertically, anchored at its > lowest point, until it is at least that tall. > > Testing with Liberation Sans, the sizes that had flat tildes before still > have flat tildes. I can add more than 1 pixel to the measurement to > unflatten them, but that causes unnecessarily tall tildes at other sizes. > I have confirmed by commenting/uncommenting steps of the hinting process > that af_glyph_hints_align_edge_points is the function that snaps the tilde > back to flat, so if my understanding is correct, the points at the top and > bottom of the tilde are forming edges and are being rounded towards each > other, causing the tilde to remain flat. How should I proceed? > > I'll also push the code I have after removing the dead code and debug > prints that have built up. > > On Mon, Jul 31, 2023 at 6:23 PM Werner LEMBERG wrote: > >> >> > I have an algorithm I'm testing for the tilde unflattening. I went >> > with doing it before all steps, because it worked better with my >> > idea, but the function af_glyph_hints_align_weak_points is undoing >> > my changes. >> >> Hmm. Have you checked the OpenType specification how the IUP >> instruction works? >> >> >> https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline >> >> Have you 'touched' the points you move (i.e., setting >> `AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't >> move them again? >> >> >> Werner >> >
Re: Progress update on adjustment database
Thanks for your help. I fixed the issue by marking all on-curve points that I moved as touched (and letting IUP do the rest), setting oy = y, and applying the stretch to fy as well. I also had some calculation errors in the height measurement the debug prints used and the algorithm itself. The next problem is that I relied on an assumption that if the original height of the tilde was about 2 pixels tall, the grid-fitted position would probably be 2 pixels. My method estimates the width of the tilde by looking for the points at the tips of the tilde curves and measuring their distance to the bounding box (see the attached image for a visualization), then adding 1 pixel and scaling the contour vertically, anchored at its lowest point, until it is at least that tall. Testing with Liberation Sans, the sizes that had flat tildes before still have flat tildes. I can add more than 1 pixel to the measurement to unflatten them, but that causes unnecessarily tall tildes at other sizes. I have confirmed by commenting/uncommenting steps of the hinting process that af_glyph_hints_align_edge_points is the function that snaps the tilde back to flat, so if my understanding is correct, the points at the top and bottom of the tilde are forming edges and are being rounded towards each other, causing the tilde to remain flat. How should I proceed? I'll also push the code I have after removing the dead code and debug prints that have built up. On Mon, Jul 31, 2023 at 6:23 PM Werner LEMBERG wrote: > > > I have an algorithm I'm testing for the tilde unflattening. I went > > with doing it before all steps, because it worked better with my > > idea, but the function af_glyph_hints_align_weak_points is undoing > > my changes. > > Hmm. Have you checked the OpenType specification how the IUP > instruction works? > > > https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline > > Have you 'touched' the points you move (i.e., setting > `AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't > move them again? > > > Werner >
Re: Progress update on adjustment database
> I have an algorithm I'm testing for the tilde unflattening. I went > with doing it before all steps, because it worked better with my > idea, but the function af_glyph_hints_align_weak_points is undoing > my changes. Hmm. Have you checked the OpenType specification how the IUP instruction works? https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#interpolate-untouched-points-through-the-outline Have you 'touched' the points you move (i.e., setting `AF_FLAG_TOUCH_Y`) so that `af_glyph_hints_align_weak_points` doesn't move them again? Werner
Re: Progress update on adjustment database
I have an algorithm I'm testing for the tilde unflattening. I went with doing it before all steps, because it worked better with my idea, but the function af_glyph_hints_align_weak_points is undoing my changes. I can tell by measuring the height of the tilde after each step of the latin hinting process. Here's an example from one test: before tilde unflattening: 109 units after: 124 units after af_latin_hint_edges: 124 units after af_glyph_hints_align_edge_points: 146 units after af_glyph_hints_align_strong_points: 146 units after af_glyph_hints_align_weak_points: 59 units Even setting oy = y for all the points doesn't stop it, or making extreme changes like raising all points by 200. How do I fix this? On Mon, Jul 31, 2023 at 12:51 AM Werner LEMBERG wrote: > > > Just take the extrema of *all* points – for fonts this should be > > > good enough, because it is standard to have points at the curve > > > extrema, thus making the bounding box of the curve identical to > > > the bounding box of the points (both on and off points). > > > > Ok, thanks. This is exactly what I needed. I was already trying > > this, but I thought it was wrong because of off points. > > For your information, you might also look at FreeType's function > `FT_Glyph_Get_CBox`, which handles the generic case. > > > What I mean is: my code allows the adjustment for i to be applied to > > glyphs with more than 2 contours to also adjust accented o and a, so > > the rules you suggested would reject around half of the characters > > that are currently being adjusted this way. I think your rules can > > still be enforced by treating the contour to be moved as "A" and all > > other contours collectively as "B". > > Yeah, I was especially referring to glyph 'i', since here the problem > is most visible. > > >> [...] I ask you to have this in mind to find a solution that can be > >> easily extended to cover this situation, too (for example, by using > >> an extendable structure instead of a plain variable). > > > > In this case, do you mean that instead of making a codepoint a key > > for the database directly, I should wrap it in a struct so that > > other kind of keys can be added? > > Something like this, yes. Just try to be generic, and think of > possible extensions. > > >Werner >
Re: Progress update on adjustment database
> > Just take the extrema of *all* points – for fonts this should be > > good enough, because it is standard to have points at the curve > > extrema, thus making the bounding box of the curve identical to > > the bounding box of the points (both on and off points). > > Ok, thanks. This is exactly what I needed. I was already trying > this, but I thought it was wrong because of off points. For your information, you might also look at FreeType's function `FT_Glyph_Get_CBox`, which handles the generic case. > What I mean is: my code allows the adjustment for i to be applied to > glyphs with more than 2 contours to also adjust accented o and a, so > the rules you suggested would reject around half of the characters > that are currently being adjusted this way. I think your rules can > still be enforced by treating the contour to be moved as "A" and all > other contours collectively as "B". Yeah, I was especially referring to glyph 'i', since here the problem is most visible. >> [...] I ask you to have this in mind to find a solution that can be >> easily extended to cover this situation, too (for example, by using >> an extendable structure instead of a plain variable). > > In this case, do you mean that instead of making a codepoint a key > for the database directly, I should wrap it in a struct so that > other kind of keys can be added? Something like this, yes. Just try to be generic, and think of possible extensions. Werner
Re: Progress update on adjustment database
> What exactly do you mean with 'find'? The algorithm? Just take the extrema of *all* points – for fonts this should be good enough, because it is standard to have points at the curve extrema, thus making the bounding box of the curve identical to the bounding box of the points (both on and off points). Ok, thanks. This is exactly what I needed. I was already trying this, but I thought it was wrong because of off points. > What exact rule are you referring to? My rule #3 > doesn't seem to fit what you are talking about... Sorry for being unclear. What I mean is: my code allows the adjustment for i to be applied to glyphs with more than 2 contours to also adjust accented o and a, so the rules you suggested would reject around half of the characters that are currently being adjusted this way. I think your rules can still be enforced by treating the contour to be moved as "A" and all other contours collectively as "B". > No, I'm not, but I ask you to have this in mind to find a solution > that can be easily extended to cover this situation, too (for example, > by using an extendable structure instead of a plain variable). In this case, do you mean that instead of making a codepoint a key for the database directly, I should wrap it in a struct so that other kind of keys can be added? That sounds like a good solution On Sun, Jul 30, 2023 at 4:14 AM Werner LEMBERG wrote: > > Hello Craig, > > > again sorry for the late reply. > > > During this time, I realized that knowing the bounding box of the > > tilde contour would help a lot. In fact, the logic for the other > > vertical separation adjustments assumes that it can get the bounding > > box by taking the minimum/maximum coordinates of all the points, but > > this doesn't work because of off-points, which I didn't consider at > > the time. How do you find this bounding box? > > What exactly do you mean with 'find'? The algorithm? Just take the > extrema of *all* points – for fonts this should be good enough, > because it is standard to have points at the curve extrema, thus > making the bounding box of the curve identical to the bounding box of > the points (both on and off points). > > > I should note that, in your example, check #3 is too restrictive. > > The logic allows for the bottom shape that needs to be separated to > > be made up of any number of contours, which allows it to work for > > characters with more complex shapes. > > What exact rule are you referring to? My rule #3 was > > (3) All points of A are lower than all points of B (or vice versa). > > which doesn't seem to fit what you are talking about... > > > I want to clarify: are you adding glyph names in the database as a > > requirement for the project? > > No, I'm not, but I ask you to have this in mind to find a solution > that can be easily extended to cover this situation, too (for example, > by using an extendable structure instead of a plain variable). > > > Werner >
Re: Progress update on adjustment database
Hello Craig, again sorry for the late reply. > During this time, I realized that knowing the bounding box of the > tilde contour would help a lot. In fact, the logic for the other > vertical separation adjustments assumes that it can get the bounding > box by taking the minimum/maximum coordinates of all the points, but > this doesn't work because of off-points, which I didn't consider at > the time. How do you find this bounding box? What exactly do you mean with 'find'? The algorithm? Just take the extrema of *all* points – for fonts this should be good enough, because it is standard to have points at the curve extrema, thus making the bounding box of the curve identical to the bounding box of the points (both on and off points). > I should note that, in your example, check #3 is too restrictive. > The logic allows for the bottom shape that needs to be separated to > be made up of any number of contours, which allows it to work for > characters with more complex shapes. What exact rule are you referring to? My rule #3 was (3) All points of A are lower than all points of B (or vice versa). which doesn't seem to fit what you are talking about... > I want to clarify: are you adding glyph names in the database as a > requirement for the project? No, I'm not, but I ask you to have this in mind to find a solution that can be easily extended to cover this situation, too (for example, by using an extendable structure instead of a plain variable). Werner
Re: Progress update on adjustment database
Hey, I've been working on the tilde correction over the last few days. During this time, I realized that knowing the bounding box of the tilde contour would help a lot. In fact, the logic for the other vertical separation adjustments assumes that it can get the bounding box by taking the minimum/maximum coordinates of all the points, but this doesn't work because of off-points, which I didn't consider at the time. How do you find this bounding box? Also, thanks for clarifying what your intended solution is for the GSUB handling. I should note that, in your example, check #3 is too restrictive. The logic allows for the bottom shape that needs to be separated to be made up of any number of contours, which allows it to work for characters with more complex shapes. I want to clarify: are you adding glyph names in the database as a requirement for the project? On Fri, Jul 21, 2023 at 4:52 AM Werner LEMBERG wrote: > > >> Many-to-one: accent marks, e.g. umlauts > >> One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi". > > > > I don’t see how these two cases are different. An accented glyph like > > ⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303); > > similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode > > characters f+i (U+0066 U+0069). > > The first case is usually handled in the GPOS table; the auto-hinter > can only ignore this because the rendering of the components doesn't > happen together but in succession. See my other mail that gives both > a many-to-one and a one-to many example that might be part of GSUB. > > > Werner >
Re: Progress update on adjustment database
>> Many-to-one: accent marks, e.g. umlauts >> One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi". > > I don’t see how these two cases are different. An accented glyph like > ⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303); > similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode > characters f+i (U+0066 U+0069). The first case is usually handled in the GPOS table; the auto-hinter can only ignore this because the rendering of the components doesn't happen together but in succession. See my other mail that gives both a many-to-one and a one-to many example that might be part of GSUB. Werner
Re: Progress update on adjustment database
Many-to-one: accent marks, e.g. umlauts One-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi". I don’t see how these two cases are different. An accented glyph like ⟨ɑ̃⟩ is made up of two Unicode characters, ɑ+◌̃ (U+0251 U+0303); similarly a ligated glyph like ⟨fi⟩ is made up of the two Unicode characters f+i (U+0066 U+0069). Regards, Brad
Re: Progress update on adjustment database
>> A mapping from input Unicode characters to glyph indices based on >> the GSUB + cmap tables and not on cmap alone. > > Right now, the only cases where the GSUB table is helpful that I am > aware of, for the purposes of this project, are for handling glyph > alternates and combining characters. Those would be one-to-one > mappings and many-to-one mappings, respectively. Yes. > Would this general solution involve other kinds of GSUB mappings? Yes, but it won't affect your database format, which always contains single Unicode input characters (or glyph names, see below). > If so, it opens up edge cases such as: if a glyph on the "many" side > of a many-to-one mapping needs a vertical separation adjustment, > does the resulting glyph need it too? This could be answered > quickly by looking at the specific characters involved, but how > would I answer this question in general? > > Even sticking to just many-to-one and one-to-one mappings, the > adjustment database must make assumptions specific to the characters > it's dealing with. Yes, I think some kind of topological tests are needed. For example, the following assumptions should be checked for an 'i'-like shape: (1) There are two contours A and B. (2) Both contours have the same direction. (3) All points of A are lower than all points of B (or vice versa). (4) There is some horizontal overlap between A and B. > In the case of combining characters, a separate database table is > required because the existing table is a list of unicode characters > and the actions that should be applied to them, while a glyph > resulting from a combining character might not be a unicode > character. I don't think you need another database. Let's discuss an 'fi' ligature example. * Analyzing the GSUB table shows that it is the result of two input characters, 'f' and i'. * 'f' doesn't occur in your database and can be thus ignored if it is about to be queried. * 'i' appears in your database, so check whether glyph 'fi' satisfies the constraints for an 'i'-like shape (as described above): No, it doesn't, since it either is a single contour, failing condition (1), or if there are two contours, failing condition (3). It would be beneficial if your database could accept also glyph names in addition to Unicode input character codes – glyph names, if present, follow the Adobe Glyph List conventions today, which means that they are standardized to a certain extent. For example, there might be an entry for glyph name 'fi' that ensures that on the right side of the glyph there is vertical separation between the bottom and top shape (i.e., the merged 'i'). Another example, this time a one-to-many mapping: Let's assume that glyph U+2173, SMALL ROMAN NUMERAL FOUR, gets mapped in the GSUB table to two glyphs, 'v.numeral' and 'i.numeral' (but 'i' is not mapped to 'i.numeral' or vice versa because of incompatible horizontal metrics). Let's further assume that glyph 'i.numeral' is about to be queried. * According to the GSUB table, 'i.numeral' is part of U+2173. * U+2173 is in your database, and the constraints for considering it are the same as for the 'i' entry, which are met here. > As for the tilde correction, I'll try doing it after the grid > fitting like you recommended. OK. Werner
Re: Progress update on adjustment database
On Friday, 21 July 2023 at 07:28:07 BST, Craig White wrote: > ...Those would be one-to-one mappings and many-to-one mappings, respectively. > Would this general solution involve other kinds of GSUB mappings? ... > Even sticking to just many-to-one and one-to-one mappings, ... Having too exclusively a western-european language background might be a blind-side... I haven't followed the conversations too closely, but the 3rd scenario you try to ignore has a very common name: ligature. Ie. Special glyphs for combinations of "fi", "ff", for example. To recap, common usages: One-to-one: alternatesMany-to-one: accent marks, e.g. umlautsOne-glyph-to-many-unicode-characters: ligatures, e.g. "ff", "fi".
Re: Progress update on adjustment database
> Well, there are no 'unclear goals': the general solution is *exactly* > what I was talking about all the time, and what you need for any entry > in the adjustment database: A mapping from input Unicode characters to > glyph indices based on the GSUB + cmap tables and not on cmap alone. Right now, the only cases where the GSUB table is helpful that I am aware of, for the purposes of this project, are for handling glyph alternates and combining characters. Those would be one-to-one mappings and many-to-one mappings, respectively. Would this general solution involve other kinds of GSUB mappings? If so, it opens up edge cases such as: if a glyph on the "many" side of a many-to-one mapping needs a vertical separation adjustment, does the resulting glyph need it too? This could be answered quickly by looking at the specific characters involved, but how would I answer this question in general? Even sticking to just many-to-one and one-to-one mappings, the adjustment database must make assumptions specific to the characters it's dealing with. In the case of combining characters, a separate database table is required because the existing table is a list of unicode characters and the actions that should be applied to them, while a glyph resulting from a combining character might not be a unicode character. Even if I assumed it was, listing all characters possibly resulting from a combining character is inefficient. Instead, only a table with a few entries are needed: the combining character's codepoint and what action should be applied. This is something I started on before this conversation, and this is an example of how the use case affects the structure of the database. Without knowing what future use cases should be easier to implement because of a generic solution, I don't know what flavor of generic is required. As for the tilde correction, I'll try doing it after the grid fitting like you recommended.
Re: Progress update on adjustment database
> Right now, I don't understand the needs well enough to know what to > generalize or to test whether the general solution works well > enough. I'd rather start with specific use cases rather than a > general solution with unclear goals. Well, there are no 'unclear goals': the general solution is *exactly* what I was talking about all the time, and what you need for any entry in the adjustment database: A mapping from input Unicode characters to glyph indices based on the GSUB + cmap tables and not on cmap alone. >> Right now, I favor the latter: It should be a last-minute action, >> similar to TrueType's `DELTAP[123]` bytecode instructions. > > I disagree with doing the adjustment after grid fitting because in > this case, grid fitting is a destructive action. Doing it after > would require taking a flat line and adding the wiggle back in, Yes, but you have all the available data because you can access the original glyph shape. In other words, you can exactly control which points to move. > possibly in a way that doesn't match the font. At the resolutions we are talking about this absolutely doesn't matter, I think. Essentially you have to make appear as x x x x > It sounds easier to prevent that from happening in the first place. OK, give it a try. Rounding information is available also, so this might work as well. Werner
Re: Progress update on adjustment database
> Probably yes, but who knows. It would be nice to have a generic > solution that completely covers the whole situation, and we never have > to think about it again. Right now, I don't understand the needs well enough to know what to generalize or to test whether the general solution works well enough. I'd rather start with specific use cases rather than a general solution with unclear goals. > This leads to the basic question: Shall the correction be applied >before or after the grid fitting? Right now, I favor the latter: It > should be a last-minute action, similar to TrueType's `DELTAP[123]` > bytecode instructions. I disagree with doing the adjustment after grid fitting because in this case, grid fitting is a destructive action. Doing it after would require taking a flat line and adding the wiggle back in, possibly in a way that doesn't match the font. It sounds easier to prevent that from happening in the first place. On Thu, Jul 20, 2023 at 1:02 PM Werner LEMBERG wrote: > > > Since hinting glyphs that are descendants of combining characters > > will help few fonts, what other ways does the database need to use > > the GSUB table? The only other use case I'm aware of are one to one > > substitutions providing alternate forms of a glyph. > > Probably yes, but who knows. It would be nice to have a generic > solution that completely covers the whole situation, and we never have > to think about it again. > > > As for the tilde un-flattening, the approach I'm thinking of is to > > force the tilde to be at least 2 pixels tall before grid fitting > > begins. Would this ever cause the tilde to be 3 pixels because of > > rounding? > > This leads to the basic question: Shall the correction be applied > before or after the grid fitting? Right now, I favor the latter: It > should be a last-minute action, similar to TrueType's `DELTAP[123]` > bytecode instructions. > > > https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#managing-exceptions > > In other words, if a tilde character's wiggle (not the whole tilde's > vertical size!) is detected to be only 1px high, the shape should be > aggressively distorted vertically to make the wiggle span two pixels. > To do this, some code has to be written to detect the the inflection > and extremum points of the upper and lower wiggle of the outline; only > the extrema are then to be moved vertically. > > > Werner >
Re: Progress update on adjustment database
> Since hinting glyphs that are descendants of combining characters > will help few fonts, what other ways does the database need to use > the GSUB table? The only other use case I'm aware of are one to one > substitutions providing alternate forms of a glyph. Probably yes, but who knows. It would be nice to have a generic solution that completely covers the whole situation, and we never have to think about it again. > As for the tilde un-flattening, the approach I'm thinking of is to > force the tilde to be at least 2 pixels tall before grid fitting > begins. Would this ever cause the tilde to be 3 pixels because of > rounding? This leads to the basic question: Shall the correction be applied before or after the grid fitting? Right now, I favor the latter: It should be a last-minute action, similar to TrueType's `DELTAP[123]` bytecode instructions. https://learn.microsoft.com/en-us/typography/opentype/spec/tt_instructions#managing-exceptions In other words, if a tilde character's wiggle (not the whole tilde's vertical size!) is detected to be only 1px high, the shape should be aggressively distorted vertically to make the wiggle span two pixels. To do this, some code has to be written to detect the the inflection and extremum points of the upper and lower wiggle of the outline; only the extrema are then to be moved vertically. Werner
Re: Progress update on adjustment database
Thanks! This even answers some questions I was thinking about, but hadn't asked. I was wondering why I couldn't find any GSUB entries for combining characters. In one font I dumped with ttx, there were entries doing the opposite: mapping 'aacute' -> 'a' + 'acute'. Since hinting glyphs that are descendants of combining characters will help few fonts, what other ways does the database need to use the GSUB table? The only other use case I'm aware of are one to one substitutions providing alternate forms of a glyph. As for the tilde un-flattening, the approach I'm thinking of is to force the tilde to be at least 2 pixels tall before grid fitting begins. Would this ever cause the tilde to be 3 pixels because of rounding? On Thu, Jul 20, 2023 at 3:21 AM Werner LEMBERG wrote: > > > The next thing I'm doing for the adjustment database is making > > combining characters work. Currently, only precomposed characters > > will be adjusted. If my understanding is correct, this would mean > > finding any lookups that map a character + combining character onto > > a glyph, then apply the appropriate adjustments to that glyph. > > Yes. I suggest that you use the `ttx` decompiler from fonttools and > analyse the contents of a GSUB table of your favourite font. > > https://pypi.org/project/fonttools/ > > At the same time, use the `ftview` FreeType demo program with an > appropriate `FT2_DEBUG` setting so that you can see what the current > HarfBuzz code does for the given font. Examples: > > ``` > ttx -t GSUB arial.ttf > FT2_DEBUG="afshaper:7 afglobal:7 -v" \ > ftview -l 2 -kq arial.ttf &> arial.log > ``` > > Option `-l 2` selects 'light' hinting (i.e., auto-hinting), `-kq` > emulates the 'q' keypress (i.e., quitting immediately). See appended > files for `arial.ttf` version 7.00. > > In `arial.log`, the information coming from the 'afshaper' component > tells you the affected GSUB lookups; this helps poking around in the > XML data as produced by `ttx`. The 'afglobal' information tells you > the glyph indices covering a given script and feature (start with > 'latn_dflt'). > > You might also try a font editor of your choice (for example, > FontForge, menu entry 'View->Show ATT') to further analyze how the > GSUB data is constructed, and to get some visual feeling on what's > going on. > > > Right now, I'm trying to figure out what features I need to look > > inside to find these lookups. Should I just search all features? > > Yes, I think so. Since the auto-hinter is agnostic to the script and > the used language, you have to have all information in advance. > > > After that, I'm going to tackle the tilde-flattening issue, and any > > other similar marks that are getting flattened. > > Note that in most fonts you won't find any GSUB data for common > combinations like 'a' + 'acute' -> 'aacute'. Usually, such stuff gets > handled by the GPOS table, i.e., instead of mapping two glyphs to a > another single one, the accent gets moved to a better position. In > this case, the glyphs are rendered separately, *outside of FreeType's > scope*. This means that we can't do anything on the auto-hinter side > to optimize the distance between the base and the accent glyph (see > also the comment in file `afshaper.c` starting at line 308, and this > nice article > https://learn.microsoft.com/en-us/typography/develop/processing-part1). > > It thus probably makes sense to do the tilde stuff first. > > > Werner >
Re: Progress update on adjustment database
> The next thing I'm doing for the adjustment database is making > combining characters work. Currently, only precomposed characters > will be adjusted. If my understanding is correct, this would mean > finding any lookups that map a character + combining character onto > a glyph, then apply the appropriate adjustments to that glyph. Yes. I suggest that you use the `ttx` decompiler from fonttools and analyse the contents of a GSUB table of your favourite font. https://pypi.org/project/fonttools/ At the same time, use the `ftview` FreeType demo program with an appropriate `FT2_DEBUG` setting so that you can see what the current HarfBuzz code does for the given font. Examples: ``` ttx -t GSUB arial.ttf FT2_DEBUG="afshaper:7 afglobal:7 -v" \ ftview -l 2 -kq arial.ttf &> arial.log ``` Option `-l 2` selects 'light' hinting (i.e., auto-hinting), `-kq` emulates the 'q' keypress (i.e., quitting immediately). See appended files for `arial.ttf` version 7.00. In `arial.log`, the information coming from the 'afshaper' component tells you the affected GSUB lookups; this helps poking around in the XML data as produced by `ttx`. The 'afglobal' information tells you the glyph indices covering a given script and feature (start with 'latn_dflt'). You might also try a font editor of your choice (for example, FontForge, menu entry 'View->Show ATT') to further analyze how the GSUB data is constructed, and to get some visual feeling on what's going on. > Right now, I'm trying to figure out what features I need to look > inside to find these lookups. Should I just search all features? Yes, I think so. Since the auto-hinter is agnostic to the script and the used language, you have to have all information in advance. > After that, I'm going to tackle the tilde-flattening issue, and any > other similar marks that are getting flattened. Note that in most fonts you won't find any GSUB data for common combinations like 'a' + 'acute' -> 'aacute'. Usually, such stuff gets handled by the GPOS table, i.e., instead of mapping two glyphs to a another single one, the accent gets moved to a better position. In this case, the glyphs are rendered separately, *outside of FreeType's scope*. This means that we can't do anything on the auto-hinter side to optimize the distance between the base and the accent glyph (see also the comment in file `afshaper.c` starting at line 308, and this nice article https://learn.microsoft.com/en-us/typography/develop/processing-part1). It thus probably makes sense to do the tilde stuff first. Werner arial-7.00.ttx.xz Description: Binary data arial-7.00.log.xz Description: Binary data