Re: [hackers] st and combining characters
On Mon, 2015-06-15 at 10:25 +0200, Roberto E. Vargas Caballero wrote: > Hi, Well, it's been a really long time and I thought I'd finally get around to summarizing what was agreed upon way back when. > I like the idea, but I think the patch needs some evolution. > A patch of 500 lines is usually hard of reading, and in in this case > the change is not trivial. > > On Wed, Jun 10, 2015 at 11:39:25PM +0200, Joakim Sindholt wrote: > > glyph now holds a union of two combining characters and a pointer to a zero > > terminated array. This is purely to make the most of the space. No length is > > kept and every character added past 2 causes a realloc. > > And why 2 characters?. I think it is more logical to have only one, which is > the most common case, isn't it?. Maybe I'm missing something. Because of space. I would wager that most computers have 64-bit pointers nowadays, so to optimally use the space before going full malloc, we can have one base character plus two combining characters. However one base plus one combining is by far the most common in western scripts. > > There's currently a problem with rendering where Xft doesn't draw the > > combining characters as a part of the character they combine to, which > > causes > > things like Hangul Jamo to be completely unusable stacks of characters > > instead > > of the proper combined form. > > Uhmm, I don't understand this point, can you explain it a bit better?, > or better, put an example. I used 㐻o͜ò떫 as my test characters (see http://sprunge.us/BIac if it didn't render right for you). It should be a double width Chinese character, followed by two o's, with a double width combining horizontal end parenthesis under them both (not just the left one as with my patch) and a ` over the second one, followed by a combining Hangul Jamo character (not a box with a line through it as it renders with my patch). Since I really don't have time to tackle this properly, I'll leave anyone who wants to pick up the torch with the same patch updated to git head. Maybe it could be put on the wiki or something. Sorry for wasting everyone's time with this. I would say the criticisms below are spot on: > > @@ -98,6 +98,7 @@ enum glyph_attribute { > > ATTR_WRAP = 1 << 8, > > ATTR_WIDE = 1 << 9, > > ATTR_WDUMMY = 1 << 10, > > + ATTR_DECOMPPTR = 1 << 11, > > ATTR_BOLD_FAINT = ATTR_BOLD | ATTR_FAINT, > > }; > > > > @@ -190,6 +191,10 @@ typedef XftColor Color; > > > > typedef struct { > > Rune u; /* character code */ > > + union { > > + Rune c[2]; > > + Rune *pc; > > + } dc; > > We already have the field u, so why is not the union made with u and pc? > > > Line *alt;/* alternate screen */ > > bool *dirty; /* dirtyness of lines */ > > XftGlyphFontSpec *specbuf; /* font spec buffer used for rendering */ > > + int specbuflen; > > > > @@ -419,8 +426,9 @@ static void ttywrite(const char *, size_t); > > static void tstrsequence(uchar); > > > > static inline ushort sixd_to_16bit(int); > > -static int xmakeglyphfontspecs(XftGlyphFontSpec *, const Glyph *, int, > > int, int); > > -static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, > > int); > > +static int xmakeglyphfontspecs(XftGlyphFontSpec *, int, const Glyph *, > > int, int, int); > > +static int xmakeglyphfontspecsintermbuf(const Glyph *, int, int, int); > > +static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, > > int, int); > > Uhmmm, these names begin to seem like java names. We must find a better way > to name them, > because with names so long the only lower case style doesn't work. > > > /* append every set & selected glyph to the selection */ > > @@ -984,6 +993,16 @@ getsel(void) { > > continue; > > > > ptr += utf8encode(gp->u, ptr); > > + if(gp->mode & ATTR_DECOMPPTR) { > > + pc = gp->dc.pc; > > + for(pc = gp->dc.pc; *pc; pc++) > > + ptr += utf8encode(*pc, ptr); > > + } else { > > + if(gp->dc.c[0]) > > + ptr += utf8encode(gp->dc.c[0], ptr); > > + if(gp->dc.c[1]) > > + ptr += utf8encode(gp->dc.c[1], ptr); > > + } > > } > > If we define the union with u and pc we can remove the else in this code, no? > > > + term.dirty[y] = 1; > > + if(term.line[y][x].mode & ATTR_DECOMPPTR) { > > + p = term.line[y][x].dc.pc; > > + while(*p) > > + p++; > > + sz = (p - term.line[y][x].dc.pc) + 2; > > + term.line[y][x].dc.pc = xrealloc(term.line[y][x].dc.pc, sz * > > sizeof(Rune)); > > + term.line[y][x].dc.pc[sz-2] = u; > > + term.line[y][x].dc.pc[sz-1] = 0; > > + }
Re: [hackers] st and combining characters
Hi, I like the idea, but I think the patch needs some evolution. A patch of 500 lines is usually hard of reading, and in in this case the change is not trivial. On Wed, Jun 10, 2015 at 11:39:25PM +0200, Joakim Sindholt wrote: glyph now holds a union of two combining characters and a pointer to a zero terminated array. This is purely to make the most of the space. No length is kept and every character added past 2 causes a realloc. And why 2 characters?. I think it is more logical to have only one, which is the most common case, isn't it?. Maybe I'm missing something. There's currently a problem with rendering where Xft doesn't draw the combining characters as a part of the character they combine to, which causes things like Hangul Jamo to be completely unusable stacks of characters instead of the proper combined form. Uhmm, I don't understand this point, can you explain it a bit better?, or better, put an example. @@ -98,6 +98,7 @@ enum glyph_attribute { ATTR_WRAP = 1 8, ATTR_WIDE = 1 9, ATTR_WDUMMY = 1 10, + ATTR_DECOMPPTR = 1 11, ATTR_BOLD_FAINT = ATTR_BOLD | ATTR_FAINT, }; @@ -190,6 +191,10 @@ typedef XftColor Color; typedef struct { Rune u; /* character code */ + union { + Rune c[2]; + Rune *pc; + } dc; We already have the field u, so why is not the union made with u and pc? Line *alt;/* alternate screen */ bool *dirty; /* dirtyness of lines */ XftGlyphFontSpec *specbuf; /* font spec buffer used for rendering */ + int specbuflen; @@ -419,8 +426,9 @@ static void ttywrite(const char *, size_t); static void tstrsequence(uchar); static inline ushort sixd_to_16bit(int); -static int xmakeglyphfontspecs(XftGlyphFontSpec *, const Glyph *, int, int, int); -static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, int); +static int xmakeglyphfontspecs(XftGlyphFontSpec *, int, const Glyph *, int, int, int); +static int xmakeglyphfontspecsintermbuf(const Glyph *, int, int, int); +static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, int, int); Uhmmm, these names begin to seem like java names. We must find a better way to name them, because with names so long the only lower case style doesn't work. /* append every set selected glyph to the selection */ @@ -984,6 +993,16 @@ getsel(void) { continue; ptr += utf8encode(gp-u, ptr); + if(gp-mode ATTR_DECOMPPTR) { + pc = gp-dc.pc; + for(pc = gp-dc.pc; *pc; pc++) + ptr += utf8encode(*pc, ptr); + } else { + if(gp-dc.c[0]) + ptr += utf8encode(gp-dc.c[0], ptr); + if(gp-dc.c[1]) + ptr += utf8encode(gp-dc.c[1], ptr); + } } If we define the union with u and pc we can remove the else in this code, no? + term.dirty[y] = 1; + if(term.line[y][x].mode ATTR_DECOMPPTR) { + p = term.line[y][x].dc.pc; + while(*p) + p++; + sz = (p - term.line[y][x].dc.pc) + 2; + term.line[y][x].dc.pc = xrealloc(term.line[y][x].dc.pc, sz * sizeof(Rune)); + term.line[y][x].dc.pc[sz-2] = u; + term.line[y][x].dc.pc[sz-1] = 0; + } else { + if(!term.line[y][x].dc.c[0]) { + term.line[y][x].dc.c[0] = u; + } else if(!term.line[y][x].dc.c[1]) { + term.line[y][x].dc.c[1] = u; + } else { + p = xmalloc(4 * sizeof(Rune)); + p[0] = term.line[y][x].dc.c[0]; + p[1] = term.line[y][x].dc.c[1]; + p[2] = u; + p[3] = 0; + term.line[y][x].dc.pc = p; + term.line[y][x].mode |= ATTR_DECOMPPTR; + } + } +} The same. If the union is made with u almost all off this code disappear. @@ -1711,7 +1786,7 @@ tclearregion(int x1, int y1, int x2, int y2) { void tdeletechar(int n) { - int dst, src, size; + int dst, src, size, i; Glyph *line; LIMIT(n, 0, term.col - term.c.x); @@ -1721,13 +1796,17 @@ tdeletechar(int n) { size = term.col - src; line = term.line[term.c.y]; + for(i = dst; i src; i++) { + if(line[i].mode ATTR_DECOMPPTR) + line[i].dc.pc = NULL; + } ... @@ -1737,6 +1816,10 @@ tinsertblank(int n) { size = term.col - dst; line = term.line[term.c.y]; + for(i = src; i dst; i++) { + if(line[i].mode ATTR_DECOMPPTR) +