Re: [hackers] st and combining characters

2015-09-17 Thread Joakim Sindholt
On Mon, 2015-06-15 at 10:25 +0200, Roberto E. Vargas Caballero wrote:
> Hi,

Well, it's been a really long time and I thought I'd finally get around
to summarizing what was agreed upon way back when.

>   I like the idea, but I think the patch needs some evolution.
> A patch of 500 lines is usually hard of reading, and in in this case
> the change is not trivial.
> 
> On Wed, Jun 10, 2015 at 11:39:25PM +0200, Joakim Sindholt wrote:
> > glyph now holds a union of two combining characters and a pointer to a zero
> > terminated array. This is purely to make the most of the space. No length is
> > kept and every character added past 2 causes a realloc.
> 
> And why 2 characters?. I think it is more logical to have only one, which is
> the most common case, isn't it?. Maybe I'm missing something.

Because of space. I would wager that most computers have 64-bit pointers
nowadays, so to optimally use the space before going full malloc, we can
have one base character plus two combining characters. However one base
plus one combining is by far the most common in western scripts.

> > There's currently a problem with rendering where Xft doesn't draw the
> > combining characters as a part of the character they combine to, which 
> > causes
> > things like Hangul Jamo to be completely unusable stacks of characters 
> > instead
> > of the proper combined form.
> 
> Uhmm, I don't understand this point, can you explain it a bit better?,
> or better, put an example.

I used 㐻o͜ò떫 as my test characters (see http://sprunge.us/BIac if it
didn't render right for you). It should be a double width Chinese
character, followed by two o's, with a double width combining horizontal
end parenthesis under them both (not just the left one as with my
patch) and a ` over the second one, followed by a combining Hangul Jamo
character (not a box with a line through it as it renders with my
patch).

Since I really don't have time to tackle this properly, I'll leave
anyone who wants to pick up the torch with the same patch updated to git
head. Maybe it could be put on the wiki or something. Sorry for wasting
everyone's time with this.

I would say the criticisms below are spot on:

> > @@ -98,6 +98,7 @@ enum glyph_attribute {
> > ATTR_WRAP   = 1 << 8,
> > ATTR_WIDE   = 1 << 9,
> > ATTR_WDUMMY = 1 << 10,
> > +   ATTR_DECOMPPTR  = 1 << 11,
> > ATTR_BOLD_FAINT = ATTR_BOLD | ATTR_FAINT,
> >  };
> >  
> > @@ -190,6 +191,10 @@ typedef XftColor Color;
> >  
> >  typedef struct {
> > Rune u;   /* character code */
> > +   union {
> > +   Rune c[2];
> > +   Rune *pc;
> > +   } dc;
> 
> We already have the field u, so why is not the union made with u and pc?
> 
> > Line *alt;/* alternate screen */
> > bool *dirty;  /* dirtyness of lines */
> > XftGlyphFontSpec *specbuf; /* font spec buffer used for rendering */
> > +   int specbuflen;
> 
> 
> > @@ -419,8 +426,9 @@ static void ttywrite(const char *, size_t);
> >  static void tstrsequence(uchar);
> >  
> >  static inline ushort sixd_to_16bit(int);
> > -static int xmakeglyphfontspecs(XftGlyphFontSpec *, const Glyph *, int, 
> > int, int);
> > -static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, 
> > int);
> > +static int xmakeglyphfontspecs(XftGlyphFontSpec *, int, const Glyph *, 
> > int, int, int);
> > +static int xmakeglyphfontspecsintermbuf(const Glyph *, int, int, int);
> > +static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, 
> > int, int);
> 
> Uhmmm, these names begin to seem like java names. We must find a better way 
> to name them,
> because with names so long the only lower case style doesn't work.
> 
> > /* append every set & selected glyph to the selection */
> > @@ -984,6 +993,16 @@ getsel(void) {
> > continue;
> >  
> > ptr += utf8encode(gp->u, ptr);
> > +   if(gp->mode & ATTR_DECOMPPTR) {
> > +   pc = gp->dc.pc;
> > +   for(pc = gp->dc.pc; *pc; pc++)
> > +   ptr += utf8encode(*pc, ptr);
> > +   } else {
> > +   if(gp->dc.c[0])
> > +   ptr += utf8encode(gp->dc.c[0], ptr);
> > +   if(gp->dc.c[1])
> > +   ptr += utf8encode(gp->dc.c[1], ptr);
> > +   }
> > }
> 
> If we define the union with u and pc we can remove the else in this code, no?
> 
> > +   term.dirty[y] = 1;
> > +   if(term.line[y][x].mode & ATTR_DECOMPPTR) {
> > +   p = term.line[y][x].dc.pc;
> > +   while(*p)
> > +   p++;
> > +   sz = (p - term.line[y][x].dc.pc) + 2;
> > +   term.line[y][x].dc.pc = xrealloc(term.line[y][x].dc.pc, sz * 
> > sizeof(Rune));
> > +   term.line[y][x].dc.pc[sz-2] = u;
> > +   term.line[y][x].dc.pc[sz-1] = 0;
> > +   } 

Re: [hackers] st and combining characters

2015-06-15 Thread Roberto E. Vargas Caballero
Hi,

I like the idea, but I think the patch needs some evolution.
A patch of 500 lines is usually hard of reading, and in in this case
the change is not trivial.

On Wed, Jun 10, 2015 at 11:39:25PM +0200, Joakim Sindholt wrote:
 glyph now holds a union of two combining characters and a pointer to a zero
 terminated array. This is purely to make the most of the space. No length is
 kept and every character added past 2 causes a realloc.

And why 2 characters?. I think it is more logical to have only one, which is
the most common case, isn't it?. Maybe I'm missing something.

 There's currently a problem with rendering where Xft doesn't draw the
 combining characters as a part of the character they combine to, which causes
 things like Hangul Jamo to be completely unusable stacks of characters instead
 of the proper combined form.

Uhmm, I don't understand this point, can you explain it a bit better?,
or better, put an example.

 @@ -98,6 +98,7 @@ enum glyph_attribute {
   ATTR_WRAP   = 1  8,
   ATTR_WIDE   = 1  9,
   ATTR_WDUMMY = 1  10,
 + ATTR_DECOMPPTR  = 1  11,
   ATTR_BOLD_FAINT = ATTR_BOLD | ATTR_FAINT,
  };
  
 @@ -190,6 +191,10 @@ typedef XftColor Color;
  
  typedef struct {
   Rune u;   /* character code */
 + union {
 + Rune c[2];
 + Rune *pc;
 + } dc;

We already have the field u, so why is not the union made with u and pc?

   Line *alt;/* alternate screen */
   bool *dirty;  /* dirtyness of lines */
   XftGlyphFontSpec *specbuf; /* font spec buffer used for rendering */
 + int specbuflen;


 @@ -419,8 +426,9 @@ static void ttywrite(const char *, size_t);
  static void tstrsequence(uchar);
  
  static inline ushort sixd_to_16bit(int);
 -static int xmakeglyphfontspecs(XftGlyphFontSpec *, const Glyph *, int, int, 
 int);
 -static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, 
 int);
 +static int xmakeglyphfontspecs(XftGlyphFontSpec *, int, const Glyph *, int, 
 int, int);
 +static int xmakeglyphfontspecsintermbuf(const Glyph *, int, int, int);
 +static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, 
 int, int);

Uhmmm, these names begin to seem like java names. We must find a better way to 
name them,
because with names so long the only lower case style doesn't work.

   /* append every set  selected glyph to the selection */
 @@ -984,6 +993,16 @@ getsel(void) {
   continue;
  
   ptr += utf8encode(gp-u, ptr);
 + if(gp-mode  ATTR_DECOMPPTR) {
 + pc = gp-dc.pc;
 + for(pc = gp-dc.pc; *pc; pc++)
 + ptr += utf8encode(*pc, ptr);
 + } else {
 + if(gp-dc.c[0])
 + ptr += utf8encode(gp-dc.c[0], ptr);
 + if(gp-dc.c[1])
 + ptr += utf8encode(gp-dc.c[1], ptr);
 + }
   }

If we define the union with u and pc we can remove the else in this code, no?

 + term.dirty[y] = 1;
 + if(term.line[y][x].mode  ATTR_DECOMPPTR) {
 + p = term.line[y][x].dc.pc;
 + while(*p)
 + p++;
 + sz = (p - term.line[y][x].dc.pc) + 2;
 + term.line[y][x].dc.pc = xrealloc(term.line[y][x].dc.pc, sz * 
 sizeof(Rune));
 + term.line[y][x].dc.pc[sz-2] = u;
 + term.line[y][x].dc.pc[sz-1] = 0;
 + } else {
 + if(!term.line[y][x].dc.c[0]) {
 + term.line[y][x].dc.c[0] = u;
 + } else if(!term.line[y][x].dc.c[1]) {
 + term.line[y][x].dc.c[1] = u;
 + } else {
 + p = xmalloc(4 * sizeof(Rune));
 + p[0] = term.line[y][x].dc.c[0];
 + p[1] = term.line[y][x].dc.c[1];
 + p[2] = u;
 + p[3] = 0;
 + term.line[y][x].dc.pc = p;
 + term.line[y][x].mode |= ATTR_DECOMPPTR;
 + }
 + }
 +}

The same. If the union is made with u almost all off this code disappear.

 @@ -1711,7 +1786,7 @@ tclearregion(int x1, int y1, int x2, int y2) {
  
  void
  tdeletechar(int n) {
 - int dst, src, size;
 + int dst, src, size, i;
   Glyph *line;
  
   LIMIT(n, 0, term.col - term.c.x);
 @@ -1721,13 +1796,17 @@ tdeletechar(int n) {
   size = term.col - src;
   line = term.line[term.c.y];
  
 + for(i = dst; i  src; i++) {
 + if(line[i].mode  ATTR_DECOMPPTR)
 + line[i].dc.pc = NULL;
 + }

...

 @@ -1737,6 +1816,10 @@ tinsertblank(int n) {
   size = term.col - dst;
   line = term.line[term.c.y];
  
 + for(i = src; i  dst; i++) {
 + if(line[i].mode  ATTR_DECOMPPTR)
 +