On 08/23/2016 03:54 AM, Tom Lane wrote:
! That's still not quite enough, though, because of locale-dependent
! character classes such as [[:alpha:]].  In Unicode locales these classes
! may have thousands of entries that are above MAX_SIMPLE_CHR, and we
! certainly don't want to be searching large colormaprange arrays at runtime.
! Nor do we even want to spend the time to initialize cvec structures that
! exhaustively describe all of those characters.  Our solution is to compute
! exact per-character colors at compile time only up to MAX_SIMPLE_CHR.
! For characters above that, we apply the <ctype.h> or <wctype.h> lookup
! functions at runtime for each locale-dependent character class used in the
! regex pattern, constructing a bitmap that describes which classes the
! runtime character belongs to.  The per-character-range data structure
! mentioned above actually holds, for each range, a separate color entry
! for each possible combination of character class properties.  That is,
! the color map for characters above MAX_SIMPLE_CHR is really a 2-D array,
! whose rows correspond to character ranges that are explicitly mentioned
! in the input, and whose columns correspond to sets of relevant locale
! character classes.

I think that last sentence should say "whose rows correspond to character ranges that are explicitly mentioned in the *regex pattern*", rather than "in the input".

An example would be very helpful here. I had to read through this many times, until I understood it. I can easily come up with examples for character classes, but not for "high" character-ranges. The best I could come up with is to check if a characters belongs to some special group of unicode characters, like U&'[\+01D100-\+01D1FF]' to check for musical symbol characters. In practice, I guess you will only see single characters in the colormaprange array, although we must of course cope with ranges too.

+       /* this relies on WHITE being zero: */
+       memset(cm->locolormap, WHITE,
+                  (MAX_SIMPLE_CHR - CHR_MIN + 1) * sizeof(color));
+
+       memset(cm->classbits, 0, sizeof(cm->classbits));
+       cm->numcmranges = 0;
+       cm->cmranges = NULL;
+       cm->maxarrayrows = 4;           /* arbitrary initial allocation */
+       cm->hiarrayrows = 1;
+       cm->hiarraycols = 1;
+       cm->hicolormap = (color *) MALLOC(cm->maxarrayrows * sizeof(color));
+       if (cm->hicolormap == NULL)
+       {
+               CERR(REG_ESPACE);
+               return;
+       }
+       /* initialize the "all other characters" row to WHITE */
+       cm->hicolormap[0] = WHITE;

Is the comment correct? I don't see why this wouldn't work with "WHITE != 0".

!       /* Duplicate existing columns to the right, and increase ref counts */
!       /* Must work downwards in the array because we realloc'd in place */
!       for (r = cm->hiarrayrows - 1; r >= 0; r--)
!       {
!               color      *oldrowptr = &newarray[r * cm->hiarraycols];
!               color      *newrowptr = &newarray[r * cm->hiarraycols * 2];
!               color      *newrowptr2 = newrowptr + cm->hiarraycols;

!               for (c = 0; c < cm->hiarraycols; c++)
!               {
!                       color           co = oldrowptr[c];
!
!                       newrowptr[c] = newrowptr2[c] = co;
!                       cm->cd[co].nuchrs++;
!               }
!       }

Perhaps "backwards" would be clearer than "downwards"? At least in my mental model, index 0 is the top row of an array, so "downwards" means 0, 1, 2. I guess you meant downwards numerically, rather than visually, but it took me a moment to process that.

+1 for this patch in general. Some regression test cases would be nice.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to