On 08/23/2016 03:54 AM, Tom Lane wrote:
! That's still not quite enough, though, because of locale-dependent ! character classes such as [[:alpha:]]. In Unicode locales these classes ! may have thousands of entries that are above MAX_SIMPLE_CHR, and we ! certainly don't want to be searching large colormaprange arrays at runtime. ! Nor do we even want to spend the time to initialize cvec structures that ! exhaustively describe all of those characters. Our solution is to compute ! exact per-character colors at compile time only up to MAX_SIMPLE_CHR. ! For characters above that, we apply the <ctype.h> or <wctype.h> lookup ! functions at runtime for each locale-dependent character class used in the ! regex pattern, constructing a bitmap that describes which classes the ! runtime character belongs to. The per-character-range data structure ! mentioned above actually holds, for each range, a separate color entry ! for each possible combination of character class properties. That is, ! the color map for characters above MAX_SIMPLE_CHR is really a 2-D array, ! whose rows correspond to character ranges that are explicitly mentioned ! in the input, and whose columns correspond to sets of relevant locale ! character classes.
I think that last sentence should say "whose rows correspond to character ranges that are explicitly mentioned in the *regex pattern*", rather than "in the input".
An example would be very helpful here. I had to read through this many times, until I understood it. I can easily come up with examples for character classes, but not for "high" character-ranges. The best I could come up with is to check if a characters belongs to some special group of unicode characters, like U&'[\+01D100-\+01D1FF]' to check for musical symbol characters. In practice, I guess you will only see single characters in the colormaprange array, although we must of course cope with ranges too.
+ /* this relies on WHITE being zero: */ + memset(cm->locolormap, WHITE, + (MAX_SIMPLE_CHR - CHR_MIN + 1) * sizeof(color)); + + memset(cm->classbits, 0, sizeof(cm->classbits)); + cm->numcmranges = 0; + cm->cmranges = NULL; + cm->maxarrayrows = 4; /* arbitrary initial allocation */ + cm->hiarrayrows = 1; + cm->hiarraycols = 1; + cm->hicolormap = (color *) MALLOC(cm->maxarrayrows * sizeof(color)); + if (cm->hicolormap == NULL) + { + CERR(REG_ESPACE); + return; + } + /* initialize the "all other characters" row to WHITE */ + cm->hicolormap[0] = WHITE;
Is the comment correct? I don't see why this wouldn't work with "WHITE != 0".
! /* Duplicate existing columns to the right, and increase ref counts */ ! /* Must work downwards in the array because we realloc'd in place */ ! for (r = cm->hiarrayrows - 1; r >= 0; r--) ! { ! color *oldrowptr = &newarray[r * cm->hiarraycols]; ! color *newrowptr = &newarray[r * cm->hiarraycols * 2]; ! color *newrowptr2 = newrowptr + cm->hiarraycols; ! for (c = 0; c < cm->hiarraycols; c++) ! { ! color co = oldrowptr[c]; ! ! newrowptr[c] = newrowptr2[c] = co; ! cm->cd[co].nuchrs++; ! } ! }
Perhaps "backwards" would be clearer than "downwards"? At least in my mental model, index 0 is the top row of an array, so "downwards" means 0, 1, 2. I guess you meant downwards numerically, rather than visually, but it took me a moment to process that.
+1 for this patch in general. Some regression test cases would be nice. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers