On Sun, Feb 19, 2012 at 04:33, Robert Haas <robertmh...@gmail.com> wrote:
> On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Yeah, it's conceivable that we could implement something whereby > >> characters with codes above some cutoff point are handled via runtime > >> calls to iswalpha() and friends, rather than being included in the > >> statically-constructed DFA maps. The cutoff point could likely be a lot > >> less than U+FFFF, too, thereby saving storage and map build time all > >> round. > > > > In the meantime, I still think the caching logic is worth having, and > > we could at least make some people happy if we selected a cutoff point > > somewhere between U+FF and U+FFFF. I don't have any strong ideas about > > what a good compromise cutoff would be. One possibility is U+7FF, which > > corresponds to the limit of what fits in 2-byte UTF8; but I don't know > > if that corresponds to any significant dropoff in frequency of usage. > > The problem, of course, is that this probably depends quite a bit on > what language you happen to be using. For some languages, it won't > matter whether you cut it off at U+FF or U+7FF; while for others even > U+FFFF might not be enough. So I think this is one of those cases > where it's somewhat meaningless to talk about frequency of usage. > Does it make sense for regexps to have collations?