On 22 October 2012 19:44, Roland Mainz <roland.ma...@nrubsig.org> wrote: > On Fri, Oct 19, 2012 at 3:38 PM, Cedric Blancher > <cedric.blanc...@googlemail.com> wrote: >> Request for enhancement: .sh.regex.available_character_class >> >> What do you think about adding a .sh.regex.available_character_class >> array variable which contains the list of available wctype character >> classes for the current locale? I know there is no API to get a list >> from the OS but libast could probe well-known names and put only those >> in the array for which wctype() turned a non-0 value. > > Erm... just curious: What is the usage scenario for such a feature ?
We build regular expressions dynamically, based on other input data. The extra character classes help a lot when processing Japanese texts because they make the regular expressions MUCH shorter, usually by dozens of sub-expressions. The problem is that a lot of platforms (Linux!!) sometimes lack the extra classes we have in Solaris or AIX which severely cripples pattern matching performance. Ced -- Cedric Blancher <cedric.blanc...@googlemail.com> Institute Pasteur _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers