On 22 October 2012 19:44, Roland Mainz <roland.ma...@nrubsig.org> wrote:
> On Fri, Oct 19, 2012 at 3:38 PM, Cedric Blancher
> <cedric.blanc...@googlemail.com> wrote:
>> Request for enhancement: .sh.regex.available_character_class
>>
>> What do you think about adding a  .sh.regex.available_character_class
>> array variable which contains the list of available wctype character
>> classes for the current locale? I know there is no API to get a list
>> from the OS but libast could probe well-known names and put only those
>> in the array for which wctype() turned a non-0 value.
>
> Erm... just curious: What is the usage scenario for such a feature ?

We build regular expressions dynamically, based on other input data.
The extra character classes help a lot when processing Japanese texts
because they make the regular expressions MUCH shorter, usually by
dozens of sub-expressions. The problem is that a lot of platforms
(Linux!!) sometimes lack the extra classes we have in Solaris or AIX
which severely cripples pattern matching performance.

Ced
-- 
Cedric Blancher <cedric.blanc...@googlemail.com>
Institute Pasteur
_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to