On Mon, Apr 28, 2014 at 8:40 AM, Jonathan Wakely <jwak...@redhat.com> wrote: > I've been looking through the regex code and have a few ideas for > simplifications or optimisations that I'd like to share.
Thanks :) > This first patch is for _BracketMatcher. We only use std::bitset when > is_same<_CharT, char> so 8 * sizeof(_CharT) should be __CHAR_BIT__ > instead. We also only user _UnsignedCharT when is_same<_CharT, char> > so that can just be simplified to unsigned char. Yes, since _UnsignedCharT is just used as indexes, we can always use a larger unsigned integer instead. Maybe "size_t" is a better choice? I'm not sure if we'll have a wchar_t cache (bitset<65536>) in the future ;) > The contents of _BracketMatcher::_M_char_set are not sorted and can > contain duplicates in the current code. Making that a sorted, unique > list in _BracketMatcher::_M_ready() allows a binary search instead of > linear search. This improves worst case performance for pathological > regular expressions like std::wregex('['+std::wstring(1000, 'a')+"b]") > but I'm not sure if it helps in the common case. Trust me, in common case, even it's a bit slower, it won't be observable. So it's just OK. > Finally, in the non-char case the _CacheT member is an unused empty > object, so having that as the first member requires 7 bytes of > padding. Re-ordering the members reduces the size of a non-char > _BracketMatcher by 8 bytes (but it's still a whopping 96 bytes). > > (For a char _BracketMatcher the bitset cache makes it 128 bytes, > this patch doesn't change that). This is OK too. Rant: I can't understand why the committee doesn't keep things simple by letting struct { } has 0 size. I used to implement my own tuple by specializing tuple<> as an empty struct. The thing is, one tuple shouldn't `has it` as a member, or it will get extra padding; at last I used inheritance. -- Regards, Tim Shen