On Thu, Mar 14, 2013 at 9:40 PM, Alexander Korotkov <aekorot...@gmail.com>wrote:

> On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> Heikki Linnakangas <hlinnakan...@vmware.com> writes:
>> > On 23.01.2013 09:36, Alexander Korotkov wrote:
>> >> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane<t...@sss.pgh.pa.us>  wrote:
>> >>> The biggest problem is that I really don't care for the idea of
>> >>> contrib/pg_trgm being this cozy with the innards of regex_t.
>>
>> >> The only option I see now is to provide a method like "export_cnfa"
>> which
>> >> would export corresponding CNFA in fixed format.
>>
>> > Yeah, I think that makes sense. The transformation code in trgm_regexp.c
>> > would probably be more readable too, if it didn't have to deal with the
>> > regex guts representation of the CNFA. Also, once you have intermediate
>> > representation of the original CNFA, you could do some of the
>> > transformation work on that representation, before building the
>> > "tranformed graph" containing trigrams. You could eliminate any
>> > non-alphanumeric characters, joining states connected by arcs with
>> > non-alphanumeric characters, for example.
>>
>> It's not just the CNFA though; the other big API problem is with mapping
>> colors back to characters.  Right now, that not only knows way too much
>> about a part of the regex internals we have ambitions to change soon,
>> but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of
>> which should be known to the regex library IMO.  So I'm not sure how we
>> divvy that up sanely.  To be clear: I'm not going to insist that we have
>> to have a clean API factorization before we commit this at all.  But it
>> worries me if we don't even know how we could get to that, because we
>> are going to need it eventually.
>>
>
> Now I have following idea about API.
> Put code of stage 2 (transform the original CNFA into an automaton-like
> graph) into regex engine. It would use API which describes what exactly are
> we going to extract from CNFA. This API could look like this.
>
> typedef char *Prefix;
> typedef char *ArcLabel;
>
> typedef struct
> {
> Prefix newPrefix;
>  ArcLabel label;
> } ArcInfo;
>
> typedef struct
> {
> Prefix (*getInitialPrefix) ();
>  bool (*prefixContains) (Prefix prefix1, Prefix prefix2);
> Prefix * (*getPrefixes) (Prefix prefix, color c, int *n);
>  ArcInfo * (*getArcs) (Prefix prefix, color c, int *n);
> void (*freePrefix) (Prefix prefix);
>  void (*freeArcLabel) (ArcLabel arcLabel);
> } CFNATransformAPI;
>
> getInitialPrefix returns initial prefix value like now this code does:
> > initkey.prefix.colors[0] = UNKNOWN_COLOR;
> > initkey.prefix.colors[1] = UNKNOWN_COLOR;
> prefixContains are exactly same as function with this name.
> getPrefixes and getArcs cycle step work of addKeys an addArcs.
> freePrefix and freeArcLabel frees used memory of Prefix and ArcLabel
> strutures.
>
> Additionally regex engine should provide correct way to examine colormap.
> int getColorCharsCount(colormap *cm, color c);
> pg_wchar *getColorChars(colormap *cm, color c);
> getColorCharsCount would return -1 if this color should be considered as
> unexpandable.
>

Now I have working implemetation of this API. Comments still need rework.
Could you give me any feedback?

------
With best regards,
Alexander Korotkov.

Attachment: trgm-regexp-0.13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to