[HACKERS] Latest on CITEXT 2.0

David E. Wheeler Wed, 25 Jun 2008 11:48:26 -0700

Howdy,

I just wanted to report the latest on my pet project: implementing anew case-insensitive text type, "citext", to be locale-aware and tobuild and run on PostgreSQL 8.3. I'm not much of a C programmer (thisis only the second time I've written *anything* in C), so I also havea few questions about my code, best practices, coverage, etc. You cangrab the latest here:


  https://svn.kineticode.com/citext/trunk/

BTW, the tests in sql/citext.sql use the pgtap.sql file to run TAPregression tests. So you can run them using `make installcheck` or`make test`. The latter requires that pg_prove be installed; you canget it here:


  https://svn.kineticode.com/pgtap/trunk/

Anyway, I think I've got it pretty close to done. The tests cover alot of stuff -- nearly everything I could figure out, anyway. Butthere are a few gaps.

As a result, I'd appreciate a little help with these questions, all inthe name of making this a solid data type suitable for use onproduction systems:

* There seem to still be some implicit CASTS to text that I'd like toduplicate. For example, select '192.168.1.2'::cidr::text;` works, but`select '192.168.1.2'::cidr::citext;` does not. Where can I find the Cfunctions that do these casts for TEXT so that I can put them to workfor citext, too? The internal cast functions used in the old citextdistribution don't exist at all on 8.3.

* There are casts from text that I'd also like to harness for use bycitext, like `cidr(text)`. Where can I find these C functions as well?(The upshot of this and the previous points is that I'd like citext tobe as compatible with TEXT as possible, and I just need to figure outhow to fill in the gaps in that compatibility.)

* Regular expression and LIKE comparisons using the the operatorsproperly work case-insensitively, but functions like replace() andregexp_replace() do not. Should they? and if so, how can I make themdo so?

* The tests assume that LC_COLLATE is set to en_US.UTF-8. Does thatwork well for standard PostgreSQL regression tests? How are locale-sensitive tests run in core regression tests?

* As for my C programming, well, what's broken? I'm especiallyconcerned that I pfree variables appropriately, but I'm not at allclear on what needs to be freed. Martijn mentioned before that btreecomparison functions free memory, but I'm such a C n00b that I don'tknow what that actually means for my implementation. I'd actuallyappreciate a bit of pedantry here. :-)

* Am I in fact getting an appropriate nul-terminated string in mycilower() function using this code?


    char * str  = DatumGetCString(
        DirectFunctionCall1( textout, PointerGetDatum( arg ) )
    );

Those are all the questions I had about my implementation. I'd like toget this thing done and released soon, so that I can be done with thisparticular Yak and get back to what I'm *supposed* to be doing with mytime.

BTW, would there be any interest in this code going into contrib/ inthe distribution? I think that, if we can ensure that it works justlike LOWER() = LOWER(), but without requiring that code, then it wouldbe a great type to point people to to use instead of that SQL hack(with all the usual caveats about it being locale-sensitive and notcanonically case-insensitive in the Unicode sense). If so, I'd behappy to make whatever changes are necessary to make it fit in withthe coding and organization standards of the core and to submit it.


But please, don't expect a civarchar type from me anytime soon. ;-)

Many thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Latest on CITEXT 2.0

Reply via email to