Michael Paesold <[EMAIL PROTECTED]> writes: > After reading the discussion and the introduction, here is what I think > tsearch in core should at least accomplish in 8.3. > ... > - Stop words in tables, not in external files.
I realized that there's a pretty serious problem with doing that, which is encoding. We don't have any way to deal with preloaded catalog data that exceeds 7-bit-ASCII, because when you do CREATE DATABASE ... ENCODING it's going to be copied over exactly as-is. And there's plenty of not-ASCII stuff in the non-English stopword files. This is something we need to solve eventually, but I think it ties into the whole multiple locale can-of-worms; there's no way we're getting it done for 8.3. So I'm afraid we have to settle for stop words in external files for the moment. I do have two suggestions though: * Let's have just one stopword file for each language, with the convention that the file is stored in UTF8 no matter what language you're talking about. We can have the stopword reading code convert to the database encoding on-the-fly when it reads the file. Without this there's just a whole bunch of foot-guns there. We'd at least need to have encoding verification checks when reading the files, which seems hardly cheaper than just translating the data. * Let's fix it so the reference to the stoplist in the user-visible options is just a name, with no path or anything like that. (Similar to the handling of timezone_abbreviations.) Then it will be feasible to re-interpret the option as a reference to a named list in a catalog someday, when we solve the encoding problem. Right now the patch has things like + DATA(insert OID = 5140 ( "ru_stem_koi8" PGNSP PGUID 5135 5137 "dicts_data/russian.stop.koi8")); which is really binding the option pretty tightly to being a filename; not to mention the large security risks involved in letting anyone but a superuser have control of such an option. > What I don't really like is the number of commands introduced without > any strong reference to full text search. E.g. CREATE CONFIGURATION > gives no hint at all that this is about full text search. Yeah. We had some off-list discussion about this and concluded that TEXT SEARCH seemed to be the right phrase to use in the command names. That hasn't gotten reflected into the patch yet. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend