Re: [HACKERS] next CommitFest
*snip* One pretty major fly in the ointment is that neither Hot Standby nor Streaming Replication has been committed or shows much sign of being about to be committed. I think this is bad. These are big features that figure to have some bugs and break some things. If they're not committed in time for alpha3, then there won't be any significant testing of these prior to alpha4/beta1, at the earliest. I think that's likely to lead to either (1) a very long beta period followed by a late release or (2) a buggy release. I feel like Simon Riggs and Fujii Masao really pulled out all the stops to get these ready in time for the September CommitFest, and while I'm not in a hurry to break the world, I think the sooner these can hit the tree, the better of we'll be in terms of releasing 8.5. Just my $0.02, absolutely, we should be commit this. we did some testing and things look stable. also, people would most likely want to build code on top of it in be ready for 8.5 (support scripts, etc.). this is important in order to create some acceptance in user land. this stuffs seems mature and very well thought. just my $0.02 ... regards, hans-jürgen schönig -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] draft RFC: concept for partial, wal-based replication
hello ... as my day has worked out quite nicely up to know i thought to f... it up and post a new concept which has been requested by a customer. the goal is to collect some feedback, ideas and so on (not to be mixed up with flames). we have funding for this and we are trying to sort out how to do it the best way. comments are welcome ... note, this is a first draft i want to refine based on some comments. here we go ... Partial WAL Replication for PostgreSQL: --- As of now the PostgreSQL community has provided patches and functionalities which allow full WAL-based replication as well as hot-standby. To extend this functionality and to make PostgreSQL even more suitable for enterprise computing than it is today, we have the commitment of a sponsor to fund partial replication for PostgreSQL 8.5 / 8.6. This is the first draft of a proposal to make partial WAL-based replication work and to provide and additional set of fancy features to the community which has been waiting for real in-core replication for a decade or more. Why partial replication? In some cases people have master servers which contain enormous amounts of data (XX TB or so). If more than just one replica of this data is needed it might happen that different slaves are used for different purposes. This implies that not all data will be used by all maschines. An example: Consider a server at a phone company collecting phone calls, billing data, and maybe network routing data. Data is used by different department and one maschine is not enough to serve all three departments. With the new functionality proposed here we could make 3 replicas each holding just a group of tables for specific tasks thus allowing people to buy cheaper hardware for slaves and use more maschines instead. Current status: --- Hot-standy and streaming replication have been a huge leap step forward for the community and what is proposed here will be an extension to those patches and functionalities. This concept is NOT aimed to replace anything - it is mainly an addon. Nodes and replication filters: -- As of 8.4 standby systems are done by creating an archive_command along with a base backup. Although it is easy to do some users still reported some difficulties due to a total misunderstanding of PITR. The idea is to add a functionality to add slaves like this: CREATE REPLICA node_name CONNECT FROM SLAVE 'connect_string' TRANSFER COMMAND 'command' [ USING replication_filter ]; 'command' would be any shell script copying data from the local master to the new database node called node_name. Replication filters can be used to make X replicas contain the same tables. Filtersets can be created like this: CREATE REPLICATION FILTER filter_name [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ]; Replication filters can be modified ... ALTER REPLICATION FILTER filter_name RENAME TO new_filtername; ALTER REPLICATION FILTER filter_name { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object; Filter sets can be dropped like this ... DROP REPLICATION FILTER filter_name; Internally CREATE REPLICA would initiate a base backup to the new slave server just like we would do it manually otherwise. The server would automatically use the user defined 'command' to copy one file after the other to the slave box. The idea is basically stolen from archive_command and friends. At this stage we either copy the entire instance as we would do it with a normal base backup or just what is needed (defined by the replication filter). Users would automatically only copy data to a slave which is really needed there and which matches their filter config. If the copy is done, we can register the new node inside a system table and commit the transaction. Also, we can automatically create a useful recovery.conf setup - we know how to connect from the slave to the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper recovery.conf file). Tables can easily be added or removed from a replication filter with ALTER REPLICATION FILTER. Replicas can be removed easily: DROP REPLICA node_name; Why SQL to add a node? We are convinced that this is the most simplistic way of doing things. It is the most intuitive way of doing things. We believe it gives users a real feeling of simplicity. The current way of doing base backups should stay in place as it is - it has proven to be nice for countless tasks. However, it is not suitable for managing 10 or more replicas easily. Especially not when they are not full blown copies of the master. Technical ideas: System tables: We suggest to always replicate the entire system catalog. It woulde be a total disaster to try some other implementation. The same applies for other tables - we always replicate entire tables; no WHERE-clauses allowed when it comes to
Re: [HACKERS] contrib/plantuner - enable PostgreSQL planner hints
hi there ... for this work i will include you in my evening prayers for at least one week. i know there has been a lot of discussion about this but what you just posted it excellent and more important: USEFUL to many people. i had something else in mind recently as well: virtual indexes. it would help people to decide whether and index would make sense if it would actually exist. in some cases this would make sense as well as many datasets are just to big to try out if an index help.s if there was a vote whether this should be in contrib or in core: +999 from me ... many thanks, hans Oleg Bartunov wrote: Hi there, this is an announcement of our new contribution module for PostgreSQL - Plantuner - enable planner hints (http://www.sai.msu.su/~megera/wiki/plantuner). Example: =# LOAD 'plantuner'; =# create table test(id int); =# create index id_idx on test(id); =# create index id_idx2 on test(id); =# \d test Table public.test Column | Type | Modifiers +-+--- id | integer | Indexes: id_idx btree (id) id_idx2 btree (id) =# explain select id from test where id=1; QUERY PLAN --- Bitmap Heap Scan on test (cost=4.34..15.03 rows=12 width=4) Recheck Cond: (id = 1) - Bitmap Index Scan on id_idx2 (cost=0.00..4.34 rows=12 width=0) Index Cond: (id = 1) (4 rows) =# set enable_seqscan=off; =# set plantuner.forbid_index='id_idx2'; =# explain select id from test where id=1; QUERY PLAN -- Bitmap Heap Scan on test (cost=4.34..15.03 rows=12 width=4) Recheck Cond: (id = 1) - Bitmap Index Scan on id_idx (cost=0.00..4.34 rows=12 width=0) Index Cond: (id = 1) (4 rows) =# set plantuner.forbid_index='id_idx2,id_idx'; =# explain select id from test where id=1; QUERY PLAN - Seq Scan on test (cost=100.00..140.00 rows=12 width=4) Filter: (id = 1) (2 rows) Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Jeff, Will statement_timeout not suffice for that use case? Well, currently statement_timeout doesn't affect waiting for locks. Sure it does. And as a DBA, I don't think I'd want the same timeout for executing queries as for waiting for a lock. this is exactly the point it is simply an additional use case. while statement_timeout is perfect to kick out queries which take too long a lock_timeout serves a totally different purpose because you will get a totally different error message. imagine some old 4GL terminal application: in this case you will hardly reach a statement_timeout because you will simply want to wait until things appear on your screen. however, you definitely don't want to wait forever if somebody keeps working on some product which is on stock and never finishes. btw, this old terminal application i was talking about is exactly the usecase we had - this is why this patch has been made. we are porting roughly 2500 terminal application from informix to postgresql. we are talking about entire factory production lines and so on here (the ECPG patches posted recently are for the same project, btw.). there are countless use-cases where you want to know whether you are locked out or whether you are just taking too long - the message is totally different. the goal of the patch is to have a mechanism to make sure that you don't starve to death. as far is syntax is concerned: there are good reasons for WAIT and good reasons for a GUC. while the WAIT syntax is clearly for a very precise instruction for a very certain place in a program, a GUC is a more overall policy. i don't see a reason why we should not have both anyway. a GUC has the charm that it can be assigned to roles, procedures, etc. nicely a WAIT clause has the charm of being incredibly precise. i can see good arguments for both. the code itself is pretty simplistic - it needs no effort to be up to date and it does not harm anything else - it is pretty isolated. many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
Jeff Janes wrote: Will statement_timeout not suffice for that use case? we tried to get around it without actually touching the core but we really need this functionality. patching the core here is not the primary desire we have. it is all about modeling some functionality which was truly missing. many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] happy birthday Tom Lane ...
Tom, On behalf of the entire PostgreSQL team here in Austria I want to wish you a happy birthday. We hope that you fill be a vital part of PostgreSQL for many years to come. Best regards, Hans-Jürgen Schönig + team -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] combined indexes with Gist - planner issues?
hello everybody, we are seriously fighting with some planner issue which seems to be slightly obscure to us. we have a table which is nicely indexed (several GB in size). i am using btree_gist operator classes to use a combined index including an FTI expression along with a number: db=# \d product.t_product Table product.t_product Column | Type | Modifiers ---+---+ id| bigint| not null default nextval('product.t_product_id_seq'::regclass) shop_id | integer | art_number| text | title | text | description | text | display_price | numeric(10,4) | Indexes: t_product_pkey PRIMARY KEY, btree (id) idx_test gist (display_price, to_tsvector('german'::regconfig, (title || ' '::text) || description)) *idx_test2 gist (to_tsvector('german'::regconfig, (title || ' '::text) || description), display_price)* what we basically expected here is that Postgres will scan the table using the index to give us the cheapest products containing the words we are looking for. i am totally surprised to see that we have to fetch all products given the words, sort and then do the limit. this totally kills performance because some words simply show up millions of times. this totally kills everything. the plans look like this: db=# explain analyze SELECT art_number, title FROM product.t_product WHERE to_tsvector('german'::regconfig, (title || ' '::text) || description) @@ plainto_tsquery('harddisk') ORDER BY display_price LIMIT 10; QUERY PLAN Limit (cost=108340.08..108340.10 rows=10 width=54) (actual time=1328.900..1328.909 rows=10 loops=1) - Sort (cost=108340.08..108422.48 rows=32961 width=54) (actual time=1328.899..1328.905 rows=10 loops=1) Sort Key: display_price Sort Method: top-N heapsort Memory: 18kB - Bitmap Heap Scan on t_product (cost=2716.62..107627.80 rows=32961 width=54) (actual time=1052.706..1328.772 rows=55 loops=1) Recheck Cond: (to_tsvector('german'::regconfig, ((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text)) - Bitmap Index Scan on idx_test2 (cost=0.00..2708.38 rows=32961 width=0) (actual time=1052.576..1052.576 rows=55 loops=1) Index Cond: (to_tsvector('german'::regconfig, ((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text)) Total runtime: 1328.942 ms (9 rows) runtime increases badly if words start to be more likely ... db=# explain analyze SELECT art_number, title FROM product.t_product WHERE to_tsvector('german'::regconfig, (title || ' '::text) || description) @@ plainto_tsquery('spiel') ORDER BY display_price LIMIT 10; QUERY PLAN -- Limit (cost=108340.08..108340.10 rows=10 width=54) (actual time=33489.675..33489.682 rows=10 loops=1) - Sort (cost=108340.08..108422.48 rows=32961 width=54) (actual time=33489.675..33489.675 rows=10 loops=1) Sort Key: display_price Sort Method: top-N heapsort Memory: 18kB - Bitmap Heap Scan on t_product (cost=2716.62..107627.80 rows=32961 width=54) (actual time=774.923..33408.522 rows=56047 loops=1) Recheck Cond: (to_tsvector('german'::regconfig, ((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text)) - Bitmap Index Scan on idx_test2 (cost=0.00..2708.38 rows=32961 width=0) (actual time=759.078..759.078 rows=56047 loops=1) Index Cond: (to_tsvector('german'::regconfig, ((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text)) Total runtime: 33489.906 ms (9 rows) i am wondering why postgres is not able to use a combined index here? is this some obscure thing related to gist, a logical problem or a planner deficiency? ideas are welcome. many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] combined indexes with Gist - planner issues?
Tom Lane wrote: Hans-Juergen Schoenig -- PostgreSQL postg...@cybertec.at writes: what we basically expected here is that Postgres will scan the table using the index to give us the cheapest products containing the words we are looking for. i am totally surprised to see that we have to fetch all products given the words, sort and then do the limit. I don't know why you'd find that surprising. GIST indexes have no support for ordering. regards, tom lane ok, i thought it would be something gist specific i was not aware of. the golden question now is: i am looking for the cheapest products given a certain text in an insane amount of data. how to do it? other quals which could narrow down the amount of data would not help. i cannot see an option with regular weapons ... maybe you can an idea how to fix core to make it work? maybe there is a mechanism we could need. we really have to make this work - no matter what it takes. we are willing to put effort into that. many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bison crashes postgresql
Andrew Dunstan wrote: Werner Echezuria wrote: Hi, I have a code in which I translate some code from sqlf to sql, but when it comes to yy_parse the server crashes, I have no idea why, because it works fine in other situations. I don't understand why you're doing what you're doing this way. Wouldn't it be better to patch the main postgres parser and make your functionality first class rather than having it run via an SQL string and a function that calls a secondary parser? cheers andrew yes, this is the thing i had in mind as well. what is your ultimate goal? many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] combined indexes with Gist - planner issues?
hello ... we did some experiments with doing such a table. the problem is if you want to allow arbitrary combinations of words which can be modeled perfectly with FTI. you would instantly end up with a self join with 5 relations or so - which is again bad. there are too many common words to consider doing with partly with gist and partly with a btree. is there any option to adapt gist in a way that a combined index would make sense here? many thanks, hans Heikki Linnakangas wrote: Hans-Juergen Schoenig -- PostgreSQL wrote: my knowledge of how gist works internally is not too extensive. any kickstart idea would be appreciated. If there's not too many of those common words, you can create a simple partial b-tree index for each, and handle the less common words with the gist index you have (you can drop the display_price column from the index). Another idea: Create a table containing one row for each word in each product: CREATE TABLE t_product_word (id bigint, word text, display_price numeric(10,4)); with triggers to keep it up-to-date. You can then create a regular two column b-tree index on that: CREATE INDEX idx_word_price ON t_product_word (word, display_price); And query with: SELECT p.art_number, p.title FROM t_product p INNER JOIN t_product_word pw ON p.id = pw.id WHERE pw.word = 'harddisk' ORDER BY pw.display_price DESC LIMIT 10; The t_product_word table will be huge, but with a few gigabytes of data it should still be manageable. -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tsvector extraction patch
hello, this patch has not made it through yesterday, so i am trying to send it again. i made a small patch which i found useful for my personal tasks. it would be nice to see this in 8.5. if not core then maybe contrib. it transforms a tsvector to table format which is really nice for text processing and comparison. test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure this is a good patch')); lex | rank +-- good |8 patch |9 pretti |3 sure |4 (4 rows) many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector extraction patch
Hans-Juergen Schoenig -- PostgreSQL wrote: hello, this patch has not made it through yesterday, so i am trying to send it again. i made a small patch which i found useful for my personal tasks. it would be nice to see this in 8.5. if not core then maybe contrib. it transforms a tsvector to table format which is really nice for text processing and comparison. test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure this is a good patch')); lex | rank +-- good |8 patch |9 pretti |3 sure |4 (4 rows) many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile *** postgresql-8.4.0.old/contrib/Makefile 2009-03-26 00:20:01.0 +0100 --- postgresql-8.4.0/contrib/Makefile 2009-06-29 11:03:04.0 +0200 *** WANTED_DIRS = \ *** 39,44 --- 39,45 tablefunc \ test_parser \ tsearch2 \ + tsvcontent \ vacuumlo ifeq ($(with_openssl),yes) diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile *** postgresql-8.4.0.old/contrib/tsvcontent/Makefile 1970-01-01 01:00:00.0 +0100 --- postgresql-8.4.0/contrib/tsvcontent/Makefile 2009-06-29 11:20:21.0 +0200 *** *** 0 --- 1,19 + # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $ + + MODULES = tsvcontent + DATA_built = tsvcontent.sql + DATA = uninstall_tsvcontent.sql + + + SHLIB_LINK += $(filter -lm, $(LIBS)) + + ifdef USE_PGXS + PG_CONFIG = pg_config + PGXS := $(shell $(PG_CONFIG) --pgxs) + include $(PGXS) + else + subdir = contrib/tsvcontent + top_builddir = ../.. + include $(top_builddir)/src/Makefile.global + include $(top_srcdir)/contrib/contrib-global.mk + endif diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c *** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c 1970-01-01 01:00:00.0 +0100 --- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c 2009-06-29 11:18:35.0 +0200 *** *** 0 --- 1,169 + #include postgres.h + + #include fmgr.h + #include funcapi.h + #include miscadmin.h + #include executor/spi.h + #include lib/stringinfo.h + #include nodes/nodes.h + #include utils/builtins.h + #include utils/lsyscache.h + #include utils/syscache.h + #include utils/memutils.h + #include tsearch/ts_type.h + #include tsearch/ts_utils.h + #include catalog/pg_type.h + + #include tsvcontent.h + + PG_MODULE_MAGIC; + + PG_FUNCTION_INFO_V1(tsvcontent); + + Datum + tsvcontent(PG_FUNCTION_ARGS) + { + FuncCallContext *funcctx; + TupleDesc ret_tupdesc; + AttInMetadata *attinmeta; + int call_cntr; + int max_calls; + ts_to_txt_fctx *fctx; + Datum result[2]; + bool isnull[2] = { false, false }; + MemoryContext oldcontext; + + /* input value containing the TS vector */ + TSVector in = PG_GETARG_TSVECTOR(0); + + /* stuff done only on the first call of the function */ + if (SRF_IS_FIRSTCALL()) + { + TupleDesc tupdesc; + int i, j; + char *wepv_base; + + /* create a function context for cross-call persistence */ + funcctx = SRF_FIRSTCALL_INIT(); + + /* + * switch to memory context appropriate for multiple function calls + */ + oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx); + + switch (get_call_result_type(fcinfo, NULL, tupdesc)) + { + case TYPEFUNC_COMPOSITE: + /* success */ + break; + case TYPEFUNC_RECORD: + /* failed to determine actual type of RECORD */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(function returning record called in context + that cannot accept type record))); + break; + default: + /* result type isn't composite */ + elog(ERROR, return type must be a row type); + break; + } + + /* make sure we have a persistent copy of the tupdesc */ + tupdesc = CreateTupleDescCopy(tupdesc); + + /* + * Generate attribute metadata needed later to produce tuples from raw + * C strings + */ + attinmeta = TupleDescGetAttInMetadata(tupdesc); + funcctx-attinmeta = attinmeta; + + /* allocate memory */ + fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx)); + + wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry); + + fctx-n_tsvt = 0; + for (i = 0; i in-size; i++) + { + if (in-entries[i].haspos) + { + WordEntryPosVector *wepv = (WordEntryPosVector *) + (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len)); + + fctx-n_tsvt += wepv-npos; + } + else + fctx-n_tsvt++; + } + + fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple)); + + for (i = 0, j = 0; i in-size; i++) + { + int pos = in-entries[i].pos
[HACKERS] tsvector extraction patch
hello, i made a small patch which i found useful for my personal tasks. it would be nice to see this in 8.5. if not core then maybe contrib. it transforms a tsvector to table format which is really nice for text processing and comparison. test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure this is a good patch')); lex | rank +-- good |8 patch |9 pretti |3 sure |4 (4 rows) many thanks, hans -- Cybertec Schoenig Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile *** postgresql-8.4.0.old/contrib/Makefile 2009-03-26 00:20:01.0 +0100 --- postgresql-8.4.0/contrib/Makefile 2009-06-29 11:03:04.0 +0200 *** WANTED_DIRS = \ *** 39,44 --- 39,45 tablefunc \ test_parser \ tsearch2 \ + tsvcontent \ vacuumlo ifeq ($(with_openssl),yes) diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile *** postgresql-8.4.0.old/contrib/tsvcontent/Makefile 1970-01-01 01:00:00.0 +0100 --- postgresql-8.4.0/contrib/tsvcontent/Makefile 2009-06-29 11:20:21.0 +0200 *** *** 0 --- 1,19 + # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $ + + MODULES = tsvcontent + DATA_built = tsvcontent.sql + DATA = uninstall_tsvcontent.sql + + + SHLIB_LINK += $(filter -lm, $(LIBS)) + + ifdef USE_PGXS + PG_CONFIG = pg_config + PGXS := $(shell $(PG_CONFIG) --pgxs) + include $(PGXS) + else + subdir = contrib/tsvcontent + top_builddir = ../.. + include $(top_builddir)/src/Makefile.global + include $(top_srcdir)/contrib/contrib-global.mk + endif diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c *** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c 1970-01-01 01:00:00.0 +0100 --- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c 2009-06-29 11:18:35.0 +0200 *** *** 0 --- 1,169 + #include postgres.h + + #include fmgr.h + #include funcapi.h + #include miscadmin.h + #include executor/spi.h + #include lib/stringinfo.h + #include nodes/nodes.h + #include utils/builtins.h + #include utils/lsyscache.h + #include utils/syscache.h + #include utils/memutils.h + #include tsearch/ts_type.h + #include tsearch/ts_utils.h + #include catalog/pg_type.h + + #include tsvcontent.h + + PG_MODULE_MAGIC; + + PG_FUNCTION_INFO_V1(tsvcontent); + + Datum + tsvcontent(PG_FUNCTION_ARGS) + { + FuncCallContext *funcctx; + TupleDesc ret_tupdesc; + AttInMetadata *attinmeta; + int call_cntr; + int max_calls; + ts_to_txt_fctx *fctx; + Datum result[2]; + bool isnull[2] = { false, false }; + MemoryContext oldcontext; + + /* input value containing the TS vector */ + TSVector in = PG_GETARG_TSVECTOR(0); + + /* stuff done only on the first call of the function */ + if (SRF_IS_FIRSTCALL()) + { + TupleDesc tupdesc; + int i, j; + char *wepv_base; + + /* create a function context for cross-call persistence */ + funcctx = SRF_FIRSTCALL_INIT(); + + /* + * switch to memory context appropriate for multiple function calls + */ + oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx); + + switch (get_call_result_type(fcinfo, NULL, tupdesc)) + { + case TYPEFUNC_COMPOSITE: + /* success */ + break; + case TYPEFUNC_RECORD: + /* failed to determine actual type of RECORD */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(function returning record called in context + that cannot accept type record))); + break; + default: + /* result type isn't composite */ + elog(ERROR, return type must be a row type); + break; + } + + /* make sure we have a persistent copy of the tupdesc */ + tupdesc = CreateTupleDescCopy(tupdesc); + + /* + * Generate attribute metadata needed later to produce tuples from raw + * C strings + */ + attinmeta = TupleDescGetAttInMetadata(tupdesc); + funcctx-attinmeta = attinmeta; + + /* allocate memory */ + fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx)); + + wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry); + + fctx-n_tsvt = 0; + for (i = 0; i in-size; i++) + { + if (in-entries[i].haspos) + { + WordEntryPosVector *wepv = (WordEntryPosVector *) + (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len)); + + fctx-n_tsvt += wepv-npos; + } + else + fctx-n_tsvt++; + } + + fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple)); + + for (i = 0, j = 0; i in-size; i++) + { + int pos = in-entries[i].pos; + int len = in-entries[i].len; + + if (in-entries[i].haspos) + { + WordEntryPosVector *wepv = (WordEntryPosVector *) +