Re: [HACKERS] next CommitFest

2009-11-09 Thread Hans-Juergen Schoenig -- PostgreSQL

*snip*


One pretty major fly in the ointment is that neither Hot Standby nor
Streaming Replication has been committed or shows much sign of being
about to be committed.  I think this is bad.  These are big features
that figure to have some bugs and break some things.  If they're not
committed in time for alpha3, then there won't be any significant
testing of these prior to alpha4/beta1, at the earliest.  I think
that's likely to lead to either (1) a very long beta period followed
by a late release or (2) a buggy release.  I feel like Simon Riggs and
Fujii Masao really pulled out all the stops to get these ready in time
for the September CommitFest, and while I'm not in a hurry to break
the world, I think the sooner these can hit the tree, the better of
we'll be in terms of releasing 8.5.

Just my $0.02,

  


absolutely, we should be commit this.
we did some testing and things look stable.
also, people would most likely want to build code on top of it in be 
ready for 8.5 (support scripts, etc.). this is important in order to 
create some acceptance in user land.

this stuffs seems mature and very well thought.

just my $0.02 ...

   regards,

   hans-jürgen schönig

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] draft RFC: concept for partial, wal-based replication

2009-10-30 Thread Hans-Juergen Schoenig -- PostgreSQL

hello ...

as my day has worked out quite nicely up to know i thought to f... it up 
and post a new concept which has been requested by a customer. the goal 
is to collect some feedback, ideas and so on (not to be mixed up with 
flames).
we have funding for this and we are trying to sort out how to do it the 
best way. comments are welcome ...

note, this is a first draft i want to refine based on some comments.
here we go ...


Partial WAL Replication for PostgreSQL:
---

As of now the PostgreSQL community has provided patches and functionalities
which allow full WAL-based replication as well as hot-standby. To extend 
this

functionality and to make PostgreSQL even more suitable for enterprise
computing than it is today, we have the commitment of a sponsor to fund 
partial

replication for PostgreSQL 8.5 / 8.6.

This is the first draft of a proposal to make partial WAL-based 
replication work
and to provide and additional set of fancy features to the community 
which has

been waiting for real in-core replication for a decade or more.


Why partial replication?


In some cases people have master servers which contain enormous amounts 
of data
(XX TB or so). If more than just one replica of this data is needed it 
might
happen that different slaves are used for different purposes.  This 
implies that

not all data will be used by all maschines.
An example: Consider a server at a phone company collecting phone calls, 
billing
data, and maybe network routing data. Data is used by different 
department and

one maschine is not enough to serve all three departments. With the new
functionality proposed here we could make 3 replicas each holding just a 
group
of tables for specific tasks thus allowing people to buy cheaper 
hardware for

slaves and use more maschines instead.


Current status:
---

Hot-standy and streaming replication have been a huge leap step forward 
for the
community and what is proposed here will be an extension to those 
patches and
functionalities. This concept is NOT aimed to replace anything - it is 
mainly an

addon.


Nodes and replication filters:
--

As of 8.4 standby systems are done by creating an archive_command along 
with a

base backup. Although it is easy to do some users still reported some
difficulties due to a total misunderstanding of PITR.

The idea is to add a functionality to add slaves like this:

CREATE REPLICA node_name
   CONNECT FROM SLAVE 'connect_string'
   TRANSFER COMMAND 'command'
   [ USING replication_filter ];

'command' would be any shell script copying data from the local master 
to the
new database node called node_name. Replication filters can be used to 
make X

replicas contain the same tables. Filtersets can be created like this:

CREATE REPLICATION FILTER filter_name
   [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ];

Replication filters can be modified ...

ALTER REPLICATION FILTER filter_name RENAME TO new_filtername;
ALTER REPLICATION FILTER filter_name
   { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object;

Filter sets can be dropped like this ...

DROP REPLICATION FILTER filter_name;

Internally CREATE REPLICA would initiate a base backup to the new slave 
server
just like we would do it manually otherwise. The server would 
automatically use
the user defined 'command' to copy one file after the other to the slave 
box.
The idea is basically stolen from archive_command and friends. At this 
stage we
either copy the entire instance as we would do it with a normal base 
backup or

just what is needed (defined by the replication filter). Users would
automatically only copy data to a slave which is really needed there and 
which
matches their filter config. If the copy is done, we can register the 
new node

inside a system table and commit the transaction. Also, we can automatically
create a useful recovery.conf setup - we know how to connect from the 
slave to

the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper
recovery.conf file).

Tables can easily be added or removed from a replication filter with ALTER
REPLICATION FILTER.

Replicas can be removed easily:

DROP REPLICA node_name;

Why SQL to add a node? We are convinced that this is the most simplistic 
way of

doing things.  It is the most intuitive way of doing things.  We believe it
gives users a real feeling of simplicity. The current way of doing base 
backups
should stay in place as it is - it has proven to be nice for countless 
tasks.
However, it is not suitable for managing 10 or more replicas easily. 
Especially

not when they are not full blown copies of the master.


Technical ideas:


System tables:

We suggest to always replicate the entire system catalog.  It woulde be 
a total
disaster to try some other implementation. The same applies for other 
tables - we

always replicate entire tables; no WHERE-clauses allowed when it comes to

Re: [HACKERS] contrib/plantuner - enable PostgreSQL planner hints

2009-10-12 Thread Hans-Juergen Schoenig -- PostgreSQL

hi there ...

for this work i will include you in my evening prayers for at least one 
week.
i know there has been a lot of discussion about this but what you just 
posted it excellent and more important: USEFUL to many people.


i had something else in mind recently as well: virtual indexes. it would 
help people to decide whether and index would make sense if it would 
actually exist. in some cases this would make sense as well as many 
datasets are just to big to try out if an index help.s


if there was a vote whether this should be in contrib or in core: +999 
from me ...


   many thanks,

  hans


Oleg Bartunov wrote:

Hi there,

this is an announcement of our new contribution module for PostgreSQL 
- Plantuner - enable planner hints

(http://www.sai.msu.su/~megera/wiki/plantuner).

Example:

=# LOAD 'plantuner';
=# create table test(id int);
=# create index id_idx on test(id);
=# create index id_idx2 on test(id);
=# \d test
 Table public.test
 Column |  Type   | Modifiers
+-+---
 id | integer |
Indexes:
id_idx btree (id)
id_idx2 btree (id)
=# explain select id from test where id=1;
  QUERY PLAN
---
 Bitmap Heap Scan on test  (cost=4.34..15.03 rows=12 width=4)
   Recheck Cond: (id = 1)
   -  Bitmap Index Scan on id_idx2  (cost=0.00..4.34 rows=12 width=0)
 Index Cond: (id = 1)
(4 rows)
=# set enable_seqscan=off;
=# set plantuner.forbid_index='id_idx2';
=# explain select id from test where id=1;
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=4.34..15.03 rows=12 width=4)
   Recheck Cond: (id = 1)
   -  Bitmap Index Scan on id_idx  (cost=0.00..4.34 rows=12 width=0)
 Index Cond: (id = 1)
(4 rows)
=# set plantuner.forbid_index='id_idx2,id_idx';
=# explain select id from test where id=1;
   QUERY PLAN
-
 Seq Scan on test  (cost=100.00..140.00 rows=12 width=4)
   Filter: (id = 1)
(2 rows)



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83




--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-09-23 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:
  

Jeff,


Will statement_timeout not suffice for that use case?
  


  

Well, currently statement_timeout doesn't affect waiting for locks.



Sure it does.

  

And as a DBA, I don't think I'd want the same timeout for executing
queries as for waiting for a lock.



this is exactly the point it is simply an additional use case.
while statement_timeout is perfect to kick out queries which take too 
long a lock_timeout serves a totally different purpose because you will 
get a totally different error message. imagine some old 4GL terminal 
application: in this case you will hardly reach a statement_timeout 
because you will simply want to wait until things appear on your screen. 
however, you definitely don't want to wait forever if somebody keeps 
working on some product which is on stock and never finishes.


btw, this old terminal application i was talking about is exactly the 
usecase we had - this is why this patch has been made.
we are porting roughly 2500 terminal application from informix to 
postgresql. we are talking about entire factory production lines and so 
on here (the ECPG patches posted recently are for the same project, btw.).
there are countless use-cases where you want to know whether you are 
locked out or whether you are just taking too long - the message is 
totally different. the goal of the patch is to have a mechanism to make 
sure that you don't starve to death.


as far is syntax is concerned: there are good reasons for WAIT and good 
reasons for a GUC.
while the WAIT syntax is clearly for a very precise instruction for a 
very certain place in a program, a GUC is a more overall policy. i don't 
see a reason why we should not have both anyway.
a GUC has the charm that it can be assigned to roles, procedures, etc. 
nicely a WAIT clause has the charm of being incredibly precise. i can 
see good arguments for both.
the code itself is pretty simplistic - it needs no effort to be up to 
date and it does not harm anything else - it is pretty isolated.


   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-09-23 Thread Hans-Juergen Schoenig -- PostgreSQL

Jeff Janes wrote:

Will statement_timeout not suffice for that use case?


we tried to get around it without actually touching the core but we 
really need this functionality.
patching the core here is not the primary desire we have. it is all 
about modeling some functionality which was truly missing.


   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] happy birthday Tom Lane ...

2009-09-18 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom,

On behalf of the entire PostgreSQL team here in Austria I want to wish 
you a happy birthday.

We hope that you fill be a vital part of PostgreSQL for many years to come.

  Best regards,

  Hans-Jürgen Schönig + team


--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

hello everybody,

we are seriously fighting with some planner issue which seems to be 
slightly obscure to us.

we have a table which is nicely indexed (several GB in size).
i am using btree_gist operator classes to use a combined index including 
an FTI expression along with a number:


db=# \d product.t_product
  Table product.t_product
   Column | Type  |   
Modifiers   
---+---+
id| bigint| not null default 
nextval('product.t_product_id_seq'::regclass)

shop_id   | integer   |
art_number| text  |
title | text  |
description   | text  |
display_price | numeric(10,4) |

Indexes:
   t_product_pkey PRIMARY KEY, btree (id)
   idx_test gist (display_price, to_tsvector('german'::regconfig, 
(title || ' '::text) || description))
*idx_test2 gist (to_tsvector('german'::regconfig, (title || ' 
'::text) || description), display_price)*



what we basically expected here is that Postgres will scan the table 
using the index to give us the cheapest products containing the words we 
are looking for.
i am totally surprised to see that we have to fetch all products given 
the words, sort and then do the limit.
this totally kills performance because some words simply show up 
millions of times. this totally kills everything.


the plans look like this:

db=#  explain analyze SELECT art_number, title
   FROM product.t_product
   WHERE to_tsvector('german'::regconfig, (title || ' '::text) || 
description) @@ plainto_tsquery('harddisk')

   ORDER BY display_price
   LIMIT 10;
  QUERY 
PLAN  

Limit  (cost=108340.08..108340.10 rows=10 width=54) (actual 
time=1328.900..1328.909 rows=10 loops=1)
  -  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=1328.899..1328.905 rows=10 loops=1)

Sort Key: display_price
Sort Method:  top-N heapsort  Memory: 18kB
-  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
rows=32961 width=54) (actual time=1052.706..1328.772 rows=55 loops=1)
  Recheck Cond: (to_tsvector('german'::regconfig, ((title 
|| ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))
  -  Bitmap Index Scan on idx_test2  (cost=0.00..2708.38 
rows=32961 width=0) (actual time=1052.576..1052.576 rows=55 loops=1)
Index Cond: (to_tsvector('german'::regconfig, 
((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))

Total runtime: 1328.942 ms
(9 rows)


runtime increases badly if words start to be more likely ...


db=#  explain analyze SELECT art_number, title
   FROM product.t_product
   WHERE to_tsvector('german'::regconfig, (title || ' '::text) || 
description) @@ plainto_tsquery('spiel')

   ORDER BY display_price
   LIMIT 10;
 QUERY 
PLAN 
--
Limit  (cost=108340.08..108340.10 rows=10 width=54) (actual 
time=33489.675..33489.682 rows=10 loops=1)
  -  Sort  (cost=108340.08..108422.48 rows=32961 width=54) (actual 
time=33489.675..33489.675 rows=10 loops=1)

Sort Key: display_price
Sort Method:  top-N heapsort  Memory: 18kB
-  Bitmap Heap Scan on t_product  (cost=2716.62..107627.80 
rows=32961 width=54) (actual time=774.923..33408.522 rows=56047 loops=1)
  Recheck Cond: (to_tsvector('german'::regconfig, ((title 
|| ' '::text) || description)) @@ plainto_tsquery('spiel'::text))
  -  Bitmap Index Scan on idx_test2  (cost=0.00..2708.38 
rows=32961 width=0) (actual time=759.078..759.078 rows=56047 loops=1)
Index Cond: (to_tsvector('german'::regconfig, 
((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text))

Total runtime: 33489.906 ms
(9 rows)

i am wondering why postgres is not able to use a combined index here?
is this some obscure thing related to gist, a logical problem or a 
planner deficiency?


ideas are welcome.

   many thanks,

  hans



--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

Tom Lane wrote:

Hans-Juergen Schoenig -- PostgreSQL postg...@cybertec.at writes:
  
what we basically expected here is that Postgres will scan the table 
using the index to give us the cheapest products containing the words we 
are looking for.
i am totally surprised to see that we have to fetch all products given 
the words, sort and then do the limit.



I don't know why you'd find that surprising.  GIST indexes have no
support for ordering.

regards, tom lane

  


ok, i thought it would be something gist specific i was not aware of.
the golden question now is: i am looking for the cheapest products given 
a certain text in an insane amount of data.
how to do it? other quals which could narrow down the amount of data 
would not help.


i cannot see an option with regular weapons ...
maybe you can an idea how to fix core to make it work? maybe there is a 
mechanism we could need.

we really have to make this work - no matter what it takes.
we are willing to put effort into that.

   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bison crashes postgresql

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

Andrew Dunstan wrote:



Werner Echezuria wrote:

Hi, I have a code in which I translate some code from sqlf to sql, but
when it comes to yy_parse the server crashes, I have no idea why,
because it works fine in other situations.
  


I don't understand why you're doing what you're doing this way. 
Wouldn't it be better to patch the main postgres parser and make your 
functionality first class rather than having it run via an SQL string 
and a function that calls a secondary parser?


cheers

andrew



yes, this is the thing i had in mind as well.
what is your ultimate goal?

   many thanks,

  hans


--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] combined indexes with Gist - planner issues?

2009-08-31 Thread Hans-Juergen Schoenig -- PostgreSQL

hello ...

we did some experiments with doing such a table.
the problem is if you want to allow arbitrary combinations of words 
which can be modeled perfectly with FTI.
you would instantly end up with a self join with 5 relations or so - 
which is again bad.


there are too many common words to consider doing with partly with gist 
and partly with a btree.


is there any option to adapt gist in a way that a combined index would 
make sense here?


   many thanks,

  hans




Heikki Linnakangas wrote:

Hans-Juergen Schoenig -- PostgreSQL wrote:
  

my knowledge of how gist works internally is not too extensive. any
kickstart idea would be appreciated.



If there's not too many of those common words, you can create a simple
partial b-tree index for each, and handle the less common words with the
gist index you have (you can drop the display_price column from the index).

Another idea:

Create a table containing one row for each word in each product:

CREATE TABLE t_product_word (id bigint, word text, display_price
numeric(10,4));

with triggers to keep it up-to-date. You can then create a regular two
column b-tree index on that:

CREATE INDEX idx_word_price ON t_product_word (word, display_price);

And query with:

SELECT p.art_number, p.title
   FROM t_product p INNER JOIN t_product_word pw ON p.id = pw.id
   WHERE pw.word = 'harddisk'
ORDER BY pw.display_price DESC LIMIT 10;

The t_product_word table will be huge, but with a few gigabytes of data
it should still be manageable.

  



--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tsvector extraction patch

2009-07-03 Thread Hans-Juergen Schoenig -- PostgreSQL

hello,

this patch has not made it through yesterday, so i am trying to send it 
again.

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure 
this is a good patch'));

lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

  many thanks,

 hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector extraction patch

2009-07-03 Thread Hans-Juergen Schoenig -- PostgreSQL

Hans-Juergen Schoenig -- PostgreSQL wrote:

hello,

this patch has not made it through yesterday, so i am trying to send 
it again.

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty 
sure this is a good patch'));

lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

  many thanks,

 hans




--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile
*** postgresql-8.4.0.old/contrib/Makefile	2009-03-26 00:20:01.0 +0100
--- postgresql-8.4.0/contrib/Makefile	2009-06-29 11:03:04.0 +0200
*** WANTED_DIRS = \
*** 39,44 
--- 39,45 
  		tablefunc	\
  		test_parser	\
  		tsearch2	\
+ 		tsvcontent	\
  		vacuumlo
  
  ifeq ($(with_openssl),yes)
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile
*** postgresql-8.4.0.old/contrib/tsvcontent/Makefile	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/Makefile	2009-06-29 11:20:21.0 +0200
***
*** 0 
--- 1,19 
+ # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $
+ 
+ MODULES = tsvcontent
+ DATA_built = tsvcontent.sql
+ DATA = uninstall_tsvcontent.sql
+ 
+ 
+ SHLIB_LINK += $(filter -lm, $(LIBS))
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/tsvcontent
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c	2009-06-29 11:18:35.0 +0200
***
*** 0 
--- 1,169 
+ #include postgres.h
+ 
+ #include fmgr.h
+ #include funcapi.h
+ #include miscadmin.h
+ #include executor/spi.h
+ #include lib/stringinfo.h
+ #include nodes/nodes.h
+ #include utils/builtins.h
+ #include utils/lsyscache.h
+ #include utils/syscache.h
+ #include utils/memutils.h
+ #include tsearch/ts_type.h
+ #include tsearch/ts_utils.h
+ #include catalog/pg_type.h
+ 
+ #include tsvcontent.h
+ 
+ PG_MODULE_MAGIC;
+ 
+ PG_FUNCTION_INFO_V1(tsvcontent);
+ 
+ Datum
+ tsvcontent(PG_FUNCTION_ARGS)
+ {
+ 	FuncCallContext 	*funcctx;
+ 	TupleDesc		ret_tupdesc;
+ 	AttInMetadata		*attinmeta;
+ 	int			call_cntr;
+ 	int			max_calls;
+ 	ts_to_txt_fctx		*fctx;
+ 	Datum			result[2];
+ 	bool			isnull[2] = { false, false };
+ 	MemoryContext 		oldcontext;
+ 
+ 	/* input value containing the TS vector */
+ 	TSVector	in = PG_GETARG_TSVECTOR(0);
+ 
+ 	/* stuff done only on the first call of the function */
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		TupleDesc	tupdesc;
+ 		int		i, j;
+ 		char		*wepv_base;
+ 
+ 		/* create a function context for cross-call persistence */
+ 		funcctx = SRF_FIRSTCALL_INIT();
+ 
+ 		/*
+ 		 * switch to memory context appropriate for multiple function calls
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+ 
+ 		switch (get_call_result_type(fcinfo, NULL, tupdesc))
+ 		{
+ 			case TYPEFUNC_COMPOSITE:
+ /* success */
+ break;
+ 			case TYPEFUNC_RECORD:
+ /* failed to determine actual type of RECORD */
+ ereport(ERROR,
+ 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 		errmsg(function returning record called in context 
+ that cannot accept type record)));
+ break;
+ 			default:
+ /* result type isn't composite */
+ elog(ERROR, return type must be a row type);
+ break;
+ 		}
+ 
+ 		/* make sure we have a persistent copy of the tupdesc */
+ 		tupdesc = CreateTupleDescCopy(tupdesc);
+ 
+ 		/*
+ 		 * Generate attribute metadata needed later to produce tuples from raw
+ 		 * C strings
+ 		 */
+ 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 		funcctx-attinmeta = attinmeta;
+ 
+ 		/* allocate memory */
+ 		fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx));
+ 
+ 		wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry);
+ 		
+ 		fctx-n_tsvt = 0;
+ 		for (i = 0; i  in-size; i++)
+ 		{
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len));
+ 
+ fctx-n_tsvt += wepv-npos;
+ 			}
+ 			else
+ fctx-n_tsvt++;
+ 		}
+ 
+ 		fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple));
+ 
+ 		for (i = 0, j = 0; i  in-size; i++)
+ 		{
+ 			int pos = in-entries[i].pos

[HACKERS] tsvector extraction patch

2009-07-02 Thread Hans-Juergen Schoenig -- PostgreSQL

hello,

i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text 
processing and comparison.


test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure 
this is a good patch'));

 lex   | rank
+--
good   |8
patch  |9
pretti |3
sure   |4
(4 rows)

   many thanks,

  hans

--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile
*** postgresql-8.4.0.old/contrib/Makefile	2009-03-26 00:20:01.0 +0100
--- postgresql-8.4.0/contrib/Makefile	2009-06-29 11:03:04.0 +0200
*** WANTED_DIRS = \
*** 39,44 
--- 39,45 
  		tablefunc	\
  		test_parser	\
  		tsearch2	\
+ 		tsvcontent	\
  		vacuumlo
  
  ifeq ($(with_openssl),yes)
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile
*** postgresql-8.4.0.old/contrib/tsvcontent/Makefile	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/Makefile	2009-06-29 11:20:21.0 +0200
***
*** 0 
--- 1,19 
+ # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $
+ 
+ MODULES = tsvcontent
+ DATA_built = tsvcontent.sql
+ DATA = uninstall_tsvcontent.sql
+ 
+ 
+ SHLIB_LINK += $(filter -lm, $(LIBS))
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/tsvcontent
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c	1970-01-01 01:00:00.0 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c	2009-06-29 11:18:35.0 +0200
***
*** 0 
--- 1,169 
+ #include postgres.h
+ 
+ #include fmgr.h
+ #include funcapi.h
+ #include miscadmin.h
+ #include executor/spi.h
+ #include lib/stringinfo.h
+ #include nodes/nodes.h
+ #include utils/builtins.h
+ #include utils/lsyscache.h
+ #include utils/syscache.h
+ #include utils/memutils.h
+ #include tsearch/ts_type.h
+ #include tsearch/ts_utils.h
+ #include catalog/pg_type.h
+ 
+ #include tsvcontent.h
+ 
+ PG_MODULE_MAGIC;
+ 
+ PG_FUNCTION_INFO_V1(tsvcontent);
+ 
+ Datum
+ tsvcontent(PG_FUNCTION_ARGS)
+ {
+ 	FuncCallContext 	*funcctx;
+ 	TupleDesc		ret_tupdesc;
+ 	AttInMetadata		*attinmeta;
+ 	int			call_cntr;
+ 	int			max_calls;
+ 	ts_to_txt_fctx		*fctx;
+ 	Datum			result[2];
+ 	bool			isnull[2] = { false, false };
+ 	MemoryContext 		oldcontext;
+ 
+ 	/* input value containing the TS vector */
+ 	TSVector	in = PG_GETARG_TSVECTOR(0);
+ 
+ 	/* stuff done only on the first call of the function */
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		TupleDesc	tupdesc;
+ 		int		i, j;
+ 		char		*wepv_base;
+ 
+ 		/* create a function context for cross-call persistence */
+ 		funcctx = SRF_FIRSTCALL_INIT();
+ 
+ 		/*
+ 		 * switch to memory context appropriate for multiple function calls
+ 		 */
+ 		oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+ 
+ 		switch (get_call_result_type(fcinfo, NULL, tupdesc))
+ 		{
+ 			case TYPEFUNC_COMPOSITE:
+ /* success */
+ break;
+ 			case TYPEFUNC_RECORD:
+ /* failed to determine actual type of RECORD */
+ ereport(ERROR,
+ 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 		errmsg(function returning record called in context 
+ that cannot accept type record)));
+ break;
+ 			default:
+ /* result type isn't composite */
+ elog(ERROR, return type must be a row type);
+ break;
+ 		}
+ 
+ 		/* make sure we have a persistent copy of the tupdesc */
+ 		tupdesc = CreateTupleDescCopy(tupdesc);
+ 
+ 		/*
+ 		 * Generate attribute metadata needed later to produce tuples from raw
+ 		 * C strings
+ 		 */
+ 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 		funcctx-attinmeta = attinmeta;
+ 
+ 		/* allocate memory */
+ 		fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx));
+ 
+ 		wepv_base = (char *)in + offsetof(TSVectorData, entries) + in-size * sizeof(WordEntry);
+ 		
+ 		fctx-n_tsvt = 0;
+ 		for (i = 0; i  in-size; i++)
+ 		{
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in-entries[i].pos + SHORTALIGN(in-entries[i].len));
+ 
+ fctx-n_tsvt += wepv-npos;
+ 			}
+ 			else
+ fctx-n_tsvt++;
+ 		}
+ 
+ 		fctx-tsvt = palloc(fctx-n_tsvt * sizeof(tsvec_tuple));
+ 
+ 		for (i = 0, j = 0; i  in-size; i++)
+ 		{
+ 			int pos = in-entries[i].pos;
+ 			int len = in-entries[i].len;
+ 
+ 			if (in-entries[i].haspos)
+ 			{
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+