Re: [HACKERS] Index Tuple Compression Approach?

2007-08-16 Thread Heikki Linnakangas
Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 
 That general approach of storing a common part leading part just once is
 called prefix compression. Yeah, it helps a lot on long text fields.
 Tree structures like file paths in particular.
 
 You kind of want to do avoid both the prefix and the suffix, no? 

You're much more likely to find common prefixes than suffixes in an
index page, because of the ordering. I suppose compressing the suffix
would be useful in some cases as well. You might be better off with some
generic compression algorithm at that point, though.

 It's been discussed before. One big problem is extracting the common
 leading part. You could only do it for text, 
 
 Or for multi-column indexes

Oh yeah, that you could do more easily.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Index Tuple Compression Approach?

2007-08-16 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 
 That general approach of storing a common part leading part just once is
 called prefix compression. Yeah, it helps a lot on long text fields.
 Tree structures like file paths in particular.
 
 You kind of want to do avoid both the prefix and the suffix, no? 

 You're much more likely to find common prefixes than suffixes in an
 index page, because of the ordering. I suppose compressing the suffix
 would be useful in some cases as well. You might be better off with some
 generic compression algorithm at that point, though.

Sorry, by suffix I don't mean common sufixes, I mean the bits of the key
following the point which discriminates between the left and right side of the
tree.

So for example if you're indexing a text field and have a
tree structure like:

 Redhat Fedora Core 7
 / \
  Debian Etch (Unstable)  Ubuntu hoary

We don't really need the whole of Redhat Fedora Core 7 in the index node. We
could actually get by with just R. Everything before R is on the left and
everything after R is on the right.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] build farm failures

2007-08-16 Thread Michael Meskes
Hi,

we have two build farm members failing to make since I committed teh
ecpg changes: echidna and herring.

It looks like they are still using an old preproc.c although they
checked out the new preproc.y. I have no idea how this is supposed to
work so could someone please enlighten me?

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Michael Paesold

Bruce Momjian wrote:

Uh, how are we going to prevent the auto-casting to tsvector from using
the default GUC config, e.g. CREATE INDEX i ON x USING GIN(col)?

This is where I started to see the need for education and error-prone
nature of the default GUC just wasn't worth having it, though I know
others disagree.


It can be removed quite easily. AFAIR, this feature was added on 
suggestion of Tom Lane. It was certainly only added in this 
tsearch-to-core release cycle, see here:


http://archives.postgresql.org/pgsql-hackers/2007-03/msg01384.php

Teodor Sigaev wrote:
 2) added operator class for text and varchar
CREATE INDEX idxname ON tblname USING GIN ( textcolumn );

So just remove the operator class or don't specify it as default 
operator class for GIN, and the thing is gone. Perhaps there is a better 
way to do this, though.


[...digging...] The idea was born in the thread starting here (involving 
Tom Lane, Joshua Drake, and Teodor Sigaev):

http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php
with the conclusion here:
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php

Best Regards
Michael Paesold


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] build farm failures

2007-08-16 Thread Andrew Dunstan



Michael Meskes wrote:

Hi,

we have two build farm members failing to make since I committed teh
ecpg changes: echidna and herring.

It looks like they are still using an old preproc.c although they
checked out the new preproc.y. I have no idea how this is supposed to
work so could someone please enlighten me?

  


Yes it looks like that. But the buildfarm client doesn't actually build 
in the repo normally - it builds in a temp copy which is removed at the 
end of the run, precisely to avoid this kind of problem, so I'm a bit 
mystified how it can happen. In fact we go to some lengths to ensure 
that there are no extraneous files, but this one might not get caught by 
that because it is is in .cvsignore. This sort of thing is usually a 
symptom of somebody having run a build in the repo directly, a thing 
that buildfarm owners have been repeatedly advised not to do.


Anyway, the simple solution is to ask Darcy to blow away the repo (these 
buildfarm clients share a single cvs checkout) so that the buildfarm 
client will get a fresh checkout next time it's run.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] XID wraparound and busy databases

2007-08-16 Thread Simon Riggs
On Wed, 2007-08-15 at 12:49 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Maybe we can do something to reduce the xid consumption? For example,
  reuse xids for read-only queries.
 
 Hmm, that's an idea.

That would be a very useful idea for additional scalability and it does
sound like a good idea.

We kicked around the idea of reusing snapshots previously also, as a way
of reducing access to the ProcArray. That idea had some problems, but if
your workload was all read-only and we reused xids then everybody would
have an identical snapshot, so reusing it would make lots of sense.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Tom Lane
Michael Paesold [EMAIL PROTECTED] writes:
 Teodor Sigaev wrote:
 2) added operator class for text and varchar
 CREATE INDEX idxname ON tblname USING GIN ( textcolumn );

 So just remove the operator class or don't specify it as default 
 operator class for GIN, and the thing is gone. Perhaps there is a better 
 way to do this, though.

 [...digging...] The idea was born in the thread starting here (involving 
 Tom Lane, Joshua Drake, and Teodor Sigaev):
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php
 with the conclusion here:
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php

Yeah, unfortunately we overlooked the implications of the conversion to
tsvector being environment-dependent.  Those opclasses will have to go
away again.  AFAICS the only safe way to build an index directly on a
text column is

CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn));

ie, you hardwire the configuration name directly into the index
definition.  Similarly, if you're using a trigger to build a
materialized tsvector column, you need to hardwire the config
name into the trigger definition.

An alternative in both cases is to take the config name from
another field of the table row.  This is what you'd need to do
for the advanced cases where you use different configs for
different entries in the same table.

We can still have a GUC default_text_search_config, but we have
to design the system around the assumption that that should only
be referenced during *queries*, not during updates.  That's the
only safe way to let it be changeable.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] XID wraparound and busy databases

2007-08-16 Thread Florian G. Pflug

Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

Is enlarging the xid field something we should consider for 8.4?


No.  We just got the tuple header down to 24 bytes, we are not going
to give that back and then some.

If you are processing 6K transactions per second, you can afford to
vacuum every couple days... and probably need to vacuum much more often
than that anyway, to avoid table bloat.

Possibly your respondent should think about trying to do more than one
thing per transaction?


I'm wondering how many of those 6k xacts/second are actually modifying
data. If a large percentage of those are readonly queries, than the need
for vacuuming could be reduced if postgres assigned an xid only if that
xid really hits the disk. Otherwise (for purely select-type queries) it
could use some special xid value.

This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch.

greetings, Florian Pflug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Index Tuple Compression Approach?

2007-08-16 Thread Chris Browne
[EMAIL PROTECTED] (Gregory Stark) writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 
 That general approach of storing a common part leading part just once is
 called prefix compression. Yeah, it helps a lot on long text fields.
 Tree structures like file paths in particular.
 
 You kind of want to do avoid both the prefix and the suffix, no? 

 You're much more likely to find common prefixes than suffixes in an
 index page, because of the ordering. I suppose compressing the suffix
 would be useful in some cases as well. You might be better off with some
 generic compression algorithm at that point, though.

 Sorry, by suffix I don't mean common sufixes, I mean the bits of the key
 following the point which discriminates between the left and right side of the
 tree.

 So for example if you're indexing a text field and have a
 tree structure like:

  Redhat Fedora Core 7
  / \
   Debian Etch (Unstable)  Ubuntu hoary

 We don't really need the whole of Redhat Fedora Core 7 in the index node. We
 could actually get by with just R. Everything before R is on the left and
 everything after R is on the right.

Right.  The case where you get more characters than just R is when
you introduce extra entries that have the same prefix.  Say, Redhat
Fedora Core 4, Redhat Fedora Core 5, Redhat Fedora Core 6.  And,
for good measure, let's throw in Redhat RHAS 3, Redhat RHAS 4, and
Redhat RHAS 5.  

In that case, you'd have substrings:
  R
  edhat 
  Fedora Core 
  RHAS 
as discriminators.
-- 
cbbrowne,@,cbbrowne.com
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
If a mute swears, does his mother wash his hands with soap? 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Deferred Constraints

2007-08-16 Thread Rodrigo Hjort
Dear Hackers,

Could you point me the first version PostgreSQL started to support
Deferred Constraints (ie: DEFERRABLE keyword on foreign keys
creation)? I guess it is earlier than 7.0, right?

-- 
Regards,

Rodrigo Hjort
http://icewall.org/~hjort

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Deferred Constraints

2007-08-16 Thread Decibel!
On Thu, Aug 16, 2007 at 12:45:23PM -0300, Rodrigo Hjort wrote:
 Dear Hackers,
 
 Could you point me the first version PostgreSQL started to support
 Deferred Constraints (ie: DEFERRABLE keyword on foreign keys
 creation)? I guess it is earlier than 7.0, right?

You could look through the release notes... they go way back.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpTmJemIG86f.pgp
Description: PGP signature


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Yeah, unfortunately we overlooked the implications of the conversion to
 tsvector being environment-dependent.  Those opclasses will have to go
 away again.  AFAICS the only safe way to build an index directly on a
 text column is

 CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn));

Is there a null configuration which could be the default for the casts? So the
syntax would still work and would generate an index which worked well but has
no stop words, no stemming, etc?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-16 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 There seems to be something wrong with this commit notification.  This
 file

 pgsql/src/test/regress/expected:
 foreign_key.out (r1.44 - r1.45)
 (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/foreign_key.out?r1=1.44r2=1.45)

 does not seem to have a 1.45 revision.  Indeed, the changeset in this
 patch for this file is in 1.44, not 1.45.

No, 1.44 was a month ago.  1.45 does exist in the master CVS repo
AFAICS.

It looks to me like cvsweb is showing a version that hasn't gotten
updated for a day or two.  Some fallout of the master server move
no doubt.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] build farm failures

2007-08-16 Thread Andrew Dunstan



Darcy Buskermolen wrote:

This sort of thing is usually a
symptom of somebody having run a build in the repo directly, a thing
that buildfarm owners have been repeatedly advised not to do.



This is something I do not recall doing, however it's possible.  though this 
does make me ask why are the build dependencies in the Makefile are not 
properly setup to tell that the .y needs to be rebuilt (which I would assume 
would make this problem also go away)


  


Thje way cvs works is that it gives the file the date it has in the 
repository, so if your preproc.c is newer than the preproc.y, make will 
detect that and not rebuild it.  If Michael's checkin occurs between the 
time the repo is updated and the time bison gets run on the original 
file this will happen. But if you never ever build in the repo then it 
won't, because buildfarm only ever builds in a copy (unless you're 
building with vpath, in which case it cleans up the generated files).


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] build farm failures

2007-08-16 Thread Darcy Buskermolen
On Thursday 16 August 2007 04:29:41 Andrew Dunstan wrote:
 Michael Meskes wrote:
  Hi,
 
  we have two build farm members failing to make since I committed teh
  ecpg changes: echidna and herring.
 
  It looks like they are still using an old preproc.c although they
  checked out the new preproc.y. I have no idea how this is supposed to
  work so could someone please enlighten me?

 Yes it looks like that. But the buildfarm client doesn't actually build
 in the repo normally - it builds in a temp copy which is removed at the
 end of the run, precisely to avoid this kind of problem, so I'm a bit
 mystified how it can happen. In fact we go to some lengths to ensure
 that there are no extraneous files, but this one might not get caught by
 that because it is is in .cvsignore. This sort of thing is usually a
 symptom of somebody having run a build in the repo directly, a thing
 that buildfarm owners have been repeatedly advised not to do.

This is something I do not recall doing, however it's possible.  though this 
does make me ask why are the build dependencies in the Makefile are not 
properly setup to tell that the .y needs to be rebuilt (which I would assume 
would make this problem also go away)


 Anyway, the simple solution is to ask Darcy to blow away the repo (these
 buildfarm clients share a single cvs checkout) so that the buildfarm
 client will get a fresh checkout next time it's run.

Let me go nuke the tree, and we'll try again


 cheers

 andrew

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-16 Thread Magnus Hagander
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 There seems to be something wrong with this commit notification.  This
 file
 
 pgsql/src/test/regress/expected:
 foreign_key.out (r1.44 - r1.45)
 (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/foreign_key.out?r1=1.44r2=1.45)
 
 does not seem to have a 1.45 revision.  Indeed, the changeset in this
 patch for this file is in 1.44, not 1.45.
 
 No, 1.44 was a month ago.  1.45 does exist in the master CVS repo
 AFAICS.
 
 It looks to me like cvsweb is showing a version that hasn't gotten
 updated for a day or two.  Some fallout of the master server move
 no doubt.

Yes, it seems to be caused by the bug in the rsync server on the old
machine. It gets fixed eventually by itself (I pushed it through now).

Marc - how much work to update that one on the old server? I know you
have a new one coming up, but as a stop-gap fix on the old one?

//Magnus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Is there a null configuration which could be the default for the casts?

If we did that then there would be no point in having a GUC variable,
because it wouldn't control anything.

While that is certainly an alternative solution, I think it's riskier
than having the GUC variable and just preventing (or at least
discouraging) people from relying on it in their index definitions.
The problem is that people will resort to ALTER CONFIGURATION default
to tune their setups, and anytime you make a nontrivial change that way,
you run the risk of breaking your existing indexes.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] build farm failures

2007-08-16 Thread Alvaro Herrera
Andrew Dunstan wrote:


 Darcy Buskermolen wrote:
 This sort of thing is usually a
 symptom of somebody having run a build in the repo directly, a thing
 that buildfarm owners have been repeatedly advised not to do.
 

 This is something I do not recall doing, however it's possible.  though 
 this does make me ask why are the build dependencies in the Makefile are 
 not properly setup to tell that the .y needs to be rebuilt (which I would 
 assume would make this problem also go away)

 Thje way cvs works is that it gives the file the date it has in the 
 repository, so if your preproc.c is newer than the preproc.y, make will 
 detect that and not rebuild it.  If Michael's checkin occurs between the 
 time the repo is updated and the time bison gets run on the original file 
 this will happen. But if you never ever build in the repo then it won't, 
 because buildfarm only ever builds in a copy (unless you're building with 
 vpath, in which case it cleans up the generated files).

Hum, so why not clean up the files when not in vpath as well?

find . -name .cvsignore | while read line
do
dir=$(dirname $line)
cd $dir
rm -fv `cat .cvsignore`
cd $OLDPWD
done

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] build farm failures

2007-08-16 Thread Andrew Dunstan



Alvaro Herrera wrote:

Andrew Dunstan wrote:
  

Darcy Buskermolen wrote:


This sort of thing is usually a
symptom of somebody having run a build in the repo directly, a thing
that buildfarm owners have been repeatedly advised not to do.


This is something I do not recall doing, however it's possible.  though 
this does make me ask why are the build dependencies in the Makefile are 
not properly setup to tell that the .y needs to be rebuilt (which I would 
assume would make this problem also go away)
  
Thje way cvs works is that it gives the file the date it has in the 
repository, so if your preproc.c is newer than the preproc.y, make will 
detect that and not rebuild it.  If Michael's checkin occurs between the 
time the repo is updated and the time bison gets run on the original file 
this will happen. But if you never ever build in the repo then it won't, 
because buildfarm only ever builds in a copy (unless you're building with 
vpath, in which case it cleans up the generated files).



Hum, so why not clean up the files when not in vpath as well?

find . -name .cvsignore | while read line
do
dir=$(dirname $line)
cd $dir
rm -fv `cat .cvsignore`
cd $OLDPWD
done

  


Because they are not supposed to be there in the first place! If the 
buildfarm owner builds in the repo that is pilot error.


And, btw,  buildfarm is not a shell script. We use File::Find to do this 
sort of thing.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] build farm failures

2007-08-16 Thread Michael Meskes
On Thu, Aug 16, 2007 at 08:24:14AM -0700, Darcy Buskermolen wrote:
 This is something I do not recall doing, however it's possible.  though this 
 does make me ask why are the build dependencies in the Makefile are not 
 properly setup to tell that the .y needs to be rebuilt (which I would assume 
 would make this problem also go away)

Frankly I have no idea. The dependencies are the same as with the
backend's gram.y file:

$(srcdir)/preproc.c: $(srcdir)/preproc.h ;

$(srcdir)/preproc.h: preproc.y

The backend has:

$(srcdir)/gram.c: $(srcdir)/parse.h ;

$(srcdir)/parse.h: gram.y

So except for the different naming it's the same. However, we haven't
had that problem with the backend so far, or did we?

What do I fail to see?

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] build farm failures

2007-08-16 Thread Andrew Dunstan



Michael Meskes wrote:

The backend has:

$(srcdir)/gram.c: $(srcdir)/parse.h ;

$(srcdir)/parse.h: gram.y

So except for the different naming it's the same. However, we haven't
had that problem with the backend so far, or did we?

What do I fail to see?

  


We have had problems in the past. If the user builds at a point in time 
after the .y file is checked in then the generated file is newer and if 
it's not removed will never be regenerated, even if they do a subsequent 
cvs update.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Josh Berkus
All,

First off, I'll assert that backup/restore is a serious issue and while the 
folks who want Tsearch in core now are dismissing it, we'll be fielding the 
complaints later.  Any solution which involves setting a GUC at restore time 
*which could vary per table or even column* isn't acceptable.  We used to do 
the \SET thing for table ownership with backup/restore, and you *know* how 
many restore failures that caused.

Basically, restore happens at two times: (1) when the server fails and you 
need to recover, and (2) when you're upgrading, already a painful process.  
Glitches which occur at these times cause panic, angry user e-mails and 
people switching away from PostgreSQL.  It's just not acceptable for us to 
put new potential booby-traps in the way of restore.

Second, as attractive as the idea is, I can't see how a typemod would work.  
It's not like we have a fixed list of dictionaries; people can create their 
own.  If we wanted to clean up the syntax I suppose we could have a form of 
to_tsvector which took a two-column composite value as if it were a 
multicolumn index:

CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text )

 hmmm, that wouldn't work as syntax, would it?  We can't accept a constant 
as a column in a multi-column index, can we?  Another reason why we can't do 
mods for 8.3.

This means, from my perspective, that the only reasonable course for 8.3 is to 
require the 2-parameter form of to_tsvector for indexes.  I'll say that in 
the applications I've developed with TSearch2 I use the 2-parameter form of 
to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch 
to be highly unreliable if I don't specify.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] building gist index on int[] field is terrible slow. Is it bug?

2007-08-16 Thread Pavel Stehule
Hello

I tested contrib package intarray and gist support from this package.
I was surpirised. Index building needed lot of time.

10K fields .. 106sec
20K f ..265 sec
30Kf  .. 450 sec
50Kf  .. 1283sec

building gin index for 50K fields needed 0.5sec

Regards
Pavel Stehule

sample of data:
pavel=# select * from test limit 10;
   a
---
 {4209,4207,4197,2066}
 {4832,3004}
 {4629}
 {3243}
 {4816}
 {3726}
 {4834}
 {1459,3160,3984}
 {4569}
 {4164,1307,962,4482}
(10 rows)

pavel=# select version();
version
---
 PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070502 (Red Hat 4.1.2-12)
(1 row)

Time: 442,034 ms

pavel=# show maintenance_work_mem ;
 maintenance_work_mem
--
 16MB
(1 row)

pavel=# set maintenance_work_mem to '300MB';
SET
Time: 0,230 ms
pavel=#
pavel=# CREATE index fooidx on test using gist(a gist__int_ops);
CREATE INDEX
Time: 1269276,866 ms

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Oleg Bartunov

On Thu, 16 Aug 2007, Josh Berkus wrote:


All,

First off, I'll assert that backup/restore is a serious issue and while the
folks who want Tsearch in core now are dismissing it, we'll be fielding the
complaints later.  Any solution which involves setting a GUC at restore time
*which could vary per table or even column* isn't acceptable.  We used to do
the \SET thing for table ownership with backup/restore, and you *know* how
many restore failures that caused.

Basically, restore happens at two times: (1) when the server fails and you
need to recover, and (2) when you're upgrading, already a painful process.
Glitches which occur at these times cause panic, angry user e-mails and
people switching away from PostgreSQL.  It's just not acceptable for us to
put new potential booby-traps in the way of restore.

Second, as attractive as the idea is, I can't see how a typemod would work.
It's not like we have a fixed list of dictionaries; people can create their
own.  If we wanted to clean up the syntax I suppose we could have a form of
to_tsvector which took a two-column composite value as if it were a
multicolumn index:

CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text )


Josh, all my respects to you, but text searching is not about index at all.
Text searching is about tsvector and tsquery data type



 hmmm, that wouldn't work as syntax, would it?  We can't accept a constant
as a column in a multi-column index, can we?  Another reason why we can't do
mods for 8.3.

This means, from my perspective, that the only reasonable course for 8.3 is to
require the 2-parameter form of to_tsvector for indexes.  I'll say that in
the applications I've developed with TSearch2 I use the 2-parameter form of
to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch
to be highly unreliable if I don't specify.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] building gist index on int[] field is terrible slow. Is it bug?

2007-08-16 Thread Oleg Bartunov

On Thu, 16 Aug 2007, Pavel Stehule wrote:


Hello

I tested contrib package intarray and gist support from this package.
I was surpirised. Index building needed lot of time.

10K fields .. 106sec
20K f ..265 sec
30Kf  .. 450 sec
50Kf  .. 1283sec

building gin index for 50K fields needed 0.5sec


you probably need gist__intbig_ops



Regards
Pavel Stehule

sample of data:
pavel=# select * from test limit 10;
  a
---
{4209,4207,4197,2066}
{4832,3004}
{4629}
{3243}
{4816}
{3726}
{4834}
{1459,3160,3984}
{4569}
{4164,1307,962,4482}
(10 rows)

pavel=# select version();
   version
---
PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070502 (Red Hat 4.1.2-12)
(1 row)

Time: 442,034 ms

pavel=# show maintenance_work_mem ;
maintenance_work_mem
--
16MB
(1 row)

pavel=# set maintenance_work_mem to '300MB';
SET
Time: 0,230 ms
pavel=#
pavel=# CREATE index fooidx on test using gist(a gist__int_ops);
CREATE INDEX
Time: 1269276,866 ms

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] XID wraparound and busy databases

2007-08-16 Thread Bruce Momjian

Added to TODO:

* Reduce XID consumption of read-only queries

   http://archives.postgresql.org/pgsql-hackers/2007-08/msg00516.php


---

Florian G. Pflug wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  Is enlarging the xid field something we should consider for 8.4?
  
  No.  We just got the tuple header down to 24 bytes, we are not going
  to give that back and then some.
  
  If you are processing 6K transactions per second, you can afford to
  vacuum every couple days... and probably need to vacuum much more often
  than that anyway, to avoid table bloat.
  
  Possibly your respondent should think about trying to do more than one
  thing per transaction?
 
 I'm wondering how many of those 6k xacts/second are actually modifying
 data. If a large percentage of those are readonly queries, than the need
 for vacuuming could be reduced if postgres assigned an xid only if that
 xid really hits the disk. Otherwise (for purely select-type queries) it
 could use some special xid value.
 
 This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch.
 
 greetings, Florian Pflug

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Another idea for index-only scans

2007-08-16 Thread Jeff Davis
On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote:
 I have added another idea for index-only scans to the TODO list:
 
A third idea would be for a heap scan to check if all rows are visible
and if so set a per-table flag which can be checked by index scans.
Any change to the table would have to clear the flag.  To detect
changes during the heap scan a counter could be set at the start and
checked at the end --- if it is the same, the table has not been
modified --- any table change would increment the counter.
 

This sounds useful for read-only or read-mostly tables.

However, it also sounds a little dangerous. If you test your application
performance, but not thoroughly enough, you might end up with a surprise
when going into production.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] tsearch2 in PostgreSQL 8.3?

2007-08-16 Thread Bruce Momjian
Josh Berkus wrote:
 All,
 
 First off, I'll assert that backup/restore is a serious issue and while the 
 folks who want Tsearch in core now are dismissing it, we'll be fielding the 
 complaints later.  Any solution which involves setting a GUC at restore time 
 *which could vary per table or even column* isn't acceptable.  We used to do 
 the \SET thing for table ownership with backup/restore, and you *know* how 
 many restore failures that caused.

Agreed.  Let me summarize where we are now.  I talked to Tom on the
phone yesterday so we have come up with the following plan:

  o  default_text_search_config stays, not super-user-only, not set
 in pg_dump output
  o  tsearch functions that don't have a configuration name will be
 marked so they can't be specified in expression indexes
  o  auto-casts  and :: to tsearch data types will also not work in 
 expression indexes (we already do this for timestamp without 
 timezone)
  o  GIN on an text column will not promote to tsvector
  o  No rewrite magic for function calls without configuration names in 
 WHERE clauses to use indexes that do specify configurations (risky)

The current documentation explains all this:

http://momjian.us/expire/textsearch/HTML/textsearch-tables.html

So, we have disabled the ability to create expression indexes that are
affected by default_text_search_config, and we have documented other
possible problems.   tsvector_update_trigger() has to be modified to
take a configuration name (and frankly I am not excited about the
filter_name capability either, but that is a separate issue).

The only remaining problem I see is that the rest of the documentation
relies heavily on default_text_search_config when in fact the most
common usage with tables and indexes can't use it.  tsquery can use the
default easily, but I am betting that tsvector usually cannot.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Another idea for index-only scans

2007-08-16 Thread Alvaro Herrera
Jeff Davis wrote:
 On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote:
  I have added another idea for index-only scans to the TODO list:
  
 A third idea would be for a heap scan to check if all rows are visible
 and if so set a per-table flag which can be checked by index scans.
 Any change to the table would have to clear the flag.  To detect
 changes during the heap scan a counter could be set at the start and
 checked at the end --- if it is the same, the table has not been
 modified --- any table change would increment the counter.
 
 This sounds useful for read-only or read-mostly tables.

I think it's too coarse-grained to be really useful.  If it was one bit
per page it could work, but one bit per relation is going to be reset
too frequently.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] build farm failures

2007-08-16 Thread Alvaro Herrera
Andrew Dunstan wrote:


 Michael Meskes wrote:
 The backend has:

 $(srcdir)/gram.c: $(srcdir)/parse.h ;

 $(srcdir)/parse.h: gram.y

 So except for the different naming it's the same. However, we haven't
 had that problem with the backend so far, or did we?

 What do I fail to see?

 We have had problems in the past. If the user builds at a point in time 
 after the .y file is checked in then the generated file is newer and if 
 it's not removed will never be regenerated, even if they do a subsequent 
 cvs update.

How do you create the copy of the repo to build?  One idea would be to
explicitely skip files that appear on .cvsignore (and maybe croak about
them).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Another idea for index-only scans

2007-08-16 Thread Decibel!
On Thu, Aug 16, 2007 at 04:06:35PM -0400, Alvaro Herrera wrote:
 Jeff Davis wrote:
  On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote:
   I have added another idea for index-only scans to the TODO list:
   
  A third idea would be for a heap scan to check if all rows are visible
  and if so set a per-table flag which can be checked by index scans.
  Any change to the table would have to clear the flag.  To detect
  changes during the heap scan a counter could be set at the start and
  checked at the end --- if it is the same, the table has not been
  modified --- any table change would increment the counter.
  
  This sounds useful for read-only or read-mostly tables.
 
 I think it's too coarse-grained to be really useful.  If it was one bit
 per page it could work, but one bit per relation is going to be reset
 too frequently.

Not for the most common use cases for table partitioning.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpKGvmIPLYTD.pgp
Description: PGP signature


Re: [HACKERS] build farm failures

2007-08-16 Thread Andrew Dunstan



Alvaro Herrera wrote:



How do you create the copy of the repo to build?  One idea would be to
explicitely skip files that appear on .cvsignore (and maybe croak about
them).

  



We are supposed to croak - see 
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/run_build.pl.diff?r1=1.69r2=1.70


And these machines run the version after that. So either the code is 
buggy or my explanation is of what happened is :-)


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-16 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, August 16, 2007 18:29:22 +0200 Magnus Hagander 
[EMAIL PROTECTED] wrote:


 Marc - how much work to update that one on the old server? I know you
 have a new one coming up, but as a stop-gap fix on the old one?

The new one is just waiting on your time scheduale ... its there ready for you 
though :)

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGxLb74QvfyHIvDvMRAp7KAKDna1ykPMPMKA7swknKg8aHnJ0d7QCg5T5A
yJKd0uDb7FBvm2EeZtS3ioA=
=DxVn
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] building gist index on int[] field is terrible slow. Is it bug?

2007-08-16 Thread Pavel Stehule
2007/8/16, Oleg Bartunov [EMAIL PROTECTED]:
 On Thu, 16 Aug 2007, Pavel Stehule wrote:

  Hello
 
  I tested contrib package intarray and gist support from this package.
  I was surpirised. Index building needed lot of time.
 
  10K fields .. 106sec
  20K f ..265 sec
  30Kf  .. 450 sec
  50Kf  .. 1283sec
 
  building gin index for 50K fields needed 0.5sec

 you probably need gist__intbig_ops


yes, it helps
thank you
Pavel Stehule

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] XID wraparound and busy databases

2007-08-16 Thread Paul Lindner

On Thu, Aug 16, 2007 at 04:13:55PM +0200, Florian G. Pflug wrote:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 Is enlarging the xid field something we should consider for 8.4?
 
 No.  We just got the tuple header down to 24 bytes, we are not going
 to give that back and then some.
 
 If you are processing 6K transactions per second, you can afford to
 vacuum every couple days... and probably need to vacuum much more often
 than that anyway, to avoid table bloat.
 
 Possibly your respondent should think about trying to do more than one
 thing per transaction?
 
 I'm wondering how many of those 6k xacts/second are actually modifying
 data. If a large percentage of those are readonly queries, than the need
 for vacuuming could be reduced if postgres assigned an xid only if that
 xid really hits the disk. Otherwise (for purely select-type queries) it
 could use some special xid value.
 
 This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch.

Bruce talked to me and others at Hi5.  We're the ones with this high
workload, and yes, the traffic is predominantly SELECTs.

Our application issues parallel SELECT queries to 32 partitioned
tables in 32 separate threads.  For our purposes this is much, much
faster than a UNION view or table inheritance.

These tables are partitioned on one key, but we also need to search on
other indexed keys, thus we use what we call a 'broadcast query'.

We'd be happy with better faster vacuums or limits on xids (or both!)

-- 
Paul Lindner| | | | |  |  |  |   |   |
[EMAIL PROTECTED]


pgpD0OBStu40z.pgp
Description: PGP signature


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-16 Thread Magnus Hagander
Marc G. Fournier wrote:
 
 Marc - how much work to update that one on the old server? I know you
 have a new one coming up, but as a stop-gap fix on the old one?
 
 The new one is just waiting on your time scheduale ... its there ready for 
 you 
 though :)


No, that's still waiting on your documentation.

But my question still stands - how much work to stop-gap fix it on the
old one?

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Another idea for index-only scans

2007-08-16 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote:
 A third idea would be for a heap scan to check if all rows are visible
 and if so set a per-table flag which can be checked by index scans.

 I think it's too coarse-grained to be really useful.  If it was one bit
 per page it could work, but one bit per relation is going to be reset
 too frequently.

Another problem it would have is that the flag would be a single point
of contention.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)

2007-08-16 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, August 16, 2007 23:16:09 +0200 Magnus Hagander 
[EMAIL PROTECTED] wrote:


 But my question still stands - how much work to stop-gap fix it on the
 old one?

rsync should be upgraded now ...

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGxOT94QvfyHIvDvMRAvzDAJ99kLdDzyM9RsxeUi1aQOyoJLv+sQCgkq/e
tRC1QXW116gVX/30VZCJQGc=
=e7HB
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] tsearch patch and namespace pollution

2007-08-16 Thread Tom Lane
I find the following additions to pg_proc in the current tsearch2 patch:

   proc   | prorettype 
--+
 pg_ts_parser_is_visible(oid) | boolean
 pg_ts_dict_is_visible(oid)   | boolean
 pg_ts_template_is_visible(oid)   | boolean
 pg_ts_config_is_visible(oid) | boolean
 tsvectorin(cstring)  | tsvector
 tsvectorout(tsvector)| cstring
 tsvectorsend(tsvector)   | bytea
 tsqueryin(cstring)   | tsquery
 tsqueryout(tsquery)  | cstring
 tsquerysend(tsquery) | bytea
 gtsvectorin(cstring) | gtsvector
 gtsvectorout(gtsvector)  | cstring
 tsvector_lt(tsvector,tsvector)   | boolean
 tsvector_le(tsvector,tsvector)   | boolean
 tsvector_eq(tsvector,tsvector)   | boolean
 tsvector_ne(tsvector,tsvector)   | boolean
 tsvector_ge(tsvector,tsvector)   | boolean
 tsvector_gt(tsvector,tsvector)   | boolean
 tsvector_cmp(tsvector,tsvector)  | integer
 length(tsvector) | integer
 strip(tsvector)  | tsvector
 setweight(tsvector,char)   | tsvector
 tsvector_concat(tsvector,tsvector)   | tsvector
 vq_exec(tsvector,tsquery)| boolean
 qv_exec(tsquery,tsvector)| boolean
 tt_exec(text,text)   | boolean
 ct_exec(character varying,text)  | boolean
 tq_exec(text,tsquery)| boolean
 cq_exec(character varying,tsquery)   | boolean
 tsquery_lt(tsquery,tsquery)  | boolean
 tsquery_le(tsquery,tsquery)  | boolean
 tsquery_eq(tsquery,tsquery)  | boolean
 tsquery_ne(tsquery,tsquery)  | boolean
 tsquery_ge(tsquery,tsquery)  | boolean
 tsquery_gt(tsquery,tsquery)  | boolean
 tsquery_cmp(tsquery,tsquery) | integer
 tsquery_and(tsquery,tsquery) | tsquery
 tsquery_or(tsquery,tsquery)  | tsquery
 tsquery_not(tsquery) | tsquery
 tsq_mcontains(tsquery,tsquery)   | boolean
 tsq_mcontained(tsquery,tsquery)  | boolean
 numnode(tsquery) | integer
 querytree(tsquery)   | text
 rewrite(tsquery,tsquery,tsquery) | tsquery
 rewrite(tsquery,text)| tsquery
 rewrite_accum(tsquery,tsquery[]) | tsquery
 rewrite_finish(tsquery)  | tsquery
 rewrite(tsquery[])   | tsquery
 stat(text)   | record
 stat(text,text)  | record
 rank(real[],tsvector,tsquery,integer)| real
 rank(real[],tsvector,tsquery)| real
 rank(tsvector,tsquery,integer)   | real
 rank(tsvector,tsquery)   | real
 rank_cd(real[],tsvector,tsquery,integer) | real
 rank_cd(real[],tsvector,tsquery) | real
 rank_cd(tsvector,tsquery,integer)| real
 rank_cd(tsvector,tsquery)| real
 token_type(oid)  | record
 token_type(text) | record
 parse(oid,text)  | record
 parse(text,text) | record
 lexize(oid,text) | text[]
 lexize(text,text)| text[]
 headline(oid,text,tsquery,text)  | text
 headline(oid,text,tsquery)   | text
 headline(text,text,tsquery,text) | text
 headline(text,text,tsquery)  | text
 headline(text,tsquery,text)  | text
 headline(text,tsquery)   | text
 to_tsvector(oid,text)| tsvector
 to_tsvector(text,text)   | tsvector
 to_tsquery(oid,text) | tsquery
 to_tsquery(text,text)| tsquery
 plainto_tsquery(oid,text)| tsquery
 plainto_tsquery(text,text)   | tsquery
 to_tsvector(text)| tsvector
 to_tsquery(text) | tsquery
 plainto_tsquery(text)| tsquery
 tsvector_update_trigger()| trigger
 get_ts_config_oid(text)  | oid
 get_current_ts_config()  | oid
(82 rows)

(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)

While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as parse or stat
with argument types as generic as text.  I think we need to put
a ts_ prefix on some of these.  Specifically, I find these names
totally unacceptable without a ts_ prefix:

 stat(text)   | record
 stat(text,text)  | record

 token_type(oid)  | record
 token_type(text)   

[HACKERS] tsvector_update_trigger() is utterly insecure

2007-08-16 Thread Tom Lane
We can't put tsvector_update_trigger() into core in anything like its
current form.  As is, it will take an unqualified function name, look
it up, and call it.  The prospects for subversion by search path
manipulation are obvious, and even if you aren't concerned about
malicious attacks, the effects of the trigger are context-dependent
(and maybe time-varying; it doesn't insist on the function being
immutable) in exactly the same way that we've been saying we can't
accept for the tsearch configuration.

I think we should redefine the trigger as taking trigger arguments that
are first a config name, then a list of one or more field names, and
nothing else.

People who want extra processing done on their fields before forming the
tsvector can write custom triggers to do it ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] GIT patch

2007-08-16 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  So instead of pressing to try to get something into 8.3, I would rather
  we stand back and think about it some more.
 
  I understand why you are saying hold for 8.4, but this issue came up in
  the middle of the 8.3 development cycle and didn't get much attention. 
  I would like to know why it will get any more attention during 8.4.
 
 It's not more attention that it needs; it's some good ideas.  Which
 we don't yet have, and we cannot produce on a schedule.

This is a difficult email to write but it seems GIT isn't going to make
it into 8.3.  There seems to be too many open implementation questions
to move forward.  It seems the internal API changes need more thought.

I somehow feel that if HOT wasn't being considered for 8.3 we might have
gotten GIT, but with limited resources I think there was more focus on
HOT, perhaps rightly so.

These patches will be held for 8.4:

o  Grouped Index Tuples (GIT)
o  Bitmap scan changes
o  Stream bitmaps (API change for Group Index Tuples)
o  Maintaining cluster order on insert

I believe Heikki is in agreement on this.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsvector_update_trigger() is utterly insecure

2007-08-16 Thread Bruce Momjian
Tom Lane wrote:
 We can't put tsvector_update_trigger() into core in anything like its
 current form.  As is, it will take an unqualified function name, look
 it up, and call it.  The prospects for subversion by search path
 manipulation are obvious, and even if you aren't concerned about
 malicious attacks, the effects of the trigger are context-dependent
 (and maybe time-varying; it doesn't insist on the function being
 immutable) in exactly the same way that we've been saying we can't
 accept for the tsearch configuration.
 
 I think we should redefine the trigger as taking trigger arguments that
 are first a config name, then a list of one or more field names, and
 nothing else.
 
 People who want extra processing done on their fields before forming the
 tsvector can write custom triggers to do it ...

Agreed.  A stated in email I didn't like the filter API on style
grounds.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] GIT patch

2007-08-16 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bruce Momjian wrote:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:

 I somehow feel that if HOT wasn't being considered for 8.3 we might have
 gotten GIT, but with limited resources I think there was more focus on
 HOT, perhaps rightly so.
 
 These patches will be held for 8.4:
 
   o  Grouped Index Tuples (GIT)
   o  Bitmap scan changes
   o  Stream bitmaps (API change for Group Index Tuples)
   o  Maintaining cluster order on insert
 
 I believe Heikki is in agreement on this.

That is certainly a bummer.

Joshua D. Drake



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxQU0ATb/zqfZUUQRArzSAJ4p6WKRiUEKOXsPduYViudNLvijDQCeMXJI
xvq7Ir1/bsQOpSlIqYwpYyc=
=kh8s
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch patch and namespace pollution

2007-08-16 Thread Bruce Momjian
Tom Lane wrote:
 I find the following additions to pg_proc in the current tsearch2 patch:

It seems a lot of these are useless and just bloat.  I will mark a few:

proc   | prorettype 
 --+
  pg_ts_parser_is_visible(oid) | boolean
  pg_ts_dict_is_visible(oid)   | boolean
  pg_ts_template_is_visible(oid)   | boolean
  pg_ts_config_is_visible(oid) | boolean

Why would anyone look these up via OID rather than name?

  tsvectorin(cstring)  | tsvector
  tsvectorout(tsvector)| cstring
  tsvectorsend(tsvector)   | bytea
  tsqueryin(cstring)   | tsquery
  tsqueryout(tsquery)  | cstring
  tsquerysend(tsquery) | bytea
  gtsvectorin(cstring) | gtsvector
  gtsvectorout(gtsvector)  | cstring
  tsvector_lt(tsvector,tsvector)   | boolean
  tsvector_le(tsvector,tsvector)   | boolean
  tsvector_eq(tsvector,tsvector)   | boolean
  tsvector_ne(tsvector,tsvector)   | boolean
  tsvector_ge(tsvector,tsvector)   | boolean
  tsvector_gt(tsvector,tsvector)   | boolean
  tsvector_cmp(tsvector,tsvector)  | integer
  length(tsvector) | integer
  strip(tsvector)  | tsvector
  setweight(tsvector,char)   | tsvector
  tsvector_concat(tsvector,tsvector)   | tsvector
  vq_exec(tsvector,tsquery)| boolean
  qv_exec(tsquery,tsvector)| boolean
  tt_exec(text,text)   | boolean
  ct_exec(character varying,text)  | boolean
  tq_exec(text,tsquery)| boolean
  cq_exec(character varying,tsquery)   | boolean
  tsquery_lt(tsquery,tsquery)  | boolean
  tsquery_le(tsquery,tsquery)  | boolean
  tsquery_eq(tsquery,tsquery)  | boolean
  tsquery_ne(tsquery,tsquery)  | boolean
  tsquery_ge(tsquery,tsquery)  | boolean
  tsquery_gt(tsquery,tsquery)  | boolean
  tsquery_cmp(tsquery,tsquery) | integer
  tsquery_and(tsquery,tsquery) | tsquery
  tsquery_or(tsquery,tsquery)  | tsquery
  tsquery_not(tsquery) | tsquery
  tsq_mcontains(tsquery,tsquery)   | boolean
  tsq_mcontained(tsquery,tsquery)  | boolean
  numnode(tsquery) | integer
  querytree(tsquery)   | text
  rewrite(tsquery,tsquery,tsquery) | tsquery
  rewrite(tsquery,text)| tsquery
  rewrite_accum(tsquery,tsquery[]) | tsquery
  rewrite_finish(tsquery)  | tsquery
  rewrite(tsquery[])   | tsquery
  stat(text)   | record
  stat(text,text)  | record

  rank(real[],tsvector,tsquery,integer)| real
  rank(real[],tsvector,tsquery)| real
  rank(tsvector,tsquery,integer)   | real
  rank(tsvector,tsquery)   | real
  rank_cd(real[],tsvector,tsquery,integer) | real
  rank_cd(real[],tsvector,tsquery) | real
  rank_cd(tsvector,tsquery,integer)| real
  rank_cd(tsvector,tsquery)| real

Do we realy need this many ranking functions?

  token_type(oid)  | record

Again, why by OID?

  token_type(text) | record
  parse(oid,text)  | record
  parse(text,text) | record
  lexize(oid,text) | text[]
  lexize(text,text)| text[]
  headline(oid,text,tsquery,text)  | text
  headline(oid,text,tsquery)   | text
  headline(text,text,tsquery,text) | text
  headline(text,text,tsquery)  | text
  headline(text,tsquery,text)  | text
  headline(text,tsquery)   | text
  to_tsvector(oid,text)| tsvector
  to_tsvector(text,text)   | tsvector
  to_tsquery(oid,text) | tsquery

Why OID again for the configuration?  I just don't see the use case and
it is bloat and causes confusion.

  to_tsquery(text,text)| tsquery
  plainto_tsquery(oid,text)| tsquery
  plainto_tsquery(text,text)   | tsquery

Again, OID.  I asked Oleg about this and he said:

 Bruce, just remove oid argument specification from documentation.

so I think we can go ahead and remove cases where the configuration name
or object is specified by oid.  I have already removed them from the
documentation and I though the patch had them removed too, but I guess
not.  Admittedly this API has been in flux.

  to_tsvector(text)| tsvector
  to_tsquery(text) | tsquery
  

Re: [HACKERS] GIT patch

2007-08-16 Thread Bruce Momjian
Joshua D. Drake wrote:
  These patches will be held for 8.4:
  
  o  Grouped Index Tuples (GIT)
  o  Bitmap scan changes
  o  Stream bitmaps (API change for Group Index Tuples)
  o  Maintaining cluster order on insert
  
  I believe Heikki is in agreement on this.
 
 That is certainly a bummer.

I think text search has challenges similar to GIT, but the GIT issues
were more how to change the internal API, while text search was a
user-API issue which is easier to bang into shape.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIT patch

2007-08-16 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bruce Momjian wrote:
 Joshua D. Drake wrote:
 These patches will be held for 8.4:

 o  Grouped Index Tuples (GIT)
 o  Bitmap scan changes
 o  Stream bitmaps (API change for Group Index Tuples)
 o  Maintaining cluster order on insert

 I believe Heikki is in agreement on this.
 That is certainly a bummer.
 
 I think text search has challenges similar to GIT, but the GIT issues
 were more how to change the internal API, while text search was a
 user-API issue which is easier to bang into shape.

Well let me just throw out there that I am in favor of this hold back
for 8.4. I don't like it but I am in favor of it.

Sincerely,

Joshua D. Drake

 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxQftATb/zqfZUUQRAob2AJwKfNaUBgz6TmSI2/bCfYvbKwQmwgCfT7pg
6UXsRjC/4WPQM+zB93p4uPM=
=nXAo
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] text search vs schemas

2007-08-16 Thread Tom Lane
I wrote:
 We can't put tsvector_update_trigger() into core in anything like its
 current form.  As is, it will take an unqualified function name, look
 it up, and call it.  The prospects for subversion by search path
 manipulation are obvious, and even if you aren't concerned about
 malicious attacks, the effects of the trigger are context-dependent

Actually ... I'm suddenly not happy about the choice to put text search
configurations etc. into schemas at all.  We've been sitting here and
assuming that to_tsvector('english', my_text_col) has a well defined
meaning --- but as the patch stands, *it does not*.  The interpretation
of the config name could easily change depending on search_path.

It does not seem likely that a typical installation will have so many
text search configs that subdividing them into schemas will really be
useful.  If I recall correctly, Teodor did that on my recommendation
that it'd be the cleanest way to distinguish built-in from non-built-in
objects for dump purposes.  That is, pg_dump would ignore TS objects
that are in pg_catalog and dump everything else.  But I'm having severe
second thoughts about that.

What seems the most attractive alternative at the moment is to have a
flat namespace for TS objects (no schemas) and introduce something like
a bool is_built_in column for pg_dump to consult in deciding whether
to dump 'em.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] text search vs schemas

2007-08-16 Thread Trevor Talbot
On 8/16/07, Tom Lane [EMAIL PROTECTED] wrote:

 Actually ... I'm suddenly not happy about the choice to put text search
 configurations etc. into schemas at all.  We've been sitting here and
 assuming that to_tsvector('english', my_text_col) has a well defined
 meaning --- but as the patch stands, *it does not*.  The interpretation
 of the config name could easily change depending on search_path.

 It does not seem likely that a typical installation will have so many
 text search configs that subdividing them into schemas will really be
 useful.  If I recall correctly, Teodor did that on my recommendation
 that it'd be the cleanest way to distinguish built-in from non-built-in
 objects for dump purposes.  That is, pg_dump would ignore TS objects
 that are in pg_catalog and dump everything else.  But I'm having severe
 second thoughts about that.

 What seems the most attractive alternative at the moment is to have a
 flat namespace for TS objects (no schemas) and introduce something like
 a bool is_built_in column for pg_dump to consult in deciding whether
 to dump 'em.

That assumes a database-oriented search config, instead of a case of
multiple users confined to invidual schemas doing their own thing.  Is
the latter possible now, and do you want to remove that ability?

Something else that occurs to me though: the problem seems to be that
parts of tsearch take object names as strings.  I thought one
advantage of having it in core is that they are now real database
objects, with owners etc.  How many other database objects are passed
around as string labels?

Wouldn't treating them as actual objects remove this whole issue?
What happens now if you try to drop a configuration that's still used
in a trigger somewhere?

(I'm new to both tsearch2 and this list, so please excuse any
mistakes.  Mostly keeping an eye on this for future use in my own
projects.)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] text search vs schemas

2007-08-16 Thread Tom Lane
Trevor Talbot [EMAIL PROTECTED] writes:
 Wouldn't treating them as actual objects remove this whole issue?

Uh, no.  Function names for example are subject to search-path
confusion.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] text search vs schemas

2007-08-16 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Trevor Talbot [EMAIL PROTECTED] writes:
 Wouldn't treating them as actual objects remove this whole issue?

 Uh, no.  Function names for example are subject to search-path
 confusion.

Wait, are they? They are in PL languages but only because most languages store
their source code as text just as is happening here. 

But they're not in views or other native SQL uses of functions because they
store the reference to the specific function's OID. In dumps they output the
schema along with the name. As in:

postgres=# \d foo.testv
   View foo.testv
 Column |  Type   | Modifiers 
+-+---
 i  | integer | 
 a  | integer | 
View definition:
 SELECT test.i, foo.a(test.i) AS a
   FROM foo.test;

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] text search vs schemas

2007-08-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Uh, no.  Function names for example are subject to search-path
 confusion.

 Wait, are they? They are in PL languages but only because most
 languages store their source code as text just as is happening here.

Hmmm ... if you look at the current solution for default expressions
for serial columns, ie nextval() on a regclass constant, it's pretty
schema-safe.  So we could imagine inventing a regconfig datatype that
is the same sort of wrapper-over-OID.  Then make the 2-parameter form
of to_tsvector take that type instead of text.

That seems like it'd fix the problem for expression indexes on
to_tsvector calls, but I don't see how it fixes the problem for
triggers.  We don't have any clear path for making trigger arguments
be anything but a list of strings.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] text search vs schemas

2007-08-16 Thread Oleg Bartunov

On Thu, 16 Aug 2007, Tom Lane wrote:


I wrote:

We can't put tsvector_update_trigger() into core in anything like its
current form.  As is, it will take an unqualified function name, look
it up, and call it.  The prospects for subversion by search path
manipulation are obvious, and even if you aren't concerned about
malicious attacks, the effects of the trigger are context-dependent


Actually ... I'm suddenly not happy about the choice to put text search
configurations etc. into schemas at all.  We've been sitting here and
assuming that to_tsvector('english', my_text_col) has a well defined
meaning --- but as the patch stands, *it does not*.  The interpretation
of the config name could easily change depending on search_path.


what's wrong with schema-qualified name ?



It does not seem likely that a typical installation will have so many
text search configs that subdividing them into schemas will really be
useful.  If I recall correctly, Teodor did that on my recommendation


it's useful.



that it'd be the cleanest way to distinguish built-in from non-built-in
objects for dump purposes.  That is, pg_dump would ignore TS objects


I think you're wrong here. Schema often used to save connections and it's
natural to have different searches in different schemes.


that are in pg_catalog and dump everything else.  But I'm having severe
second thoughts about that.

What seems the most attractive alternative at the moment is to have a
flat namespace for TS objects (no schemas) and introduce something like
a bool is_built_in column for pg_dump to consult in deciding whether
to dump 'em.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly