Re: [HACKERS] Index Tuple Compression Approach?
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: That general approach of storing a common part leading part just once is called prefix compression. Yeah, it helps a lot on long text fields. Tree structures like file paths in particular. You kind of want to do avoid both the prefix and the suffix, no? You're much more likely to find common prefixes than suffixes in an index page, because of the ordering. I suppose compressing the suffix would be useful in some cases as well. You might be better off with some generic compression algorithm at that point, though. It's been discussed before. One big problem is extracting the common leading part. You could only do it for text, Or for multi-column indexes Oh yeah, that you could do more easily. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Index Tuple Compression Approach?
Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: That general approach of storing a common part leading part just once is called prefix compression. Yeah, it helps a lot on long text fields. Tree structures like file paths in particular. You kind of want to do avoid both the prefix and the suffix, no? You're much more likely to find common prefixes than suffixes in an index page, because of the ordering. I suppose compressing the suffix would be useful in some cases as well. You might be better off with some generic compression algorithm at that point, though. Sorry, by suffix I don't mean common sufixes, I mean the bits of the key following the point which discriminates between the left and right side of the tree. So for example if you're indexing a text field and have a tree structure like: Redhat Fedora Core 7 / \ Debian Etch (Unstable) Ubuntu hoary We don't really need the whole of Redhat Fedora Core 7 in the index node. We could actually get by with just R. Everything before R is on the left and everything after R is on the right. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] build farm failures
Hi, we have two build farm members failing to make since I committed teh ecpg changes: echidna and herring. It looks like they are still using an old preproc.c although they checked out the new preproc.y. I have no idea how this is supposed to work so could someone please enlighten me? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Bruce Momjian wrote: Uh, how are we going to prevent the auto-casting to tsvector from using the default GUC config, e.g. CREATE INDEX i ON x USING GIN(col)? This is where I started to see the need for education and error-prone nature of the default GUC just wasn't worth having it, though I know others disagree. It can be removed quite easily. AFAIR, this feature was added on suggestion of Tom Lane. It was certainly only added in this tsearch-to-core release cycle, see here: http://archives.postgresql.org/pgsql-hackers/2007-03/msg01384.php Teodor Sigaev wrote: 2) added operator class for text and varchar CREATE INDEX idxname ON tblname USING GIN ( textcolumn ); So just remove the operator class or don't specify it as default operator class for GIN, and the thing is gone. Perhaps there is a better way to do this, though. [...digging...] The idea was born in the thread starting here (involving Tom Lane, Joshua Drake, and Teodor Sigaev): http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php with the conclusion here: http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php Best Regards Michael Paesold ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] build farm failures
Michael Meskes wrote: Hi, we have two build farm members failing to make since I committed teh ecpg changes: echidna and herring. It looks like they are still using an old preproc.c although they checked out the new preproc.y. I have no idea how this is supposed to work so could someone please enlighten me? Yes it looks like that. But the buildfarm client doesn't actually build in the repo normally - it builds in a temp copy which is removed at the end of the run, precisely to avoid this kind of problem, so I'm a bit mystified how it can happen. In fact we go to some lengths to ensure that there are no extraneous files, but this one might not get caught by that because it is is in .cvsignore. This sort of thing is usually a symptom of somebody having run a build in the repo directly, a thing that buildfarm owners have been repeatedly advised not to do. Anyway, the simple solution is to ask Darcy to blow away the repo (these buildfarm clients share a single cvs checkout) so that the buildfarm client will get a fresh checkout next time it's run. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] XID wraparound and busy databases
On Wed, 2007-08-15 at 12:49 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Maybe we can do something to reduce the xid consumption? For example, reuse xids for read-only queries. Hmm, that's an idea. That would be a very useful idea for additional scalability and it does sound like a good idea. We kicked around the idea of reusing snapshots previously also, as a way of reducing access to the ProcArray. That idea had some problems, but if your workload was all read-only and we reused xids then everybody would have an identical snapshot, so reusing it would make lots of sense. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Michael Paesold [EMAIL PROTECTED] writes: Teodor Sigaev wrote: 2) added operator class for text and varchar CREATE INDEX idxname ON tblname USING GIN ( textcolumn ); So just remove the operator class or don't specify it as default operator class for GIN, and the thing is gone. Perhaps there is a better way to do this, though. [...digging...] The idea was born in the thread starting here (involving Tom Lane, Joshua Drake, and Teodor Sigaev): http://archives.postgresql.org/pgsql-hackers/2007-03/msg00912.php with the conclusion here: http://archives.postgresql.org/pgsql-hackers/2007-03/msg00936.php Yeah, unfortunately we overlooked the implications of the conversion to tsvector being environment-dependent. Those opclasses will have to go away again. AFAICS the only safe way to build an index directly on a text column is CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn)); ie, you hardwire the configuration name directly into the index definition. Similarly, if you're using a trigger to build a materialized tsvector column, you need to hardwire the config name into the trigger definition. An alternative in both cases is to take the config name from another field of the table row. This is what you'd need to do for the advanced cases where you use different configs for different entries in the same table. We can still have a GUC default_text_search_config, but we have to design the system around the assumption that that should only be referenced during *queries*, not during updates. That's the only safe way to let it be changeable. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] XID wraparound and busy databases
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Possibly your respondent should think about trying to do more than one thing per transaction? I'm wondering how many of those 6k xacts/second are actually modifying data. If a large percentage of those are readonly queries, than the need for vacuuming could be reduced if postgres assigned an xid only if that xid really hits the disk. Otherwise (for purely select-type queries) it could use some special xid value. This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Index Tuple Compression Approach?
[EMAIL PROTECTED] (Gregory Stark) writes: Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: That general approach of storing a common part leading part just once is called prefix compression. Yeah, it helps a lot on long text fields. Tree structures like file paths in particular. You kind of want to do avoid both the prefix and the suffix, no? You're much more likely to find common prefixes than suffixes in an index page, because of the ordering. I suppose compressing the suffix would be useful in some cases as well. You might be better off with some generic compression algorithm at that point, though. Sorry, by suffix I don't mean common sufixes, I mean the bits of the key following the point which discriminates between the left and right side of the tree. So for example if you're indexing a text field and have a tree structure like: Redhat Fedora Core 7 / \ Debian Etch (Unstable) Ubuntu hoary We don't really need the whole of Redhat Fedora Core 7 in the index node. We could actually get by with just R. Everything before R is on the left and everything after R is on the right. Right. The case where you get more characters than just R is when you introduce extra entries that have the same prefix. Say, Redhat Fedora Core 4, Redhat Fedora Core 5, Redhat Fedora Core 6. And, for good measure, let's throw in Redhat RHAS 3, Redhat RHAS 4, and Redhat RHAS 5. In that case, you'd have substrings: R edhat Fedora Core RHAS as discriminators. -- cbbrowne,@,cbbrowne.com http://www3.sympatico.ca/cbbrowne/spreadsheets.html If a mute swears, does his mother wash his hands with soap? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Deferred Constraints
Dear Hackers, Could you point me the first version PostgreSQL started to support Deferred Constraints (ie: DEFERRABLE keyword on foreign keys creation)? I guess it is earlier than 7.0, right? -- Regards, Rodrigo Hjort http://icewall.org/~hjort ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Deferred Constraints
On Thu, Aug 16, 2007 at 12:45:23PM -0300, Rodrigo Hjort wrote: Dear Hackers, Could you point me the first version PostgreSQL started to support Deferred Constraints (ie: DEFERRABLE keyword on foreign keys creation)? I guess it is earlier than 7.0, right? You could look through the release notes... they go way back. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpTmJemIG86f.pgp Description: PGP signature
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Tom Lane [EMAIL PROTECTED] writes: Yeah, unfortunately we overlooked the implications of the conversion to tsvector being environment-dependent. Those opclasses will have to go away again. AFAICS the only safe way to build an index directly on a text column is CREATE INDEX idxname ON tblname USING gin (to_tsvector('config', textcolumn)); Is there a null configuration which could be the default for the casts? So the syntax would still work and would generate an index which worked well but has no stop words, no stemming, etc? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
Alvaro Herrera [EMAIL PROTECTED] writes: There seems to be something wrong with this commit notification. This file pgsql/src/test/regress/expected: foreign_key.out (r1.44 - r1.45) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/foreign_key.out?r1=1.44r2=1.45) does not seem to have a 1.45 revision. Indeed, the changeset in this patch for this file is in 1.44, not 1.45. No, 1.44 was a month ago. 1.45 does exist in the master CVS repo AFAICS. It looks to me like cvsweb is showing a version that hasn't gotten updated for a day or two. Some fallout of the master server move no doubt. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] build farm failures
Darcy Buskermolen wrote: This sort of thing is usually a symptom of somebody having run a build in the repo directly, a thing that buildfarm owners have been repeatedly advised not to do. This is something I do not recall doing, however it's possible. though this does make me ask why are the build dependencies in the Makefile are not properly setup to tell that the .y needs to be rebuilt (which I would assume would make this problem also go away) Thje way cvs works is that it gives the file the date it has in the repository, so if your preproc.c is newer than the preproc.y, make will detect that and not rebuild it. If Michael's checkin occurs between the time the repo is updated and the time bison gets run on the original file this will happen. But if you never ever build in the repo then it won't, because buildfarm only ever builds in a copy (unless you're building with vpath, in which case it cleans up the generated files). cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] build farm failures
On Thursday 16 August 2007 04:29:41 Andrew Dunstan wrote: Michael Meskes wrote: Hi, we have two build farm members failing to make since I committed teh ecpg changes: echidna and herring. It looks like they are still using an old preproc.c although they checked out the new preproc.y. I have no idea how this is supposed to work so could someone please enlighten me? Yes it looks like that. But the buildfarm client doesn't actually build in the repo normally - it builds in a temp copy which is removed at the end of the run, precisely to avoid this kind of problem, so I'm a bit mystified how it can happen. In fact we go to some lengths to ensure that there are no extraneous files, but this one might not get caught by that because it is is in .cvsignore. This sort of thing is usually a symptom of somebody having run a build in the repo directly, a thing that buildfarm owners have been repeatedly advised not to do. This is something I do not recall doing, however it's possible. though this does make me ask why are the build dependencies in the Makefile are not properly setup to tell that the .y needs to be rebuilt (which I would assume would make this problem also go away) Anyway, the simple solution is to ask Darcy to blow away the repo (these buildfarm clients share a single cvs checkout) so that the buildfarm client will get a fresh checkout next time it's run. Let me go nuke the tree, and we'll try again cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: There seems to be something wrong with this commit notification. This file pgsql/src/test/regress/expected: foreign_key.out (r1.44 - r1.45) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/foreign_key.out?r1=1.44r2=1.45) does not seem to have a 1.45 revision. Indeed, the changeset in this patch for this file is in 1.44, not 1.45. No, 1.44 was a month ago. 1.45 does exist in the master CVS repo AFAICS. It looks to me like cvsweb is showing a version that hasn't gotten updated for a day or two. Some fallout of the master server move no doubt. Yes, it seems to be caused by the bug in the rsync server on the old machine. It gets fixed eventually by itself (I pushed it through now). Marc - how much work to update that one on the old server? I know you have a new one coming up, but as a stop-gap fix on the old one? //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Gregory Stark [EMAIL PROTECTED] writes: Is there a null configuration which could be the default for the casts? If we did that then there would be no point in having a GUC variable, because it wouldn't control anything. While that is certainly an alternative solution, I think it's riskier than having the GUC variable and just preventing (or at least discouraging) people from relying on it in their index definitions. The problem is that people will resort to ALTER CONFIGURATION default to tune their setups, and anytime you make a nontrivial change that way, you run the risk of breaking your existing indexes. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] build farm failures
Andrew Dunstan wrote: Darcy Buskermolen wrote: This sort of thing is usually a symptom of somebody having run a build in the repo directly, a thing that buildfarm owners have been repeatedly advised not to do. This is something I do not recall doing, however it's possible. though this does make me ask why are the build dependencies in the Makefile are not properly setup to tell that the .y needs to be rebuilt (which I would assume would make this problem also go away) Thje way cvs works is that it gives the file the date it has in the repository, so if your preproc.c is newer than the preproc.y, make will detect that and not rebuild it. If Michael's checkin occurs between the time the repo is updated and the time bison gets run on the original file this will happen. But if you never ever build in the repo then it won't, because buildfarm only ever builds in a copy (unless you're building with vpath, in which case it cleans up the generated files). Hum, so why not clean up the files when not in vpath as well? find . -name .cvsignore | while read line do dir=$(dirname $line) cd $dir rm -fv `cat .cvsignore` cd $OLDPWD done -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] build farm failures
Alvaro Herrera wrote: Andrew Dunstan wrote: Darcy Buskermolen wrote: This sort of thing is usually a symptom of somebody having run a build in the repo directly, a thing that buildfarm owners have been repeatedly advised not to do. This is something I do not recall doing, however it's possible. though this does make me ask why are the build dependencies in the Makefile are not properly setup to tell that the .y needs to be rebuilt (which I would assume would make this problem also go away) Thje way cvs works is that it gives the file the date it has in the repository, so if your preproc.c is newer than the preproc.y, make will detect that and not rebuild it. If Michael's checkin occurs between the time the repo is updated and the time bison gets run on the original file this will happen. But if you never ever build in the repo then it won't, because buildfarm only ever builds in a copy (unless you're building with vpath, in which case it cleans up the generated files). Hum, so why not clean up the files when not in vpath as well? find . -name .cvsignore | while read line do dir=$(dirname $line) cd $dir rm -fv `cat .cvsignore` cd $OLDPWD done Because they are not supposed to be there in the first place! If the buildfarm owner builds in the repo that is pilot error. And, btw, buildfarm is not a shell script. We use File::Find to do this sort of thing. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] build farm failures
On Thu, Aug 16, 2007 at 08:24:14AM -0700, Darcy Buskermolen wrote: This is something I do not recall doing, however it's possible. though this does make me ask why are the build dependencies in the Makefile are not properly setup to tell that the .y needs to be rebuilt (which I would assume would make this problem also go away) Frankly I have no idea. The dependencies are the same as with the backend's gram.y file: $(srcdir)/preproc.c: $(srcdir)/preproc.h ; $(srcdir)/preproc.h: preproc.y The backend has: $(srcdir)/gram.c: $(srcdir)/parse.h ; $(srcdir)/parse.h: gram.y So except for the different naming it's the same. However, we haven't had that problem with the backend so far, or did we? What do I fail to see? Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] build farm failures
Michael Meskes wrote: The backend has: $(srcdir)/gram.c: $(srcdir)/parse.h ; $(srcdir)/parse.h: gram.y So except for the different naming it's the same. However, we haven't had that problem with the backend so far, or did we? What do I fail to see? We have had problems in the past. If the user builds at a point in time after the .y file is checked in then the generated file is newer and if it's not removed will never be regenerated, even if they do a subsequent cvs update. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
All, First off, I'll assert that backup/restore is a serious issue and while the folks who want Tsearch in core now are dismissing it, we'll be fielding the complaints later. Any solution which involves setting a GUC at restore time *which could vary per table or even column* isn't acceptable. We used to do the \SET thing for table ownership with backup/restore, and you *know* how many restore failures that caused. Basically, restore happens at two times: (1) when the server fails and you need to recover, and (2) when you're upgrading, already a painful process. Glitches which occur at these times cause panic, angry user e-mails and people switching away from PostgreSQL. It's just not acceptable for us to put new potential booby-traps in the way of restore. Second, as attractive as the idea is, I can't see how a typemod would work. It's not like we have a fixed list of dictionaries; people can create their own. If we wanted to clean up the syntax I suppose we could have a form of to_tsvector which took a two-column composite value as if it were a multicolumn index: CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text ) hmmm, that wouldn't work as syntax, would it? We can't accept a constant as a column in a multi-column index, can we? Another reason why we can't do mods for 8.3. This means, from my perspective, that the only reasonable course for 8.3 is to require the 2-parameter form of to_tsvector for indexes. I'll say that in the applications I've developed with TSearch2 I use the 2-parameter form of to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch to be highly unreliable if I don't specify. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] building gist index on int[] field is terrible slow. Is it bug?
Hello I tested contrib package intarray and gist support from this package. I was surpirised. Index building needed lot of time. 10K fields .. 106sec 20K f ..265 sec 30Kf .. 450 sec 50Kf .. 1283sec building gin index for 50K fields needed 0.5sec Regards Pavel Stehule sample of data: pavel=# select * from test limit 10; a --- {4209,4207,4197,2066} {4832,3004} {4629} {3243} {4816} {3726} {4834} {1459,3160,3984} {4569} {4164,1307,962,4482} (10 rows) pavel=# select version(); version --- PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12) (1 row) Time: 442,034 ms pavel=# show maintenance_work_mem ; maintenance_work_mem -- 16MB (1 row) pavel=# set maintenance_work_mem to '300MB'; SET Time: 0,230 ms pavel=# pavel=# CREATE index fooidx on test using gist(a gist__int_ops); CREATE INDEX Time: 1269276,866 ms ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
On Thu, 16 Aug 2007, Josh Berkus wrote: All, First off, I'll assert that backup/restore is a serious issue and while the folks who want Tsearch in core now are dismissing it, we'll be fielding the complaints later. Any solution which involves setting a GUC at restore time *which could vary per table or even column* isn't acceptable. We used to do the \SET thing for table ownership with backup/restore, and you *know* how many restore failures that caused. Basically, restore happens at two times: (1) when the server fails and you need to recover, and (2) when you're upgrading, already a painful process. Glitches which occur at these times cause panic, angry user e-mails and people switching away from PostgreSQL. It's just not acceptable for us to put new potential booby-traps in the way of restore. Second, as attractive as the idea is, I can't see how a typemod would work. It's not like we have a fixed list of dictionaries; people can create their own. If we wanted to clean up the syntax I suppose we could have a form of to_tsvector which took a two-column composite value as if it were a multicolumn index: CREATE INDEX resumes_fti ON resumes USING GIN ( 'default', resume_text ) Josh, all my respects to you, but text searching is not about index at all. Text searching is about tsvector and tsquery data type hmmm, that wouldn't work as syntax, would it? We can't accept a constant as a column in a multi-column index, can we? Another reason why we can't do mods for 8.3. This means, from my perspective, that the only reasonable course for 8.3 is to require the 2-parameter form of to_tsvector for indexes. I'll say that in the applications I've developed with TSearch2 I use the 2-parameter form of to_tsvector and to_tsquery exclusively, as I've found the behavior of TSearch to be highly unreliable if I don't specify. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] building gist index on int[] field is terrible slow. Is it bug?
On Thu, 16 Aug 2007, Pavel Stehule wrote: Hello I tested contrib package intarray and gist support from this package. I was surpirised. Index building needed lot of time. 10K fields .. 106sec 20K f ..265 sec 30Kf .. 450 sec 50Kf .. 1283sec building gin index for 50K fields needed 0.5sec you probably need gist__intbig_ops Regards Pavel Stehule sample of data: pavel=# select * from test limit 10; a --- {4209,4207,4197,2066} {4832,3004} {4629} {3243} {4816} {3726} {4834} {1459,3160,3984} {4569} {4164,1307,962,4482} (10 rows) pavel=# select version(); version --- PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12) (1 row) Time: 442,034 ms pavel=# show maintenance_work_mem ; maintenance_work_mem -- 16MB (1 row) pavel=# set maintenance_work_mem to '300MB'; SET Time: 0,230 ms pavel=# pavel=# CREATE index fooidx on test using gist(a gist__int_ops); CREATE INDEX Time: 1269276,866 ms ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] XID wraparound and busy databases
Added to TODO: * Reduce XID consumption of read-only queries http://archives.postgresql.org/pgsql-hackers/2007-08/msg00516.php --- Florian G. Pflug wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Possibly your respondent should think about trying to do more than one thing per transaction? I'm wondering how many of those 6k xacts/second are actually modifying data. If a large percentage of those are readonly queries, than the need for vacuuming could be reduced if postgres assigned an xid only if that xid really hits the disk. Otherwise (for purely select-type queries) it could use some special xid value. This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch. greetings, Florian Pflug -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Another idea for index-only scans
On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. This sounds useful for read-only or read-mostly tables. However, it also sounds a little dangerous. If you test your application performance, but not thoroughly enough, you might end up with a surprise when going into production. Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] tsearch2 in PostgreSQL 8.3?
Josh Berkus wrote: All, First off, I'll assert that backup/restore is a serious issue and while the folks who want Tsearch in core now are dismissing it, we'll be fielding the complaints later. Any solution which involves setting a GUC at restore time *which could vary per table or even column* isn't acceptable. We used to do the \SET thing for table ownership with backup/restore, and you *know* how many restore failures that caused. Agreed. Let me summarize where we are now. I talked to Tom on the phone yesterday so we have come up with the following plan: o default_text_search_config stays, not super-user-only, not set in pg_dump output o tsearch functions that don't have a configuration name will be marked so they can't be specified in expression indexes o auto-casts and :: to tsearch data types will also not work in expression indexes (we already do this for timestamp without timezone) o GIN on an text column will not promote to tsvector o No rewrite magic for function calls without configuration names in WHERE clauses to use indexes that do specify configurations (risky) The current documentation explains all this: http://momjian.us/expire/textsearch/HTML/textsearch-tables.html So, we have disabled the ability to create expression indexes that are affected by default_text_search_config, and we have documented other possible problems. tsvector_update_trigger() has to be modified to take a configuration name (and frankly I am not excited about the filter_name capability either, but that is a separate issue). The only remaining problem I see is that the rest of the documentation relies heavily on default_text_search_config when in fact the most common usage with tables and indexes can't use it. tsquery can use the default easily, but I am betting that tsvector usually cannot. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Another idea for index-only scans
Jeff Davis wrote: On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. This sounds useful for read-only or read-mostly tables. I think it's too coarse-grained to be really useful. If it was one bit per page it could work, but one bit per relation is going to be reset too frequently. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] build farm failures
Andrew Dunstan wrote: Michael Meskes wrote: The backend has: $(srcdir)/gram.c: $(srcdir)/parse.h ; $(srcdir)/parse.h: gram.y So except for the different naming it's the same. However, we haven't had that problem with the backend so far, or did we? What do I fail to see? We have had problems in the past. If the user builds at a point in time after the .y file is checked in then the generated file is newer and if it's not removed will never be regenerated, even if they do a subsequent cvs update. How do you create the copy of the repo to build? One idea would be to explicitely skip files that appear on .cvsignore (and maybe croak about them). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Another idea for index-only scans
On Thu, Aug 16, 2007 at 04:06:35PM -0400, Alvaro Herrera wrote: Jeff Davis wrote: On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote: I have added another idea for index-only scans to the TODO list: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter. This sounds useful for read-only or read-mostly tables. I think it's too coarse-grained to be really useful. If it was one bit per page it could work, but one bit per relation is going to be reset too frequently. Not for the most common use cases for table partitioning. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpKGvmIPLYTD.pgp Description: PGP signature
Re: [HACKERS] build farm failures
Alvaro Herrera wrote: How do you create the copy of the repo to build? One idea would be to explicitely skip files that appear on .cvsignore (and maybe croak about them). We are supposed to croak - see http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/run_build.pl.diff?r1=1.69r2=1.70 And these machines run the version after that. So either the code is buggy or my explanation is of what happened is :-) cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Thursday, August 16, 2007 18:29:22 +0200 Magnus Hagander [EMAIL PROTECTED] wrote: Marc - how much work to update that one on the old server? I know you have a new one coming up, but as a stop-gap fix on the old one? The new one is just waiting on your time scheduale ... its there ready for you though :) - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGxLb74QvfyHIvDvMRAp7KAKDna1ykPMPMKA7swknKg8aHnJ0d7QCg5T5A yJKd0uDb7FBvm2EeZtS3ioA= =DxVn -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] building gist index on int[] field is terrible slow. Is it bug?
2007/8/16, Oleg Bartunov [EMAIL PROTECTED]: On Thu, 16 Aug 2007, Pavel Stehule wrote: Hello I tested contrib package intarray and gist support from this package. I was surpirised. Index building needed lot of time. 10K fields .. 106sec 20K f ..265 sec 30Kf .. 450 sec 50Kf .. 1283sec building gin index for 50K fields needed 0.5sec you probably need gist__intbig_ops yes, it helps thank you Pavel Stehule ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] XID wraparound and busy databases
On Thu, Aug 16, 2007 at 04:13:55PM +0200, Florian G. Pflug wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is enlarging the xid field something we should consider for 8.4? No. We just got the tuple header down to 24 bytes, we are not going to give that back and then some. If you are processing 6K transactions per second, you can afford to vacuum every couple days... and probably need to vacuum much more often than that anyway, to avoid table bloat. Possibly your respondent should think about trying to do more than one thing per transaction? I'm wondering how many of those 6k xacts/second are actually modifying data. If a large percentage of those are readonly queries, than the need for vacuuming could be reduced if postgres assigned an xid only if that xid really hits the disk. Otherwise (for purely select-type queries) it could use some special xid value. This is what I'm doing in my Readonly-Queries-On-PITR-Slave patch. Bruce talked to me and others at Hi5. We're the ones with this high workload, and yes, the traffic is predominantly SELECTs. Our application issues parallel SELECT queries to 32 partitioned tables in 32 separate threads. For our purposes this is much, much faster than a UNION view or table inheritance. These tables are partitioned on one key, but we also need to search on other indexed keys, thus we use what we call a 'broadcast query'. We'd be happy with better faster vacuums or limits on xids (or both!) -- Paul Lindner| | | | | | | | | | [EMAIL PROTECTED] pgpD0OBStu40z.pgp Description: PGP signature
Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
Marc G. Fournier wrote: Marc - how much work to update that one on the old server? I know you have a new one coming up, but as a stop-gap fix on the old one? The new one is just waiting on your time scheduale ... its there ready for you though :) No, that's still waiting on your documentation. But my question still stands - how much work to stop-gap fix it on the old one? //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Another idea for index-only scans
Alvaro Herrera [EMAIL PROTECTED] writes: On Wed, 2007-08-15 at 11:54 -0400, Bruce Momjian wrote: A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. I think it's too coarse-grained to be really useful. If it was one bit per page it could work, but one bit per relation is going to be reset too frequently. Another problem it would have is that the flag would be a single point of contention. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Thursday, August 16, 2007 23:16:09 +0200 Magnus Hagander [EMAIL PROTECTED] wrote: But my question still stands - how much work to stop-gap fix it on the old one? rsync should be upgraded now ... - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGxOT94QvfyHIvDvMRAvzDAJ99kLdDzyM9RsxeUi1aQOyoJLv+sQCgkq/e tRC1QXW116gVX/30VZCJQGc= =e7HB -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] tsearch patch and namespace pollution
I find the following additions to pg_proc in the current tsearch2 patch: proc | prorettype --+ pg_ts_parser_is_visible(oid) | boolean pg_ts_dict_is_visible(oid) | boolean pg_ts_template_is_visible(oid) | boolean pg_ts_config_is_visible(oid) | boolean tsvectorin(cstring) | tsvector tsvectorout(tsvector)| cstring tsvectorsend(tsvector) | bytea tsqueryin(cstring) | tsquery tsqueryout(tsquery) | cstring tsquerysend(tsquery) | bytea gtsvectorin(cstring) | gtsvector gtsvectorout(gtsvector) | cstring tsvector_lt(tsvector,tsvector) | boolean tsvector_le(tsvector,tsvector) | boolean tsvector_eq(tsvector,tsvector) | boolean tsvector_ne(tsvector,tsvector) | boolean tsvector_ge(tsvector,tsvector) | boolean tsvector_gt(tsvector,tsvector) | boolean tsvector_cmp(tsvector,tsvector) | integer length(tsvector) | integer strip(tsvector) | tsvector setweight(tsvector,char) | tsvector tsvector_concat(tsvector,tsvector) | tsvector vq_exec(tsvector,tsquery)| boolean qv_exec(tsquery,tsvector)| boolean tt_exec(text,text) | boolean ct_exec(character varying,text) | boolean tq_exec(text,tsquery)| boolean cq_exec(character varying,tsquery) | boolean tsquery_lt(tsquery,tsquery) | boolean tsquery_le(tsquery,tsquery) | boolean tsquery_eq(tsquery,tsquery) | boolean tsquery_ne(tsquery,tsquery) | boolean tsquery_ge(tsquery,tsquery) | boolean tsquery_gt(tsquery,tsquery) | boolean tsquery_cmp(tsquery,tsquery) | integer tsquery_and(tsquery,tsquery) | tsquery tsquery_or(tsquery,tsquery) | tsquery tsquery_not(tsquery) | tsquery tsq_mcontains(tsquery,tsquery) | boolean tsq_mcontained(tsquery,tsquery) | boolean numnode(tsquery) | integer querytree(tsquery) | text rewrite(tsquery,tsquery,tsquery) | tsquery rewrite(tsquery,text)| tsquery rewrite_accum(tsquery,tsquery[]) | tsquery rewrite_finish(tsquery) | tsquery rewrite(tsquery[]) | tsquery stat(text) | record stat(text,text) | record rank(real[],tsvector,tsquery,integer)| real rank(real[],tsvector,tsquery)| real rank(tsvector,tsquery,integer) | real rank(tsvector,tsquery) | real rank_cd(real[],tsvector,tsquery,integer) | real rank_cd(real[],tsvector,tsquery) | real rank_cd(tsvector,tsquery,integer)| real rank_cd(tsvector,tsquery)| real token_type(oid) | record token_type(text) | record parse(oid,text) | record parse(text,text) | record lexize(oid,text) | text[] lexize(text,text)| text[] headline(oid,text,tsquery,text) | text headline(oid,text,tsquery) | text headline(text,text,tsquery,text) | text headline(text,text,tsquery) | text headline(text,tsquery,text) | text headline(text,tsquery) | text to_tsvector(oid,text)| tsvector to_tsvector(text,text) | tsvector to_tsquery(oid,text) | tsquery to_tsquery(text,text)| tsquery plainto_tsquery(oid,text)| tsquery plainto_tsquery(text,text) | tsquery to_tsvector(text)| tsvector to_tsquery(text) | tsquery plainto_tsquery(text)| tsquery tsvector_update_trigger()| trigger get_ts_config_oid(text) | oid get_current_ts_config() | oid (82 rows) (This list omits functions with INTERNAL arguments, as those are of no particular concern to users.) While most of these are probably OK, I'm disturbed by the prospect that we are commandeering names as generic as parse or stat with argument types as generic as text. I think we need to put a ts_ prefix on some of these. Specifically, I find these names totally unacceptable without a ts_ prefix: stat(text) | record stat(text,text) | record token_type(oid) | record token_type(text)
[HACKERS] tsvector_update_trigger() is utterly insecure
We can't put tsvector_update_trigger() into core in anything like its current form. As is, it will take an unqualified function name, look it up, and call it. The prospects for subversion by search path manipulation are obvious, and even if you aren't concerned about malicious attacks, the effects of the trigger are context-dependent (and maybe time-varying; it doesn't insist on the function being immutable) in exactly the same way that we've been saying we can't accept for the tsearch configuration. I think we should redefine the trigger as taking trigger arguments that are first a config name, then a list of one or more field names, and nothing else. People who want extra processing done on their fields before forming the tsvector can write custom triggers to do it ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] GIT patch
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: So instead of pressing to try to get something into 8.3, I would rather we stand back and think about it some more. I understand why you are saying hold for 8.4, but this issue came up in the middle of the 8.3 development cycle and didn't get much attention. I would like to know why it will get any more attention during 8.4. It's not more attention that it needs; it's some good ideas. Which we don't yet have, and we cannot produce on a schedule. This is a difficult email to write but it seems GIT isn't going to make it into 8.3. There seems to be too many open implementation questions to move forward. It seems the internal API changes need more thought. I somehow feel that if HOT wasn't being considered for 8.3 we might have gotten GIT, but with limited resources I think there was more focus on HOT, perhaps rightly so. These patches will be held for 8.4: o Grouped Index Tuples (GIT) o Bitmap scan changes o Stream bitmaps (API change for Group Index Tuples) o Maintaining cluster order on insert I believe Heikki is in agreement on this. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsvector_update_trigger() is utterly insecure
Tom Lane wrote: We can't put tsvector_update_trigger() into core in anything like its current form. As is, it will take an unqualified function name, look it up, and call it. The prospects for subversion by search path manipulation are obvious, and even if you aren't concerned about malicious attacks, the effects of the trigger are context-dependent (and maybe time-varying; it doesn't insist on the function being immutable) in exactly the same way that we've been saying we can't accept for the tsearch configuration. I think we should redefine the trigger as taking trigger arguments that are first a config name, then a list of one or more field names, and nothing else. People who want extra processing done on their fields before forming the tsvector can write custom triggers to do it ... Agreed. A stated in email I didn't like the filter API on style grounds. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] GIT patch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I somehow feel that if HOT wasn't being considered for 8.3 we might have gotten GIT, but with limited resources I think there was more focus on HOT, perhaps rightly so. These patches will be held for 8.4: o Grouped Index Tuples (GIT) o Bitmap scan changes o Stream bitmaps (API change for Group Index Tuples) o Maintaining cluster order on insert I believe Heikki is in agreement on this. That is certainly a bummer. Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxQU0ATb/zqfZUUQRArzSAJ4p6WKRiUEKOXsPduYViudNLvijDQCeMXJI xvq7Ir1/bsQOpSlIqYwpYyc= =kh8s -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch patch and namespace pollution
Tom Lane wrote: I find the following additions to pg_proc in the current tsearch2 patch: It seems a lot of these are useless and just bloat. I will mark a few: proc | prorettype --+ pg_ts_parser_is_visible(oid) | boolean pg_ts_dict_is_visible(oid) | boolean pg_ts_template_is_visible(oid) | boolean pg_ts_config_is_visible(oid) | boolean Why would anyone look these up via OID rather than name? tsvectorin(cstring) | tsvector tsvectorout(tsvector)| cstring tsvectorsend(tsvector) | bytea tsqueryin(cstring) | tsquery tsqueryout(tsquery) | cstring tsquerysend(tsquery) | bytea gtsvectorin(cstring) | gtsvector gtsvectorout(gtsvector) | cstring tsvector_lt(tsvector,tsvector) | boolean tsvector_le(tsvector,tsvector) | boolean tsvector_eq(tsvector,tsvector) | boolean tsvector_ne(tsvector,tsvector) | boolean tsvector_ge(tsvector,tsvector) | boolean tsvector_gt(tsvector,tsvector) | boolean tsvector_cmp(tsvector,tsvector) | integer length(tsvector) | integer strip(tsvector) | tsvector setweight(tsvector,char) | tsvector tsvector_concat(tsvector,tsvector) | tsvector vq_exec(tsvector,tsquery)| boolean qv_exec(tsquery,tsvector)| boolean tt_exec(text,text) | boolean ct_exec(character varying,text) | boolean tq_exec(text,tsquery)| boolean cq_exec(character varying,tsquery) | boolean tsquery_lt(tsquery,tsquery) | boolean tsquery_le(tsquery,tsquery) | boolean tsquery_eq(tsquery,tsquery) | boolean tsquery_ne(tsquery,tsquery) | boolean tsquery_ge(tsquery,tsquery) | boolean tsquery_gt(tsquery,tsquery) | boolean tsquery_cmp(tsquery,tsquery) | integer tsquery_and(tsquery,tsquery) | tsquery tsquery_or(tsquery,tsquery) | tsquery tsquery_not(tsquery) | tsquery tsq_mcontains(tsquery,tsquery) | boolean tsq_mcontained(tsquery,tsquery) | boolean numnode(tsquery) | integer querytree(tsquery) | text rewrite(tsquery,tsquery,tsquery) | tsquery rewrite(tsquery,text)| tsquery rewrite_accum(tsquery,tsquery[]) | tsquery rewrite_finish(tsquery) | tsquery rewrite(tsquery[]) | tsquery stat(text) | record stat(text,text) | record rank(real[],tsvector,tsquery,integer)| real rank(real[],tsvector,tsquery)| real rank(tsvector,tsquery,integer) | real rank(tsvector,tsquery) | real rank_cd(real[],tsvector,tsquery,integer) | real rank_cd(real[],tsvector,tsquery) | real rank_cd(tsvector,tsquery,integer)| real rank_cd(tsvector,tsquery)| real Do we realy need this many ranking functions? token_type(oid) | record Again, why by OID? token_type(text) | record parse(oid,text) | record parse(text,text) | record lexize(oid,text) | text[] lexize(text,text)| text[] headline(oid,text,tsquery,text) | text headline(oid,text,tsquery) | text headline(text,text,tsquery,text) | text headline(text,text,tsquery) | text headline(text,tsquery,text) | text headline(text,tsquery) | text to_tsvector(oid,text)| tsvector to_tsvector(text,text) | tsvector to_tsquery(oid,text) | tsquery Why OID again for the configuration? I just don't see the use case and it is bloat and causes confusion. to_tsquery(text,text)| tsquery plainto_tsquery(oid,text)| tsquery plainto_tsquery(text,text) | tsquery Again, OID. I asked Oleg about this and he said: Bruce, just remove oid argument specification from documentation. so I think we can go ahead and remove cases where the configuration name or object is specified by oid. I have already removed them from the documentation and I though the patch had them removed too, but I guess not. Admittedly this API has been in flux. to_tsvector(text)| tsvector to_tsquery(text) | tsquery
Re: [HACKERS] GIT patch
Joshua D. Drake wrote: These patches will be held for 8.4: o Grouped Index Tuples (GIT) o Bitmap scan changes o Stream bitmaps (API change for Group Index Tuples) o Maintaining cluster order on insert I believe Heikki is in agreement on this. That is certainly a bummer. I think text search has challenges similar to GIT, but the GIT issues were more how to change the internal API, while text search was a user-API issue which is easier to bang into shape. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIT patch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bruce Momjian wrote: Joshua D. Drake wrote: These patches will be held for 8.4: o Grouped Index Tuples (GIT) o Bitmap scan changes o Stream bitmaps (API change for Group Index Tuples) o Maintaining cluster order on insert I believe Heikki is in agreement on this. That is certainly a bummer. I think text search has challenges similar to GIT, but the GIT issues were more how to change the internal API, while text search was a user-API issue which is easier to bang into shape. Well let me just throw out there that I am in favor of this hold back for 8.4. I don't like it but I am in favor of it. Sincerely, Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxQftATb/zqfZUUQRAob2AJwKfNaUBgz6TmSI2/bCfYvbKwQmwgCfT7pg 6UXsRjC/4WPQM+zB93p4uPM= =nXAo -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] text search vs schemas
I wrote: We can't put tsvector_update_trigger() into core in anything like its current form. As is, it will take an unqualified function name, look it up, and call it. The prospects for subversion by search path manipulation are obvious, and even if you aren't concerned about malicious attacks, the effects of the trigger are context-dependent Actually ... I'm suddenly not happy about the choice to put text search configurations etc. into schemas at all. We've been sitting here and assuming that to_tsvector('english', my_text_col) has a well defined meaning --- but as the patch stands, *it does not*. The interpretation of the config name could easily change depending on search_path. It does not seem likely that a typical installation will have so many text search configs that subdividing them into schemas will really be useful. If I recall correctly, Teodor did that on my recommendation that it'd be the cleanest way to distinguish built-in from non-built-in objects for dump purposes. That is, pg_dump would ignore TS objects that are in pg_catalog and dump everything else. But I'm having severe second thoughts about that. What seems the most attractive alternative at the moment is to have a flat namespace for TS objects (no schemas) and introduce something like a bool is_built_in column for pg_dump to consult in deciding whether to dump 'em. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] text search vs schemas
On 8/16/07, Tom Lane [EMAIL PROTECTED] wrote: Actually ... I'm suddenly not happy about the choice to put text search configurations etc. into schemas at all. We've been sitting here and assuming that to_tsvector('english', my_text_col) has a well defined meaning --- but as the patch stands, *it does not*. The interpretation of the config name could easily change depending on search_path. It does not seem likely that a typical installation will have so many text search configs that subdividing them into schemas will really be useful. If I recall correctly, Teodor did that on my recommendation that it'd be the cleanest way to distinguish built-in from non-built-in objects for dump purposes. That is, pg_dump would ignore TS objects that are in pg_catalog and dump everything else. But I'm having severe second thoughts about that. What seems the most attractive alternative at the moment is to have a flat namespace for TS objects (no schemas) and introduce something like a bool is_built_in column for pg_dump to consult in deciding whether to dump 'em. That assumes a database-oriented search config, instead of a case of multiple users confined to invidual schemas doing their own thing. Is the latter possible now, and do you want to remove that ability? Something else that occurs to me though: the problem seems to be that parts of tsearch take object names as strings. I thought one advantage of having it in core is that they are now real database objects, with owners etc. How many other database objects are passed around as string labels? Wouldn't treating them as actual objects remove this whole issue? What happens now if you try to drop a configuration that's still used in a trigger somewhere? (I'm new to both tsearch2 and this list, so please excuse any mistakes. Mostly keeping an eye on this for future use in my own projects.) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] text search vs schemas
Trevor Talbot [EMAIL PROTECTED] writes: Wouldn't treating them as actual objects remove this whole issue? Uh, no. Function names for example are subject to search-path confusion. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] text search vs schemas
Tom Lane [EMAIL PROTECTED] writes: Trevor Talbot [EMAIL PROTECTED] writes: Wouldn't treating them as actual objects remove this whole issue? Uh, no. Function names for example are subject to search-path confusion. Wait, are they? They are in PL languages but only because most languages store their source code as text just as is happening here. But they're not in views or other native SQL uses of functions because they store the reference to the specific function's OID. In dumps they output the schema along with the name. As in: postgres=# \d foo.testv View foo.testv Column | Type | Modifiers +-+--- i | integer | a | integer | View definition: SELECT test.i, foo.a(test.i) AS a FROM foo.test; -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] text search vs schemas
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Uh, no. Function names for example are subject to search-path confusion. Wait, are they? They are in PL languages but only because most languages store their source code as text just as is happening here. Hmmm ... if you look at the current solution for default expressions for serial columns, ie nextval() on a regclass constant, it's pretty schema-safe. So we could imagine inventing a regconfig datatype that is the same sort of wrapper-over-OID. Then make the 2-parameter form of to_tsvector take that type instead of text. That seems like it'd fix the problem for expression indexes on to_tsvector calls, but I don't see how it fixes the problem for triggers. We don't have any clear path for making trigger arguments be anything but a list of strings. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] text search vs schemas
On Thu, 16 Aug 2007, Tom Lane wrote: I wrote: We can't put tsvector_update_trigger() into core in anything like its current form. As is, it will take an unqualified function name, look it up, and call it. The prospects for subversion by search path manipulation are obvious, and even if you aren't concerned about malicious attacks, the effects of the trigger are context-dependent Actually ... I'm suddenly not happy about the choice to put text search configurations etc. into schemas at all. We've been sitting here and assuming that to_tsvector('english', my_text_col) has a well defined meaning --- but as the patch stands, *it does not*. The interpretation of the config name could easily change depending on search_path. what's wrong with schema-qualified name ? It does not seem likely that a typical installation will have so many text search configs that subdividing them into schemas will really be useful. If I recall correctly, Teodor did that on my recommendation it's useful. that it'd be the cleanest way to distinguish built-in from non-built-in objects for dump purposes. That is, pg_dump would ignore TS objects I think you're wrong here. Schema often used to save connections and it's natural to have different searches in different schemes. that are in pg_catalog and dump everything else. But I'm having severe second thoughts about that. What seems the most attractive alternative at the moment is to have a flat namespace for TS objects (no schemas) and introduce something like a bool is_built_in column for pg_dump to consult in deciding whether to dump 'em. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly