Re: [HACKERS] tsearch_core for inclusion
On Fri, 16 Mar 2007, Magnus Hagander wrote: Most people whom I talk to about tsearch who want the syntax changed to make it easier want something akin to just CREATE INDEX fti1 on t1(c1) USING FULLTEXT and then be done with it. This patch isn't going to give people that. Since we use standard postgresql-ish CREATE INDEX command, I assume people want to skip creation of tsvector column ? That would be great. How they could manage complex document indexing, when document is a combination (with different weights) of many text attributes from several tables, for example ? Just to give you some more work, could I have both, please. Seriously, if the current powerful functionality could be combined with a dead simple solution for new users and those who don't need it, that would be very good. This could be an auto generated hidden column or something, as long as the user doesn't need to see or care about it in the simple case. hmm, then we should think about very generic fts configuration behind such simple solution. We need to know what requirements and features should be supported. for example, CREATE INDEX fti1 on t1(c1) USING FULLTEXT syntax assume default configuration only. Joshua, Tome proposed something like CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN, so is't worth to extend it to specify fts configuration like CREATE INDEX fti1 on t1(c1) USING FULLTEXT [public.pg] [WITH] GIST | GIN Normally, fts configuration is used when creating tsvector, so CREATE INDEX doesn't need to know it. Hmm, looks rather messy. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] UPDATE using sub selects
Hi, What's the expected result if the tuple from subselect is more than 1? Error, per SQL99 section 7.14: 1) If the cardinality of a row subquery is greater than 1 (one), then an exception condition is raised: cardinality violation. I expect no update at all in case of void result set, is this the case ? No, you get nulls; it's a subquery not a join. Per SQL99 7.1: c) If the row value constructor is a row subquery, then: i) Let R be the result of the row subquery and let D be the degree of R. ii) If the cardinality of R is 0 (zero), then the result of the row value constructor is D null values. iii) If the cardinality of R is 1 (one), then the result of the row value constructor is R. regards, tom lane To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] tsearch_core for inclusion
Yeah, that one. It might be more consistent to spell it as fulltext_ops but I wouldn't insist on it. Hmm, you are prompting an idea to me how to simplify usage of full text index in simple cases. CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops); Fulltext_ops opclass parses the document similarly to_tsvector nad stores lexemes in gin index. It's a full equalent of CREATE INDEX ... ( to_tsvector( textcolumn ) ) And, let we define operation text @ text, which is equivalent of text @@ plainto_tsquery(text), so, queries will look like SELECT * FROM tblname WHERE textcolumn @ textquery; Fulltext_ops can speedup both operation, text @@ tsquery and text @ text. Because gin API has extractQuery method which calls once per index scan and it can parse query to lexemes. Some disadvantage: with that way it isn't possible make fast ranking - there is no stored parsed text. And, fulltext_ops may be done for GiST index too, but fulltext opclass will be lossy which means slow search due to reparse texts for each index match. BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done: CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch_core for inclusion
I'm also concerned about the stability of the tsearch api in general wrt including it in core. Currently the recommended upgrade practice is to dump/reload without tsearch, installing the new servers version of tsearch That is because pg_ts* tables changes, function names and internal API. Putting tsearch in core discards a lot of such problem. For example, who notices changes in pg_am table from release to release? Really it was a developers/hackers, not a usual users -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [RFC] CLUSTER VERBOSE
Grzegorz Jaskiewicz wrote: Because CLUSTER is divided into two major operations, (data reordering, index rebuild) - I see it this way: CLUSTER on I: index name T: table name, data reordering CLUSTER on I: index name T: table name, index rebuild Something like that would be nice to see how long each step takes, like vacuum verbose. and than: CLUSTER 10% CLUSTER 12% , etc We don't have progress indicators for any other commands, and I don't see why we should add one for cluster in particular. Sure, progress indicators are nice, but we should rather try to add some kind of a general progress indicator support that would support SELECTs for example. I know it's much harder, but also much more useful. I am looking for opinions, on what information should be presented. What would be useful is some kind of a metric of how (de)clustered the table was before CLUSTER, and the same # of dead vs. live row counts that vacuum verbose prints. We don't really have a good metric for clusteredness, as have been discussed before, so if you can come up with a good one that would be useful in the planner as well, that would be great. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] pltcl vs. multilib machines
Tom Lane wrote: It turns out that this is because the link command for pltcl includes -L/usr/lib, so that gets searched before /usr/lib64. And the reason the command includes that is that that's what it says in TCL_LIB_SPEC in /usr/lib/tclConfig.sh. There is also a /usr/lib64/tclConfig.sh which says the right things, but we aren't finding that because this is how we determine where to look for tclConfig.sh: $ echo 'puts $auto_path' | tclsh /usr/share/tcl8.4 /usr/share /usr/lib /usr/lib64 $ Perhaps I should lobby the Red Hat guys to change the order of that result, but really this is more our problem than theirs: whichever way tclsh reports it, it will be wrong for trying to build Postgres with the other word width on a multilib machine. Not the ideal answer, but I've had this in my config script for quite a while now: --with-tclconfig=/usr/lib64 Joe ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Lock table in non-volatile functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I'm observing that is not allowed to LOCK a table in a STABLE/IMMUTABLE function but at same time is allowed a SELECT FOR UPDATE. Is that normal? -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF+lsm7UpzwH2SGd4RAjloAJ4j/AOdJhGMRnvM/TKVpKHPwesAOACeO4mT OQhSwR1of3xS7HSSvtjGiQc= =nFFM -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [RFC] CLUSTER VERBOSE
On Mar 16, 2007, at 9:53 AM, Heikki Linnakangas wrote: Grzegorz Jaskiewicz wrote: Because CLUSTER is divided into two major operations, (data reordering, index rebuild) - I see it this way: CLUSTER on I: index name T: table name, data reordering CLUSTER on I: index name T: table name, index rebuild Something like that would be nice to see how long each step takes, like vacuum verbose. yup. I am looking for opinions, on what information should be presented. What would be useful is some kind of a metric of how (de)clustered the table was before CLUSTER, and the same # of dead vs. live row counts that vacuum verbose prints. Is that information available in cluster.c atm ? I am looking for some hints here. One of the reasons I decided to go with this patch, is to learn something - and cluster seems to be touching very 'bone' of postgres, tuples system (just like vacuum), and indices. I would appreciate any hints. We don't really have a good metric for clusteredness, as have been discussed before, so if you can come up with a good one that would be useful in the planner as well, that would be great. I really don't know where and how should I calculate such param. Any hints ? thanks. -- Grzegorz Jaskiewicz C/C++ freelance for hire ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bug in UTF8-Validation Code?
Am Mittwoch, 14. März 2007 08:01 schrieb Michael Paesold: Andrew Dunstan wrote: This strikes me as essential. If the db has a certain encoding ISTM we are promising that all the text data is valid for that encoding. The question in my mind is how we help people to recover from the fact that we haven't done that. I would also say that it's a bug that escape sequences can get characters into the database that are not valid in the specified encoding. If you compare the encoding to table constraints, there is no way to simply escape a constraint check. This seems to violate the principle of consistency in ACID. Additionally, if you include pg_dump into ACID, it also violates durability, since it cannot restore what it wrote itself. Is there anything in the SQL spec that asks for such a behaviour? I guess not. A DBA will usually not even learn about this issue until they are presented with a failing restore. Is there anything I can do to help with this problem? Maybe implementing a new GUC variable that turns off accepting wrong encoded sequences (so DBAs still can turn it on if they really depend on it)? For me, Best regards, Mario Weilguni ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Question: pg_class attributes and race conditions ?
What is the safest way to access/modify the pg_class attribute and still avoid any race conditions with the other backends ? A specific example is: To solve the CREATE INDEX problem with HOT, I am thinking of adding (along with other things) a pg_class boolean attribute, say hot_update_enable. All backends are supposed to check this attribute before they perform an UPDATE. The attribute would usually be available in relation-rd_rel My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. This would ensure that no backend has a stale Relation pointer with stale value of hot_update_enable. Also, should I use heap_inplace_update() rather than simple_heap_update() because I want other backends to see the change immediately irrespective of their snapshot ? Is this a fair analysis ? Are there any rules I must follow to avoid any deadlock and race conditions. I know we should not be requesting a higher grade lock while holding a lower grade lock, but are there any other restrictions/best practices ? Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug in UTF8-Validation Code?
Mario Weilguni wrote: Is there anything I can do to help with this problem? Maybe implementing a new GUC variable that turns off accepting wrong encoded sequences (so DBAs still can turn it on if they really depend on it)? I think that this should be done away with unconditionally. Or does anybody have a good point for allowing corrupt data in text columns? Maybe it is the way it is now because nobody could be bothered to add the appropriate checks... Yours, Laurenz Albe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bug in UTF8-Validation Code?
Albe Laurenz wrote: Mario Weilguni wrote: Is there anything I can do to help with this problem? Maybe implementing a new GUC variable that turns off accepting wrong encoded sequences (so DBAs still can turn it on if they really depend on it)? I think that this should be done away with unconditionally. Or does anybody have a good point for allowing corrupt data in text columns? Maybe it is the way it is now because nobody could be bothered to add the appropriate checks... I agree. It's more or less an integrity violation, IMNSHO. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch_core for inclusion
On Friday 16 March 2007 04:44, Teodor Sigaev wrote: I'm also concerned about the stability of the tsearch api in general wrt including it in core. Currently the recommended upgrade practice is to dump/reload without tsearch, installing the new servers version of tsearch That is because pg_ts* tables changes, function names and internal API. Putting tsearch in core discards a lot of such problem. For example, who notices changes in pg_am table from release to release? Really it was a developers/hackers, not a usual users I've ran into problems on very vanilla setups that I am sure other users are going to run across... see the following for details: http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-that-is-tsearch2-8.1-8.2-upgrading.html I don't see how the proposal is going to solve that type of problem, but maybe I am overlooking something? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
[EMAIL PROTECTED] wrote: Does hstore nest? My impression is that it doesn't. Which might well not matter, of course. If what you mean is to have mappings of mappings then no. Hstore implements a data type for a (finite) mapping (a set of key - value pairs, think hash for perl folks), with operations like H1 contains H2 (in the sense that all key-value pairs in H2 are also in H1) supported by an index. Keys and values are strings. As a perl folk I think of hashes as nestable :-). Unlike hstore, the keys are strings but the values can be anything, including a hashref or arrayref. Anyway, this means that you can't use hstore to cover the same field as YAML or JSON. That doesn't mean it's not useful - far from it. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] UPDATE using sub selects
NikhilS [EMAIL PROTECTED] writes: To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row. No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow multiple output columns, or invent a ROW_SUBLINK SubLinkType that is just like EXPR_SUBLINK except for allowing multiple output columns. The latter would probably be less likely to break other things... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch_core for inclusion
Teodor Sigaev [EMAIL PROTECTED] writes: Hmm, you are prompting an idea to me how to simplify usage of full text index in simple cases. CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops); +1 ... makes the easy cases easy, doesn't make the hard cases any harder. BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done: CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn) Not in favor of this, because FULLTEXT isn't an AM ... how would you know whether to use GIST or GIN? Actually, if you wanted to simplify life a bit, you could mark fulltext_ops as being the default opclass for text (and varchar I guess) under GIST and GIN. Then it reduces to just CREATE INDEX idxname ON tblname USING gin (textcolumn); regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch_core for inclusion
I don't see how the proposal is going to solve that type of problem, but maybe I am overlooking something? The same way as other system tables objects, they don't dump, they don't restore. In 8.3, seems, API to index AM will be changed - will anybody except pghackers see that? New opclass layout, new opfamily table - users don't that changes at all. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee wrote: What is the safest way to access/modify the pg_class attribute and still avoid any race conditions with the other backends ? A specific example is: To solve the CREATE INDEX problem with HOT, I am thinking of adding (along with other things) a pg_class boolean attribute, say hot_update_enable. All backends are supposed to check this attribute before they perform an UPDATE. The attribute would usually be available in relation-rd_rel My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. This would ensure that no backend has a stale Relation pointer with stale value of hot_update_enable. FWIW this is pretty much the same I wanted to do with setting relfrozenxid to FrozenTransactionId. To this end I wrote a patch to add a catalog pg_ntclass (later renamed to pg_class_nt), which was ultimately rejected for reasons I don't remember at the time. Maybe it would be illuminating to investigate that -- please see the archives. (I still think it would be good to have a pg_class_nt catalog, so it's not a dead idea). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] UPDATE using sub selects
Hi, On 3/16/07, Tom Lane [EMAIL PROTECTED] wrote: NikhilS [EMAIL PROTECTED] writes: To allow both of the above to hold, I think the subselect will have to be treated like a EXPR_SUBLINK subquery. I was wondering if we have a similar mechanism for plain selects/subselects to check and restrict their output to a single row. No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow multiple output columns, or invent a ROW_SUBLINK SubLinkType that is just like EXPR_SUBLINK except for allowing multiple output columns. The latter would probably be less likely to break other things... Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a not so simple and narrow fix as you had mentioned earlier in your first response :) Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Lock table in non-volatile functions
Gaetano Mendola [EMAIL PROTECTED] writes: I'm observing that is not allowed to LOCK a table in a STABLE/IMMUTABLE function but at same time is allowed a SELECT FOR UPDATE. Really? AFAICS, CommandIsReadOnly() will reject SELECT FOR UPDATE too. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch_core for inclusion
Teodor Sigaev wrote: CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops); Fulltext_ops opclass parses the document similarly to_tsvector nad stores lexemes in gin index. It's a full equalent of CREATE INDEX ... ( to_tsvector( textcolumn ) ) And, let we define operation text @ text, which is equivalent of text @@ plainto_tsquery(text), so, queries will look like SELECT * FROM tblname WHERE textcolumn @ textquery; Fulltext_ops can speedup both operation, text @@ tsquery and text @ text. Because gin API has extractQuery method which calls once per index scan and it can parse query to lexemes. Some disadvantage: with that way it isn't possible make fast ranking - there is no stored parsed text. And, fulltext_ops may be done for GiST index too, but fulltext opclass will be lossy which means slow search due to reparse texts for each index match. Just a thought: If the patch that implements the GENERATED ALWAYS syntax is accepted, than creating a seperate field that hold the parsed text and an index on that column becomes as easy as: alter table t1 add column text_parsed generated always as to_tsvector(text); create index idx on t1 using gin (text_parsed fulltext_ops); I know that there is a trigger function in tsearch that support something similar, but I really like the simplicity of the statements above. One a related note - will to_tsvector and to_tsquery be renamed to something like ft_parse_text() and ft_parse_query() if tsearch2 goes into core? It seems like the ts part of those names would be the only referenced left to the name tsearch if they are not, which could be somewhat confusing for users. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee [EMAIL PROTECTED] writes: My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. In what context are you proposing to do that, and won't this high-strength lock in itself lead to deadlocks? The whole thing sounds exceedingly ugly anyway --- for example what happens if the backend doing the CREATE INDEX fails and is therefore unable to clear the flag again? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch_core for inclusion
One a related note - will to_tsvector and to_tsquery be renamed to something like ft_parse_text() and ft_parse_query() if tsearch2 goes Furthering this... perhaps even: ft_search() ft_query() Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] tsearch_core for inclusion
Tom Lane wrote: Actually, if you wanted to simplify life a bit, you could mark fulltext_ops as being the default opclass for text (and varchar I guess) under GIST and GIN. Then it reduces to just CREATE INDEX idxname ON tblname USING gin (textcolumn); Nice. This gets my vote. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch_core for inclusion
On Fri, 16 Mar 2007, Joshua D. Drake wrote: One a related note - will to_tsvector and to_tsquery be renamed to something like ft_parse_text() and ft_parse_query() if tsearch2 goes Furthering this... perhaps even: ft_search() ft_query() ts_ means Text Search, I don't think ft_ (Full Text) is better. Going further it should be fts_ (Full Text Search), but we have many concerns about compatibility and stability of api, so I'd prefer to stay with ts_. The more important is what syntax we should accept for the final patch ? Original proposed or modified version ? On http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html they are D.1.2 and D.1.3 As I understood Teodor's proposal about CREATE INDEX command is ok for all. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch_core for inclusion
Florian G. Pflug wrote: Teodor Sigaev wrote: CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops); Fulltext_ops opclass parses the document similarly to_tsvector nad stores lexemes in gin index. It's a full equalent of CREATE INDEX ... ( to_tsvector( textcolumn ) ) And, let we define operation text @ text, which is equivalent of text @@ plainto_tsquery(text), so, queries will look like SELECT * FROM tblname WHERE textcolumn @ textquery; Fulltext_ops can speedup both operation, text @@ tsquery and text @ text. Because gin API has extractQuery method which calls once per index scan and it can parse query to lexemes. Some disadvantage: with that way it isn't possible make fast ranking - there is no stored parsed text. And, fulltext_ops may be done for GiST index too, but fulltext opclass will be lossy which means slow search due to reparse texts for each index match. Just a thought: If the patch that implements the GENERATED ALWAYS syntax is accepted, than creating a seperate field that hold the parsed text and an index on that column becomes as easy as: alter table t1 add column text_parsed generated always as to_tsvector(text); create index idx on t1 using gin (text_parsed fulltext_ops); or to take tom's idea into consideration: ALTER TABLE t1 ADD COLUMN text_parsed GENERATED ALWAYS AS to_tsvector(text); CREATE INDEX idxname ON t1 USING gin (text_parsed); which looks pretty nice and simple to me I know that there is a trigger function in tsearch that support something similar, but I really like the simplicity of the statements above. One a related note - will to_tsvector and to_tsquery be renamed to something like ft_parse_text() and ft_parse_query() if tsearch2 goes into core? It seems like the ts part of those names would be the only referenced left to the name tsearch if they are not, which could be somewhat confusing for users. well either renaming those functions (and completely destroy the upgrade path for any current users) or just refer to it as text search in the docs (so that the prefix makes sense). Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch_core for inclusion
Oleg Bartunov wrote: On Fri, 16 Mar 2007, Joshua D. Drake wrote: One a related note - will to_tsvector and to_tsquery be renamed to something like ft_parse_text() and ft_parse_query() if tsearch2 goes Furthering this... perhaps even: ft_search() ft_query() ts_ means Text Search, I don't think ft_ (Full Text) is better. Going further it should be fts_ (Full Text Search), but we have many concerns about compatibility and stability of api, so I'd prefer to stay with ts_. Hm, so it could be fts_parse_query() and fts_parse_text() You could alias it to to_tsvector() and to_tsquery() to archive api compatibility. I agree that the names of these functions are really a minor issue, and api compatibility is more important. But confusing names can be the source of a lot of errors for new users, so there *is* a point is naming things consistenly. And if the cost is basically an entry in pg_proc, why not do it? greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: In what context are you proposing to do that, and won't this high-strength lock in itself lead to deadlocks? The whole thing sounds exceedingly ugly anyway --- for example what happens if the backend doing the CREATE INDEX fails and is therefore unable to clear the flag again? Let me state the problem and a vague solution I am thinking of. I would appreciate comments and suggestions. The major known issue left with HOT is support for CREATE INDEX and CREATE INDEX CONCURRENTLY. The problem is with HEAP_ONLY tuples in the heap which do not have index entries in the existing indexes. When we build a new index, some or all of the HEAP_ONLY tuples may need index entries in the new index. It would be very ugly if we try to keep the existing indexes without index entries for those tuples. A clean solution would be to add index entries for the HEAP_ONLY tuples in the existing indexes and break all the HOT-chains. I would leave the details, but rather explain what I have in mind at high level. Any help to fill in the details or any suggestions to do things differently would immensely help. This is what I have in mind: In the context of CREATE INDEX [CONCURRENTLY], We first disable HOT-updates on the table. This would ensure that no new HOT tuples are added while we CHILL the heap. (How do we do this ?) We then start scanning the heap and start building the new index. If a HEAP_ONLY tuple is found which needs to be indexed, we mark the tuple with a CHILL_IN_PROGRESS flag and insert index entries into all the existing indexes. (The buffer is exclusively locked and the operation is WAL logged). We do this until entire heap is scanned. At this point, we would have inserted missing index entries for the HEAP_ONLY tuples. Till this point, we don't use the direct index entries to fetch the HEAP_ONLY tuples to avoid duplicate fetches of the same tuple. We now wait for all the concurrent index scans to end and then disable HOT-chain following logic to fetch tuples. (How do we do this ?) At this point, all index scans would ONLY use the direct path from the index to fetch tuples. The HOT-chains are not followed to avoid duplicate fetches of the same tuple. A second pass over the heap is now required to clear the CHILL_IN_PROGRESS, HEAP_ONLY and HEAP_HOT_UPDATED flags. At the end of this step, all the indexes and the table are in sync. Once again we need to ensure that there are no concurrent index scans in progress and then enable HOT-fetch. Also, HOT-updates can be turned on. If CREATE INDEX crashes, VACUUM is required to clear the CHILL_IN_PROGRESS flags and the corresponding index entries are removed. Since VACUUM runs mutually exclusive to CREATE INDEX, we don't need any special mechanism to handle race conditions between them. There are some other details like running multiple CREATE INDEX in parallel and still be able to CHILL the table safely. May be one of them needs to act as the chiller and others wait for it finish successfully. Any thoughts on the overall approach ? Any suggestions to simplify things or any alternate designs ? Can something as simple as CHILLing the table holding VACUUM FULL strength lock be acceptable ? Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] New Project: PostGIS application to Missing People
Luis, This is a proposal for design a new concept for integrated PostGIS application and how to implement features to improve tracking information about missing people. This application will be useful in disaster scenarios, looking for missing kids, rescue kidnapped people, human right watch, etc . This task surely is not an easy one but I think it would be a great service to the global community and I would really enjoy mentoring this project. That is my motivation. Per my private e-mail, I think you misunderstood. I was suggesting that you find a *student* to propose the project. Given that we, as a project, are responsible for our mentors and the outcome of SoC projects, we really can't accept mentors who are not familiar to us. For that matter, mentors don't get to choose the projects. The students do. We just choose what we want to accept. If the student proposing the project speaks Spanish, that should not be a problem. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. Let's assume that we'd use one byte in the tuple header to store HOT-flags. That's enough to store the HOT-status for 8 indexes. A set bit means that the index corresponding that bit doesn't have an index pointer in it. When you CREATE INDEX, assign a bit for the new index that's not currently in use. When you scan the table to build the index, clear that bit for every tuple if set and insert index entry as usual. DROP INDEX wouldn't need to scan the heap to clear the flags, because we clear them on CREATE INDEX when necessary. If you run out of bits in the header, IOW have more than 8 indexes on a table, indexes unlucky enough to not have a bit assigned to them wouldn't be HOT-updateable. This would also enable us to skip index inserts for those indexes whose key columns are not updated, and do the index inserts as usual for the rest. The limitation that you can only retail vacuum HOT-updated tuples when none of the indexed keys were changed remains, but we've accepted that already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. Yes, it looks a bit complex. But IMHO we can get around that. Do you have any ideas in mind about doing that ? I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I remember you raised this concern very early, but I am hopeful that we would be able to solve this. Would it be acceptable to have a simple (though not the best) solution for this release and then improve later on ? As I mentioned earlier, one option is to CHILL the table, if required, holding AccessExclusive lock, just like VACUUM FULL. I am assuming here that CREATE INDEX is not such a common activity, isn't that true ? I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? I think that will take away the ability to reuse HEAP_ONLY tuples without vacuuming the heap and index. Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote: Any thoughts on the overall approach ? Any suggestions to simplify things or any alternate designs ? Well your design is very different from what we discussed, so I think I should post my proposed design alongside this, for further discussion. - - - Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY. CREATE INDEX CONCURRENTLY has no feature changes to work with HOT. CREATE INDEX works normally, except when HOT tuples are found, in which case special processing occurs requiring additional locking. The index build occurs in a single scan, as now. CREATE INDEX CONCURRENTLY - We perform no chilling during the initial scan. We index the tuple identified by SnapshotNow, but we take the root tuple's htid, not the htid of the tuple version being indexed. We assume that the tuple version indexed will be the root of the HOT chain by the time the index is complete. Currently, we wait until all pre-existing transactions exit before we allow this to COMMIT. With HOT, we simply move the wait so it occurs *before* the second scan, then we can prune the HOT chains as we pass through the heap on the second scan. There will be no pre-existing HOT tuples and so no chilling is required. CREATE INDEX We add a field, xchill, to pg_class that stores TransactionIds. This is set to InvalidTransactionId if no index has been built yet. The concept of main indexer is introduced, so we allow multiple concurrent index builds, but only one of these can chill tuples at a time. 1. In IndexBuildHeapScan, as we pass through the table: a) if we find any any HOT rows, we check xchill and do one of steps (i-iii). Until this point, it hasn't been important whether we are the main or a secondary indexer. i) if xchill is InvalidTransactionId or is committed then we attempt to become main indexer immediately, following these steps before we continue building the index (1b) -- If the table is temp, or if we created the table then we immediately become the main indexer, so return immediately. If the table being indexed is already visible to everybody, then: -- Update pg_class entry for the table, setting the xchill field for the table to the builder's Xid. (Use heap_inplace_update for this, which is OK, whether we commit or abort). -- acquire AccessExclusiveLock on all existing indexes (not the heap) ii) If xchill is in-progress we wait for that transaction to complete, then do either step i) or iii). We cannot continue building our index until the other transaction commits because we cannot yet see the other index, yet we have to insert into it in order to correctly chill a tuple to allow *our* index to be built. iii) if xchill is aborted we abort also, saying that a VACUUM is needed. b) If we get here then we are the main indexer and can chill tuples. As we move through the scan we chill all HOT tuples, mark them HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for them in all existing indexes, as well as this one. Then remove CHILL_IN_PROGRESS flags, without writing WAL. c) release locks on indexes, before end of transaction 2. If we crash or a transaction abort occurs: - we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with HEAP_CHILL_IN_PROGRESS. - VACUUM must be used to clean up after an aborted index build and needs some additional code to allow this to occur. 3. Concurrent index builds are allowed. If we are not the main indexer, then we can attempt to build an index, but any scan that sees a HOT tuple will block and wait for the main index builder to complete before it proceeds. 4. When an indexscan reads the table, if it finds a HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent index scans and tuple chilling can mean that an index scan find the same tuple twice, by different routes, if a CREATE INDEX crashed. To avoid this an IndexScan will only find a tuple visible if it came across a HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed the path from root-HOT tuple. In this design, CREATE INDEX does have a deadlock risk when it is used within a transaction *and* the index is being built on a publicly visible table (i.e. not just-built and not temp). IMHO that risk is acceptable, since if users are worried about concurrent access to a table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Joshua D. Drake [EMAIL PROTECTED] writes: Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? affect. But I think you're right that generally you'll have two indexes. More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Well if you're indexing a column that you're updating then you've already failed your saving throw. The case we're trying to deal with is when you're updating columns that *aren't* indexed and therefore really don't need redundant index pointers for each tuple version with identical to the old versions. Especially since those index pointers are what's preventing us from vacuuming the old tuple versions. If you are updating an index key then there's no question you're going to need vacuum to clean out your index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug in UTF8-Validation Code?
On Tue, 2007-03-13 at 12:00 +0100, Mario Weilguni wrote: Hi, I've a problem with a database, I can dump the database to a file, but restoration fails, happens with 8.1.4. I reported the same problem a while back: http://archives.postgresql.org/pgsql-bugs/2006-10/msg00246.php Some people think it's a bug, some people don't. It is technically documented behavior, but I don't think the documentation is clear enough. I think it is a bug that should be fixed, and here's another message in the thread that expresses my opinion: http://archives.postgresql.org/pgsql-bugs/2006-11/msg00033.php If you look at that email, it includes some examples of surprising behaviors caused by that bug, particularly with bytea. In some applications (for which it's impractical to change the source code), I actually use a CHECK constraint (which raises an exception on invalid utf8 data) on every text column so that some binary data doesn't break my slony replication. I'd like to see this fixed. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Heikki Linnakangas wrote: Tom Lane wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. Agree. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. I like the idea. The major objection would be that it adds a byte to the tuple header which when considered along with the null bitmap, may actually make the header 8 bytes larger in the worst case. Also, I am also worried about the additional complexity introduced with this. We can and should work on this idea, I am wondering whether it would be too much to do before the feature freeze. I am personally inclined towards doing something simpler to tackle the CREATE INDEX issue at the moment. But if that is not acceptable and/or you or anyone else is willing help me on this, we can work on a better solution. Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug in UTF8-Validation Code?
On Wed, 2007-03-14 at 01:29 -0600, Michael Fuhr wrote: On Tue, Mar 13, 2007 at 04:42:35PM +0100, Mario Weilguni wrote: Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake: Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where we had to use iconv? What issues? I've upgraded several 8.0 database to 8.1. without having to use iconv. Did I miss something? http://www.postgresql.org/docs/8.1/interactive/release-8-1.html Some users are having problems loading UTF-8 data into 8.1.X. This is because previous versions allowed invalid UTF-8 byte sequences to be entered into the database, and this release properly accepts only valid UTF-8 sequences. One way to correct a dumpfile is to run the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql. If the above quote were actually true, then Mario wouldn't be having a problem. Instead, it's half-true: Invalid byte sequences are rejected in some situations and accepted in others. If postgresql consistently rejected or consistently accepted invalid byte sequences, that would not cause problems with COPY (meaning problems with pg_dump, slony, etc.). Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 12:40 -0400, Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. Seems like we can fix all but some strange CREATE INDEX use cases. Since we have CREATE INDEX CONCURRENTLY, seems like HOT is a showstopper for the whole CREATE INDEX concept. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. Sounds interesting. I'd not considered that before. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? On its own, I don't think this is a sufficiently wide use-case. Perhaps we should do this PLUS make HOT-semantics optional for each additional index. i.e. HOT is always enforced on primary indexes and optionally on other indexes (but not by default). If you accept the HOT option on an index, you then accept the additional issues surrounding chilling tuples. Bear in mind that there aren't any at all if you use CREATE INDEX CONCURRENTLY and many other cases. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? affect. But I think you're right that generally you'll have two indexes. More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Well if you're indexing a column that you're updating then you've already failed your saving throw. Just for everyone who missed this. Greg Stark obviously spends his time (or at some time) playing|ed DD. I have an Epic level Sorcerer, how about you Greg? ;) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Buildfarm feature request: some way to track/classify failures
The current buildfarm webpages make it easy to see when a branch tip is seriously broken, but it's not very easy to investigate transient failures, such as a regression test race condition that only materializes once in awhile. I would like to have a way of seeing just the failed build attempts across all machines running a given branch. Ideally it would be possible to tag failures as to the cause (if known) and/or symptom pattern, and then be able to examine just the ones without known cause or having similar symptoms. I'm not sure how much of this is reasonable to try to do with webpages similar to what we've got. But the data is all in a database AIUI, so another possibility is to do this work via SQL. That'd require having the ability to pull the information from the buildfarm database so someone else could manipulate it. So I guess the first question is can you make the build data available, and the second is whether you're interested in building more flexible views or just want to let someone else do that. Also, if anyone does make an effort to tag failures, it'd be good to somehow push that data back into the master database, so that we don't end up duplicating such work. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Bitmapscan changes
Heikki Linnakangas wrote: Joshua D. Drake wrote: This is what I suggest. Provide a tarball of -head with the patch applied. Here you are: http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz Provide a couple of use cases that can be run with explanation of how to verify the use cases. There's a number of simple test cases on the web page that I've used (perfunittests). I can try to simplify them and add explanations. O.k. maybe I am the only one, but I actually dug the archives for what website you were talking about and then said, Aha!, he means: http://community.enterprisedb.com/git/;. So I will accept my own paperbag, and hopefully save some from the same fate by posted the above link. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] Bitmapscan changes
Heikki Linnakangas wrote: Joshua D. Drake wrote: This is what I suggest. Provide a tarball of -head with the patch applied. Here you are: http://community.enterprisedb.com/git/pgsql-git-20070315.tar.gz Provide a couple of use cases that can be run with explanation of how to verify the use cases. There's a number of simple test cases on the web page that I've used (perfunittests). I can try to simplify them and add explanations. This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz File not found. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Tom Lane wrote: The current buildfarm webpages make it easy to see when a branch tip is seriously broken, but it's not very easy to investigate transient failures, such as a regression test race condition that only materializes once in awhile. I would like to have a way of seeing just the failed build attempts across all machines running a given branch. Ideally it would be possible to tag failures as to the cause (if known) and/or symptom pattern, and then be able to examine just the ones without known cause or having similar symptoms. I'm not sure how much of this is reasonable to try to do with webpages similar to what we've got. But the data is all in a database AIUI, so another possibility is to do this work via SQL. That'd require having the ability to pull the information from the buildfarm database so someone else could manipulate it. So I guess the first question is can you make the build data available, and the second is whether you're interested in building more flexible views or just want to let someone else do that. Also, if anyone does make an effort to tag failures, it'd be good to somehow push that data back into the master database, so that we don't end up duplicating such work. If the data is already there and just not represented, just let me know exactly what you want and I will implement pages for that data happily. Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Bitmapscan changes
Joshua D. Drake wrote: This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz Sorry about that, typo in the filename. Fixed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Tom Lane wrote: The current buildfarm webpages make it easy to see when a branch tip is seriously broken, but it's not very easy to investigate transient failures, such as a regression test race condition that only materializes once in awhile. I would like to have a way of seeing just the failed build attempts across all machines running a given branch. Ideally it would be possible to tag failures as to the cause (if known) and/or symptom pattern, and then be able to examine just the ones without known cause or having similar symptoms. I'm not sure how much of this is reasonable to try to do with webpages similar to what we've got. But the data is all in a database AIUI, so another possibility is to do this work via SQL. That'd require having the ability to pull the information from the buildfarm database so someone else could manipulate it. So I guess the first question is can you make the build data available, and the second is whether you're interested in building more flexible views or just want to let someone else do that. Also, if anyone does make an effort to tag failures, it'd be good to somehow push that data back into the master database, so that we don't end up duplicating such work. Well, the db is currently running around 13Gb, so that's not something to be exported lightly ;-) If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some features, like dynamic partitioning and copy from queries, that might make life easier (CP people: that's a hint :-) ) I don't want to fragment effort, but I also know CP don't want open access, for obvious reasons. We can also look at a safe API that we could make available freely. I've already done this over SOAP (see example client at http://people.planetpostgresql.org/andrew/index.php?/archives/14-SOAP-server-for-Buildfarm-dashboard.html ). Doing updates is a whole other matter, of course. Lastly, note that some buildfarm enhancements are on the SOC project list. I have no idea if anyone will express any interest in that, of course. It's not very glamorous work. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] tsearch_core for inclusion
On Friday 16 March 2007 10:45, Teodor Sigaev wrote: I don't see how the proposal is going to solve that type of problem, but maybe I am overlooking something? The same way as other system tables objects, they don't dump, they don't restore. In 8.3, seems, API to index AM will be changed - will anybody except pghackers see that? New opclass layout, new opfamily table - users don't that changes at all. If I have configured my tsearch install for spanish (spanish dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the next database upgrade? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Well, the db is currently running around 13Gb, so that's not something to be exported lightly ;-) If we upgraded from Postgres 8.0.x to 8.2.x we could make use of some features, like dynamic partitioning and copy from queries, that might make life easier (CP people: that's a hint :-) ) Yeah, Yeah... I need to get you off that machine as a whole :) Which is on the list but I am waiting for 8.3 *badda bing*. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Andrew Dunstan [EMAIL PROTECTED] writes: Well, the db is currently running around 13Gb, so that's not something to be exported lightly ;-) Yeah. I would assume though that the vast bulk of that is captured log files. For the purposes I'm imagining, it'd be sufficient to export only the rest of the database --- or ideally, records including all the other fields and a URL for each log file. For the small number of log files you actually need to examine, you'd chase the URL. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Bitmapscan changes
Heikki Linnakangas wrote: Joshua D. Drake wrote: This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz Sorry about that, typo in the filename. Fixed. Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA http://pgsql.privatepaste.com/170yD8c0gr Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Simon Riggs wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? On its own, I don't think this is a sufficiently wide use-case. Perhaps we should do this PLUS make HOT-semantics optional for each additional index. i.e. HOT is always enforced on primary indexes and optionally on other indexes (but not by default). Here's is a very simple, low-tech idea. How about checking whether the new index requires chilling tuples; if it does, then elog(ERROR) until all the indexes have been manually chilled, which would be done with an ALTER INDEX ... CHILL command or something like that. Only when all indexes are known chilled, you can create another one, and then the user can hotify indexes as appropriate. (Disclaimer: I haven't followed the HOT design closely to know if this makes enough sense) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Bitmapscan changes
Joshua D. Drake wrote: Heikki Linnakangas wrote: Joshua D. Drake wrote: This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz Sorry about that, typo in the filename. Fixed. Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA Thanks for looking into this, though that test alone doesn't really tell us anything. You'd have to run the same tests with and without clustered indexes enabled, and compare. With the default settings the test data fits in memory anyway, so you're not seeing the I/O benefit but only the CPU overhead. Attached is a larger test case with a data set of 2 GB. Run the git_demo_init.sql first to create tables and indexes, and git_demo_run.sql to perform selects on them. The test runs for quite a long time, depending on your hardware, and print the time spent on the selects, with and without clustered index. You'll obviously need to run it with the patch applied. I'd suggest to enable stats_block_level to see the effect on buffer cache hit/miss ratio. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com git_demo.tar.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Well, the db is currently running around 13Gb, so that's not something to be exported lightly ;-) Yeah. I would assume though that the vast bulk of that is captured log files. For the purposes I'm imagining, it'd be sufficient to export only the rest of the database --- or ideally, records including all the other fields and a URL for each log file. For the small number of log files you actually need to examine, you'd chase the URL. OK, for anyone that wants to play, I have created an extract that contains a summary of every non-CVS-related failure we've had. It's a single table looking like this: CREATE TABLE mfailures ( sysname text, snapshot timestamp without time zone, stage text, conf_sum text, branch text, changed_this_run text, changed_since_success text, log_archive_filenames text[], build_flags text[] ); The dump is just under 1Mb and can be downloaded from http://www.pgbuildfarm.org/mfailures.dump If this is useful we can create it or something like it on a regular basis (say nightly). The summary log for a given build can be got from: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sysnamedt=snapshot To look at the log for a given run stage select http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=sysnamedt=snapshotstg=stagename - the stage names available (if any) are the entries in log_archive_filenames, stripped of the .log suffix. We can make these available over an API that isn't plain http is people want. Or we can provide a version of the buildlog that is tripped of the html. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
On Fri, 16 Mar 2007, Andrew Dunstan wrote: OK, for anyone that wants to play, I have created an extract that contains a summary of every non-CVS-related failure we've had. It's a single table looking like this: CREATE TABLE mfailures ( sysname text, snapshot timestamp without time zone, stage text, conf_sum text, branch text, changed_this_run text, changed_since_success text, log_archive_filenames text[], build_flags text[] ); Sweet. Should be interesting to look at. The dump is just under 1Mb and can be downloaded from http://www.pgbuildfarm.org/mfailures.dump Sure about that? --14:45:45-- http://www.pgbuildfarm.org/mfailures.dump = `mfailures.dump' Resolving www.pgbuildfarm.org... 207.173.203.146 Connecting to www.pgbuildfarm.org|207.173.203.146|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 9,184,142 (8.8M) [text/plain] -- BOO! We changed Coke again! BLEAH! BLEAH! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 16:59 -0400, Alvaro Herrera wrote: Here's is a very simple, low-tech idea. How about checking whether the new index requires chilling tuples; if it does, then elog(ERROR) until all the indexes have been manually chilled, which would be done with an ALTER INDEX ... CHILL command or something like that. Only when all indexes are known chilled, you can create another one, and then the user can hotify indexes as appropriate. Well, I've spent two weeks searching for a design that does CREATE INDEX without changing existing functionality. What's been proposed is very close, but not exact. CREATE INDEX CONCURRENTLY can work, so we're just discussing the other increasingly edgy cases. I agree some kind of compromise on CREATE INDEX seems to be required if we want HOT without some drastic reductions in function. I'm happy to go for low tech approaches, or anything really. Simple is good, so we can hit feature freeze. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] tsearch_core for inclusion
Robert Treat wrote: On Friday 16 March 2007 10:45, Teodor Sigaev wrote: I don't see how the proposal is going to solve that type of problem, but maybe I am overlooking something? The same way as other system tables objects, they don't dump, they don't restore. In 8.3, seems, API to index AM will be changed - will anybody except pghackers see that? New opclass layout, new opfamily table - users don't that changes at all. If I have configured my tsearch install for spanish (spanish dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the next database upgrade? I believe what teodor meant is that the *tables* won't be dumped as such by pg_dump (the same as all other tables in pg_catalog). But pg_dump would of course need to dump the information stored in the tables - it would put some CREATE FULLTEXT ... statements into your dump. Its really the same as for any other database object like a function, a type, ... greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Adding a typmod field to Const et al
A month or so back I wrote: BTW, I think a good case could be made that the core of the problem is exactly that struct Const doesn't carry typmod, and thus that we lose information about constructs like 'foo'::char(7). We should fix that, and also anywhere else in the expression tree structure where we are discarding knowledge about the typmod of a result. This has got some urgency because of Teodor's recent work on allowing user defined types to have typmods --- we can expect massive growth in the number of scenarios where it matters. I looked into this and determined that the interesting cases seem to be Const: needs a struct field added ArrayRef: ditto; but we could drop refrestype which is redundant SubLink:EXPR and ARRAY cases should recurse to subplan target item, as exprType() does ArrayExpr: should adopt the same behavior as Coalesce and similar nodes, ie, if all the elements show the same type/typmod then return that typmod instead of -1 With these changes, exprTypmod covers all the same cases as exprType, except for cases that demonstrably don't have a typmod, such as the result of a non-length-coercion function, or nodes that have a hardwired result type such as BOOL that doesn't take a typmod. Comments, objections? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Buildfarm feature request: some way to track/classify failures
Jeremy Drake wrote: The dump is just under 1Mb and can be downloaded from http://www.pgbuildfarm.org/mfailures.dump Sure about that? HTTP request sent, awaiting response... 200 OK Length: 9,184,142 (8.8M) [text/plain] Damn these new specs. They made me skip a digit. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Adding a typmod field to Const et al
Tom Lane [EMAIL PROTECTED] writes: ArrayExpr: should adopt the same behavior as Coalesce and similar nodes, ie, if all the elements show the same type/typmod then return that typmod instead of -1 ... Comments, objections? I'm not entirely convinced by this one. Does that mean expressions like this would throw an error if col1 was declared as a numeric(1)? ARRAY[col1] || 10 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Adding a typmod field to Const et al
Gregory Stark [EMAIL PROTECTED] writes: I'm not entirely convinced by this one. Does that mean expressions like this would throw an error if col1 was declared as a numeric(1)? ARRAY[col1] || 10 No, because the result of the || operator won't have a specific typmod. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Lock table in non-volatile functions
Tom Lane wrote: Gaetano Mendola [EMAIL PROTECTED] writes: I'm observing that is not allowed to LOCK a table in a STABLE/IMMUTABLE function but at same time is allowed a SELECT FOR UPDATE. Really? AFAICS, CommandIsReadOnly() will reject SELECT FOR UPDATE too. kalman=# select version(); version PostgreSQL 8.2.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) (1 row) kalman=# CREATE TABLE test( a integer ); CREATE TABLE kalman=# INSERT INTO test VALUES ( 1 ); INSERT 0 1 kalman=# CREATE OR REPLACE FUNCTION sp_test() kalman-# RETURNS INTEGER AS $body$ kalman$# DECLARE kalman$# my_integer integer; kalman$# my_port_set RECORD; kalman$# BEGIN kalman$# FOR my_port_set IN kalman$# SELECT a kalman$# FROM test kalman$# FOR UPDATE kalman$# LOOP kalman$# my_integer = 0; kalman$# END LOOP; kalman$# RETURN 0; kalman$# END; kalman$# $body$ language 'plpgsql' kalman-# STABLE; CREATE FUNCTION kalman=# select sp_test(); sp_test - 0 (1 row) BTW why forbid the lock in a non volatile function or (if you fix this) the SELECT FOR UPDATE ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Lock table in non-volatile functions
Gaetano Mendola [EMAIL PROTECTED] writes: Tom Lane wrote: Really? AFAICS, CommandIsReadOnly() will reject SELECT FOR UPDATE too. kalman$# FOR my_port_set IN kalman$# SELECT a kalman$# FROM test kalman$# FOR UPDATE kalman$# LOOP Hm, that's a bug --- SPI_cursor_open is failing to check for a read-only query. BTW why forbid the lock in a non volatile function or (if you fix this) the SELECT FOR UPDATE ? Well, as for the lock, a non-volatile function isn't supposed to have any side-effects, and taking a lock is certainly a side-effect no? Now I suppose it'll be taking AccessShareLock anyway if it reads any tables, so maybe we could negotiate about what sort of locks could be allowed; but I'd certainly argue that allowing it to take any kind of exclusive lock would be a Bad Idea. As for SELECT FOR UPDATE, there's a very good reason for disallowing that even without considering what locks it takes. In a READ COMMITTED transaction, SELECT FOR UPDATE can return row states that aren't visible according to the nominal transaction snapshot, and so it violates the promise of stable results. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] Bitmapscan changes
Heikki Linnakangas wrote: Joshua D. Drake wrote: Heikki Linnakangas wrote: Joshua D. Drake wrote: This URL is not working: http://community.enterprisedb.com/git/git-perfunittests-20070222.tar.gz Sorry about that, typo in the filename. Fixed. Here are my results on a modest 3800X2 2 Gig of ram, RAID 1 dual SATA heap_pages | normal_index_pages | clustered_index_pages ++--- 216217 | 109679 | 1316 select_with_normal_index -- 10 (1 row) Time: 1356524.743 ms select_with_normal_index -- 10 (1 row) Time: 1144832.597 ms select_with_normal_index -- 10 (1 row) Time: 445.236 ms And now run the same tests with clustered index Timing is on. select_with_clustered_index - 10 (1 row) Time: 815622.768 ms select_with_clustered_index - 10 (1 row) Time: 535749.457 ms select_with_clustered_index - 10 (1 row) select relname,indexrelname,idx_blks_read,idx_blks_hit from pg_statio_all_indexes where schemaname = 'public'; relname| indexrelname | idx_blks_read | idx_blks_hit --+--+---+-- narrowtable | narrowtable_index|296973 | 904654 narrowtable2 | narrowtable2_clustered_index | 44556 | 857269 (2 rows) select relname,heap_blks_read,heap_blks_hit,idx_blks_read,idx_blks_hit from pg_statio_user_tables ; relname| heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit --++---+---+-- narrowtable2 | 734312 | 40304136 | 44556 | 857269 narrowtable | 952044 | 40002609 |296973 | 904654 Seems like a clear win to me. Anyone else want to try? Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly