Re: [HACKERS] pg_trgm
On fre, 2010-05-28 at 10:04 +0900, Tatsuo Ishii wrote: I think the problem at hand has nothing at all to do with agglutination or CJK-specific issues. You will get the same problem with other languages *if* you set a locale that does not adequately support the characters in use. E.g., Russian with locale C and encoding UTF8: select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E \u043D\u044B'); similarity NaN (1 row) Wait. This works fine for me with stock pg_trgm. local is C and encoding is UTF8. What version of PostgreSQL are you using? Mine is 8.4.4. This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If you run this in 8.4, you're just comparing a sequence of ASCII letters and digits. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch submission deadline for CommitFest 2010-07
Hi, When is the patch submission deadline for CommitFest 2010-07? July 14? or June 14 (before review fest)? Sorry, I'm not sure what is actually different between CF and RF. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On 27/05/10 22:55, Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: How about select myfunc(a := 7, b := 6); ? Hey, that's a thought. We couldn't have used that notation before because we didn't have := as a separate token, but since I hacked that in for plpgsql's benefit, I think it might be an easy fix. It'd be nice that it puts the argument name first like the spec syntax, too. If we go with that, should we make some preparations to allow = in the future? Like provide an alternative operator name for hstore's =, and add a note somewhere in the docs to discourage other modules from using =. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch submission deadline for CommitFest 2010-07
On 28/05/10 09:26, Fujii Masao wrote: When is the patch submission deadline for CommitFest 2010-07? July 14? or June 14 (before review fest)? Sorry, I'm not sure what is actually different between CF and RF. July 14. But if you finish the patch before June 14, it will get reviewed earlier, between June 14 and July 14. So aim for June 14 :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 27/05/10 22:56, Robert Haas wrote: On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haasrobertmh...@gmail.com wrote: On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner (a) The tuples were written within the same transaction which created or truncated the table. In case (a), you mess up visibility with respect to other command-IDs within the transaction. Surely that problem is surmountable? I proposed an idea at PGCon, but I believe Tom and Heikki thought it was far too grotty to consider. No, I think it's surmountable too. We discussed hacks to teach the MVCC checks that all frozen tuples on a table that was created in the same transaction (i.e. the same cases where we skip WAL logging) were actually created by the running transaction, and check commandid accordingly. Or detect simple DML commands where we know that the command doesn't read the table. COPY would usually fall into that category, though non-immutable input functions make that a bit iffy. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 28/05/10 04:47, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: On 19/05/10 21:01, Jesper Krogh wrote: In practice, just cranking the statistics estimate up high enough seems to solve the problem, but doesn't there seem to be something wrong in how the statistics are collected? The algorithm to determine most common vals does not do it accurately. That would require keeping all lexemes from the analysed tsvectors in memory, which would be impractical. If you want to learn more about the algorithm being used, try reading http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in ts_typanalyze.c I re-scanned that paper and realized that there is indeed something wrong with the way we are doing it. So I think we have to fix this. Hm, I'll try to take another look this evening (CEST). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch submission deadline for CommitFest 2010-07
On Fri, May 28, 2010 at 4:08 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 28/05/10 09:26, Fujii Masao wrote: When is the patch submission deadline for CommitFest 2010-07? July 14? or June 14 (before review fest)? Sorry, I'm not sure what is actually different between CF and RF. July 14. But if you finish the patch before June 14, it will get reviewed earlier, between June 14 and July 14. So aim for June 14 :-). Thanks! I'll do my best :) Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Failback with log shipping
At PGCon, several people asked me about restarting an old master as a standby after failover has happened. And it wasn't the first time people ask me about it, even before 9.0. We have no mention of that in the docs, which is a pretty serious oversight. What can we say about it? I believe the current official policy is that you have to take a new base backup and restore from that. Rsync can be used to speed that up. However, someone once asked me for a comment on a script he wrote to do that in a smarter way. I forget who and when and how exactly it worked, but it seems possible to do safely. First of all, you have to shut down the master cleanly for this to work, otherwise there can be changes in the old master that never made it to the standby. Assuming controlled shutdown and that the standby received all WAL from the old master before it was promoted, I think you can simply create a recovery.conf in the old master's data directory to turn it into a standby server, and restart. Am I missing something? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
On Fri, May 28, 2010 at 12:07 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I learned that to return an enum value from C, one needs to return the OID of the right row of the pg_enum table. I eventually managed to write the code below, which is mostly based on the enum_in function in src/backend/utils/adt/enum.c . PG_RETURN macros shouldn't do any nontrivial processing (see the existing ones for references). I assume you have the enum labels declared in pg_enum.h, so I think you can just return the correct OID values directly. Declare constants for them in pg_enum.h and then just do PG_RETURN_OID(whatever). #define JSONTypeNullOid ... #define JSONTypeStringOid ... #define JSONTypeNumberOid ... It really shouldn't be necessary to do a catalog lookup to retrieve a constant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mingw initdb failure on HEAD
Takahiro Itagaki wrote: Andrew Dunstan and...@dunslane.net wrote: Several buildfarm mingw members are getting failures like this, when running initdb: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_batdt=2010-05-27%2019:45:18 Could it have been caused by the PGDLLIMPORT/PGDLLEXPORT changes? Probably, but it's curious because MSVC members are OK. Do we have special treatments for exported functions in mingw? It might export 'dllimport' funtions/variables, but not 'dllexport' ones. It has broken Cygwin as well, so that's two out of three Windows platforms that don't like this. I am not sure what the best fix for the original problem is, but this isn't it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT
On 27/05/10 10:59, Takahiro Itagaki wrote: Log Message: --- Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT independently from BUILDING_DLL. It is always __declspec(dllexport). It looks like the Windows buildfarm members are not happy about this change... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specification for Trusted PLs?
On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote: David Fetter da...@fetter.org writes: I don't know about a *good* idea, but here's the one I've got. 1. Make a whitelist. This is what needs to work in order for a language to be a fully functional trusted PL. Well, I pretty much lose interest right here, because this is already assuming that every potentially trusted PL is isomorphic in its capabilities. That's not normally a problem. The conventional way would be to place the interpreter in its own sandbox, similar to how Chrome has each tab running in its own process. These processes are protected in a way so that the code running inside them can't do any harm--e.g. a ptrace jail[1]. This is quite a change from existing pl implementations, and present a different set of performance/compatibility issues. If that were so, there'd not be very much point in supporting multiple PLs. A good example here is R. I have no idea whether PL/R is trusted or trustworthy, but in any case the main point of supporting that PL is to allow access to the R statistical library. How does that fit into a whitelist designed for some other language? It doesn't. AFAIU, a trusted language should only be able to perform computation, e.g. not touch the local filesystem, beyond readonly access to library code, and not see the network. Policies such as these are easy to enforce in a ptrace jail, and would still allow a trusted pl/r to do whatever it wants to get any pure calculation done. As soon as it needs to touch the file system the language becomes non-trusted. 3. (the un-fun part) Write tests which attempt to do things not in the whitelist. We can start from the vulnerabilities so far discovered. And here is the *other* fatal problem: a whitelist does not in fact give any leverage at all for testing whether there is access to functionality outside the whitelist. (It might be useful if you could enforce the whitelist at some sufficiently low level of the language implementation, but as a matter of testing, it does nothing for you.) What you're suggesting isn't so much un-fun as un-possible. Given a maze of twisty little subroutines all different, how will you find out if any of them contain calls of unwanted functionality? A jail helps with a lot of this; the remainder is in the normal fact that bug testing can only demonstrate the presence of bugs and you need to do formal code proof to check for the absence of bugs. -- Sam http://samason.me.uk/ [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.5494 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specification for Trusted PLs?
On fre, 2010-05-28 at 13:03 +0100, Sam Mason wrote: That's not normally a problem. The conventional way would be to place the interpreter in its own sandbox, similar to how Chrome has each tab running in its own process. These processes are protected in a way so that the code running inside them can't do any harm--e.g. a ptrace jail[1]. This is quite a change from existing pl implementations, and present a different set of performance/compatibility issues. Surely a definition of a trusted language that invalidates the existing trusted languages is not going help resolve the issue. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specification for Trusted PLs?
Sam Mason wrote: On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote: David Fetter da...@fetter.org writes: I don't know about a *good* idea, but here's the one I've got. 1. Make a whitelist. This is what needs to work in order for a language to be a fully functional trusted PL. Well, I pretty much lose interest right here, because this is already assuming that every potentially trusted PL is isomorphic in its capabilities. That's not normally a problem. The conventional way would be to place the interpreter in its own sandbox, similar to how Chrome has each tab running in its own process. These processes are protected in a way so that the code running inside them can't do any harm--e.g. a ptrace jail[1]. This is quite a change from existing pl implementations, and present a different set of performance/compatibility issues. I have my own translation of this last sentence. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback with log shipping
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Assuming controlled shutdown and that the standby received all WAL from the old master before it was promoted, I think you can simply create a recovery.conf in the old master's data directory to turn it into a standby server, and restart. Am I missing something? Would that mean that a controlled restart of the old master so that the recovery stops before applying the logs that were not shipped to the slave would put it in the same situation? How easy is it to script that? It seems all we need is the current XID of the slave at the end of recovery. It should be in the log, maybe it's easy enough to expose it at the SQL level… Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback with log shipping
On 28/05/10 16:11, Dimitri Fontaine wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Assuming controlled shutdown and that the standby received all WAL from the old master before it was promoted, I think you can simply create a recovery.conf in the old master's data directory to turn it into a standby server, and restart. Am I missing something? Would that mean that a controlled restart of the old master so that the recovery stops before applying the logs that were not shipped to the slave would put it in the same situation? Not shipped before the first failover you mean? No, if any WAL records were created in the old master that were not shipped to the standby before failover, the corresponding changes to the data files might've been flushed to disk already, and you can't undo those by not replaying the WAL record on restart. How easy is it to script that? It seems all we need is the current XID of the slave at the end of recovery. It should be in the log, maybe it's easy enough to expose it at the SQL level… XID doesn't help at all, LSN more likely, but I feel that I don't fully understand what you're saying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Peter Eisentrautpete...@gmx.net writes: How about select myfunc(a := 7, b := 6); If we go with that, should we make some preparations to allow = in the future? Like provide an alternative operator name for hstore's =, and add a note somewhere in the docs to discourage other modules from using =. I'd vote no. We're intentionally choosing to deviate from a very poor choice of notation. Maybe Peter can interest the committee in allowing := as an alternate notation, instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr
Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: There are functions pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port, which are exposed through the pg_stat_activity view, but there is no straightforward way to get the server-side address and port of a connection. This is obviously much less commonly needed than the client information, ... indeed. Is it worth burdening the pg_stats mechanism with this? The use case seems vanishingly thin. I am confused how this is different from inet_server_addr() and inet_server_port(). Also, these functions return nothing for unix domain connections. Should they, particularly for the port number which we do use to map to a socket name? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
Robert Haas robertmh...@gmail.com writes: On Fri, May 28, 2010 at 12:07 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I learned that to return an enum value from C, one needs to return the OID of the right row of the pg_enum table. I eventually managed to write the code below, which is mostly based on the enum_in function in src/backend/utils/adt/enum.c . PG_RETURN macros shouldn't do any nontrivial processing (see the existing ones for references). Yeah, that was my first reaction too. If we don't already have one, it would be appropriate to provide a lookup enum value function (functionally about the same as enum_in, but designed to be called conveniently from C). Then, if you needed to work from a textual enum label, you'd call that function and then PG_RETURN_OID. However, for a built-in enum type, I agree with Robert's solution of just #define-ing fixed OIDs for the values of the type. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/28 Tom Lane t...@sss.pgh.pa.us: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Peter Eisentrautpete...@gmx.net writes: How about select myfunc(a := 7, b := 6); If we go with that, should we make some preparations to allow = in the future? Like provide an alternative operator name for hstore's =, and add a note somewhere in the docs to discourage other modules from using =. I'd vote no. We're intentionally choosing to deviate from a very poor choice of notation. Maybe Peter can interest the committee in allowing := as an alternate notation, instead. -1 I prefer a standard. And again - it isn't poor syntax - ADA, Perl use it, It can be a funny if ANSI SQL committee change some design from Oracle's proposal to PostgreSQL's proposal. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr
Bruce Momjian br...@momjian.us writes: Tom Lane wrote: ... indeed. Is it worth burdening the pg_stats mechanism with this? The use case seems vanishingly thin. I am confused how this is different from inet_server_addr() and inet_server_port(). I think the point is to let someone find out *from another session* which server port number a particular session is using. I fail to see a significant use case for that, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: ... indeed. Is it worth burdening the pg_stats mechanism with this? The use case seems vanishingly thin. I am confused how this is different from inet_server_addr() and inet_server_port(). I think the point is to let someone find out *from another session* which server port number a particular session is using. I fail to see a significant use case for that, though. Uh, aren't they all using the same server port number, e.g. 5432? Is the issue different IP addresses for the same server? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Peter Eisentrautpete...@gmx.net writes: How about select myfunc(a := 7, b := 6); If we go with that, should we make some preparations to allow = in the future? Like provide an alternative operator name for hstore's =, and add a note somewhere in the docs to discourage other modules from using =. I'd vote no. We're intentionally choosing to deviate from a very poor choice of notation. Maybe Peter can interest the committee in allowing := as an alternate notation, instead. What's poor about it? It probably comes from PLSQL which in turn got it from Ada, so they aren't just making this up. I agree it's inconvenient for us, but that's a different issue. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr
On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: ... indeed. Is it worth burdening the pg_stats mechanism with this? The use case seems vanishingly thin. I am confused how this is different from inet_server_addr() and inet_server_port(). I think the point is to let someone find out *from another session* which server port number a particular session is using. I fail to see a significant use case for that, though. Uh, aren't they all using the same server port number, e.g. 5432? Is the issue different IP addresses for the same server? Yes, I would like to know who is connecting to what IP address. It's useful if you have HA setups and you need to check which way your connections are going. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr
Peter Eisentraut wrote: On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: ... indeed. Is it worth burdening the pg_stats mechanism with this? The use case seems vanishingly thin. I am confused how this is different from inet_server_addr() and inet_server_port(). I think the point is to let someone find out *from another session* which server port number a particular session is using. I fail to see a significant use case for that, though. Uh, aren't they all using the same server port number, e.g. 5432? Is the issue different IP addresses for the same server? Yes, I would like to know who is connecting to what IP address. It's useful if you have HA setups and you need to check which way your connections are going. OK, at least now I understand the goal. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
What's poor about it? It probably comes from PLSQL which in turn got it from Ada, so they aren't just making this up. I agree it's inconvenient for us, but that's a different issue. Further, the ( parameter := value ) notation is not only consistent with what is used inside pl/pgsql, it's also more consistent than AS with SQL Server's named parameter notation, which is: EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5 Since former SQL Server / Sybase apps are the most likely to use named parameter notation in PostgreSQL, having a syntax which could be ported using only sed would be nice. Relevant to the whole discussion, though ... is the conflicting SQL standard syntax something we're every likely to implement? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to pass around collation information
I have been thinking about this collation support business a bit. Ignoring for the moment where we would get the actual collation routines from, I wonder how we are going to pass this information around in the system. Someone declares a collation on a column in a table, and somehow this information needs to arrive in bttextcmp() and friends. Also, functions that take in a string and return one (e.g., substring), need to take in this information and return it back out. How should this work? Option 1, make it part of the datum. That way it will pass through the system just fine, but it would waste a lot of storage and break just about everything that operates on string types now, as well as pg_upgrade. So that's probably out. Option 2, invent some new mechanism that accompanies a datum or a type whereever it goes. Kind of like typmod, but not really. Then the collation information would presumably be made available to functions through the fmgr interface. The binary representation of data values stays the same. Option 2a, while we are at it, are there any other things of this nature that would be worth supporting at the same time? I could imagine that having the option to pass around the ctype locale or the text search dictionary in a similar way could be useful. Is this something that could be combined with typmod or other dormant data type metadata requirements (PostGIS?, XML?)? Ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Josh Berkus wrote: Since former SQL Server / Sybase apps are the most likely to use named parameter notation in PostgreSQL, having a syntax which could be ported using only sed would be nice. Relevant to the whole discussion, though ... is the conflicting SQL standard syntax something we're every likely to implement? Not sure, but I assume people could be using the AS syntax in other databases (for the inheritance usage) and then trying to use it in our database. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Bruce Momjian wrote: Josh Berkus wrote: Since former SQL Server / Sybase apps are the most likely to use named parameter notation in PostgreSQL, having a syntax which could be ported using only sed would be nice. Relevant to the whole discussion, though ... is the conflicting SQL standard syntax something we're every likely to implement? Not sure, but I assume people could be using the AS syntax in other databases (for the inheritance usage) and then trying to use it in our database. Yeah. Whether or not we ever implement it really doesn't matter, IMO. We should not be in the business of taking an SQL standard piece of syntax and using it for some other purpose. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010: Option 2, invent some new mechanism that accompanies a datum or a type whereever it goes. Kind of like typmod, but not really. Then the collation information would presumably be made available to functions through the fmgr interface. The binary representation of data values stays the same. Is the collation a property of the datum, or one of the comparison? If the latter, should it be really be made a sidecar of a datum, or would it make more sense to attach it to the operation being performed? I wonder if instead of trying to pass it down multiple layers till bttextcmp and further down, it would make more sense to set a global variable somewhere in the high levels, and only have it checked in varstr_cmp. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On 28/05/10 19:27, Peter Eisentraut wrote: I have been thinking about this collation support business a bit. Ignoring for the moment where we would get the actual collation routines from, I wonder how we are going to pass this information around in the system. Someone declares a collation on a column in a table, and somehow this information needs to arrive in bttextcmp() and friends. Yes. Comparison operators need it, as do functions like isalpha(). Also, functions that take in a string and return one (e.g., substring), need to take in this information and return it back out. How should this work? Hmm, I don't see what substring would need collation for. And it certainly shouldn't be returning it. Collation is a property of the comparison operators (and isalpha etc.), and the planner needs to deduce the right collation for each such operation in the query. That involves looking at the tables and columns involved, as well as per-user information and any explicit COLLATE clauses in the query, but all that happens at plan-time. Option 1, make it part of the datum. That way it will pass through the system just fine, but it would waste a lot of storage and break just about everything that operates on string types now, as well as pg_upgrade. So that's probably out. It's also fundamentally wrong, collation is not a property of a datum but of the operation. Option 2, invent some new mechanism that accompanies a datum or a type whereever it goes. Kind of like typmod, but not really. Then the collation information would presumably be made available to functions through the fmgr interface. The binary representation of data values stays the same. Something like that. I'm thinking that bttextcmp() and friends will simply take an extra argument indicating the collation, and we'll teach the operator / operator class infrastructure about that too. One way to approach this is to realize that it's already possible to use multiple collations in a database. You just have to define separate = operators and operator classes for every collation, and change all your queries to use the right operator depending on the desired collation everywhere where you use = (including ORDER BYs, with the USING operator syntax). The behavior is exactly what we want, it's just completely inpractical, so we need something to do the same in a less cumbersome way. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On 28/05/10 19:19, Josh Berkus wrote: ( parameter := value ) notation is not only consistent with what is used inside pl/pgsql, it's also more consistent than AS with SQL Server's named parameter notation, which is: EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5 Since former SQL Server / Sybase apps are the most likely to use named parameter notation in PostgreSQL, having a syntax which could be ported using only sed would be nice. Once you solve the problem of finding the '='s in the source, replacing them is exactly the same effort regardless of what you replace them with. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] dividing money by money
I'm not quite sure how to go about changing it from an add-on function to a built-in one. So if you want to do that, go ahead. If you'd rather I did, just tell me how it's done. Andy Balholm (509) 276-2065 a...@balholm.com On May 26, 2010, at 11:18 AM, Kevin Grittner wrote: Hi Andy, Do you want to package this up as a patch for 9.1? If not, is it OK if I do? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote: It's also fundamentally wrong, collation is not a property of a datum but of the operation. One way to approach this is to realize that it's already possible to use multiple collations in a database. You just have to define separate = operators and operator classes for every collation, and change all your queries to use the right operator depending on the desired collation everywhere where you use = (including ORDER BYs, with the USING operator syntax). The behavior is exactly what we want, it's just completely inpractical, so we need something to do the same in a less cumbersome way. Well, maybe we should step back a little and work out what sort of feature we actually want, if any. The feature I'm thinking of is what people might call per-column locale, and the SQL standard defines that. It would look like this: CREATE TABLE test ( a varchar COLLATE de, b varchar COLLATE fr ); SELECT * FROM test WHERE a 'baz' ORDER BY b; So while it's true that the collation is used by the operations ( and ORDER BY), the information which collation to use comes with the data values. It's basically saying, a is in language de, so sort it like that unless told otherwise. There is also an override syntax available, like this: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; But here again the collation is attached to a data value, and only from there it is passed to the operator. What is actually happening is SELECT * FROM test WHERE (a COLLATE en) 'baz' ORDER BY (b COLLATE sv); What you appear to be describing is a per-operation locale, which also sounds valid, but it would be a different thing. It might be thought of as this: SELECT * FROM test WHERE a ( COLLATE en) 'baz' ORDER BY COLLATE sv b; with some suitable global default. So which one of these should it be? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VPATH docs
Patch applied. Thanks. --- David Fetter wrote: Folks, Andrew Dunstan posted some instructions on his blog, and I'm thinking they clarify things a great deal for people who want to learn how to do VPATH builds. Attached patch adds the description along with an index term. What say? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate [ Attachment, skipping... ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Josh Berkus j...@agliodbs.com writes: Since former SQL Server / Sybase apps are the most likely to use named parameter notation in PostgreSQL, having a syntax which could be ported using only sed would be nice. I fear you're vastly overestimating the ability of sed to distinguish between = used in this way and = used in any other way. Still, putting the parameter name on the left is clearly both more natural and more like every other product. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Andrew Dunstan and...@dunslane.net writes: Yeah. Whether or not we ever implement it really doesn't matter, IMO. We should not be in the business of taking an SQL standard piece of syntax and using it for some other purpose. Evidently the 201x SQL standard has blindsided us twice: first by defining a syntax for named parameters that wasn't like ours, and second by defining a syntax for something else that conflicted with ours. I agree that the AS approach is pretty untenable given that double whammy, and we'd better get rid of it. (Hopefully Peter will be able to keep us better apprised of things in the future.) It seems that we're agreed on trying to use := instead, and the only debate is about whether to deprecate use of = as an operator. But anything that we might do about the latter would reach no farther than the documentation in 9.0 anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
Peter Eisentraut pete...@gmx.net writes: So while it's true that the collation is used by the operations ( and ORDER BY), the information which collation to use comes with the data values. It's basically saying, a is in language de, so sort it like that unless told otherwise. There is also an override syntax available, like this: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; That seems fairly bizarre. What does this mean: WHERE a COLLATE en b COLLATE de ? If it's an error, why is this not an error WHERE a COLLATE en b if b is marked as COLLATE de in its table? I guess the more general question is whether the spec expects that collation settings can be derived statically (like type information) or whether they might sometimes only be known at runtime. We also need to think about whether we're okay with only applying collation to built-in types (text, varchar, char) or whether we need the feature to work for add-on types as well. In particular, is citext still a meaningful feature if we have this, or is it superseded by COLLATE? In the abstract I'd prefer to let it work for user-defined types, but if we can have a much simpler implementation by not doing so, it might be better to give that up. Is COLLATE a property that can be attached to a domain over text? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On Fri, May 28, 2010 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: So while it's true that the collation is used by the operations ( and ORDER BY), the information which collation to use comes with the data values. It's basically saying, a is in language de, so sort it like that unless told otherwise. There is also an override syntax available, like this: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; That seems fairly bizarre. What does this mean: WHERE a COLLATE en b COLLATE de ? If it's an error, why is this not an error WHERE a COLLATE en b if b is marked as COLLATE de in its table? I think we need to think of the comparison operators as ternary, and the COLLATE syntax applied to columns or present in queries as various ways of setting defaults or explicit overrides for what the third argument will end up being. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On Fri, May 28, 2010 at 10:08 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/5/28 Tom Lane t...@sss.pgh.pa.us: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Peter Eisentrautpete...@gmx.net writes: How about select myfunc(a := 7, b := 6); If we go with that, should we make some preparations to allow = in the future? Like provide an alternative operator name for hstore's =, and add a note somewhere in the docs to discourage other modules from using =. I'd vote no. We're intentionally choosing to deviate from a very poor choice of notation. Maybe Peter can interest the committee in allowing := as an alternate notation, instead. -1 I prefer a standard. And again - it isn't poor syntax - ADA, Perl use it, It can be a funny if ANSI SQL committee change some design from Oracle's proposal to PostgreSQL's proposal. I agree. It's good syntax. I think we should try hard to adopt it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback with log shipping
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Not shipped before the first failover you mean? No, if any WAL records were created in the old master that were not shipped to the standby before failover, the corresponding changes to the data files might've been flushed to disk already, and you can't undo those by not replaying the WAL record on restart. Ah yes you need to fail between when (WAL is written and not sent) and CHECKPOINT for this to be possible. But automatic testing of the situation (is the data already safe in PGDATA) might still be possible? How easy is it to script that? It seems all we need is the current XID of the slave at the end of recovery. It should be in the log, maybe it's easy enough to expose it at the SQL level… XID doesn't help at all, LSN more likely, but I feel that I don't fully understand what you're saying. Sorry I was unclear, I was thinking in terms of recovery.conf file and either recovery_target_xid or recovery_target_time. The idea being that if the old-master didn't CHECKPOINT the changes that the slave missed, then we can do crash recovery and choose to stop before that point, then apply WALs from the new master. That might sounds like a strange thing to do, but if switching from master to slave allows skipping the base backup to get a slave again, I guess we'll see people choosing the all automated failover scripting (with heartbeat and so on). The goal would be to reduce downtime the more you can. When possible I'd still choose manual failover to the slave after a master's restart and crash recovery, but the downtime constraint might not allow that everywhere. So you're saying controlled failover could possibly skip base backup to reuse old master as new slave, and I'm asking if by some luck (crash happened before CHECKPOINT) and some recovery.conf setup we could get to the same situation in case of hard failure. That would allow completely automatic switchover / failover with no need to resync. I'm not sure how much clearer I managed to be :) Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote: I think we need to think of the comparison operators as ternary, and the COLLATE syntax applied to columns or present in queries as various ways of setting defaults or explicit overrides for what the third argument will end up being. How could this extend to things like isalpha() or upper() that would need access to ctype information? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is
itag...@postgresql.org (Takahiro Itagaki) writes: Log Message: --- PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is __declspec (dllimport) on other compilers because cygwin and mingw don't like dllexport. That probably explains why the code was the way it was before ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; That seems fairly bizarre. What does this mean: WHERE a COLLATE en b COLLATE de ? If it's an error, why is this not an error WHERE a COLLATE en b if b is marked as COLLATE de in its table? The way I understand it, a collation derivation can be explicit or implicit. Explicit derivations override implicit derivations. If in the argument set of an operation, explicit collation derivations exist, they must all be the same. I guess the more general question is whether the spec expects that collation settings can be derived statically (like type information) or whether they might sometimes only be known at runtime. It looks like it is treated like type information. The derivation and validation rules are part of the Syntax Rules. We also need to think about whether we're okay with only applying collation to built-in types (text, varchar, char) or whether we need the feature to work for add-on types as well. In particular, is citext still a meaningful feature if we have this, or is it superseded by COLLATE? In the abstract I'd prefer to let it work for user-defined types, but if we can have a much simpler implementation by not doing so, it might be better to give that up. I think if we get this done using the strcoll_l() API to do the work, which looks like the path of least resistance at the moment, citext would still be useful because all the standard locales would still be case sensitive. Is COLLATE a property that can be attached to a domain over text? According to the spec, yes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 28/05/10 19:19, Josh Berkus wrote: EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5 Once you solve the problem of finding the '='s in the source, replacing them is exactly the same effort regardless of what you replace them with. I guess it would be a choice of target between 'GXKP' AS ItemCode, 5 AS PriceLevel and ItemCode := 'GXKP', PriceLevel := 5 By the way, as it seems we're voting, I much prefer := than either the AS and = variant, and I'm not keen on seeing us deprecate the operator. Further, as said Andrew, keeping AS conflicting with the standard with no hysterical raisin to do so would be a bad move IMHO. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
Hi, Peter Eisentraut pete...@gmx.net writes: On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote: USING operator syntax). The behavior is exactly what we want, it's just completely inpractical, so we need something to do the same in a less cumbersome way. For an example, here is something I did to better understand the system a while ago. Of course I never got to use it for real: http://pgsql.tapoueh.org/btree_fr_ops/ Well, maybe we should step back a little and work out what sort of feature we actually want, if any. The feature I'm thinking of is what people might call per-column locale, and the SQL standard defines that. It would look like this: CREATE TABLE test ( a varchar COLLATE de, b varchar COLLATE fr ); SELECT * FROM test WHERE a 'baz' ORDER BY b; So while it's true that the collation is used by the operations ( and ORDER BY), the information which collation to use comes with the data values. It's basically saying, a is in language de, so sort it like that unless told otherwise. There is also an override syntax available, like this: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; But here again the collation is attached to a data value, and only from there it is passed to the operator. What is actually happening is SELECT * FROM test WHERE (a COLLATE en) 'baz' ORDER BY (b COLLATE sv); What you appear to be describing is a per-operation locale, which also sounds valid, but it would be a different thing. It might be thought of as this: SELECT * FROM test WHERE a ( COLLATE en) 'baz' ORDER BY COLLATE sv b; with some suitable global default. So which one of these should it be? My understanding is that what we do is per-operation locale. The locale information bears no semantic when not attached to some operation on strings, like sorting or comparing. So what you're showing here I think is how to attach a collation label to every string in the system, at the catalog level or dynamically at the query level. Now this collation label will only be used whenever you want to use a collation aware function or operator. Those functions need to get the labels for their implementation to have the expected meaning. So we need both to attach collations to all known strings (defaulting to the current database collation I guess), as you showed at the SQL level, and to pass this information down to the functions operating on those strings. A confusing example on this grounds would be the following, which I hope the standard disallow: SELECT * FROM test WHERE a COLLATE en b COLLATE sv; Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 28/05/10 04:47, Tom Lane wrote: I re-scanned that paper and realized that there is indeed something wrong with the way we are doing it. The paper says (last sentence in the definition of the algorithm, section 4.2): When a user requests a list of items with threshold s, we output those entries in D where f = (s-e)N. What we are actually doing is emitting every entry with f = 2. Since e is fixed at 1/w, this effectively means that we are setting s to be only fractionally greater than e, which means very large relative errors in the estimates. I gave it a though and reread the paper, but since I already blundered once, please verify me on this. We follow the algorithm as written, the trouble starts when we want to output the result. The paper says which items from the D structure should be returned when the user asks for items that have frequencies higher than a threshold s. What we want to put in the statistics table are items accompanied by their frequencies, so we need to do some reasoning before we can construct the result. Say we have an item I with count f (taken from our D structure). The total number of entries is N. The question would be: what would be the minimum frequency that the user could specify, that would still make us output this element. From f = (s - e) * N we can say it's s = (f / N) + e So if the user wants items that occur with frequency (f / N) + e or less. This would mean that the corresponding value in the statistics entry should be I, (f / N) + e) The thing is, this doesn't change much, because currently we are putting (f / N) there, and e is set to 1 / stats_target * 10, so the difference would not be dramatic. Or, if you want it explained another way: we are emitting words whose f is very small and whose delta is very large, representing items that got added to the scan very late. These really shouldn't be there because their true frequency is probably a lot less than the intended threshold. Well, the idea of the algorithm is that if their frequency would have been bigger, they would appear earlier and would survive the pruning, as I understand it. The net effect of this is first that there are a lot of rather useless entries in the MCV list whose claimed frequency is quite small, like as little as two occurrences. Their true frequency could be quite a bit more. What's even worse is that we believe that the minimum of these claimed frequencies is a reliable upper bound for the frequencies of items not listed in the MCV list. Per the algorithm it *is* the upper bound, if I got my maths correctly. The last item in the list would not be returned if the requested frequency was higher than the one that is associated to that item. So I think we have to fix this. The right thing is to select s and e values that are actually meaningful in the terms of the paper, then compute w from that, and be sure to enforce the minimum f value specified in the algorithm ... ie, don't be afraid to throw away values in the final D table. I we should definitely prune the table one last time in the very probable case that the loop ended in the middle of two regularly happening prunes. As for selecting the algorithm parameters: we don't get to select s. We do get to select e, but that's it. I have a feeling that our problems are caused by thte fact, that the algorithm tries to answer the question which elements occur more frequently than X and we actually want the answer to the what's the frequency of each element. I've almost convinced myself that the transformation between the answers to these questions exists, but this trouble report keeps giving me doubts. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: I thought it might be about that simple once you went at it the right way ;-). However, I'd suggest checking ferror(pset.queryFout) as well as the fflush result. Sure, I can add the ferror() check. Patch attached. This seemed pretty small and uncontroversial, so I went ahead and committed it for 9.0. I rearranged the order of operations a bit to make it seem more coherent, and also added an initial clearerr() just to forestall problems if stdout had the error flag set for some reason. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote: I think we need to think of the comparison operators as ternary, and the COLLATE syntax applied to columns or present in queries as various ways of setting defaults or explicit overrides for what the third argument will end up being. How could this extend to things like isalpha() or upper() that would need access to ctype information? Good question. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: We follow the algorithm as written, the trouble starts when we want to output the result. The paper says which items from the D structure should be returned when the user asks for items that have frequencies higher than a threshold s. What we want to put in the statistics table are items accompanied by their frequencies, so we need to do some reasoning before we can construct the result. Well, the estimated frequency is still just f/N. The point is that we must filter out items with small f values because they're probably inaccurate --- in particular, anything with f eN is completely untrustworthy. I agree that we currently aren't bothering to determine a specific s value, but we probably need to do that in order to have a clear understanding of what we are getting. The idea that I was toying with is to assume a Zipfian distribution of the input (with some reasonable parameter), and use that to estimate what the frequency of the K'th element will be, where K is the target number of MCV entries or perhaps a bit more. Then use that estimate as the s value, and set e = s/10 or so, and then w = 1/e and continue as per the paper. If the eventual filtering results in a lot less than the target number of MCV entries (because the input wasn't so Zipfian), we lose, but at least we have accurate numbers for the entries we kept. I we should definitely prune the table one last time in the very probable case that the loop ended in the middle of two regularly happening prunes. The paper doesn't say that you need to do that. I suspect if you work through the math, you'll find out that the minimum-f filter skips anything that would have been pruned anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
2010/5/28 alvherre alvhe...@commandprompt.com: Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010: Option 2, invent some new mechanism that accompanies a datum or a type whereever it goes. Kind of like typmod, but not really. Then the collation information would presumably be made available to functions through the fmgr interface. The binary representation of data values stays the same. Is the collation a property of the datum, or one of the comparison? If the latter, should it be really be made a sidecar of a datum, or would it make more sense to attach it to the operation being performed? I wonder if instead of trying to pass it down multiple layers till bttextcmp and further down, it would make more sense to set a global variable somewhere in the high levels, and only have it checked in varstr_cmp. Maybe collation is property of some operation: func call, sort, ... I prefer to put collation info to FunctionCallInfo structure. Usually you cannot change collation per row - collation is attached to expression. Regards Pavel -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List traffic
On 5/27/10 5:42 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: We do not have a problem. The lists are fine the way they are. +1 ... wasn't the point I thought you were trying to make, but I'm good with not changing things. Yeah, that's because I was responding to the suggestion that 5 of our lists should be collapsed into 'general' as the One Uber-List. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback with log shipping
On 28/05/10 22:20, Dimitri Fontaine wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Not shipped before the first failover you mean? No, if any WAL records were created in the old master that were not shipped to the standby before failover, the corresponding changes to the data files might've been flushed to disk already, and you can't undo those by not replaying the WAL record on restart. Ah yes you need to fail between when (WAL is written and not sent) and CHECKPOINT for this to be possible. Checkpoint only guarantees that everything before that is flushed to disk. It doesn't guarantee that nothing is flushed to disk until that. If there's a checkpoint that hasn't been shipped to the standby, you're certainly hosed, but if there is no checkpoint you don't know if the data files have changed or not. But automatic testing of the situation (is the data already safe in PGDATA) might still be possible? Hmm, so the situation is this: D - E - crash! / A - B - C \ d - f - g - h The letters represent WAL records. C is the last WAL record that was shipped to the standby, D E are WAL records that were generated in the old master before the crash but never sent to the standby, and d-h are WAL records created in the standby after failover. I guess you could read the WAL in the old master and compare it with the WAL from the standby to figure out where the failover happened (C), and then scan all the data pages involved in records D - E, checking that the LSNs on the data pages touched by those records are earlier than C. That's a bit laborious, and requires knowledge of all different kinds of WAL records to figure out which data pages they touch, but seems possible in theory. How easy is it to script that? It seems all we need is the current XID of the slave at the end of recovery. It should be in the log, maybe it's easy enough to expose it at the SQL level… XID doesn't help at all, LSN more likely, but I feel that I don't fully understand what you're saying. Sorry I was unclear, I was thinking in terms of recovery.conf file and either recovery_target_xid or recovery_target_time. The idea being that if the old-master didn't CHECKPOINT the changes that the slave missed, then we can do crash recovery and choose to stop before that point, then apply WALs from the new master. Ah, I see. No, you don't want to use a recovery target, that would end the recovery and start the server. You just need to make sure to use WALs from the new master instead of the old one when both exist. So you're saying controlled failover could possibly skip base backup to reuse old master as new slave, and I'm asking if by some luck (crash happened before CHECKPOINT) and some recovery.conf setup we could get to the same situation in case of hard failure. That would allow completely automatic switchover / failover with no need to resync. Yeah, that would be nice. In practice, I think you would get lucky more often than not, because whenever you modify and dirty a page, writing a WAL record, the usage count on the buffer is incremented and it won't be evicted from the buffer cache for a while. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On 28/05/10 23:15, Robert Haas wrote: On Fri, May 28, 2010 at 3:20 PM, Peter Eisentrautpete...@gmx.net wrote: On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote: I think we need to think of the comparison operators as ternary, and the COLLATE syntax applied to columns or present in queries as various ways of setting defaults or explicit overrides for what the third argument will end up being. How could this extend to things like isalpha() or upper() that would need access to ctype information? Good question. :-( Strictly speaking, collation and ctype are two different things. Which is a convenient way to evade the question :-). But you could ask, how would we handle more fine-grained ctype in upper() then? Perhaps by adding a second argument for ctype. Similarly to to_tsvector([config, ] string), you could explicitly pass the ctype as an argument, or leave it out in which case a default is used. It wouldn't give you per-column ctype, though. What does the spec have to say about the ctype used for upper() et al BTW? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List traffic
On Fri, May 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote: On 5/27/10 5:42 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: We do not have a problem. The lists are fine the way they are. +1 ... wasn't the point I thought you were trying to make, but I'm good with not changing things. Yeah, that's because I was responding to the suggestion that 5 of our lists should be collapsed into 'general' as the One Uber-List. i think not all should be collapsed but at least -novice, IMHO -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to pass around collation information
On Fri, May 28, 2010 at 10:32:34PM +0300, Peter Eisentraut wrote: On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote: SELECT * FROM test WHERE a COLLATE en 'baz' ORDER BY b COLLATE sv; That seems fairly bizarre. What does this mean: WHERE a COLLATE en b COLLATE de ? If it's an error, why is this not an error WHERE a COLLATE en b if b is marked as COLLATE de in its table? The way I understand it, a collation derivation can be explicit or implicit. Explicit derivations override implicit derivations. If in the argument set of an operation, explicit collation derivations exist, they must all be the same. The SQL standard has an explicit set of rules for determining the collations of any particular operation (they apply to operators/functions not to the datums). The basic idea is that tables/columns/data types define an implicit collation, which can be overidden by explicit collations. If there is ambiguity you throw an error. I implemented this all several years ago, it's not all that complicated really. IIRC I added a field to the Node type and each level determined it's collection from the sublevels. I solved the problem for the OP by providing an extra function to user defined functions which would return the collation for that particular call. The more interesting question I found was that the standard only defined collation for strings, whereas it can be applied much more broadly. I described a possible solution several years back, it should in the archives somewhere. It worked pretty well as I recall. IIRC The idea was to let each type has its own set of collations and when using an operator/function you let the collection be determined by the argument that had the same type as the return type. It would be nice if COLLATE could finally be implemented, it'd be quite useful. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 28/05/10 22:22, Tom Lane wrote: The idea that I was toying with is to assume a Zipfian distribution of the input (with some reasonable parameter), and use that to estimate what the frequency of the K'th element will be, where K is the target number of MCV entries or perhaps a bit more. Then use that estimate as the s value, and set e = s/10 or so, and then w = 1/e and continue as per the paper. If the eventual filtering results in a lot less than the target number of MCV entries (because the input wasn't so Zipfian), we lose, but at least we have accurate numbers for the entries we kept. I see what you mean, so the idea would be: * assume some value of W as the number of all words in the language * estimate s as 1/(st + 10)*H(W), where H(W) is the W'th harmonic number and st is the statistics target, using Zipf's law * set e = s/10 and w = 1/e, that is 10/s * perform LC using that value of w * remove all elements for which f (s-e)N, that is f 0.9*sN, where N is the total number of lexemes processed * create the MCELEM entries as (item, f/N) Now I tried to substitute some numbers there, and so assuming the English language has ~1e6 words H(W) is around 6.5. Let's assume the statistics target to be 100. I chose s as 1/(st + 10)*H(W) because the top 10 English words will most probably be stopwords, so we will never see them in the input. Using the above estimate s ends up being 6.5/(100 + 10) = 0.06 We then do LC, pruning the D structure every w = 1/0.006 = 167 lexemes After that, we remove lexemes with f 0.9 * 0.06 * N = 0.054*N So assuming that on average a tsvector has 154 elements and that we went through 35017 rows (as it would be in Jesper's case, before he raised the stats target from 100 to 1000), we will remove lexemes with f 0.054 * 35017 * 154 that is f 291201.37 I wonder what would happen if Jasper's case if we did that... And I wonder how sound that maths is. I we should definitely prune the table one last time in the very probable case that the loop ended in the middle of two regularly happening prunes. The paper doesn't say that you need to do that. I suspect if you work through the math, you'll find out that the minimum-f filter skips anything that would have been pruned anyway. Possible. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql's is_select_command is naive
Robert Haas wrote: On Wed, May 26, 2010 at 10:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It knows that queries beginning with select or values are select commands, but it seems not to be clued in about table and with. What we really ought to do IMO is throw out the entire current implementation of fetch_count. ?If libpq exposed access to the protocol-level fetch count, we could implement it without this cursor kluge. I suspect that would make a lot of people very happy. I have added the following TODO: Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries * http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php * http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing psql's input buffer after auto-reconnect
Tom Lane wrote: We determined that $SUBJECT would be a good idea in this thread: http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php I looked a bit at what it would take to make this happen. The difficulty is that the input buffer is a local variable in MainLoop(), and so are a bunch of other subsidiary variables that would need to be reset along with it. The place where auto-reconnect presently happens is CheckConnection(), which is in a different file and is also several levels of subroutine call away from MainLoop. AFAICS there are three ways that we might attack this: 1. Massive restructuring of the code in common.c so that the fact of a connection reset having happened can be returned back to MainLoop. 2. Export much of MainLoop's internal state as globals, so that CheckConnection can hack on it directly. 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting the connection, to force control to go back to MainLoop directly. MainLoop is already coded to clear its local state after catching a longjmp. Now #1 might be the best long-term solution but I have no particular appetite to tackle it, and #2 is just too ugly to contemplate. That leaves #3, which is a bit ugly in its own right but seems like the best fix we're likely to get. Comments, better ideas? Added to TODO: Prevent psql from sending remaining single-line multi-statement queries after reconnection * http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php * http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 28, 2010, at 7:19 PM, Bruce Momjian br...@momjian.us wrote: Jan Wieck wrote: Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. That would be FAR more complex, less robust, and less performant - whereas doing what Jan has proposed is pretty straightforward and should have minimal impact on performance - or none when not enabled. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small exclusion constraints patch
Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Currently, the check for exclusion constraints performs a sanity check that's slightly too strict -- it assumes that a tuple will conflict with itself. That is not always the case: the operator might be , in which case it's perfectly valid for the search for conflicts to not find itself. This patch simply removes that sanity check, and leaves a comment in place. I'm a bit uncomfortable with removing the sanity check; it seems like a good thing to have, especially since this code hasn't even made it out of beta yet. AFAIK the case is purely hypothetical, because we have no index opclasses supporting such an operator, no? How about just documenting that we'd need to remove the sanity check if we ever did add support for such a case? Done, with attached, applied patch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com Index: src/backend/executor/execUtils.c === RCS file: /cvsroot/pgsql/src/backend/executor/execUtils.c,v retrieving revision 1.171 diff -c -c -r1.171 execUtils.c *** src/backend/executor/execUtils.c 26 Feb 2010 02:00:41 - 1.171 --- src/backend/executor/execUtils.c 29 May 2010 02:30:23 - *** *** 1310,1316 /* * We should have found our tuple in the index, unless we exited the loop ! * early because of conflict. Complain if not. */ if (!found_self !conflict) ereport(ERROR, --- 1310,1317 /* * We should have found our tuple in the index, unless we exited the loop ! * early because of conflict. Complain if not. If we ever implement ! * '' index opclasses, this check will fail and will have to be removed. */ if (!found_self !conflict) ereport(ERROR, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback with log shipping
On Fri, May 28, 2010 at 7:58 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: At PGCon, several people asked me about restarting an old master as a standby after failover has happened. And it wasn't the first time people ask me about it, even before 9.0. We have no mention of that in the docs, which is a pretty serious oversight. What can we say about it? I believe the current official policy is that you have to take a new base backup and restore from that. Rsync can be used to speed that up. However, someone once asked me for a comment on a script he wrote to do that in a smarter way. I forget who and when and how exactly it worked, but it seems possible to do safely. First of all, you have to shut down the master cleanly for this to work, otherwise there can be changes in the old master that never made it to the standby. Assuming controlled shutdown and that the standby received all WAL from the old master before it was promoted, I think you can simply create a recovery.conf in the old master's data directory to turn it into a standby server, and restart. Am I missing something? Failover always increments the timeline ID of the old standby (i.e., new master). Can that procedure work around the gap of the timeline ID between servers? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers