Re: [HACKERS] SR slaves and .pgpass
On Mon, Jun 7, 2010 at 5:42 AM, Andrew Dunstan and...@dunslane.net wrote: The docs state, w.r.t. $subject: The password can be provided either in the primary_conninfo string or in a separate ~/.pgpass file on the standby server. I tried this with a database name of replication in the .pgpass file, which matches what we need to use in pg_hba.conf, but it failed miserably, and only worked when I used a wildcard for the database name in the .pgpass file. If this is expected it needs to be documented more clearly; if not, it's a bug. Yep, this is expected, so we need to improve the doc. What about: diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 5c0d9ab..458a4e2 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -819,7 +819,9 @@ hostreplication foo 192.168.1.100/32md5 para The host name and port number of the primary, connection user name, and password are specified in the filenamerecovery.conf/ file or - the corresponding environment variable on the standby. + in a separate filename~/.pgpass/ on the standby (In the latter case, + literaldatabase/ field in a filename~/.pgpass/ file must be + literal*/). For example, if the primary is running on host IP literal192.168.1.50/, port literal5432/literal, the superuser's name for replication is literalfoo/, and the password is literalfoopass/, the administrator diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 439db3f..cc351f8 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -268,9 +268,11 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows primary (see xref linkend=streaming-replication-authentication). A password needs to be provided too, if the primary demands password - authentication. (The password can be provided either in + authentication. The password can be provided either in the varnameprimary_conninfo/varname string or in a separate - filename~/.pgpass/ file on the standby server.) + filename~/.pgpass/ file on the standby server (in the latter case, + literaldatabase/ field in a filename~/.pgpass/ file must be + literal*/). Do not specify a database name in the varnameprimary_conninfo/varname string. /para Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
On 07/06/10 06:06, Stephen Frost wrote: Also, perhaps I'm not being paranoid enough, but all this concern over error cases really doesn't really worry me that much. The amount of data one could acquire that way is pretty limited. It's not limited. It allows you to read all contents of the underlying table or tables. I don't see much point doing anything at all if we don't plug that. There's many side channels like exposing row counts in EXPLAIN and statistics and timing attacks, that are not as critical, because they don't let expose all data, and the attacker can't accurately choose what data is exposed. Those are not as important. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exporting raw parser
On Wed, May 26, 2010 at 6:02 PM, Tatsuo Ishii is...@postgresql.org wrote: I'm thinking about exporting the raw parser and related modules as a C library. Though this will not be an immediate benefit of PostgreSQL itself, it will be a huge benefit for any PostgreSQL applications/middle ware those need to parse SQL statements. In the past I and people I have known/worked with have made strategic use of UDFs running on a live server that return the parse tree, semantically analyzed tree, and planned tree (I think) outNode textual representation for various projects, and found them highly useful. Syntactic, semantic, and operational meaning of a query was useful for our projects. Some of this code was linked with the server, and so reading the node using Postgres' parser was easy. Otherwise, a small parser needed be written for external projects. Perhaps a slightly more ideal state of affairs would be: * These hooks to acquire the syntactic/semantic/planned trees would be bundled for free * When writing code not linked against the server, a more common serialization format, ala JSON or whatnot A more ambitious project that I don't think is in the scope of any initial implementation would be to allow for cross referencing of these compilation passes, similar to how GNU Bison allows you to interrogate for the position of a lexeme when reporting errors. In my experience, code written that mangles one layer (say, semantic, or harder yet, plan) has a hard time doing the best error because getting from a node at the bottom to the right lexeme(s) at the top is very cumbersome. One could imagine this being useful for other purposes too, but that is how I felt it firsthand. Feels a lot harder, though. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nvarchar notation accepted?
On sön, 2010-06-06 at 21:13 -0500, Jaime Casanova wrote: On Thu, May 13, 2010 at 11:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jaime Casanova ja...@2ndquadrant.com writes: On Thu, May 13, 2010 at 10:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Actually, the lexer translates N'foo' to NCHAR 'foo' and then the grammar treats that just like CHAR 'foo'. In short, the N doesn't do anything very useful, and it certainly doesn't have any effect on encoding behavior. I think this is something Tom Lockhart put in ten or so years back, and never got as far as making it actually do anything helpful. so, the N'' syntax is fine and i don't need to hunt them as a migration step? As long as the implied cast to char(n) doesn't cause you problems, it's fine. Is this something we want to document? Maybe something like: For historical reasons N'' syntax is also accepted as a string literal. or we can even mention the fact that that is useful for sql server migrations? I don't think it's a historical reason, at least not unless all reasons are to some degree historical. The N'' syntax is in the SQL standard, and so if our implementation matches that, it should be documented as a supported feature, and if it doesn't match it, we should fix it, and perhaps leave it undocumented until we have figured out what we want it to do. (I have not done that analysis.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/07 15:48), Heikki Linnakangas wrote: On 07/06/10 06:06, Stephen Frost wrote: Also, perhaps I'm not being paranoid enough, but all this concern over error cases really doesn't really worry me that much. The amount of data one could acquire that way is pretty limited. It's not limited. It allows you to read all contents of the underlying table or tables. I don't see much point doing anything at all if we don't plug that. IIRC, Oracle pays attention not to expose function's arguments via error messages due to the security reasons, although it focuses on that does not provide a hint to attacker who tries SQL injection. It seems to me it is a matter of degree. If we allows to execute lowrite() inside of the join loop, it can be abused to move massive invisible information into visible large objects within a little time. So, its priority is relatively higher than error messages. However, we cannot move massive information via error messages in a little time, although it may expose whole of the arguments. So, its threat is relatively smaller. There's many side channels like exposing row counts in EXPLAIN and statistics and timing attacks, that are not as critical, because they don't let expose all data, and the attacker can't accurately choose what data is exposed. Those are not as important. It also means; because they can provide much smaller bandwidth to leak invisible information than error messages, these are not as important. Is it right? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exporting raw parser
Daniel Farina drfar...@acm.org writes: Some of this code was linked with the server, and so reading the node using Postgres' parser was easy. Otherwise, a small parser needed be written for external projects. Perhaps a slightly more ideal state of affairs would be: * These hooks to acquire the syntactic/semantic/planned trees would be bundled for free * When writing code not linked against the server, a more common serialization format, ala JSON or whatnot Accessing to those data have been talked about with respect to DDL triggers too. You want to be able to know what exactly is being executed, and against what objects. And you want to be able to abuse this information from either a C-coded server function or a PLpgSQL trigger. I guess the WIP JSON datatype would help a lot even when working from within the server, as that does not mean working in C. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
On 07/06/10 10:30, KaiGai Kohei wrote: (2010/06/07 15:48), Heikki Linnakangas wrote: There's many side channels like exposing row counts in EXPLAIN and statistics and timing attacks, that are not as critical, because they don't let expose all data, and the attacker can't accurately choose what data is exposed. Those are not as important. It also means; because they can provide much smaller bandwidth to leak invisible information than error messages, these are not as important. Is it right? The big difference is what information can be obtained, not how fast it can be obtained. Imagine a table that holds username/passwords for users. Each user is allowed to see his own row, including password, but not anyone else's. EXPLAIN side-channel might give pretty accurate information of how many rows there is in the table, and via clever EXPLAIN+statistics probing you might be able to find out what the top-10 passwords are, for example. But if you wanted to know what your neighbor's password is, the side-channels would not help you much, but an error message would reveal it easily. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
On Sunday 30 May 2010 18:29:31 Greg Stark wrote: On Sun, May 30, 2010 at 4:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: I read through that thread and couldn't find much discussion of alternative CRC implementations --- we spent all our time on arguing about whether we needed 64-bit CRC or not. SSE4.2 has a hardware CRC32 instruction, this might be interesting to use... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
On Monday 07 June 2010 12:37:13 Pierre C wrote: On Sunday 30 May 2010 18:29:31 Greg Stark wrote: On Sun, May 30, 2010 at 4:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: I read through that thread and couldn't find much discussion of alternative CRC implementations --- we spent all our time on arguing about whether we needed 64-bit CRC or not. SSE4.2 has a hardware CRC32 instruction, this might be interesting to use... Different polynom unfortunately... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
Heikki, * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: The big difference is what information can be obtained, not how fast it can be obtained. Actually, I disagree. Time required to acquire the data does matter. Imagine a table that holds username/passwords for users. Each user is allowed to see his own row, including password, but not anyone else's. EXPLAIN side-channel might give pretty accurate information of how many rows there is in the table, and via clever EXPLAIN+statistics probing you might be able to find out what the top-10 passwords are, for example. But if you wanted to know what your neighbor's password is, the side-channels would not help you much, but an error message would reveal it easily. Using only built-ins, could you elaborate on how one could pick exactly what row was revealed using an error case? That strikes me as difficult, but perhaps I'm not thinking creatively enough. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
On 07/06/10 14:06, Stephen Frost wrote: * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: The big difference is what information can be obtained, not how fast it can be obtained. Actually, I disagree. Time required to acquire the data does matter. Depends on the magnitude, of course. If it takes 1 year per row, that's probably acceptable. If it takes 1 second, that's extremely slow compared to normal queries, but most likely still disastreous from a security point of view. Imagine a table that holds username/passwords for users. Each user is allowed to see his own row, including password, but not anyone else's. EXPLAIN side-channel might give pretty accurate information of how many rows there is in the table, and via clever EXPLAIN+statistics probing you might be able to find out what the top-10 passwords are, for example. But if you wanted to know what your neighbor's password is, the side-channels would not help you much, but an error message would reveal it easily. Using only built-ins, could you elaborate on how one could pick exactly what row was revealed using an error case? That strikes me as difficult, but perhaps I'm not thinking creatively enough. WHERE should do it: SELECT * FROM secrets_view WHERE username = 'neighbor' AND password::integer = 1234; ERROR: invalid input syntax for integer: neighborssecretpassword Assuming that username = 'neighbor' is evaluated before the cast. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
On Jun 7, 2010, at 12:45 , Andres Freund wrote: On Monday 07 June 2010 12:37:13 Pierre C wrote: On Sunday 30 May 2010 18:29:31 Greg Stark wrote: On Sun, May 30, 2010 at 4:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: I read through that thread and couldn't find much discussion of alternative CRC implementations --- we spent all our time on arguing about whether we needed 64-bit CRC or not. SSE4.2 has a hardware CRC32 instruction, this might be interesting to use... Different polynom unfortunately... Since only the WAL uses CRC, I guess the polynomial could be changed though. pg_upgrade for example shouldn't care. RFC3385 compares different checksumming methods for use in iSCSI, and CRC32c (which uses the same polynomial as the SSE4.2 instruction) wins. Here's a link: http://www.faqs.org/rfcs/rfc3385.html best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
* Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: WHERE should do it: SELECT * FROM secrets_view WHERE username = 'neighbor' AND password::integer = 1234; ERROR: invalid input syntax for integer: neighborssecretpassword Assuming that username = 'neighbor' is evaluated before the cast. Fair enough, so we can't allow built-ins either, except perhaps in very specific/limited situations. Still, if we track that the above WHERE and password::integer calls *should* be run as role X, while the view should run as role Y, maybe we can at least identify the case where we've ended up in a situation where we are going to expose unintended data. I don't know enough about the optimizer or the planner to have any clue how we might teach them to actually avoid doing such, though I certainly believe it could end up being a disaster on performance based on comments from others who know better. :) Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] warning message in standby
Hi, When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this problem, we should add a sleep into emode_for_corrupt_record() or somewhere? Or we should stop walreceiver and retry to read WAL from pg_xlog or the archive? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add current WAL end (as seen by walsender, ie, GetWriteRecPtr()
On Fri, Jun 4, 2010 at 7:17 AM, Tom Lane t...@postgresql.org wrote: Log Message: --- Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result) and current server clock time to SR data messages. These are not currently used on the slave side but seem likely to be useful in future, and it'd be better not to change the SR protocol after release. Per discussion. Also do some minor code review and cleanup on walsender.c, and improve the protocol documentation. This commit changed walsender so that it doesn't call set_ps_display() if update_process_title = off. On the other hand, walreceiver doesn't check update_process_title. Though this check might not be required since it's within set_ps_display(), we should do that for the sake of consistency? I attached the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center walrcv_update_process_title_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Dean Rasheed wrote: ... So the current code in escape_yaml() is inadequate for producing valid YAML. I think it would have to also consider at least the following characters as special - : [ ] { } , \ ' | * . Technically, it would also need to trap empty strings, and strings with leading or trailing whitespace. Making escape_yaml() completely bulletproof with this approach would be quite difficult, and (IMO) not worth the effort ... Doesn't seem like a lot of effort to me. You've already laid out most of the exceptions above, although they require a few tweaks. The rules should be: Requires quoting only if the first character: * ! |'% @ ` # Same as above, but no quoting if the second character is safe: - ? : Always requires quoting: :space space# aka ': ' ' #' Always requires quoting: , [ ] { } Always require quoting: (leading space) (trailing space) (empty string) See: http://yaml.org/spec/1.2/spec.html section 5.3 and 7.3.3 - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201006070943 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkwM+wAACgkQvJuQZxSWSsgWZACcCgb0rDvA6ZVhHId/q568gBGo sjgAoLY7HbkI7sRpO45vi0jSRJ2Fiytk =v7T/ -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Greg Sabino Mullane g...@turnstep.com writes: The rules should be: Requires quoting only if the first character: * ! |'% @ ` # Same as above, but no quoting if the second character is safe: - ? : Always requires quoting: :space space# aka ': ' ' #' Always requires quoting: , [ ] { } Always require quoting: (leading space) (trailing space) (empty string) Egad ... this is supposed to be an easily machine-generatable format? If it's really as broken as the above suggests, I think we should rip it out while we still can. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Tom Lane wrote: ... Egad ... this is supposed to be an easily machine-generatable format? If it's really as broken as the above suggests, I think we should rip it out while we still can. Heh ... not like you to shrink from a challenge. ;) I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). It will still be 100% valid YAML, just with some excess quoting for the very rare case when a value contains one of the special characters. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201006071035 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkwNA+4ACgkQvJuQZxSWSshSswCg81kd3FdYnQup1eLWGesm+vm+ VO8AoL1Fwil/vXfRdRHx4A4zZUTDbZuT =oPDv -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane g...@turnstep.com wrote: Tom Lane wrote: I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). It will still be 100% valid YAML, just with some excess quoting for the very rare case when a value contains one of the special characters. Since you're the main advocate of this feature, I think you should implement it rather than leaving it to Tom or I. The reason why I was initially skeptical of adding a YAML output format is that JSON is a subset of YAML. Therefore, the JSON output format ought to be perfectly sufficient for anyone using a YAML parser. If it's not, that's because their YAML processor is broken, and they should get a new one, or because the YAML spec is defective. The YAML format got voted in by consensus because people thought that it would also make a nice alternative to the text format for human readable output. I don't believe that (it uses way too much vertical space) but even if you accept the argument, the more we make the YAML format look like the JSON format, the less water that argument holds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Robert Haas wrote: On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane g...@turnstep.com wrote: Tom Lane wrote: I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). It will still be 100% valid YAML, just with some excess quoting for the very rare case when a value contains one of the special characters. Since you're the main advocate of this feature, I think you should implement it rather than leaving it to Tom or I. Or anyone else :-) The reason why I was initially skeptical of adding a YAML output format is that JSON is a subset of YAML. Therefore, the JSON output format ought to be perfectly sufficient for anyone using a YAML parser. There is some debate on this point, IIRC. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Greg Sabino Mullane g...@turnstep.com writes: I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). I would go with that. The quoting rules you proposed previously seem way too complicated --- meaning potentially buggy, and even if they're not buggy, the behavior would seem unpredictable to most users. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
On 7 June 2010 15:56, Tom Lane t...@sss.pgh.pa.us wrote: Greg Sabino Mullane g...@turnstep.com writes: I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). I would go with that. The quoting rules you proposed previously seem way too complicated --- meaning potentially buggy, and even if they're not buggy, the behavior would seem unpredictable to most users. Well actually it's not just everything containing a special character, it's also anything with leading or trailing whitespace, and empty strings (not sure that can ever happen in practice). It's because of the potential for bugs in this area, that I'd propose just quoting everything (except numeric values) as in my original patch. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
Fujii Masao masao.fu...@gmail.com writes: On Mon, Jun 7, 2010 at 5:42 AM, Andrew Dunstan and...@dunslane.net wrote: I tried this with a database name of replication in the .pgpass file, which matches what we need to use in pg_hba.conf, but it failed miserably, and only worked when I used a wildcard for the database name in the .pgpass file. If this is expected it needs to be documented more clearly; if not, it's a bug. Yep, this is expected, so we need to improve the doc. Why don't we improve the code, instead? In particular make libpqrcv_connect() do - snprintf(conninfo_repl, sizeof(conninfo_repl), %s replication=true, conninfo); + snprintf(conninfo_repl, sizeof(conninfo_repl), %s database=replication replication=true, conninfo); I don't think it's unlikely that someone would try to enter a replication-specific password into ~/.pgpass. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgstatindex still throws ERROR: value 3220078592 is out of range for type integer
I noted on line 169 that max_avail is still an int ? Where else would it be having problems ? Dave -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
Tom Lane wrote: Why don't we improve the code, instead? In particular make libpqrcv_connect() do - snprintf(conninfo_repl, sizeof(conninfo_repl), %s replication=true, conninfo); + snprintf(conninfo_repl, sizeof(conninfo_repl), %s database=replication replication=true, conninfo); I don't think it's unlikely that someone would try to enter a replication-specific password into ~/.pgpass. +1. It's highly likely - that's how we got here in the first place. It seems to me like a perfectly reasonable thing to do. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql: ON_ERROR_STOP command-line flag for scripts
Hey hackers, I routinely run SQL scripts using pgsql -f myscript.sql. If there is an error mid-way in a large script file, due to the default ON_ERROR_STOP=0, I see repeated ERROR: current transaction is aborted, commands ignored until end of transaction block until the end of the script. Often the error is buried at the top of the output and the following commands are meaningless or misleading messages with a last ROLLBACK at the bottom. This is particular relevant to new PostGIS users running postgis.sql that missed a step somewhere. I know I can either embed a non-SQL command in the SQL file or add --set ON_ERROR_STOP =1 to the command arguments, however I'd like a simpler command interface (just like there is -q / --quiet for QUIET). Could I suggest a new command-line option -r / --on-error-stop for psql? I have limited programming with C, should I bother submitting a patch, or is someone else eager? Thanks for your consideration. -Mike -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
* Tom Lane: Egad ... this is supposed to be an easily machine-generatable format? Perhaps you could surround all strings with in the generator, and escape all potentially special characters (which seems to include some whitespace even in quoted strings, unfortunately)? It has been claimed before that YAML is a superset of JSON, so why can't the YAML folks use the existing JSON output instead? -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: ON_ERROR_STOP command-line flag for scripts
* Mike Toews (mwto...@gmail.com) wrote: I know I can either embed a non-SQL command in the SQL file or add --set ON_ERROR_STOP =1 to the command arguments, however I'd like a simpler command interface (just like there is -q / --quiet for QUIET). Could I suggest a new command-line option -r / --on-error-stop for psql? I have limited programming with C, should I bother submitting a patch, or is someone else eager? Honestly, I'd love to see something like --script that does a number of things: Forcibly set (regardless of .psqlrc, in particular): --no-psqlrc --echo-all --single-transaction ON_ERROR_ROLLBACK=off ON_ERROR_STOP=on AUTOCOMMIT=off Assume -f - if no -f passed in (provided this actually works as expected.. I'm not sure it does, I remember there being some discussion about this recently). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Florian Weimer wrote: It has been claimed before that YAML is a superset of JSON, so why can't the YAML folks use the existing JSON output instead? Because JSON just crosses the line where it feels like there's so much markup that people expect a tool is necessary to read it, which has always been the issue with XML too--bad human readability. I was on the fence about YAML until I used it for a client issue over the weekend. I was able to hack together a quick tool to work on the issue that parsed enough YAML *without using an external library* well enough for my purposes in an hour, one that was still far more robust than a similar hack trying to read plain old text format for EXPLAIN. And the client was able to follow what was going on as I passed YAML output back and forth with them. Just having every field labeled clearly cut off all the usual which of these is the startup cost again? questions I'm used to getting. The complaints about YAML taking up too much vertical space are understandable, but completely opposite of what I care about. I can e-mail a customer a YAML plan and it will survive to the other side and even in a reply back to me. Whereas any non-trivial text format one is guaranteed to utterly destroyed by line wrapping along the way. I think this thread could use a fresh example to remind anyone who hasn't played with the curent YAML format what it looks like. Here's one from a query against the Dell Store 2 database: EXPLAIN SELECT * FROM customers WHERE customerid1000 ORDER BY zip; QUERY PLAN -- Sort (cost=4449.30..4496.80 rows=19000 width=268) Sort Key: zip - Seq Scan on customers (cost=0.00..726.00 rows=19000 width=268) Filter: (customerid 1000) EXPLAIN (FORMAT YAML) SELECT * FROM customers WHERE customerid1000 ORDER BY zip; QUERY PLAN - - Plan:+ Node Type: Sort+ Startup Cost: 4449.30 + Total Cost: 4496.80+ Plan Rows: 19000 + Plan Width: 268+ Sort Key: + - zip+ Plans: + - Node Type: Seq Scan+ Parent Relationship: Outer + Relation Name: customers + Alias: customers + Startup Cost: 0.00 + Total Cost: 726.00 + Plan Rows: 19000 + Plan Width: 268+ Filter: (customerid 1000) -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Greg Smith g...@2ndquadrant.com writes: The complaints about YAML taking up too much vertical space are understandable, but completely opposite of what I care about. I can e-mail a customer a YAML plan and it will survive to the other side and even in a reply back to me. Whereas any non-trivial text format one is guaranteed to utterly destroyed by line wrapping along the way. I think this thread could use a fresh example to remind anyone who hasn't played with the curent YAML format what it looks like. So? This doesn't look amazingly unlike the current JSON output, and to the extent that we have to add more quoting to it, it's going to look even more like the JSON output. Given the lack of any field separators other than newlines, I'm also finding myself extremely doubtful about the claim that it survives line-wrapping mutilations well. For instance this bit: - Node Type: Seq Scan Parent Relationship: Outer doesn't appear to have anything but whitespace to distinguish it from - Node Type: Seq Scan Parent Relationship: Outer regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nvarchar notation accepted?
On Mon, Jun 7, 2010 at 2:23 AM, Peter Eisentraut pete...@gmx.net wrote: The N'' syntax is in the SQL standard, I didn't know that, do you know what paragraph is it? i can't find it -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nvarchar notation accepted?
On mån, 2010-06-07 at 12:56 -0500, Jaime Casanova wrote: On Mon, Jun 7, 2010 at 2:23 AM, Peter Eisentraut pete...@gmx.net wrote: The N'' syntax is in the SQL standard, I didn't know that, do you know what paragraph is it? i can't find it Look for national character string literal. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
It's because of the potential for bugs in this area, that I'd propose just quoting everything (except numeric values) as in my original patch. I don't see a problem with this. I supported YAML output because I find it easier to read and copypaste than the other outputs. This is still the case even with quoting. And it's not exactly a hugely intrusive patch. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Functional dependencies and GROUP BY
I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The full functional dependency deduction rules are pretty big and arcane, so I concentrated on getting a useful subset working. In particular: When grouping by primary key, the other columns can be omitted, e.g., CREATE TABLE tab1 (a int PRIMARY KEY, b int); SELECT a, b FROM tab1 GROUP BY a; This is frequently requested by MySQL converts (and possibly others). Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; For lack of a better idea, I have made it so that merge-joinable operators qualify as equality operators. Better ideas welcome. Other rules could be added over time (but I'm current not planning to work on that myself). At this point, this patch could use some review and testing with unusual queries that break my implementation. ;-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index d0c41ce..e40cc4c 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -886,10 +886,7 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns literalproduct_id/literal, literalp.name/literal, and literalp.price/literal must be in the literalGROUP BY/ clause since they are referenced in -the query select list. (Depending on how the products -table is set up, name and price might be fully dependent on the -product ID, so the additional groupings could theoretically be -unnecessary, though this is not implemented.) The column +the query select list (but see below). The column literals.units/ does not have to be in the literalGROUP BY/ list since it is only used in an aggregate expression (literalsum(...)/literal), which represents the sales @@ -898,6 +895,18 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales /para para +If the products table is set up so that, +say, literalproduct_id/literal is the primary key or a +not-null unique constraint, then it would be enough to group +by literalproduct_id/literal in the above example, since name +and price would be firsttermfunctionally +dependent/firsttermindextermprimaryfunctional +dependency/primary/indexterm on the product ID, and so there +would be no ambiguity about which name and price value to return +for each product ID group. + /para + + para In strict SQL, literalGROUP BY/ can only group by columns of the source table but productnamePostgreSQL/productname extends this to also allow literalGROUP BY/ to group by columns in the diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index a4d017f..d901390 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -520,9 +520,17 @@ GROUP BY replaceable class=parameterexpression/replaceable [, ...] produces a single value computed across all the selected rows). When literalGROUP BY/literal is present, it is not valid for the commandSELECT/command list expressions to refer to -ungrouped columns except within aggregate functions, since there -would be more than one possible value to return for an ungrouped -column. +ungrouped columns except within aggregate functions or if the +ungrouped column is functionally dependent on the grouped columns, +since there would otherwise be more than one possible value to +return for an ungrouped column. A functional dependency exists if +the grouped columns (or a subset thereof) are the primary key or a +not-null unique constraint of the table containing the ungrouped +column. A functional dependency also exists if the ungrouped +column is constrained by the literalWHERE/literal clause to a +constant value (for example, by equality comparison with a +constant). Further rules for determining functional dependencies +might be added in the future. /para /refsect2 diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 0a69bde..bfdd7ef 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -14,6 +14,8 @@ */ #include postgres.h +#include access/heapam.h +#include catalog/pg_index.h #include nodes/makefuncs.h #include nodes/nodeFuncs.h #include optimizer/tlist.h @@ -24,20 +26,23 @@ #include rewrite/rewriteManip.h #include utils/builtins.h #include utils/lsyscache.h +#include utils/syscache.h typedef struct { ParseState *pstate; + Query *qry; List *groupClauses; bool have_non_var_grouping; int sublevels_up; } check_ungrouped_columns_context; -static void check_ungrouped_columns(Node *node, ParseState *pstate,
Re: [HACKERS] [BUGS] Invalid YAML output from EXPLAIN
Tom Lane wrote: This doesn't look amazingly unlike the current JSON output, and to the extent that we have to add more quoting to it, it's going to look even more like the JSON output. I don't know about that; here's the JSON one: EXPLAIN (FORMAT JSON) SELECT * FROM customers WHERE customerid1000 ORDER BY zip; QUERY PLAN --- [+ { + Plan: {+ Node Type: Sort, + Startup Cost: 4449.30, + Total Cost: 4496.80, + Plan Rows: 19000,+ Plan Width: 268, + Sort Key: [zip], + Plans: [ + {+ Node Type: Seq Scan, + Parent Relationship: Outer,+ Relation Name: customers, + Alias: customers, + Startup Cost: 0.00, + Total Cost: 726.00, + Plan Rows: 19000,+ Plan Width: 268, + Filter: (customerid 1000)+ }+ ] + }+ } + ] From the perspective of how that's less useful as a human form of output, it's longer, wider, and has redundant punctuation that gets in the way. I think that YAML quoting will need to respect one of the special cases to keep from ruining its readability: Requires quoting only if the first character for will make its current format look terrible if that rule is applied to the whole line instead. That sounds like a necessary special case to include: don't quote any quote characters that appear unless they're the first character on the line. Everything else could switch back to really aggressive quoting in every spot and that wouldn't hurt the readability of the format very much IMHO. Given the lack of any field separators other than newlines, I'm also finding myself extremely doubtful about the claim that it survives line-wrapping mutilations well. All I was claiming there is that the output is dramatically less wide than the standard text format of the same plan, and therefore far less likely to get nailed by a mail client that wraps at normal line widths. Agreed that once wrapping does occur, it has serious problems too. Here are the stats for this plan, leaving off the QUERY PLAN header from each: TEXT: 4 vertical, 69 horizontal YAML: 18 vertical, 36 horizontal JSON: 25 vertical, 43 horizontal XML[1]: 27 vertical, 60 horizontal Quote the TEXT line with or get a plan with one more line of intendation, and you're likely to get wrapped badly at the 72 character line limit some clients use. Quite a bit more headroom before the YAML format will wrap like that; JSON is in the middle. I now see plenty of use for YAML when exchanging plans over e-mail, and it's a bonus that should survive that format to be parseable on the other side. JSON and XML are certainly the preferred way to feed plans into analysis tools. unambiguously. [1] Might as well make this a complete example: explain xmlns=http://www.postgresql.org/2009/explain; + Query + Plan+ Node-TypeSort/Node-Type + Startup-Cost4449.30/Startup-Cost+ Total-Cost4496.80/Total-Cost+ Plan-Rows19000/Plan-Rows+ Plan-Width268/Plan-Width+ Sort-Key + Itemzip/Item + /Sort-Key + Plans + Plan+ Node-TypeSeq Scan/Node-Type + Parent-RelationshipOuter/Parent-Relationship+ Relation-Namecustomers/Relation-Name+ Aliascustomers/Alias+ Startup-Cost0.00/Startup-Cost + Total-Cost726.00/Total-Cost + Plan-Rows19000/Plan-Rows+ Plan-Width268/Plan-Width+ Filter(customerid gt; 1000)/Filter + /Plan + /Plans+ /Plan + /Query+ /explain -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add current WAL end (as seen by walsender, ie, GetWriteRecPtr()
Fujii Masao masao.fu...@gmail.com writes: This commit changed walsender so that it doesn't call set_ps_display() if update_process_title = off. Yeah. The idea is to not waste time on a useless snprintf() call when the GUC is turned off. On the other hand, walreceiver doesn't check update_process_title. Though this check might not be required since it's within set_ps_display(), we should do that for the sake of consistency? Seems reasonable to make the same minor optimization on that side. Will commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
Florian Pflug wrote: On Jun 7, 2010, at 12:45 , Andres Freund wrote: On Monday 07 June 2010 12:37:13 Pierre C wrote: On Sunday 30 May 2010 18:29:31 Greg Stark wrote: On Sun, May 30, 2010 at 4:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: I read through that thread and couldn't find much discussion of alternative CRC implementations --- we spent all our time on arguing about whether we needed 64-bit CRC or not. SSE4.2 has a hardware CRC32 instruction, this might be interesting to use... Different polynom unfortunately... Since only the WAL uses CRC, I guess the polynomial could be changed though. pg_upgrade for example shouldn't care. RFC3385 compares different checksumming methods for use in iSCSI, and CRC32c (which uses the same polynomial as the SSE4.2 instruction) wins. Here's a link: http://www.faqs.org/rfcs/rfc3385.html The linux kernel also uses it when it's availabe, see e.g. http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/crypto/crc32c-intel.c regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
On Fri, May 28, 2010 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, May 28, 2010 at 12:07 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I learned that to return an enum value from C, one needs to return the OID of the right row of the pg_enum table. I eventually managed to write the code below, which is mostly based on the enum_in function in src/backend/utils/adt/enum.c . PG_RETURN macros shouldn't do any nontrivial processing (see the existing ones for references). Yeah, that was my first reaction too. If we don't already have one, it would be appropriate to provide a lookup enum value function (functionally about the same as enum_in, but designed to be called conveniently from C). Then, if you needed to work from a textual enum label, you'd call that function and then PG_RETURN_OID. Here is the function I wrote to look up enum values: Oid enum_label_to_oid(const char *typname, const char *label) { Oid enumtypoid; HeapTuple tup; Oid ret; enumtypoid = TypenameGetTypid(typname); Assert(OidIsValid(enumtypoid)); tup = SearchSysCache2(ENUMTYPOIDNAME, ObjectIdGetDatum(enumtypoid), CStringGetDatum(label)); Assert(HeapTupleIsValid(tup)); ret = HeapTupleGetOid(tup); ReleaseSysCache(tup); return ret; } If this were added to PostgreSQL proper, what source/header files would it make sense to put it in? enum.c/builtins.h ? However, for a built-in enum type, I agree with Robert's solution of just #define-ing fixed OIDs for the values of the type. I don't know if the JSON datatype will be a contrib module or built-in yet, but if it were contrib, would it still be better to use fixed OIDs anyway? One issue with setting this precedent is that new contrib modules using enums wouldn't be compatible with older versions of PostgreSQL. One idea might be to insert extend CREATE TYPE name AS ENUM so OIDs can be specified explicitly, but then that could lead to OID clashes. That would be a really messy problem for users. By the way, just curious: why can't the internal representation of an enum just be an INT starting from 0 by default, like in C? That would make a heck of a lot more sense, in my opinion. It might also allow users to do things like this in the future: CREATE TYPE number AS ENUM ('one' = 1, 'two' = 2, 'three' = 3); Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
On Mon, Jun 7, 2010 at 7:33 PM, Peter Eisentraut pete...@gmx.net wrote: I have developed a patch that partially implements the functional dependency feature Nice! :) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
Excerpts from Joseph Adams's message of lun jun 07 17:16:12 -0400 2010: However, for a built-in enum type, I agree with Robert's solution of just #define-ing fixed OIDs for the values of the type. I don't know if the JSON datatype will be a contrib module or built-in yet, but if it were contrib, would it still be better to use fixed OIDs anyway? If it were contrib, fixed OIDs wouldn't cut it precisely for that reason. (A contrib module is not built-in for these purposes.) One issue with setting this precedent is that new contrib modules using enums wouldn't be compatible with older versions of PostgreSQL. One idea might be to insert extend CREATE TYPE name AS ENUM so OIDs can be specified explicitly, but then that could lead to OID clashes. That would be a really messy problem for users. Yeah. We've just defined an interface for pg_migrator-only usage, allowing it to define the OID values of ENUMs; it wasn't considered a good idea to expose the details to the user. By the way, just curious: why can't the internal representation of an enum just be an INT starting from 0 by default, like in C? That would make a heck of a lot more sense, in my opinion. It might also allow users to do things like this in the future: CREATE TYPE number AS ENUM ('one' = 1, 'two' = 2, 'three' = 3); The problem is that the output function needs to be able to figure out the value starting with only the datum value. If it had only the 1 it couldn't know what enum it'd correspond to. The other alternative would have been to make enums 64 bits wide, carrying the enum OID in 32 bits and the value in the other 32. This was dismissed as too wasteful. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Kevin Grittner kevin.gritt...@wicourts.gov writes: ... my perspective is that it would be A Good Thing if it could just be turned on when needed. If you have recurring bug that can be arranged, but in those cases you have other options; so I'm assuming you want this kept because it is primarily of forensic value after a non-repeatable bug has munged something? Yeah, that's exactly the problem. When you realize you need it, it's too late. The best thought I've had so far is that if someone kept WAL files long enough the evidence might be in there somewhere Hm, that is an excellent point. The WAL trace would actually be a lot superior in terms of being able to figure out what went wrong. But I don't quite see how we tell people either keep xmin or keep your old WAL. Also, for production sites the amount of WAL you'd have to hang onto seems a bit daunting. Other problems are the cost of shipping it to a developer, and the impracticality of sanitizing private data in it before you show it to somebody. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
On Mon, Jun 7, 2010 at 5:16 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: I don't know if the JSON datatype will be a contrib module or built-in yet, but if it were contrib, would it still be better to use fixed OIDs anyway? Part of the point is that EXPLAIN (FORMAT JSON) should return json, so this needs to be built-in. Otherwise, that won't work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: The best thought I've had so far is that if someone kept WAL files long enough the evidence might be in there somewhere Hm, that is an excellent point. The WAL trace would actually be a lot superior in terms of being able to figure out what went wrong. But I don't quite see how we tell people either keep xmin or keep your old WAL. Also, for production sites the amount of WAL you'd have to hang onto seems a bit daunting. Any thoughts on how far back the WAL would need to go to deal with the issues where such information has been useful? (For example, we always have at least two weeks worth, but I don't know if that's a useful range or not.) Other problems are the cost of shipping it to a developer, and the impracticality of sanitizing private data in it before you show it to somebody. Yeah, this wouldn't be a practical answer to the need unless PostgreSQL shipped with a tool which could scan WAL and extract the relevant information (probably under direction of someone from the list or a private support organization). Is the required information predictable enough to make developing such a tool a tractable problem? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Tom Lane t...@sss.pgh.pa.us wrote: If we were actually going in this direction we'd want to write a much better WAL-text-dump tool than we have, and then in principle somebody could sanitize the text output before shipping it off. I wouldn't think this would be practical unless there was a way to scan the WAL files and dump only the bits related to the affected pages, and probably with at least an option (possibly default) to just mention the data type and length, rather than showing the actual values -- how often are the values relevant, anyway? (Not rhetorical; I really don't know.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: Hm, that is an excellent point. The WAL trace would actually be a lot superior in terms of being able to figure out what went wrong. But I don't quite see how we tell people either keep xmin or keep your old WAL. Also, for production sites the amount of WAL you'd have to hang onto seems a bit daunting. Any thoughts on how far back the WAL would need to go to deal with the issues where such information has been useful? (For example, we always have at least two weeks worth, but I don't know if that's a useful range or not.) Well, it's a how long does it take you to notice data corruption kind of issue. The most recent case I can think of where xmin was helpful was in trying to sort out a problem with an index being inconsistent with the heap, which manifested as wrong query answers for the user. I don't know how long it took him to recognize and report the problem. (We never did locate the bug-if-any, IIRC... it would have been much more helpful to have the WAL trace. xmin did let me rule out some theories, though.) Other problems are the cost of shipping it to a developer, and the impracticality of sanitizing private data in it before you show it to somebody. Yeah, this wouldn't be a practical answer to the need unless PostgreSQL shipped with a tool which could scan WAL and extract the relevant information (probably under direction of someone from the list or a private support organization). Is the required information predictable enough to make developing such a tool a tractable problem? Hard to tell. If we were actually going in this direction we'd want to write a much better WAL-text-dump tool than we have, and then in principle somebody could sanitize the text output before shipping it off. But going through a large volume of data that way could be pretty impractical. Also, we (or at least I) have nearly zip experience with trying to debug problems by examining WAL, so it's not real clear to me which details might be important. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to get permission to debug postgres?
I have a system that uses C methods as functions in a Postgres database. When I try to run gdb with my user account I get this: ptrace: Operation not permitted It looks like a permission thing. It seemed to work when I ran gdb as the postgres user. I would like to continue running postgres as the postgres user and not have to su postgres when I want to debug. Can I somehow grant my user account ptrace permission for the process in question? Adding myself to the postgres group didn't seem to help. I found this on the ptrace man page: EPERM The specified process cannot be traced. This could be because the parent has insufficient privileges (the required capability is CAP_SYS_PTRACE); non-root processes cannot trace processes that they cannot send signals to or those running set-user- ID/set-group-ID programs, for obvious reasons. Alternatively, the process may already be being traced, or be init(8) (PID 1). So do I just need to give CAP_SYS_TRACE privileges to my user? If so, how? Thank you much. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte charater set in levenshtein function
Hello Hackers! I have extended my patch by introducing levenshtein_less_equal function. This function have additional argument max_d and stops calculating when distance exceeds max_d. With low values of max_d function works much faster than original one. The example of original levenshtein function usage: test=# select word, levenshtein(word, 'consistent') as dist from words where levenshtein(word, 'consistent') = 2 order by dist; word | dist -+-- consistent |0 insistent |2 consistency |2 coexistent |2 consistence |2 (5 rows) test=# explain analyze select word, levenshtein(word, 'consistent') as dist from words where levenshtein(word, 'consistent') = 2 order by dist; QUERY PLAN --- Sort (cost=2779.13..2830.38 rows=20502 width=8) (actual time=203.652..203.658 rows=5 loops=1) Sort Key: (levenshtein(word, 'consistent'::text)) Sort Method: quicksort Memory: 25kB - Seq Scan on words (cost=0.00..1310.83 rows=20502 width=8) (actual time=19.019..203.601 rows=5 loops=1) Filter: (levenshtein(word, 'consistent'::text) = 2) Total runtime: 203.723 ms (6 rows) Example of levenshtein_less_equal usage in this case: test=# select word, levenshtein_less_equal(word, 'consistent', 2) as dist from words where levenshtein_less_equal(word, 'consistent', 2) = 2 order by dist; word | dist -+-- consistent |0 insistent |2 consistency |2 coexistent |2 consistence |2 test=# explain analyze select word, levenshtein_less_equal(word, 'consistent', 2) as dist from words where levenshtein_less_equal(word, 'consistent', 2) = 2 order by dist; QUERY PLAN - Sort (cost=2779.13..2830.38 rows=20502 width=8) (actual time=42.198..42.203 rows=5 loops=1) Sort Key: (levenshtein_less_equal(word, 'consistent'::text, 2)) Sort Method: quicksort Memory: 25kB - Seq Scan on words (cost=0.00..1310.83 rows=20502 width=8) (actual time=5.391..42.143 rows=5 loops=1) Filter: (levenshtein_less_equal(word, 'consistent'::text, 2) = 2) Total runtime: 42.292 ms (6 rows) In the example above levenshtein_less_equal works about 5 times faster. With best regards, Alexander Korotkov. fuzzystrmatch-0.3.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parameters of GiST indexes
Hi hackers, I found that some parameters of GiST implementation are builin in the code. For example, following can be found in the backend/utils/adt/tsgistidx.c: #define SIGLENINT 31/* 121 = key will toast, so it will not work * !!! */ #define SIGLEN( sizeof(int4) * SIGLENINT ) #define SIGLENBIT (SIGLEN * BITS_PER_BYTE) I think that such parameters don't have optimal value for all the cases; and it would be a great option to let user define such parameters for particular index. For example, following syntax can be used: CREATE INDEX name ON table USING gist(column) WITH (SIGLENINT=63); With best regards, Korotkov Alexander.
[HACKERS] Using multidimensional indexes in ordinal queries
Hello hackers, I would like to share some my thoughts about usage of multidimensional indexes for queries which deal with ordinal unidimensional data types. I think that gist indexes (especially with knngist) can produce great benefit for complex multi-criterion queries. Let's consider come example. I use postgresql-9.0beta1 with knngist patch. Also I have created simple patch that allows to use knngist for ordinal sorting in cube extension (patch is attached). The * operator was introduced in my patch. The first operand is the cube and the second operand is number n. If n = 2*k then the ascending ordering by k-dimension occurs. If n = 2*k + 1 then descending ordering by k-dimension occurs. Now this operator have a limitation and works only with nonnegative coordinate values. Let's create table with 3 float-point columns and fill it with 10M rows; create table test (id serial primary key, v1 double precision, v2 double precision, v3 double precision); insert into test (v1,v2,v3) (select random()*1000, random()*1000, random()*1000 from generate_series(1,1000,1)); Now, let's create 3 separate btree indexes and one gist cube index. create index test_v1_idx on test(v1); create index test_v2_idx on test(v2); create index test_v3_idx on test(v3); create index test_cube_idx on test using gist(cube(ARRAY[v1,v2,v3])); Let's consider some complex query with filtering, ordering and limit. test=# select * from test where v1 between 480 and 500 and v2 between 480 and 500 order by v3 limit 10; id|v1|v2|v3 --+--+--+--- 12283631 | 485.982828773558 | 496.795611456037 | 0.213871244341135 4936086 | 497.239370364696 | 491.878624074161 | 1.26481195911765 8963067 | 484.963194001466 | 497.094289399683 | 1.30057940259576 12435440 | 498.670902103186 | 498.667187988758 | 1.33110675960779 11667415 | 494.398592971265 | 497.440234292299 | 1.44533207640052 8530558 | 482.85893118009 | 496.267838869244 | 1.48530444130301 4004942 | 483.679085504264 | 489.547223784029 | 1.57393841072917 14897796 | 491.37338064611 | 487.47524273 | 1.81775307282805 4105759 | 489.506138022989 | 486.91446846351 | 1.94038823246956 12895656 | 499.508572742343 | 487.065799534321 | 2.34963605180383 (10 rows) test=# explain analyze select * from test where v1 between 480 and 500 and v2 between 480 and 500 order by v3 limit 10; QUERY PLAN --- Limit (cost=22786.73..22786.75 rows=10 width=28) (actual time=3242.135..3242.162 rows=10 loops=1) - Sort (cost=22786.73..22797.59 rows=4345 width=28) (actual time=3242.131..3242.144 rows=10 loops=1) Sort Key: v3 Sort Method: top-N heapsort Memory: 25kB - Bitmap Heap Scan on test (cost=8755.91..22692.83 rows=4345 width=28) (actual time=1281.030..3234.934 rows=4027 loops=1) Recheck Cond: ((v1 = 480::double precision) AND (v1 = 500::double precision) AND (v2 = 480::double precision) AND (v2 = 500::double precision)) - BitmapAnd (cost=8755.91..8755.91 rows=4345 width=0) (actual time=1280.783..1280.783 rows=0 loops=1) - Bitmap Index Scan on test_v1_idx (cost=0.00..4243.12 rows=202177 width=0) (actual time=644.702..644.702 rows=200715 loops=1) Index Cond: ((v1 = 480::double precision) AND (v1 = 500::double precision)) - Bitmap Index Scan on test_v2_idx (cost=0.00..4510.37 rows=214902 width=0) (actual time=630.085..630.085 rows=200200 loops=1) Index Cond: ((v2 = 480::double precision) AND (v2 = 500::double precision)) Total runtime: 3242.253 ms (12 rows) This query can be rewritten in order to let planner use gist cube index. test=# select * from test where cube(array[v1,v2,v3]) @ cube(array[480,480,'-Infinity'::float],array[500,500,'+Infinity'::float]) order by cube(array[v1,v2,v3]) * 4 limit 10; id|v1|v2|v3 --+--+--+--- 12283631 | 485.982828773558 | 496.795611456037 | 0.213871244341135 4936086 | 497.239370364696 | 491.878624074161 | 1.26481195911765 8963067 | 484.963194001466 | 497.094289399683 | 1.30057940259576 12435440 | 498.670902103186 | 498.667187988758 | 1.33110675960779 11667415 | 494.398592971265 | 497.440234292299 | 1.44533207640052 8530558 | 482.85893118009 | 496.267838869244 | 1.48530444130301 4004942 | 483.679085504264 | 489.547223784029 | 1.57393841072917 14897796 | 491.37338064611 | 487.47524273 | 1.81775307282805 4105759 | 489.506138022989 | 486.91446846351 | 1.94038823246956 12895656 | 499.508572742343 | 487.065799534321 | 2.34963605180383 (10 rows) test=# explain analyze select * from test where
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Tom Lane wrote: If we were actually going in this direction we'd want to write a much better WAL-text-dump tool than we have, and then in principle somebody could sanitize the text output before shipping it off. But going through a large volume of data that way could be pretty impractical. Also, we (or at least I) have nearly zip experience with trying to debug problems by examining WAL, so it's not real clear to me which details might be important. There's another interesting thing about moving in this direction too. Systems that have WAL archiving setup that run into a problem could end up being a much richer source for historical analysis of how the system got into the bad state than is available right now. Typically those can have longer histories available than you'll find on a primary that's recycling segments all the time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/07 20:53), Heikki Linnakangas wrote: On 07/06/10 14:06, Stephen Frost wrote: * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: The big difference is what information can be obtained, not how fast it can be obtained. Actually, I disagree. Time required to acquire the data does matter. Depends on the magnitude, of course. If it takes 1 year per row, that's probably acceptable. If it takes 1 second, that's extremely slow compared to normal queries, but most likely still disastreous from a security point of view. FYI, the classic also mentioned about bandwidth of covert channel, although it was already obsoleted. See the page.80 of: http://csrc.nist.gov/publications/history/dod85.pdf It said 1bit/sec are acceptable on DoD in 25 years ago. Imagine a table that holds username/passwords for users. Each user is allowed to see his own row, including password, but not anyone else's. EXPLAIN side-channel might give pretty accurate information of how many rows there is in the table, and via clever EXPLAIN+statistics probing you might be able to find out what the top-10 passwords are, for example. But if you wanted to know what your neighbor's password is, the side-channels would not help you much, but an error message would reveal it easily. Using only built-ins, could you elaborate on how one could pick exactly what row was revealed using an error case? That strikes me as difficult, but perhaps I'm not thinking creatively enough. WHERE should do it: SELECT * FROM secrets_view WHERE username = 'neighbor' AND password::integer = 1234; ERROR: invalid input syntax for integer: neighborssecretpassword Assuming that username = 'neighbor' is evaluated before the cast. In this case, is it unnecessary to expose the given argument in the error message (from security perspective), isn't it? Because it is basically matter of the integer input handler, it seems to me what we should fix up is int4in(), not optimizer. Perhaps, we should categorize the issued functionalities base on the level of its threat when abused. * High-threat Functions have side-effect that allows to move the given arguments into another tables or other high-bandwidth chennel. E.g) lowrite(), pg_write_file() - It should be fixed soon. * Middle-threat Functions have side-effect that allows to move the given arguments using error messages or other low-bandwidth channel. E.g) int4in() - It should be fixed in long term. * Row-threat Functions can imply existence of invisible tuples, but it does not expose the value itself. E.g) EXPLAIN statement, PK/FK constraints - It should not be fixed in PostgreSQL. Now we allow all of them. But isn't it valuable to fix the high-threat first? Then, we can revise error messages in built-in functions, I think. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/07 21:56), Stephen Frost wrote: * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: WHERE should do it: SELECT * FROM secrets_view WHERE username = 'neighbor' AND password::integer = 1234; ERROR: invalid input syntax for integer: neighborssecretpassword Assuming that username = 'neighbor' is evaluated before the cast. Fair enough, so we can't allow built-ins either, except perhaps in very specific/limited situations. Still, if we track that the above WHERE and password::integer calls *should* be run as role X, while the view should run as role Y, maybe we can at least identify the case where we've ended up in a situation where we are going to expose unintended data. I don't know enough about the optimizer or the planner to have any clue how we might teach them to actually avoid doing such, though I certainly believe it could end up being a disaster on performance based on comments from others who know better. :) My opinion is that it is a matter in individual functions, not optimizer. Basically, built-in functions *should* be trusted, because our security mechanism is not designed to prevent anything from malicious internal binary modules. Historically, we have not known the risk to leak invisible information using error messages for a long time, so most of internal functions have not been designed not to return users unnecessary information. If so, it needs to revise error messages, but it will not complete with a single commit. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstatindex still throws ERROR: value 3220078592 is out of range for type integer
Dave Cramer p...@fastcrypt.com wrote: I noted on line 169 that max_avail is still an int ? Where else would it be having problems ? It should not a problem because the local variable only stores byte size in a page. It will be at most only BLCKSZ (=8192). I wonder why you had ERROR: value ... is out of range for type integer message because we don't use any integer data types for sizes in pgstatindex. The error should have been thrown by SQL typin functions rather than C routines. CREATE OR REPLACE FUNCTION pgstatindex(IN relname text, OUT version INT, OUT tree_level INT, OUT index_size BIGINT, OUT root_block_no BIGINT, OUT internal_pages BIGINT, OUT leaf_pages BIGINT, OUT empty_pages BIGINT, OUT deleted_pages BIGINT, OUT avg_leaf_density FLOAT8, OUT leaf_fragmentation FLOAT8) AS 'MODULE_PATHNAME', 'pgstatindex' LANGUAGE C STRICT; Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Working with PostgreSQL enums in C code
Joseph Adams wrote: By the way, just curious: why can't the internal representation of an enum just be an INT starting from 0 by default, like in C? That would make a heck of a lot more sense, in my opinion. It might also allow users to do things like this in the future: Please review the debates over the internal representation from several years ago when enums were implemented. Essentially the difficulty is that the output function needs to get nothing more than the value itself, and that means the representation needs to carry with it some information about *which* enum set it is in. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
KaiGai Kohei kai...@ak.jp.nec.com writes: In this case, is it unnecessary to expose the given argument in the error message (from security perspective), isn't it? Yes, if all you care about is security and not usability, that looks like a great solution. We're *not* doing it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/08 9:23), KaiGai Kohei wrote: (2010/06/07 21:56), Stephen Frost wrote: * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: WHERE should do it: SELECT * FROM secrets_view WHERE username = 'neighbor' AND password::integer = 1234; ERROR: invalid input syntax for integer: neighborssecretpassword Assuming that username = 'neighbor' is evaluated before the cast. Fair enough, so we can't allow built-ins either, except perhaps in very specific/limited situations. Still, if we track that the above WHERE and password::integer calls *should* be run as role X, while the view should run as role Y, maybe we can at least identify the case where we've ended up in a situation where we are going to expose unintended data. I don't know enough about the optimizer or the planner to have any clue how we might teach them to actually avoid doing such, though I certainly believe it could end up being a disaster on performance based on comments from others who know better. :) My opinion is that it is a matter in individual functions, not optimizer. Basically, built-in functions *should* be trusted, because our security mechanism is not designed to prevent anything from malicious internal binary modules. Sorry, it does not mean *all* the built-in functions could be trusted. Some of built-in ones cannot be trusted from definitions, such as lowrite(). Perhaps, it eventually needs a flag in the pg_proc to mark a function being either trusted or untrusted. Then, planner may be able to check the flag to decide whether is can be pushed down, or not. If so, we can mark int4in() as trusted, when we revise the issue of error message. I think the idea makes this problem more simple. Thanks, Historically, we have not known the risk to leak invisible information using error messages for a long time, so most of internal functions have not been designed not to return users unnecessary information. If so, it needs to revise error messages, but it will not complete with a single commit. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
2010/6/8 Peter Eisentraut pete...@gmx.net: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The full functional dependency deduction rules are pretty big and arcane, so I concentrated on getting a useful subset working. In particular: Also, when a column is compared with a constant, it can appear ungrouped: SELECT x, y FROM tab2 WHERE y = 5 GROUP BY x; I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; or there should be more complicated example including JOIN cases. I don't believe we can detect all of such cases. If the simple case is allowed, users don't understand why the complicated case doesn't allow sometimes. So it'll not be consistent. Finally, it may hide unintended bugs. ORM tools may make WHERE clause in some conditions and don't in other conditions. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to get permission to debug postgres?
On Sat, Jun 5, 2010 at 8:16 AM, Randy Solomonson ra...@solomonson.com wrote: I have a system that uses C methods as functions in a Postgres database. When I try to run gdb with my user account I get this: ptrace: Operation not permitted It looks like a permission thing. It seemed to work when I ran gdb as the postgres user. I would like to continue running postgres as the postgres user and not have to su postgres when I want to debug. Can I somehow grant my user account ptrace permission for the process in question? Adding myself to the postgres group didn't seem to help. I found this on the ptrace man page: EPERM The specified process cannot be traced. This could be because the parent has insufficient privileges (the required capability is CAP_SYS_PTRACE); non-root processes cannot trace processes that they cannot send signals to or those running set-user- ID/set-group-ID programs, for obvious reasons. Alternatively, the process may already be being traced, or be init(8) (PID 1). So do I just need to give CAP_SYS_TRACE privileges to my user? If so, how? This isn't really a PostgreSQL-specific question but more of a general Linux admin question, and I confess I don't really know the answer. I *think* that capabilities are more intended to be set on particular executables, e.g. give ping the ability to open raw sockets without making it setuid root, rather than to particular users, but I'm not really sure. In terms of PG development, I've found that it's much simpler to start up a test instance of PG under my own user account and then do my debugging from there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
On Tue, Jun 8, 2010 at 1:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: KaiGai Kohei kai...@ak.jp.nec.com writes: In this case, is it unnecessary to expose the given argument in the error message (from security perspective), isn't it? Yes, if all you care about is security and not usability, that looks like a great solution. We're *not* doing it. It's possible to take a more nuanced angle on this approach. You could imagine a flag indicating whether the call is within a SECURE VIEW which if enabled caused error messages to elide data taken from arguments. In order for this not to destroy the code legibility I think you would need to design some new %-escapes for error messages to indicate such arguments or some similar trick. You wouldn't want to have to put extra conditionals everywhere. And I'm not sure where to conveniently put such a flag so it would have the right semantics. It would still be a lot of work and a big patch, but mostly mechanical and it wouldn't impact usability for any case where it wasn't necessary. It would still have the problem described earlier that we would keep finding new omissions for years. I can't see us implementing variable taint tracking in C to do it automatically though. Perhaps we could get it from one of the static analysis tools. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/08 9:46), Tom Lane wrote: KaiGai Koheikai...@ak.jp.nec.com writes: In this case, is it unnecessary to expose the given argument in the error message (from security perspective), isn't it? Yes, if all you care about is security and not usability, that looks like a great solution. We're *not* doing it. Sorry, are you saying we should not revise error messages because of usability?? If so, and if we decide the middle-threat also should be fixed, it is necessary to distinguish functions trusted and untrusted, even if a function is built-in. Perhaps, pg_proc takes a new flag to represent it. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: Perhaps, pg_proc takes a new flag to represent it. Without an actual well-formed approach for dealing with this which requires it, it's far too soon to be asking for changes in the catalog. Please stop suggesting individual maybe-this-would-help changes. We need an actual solution which addresses all, or at least most, of the concerns described, followed by a patch which implements the mechanics of the solution. If the solution calls for an additional pg_proc field, then the patch should include it. Not sure if this would translate well, but asking for new flags to be added to pg_proc right now is putting the cart before the horse. We don't even know which functions we might mark as trusted or not yet, nor is it even clear that adding such a flag would actually help. Adding a flag to pg_proc is certainly nothing like a solution to this problem by itself. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
* Tom Lane (t...@sss.pgh.pa.us) wrote: If we were actually going in this direction we'd want to write a much better WAL-text-dump tool than we have, and then in principle somebody could sanitize the text output before shipping it off. But going through a large volume of data that way could be pretty impractical. Also, we (or at least I) have nearly zip experience with trying to debug problems by examining WAL, so it's not real clear to me which details might be important. Just an off-the-wall thought, but, would it be possible to have a tool which read WAL backwards and compared entries in the WAL against entries on disk? I realize that you'd only see one version of a particular block and then have to skip any updates which are earlier than it, but it seems like you could cover a pretty large chunk of the recent changes to the database using this approach.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Functional dependencies and GROUP BY
* Hitoshi Harada (umi.tan...@gmail.com) wrote: I don't see why it should be allowed. I see the insist that y must be unique value so it is ok to be ungrouped but the point of discussion is far from that; Semantically y is not grouping key. Ignoring the fact that it's terribly useful- isn't it part of the SQL spec? In addition, what if y is implicitly a constant? For example, SELECT x, y FROM tab2 WHERE y = a AND a = 5 GROUP BY x; Not sure I see the issue here? Finally, it may hide unintended bugs. ORM tools may make WHERE clause in some conditions and don't in other conditions. Yeah, this one I really just done buy.. If an ORM tool doesn't write correct SQL, then it's the ORM's fault, not ours. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Functional dependencies and GROUP BY
* Peter Eisentraut (pete...@gmx.net) wrote: This is frequently requested by MySQL converts (and possibly others). I'd certainly love to see it- but let's not confuse people by implying that it would actually act the way MySQL does. It wouldn't, because what MySQL does is alot closer to 'distinct on' and is patently insane to boot. Again, I'd *love* to see this be done in PG, but when we document it and tell people about it, *please* don't say it's similar in any way to MySQL's oh, we'll just pick a random value from the columns that aren't group'd on implementation. At this point, this patch could use some review and testing with unusual queries that break my implementation. ;-) I'll give it a shot... :) Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/08 10:17), Stephen Frost wrote: KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: Perhaps, pg_proc takes a new flag to represent it. Without an actual well-formed approach for dealing with this which requires it, it's far too soon to be asking for changes in the catalog. Please stop suggesting individual maybe-this-would-help changes. We need an actual solution which addresses all, or at least most, of the concerns described, followed by a patch which implements the mechanics of the solution. If the solution calls for an additional pg_proc field, then the patch should include it. OK, it was too implementation-specific. Please return to the categorization with 3-level that I mentioned at the previous message. I guess nobody opposes to disable pushing down on functions categorized to high-threat, such as lowrite() and so on. It actually can move given arguments to other tables, files, ... And, I guess nobody wants to tackle an issue categorized to low-threat, such as EXPLAIN, PK/FK constraints and so on. It can imply existence of invisible objects, but no leaks of actual value. Our headache is on functions categorized to middle-threat. It enables to leak the given arguments using error messages. Here are several ideas, but they have good and bad points. At first, it is necessary whether we should fix up the threat in this level, or not. It seems to me majority of the -hackers want to fix both of high and middle level. If we should fix up the issue, I think we have only two options semantically. (A) It prevents leaky functions to be pushed down, so no invisible information will be provided. But it makes performance regression. (B) It prevents leaky functions to raise an error, although we allow it to be pushed down. But is needs large scale of code changes. Of course, it has trade-off. As TCSEC mentioned, we are facing with the large potential cost of reducing the bandwidths of such covert channel Not sure if this would translate well, but asking for new flags to be added to pg_proc right now is putting the cart before the horse. We don't even know which functions we might mark as trusted or not yet, nor is it even clear that adding such a flag would actually help. Adding a flag to pg_proc is certainly nothing like a solution to this problem by itself. For built-in functions, the code should be reviewed to ensure it does not expose the given argument using error messages. Then, we can mark it as trusted. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
2010/6/7 KaiGai Kohei kai...@ak.jp.nec.com: Our headache is on functions categorized to middle-threat. It enables to leak the given arguments using error messages. Here are several ideas, but they have good and bad points. I think we are altogether off in the weeds here. We ought to start with an implementation that pushes nothing down, and then try to figure out how much we can relax that without too much compromising security. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
* Robert Haas (robertmh...@gmail.com) wrote: 2010/6/7 KaiGai Kohei kai...@ak.jp.nec.com: Our headache is on functions categorized to middle-threat. It enables to leak the given arguments using error messages. Here are several ideas, but they have good and bad points. I think we are altogether off in the weeds here. We ought to start with an implementation that pushes nothing down, and then try to figure out how much we can relax that without too much compromising security. I agree with this- and it's more-or-less what I was trying to propose in my previous comments. I'm not even sure we need to focus on not pushing anything down at this point- I'd start with trying to get enough information passed around/through the system to even *identify* the case where there's a problem in the first place.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/08 11:15), Robert Haas wrote: 2010/6/7 KaiGai Koheikai...@ak.jp.nec.com: Our headache is on functions categorized to middle-threat. It enables to leak the given arguments using error messages. Here are several ideas, but they have good and bad points. I think we are altogether off in the weeds here. We ought to start with an implementation that pushes nothing down, and then try to figure out how much we can relax that without too much compromising security. It seems to me fair enough. I think we can adjust what functions are harmless, and whats are not later. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
For the sake of clarity.. * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: OK, it was too implementation-specific. No, that wasn't the problem. There isn't an actual implementation yet for it to be too-specific on. The problem is that proposing a change to the catalog without figuring out what it'd actually be used for in an overall solution is a waste of time. Please return to the categorization with 3-level that I mentioned at the previous message. As Robert said, we're off in the weeds here. I'm not convinced that we've got 3 levels, for starters. It could well be fewer, or more. Let's stop making assumptions about what's OK and what's not OK. For built-in functions, the code should be reviewed to ensure it does not expose the given argument using error messages. Then, we can mark it as trusted. One thing that I think *is* clear- removing useful information from error messages is *not* going to be an acceptable solution. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS
(2010/06/08 11:28), Stephen Frost wrote: For the sake of clarity.. * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: OK, it was too implementation-specific. No, that wasn't the problem. There isn't an actual implementation yet for it to be too-specific on. The problem is that proposing a change to the catalog without figuring out what it'd actually be used for in an overall solution is a waste of time. Indeed, Please return to the categorization with 3-level that I mentioned at the previous message. As Robert said, we're off in the weeds here. I'm not convinced that we've got 3 levels, for starters. It could well be fewer, or more. Let's stop making assumptions about what's OK and what's not OK. Indeed, we may find out the 4th category in the future. For built-in functions, the code should be reviewed to ensure it does not expose the given argument using error messages. Then, we can mark it as trusted. One thing that I think *is* clear- removing useful information from error messages is *not* going to be an acceptable solution. Even if it is conditional, like as Greg Stark suggested? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
On Tue, Jun 8, 2010 at 12:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Mon, Jun 7, 2010 at 5:42 AM, Andrew Dunstan and...@dunslane.net wrote: I tried this with a database name of replication in the .pgpass file, which matches what we need to use in pg_hba.conf, but it failed miserably, and only worked when I used a wildcard for the database name in the .pgpass file. If this is expected it needs to be documented more clearly; if not, it's a bug. Yep, this is expected, so we need to improve the doc. Why don't we improve the code, instead? In particular make libpqrcv_connect() do - snprintf(conninfo_repl, sizeof(conninfo_repl), %s replication=true, conninfo); + snprintf(conninfo_repl, sizeof(conninfo_repl), %s database=replication replication=true, conninfo); What if the real database named replication exists? How can we specify the password only for replication purpose in that case? BTW, to distinguish the replication connection from the connection to the real database named replication, I proposed changing the .pgpass code so that it accepts the keyword only for replication, like pg_hba.conf. But it was rejected, and as the result of the discussion, we had consensus to not change the code. http://archives.postgresql.org/pgsql-hackers/2010-01/msg00400.php I don't think it's unlikely that someone would try to enter a replication-specific password into ~/.pgpass. Agreed. But I think that we don't need to specify other than the wildcard in the database field of .pgpass to use the replication-specific password if the replication-specific user is supplied in .pgpass. So the current code is enough for me. Am I missing something? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
Fujii Masao masao.fu...@gmail.com writes: But I think that we don't need to specify other than the wildcard in the database field of .pgpass to use the replication-specific password if the replication-specific user is supplied in .pgpass. So the current code is enough for me. Am I missing something? You're looking at it from the perspective of somebody who knows exactly how the code works. What Andrew tried is exactly what 95% of other people would try. There doesn't seem to me to be any very good argument against making it work for them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Functional dependencies and GROUP BY
Peter Eisentraut pete...@gmx.net writes: I have developed a patch that partially implements the functional dependency feature that allows some columns to be omitted from the GROUP BY clause if it can be shown that the columns are functionally dependent on the columns in the group by clause and therefore guaranteed to be unique per group. The main objection to this is the same one I've had all along: it makes the syntactic validity of a query dependent on what indexes exist for the table. At minimum, that means that enforcing the check at parse time is the Wrong Thing. The var-compared-with-constant case seems like a big crock. Are we really required to provide such a thing per spec? I'm also fairly concerned about the performance of a check implemented this way --- it's going to do a lot of work, and do it over and over again as it traverses the query tree. At least some of that could be alleviated after you move the check to the planner, just by virtue of the index information already having been acquired ... but I'd still suggest expending more than no effort on caching the results. For instance, given SELECT * FROM a_very_wide_table GROUP BY pk you shouldn't have to prove more than once that a_very_wide_table is grouped by its PK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
On Tue, Jun 8, 2010 at 12:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: But I think that we don't need to specify other than the wildcard in the database field of .pgpass to use the replication-specific password if the replication-specific user is supplied in .pgpass. So the current code is enough for me. Am I missing something? You're looking at it from the perspective of somebody who knows exactly how the code works. What Andrew tried is exactly what 95% of other people would try. There doesn't seem to me to be any very good argument against making it work for them. Hmm.. is it worth going back to my proposal? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR slaves and .pgpass
Fujii Masao masao.fu...@gmail.com writes: Hmm.. is it worth going back to my proposal? I don't recall exactly what proposal you might be referring to, but I'm hesitant to put any large amount of work into hacking .pgpass processing for this. The whole business of replication authorization is likely to get revisited in 9.1, no? I think a cheap-and-cheerful solution is about right for the moment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers