[HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Attached patch allows pg_stat_statements to store arbitrarily long query texts, using an external, transparently managed lookaside file. This is of great practical benefit to certain types of users, who need to understand the execution costs of queries with associated excessively long query texts, often due to the fact that the query was generated by an ORM. This is a common complaint of Heroku customers, though I'm sure they're not alone there. As I mentioned on a previous occasion, since query fingerprinting was added, the query text is nothing more than a way of displaying the entries to users that are interested. As such, it seems perfectly reasonable to store those externally, out of shared memory - the function pg_stat_statements() itself (that the view is defined as a call to) isn't particularly performance sensitive. Creating a brand new pg_stat_statements entry is already costly, since it necessitates an exclusive lock that blocks everything, so the additional cost of storing the query text when that happens is assumed to be of marginal concern. Better to have more entries in the first place, so that doesn't need to happen very frequently - using far less memory per entry helps the user to increase pg_stat_statements.max in the first place, making excessive exclusive locking more unlikely in the first place. Having said all that, the code bends over backwards to make sure that physical I/O is as unlikely as possible during exclusive locking. There are numerous tricks employed here where cache pressure is a concern, that I won't go into here, since it's all well commented in the patch. Maybe we should have a warning when eviction occurs too frequently? I also think that we might want to take another look at making the normalization logic less strict in terms of differentiating queries that a reasonable person might consider equivalent. This is obviously rather fuzzy, but I've had complaints about things like pg_stat_statements being overly fussy in seeing row and array comparisons as distinct from each other. This is a discussion for another thread most probably, but informed by one of the same concerns - making expensive cache pressure less likely. This incidentally furthers the case for pg_stat_statements in core (making it similar to pg_stat_user_functions - not turned on by default, but easily available, even on busy production systems that cannot afford a restart and didn't know they needed it until performance tanked 5 minutes ago). The amount of shared memory now used (and therefore arguably wasted by having a little bit of shared memory just in case pg_stat_statements becomes useful) is greatly reduced. That's really a separate discussion, though, which is why I haven't done the additional work of integrating pg_stat_statements into core here. Doing a restart to enable pg_stat_statements is, in my experience, only acceptable infrequently - restarts are generally to be avoided at all costs. I can see a day when the query hash is actually a general query cache identifier, at which time the query text will also be stored outside of the pgss hash table proper. Refactoring = I've decided to remove the encoding id from the pgss entry key (it's now just in the main entry struct) - I don't think it makes sense anymore. It is clearly no longer true that it's a notational convenience (and hasn't been since 9.1). Like query texts themselves, it is something that exists for the sole purpose of displaying statistics to users, and as such cannot possibly result in incorrectly failing to differentiate or spuriously differentiating two entries. Now, I suppose it's true that since we're sometimes directly hashing query texts, it might matter (e.g. utility statements) assuming that they could vary. But since encoding invariably comes from database encoding anyway, and we always differentiate on databaseid to begin with, I fail to see how that could possibly matter. I've bumped PGSS_FILE_HEADER, just in case a pg_stat_statements temp file survives a pg_upgrade. I think that some minor tweaks made by Noah to pg_stat_statements (as part of commit b560ec1b) ought to have necessitated doing so anyway, but no matter. The role of time-series aggregation = Over on the min/max pg_stat_statements storage thread, I've stated that I think the way forward is that third party tools aggregate deltas fairly aggressively - maybe even as aggressively as every 3 seconds or something. So I'm slightly concerned that I may be hampering a future tool that needs to aggregate statistics very aggressively. For that reason, we should probably provide an alternative function/view that identifies pg_stat_statements entries by hashid + databaseid + userid only, so any overhead from reading big query strings from disk with a shared lock held is eliminated. Thoughts? -- Peter Geoghegan pg_stat_statements_ext_text.v1.2013_11_14.patch.gz Description: GNU Zip compressed data -- Sent via
Re: [HACKERS] nested hstore patch
On 11/14/2013 01:32 AM, David E. Wheeler wrote: On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I remember strong voices in support of *not* normalising json, so that things like {a:1,a:true, a:b, a:none} would go through the system unaltered, for claimed standard usage of json as processing instructions. That is as source code which can possibly converted to JavaScript Object and not something that would come out of serialising of any existing JavaScript Object. My recollection from PGCon was that there was consensus to normalize on the way in -- Great news! I remember advocating this approach in the mailing lists but having been out-voted based on current real-world usage out there :) or at least, if we switched to a binary representation as proposed by Oleg Teodor, it was not worth the hassle to try to keep it. Very much agree. For the source code approach I'd recommend text type with maybe a check that it is possible to convert it to json. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA
On Sun, Nov 3, 2013 at 3:18 AM, David Rowley dgrowle...@gmail.com wrote: I'm low on ideas on how to improve things much around here for now, but for what it's worth, I did create a patch which changes unnecessary calls to appendPQExpBuffer() into calls to appendPQExpBufferStr() similar to the recent one for appendStringInfo and appendStringInfoString. Attached is a re-based version of this. Regards David Rowley Regards David Rowley appendPQExpBufferStr_v0.2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second
On 13 Nov 2013, at 20:51, Mika Eloranta m...@ohmu.fi wrote: Prevent excessive progress reporting that can grow to gigabytes of output with large databases. Same patch as an attachment. -- Mika Eloranta Ohmu Ltd. http://www.ohmu.fi/ 0001-pg_basebackup-progress-report-max-once-per-second.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: On 04.11.2013 23:44, Alexander Korotkov wrote: On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov aekorot...@gmail.comwrote: Attached version of patch is debugged. I would like to note that number of bugs was low and it wasn't very hard to debug. I've rerun tests on it. You can see that numbers are improved as the result of your refactoring. event | period ---+- index_build | 00:01:45.416822 index_build_recovery | 00:00:06 index_update | 00:05:17.263606 index_update_recovery | 00:01:22 search_new| 00:24:07.706482 search_updated| 00:26:25.364494 (6 rows) label | blocks_mark +- search_new | 847587636 search_updated | 881210486 (2 rows) label | size ---+--- new | 419299328 after_updates | 715915264 (2 rows) Beside simple bugs, there was a subtle bug in incremental item indexes update. I've added some more comments including ASCII picture about how item indexes are shifted. Now, I'm trying to implement support of old page format. Then we can decide which approach to use. Attached version of patch has support of old page format. Patch still needs more documentation and probably refactoring, but I believe idea is clear and it can be reviewed. In the patch I have to revert change of null category placement for compatibility with old posting list format. Thanks, just glanced at this quickly. If I'm reading the patch correctly, old-style non-compressed posting tree leaf pages are not vacuumed at all; that's clearly not right. Fixed. Now separate function handles uncompressed posting lists and compress them if as least one TID is deleted. I argued earlier that we don't need to handle the case that compressing a page makes it larger, so that the existing items don't fit on the page anymore. I'm having some second thoughts on that; I didn't take into account the fact that the mini-index in the new page format takes up some space. I think it's still highly unlikely that there isn't enough space on a 8k page, but it's not totally crazy with a non-standard small page size. So at least that situation needs to be checked for with an ereport(), rather than Assert. Now this situation is ereported before any change in page. To handle splitting a non-compressed page, it seems that you're relying on the fact that before splitting, we try to insert, which compresses the page. The problem with that is that it's not correctly WAL-logged. The split record that follows includes a full copy of both page halves, but if the split fails for some reason, e.g you run out of disk space, there is no WAL record at all of the the compression. I'd suggest doing the compression in the insert phase on a temporary copy of the page, leaving the original page untouched if there isn't enough space for the insertion to complete. (You could argue that this case can't happen because the compression must always create enough space to insert one more item. maybe so, but at least there should be an explicit check for that.) Good point. Yes, if we don't handle specially inserting item indexes, I see no point to do special handling for single TID which is much smaller. In the attached patch dataCompressLeafPage just reserves space for single TID. Also, many changes in comments and README. Unfortunally, I didn't understand what is FIXME about in ginVacuumEntryPage. So, it's not fixed :) -- With best regards, Alexander Korotkov. gin-packed-postinglists-13.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tcp_keepalives_idle
On 11/14/13 7:08 AM, Tatsuo Ishii wrote: It means the connection is idle except for keepalive packets. We could perhaps just drop the word otherwise, if people find it confusing. Wah. I seemed to completely misunderstand what the pharase says. Thanks for clarification. I agree to drop otherwise. I had some problem interpreting these explanations as well: http://www.postgresql.org/message-id/527a21f1.2000...@joh.to Compare that to the description in the libpq documentation: Controls the number of seconds of inactivity after which TCP should send a keepalive message to the server.. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second
On 11/14/13 10:27 AM, Mika Eloranta wrote: On 13 Nov 2013, at 20:51, Mika Eloranta m...@ohmu.fi wrote: Prevent excessive progress reporting that can grow to gigabytes of output with large databases. Same patch as an attachment. I can't comment on the usefulness of this patch, but the first hunk in progress_report does not conform to the code style guidelines of the project. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra functionality to createuser
Hello, Tried to test this patch. Did the following 1. cloned from https://github.com/samthakur74/postgres 2. Applied patch and make install 3. created rolesapp_readonly_role,app2_writer_role 4. Tried createuser -D -S -l -g app_readonly_role,app2_writer_role test_user got error: createuser: invalid option -- 'g' 5. Tried createuser -D -S -l --roles app_readonly_role,app2_writer_role test_user. This does not give error. 6. Confirmed that test_user is created using \du and it has postgres=# \du List of roles Role name | Attributes | Member of ---++--- --- Sameer| Superuser, Create role, Create DB, Replication | {} app2_writer_role | Cannot login | {} app_readonly_role | Cannot login | {} my_new_user || {app_reado nly_role,app2_writer_role} test_user || {app_reado nly_role,app2_writer_role} 7. createuser --help does show -g, --roles roles to associate with this new role So i think -g option is failing regards Sameer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Clang 3.3 Analyzer Results
On Wednesday, November 13, 2013, Tom Lane wrote: Kevin Grittner kgri...@ymail.com javascript:; writes: If nobody objects, I'll fix that small memory leak in the regression test driver. Hopefully someone more familiar with pg_basebackup will fix the double-free (and related problems mentioned by Tom) in streamutil.c. Here's a less convoluted (IMHO) approach to the password management logic in streamutil.c. One thing I really didn't care for about the existing coding is that the loop-for-password included all the rest of the function, even though there's no intention to retry for any purpose except collecting a password. So I moved up the bottom of the loop. For ease of review, I've not reindented the code below the new loop bottom, but would do so before committing. Any objections to this version? Nope, looks good to me. That code was originally stolen from psql, and then whacked around a number of times. The part about looping and passwords, for example, is in startup.c in psql as well. We probably want to fix it there as well (even if it doesn't have the same problem, it has the same general design). Or perhaps even put that function somewhere shared between the two? It's also in pg_dump/pg_backup_db.c, there's a version of it in pg_dumpall.c, etc. Which I think is a good argument for fixing them all by sharing the code somewhere? In fact, we already have some in script/common.c - but it's only used by the tools that are in script/. //Magnus -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] [PATCH 1/2] SSL: GUC option to prefer server cipher order
On Thursday, November 7, 2013, Marko Kreen wrote: On Wed, Nov 06, 2013 at 09:57:32PM -0300, Alvaro Herrera wrote: Marko Kreen escribió: By default OpenSSL (and SSL/TLS in general) lets client cipher order take priority. This is OK for browsers where the ciphers were tuned, but few Postgres client libraries make cipher order configurable. So it makes sense to make cipher order in postgresql.conf take priority over client defaults. This patch adds setting 'ssl_prefer_server_ciphers' which can be turned on so that server cipher order is preferred. Wouldn't it make more sense to have this enabled by default? Well, yes. :) I would even drop the GUC setting, but hypothetically there could be some sort of backwards compatiblity concerns, so I added it to patch and kept old default. But if noone has strong need for it, the setting can be removed. I think the default behaviour should be the one we recommend (which would be to have the server one be preferred). But I do agree with the requirement to have a GUC to be able to remove it - even though I don't like the idea of more GUCs. But making it a compile time option would make it the same as not having one... //Magnus -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Logging WAL when updating hintbit
On 11/14/2013 07:02 AM, Sawada Masahiko wrote: I attached patch adds new wal_level 'all'. Shouldn't this be a separate setting? It's useful for storage which requires rewriting a partially written sector before it can be read again. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hello, When I read it again and try to relate, I get your point. Actually true, hashes must always be performed as last option (if that is what you too meant) and if there are few other operations they must be the last one to be performed especially after sorting/grouping. Hashes must somehow make use of already sorted data (I think this something even you indicated) Yes, some 'hash'es could preserve order selecting such a function for hash function. But at least PostgreSQL's 'HashAggregation' uses not-order-preserving function as hash function. So output cannot preserve input ordering. I will do that if I get a DB2 system or Oracle system running. I will try to replicate the same 2 test cases and share the plan. One thing which I am sure is, the below part of the plan QUERY PLAN | Subquery Scan on __unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558 loops=1) would be generated as RID scan in DB2 (which I have seen to perform better than normal subquery scans in DB2). DB2's document says it is used for 'index ORing' corresponds OR or IN ops, which seems to be a relative to BitmapOr of PostgreSQL, perhaps not to HashAggregates/SemiJoin. I tried to imagin the plan for the group_by case with repeated index scan and merging.. select student_name from student_score where (course,score) in (select course,max(score) from student_score group by course); Taking the advantage that the cardinarity of course is 8, this query could be transformed into 8 times of index scan and bitmaping. With hypothetical plan node LOOP, and BitmapScanAdd the plan could be, | BitmapHeapScan (rows = 155, loops = 1) | - LOOP | ON Subquery (select distinct course from student_course) as c0 | - BitmapScanAdd (loops = 8) |BitmapCond: (student_score.score = x) | - Limit (rows = 1, loops = 8) AS x | - Unique (rows = 1, loops = 8) | - IndexScan using idx_score on student_course (rows = 1, loops = 8) | Filter (student_course.course = c0) I suppose this is one possibility of what DB2 is doing. If DB2 does the same optimization for ranking 1 with the dense_rank() case, this plan might be like this, | BitmapHeapScan (rows = 133, loops = 1) | - LOOP | ON Subquery (select distinct course from student_course) as c0 | - BitmapScanAdd (loops = 8) |BitmapCond: (student_score.score = x) | - Limit (rows = 1, loops = 8) AS x | - Unique (rows = 2, loops = 8) | - IndexScan using idx_score on student_course (rows = 18,loops= 8) | Filter (student_course.course = c0) Both plans surely seem to be done shortly for relatively small n's and number of courses. On the other hand, using semijoin as PostgreSQL does, creating HashAggregate storing nth place score for every course requires some memory to work on for each course. | Hash Semi Join | Hash Cond: (a.course = b.course and a.score = b.score) | - SeqScan on student_score as a | - Hash | - HashAggregatesFunc (rows = 8) | Key = course, func = rankn(dense_rank(), n, key, val) |- SeqScan on student_score (rows = 122880) Where, rankn() must keep socres down to nth rank and emits nth score as final value. I don't get more generic form for this mechanism right now, and the value to do in this specific manner seems not so much.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra functionality to createuser
1. cloned from https://github.com/samthakur74/postgres Sorry. cloned from https://github.com/postgres/postgres regards Sameer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04.11.2013 23:44, Alexander Korotkov wrote: On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov aekorot...@gmail.comwrote: Attached version of patch is debugged. I would like to note that number of bugs was low and it wasn't very hard to debug. I've rerun tests on it. You can see that numbers are improved as the result of your refactoring. event | period ---+- index_build | 00:01:45.416822 index_build_recovery | 00:00:06 index_update | 00:05:17.263606 index_update_recovery | 00:01:22 search_new| 00:24:07.706482 search_updated| 00:26:25.364494 (6 rows) label | blocks_mark +- search_new | 847587636 search_updated | 881210486 (2 rows) label | size ---+--- new | 419299328 after_updates | 715915264 (2 rows) Beside simple bugs, there was a subtle bug in incremental item indexes update. I've added some more comments including ASCII picture about how item indexes are shifted. Now, I'm trying to implement support of old page format. Then we can decide which approach to use. Attached version of patch has support of old page format. Patch still needs more documentation and probably refactoring, but I believe idea is clear and it can be reviewed. In the patch I have to revert change of null category placement for compatibility with old posting list format. Thanks, just glanced at this quickly. If I'm reading the patch correctly, old-style non-compressed posting tree leaf pages are not vacuumed at all; that's clearly not right. Fixed. Now separate function handles uncompressed posting lists and compress them if as least one TID is deleted. I argued earlier that we don't need to handle the case that compressing a page makes it larger, so that the existing items don't fit on the page anymore. I'm having some second thoughts on that; I didn't take into account the fact that the mini-index in the new page format takes up some space. I think it's still highly unlikely that there isn't enough space on a 8k page, but it's not totally crazy with a non-standard small page size. So at least that situation needs to be checked for with an ereport(), rather than Assert. Now this situation is ereported before any change in page. To handle splitting a non-compressed page, it seems that you're relying on the fact that before splitting, we try to insert, which compresses the page. The problem with that is that it's not correctly WAL-logged. The split record that follows includes a full copy of both page halves, but if the split fails for some reason, e.g you run out of disk space, there is no WAL record at all of the the compression. I'd suggest doing the compression in the insert phase on a temporary copy of the page, leaving the original page untouched if there isn't enough space for the insertion to complete. (You could argue that this case can't happen because the compression must always create enough space to insert one more item. maybe so, but at least there should be an explicit check for that.) Good point. Yes, if we don't handle specially inserting item indexes, I see no point to do special handling for single TID which is much smaller. In the attached patch dataCompressLeafPage just reserves space for single TID. Also, many changes in comments and README. Unfortunally, I didn't understand what is FIXME about in ginVacuumEntryPage. So, it's not fixed :) Sorry, I posted buggy version of patch. Attached version is correct. -- With best regards, Alexander Korotkov. gin-packed-postinglists-14.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve code in tidbitmap.c
I'd like to do the complementary explanation of this. In tbm_union_page() and tbm_intersect_page() in tidbitmap.c, WORDS_PER_PAGE is used in the scan of a lossy chunk, instead of WORDS_PER_CHUNK, where these macros are defined as: /* number of active words for an exact page: */ #define WORDS_PER_PAGE ((MAX_TUPLES_PER_PAGE - 1) / BITS_PER_BITMAPWORD + 1) /* number of active words for a lossy chunk: */ #define WORDS_PER_CHUNK ((PAGES_PER_CHUNK - 1) / BITS_PER_BITMAPWORD + 1) Though the use of WORDS_PER_PAGE in the scan of a lossy chunk is logically correct since these macros implicitly satisfy that WORDS_PER_CHUNK WORDS_PER_PAGE, I think WORDS_PER_CHUNK should be used in the scan of a lossy chunk for code readability and maintenance. So, I submitted a patch working in such a way in an earlier email. I think that, as a secondary effect of the patch, the scan would be performed a bit efficiently. I'll add the patch to the 2013-11 CF. Any comments are welcome. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Design A new option: xlogdir is added to the list of options for pg_basebackup. The new option is not having an equivalent short option letter. This option will allow the user to specify a different directory for pg_xlog. The format for specifying a different directory will be: --xlogdir=/path/to/xlog/directory eg: pg_basebackup --xlogdir=/home/pg/xlog -D ../dataBaseBackUp When user specifies a xlog directory, it creates a symbolic link from the default directory to the user specified directory. User can give only absolute path for the xlog directory. This option will work only if the format for the backup is 'plain'. Please provide your feedback / suggestions Regards, Hari babu. UserSpecifiedxlogDir.patch Description: UserSpecifiedxlogDir.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2013/10/21 20:17), KONDO Mitsumasa wrote: (2013/10/18 22:21), Andrew Dunstan wrote: If we're going to extend pg_stat_statements, even more than min and max I'd like to see the standard deviation in execution time. OK. I do! I am making some other patches, please wait more! I add stddev_time and fix some sources. Psql result of my latest patch is under following. userid | 10 dbid| 16384 query | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?; calls | 74746 total_time | 1094.291998 min_time| 0.007 max_time| 15.091 stddev_time | 0.100439187720684 rows| 74746 shared_blks_hit | 302346 shared_blks_read| 6 shared_blks_dirtied | 161 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0 I don't think a lot that order of columns in this table. If have any idea, please send me. And thanks for a lot of comments and discussion, I am going to refer to these for not only this patch but also development of pg_statsinfo and pg_stats_reporter:-) Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Oh! Sorry... I forgot to attach my latest patch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql new file mode 100644 index 000..929d623 --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql @@ -0,0 +1,51 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements( +OUT userid oid, +OUT dbid oid, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT min_time float8, +OUT max_time float8, +OUT stddev_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; + +/* New Function */ +CREATE FUNCTION pg_stat_statements_reset_time() +RETURNS void +AS 'MODULE_PATHNAME' +LANGUAGE C; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql new file mode 100644 index 000..d590acc --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql @@ -0,0 +1,52 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit + +-- Register functions. +CREATE FUNCTION pg_stat_statements_reset() +RETURNS void +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE FUNCTION pg_stat_statements_reset_time() +RETURNS void +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE FUNCTION pg_stat_statements( +OUT userid oid, +OUT dbid oid, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT min_time float8, +OUT max_time float8, +OUT stddev_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME' +LANGUAGE C; + +-- Register a view on the function for ease of use. +CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; + +-- Don't want this to be available to non-superusers. +REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; +REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index e8aed61..5c63940 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -4,8 +4,10 @@ MODULE_big = pg_stat_statements OBJS = pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \ - pg_stat_statements--unpackaged--1.0.sql +DATA = pg_stat_statements--1.2.sql \ + pg_stat_statements--1.0--1.1.sql \ + pg_stat_statements--1.1--1.2.sql \ + pg_stat_statements--unpackaged--1.0.sql ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql index 5be281e..5662273 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql @@ -1,7 +1,7 @@ /* contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql */ -- complain if script is sourced in psql, rather than via ALTER EXTENSION -\echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.1' to load this file. \quit +\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit /* First we have to remove them from the extension */ ALTER EXTENSION pg_stat_statements DROP VIEW
Re: [HACKERS] nested hstore patch
On 11/14/2013 03:21 AM, Hannu Krosing wrote: On 11/14/2013 01:32 AM, David E. Wheeler wrote: On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I remember strong voices in support of *not* normalising json, so that things like {a:1,a:true, a:b, a:none} would go through the system unaltered, for claimed standard usage of json as processing instructions. That is as source code which can possibly converted to JavaScript Object and not something that would come out of serialising of any existing JavaScript Object. My recollection from PGCon was that there was consensus to normalize on the way in -- Great news! I remember advocating this approach in the mailing lists but having been out-voted based on current real-world usage out there :) or at least, if we switched to a binary representation as proposed by Oleg Teodor, it was not worth the hassle to try to keep it. Very much agree. For the source code approach I'd recommend text type with maybe a check that it is possible to convert it to json. I don't think you and David are saying the same thing. AIUI he wants one JSON type and is prepared to discard text preservation (duplicate keys and key order). You want two json types, one of which would feature text preservation. Correct me if I'm wrong. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I create a patch that is improvement of disk-read and OS file caches. It can optimize kernel readahead parameter using buffer access strategy and posix_fadvice() in various disk-read situations. In general OS, readahead parameter was dynamically decided by disk-read situations. If long time disk-read was happened, readahead parameter becomes big. However it is based on experienced or heuristic algorithm, it causes waste disk-read and throws out useful OS file caches in some case. It is bad for disk-read performance a lot. It would be relevant to know which kernel did you use for those tests. @@ -677,6 +677,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, errmsg(could not seek to block %u in file \%s\: %m, blocknum, FilePathName(v-mdfd_vfd; +BufferHintIOAdvise(v-mdfd_vfd, buffer, BLCKSZ, strategy); nbytes = FileRead(v-mdfd_vfd, buffer, BLCKSZ); TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, A while back, I tried to use posix_fadvise to prefetch index pages. I ended up finding out that interleaving posix_fadvise with I/O like that severly hinders (ie: completely disables) the kernel's read-ahead algorithm. How exactly did you set up those benchmarks? pg_bench defaults? pg_bench does not exercise heavy sequential access patterns, or long index scans. It performs many single-page index lookups per transaction and that's it. You may want to try your patch with more real workloads, and maybe you'll confirm what I found out last time I messed with posix_fadvise. If my experience is still relevant, those patterns will have suffered a severe performance penalty with this patch, because it will disable kernel read-ahead on sequential index access. It may still work for sequential heap scans, because the access strategy will tell the kernel to do read-ahead, but many other access methods will suffer. Try OLAP-style queries. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On 14.11.2013 14:38, David Rowley wrote: I've just completed some more benchmarking of this. I didn't try dropping the threshold down to 2 or 0 but I did tests at the cut over point and really don't see much difference in performance between the list at 32 and the hashtable at 33 sequences. The hash table version excels in the 16000 sequence test in comparison to the unpatched version. Times are in milliseconds of the time it took to call currval() 10 times for 1 sequence. Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32 in cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in cache 1963.711 10329.22 426% If I understand those results correctly, the best case scenario with the current code takes about 1800 ms. There's practically no difference with N = 32, where N is the number of sequences touched. The hash table method also takes about 1800 ms when N=33. The performance of the hash table is O(1), so presumably we can extrapolate from that that it's the same for any N. I think that means that we should just completely replace the list with the hash table. The difference with a small N is lost in noise, so there's no point in keeping the list as a fast path for small N. That'll make the patch somewhat simpler. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
Hi, On 2013-11-13 22:55:43 +1300, David Rowley wrote: Here http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.us there was some talk about init_sequence being a bottleneck when many sequences are used in a single backend. The attached I think implements what was talked about in the above link which for me seems to double the speed of a currval() loop over 3 sequences. It goes from about 7 seconds to 3.5 on my laptop. I think it'd be a better idea to integrate the sequence caching logic into the relcache. There's a comment about it: * (We can't * rely on the relcache, since it's only, well, a cache, and may decide to * discard entries.) but that's not really accurate anymore. We have the infrastructure for keeping values across resets and we don't discard entries. Since we already do a relcache lookup for every sequence manipulation (c.f. init_sequence()) relying on it won't increase, but rather decrease the overhead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
Andres Freund and...@2ndquadrant.com writes: I think it'd be a better idea to integrate the sequence caching logic into the relcache. There's a comment about it: * (We can't * rely on the relcache, since it's only, well, a cache, and may decide to * discard entries.) but that's not really accurate anymore. We have the infrastructure for keeping values across resets and we don't discard entries. We most certainly *do* discard entries, if they're not open when a cache flush event comes along. I suppose it'd be possible to mark a relcache entry for a sequence as locked-in-core, but that doesn't attract me at all. A relcache entry is a whole lot larger than the amount of state we really need to keep for a sequence. One idea is to have a hashtable for the sequence-specific data, but to add a link field to the relcache entry that points to the non-flushable sequence hashtable entry. That would save the second hashtable lookup as long as the relcache entry hadn't been flushed since last use, while not requiring any violence to the lifespan semantics of relcache entries. (Actually, if we did that, it might not even be worth converting the list to a hashtable? Searches would become a lot less frequent.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On 2013-11-14 09:23:20 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I think it'd be a better idea to integrate the sequence caching logic into the relcache. There's a comment about it: * (We can't * rely on the relcache, since it's only, well, a cache, and may decide to * discard entries.) but that's not really accurate anymore. We have the infrastructure for keeping values across resets and we don't discard entries. We most certainly *do* discard entries, if they're not open when a cache flush event comes along. What I was aiming at is that we don't discard them because of a limited cache size. I don't think it means much that we flush the entry when it's changed but not referenced. I suppose it'd be possible to mark a relcache entry for a sequence as locked-in-core, but that doesn't attract me at all. A relcache entry is a whole lot larger than the amount of state we really need to keep for a sequence. But effectively that's what already happens unless either somebody else does an ALTER SEQUENCE or sinval overflow happened, right? So in production workloads we already will keep both around since hopefully neither altering a sequence nor sinval overflows should be a frequent thing. One idea is to have a hashtable for the sequence-specific data, but to add a link field to the relcache entry that points to the non-flushable sequence hashtable entry. That would save the second hashtable lookup as long as the relcache entry hadn't been flushed since last use, while not requiring any violence to the lifespan semantics of relcache entries. (Actually, if we did that, it might not even be worth converting the list to a hashtable? Searches would become a lot less frequent.) That's not a bad idea if we decide not to integrate them. And I agree, there's not much need to have a separate hashtable in that case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Clang 3.3 Analyzer Results
Magnus Hagander mag...@hagander.net writes: That code was originally stolen from psql, and then whacked around a number of times. The part about looping and passwords, for example, is in startup.c in psql as well. We probably want to fix it there as well (even if it doesn't have the same problem, it has the same general design). Or perhaps even put that function somewhere shared between the two? It's also in pg_dump/pg_backup_db.c, there's a version of it in pg_dumpall.c, etc. Which I think is a good argument for fixing them all by sharing the code somewhere? In fact, we already have some in script/common.c - but it's only used by the tools that are in script/. Hm, maybe, but where? It's inappropriate for libpgcommon (we don't want that calling libpq), so I'm not real sure what to do with it. Also it's not clear to me that all these tools would have the same requirements for the non-password parameters for the connection request. BTW, I realized while fooling with this that although the code looks like it's intended to iterate till a correct password is obtained, actually it cannot prompt more than once, because of the way PQconnectionNeedsPassword is coded. Therefore, the double free that clang is worried about can't really happen. It's still worth fixing IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
Andres Freund and...@2ndquadrant.com writes: On 2013-11-14 09:23:20 -0500, Tom Lane wrote: We most certainly *do* discard entries, if they're not open when a cache flush event comes along. What I was aiming at is that we don't discard them because of a limited cache size. I don't think it means much that we flush the entry when it's changed but not referenced. Well, I don't want non-user-significant events (such as an sinval queue overrun) causing sequence state to get discarded. We would get bug reports about lost sequence values. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote: On Nov 13, 2013, at 4:45 PM, Andrew Dunstan and...@dunslane.net wrote: It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stuff order will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a pretty print option restore the lost info? I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable. There is a canonical form. http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00 A Canonical form would be very useful. Thats a bit trickier than sorting the keys and I don't know there is an accepted canonical form for json yet that can represent all json documents. (The canonical form is not the pretty form, but I think the key ordering should be the same.) It might be nice to have a more general canonical form if one emerges from somewhere that could encode any json. Since without something like this, hashing can only be well specified for the 'sensible subset of json' used in security protocols. Garick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] ecpg: Split off mmfatal() from mmerror()
On Tue, Nov 12, 2013 at 10:22:20PM -0500, Peter Eisentraut wrote: Similar to recent pg_upgrade changes (https://commitfest.postgresql.org/action/patch_view?id=1216), here is a patch to separate the terminating and nonterminating variants of mmerror() in ecpg. ... Haven't tried it, but it sure looks good to me. Feel free to commit. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On 2013-11-14 09:47:18 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-11-14 09:23:20 -0500, Tom Lane wrote: We most certainly *do* discard entries, if they're not open when a cache flush event comes along. What I was aiming at is that we don't discard them because of a limited cache size. I don't think it means much that we flush the entry when it's changed but not referenced. Well, I don't want non-user-significant events (such as an sinval queue overrun) causing sequence state to get discarded. We would get bug reports about lost sequence values. But we can easily do as you suggest and simply retain the entry in that case. I am just not seeing the memory overhead argument as counting much since we don't protect against it in normal operation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Somebody broke \d on indexes
In HEAD: regression=# \d tenk1_thous_tenthous ERROR: column i.indisidentity does not exist LINE 4: i.indisidentity, ^ This works fine in released versions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 11/14/2013 12:09 AM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote: On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should require hstore to do things like build Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. This is pretty standard in the programming languages I know of which use JSON. I think the caller should have no such responsibility. Keys should be able to repeated. Apparently your experience with using JSON in practice has been fairly different from mine; the projects I work on, the JSON is being constantly converted back and forth to hashes and dictionaries, which means that ordering is not preserved and keys have to be unique (or become unique within one conversion cycle). I think, based on the language of the RFC and common practice, that it's completely valid for us to require unique keys within JSON-manipulation routines. Common practice? The internet is littered with complaints about documents being spontaneously re-ordered and or de-duplicated in various stacks. Other stacks provide mechanisms for explicit key order handling (see here: http://docs.python.org/2/library/json.html). Why do you think they did that? I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Maybe call it jsdoc or jdoc or jsobj or somesuch. This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. Had the json type been stuffed into an extension, there would be a clearer path to get to where you want to go since we could have walled off the old functionality and introduced side by side API calls. As things stand now, I don't see a clean path to do that. I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. It's not just a matter of performance. It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus j...@agliodbs.com wrote: It's not just a matter of performance. It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. That's not very convincing. What *exactly* is impossible and why to you think it justifies breaking compatibility with current applications? The way forward seems pretty straightforward: given that hstore is getting nesting power and is moving closer to the json way of doing things it is essentially 'binary mode json'. I'm ok with de-duplication and key ordering when moving into that structure since it's opt in and doesn't break the serialization behaviors we have today. If you want to go further and unify the types then you have to go through the design work to maintain compatibility. Furthermore, I bet the performance argument isn't so clear cut either. The current json type is probably faster at bulk serialization precisely because you *dont* need to deduplicate and reorder keys: the serialization operates without context. It will certainly be much better for in place manipulations but it's not nearly as simple as you are making it out to be. merlin -- Sent via pgsql-hackers mailing
Re: [HACKERS] Somebody broke \d on indexes
On 2013-11-14 09:52:11 -0500, Tom Lane wrote: In HEAD: regression=# \d tenk1_thous_tenthous ERROR: column i.indisidentity does not exist LINE 4: i.indisidentity, ^ That's me. At some point indisidentity was renamed to indisreplident. Patch attached (also renaming a variable that didn't cause problems but wasn't named consistently anymore). Shouldn't we have at least one \d of an index in the regression tests somewhere? Not that that excuses stupid mitakes, but it'd be helpful nonetheless. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 25fec2b..ceda13e 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -1611,9 +1611,9 @@ describeOneTableDetails(const char *schemaname, false AS condeferrable, false AS condeferred,\n); if (pset.sversion = 90400) - appendPQExpBuffer(buf, i.indisidentity,\n); + appendPQExpBuffer(buf, i.indisreplident,\n); else - appendPQExpBuffer(buf, false AS indisidentity,\n); + appendPQExpBuffer(buf, false AS indisreplident,\n); appendPQExpBuffer(buf, a.amname, c2.relname, pg_catalog.pg_get_expr(i.indpred, i.indrelid, true)\n @@ -1638,7 +1638,7 @@ describeOneTableDetails(const char *schemaname, char *indisvalid = PQgetvalue(result, 0, 3); char *deferrable = PQgetvalue(result, 0, 4); char *deferred = PQgetvalue(result, 0, 5); - char *indisidentity = PQgetvalue(result, 0, 6); + char *indisreplident = PQgetvalue(result, 0, 6); char *indamname = PQgetvalue(result, 0, 7); char *indtable = PQgetvalue(result, 0, 8); char *indpred = PQgetvalue(result, 0, 9); @@ -1670,7 +1670,7 @@ describeOneTableDetails(const char *schemaname, if (strcmp(deferred, t) == 0) appendPQExpBuffer(tmpbuf, _(, initially deferred)); - if (strcmp(indisidentity, t) == 0) + if (strcmp(indisreplident, t) == 0) appendPQExpBuffer(tmpbuf, _(, replica identity)); printTableAddFooter(cont, tmpbuf.data); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Somebody broke \d on indexes
Andres Freund and...@2ndquadrant.com writes: On 2013-11-14 09:52:11 -0500, Tom Lane wrote: In HEAD: regression=# \d tenk1_thous_tenthous ERROR: column i.indisidentity does not exist LINE 4: i.indisidentity, ^ That's me. At some point indisidentity was renamed to indisreplident. Patch attached (also renaming a variable that didn't cause problems but wasn't named consistently anymore). Ah, thanks, will commit. Shouldn't we have at least one \d of an index in the regression tests somewhere? Not that that excuses stupid mitakes, but it'd be helpful nonetheless. Seems like a good idea, will add one of those too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/14/2013 12:20 AM, Josh Berkus wrote: Merlin, I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. It's not just a matter of performance. It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. XML went through the same thing, which is part of how we got a bunch of incompatible dialects of XML. Now, your use case does show us that there's a case to be made for still having text JSON even after we have binary JSON. text-json could easily be a domain (text + check that it is convertible to json) maybe it is even possible to teach pg_upgrade to do this automatically There's a strong simplicity argument against that, though ... I think it confuses most people, similar to how storing 1+1 as processing instructions instead of just evaluationg it and storing 2 :) OTOH we are in this mess now and have to solve the backwards compatibility somehow. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/14/2013 01:42 AM, Andrew Dunstan wrote: On 11/13/2013 07:01 PM, Hannu Krosing wrote: I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Well, that's two voices for that course of action. I am not really for it (I would have liked to have a json_object/json_structure instead of json_string as the meaning of json) but I think there is quite strong argument for not breaking backwards compatibility. Interesting that I don't think I heard a single voice for this either at pgCon or pgOpen, I attended neither, but I did voice my preferences for _not_ having the json-as-source-code type on the mailing lists during previous json discussions. although I spent large amounts of time at both talking to people about Json, so I'd be interested to hear more voices. It would actually simplify things in a way if we do that - we've been working on a way of doing this that wouldn't upset pg_upgrade. This would render that effort unnecessary. I wonder how hard it would be to rename current json to json_source and have a new nested-hstore based json ? However it will complicate things for users who will have to choose between the data types, and function authors who will possibly have to write versions of functions to work with both types. You mostly want the functions for json-object type. This is supported by the fact that current functions on json-source treat it as json-object (for example key lookup gives you the value of latest key and not a list of all matching key values). You may want some new functions on json-source (maybe json_source_enumerate_key_values(json, key)) but the current ones are really for json-object. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote: This is supported by the fact that current functions on json-source treat it as json-object (for example key lookup gives you the value of latest key and not a list of all matching key values). yeah. hm. that's a good point. Maybe there's a middle ground here: I bet the compatibility issues would be minimized to an acceptable level if the 'xxx_to_json' functions maintained their current behaviors; they would construct the json type in a special internal mode that would behave like the current type does. In other words, the marshalling into binary structure could happen when: *) stored do a column in a table *) when any modifying routine is called, updating a key, value, etc *) manually via a function but not at cast time. This preserves compatibility for the important points and allows serialization of structures that are difficult with the binary mode variant. merln -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ideas of printing out the alternative paths
When searching all the possible paths of executing a query, the optimizer finds and saves the cheapest paths for the top level rel. I'd like to check out all the paths the optimizer has ever considered, which I believe, are stored in the pathlist of the top level rel. But I do not have an idea of how to print out these paths to see them visually. Does anyone have an idea how I can achieve this? Thanks, Zhan
Re: [HACKERS] additional json functionality
On 11/14/2013 04:07 PM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Maybe call it jsdoc or jdoc or jsobj or somesuch. This is exactly what needs to be done, full stop (how about: hstore). hstore has completely different i/o formats and thus has similar backwards compatibility problems. It really comes down to this: changing the serialization behaviors It is really not serialisation behaviours as there is nothing you can sensibly serialise to have repeated keys. I agree that you can generate such JSON which would be valid input tu any json parser, but no JavaScript Object which really serializes to such JSON. that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. we should have disallowed this from the beginning and should have encourages using text as storage for JavaScript source code. It's going to break applications I've currently put into production with no clear workaround. we could rename the old json type during pg_upgrade, but this would likely break at least implicit casts in functions. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. You could choose a sane serializer ;) The main argument here is still weather json is source code or serialization result for JavaScript Object (Notation). Had the json type been stuffed into an extension, there would be a clearer path to get to where you want to go since we could have walled off the old functionality and introduced side by side API calls. As things stand now, I don't see a clean path to do that. I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. But one could argue that this is not json either but rather some json-like input format for special parsers. Current recommendation is to use text for these kinds of things. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. It's not just a matter of performance. It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. All these would be possible if we redefined json as another notation for XML instead of string representation of JavaScript Object :) And things could really be in-place only inside pl/language functions, as PostgreSQL is still MVCC. What should be faster is access to nested values, though I suspect that it is not significantly faster unless you have very large json documents. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 11/14/2013 01:47 PM, Andrew Dunstan wrote: On 11/14/2013 03:21 AM, Hannu Krosing wrote: On 11/14/2013 01:32 AM, David E. Wheeler wrote: On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I remember strong voices in support of *not* normalising json, so that things like {a:1,a:true, a:b, a:none} would go through the system unaltered, for claimed standard usage of json as processing instructions. That is as source code which can possibly converted to JavaScript Object and not something that would come out of serialising of any existing JavaScript Object. My recollection from PGCon was that there was consensus to normalize on the way in -- Great news! I remember advocating this approach in the mailing lists but having been out-voted based on current real-world usage out there :) or at least, if we switched to a binary representation as proposed by Oleg Teodor, it was not worth the hassle to try to keep it. Very much agree. For the source code approach I'd recommend text type with maybe a check that it is possible to convert it to json. I don't think you and David are saying the same thing. AIUI he wants one JSON type and is prepared to discard text preservation (duplicate keys and key order). You want two json types, one of which would feature text preservation. I actually *want* the same thing that David wants, but I think that Merlin has valid concerns about backwards compatibility. If we have let this behaviour in, it is not nice to break several uses of it now. If we could somehow turn old json into a text domain with json syntax check (which it really is up to 9.3) via pg_upgrade that would be great. It would be the required for pg_dump to have some swicth to output different typename in CREATE TABLE and similar. Correct me if I'm wrong. cheers andrew -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/14/2013 05:06 PM, Merlin Moncure wrote: On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote: This is supported by the fact that current functions on json-source treat it as json-object (for example key lookup gives you the value of latest key and not a list of all matching key values). yeah. hm. that's a good point. Maybe there's a middle ground here: I bet the compatibility issues would be minimized to an acceptable level if the 'xxx_to_json' functions maintained their current behaviors; they would construct the json type in a special internal mode that would behave like the current type does. Do you have any xxx_to_json usage which can generate a field with multiple equal keys ? Or is it just about preserving order ? In other words, the marshalling into binary structure could happen when: *) stored do a column in a table *) when any modifying routine is called, updating a key, value, etc *) manually via a function but not at cast time. This preserves compatibility for the important points and allows serialization of structures that are difficult with the binary mode variant. Seems like this would not play nice with how PostgreSQL type system work in general, but could be a way forward if you say that you really do not need to store the order-preserving, multi-valued json. But in this case it could also be possible for these function to just generate json-format text, and with proper casts this would act exactly as you describe above, no ? -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ideas of printing out the alternative paths
Zhan Li zhanl...@gmail.com writes: When searching all the possible paths of executing a query, the optimizer finds and saves the cheapest paths for the top level rel. I'd like to check out all the paths the optimizer has ever considered, which I believe, are stored in the pathlist of the top level rel. No, most of them have been thrown away long before that. See add_path. Also realize that in a large join problem, a lot of potential plans never get explicitly considered, because the input paths get pruned before we get to considering the join rel at all. (If this were not so, planning would take too long.) People have experimented with having add_path print something about each path that's fed to it, but the output tends to be voluminous and not all that useful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
On Thu, Nov 14, 2013 at 9:09 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Hi, I create a patch that is improvement of disk-read and OS file caches. It can optimize kernel readahead parameter using buffer access strategy and posix_fadvice() in various disk-read situations. When I compiled the HEAD code with this patch on MacOS, I got the following error and warnings. gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -I../../../../src/include -c -o fd.o fd.c fd.c: In function 'BufferHintIOAdvise': fd.c:1182: error: 'POSIX_FADV_SEQUENTIAL' undeclared (first use in this function) fd.c:1182: error: (Each undeclared identifier is reported only once fd.c:1182: error: for each function it appears in.) fd.c:1185: error: 'POSIX_FADV_RANDOM' undeclared (first use in this function) make[4]: *** [fd.o] Error 1 make[3]: *** [file-recursive] Error 2 make[2]: *** [storage-recursive] Error 2 make[1]: *** [install-backend-recurse] Error 2 make: *** [install-src-recurse] Error 2 tablecmds.c:9120: warning: passing argument 5 of 'smgrread' makes pointer from integer without a cast bufmgr.c:455: warning: passing argument 5 of 'smgrread' from incompatible pointer type Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Nov 14, 2013 at 7:11 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Hmm, now if we had portable atomic addition, so that we could spare the spinlock ... That certainly seems like an interesting possibility. I think that pg_stat_statements should be made to do this kind of thing by a third party tool that aggregates snapshots of deltas. Time-series data, including (approximate) *local* minima and maxima should be built from that. I think tools like KONDO-san's pg_statsinfo tool have an important role to play here. I would like to see it or a similar tool become a kind of defacto standard for consuming pg_stat_statements' output. At this point we are in general very much chasing diminishing returns by adding new things to the counters struct, particularly given that it's currently protected by a spinlock. And adding a histogram or min/max for something like execution time isn't an approach that can be made to work for every existing cost tracked by pg_stat_statements. So, taking all that into consideration, I'm afraid this patch gets a -1 from me. Agreed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ideas of printing out the alternative paths
Thank you for your reply Tom. Then a) what are exactly stored in the pathlist of top level rel? Paths worth considering? b) I have been struggling to come up with a way to print the Path struct. If I can print a path the way like A hash join (B nested loop join C), that would be great. You mentioned people have printed something about each path, can you please give me a hint of what's that and how to achieve that? On Thu, Nov 14, 2013 at 12:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Zhan Li zhanl...@gmail.com writes: When searching all the possible paths of executing a query, the optimizer finds and saves the cheapest paths for the top level rel. I'd like to check out all the paths the optimizer has ever considered, which I believe, are stored in the pathlist of the top level rel. No, most of them have been thrown away long before that. See add_path. Also realize that in a large join problem, a lot of potential plans never get explicitly considered, because the input paths get pruned before we get to considering the join rel at all. (If this were not so, planning would take too long.) People have experimented with having add_path print something about each path that's fed to it, but the output tends to be voluminous and not all that useful. regards, tom lane
Re: [HACKERS] GIN improvements part2: fast scan
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 28.06.2013 22:31, Alexander Korotkov wrote: Now, I got the point of three state consistent: we can keep only one consistent in opclasses that support new interface. exact true and exact false values will be passed in the case of current patch consistent; exact false and unknown will be passed in the case of current patch preConsistent. That's reasonable. I'm going to mark this as returned with feedback. For the next version, I'd like to see the API changed per above. Also, I'd like us to do something about the tidbitmap overhead, as a separate patch before this, so that we can assess the actual benefit of this patch. And a new test case that demonstrates the I/O benefits. Revised version of patch is attached. Changes are so: 1) Patch rebased against packed posting lists, not depends on additional information now. 2) New API with tri-state logic is introduced. -- With best regards, Alexander Korotkov. gin-fast-scan.6.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Thu, Nov 14, 2013 at 10:54 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On 11/14/2013 05:06 PM, Merlin Moncure wrote: On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing ha...@2ndquadrant.com wrote: This is supported by the fact that current functions on json-source treat it as json-object (for example key lookup gives you the value of latest key and not a list of all matching key values). yeah. hm. that's a good point. Maybe there's a middle ground here: I bet the compatibility issues would be minimized to an acceptable level if the 'xxx_to_json' functions maintained their current behaviors; they would construct the json type in a special internal mode that would behave like the current type does. Do you have any xxx_to_json usage which can generate a field with multiple equal keys ? Absolutely -- that's what I've been saying all along. For example: IIRC the end consumer is jqgrid, although the structure format may be being done to satisfy some intermediate transformations perhaps in GWT or in the browser itself. The point is I didn't define the structure (I think it sucks too), it was given to me to create and I did. The object's dynamic keys and values are moved into json structure by passing two parallel arrays into a userland function similar to what Andrew is proposing with json_build functionality. { classDisplayName: null, rows: [ { PropertyName: xxx, Row: 1, Group: Executive Dashboard, MetricName: Occupancy, 2012: 95.4%, Q2: 96.5%, Q3: 96.3%, Q4: 94.8%, 2013: 95.1%, Q2: 94.1%, Q3: 96.0%, Q4: 96.1% }, { PropertyName: xxx, Row: 2, Group: Executive Dashboard, MetricName: Occupancy, 2012: 95.9%, Q2: 97.3%, Q3: 95.7%, Q4: 95.2%, 2013: 93.9%, Q2: 93.4%, Q3: 95.3%, Q4: 95.1% } ] } but not at cast time. This preserves compatibility for the important points and allows serialization of structures that are difficult with the binary mode variant. Seems like this would not play nice with how PostgreSQL type system work in general, but could be a way forward if you say that you really do not need to store the order-preserving, multi-valued json. Yes, exactly. I'm OK with simplifying the structure for storage purposes because in that context postgres is the parser and gets to decide what the precise behaviors are. Simplifying the stored structures during upgrade is an OK concession to make, I think. It is not safe to assume the structure should be simplified when serializing. But in this case it could also be possible for these function to just generate json-format text, and with proper casts this would act exactly as you describe above, no ? I think so. if I'm following you correctly. Maybe you get the best of both worlds and (mostly) maintaining compatibility by deferring the decomposition into binary structure in certain contexts. I'd even throw in the equality operator (which, thankfully, we haven't defined yet) as a place where decomposition could happen. Pretty much any scenario that isn't involved in raw assembly and output. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AWS RDS now supports PostgreSQL!
http://aws.typepad.com/aws/2013/11/amazon-rds-for-postgresql-now-available.html (The Free Tier page has not been updated yet, but I believe PostgreSQL should also be free for new AWS users: http://aws.amazon.com/free/ ) Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote: This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to be guaranteed? Just because people have come to depend on something doesn’t mean we can’t change it. It’s one thing if we said this was a feature you could depend on, but AFAIK we haven’t. And frankly, the dupes have caused problems for some of my colleagues at work. To me, it’s a bug (or, at best, a mis-feature) that causes more issues than it prevents. In my experience, no JSON parser guarantees key order or duplication. You can’t have dupes and there is no ordering in a Perl hash, Objective-C NSDictionary, or JavaScript object. There is of course order and there can be dupes in a JSON string, but not in the objects built from it. If you go in and out of a parser, dupes are eliminated and key order is not preserved. I expect the same from JSON storage. With no guarantees of preserved ordering or duplication, and with no formal expectation of such by JSON parsers written for various programming languages, I think there is little to be lost by removing those aspects of the JSON type. For those (hopefully rare) situations where such expectations exist, the JSON should be stored as text, as Hannu suggests. My $0.02. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Sounds good! Here are the review comments: +printf(_(--xlogdir=XLOGDIR location for the transaction log directory\n)); This message is not aligned well. -if (!streamwal || strcmp(filename + strlen(filename) - 8, /pg_xlog) != 0) +if ((!streamwal (strcmp(xlog_dir, ) == 0)) +|| strcmp(filename + strlen(filename) - 8, /pg_xlog) != 0) You need to update the source code comment. +#ifdef HAVE_SYMLINK +if (symlink(xlog_dir, linkloc) != 0) +{ +fprintf(stderr, _(%s: could not create symbolic link \%s\: %s\n), +progname, linkloc, strerror(errno)); +exit(1); +} +#else +fprintf(stderr, _(%s: symlinks are not supported on this platform + cannot use xlogdir)); +exit(1); +#endif +} Is it better to call pg_free() at the end? Even if we don't do that, it would be almost harmless, though. Don't we need to prevent users from specifying the same directory in both --pgdata and --xlogdir? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LISTEN / NOTIFY enhancement request for Postgresql
On Thu, Oct 24, 2013 at 11:41:57AM -0400, Sev Zaslavsky wrote: Here is an example implementation: http://activemq.apache.org/nms/ activemq-wildcards.html • is used to separate names in a path • * is used to match any name in a path • is used to recursively match any destination starting from this name For example using the example above, these subscriptions are possible Subscription Meaning PRICE. Any price for any product on any exchange PRICE.STOCK.Any price for a stock on any exchange PRICE.STOCK.NASDAQ.* Any stock price on NASDAQ PRICE.STOCK.*.IBMAny IBM stock price on any exchange My request is to implement the same or similar feature in Postgresql. This does seem useful and pretty easy to implement. Should we add a TODO? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AWS RDS now supports PostgreSQL!
On Thu, Nov 14, 2013 at 12:29:58PM -0500, Rayson Ho wrote: http://aws.typepad.com/aws/2013/11/amazon-rds-for-postgresql-now-available.html (The Free Tier page has not been updated yet, but I believe PostgreSQL should also be free for new AWS users: http://aws.amazon.com/free/ ) Here is the Postgres AWS page: http://aws.amazon.com/rds/postgresql/ -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional information on log_line_prefix
On Fri, Oct 25, 2013 at 02:58:12PM -0400, Andrew Dunstan wrote: On 10/25/2013 01:50 PM, Emanuel Calvo wrote: Hi guys, I'm working on a quick convertion script for query reviews and I wonder if a feature request to add the following values will be possible: %D = duration %L = lock_time (lock only for this query) %E = estimated rows %R = total rows returned %B = total bytes sent %T = temporal tables used Those prefixes/values are just examples/proposals. Thanks for the hard work! Pretty much all of this can be got with the auto_explain module, and I think that's where it belongs. The other problem is that these mostly have meaning for a line generated by a completed query, but not for the many other lines that can appear in that log file. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler da...@justatheory.com wrote: On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote: This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to be guaranteed? It doesn't, but the row_to_json function has a very clear mechanism of action. And, 'not being documented' is not the standard for latitude to make arbitrary changes to existing function behaviors. In my experience, no JSON parser guarantees key order or duplication. I found one in about two seconds. http://docs.python.org/2/library/json.html object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value ofobject_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion). If object_hook is also defined, the object_pairs_hooktakes priority. That makes the rest of your argument moot. Plus, I quite clearly am dealing with parsers that do. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ideas of printing out the alternative paths
Zhan Li zhanl...@gmail.com writes: Thank you for your reply Tom. Then a) what are exactly stored in the pathlist of top level rel? Paths worth considering? b) I have been struggling to come up with a way to print the Path struct. If I can print a path the way like A hash join (B nested loop join C), that would be great. You mentioned people have printed something about each path, can you please give me a hint of what's that and how to achieve that? I don't think anyone's tried anything much smarter than src/backend/nodes/outfuncs.c, or there's some more limited stuff at the bottom of src/backend/optimizer/path/allpaths.c. Reassembling into something more human-readable than that would probably take some work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/14/2013 08:17 PM, Merlin Moncure wrote: On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler da...@justatheory.com wrote: On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote: This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to be guaranteed? It doesn't, but the row_to_json function has a very clear mechanism of action. And, 'not being documented' is not the standard for latitude to make arbitrary changes to existing function behaviors. the whole hash*() function family was changed based on not documented premise, so we do have a precedent . In my experience, no JSON parser guarantees key order or duplication. I found one in about two seconds. http://docs.python.org/2/library/json.html object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value ofobject_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion). If object_hook is also defined, the object_pairs_hooktakes priority. That makes the rest of your argument moot. Plus, I quite clearly am dealing with parsers that do. I am sure you could also devise an json encoding scheme where white space is significant ;) The question is, how much of it should json *type* support. As discussed in other thread, most of your requirements would be met by having json/row/row set-to-text serializer functions which output json-formatted text. Then if you actually want to save this as easy to manipulate json document, you can save this text to a field of type json, which does de-duplication and loses order. So my suggestion is to upgrade existing json data type to text - or maybe json_text with format check - when upgrading to 9.4, to change current function which output json to output text and have new json type which stores proper JavaScript Object - like structured data. I would like to go a step further and have it automatically support not only the json data types as data but all postgresql data types by including type oid in the binary encoding, but this is probably not something for json but rather for a new pgdoc data type in 9.5 Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
Hannu Krosing-5 wrote On 11/14/2013 08:17 PM, Merlin Moncure wrote: On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler lt; david@ gt; wrote: On Nov 14, 2013, at 7:07 AM, Merlin Moncure lt; mmoncure@ gt; wrote: This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. The current json format is a minimally conforming (i.e., does not enforce the should not contain duplicates suggestion) structured json validating type that stores its input as-is once validated. Its presence is going to probably cause difficulties with function API for reasons already mentioned but its place in core type-library is already firmly established. Andrew's API additions seem like good things to have for this type. I haven't seen any comments on this but do these functions facilitate creating json that can have duplicates and that maintain order? Even if we accept input to json with these limitations we are not obligated to make our own json output minimally conforming - though we should at maintain such if it is already in place. So my suggestion is to upgrade existing json data type to text - or maybe json_text with format check - when upgrading to 9.4, to change current function which output json to output text and have new json type which stores proper JavaScript Object - like structured data. Technically a down-grade but anyway... How does this work with a pg_dump/pg_restore upgrade? If we want to have maximally conforming json type(s) we can still create them. I'd say we'd still want two versions, similar in a way to how we have bytea and text even though any text can technically be stored like bytea. The constructor API for both would want to be identical with the only real difference being that text-json_source would be layout preserving (i.e., validation only) while text-json_binary would be a true parsing conversion. Likewise json_source-text would output the same input while json_binary-text would output the canonical form (pretty-printing and such would need to be initiated via functions). If things are going to be a little more complex anyway why not just go and toss in the kitchen sink too? This way we provide maximal flexibility. From a development perspective some features (indexes, equality, in-place updates and related modification API) may only make sense on a subset of the available types but trade-offs are a fact of life. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 14.11.2013 19:26, Alexander Korotkov wrote: On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 28.06.2013 22:31, Alexander Korotkov wrote: Now, I got the point of three state consistent: we can keep only one consistent in opclasses that support new interface. exact true and exact false values will be passed in the case of current patch consistent; exact false and unknown will be passed in the case of current patch preConsistent. That's reasonable. I'm going to mark this as returned with feedback. For the next version, I'd like to see the API changed per above. Also, I'd like us to do something about the tidbitmap overhead, as a separate patch before this, so that we can assess the actual benefit of this patch. And a new test case that demonstrates the I/O benefits. Revised version of patch is attached. Changes are so: 1) Patch rebased against packed posting lists, not depends on additional information now. 2) New API with tri-state logic is introduced. Thanks! A couple of thoughts after a 5-minute glance: * documentation * How about defining the tri-state consistent function to also return a tri-state? True would mean that the tuple definitely matches, false means the tuple definitely does not match, and Unknown means it might match. Or does return value true with recheck==true have the same effect? If I understood the patch, right, returning Unknown or True wouldn't actually make any difference, but it's conceivable that we might come up with more optimizations in the future that could take advantage of that. For example, for a query like foo OR (bar AND baz), you could immediately return any tuples that match foo, and not bother scanning for bar and baz at all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LISTEN / NOTIFY enhancement request for Postgresql
Bruce Momjian br...@momjian.us writes: • is used to separate names in a path • * is used to match any name in a path • is used to recursively match any destination starting from this name For example using the example above, these subscriptions are possible Subscription Meaning PRICE. Any price for any product on any exchange PRICE.STOCK.Any price for a stock on any exchange PRICE.STOCK.NASDAQ.* Any stock price on NASDAQ PRICE.STOCK.*.IBMAny IBM stock price on any exchange My request is to implement the same or similar feature in Postgresql. This does seem useful and pretty easy to implement. Should we add a TODO? I think we should consider the ltree syntax in that case, as documented in the following link: http://www.postgresql.org/docs/9.3/interactive/ltree.html Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks
I wrote: Robert Haas robertmh...@gmail.com writes: I'm not volunteering to spend time fixing this, but I disagree with the premise that it isn't worth fixing, because I think it's a POLA violation. Yeah, I'm not terribly comfortable with letting it go either. Attached is a proposed patch. I couldn't see any nice way to do it without adding a field to PLpgSQL_execstate, so this isn't a feasible solution for back-patching (it'd break the plpgsql debugger). However, given the infrequency of complaints, I think fixing it in 9.4 and up is good enough. This patch crashed and burned when I tried it on a case where a DO block traps an exception :-(. I had thought that a private econtext stack was the right thing to do, but it isn't because we need plpgsql_subxact_cb to be able to clean up dead econtexts in either functions or DO blocks. So after some experimentation I came up with version 2, attached. The additional test case I was using looks like do $outer$ begin for i in 1..10 loop begin execute $e$ do $$ declare x int = 0; begin x := 1 / x; end; $$; $e$; exception when division_by_zero then null; end; end loop; end; $outer$; If you try this with the patch, you'll notice that there's still a slow leak, but that's not the fault of DO blocks: the same leak exists if you transpose this code into a regular function. That leak is the fault of exec_stmt_dynexecute, which copies the querystring into the function's main execution context, from which it won't get freed if an error is thrown out of SPI_execute. (If we were using EXECUTE USING, the ppd structure would get leaked too.) There are some other similar error- case leaks in other plpgsql statements, I think. I'm not excited about trying to clean those up as part of this patch. regards, tom lane diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index bc31fe9..f5f1892 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** typedef struct *** 66,71 --- 66,80 * so that we don't have to re-prepare simple expressions on each trip through * a function. (We assume the case to optimize is many repetitions of a * function within a transaction.) + * + * However, there's no value in trying to amortize simple expression setup + * across multiple executions of a DO block (inline code block), since there + * can never be any. If we use the shared EState for a DO block, the expr + * state trees are effectively leaked till end of transaction, and that can + * add up if the user keeps on submitting DO blocks. Therefore, each DO block + * has its own simple-expression EState, which is cleaned up at exit from + * plpgsql_inline_handler(). DO blocks still use the simple_econtext_stack, + * though, so that subxact abort cleanup does the right thing. */ typedef struct SimpleEcontextStackEntry { *** typedef struct SimpleEcontextStackEntry *** 74,80 struct SimpleEcontextStackEntry *next; /* next stack entry up */ } SimpleEcontextStackEntry; ! static EState *simple_eval_estate = NULL; static SimpleEcontextStackEntry *simple_econtext_stack = NULL; / --- 83,89 struct SimpleEcontextStackEntry *next; /* next stack entry up */ } SimpleEcontextStackEntry; ! static EState *shared_simple_eval_estate = NULL; static SimpleEcontextStackEntry *simple_econtext_stack = NULL; / *** static int exec_stmt_dynfors(PLpgSQL_exe *** 136,142 static void plpgsql_estate_setup(PLpgSQL_execstate *estate, PLpgSQL_function *func, ! ReturnSetInfo *rsi); static void exec_eval_cleanup(PLpgSQL_execstate *estate); static void exec_prepare_plan(PLpgSQL_execstate *estate, --- 145,152 static void plpgsql_estate_setup(PLpgSQL_execstate *estate, PLpgSQL_function *func, ! ReturnSetInfo *rsi, ! EState *simple_eval_estate); static void exec_eval_cleanup(PLpgSQL_execstate *estate); static void exec_prepare_plan(PLpgSQL_execstate *estate, *** static char *format_preparedparamsdata(P *** 230,239 /* -- * plpgsql_exec_function Called by the call handler for *function execution. * -- */ Datum ! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo) { PLpgSQL_execstate estate; ErrorContextCallback plerrcontext; --- 240,256 /* -- * plpgsql_exec_function Called by the call handler for *function execution. + * + * This is also used to execute inline code blocks (DO blocks). The only + * difference that this code is aware of is that for a DO block, we want + * to use a private simple_eval_estate, which is created and passed in by + * the caller. For regular functions, pass NULL,
Re: [HACKERS] Assertions in PL/PgSQL
rebased patch Regards Pavel 2013/11/14 Peter Eisentraut pete...@gmx.net On Wed, 2013-10-09 at 18:57 +0200, Pavel Stehule wrote: here is a patch for RAISE WHEN clause Your patch needs to be rebased. diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index ca2c2b5..d6845d7 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -1753,9 +1753,7 @@ BEGIN -- Since execution is not finished, we can check whether rows were returned -- and raise exception if not. -IF NOT FOUND THEN -RAISE EXCEPTION 'No flight at %.', $1; -END IF; +RAISE EXCEPTION 'No flight at %.', $1 WHEN NOT FOUND; RETURN; END @@ -3376,11 +3374,11 @@ END LOOP optional replaceablelabel/replaceable /optional; raise errors. synopsis -RAISE optional replaceable class=parameterlevel/replaceable /optional 'replaceable class=parameterformat/replaceable' optional, replaceable class=parameterexpression/replaceable optional, ... /optional/optional optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional; -RAISE optional replaceable class=parameterlevel/replaceable /optional replaceable class=parametercondition_name/ optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional; -RAISE optional replaceable class=parameterlevel/replaceable /optional SQLSTATE 'replaceable class=parametersqlstate/' optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional; -RAISE optional replaceable class=parameterlevel/replaceable /optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional; -RAISE ; +RAISE optional replaceable class=parameterlevel/replaceable /optional 'replaceable class=parameterformat/replaceable' optional, replaceable class=parameterexpression/replaceable optional, ... /optional/optional optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional optional WHEN replaceableboolean-expression/replaceable /optional ; +RAISE optional replaceable class=parameterlevel/replaceable /optional replaceable class=parametercondition_name/ optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional optional WHEN replaceableboolean-expression/replaceable /optional; +RAISE optional replaceable class=parameterlevel/replaceable /optional SQLSTATE 'replaceable class=parametersqlstate/' optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional /optional optional WHEN replaceableboolean-expression/replaceable /optional; +RAISE optional replaceable class=parameterlevel/replaceable /optional USING replaceable class=parameteroption/replaceable = replaceable class=parameterexpression/replaceable optional, ... /optional optional WHEN replaceableboolean-expression/replaceable /optional; +RAISE optional WHEN replaceableboolean-expression/replaceable /optional ; /synopsis The replaceable class=parameterlevel/replaceable option specifies diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index bc31fe9..edb6105 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -2874,6 +2874,20 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt) char *err_schema = NULL; ListCell *lc; + /* check condition when is entered */ + if (stmt-cond != NULL) + { + bool value; + bool isnull; + + value = exec_eval_boolean(estate, stmt-cond, isnull); + exec_eval_cleanup(estate); + + /* ignore statement, when result of condition is false or NULL */ + if (isnull || value == false) + return PLPGSQL_RC_OK; + } + /* RAISE with no parameters: re-throw current exception */ if (stmt-condname == NULL stmt-message == NULL stmt-options == NIL) diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index f112282..a4d7035 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -63,6 +63,7 @@ static void current_token_is_not_variable(int tok); static PLpgSQL_expr *read_sql_construct(int until, int until2, int until3, + int until4, const char *expected, const char *sqlstart, bool isexpression, @@ -105,7 +106,7 @@ static void check_labels(const char *start_label, int end_location); static PLpgSQL_expr *read_cursor_args(PLpgSQL_var *cursor, int until, const char *expected); -static List *read_raise_options(void); +static List *read_raise_options(int *tok); %} @@ -1386,6 +1387,7 @@ for_control :
Re: [HACKERS] Extra functionality to createuser
On Thu, Nov 14, 2013 at 5:41 AM, Sameer Thakur samthaku...@gmail.com wrote: So i think -g option is failing Right you are. I was missing a g: in the getopt_long() call. Attached is a revised patch that handles that. And it behaves better: postgres@cbbrowne ~/p/s/b/scripts ./createuser -g purge_role -U postgres newuser4 postgres@cbbrowne ~/p/s/b/scripts pg_dumpall -g | grep newuser4 CREATE ROLE newuser4; ALTER ROLE newuser4 WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB LOGIN NOREPLICATION; GRANT purge_role TO newuser4 GRANTED BY postgres; -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml index 2f1ea2f..5fedc80 100644 --- a/doc/src/sgml/ref/createuser.sgml +++ b/doc/src/sgml/ref/createuser.sgml @@ -131,6 +131,16 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-g//term + termoption--roles//term + listitem + para +Indicates roles to associate with this role. + /para + /listitem + /varlistentry + + varlistentry termoption-i//term termoption--inherit//term listitem diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c index d1542d9..c469b52 100644 --- a/src/bin/scripts/createuser.c +++ b/src/bin/scripts/createuser.c @@ -47,6 +47,7 @@ main(int argc, char *argv[]) {pwprompt, no_argument, NULL, 'P'}, {encrypted, no_argument, NULL, 'E'}, {unencrypted, no_argument, NULL, 'N'}, + {roles, required_argument, NULL, 'g'}, {NULL, 0, NULL, 0} }; @@ -57,6 +58,7 @@ main(int argc, char *argv[]) char *host = NULL; char *port = NULL; char *username = NULL; + char *roles = NULL; enum trivalue prompt_password = TRI_DEFAULT; boolecho = false; boolinteractive = false; @@ -83,7 +85,7 @@ main(int argc, char *argv[]) handle_help_version_opts(argc, argv, createuser, help); - while ((c = getopt_long(argc, argv, h:p:U:wWedDsSaArRiIlLc:PEN, + while ((c = getopt_long(argc, argv, h:p:U:g:wWedDsSaArRiIlLc:PEN, long_options, optindex)) != -1) { switch (c) @@ -112,6 +114,9 @@ main(int argc, char *argv[]) case 'D': createdb = TRI_NO; break; + case 'g': + roles = pg_strdup(optarg); + break; case 's': case 'a': superuser = TRI_YES; @@ -302,6 +307,8 @@ main(int argc, char *argv[]) appendPQExpBuffer(sql, NOREPLICATION); if (conn_limit != NULL) appendPQExpBuffer(sql, CONNECTION LIMIT %s, conn_limit); + if (roles != NULL) + appendPQExpBuffer(sql, IN ROLE %s, roles); appendPQExpBuffer(sql, ;\n); if (echo) @@ -334,6 +341,7 @@ help(const char *progname) printf(_( -D, --no-createdb role cannot create databases (default)\n)); printf(_( -e, --echoshow the commands being sent to the server\n)); printf(_( -E, --encrypted encrypt stored password\n)); + printf(_( -g, --roles roles to associate with this new role\n)); printf(_( -i, --inherit role inherits privileges of roles it is a\n member of (default)\n)); printf(_( -I, --no-inherit role does not inherit privileges\n)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] union all query consumes all memory
Hello, one my customer reported a out of memory issue. After investigation he found a main problem in large query that uses a lot of union all queries. He wrote a self test: do $$ declare i integer; str text=''; begin for i in 1..1000 loop str := str || 'union all select i,i,i from generate_series(1,5) g(i) '; end loop; execute 'select 1,2,3 ' || str; end; $$ is it expected behave? Tested on PostgreSQL 9.1, 9.2, 9.3 It looks so all generated data are saved in memory only. Regards Pavel Stehule
Re: [HACKERS] union all query consumes all memory
Pavel Stehule pavel.steh...@gmail.com writes: one my customer reported a out of memory issue. After investigation he found a main problem in large query that uses a lot of union all queries. He wrote a self test: do $$ declare i integer; str text=''; begin for i in 1..1000 loop str := str || 'union all select i,i,i from generate_series(1,5) g(i) '; end loop; execute 'select 1,2,3 ' || str; end; $$ is it expected behave? Well, that query is entitled to use 1000 * work_mem for the generate_series output tuplestores. What's your work_mem setting? Perhaps more to the point, what was your customer doing? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] strncpy is not a safe version of strcpy
Hi All, As a bit of a background task, over the past few days I've been analysing the uses of strncpy in the code just to try and validate if it is the right function to be using. I've already seen quite a few places where their usage is wrongly assumed. As many of you will know and maybe some of you have forgotten that strncpy is not a safe version of strcpy. It is also quite an inefficient way to copy a string to another buffer as strncpy will 0 out any space that happens to remain in the buffer. If there is no space left after the copy then the buffer won't end with a 0. It is likely far better explained here -- http://www.courtesan.com/todd/papers/strlcpy.html For example , the following 2 lines in jsonfuncs.c memset(name, 0, NAMEDATALEN); strncpy(name, fname, NAMEDATALEN); The memset here is redundant as strncpy will null the remaining buffer. This example is not dangerous, but it does highlight that there's code that's made the final cut which made this wrong assumption about strncpy. I was not going to bring this to light until I had done some more analysis, but there was just a commit which added a usage of strncpy that really looks like it should be a strlcpy. I'll continue with my analysis, but perhaps posting this early will bring something to light which I've not yet realised. Regards David Rowley
[HACKERS] Anybody using get_eclass_for_sort_expr in an extension?
I looked into bug #8591: http://www.postgresql.org/message-id/e1vgk41-00050x...@wrigleys.postgresql.org and was able to reproduce the problem. The proximate cause is that get_eclass_for_sort_expr is wrong to suppose that it can always create new equivalence class entries with empty em_nullable_relids; in the case where the call is coming from initialize_mergeclause_eclasses, it's essential to use the info supplied in restrictinfo-nullable_relids. Otherwise, quals deduced from the equivalence class may have wrong nullable_relids settings, allowing them to be pushed to unsafe places in 9.2 and later. This is relatively easy to fix as in the attached patch, but I'm wondering how risky this is to back-patch. We can deal with the field added to PlannerInfo by sticking it at the end of the struct in the back branches; we've done that before and not heard squawks. However, the signature change for get_eclass_for_sort_expr() would be problematic if any third-party code is calling that function directly. I'm inclined to think there probably isn't any such code, since none of the existing calls are in index-specific code or other places that people would have been likely to copy and modify. Still, I can't claim that the risk is zero. We could hack around that in the back branches in more or less ugly ways, such as by making get_eclass_for_sort_expr() be a wrapper around a new function. However, it's far from clear what such a wrapper ought to do --- it could pass NULL for nullable_relids, which would match the existing behavior, but the odds seem good that the existing behavior might be wrong for whatever an extension is trying to do. And having the code text diverge between HEAD and back branches isn't so appetizing anyway. I'm a bit inclined to take the risk of breaking anything that's calling get_eclass_for_sort_expr() directly. Thoughts? regards, tom lane diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 817b149..b39927e 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *** _outPlannerInfo(StringInfo str, const Pl *** 1698,1703 --- 1698,1704 WRITE_UINT_FIELD(query_level); WRITE_NODE_FIELD(plan_params); WRITE_BITMAPSET_FIELD(all_baserels); + WRITE_BITMAPSET_FIELD(nullable_baserels); WRITE_NODE_FIELD(join_rel_list); WRITE_INT_FIELD(join_cur_level); WRITE_NODE_FIELD(init_plans); diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 711b161..baddd34 100644 *** a/src/backend/optimizer/path/equivclass.c --- b/src/backend/optimizer/path/equivclass.c *** add_eq_member(EquivalenceClass *ec, Expr *** 510,515 --- 510,522 * equivalence class it is a member of; if none, optionally build a new * single-member EquivalenceClass for it. * + * expr is the expression, and nullable_relids is the set of base relids + * that are potentially nullable below it. We actually only care about + * the set of such relids that are used in the expression; but for caller + * convenience, we perform that intersection step here. The caller need + * only be sure that nullable_relids doesn't omit any nullable rels that + * might appear in the expr. + * * sortref is the SortGroupRef of the originating SortGroupClause, if any, * or zero if not. (It should never be zero if the expression is volatile!) * *** add_eq_member(EquivalenceClass *ec, Expr *** 538,543 --- 545,551 EquivalenceClass * get_eclass_for_sort_expr(PlannerInfo *root, Expr *expr, + Relids nullable_relids, List *opfamilies, Oid opcintype, Oid collation, *** get_eclass_for_sort_expr(PlannerInfo *ro *** 545,550 --- 553,559 Relids rel, bool create_it) { + Relids expr_relids; EquivalenceClass *newec; EquivalenceMember *newem; ListCell *lc1; *** get_eclass_for_sort_expr(PlannerInfo *ro *** 556,561 --- 565,576 expr = canonicalize_ec_expression(expr, opcintype, collation); /* + * Get the precise set of nullable relids appearing in the expression. + */ + expr_relids = pull_varnos((Node *) expr); + nullable_relids = bms_intersect(nullable_relids, expr_relids); + + /* * Scan through the existing EquivalenceClasses for a match */ foreach(lc1, root-eq_classes) *** get_eclass_for_sort_expr(PlannerInfo *ro *** 629,636 if (newec-ec_has_volatile sortref == 0) /* should not happen */ elog(ERROR, volatile EquivalenceClass has no sortref); ! newem = add_eq_member(newec, copyObject(expr), pull_varnos((Node *) expr), ! NULL, false, opcintype); /* * add_eq_member doesn't check for volatile functions, set-returning --- 644,651 if (newec-ec_has_volatile sortref == 0) /* should not happen */ elog(ERROR, volatile EquivalenceClass has no sortref); !
[HACKERS] SSL: better default ciphersuite
Attached patch changes the default ciphersuite to HIGH:!aNULL instead of old DEFAULT:!LOW:!EXP:!MD5:@STRENGTH where DEFAULT is a shortcut for ALL:!aNULL:!eNULL. Main goal is to leave low-level ciphersuite details to OpenSSL guys and give clear impression to Postgres admins what it is about. Compared to old value, new value will remove all suites with RC4 and SEED from ciphersuite list. If OpenSSL is compiled with support for SSL2, it will include following suite: DES-CBC3-MD5, usable only for SSL2 connections. Tested with OpenSSL 0.9.7 - 1.0.1, using openssl ciphers -v ... command. Values used --- HIGH: Contains only secure and well-researched algorithms. !aNULL Needed to disable suites that do not authenticate server. DEFAULT includes !aNULL by default. Values not used --- !MD5 This affects only one suite: DES-CBC3-MD5, which is available only for SSL2 connections. So it would only pollute the default value. @STRENGTH The OpenSSL cipher list is already sorted by humans, it's unlikely that mechanical sort would improve things. Also the existence of this value in old list is rather dubious, as server cipher order was never respected anyway. -- marko diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index ffc69c7..d4e6c52 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3144,7 +3144,7 @@ static struct config_string ConfigureNamesString[] = }, SSLCipherSuites, #ifdef USE_SSL - DEFAULT:!LOW:!EXP:!MD5:@STRENGTH, + HIGH:!aNULL, #else none, #endif diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 34a2d05..e6b7f9a 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -79,7 +79,7 @@ #authentication_timeout = 1min # 1s-600s #ssl = off# (change requires restart) -#ssl_ciphers = 'DEFAULT:!LOW:!EXP:!MD5:@STRENGTH' # allowed SSL ciphers +#ssl_ciphers = 'HIGH:!aNULL' # allowed SSL ciphers # (change requires restart) #ssl_renegotiation_limit = 512MB # amount of data between renegotiations #ssl_cert_file = 'server.crt' # (change requires restart) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Anybody using get_eclass_for_sort_expr in an extension?
On Thu, Nov 14, 2013 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit inclined to take the risk of breaking anything that's calling get_eclass_for_sort_expr() directly. Thoughts? It's worth being aware of the fact that Peter E's Jenkins instance seems to track regressions for some popular third party extensions: http://pgci.eisentraut.org/jenkins/view/Extensions/ Looking at what he is currently testing, these extensions seem unlikely to fall afoul of your changes. Just something to be aware of going forward. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 14.11.2013 14:38, David Rowley wrote: I've just completed some more benchmarking of this. I didn't try dropping the threshold down to 2 or 0 but I did tests at the cut over point and really don't see much difference in performance between the list at 32 and the hashtable at 33 sequences. The hash table version excels in the 16000 sequence test in comparison to the unpatched version. Times are in milliseconds of the time it took to call currval() 10 times for 1 sequence. Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32 in cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in cache 1963.711 10329.22 426% If I understand those results correctly, the best case scenario with the current code takes about 1800 ms. There's practically no difference with N = 32, where N is the number of sequences touched. The hash table method also takes about 1800 ms when N=33. The performance of the hash table is O(1), so presumably we can extrapolate from that that it's the same for any N. I think that means that we should just completely replace the list with the hash table. The difference with a small N is lost in noise, so there's no point in keeping the list as a fast path for small N. That'll make the patch somewhat simpler. - Heikki I had thought that maybe the biggest type of workloads might only touch 1 or 2 sequences, though it may be small but I had thought there would be an overhead in both cycles and memory usage in creating a hash table for these light usages of sequence backends. It would certainly make the patch more simple by removing this and it would also mean that I could remove the sometimes unused -next member from the SeqTableData struct which is just now set to NULL when in hash table mode. If you think it's the way to go then I can make the change, though maybe I'll hold off the refactor for now as it looks like other ideas have come up around rel cache. Regards David Rowley
Re: [HACKERS] strncpy is not a safe version of strcpy
On 15 Listopad 2013, 0:07, David Rowley wrote: Hi All, As a bit of a background task, over the past few days I've been analysing the uses of strncpy in the code just to try and validate if it is the right function to be using. I've already seen quite a few places where their usage is wrongly assumed. As many of you will know and maybe some of you have forgotten that strncpy is not a safe version of strcpy. It is also quite an inefficient way to copy a string to another buffer as strncpy will 0 out any space that happens to remain in the buffer. If there is no space left after the copy then the buffer won't end with a 0. It is likely far better explained here -- http://www.courtesan.com/todd/papers/strlcpy.html For example , the following 2 lines in jsonfuncs.c memset(name, 0, NAMEDATALEN); strncpy(name, fname, NAMEDATALEN); Be careful with 'Name' data type - it's not just a simple string buffer. AFAIK it needs to work with hashing etc. so the zeroing is actually needed here to make sure two values produce the same result. At least that's how I understand the code after a quick check - for example this is from the same jsonfuncs.c you mentioned: memset(fname, 0, NAMEDATALEN); strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN); hashentry = hash_search(json_hash, fname, HASH_FIND, NULL); So the zeroing is on purpose, although if strncpy does that then the memset is probably superflous. Either people do that because of habit / copy'n'paste, or maybe there are supported platforms when strncpy does not behave like this for some reason. I seriously doubt this inefficiency is going to be measurable in real world. If the result was a buffer-overflow bug, that'd be a different story, but maybe we could check the ~120 calls to strncpy in the whole code base and replace it with strlcpy where appropriate. That being said, thanks for looking into things like this. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strncpy is not a safe version of strcpy
On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra t...@fuzzy.cz wrote: It is likely far better explained here -- http://www.courtesan.com/todd/papers/strlcpy.html For example , the following 2 lines in jsonfuncs.c memset(name, 0, NAMEDATALEN); strncpy(name, fname, NAMEDATALEN); Be careful with 'Name' data type - it's not just a simple string buffer. AFAIK it needs to work with hashing etc. so the zeroing is actually needed here to make sure two values produce the same result. At least that's how I understand the code after a quick check - for example this is from the same jsonfuncs.c you mentioned: memset(fname, 0, NAMEDATALEN); strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN); hashentry = hash_search(json_hash, fname, HASH_FIND, NULL); So the zeroing is on purpose, although if strncpy does that then the memset is probably superflous. Either people do that because of habit / copy'n'paste, or maybe there are supported platforms when strncpy does not behave like this for some reason. I had not thought of the fact the some platforms don't properly implement strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate that this behaviour is part of the C89 standard. So does this mean we can always assume that all supported platforms always 0 out the remaining buffer? I seriously doubt this inefficiency is going to be measurable in real world. If the result was a buffer-overflow bug, that'd be a different story, but maybe we could check the ~120 calls to strncpy in the whole code base and replace it with strlcpy where appropriate. The example was more of a demonstration of wrong assumption rather than wasted cycles. Though the wasted cycles was on my mind a bit too. I was more focused on trying to draw a bit of attention to commit 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not properly set the last byte to 0 afterwards. I think this case could just be replaced with strlcpy which does all this hard work for us. Regards David Rowley That being said, thanks for looking into things like this. Tomas
[HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype
Initial review of the patch submitted in this message and listed in the current CommitFest: http://www.postgresql.org/message-id/cagpqqf3xwwc_4fhinz_g6ecvps_ov3k2pe4-aj1dg4iyy+f...@mail.gmail.com This patch would seem to be already committed here http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7778ddc7a2d5b006edbfa69cdb44b8d8c24ec1ff Is a review necessary at this point? -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype
On Thu, Nov 14, 2013 at 4:53 PM, Adrian Klaver adrian.kla...@gmail.com wrote: Is a review necessary at this point? No. Just mark it committed. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On Fri, Nov 15, 2013 at 3:12 AM, Andres Freund and...@2ndquadrant.comwrote: Hi, On 2013-11-13 22:55:43 +1300, David Rowley wrote: Here http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.usthere was some talk about init_sequence being a bottleneck when many sequences are used in a single backend. The attached I think implements what was talked about in the above link which for me seems to double the speed of a currval() loop over 3 sequences. It goes from about 7 seconds to 3.5 on my laptop. I think it'd be a better idea to integrate the sequence caching logic into the relcache. There's a comment about it: * (We can't * rely on the relcache, since it's only, well, a cache, and may decide to * discard entries.) but that's not really accurate anymore. We have the infrastructure for keeping values across resets and we don't discard entries. I just want to check this idea against an existing todo item to move sequences into a single table, as I think by the sounds of it this binds sequences being relations even closer together. The todo item reads: Consider placing all sequences in a single table, or create a system view This had been on the back of my mind while implementing the hash table stuff for init_sequence and again when doing my benchmarks where I created 3 sequences and went through the pain of having a path on my file system with 3 8k files. It sounds like your idea overlaps with this todo a little, so maybe this is a good idea to decide which would be best, though the more I think about it, the more I think that moving sequences into a single table is a no-go So for implementing moving sequences into a single system table: 1. The search_path stuff makes this a bit more complex. It sounds like this would require some duplication of the search_path logic. 2. There is also the problem with tracking object dependency. Currently: create sequence t_a_seq; create table t (a int not null default nextval('t_a_seq')); alter sequence t_a_seq owned by t.a; drop table t; drop sequence t_a_seq; -- already deleted by drop table t ERROR: sequence t_a_seq does not exist Moving sequences to a single table sounds like a special case for this logic. 3. Would moving sequences to a table still have to check that no duplicate object existed in the pg_class? Currently you can't have a sequence with the same name as a table create sequence a; create table a (a int); ERROR: relation a already exists Its not that I'm trying to shoot holes in moving sequences to a single table, really I'd like find a way to improve the wastefulness these 1 file per sequence laying around my file system, but if changing this is a no-go then it would be better to come off the todo list and then we shouldn't as many issues pouring more concrete in the sequences being relations mould. Regards David Rowley
Re: [HACKERS] Review: Patch insert throw error when year field len 4 for timestamptz datatype
On Thu, Nov 14, 2013 at 04:58:55PM -0800, Peter Geoghegan wrote: On Thu, Nov 14, 2013 at 4:53 PM, Adrian Klaver adrian.kla...@gmail.com wrote: Is a review necessary at this point? No. Just mark it committed. Oh, sorry, I didn't realize it was in the commit-fest app. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
Hi Claudio, (2013/11/14 22:53), Claudio Freire wrote: On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I create a patch that is improvement of disk-read and OS file caches. It can optimize kernel readahead parameter using buffer access strategy and posix_fadvice() in various disk-read situations. In general OS, readahead parameter was dynamically decided by disk-read situations. If long time disk-read was happened, readahead parameter becomes big. However it is based on experienced or heuristic algorithm, it causes waste disk-read and throws out useful OS file caches in some case. It is bad for disk-read performance a lot. It would be relevant to know which kernel did you use for those tests. I use CentOS 6.4 which kernel version is 2.6.32-358.23.2.el6.x86_64 in this test. A while back, I tried to use posix_fadvise to prefetch index pages. I search your past work. Do you talk about this ML-thread? Or is there another latest discussion? I see your patch is interesting, but it wasn't submitted to CF and stopping discussions. http://www.postgresql.org/message-id/CAGTBQpZzf70n0PYJ=VQLd+jb3wJGo=2txmy+skjd6g_vjc5...@mail.gmail.com I ended up finding out that interleaving posix_fadvise with I/O like that severly hinders (ie: completely disables) the kernel's read-ahead algorithm. Your patch becomes maximum readahead, when a sql is selected index range scan. Is it right? I think that your patch assumes that pages are ordered by index-data. This assumption is partially wrong. If your assumption is true, we don't need CLUSTER command. In actuary, CLUSTER command becomes better performance than nothing. How exactly did you set up those benchmarks? pg_bench defaults? My detail test setting is under following, * Server info CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2U/12C) RAM: 6GB - I reduced it intentionally in OS paraemter, because large memory tests have long time. HDD: SEAGATE Model: ST2000NM0001 @ 7200rpm * 1 RAID: none. * postgresql.conf(summarized) shared_buffers = 600MB (10% of RAM = 6GB) work_mem = 1MB maintenance_work_mem = 64MB wal_level = archive fsync = on archive_mode = on checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.7 * pgbench settings pgbench -j 4 -c 32 -T 600 pgbench pg_bench does not exercise heavy sequential access patterns, or long index scans. It performs many single-page index lookups per transaction and that's it. Yes, your argument is right. And it is also a fact that performance becomes better in these situations. You may want to try your patch with more real workloads, and maybe you'll confirm what I found out last time I messed with posix_fadvise. If my experience is still relevant, those patterns will have suffered a severe performance penalty with this patch, because it will disable kernel read-ahead on sequential index access. It may still work for sequential heap scans, because the access strategy will tell the kernel to do read-ahead, but many other access methods will suffer. The decisive difference with your patch is that my patch uses buffer hint control architecture, so it can control readahaed smarter in some cases. However, my patch is on the way and needed to more improvement. I am going to add method of controlling readahead by GUC, for user can freely select readahed parameter in their transactions. Try OLAP-style queries. I have DBT-3(TPC-H) benchmark tools. If you don't like TPC-H, could you tell me good OLAP benchmark tools? Regards, -- Mitsumasa KONDO NTT Open Source Software -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
(2013/11/15 2:03), Fujii Masao wrote: On Thu, Nov 14, 2013 at 9:09 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Hi, I create a patch that is improvement of disk-read and OS file caches. It can optimize kernel readahead parameter using buffer access strategy and posix_fadvice() in various disk-read situations. When I compiled the HEAD code with this patch on MacOS, I got the following error and warnings. gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -I../../../../src/include -c -o fd.o fd.c fd.c: In function 'BufferHintIOAdvise': fd.c:1182: error: 'POSIX_FADV_SEQUENTIAL' undeclared (first use in this function) fd.c:1182: error: (Each undeclared identifier is reported only once fd.c:1182: error: for each function it appears in.) fd.c:1185: error: 'POSIX_FADV_RANDOM' undeclared (first use in this function) make[4]: *** [fd.o] Error 1 make[3]: *** [file-recursive] Error 2 make[2]: *** [storage-recursive] Error 2 make[1]: *** [install-backend-recurse] Error 2 make: *** [install-src-recurse] Error 2 tablecmds.c:9120: warning: passing argument 5 of 'smgrread' makes pointer from integer without a cast bufmgr.c:455: warning: passing argument 5 of 'smgrread' from incompatible pointer type Thanks you for your report! I will fix it. Could you tell me your Mac OS version and gcc version? I have only mac book air with Maverick OS(10.9). Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
On Thu, Nov 14, 2013 at 6:18 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I will fix it. Could you tell me your Mac OS version and gcc version? I have only mac book air with Maverick OS(10.9). I have an idea that Mac OSX doesn't have posix_fadvise at all. Didn't you use the relevant macros so that the code at least builds on those platforms? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2013/11/14 7:11), Peter Geoghegan wrote: On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Hmm, now if we had portable atomic addition, so that we could spare the spinlock ... And adding a histogram or min/max for something like execution time isn't an approach that can be made to work for every existing cost tracked by pg_stat_statements. So, taking all that into consideration, I'm afraid this patch gets a -1 from me. It is confirmation just to make sure, does this patch mean my patch? I agree with you about not adding another lock implementation. It will becomes overhead. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Nov 14, 2013 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote: I think that pg_stat_statements should be made to do this kind of thing by a third party tool that aggregates snapshots of deltas. Time-series data, including (approximate) *local* minima and maxima should be built from that. I think tools like KONDO-san's pg_statsinfo tool have an important role to play here. I would like to see it or a similar tool become a kind of defacto standard for consuming pg_stat_statements' output. Agreed. I would like to hear others' thoughts on how to proceed here. Has anyone actually tried and failed with an approach that uses aggressive aggregation of snapshots/deltas? If that's something that is problematic, let's hear why. Maybe pg_stat_statements could do better when snapshots are aggressively taken by tools at fixed intervals. Most obviously, we could probably come up with a way better interface for tools that need deltas only, where a consumer registers interest in receiving snapshots. We could store a little bitmap structure in shared memory, and set bits as corresponding pg_stat_statements entries are dirtied, and only send query texts the first time. I am still very much of the opinion that we need to expose queryid and so on. I lost that argument several times, but it seems like there is strong demand from it from many places - I've had people that I don't know talk to me about it at conferences. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
(2013/11/15 11:17), Peter Geoghegan wrote: On Thu, Nov 14, 2013 at 6:18 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I will fix it. Could you tell me your Mac OS version and gcc version? I have only mac book air with Maverick OS(10.9). I have an idea that Mac OSX doesn't have posix_fadvise at all. Didn't you use the relevant macros so that the code at least builds on those platforms? Thank you for your nice advice, too. I try to fix macro program. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Nov 14, 2013 at 6:28 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: It is confirmation just to make sure, does this patch mean my patch? I agree with you about not adding another lock implementation. It will becomes overhead. Yes, I referred to your patch. I don't want to go down this road, because aggregation and constructing a timeline feels like the job of another tool. I am concerned about local minima and maxima. Even with the ability to reset min/max independently, you can't do so for each entry individually. And this approach won't scale to a histogram or more sophisticated ways of characterizing distribution, particularly not multiplicatively for things other than execution time (blocks hit and so on) - that spinlock needs to be held for very little time indeed to preserve pg_stat_statements current low overhead. As I said above, lets figure out how to have your tool or a similar tool acquire snapshots inexpensively and frequently instead. That is a discussion I'd be happy to have. IMO pg_stat_statements ought to be as cheap as possible, and do one thing well - aggregate fixed-unit costs cumulatively. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] information schema parameter_default implementation
Updated patch attached. On Sat, 2013-11-09 at 12:09 +0530, Amit Khandekar wrote: 2) I found the following check a bit confusing, maybe you can make it better if (!argmodes || argmodes[i] == PROARGMODE_IN || argmodes[i] == PROARGMODE_INOUT || argmodes[i] == PROARGMODE_VARIADIC) Factored that out into a separate helper function. The statement needs to be split into 80 columns width lines. done 3) Function level comment for pg_get_function_arg_default() is missing. I think the purpose of the function is clear. Unless a reader goes through the definition, it is not obvious whether the second function argument represents the parameter position in input parameters or it is the parameter position in *all* the function parameters (i.e. proargtypes or proallargtypes). I think at least a one-liner description of what this function does should be present. done 4) You can also add comments inside the function, for example the comment for the line: nth = inputargn - 1 - (proc-pronargs - proc-pronargdefaults); Suggestion? Yes, it is difficult to understand what's the logic behind this calculation, until we go read the documentation related to pg_proc.proargdefaults. I think this should be sufficient: /* * proargdefaults elements always correspond to the last N input arguments, * where N = pronargdefaults. So calculate the nth_default accordingly. */ done There should be an initial check to see if nth_arg is passed a negative value. It is used as an index into the argmodes array, so a -ve array index can cause a crash. Because pg_get_function_arg_default() can now be called by a user also, we need to validate the input values. I am ok with returning with an error Invalid argument. done --- Instead of : deparse_expression_pretty(node, NIL, false, false, 0, 0) you can use : deparse_expression(node, NIL, false, false) done From 442911934c1640af18a83ef2efaafc45c569c98f Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Thu, 14 Nov 2013 21:43:15 -0500 Subject: [PATCH] Implement information_schema.parameters.parameter_default column Reviewed-by: Ali Dar ali.munir@gmail.com Reviewed-by: Amit Khandekar amit.khande...@enterprisedb.com --- doc/src/sgml/information_schema.sgml| 9 +++ src/backend/catalog/information_schema.sql | 9 ++- src/backend/utils/adt/ruleutils.c | 84 + src/include/catalog/catversion.h| 2 +- src/include/catalog/pg_proc.h | 2 + src/include/utils/builtins.h| 1 + src/test/regress/expected/create_function_3.out | 33 +- src/test/regress/sql/create_function_3.sql | 24 +++ 8 files changed, 160 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml index 22e17bb..22f43c8 100644 --- a/doc/src/sgml/information_schema.sgml +++ b/doc/src/sgml/information_schema.sgml @@ -3323,6 +3323,15 @@ titleliteralparameters/literal Columns/title in future versions.) /entry /row + + row + entryliteralparameter_default/literal/entry + entrytypecharacter_data/type/entry + entry + The default expression of the parameter, or null if none or if the + function is not owned by a currently enabled role. + /entry + /row /tbody /tgroup /table diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index c5f7a8b..fd706e3 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -1133,10 +1133,15 @@ CREATE VIEW parameters AS CAST(null AS sql_identifier) AS scope_schema, CAST(null AS sql_identifier) AS scope_name, CAST(null AS cardinal_number) AS maximum_cardinality, - CAST((ss.x).n AS sql_identifier) AS dtd_identifier + CAST((ss.x).n AS sql_identifier) AS dtd_identifier, + CAST( + CASE WHEN pg_has_role(proowner, 'USAGE') + THEN pg_get_function_arg_default(p_oid, (ss.x).n) + ELSE NULL END + AS character_data) AS parameter_default FROM pg_type t, pg_namespace nt, - (SELECT n.nspname AS n_nspname, p.proname, p.oid AS p_oid, + (SELECT n.nspname AS n_nspname, p.proname, p.oid AS p_oid, p.proowner, p.proargnames, p.proargmodes, _pg_expandarray(coalesce(p.proallargtypes, p.proargtypes::oid[])) AS x FROM pg_namespace n, pg_proc p diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 5ffce68..dace05a 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -2266,6 +2266,90 @@ print_function_arguments(StringInfo buf, HeapTuple proctup, return argsprinted; } +static bool +is_input_argument(int nth, const char
[HACKERS] Database disconnection and switch inside a single bgworker
Hi all, Currently, bgworkers offer the possibility to connect to a given database using BackgroundWorkerInitializeConnection in bgworker.h, but there is actually no way to disconnect from a given database inside the same bgworker process. One of the use cases for that would be the possibility to have the same bgworker performing for example some analysis on a database A, like some analysis of statistics using a common database like postgres, and then perform some actions on another database like, let's imagine an ANALYSE on a given relation only on database B. Using the infrastructure of 9.4 as of now, it would be possible of course to have a bgworker process launching some other child processes dynamically on different databases, but users (including me) might not want to do that all the time. Database disconnection would be also pretty cool for things like parallel query processing using a pool of bgworker processes that all the backends could use in parallel as a single ressource. Note that I didn't have a look at the code yet to see how it would be possible to do that (looks tricky though), if any infrastructure is needed or if it could be possible to do that without modifying the core code of Postgres. So, opinions about that as well as additional thoughts are welcome! Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On Thu, Nov 14, 2013 at 3:53 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Nov 14, 2013 at 3:02 PM, Sawada Masahiko sawada.m...@gmail.com wrote: I attached patch adds new wal_level 'all'. If wal_level is set 'all', the server logs WAL not only when wal_level is set 'hot_standby' ,but also when updating hint bit. That is, we will be able to know all of the changed block number by reading the WALs. Is 'all' a name really suited? It looks too general. I don't have a name on top of my mind but what about something like full_pages or something similar... Yes, I'm worried about name of value. But 'full_pages' sounds good for me. This wal_level is infrastructure for fast failback. (i.g., without fresh backup) It need to cooperate with pg_rewind. I am not sure that using as reason the possible interactions of a contrib module not in core is a reason sufficient to justify the presence of a new wal_level, and pg_rewind is still a young solution that needs to be improved. So such a patch looks premature IMO, but the idea is interesting and might cover many needs for external projects. Not only that, I think it will be profitable infrastructure for differential backup. Yep, agreed. This might help some projects in this area. And it leads to improve performance at standby server side. Because the standby server doesn't update hintbit by itself, but FPW is replicated to standby server and applied. It would be interesting to see some numbers here. I think this patch provide several benefit for feature. The fast failback with pg_rewind is one of them. If I want to provide only the fast failback with pg_rewind, this patch logs too much information. Just logging changed block number is enough for that. As you said pg_rewind is still a young solution. But It very cool and flexible solution. I'm going to improve pg_rewind actively. This is clearly a WIP patch so it does not matter now, but if you submit it later on, be sure to add some comments in bufmgr.c as well as documentation for the new option. Yes, will do. -- Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Autoconf 2.69 update
I'm proposing that we upgrade our Autoconf to 2.69, which is the latest right now (release date 2012-04-24). There are no changes in the source needed, just tweak the version number in configure.in (see below) and run autoreconf. I've compared the configure output before and after on a few boxes, and there were no significant changes. The major benefit of this update is that configure shrinks to about half the size and also runs a bit faster. Maybe at this time, everyone who is interested can check that they have an installation of Autoconf 2.69 around, and then in a few weeks we can do the update. diff --git a/configure.in b/configure.in index a4baeaf..ac61be5 100644 --- a/configure.in +++ b/configure.in @@ -19,7 +19,7 @@ m4_pattern_forbid(^PGAC_)dnl to catch undefined macros AC_INIT([PostgreSQL], [9.4devel], [pgsql-b...@postgresql.org]) -m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.63], [], [m4_fatal([Autoconf version 2.63 is required. +m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.69], [], [m4_fatal([Autoconf version 2.69 is required. Untested combinations of 'autoconf' and PostgreSQL versions are not recommended. You can remove the check from 'configure.in' but it is then your responsibility whether the result works or not.])]) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autoconf 2.69 update
On Fri, Nov 15, 2013 at 12:00 PM, Peter Eisentraut pete...@gmx.net wrote: I'm proposing that we upgrade our Autoconf to 2.69, which is the latest right now (release date 2012-04-24). There are no changes in the source needed, just tweak the version number in configure.in (see below) and run autoreconf. I've compared the configure output before and after on a few boxes, and there were no significant changes. The major benefit of this update is that configure shrinks to about half the size and also runs a bit faster. +1 for the update. Maybe at this time, everyone who is interested can check that they have an installation of Autoconf 2.69 around, and then in a few weeks we can do the update. Archlinux boxes will be happier with that. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strncpy is not a safe version of strcpy
On 15 Listopad 2013, 1:00, David Rowley wrote: On Fri, Nov 15, 2013 at 12:33 PM, Tomas Vondra t...@fuzzy.cz wrote: It is likely far better explained here -- http://www.courtesan.com/todd/papers/strlcpy.html For example , the following 2 lines in jsonfuncs.c memset(name, 0, NAMEDATALEN); strncpy(name, fname, NAMEDATALEN); Be careful with 'Name' data type - it's not just a simple string buffer. AFAIK it needs to work with hashing etc. so the zeroing is actually needed here to make sure two values produce the same result. At least that's how I understand the code after a quick check - for example this is from the same jsonfuncs.c you mentioned: memset(fname, 0, NAMEDATALEN); strncpy(fname, NameStr(tupdesc-attrs[i]-attname), NAMEDATALEN); hashentry = hash_search(json_hash, fname, HASH_FIND, NULL); So the zeroing is on purpose, although if strncpy does that then the memset is probably superflous. Either people do that because of habit / copy'n'paste, or maybe there are supported platforms when strncpy does not behave like this for some reason. I had not thought of the fact the some platforms don't properly implement strncpy(). On quick check http://man.he.net/man3/strncpy seems to indicate that this behaviour is part of the C89 standard. So does this mean we can always assume that all supported platforms always 0 out the remaining buffer? I don't know about such platform - I was merely speculating about why people might use such code. I seriously doubt this inefficiency is going to be measurable in real world. If the result was a buffer-overflow bug, that'd be a different story, but maybe we could check the ~120 calls to strncpy in the whole code base and replace it with strlcpy where appropriate. The example was more of a demonstration of wrong assumption rather than wasted cycles. Though the wasted cycles was on my mind a bit too. I was Yeah. To be fair, number of occurrences in the code base is not a particularly exact measure of the impact - some of those uses might be used in code paths that are quite busy. more focused on trying to draw a bit of attention to commit 061b88c732952c59741374806e1e41c1ec845d50 which uses strncpy and does not properly set the last byte to 0 afterwards. I think this case could just be replaced with strlcpy which does all this hard work for us. Hmm, you mean this piece of code? strncpy(saved_argv0, argv[0], MAXPGPATH); IMHO you're right that's probably broken, unless there's some checking happening before the call. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote: Hello, Please find attached pg_stat_statements-identification-v9.patch. I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key;/* hash key of entry - MUST BE FIRST */ Counterscounters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? This is what I call a can't happen error, or a defensive one: + else + { + /* +* Couldn't identify the tuple format. Raise error. +* +* This is an exceptional case that may only happen in bizarre +* situations, since it is thought that every released version +* of pg_stat_statements has a matching schema. +*/ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. Please take a look at this, for future reference: https://wiki.postgresql.org/wiki/Creating_Clean_Patches The whitespace changes are distracting. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2013/11/15 11:31), Peter Geoghegan wrote: On Thu, Nov 14, 2013 at 6:28 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: It is confirmation just to make sure, does this patch mean my patch? I agree with you about not adding another lock implementation. It will becomes overhead. Yes, I referred to your patch. I don't want to go down this road, because aggregation and constructing a timeline feels like the job of another tool. I am concerned about local minima and maxima. Even with the ability to reset min/max independently, you can't do so for each entry individually. And this approach won't scale to a histogram or more sophisticated ways of characterizing distribution, particularly not multiplicatively for things other than execution time (blocks hit and so on) I think that pg_stat_statements is light-weight monitoring tool, not whole monitoring tool. This feature is very good for everyone to get statistics. If you'd like to get more detail statistics, I suggest you to add another monitoring tools like pg_stat_statements_full. It might more heavy, but it can get more detail information. Everyone will welcome to new features of that. - that spinlock needs to be held for very little time indeed to preserve pg_stat_statements current low overhead. I'd like to leave pg_stat_statement low overhead. My patch realizes it. I don't add new locks and complicated code in my patch. As I said above, lets figure out how to have your tool or a similar tool acquire snapshots inexpensively and frequently instead. We tried to solve this problem using our tool in the past. However, it is difficult that except log output method which is log_statement=all option. So we try to add new feature to pg_stat_statement, it would help DBA to provide useful statistics without overhead. That is a discussion I'd be happy to have. IMO pg_stat_statements ought to be as cheap as possible, and do one thing well - aggregate fixed-unit costs cumulatively. I am also happy to your discussion! I'd like to install your new patch and give you my comment. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2013/11/15 2:09), Fujii Masao wrote: Agreed. Could you tell me your agreed reason? I am sorry that I suspect you doesn't understand this disccusion enough:-( Regards, -- Mitsumasa KONDO NTT Open Source Software Ceter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 12 November 2013 08:47 Amit Kapila wrote: On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 08 November 2013 18:35 Amit Kapila wrote: On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 07 November 2013 09:42 Amit Kapila wrote: 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done. This approach doesn't include the tuples which are remains during the vacuum operation. Patch is modified as take a copy of n_dead_tuples during vacuum start and use the same while calculating the new dead tuples at end of vacuum. By the way, do you have test case or can you try to write a test case which can show this problem and then after fix, you can verify if the problem is resolved. The simulated index bloat problem can be generated using the attached script and sql. With the fix of setting the dead tuples properly, Which fix here you are referring to, is it the one which you have proposed with your initial mail? the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. With the simulated bloat test run for 1 hour the bloat occurred as below, Unpatched - 1532MB Patched - 1474MB In your test run, have you checked what happen if after heavy update (or once bloat occurs), if you keep the system idle (or just have read load on system) for some time, what is the result? You haven't answered one of my questions in previous mail ( With the fix of setting the dead tuples properly, the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. Which fix here you are referring to?) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote: Hello, Please find attached pg_stat_statements-identification-v9.patch. I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? The destabilization of the query_id is to attempt to skirt the objection that the unpredictability of instability in the query_id would otherwise cause problems and present a false contract, particular in regards to point release upgrades. Earliest drafts of mine included destabilizing every session, but this kills off a nice use case of correlating query ids between primaries and standbys, if memory serves. This has the semantics of destabilizing -- for sure -- every point release. @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key;/* hash key of entry - MUST BE FIRST */ Counterscounters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) It is roughly modeled after the column version of the same thing that pre-existed in pg_stat_statements. The only reason to have a LATEST is for some of the invariants though; otherwise, most uses should use the version-stamped symbol. I did this wrongly a bunch of times as spotted by Alvaro, I believe. I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? Testing. It was very useful to set to small numbers, like two or three. It's not crucial to the patch at all but having to whack it back all the time to keep the patch submission devoid of it seemed impractically tedious during revisions. This is what I call a can't happen error, or a defensive one: + else + { + /* +* Couldn't identify the tuple format. Raise error. +* +* This is an exceptional case that may only happen in bizarre +* situations, since it is thought that every released version +* of pg_stat_statements has a matching schema. +*/ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. That seems reasonable. Yes, it's basically a soft assert. I hit this one in development a few times, it was handy. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? The whitespace changes are not intentional, but I think the comments help a reader track what's going on better, which becomes more important if one adds more fields. It can be broken out into a separate patch, but it didn't seem like that bookkeeping was necessary. @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). I made no design decisions here...no complaints from me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] Optimize kernel readahead using buffer access strategy
On Thu, Nov 14, 2013 at 11:13 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Hi Claudio, (2013/11/14 22:53), Claudio Freire wrote: On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I create a patch that is improvement of disk-read and OS file caches. It can optimize kernel readahead parameter using buffer access strategy and posix_fadvice() in various disk-read situations. In general OS, readahead parameter was dynamically decided by disk-read situations. If long time disk-read was happened, readahead parameter becomes big. However it is based on experienced or heuristic algorithm, it causes waste disk-read and throws out useful OS file caches in some case. It is bad for disk-read performance a lot. It would be relevant to know which kernel did you use for those tests. I use CentOS 6.4 which kernel version is 2.6.32-358.23.2.el6.x86_64 in this test. That's close to the kernel version I was using, so you should see the same effect. A while back, I tried to use posix_fadvise to prefetch index pages. I search your past work. Do you talk about this ML-thread? Or is there another latest discussion? I see your patch is interesting, but it wasn't submitted to CF and stopping discussions. http://www.postgresql.org/message-id/CAGTBQpZzf70n0PYJ=VQLd+jb3wJGo=2txmy+skjd6g_vjc5...@mail.gmail.com Yes, I didn't, exactly because of that bad interaction with the kernel. It needs either more smarts to only do fadvise on known-random patterns (what you did mostly), or an accompanying kernel patch (which I was working on, but ran out of test machines). I ended up finding out that interleaving posix_fadvise with I/O like that severly hinders (ie: completely disables) the kernel's read-ahead algorithm. Your patch becomes maximum readahead, when a sql is selected index range scan. Is it right? Ehm... sorta. I think that your patch assumes that pages are ordered by index-data. No. It just knows which pages will be needed, and fadvises them. No guessing involved, except the guess that the scan will not be aborted. There's a heuristic to stop limited scans from attempting to fadvise, and that's that prefetch strategy is applied only from the Nth+ page walk. It improves index-only scans the most, but I also attempted to handle heap prefetches. That's where the kernel started conspiring against me, because I used many naturally-clustered indexes, and THERE performance was adversely affected because of that kernel bug. You may want to try your patch with more real workloads, and maybe you'll confirm what I found out last time I messed with posix_fadvise. If my experience is still relevant, those patterns will have suffered a severe performance penalty with this patch, because it will disable kernel read-ahead on sequential index access. It may still work for sequential heap scans, because the access strategy will tell the kernel to do read-ahead, but many other access methods will suffer. The decisive difference with your patch is that my patch uses buffer hint control architecture, so it can control readahaed smarter in some cases. Indeed, but it's not enough. See my above comment about naturally clustered indexes. The planner expects that, and plans accordingly. It will notice correlation between a PK and physical location, and will treat an index scan over PK to be almost sequential. With your patch, that assumption will be broken I believe. However, my patch is on the way and needed to more improvement. I am going to add method of controlling readahead by GUC, for user can freely select readahed parameter in their transactions. Rather, I'd try to avoid fadvising consecutive or almost-consecutive blocks. Detecting that is hard at the block level, but maybe you can tie that detection into the planner, and specify a sequential strategy when the planner expects index-heap correlation? Try OLAP-style queries. I have DBT-3(TPC-H) benchmark tools. If you don't like TPC-H, could you tell me good OLAP benchmark tools? I don't really know. Skimming the specs, I'm not sure if those queries generate large index range queries. You could try, maybe with autoexplain? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 15 November 2013 10:00 Amit Kapila wrote: On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 12 November 2013 08:47 Amit Kapila wrote: On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 08 November 2013 18:35 Amit Kapila wrote: On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 07 November 2013 09:42 Amit Kapila wrote: 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done. This approach doesn't include the tuples which are remains during the vacuum operation. Patch is modified as take a copy of n_dead_tuples during vacuum start and use the same while calculating the new dead tuples at end of vacuum. By the way, do you have test case or can you try to write a test case which can show this problem and then after fix, you can verify if the problem is resolved. The simulated index bloat problem can be generated using the attached script and sql. With the fix of setting the dead tuples properly, Which fix here you are referring to, is it the one which you have proposed with your initial mail? the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. With the simulated bloat test run for 1 hour the bloat occurred as below, Unpatched - 1532MB Patched - 1474MB In your test run, have you checked what happen if after heavy update (or once bloat occurs), if you keep the system idle (or just have read load on system) for some time, what is the result? In the simulated test run which is shared in the previous mail, after a heavy update the system is idle for 15 mins. With the master code, the vacuum is not triggered during this idle time as it is Not satisfies the vacuum threshold, because it doesn't consider the dead tuples occurred During vacuum operation. With the fix the one extra vacuum can gets triggered compared to master code after two or three heavy updates because of accumulation of dead tuples. You haven't answered one of my questions in previous mail ( With the fix of setting the dead tuples properly, the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. Which fix here you are referring to?) The bloat reduced is same with initial and latest patch. The vacuum cost parameters change (which doesn't contain any fix) is avoided the bloat. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote I think that means that we should just completely replace the list with the hash table. The difference with a small N is lost in noise, so there's no point in keeping the list as a fast path for small N. That'll make the patch somewhat simpler. - Heikki Attached is a much more simple patch which gets rid of the initial linked list. hashtab_seq_v0.2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
+1 interesting feature Pavel 2013/11/15 Peter Eisentraut pete...@gmx.net Various places in the constraint checking code say something like, if we ever implement assertions, here is where it should go. I've been fiddling with filling in those gaps for some time now, and the other day I noticed, hey, this actually kind of works, so here it is. Let's see whether this architecture is sound. A constraint trigger performs the actual checking. For the implementation of the trigger, I've used some SPI hacking for now; that could probably be refined. The attached patch has documentation, tests, psql support. Missing pieces are pg_dump support, dependency management, and permission checking (the latter marked in the code). This is not a performance feature. It's for things like, this table should have at most 10 rows, or all the values in this table must be bigger than all the values in that other table. It's a bit esoteric, but it comes up again and again. Let me know what you think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On Fri, Nov 15, 2013 at 3:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: I think it'd be a better idea to integrate the sequence caching logic into the relcache. There's a comment about it: * (We can't * rely on the relcache, since it's only, well, a cache, and may decide to * discard entries.) but that's not really accurate anymore. We have the infrastructure for keeping values across resets and we don't discard entries. We most certainly *do* discard entries, if they're not open when a cache flush event comes along. I suppose it'd be possible to mark a relcache entry for a sequence as locked-in-core, but that doesn't attract me at all. A relcache entry is a whole lot larger than the amount of state we really need to keep for a sequence. One idea is to have a hashtable for the sequence-specific data, but to add a link field to the relcache entry that points to the non-flushable sequence hashtable entry. That would save the second hashtable lookup as long as the relcache entry hadn't been flushed since last use, while not requiring any violence to the lifespan semantics of relcache entries. (Actually, if we did that, it might not even be worth converting the list to a hashtable? Searches would become a lot less frequent.) Unless I've misunderstood something it looks like this would mean giving heamam.c and relcache.c knowledge of sequences. Currently relation_open is called from open_share_lock in sequence.c. The only way I can see to do this would be to add something like relation_open_sequence() in heapam.c which means we'd need to invent RelationIdGetSequenceRelation() and use that instead of RelationIdGetRelation() and somewhere along the line have it pass back the SeqTableData struct which would be tagged onto RelIdCacheEnt. I think it can be done but I don't think it will look pretty. Perhaps if there was a more generic way... Would tagging some void *rd_extra only the RelationData be a better way? And just have sequence.c make use of that for storing the SeqTableData. Also I'm wondering what we'd do with all these pointers when someone does DISCARD SEQUENCES; would we have to invalidate the relcache or would it just be matter of looping over it and freeing of the sequence data setting the pointers to NULL? Regards David Rowley
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? The thought was queryid should have a different value for the same query across PG versions, to ensure that clients using the view,do not assume otherwise. @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key; /* hash key of entry - MUST BE FIRST */ Counters counters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) There is #define PGSS_TUP_LATEST PGSS_TUP_V1_2 So if an update has to be done, this is the one place to do it. I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? Will set it back to the original value and also test for max value = 1 This is what I call a can't happen error, or a defensive one: + else + { + /* + * Couldn't identify the tuple format. Raise error. + * + * This is an exceptional case that may only happen in bizarre + * situations, since it is thought that every released version + * of pg_stat_statements has a matching schema. + */ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. Will convert to using elogs Please take a look at this, for future reference: https://wiki.postgresql.org/wiki/Creating_Clean_Patches The whitespace changes are distracting. Thanks! Still learning the art of clean patch submission. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? I think its a carry over from Daniel's code. I understand the thought. Will keep patch strictly restricted to functionality implemented @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). Will make the change. -- Peter Geoghegan Thank you for the review Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5778472.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] GIN improvements part 1: additional information
On Thu, Nov 14, 2013 at 3:00 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04.11.2013 23:44, Alexander Korotkov wrote: On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov aekorot...@gmail.comwrote: Attached version of patch is debugged. I would like to note that number of bugs was low and it wasn't very hard to debug. I've rerun tests on it. You can see that numbers are improved as the result of your refactoring. event | period ---+- index_build | 00:01:45.416822 index_build_recovery | 00:00:06 index_update | 00:05:17.263606 index_update_recovery | 00:01:22 search_new| 00:24:07.706482 search_updated| 00:26:25.364494 (6 rows) label | blocks_mark +- search_new | 847587636 search_updated | 881210486 (2 rows) label | size ---+--- new | 419299328 after_updates | 715915264 (2 rows) Beside simple bugs, there was a subtle bug in incremental item indexes update. I've added some more comments including ASCII picture about how item indexes are shifted. Now, I'm trying to implement support of old page format. Then we can decide which approach to use. Attached version of patch has support of old page format. Patch still needs more documentation and probably refactoring, but I believe idea is clear and it can be reviewed. In the patch I have to revert change of null category placement for compatibility with old posting list format. Thanks, just glanced at this quickly. If I'm reading the patch correctly, old-style non-compressed posting tree leaf pages are not vacuumed at all; that's clearly not right. Fixed. Now separate function handles uncompressed posting lists and compress them if as least one TID is deleted. I argued earlier that we don't need to handle the case that compressing a page makes it larger, so that the existing items don't fit on the page anymore. I'm having some second thoughts on that; I didn't take into account the fact that the mini-index in the new page format takes up some space. I think it's still highly unlikely that there isn't enough space on a 8k page, but it's not totally crazy with a non-standard small page size. So at least that situation needs to be checked for with an ereport(), rather than Assert. Now this situation is ereported before any change in page. To handle splitting a non-compressed page, it seems that you're relying on the fact that before splitting, we try to insert, which compresses the page. The problem with that is that it's not correctly WAL-logged. The split record that follows includes a full copy of both page halves, but if the split fails for some reason, e.g you run out of disk space, there is no WAL record at all of the the compression. I'd suggest doing the compression in the insert phase on a temporary copy of the page, leaving the original page untouched if there isn't enough space for the insertion to complete. (You could argue that this case can't happen because the compression must always create enough space to insert one more item. maybe so, but at least there should be an explicit check for that.) Good point. Yes, if we don't handle specially inserting item indexes, I see no point to do special handling for single TID which is much smaller. In the attached patch dataCompressLeafPage just reserves space for single TID. Also, many changes in comments and README. Unfortunally, I didn't understand what is FIXME about in ginVacuumEntryPage. So, it's not fixed :) Sorry, I posted buggy version of patch. Attached version is correct. Another bug fix by report from Rod Taylor. -- With best regards, Alexander Korotkov. gin-packed-postinglists-16.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor r...@simple-knowledge.comwrote: I checked out master and put together a test case using a small percentage of production data for a known problem we have with Pg 9.2 and text search scans. A small percentage in this case means 10 million records randomly selected; has a few billion records. Tests ran for master successfully and I recorded timings. Applied the patch included here to master along with gin-packed-postinglists-14.patch. Run make clean; ./configure; make; make install. make check (All 141 tests passed.) initdb, import dump The GIN index fails to build with a segfault. Thanks for testing. See fixed version in thread about packed posting lists. -- With best regards, Alexander Korotkov.