Re: [HACKERS] Minmax indexes
On Thu, Jul 10, 2014 at 6:16 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Claudio Freire wrote: An aggregate to generate a compressed set from several values A function which adds a new value to the compressed set and returns the new compressed set A function which tests if a value is in a compressed set A function which tests if a compressed set overlaps another compressed set of equal type If you can define different compressed sets, you can use this to generate both min/max indexes as well as bloom filter indexes. Whether we'd want to have both is perhaps questionable, but having the ability to is probably desirable. Here's a new version of this patch, which is more generic the original versions, and similar to what you describe. I've not read the discussion so far at all, but I found the problem when I played with this patch. Sorry if this has already been discussed. =# create table test as select num from generate_series(1,10) num; SELECT 10 =# create index testidx on test using minmax (num); CREATE INDEX =# alter table test alter column num type text; ERROR: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-07-04 19:27:10 +0530, Rahila Syed wrote: + /* Allocates memory for compressed backup blocks according to the compression + * algorithm used.Once per session at the time of insertion of first XLOG + * record. + * This memory stays till the end of session. OOM is handled by making the + * code proceed without FPW compression*/ + static char *compressed_pages[XLR_MAX_BKP_BLOCKS]; + static bool compressed_pages_allocated = false; + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF + compressed_pages_allocated!= true) + { + size_t buffer_size = VARHDRSZ; + int j; + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY) + buffer_size += snappy_max_compressed_length(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4) + buffer_size += LZ4_compressBound(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ) + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ); + for (j = 0; j XLR_MAX_BKP_BLOCKS; j++) + { compressed_pages[j] = (char *) malloc(buffer_size); + if(compressed_pages[j] == NULL) + { + compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF; + break; + } + } + compressed_pages_allocated = true; + } Why not do this in InitXLOGAccess() or similar? /* * Make additional rdata chain entries for the backup blocks, so that we * don't need to special-case them in the write loop. This modifies the @@ -1015,11 +1048,32 @@ begin:; rdt-next = (dtbuf_rdt2[i]); rdt = rdt-next; + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF) + { + /* Compress the backup block before including it in rdata chain */ + rdt-data = CompressBackupBlock(page, BLCKSZ - bkpb-hole_length, + compressed_pages[i], (rdt-len)); + if (rdt-data != NULL) + { + /* + * write_len is the length of compressed block and its varlena + * header + */ + write_len += rdt-len; + bkpb-hole_length = BLCKSZ - rdt-len; + /*Adding information about compression in the backup block header*/ + bkpb-block_compression=compress_backup_block; + rdt-next = NULL; + continue; + } + } + So, you're compressing backup blocks one by one. I wonder if that's the right idea and if we shouldn't instead compress all of them in one run to increase the compression ratio. +/* * Get a pointer to the right location in the WAL buffer containing the * given XLogRecPtr. * @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr lsn, BkpBlock bkpb, char *blk, { memcpy((char *) page, blk, BLCKSZ); } + /* Decompress if backup block is compressed*/ + else if (VARATT_IS_COMPRESSED((struct varlena *) blk) + bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF) + { + if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_SNAPPY) + { + int ret; + size_t compressed_length = VARSIZE((struct varlena *) blk) - VARHDRSZ; + char *compressed_data = (char *)VARDATA((struct varlena *) blk); + size_t s_uncompressed_length; + + ret = snappy_uncompressed_length(compressed_data, + compressed_length, + s_uncompressed_length); + if (!ret) + elog(ERROR, snappy: failed to determine compression length); + if (BLCKSZ != s_uncompressed_length) + elog(ERROR, snappy: compression size mismatch %d != %zu, + BLCKSZ, s_uncompressed_length); + + ret = snappy_uncompress(compressed_data, + compressed_length, + page); + if (ret != 0) + elog(ERROR, snappy: decompression failed: %d, ret); + } + else if (bkpb.block_compression == BACKUP_BLOCK_COMPRESSION_LZ4) + { + int ret; +
Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls
On Mon, 2014-07-07 at 01:21 -0700, Jeff Davis wrote: On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote: On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: Thanks for the detailed feedback, I'm sorry it took so long to incorporate it. I've attached the latest version of the patch, fixing in particular: As innocent as this patch seemed at first, it actually opens up a lot of questions. Attached is the (incomplete) edit of the patch so far. Changes from your patch: * changed test to be locale-insensitive * lots of refactoring in the execution itself * fix offset 0 case * many test improvements * remove bitmapset and just use an array bitmap * fix error message typo Open Issues: I don't think exposing the frame options is a good idea. That's an internal concept now, but putting it in windowapi.h will mean that it needs to live forever. The struct is private, so there's no easy hack to access the frame options directly. That means that we need to work with the existing API functions, which is OK because I think that everything we want to do can go into WinGetFuncArgInPartition(). If we do the same thing for WinGetFuncArgInFrame(), then first/last/nth also work. That leaves the questions: * Do we want IGNORE NULLS to work for every window function, or only a specified subset? * If it only works for some window functions, is that hard-coded or driven by the catalog? * If it works for all window functions, could it cause some pre-existing functions to behave strangely? Also, I'm re-thinking Dean's comments here: http://www.postgresql.org/message-id/CAEZATCWT3=P88nv2ThTjvRDLpOsVtAPxaVPe=MaWe-x=guh...@mail.gmail.com He brings up a few good points. I will look into the frame vs. window option, though it looks like you've already at least fixed the crash. His other point about actually eliminating the NULLs from the window itself is interesting, but I don't think it works. IGNORE NULLS ignores *other* rows with NULL, but (per spec) does not ignore the current row. That sounds awkward if you've already removed the NULL rows from the window, but maybe there's something that could work. And there are a few other things I'm still looking into, but hopefully they don't raise new issues. Regards, Jeff Davis *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 13164,13169 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; --- 13164,13170 lag(replaceable class=parametervalue/replaceable typeany/ [, replaceable class=parameteroffset/replaceable typeinteger/ [, replaceable class=parameterdefault/replaceable typeany/ ]]) + [ { RESPECT | IGNORE } NULLS ] /function /entry entry *** *** 13178,13184 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and !replaceable class=parameterdefault/replaceable to null /entry /row --- 13179,13187 replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and !replaceable class=parameterdefault/replaceable to null. If !literalIGNORE NULLS/ is specified then the function will be evaluated !as if the rows containing nulls didn't exist. /entry /row *** *** 13191,13196 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; --- 13194,13200 lead(replaceable class=parametervalue/replaceable typeany/ [, replaceable class=parameteroffset/replaceable typeinteger/ [, replaceable class=parameterdefault/replaceable typeany/ ]]) + [ { RESPECT | IGNORE } NULLS ] /function /entry entry *** *** 13205,13211 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and !replaceable class=parameterdefault/replaceable to null /entry /row --- 13209,13217 replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and !replaceable class=parameterdefault/replaceable to null. If !literalIGNORE NULLS/ is specified then the function will be evaluated !as if the rows containing nulls didn't exist. /entry /row *** *** 13299,13309 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y
Re: [HACKERS] inherit support for foreign tables
(2014/07/10 18:12), Shigeru Hanada wrote: 2014-06-24 16:30 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp: (2014/06/23 18:35), Ashutosh Bapat wrote: Selecting tableoid on parent causes an error, ERROR: cannot extract system attribute from virtual tuple. The foreign table has an OID which can be reported as tableoid for the rows coming from that foreign table. Do we want to do that? No. I think it's a bug. I'll fix it. IIUC, you mean that tableoid can't be retrieved when a foreign table is accessed via parent table, No. What I want to say is that tableoid *can* be retrieved when a foreign table is accessed via the parent table. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 9 July 2014 23:54, Peter Geoghegan p...@heroku.com wrote: On Wed, Jul 9, 2014 at 2:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: All this being said, I'm sticking to the name Minmax indexes. There was a poll in pgsql-advocacy http://www.postgresql.org/message-id/53a0b4f8.8080...@agliodbs.com about a new name, but there were no suggestions supported by more than one person. If a brilliant new name comes up, I'm open to changing it. How about summarizing indexes? That seems reasonably descriptive. -1 for another name change. That boat sailed some months back. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 10 July 2014 00:13, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Josh Berkus wrote: On 07/09/2014 02:16 PM, Alvaro Herrera wrote: The way it works now, each opclass needs to have three support procedures; I've called them getOpers, maybeUpdateValues, and compare. (I realize these names are pretty bad, and will be changing them.) I kind of like maybeUpdateValues. Very ... NoSQL-ish. Maybe update the values, maybe not. ;-) :-) Well, that's exactly what happens. If we insert a new tuple into the table, and the existing summarizing tuple (to use Peter's term) already covers it, then we don't need to update the index tuple at all. What this name doesn't say is what values are to be maybe-updated. There are lots of functions that maybe-do-things, that's just modular programming. Not sure we need to prefix things with maybe to explain that, otherwise we'd have maybeXXX everywhere. More descriptive name would be MaintainIndexBounds() or similar. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/07/11 15:50), Etsuro Fujita wrote: (2014/07/10 18:12), Shigeru Hanada wrote: IIUC, you mean that tableoid can't be retrieved when a foreign table is accessed via parent table, No. What I want to say is that tableoid *can* be retrieved when a foreign table is accessed via the parent table. In fact, you can do that with v13 [1], but I plan to change the way of fixing (see [2]). Thanks, [1] http://www.postgresql.org/message-id/53b10914.2010...@lab.ntt.co.jp [2] http://www.postgresql.org/message-id/53b2188b.4090...@lab.ntt.co.jp Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
Hi, Found new issues with latest patch: Thank you for reviewing the patch with variable cases. I have revised the patch, and attached latest patch. A: Will you please explain the idea behind these changes ? I thought wrong about adding new to tail of query_buf. The latest patch does not change related to them. Thanks. B: I added the condition of cur_line 0. A. However, this introduced new bug. As I told, when editor number of lines reaches INT_MAX it starts giving negative number. You tried overcoming this issue by adding 0 check. But I guess you again fumbled in setting that correctly. You are setting it to INT_MAX - 1. This enforces each new line to show line number as INT_MAX - 1 which is incorrect. postgres=# \set PROMPT1 '%/[%l]%R%# ' postgres[1]=# \set PROMPT2 '%/[%l]%R%# ' postgres[1]=# \e postgres[2147483646]-# limit postgres[2147483646]-# 1; relname -- pg_statistic (1 row) postgres[1]=# \e postgres[2147483646]-# = postgres[2147483646]-# ' postgres[2147483646]'# abc postgres[2147483646]'# ' postgres[2147483646]-# ; relname - (0 rows) postgres[1]=# select relname from pg_class where relname = ' abc ' ; Again to mimic that, I have simply intialized newline to INT_MAX - 2. Please don't take me wrong, but it seems that your unit testing is not enough. Above issue I discovered by doing exactly same steps I did in reviewing previous patch. If you had tested your new patch with those steps I guess you have caught that yourself. B. + /* Calculate the line number */ + if (scan_result != PSCAN_INCOMPLETE) + { + /* The one new line is always added to tail of query_buf */ + newline = (newline != 0) ? (newline + 1) : 1; + cur_line += newline; + } Here in above code changes, in any case you are adding 1 to newline. i.e. when it is 0 you are setting it 1 (+1) and when 0 you are setting nl + 1 (again +1). So why can't you simply use if (scan_result != PSCAN_INCOMPLETE) cur_line += (newline + 1); Or better, why can't you initialize newline with 1 itself and then directly assign cur_line with newline. That will eliminate above entire code block, isn't it? Or much better, simply get rid of newline, and use cur_line itself, will this work well for you? C. Typos: 1. /* Count the number of new line for calculate ofline number */ Missing space between 'of' and 'line'. However better improve that to something like (just suggestion): Count the number of new lines to correctly adjust current line number 2. /* Avoid cur_line and newline exceeds the INT_MAX */ You are saying avoid cur_line AND newline, but there is no adjustment for newline in the code following the comment. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does it save to recalculate the hash bucket at 10 instead? Resizing sounds like it will only be useful of we only just overflow our limit. If we release next version with this as a hardcoded change, my understanding is that memory usage for hash joins will leap upwards, even if the run time of queries reduces. It sounds like we need some kind of parameter to control this. We made it faster might not be true if we run this on servers that are already experiencing high memory pressure. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is there a way to temporarily disable a index
That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Rationale: Schema changes on big tables. I might have convinced myself / strong beliefs that for all queries that I need to be fast the planner does not need to use a given index (e.g. other possible plans are fast enough). However if I just drop the index and it turns out I'm wrong I might be in a world of pain because it might just take way to long to recreate the index. I know that I can use pg_stat* to figure out if an index is used at all. But in the presense of multiple indices and complex queries the planner might prefer the index-to-be-dropped but the difference to the alternatives available is immaterial. The current best alternative we have is to test such changes on a testing database that gets regularly restored from production. However at least in our case we simply don't know all possible queries (and logging all of them is not an option). Cheers, Bene
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Greetings, Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport). From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion. But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? Thanks, Ali Akbar *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *** *** 141,149 static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, --- 141,151 pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur, ! PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *** *** 3595,3620 SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; buf = xmlBufferCreate(); PG_TRY(); { ! xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); xmlBufferFree(buf); } else --- 3597,3636 * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNodePtr cur_copy; buf = xmlBufferCreate(); + + /* the result of xmlNodeDump won't contain namespace definitions + * from parent nodes, but xmlCopyNode duplicates a node along + * with its required namespace definitions. + */ + cur_copy = xmlCopyNode(cur, 1); + + if (cur_copy == NULL) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not serialize xml); + PG_TRY(); { ! xmlNodeDump(buf, NULL, cur_copy, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { + xmlFreeNode(cur_copy); xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); + xmlFreeNode(cur_copy); xmlBufferFree(buf); } else *** *** 3656,3662 xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate) { int result = 0; Datum datum; --- 3672,3679 */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt) { int result = 0; Datum datum; *** *** 3678,3684 xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i])); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); --- 3695,3702 for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i], ! xmlerrcxt)); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); *** *** 3882,3890 xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate); else ! (void) xml_xpathobjtoxmlarray(xpathobj, astate); } PG_CATCH(); { --- 3900,3908 * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems =
[HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan
I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy blocks for a bitmap heap scan when both the numbers of exact/lossy pages retrieved by the node are zero. Such an example is shown below. I think it would be better to suppress the 'Heap Blocks' line in that case, based on the same idea of the 'Buffers' line. Patch attached. postgres=# explain (analyze, verbose, buffers) select * from test where id 10; QUERY PLAN -- Bitmap Heap Scan on public.test (cost=4.29..8.31 rows=1 width=29) (actual time=0.006..0.006 rows=0 loops=1) Output: id, inserted, data Recheck Cond: (test.id 10) Heap Blocks: Buffers: shared hit=2 - Bitmap Index Scan on test_pkey (cost=0.00..4.29 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1) Index Cond: (test.id 10) Buffers: shared hit=2 Planning time: 0.118 ms Execution time: 0.027 ms (10 rows) Thanks, Best regards, Etsuro Fujita *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *** *** 1937,1949 show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es) } else { ! appendStringInfoSpaces(es-str, es-indent * 2); ! appendStringInfoString(es-str, Heap Blocks:); ! if (planstate-exact_pages 0) ! appendStringInfo(es-str, exact=%ld, planstate-exact_pages); ! if (planstate-lossy_pages 0) ! appendStringInfo(es-str, lossy=%ld, planstate-lossy_pages); ! appendStringInfoChar(es-str, '\n'); } } --- 1937,1952 } else { ! if (planstate-exact_pages 0 || planstate-lossy_pages 0) ! { ! appendStringInfoSpaces(es-str, es-indent * 2); ! appendStringInfoString(es-str, Heap Blocks:); ! if (planstate-exact_pages 0) ! appendStringInfo(es-str, exact=%ld, planstate-exact_pages); ! if (planstate-lossy_pages 0) ! appendStringInfo(es-str, lossy=%ld, planstate-lossy_pages); ! appendStringInfoChar(es-str, '\n'); ! } } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
On Thursday, July 10, 2014, Robert Haas robertmh...@gmail.com wrote: On Wed, Jul 9, 2014 at 2:13 AM, Stephen Frost sfr...@snowman.net javascript:; wrote: Yes, this would be possible (and is nearly identical to the original patch, except that this includes per-role considerations), however, my thinking is that it'd be simpler to work with policy names rather than sets of quals, to use when mapping to roles, and they would potentially be useful later for other things (eg: for setting up which policies should be applied when, or which should be OR' or ANDd with other policies, or having groups of policies, etc). Hmm. I guess that's reasonable. Should the policy be a per-table object (like rules, constraints, etc.) instead of a global object? You could do: ALTER TABLE table_name ADD POLICY policy_name (quals); ALTER TABLE table_name POLICY FOR role_name IS policy_name; ALTER TABLE table_name DROP POLICY policy_name; Right, I was thinking they would be per table as they would specifically provide a name for a set of quals, and quals are naturally table-specific. I don't see a need to have them be global- that had been brought up before with the notion of applications picking their policy, but we could also add that later through another term (eg: contexts) which would then map to policies or similar. We could even extend policies to be global by mapping existing per-table ones to be global if we really needed to... My feeling at the moment is that having them be per-table makes sense and we'd still have flexibility to change later if we had some compelling reason to do so. Thanks! Stephen
[HACKERS] pg_receivexlog and replication slots
Is there a particular reason why pg_receivexlog only supports using manually created slots but pg_recvlogical supports creating and dropping them? Wouldn't it be good for consistency there? I'm guessing it's related to not being exposed over the replication protocol? We had a discussion earlier that I remember about being able to use an auto-drop slot in pg_basebackup, but this would be different - this would be about creating and dropping a regular physical replication slot... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote: Is there a particular reason why pg_receivexlog only supports using manually created slots but pg_recvlogical supports creating and dropping them? Wouldn't it be good for consistency there? I've added it to recvlogical because logical decoding isn't usable without slots. I'm not sure what you'd want to create/drop a slot from receivexlog for, but I'm not adverse to adding the capability. I'm guessing it's related to not being exposed over the replication protocol? It's exposed: create_replication_slot: /* CREATE_REPLICATION_SLOT slot PHYSICAL */ K_CREATE_REPLICATION_SLOT IDENT K_PHYSICAL Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On Fri, Jul 11, 2014 at 11:14 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote: Is there a particular reason why pg_receivexlog only supports using manually created slots but pg_recvlogical supports creating and dropping them? Wouldn't it be good for consistency there? I've added it to recvlogical because logical decoding isn't usable without slots. I'm not sure what you'd want to create/drop a slot from receivexlog for, but I'm not adverse to adding the capability. I was mostly thinking for consistency, really. Using slots with pg_receivexlog makes quite a bit of sense, even though it's useful without it. But having the ability to create and drop (with compatible commandline arguments of course) them directly when you set it up would certainly be more convenient. I'm guessing it's related to not being exposed over the replication protocol? It's exposed: create_replication_slot: /* CREATE_REPLICATION_SLOT slot PHYSICAL */ K_CREATE_REPLICATION_SLOT IDENT K_PHYSICAL Hmm. You know, it would help if I actually looked at a 9.4 version of the file when I check for functions of this kind :) Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incorrect comment in postgres_fdw.c
I think the following comment for store_returning_result() in postgres_fdw.c is not right. /* PGresult must be released before leaving this function. */ I think PGresult should not be released before leaving this function *on success* in that function. (I guess the comment has been copied and pasted from that for get_remote_estimate().) Thanks, Best regards, Etsuro Fujita diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 7dd43a9..f328833 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -2257,7 +2257,6 @@ static void store_returning_result(PgFdwModifyState *fmstate, TupleTableSlot *slot, PGresult *res) { - /* PGresult must be released before leaving this function. */ PG_TRY(); { HeapTuple newtup; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 11 Červenec 2014, 9:27, Simon Riggs wrote: On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does it save to recalculate the hash bucket at 10 instead? Resizing sounds like it will only be useful of we only just overflow our limit. If we release next version with this as a hardcoded change, my understanding is that memory usage for hash joins will leap upwards, even if the run time of queries reduces. It sounds like we need some kind of parameter to control this. We made it faster might not be true if we run this on servers that are already experiencing high memory pressure. Sure. We certainly don't want to make things worse for environments with memory pressure. The current implementation has two issues regarding memory: (1) It does not include buckets into used memory, i.e. it's not included into work_mem (so we may overflow work_mem). I plan to fix this, to make work_mem a bit more correct, as it's important for cases with NTUP_PER_BUCKET=1. (2) There's a significant palloc overhead, because of allocating each tuple separately - see my message from yesterday, where I observed the batch memory context to get 1.4GB memory for 700MB of tuple data. By densely packing the tuples, I got down to ~700MB (i.e. pretty much no overhead). The palloc overhead seems to be 20B (on x86_64) per tuple, and eliminating this it more than compensates for ~8B per tuple, required for NTUP_PER_BUCKET=1. And fixing (1) makes it more correct / predictable. It also improves the issue that palloc overhead is not counted into work_mem at all (that's why I got ~1.4GB batch context with work_mem=1GB). So in the end this should give us much lower memory usage for hash joins, even if we switch to NTUP_PER_BUCKET=1 (although that's pretty much independent change). Does that seem reasonable? Regarding the tunable to control this - I certainly don't want another GUC no one really knows how to set right. And I think it's unnecessary thanks to the palloc overhead / work_mem accounting fix, described above. The one thing I'm not sure about is what to do in case of reaching the work_mem limit (which should only happen with underestimated row count / row width) - how to decide whether to shrink the hash table or increment the number of batches. But this is not exclusive to NTUP_PER_BUCKET=1, it may happen with whatever NTUP_PER_BUCKET value you choose. The current code does not support resizing at all, so it always increments the number of batches, but I feel an interleaved approach might be more appropriate (nbuckets/2, nbatches*2, nbuckets/2, nbatches*2, ...). It'd be nice to have some cost estimates ('how expensive is a rescan' vs. 'how expensive is a resize'), but I'm not sure how to get that. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-07-11 11:18:58 +0200, Magnus Hagander wrote: On Fri, Jul 11, 2014 at 11:14 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-07-11 11:08:48 +0200, Magnus Hagander wrote: Is there a particular reason why pg_receivexlog only supports using manually created slots but pg_recvlogical supports creating and dropping them? Wouldn't it be good for consistency there? I've added it to recvlogical because logical decoding isn't usable without slots. I'm not sure what you'd want to create/drop a slot from receivexlog for, but I'm not adverse to adding the capability. I was mostly thinking for consistency, really. Using slots with pg_receivexlog makes quite a bit of sense, even though it's useful without it. But having the ability to create and drop (with compatible commandline arguments of course) them directly when you set it up would certainly be more convenient. Ok. Do you plan to take care of it? If, I'd be fine with backpatching it. I'm not likely to get to it right now :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Securing make check (CVE-2014-0067)
Re: Bruce Momjian 2014-07-08 20140708202114.gd9...@momjian.us I believe pg_upgrade itself still needs a fix. While it's not a security problem to put the socket in $CWD while upgrading (it is using -c unix_socket_permissions=0700), this behavior is pretty unexpected, and does fail if your $CWD is 107 bytes. In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl perl tests to avoid that problem, so imho it would make even more sense to fix pg_upgrade which could also fail in production. +1. Does writing that patch interest you? I'll give it a try once I've finished this CF review. OK. Let me know if you need help. Here's the patch. Proposed commit message: Create pg_upgrade sockets in temp directories pg_upgrade used to use the current directory for UNIX sockets to access the old/new cluster. This fails when the current path is 107 bytes. Fix by reusing the tempdir code from pg_regress introduced in be76a6d39e2832d4b88c0e1cc381aa44a7f86881. For cleanup, we need to remember up to two directories. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Securing make check (CVE-2014-0067)
Re: To Bruce Momjian 2014-07-11 20140711093923.ga3...@msg.df7cb.de Re: Bruce Momjian 2014-07-08 20140708202114.gd9...@momjian.us I believe pg_upgrade itself still needs a fix. While it's not a security problem to put the socket in $CWD while upgrading (it is using -c unix_socket_permissions=0700), this behavior is pretty unexpected, and does fail if your $CWD is 107 bytes. In f545d233ebce6971b6f9847680e48b679e707d22 Peter fixed the pg_ctl perl tests to avoid that problem, so imho it would make even more sense to fix pg_upgrade which could also fail in production. +1. Does writing that patch interest you? I'll give it a try once I've finished this CF review. OK. Let me know if you need help. Here's the patch. Proposed commit message: Create pg_upgrade sockets in temp directories pg_upgrade used to use the current directory for UNIX sockets to access the old/new cluster. This fails when the current path is 107 bytes. Fix by reusing the tempdir code from pg_regress introduced in be76a6d39e2832d4b88c0e1cc381aa44a7f86881. For cleanup, we need to remember up to two directories. Uh... now really. Christoph -- c...@df7cb.de | http://www.df7cb.de/ diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c index b81010a..121a3d4 100644 --- a/contrib/pg_upgrade/option.c +++ b/contrib/pg_upgrade/option.c @@ -14,6 +14,7 @@ #include pg_upgrade.h +#include signal.h #include time.h #include sys/types.h #ifdef WIN32 @@ -26,6 +27,13 @@ static void check_required_directory(char **dirpath, char **configpath, char *envVarName, char *cmdLineOption, char *description); #define FIX_DEFAULT_READ_ONLY -c default_transaction_read_only=false +#ifdef HAVE_UNIX_SOCKETS +/* make_temp_sockdir() is invoked at most twice from pg_upgrade.c via get_sock_dir() */ +#define MAX_TEMPDIRS 2 +static int n_tempdirs = 0; /* actual number of directories created */ +static const char *temp_sockdir[MAX_TEMPDIRS]; +#endif + UserOpts user_opts; @@ -396,6 +404,86 @@ adjust_data_dir(ClusterInfo *cluster) check_ok(); } +#ifdef HAVE_UNIX_SOCKETS +/* + * Remove the socket temporary directories. pg_ctl waits for postmaster + * shutdown, so we expect the directory to be empty, unless we are interrupted + * by a signal, in which case the postmaster will clean up the sockets, but + * there's a race condition with us removing the directory. Ignore errors; + * leaking a (usually empty) temporary directory is unimportant. This can run + * from a signal handler. The code is not acceptable in a Windows signal + * handler (see initdb.c:trapsig()), but Windows is not a HAVE_UNIX_SOCKETS + * platform. + */ +static void remove_temp(void) +{ + int i; + + for (i = 0; i n_tempdirs; i++) + { + Assert(temp_sockdir[i]); + rmdir(temp_sockdir[i]); + } +} + +/* + * Signal handler that calls remove_temp() and reraises the signal. + */ +static void +signal_remove_temp(int signum) +{ + remove_temp(); + + pqsignal(signum, SIG_DFL); + raise(signum); +} + +/* + * Create a temporary directory suitable for the server's Unix-domain socket. + * The directory will have mode 0700 or stricter, so no other OS user can open + * our socket to exploit it independently from the auth method used. Most + * systems constrain the length of socket paths well below _POSIX_PATH_MAX, so + * we place the directory under /tmp rather than relative to the possibly-deep + * current working directory. + * + * Using a temporary directory so no connections arrive other than what + * pg_upgrade initiate itself. Compared to using the compiled-in + * DEFAULT_PGSOCKET_DIR, this also permits pg_upgrade to work in builds that + * relocate it to a directory not writable to the cluster owner. + */ +static const char * +make_temp_sockdir(void) +{ + char *template = strdup(/tmp/pg_upgrade-XX); + + Assert(n_tempdirs MAX_TEMPDIRS); + temp_sockdir[n_tempdirs] = mkdtemp(template); + if (temp_sockdir[n_tempdirs] == NULL) + { + fprintf(stderr, _(%s: could not create directory \%s\: %s\n), +os_info.progname, template, strerror(errno)); + exit(2); + } + + /* + * Remove the directories during clean exit. Register the handler only + * once, though. + */ + if (n_tempdirs == 0) + atexit(remove_temp); + + /* + * Remove the directory before dying to the usual signals. Omit SIGQUIT, + * preserving it as a quick, untidy exit. + */ + pqsignal(SIGHUP, signal_remove_temp); + pqsignal(SIGINT, signal_remove_temp); + pqsignal(SIGPIPE, signal_remove_temp); + pqsignal(SIGTERM, signal_remove_temp); + + return temp_sockdir[n_tempdirs++]; +} +#endif /* HAVE_UNIX_SOCKETS */ /* * get_sock_dir @@ -418,10 +506,8 @@ get_sock_dir(ClusterInfo *cluster, bool live_check) { if (!live_check) { - /* Use the current directory for the socket */ - cluster-sockdir = pg_malloc(MAXPGPATH); - if (!getcwd(cluster-sockdir, MAXPGPATH)) -pg_fatal(cannot find current directory\n); + /*
Re: [HACKERS] add line number as prompt option to psql
On Fri, Jul 11, 2014 at 4:23 PM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: Hi, A. However, this introduced new bug. As I told, when editor number of lines reaches INT_MAX it starts giving negative number. You tried overcoming this issue by adding 0 check. But I guess you again fumbled in setting that correctly. You are setting it to INT_MAX - 1. This enforces each new line to show line number as INT_MAX - 1 which is incorrect. postgres=# \set PROMPT1 '%/[%l]%R%# ' postgres[1]=# \set PROMPT2 '%/[%l]%R%# ' postgres[1]=# \e postgres[2147483646]-# limit postgres[2147483646]-# 1; relname -- pg_statistic (1 row) postgres[1]=# \e postgres[2147483646]-# = postgres[2147483646]-# ' postgres[2147483646]'# abc postgres[2147483646]'# ' postgres[2147483646]-# ; relname - (0 rows) postgres[1]=# select relname from pg_class where relname = ' abc ' ; Again to mimic that, I have simply intialized newline to INT_MAX - 2. Please don't take me wrong, but it seems that your unit testing is not enough. Above issue I discovered by doing exactly same steps I did in reviewing previous patch. If you had tested your new patch with those steps I guess you have caught that yourself. To my understating cleanly, you means that line number is not changed when newline has reached to INT_MAX, is incorrect? And the line number should be switched to 1 when line number has reached to INT_MAX? B. + /* Calculate the line number */ + if (scan_result != PSCAN_INCOMPLETE) + { + /* The one new line is always added to tail of query_buf */ + newline = (newline != 0) ? (newline + 1) : 1; + cur_line += newline; + } Here in above code changes, in any case you are adding 1 to newline. i.e. when it is 0 you are setting it 1 (+1) and when 0 you are setting nl + 1 (again +1). So why can't you simply use if (scan_result != PSCAN_INCOMPLETE) cur_line += (newline + 1); Or better, why can't you initialize newline with 1 itself and then directly assign cur_line with newline. That will eliminate above entire code block, isn't it? Or much better, simply get rid of newline, and use cur_line itself, will this work well for you? this is better. I will change code to this. C. Typos: 1. /* Count the number of new line for calculate ofline number */ Missing space between 'of' and 'line'. However better improve that to something like (just suggestion): Count the number of new lines to correctly adjust current line number 2. /* Avoid cur_line and newline exceeds the INT_MAX */ You are saying avoid cur_line AND newline, but there is no adjustment for newline in the code following the comment. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company Thanks. I will fix it. -- Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jul 9, 2014 at 12:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrow...@gmail.com writes: On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote: On review it looks like analyzejoins.c would possibly benefit from an earlier fast-path check as well. Do you mean for non-subqueries? There already is a check to see if the relation has no indexes. Oh, sorry, that was a typo: I meant to write pathnode.c. Specifically, we could skip the translate_sub_tlist step. Admittedly that's not hugely expensive, but as long as we have the infrastructure for a quick check it might be worth doing. TBH I find the checks for FOR UPDATE and volatile functions to be questionable as well. Well, that's a real tough one for me as I only added that based on what you told me here: I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. Hah ;-). But the analogy to qual pushdown hadn't occurred to me at the time. Ok, I've removed the check for volatile functions and FOR UPDATE. As far as I know the FOR UPDATE check is pretty much void as of now anyway, since the current state of query_is_distinct_for() demands that there's either a DISTINCT, GROUP BY or just a plain old aggregate without any grouping, which will just return a single row, neither of these will allow FOR UPDATE anyway. True. So the effort here should be probably be more focused on if we should allow the join removal when the subquery contains volatile functions. We should probably think fairly hard on this now as I'm still planning on working on INNER JOIN removals at some point and I'm thinking we should likely be consistent between the 2 types of join for when it comes to FOR UPDATE and volatile functions, and I'm thinking right now that for INNER JOINs that, since they're INNER that we could remove either side of the join. In that case maybe it would be harder for the user to understand why their volatile function didn't get executed. Color me dubious. In exactly what circumstances would it be valid to suppress an inner join involving a sub-select? hmm, probably I didn't think this through enough before commenting as I don't actually have any plans for subselects with INNER JOINs. Though saying that I guess there are cases that can be removed... Anything that queries a single table that has a foreign key matching the join condition, where the subquery does not filter or group the results. Obviously something about the query would have to exist that caused it not to be flattened, perhaps some windowing functions... I've attached an updated patch which puts in some fast path code for subquery type joins. I'm really not too sure on a good name for this function. I've ended up with query_supports_distinctness() which I'm not that keen on, but I didn't manage to come up with anything better. Regards David Rowley subquery_leftjoin_removal_v1.6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Fix search_path default value separator.
Hi I noticed a minor inconsistency with the search_path separator used in the default configuration. The schemas of any search_path set using `SET search_path TO...` are separated by , (comma, space), while the default value is only separated by , (comma). The attached patch against master changes the separator of the default value to be consistent with the usual comma-space separators, and updates the documentation of `SHOW search_path;` accordingly. This massive three-byte change passes all 144 tests of make check. Regards, Christoph From 0f52f107af59f560212bff2bda74e643d63687f0 Mon Sep 17 00:00:00 2001 From: Christoph Martin christoph.r.mar...@gmail.com Date: Fri, 11 Jul 2014 08:52:39 +0200 Subject: [PATCH] Fix search_path default value separator. When setting the search_path with `SET search_path TO...`, the schemas of the resulting search_path as reported by `SHOW search_path` are separated by a comma followed by a single space. This patch applies the same format to the default search_path, and updates the documentation accordingly. --- doc/src/sgml/ddl.sgml | 2 +- src/backend/utils/misc/guc.c | 2 +- src/backend/utils/misc/postgresql.conf.sample | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 3b7fff4..2273616 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1746,7 +1746,7 @@ SHOW search_path; screen search_path -- - $user,public + $user, public /screen The first element specifies that a schema with the same name as the current user is to be searched. If no such schema exists, diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 3a31a75..a3f1051 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2915,7 +2915,7 @@ static struct config_string ConfigureNamesString[] = GUC_LIST_INPUT | GUC_LIST_QUOTE }, namespace_search_path, - \$user\,public, + \$user\, public, check_search_path, assign_search_path, NULL }, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 3f3e706..df98b02 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -499,7 +499,7 @@ # - Statement Behavior - -#search_path = '$user,public' # schema names +#search_path = '$user, public' # schema names #default_tablespace = '' # a tablespace name, '' uses the default #temp_tablespaces = '' # a list of tablespace names, '' uses # only default tablespace -- 2.0.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan
On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy blocks for a bitmap heap scan when both the numbers of exact/lossy pages retrieved by the node are zero. Such an example is shown below. I think it would be better to suppress the 'Heap Blocks' line in that case, based on the same idea of the 'Buffers' line. Patch attached. The patch looks good to me. Barring any objection, I will commit this both in HEAD and 9.4. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 11 July 2014 10:23, Tomas Vondra t...@fuzzy.cz wrote: On 11 Červenec 2014, 9:27, Simon Riggs wrote: On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does it save to recalculate the hash bucket at 10 instead? Resizing sounds like it will only be useful of we only just overflow our limit. If we release next version with this as a hardcoded change, my understanding is that memory usage for hash joins will leap upwards, even if the run time of queries reduces. It sounds like we need some kind of parameter to control this. We made it faster might not be true if we run this on servers that are already experiencing high memory pressure. Sure. We certainly don't want to make things worse for environments with memory pressure. The current implementation has two issues regarding memory: (1) It does not include buckets into used memory, i.e. it's not included into work_mem (so we may overflow work_mem). I plan to fix this, to make work_mem a bit more correct, as it's important for cases with NTUP_PER_BUCKET=1. (2) There's a significant palloc overhead, because of allocating each tuple separately - see my message from yesterday, where I observed the batch memory context to get 1.4GB memory for 700MB of tuple data. By densely packing the tuples, I got down to ~700MB (i.e. pretty much no overhead). The palloc overhead seems to be 20B (on x86_64) per tuple, and eliminating this it more than compensates for ~8B per tuple, required for NTUP_PER_BUCKET=1. And fixing (1) makes it more correct / predictable. It also improves the issue that palloc overhead is not counted into work_mem at all (that's why I got ~1.4GB batch context with work_mem=1GB). So in the end this should give us much lower memory usage for hash joins, even if we switch to NTUP_PER_BUCKET=1 (although that's pretty much independent change). Does that seem reasonable? Yes, that alleviates my concern. Thanks. Regarding the tunable to control this - I certainly don't want another GUC no one really knows how to set right. And I think it's unnecessary thanks to the palloc overhead / work_mem accounting fix, described above. Agreed, nor does anyone. The one thing I'm not sure about is what to do in case of reaching the work_mem limit (which should only happen with underestimated row count / row width) - how to decide whether to shrink the hash table or increment the number of batches. But this is not exclusive to NTUP_PER_BUCKET=1, it may happen with whatever NTUP_PER_BUCKET value you choose. The current code does not support resizing at all, so it always increments the number of batches, but I feel an interleaved approach might be more appropriate (nbuckets/2, nbatches*2, nbuckets/2, nbatches*2, ...). It'd be nice to have some cost estimates ('how expensive is a rescan' vs. 'how expensive is a resize'), but I'm not sure how to get that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing autocomplete for CREATE DATABASE
On Fri, Jul 11, 2014 at 12:01 AM, Vik Fearing vik.fear...@dalibo.com wrote: On 07/10/2014 09:32 PM, Magnus Hagander wrote: It seems psql is missing autocomplete entries for LC_COLLATE and LC_CTYPE for the CREATE DATABASE command. Attached patch adds that. I doubt this is important enough to backpatch - thoughts? It won't apply to current head, but otherwise I see no problem with it. Meh, thanks for pointing that out. I git-fetch:ed from the wrong repository :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
Hi, On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com wrote: To my understating cleanly, you means that line number is not changed when newline has reached to INT_MAX, is incorrect? As per my thinking yes. And the line number should be switched to 1 when line number has reached to INT_MAX? Yes, when it goes beyond INT_MAX, wrap around to 1. BTW, I wonder, can't we simply use unsigned int instead? Also, what is the behaviour on LINE n, in error message in case of such wrap-around? Or much better, simply get rid of newline, and use cur_line itself, will this work well for you? this is better. I will change code to this. Thanks. I will fix it. Meanwhile I have tried this. Attached patch to have your suggestion on that. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 255e8ca..030f4d0 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3298,6 +3298,11 @@ testdb=gt; userinputINSERT INTO my_table VALUES (:'content');/userinput /varlistentry varlistentry +termliteral%l/literal/term +listitemparaThe current line number/para/listitem + /varlistentry + + varlistentry termliteral%/literalreplaceable class=parameterdigits/replaceable/term listitem para diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c index c3aff20..675b550 100644 --- a/src/bin/psql/mainloop.c +++ b/src/bin/psql/mainloop.c @@ -8,6 +8,7 @@ #include postgres_fe.h #include mainloop.h +#include limits.h #include command.h #include common.h @@ -58,6 +59,7 @@ MainLoop(FILE *source) pset.cur_cmd_source = source; pset.cur_cmd_interactive = ((source == stdin) !pset.notty); pset.lineno = 0; + cur_line = 1; /* Create working state */ scan_state = psql_scan_create(); @@ -225,6 +227,7 @@ MainLoop(FILE *source) { PsqlScanResult scan_result; promptStatus_t prompt_tmp = prompt_status; + char *tmp = line; scan_result = psql_scan(scan_state, query_buf, prompt_tmp); prompt_status = prompt_tmp; @@ -235,6 +238,30 @@ MainLoop(FILE *source) exit(EXIT_FAILURE); } + /* + * Increase current line number counter with the new lines present + * in the line buffer + */ + while (*tmp != '\0' scan_result != PSCAN_INCOMPLETE) + { +if (*(tmp++) == '\n') + cur_line++; + } + + /* The one new line is always added to tail of query_buf */ + if (scan_result != PSCAN_INCOMPLETE) +cur_line++; + + /* + * If we overflow, then we start at INT_MIN and move towards 0. So + * to get +ve wrap-around line number we have to add INT_MAX + 2 to + * this number. We add 2 due to the fact that we have difference + * of 1 in absolute value of INT_MIN and INT_MAX and another 1 as + * line number starts at one and not at zero. + */ + if (cur_line 0) +cur_line += INT_MAX + 2; + /* * Send command if semicolon found, or if end of line and we're in * single-line mode. @@ -256,6 +283,7 @@ MainLoop(FILE *source) /* execute query */ success = SendQuery(query_buf-data); slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR; +cur_line = 1; /* transfer query to previous_buf by pointer-swapping */ { @@ -303,6 +331,7 @@ MainLoop(FILE *source) query_buf : previous_buf); success = slashCmdStatus != PSQL_CMD_ERROR; +cur_line = 1; if ((slashCmdStatus == PSQL_CMD_SEND || slashCmdStatus == PSQL_CMD_NEWEDIT) query_buf-len == 0) diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c index 26fca04..6a62e5f 100644 --- a/src/bin/psql/prompt.c +++ b/src/bin/psql/prompt.c @@ -44,6 +44,7 @@ * in prompt2 -, *, ', or ; * in prompt3 nothing * %x - transaction status: empty, *, !, ? (unknown or no connection) + * %l - the line number * %? - the error code of the last query (not yet implemented) * %% - a percent sign * @@ -229,6 +230,9 @@ get_prompt(promptStatus_t status) } break; +case 'l': + sprintf(buf, %d, cur_line); + break; case '?': /* not here yet */ break; diff --git a/src/bin/psql/prompt.h b/src/bin/psql/prompt.h index 4d2f7e3..f1f80d2 100644 --- a/src/bin/psql/prompt.h +++ b/src/bin/psql/prompt.h @@ -22,4 +22,7 @@ typedef enum _promptStatus char *get_prompt(promptStatus_t status); +/* Current line number */ +intcur_line; + #endif /* PROMPT_H */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
On Fri, Jul 11, 2014 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: We could no doubt fix this by also insisting that the left-side vars be provably not null, but that's going to make the patch even slower and even less often applicable. I'm feeling discouraged about whether this is worth doing in this form. :-( seems I didn't do my analysis very well on that one. Hm ... actually, there might be a better answer: what about transforming WHERE (x,y) NOT IN (SELECT provably-not-null-values FROM ...) to WHERE antijoin condition AND x IS NOT NULL AND y IS NOT NULL ? I think this is the way to go. It's basically what I had to do with the WIP patch I have here for SEMI JOIN removal, where when a IN() or EXISTS type join could be removed due to the existence of a foreign key, the NULL values still need to be filtered out. Perhaps it would be possible for a future patch to check get_attnotnull and remove these again in eval_const_expressions, if the column can't be null. Thanks for taking the time to fix up the weirdness with the NATURAL joins and also making use of the join condition to prove not null-ability. I'll try and get some time soon to look into adding the IS NOT NULL quals, unless you were thinking of looking again? Regards David Rowley Of course this would require x/y not being volatile, but if they are, we're not going to get far with optimizing the query anyhow. regards, tom lane
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Thank you for review. So, you're compressing backup blocks one by one. I wonder if that's the right idea and if we shouldn't instead compress all of them in one run to increase the compression ratio. The idea behind compressing blocks one by one was to keep the code as much similar to the original as possible. For instance the easiest change I could think of is , if we compress all backup blocks of a WAL record together the below format of WAL record might change Fixed-size header (XLogRecord struct) rmgr-specific data BkpBlock backup block data BkpBlock backup block data to Fixed-size header (XLogRecord struct) rmgr-specific data BkpBlock BkpBlock backup blocks data ... But at the same time, it can be worth giving a try to see if there is significant improvement in compression . So why aren't we compressing the hole here instead of compressing the parts that the current logic deems to be filled with important information? Entire full page image in the WAL record is compressed. The unimportant part of the full page image which is hole is not WAL logged in original code. This patch compresses entire full page image inclusive of hole. This can be optimized by omitting hole in the compressed FPW(incase hole is filled with non-zeros) like the original uncompressed FPW . But this can lead to change in BkpBlock structure. This should be named 'compress_full_page_writes' or so, even if a temporary guc. There's the 'full_page_writes' guc and I see little reaason to deviate from its name. Yes. This will be renamed to full_page_compression according to suggestions earlier in the discussion. Thank you, Rahila Syed On Fri, Jul 11, 2014 at 12:00 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-07-04 19:27:10 +0530, Rahila Syed wrote: + /* Allocates memory for compressed backup blocks according to the compression + * algorithm used.Once per session at the time of insertion of first XLOG + * record. + * This memory stays till the end of session. OOM is handled by making the + * code proceed without FPW compression*/ + static char *compressed_pages[XLR_MAX_BKP_BLOCKS]; + static bool compressed_pages_allocated = false; + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF + compressed_pages_allocated!= true) + { + size_t buffer_size = VARHDRSZ; + int j; + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY) + buffer_size += snappy_max_compressed_length(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4) + buffer_size += LZ4_compressBound(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ) + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ); + for (j = 0; j XLR_MAX_BKP_BLOCKS; j++) + { compressed_pages[j] = (char *) malloc(buffer_size); + if(compressed_pages[j] == NULL) + { + compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF; + break; + } + } + compressed_pages_allocated = true; + } Why not do this in InitXLOGAccess() or similar? /* * Make additional rdata chain entries for the backup blocks, so that we * don't need to special-case them in the write loop. This modifies the @@ -1015,11 +1048,32 @@ begin:; rdt-next = (dtbuf_rdt2[i]); rdt = rdt-next; + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF) + { + /* Compress the backup block before including it in rdata chain */ + rdt-data = CompressBackupBlock(page, BLCKSZ - bkpb-hole_length, + compressed_pages[i], (rdt-len)); + if (rdt-data != NULL) + { + /* + * write_len is the length of compressed block and its varlena + * header + */ + write_len += rdt-len; + bkpb-hole_length = BLCKSZ - rdt-len; + /*Adding information about compression in the backup block header*/ + bkpb-block_compression=compress_backup_block; + rdt-next = NULL; + continue; + } + } + So, you're compressing backup blocks one by one. I wonder if that's the right idea and if we shouldn't instead compress all of them in one run to increase the compression ratio. +/* * Get a pointer to the right location in the WAL buffer containing the * given XLogRecPtr. * @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr
Re: [HACKERS] Minmax indexes
Fujii Masao wrote: On Thu, Jul 10, 2014 at 6:16 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Here's a new version of this patch, which is more generic the original versions, and similar to what you describe. I've not read the discussion so far at all, but I found the problem when I played with this patch. Sorry if this has already been discussed. =# create table test as select num from generate_series(1,10) num; SELECT 10 =# create index testidx on test using minmax (num); CREATE INDEX =# alter table test alter column num type text; ERROR: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly. Ah, yes, I need to pass down collation OIDs to comparison functions. That's marked as XXX in various places in the code. Sorry I forgot to mention that. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade and toast tables bug discovered
On Fri, Jul 11, 2014 at 12:18:40AM -0400, Alvaro Herrera wrote: Bruce Momjian wrote: On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote: I have thought some more on this. I thought I would need to open pg_class in C and do complex backend stuff, but I now realize I can do it from libpq, and just call ALTER TABLE and I think that always auto-checks if a TOAST table is needed. All I have to do is query pg_class from libpq, then construct ALTER TABLE commands for each item, and it will optionally create the TOAST table if needed. I just have to use a no-op ALTER TABLE command, like SET STATISTICS. Attached is the backend part of the patch. I will work on the pg_upgrade/libpq/ALTER TABLE part later. Uh, why does this need to be in ALTER TABLE? Can't this be part of table creation done by pg_dump? Uh, I think you need to read the thread. We have to delay the toast creation part so we don't use an oid that will later be required by another table from the old cluster. This has to be done after all tables have been created. We could have pg_dump spit out those ALTER lines at the end of the dump, but it seems simpler to do it in pg_upgrade. Even if we have pg_dump create all the tables that require pre-assigned TOAST oids first, then the other tables that _might_ need a TOAST table, those later tables might create a toast oid that matches a later non-TOAST-requiring table, so I don't think that fixes the problem. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade and toast tables bug discovered
On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote: Uh, why does this need to be in ALTER TABLE? Can't this be part of table creation done by pg_dump? Uh, I think you need to read the thread. We have to delay the toast creation part so we don't use an oid that will later be required by another table from the old cluster. This has to be done after all tables have been created. We could have pg_dump spit out those ALTER lines at the end of the dump, but it seems simpler to do it in pg_upgrade. Even if we have pg_dump create all the tables that require pre-assigned TOAST oids first, then the other tables that _might_ need a TOAST table, those later tables might create a toast oid that matches a later non-TOAST-requiring table, so I don't think that fixes the problem. What would be nice is if I could mark just the tables that will need toast tables created in that later phase (those tables that didn't have a toast table in the old cluster, but need one in the new cluster). However, I can't see where to store that or how to pass that back into pg_upgrade. I don't see a logical place in pg_class to put it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
Jeevan Chalke wrote: On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com wrote: And the line number should be switched to 1 when line number has reached to INT_MAX? Yes, when it goes beyond INT_MAX, wrap around to 1. BTW, I wonder, can't we simply use unsigned int instead? That was my thought also: let the variable be unsigned, and have it wrap around normally. So once you reach UINT_MAX, the next line number is zero (instead of getting stuck at UINT_MAX, which would be rather strange). Anyway I don't think anyone is going to reach the UINT_MAX limit ... I mean that would be one hell of a query, wouldn't it. If your query is upwards of a million lines, surely you are in deep trouble already. Does your text editor handle files longer than 4 billion lines? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
Benedikt Grundmann wrote That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Rationale: Schema changes on big tables. I might have convinced myself / strong beliefs that for all queries that I need to be fast the planner does not need to use a given index (e.g. other possible plans are fast enough). However if I just drop the index and it turns out I'm wrong I might be in a world of pain because it might just take way to long to recreate the index. I know that I can use pg_stat* to figure out if an index is used at all. But in the presense of multiple indices and complex queries the planner might prefer the index-to-be-dropped but the difference to the alternatives available is immaterial. The current best alternative we have is to test such changes on a testing database that gets regularly restored from production. However at least in our case we simply don't know all possible queries (and logging all of them is not an option). Cheers, Bene Worth double-checking in test but... BEGIN; DROP INDEX ...; EXPLAIN ANALYZE SELECT ... ROLLBACK; Index dropping is transactional so your temporary action lasts until you abort said transaction. Though given your knowledge limitations this really isn't an improvement... Catalog hacking could work but not recommended (nor do I know the proper commands and limitations). Do you need the database/table to accept writes during the testing period? You can avoid all indexes, but not a named subset, using a configuration parameter. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Is-there-a-way-to-temporarily-disable-a-index-tp5811249p5811290.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing NOT IN to use ANTI joins
David Rowley dgrowle...@gmail.com writes: On Fri, Jul 11, 2014 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hm ... actually, there might be a better answer: what about transforming WHERE (x,y) NOT IN (SELECT provably-not-null-values FROM ...) to WHERE antijoin condition AND x IS NOT NULL AND y IS NOT NULL I think this is the way to go. I'll try and get some time soon to look into adding the IS NOT NULL quals, unless you were thinking of looking again? Go for it, I've got a lot of other stuff on my plate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
David G Johnston david.g.johns...@gmail.com writes: Benedikt Grundmann wrote That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Catalog hacking could work but not recommended (nor do I know the proper commands and limitations). Do you need the database/table to accept writes during the testing period? Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
On 2014-07-11 11:07:21 -0400, Tom Lane wrote: David G Johnston david.g.johns...@gmail.com writes: Benedikt Grundmann wrote That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Catalog hacking could work but not recommended (nor do I know the proper commands and limitations). Do you need the database/table to accept writes during the testing period? Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) Works, but IIRC can cause problems at least 9.4 because concurrent cache builds might miss the pg_index row... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
Andres Freund and...@2ndquadrant.com writes: On 2014-07-11 11:07:21 -0400, Tom Lane wrote: Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) Works, but IIRC can cause problems at least 9.4 because concurrent cache builds might miss the pg_index row... If you're talking about SnapshotNow hazards, I think the risk would be minimal, and probably no worse than cases that the system will cause by itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
On 2014-07-11 11:20:08 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-07-11 11:07:21 -0400, Tom Lane wrote: Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) Works, but IIRC can cause problems at least 9.4 because concurrent cache builds might miss the pg_index row... If you're talking about SnapshotNow hazards, I think the risk would be minimal, and probably no worse than cases that the system will cause by itself. Yes, SnapshotNow. I could reproduce it causing 'spurious' HOT updates and missing index inserts a while back. And I don't think it's comparable with normal modifications. Those either have a modification blocking lock or use heap_inplace... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
Andres Freund and...@2ndquadrant.com writes: On 2014-07-11 11:20:08 -0400, Tom Lane wrote: If you're talking about SnapshotNow hazards, I think the risk would be minimal, and probably no worse than cases that the system will cause by itself. Yes, SnapshotNow. I could reproduce it causing 'spurious' HOT updates and missing index inserts a while back. And I don't think it's comparable with normal modifications. Those either have a modification blocking lock or use heap_inplace... I still think the risk is minimal, but if the OP was worried about this he could take out an AccessExclusive lock on the parent table for long enough to commit the pg_index change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On Fri, 2014-07-11 at 14:41 +0900, Fujii Masao wrote: Could you add the patch into next CF? Sure. The patch is so small I was thinking about committing it in a few days (assuming no complaints), but I'm in no hurry. The patch doesn't contain the change of the document. But I think that it's better to document what character is allowed as escape in LIKE, SIMILAR TO and SUBSTRING. It should be assumed that multi-byte characters are allowed nearly everywhere, and we should document the places where only single-byte characters are allowed. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
Jeff Davis pg...@j-davis.com writes: Attached is a small patch to $SUBJECT. In master, only single-byte characters are allowed as an escape. Of course, with the patch it must still be a single character, but it may be multi-byte. I'm concerned about the performance cost of this patch. Have you done any measurements about what kind of overhead you are putting on the inner loop of similar_escape? At the very least, please don't call GetDatabaseEncoding() over again every single time through the inner loop. More generally, why do we need to use pg_encoding_verifymb? The input data is presumably validly encoded. ISTM the significantly cheaper pg_mblen() would be more appropriate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
On Fri, Jul 11, 2014 at 11:07:21AM -0400, Tom Lane wrote: David G Johnston david.g.johns...@gmail.com writes: Benedikt Grundmann wrote That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Catalog hacking could work but not recommended (nor do I know the proper commands and limitations). Do you need the database/table to accept writes during the testing period? Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) I wonder whether this should be exposed at the SQL level? Hacking pg_index is left to superusers, but the creator of an index (or the owner of the schema) might want to experiment with disabling indices while debugging query plans as well. Turns out this is already in the TODO, Steve Singer has requested this (in particular, ALTER TABLE ... ENABLE|DISABLE INDEX ...) in http://www.postgresql.org/message-id/87hbegz5ir@cbbrowne.afilias-int.info (as linked to from the TODO wiki page), but the neighboring discussion was mostly about FK constraints. Thoughts? Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way to temporarily disable a index
On Fri, Jul 11, 2014 at 12:12 PM, Michael Banck mba...@gmx.net wrote: On Fri, Jul 11, 2014 at 11:07:21AM -0400, Tom Lane wrote: David G Johnston david.g.johns...@gmail.com writes: Benedikt Grundmann wrote That is it possible to tell the planner that index is off limits i.e. don't ever generate a plan using it? Catalog hacking could work but not recommended (nor do I know the proper commands and limitations). Do you need the database/table to accept writes during the testing period? Hacking pg_index.indisvalid could work, given a reasonably recent PG. I would not try it in production until I'd tested it ;-) I wonder whether this should be exposed at the SQL level? Hacking pg_index is left to superusers, but the creator of an index (or the owner of the schema) might want to experiment with disabling indices while debugging query plans as well. Turns out this is already in the TODO, Steve Singer has requested this (in particular, ALTER TABLE ... ENABLE|DISABLE INDEX ...) in http://www.postgresql.org/message-id/87hbegz5ir@cbbrowne.afilias-int.info (as linked to from the TODO wiki page), but the neighboring discussion was mostly about FK constraints. Thoughts? Michael Apparently work is ongoing on to allow EXPLAIN to calculate the impact a particular index has on table writes. What is needed is a mechanism to temporarily facilitate the remove impact of specific indexes on reads without having to disable the index for writing. Ideally on a per-query basis so altering the catalog doesn't make sense. I know we do not want traditional planner hints but in the spirit of the existing enable_indexscan GUC there should be a disable_readofindex='table1.index1,table1.index2,table2.index1' GUC capability that would allow for session, user, or system-level control of which indexes are to be used during table reads. David J.
Re: [HACKERS] Minmax indexes
On Thu, Jul 10, 2014 at 4:20 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Claudio Freire wrote: On Wed, Jul 9, 2014 at 6:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Another thing I noticed is that version 8 of the patch blindly believed the pages_per_range declared in catalogs. This meant that if somebody did alter index foo set pages_per_range=123 the index would immediately break (i.e. return corrupted results when queried). I have fixed this by storing the pages_per_range value used to construct the index in the metapage. Now if you do the ALTER INDEX thing, the new value is only used when the index is recreated by REINDEX. This seems a lot like parameterizing. I don't understand what that means -- care to elaborate? We've been talking about bloom filters, and how their shape differs according to the parameters of the bloom filter (number of hashes, hash type, etc). But after seeing this case of pages_per_range, I noticed it's an effective-enough mechanism. Like: CREATE INDEX ix_blah ON some_table USING bloom (somecol) WITH (BLOOM_HASHES=15, BLOOM_BUCKETS=1024, PAGES_PER_RANGE=64); Marking as read-only is ok, or emitting a NOTICE so that if anyone changes those parameters that change the shape of the index, they know it needs a rebuild would be OK too. Both mechanisms work for me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
On Fri, Jul 11, 2014 at 11:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Jeevan Chalke wrote: On Fri, Jul 11, 2014 at 3:13 PM, Sawada Masahiko sawada.m...@gmail.com wrote: And the line number should be switched to 1 when line number has reached to INT_MAX? Yes, when it goes beyond INT_MAX, wrap around to 1. BTW, I wonder, can't we simply use unsigned int instead? That was my thought also: let the variable be unsigned, and have it wrap around normally. So once you reach UINT_MAX, the next line number is zero (instead of getting stuck at UINT_MAX, which would be rather strange). Anyway I don't think anyone is going to reach the UINT_MAX limit ... I mean that would be one hell of a query, wouldn't it. If your query is upwards of a million lines, surely you are in deep trouble already. Does your text editor handle files longer than 4 billion lines? As you said, if line number reached UINT_MAX then I think that this case is too strange. I think INT_MAX is enough for line number. The v5 patch which Jeevan is created seems to good. But one point, I got hunk when patch is applied to HEAD. (doc file) So I have revised it and attached. Regards, --- Sawada Masahiko psql-line-number_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add line number as prompt option to psql
Sawada Masahiko wrote: As you said, if line number reached UINT_MAX then I think that this case is too strange. I think INT_MAX is enough for line number. My point is not whether 2 billion is a better number than 4 billion as a maximum value. My point is that wraparound of signed int is, I think, not even defined in C, whereas wraparound of unsigned int is well defined. cur_line should be declared as unsigned int. I don't trust that INT_MAX+2 arithmetic. Please don't use cur_line as a name for a global variable. Something like PSQLLineNumber seems more appropriate if it's going to be exposed through prompt.h. However, note that MainLoop() keeps state in local variables and notes that it is reentrant; what happens to your cur_line when a file is read by \i and similar? I wonder if it should be part of PsqlScanStateData instead ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tweaking NTUP_PER_BUCKET
On 10.7.2014 21:33, Tomas Vondra wrote: On 9.7.2014 16:07, Robert Haas wrote: On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the tuples, stuff them into a simple list - don't build the hash table yet (2) building the hash table - we have all the tuples in a simple list, batching is done - we know exact row count, can size the table properly - build the table We could do this, and in fact we could save quite a bit of memory if we allocated say 1MB chunks and packed the tuples in tightly instead of palloc-ing each one separately. But I worry that rescanning the data to build the hash table would slow things down too much. I did a quick test of how much memory we could save by this. The attached patch densely packs the tuples into 32kB chunks (1MB seems way too much because of small work_mem values, but I guess this might be tuned based on number of tuples / work_mem size ...). Turns out getting this working properly will quite complicated. The patch was only a quick attempt to see how much overhead there is, and didn't solve one important details - batching. The problem is that when increasing the number of batches, we need to get rid of the tuples from one of them. Which means calling pfree() on the tuples written to a temporary file, and that's not possible with the dense allocation. 1) copy into new chunks (dead end) -- The first idea I had was to just copy everything into new chunks and then throw away the old ones, but this way we might end up using 150% of work_mem on average (when the two batches are about 1/2 the data each), and in an extreme case up to 200% of work_mem (all tuples having the same key and thus falling into the same batch). So I don't think that's really a good approach. 2) walking through the tuples sequentially -- The other option is not to walk the tuples through buckets, but by walking throught the chunks - we know the tuples are stored as HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by walking the tuples in the order they're stored, we may just rearrange the tuples in already allocated memory. And maybe it could be even faster than the current code, because of the sequential access to the memory (as opposed to the random access through buckets) and CPU caches. So I like this approach - it's simple, clean and shouldn't add any overhead (memory or CPU). 3) batch-aware chunks - I also think a batch-aware chunks might work. Say we're starting with nbatch=N. Instead of allocating everything in a single chunk, we'll allocate the tuples from the chunks according to a hypothetical batch number - what batch would the tuple belong to if we had (nbatch=N*2). So we'd have two chunks (or sequences of chunks), and we'd allocate the tuples into them. Then if we actually need to increase the number of batches, we know we can just free one of the chunks, because all of the tuples are from the 'wrong' batche, and redistribute the remaining tuples into the new chunks (according to the new hypothetical batch numbers). I'm not sure how much overhead this would be, though. I think it can be done quite efficiently, though, and it shouldn't have any impact at all, if we don't do any additional batching (i.e. if the initial estimates are ok). Any other ideas how to tackle this? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Heikki Linnakangas wrote: I did again the refactoring you did back in 2006, patch attached. One thing I did differently: I moved the raw, non-encrypted, read/write functions to separate functions: pqsecure_raw_read and pqsecure_raw_write. Those functions encapsulate the SIGPIPE handling. The OpenSSL code implements a custom BIO, which calls to pqsecure_raw_read/write to do the low-level I/O. Similarly in the server-side, there are be_tls_raw_read and pg_tls_raw_write functions, which do the prepare_for_client_read()/client_read_ended() dance, so that the SSL implementation doesn't need to know about that. I'm skimming over this patch (0001). There are some issues: * You duplicated the long comment under the IDENTIFICATION tag that was in be-secure.c; it's now both in that file and also in be-secure-openssl.c. I think it should be removed from be-secure.c. Also, the hardcoded DH params are duplicated in be-secure.c, but they belong in -openssl.c only now. * There is some mixup regarding USE_SSL and USE_OPENSSL in be-secure.c. I think anything that's OpenSSL-specific should be in be-secure-openssl.c only; any new SSL implementation will need to implement all those functions. For instance, be_tls_init(). I imagine that if we select any SSL implementation, USE_SSL would get defined, and each SSL implementation would additionally define its own symbol. Unless the idea is to get rid of USE_OPENSSL completely, and use only the Makefile bit to decide which implementation to use? If so, then USE_OPENSSL as a preprocessor symbol is useless ... * ssl_renegotiation_limit is also duplicated. But removing this one is probably not going to be as easy as deleting a line from be-secure.c, because guc.c depends on that one. I think that variable should be defined in be-secure.c (i.e. delete it from -openssl) and make sure that all SSL implementations enforce it on their own somehow. The DISABLE_SIGPIPE thingy looks wrong in pqsecure_write. I think it should be like this: ssize_t pqsecure_write(PGconn *conn, const void *ptr, size_t len) { ssize_t n; #ifdef USE_SSL if (conn-ssl_in_use) { DECLARE_SIGPIPE_INFO(spinfo); DISABLE_SIGPIPE(conn, spinfo, return -1); n = pgtls_write(conn, ptr, len); RESTORE_SIGPIPE(spinfo); } else #endif /* USE_OPENSSL */ { n = pqsecure_raw_write(conn, ptr, len); } return n; } You are missing the restore call, and I moved the declaration inside the ssl_in_use block since otherwise it's not useful. The other path is fine since pqsecure_raw_write disables and restores the flag by itself. Also, you're missing DECLARE/DISABLE/RESTORE in the ssl_in_use block in pqsecure_read. (The original code does not have that code in the non-SSL path. I assume, without checking, that that's okay.) I also assume without checking that all SSL implementations would be fine with this SIGPIPE handling. Another thing that seems wrong is the REMEMBER_EPIPE stuff. The fe-secure-openssl.c code should be setting the flag, but AFAICS only the non-SSL code is doing it. Thanks for working on this -- I'm sure many distributors will be happy to be able to enable other, less license-broken TLS implementations. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
On Fri, Jul 11, 2014 at 4:55 AM, Stephen Frost sfr...@snowman.net wrote: On Thursday, July 10, 2014, Robert Haas robertmh...@gmail.com wrote: On Wed, Jul 9, 2014 at 2:13 AM, Stephen Frost sfr...@snowman.net wrote: Yes, this would be possible (and is nearly identical to the original patch, except that this includes per-role considerations), however, my thinking is that it'd be simpler to work with policy names rather than sets of quals, to use when mapping to roles, and they would potentially be useful later for other things (eg: for setting up which policies should be applied when, or which should be OR' or ANDd with other policies, or having groups of policies, etc). Hmm. I guess that's reasonable. Should the policy be a per-table object (like rules, constraints, etc.) instead of a global object? You could do: ALTER TABLE table_name ADD POLICY policy_name (quals); ALTER TABLE table_name POLICY FOR role_name IS policy_name; ALTER TABLE table_name DROP POLICY policy_name; Right, I was thinking they would be per table as they would specifically provide a name for a set of quals, and quals are naturally table-specific. I don't see a need to have them be global- that had been brought up before with the notion of applications picking their policy, but we could also add that later through another term (eg: contexts) which would then map to policies or similar. We could even extend policies to be global by mapping existing per-table ones to be global if we really needed to... My feeling at the moment is that having them be per-table makes sense and we'd still have flexibility to change later if we had some compelling reason to do so. I don't think you can really change it later. If policies are per-table, then you could have a policy p1 on table t1 and also on table t2; if they become global objects, then you can't have p1 in two places. I hope I'm not beating a dead horse here, but changing syntax after it's been released is very, very hard. But that's not an argument against doing it this way; I think per-table policies are probably simpler and better here. It means, for example, that policies need not have their own permissions and ownership structure - they're part of the table, just like a constraint, trigger, or rule, and the table owner's permissions control. I like that, and I think our users will, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] things I learned from working on memory allocation
On Thu, Jul 10, 2014 at 1:05 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jul 8, 2014 at 1:27 AM, Robert Haas robertmh...@gmail.com wrote: 6. In general, I'm worried that it's going to be hard to keep the overhead of parallel sort from leaking into the non-parallel case. With the no-allocator approach, every place that uses GetMemoryChunkSpace() or repalloc() or pfree() will have to handle the DSM and non-DSM cases differently, which isn't great for either performance or maintainability. And even with an allocator, the SortTuple array will need to use relative pointers in a DSM; that might burden the non-DSM case. I think you are right in saying that there can be a performance impact for non-parallel case due to pfree() (or other similar calls) and/or due to usage of relative pointers, but can it have measurable impact during actual sort operation? Benchmarking seems to indicate that it indeed can. If there is an noticeable impact, then do you think having separate file/infrastructure for parallel sort can help, basically non-parallel and parallel sort will have some common and some specific parts. The above layer will call the parallel/non-parallel functions based on operation need. The tuplesort.c already handles so many cases already that adding yet another axis of variability is intimidating. But it may work out that there's no better option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
Robert, On Friday, July 11, 2014, Robert Haas robertmh...@gmail.com wrote: On Fri, Jul 11, 2014 at 4:55 AM, Stephen Frost sfr...@snowman.net javascript:; wrote: My feeling at the moment is that having them be per-table makes sense and we'd still have flexibility to change later if we had some compelling reason to do so. I don't think you can really change it later. If policies are per-table, then you could have a policy p1 on table t1 and also on table t2; if they become global objects, then you can't have p1 in two places. I hope I'm not beating a dead horse here, but changing syntax after it's been released is very, very hard. Fair enough. My thinking was we'd come up with a way to map them (eg: table_policy), but I do agree that changing it later would really suck and having them be per-table makes a lot of sense. But that's not an argument against doing it this way; I think per-table policies are probably simpler and better here. It means, for example, that policies need not have their own permissions and ownership structure - they're part of the table, just like a constraint, trigger, or rule, and the table owner's permissions control. I like that, and I think our users will, too. Agreed and I believe this is more-or-less what I had proposed up-thread (not at a computer at the moment). I hope to have a chance to review and update the design and flush out the catalog definition this weekend. Thanks! Stephen
Re: [HACKERS] Minmax indexes
On Fri, Jul 11, 2014 at 6:00 PM, Claudio Freire klaussfre...@gmail.com wrote: Marking as read-only is ok, or emitting a NOTICE so that if anyone changes those parameters that change the shape of the index, they know it needs a rebuild would be OK too. Both mechanisms work for me. We don't actually have any of these mechanisms. They wouldn't be bad things to have but I don't think we should gate adding new types of indexes on adding them. In particular, the index could just hard code a value for these parameters and having them be parameterized is clearly better even if that doesn't produce all the warnings or rebuild things automatically or whatever. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Fri, Jul 11, 2014 at 3:47 PM, Greg Stark st...@mit.edu wrote: On Fri, Jul 11, 2014 at 6:00 PM, Claudio Freire klaussfre...@gmail.com wrote: Marking as read-only is ok, or emitting a NOTICE so that if anyone changes those parameters that change the shape of the index, they know it needs a rebuild would be OK too. Both mechanisms work for me. We don't actually have any of these mechanisms. They wouldn't be bad things to have but I don't think we should gate adding new types of indexes on adding them. In particular, the index could just hard code a value for these parameters and having them be parameterized is clearly better even if that doesn't produce all the warnings or rebuild things automatically or whatever. No, I agree, it's just a nice to have. But at least the docs should mention it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Over-optimization in ExecEvalWholeRowVar
This example in the regression database is a simplified version of a bug I was shown off-list: regression=# select ( select q from ( select 1,2,3 where f10 union all select 4,5,6.0 where f1=0 ) q ) from int4_tbl; ERROR: record type has not been registered The reason for the problem is that ExecEvalWholeRowVar is designed to bless the Var's record type (via assign_record_type_typmod) just once per query. That works fine, as long as the source tuple slot is the exact same TupleTableSlot throughout the query. But in the above example, we have an Append plan node for the UNION ALL, which will pass back its children's result slots directly --- so at the level where q is being evaluated, we see first the first leaf SELECT's output slot (which gets blessed) and then the second leaf SELECT's output slot (which doesn't). That leads to generating a composite Datum with typmod -1 ... oops. I can reproduce this bug in all active branches. Probably the reason it hasn't been identified long since is that in most scenarios the planner won't use a whole-row Var at all in cases like this, but will prefer to build RowExprs. (In particular, the inconsistent data types of the last sub-select output columns are necessary to trigger the bug in this example.) The easy fix is to move the blessing code from ExecEvalWholeRowVar into ExecEvalWholeRowFast. (ExecEvalWholeRowSlow doesn't need it since it doesn't deal with RECORD cases.) That'll add a usually-useless comparison or two per row cycle, which is unlikely to be measurable. A bigger-picture problem is that we have other once-per-query tests in ExecEvalWholeRowVar, and ExecEvalScalarVar too, whose validity should be questioned given this bug. I think they are all right, though, because those tests are concerned with validating the source rowtype, which should not change across Append children. The bug here is because we're assuming not just that the input rowtype is abstractly the same across all rows, but that the exact same TupleTableSlot is used to return every source row. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] better atomics - v0.5
On Thu, Jul 10, 2014 at 08:46:55AM +0530, Amit Kapila wrote: As per my understanding of the general theory around barriers, read and write are defined to avoid reordering due to compiler and full memory barriers are defined to avoid reordering due to processors. There are some processors that support instructions for memory barriers with acquire, release and fence semantics. Not exactly, both compilers and processors can rearrange loads and stores. Because the architecture most developers work on (x86) has such a strong memory model (it's goes to lot of effort to hide reordering) people are under the impression that it's only the compiler you need to worry about, but that's not true for any other architechture. Memory barriers/fences are on the whole discouraged if you can use acquire/release semantics because the latter are faster and much easier to optimise for. I strongly recommend the Atomic Weapons talk on the C11 memory model to help understand how they work. As a bonus it includes correct implementations for the major architectures. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] proposal: rounding up time value less than its unit.
On Thu, Jul 10, 2014 at 2:52 AM, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: Several couple weeks ago, I posted a mail to pgsql-doc. http://www.postgresql.org/message-id/53992ff8.2060...@po.ntts.co.jp In this thread, I concluded that it's better to round up the value which is less than its unit. Current behavior (rounding down) has undesirable setting risk, because some GUCs have special meaning for 0. And then I made a patch for this. Please check the attached patch. Thanks for the patch. Please add it here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Attached is a small patch to $SUBJECT. In master, only single-byte characters are allowed as an escape. Of course, with the patch it must still be a single character, but it may be multi-byte. I'm concerned about the performance cost of this patch. Have you done any measurements about what kind of overhead you are putting on the inner loop of similar_escape? I didn't consider this very performance critical, because this is looping through the pattern, which I wouldn't expect to be a long string. On my machine using en_US.UTF-8, the difference is imperceptible for a SIMILAR TO ... ESCAPE query. I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('','#') from t; which was the worst reasonable case I could think of. (It appears that selecting from a table is faster than from generate_series. I'm curious what you use when testing the performance of an individual function at the SQL level.) At the very least, please don't call GetDatabaseEncoding() over again every single time through the inner loop. More generally, why do we need to use pg_encoding_verifymb? The input data is presumably validly encoded. ISTM the significantly cheaper pg_mblen() would be more appropriate. Thank you. Using the non-verifying variants reduces the penalty in the above test to 1%. If needed, we could optimize further through code specialization, as like_escape() does. Though I think like_escape() is specialized just because MatchText() is specialized; and MatchText is specialized because it operates on the actual data, not just the pattern. So I don't see a reason to specialize similar_escape(). Regards, Jeff Davis *** a/src/backend/utils/adt/regexp.c --- b/src/backend/utils/adt/regexp.c *** *** 688,698 similar_escape(PG_FUNCTION_ARGS) elen = VARSIZE_ANY_EXHDR(esc_text); if (elen == 0) e = NULL; /* no escape character */ ! else if (elen != 1) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE), ! errmsg(invalid escape string), ! errhint(Escape string must be empty or one character.))); } /*-- --- 688,703 elen = VARSIZE_ANY_EXHDR(esc_text); if (elen == 0) e = NULL; /* no escape character */ ! else ! { ! int escape_mblen = pg_mbstrlen_with_len(e, elen); ! ! if (escape_mblen 1) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE), ! errmsg(invalid escape string), ! errhint(Escape string must be empty or one character.))); ! } } /*-- *** *** 722,781 similar_escape(PG_FUNCTION_ARGS) while (plen 0) { ! char pchar = *p; ! if (afterescape) { ! if (pchar == '' !incharclass) /* for SUBSTRING patterns */ ! *r++ = ((nquotes++ % 2) == 0) ? '(' : ')'; ! else { *r++ = '\\'; *r++ = pchar; } ! afterescape = false; ! } ! else if (e pchar == *e) ! { ! /* SQL99 escape character; do not send to output */ ! afterescape = true; } ! else if (incharclass) { ! if (pchar == '\\') *r++ = '\\'; ! *r++ = pchar; ! if (pchar == ']') ! incharclass = false; ! } ! else if (pchar == '[') ! { ! *r++ = pchar; ! incharclass = true; ! } ! else if (pchar == '%') ! { ! *r++ = '.'; ! *r++ = '*'; ! } ! else if (pchar == '_') ! *r++ = '.'; ! else if (pchar == '(') ! { ! /* convert to non-capturing parenthesis */ ! *r++ = '('; ! *r++ = '?'; ! *r++ = ':'; ! } ! else if (pchar == '\\' || pchar == '.' || ! pchar == '^' || pchar == '$') ! { ! *r++ = '\\'; ! *r++ = pchar; } - else - *r++ = pchar; - p++, plen--; } *r++ = ')'; --- 727,813 while (plen 0) { ! char pchar = *p; ! int mblen = pg_mblen(p); ! if (mblen == 1) { ! if (afterescape) ! { ! if (pchar == '' !incharclass) /* for SUBSTRING patterns */ ! *r++ = ((nquotes++ % 2) == 0) ? '(' : ')'; ! else ! { ! *r++ = '\\'; ! *r++ = pchar; ! } ! afterescape = false; ! } ! else if (e pchar == *e) ! { ! /* SQL99 escape character; do not send to output */ ! afterescape = true; ! } ! else if (incharclass) ! { ! if (pchar == '\\') ! *r++ = '\\'; ! *r++ = pchar; ! if (pchar == ']') ! incharclass = false; ! } ! else if (pchar == '[') ! { ! *r++ = pchar; ! incharclass = true; ! } ! else if (pchar == '%') ! { ! *r++ = '.'; ! *r++ = '*'; ! } ! else if (pchar == '_') ! *r++ = '.'; ! else if (pchar == '(') ! { ! /* convert to non-capturing parenthesis */ ! *r++ = '('; ! *r++ = '?'; ! *r++ =
Re: [HACKERS] Crash on backend exit w/ OpenLDAP [2.4.24, 2.4.31]
On Thu, Jun 19, 2014 at 01:01:34PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Jun 12, 2014 at 05:02:19PM -0400, Noah Misch wrote: You can cause the at-exit crash by building PostgreSQL against OpenLDAP 2.4.31, connecting with LDAP authentication, and issuing LOAD 'dblink'. 4. Detect older OpenLDAP versions at runtime, just before we would otherwise initialize OpenLDAP, and raise an error. Possibly make the same check at compile time, for packager convenience. Having pondered this some more, I lean toward the following conservative fix. Add to all supported branches a test case that triggers the crash and a configure-time warning if the OpenLDAP version falls in the vulnerable range. So long as those who build from source monitor either configure output or test suite failures, they'll have the opportunity to head off the problem. +1 for a configure warning, but I share your concern that it's likely to go unnoticed (sometimes I wish autoconf were not so chatty...). I'm not sure about the practicality of adding a test case --- how will we test that if no LDAP server is at hand? Merely attempting an LDAP connection (to a closed port, for example) initializes the library far enough to trigger the problem. Here's a patch implementing the warning and test case. The test case is based on the one I posted upthread, modified to work with installcheck, work with non-LDAP builds, and close a race condition. -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/configure b/configure index 481096c..e2850e4 100755 --- a/configure +++ b/configure @@ -9464,6 +9464,17 @@ fi fi +# PGAC_LDAP_SAFE +# -- +# PostgreSQL sometimes loads libldap_r and plain libldap into the same +# process. Check for OpenLDAP versions known not to tolerate doing so; assume +# non-OpenLDAP implementations are safe. The dblink test suite exercises the +# hazardous interaction directly. + + + + + if test $with_ldap = yes ; then if test $PORTNAME != win32; then for ac_header in ldap.h @@ -9480,6 +9491,47 @@ fi done + { $as_echo $as_me:${as_lineno-$LINENO}: checking for compatible LDAP implementation 5 +$as_echo_n checking for compatible LDAP implementation... 6; } +if ${pgac_cv_ldap_safe+:} false; then : + $as_echo_n (cached) 6 +else + cat confdefs.h - _ACEOF conftest.$ac_ext +/* end confdefs.h. */ +#include ldap.h +#if !defined(LDAP_VENDOR_VERSION) || \ + (defined(LDAP_API_FEATURE_X_OPENLDAP) \ + LDAP_VENDOR_VERSION = 20424 LDAP_VENDOR_VERSION = 20431) +choke me +#endif +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile $LINENO; then : + pgac_cv_ldap_safe=yes +else + pgac_cv_ldap_safe=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv_ldap_safe 5 +$as_echo $pgac_cv_ldap_safe 6; } + +if test $pgac_cv_ldap_safe != yes; then + { $as_echo $as_me:${as_lineno-$LINENO}: WARNING: +*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend +*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and +*** also uses LDAP will crash on exit. 5 +$as_echo $as_me: WARNING: +*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend +*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and +*** also uses LDAP will crash on exit. 2;} +fi else for ac_header in winldap.h do : diff --git a/configure.in b/configure.in index c938a5d..9f324f0 100644 --- a/configure.in +++ b/configure.in @@ -1096,10 +1096,39 @@ if test $with_libxslt = yes ; then AC_CHECK_HEADER(libxslt/xslt.h, [], [AC_MSG_ERROR([header file libxslt/xslt.h is required for XSLT support])]) fi +# PGAC_LDAP_SAFE +# -- +# PostgreSQL sometimes loads libldap_r and plain libldap into the same +# process. Check for OpenLDAP versions known not to tolerate doing so; assume +# non-OpenLDAP implementations are safe. The dblink test suite exercises the +# hazardous interaction directly. + +AC_DEFUN([PGAC_LDAP_SAFE], +[AC_CACHE_CHECK([for compatible LDAP implementation], [pgac_cv_ldap_safe], +[AC_COMPILE_IFELSE([AC_LANG_PROGRAM( +[#include ldap.h +#if !defined(LDAP_VENDOR_VERSION) || \ + (defined(LDAP_API_FEATURE_X_OPENLDAP) \ + LDAP_VENDOR_VERSION = 20424 LDAP_VENDOR_VERSION = 20431) +choke me +#endif], [])], +[pgac_cv_ldap_safe=yes], +[pgac_cv_ldap_safe=no])]) + +if test $pgac_cv_ldap_safe != yes; then + AC_MSG_WARN([ +*** With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, each backend +*** process that loads libpq (via WAL receiver, dblink, or postgres_fdw) and +*** also uses LDAP will crash on exit.]) +fi]) + + + if test $with_ldap = yes ; then if test $PORTNAME != win32; then AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file ldap.h is required for LDAP])]) + PGAC_LDAP_SAFE else
Re: [HACKERS] things I learned from working on memory allocation
On Fri, Jul 11, 2014 at 11:15 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jul 10, 2014 at 1:05 AM, Amit Kapila amit.kapil...@gmail.com wrote: If there is an noticeable impact, then do you think having separate file/infrastructure for parallel sort can help, basically non-parallel and parallel sort will have some common and some specific parts. The above layer will call the parallel/non-parallel functions based on operation need. The tuplesort.c already handles so many cases already that adding yet another axis of variability is intimidating. But it may work out that there's no better option. For using new allocator, one idea is that it works seemlesly with current memory allocation routines (palloc, pfree, repalloc, ..), another could be that have separate routines (shm_palloc, shm_pfree, ..) for allocation from new allocator. I think having separate routines means all the call sites for allocation/deallocation needs to be aware of operation type (parallel or non-parallel), however I think this can avoid the overhead for non-parallel cases. I think it is bit difficult to say which approach (with allocator or directly use dsm) will turn out to be better as both have its pros and cons as listed by you, however I feel that if we directly using dsm, we might need to change more core logic than with allocator. I believe we will face the same question again if somebody wants to parallelize the join, so one parameter to decide could be based on which approach will lead to less change in core logic/design. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] better atomics - v0.5
On Sat, Jul 12, 2014 at 1:26 AM, Martijn van Oosterhout klep...@svana.org wrote: On Thu, Jul 10, 2014 at 08:46:55AM +0530, Amit Kapila wrote: As per my understanding of the general theory around barriers, read and write are defined to avoid reordering due to compiler and full memory barriers are defined to avoid reordering due to processors. There are some processors that support instructions for memory barriers with acquire, release and fence semantics. Not exactly, both compilers and processors can rearrange loads and stores. Because the architecture most developers work on (x86) has such a strong memory model (it's goes to lot of effort to hide reordering) people are under the impression that it's only the compiler you need to worry about, but that's not true for any other architechture. I am not saying that we need only barriers to avoid reordering due to compiler, rather my point was that we need to avoid reordering due to both compiler and processor, however ways to achieve it require different API's Memory barriers/fences are on the whole discouraged if you can use acquire/release semantics because the latter are faster and much easier to optimise for. Do all processors support instructions for memory barriers with acquire, release semantics? I strongly recommend the Atomic Weapons talk on the C11 memory model to help understand how they work. As a bonus it includes correct implementations for the major architectures. Yes that will be a good if we can can implement as per C11 memory model and thats what I am referring while reviewing this patch, however even if that is not possible or difficult to achieve in all cases, it is worth to have a discussion for memory model used in current API's implemented in patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com