Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-25 Thread Ashutosh Sharma
On Sat, Mar 25, 2017 at 11:02 AM, Amit Kapila  wrote:
> On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma  
> wrote:
>> Hi,
>>
>> On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas  wrote:
>>> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila  
>>> wrote:
> Maybe we should call them "unused pages".

 +1.  If we consider some more names for that column then probably one
 alternative could be "empty pages".
>>>
>>> Yeah, but I think "unused" might be better.  Because a page could be
>>> in use (as an overflow page or primary bucket page) and still be
>>> empty.
>>>
>>
>> Based on the earlier discussions, I have prepared a patch that would
>> allow pgstathashindex() to show the number of unused pages in hash
>> index. Please find the attached patch for the same. Thanks.
>>
>
>   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
>   stats.bitmap_pages++;
> + else if (PageIsEmpty(page))
> + stats.unused_pages++;
>
> I think having this check after PageIsNew() makes more sense then
> having at the place where you currently have,

I don't think having it just below PageIsNew() check would be good.
The reason being, there can be a bucket page which is in use but still
be empty. So, if we add a check just below PageIsNew check then even
though a page is marked as bucket page and is empty it will be shown
as unused page which i feel is not correct. Here is one simple example
that illustrates this point,

Query:
==
select hash_page_type(get_raw_page('con_hash_index', 17));

gdb shows following info for this block number 17,

(gdb) p *(PageHeader)page
$1 = {pd_lsn = {xlogid = 0, xrecoff = 122297104}, pd_checksum = 0,
pd_flags = 0, pd_lower = 24,pd_upper = 8176, pd_special = 8176,
  pd_pagesize_version = 8196, pd_prune_xid = 0,pd_linp = 0x22a82d8}

(gdb) p((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag
$2 = 66

(gdb) p(((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag
& LH_BUCKET_PAGE)
$3 = 2

>From above information it is clear that this page is a bucket page and
is empty. I think it should be shown as bucket page rather than unused
page. Also,  this has already been discussed in [1].

other than that
> code-wise your patch looks okay, although I haven't tested it.
>

I have tested it properly and didn't find any issues.

> I think this should also be tracked under PostgreSQL 10 open items,
> but as this is not directly a bug, so not sure, what do others think?

Well, Even I am not sure if this has to be added under open items list
or not. Thanks.

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobwLKpKe99qnTJCBzFB4UaVGKrLNX3hwp8wcxObMDx7pA%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comments in hash_alloc_buckets

2017-03-25 Thread Ashutosh Sharma
>> While working on - [1], I realised that the following comments in
>> _hash_alloc_buckets() needs to be corrected.
>>
>> /*
>>  * Initialize the freed overflow page.  Just zeroing the page won't work,
>>  * See _hash_freeovflpage for similar usage.
>>  */
>> _hash_pageinit(page, BLCKSZ);
>>
>> Attached is the patch that corrects above comment. Thanks.
>>
>
> - * Initialize the freed overflow page.  Just zeroing the page won't work,
> + * Initialize the last page in hash index.
>
> I think saying ".. last page in hash index" sounds slightly awkward as
> this is the last page for current split point, how about just
> "Initialize the page. ..."

Yes, I mean just adding "Initialize the page. ..." looks more simple
and correct. Attached is the patch with similar comment.
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 622cc4b..4fd9cbc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1002,7 +1002,7 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 	page = (Page) zerobuf;
 
 	/*
-	 * Initialize the freed overflow page.  Just zeroing the page won't work,
+	 * Initialize the page.  Just zeroing the page won't work,
 	 * See _hash_freeovflpage for similar usage.
 	 */
 	_hash_pageinit(page, BLCKSZ);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Amit Kapila
On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
 wrote:
>
>
> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
> wrote:
>>
>>
>>
>> I was worried for the case if the index is created non-default
>> collation, will the datumIsEqual() suffice the need.  Now again
>> thinking about it, I think it will because in the index tuple we are
>> storing the value as in heap tuple.  However today it occurred to me
>> how will this work for toasted index values (index value >
>> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
>> probably won't work for toasted values.  Have you considered that
>> point?
>>
>
> No, I haven't and thanks for bringing that up. And now that I think more
> about it and see the code, I think the naive way of just comparing index
> attribute value against heap values is probably wrong. The example of
> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
> attributes that we might store differently in heap and index. Like
> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
> not clear to me if index_get_attr will return the values which are binary
> comparable to heap values.. I wonder if calling index_form_tuple on the heap
> values, fetching attributes via index_get_attr on both index tuples and then
> doing a binary compare is a more robust idea.
>

I am not sure how do you want to binary compare two datums, if you are
thinking datumIsEqual(), that won't work.  I think you need to use
datatype specific compare function something like what we do in
_bt_compare().

> Or may be that's just
> duplicating efforts.
>

I think if we do something on the lines as mentioned by me above we
might not need to duplicate the efforts.

> While looking at this problem, it occurred to me that the assumptions made
> for hash indexes are also wrong :-( Hash index has the same problem as
> expression indexes have. A change in heap value may not necessarily cause a
> change in the hash key. If we don't detect that, we will end up having two
> hash identical hash keys with the same TID pointer. This will cause the
> duplicate key scans problem since hashrecheck will return true for both the
> hash entries. That's a bummer as far as supporting WARM for hash indexes is
> concerned,
>

Yeah, I also think so.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 12:33 PM, Ashutosh Sharma  wrote:
> On Sat, Mar 25, 2017 at 11:02 AM, Amit Kapila  wrote:
>> On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma  
>> wrote:
>>> Hi,
>>>
>>> On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas  wrote:
 On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila  
 wrote:
>> Maybe we should call them "unused pages".
>
> +1.  If we consider some more names for that column then probably one
> alternative could be "empty pages".

 Yeah, but I think "unused" might be better.  Because a page could be
 in use (as an overflow page or primary bucket page) and still be
 empty.

>>>
>>> Based on the earlier discussions, I have prepared a patch that would
>>> allow pgstathashindex() to show the number of unused pages in hash
>>> index. Please find the attached patch for the same. Thanks.
>>>
>>
>>   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
>>   stats.bitmap_pages++;
>> + else if (PageIsEmpty(page))
>> + stats.unused_pages++;
>>
>> I think having this check after PageIsNew() makes more sense then
>> having at the place where you currently have,
>
> I don't think having it just below PageIsNew() check would be good.
> The reason being, there can be a bucket page which is in use but still
> be empty. So, if we add a check just below PageIsNew check then even
> though a page is marked as bucket page and is empty it will be shown
> as unused page which i feel is not correct. Here is one simple example
> that illustrates this point,
>

oh, is it a page where all the items have been deleted and no new
items have been inserted?  The reason why I have told that place is
not appropriate is because all the other checks in near by code are on
the page type and this check looks odd at that place, but we might
need to do this way if there is no other clean solution.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-25 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 9:23 PM, Robert Haas  wrote:
> On Thu, Mar 23, 2017 at 7:29 AM, Michael Paquier
>  wrote:
>> On Thu, Mar 23, 2017 at 8:19 PM, Kuntal Ghosh
>>  wrote:
>>> Hence, to be consistent with others, bgworker processes can be
>>> initialized from BackgroundWorkerInitializeConnectionBy[Oid].
>>
>> Yeah, I am fine with that. Thanks for the updated versions.
>
> I think this is still not good.  The places where pgstat_bestart() has
> been added are not even correct.  For example, the call added to
> BackgroundWriterMain() occurs after the section that does
> error-recovery, so it would get repeated every time the background
> writer recovers from an error.  There are similar problems elsewhere.
> Furthermore, although in theory there's an idea here that we're making
> it no longer the responsibility of InitPostgres() to call
> pgstat_bestart(), the patch as proposed only removes one of the two
> calls, so we really don't even have a consistent practice.  I think
> it's better to go with the idea of having InitPostgres() be
> responsible for calling this for regular backends, and
> AuxiliaryProcessMain() for auxiliary backends.  That involves
> substantially fewer calls to pgstat_bestart() and they are spread
> across only two functions, which IMHO makes fewer bugs of omission a
> lot less likely.
Agreed. Calling it from  InitPostgres() and AuxiliaryProcessMain()
seems correct because of the following two reasons as you've mentioned
up in the thread:
1. security-filtering should be left to some higher-level facility
that can make policy decisions rather than being hard-coded in the
individual modules.
2. makes fewer bugs of omission a lot less likely.

> Updated patch with that change attached.  In addition to that
> modification, I made some other alterations:
>
> - I changed the elog() for the can't-happen case in pgstat_bestart()
> from PANIC to FATAL.  The contents of shared memory aren't corrupted,
> so I think PANIC is overkill.
Agreed and duly noted for future.

> - I tweaked the comment in WalSndLoop() just before the
> pgstat_report_activity() call to accurately reflect what the effect of
> that call now is.
>
> - I changed the column ordering in pg_stat_get_activity() to put
> backend_type with the other columns that appear in pg_stat_activity,
> rather than (as the patch did) grouping it with the ones that appear
> in pg_stat_ssl.
Thank you.

> - I modified the code to tolerate a NULL return from
> AuxiliaryPidGetProc().  I am pretty sure that without that there's a
> race condition that could lead to a crash if somebody tried to call
> this function just as an auxiliary process was terminating.
Wow. Haven't thought of that. If it's called after
AuxiliaryProcKill(), a crash is evident.

> - I updated the documentation slightly.
Looks good to me.

> - I rebased over some conflicting commits.
Sorry for the inconveniences. It seems that the conflicting changes
occurred within few hours after I've posted the patch.

> If there aren't objections, I plan to commit this version.
Thank you for looking into the patch and doing the necessary changes.
All the changes look good to me and I've tested the feature and it has
passed all the regression tests.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 25 March 2017 at 13:10, Tomas Vondra  wrote:
> while working on a patch I ran into some crashes that seem to be caused by
> inconsistent handling of max_parallel_workers - queries still seem to be
> planned with parallel plans enabled, but then crash at execution.
>
> The attached script reproduces the issue on a simple query, causing crashed
> in GatherMerge, but I assume the issue is more general.

I had a look at this and found a bunch of off by 1 errors in the code.

I've attached a couple of patches, one is the minimal fix, and one
more complete one.

In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


gather_merge_minimal_fix.patch
Description: Binary data


gather_merge_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-25 Thread Thomas Munro
On Sat, Mar 25, 2017 at 6:04 PM, Amit Kapila  wrote:
> On Tue, Mar 21, 2017 at 5:51 PM, Dilip Kumar  wrote:
>> On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro
>>  wrote:
>>> I noticed a failure in the inet.sql test while running the regression
>>> tests with parallelism cranked up, and can reproduce it interactively
>>> as follows.  After an spgist index is created and the plan changes to
>>> the one shown below, the query returns no rows.
>>
>> Thanks for reporting.  Seems like we are getting issues related to
>> TBM_ONE_PAGE and TBM_EMPTY.
>>
>> I think in this area we need more testing, reason these are not tested
>> properly because these are not the natural case for parallel bitmap.
>> I think in next few days I will test more such cases by forcing the
>> parallel bitmap.
>>
>
> Okay, is your testing complete?
>
>> Here is the patch to fix the issue in hand.  I have also run the
>> regress suit with force_parallel_mode=regress and all the test are
>> passing.
>>
>
> Thomas, did you get chance to verify Dilip's latest patch?
>
> I have added this issue in PostgreSQL 10 Open Items list so that we
> don't loose track of this issue.

The result is correct with this patch.  I ran make installcheck then
the same steps as above and the query result was correct after
creating the index.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-25 Thread Ashutosh Sharma
>>
>> +1.  If we consider some more names for that column then probably one
>> alternative could be "empty pages".
>
> Yeah, but I think "unused" might be better.  Because a page could be
> in use (as an overflow page or primary bucket page) and still be
> empty.
>

 Based on the earlier discussions, I have prepared a patch that would
 allow pgstathashindex() to show the number of unused pages in hash
 index. Please find the attached patch for the same. Thanks.

>>>
>>>   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
>>>   stats.bitmap_pages++;
>>> + else if (PageIsEmpty(page))
>>> + stats.unused_pages++;
>>>
>>> I think having this check after PageIsNew() makes more sense then
>>> having at the place where you currently have,
>>
>> I don't think having it just below PageIsNew() check would be good.
>> The reason being, there can be a bucket page which is in use but still
>> be empty. So, if we add a check just below PageIsNew check then even
>> though a page is marked as bucket page and is empty it will be shown
>> as unused page which i feel is not correct. Here is one simple example
>> that illustrates this point,
>>
>
> oh, is it a page where all the items have been deleted and no new
> items have been inserted?

Yes, it is a page from where items have been removed and no new
insertion has happened.

The reason why I have told that place is
> not appropriate is because all the other checks in near by code are on
> the page type and this check looks odd at that place, but we might
> need to do this way if there is no other clean solution.

I got your point but then i think that is the only one solution we
have as of now.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PDF build is broken

2017-03-25 Thread Devrim Gündüz


Hi,

I can't build PDFs with latest snapshot tarball:


$ make postgres-A4.pdf
{ \
  echo ""; \
  echo ""; \
} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt > features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl 
../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c 
/usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl -t sgml -i 
output-html -V html-index postgres.sgml
LC_ALL=C '/usr/bin/perl' /usr/bin/collateindex.pl -f -g -i 'bookindex' -o 
bookindex.sgml HTML.index
Processing HTML.index...
2762 entries loaded...
collateindex.pl: duplicated index entry found: TRUNC  
1 entries ignored...
Done.
openjade  -D . -D . -c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d 
./stylesheet.dsl -t tex -V tex-backend -i output-print -i include-index -V 
texpdf-output -V '%paper-type%'=A4 -o postgres-A4.tex-pdf postgres.sgml
openjade:ref/alter_collation.sgml:96:10:E: [xref to REFSECT1 unsupported]
pdfjadetex postgres-A4.tex-pdf
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded 
format=pdfjadetex)
 restricted \write18 enabled.
entering extended mode
! I can't find file `postgres-A4.tex-pdf'.
<*> postgres-A4.tex-pdf
   
(Press Enter to retry, or Control-D to exit)
Please type another input file name: 


Can someone please take a look?

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Rushabh Lathia
Thanks Tomas for reporting issue and Thanks David for working
on this.

I can see the problem in GatherMerge, specially when nworkers_launched
is zero. I will look into this issue and will post a fix for the same.

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set
the max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

Thanks,


On Sat, Mar 25, 2017 at 2:21 PM, David Rowley 
wrote:

> On 25 March 2017 at 13:10, Tomas Vondra 
> wrote:
> > while working on a patch I ran into some crashes that seem to be caused
> by
> > inconsistent handling of max_parallel_workers - queries still seem to be
> > planned with parallel plans enabled, but then crash at execution.
> >
> > The attached script reproduces the issue on a simple query, causing
> crashed
> > in GatherMerge, but I assume the issue is more general.
>
> I had a look at this and found a bunch of off by 1 errors in the code.
>
> I've attached a couple of patches, one is the minimal fix, and one
> more complete one.
>
> In the more complete one I ended up ditching the
> GatherMergeState->nreaders altogether. It was always the same as
> nworkers_launched anyway, so I saw no point in it.
> Here I've attempted to make the code a bit more understandable, to
> prevent further confusion about how many elements are in each array.
> Probably there's more that can be done here. I see GatherState has
> nreaders too, but I've not looked into the detail of if it suffers
> from the same weirdness of nreaders always matching nworkers_launched.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Rushabh Lathia


Re: [HACKERS] New CORRESPONDING clause design

2017-03-25 Thread Surafel Temesgen
>
>
> I took a quick look through this and noted that it fails to touch
> ruleutils.c, which means that dumping of views containing CORRESPONDING
> certainly doesn't work.
>
fixed

> Also, the changes in parser/analyze.c seem rather massive and
> correspondingly hard to review.  Is it possible to rearrange the
> patch to reduce the size of that diff?  If you can avoid moving
> or reindenting existing code, that'd help.
>
Part of transformSetOperationTree that make union data type of
set operation target list became makeUnionDatatype inorder to
easy using it multiple time and avoid very long transformSetOperationTree
function


> The code in that area seems rather confused, too.  For instance,
> I'm not sure exactly what orderCorrespondingList() is good for,
> but it certainly doesn't look to be doing anything that either its
> name or its header comment (or the comments at the call sites) would
> suggest.  Its input and output tlists are always in the same order.
>
> It give corresponding target list a sequential resnos
Inorder to avoid touching generate_append_tlist I change
the comment and function name as such

I also think there should be some comments about exactly what matching
> semantics we're enforcing.   The SQL standard says
>
> a) If CORRESPONDING is specified, then:
>   i) Within the columns of T1, equivalent s shall
>  not be specified more than once and within the columns of
>  T2, equivalent s shall not be specified more
>  than once.
>
> That seems unreasonably stringent to me; it ought to be sufficient to
> forbid duplicates of the names listed in CORRESPONDING, or the common
> column names if there's no BY list.  But whichever restriction you prefer,
> this code seems to be failing to check it --- I certainly don't see any
> new error message about "column name "foo" appears more than once".
>
fixed

I'm not impressed by using A_Const for the members of the CORRESPONDING
> name list.  That's not a clever solution, that's a confusing kluge,
> because it's a complete violation of the meaning of A_Const.  Elsewhere
> we just use lists of String for name lists, and that seems sufficient
> here.  Personally I'd just use the existing columnList production rather
> than rolling your own.
>
fixed

>
>


corresponding_clause_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-25 Thread Craig Ringer
On 24 March 2017 at 23:21, David Steele  wrote:
> Hi Vaishnavi,
>
> On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:
>>
>> On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite >
>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?
>
>
> Daniel, any input here?
>
>> > I would also like to hear Craig's opinion on it before applying this
>> fix
>> > to the original patch, just to make sure am not missing anything
>> here.
>
>
> Craig?

I'm fairly confident that I overlooked single row mode entirely in the
original patch, though it's long enough ago that it's hard for me to
remember exactly.

I don't really have much of an opinion on the best handling of it.

I would expect to be setting single-row mode just before I requested
the *result* from the next pending query, since it relates to result
processing rather than query dispatch. But that's about all the
opinion I have here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
> Also another point which I think we should fix is, when someone set
> max_parallel_workers = 0, we should also set the
> max_parallel_workers_per_gather
> to zero. So that way it we can avoid generating the gather path with
> max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
 wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
>
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.
>
> I wondered if there's anything we can do here to better test cases
> when no workers are able to try to ensure the parallel nodes work
> correctly, but the more I think about it, the more I don't see wrong
> with just using SET max_parallel_workers = 0;
>
> My vote would be to leave the GUC behaviour as is and add some tests
> to select_parallel.sql which exploit setting max_parallel_workers to 0
> for running some tests.
>

I think force_parallel_mode=regress should test the same thing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 26 March 2017 at 00:17, Amit Kapila  wrote:
> On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
>  wrote:
>> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>>> Also another point which I think we should fix is, when someone set
>>> max_parallel_workers = 0, we should also set the
>>> max_parallel_workers_per_gather
>>> to zero. So that way it we can avoid generating the gather path with
>>> max_parallel_worker = 0.
>>
>> I see that it was actually quite useful that it works the way it does.
>> If it had worked the same as max_parallel_workers_per_gather, then
>> likely Tomas would never have found this bug.
>>
>> I wondered if there's anything we can do here to better test cases
>> when no workers are able to try to ensure the parallel nodes work
>> correctly, but the more I think about it, the more I don't see wrong
>> with just using SET max_parallel_workers = 0;
>>
>> My vote would be to leave the GUC behaviour as is and add some tests
>> to select_parallel.sql which exploit setting max_parallel_workers to 0
>> for running some tests.
>>
>
> I think force_parallel_mode=regress should test the same thing.

Not really. That flag's days are surely numbered. It creates a Gather
node, the problem was with GatherMerge.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Peter Eisentraut
On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting.  So each individual setting needs to be robust against
cases of related settings being nonsensical.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 10:13 AM, Mithun Cy  wrote:
> On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
>> Sure, I was telling you based on that.  If you are implicitly treating
>> it as 2-dimensional array, it might be easier to compute the array
>>offsets.
>
> I think calculating spares offset will not become anyway much simpler
> we still need to calculate split group and split phase separately. I
> have added a patch to show how a 2-dimensional spares code looks like
> and where all we need changes.
>

I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.

1.
+ /*
+ * The first 4 bucket belongs to first splitpoint group 0. And since group
+ * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
+ * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
+ * buckets to double the total number of buckets, which will become 2^4. I
+ * think by this time we can see a pattern which say if num_bucket > 4
+ * splitpoint group = log2(num_bucket) - 2
+ */

+ if (num_bucket <= 4)
+ splitpoint_group = 0; /* converted to base 0. */
+ else
+ splitpoint_group = _hash_log2(num_bucket) - 2;

This patch defines split point group zero has four buckets and based
on that above calculation is done.  I feel you can define it like
#define Buckets_First_Split_Group 4 and then use it in above code.
Also, in else part number 2 looks awkward, can we define it as
log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
some thing like that.  I think that way code will look neat.  I don't
like the way above comment is worded even though it is helpful to
understand the calculation.  If you want, then you can add such a
comment in file header, here it looks out of place.

2.
+power-of-2 groups, called "split points" in the code.  That means on every new
+split points we double the existing number of buckets.

split points/split point

3.
+ * which phase of allocation the bucket_num belogs to with in the group.

/belogs/belongs


I have still not completely reviewed the patch as I have ran out of
time for today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-25 Thread Simon Riggs
On 24 March 2017 at 06:23, Craig Ringer  wrote:
> On 23 March 2017 at 17:44, Craig Ringer  wrote:
>
> Minor update to catalog_xmin walsender patch to fix failure to
> parenthesize definition of PROCARRAY_PROC_FLAGS_MASK .
>
> This one's ready to go. Working on drop slots on DB drop now.

Committed. Next!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm.c API tweak

2017-03-25 Thread Alvaro Herrera
Thomas Munro wrote:

> I'd word this slightly differently:
> 
> + * If there is a CurrentResourceOwner, the new segment is born unpinned and 
> the
> + * resource owner is in charge of destroying it (and will be blamed if it
> + * doesn't).  If there's no current resource owner, then the segment starts 
> in
> + * the pinned state, so it'll stay alive until explicitely unpinned.
> 
> It's confusing that we've overloaded the term 'pin'.  When we 'pin the
> mapping', we're disassociating it from the resource owner so that it
> will remain attached for longer than the current resource owner scope.
> When we 'pin the segment' we are making it survive even if all
> backends detach (= an extra secret refcount).  What we're talking
> about here is not pinning the *segment*, but pinning the *mapping* in
> this backend.

I agree that it's unfortunate.  I think we would be happier in the long
run if we changed one of those terms.  Right now, the whole of dsm.c
appears confusing to me.  Maybe one thing that would help is to explain
the distinction between mappings and segments in the comment at the top
of the file.  This is a bigger change than I care to tackle right now,
though.

It took me a few minutes to realize that the memory allocated by
dsm_create_descriptor is backend-local, which is why dsm_attach creates
a new descriptor.

> How about: "If there is a non-NULL CurrentResourceOwner, the new
> segment is associated with it and will be automatically detached when
> the resource owner releases.  Otherwise it will remain attached until
> explicitly detached.  Creating or attaching with a NULL
> CurrentResourceOwner is equivalent to creating or attaching with a
> non-NULL CurrentResourceOwner and then calling dsm_pin_mapping()."

Sounds a lot better, but I don't think it explains the contract exactly
either, at least in the case where there is a resowner: what happens if
the resowner releases is that it logs a complaint (so what the resowner
does is act as cross-check that resources are properly handled
elsewhere, as well as ensuring sane behavior in case of error).  I think
we should stress the point that the segment must be detached before the
resowner releases in normal cases.  (Also, let's talk about segment
creation in the dsm_create comment, not attachment).

How about this:
  If there is a non-NULL CurrentResourceOwner, the new segment is
  associated with it and must be detached before the resource owner
  releases, or a warning will be logged.  If CurrentResourceOwner is
  NULL, the segment remains attached until explicitely detached or the
  session ends.
  Creating with a NULL CurrentResourceOwner is equivalent to creating
  with a non-NULL CurrentResourceOwner and then calling dsm_pin_mapping.

> Then dsm_attach() needs to say "See the note above dsm_create() about
> the CurrentResourceOwner.", since it's the same deal there.

I think trying to explain both in the comment for dsm_create() is more
confusing than helpful.  I propose to spell out the rule in both places
instead:

  If there is a non-NULL CurrentResourceOwner, the attached segment is
  associated with it and must be detached before the resource owner
  releases, or a warning will be logged.  Otherwise the segment remains
  attached until explicitely detached or the session ends.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-25 Thread Peter Eisentraut
On 3/24/17 19:13, David Steele wrote:
> Behavior for the current default of 16MB is unchanged, and all other 
> sizes go through a logical progression.

Just at a glance, without analyzing the math behind it, this scheme
seems super confusing.

> 
> 1GB:
> 00010040
> 00010080
> 000100C0
> 00010001
> 
> 256GB:
> 
> 00010010
> 00010020
> 00010030
> ...
> 000100E0
> 000100F0
> 00010001
> 
> 64GB:
> 
> 000100010004
> 000100010008
> 00010001000C
> ...
> 0001000100F8
> 0001000100FC
> 00010001


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-25 Thread Peter Eisentraut
On 3/24/17 08:18, Stephen Frost wrote:
> Peter,
> 
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> There is a function for that.
> [...]
>> There is not a function for that, but there could be one.
> 
> I'm not sure you've really considered what you're suggesting here.

Create a set-returning function that returns all the to-be-expected file
names between two LSNs.

> Beyond that, this also bakes in an assumption that we would then require
> access to a database

That is a good point, but then any change to the naming whatsoever will
create trouble.  Then we might as well choose which specific trouble.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Mar 24, 2017 at 12:46 PM, Dave Page  wrote:
> > On Fri, Mar 24, 2017 at 4:24 PM, Robert Haas  wrote:
> >> That's possible, but it really depends on the tool, not on core
> >> PostgreSQL.  The tool should be the one providing the script, because
> >> the tool is what knows its own permissions requirements.  Users who
> >> care about security won't want to grant every privilege given by a
> >> pg_monitor role to a tool that only needs a subset of those
> >> privileges.
> >
> > The upshot of this would be that every time there's a database server
> > upgrade that changes the permissions required somehow, the user has to
> > login to every server they have and run a script. It is no longer a
> > one-time thing, which makes it vastly more painful to deal with
> > upgrades.
> 
> So, I might be all wet here, but I would have expected that changes on
> the TOOL side would be vastly more frequent.  I mean, we do not change
> what a certain builtin permission does very often.  If we add
> pg_read_all_settings, what is the likelihood that the remit of that
> role is ever going to change?  I would judge that was is vastly more
> likely is that a new version of some tool would start needing that
> privilege (or some other) where it didn't before.  If that happens,
> and the script is on the tool side, then you just add a new line to
> the script.  If the script is built into initdb, then you've got to
> wait for the next major release before you can update the definition
> of pg_monitor - and maybe argue with other tool authors with different
> requirements.

The expectation here is that the pg_monitor role will include all of the
reasonable privileges in a given release that a monitor tool should be
using.

While it's likely that monitoring tools will not all be able to work
with every function they would then have access to on day 1, I'm sure
the goal for all of them will be to reach the point where they are using
all of the functions and tables they then have access to.

Moving forward, the privileges being added in future versions of PG (the
ones you point out as happening at initdb-time) would be only those
which are associated with new features in those later versions of PG.
Those new features aren't going to be back-patched and therefore it
wouldn't be possible for the tool to provide a script to GRANT access to
those new features which would work on older versions of PG, meaning
that the script you suggest would necessairly be PG-version specific.

In other words, I don't believe this is a valid concern.

> So I'm fine with this:
> 
> -if (tblspcOid != MyDatabaseTableSpace)
> +if (tblspcOid != MyDatabaseTableSpace &&
> +!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS))
> 
> But I don't like this:
> 
> +GRANT EXECUTE ON FUNCTION pgstattuple(text) TO pg_read_all_stats;

I believe we understood that to be the case.  I don't object to the
clarification, of course.

> My position is that execute permission on a function is already
> grantable, so granting it by default to a built-in role is just
> removing flexibility which would otherwise be available to the user.

To remove flexibility would require that we remove the ability to
independently GRANT that right.  We are not doing that.  Nor are we
taking anything away from the user by added a new default role- we
already claimed the pg_ namespace for roles in 9.6.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-25 Thread Dagfinn Ilmari Mannsåker
Jim Nasby  writes:

> On 2/25/17 10:27 AM, Peter Eisentraut wrote:
>> So I'm also wondering here which style people prefer so
>> I can implement it there.
>
> I think the more OO style is definitely better. I expect it would
> simplify the code as well.

I'm not a Python person, but I'd argue that the "more OO" style should
be the primary style documented, and the old style should just be
mentioned for reference.

 - ilmari

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report the number of skipped frozen pages by manual VACUUM

2017-03-25 Thread Masahiko Sawada
On Sat, Mar 25, 2017 at 2:42 AM, Fujii Masao  wrote:
> On Thu, Mar 23, 2017 at 4:28 AM, Fujii Masao  wrote:
>> On Wed, Mar 15, 2017 at 7:51 PM, Masahiko Sawada  
>> wrote:
>>> On Wed, Mar 15, 2017 at 1:09 PM, Yugo Nagata  wrote:
 On Fri, 10 Mar 2017 20:08:42 +0900
 Masahiko Sawada  wrote:

> On Fri, Mar 10, 2017 at 3:58 PM, Jim Nasby  wrote:
> > On 3/6/17 8:34 PM, Masahiko Sawada wrote:
> >>
> >> I don't think it can say "1 frozen pages" because the number of
> >> skipped pages according to visibility map is always more than 32
> >> (SKIP_PAGES_THRESHOLD).
> >
> >
> > That's just an artifact of how the VM currently works. I'm not a fan of
> > cross dependencies like that unless there's a pretty good reason.
>
> Thank you for the comment.
> Agreed. Attached fixed version patch.

 It seems that it says "frozen pages" if pinskipped_pages is not zero,
 rather than about frozenskipped_pages.

 How about writing as below?

 appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, "
 "Skipped %u pages due to buffer pins, 
 ",
 vacrelstats->pinskipped_pages),
  vacrelstats->pinskipped_pages);
 appendStringInfo(&buf, ngettext("%u frozen page.\n", "%u frozen 
 pages.\n",
 vacrelstats->frozenskipped_pages),
  vacrelstats->frozenskipped_pages);

>>>
>>> Yeah, you're right. I've attached the updated version patch
>>> incorporated your comment and fixing documentation.
>>
>> The patch looks very simple and good to me.
>> Barring objection, I will commit this.
>
> Pushed.
>

Thank you!
Actually, vacuum execution example in doc is still inconsistent with
actual output because commit 9eb344faf54a849898d9be012ddfa8204cfeb57c
added the oldest xmin to the vacuum verbose log. I mentioned another
thread[1] but it's still inconsistent.

[1] 
https://www.postgresql.org/message-id/cad21aoaga2pb3p-cwmtkxbsbkzs1bcdgblcyvcvcdxspg_x...@mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Rushabh Lathia
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/25/17 09:01, David Rowley wrote:
> > On 25 March 2017 at 23:09, Rushabh Lathia 
> wrote:
> >> Also another point which I think we should fix is, when someone set
> >> max_parallel_workers = 0, we should also set the
> >> max_parallel_workers_per_gather
> >> to zero. So that way it we can avoid generating the gather path with
> >> max_parallel_worker = 0.
> > I see that it was actually quite useful that it works the way it does.
> > If it had worked the same as max_parallel_workers_per_gather, then
> > likely Tomas would never have found this bug.
>
> Another problem is that the GUC system doesn't really support cases
> where the validity of one setting depends on the current value of
> another setting.  So each individual setting needs to be robust against
> cases of related settings being nonsensical.
>

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

Thanks,
Rushabh Lathia
www.Enterprisedb.com
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 72f30ab..e19bace 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -431,9 +431,14 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
 {
 	int			i;
 
-	for (i = 0; i < gm_state->nreaders; i++)
+	for (i = 0; i < gm_state->nreaders + 1; i++)
 	{
-		pfree(gm_state->gm_tuple_buffers[i].tuple);
+		/*
+		 * Gather merge always allow leader to participate and we don't
+		 * allocate the tuple, so no need to free the tuple for leader.
+		 */
+		if (i != gm_state->nreaders)
+			pfree(gm_state->gm_tuple_buffers[i].tuple);
 		gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
 	}
 
@@ -474,6 +479,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
 		 */
 		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
 
+		Assert(i < gm_state->nreaders + 1);
+
 		if (gather_merge_readnext(gm_state, i, false))
 			binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i));
 		else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Mar 24, 2017 at 8:30 AM, Stephen Frost  wrote:
> >> But why not do it with GRANTs in the first place then?
> >
> > This is akin to asking why do we need GRANT ALL and ALTER DEFAULT PRIVs.

I wasn't very clear with my thinking as to why I felt it was akin to
those other commands, please let me explain it a bit more clearly.

Consider if all the needs of a monitoring role existed in a 'monitor'
schema, and ignore the fact that pg_upgrade ignores default privileges
on upgrade and instead think of this from the perspective of developing
an application in PG.

In such a case, users could create a 'monitor' role, GRANT ALL to that
user on the functions and tables and views in the 'monitor' schema, and
set up DEFAULT PRIVs for the monitor role to have access to such future
functions/tables/etc as might be created in the future in that schema.

There's no way for users to do that today with those pieces of the
system which are associated with monitoring.  We are proposing given a
way for users to do that, with this proposed default role.

> Not really.  ALTER DEFAULT PRIVILEGES affects what happens for future
> objects, which isn't necessary here at all.  The monitoring tool only
> needs to be granted the minimum set of privileges that it currently
> needs, not future privileges which it or other monitoring tools may
> need in future versions.  GRANT ALL is closer, but GRANT .. ON ALL
> TABLES IN SCHEMA really is just a convenience function.  You would
> still have the capability to do the exact same thing without that
> function; it would just be more work.  And the same is true here.  I
> understand why you and Dave want to make this a single command: that's
> easier.  But, like GRANT ON ALL TABLES, it's also less flexible.  If
> you want to grant on all tables except one, you can't use that magic
> syntax any more, and so here.  pg_monitor will either target one
> particular monitoring solution (hey, maybe it'll be the one my
> employer produces!) or the judgement of one particular person
> (whatever Stephen Frost thinks it SHOULD do in a given release,
> independent of what tools on the ground do) and those aren't good
> definitions.

I don't believe the assertion that when we added GRANT ON ALL TABLES
that we reduced flexibility of the system.  I do agree that it's a
convenience function- a valuable enough one that we added it.  It is
certainly possible to have exceptions to GRANT ON ALL TABLES- you simply
remove those privileges which were GRANT'd, in the same transaction.
You are correct that it wouldn't be possible to remove pg_monitor's
rights and users would instead need to GRANT just those privileges they
wish their monitor user to have, but that is the exception to the rule.

As it relates to tools, I might be wrong, but it certainly seems like
Dave and I, from two independent consulting organizations, are on the
same page when it comes to what this pg_monitor role should have access
to, with, frankly, very little discussion needed.

The point about least-privilege is certainly one I understand, but I
don't believe it's really properly applied here.  While a given
monitoring tool might not use function X today and therefore from a
strict least-privilege standpoint it shouldn't have access to it, but
what is the risk of GRANT'ing that access to the monitor user?  Is the
expectation that a future version of that tool will need access to that
function?  If the risk is low and the expectation is that the tool will
need access in the future, then GRANT'ing the access is entirely
reasonable.

Consider that an accountant might not need access to the
employee/manager relationships today because they are processing only
invoices.  Tomorrow, however, they will need access because they are
reviewing Purchase Requests.  Does it make sense to only GRANT the
access they will actually use at the start of each day?

These are all trade-offs to consider between the risk of granting access
and not.  There will undoubtably be environments where the tool will
actually support using function X, but the user doesn't feel comfortable
with giving the monitoring user even that access and, therefore, tools
will need to at least fail gracefully in such cases, but those are the
exceptions, and will be happy to carefully craft their privileges based
on their own determination of the trade-offs.  For the vast majority of
users, allowing the monitor user access to basically everything, as long
as it can't become super-user or directly view/modify the data, will
cover the attack vectors they are most concerned with, just like the
GRANT ON ALL and ALTER DEFAULT PRIVS satisfy very similar use-cases for
a lot of users.

> > With the approach that Dave and I are advocating, we can avoid all of
> > that.  Contrib modules can bake-in GRANTs to the appropriate roles,
> > upgrades can be handled smoothly even when we add new capabilities which
> > are appropriate, users have a s

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane
More random musing ... have you considered making the jump-target fields
in expressions be relative rather than absolute indexes?  That is,
EEO_JUMP would look like

op += (stepno); \
EEO_DISPATCH(); \

instead of

op = &state->steps[stepno]; \
EEO_DISPATCH(); \

I have not carried out a full patch to make this work, but just making
that one change and examining the generated assembly code looks promising.
Instead of this

movslq  40(%r14), %r8
salq$6, %r8
addq24(%rbx), %r8
movq%r8, %r14
jmp *(%r8)

we get this

movslq  40(%r14), %rax
salq$6, %rax
addq%rax, %r14
jmp *(%r14)

which certainly looks like it ought to be faster.  Also, the real reason
I got interested in this at all is that with relative jumps, groups of
steps would be position-independent within the steps array, which would
enable some compile-time tricks that seem impractical with the current
definition.

BTW, now that I've spent a bit of time looking at the generated assembly
code, I'm kind of disinclined to believe any arguments about how we have
better control over branch prediction with the jump-threading
implementation.  At least with current gcc (6.3.1 on Fedora 25) at -O2,
what I see is multiple places jumping to the same indirect jump
instruction :-(.  It's not a total disaster: as best I can tell, all the
uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
40 of the 71 uses of EEO_NEXT by jumping to the same couple of
instructions that increment the "op" register and then do an indirect
jump :-(.

So it seems that we're at the mercy of gcc's whims as to which instruction
dispatches will be distinguishable to the hardware; which casts a very
dark shadow over any benchmarking-based arguments that X is better than Y
for branch prediction purposes.  Compiler version differences are likely
to matter a lot more than anything we do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-25 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/24/17 19:13, David Steele wrote:
> > Behavior for the current default of 16MB is unchanged, and all other 
> > sizes go through a logical progression.
> 
> Just at a glance, without analyzing the math behind it, this scheme
> seems super confusing.

Compared to:

1GB:
00010001
00010002
00010003
00010001

...?

Having the naming no longer match the LSN and also, seemingly, jump
randomly, strikes me as very confusing.  At least with the LSN-based
approach, we aren't jumping randomly but exactly in-line with what the
starting LSN of the file is, and always by the same amount (in hex).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Monitoring roles patch

2017-03-25 Thread Dave Page
Hi

> On 25 Mar 2017, at 15:55, Stephen Frost  wrote:
> 
> Robert,
> 
> * Robert Haas (robertmh...@gmail.com) wrote:
>>> On Fri, Mar 24, 2017 at 12:46 PM, Dave Page  wrote:
 On Fri, Mar 24, 2017 at 4:24 PM, Robert Haas  wrote:
 That's possible, but it really depends on the tool, not on core
 PostgreSQL.  The tool should be the one providing the script, because
 the tool is what knows its own permissions requirements.  Users who
 care about security won't want to grant every privilege given by a
 pg_monitor role to a tool that only needs a subset of those
 privileges.
>>> 
>>> The upshot of this would be that every time there's a database server
>>> upgrade that changes the permissions required somehow, the user has to
>>> login to every server they have and run a script. It is no longer a
>>> one-time thing, which makes it vastly more painful to deal with
>>> upgrades.
>> 
>> So, I might be all wet here, but I would have expected that changes on
>> the TOOL side would be vastly more frequent.  I mean, we do not change
>> what a certain builtin permission does very often.  If we add
>> pg_read_all_settings, what is the likelihood that the remit of that
>> role is ever going to change?  I would judge that was is vastly more
>> likely is that a new version of some tool would start needing that
>> privilege (or some other) where it didn't before.  If that happens,
>> and the script is on the tool side, then you just add a new line to
>> the script.  If the script is built into initdb, then you've got to
>> wait for the next major release before you can update the definition
>> of pg_monitor - and maybe argue with other tool authors with different
>> requirements.
> 
> The expectation here is that the pg_monitor role will include all of the
> reasonable privileges in a given release that a monitor tool should be
> using.

Exactly.

> 
> While it's likely that monitoring tools will not all be able to work
> with every function they would then have access to on day 1, I'm sure
> the goal for all of them will be to reach the point where they are using
> all of the functions and tables they then have access to.

Right - and we have discussed here, in at least one developer meeting, and with 
the tools team at EDB what should be included, so I'm confident we have a solid 
starting point that would be generally useful.

> 
> Moving forward, the privileges being added in future versions of PG (the
> ones you point out as happening at initdb-time) would be only those
> which are associated with new features in those later versions of PG.
> Those new features aren't going to be back-patched and therefore it
> wouldn't be possible for the tool to provide a script to GRANT access to
> those new features which would work on older versions of PG, meaning
> that the script you suggest would necessairly be PG-version specific.

Indeed.

Another issue with having the admin run a script is that any new databases 
created from template0 or the default template1, may all need the script to be 
run at that point. Admin tools that can automatically and immediately start 
monitoring new databases may need manual admin/dbowner intervention which could 
be extremely painful in large environments.

I believe this and other reasons we've described are exactly why other DBMS' do 
what we're proposing.

> 
> In other words, I don't believe this is a valid concern.
> 
>> So I'm fine with this:
>> 
>> -if (tblspcOid != MyDatabaseTableSpace)
>> +if (tblspcOid != MyDatabaseTableSpace &&
>> +!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS))
>> 
>> But I don't like this:
>> 
>> +GRANT EXECUTE ON FUNCTION pgstattuple(text) TO pg_read_all_stats;
> 
> I believe we understood that to be the case.  I don't object to the
> clarification, of course.
> 
>> My position is that execute permission on a function is already
>> grantable, so granting it by default to a built-in role is just
>> removing flexibility which would otherwise be available to the user.
> 
> To remove flexibility would require that we remove the ability to
> independently GRANT that right.  We are not doing that.  Nor are we
> taking anything away from the user by added a new default role- we
> already claimed the pg_ namespace for roles in 9.6.

Right - and if a user doesn't like it, they can easily just not use the default 
role and create their own alternative instead.

>From a usability perspective, I cannot see any downsides to the default roles 
>as proposed. They take nothing away from the users that don't want/need them, 
>and add significant convenience for those that do.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Allow vacuums to report oldestxmin

2017-03-25 Thread Fujii Masao
On Mon, Mar 6, 2017 at 9:37 PM, Masahiko Sawada  wrote:
> On Fri, Mar 3, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Allow vacuums to report oldestxmin
>>
>> Allow VACUUM and Autovacuum to report the oldestxmin value they
>> used while cleaning tables, helping to make better sense out of
>> the other statistics we report in various cases.
>>
>> Branch
>> --
>> master
>>
>> Details
>> ---
>> http://git.postgresql.org/pg/commitdiff/9eb344faf54a849898d9be012ddfa8204cfeb57c
>>
>> Modified Files
>> --
>> src/backend/commands/vacuumlazy.c | 9 +
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>>
>
> Should we change the example in vacuum.sgml file as well? Attached patch.

"tuples" in the above should be "row versions"?
We should review not only this line but also all the lines in the example
of VERBOSE output, I think.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Thu, Mar 16, 2017 at 8:35 PM, David Steele  wrote:

> On 2/21/17 9:54 AM, Bernd Helmle wrote:
> > Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov:
> >> +1
> >> And you could try to use pg_wait_sampling
> >>  to sampling of wait
> >> events.
> >
> > I've tried this with your example from your blog post[1] and got this:
> >
> > (pgbench scale 1000)
> >
> > pgbench -Mprepared -S -n -c 100 -j 100 -T 300 -P2 pgbench2
> >
> > SELECT-only:
> >
> > SELECT * FROM profile_log ;
> >  ts |  event_type   | event | count
> > +---+---+---
> >  2017-02-21 15:21:52.45719  | LWLockNamed   | ProcArrayLock | 8
> >  2017-02-21 15:22:11.19594  | LWLockTranche | lock_manager  | 1
> >  2017-02-21 15:22:11.19594  | LWLockNamed   | ProcArrayLock |24
> >  2017-02-21 15:22:31.220803 | LWLockNamed   | ProcArrayLock | 1
> >  2017-02-21 15:23:01.255969 | LWLockNamed   | ProcArrayLock | 1
> >  2017-02-21 15:23:11.272254 | LWLockNamed   | ProcArrayLock | 2
> >  2017-02-21 15:23:41.313069 | LWLockNamed   | ProcArrayLock | 1
> >  2017-02-21 15:24:31.37512  | LWLockNamed   | ProcArrayLock |19
> >  2017-02-21 15:24:41.386974 | LWLockNamed   | ProcArrayLock | 1
> >  2017-02-21 15:26:41.530399 | LWLockNamed   | ProcArrayLock | 1
> > (10 rows)
> >
> > writes pgbench runs have far more events logged, see the attached text
> > file. Maybe this is of interest...
> >
> >
> > [1] http://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/
>
> This patch applies cleanly at cccbdde.  It doesn't break compilation on
> amd64 but I can't test on a Power-based machine
>
> Alexander, have you had a chance to look at Bernd's results?
>

Bernd's results look good.  But it seems that his machine is not big
enough, so effect of patch isn't huge.
I will get access to big enough Power8 machine in Monday 27 March.  I will
run benchmarks there and post results immediately.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] increasing the default WAL segment size

2017-03-25 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/24/17 08:18, Stephen Frost wrote:
> > Beyond that, this also bakes in an assumption that we would then require
> > access to a database
> 
> That is a good point, but then any change to the naming whatsoever will
> create trouble.  Then we might as well choose which specific trouble.

Right, and I'd rather we work that out before we start encouraging users
to change their WAL segment size, which is what this patch will do.

Personally, I'm alright with the patch David has produced, which is
pretty small, maintains the same names when 16MB segments are used, and
is pretty straight-forward to reason about.  I do think it'll need added
documentation to clarify how WAL segment names are calculated and
perhaps another function which returns the size of WAL segments on a
given cluster (I don't think we have that..?), and, ideally, added
regression tests or buildfarm animals which try different sizes.

On the other hand, I don't have any particular issue with the naming
scheme you proposed up-thread, which uses proper separators between the
components of a WAL filename, but that would change what happens with
16MB WAL segments today.

I'm still of the opinion that we should be changing the default to 64MB
for WAL segments.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] increasing the default WAL segment size

2017-03-25 Thread Peter Eisentraut
At this point, I suggest splitting this patch up into several
potentially less controversial pieces.

One big piece is that we currently don't support segment sizes larger
than 64 GB, for various internal arithmetic reasons.  Your patch appears
to address that.  So I suggest isolating that.  Assuming it works
correctly, I think there would be no great concern about it.

The next piece would be making the various tools aware of varying
segment sizes without having to rely on a built-in value.

The third piece would then be the rest that allows you to set the size
at initdb

If we take these in order, we would make it easier to test various sizes
and see if there are any more unforeseen issues when changing sizes.  It
would also make it easier to do performance testing so we can address
the original question of what the default size should be.

One concern I have is that your patch does not contain any tests.  There
should probably be lots of tests.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Allow vacuums to report oldestxmin

2017-03-25 Thread Masahiko Sawada
On Sun, Mar 26, 2017 at 1:37 AM, Fujii Masao  wrote:
> On Mon, Mar 6, 2017 at 9:37 PM, Masahiko Sawada  wrote:
>> On Fri, Mar 3, 2017 at 10:50 PM, Simon Riggs  wrote:
>>> Allow vacuums to report oldestxmin
>>>
>>> Allow VACUUM and Autovacuum to report the oldestxmin value they
>>> used while cleaning tables, helping to make better sense out of
>>> the other statistics we report in various cases.
>>>
>>> Branch
>>> --
>>> master
>>>
>>> Details
>>> ---
>>> http://git.postgresql.org/pg/commitdiff/9eb344faf54a849898d9be012ddfa8204cfeb57c
>>>
>>> Modified Files
>>> --
>>> src/backend/commands/vacuumlazy.c | 9 +
>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>>
>>
>> Should we change the example in vacuum.sgml file as well? Attached patch.
>
> "tuples" in the above should be "row versions"?
> We should review not only this line but also all the lines in the example
> of VERBOSE output, I think.

Right. These verbose log messages are out of date. I ran
VACUUM(VERBOSE, ANALYZE) with same scenario as current example as
possible. Attached patch updates verbose log messages.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_vacuum_example_in_doc_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Peter Geoghegan
On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila  wrote:
> I am not sure how do you want to binary compare two datums, if you are
> thinking datumIsEqual(), that won't work.  I think you need to use
> datatype specific compare function something like what we do in
> _bt_compare().

How will that interact with types like numeric, that have display
scale or similar?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Tom Lane
Alexander Korotkov  writes:
> [ lwlock-power-3.patch ]

I experimented with this patch a bit.  I can't offer help on testing it
on large PPC machines, but I can report that it breaks the build on
Apple PPC machines, apparently because of nonstandard assembly syntax.
I get "Parameter syntax error" on these four lines of assembly:

and 3,r0,r4
cmpwi   3,0
add 3,r0,r5
stwcx.  3,0,r2

I am not sure what the "3"s are meant to be --- if that's a hard-coded
register number, then it's unacceptable code to begin with, IMO.
You should be able to get gcc to give you a scratch register of its
choosing via appropriate use of the register assignment part of the
asm syntax.  I think there are examples in s_lock.h.

I'm also unhappy that this patch changes the generic implementation of
LWLockAttemptLock.  That means that we'd need to do benchmarking on *every*
machine type to see what we're paying elsewhere for the benefit of PPC.
It seems unlikely that putting an extra level of function call into that
hot-spot is zero cost.

Lastly, putting machine-specific code into atomics.c is a really bad idea.
We have a convention for where to put such code, and that is not it.

You could address both the second and third concerns, I think, by putting
the PPC asm function into port/atomics/arch-ppc.h (which is where it
belongs anyway) and making the generic implementation be a "static inline"
function in port/atomics/fallback.h.  In that way, at least with compilers
that do inlines sanely, we could expect that this patch doesn't change the
generated code for LWLockAttemptLock at all on machines other than PPC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Pavan Deolasee
On Sat, 25 Mar 2017 at 11:03 PM, Peter Geoghegan  wrote:

> On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila 
> wrote:
> > I am not sure how do you want to binary compare two datums, if you are
> > thinking datumIsEqual(), that won't work.  I think you need to use
> > datatype specific compare function something like what we do in
> > _bt_compare().
>
> How will that interact with types like numeric, that have display
> scale or similar?
>
>  I wonder why Amit thinks that datumIsEqual won't work once we convert the
heap values to index tuple and then fetch using index_get_attr. After all
that's how the current index tuple was constructed when it was inserted. In
fact, we must not rely on _bt_compare because that might return "false
positive" even for two different heap binary values  (I think). To decide
whether to do WARM update or not in heap_update we only rely on binary
comparison. Could it happen that for two different binary heap values, we
still compute the same index attribute? Even when expression indexes are
not supported?

Thanks,
Pavan



> --
> Peter Geoghegan
>
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-03-25 Thread Peter Eisentraut
On 3/13/17 09:15, Mithun Cy wrote:
> A. launch_autoprewarm_dump() RETURNS int4
> This is a SQL callable function to launch the autoprewarm worker to
> dump the buffer pool information at regular interval. In a server, we
> can only run one autoprewarm worker so if a worker sees another
> existing worker it will exit immediately. The return value is pid of
> the worker which has been launched.

Why do you need that?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] standardized backwards incompatibility tag for commits

2017-03-25 Thread Andres Freund
Hi,

Seems like it'd be good to standardize how we're declaring that a commit
contains backwards incompatible changes.  I've seen
- 'BACKWARDS INCOMPATIBLE CHANGE'
- 'BACKWARD INCOMPATIBILITY'
- a lot of free-flow text annotations like "as a
  backward-incompatibility", "This makes a backwards-incompatible change"

Especially the latter are easy to miss when looking through the commit
log and I'd bet some get missed when generating the release notes.

Standardizing something that can be reliably searched for seems like a
good idea.  And then we should denote that, and a bunch of other
conventions for commit messages, somewhere (protected wiki page?).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Sat, Mar 25, 2017 at 8:44 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > [ lwlock-power-3.patch ]
>
> I experimented with this patch a bit.  I can't offer help on testing it
> on large PPC machines, but I can report that it breaks the build on
> Apple PPC machines, apparently because of nonstandard assembly syntax.
> I get "Parameter syntax error" on these four lines of assembly:
>
> and 3,r0,r4
> cmpwi   3,0
> add 3,r0,r5
> stwcx.  3,0,r2
>
> I am not sure what the "3"s are meant to be --- if that's a hard-coded
> register number, then it's unacceptable code to begin with, IMO.
> You should be able to get gcc to give you a scratch register of its
> choosing via appropriate use of the register assignment part of the
> asm syntax.  I think there are examples in s_lock.h.
>

Right. Now I use variable bind to register instead of hard-coded register
for that purpose.

I'm also unhappy that this patch changes the generic implementation of
> LWLockAttemptLock.  That means that we'd need to do benchmarking on *every*
> machine type to see what we're paying elsewhere for the benefit of PPC.
> It seems unlikely that putting an extra level of function call into that
> hot-spot is zero cost.
>
> Lastly, putting machine-specific code into atomics.c is a really bad idea.
> We have a convention for where to put such code, and that is not it.
>
> You could address both the second and third concerns, I think, by putting
> the PPC asm function into port/atomics/arch-ppc.h (which is where it
> belongs anyway) and making the generic implementation be a "static inline"
> function in port/atomics/fallback.h.  In that way, at least with compilers
> that do inlines sanely, we could expect that this patch doesn't change the
> generated code for LWLockAttemptLock at all on machines other than PPC.
>

I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into
port/atomics/arch-ppc.h.  I also had to declare pg_atomic_uint32 there to
satisfy usage of this type as argument
of pg_atomic_fetch_mask_add_u32_impl().

I moved generic implementation of pg_atomic_fetch_mask_add_u32() into
port/atomics/generic.h.  That seems to be more appropriate place for that
than fallback.h.  Because fallback.h has definitions for cases when we
don't have atomic for this platform at all.  generic.h has definitions of
lacking atomics using defined atomics.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


lwlock-power-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standardized backwards incompatibility tag for commits

2017-03-25 Thread Tom Lane
Andres Freund  writes:
> Seems like it'd be good to standardize how we're declaring that a commit
> contains backwards incompatible changes.  I've seen
> - 'BACKWARDS INCOMPATIBLE CHANGE'
> - 'BACKWARD INCOMPATIBILITY'
> - a lot of free-flow text annotations like "as a
>   backward-incompatibility", "This makes a backwards-incompatible change"

> Especially the latter are easy to miss when looking through the commit
> log and I'd bet some get missed when generating the release notes.

Bruce might have a different opinion, but for my own part I do not think
it would make any difference in creating the release notes.  The important
thing is that the information be there in the log entry, not exactly how
it's spelled.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Tom Lane
Alexander Korotkov  writes:
> I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into
> port/atomics/arch-ppc.h.  I also had to declare pg_atomic_uint32 there to
> satisfy usage of this type as argument
> of pg_atomic_fetch_mask_add_u32_impl().

Hm, you did something wrong there, because now I get a bunch of failures:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../../src/include 
   -c -o brin.o brin.c
In file included from ../../../../src/include/port/atomics.h:123,
 from ../../../../src/include/utils/dsa.h:17,
 from ../../../../src/include/nodes/tidbitmap.h:26,
 from ../../../../src/include/access/genam.h:19,
 from ../../../../src/include/nodes/execnodes.h:17,
 from ../../../../src/include/access/brin.h:14,
 from brin.c:18:
../../../../src/include/port/atomics/generic.h:154:3: error: #error "No 
pg_atomic_test_and_set provided"
../../../../src/include/port/atomics.h: In function 'pg_atomic_init_flag':
../../../../src/include/port/atomics.h:178: warning: implicit declaration of 
function 'pg_atomic_init_flag_impl'
../../../../src/include/port/atomics.h: In function 'pg_atomic_test_set_flag':
../../../../src/include/port/atomics.h:193: warning: implicit declaration of 
function 'pg_atomic_test_set_flag_impl'
../../../../src/include/port/atomics.h: In function 
'pg_atomic_unlocked_test_flag':
../../../../src/include/port/atomics.h:208: warning: implicit declaration of 
function 'pg_atomic_unlocked_test_flag_impl'
... and so on.

I'm not entirely sure what the intended structure of these header files
is.  Maybe Andres can comment.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Valgrind failures caused by multivariate stats patch

2017-03-25 Thread Andres Freund
Hi,

I just tried to run valgrind before pushing my expression evaluation
work, but that triggers independent failures:

==2486== Uninitialised byte(s) found during client check request
==2486==at 0x5B56B9: PageAddItemExtended (bufpage.c:329)
==2486==by 0x225E14: RelationPutHeapTuple (hio.c:53)
==2486==by 0x21A656: heap_update (heapam.c:4189)
==2486==by 0x21BAA2: simple_heap_update (heapam.c:4511)
==2486==by 0x2B2B1F: CatalogTupleUpdate (indexing.c:216)
==2486==by 0x5784C2: statext_store (extended_stats.c:298)
==2486==by 0x577CFF: BuildRelationExtStatistics (extended_stats.c:99)
==2486==by 0x3546B6: do_analyze_rel (analyze.c:577)
==2486==by 0x353B76: analyze_rel (analyze.c:271)
==2486==by 0x3E7B89: vacuum (vacuum.c:321)
==2486==by 0x3E7755: ExecVacuum (vacuum.c:122)
==2486==by 0x5C5F1C: standard_ProcessUtility (utility.c:670)
==2486==by 0x5C5724: ProcessUtility (utility.c:353)
==2486==by 0x5C46CA: PortalRunUtility (pquery.c:1174)
==2486==by 0x5C48D0: PortalRunMulti (pquery.c:1317)
==2486==by 0x5C3E1A: PortalRun (pquery.c:795)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==  Address 0x92b7f93 is 195 bytes inside a block of size 215 
client-defined
==2486==at 0x761EAC: MemoryContextAllocExtended (mcxt.c:840)
==2486==by 0x1C08EB: heap_form_tuple (heaptuple.c:744)
==2486==by 0x1C0BBF: heap_modify_tuple (heaptuple.c:833)
==2486==by 0x578497: statext_store (extended_stats.c:292)
==2486==by 0x577CFF: BuildRelationExtStatistics (extended_stats.c:99)
==2486==by 0x3546B6: do_analyze_rel (analyze.c:577)
==2486==by 0x353B76: analyze_rel (analyze.c:271)
==2486==by 0x3E7B89: vacuum (vacuum.c:321)
==2486==by 0x3E7755: ExecVacuum (vacuum.c:122)
==2486==by 0x5C5F1C: standard_ProcessUtility (utility.c:670)
==2486==by 0x5C5724: ProcessUtility (utility.c:353)
==2486==by 0x5C46CA: PortalRunUtility (pquery.c:1174)
==2486==by 0x5C48D0: PortalRunMulti (pquery.c:1317)
==2486==by 0x5C3E1A: PortalRun (pquery.c:795)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
==2486==  Uninitialised value was created by a heap allocation
==2486==at 0x76212F: palloc (mcxt.c:872)
==2486==by 0x578814: statext_ndistinct_build (mvdistinct.c:80)
==2486==by 0x577CCE: BuildRelationExtStatistics (extended_stats.c:94)
==2486==by 0x3546B6: do_analyze_rel (analyze.c:577)
==2486==by 0x353B76: analyze_rel (analyze.c:271)
==2486==by 0x3E7B89: vacuum (vacuum.c:321)
==2486==by 0x3E7755: ExecVacuum (vacuum.c:122)
==2486==by 0x5C5F1C: standard_ProcessUtility (utility.c:670)
==2486==by 0x5C5724: ProcessUtility (utility.c:353)
==2486==by 0x5C46CA: PortalRunUtility (pquery.c:1174)
==2486==by 0x5C48D0: PortalRunMulti (pquery.c:1317)
==2486==by 0x5C3E1A: PortalRun (pquery.c:795)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
==2486==by 0x45DFA6: main (main.c:228)
==2486== 
{
   
   Memcheck:User
   fun:PageAddItemExtended
   fun:RelationPutHeapTuple
   fun:heap_update
   fun:simple_heap_update
   fun:CatalogTupleUpdate
   fun:statext_store
   fun:BuildRelationExtStatistics
   fun:do_analyze_rel
   fun:analyze_rel
   fun:vacuum
   fun:ExecVacuum
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:PortalRunUtility
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
}
==2486== Uninitialised byte(s) found during client check request
==2486==at 0x1C4857: printtup (printtup.c:347)
==2486==by 0x401FD5: ExecutePlan (execMain.c:1681)
==2486==by 0x3FFDED: standard_ExecutorRun (execMain.c:355)
==2486==by 0x3FFC07: ExecutorRun (execMain.c:298)
==2486==by 0x5C40BD: PortalRunSelect (pquery.c:928)
==2486==by 0x5C3D50: PortalRun (pquery.c:769)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
==2486==by 0x45DFA6: main (main.c:228)
==248

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Sat, Mar 25, 2017 at 11:32 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into
> > port/atomics/arch-ppc.h.  I also had to declare pg_atomic_uint32 there to
> > satisfy usage of this type as argument
> > of pg_atomic_fetch_mask_add_u32_impl().
>
> Hm, you did something wrong there, because now I get a bunch of failures:
>
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv -g -O2
> -I../../../../src/include-c -o brin.o brin.c
> In file included from ../../../../src/include/port/atomics.h:123,
>  from ../../../../src/include/utils/dsa.h:17,
>  from ../../../../src/include/nodes/tidbitmap.h:26,
>  from ../../../../src/include/access/genam.h:19,
>  from ../../../../src/include/nodes/execnodes.h:17,
>  from ../../../../src/include/access/brin.h:14,
>  from brin.c:18:
> ../../../../src/include/port/atomics/generic.h:154:3: error: #error "No
> pg_atomic_test_and_set provided"
> ../../../../src/include/port/atomics.h: In function 'pg_atomic_init_flag':
> ../../../../src/include/port/atomics.h:178: warning: implicit declaration
> of function 'pg_atomic_init_flag_impl'
> ../../../../src/include/port/atomics.h: In function
> 'pg_atomic_test_set_flag':
> ../../../../src/include/port/atomics.h:193: warning: implicit declaration
> of function 'pg_atomic_test_set_flag_impl'
> ../../../../src/include/port/atomics.h: In function
> 'pg_atomic_unlocked_test_flag':
> ../../../../src/include/port/atomics.h:208: warning: implicit declaration
> of function 'pg_atomic_unlocked_test_flag_impl'
> ... and so on.
>
> I'm not entirely sure what the intended structure of these header files
> is.  Maybe Andres can comment.
>

It seems that on this platform definition of atomics should be provided by
fallback.h.  But it doesn't because I already
defined PG_HAVE_ATOMIC_U32_SUPPORT in arch-ppc.h.  I think in this case we
shouldn't provide ppc-specific implementation
of pg_atomic_fetch_mask_add_u32().  However, I don't know how to do this
assuming arch-ppc.h is included before compiler-specific headers.  Thus, in
arch-ppc.h we don't know yet if we would find implementation of atomics for
this platform.  One possible solution is to provide assembly implementation
for all atomics in arch-ppc.h.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Valgrind failures caused by multivariate stats patch

2017-03-25 Thread Tom Lane
Andres Freund  writes:
> I just tried to run valgrind before pushing my expression evaluation
> work, but that triggers independent failures:

FWIW, I just now finished valgrinding the regression tests on 457a44487
plus the expression patch, and it looked good.  So these failures
are definitely independent of your work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-25 Thread Tom Lane
Robert Haas  writes:
> Improve access to parallel query from procedural languages.

Mandrill has been failing since this patch went in, eg

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2017-03-25%2021%3A34%3A08

It doesn't seem to be a platform-specific problem: I can duplicate the
failure here by applying same settings mandrill uses, ie build with
-DRANDOMIZE_ALLOCATED_MEMORY and set force_parallel_mode = regress.

I doubt that the bug is directly in that patch; more likely it's
exposed a pre-existing bug in parallel logic related to triggers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-25 Thread Tom Lane
I wrote:
> It doesn't seem to be a platform-specific problem: I can duplicate the
> failure here by applying same settings mandrill uses, ie build with
> -DRANDOMIZE_ALLOCATED_MEMORY and set force_parallel_mode = regress.

Oh ... scratch that: you don't even need -DRANDOMIZE_ALLOCATED_MEMORY.
For some reason I just assumed that any parallelism-related patch
would have been tested with force_parallel_mode = regress.  This one
evidently was not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Andres Freund
Hi,

On 2017-03-24 17:16:15 -0400, Tom Lane wrote:
> Attached is an updated patch.  I believe this is committable, but
> since I whacked it around quite a bit, I'm sure you'll want to look
> it over first.

Indeed.

Points:

- It's going to take a while for me to get used to execExprInterp.c - I
  constantly try to switch to a nonexistant buffer ;)

- Like your approach with
  /* Check that current tupdesc doesn't have more fields than we allocated */
  in ExecEvalFieldStoreDeForm().

- I like EEO_FLAG_IS_QUAL.

- I think at some point (not sure whether in your or in my revision)
  ScalarArrayOpExpr lost its permission check. Added that.

- Added note that we knowingly don't perform permission checks for
  input/output funcs.

- Both pre/post patch, CoerceViaIO doesn't invoke
  InvokeFunctionExecuteHook - I'm still unclear what that hook is
  useful for, so ...

- The !caseExpr->defresult result branch is currently unreachable (and
  its equivalent was before the patch) because transformCaseExpr()
  generates a default expression.  I'm inclined to replace the dead code
  with an assertion. Any reason not to do that?

- I see you kept determination of the set of constraints to be checked
  for domain at initialization time. Made note that that a) might change
  b) could change behaviour.


> Please be sure the commit message notes that function EXECUTE permissions
> are now checked at executor startup not first call.  We need to document
> that in the v10 release notes as an incompatibility, and I'm sure we'll
> forget if the commit log doesn't point it out.

Done.


> * As we discussed, ExecEvalWholeRowVar is now using a pretty inefficient
> method for flattening tuples into datums.  I will take a to-do item to
> fix this.

Thanks.


> * ExecInitCheck is really just ExecInitExpr with a make_ands_explicit
> call in front of it.  It turned out that most of the places that were
> (or should have been) calling it were doing a make_ands_implicit call
> for no other reason than to satisfy its API.  I changed most of those
> to just call ExecInitExpr directly.  There are just a couple of call
> sites left, and I think probably those are likewise not really that
> happy with this API --- but I didn't take the time to chase down where
> the expressions were coming from in those cases.  It seems possible
> that we could remove ExecInitCheck/ExecPrepareCheck entirely.  I'll
> look into this later, too.

Thanks^2.


> * As we discussed, CaseTestValue/DomainValue are pretty messy and need
> to be rethought.  That might not get done for v10 though.

Yea, I'm disinclined to tackle this just now, there's enough other stuff
pending - but I'm willing to tackle it for v11 unless you beat me to it.


> * I'm not very happy that execSRF.c has two somewhat different
> implementations of largely similar functionality, and
> SetExprState.elidedFuncState seems like a wart.  That's mostly a
> pre-existing problem of course.  I'm satisfied with leaving it as it is
> for now, but eventually some refactoring there would be good.

Yea, that's not pretty.  Given the historical difference in behaviour
between SRF and ROWS FROM (the latter always materializes, the former
only when the set function does so itself), I'm not sure they can easily
be simplified.

I'm not really sure how to get rid of the issue underlying
elidedFuncState?  I mean we could move that case into nodeFunctionscan,
above ExecMakeTableFunctionResult's level, but that doesn't really seem
like an improvement.


I think there's some argument to be made that we should move all of
execSRF.c's logic to their respective callsites.  There's not really
much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() -
and at least the latter would even become a bit simpler if we didn't
support both callers.


Thanks a lot for your work on the patch!

And pushed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane
Andres Freund  writes:
> - The !caseExpr->defresult result branch is currently unreachable (and
>   its equivalent was before the patch) because transformCaseExpr()
>   generates a default expression.  I'm inclined to replace the dead code
>   with an assertion. Any reason not to do that?

Ah-hah.  I'd noted that that code wasn't being reached when I did some
code coverage checks this morning, but I hadn't found the cause yet.
Yeah, replacing the if-test with an assert seems fine.

> I think there's some argument to be made that we should move all of
> execSRF.c's logic to their respective callsites.  There's not really
> much shared code: ExecEvalFuncArgs(), tupledesc_match(), init_sexpr() -
> and at least the latter would even become a bit simpler if we didn't
> support both callers.

Possibly.  All that code could stand to be rethought, probably, now that
its mission is just to handle the SRF case.  But at least all the cruft is
in one place for the moment.  And I don't think it's anything we have to
fix before v10, it's just cosmetic.

> Thanks a lot for your work on the patch!

You're welcome!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Andres Freund
On 2017-03-25 12:22:15 -0400, Tom Lane wrote:
> More random musing ... have you considered making the jump-target fields
> in expressions be relative rather than absolute indexes?  That is,
> EEO_JUMP would look like
> 
>   op += (stepno); \
>   EEO_DISPATCH(); \
> 
> instead of
> 
>   op = &state->steps[stepno]; \
>   EEO_DISPATCH(); \
> 
> I have not carried out a full patch to make this work, but just making
> that one change and examining the generated assembly code looks promising.
> Instead of this
> 
>   movslq  40(%r14), %r8
>   salq$6, %r8
>   addq24(%rbx), %r8
>   movq%r8, %r14
>   jmp *(%r8)
> 
> we get this
> 
>   movslq  40(%r14), %rax
>   salq$6, %rax
>   addq%rax, %r14
>   jmp *(%r14)

That seems like a good idea.  I've not done this in the committed
version (and I don't think we necessarily need to this before the
release), but fo rthe future it seems like a good plan.  It makes sense
that it's faster - there's no need to reference state->steps.


> which certainly looks like it ought to be faster.  Also, the real reason
> I got interested in this at all is that with relative jumps, groups of
> steps would be position-independent within the steps array, which would
> enable some compile-time tricks that seem impractical with the current
> definition.

Indeed.


> BTW, now that I've spent a bit of time looking at the generated assembly
> code, I'm kind of disinclined to believe any arguments about how we have
> better control over branch prediction with the jump-threading
> implementation.

I measured the performance difference between using it and not using it,
and it came out a pretty clear plus. On gcc 6.3, gcc master snapshot,
and clang-3.9.  It's not just that more jumps are duplicated, it's also
that the switch() always adds a boundary check.


> At least with current gcc (6.3.1 on Fedora 25) at -O2,
> what I see is multiple places jumping to the same indirect jump
> instruction :-(.  It's not a total disaster: as best I can tell, all the
> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
> instructions that increment the "op" register and then do an indirect
> jump :-(.

Yea, I see some of that too - "usually" when there's more than just the
jump in common.  I think there's some gcc variables that influence this
(min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
worthwhile experimenting with setting them locally via a pragma or such.
I think Aants wanted to experiment with that, too.

Then there's also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
which causes some forms of computed goto (not ours I think) to be
deoptimized in gcc.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: git rm execQual.c

2017-03-25 Thread Andres Freund
On 2017-03-25 22:22:21 +, Tom Lane wrote:
> git rm execQual.c
> 
> Should have been in commit b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755,
> but passing the patch back and forth as a patch seems to have dropped
> that metadata.

Ah, thanks. FWIW, using git style patches (git format-patch), gets rid
of that problem, both for additions and deletions... - imo makes life a
good bit easier.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small doc fix for xmltable

2017-03-25 Thread Arjen Nienhuis
Hi,

In the devel docs for xmltable there should be a comma after XMLNAMESPACES()

Groeten, Arjen


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ba6f8dd..1180385 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10504,7 +10504,7 @@ SELECT xpath_exists('/my:a/text()', 'http://example.com";>test

 
-xmltable(
XMLNAMESPACES(namespace uri AS
namespace name,
...)
+xmltable(
XMLNAMESPACES(namespace uri AS
namespace name,
...),
   row_expression PASSING BY
REF document_expression BY
REF
   COLUMNS name {
type PATH
column_expression DEFAULT
default_expression NOT NULL
| NULL
 | FOR ORDINALITY }


[HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Andres Freund
Hi Joe,


On 2017-03-25 22:11:37 +, Andres Freund wrote:
> Faster expression evaluation and targetlist projection.

Apparently this broke the selinux integration somehow:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-03-25%2022%3A45%3A01

Unfortunately the buildfarm appears to not display regression.diffs -
making this a bit hard to debug remotely.  Any chance you could help me
out with that? I'd rather not setup selinux here...

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Joe Conway
On 03/25/2017 04:03 PM, Andres Freund wrote:
> Hi Joe,
> 
> 
> On 2017-03-25 22:11:37 +, Andres Freund wrote:
>> Faster expression evaluation and targetlist projection.
> 
> Apparently this broke the selinux integration somehow:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-03-25%2022%3A45%3A01
> 
> Unfortunately the buildfarm appears to not display regression.diffs -
> making this a bit hard to debug remotely.  Any chance you could help me
> out with that? I'd rather not setup selinux here...


Sure. What exactly do you want? Should we jump on a call or hangout or
something?

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Andres Freund
On 2017-03-25 16:36:03 -0700, Joe Conway wrote:
> On 03/25/2017 04:03 PM, Andres Freund wrote:
> > Hi Joe,
> > 
> > 
> > On 2017-03-25 22:11:37 +, Andres Freund wrote:
> >> Faster expression evaluation and targetlist projection.
> > 
> > Apparently this broke the selinux integration somehow:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-03-25%2022%3A45%3A01
> > 
> > Unfortunately the buildfarm appears to not display regression.diffs -
> > making this a bit hard to debug remotely.  Any chance you could help me
> > out with that? I'd rather not setup selinux here...
> 
> 
> Sure. What exactly do you want?

I think, for starters, seeing regression.diffs from sepgsql would be
useful.  Might already clear up what's the issue.


> Should we jump on a call or hangout or something?

I think for now just seeing regression.diffs would be enough - depending
on what the differences are. Could very well be that we're just seing a
difference due to function permission checks happening a bit earlier
now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Joe Conway
On 03/25/2017 04:45 PM, Andres Freund wrote:
> On 2017-03-25 16:36:03 -0700, Joe Conway wrote:
>> On 03/25/2017 04:03 PM, Andres Freund wrote:
>> > Hi Joe,
>> > 
>> > 
>> > On 2017-03-25 22:11:37 +, Andres Freund wrote:
>> >> Faster expression evaluation and targetlist projection.
>> > 
>> > Apparently this broke the selinux integration somehow:
>> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-03-25%2022%3A45%3A01
>> > 
>> > Unfortunately the buildfarm appears to not display regression.diffs -
>> > making this a bit hard to debug remotely.  Any chance you could help me
>> > out with that? I'd rather not setup selinux here...
>> 
>> 
>> Sure. What exactly do you want?
> 
> I think, for starters, seeing regression.diffs from sepgsql would be
> useful.  Might already clear up what's the issue.

I went looking, and even after a forced run the diff file is gone --
does the buildfarm auto-cleanup or something?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Ants Aasma
On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund  wrote:
>> At least with current gcc (6.3.1 on Fedora 25) at -O2,
>> what I see is multiple places jumping to the same indirect jump
>> instruction :-(.  It's not a total disaster: as best I can tell, all the
>> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement about
>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
>> instructions that increment the "op" register and then do an indirect
>> jump :-(.
>
> Yea, I see some of that too - "usually" when there's more than just the
> jump in common.  I think there's some gcc variables that influence this
> (min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
> worthwhile experimenting with setting them locally via a pragma or such.
> I think Aants wanted to experiment with that, too.

I haven't had the time to research this properly, but initial tests
show that with GCC 6.2 adding

#pragma GCC optimize ("no-crossjumping")

fixes merging of the op tail jumps.

Some quick and dirty benchmarking suggests that the benefit for the
interpreter is about 15% (5% speedup on a workload that spends 1/3 in
ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
vars before the indirect jump is somewhere between a tiny benefit and
no effect, certainly not worth introducing extra complexity. Clang 3.8
does the correct thing out of the box and is a couple of percent
faster than GCC with the pragma.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Andres Freund


On March 25, 2017 4:54:08 PM PDT, Joe Conway  wrote:
>On 03/25/2017 04:45 PM, Andres Freund wrote:
>> On 2017-03-25 16:36:03 -0700, Joe Conway wrote:
>>> On 03/25/2017 04:03 PM, Andres Freund wrote:
>>> > Hi Joe,
>>> > 
>>> > 
>>> > On 2017-03-25 22:11:37 +, Andres Freund wrote:
>>> >> Faster expression evaluation and targetlist projection.
>>> > 
>>> > Apparently this broke the selinux integration somehow:
>>> >
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-03-25%2022%3A45%3A01
>>> > 
>>> > Unfortunately the buildfarm appears to not display
>regression.diffs -
>>> > making this a bit hard to debug remotely.  Any chance you could
>help me
>>> > out with that? I'd rather not setup selinux here...
>>> 
>>> 
>>> Sure. What exactly do you want?
>> 
>> I think, for starters, seeing regression.diffs from sepgsql would be
>> useful.  Might already clear up what's the issue.
>
>I went looking, and even after a forced run the diff file is gone --
>does the buildfarm auto-cleanup or something?

Yes, it does. You'd probably have to run the tests manually.  Do you have 
selinux setup?  I assumed you would, given I seen to recall a talk of yours 
with references to it ;)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Andres Freund


On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
>On Sun, Mar 26, 2017 at 12:22 AM, Andres Freund 
>wrote:
>>> At least with current gcc (6.3.1 on Fedora 25) at -O2,
>>> what I see is multiple places jumping to the same indirect jump
>>> instruction :-(.  It's not a total disaster: as best I can tell, all
>the
>>> uses of EEO_JUMP remain distinct.  But gcc has chosen to implement
>about
>>> 40 of the 71 uses of EEO_NEXT by jumping to the same couple of
>>> instructions that increment the "op" register and then do an
>indirect
>>> jump :-(.
>>
>> Yea, I see some of that too - "usually" when there's more than just
>the
>> jump in common.  I think there's some gcc variables that influence
>this
>> (min-crossjump-insns (5), max-goto-duplication-insns (8)).  Might be
>> worthwhile experimenting with setting them locally via a pragma or
>such.
>> I think Aants wanted to experiment with that, too.
>
>I haven't had the time to research this properly, but initial tests
>show that with GCC 6.2 adding
>
>#pragma GCC optimize ("no-crossjumping")
>
>fixes merging of the op tail jumps.
>
>Some quick and dirty benchmarking suggests that the benefit for the
>interpreter is about 15% (5% speedup on a workload that spends 1/3 in
>ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
>vars before the indirect jump is somewhere between a tiny benefit and
>no effect, certainly not worth introducing extra complexity. Clang 3.8
>does the correct thing out of the box and is a couple of percent
>faster than GCC with the pragma.

That's large enough to be worth doing (although I recall you seeing all jumps 
commonalized).  We should probably do this on a per function basis however 
(either using pragma push option, or function attributes).

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Joe Conway
On 03/25/2017 05:21 PM, Andres Freund wrote:
> On March 25, 2017 4:54:08 PM PDT, Joe Conway  wrote:
>>On 03/25/2017 04:45 PM, Andres Freund wrote:
>>> I think, for starters, seeing regression.diffs from sepgsql would be
>>> useful.  Might already clear up what's the issue.
>>
>>I went looking, and even after a forced run the diff file is gone --
>>does the buildfarm auto-cleanup or something?
> 
> Yes, it does. You'd probably have to run the tests manually.  Do you
> have selinux setup?  I assumed you would, given I seen to recall a
> talk of yours with references to it ;)


Yeah, but those machines are MLS fully constrained, and the sepgsql
regression test specifically needs "targeted" and some other particular
setup. So the easiest box to run this on is the buildfarm animal, but I
also want to do it in a way that doesn't mess up that environment.

I found "keep_error_builds" in build-farm.conf and tried setting to 1
and rerunning in force -- that seems to have worked, so diffs attached.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
*** 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/ddl.out
2017-03-25 17:16:52.707097081 -0700
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/ddl.out
 2017-03-25 17:23:17.811479306 -0700
***
*** 187,192 
--- 187,193 
  LOG:  SELinux: allowed { search } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_schema_t:s0 tclass=db_schema 
name="pg_catalog"
  LOG:  SELinux: allowed { search } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_schema_t:s0 tclass=db_schema 
name="pg_catalog"
  LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
name="regtest_schema.regtest_table_4.y"
+ LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.float8(integer)"
  DROP INDEX regtest_index_tbl4_y;
  LOG:  SELinux: allowed { remove_name } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_schema_t:s0 tclass=db_schema 
name="regtest_schema"
  LOG:  SELinux: allowed { setattr } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table 
name="regtest_schema.regtest_table_4"

==

*** 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/alter.out
  2017-03-25 17:16:52.707097081 -0700
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/alter.out
   2017-03-25 17:23:21.280482200 -0700
***
*** 170,177 
--- 170,180 
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table 
regtest_table column a"
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table 
name="regtest_schema.regtest_table_3"
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table 
regtest_table_3 column x"
+ LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.int4eq(integer,integer)"
  ALTER TABLE regtest_table ADD CONSTRAINT test_ck CHECK (b like '%abc%') NOT 
VALID;  -- not supported
  ALTER TABLE regtest_table VALIDATE CONSTRAINT test_ck;  -- not supported
+ LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.textlike(pg_catalog.text,pg_catalog.text)"
+ LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.textlike(pg_catalog.text,pg_catalog.text)"
  ALTER TABLE regtest_table DROP CONSTRAINT test_ck;  -- not supported
  CREATE TRIGGER regtest_test_trig BEFORE UPDATE ON regtest_table
  FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();

==

*** 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out
   2017-03-25 17:16:52.7070970

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-25 Thread Peter Geoghegan
On Thu, Mar 23, 2017 at 12:35 AM, Thomas Munro
 wrote:
> Thanks.  I really appreciate your patience with the resource
> management stuff I had failed to think through.

It's a surprisingly difficult problem, that almost requires
prototyping just to explain. No need to apologize. This is the process
by which many hard problems end up being solved.

>> However, I notice that the place that this happens instead,
>> PathNameDelete(), does not repeat the fd.c step of doing a final
>> stat(), and using the stats for a pgstat_report_tempfile(). So, a new
>> pgstat_report_tempfile() call is simply missing. However, the more
>> fundamental issue in my mind is: How can you fix that? Where would it
>> go if you had it?
>
> You're right.  I may be missing something here (again), but it does
> seem straightforward to implement because we always delete each file
> that really exists exactly once (and sometimes we also try to delete
> files that don't exist due to imprecise meta-data, but that isn't
> harmful and we know when that turns out to be the case).

ISTM that your patch now shares a quality with parallel tuplesort: You
may now hold files open after an unlink() of the original link/path
that they were opened using. As Robert pointed out when discussing
parallel tuplesort earlier in the week, that comes with the risk,
however small, that the vFD cache will close() the file out from under
us during LRU maintenance, resulting in a subsequent open() (at the
tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
fine to assume that we can sanely close() the file ourselves in fd.c
error paths despite a concurrent unlink(), since we never operate on
the link itself, and there probably isn't much pressure on each
backend's vFD cache. But, is that good enough? I can't say, though I
suspect that this particular risk is one that's best avoided.

I haven't tested out how much of a problem this might be for your
patch, but I do know that resowner.c will call your shared mem segment
callback before closing any backend local vFDs, so I can't imagine how
it could be that this risk doesn't exist.

FWIW, I briefly entertained the idea that we could pin a vFD for just
a moment, ensuring that the real FD could not be close()'d out by
vfdcache LRU maintenance, which would fix this problem for parallel
tuplesort, I suppose. That may not be workable for PHJ, because PHJ
would probably need to hold on to such a "pin" for much longer, owing
to the lack of any explicit "handover" phase.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-03-25 Thread Mithun Cy
On Sat, Mar 25, 2017 at 11:46 PM, Peter Eisentraut
 wrote:
> On 3/13/17 09:15, Mithun Cy wrote:
>> A. launch_autoprewarm_dump() RETURNS int4
>> This is a SQL callable function to launch the autoprewarm worker to
>> dump the buffer pool information at regular interval. In a server, we
>> can only run one autoprewarm worker so if a worker sees another
>> existing worker it will exit immediately. The return value is pid of
>> the worker which has been launched.
>
> Why do you need that?

To launch an autoprewarm worker we have to preload the liberary which
need a server restart. If we want to start periodic dumping on an
already running server so that it can automatically prewarm on its
next restart, this API can be used to launch the autoprewarm.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-25 Thread Mengxing Liu
Thanks for your time. I've updated the proposal according to your comments. 
But I am still wondering that can you review it for a double-check to make sure 
I've made everything clear? 
You can read my replies for reference. 

> -Original Messages-
> From: "Kevin Grittner" 
> Sent Time: 2017-03-25 04:53:36 (Saturday)
> To: "Mengxing Liu" 
> Cc: "pgsql-hackers@postgresql.org" 
> Subject: Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate 
> O(N^2) scaling from rw-conflict tracking in serializable transactions
> 
> On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
>  wrote:
> 
> > I've finished a draft proposal for "Eliminate O(N^2) scaling from
> > rw-conflict tracking in serializable transactions".
> 
> I've attached some comments to the document; let me know if they
> don't show up for you or you need clarification.
> 
> Overall, if the comments are addressed, I think it is great.
> 
> --
> Kevin Grittner


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] free space map and visibility map

2017-03-25 Thread Jeff Janes
On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada 
> wrote in  gmail.com>
> > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas 
> wrote:
> > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes 
> wrote:
> > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
> then
> > >> can't leave the block as all visible or all frozen).  I think the
> issue is
> > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
> correctly,
> > >> that neither of those ever update the FSM, regardless of FPI?
> > >
> > > Yes, updates to the FSM are never logged.  Forcing replay of
> > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > >
> >
> > I think I was missing something. I imaged your situation is that FPI
> > is replayed during crash recovery after the crashed server vacuums the
> > page and marked it as all-frozen. But this situation is also resolved
> > by that solution.
>
> # HEAP2_CLEAN is issued in lazy_vacuum_page
>
> It will work but I'm not sure it is right direction for
> HEAP2_FREEZE_PAGE to touch FSM.
>
> As Masahiko said, the situation must be created by HEAP2_VISIBLE
> without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> think only the latter can happen. The comment in heap_xlog_clean
> below is right generally but if a page filled with tuples becomes
> almost empty and freezable by this cleanup, a problematic
> situation like this occurs.
>

I now think this is not the cause of the problem I am seeing.  I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
did not fix it.  With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages.  I also set up replica
streaming and turned off crashing on the master, and the FSM of the replica
stays accurate, so the WAL stream and replay logic is doing the right thing
on the replica.

I now think the dirtied FSM pages are somehow not getting marked as dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them.  It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.


> > /*
> >  * Update the FSM as well.
> >  *
> >  * XXX: Don't do this if the page was restored from full page image. We
> >  * don't bother to update the FSM in that case, it doesn't need to be
> >  * totally accurate anyway.
> >  */
>

What does that save us?  If we restored from FPI, we already have the block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.

Cheers,

Jeff


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-25 Thread Thomas Munro
On Sun, Mar 26, 2017 at 1:53 PM, Peter Geoghegan  wrote:
> ISTM that your patch now shares a quality with parallel tuplesort: You
> may now hold files open after an unlink() of the original link/path
> that they were opened using. As Robert pointed out when discussing
> parallel tuplesort earlier in the week, that comes with the risk,
> however small, that the vFD cache will close() the file out from under
> us during LRU maintenance, resulting in a subsequent open() (at the
> tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
> fine to assume that we can sanely close() the file ourselves in fd.c
> error paths despite a concurrent unlink(), since we never operate on
> the link itself, and there probably isn't much pressure on each
> backend's vFD cache. But, is that good enough? I can't say, though I
> suspect that this particular risk is one that's best avoided.
>
> I haven't tested out how much of a problem this might be for your
> patch, but I do know that resowner.c will call your shared mem segment
> callback before closing any backend local vFDs, so I can't imagine how
> it could be that this risk doesn't exist.

I wouldn't have expected anything like that to be a problem, because
FileClose() doesn't call FileAccess().  So IIUC it wouldn't ever try
to reopen a kernel fd just to close it.

But... what you said above must be a problem for Windows.  I believe
it doesn't allow files to be unlinked if they are open, and I see that
DSM segments are cleaned up in resowner's phase ==
RESOURCE_RELEASE_BEFORE_LOCKS and files are closed in phase ==
RESOURCE_RELEASE_AFTER_LOCKS.

Hmm.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 11:24 PM, Pavan Deolasee
 wrote:
>
> On Sat, 25 Mar 2017 at 11:03 PM, Peter Geoghegan  wrote:
>>
>> On Sat, Mar 25, 2017 at 12:54 AM, Amit Kapila 
>> wrote:
>> > I am not sure how do you want to binary compare two datums, if you are
>> > thinking datumIsEqual(), that won't work.  I think you need to use
>> > datatype specific compare function something like what we do in
>> > _bt_compare().
>>
>> How will that interact with types like numeric, that have display
>> scale or similar?
>>
>  I wonder why Amit thinks that datumIsEqual won't work once we convert the
> heap values to index tuple and then fetch using index_get_attr. After all
> that's how the current index tuple was constructed when it was inserted.

I think for toasted values you need to detoast before comparison and
it seems datamIsEqual won't do that job.  Am I missing something which
makes you think that datumIsEqual will work in this context.

> In
> fact, we must not rely on _bt_compare because that might return "false
> positive" even for two different heap binary values  (I think).

I am not telling to rely on _bt_compare, what I was trying to hint at
it was that I think we might need to use some column type specific
information for comparison.  I am not sure at this stage what is the
best way to deal with this problem without incurring non trivial cost.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Andres Freund
Hi,

On 2017-03-25 17:33:33 -0700, Joe Conway wrote:
> On 03/25/2017 05:21 PM, Andres Freund wrote:
> > On March 25, 2017 4:54:08 PM PDT, Joe Conway  wrote:
> >>On 03/25/2017 04:45 PM, Andres Freund wrote:
> >>> I think, for starters, seeing regression.diffs from sepgsql would be
> >>> useful.  Might already clear up what's the issue.
> >>
> >>I went looking, and even after a forced run the diff file is gone --
> >>does the buildfarm auto-cleanup or something?
> > 
> > Yes, it does. You'd probably have to run the tests manually.  Do you
> > have selinux setup?  I assumed you would, given I seen to recall a
> > talk of yours with references to it ;)

> Yeah, but those machines are MLS fully constrained, and the sepgsql
> regression test specifically needs "targeted" and some other particular
> setup. So the easiest box to run this on is the buildfarm animal, but I
> also want to do it in a way that doesn't mess up that environment.

Ah.


> I found "keep_error_builds" in build-farm.conf and tried setting to 1
> and rerunning in force -- that seems to have worked, so diffs attached.

Thanks. Those all look like "valid" differences, i.e. checks that
previously hadn't been executed because the function is never invoked,
as there's no data in those tables.

I blindly tried to fix these, let's hope that works.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Tom Lane
Andres Freund  writes:
> On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
>> I haven't had the time to research this properly, but initial tests
>> show that with GCC 6.2 adding
>> 
>> #pragma GCC optimize ("no-crossjumping")
>> 
>> fixes merging of the op tail jumps.
>> 
>> Some quick and dirty benchmarking suggests that the benefit for the
>> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
>> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
>> vars before the indirect jump is somewhere between a tiny benefit and
>> no effect, certainly not worth introducing extra complexity. Clang 3.8
>> does the correct thing out of the box and is a couple of percent
>> faster than GCC with the pragma.

> That's large enough to be worth doing (although I recall you seeing all jumps 
> commonalized).  We should probably do this on a per function basis however 
> (either using pragma push option, or function attributes).

Seems like it would be fine to do it on a per-file basis.  If you're
worried about pessimizing the out-of-line subroutines, we could move
those to a different file --- it's pretty questionable that they're
in execExprInterp.c in the first place, considering they're meant to be
used by more than just that execution method.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-25 Thread Andres Freund
On 2017-03-25 23:51:45 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
> >> I haven't had the time to research this properly, but initial tests
> >> show that with GCC 6.2 adding
> >> 
> >> #pragma GCC optimize ("no-crossjumping")
> >> 
> >> fixes merging of the op tail jumps.
> >> 
> >> Some quick and dirty benchmarking suggests that the benefit for the
> >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
> >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
> >> vars before the indirect jump is somewhere between a tiny benefit and
> >> no effect, certainly not worth introducing extra complexity. Clang 3.8
> >> does the correct thing out of the box and is a couple of percent
> >> faster than GCC with the pragma.
> 
> > That's large enough to be worth doing (although I recall you seeing all 
> > jumps commonalized).  We should probably do this on a per function basis 
> > however (either using pragma push option, or function attributes).
> 
> Seems like it would be fine to do it on a per-file basis.

I personally find per-function annotation ala
__attribute__((optimize("no-crossjumping")))
cleaner anyway.  I tested that, and it seems to work.

Obviously we'd have to hide that behind a configure test.  Could also do
tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.


> If you're
> worried about pessimizing the out-of-line subroutines, we could move
> those to a different file --- it's pretty questionable that they're
> in execExprInterp.c in the first place, considering they're meant to be
> used by more than just that execution method.

I indeed am, but having the code in the same file has a minor advantage:
It allows the compiler to partially inline them, if it feels like it
(e.g. moving null checks inline).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-25 Thread Andres Freund
On 2017-03-25 20:40:23 -0700, Andres Freund wrote:
> I blindly tried to fix these, let's hope that works.

In a second attempt (yes, reading diffs correctly is helpful), this
resolved the selinux issue. Yeha.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove !isTemp buffile.c code

2017-03-25 Thread Andres Freund
Hi,

A nearby thread [1], does some work on buffile.c.  Amongst others it
renames isTemp to something else (not sure yet why - the new name is
about as apt as before), but that's not really what I want to talk about
here.

buffile.c currently has a fair amount of code dependant on
BufFile->isTemp - which afaics hasn't ever been used.  I'd rather remove
that code, before potentially (further) breaking the unused code.

Any arguments against?

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/CAEepm=0vp18JyWa5o_=ehK1QTZrymXk3Q0NPJoSqb=tdqti...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove !isTemp buffile.c code

2017-03-25 Thread Peter Geoghegan
On Sat, Mar 25, 2017 at 10:25 PM, Andres Freund  wrote:
> buffile.c currently has a fair amount of code dependant on
> BufFile->isTemp - which afaics hasn't ever been used.  I'd rather remove
> that code, before potentially (further) breaking the unused code.

I think this is a good idea, but then I suggested it originally.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-25 Thread Peter Geoghegan
On Sat, Mar 25, 2017 at 7:56 PM, Thomas Munro
 wrote:
> On Sun, Mar 26, 2017 at 1:53 PM, Peter Geoghegan  wrote:
>> ISTM that your patch now shares a quality with parallel tuplesort: You
>> may now hold files open after an unlink() of the original link/path
>> that they were opened using. As Robert pointed out when discussing
>> parallel tuplesort earlier in the week, that comes with the risk,
>> however small, that the vFD cache will close() the file out from under
>> us during LRU maintenance, resulting in a subsequent open() (at the
>> tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
>> fine to assume that we can sanely close() the file ourselves in fd.c
>> error paths despite a concurrent unlink(), since we never operate on
>> the link itself, and there probably isn't much pressure on each
>> backend's vFD cache. But, is that good enough? I can't say, though I
>> suspect that this particular risk is one that's best avoided.
>>
>> I haven't tested out how much of a problem this might be for your
>> patch, but I do know that resowner.c will call your shared mem segment
>> callback before closing any backend local vFDs, so I can't imagine how
>> it could be that this risk doesn't exist.
>
> I wouldn't have expected anything like that to be a problem, because
> FileClose() doesn't call FileAccess().  So IIUC it wouldn't ever try
> to reopen a kernel fd just to close it.

The concern is that something somewhere does. For example, mdread()
calls FileRead(), which calls FileAccess(), ultimately because of some
obscure catalog access. It's very hard to reason about things like
that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Mithun Cy
Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>
> I think one-dimensional patch has fewer places to touch, so that looks
> better to me.  However, I think there is still hard coding and
> assumptions in code which we should try to improve.

Great!, I will continue with spares 1-dimensional improvement.

>
> 1.
> + /*
> + * The first 4 bucket belongs to first splitpoint group 0. And since group
> + * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
> + * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
> + * buckets to double the total number of buckets, which will become 2^4. I
> + * think by this time we can see a pattern which say if num_bucket > 4
> + * splitpoint group = log2(num_bucket) - 2
> + */
>
> + if (num_bucket <= 4)
> + splitpoint_group = 0; /* converted to base 0. */
> + else
> + splitpoint_group = _hash_log2(num_bucket) - 2;
>
> This patch defines split point group zero has four buckets and based
> on that above calculation is done.  I feel you can define it like
> #define Buckets_First_Split_Group 4 and then use it in above code.
> Also, in else part number 2 looks awkward, can we define it as
> log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
> some thing like that.  I think that way code will look neat.  I don't
> like the way above comment is worded even though it is helpful to
> understand the calculation.  If you want, then you can add such a
> comment in file header, here it looks out of place.

I have removed the comments. And, defined a new macro which maps
bucket to SPLIT GROUP

#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
((num_bucket <= Buckets_First_Split_Group) ? 0 : \
(_hash_log2(num_bucket) - 2))

I could not find a way to explain why minus 2? better than " The
splitpoint group of a given bucket can be taken as
(_hash_log2(bucket) - 2). Subtracted by 2 because each group have 2 ^
(x + 2) buckets.". Now I have added those with existing comments I
think that should make it little better.

Adding comments about spares array in hashutils.c's file header did
not appear right to me. I think README has some details about same.

> 2.
> +power-of-2 groups, called "split points" in the code.  That means on every 
> new
> +split points we double the existing number of buckets.
>
> split points/split point

Fixed.

> 3.
> + * which phase of allocation the bucket_num belogs to with in the group.
>
> /belogs/belongs

Fixed

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers