Re: [HACKERS] Cluster name in ps output

2014-06-26 Thread Abhijit Menon-Sen
At 2014-06-25 16:13:19 +0900, masao.fu...@gmail.com wrote:
>
> Categorizing this parameter to CONN_AUTH_SETTINGS looks strange to me

Oh yes. Sorry, I meant to respond to this point in my original review,
but forgot. Yes, CONN_AUTH_SETTINGS is just weird. But I couldn't find
an obviously better answer either.

LOGGING_WHAT would work with log_line_prefix support, but I don't think
there was as much support for that version of this patch. I personally
don't have a strong opinion about whether it's worth adding an escape.

> STATS_COLLECTOR also looks a bit strange not only for cluster_name but
> also update_process_title, though...

True. Is UNGROUPED the only answer?

> monitoring.sgml explains PS display. Isn't it better to update
> monitoring.sgml so that it refers to cluster_name?

Good point.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: show only failed queries

2014-06-26 Thread Pavel Stehule
2014-06-26 8:22 GMT+02:00 Samrat Revagade :

> > I am sending updated patch - buggy statement is printed via more logical
> psql_error function instead printf
>
> Thank you for updating patch, I really appreciate your efforts.
>
> Now, everything is good from my side.
> * it apply cleanly to the current git master
> * includes necessary docs
> * I think It is very good  and necessary feature.
>
> If Kumar Rajeev Rastogi do not have any extra comments, then I think patch
> is  ready for committer.
>

Thank you very much

Regards

Pavel


Re: [HACKERS] Cluster name in ps output

2014-06-26 Thread Fujii Masao
On Thu, Jun 26, 2014 at 4:05 PM, Abhijit Menon-Sen  wrote:
> At 2014-06-25 16:13:19 +0900, masao.fu...@gmail.com wrote:
>>
>> Categorizing this parameter to CONN_AUTH_SETTINGS looks strange to me
>
> Oh yes. Sorry, I meant to respond to this point in my original review,
> but forgot. Yes, CONN_AUTH_SETTINGS is just weird. But I couldn't find
> an obviously better answer either.
>
> LOGGING_WHAT would work with log_line_prefix support, but I don't think
> there was as much support for that version of this patch. I personally
> don't have a strong opinion about whether it's worth adding an escape.
>
>> STATS_COLLECTOR also looks a bit strange not only for cluster_name but
>> also update_process_title, though...
>
> True. Is UNGROUPED the only answer?

Or create new group like REPORTING_WHAT?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

2014-06-26 Thread Heikki Linnakangas

On 06/24/2014 01:27 AM, Andres Freund wrote:

On 2014-06-23 13:00:11 -0700, Jeff Davis wrote:

On Fri, 2014-06-20 at 09:10 +, Andres Freund wrote:

To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.


Trying to follow along. I read the discussion about bug #10533.

Before 2a8e1ac5, a block of actions -- visibility test, VM test, set VM
bit, set all-visible -- were inside the critical section and after the
WAL insert.

2a8e1ac5 left the block of actions in the critical section but moved
them before the WAL insert. The commit message seems to indicate that
there's a real problem setting the all-visible flag after the WAL
insert, but I couldn't identify a serious problem there.


Yes, that was confusing. Imo it made the state worse, not better. I'd
asked for clarification somewhere... Heikki?


Hmm. I don't see any live bug caused before 2a8e1ac5 anymore either. I 
think I missed the fact that replaying the XLOG_HEAP2_VISIBLE record 
always sets the bit in the heap page.



Does your change still make
sense to you and do you see problem with the current state (as of ecac0e2b)?


Hmm, in the current state, it's again possible that the full-page image 
doesn't contain the all-visible flag, even though the page in the buffer 
does. I guess that's OK, because replaying XLOG_HEAP2_VISIBLE always 
sets the flag. My page-comparison tool will complain, so it would be 
nice to not do that, but it's a false alarm.


Or we could call heap_page_is_all_visible() before entering the critical 
section, if we passed heap_page_is_all_visible() the list of dead tuples 
we're removing so that it could ignore them.



I'm a little concerned that we're approaching the WAL rules in an ad-hoc
manner. I don't see an actual problem right now, but we've been fixing
problems with PD_ALL_VISIBLE since the VM was made crash safe almost
exactly three years ago (503c7305). Do we finally feel confident in the
design and its implications?


I'm not particularly happy about all this either, but this section of
code is actually much newer. It was added in fdf9e21196a6/9.3.


I'm not happy with the way we deviate from the normal WAL rules with in 
the visibility map either. In the long run, we may be better off to just 
take the hit and WAL-log the VM bits like any other update. Basically, 
always behave as if checksums were enabled. But for now we just have to 
fix bugs as they're found.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Abhijit Menon-Sen
At 2014-06-25 10:36:07 -0400, sfr...@snowman.net wrote:
>
> To me, that's ridiculous on the face of it.  Other databases have had
> this kind of capability as a matter of course for decades- we are far
> behind in this area.

OK, I've done a bit more thinking, but I'm not convinced that it's so
cut-and-dried (as "ridiculous on the face of it").

Ian looked into the auditing capabilities of various (SQL and otherwise)
systems some time ago. Informix handles its auditing configuration
entirely outside the database. DB2 uses a mixture of in-database and
external configuration. Oracle stores everything in the database.

We're certainly not going to invent a mechanism other than GUC settings
or catalog tables for auditing information (à la Informix). What I think
you're saying is that without storing stuff in the catalog tables, we're
not going to be able to offer auditing capabilities worth having. I can
appreciate that, but I'm not sure I agree.

You're right, it isn't possible to sanely do something like "AUDIT
SELECT ON TABLE t1 FOR role1" without storing configuration in the
catalog tables. You're right, it would be nice to have that level
of control.

The part that I don't agree about is that using a GUC setting and
a reloption won't give us anything useful. We can use that to do
global, per-database, per-user, and per-object auditing with just
mechanisms that are familiar to Postgres users already (ALTER … SET).
Some other messages in the thread have suggested a similar mechanism,
which implies that at least some people wouldn't be unhappy with it.

No, we couldn't do combinations of TABLE/ROLE, but there's also no
terrible conflict with doing that via some new "AUDIT foo ON bar"
syntax later.

As you point out, we're decades behind, and I seriously doubt if
we're going to jump forward a few decades in time for 9.5.

What do others think of the tradeoff?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-26 Thread Christoph Berg
Re: Amit Kapila 2014-06-26 

> On Wed, Jun 25, 2014 at 9:56 PM, Vik Fearing  wrote:
> > On 06/25/2014 03:04 PM, Amit Kapila wrote:
> > > Currently you can achieve that by
> > > "ALTER SYSTEM RESET guc = Default;".
> > > However it will be good to have support for RESET as well.  I think it
> > > should not be too complicated to implement that syntax, I personally
> > > don't have bandwidth to it immediately, but I would like to take care
> > > of it unless you or someone wants to do it by the time I get some
> > > bandwidth.
> >
> > Would something like this suffice?
> 
> I think it will make sense if we support RESET ALL as well similar
> to Alter Database .. RESET ALL syntax.  Do you see any reason
> why we shouldn't support RESET ALL syntax for Alter System?

RESET ALL would definitely be useful to have as well.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Simon Riggs
On 25 June 2014 15:40, Stephen Frost  wrote:
> * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
>> At 2014-06-25 00:10:55 -0400, sfr...@snowman.net wrote:
>> > For my part, the nexts steps might be to consider how you'd migrate
>> > what you've provided for configuration into catalog tables
>>
>> I must confess that I do not understand what needs to be migrated into
>> the catalog tables, or why. Of course, pgaudit.log must be renamed, but
>> why can't it continue to be a GUC setting? (Fujii-san suggested that it
>> be integrated with log_statement. I'm not sure what I think of that, but
>> it's certainly one possibility.)
>
> What I was getting at are things like "what tables are to be audited,
> and for what users?".  I thought pg_audit had this capability today
> (isn't that part of the request for the rlsextension field..?) and I'd
> want to see something like
>
> AUDIT SELECT ON table1 FOR role1;
>
>> > and how we'd address the concerns raised elsewhere regarding catalog
>> > access in cases where we're not in a transaction
>>
>> …by not putting things into the catalog?
>
> I just don't see that as viable.

"Which tables are audited" would be available via the reloptions
field. This is basically the difference between information being
stored within the reloptions field (think of its as an hstore column)
and being stored in a separate audit-specific column. I see no
difference between them, we just need to provide a view that extracts
the useful information. It is very important we add new things as
reloptions so that we don't have to continually add new code elsewhere
to handle all the new options we add.

I don't agree with adding "AUDIT ..." syntax to the parser and having
top-level SQL commands. That just bloats the parser for no particular
gain. With reloptions we already support all the required DDL. No
further work required.

We want an audit feature in core 9.5, but I don't see those adding SQL
and adding specific catalog columns as giving us anything at all. It
will still be an audit feature without them.

Many, many users do not want audit. Many do. So the idea is to allow
audit to be added in a way that doesn't affect everybody else.

So lets commit pgaudit now and spend time on genuine enhancements to
the functionality, not just rewriting the same thing into a different
form that adds no value.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Simon Riggs
On 23 June 2014 21:51, Stephen Frost  wrote:

>  I also wouldn't want this to become
> an excuse to not improve our current logging infrastructure, which is
> how we got to the place we are wrt partitioning, imv.

Not the case at all.

I wrote the existing partitioning code in 6 weeks in 2005, with review
and rewrite by Tom. It was fairly obvious after a year or two that
there were major shortcomings that needed to be addressed.

Partitioning has been held back by not having someone that genuinely
understands both the problem and reasonable solutions from spending
time on this in the last 9 years. We've had 2.5 attempts over that
time, but the first two were completely wasted because they weren't
discussed on list first and when they were the ideas in those patches
were shot down in seconds, regrettably.

Nothing about this patch bears any resemblance to that situation.

I personally never had the time or money to fix that situation until
now, so I'm hoping and expecting that that will change in 9.5, as
discussed in developer meeting.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf and reload

2014-06-26 Thread Christoph Berg
Re: Amit Kapila 2014-06-26 

> On Wed, Jun 25, 2014 at 7:52 PM, Christoph Berg  wrote:
> > Re: Amit Kapila 2014-06-25 <
> caa4ek1+f9ztogvvw-wyj2+vt0k8_jxtziqhp8ivb7wdo1w1...@mail.gmail.com>
> >
> > > I think maintaining values both in postgresql.conf and by Alter System
> > > is not advisable.
> >
> > Possibly, but then the system should be warning about all options, not
> > just the restart-only ones. And it should warn at startup, not at
> > reload time.
> 
> How about adding a note in Alter System so that users are aware of
> such behaviour and can ensure that they don't have duplicate entries?

If the behavior isn't going to change, that issue need to be
documented, sure.

> Clearly such warnings indicate that there are conflicting settings, so
> user can take appropriate action to avoid it.

I don't think conflicting settings in postgresql.auto.conf are a user
error, or misconfiguration. They are just normal use of ALTER SYSTEM.
Of course the user might want to eventually consolidate their config,
but we shouldn't treat the situation as bogus.

Frankly what bugs me most about this is that the warnings occur only
at reload time, not at startup. If the server thinks something is
wrong with my config, it should tell me rightaway.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Changes in amcanreturn() interface to support multicolumn indexes

2014-06-26 Thread Anastasia Lubennikova
Changes in amcanreturn() interface to support multicolumn indexes.
Hi, hackers
I work on GSoC project Support_for_Index-only_scans_for_GIST_GSoC_2014

There is a question of support multicolumn index only scans for GIST.
I need help with this problem.

bool amcanreturn() - function to check whether index supports index-only
scans.
Now it's defined for
- B-tree. It always returns true, so there's no questions.
- SP-GIST. It doesn't support multicolumn indexes, so there's no problems
in spgcanreturn too.

- GIST. In first version it works only for onecolumn indexes.
gistcanreturn() checks whether fetch() method is defined for column's data
type.

There is a question of support multicolumn index only scans for GIST.
gistcanreturn() can return true if fetch is implemented for all indexed
columns and false otherwise.
But that's not very good case for multicolumn indexes.

I think, it requires extend the interface to add separate columns checking.
But I can't understand what kind of changes is required
and how would it affect on previous amcanreturn interface.


-- 
Regards,
Lubennikova Anastasia


Re: [HACKERS] Allowing join removals for more join types

2014-06-26 Thread David Rowley
On Sun, Jun 22, 2014 at 11:51 PM, Simon Riggs  wrote:

> On 17 June 2014 11:04, David Rowley  wrote:
> > On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch  wrote:
> >>
> >> As a point of procedure, I recommend separating the semijoin support
> into
> >> its
> >> own patch.  Your patch is already not small; delaying non-essential
> parts
> >> will
> >> make the essential parts more accessible to reviewers.
> >>
> >
> > In the attached patch I've removed all the SEMI and ANTI join removal
> code
> > and left only support for LEFT JOIN removal of sub-queries that can be
> > proved to be unique on the join condition by looking at the GROUP BY and
> > DISTINCT clause.
>
> Good advice, we can come back for the others later.
>
>
> > Example:
> >
> > SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP
> BY
> > value) t2 ON t1.id = t2.value;
>
> Looks good on initial look.
>
> This gets optimized...
>
> EXPLAIN (COSTS OFF)
> SELECT a.id FROM a
> LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
> GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1;
>
> does it work with transitive closure like this..
>
> EXPLAIN (COSTS OFF)
> SELECT a.id FROM a
> LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
> GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;
>
> i.e. c.id is not in the join, but we know from subselect that c.id =
> b.id and b.id is in the join
>
>
Well, there's no code that looks at equivalence of the columns in the
query, but I'm not quite sure if there would have to be or not as I can't
quite think of a way to write that query in a valid way that would cause it
not to remove the join.

The example query will fail with: ERROR:  column "b.id" must appear in the
GROUP BY clause or be used in an aggregate function

And if we rewrite it to use c.id in the target list

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT c.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id
GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1;

With this one c.id becomes b.id, since we've given the subquery the alias
'b', so I don't think there's case here to optimise anything else.

Regards

David Rowley


[HACKERS] Re: [COMMITTERS] pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

2014-06-26 Thread Jeff Davis
On Tue, 2014-06-24 at 00:27 +0200, Andres Freund wrote:
> > Can you explain? It looks like the master will still see the bit right
> > away after the HEAP2_CLEAN record, but there is nothing in the
> > HEAP2_CLEAN record to tell the standby to set all-visible. The standby
> > has to wait until the HEAP2_VISIBLE record comes along.
> 
> Nobody can see the page in that state on the master - it'll be locked
> exclusively. But that's not really the point.
> 
> What could happen with 2a8e1ac5's coding is that log_heap_clean()'s
> XLogInsert() took a full page image which then contained the all visible
> bit because the PageSetAllVisible() was done *before* it. The
> standby would have replayed the HEAP2_CLEAN including the FPI and then
> unlocked the page. At that point the heap page's all visible will be set,
> but the vm bit wouldn't.
> With the new coding (ecac0e2b9) the HEAP2_CLEAN cannot log the all
> visible bit because it's not set yet. Only the HEAP2_VISIBLE will set
> it on both the heap and the vm. Makes sense?

Yes, it makes much more sense now, thank you.

> > With checksums, Simon and I tried to create some new abstractions, like
> > MarkBufferDirtyHint, and document them in transam/README. I hope that
> > gives us some greater confidence in the checksums code[1], because we
> > can see a lot of the tricky aspects in one place, along with the rules
> > that callers should follow.
> 
> Hm. I don't think the current callsites dealing with this are easily
> amenable to something like that.

I haven't thought through the details, but I think that setting
PD_ALL_VISIBLE is too tricky of an operation to have different callers
doing different things.

It seems like there should be a way for all callers (there are only a
few) to set PD_ALL_VISIBLE and the VM bit the same way.

> > A quick look at the comment above the function tells us that
> > MarkBufferDirtyHint won't work for the all-visible bit, but perhaps an
> > adapted version could work. If so, we could separate the all-visible bit
> > from the HEAP2_CLEAN action, and it would look a lot more like setting a
> > tuple hint.
> 
> Huh? HEAP2_CLEAN doesn't set all visible? The only reason we're now
> doing a HEAP2_VISIBLE there is that removing the tuples in the _CLEAN
> step increases the chance of it the page now being all visible? It's
> separate steps.

When setting all-visible was part of the same critical section doing the
logging for HEAP2_CLEAN, it seemed to be a part of that action. But
you're right: now it's more separate.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-26 Thread Vik Fearing
On 06/26/2014 06:45 AM, Fujii Masao wrote:
>> The point of this feature, AFAICS, is to detect clients that are failing
>> > to issue another query or close their transaction as a result of bad
>> > client logic.  It's not to protect against network glitches.
>
> If so, the document should explain that cleanly. Otherwise users may
> misunderstand this parameter and try to use it to protect even long 
> transaction
> generated by network glitches or client freeze.

What does pg_stat_activity say for those cases?  If it's able to say
"idle in transaction", then this patch covers it.  If it isn't, then
that seems like a different patch.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-06-26 Thread David Rowley
On Wed, Jun 25, 2014 at 10:03 AM, Simon Riggs  wrote:

> On 23 June 2014 12:06, David Rowley  wrote:
>
> >> It's not clear to me where you get the term "sortclause" from. This is
> >> either the groupclause or distinctclause, but in the test cases you
> >> provide this shows this has nothing at all to do with sorting since
> >> there is neither an order by or a sorted aggregate anywhere near those
> >> queries. Can we think of a better name that won't confuse us in the
> >> future?
> >>
> >
> > I probably got the word "sort" from the function targetIsInSortList,
> which
> > expects a list of SortGroupClause. I've renamed the function to
> > sortlist_is_unique_on_restrictinfo() and renamed the sortclause
> parameter to
> > sortlist. Hopefully will reduce confusion about it being an ORDER BY
> clause
> > a bit more. I think sortgroupclauselist might be just a bit too long.
> What
> > do you think?
>
> OK, perhaps I should be clearer. The word "sort" here seems completely
> misplaced and we should be using a more accurately descriptive term.
> It's slightly more than editing to rename things like that, so I'd
> prefer you cam up with a better name.
>
>
hmm, I do see what you mean and understand the concern, but I was a bit
stuck on the fact it is a list of SortGroupClause after all. After a bit of
looking around the source I found a function called grouping_is_sortable
which seems to be getting given ->groupClause and ->distinctClause in a few
places around the source. I've ended up naming the
function groupinglist_is_unique_on_restrictinfo, but I can drop the word
"list" off of that if that's any better?  I did have it named
clauselist_is_unique_on_restictinfo for a few minutes, but then I noticed
that this was not very suitable since the calling function uses the
variable name clause_list for the restrictinfo list, which made it even
more confusing.

Attached is a delta patch between version 1.2 and 1.3, and also a
completely updated patch.



> Did you comment on the transitive closure question? Should we add a
> test for that, whether or not it works yet?
>
>
In my previous email.

I could change the the following to use c.id in the targetlist and group by
clause, but I'm not really sure it's testing anything new or different.

EXPLAIN (COSTS OFF)
SELECT a.id FROM a
LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP
BY b.id) b ON a.id = b.id AND b.dummy = 1;

Regards

David Rowley


> Other than that it looks pretty good to commit, so I'll wait a week
> for other objections then commit.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


subquery_leftjoin_removal_v1.3.patch
Description: Binary data


subquery_leftjoin_removal_v1.3_delta.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Ian Barwick
On 14/06/25 23:36, Stephen Frost wrote:
> Other databases have had this kind of capability as a
> matter of course for decades- we are far behind in this area.

On a related note, MySQL/MariaDB have had some sort of auditing
capability for, well, months. By no means as sophisticated as
some of the others, but still more than nothing.

  https://www.mysql.com/products/enterprise/audit.html
  https://mariadb.com/kb/en/about-the-mariadb-audit-plugin/


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-06-26 Thread Simon Riggs
On 26 June 2014 10:01, David Rowley  wrote:

>> Did you comment on the transitive closure question? Should we add a
>> test for that, whether or not it works yet?
>>
>
> In my previous email.
>
> I could change the the following to use c.id in the targetlist and group by
> clause, but I'm not really sure it's testing anything new or different.
>
> EXPLAIN (COSTS OFF)
> SELECT a.id FROM a
> LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP
> BY b.id) b ON a.id = b.id AND b.dummy = 1;

OK, agreed, no need to include.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-06-26 Thread David Rowley
On Thu, Jun 26, 2014 at 3:52 AM, Tom Lane  wrote:

> Simon Riggs  writes:
> > To be clearer, what I mean is we use only the direct proof approach,
> > for queries like this
>
> >   SELECT * FROM a WHERE id NOT IN(SELECT unknown_col FROM b WHERE
> > unknown_col IS NOT NULL);
>
> > and we don't try to do it for queries like this
>
> >   SELECT * FROM a WHERE id NOT IN(SELECT not_null_column FROM b);
>
> > because we don't know the full provenance of "not_null_column" in all
> > cases and that info is (currently) unreliable.
>
> FWIW, I think that would largely cripple the usefulness of the patch.
> If you can tell people to rewrite their queries, you might as well
> tell them to rewrite into NOT EXISTS.  The real-world queries that
> we're trying to improve invariably look like the latter case not the
> former, because people who are running into this problem usually
> aren't even thinking about the possibility of NULLs.
>
>
At first I didn't quite agree with this, but after a bit more thought I'm
coming around to it.

I had been thinking that, quite often the subquery in the NOT IN would have
a WHERE clause and not just be accessing all rows of the table, but then
it's probably very likely that when the subquery *does* contain a WHERE
clause that it's for some completely different column.. It seems a bit
weird to write NOT IN(SELECT somecol FROM table WHERE somecol = 5) it's
probably more likely to be something like NOT IN(SELECT somecol FROM table
WHERE category = 5), i.e some column that's not in the target list, and in
this case we wouldn't know that somecol couldn't be NULL.

If there's no way to tell that the target entry comes from a left join,
then would it be a bit too quirky to just do the NOT NULL checking when
list_length(query->rtable) == 1 ? or maybe even loop over query->rtable and
abort if we find an outer join type? it seems a bit hackish, but perhaps it
would please more people than it would surprise.

Regards

David Rowley




> I would actually say that if we only have the bandwidth to get one of
> these cases done, it should be the second one not the first.  It's
> unclear to me that checking for the first case would even be worth
> the planner cycles it'd cost.
>
> regards, tom lane
>


[HACKERS] Re: [COMMITTERS] pgsql: Do all-visible handling in lazy_vacuum_page() outside its critic

2014-06-26 Thread Andres Freund
On 2014-06-26 10:39:01 +0300, Heikki Linnakangas wrote:
> On 06/24/2014 01:27 AM, Andres Freund wrote:
> >Does your change still make
> >sense to you and do you see problem with the current state (as of ecac0e2b)?
>
> Hmm, in the current state, it's again possible that the full-page image
> doesn't contain the all-visible flag, even though the page in the buffer
> does. I guess that's OK, because replaying XLOG_HEAP2_VISIBLE always sets
> the flag. My page-comparison tool will complain, so it would be nice to not
> do that, but it's a false alarm.

I don't see where that difference should come from? On both the primary
and standby neither the page nor the vm will have all-visible bit set
after the HEAP2_CLEAN. Both will have it set after the
HEAP2_VISIBLE.
The only chance for a disparity that I see is when crashing after
PageSetAllVisible(), but before the XLogInsert(). That's fairly harmless
afaics and I don't think your tool would pick that up?

I think we could fix it by doing something like

if (!PageIsAllVisible(page) && heap_page_is_all_visible())
{
 visibilitymap_pin(onerel, blkno, vmbuffer)

 START_CRIT_SECTION();
 PageSetAllVisible(page);
 if (!visibilitymap_test(onerel, blkno, vmbuffer))
 {
visibilitymap_set(onerel, blkno, buffer, InvalidXLogRecPtr, 
*vmbuffer,
  visibility_cutoff_xid);
 }
 END_CRIT_SECTION();
}

If we care.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-06-26 Thread Simon Riggs
On 26 June 2014 10:31, David Rowley  wrote:

> If there's no way to tell that the target entry comes from a left join, then
> would it be a bit too quirky to just do the NOT NULL checking when
> list_length(query->rtable) == 1 ? or maybe even loop over query->rtable and
> abort if we find an outer join type? it seems a bit hackish, but perhaps it
> would please more people than it would surprise.

We don't know enough about joins at present, so we only allow it when
there are no joins (i.e. length == 1). That's just a statement of
reality, not a hack.

I would agree with Tom that the common usage is to do NOT IN against a
table with no where clause, so this would hit the most frequent use
case.

Maybe Tom will have a flash of insight before commit, or maybe we
figure out a way to extend it later.

Let's document it and move on.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-26 Thread furuyao
> The patch looks somewhat complicated and bugs can be easily introduced
> because it tries to not only add new feature but also reorganize the main
> loop in HandleCopyStream at the same time. To keep the patch simple, I'm
> thinking to firstly apply the attached patch which just refactors the
> main loop. Then we can apply the main patch, i.e., add new feature.
> Thought?
Thank you for the refactoring patch.
I did a review of the patch.
As a result, I found the calculation of sleeptime when the --status-intarvall 
is set to 1 was incorrect.

--status-intarvall 1 -->  sleeptime 1. !?
--status-intarvall 2 -->  sleeptime 1. OK
--status-intarvall 3 -->  sleeptime 2. OK

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Andres Freund
On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
> On Wed, Jun 25, 2014 at 4:36 PM, Andres Freund  wrote:
> > Since it better be legal to manipulate a atomic variable while holding a
> > spinlock we cannot simply use an arbitrary spinlock as backing for
> > atomics. That'd possibly cause us to wait on ourselves or cause
> > deadlocks.
> 
> I think that's going to fall afoul of Tom's previously-articulated "no
> loops inside spinlocks" rule.  Most atomics, by nature, are
> loop-until-it-works.

Well, so is TAS itself :).

More seriously, I think we're not going to have much fun if we're making
up the rule that you can't do an atomic add/sub while a spinlock is
held. That just precludes to many use cases and will make the code much
harder to understand. I don't think we're going to end up having many
problems if we allow atomic read/add/sub/write in there.

> >> How much would we lose if we supported compiler intrinsics on
> >> versions of GCC and MSVC that have it and left everything else to
> >> future patches?
> >
> > The discussion so far seemed pretty clear that we can't regress somewhat
> > frequently used platforms. And dropping support for all but msvc and gcc
> > would end up doing that. We're going to have to do the legword for the
> > most common platforms... Note that I didn't use assembler for those, but
> > relied on intrinsics...
> 
> We can't *regress* somewhat-frequently used platforms, but that's
> different from saying we've got to support *new* facilities on those
> platforms.

Well, we want to use the new facilities somewhere. And it's not entirely
unfair to call it a regression if other os/compiler/arch combinations
speed up but yours don't.

> Essentially the entire buildfarm is running either gcc,
> some Microsoft compiler, or a compiler that's supports the same
> atomics intrinsics as gcc i.e. icc or clang.  Some of those compilers
> may be too old to support the atomics intrinsics, and there's one Sun
> Studio animal, but I don't know that we need to care about those
> things in this patch...

I think we should support those where it's easy and where we'll see the
breakage on the buildfarm. Sun studio's intrinsics are simple enough
that I don't think my usage of them will be too hard to fix.

> ...unless of course the atomics fallbacks in this patch are
> sufficiently sucky that anyone who ends up using those is going to be
> sad.  Then the bar is a lot higher.  But if that's the case then I
> wonder if we're really on the right course here.

I don't you'll notice it much on mostly nonconcurrent uses. But some of
those architectures/platforms do support larger NUMA like setups. And
for those it'd probably hurt for some workloads. And e.g. solaris is
still fairly common for such setups.

> > I added the x86 inline assembler because a fair number of buildfarm
> > animals use ancient gcc's and because I could easily test it. It's also
> > more efficient for gcc < 4.6. I'm not wedded to keeping it.
> 
> Hmm. gcc 4.6 is only just over a year old, so if pre-4.6
> implementations aren't that good, that's a pretty good argument for
> keeping our own implementations around.  :-(

The difference isn't that big. It's
return __atomic_compare_exchange_n(&ptr->value, expected, newval,
   false,  __ATOMIC_SEQ_CST,__ATOMIC_SEQ_CST);
vs

boolret;
uint64  current;
current = __sync_val_compare_and_swap(&ptr->value, *expected, newval);
ret = current == *expected;
*expected = current;
return ret;

On x86 that's an additional cmp, mov and such. Useless pos because
cmpxchg already computes all that... (that's returned by the setz in my
patch). MSVC only supports the second form as well.

There's a fair number of < 4.2 gccs on the buildfarm as well though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Andres Freund
On 2014-06-25 20:22:31 -0400, Robert Haas wrote:
> On Wed, Jun 25, 2014 at 5:42 PM, Heikki Linnakangas
>  wrote:
> > I think having a separate file for each architecture is nice. I totally
> > agree that they don't belong in src/include/storage, though. s_lock.h has
> > always been misplaced there, but we've let it be for historical reasons, but
> > now that we're adding a dozen new files, it's time to move them out.
> 
> I find the current organization pretty confusing, but maybe that could
> be solved by better documentation of what's supposed to go in each
> architecture or compiler-dependent file.

The idea is that first a architecture specific file (atomics-arch-*.h)
is included. That file can provide a (partial) implementation for the
specific architecture. Or it can do pretty much nothing.

After that a compiler specific file is included
(atomics-generic-*.h). If atomics aren't yet implemented that can
provide an intrinsics based implementation if the compiler version has
support for it. At the very least a compiler barrier should be provided.

After that the spinlock based fallback implementation
(atomics-fallback.h) provides atomics and barriers if not yet
available. By here we're sure that nothing else will provide them.

Then we can provide operations (atomics-generic.h) that build ontop of
the provided functions. E.g. implement _sub, _and et al.

I'll include some more of that explanation in the header.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-26 Thread Petr Jelinek

On 25/06/14 19:43, Sawada Masahiko wrote:

Hi,

I send you review comment about thie patch.


Hello, thanks.


--
$ pg_resetxlog -s -n  data | grep "Database system identifier"
Database system identifier:   6028907917695471865

The -s option does not worksfine with -n option.


Fixed.


--
$ pg_resetxlog
-s602890791769547186511
data
Transaction log reset
$ pg_controldata data | grep "Database system identifier"
Database system identifier:   18446744073709551615

pg_resetxlog is finished successfully, but system identifier was not
changed.
Also I think that checking data about number of digits is needed.



It actually did change the identifier, just to ULONG_MAX, since that's 
the maximum value that can be set (scanf does that conversion), I added 
check that limits the maximum value of system identifier input to 
ULONG_MAX-1 and reports error if it's bigger. I also added some 
additional input validation.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 34b0606..39ae574 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -30,6 +30,7 @@ PostgreSQL documentation
-m mxid,mxid
-O mxoff
-l xlogfile
+   -s [sysid]
datadir
   
  
@@ -184,6 +185,13 @@ PostgreSQL documentation
   
 
   
+   The -s option instructs pg_resetxlog to 
+   set the system identifier of the cluster to specified sysid.
+   If the sysid is not provided new one will be generated using
+   same algorithm initdb uses.
+  
+
+  
The -V and --version options print
the pg_resetxlog version and exit.  The
options -? and --help show supported arguments,
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 915a1ed..e8bbbc1 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -67,7 +67,9 @@ static MultiXactId set_mxid = 0;
 static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
 static uint32 minXlogTli = 0;
 static XLogSegNo minXlogSegNo = 0;
+static uint64 set_sysid = 0;
 
+static uint64 GenerateSystemIdentifier(void);
 static bool ReadControlFile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
@@ -111,7 +113,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:s::")) != -1)
 	{
 		switch (c)
 		{
@@ -227,6 +229,29 @@ main(int argc, char *argv[])
 XLogFromFileName(optarg, &minXlogTli, &minXlogSegNo);
 break;
 
+			case 's':
+if (optarg)
+{
+	if (strspn(optarg, "0123456789") != strlen(optarg) ||
+		sscanf(optarg, UINT64_FORMAT, &set_sysid) != 1)
+	{
+		fprintf(stderr, _("%s: invalid argument for option -s\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	if (set_sysid == ULONG_MAX)
+	{
+		fprintf(stderr, _("%s: system identifier (-s) is too large\n"), progname);
+		exit(1);
+	}
+}
+else
+{
+	set_sysid = GenerateSystemIdentifier();
+}
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit(1);
@@ -354,6 +379,9 @@ main(int argc, char *argv[])
 	if (minXlogSegNo > newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
+	if (set_sysid != 0)
+		ControlFile.system_identifier = set_sysid;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.
@@ -395,6 +423,26 @@ main(int argc, char *argv[])
 
 
 /*
+ * Create a new unique installation identifier.
+ *
+ * See notes in xlog.c about the algorithm.
+ */
+static uint64
+GenerateSystemIdentifier(void)
+{
+	uint64			sysidentifier;
+	struct timeval	tv;
+
+	gettimeofday(&tv, NULL);
+	sysidentifier = ((uint64) tv.tv_sec) << 32;
+	sysidentifier |= ((uint64) tv.tv_usec) << 12;
+	sysidentifier |= getpid() & 0xFFF;
+
+	return sysidentifier;
+}
+
+
+/*
  * Try to read the existing pg_control file.
  *
  * This routine is also responsible for updating old pg_control versions
@@ -475,9 +523,6 @@ ReadControlFile(void)
 static void
 GuessControlValues(void)
 {
-	uint64		sysidentifier;
-	struct timeval tv;
-
 	/*
 	 * Set up a completely default set of pg_control values.
 	 */
@@ -489,14 +534,10 @@ GuessControlValues(void)
 
 	/*
 	 * Create a new unique installation identifier, since we can no longer use
-	 * any old XLOG records.  See notes in xlog.c about the algorithm.
+	 * any old XLOG records.
 	 */
-	gettimeofday(&tv, NULL);
-	sysidentifier = ((uint64) tv.tv_sec) << 32;
-	sysidentifier |= ((uint64) tv.tv_usec) << 12;
-	sysidentifier |= getpid() & 0xFFF;
 
-	ControlFile.system_identifier = sysidentifier;
+	ControlFile.system

Re: [HACKERS] sorting a union over inheritance vs pathkeys

2014-06-26 Thread Michael Glaesemann
On Jun 25, 2014, at 22:14, Tom Lane  wrote:

> I wrote:
>> Michael Glaesemann  writes:
>>> -- ERROR:  could not find pathkey item to sort
> 
>> Hm ... I can reproduce that in 9.3 but it seems fine in 9.4 and HEAD.
>> Don't know what's going on exactly.
> 
> Interesting --- it appears that commit
> a87c729153e372f3731689a7be007bc2b53f1410 is why it works in 9.4.  I had
> thought that was just improving plan quality, but it seems to also prevent
> this problem.  I guess we'd better back-patch it.

Thanks, Tom!

Michael Glaesemann
grzm seespotcode net





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-26 Thread Pavel Stehule
Hello all,

today I had to work with one slow query - when I checked different
scenarios I found a dependency on work_mem size - but it is atypical -
bigger work_mem increased query execution 31 minutes (600MB work mem) and 1
minute (1MB).

db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '600MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

--
 HashAggregate  (cost=2145957.37..2145957.96 rows=59 width=12) (actual
time=1864088.145..1864088.155 rows=31 loops=1)
   ->  Hash Join  (cost=882573.27..2142753.08 rows=213619 width=12) (actual
time=9083.326..1859069.171 rows=7070141 loops=1)
 Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id =
f_users_aax8qg0e23asx5h.id)
 ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..854559.95
rows=32278695 width=12) (actual time=0.006..14271.239 rows=32211565 loops=1)
 ->  Hash  (cost=881801.58..881801.58 rows=61735 width=8) (actual
time=9076.153..9076.153 rows=3310768 loops=1)
   Buckets: 8192  Batches: 1  Memory Usage: 129327kB
   ->  Hash Join  (cost=782.15..881801.58 rows=61735 width=8)
(actual time=1.423..8086.228 rows=3310768 loops=1)
 Hash Cond:
(f_users_aax8qg0e23asx5h.dt_registrationdatetime_id =
dwh_dm_abnblk9j5oagorw.id)
 ->  Seq Scan on f_users_aax8qg0e23asx5h
(cost=0.00..845420.42 rows=9328442 width=12) (actual time=0.015..4098.915
rows=9322096 loops=1)
 ->  Hash  (cost=777.59..777.59 rows=365 width=4)
(actual time=1.400..1.400 rows=365 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 13kB
   ->  Bitmap Heap Scan on dwh_dm_abnblk9j5oagorw
(cost=11.08..777.59 rows=365 width=4) (actual time=0.111..1.218 rows=365
loops=1)
 Recheck Cond: (2014 = id_year)
 ->  Bitmap Index Scan on
dwh_dm_abnblk9j5oagorw_id_year_idx  (cost=0.00..10.99 rows=365 width=0)
(actual time=0.068..0.068 rows=365 loops=1)
   Index Cond: (2014 = id_year)
 Total runtime: 1864107.373 ms
(16 rows)


db_kost07e2d9cdmg20b1takpqntobo6ghj=# set work_mem to '1MB';
SET
db_kost07e2d9cdmg20b1takpqntobo6ghj=# explain analyze SELECT
"f_users_aax8qg0e23asx5h"."firtsclickutmsource_id" AS "a_5078", COUNT( (
CASE WHEN ( 89388 = "f_emails_aaekn159p5aw3t8"."read_id" ) THEN
"f_emails_aaekn159p5aw3t8"."id" ELSE CAST( NULL AS INT ) END ) ) AS
"c_7a2a545bf904a48a", COUNT( "f_emails_aaekn159p5aw3t8"."id" ) AS
"c_9e301e13350cc0dc", ( MAX( ( CASE WHEN ( 89388 =
"f_emails_aaekn159p5aw3t8"."read_id" ) THEN 0 END ) ) IS NOT NULL ) AS
"def_7a2a545bf904a48a", TRUE AS "def_9e301e13350cc0dc" FROM ( (
"f_emails_aaekn159p5aw3t8" INNER JOIN "f_users_aax8qg0e23asx5h" ON (
"f_emails_aaekn159p5aw3t8"."userid_id" =  "f_users_aax8qg0e23asx5h"."id" )
) INNER JOIN "dwh_dm_abnblk9j5oagorw" ON (
"f_users_aax8qg0e23asx5h"."dt_registrationdatetime_id" =
"dwh_dm_abnblk9j5oagorw"."id" ) ) WHERE ( 2014 =
"dwh_dm_abnblk9j5oagorw"."id_year" ) GROUP BY 1;

QUERY
PLAN

--
 HashAggregate  (cost=2275675.45..2275675.95 rows=50 width=12) (actual
time=48438.407..48438.416 rows=31 loops=1)
   ->  Hash Join  (cost=882772.88..2272526.68 rows=209918 width=12) (actual
time=9384.610..45211.963 rows=6988804 loops=1)
 Hash Cond: (f_emails_aaekn159p5aw3t8.userid_id =
f_users_aax8qg0e23asx5h.id)
 ->  Seq Scan on f_emails_aaekn159p5aw3t8  (cost=0.00..839754.64
rows=31719464 width=12) (actual time=0.034..14299.338 rows=31653392 loops=1)
 ->  Hash  (cost=881759.44..881759.44 rows=61715 width=8) (actual
time=9368.371..9368.371 rows=3310768 loops=1)
   Buckets: 4096  Batches: 256 (originally 4)  Memory Usage:
1025kB
   ->  Hash Join  (cost=782.15..881759.44 rows=61715 width=8)
(actual time=3.839..8223.097 row

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-26 Thread Rushabh Lathia
Thanks for sharing latest patch.

For me this syntax is limiting the current returning clause implementation.
Because its not allowing the returning PRIMARY KEYS with other columns, and
if user or application require that they need to do RETURNING *. Following
syntax not working, which i think is very useful in term if someone want to
use
returning clause with PRIMARY KEY as well as some other table columns.

INSERT INTO dept VALUES (10,'ACCOUNTING','NEW YORK') returning primary key,
dname;

I think allowing other columns with PRIMARY KEY would be more useful syntax.
Even in later versions if we want to extend this syntax to return UNIQUE
KEY,
SEQUENCE VALUES, etc.. comma separation syntax will be more handy.

Others please share your inputs/thoughts.



On Thu, Jun 26, 2014 at 8:11 AM, Ian Barwick  wrote:

> On 25/06/14 16:04, Ian Barwick wrote:
>
>> Hi
>>
>> On 14/06/25 15:13, Rushabh Lathia wrote:
>>
>>> Hello All,
>>>
>>> I assigned my self as reviewer of the patch. I gone through the
>>> mail chain discussion and in that question has been raised about
>>> the feature and its implementation, so would like to know what is
>>> the current status of this project/patch.
>>>
>>> Regards,
>>>
>>
>> I'll be submitting a revised version of this patch very shortly.
>>
>
> Revised version of the patch attached, which implements the expansion
> of "primary key" in the rewrite phase per Tom Lane's suggestion upthread
> [*]
>
> [*] http://www.postgresql.org/message-id/28583.1402325...@sss.pgh.pa.us
>
>
>
> Regards
>
> Ian Barwick
>
> --
>  Ian Barwick   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Rushabh Lathia


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2014 at 5:20 AM, Andres Freund  wrote:
> On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
>> On Wed, Jun 25, 2014 at 4:36 PM, Andres Freund  
>> wrote:
>> > Since it better be legal to manipulate a atomic variable while holding a
>> > spinlock we cannot simply use an arbitrary spinlock as backing for
>> > atomics. That'd possibly cause us to wait on ourselves or cause
>> > deadlocks.
>>
>> I think that's going to fall afoul of Tom's previously-articulated "no
>> loops inside spinlocks" rule.  Most atomics, by nature, are
>> loop-until-it-works.
>
> Well, so is TAS itself :).
>
> More seriously, I think we're not going to have much fun if we're making
> up the rule that you can't do an atomic add/sub while a spinlock is
> held. That just precludes to many use cases and will make the code much
> harder to understand. I don't think we're going to end up having many
> problems if we allow atomic read/add/sub/write in there.

That rule seems reasonable -- why would you ever want to do this?
While you couldn't properly deadlock it seems like it could lead to
unpredictable and hard to diagnose performance stalls.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Andres Freund
On 2014-06-26 07:44:14 -0500, Merlin Moncure wrote:
> On Thu, Jun 26, 2014 at 5:20 AM, Andres Freund  wrote:
> > On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
> >> On Wed, Jun 25, 2014 at 4:36 PM, Andres Freund  
> >> wrote:
> >> > Since it better be legal to manipulate a atomic variable while holding a
> >> > spinlock we cannot simply use an arbitrary spinlock as backing for
> >> > atomics. That'd possibly cause us to wait on ourselves or cause
> >> > deadlocks.
> >>
> >> I think that's going to fall afoul of Tom's previously-articulated "no
> >> loops inside spinlocks" rule.  Most atomics, by nature, are
> >> loop-until-it-works.
> >
> > Well, so is TAS itself :).
> >
> > More seriously, I think we're not going to have much fun if we're making
> > up the rule that you can't do an atomic add/sub while a spinlock is
> > held. That just precludes to many use cases and will make the code much
> > harder to understand. I don't think we're going to end up having many
> > problems if we allow atomic read/add/sub/write in there.
> 
> That rule seems reasonable -- why would you ever want to do this?

Are you wondering why you'd ever manipulate an atomic op inside a
spinlock?
There's enough algorithms where a slowpath is done under a spinlock but
the fastpath is done without. You can't simply use nonatomic operations
for manipulation under the spinlock because the fastpaths might then
observe/cause bogus state.

Obviously I'm not advocating to do random stuff in spinlocks. W're
talking about single lock xadd;s (or the LL/SC) equivalent here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-06-26 Thread MauMau

Hello,

From: "Pavel Stehule" 

fixed


Thank you.  All is fine.



should be ""Environment variables".  And this section lacks description
for Windows, such as:

+ printf(_("  NAME=VALUE [NAME=VALUE] psql ...\n  or \\setenv NAME 
[VALUE]

in interactive mode\n\n"));

+ printf(_("  PGPASSFILE password file (default ~/.pgpass)\n"));



??? -


I meant that how to set environment variables on Windows command prompt is 
different from on UNIX/Linux, and the default password file path is also 
different on Windows.  Do we describe them in this help?



Lastly, to follow most of your descriptions, "s" at the end of the first 
verb in these messages should be removed.  For example, use "set" instead of 
"sets".


+ printf(_("  COMP_KEYWORD_CASE  determines which letter case to use when 
completing an SQL key word\n"));
+ printf(_("  columnssets the target width for the wrapped 
format\n"));
+ printf(_("  linestyle  sets the border line drawing style [ascii, 
old-ascii, unicode]\n"));
+ printf(_("  recordsep  specifies the record (line) separator to 
use in unaligned output format\n"));
+ printf(_("  title  sets the table title for any subsequently 
printed tables\n"));



This is all I noticed in the review.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash with assertions and WAL_DEBUG

2014-06-26 Thread Heikki Linnakangas

On 06/24/2014 05:47 PM, Alvaro Herrera wrote:

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 3c1c81a..4264373 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -219,6 +219,16 @@ mdinit(void)
  
&hash_ctl,
   HASH_ELEM | 
HASH_FUNCTION | HASH_CONTEXT);
pendingUnlinks = NIL;
+
+   /*
+* XXX: The checkpointer needs to add entries to the pending ops
+* table when absorbing fsync requests. That is done within a 
critical
+* section. It means that there's a theoretical possibility 
that you
+* run out of memory while absorbing fsync requests, which 
leads to
+* a PANIC. Fortunately the hash table is small so that's 
unlikely to
+* happen in practice.
+*/
+   MemoryContextAllowInCriticalSection(MdCxt, true);
}
  }


Isn't that allowing a bit too much? We e.g. shouldn't allow
_fdvec_alloc() within a crritical section. Might make sense to create a
child context for it.


I agree.


Ok. That leaves nothing but _fdvec_alloc() in MdCxt.


Rahila Syed wrote:


The patch on compilation gives following error,

mcxt.c: In function ‘MemoryContextAllowInCriticalSection’:
mcxt.c:322: error: ‘struct MemoryContextData’ has no member named
‘allowInCriticalSection’

The member in MemoryContextData is defined as 'allowInCritSection' while
the MemoryContextAllowInCriticalSection accesses the field as
'context->allowInCriticalSection'.


It appears Heikki did a search'n replace for "->allowInCritSection"
before submitting, which failed to match the struct declaration.


Uh, looks like I failed to test this at all with assertions enabled. 
Once you fix that error, you get a lot of assertion failures :-(.


Attached is a new attempt:

* walDebugCxt is now created at backend startup (in XLOGShmemInit()). 
This fixes the issue that Andres pointed out, that the context creation 
would PANIC if the first XLogInsert in a backend happens within a 
critical section. LWLOCK_STATS had the same issue, it would fail if the 
first LWLock acquisition in a process happens within a critical section. 
I fixed that by adding an explicit InitLWLockAccess() function and moved 
the initialization there.


* hash_create also creates a new memory context within the given 
context. That new context needs to inherit the allowInCritSection flag, 
otherwise allocating entries in the hash will still fail. Both 
LWLOCK_STATS and the pending ops hash table in the checkpointer had this 
issue.


* Checkpointer does one palloc to allocate private space to copy the 
requests to, in addition to using the hash table. I moved that palloc to 
before entering the critical section.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abc5682..bef0f66 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -60,6 +60,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
+#include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/relmapper.h"
 #include "utils/snapmgr.h"
@@ -736,6 +737,10 @@ static bool bgwriterLaunched = false;
 static int	MyLockNo = 0;
 static bool holdingAllLocks = false;
 
+#ifdef WAL_DEBUG
+static MemoryContext walDebugCxt = NULL;
+#endif
+
 static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo);
 static bool recoveryStopsBefore(XLogRecord *record);
@@ -1258,6 +1263,7 @@ begin:;
 	if (XLOG_DEBUG)
 	{
 		StringInfoData buf;
+		MemoryContext oldCxt = MemoryContextSwitchTo(walDebugCxt);
 
 		initStringInfo(&buf);
 		appendStringInfo(&buf, "INSERT @ %X/%X: ",
@@ -1282,10 +1288,11 @@ begin:;
 
 			appendStringInfoString(&buf, " - ");
 			RmgrTable[rechdr->xl_rmid].rm_desc(&buf, (XLogRecord *) recordbuf.data);
-			pfree(recordbuf.data);
 		}
 		elog(LOG, "%s", buf.data);
-		pfree(buf.data);
+
+		MemoryContextSwitchTo(oldCxt);
+		MemoryContextReset(walDebugCxt);
 	}
 #endif
 
@@ -4807,6 +4814,24 @@ XLOGShmemInit(void)
 	char	   *allocptr;
 	int			i;
 
+#ifdef WAL_DEBUG
+	/*
+	 * Create a memory context for WAL debugging that's exempt from the
+	 * normal "no pallocs in critical section" rule. Yes, that can lead to a
+	 * PANIC if an allocation fails, but no-one is going to enable wal_debug
+	 * in production.
+	 */
+	if (walDebugCxt == NULL)
+	{
+		walDebugCxt = AllocSetContextCreate(TopMemoryContext,
+			"WAL Debug",
+			ALLOCSET_DEFAULT_MINSIZE,
+			ALLOCSET_DEFAULT_INITSIZE,
+			ALLOCSET_DEFAULT_MAXSIZE);
+		MemoryContextAllowInCriticalSection(walDebugCxt, true);
+	}
+#endif
+
 	ControlFile = (ControlFileData *)
 		ShmemInitStruct("Control File", sizeof(ControlFileData), &foundCFil

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-06-26 Thread Pavel Stehule
2014-06-26 15:26 GMT+02:00 MauMau :

> Hello,
>
> From: "Pavel Stehule" 
>
>> fixed
>>
>
> Thank you.  All is fine.
>
>
>
>  should be ""Environment variables".  And this section lacks description
>>> for Windows, such as:
>>>
>>> + printf(_("  NAME=VALUE [NAME=VALUE] psql ...\n  or \\setenv NAME
>>> [VALUE]
>>> in interactive mode\n\n"));
>>>
>>> + printf(_("  PGPASSFILE password file (default ~/.pgpass)\n"));
>>>
>>>
>> ??? -
>>
>
> I meant that how to set environment variables on Windows command prompt is
> different from on UNIX/Linux, and the default password file path is also
> different on Windows.  Do we describe them in this help?
>

hmm, I'll check it

Pavel


>
>
> Lastly, to follow most of your descriptions, "s" at the end of the first
> verb in these messages should be removed.  For example, use "set" instead
> of "sets".
>
>
> + printf(_("  COMP_KEYWORD_CASE  determines which letter case to use when
> completing an SQL key word\n"));
> + printf(_("  columnssets the target width for the wrapped
> format\n"));
> + printf(_("  linestyle  sets the border line drawing style
> [ascii, old-ascii, unicode]\n"));
> + printf(_("  recordsep  specifies the record (line) separator to
> use in unaligned output format\n"));
> + printf(_("  title  sets the table title for any subsequently
> printed tables\n"));
>
>
> This is all I noticed in the review.
>
>
> Regards
> MauMau
>
>


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Stephen Frost
Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
> Ian looked into the auditing capabilities of various (SQL and otherwise)
> systems some time ago. Informix handles its auditing configuration
> entirely outside the database. DB2 uses a mixture of in-database and
> external configuration. Oracle stores everything in the database.

We'll certainly still have some external configuration, but not things
which are referring to specific roles or tables.  We will also need a
way to manage that configuration such that an individual could have only
access to modify the audit parameters without also having lots of other
rights to the system.  Having a separate auditing role is called out as
a requirement for quite a few different industries.

> We're certainly not going to invent a mechanism other than GUC settings
> or catalog tables for auditing information (à la Informix). What I think
> you're saying is that without storing stuff in the catalog tables, we're
> not going to be able to offer auditing capabilities worth having. I can
> appreciate that, but I'm not sure I agree.

This is going a bit father than I intended.  I was trying to outline
where we want to be going with this and pointing out that if we put
pgaudit out there as "this is how you do auditing in PG" that we'll be
stuck with it more-or-less forever- and that it won't satisfy critical
use-cases and would minimize PG's ability to be used in those
industries.  Further, if we go with this approach now, we will end up
creating a huge upgrade headache (which could end up being used as a
reason to not have the in-core capabilities).

A catalog-based approach which had only the specific features that
pgaudit has today, but is implemented in such a way that we'll be able
to add those other features in the future would be great, imv.  What
this looks like, however, is a solution which only gets us half-way and
then makes it quite difficult to go farther.

> You're right, it isn't possible to sanely do something like "AUDIT
> SELECT ON TABLE t1 FOR role1" without storing configuration in the
> catalog tables. You're right, it would be nice to have that level
> of control.

Great, glad we agree on that. :)

> The part that I don't agree about is that using a GUC setting and
> a reloption won't give us anything useful. We can use that to do
> global, per-database, per-user, and per-object auditing with just
> mechanisms that are familiar to Postgres users already (ALTER … SET).

I've clearly miscommunicated and my apologies for that- it's not that
the feature set isn't useful or that using GUCs and reloptions wouldn't
be useful, it's much more than I don't want that to be our end-state and
I don't think we'll have an easy time moving from GUCs and reloptions to
the end-state that I'm looking at.  What I was balking at was the notion
that GUCs and reloptions would satisfy the auditing requirements for all
(or even most) of our users.  It would be sufficient and useful for
some, but certainly not all.

> Some other messages in the thread have suggested a similar mechanism,
> which implies that at least some people wouldn't be unhappy with it.

Sure.

> No, we couldn't do combinations of TABLE/ROLE, but there's also no
> terrible conflict with doing that via some new "AUDIT foo ON bar"
> syntax later.

We also wouldn't be able to provide fine-grained control over who is
allowed to define the auditing requirements and I don't see how you'd
support per-column auditing definitions or integrate auditing with RLS.

> As you point out, we're decades behind, and I seriously doubt if
> we're going to jump forward a few decades in time for 9.5.

Agreed- but as has been pointed out to me on the RLS thread, we don't
want to put something into 9.5 which will mean we can't ever catch up
because of backwards compatibility or pg_upgrade issues.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> "Which tables are audited" would be available via the reloptions
> field. 

RLS could be implemented through reloptions too.  Would it be useful to
some people?  Likely.  Would it satisfy the users who, today, are
actually asking for that feature?  No (or at least, not the ones that
I've talked with).  We could expand quite a few things to work through
reloptions but I don't see it as a particularly good design for complex
subsystems, of which auditing is absolutely one of those.

I've gotten quite a bit of push-back and been working through a much
more detailed (but, all-in-all, good, imv) design and roadmap for RLS
with targets for 9.5 and beyond.  I simply do not understand why these
same concerns apparently don't apply to pgaudit.  Being an extension
which lives in contrib can be, in many ways, worse than having nothing
at all distributed with PG because it causes pg_upgrade and backwards
compatibility issues.

> This is basically the difference between information being
> stored within the reloptions field (think of its as an hstore column)
> and being stored in a separate audit-specific column. I see no
> difference between them, we just need to provide a view that extracts
> the useful information. It is very important we add new things as
> reloptions so that we don't have to continually add new code elsewhere
> to handle all the new options we add.

Auditing isn't going to be a polished and solved piece of core PG with
an implementation that uses only single additional column (even an
hstore or json column), imv.

> I don't agree with adding "AUDIT ..." syntax to the parser and having
> top-level SQL commands. That just bloats the parser for no particular
> gain. With reloptions we already support all the required DDL. No
> further work required.

I disagree that new top-level syntax is completely off the table, but
I'm open to another solution for SQL-based auditing specifications.  I
would argue that we must consider how we will support a permissions
system around auditing which differs from the owner of the relation or
owner of the database and certainly don't want the control over auditing
to require superuser rights.

> We want an audit feature in core 9.5, but I don't see those adding SQL
> and adding specific catalog columns as giving us anything at all. It
> will still be an audit feature without them.

I'd love to see auditing happen in 9.5 too.  I'd also like to see RLS in
9.5, but that doesn't mean I'm ignoring the concerns about making sure
that we have a good design which will add useful capabilities in 9.5
while allowing the more advanced capabilities to be added later.

> Many, many users do not want audit. Many do. So the idea is to allow
> audit to be added in a way that doesn't affect everybody else.

Sure.

> So lets commit pgaudit now and spend time on genuine enhancements to
> the functionality, not just rewriting the same thing into a different
> form that adds no value.

I think this will paint us into a corner such that we won't be able to
add the capabilities which the users who are most concerned about
auditing require.  I really don't want that to happen as it will mean
that we have an auditing system for the users who like the idea of
auditing but don't have real security issues, while the large industries
we're targeting won't be able to use PG.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] NUMA packaging and patch

2014-06-26 Thread Kohei KaiGai
Hello,

Let me comment on this patch.

It can be applied on head of the master branch, built and run
regression test successfully.
What this patch tries to do is quite simple and obvious.
It suggests operating system to distribute physical pages to
every numa nodes on allocation.

One thing I concern is, it may conflict with numa-balancing
features that is supported in the recent Linux kernel; that
migrates physical pages according to the location of tasks
which references the page beyond the numa zone.
# I'm not sure whether it is applied on shared memory region.
# Please correct me if I misunderstood. But it looks to me
# physical page in shared memory is also moved.
http://events.linuxfoundation.org/sites/events/files/slides/summit2014_riel_chegu_w_0340_automatic_numa_balancing_0.pdf

Probably, interleave policy should work well on OLTP workload.
How about OLAP workload if physical pages are migrated
by operating system transparently to local node?
In OLAP case, less concurrency is required, but a query run
complicated logic (usually including full-scan) on a particular
CPU.

Isn't it make sense to have a GUC to control the numa policy.
In some cases, it makes sense to allocate physical memory
according to operating system's choice.

Thanks,

2014-06-11 2:34 GMT+09:00 Kevin Grittner :
> Josh Berkus  wrote:
>> On 06/08/2014 03:45 PM, Kevin Grittner wrote:
>>> By default, the OS cache and buffers are allocated in the memory
>>> node with the shortest "distance" from the CPU a process is
>>> running on.
>>
>> Note that this will stop being the default in future Linux kernels.
>> However, we'll have to deal with the old ones for some time to come.
>
> I was not aware of that.  Thanks.  Do you have a URL handy?
>
> In any event, that is the part of the problem which I think falls
> into the realm of packagers and/or sysadmins; a patch for that
> doesn't seem sensible, given how cpusets are implemented.  I did
> figure we would want to add some documentation around it, though.
> Do you agree that is worthwhile?
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Noah Misch
On Thu, Jun 26, 2014 at 12:20:06PM +0200, Andres Freund wrote:
> On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
> > On Wed, Jun 25, 2014 at 4:36 PM, Andres Freund  
> > wrote:
> > > Since it better be legal to manipulate a atomic variable while holding a
> > > spinlock we cannot simply use an arbitrary spinlock as backing for
> > > atomics. That'd possibly cause us to wait on ourselves or cause
> > > deadlocks.
> > 
> > I think that's going to fall afoul of Tom's previously-articulated "no
> > loops inside spinlocks" rule.  Most atomics, by nature, are
> > loop-until-it-works.
> 
> Well, so is TAS itself :).
> 
> More seriously, I think we're not going to have much fun if we're making
> up the rule that you can't do an atomic add/sub while a spinlock is
> held. That just precludes to many use cases and will make the code much
> harder to understand. I don't think we're going to end up having many
> problems if we allow atomic read/add/sub/write in there.

I agree it's valuable to have a design that permits invoking an atomic
operation while holding a spinlock.

> > > I added the x86 inline assembler because a fair number of buildfarm
> > > animals use ancient gcc's and because I could easily test it. It's also
> > > more efficient for gcc < 4.6. I'm not wedded to keeping it.
> > 
> > Hmm. gcc 4.6 is only just over a year old, so if pre-4.6
> > implementations aren't that good, that's a pretty good argument for
> > keeping our own implementations around.  :-(

GCC 4.6 was released 2011-03-25, though that doesn't change the bottom line.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how can i prevent materialized views from refreshing during pg_restore

2014-06-26 Thread Kirk Roybal
I asked a question over on StackOverflow, and Craig Ringer told me to
report it here.

http://stackoverflow.com/questions/24413161/how-can-i-prevent-materialized-views-from-refreshing-during-pg-restore

I have created a dump of the database using pg_dump in "custom" format
(-Fc). This format allows for pg_restore to be invoked with the "jobs"
option (-j8). The jobs options starts 8 processes, and restores the vast
majority of relations in my database within 10 minutes.

I'm left with 4 processes. One of them is the refresh of a materialized
view, and the other 3 are indexes to be applied to 3 tables that the
materialized view uses as data sources. The indexes are "waiting"
according to pg_stat_activity, presumably because the REFRESH of the
materialized view is still accessing the source tables.

When the indexes are in place, the refresh of the view only takes a couple
of minutes. Because the indexes are not in place during the REFRESH, I cut
the REFRESH process off at 17 hours, which made pg_restore fail.

How can I

Force the order of items so the indexes get created first
Turn off the refresh of the materialized view and do it manually later
Manipulate the dump file in custom format to say "WITH NO DATA"
Intercept the REFRESH MATERIALIZED VIEW statement and throw it in the
trash

Or any other solution that gets the job done?

I have a dump file that I'm willing to send to somebody that seems to
reproduce the problem pretty consistently.


Thank you,
/Kirk



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-26 Thread Vik Fearing
On 06/26/2014 05:07 AM, Amit Kapila wrote:
> On Wed, Jun 25, 2014 at 9:56 PM, Vik Fearing  > wrote:
>> On 06/25/2014 03:04 PM, Amit Kapila wrote:
>> > Currently you can achieve that by
>> > "ALTER SYSTEM RESET guc = Default;".
>> > However it will be good to have support for RESET as well.  I think it
>> > should not be too complicated to implement that syntax, I personally
>> > don't have bandwidth to it immediately, but I would like to take care
>> > of it unless you or someone wants to do it by the time I get some
>> > bandwidth.
>>
>> Would something like this suffice?
> 
> I think it will make sense if we support RESET ALL as well similar
> to Alter Database .. RESET ALL syntax.  Do you see any reason
> why we shouldn't support RESET ALL syntax for Alter System?

Yes, that makes sense.  I've added that in the attached version 2 of the
patch.

> About patch:
> 
> + | ALTER SYSTEM_P RESET var_name
> + {
> + AlterSystemStmt *n = makeNode(AlterSystemStmt);
> + n->setstmt = makeNode(VariableSetStmt);
> + n->setstmt->kind = VAR_RESET;
> + n->setstmt->name = $4;
> + $ = (Node *)n;
> + }
> 
> I think it would be better to have ALTER SYSTEM_P as generic and
> then SET | RESET as different versions, something like below:
> 
> | SET reloptions
> {
> AlterTableCmd *n = makeNode(AlterTableCmd);
> n->subtype = AT_SetRelOptions;
> n->def = (Node *)$2;
> $$ = (Node *)n;
> }
> /* ALTER TABLE  RESET (...) */
> | RESET reloptions
> {
> AlterTableCmd *n = makeNode(AlterTableCmd);
> n->subtype = AT_ResetRelOptions;
> n->def = (Node *)$2;
> $$ = (Node *)n;
> }
> 
> Another point is that if we decide to support RESET ALL syntax, then
> we might want reuse VariableResetStmt, may be by breaking into
> generic and specific like we have done for generic_set.

I didn't quite follow your ALTER TABLE example because I don't think
it's necessary, but I did split out VariableResetStmt like you suggested
because I think that is indeed cleaner.

> Thanks for working on patch.

You're welcome.
-- 
Vik
*** a/doc/src/sgml/ref/alter_system.sgml
--- b/doc/src/sgml/ref/alter_system.sgml
***
*** 22,27  PostgreSQL documentation
--- 22,30 
   
  
  ALTER SYSTEM SET configuration_parameter { TO | = } { value | 'value' | DEFAULT }
+ 
+ ALTER SYSTEM RESET configuration_parameter
+ ALTER SYSTEM RESET ALL
  
   
  
***
*** 30,37  ALTER SYSTEM SET configuration_parameter
 ALTER SYSTEM writes the configuration parameter
!values to the postgresql.auto.conf file. With
!DEFAULT, it removes a configuration entry from
 postgresql.auto.conf file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
--- 33,40 
  

 ALTER SYSTEM writes the configuration parameter
!values to the postgresql.auto.conf file. Setting the parameter to
!DEFAULT, or using the RESET variant, removes the configuration entry from
 postgresql.auto.conf file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 401,407  static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  
  %type 	insert_rest
  
! %type  generic_set set_rest set_rest_more SetResetClause FunctionSetResetClause
  
  %type 	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type 	columnDef columnOptions
--- 401,407 
  
  %type 	insert_rest
  
! %type  generic_set set_rest set_rest_more generic_reset reset_rest SetResetClause FunctionSetResetClause
  
  %type 	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type 	columnDef columnOptions
***
*** 1567,1605  NonReservedWord_or_Sconst:
  		;
  
  VariableResetStmt:
! 			RESET var_name
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = $2;
! 	$$ = (Node *) n;
  }
! 			| RESET TIME ZONE
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "timezone";
! 	$$ = (Node *) n;
  }
! 			| RESET TRANSACTION ISOLATION LEVEL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "transaction_isolation";
! 	$$ = (Node *) n;
  }
! 			| RESET SESSION AUTHORIZATION
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "session_authorization";
! 	$$ = (Node *) n;
  }
! 			| RESET ALL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET_ALL;
! 	$$ = (Node *) n;
  }
  		;
  
--- 1567,1613 
  		;
  
  VariableResetStmt:
! 			RESET reset_rest		{ $$ = (Node *) $2; }
! 		;
! 
! reset_rest:
! 			generic_reset			

[HACKERS] Re: how can i prevent materialized views from refreshing during pg_restore

2014-06-26 Thread David G Johnston
bithead wrote
> I asked a question over on StackOverflow, and Craig Ringer told me to
> report it here.
> 
> http://stackoverflow.com/questions/24413161/how-can-i-prevent-materialized-views-from-refreshing-during-pg-restore
> 
> I have created a dump of the database using pg_dump in "custom" format
> (-Fc). This format allows for pg_restore to be invoked with the "jobs"
> option (-j8). The jobs options starts 8 processes, and restores the vast
> majority of relations in my database within 10 minutes.
> 
> I'm left with 4 processes. One of them is the refresh of a materialized
> view, and the other 3 are indexes to be applied to 3 tables that the
> materialized view uses as data sources. The indexes are "waiting"
> according to pg_stat_activity, presumably because the REFRESH of the
> materialized view is still accessing the source tables.
> 
> When the indexes are in place, the refresh of the view only takes a couple
> of minutes. Because the indexes are not in place during the REFRESH, I cut
> the REFRESH process off at 17 hours, which made pg_restore fail.
> 
> How can I
> 
> Force the order of items so the indexes get created first
> Turn off the refresh of the materialized view and do it manually later
> Manipulate the dump file in custom format to say "WITH NO DATA"
> Intercept the REFRESH MATERIALIZED VIEW statement and throw it in the
> trash
> 
> Or any other solution that gets the job done?
> 
> I have a dump file that I'm willing to send to somebody that seems to
> reproduce the problem pretty consistently.

Have/can you try the "-l (el) & -L" options to pg_restore?

http://www.postgresql.org/docs/9.3/static/app-pgrestore.html

(example of usage is toward the bottom of the page)

Basically re-order the command sequence so that the materialize runs as late
as possible, or just disable it altogether.

pg_dump/pg_restore should be taught to handle this better, which is the main
reason why Craig had you post here ASAP, but to get it functional for now
manual intervention will be necessary.  In theory the "listing" capabilities
should allow you to do what you need.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/how-can-i-prevent-materialized-views-from-refreshing-during-pg-restore-tp5809364p5809367.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: how can i prevent materialized views from refreshing during pg_restore

2014-06-26 Thread Kirk Roybal
I think this (pg_restore -l | pg_restore -L) will get me where I need to
go for now by inserting a small shell script in between that pushes the
materialized views to the end of the list, but then I will also have to
manage my own dependencies for the items that I re-sort (MatViews of
MatViews).
This pretty seriously limits the usefulness of materialized views for me. 
For version 9.3.x, I'm likely to require MatView dependencies no more than
1 deep.
Thanks for the answer, I had no solution before that.

/Kirk


> bithead wrote
>> I asked a question over on StackOverflow, and Craig Ringer told me to
>> report it here.
>>
>> http://stackoverflow.com/questions/24413161/how-can-i-prevent-materialized-views-from-refreshing-during-pg-restore
>>
>> I have created a dump of the database using pg_dump in "custom" format
>> (-Fc). This format allows for pg_restore to be invoked with the "jobs"
>> option (-j8). The jobs options starts 8 processes, and restores the vast
>> majority of relations in my database within 10 minutes.
>>
>> I'm left with 4 processes. One of them is the refresh of a materialized
>> view, and the other 3 are indexes to be applied to 3 tables that the
>> materialized view uses as data sources. The indexes are "waiting"
>> according to pg_stat_activity, presumably because the REFRESH of the
>> materialized view is still accessing the source tables.
>>
>> When the indexes are in place, the refresh of the view only takes a
>> couple
>> of minutes. Because the indexes are not in place during the REFRESH, I
>> cut
>> the REFRESH process off at 17 hours, which made pg_restore fail.
>>
>> How can I
>>
>> Force the order of items so the indexes get created first
>> Turn off the refresh of the materialized view and do it manually
>> later
>> Manipulate the dump file in custom format to say "WITH NO DATA"
>> Intercept the REFRESH MATERIALIZED VIEW statement and throw it in
>> the
>> trash
>>
>> Or any other solution that gets the job done?
>>
>> I have a dump file that I'm willing to send to somebody that seems to
>> reproduce the problem pretty consistently.
>
> Have/can you try the "-l (el) & -L" options to pg_restore?
>
> http://www.postgresql.org/docs/9.3/static/app-pgrestore.html
>
> (example of usage is toward the bottom of the page)
>
> Basically re-order the command sequence so that the materialize runs as
> late
> as possible, or just disable it altogether.
>
> pg_dump/pg_restore should be taught to handle this better, which is the
> main
> reason why Craig had you post here ASAP, but to get it functional for now
> manual intervention will be necessary.  In theory the "listing"
> capabilities
> should allow you to do what you need.
>
> David J.
>
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/how-can-i-prevent-materialized-views-from-refreshing-during-pg-restore-tp5809364p5809367.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
--bithead--



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-26 Thread Tomas Vondra

Hi,

Dne 2014-06-26 14:10, Pavel Stehule napsal:

Hello all,

today I had to work with one slow query - when I checked different
scenarios I found a dependency on work_mem size - but it is atypical -
bigger work_mem increased query execution 31 minutes (600MB work mem)
and 1 minute (1MB).


The problem described in Pavel's emails (illustrated by the awful
explain plans) is in this part:

 ->  Hash  (cost=881801.58..881801.58 rows=61735 width=8) (actual 
time=9076.153..9076.153 rows=3310768 loops=1)


That is, the estimated number of rows is ~60k, but in reality it's 
~3.3M.
This then leads to a hash table with small number of buckets (8192) 
containing
large number of tuples (~400 in this case) in a linked list. Which 
significantly

slows down the lookups during the hash join.

This issue is actually quite common - all it takes is a significant
underestimation of the hashed relation, either because it's a complex 
join
(thus inherently difficult to estimate) or because the WHERE conditions 
are

not independent (see the example below).

The attached patch (early WIP, after ~30 minutes of hacking) addresses 
this by
increasing the number of bucket whenever the average number of tuples 
per item

exceeds 1.5x NTUP_PER_BUCKET.


=== Example 

create table table_a (id int, fk_id int);
create table table_b (fk_id int, val_a int, val_b int);

insert into table_a select i, i from generate_series(1,1000) s(i);
insert into table_b select i, mod(i,1000), mod(i,1000) from 
generate_series(1,1000) s(i);


analyze table_a;
analyze table_b;

explain analyze select count(*) from table_a join table_b on (table_a.id 
= table_b.fk_id) where val_a < 10 and val_b < 10;



without the patch:

QUERY PLAN
--
 Aggregate  (cost=385834.56..385834.57 rows=1 width=0) (actual 
time=49543.263..49543.264 rows=1 loops=1)
   ->  Hash Join  (cost=204069.89..385831.16 rows=1359 width=0) (actual 
time=923.751..49531.554 rows=10 loops=1)

 Hash Cond: (table_a.id = table_b.fk_id)
 ->  Seq Scan on table_a  (cost=0.00..144247.77 rows=977 
width=4) (actual time=0.104..967.090 rows=1000 loops=1)
 ->  Hash  (cost=204052.90..204052.90 rows=1359 width=4) (actual 
time=923.615..923.615 rows=10 loops=1)

   Buckets: 1024  Batches: 1  Memory Usage: 3516kB
   ->  Seq Scan on table_b  (cost=0.00..204052.90 rows=1359 
width=4) (actual time=0.086..910.656 rows=10 loops=1)

 Filter: ((val_a < 10) AND (val_b < 10))
 Rows Removed by Filter: 990
 Planning time: 0.271 ms
 Execution time: 49545.053 ms
(11 rows)


with the patch:

QUERY PLAN
--
 Aggregate  (cost=385834.56..385834.57 rows=1 width=0) (actual 
time=9780.346..9780.347 rows=1 loops=1)
   ->  Hash Join  (cost=204069.89..385831.16 rows=1359 width=0) (actual 
time=939.297..9772.256 rows=10 loops=1)

 Hash Cond: (table_a.id = table_b.fk_id)
 ->  Seq Scan on table_a  (cost=0.00..144247.77 rows=977 
width=4) (actual time=0.103..962.446 rows=1000 loops=1)
 ->  Hash  (cost=204052.90..204052.90 rows=1359 width=4) (actual 
time=939.172..939.172 rows=10 loops=1)

   Buckets: 8192  Batches: 1  Memory Usage: 3516kB
   ->  Seq Scan on table_b  (cost=0.00..204052.90 rows=1359 
width=4) (actual time=0.064..909.191 rows=10 loops=1)

 Filter: ((val_a < 10) AND (val_b < 10))
 Rows Removed by Filter: 990
 Planning time: 0.276 ms
 Execution time: 9782.295 ms
(11 rows)

Time: 9784.392 ms

So the duration improved significantly - from ~52 seconds to ~10 
seconds.


The patch is not perfect, it needs a bit more love, as illustrated by 
the FIXME/TODO items. Feel free to comment.


regards
Tomas

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 589b2f1..d71b8b9 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -39,6 +39,7 @@
 
 
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
+static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 	  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -682,6 +683,77 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 }
 
 /*
+ * ExecHashIncreaseNumBuckets
+ *		increase the original number of buckets in order to reduce
+ *		number of tuples per bucket
+ */
+static void
+ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
+{
+	int			oldnbuckets = hashtable->nbuckets;
+	int			i;
+

Re: [HACKERS] NUMA packaging and patch

2014-06-26 Thread Claudio Freire
On Thu, Jun 26, 2014 at 11:18 AM, Kohei KaiGai  wrote:
> One thing I concern is, it may conflict with numa-balancing
> features that is supported in the recent Linux kernel; that
> migrates physical pages according to the location of tasks
> which references the page beyond the numa zone.
> # I'm not sure whether it is applied on shared memory region.
> # Please correct me if I misunderstood. But it looks to me
> # physical page in shared memory is also moved.
> http://events.linuxfoundation.org/sites/events/files/slides/summit2014_riel_chegu_w_0340_automatic_numa_balancing_0.pdf


Sadly, it excludes the OS cache explicitly (when it mentions libc.so),
which is one of the hottest sources of memory bandwidth consumption in
a database.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's the point of json_extract_path_op etc?

2014-06-26 Thread Andrew Dunstan


On 06/25/2014 02:46 PM, Tom Lane wrote:

Why do we have essentially duplicate pg_proc entries for json_extract_path
and json_extract_path_op?  The latter is undocumented and seems only to be
used as the infrastructure for the #> operator.  I see that only the
former is marked variadic, but AFAIK the operator machinery couldn't care
less about that, so it seems to me we could get rid of the
json_extract_path_op entry and point the operator at json_extract_path.

Likewise for json_extract_path_text_op, jsonb_extract_path_op, and
jsonb_extract_path_text_op.






ISTR trying that and running into problems, maybe with opr_sanity checks.

But if you can get rid of them cleanly then by all means do.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NUMA packaging and patch

2014-06-26 Thread Kevin Grittner
Claudio Freire  wrote:

> Sadly, it excludes the OS cache explicitly (when it mentions libc.so),
> which is one of the hottest sources of memory bandwidth consumption in
> a database.

Agreed.  On the bright side, the packagers and/or sysadmins can fix this
without any changes to the PostgreSQL code, by creating a custom cpuset
and using it during launch of the postmaster.  I went through that
exercise in my original email.  This patch complements that by
preventing one CPU from managing all of PostgreSQL shared memory, and
thus becoming a bottleneck.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS Design

2014-06-26 Thread Robert Haas
On Wed, Jun 25, 2014 at 4:48 PM, Dean Rasheed  wrote:
>> Instead of doing it this way, we could instead do:
>>
>> ALTER ROLE role1 ADD POLICY p1;
>> ALTER ROLE role2 ADD POLICY p2;
>>
>> We could possibly allow multiple policies to be set for the same user,
>> but given an error (or OR the quals together) if there are conflicting
>> policies for the same table.  A user with no policies would see
>> everything to which they've been granted access.
>>
> I'm a bit uneasy about allowing overlapping policies like this,
> because I think it is more likely to lead to unintended consequences
> than solve real use cases. For example, suppose you define policies p1
> and p2 and set them up on table t1, and you grant role1 permissions on
> t1 and allow role1 the use of policy p1. Then you set up policy p2 on
> another table t2, and decide you want to allow role1 access to t2
> using this policy. The only way to do it is to add p2 to role1, but
> doing so also then gives role1 access to t1 using p2, which might not
> have been what you intended.

I guess that's true but it just seems like a configuration error.  I
have it in mind that most people will define policies for
non-overlapping sets of tables and then apply those policies as
appropriate to each user.  Whether that's true or not, I don't see it
as being materially different from granting membership in a role - you
could easily give the user permission to do stuff they shouldn't be
able to do, but if you don't carefully examine the bundle of
privileges that come with that GRANT before executing on it, that's
your fault, not the system's.

>> To support different policies on different operations, you could have
>> something like:
>>
>> ALTER TABLE t1 SET POLICY p1 ON INSERT TO t1_p1_quals;
>>
>> Without the ON clause, it would establish the given policy for all 
>> operations.
>
> Yes, that makes sense. But as I was arguing above, I think the ACLs
> should be attached to the specific RLS policy identified uniquely by
> (table, policy, command). So, for example, if you did
>
> ALTER TABLE t1 SET POLICY p1 ON SELECT TO t1_p1_sel_quals;
> ALTER TABLE t1 SET POLICY p1 ON UPDATE TO t1_p1_upd_quals;
>
> you could also do
>
> GRANT SELECT ON TABLE t1 TO role1 USING p1;
> GRANT UPDATE ON TABLE t1 TO role1 USING p1;
>
> but it would be an error to do
>
> GRANT DELETE ON TABLE t1 TO role1 USING p1;

As I see it, the downside of this is that it gets a lot more complex.
We have to revise the ACL representation, which is already pretty darn
complicated, to keep track not only of the grantee, grantor, and
permissions, but also the policies qualifying those permissions.  The
changes to GRANT will need to propagate into GRANT ON ALL TABLES IN
SCHEMA and AFTER DEFAULT PRIVILEGES.  There is administrative
complexity as well, because if you want to policy-protect an
additional table, you've got to add the table to the policy and then
update all the grants as well.  I think what will happen in practice
is that people will grant to PUBLIC all rights on the policy, and then
do all the access control through the GRANT statements.

An interesting question we haven't much considered is: who can set up
policies and add then to users?  Maybe we should flip this around, and
instead of adding users to policies, we should exempt users from
policies.

CREATE POLICY p1;

And then, if they own p1 and t1, they can do:

ALTER TABLE t1 SET POLICY p1 TO t1_p1_quals;
(or maybe we should associate it to the policy instead of the table:
ALTER POLICY p1 SET TABLE t1 TO t1_p1_quals)

And then the policy applies to everyone who doesn't have the grantable
EXEMPT privilege on the policy.  The policy owner and superuser have
that privilege by default and it can be handed out to others like
this:

GRANT EXEMPT ON POLICY p1 TO snowden;

Then users who have row_level_security=on will bypass RLS if possible,
and otherwise it will be applied.  Users who have
row_level_security=off will bypass RLS if possible, and otherwise
error.  And users who have row_level_security=force will apply RLS
even if they are entitled to bypass it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Simon Riggs
On 26 June 2014 14:59, Stephen Frost  wrote:

> I think this will paint us into a corner such that we won't be able to
> add the capabilities which the users who are most concerned about
> auditing require.

I'm sorry, but this seems exactly the wrong way around to me.

The point here is that we have an extension, right now. Per table
auditing can be supported trivially via reloptions. The alternative is
to fatten the grammar with loads of noise words and invent some new
specific catalog stuff. That gives us no new features over what we
have here, plus it has a hard cost of at least 3 man months work -
starting that now endangers getting a feature into 9.5. Doing that
will force the audit feature to be an in-core only solution, meaning
it cannot be tailored easily for individual requirements and it will
evolve much more slowly towards where our users want it to be.

So I see your approach costing more, taking longer and endangering the
feature schedule, yet offering nothing new. The hard cost isn't
something that should be ignored, we could spend money adding new
features or we could waste it rewriting things. Cost may mean little
to some, but we need to realise that increasing costs may make
something infeasible. We have at most 1 more man month of funds to
assist here, after that we're into volunteer time, which never goes
far.

Anyway, what we should do now is talk about what features we want and
detail what the requirements are, so we stand a chance of assessing
things in the right context.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL access to database attributes

2014-06-26 Thread Vik Fearing
On 06/23/2014 06:45 PM, Pavel Stehule wrote:
> 
> 
> 
> 2014-06-23 18:39 GMT+02:00 Vik Fearing  >:
> 
> On 06/23/2014 06:21 PM, Robert Haas wrote:
> > On Sun, Jun 22, 2014 at 2:59 PM, Pavel Stehule
> mailto:pavel.steh...@gmail.com>> wrote:
> >> I found only one problem - first patch introduce a new property
> >> CONNECTION_LIMIT and replace previously used "CONNECTION LIMIT" in
> >> documentation. But "CONNECTION LIMIT" is still supported, but it
> is not
> >> documented. So for some readers it can look like breaking
> compatibility, but
> >> it is false. This should be documented better.
> >
> > Yeah, I think the old syntax should be documented also.
> 
> Why do we want to document syntax that should eventually be deprecated?
> 
> 
> It is fair to our users. It can be deprecated, ok, we can write in doc -
> this feature will be deprecated in next three years. Don't use it, but
> this should be documentated.

Okay, here is version two of the refactoring patch that documents that
the with-space version is deprecated but still accepted.

The feature patch is not affected by this and so I am not attaching a
new version of that.
-- 
Vik
*** a/contrib/sepgsql/expected/alter.out
--- b/contrib/sepgsql/expected/alter.out
***
*** 110,116  SET search_path = regtest_schema, regtest_schema_2, public;
  --
  -- misc ALTER commands
  --
! ALTER DATABASE regtest_sepgsql_test_database CONNECTION LIMIT 999;
  LOG:  SELinux: allowed { setattr } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:object_r:sepgsql_db_t:s0 tclass=db_database name="regtest_sepgsql_test_database"
  ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet
  ALTER TABLE regtest_table ADD COLUMN d float;
--- 110,116 
  --
  -- misc ALTER commands
  --
! ALTER DATABASE regtest_sepgsql_test_database CONNECTION_LIMIT 999;
  LOG:  SELinux: allowed { setattr } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:object_r:sepgsql_db_t:s0 tclass=db_database name="regtest_sepgsql_test_database"
  ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet
  ALTER TABLE regtest_table ADD COLUMN d float;
*** a/contrib/sepgsql/sql/alter.sql
--- b/contrib/sepgsql/sql/alter.sql
***
*** 85,91  SET search_path = regtest_schema, regtest_schema_2, public;
  --
  -- misc ALTER commands
  --
! ALTER DATABASE regtest_sepgsql_test_database CONNECTION LIMIT 999;
  ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet
  
  ALTER TABLE regtest_table ADD COLUMN d float;
--- 85,91 
  --
  -- misc ALTER commands
  --
! ALTER DATABASE regtest_sepgsql_test_database CONNECTION_LIMIT 999;
  ALTER DATABASE regtest_sepgsql_test_database SET search_path TO regtest_schema, public; -- not supported yet
  
  ALTER TABLE regtest_table ADD COLUMN d float;
*** a/doc/src/sgml/ref/alter_database.sgml
--- b/doc/src/sgml/ref/alter_database.sgml
***
*** 25,31  ALTER DATABASE name [ [ WITH ] where option can be:
  
! CONNECTION LIMIT connlimit
  
  ALTER DATABASE name RENAME TO new_name
  
--- 25,31 
  
  where option can be:
  
! CONNECTION_LIMIT connection_limit
  
  ALTER DATABASE name RENAME TO new_name
  
***
*** 107,117  ALTER DATABASE name RESET ALL
   
  
   
!   connlimit

 
  How many concurrent connections can be made
  to this database.  -1 means no limit.
 

   
--- 107,119 
   
  
   
!   connection_limit

 
  How many concurrent connections can be made
  to this database.  -1 means no limit.
+ The deprecated spelling CONNECTION LIMIT is
+ still accepted.
 

   
*** a/doc/src/sgml/ref/create_database.sgml
--- b/doc/src/sgml/ref/create_database.sgml
***
*** 28,34  CREATE DATABASE name
 [ LC_COLLATE [=] lc_collate ]
 [ LC_CTYPE [=] lc_ctype ]
 [ TABLESPACE [=] tablespace_name ]
![ CONNECTION LIMIT [=] connlimit ] ]
  
   
  
--- 28,34 
 [ LC_COLLATE [=] lc_collate ]
 [ LC_CTYPE [=] lc_ctype ]
 [ TABLESPACE [=] tablespace_name ]
![ CONNECTION_LIMIT [=] connection_limit ] ]
  
   
  
***
*** 148,158  CREATE DATABASE name
   
  
   
!   connlimit

 
  How many concurrent connections can be made
  to this database.  -1 (the default) means no limit.
 

   
--- 148,160 
   
  
   
!   connection_limit

 
  How many concurrent connections can be made
  to this database.  -1 (the default) means no limit.
+ The deprecated spelling CONNECTION

Re: [HACKERS] Changes in amcanreturn() interface to support multicolumn indexes

2014-06-26 Thread Tom Lane
Anastasia Lubennikova  writes:
> There is a question of support multicolumn index only scans for GIST.
> gistcanreturn() can return true if fetch is implemented for all indexed
> columns and false otherwise.
> But that's not very good case for multicolumn indexes.

> I think, it requires extend the interface to add separate columns checking.
> But I can't understand what kind of changes is required
> and how would it affect on previous amcanreturn interface.

Yeah, rather than just returning a bool, I suppose we need it to return
an indication of which index columns it's prepared to return values for.
(And then I guess the runtime API should return nulls for index columns
it can't handle.)  You could probably use either a bitmapset or an integer
List to return the column numbers.

I wouldn't be too worried about backwards compatibility in changing that
API.  If there are any people supporting outside-core index AMs, they've
got much bigger problems than this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-26 Thread Sawada Masahiko
Thank you for updating the patch.
I think that the following behaviour of pg_resetxlog is bug.

$ pg_controldata data | grep "Database system identifier"
Database system identifier:   6029284919152642525

--
$ pg_resetxlog -s0 -n data
Current pg_control values:

pg_control version number:942
Catalog version number:   201406181
Database system identifier:   6029284919152642525
Latest checkpoint's TimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/1810
Latest checkpoint's NextOID:  13004
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:1800
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0


Values to be changed:

First log segment after reset:00010002

--
$ pg_resetxlog  -s0 data
Transaction log reset
$ pg_controldata data | grep "Database system identifier"
Database system identifier:   6029284919152642525

this patch dose not works fine with -s0.

Regards,
--
Sawada Masahiko

On Thursday, June 26, 2014, Petr Jelinek  wrote:

> On 25/06/14 19:43, Sawada Masahiko wrote:
>
>> Hi,
>>
>> I send you review comment about thie patch.
>>
>
> Hello, thanks.
>
>  --
>> $ pg_resetxlog -s -n  data | grep "Database system identifier"
>> Database system identifier:   6028907917695471865
>>
>> The -s option does not worksfine with -n option.
>>
>
> Fixed.
>
>  --
>> $ pg_resetxlog
>> -s6028907917695471865
>> 11
>> data
>> Transaction log reset
>> $ pg_controldata data | grep "Database system identifier"
>> Database system identifier:   18446744073709551615
>>
>> pg_resetxlog is finished successfully, but system identifier was not
>> changed.
>> Also I think that checking data about number of digits is needed.
>>
>>
> It actually did change the identifier, just to ULONG_MAX, since that's the
> maximum value that can be set (scanf does that conversion), I added check
> that limits the maximum value of system identifier input to ULONG_MAX-1 and
> reports error if it's bigger. I also added some additional input validation.
>
> --
>  Petr Jelinek  http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-06-26 Thread Tom Lane
David Rowley  writes:
> If there's no way to tell that the target entry comes from a left join,
> then would it be a bit too quirky to just do the NOT NULL checking when
> list_length(query->rtable) == 1 ? or maybe even loop over query->rtable and
> abort if we find an outer join type? it seems a bit hackish, but perhaps it
> would please more people than it would surprise.

Why do you think you can't tell if the column is coming from the wrong
side of a left join?

I don't recall right now if there is already machinery in the planner for
this, but we could certainly deconstruct the from-clause enough to tell
that.

It's not hard to imagine a little function that recursively descends the
join tree, not bothering to descend into the nullable sides of outer
joins, and returns "true" if it finds a RangeTableRef for the desired RTE.
If it doesn't find the RTE in the not-nullable parts of the FROM clause,
then presumably it's nullable.

The only thing wrong with that approach is if you have to call the
function many times, in which case discovering the information from
scratch each time could get expensive.

I believe deconstruct_jointree already keeps track of whether it's
underneath an outer join, so if we were doing this later than that,
I'd advocate making sure it saves the needed information.  But I suppose
you're trying to do this before that.  It might be that you could
easily save aside similar information during the earlier steps in
prepjointree.c.  (Sorry for not having examined the patch yet, else
I'd probably have a more concrete suggestion.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-06-26 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 26 June 2014 14:59, Stephen Frost  wrote:
> > I think this will paint us into a corner such that we won't be able to
> > add the capabilities which the users who are most concerned about
> > auditing require.
> 
> I'm sorry, but this seems exactly the wrong way around to me.

Can you explain how we are going to migrate from the proposed pgaudit
approach to an in-core approach with the flexibility discussed?  That's
really what I'd like to hear and to discuss.

I don't understand why we shouldn't be considering and discussing that
at this point.

> The point here is that we have an extension, right now.

I understand that.  We have a patch for async I/O too.  We also have an
RLS patch today (one which has been around for a few months...).  That
doesn't mean these things can just go in without any further discussion.
I can't just go commit RLS and ignore the concerns raised by Robert and
others simply because I have an implementation today, yet that seems to
be the thrust of this argument.

> Per table
> auditing can be supported trivially via reloptions. The alternative is
> to fatten the grammar with loads of noise words and invent some new
> specific catalog stuff. That gives us no new features over what we
> have here, plus it has a hard cost of at least 3 man months work -
> starting that now endangers getting a feature into 9.5. 

I'm happy to hear your concerns regarding the level of effort required
and that we need to be pushing to have things worked through and
hopefully committed earlier in this cycle rather than waiting til the
end and possibly missing 9.5

> Doing that
> will force the audit feature to be an in-core only solution, meaning
> it cannot be tailored easily for individual requirements and it will
> evolve much more slowly towards where our users want it to be.

This makes less than zero sense to me- having it in core allows us
flexibility with regard to what it does and how it works (more so than
an extension does, in fact..  the extension is obviously constrained in
terms of what it is capable of doing).  I agree that changes to an
in-core solution will need to follow the PG release cycle- but the same
is true of contrib modules.  If you want your own release cycle for
pgaudit which is independent of the PG one, then you should be putting
it up on PGXN rather than submitting it to be included in contrib.

The rate of evolution is defined by those who are working on it, with
the caveat that new features, etc, happen as part of the PG release
cycle which are snapshots of where we are at a given point.

> So I see your approach costing more, taking longer and endangering the
> feature schedule, yet offering nothing new.

Just because you choose to ignore the additional flexibility and
capability which an in-core solution would offer doesn't mean that it
doesn't exist.  Having a real SQL syntax which DBAs can use through the
PG protocol and tailor to their needs, with a permissions model around
it, is a valuable feature over and above what is being proposed here.

> The hard cost isn't
> something that should be ignored, we could spend money adding new
> features or we could waste it rewriting things. Cost may mean little
> to some, but we need to realise that increasing costs may make
> something infeasible. We have at most 1 more man month of funds to
> assist here, after that we're into volunteer time, which never goes
> far.

I'm quite aware that there are costs to doing development.  I do not
agree that means we should be compromising or putting ourselves in a
position which prevents us from moving forward.

> Anyway, what we should do now is talk about what features we want and
> detail what the requirements are, so we stand a chance of assessing
> things in the right context.

Sure, I had been doing that earlier in the thread before we got off on
this tangent..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Robert Haas
On Thu, Jun 26, 2014 at 6:20 AM, Andres Freund  wrote:
> On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
>> On Wed, Jun 25, 2014 at 4:36 PM, Andres Freund  
>> wrote:
>> > Since it better be legal to manipulate a atomic variable while holding a
>> > spinlock we cannot simply use an arbitrary spinlock as backing for
>> > atomics. That'd possibly cause us to wait on ourselves or cause
>> > deadlocks.
>>
>> I think that's going to fall afoul of Tom's previously-articulated "no
>> loops inside spinlocks" rule.  Most atomics, by nature, are
>> loop-until-it-works.
>
> Well, so is TAS itself :).

Yeah, which is why we have a rule that you're not suppose to acquire
another spinlock while already holding one.  You're trying to make the
spinlocks used to simulate atomic ops an exception to that rule, but
I'm not sure that's OK.  An atomic op is a lot more expensive than a
regular unlocked load or store even if it doesn't loop, and if it does
loop, it's worse, potentially by a large multiple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Robert Haas
On Thu, Jun 26, 2014 at 7:21 AM, Andres Freund  wrote:
> On 2014-06-25 20:22:31 -0400, Robert Haas wrote:
>> On Wed, Jun 25, 2014 at 5:42 PM, Heikki Linnakangas
>>  wrote:
>> > I think having a separate file for each architecture is nice. I totally
>> > agree that they don't belong in src/include/storage, though. s_lock.h has
>> > always been misplaced there, but we've let it be for historical reasons, 
>> > but
>> > now that we're adding a dozen new files, it's time to move them out.
>>
>> I find the current organization pretty confusing, but maybe that could
>> be solved by better documentation of what's supposed to go in each
>> architecture or compiler-dependent file.
>
> The idea is that first a architecture specific file (atomics-arch-*.h)
> is included. That file can provide a (partial) implementation for the
> specific architecture. Or it can do pretty much nothing.
>
> After that a compiler specific file is included
> (atomics-generic-*.h). If atomics aren't yet implemented that can
> provide an intrinsics based implementation if the compiler version has
> support for it. At the very least a compiler barrier should be provided.
>
> After that the spinlock based fallback implementation
> (atomics-fallback.h) provides atomics and barriers if not yet
> available. By here we're sure that nothing else will provide them.
>
> Then we can provide operations (atomics-generic.h) that build ontop of
> the provided functions. E.g. implement _sub, _and et al.
>
> I'll include some more of that explanation in the header.

I get the general principle, but I think it would be good to have one
place that says something like:

Each architecture must provide A and B, may provide both or neither of
C and D, and may also any or all of E, F, and G.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-26 Thread Tomas Vondra
Attached is v2 of the patch, with some cleanups / minor improvements:

* improved comments, whitespace fixed / TODOs etc.

* tracking inital # of buckets (similar to initial # of batches)

* adding info about buckets to EXPLAIN ANALYZE, similar to batches - I
didn't want to make it overly complex, so the info about initial
bucket/batch count is added if at least one them is modified

* modified threshold triggering the growth, to get NTUP_PER_BUCKET on
average (see the NTUP_GROW_THRESHOLD comment nodeHash.c)

* there's a single FIXME, related to counting tuples in the

One thing that's important to note is the difference between # of
batches and # of buckets. While one # of batches is "global" the # of
buckets is 'within a batch'. So theoretically each batch can use
different number of buckets.

However the value is reused between batches, so it only grows. That
means this is possible:

  initial: 1024 buckets (before 1st batch)
  batch 1: 1024 buckets
  batch 2: 1024 buckets
  batch 3: 4096 buckets
  batch 4: 8192 buckets

while this is not:

  initial: 1024 buckets (before 1st batch)
  batch 1: 1024 buckets
  batch 2: 4096 buckets
  batch 3: 1024 buckets
  batch 4: 8192 buckets

However in practice I expect the first batch will to do all the work,
and the following batches will just reuse the same number of buckets.
This of course assumes the batches have similar tuple sizes etc.

So the first batch will do all the reshuffling the tables, and the
following batches will reuse the 'right' number of buckets from the start.

regards
Tomas
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0d9663c..db3a953 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		if (es->format != EXPLAIN_FORMAT_TEXT)
 		{
 			ExplainPropertyLong("Hash Buckets", hashtable->nbuckets, es);
+			ExplainPropertyLong("Original Hash Buckets",
+hashtable->nbuckets_original, es);
 			ExplainPropertyLong("Hash Batches", hashtable->nbatch, es);
 			ExplainPropertyLong("Original Hash Batches",
 hashtable->nbatch_original, es);
 			ExplainPropertyLong("Peak Memory Usage", spacePeakKb, es);
 		}
-		else if (hashtable->nbatch_original != hashtable->nbatch)
+		else if ((hashtable->nbatch_original != hashtable->nbatch) || (hashtable->nbuckets_original != hashtable->nbuckets))
 		{
 			appendStringInfoSpaces(es->str, es->indent * 2);
 			appendStringInfo(es->str,
-			"Buckets: %d  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
-			 hashtable->nbuckets, hashtable->nbatch,
-			 hashtable->nbatch_original, spacePeakKb);
+			"Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
+			 hashtable->nbuckets, hashtable->nbuckets_original,
+			 hashtable->nbatch, hashtable->nbatch_original, spacePeakKb);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 589b2f1..879b336 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -39,6 +39,7 @@
 
 
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
+static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 	  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -271,6 +272,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 */
 	hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
 	hashtable->nbuckets = nbuckets;
+	hashtable->nbuckets_original = nbuckets;
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->buckets = NULL;
 	hashtable->keepNulls = keepNulls;
@@ -386,6 +388,22 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 /* Target bucket loading (tuples per bucket) */
 #define NTUP_PER_BUCKET			10
 
+/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets.
+ * 
+ * Once we reach the threshold we double the number of buckets, and we
+ * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That
+ * means these two equations should hold
+ * 
+ *   b = 2a (growth)
+ *   (a + b)/2 = 1  (oscillate around NTUP_PER_BUCKET)
+ * 
+ * which means b=1. (a = b/2). If we wanted higher threshold, we
+ * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for
+ * growth, leading to (b=1.6). Or (b=8a) giving 1. etc.
+ * 
+ * Let's start with doubling the bucket count, i.e. 1.333. */
+#define NTUP_GROW_THRESHOLD 1.333
+
 void
 ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 		int *numbuckets,
@@ -682,6 +700,92 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 }
 
 /*
+ * ExecHashIncreaseNumBuckets
+ *		increase the original number of buckets in order to reduce
+ *		number of tuples per bucket
+ */
+static void
+ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
+{
+	int			i;

Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 26, 2014 at 6:20 AM, Andres Freund  wrote:
>> On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
>>> I think that's going to fall afoul of Tom's previously-articulated "no
>>> loops inside spinlocks" rule.  Most atomics, by nature, are
>>> loop-until-it-works.

>> Well, so is TAS itself :).

> Yeah, which is why we have a rule that you're not suppose to acquire
> another spinlock while already holding one.  You're trying to make the
> spinlocks used to simulate atomic ops an exception to that rule, but
> I'm not sure that's OK.  An atomic op is a lot more expensive than a
> regular unlocked load or store even if it doesn't loop, and if it does
> loop, it's worse, potentially by a large multiple.

Well, the point here is to be sure that there's a reasonably small bound
on how long we hold the spinlock.  It doesn't have to be zero, just small.

Would it be practical to say that the coding rule is that you can only
use atomic ops on fields that are protected by the spinlock, ie, nobody
else is supposed to be touching those fields while you have the spinlock?
If that's the case, then the atomic op should see no contention and thus
not take very long.

On the other hand, if the atomic op is touching something not protected
by the spinlock, that seems to me to be morally equivalent to taking a
spinlock while holding another one, which as Robert says is forbidden
by our current coding rules, and for very good reasons IMO.

However, that may be safe only as long as we have real atomic ops.
If we're simulating those using spinlocks then you have to ask questions
about how many underlying spinlocks there are and whether artificial
contention might ensue (due to the same spinlock being used for multiple
things).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Andres Freund
On 2014-06-26 11:47:15 -0700, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Jun 26, 2014 at 6:20 AM, Andres Freund  
> > wrote:
> >> On 2014-06-25 20:16:08 -0400, Robert Haas wrote:
> >>> I think that's going to fall afoul of Tom's previously-articulated "no
> >>> loops inside spinlocks" rule.  Most atomics, by nature, are
> >>> loop-until-it-works.
> 
> >> Well, so is TAS itself :).
> 
> > Yeah, which is why we have a rule that you're not suppose to acquire
> > another spinlock while already holding one.  You're trying to make the
> > spinlocks used to simulate atomic ops an exception to that rule, but
> > I'm not sure that's OK.  An atomic op is a lot more expensive than a
> > regular unlocked load or store even if it doesn't loop, and if it does
> > loop, it's worse, potentially by a large multiple.
> 
> Well, the point here is to be sure that there's a reasonably small bound
> on how long we hold the spinlock.  It doesn't have to be zero, just small.
> 
> Would it be practical to say that the coding rule is that you can only
> use atomic ops on fields that are protected by the spinlock, ie, nobody
> else is supposed to be touching those fields while you have the spinlock?
> If that's the case, then the atomic op should see no contention and thus
> not take very long.

I don't really see usecases where it's not related data that's being
touched, but: The point is that the fastpath (not using a spinlock) might
touch the atomic variable, even while the slowpath has acquired the
spinlock. So the slowpath can't just use non-atomic operations on the
atomic variable.
Imagine something like releasing a buffer pin while somebody else is
doing something that requires holding the buffer header spinlock.

> On the other hand, if the atomic op is touching something not protected
> by the spinlock, that seems to me to be morally equivalent to taking a
> spinlock while holding another one, which as Robert says is forbidden
> by our current coding rules, and for very good reasons IMO.
>
> However, that may be safe only as long as we have real atomic ops.
> If we're simulating those using spinlocks then you have to ask questions
> about how many underlying spinlocks there are and whether artificial
> contention might ensue (due to the same spinlock being used for multiple
> things).

With the code as submitted it's one spinlock per atomic variable. Which
is fine until spinlocks are emulated using semaphores - in that case a
separate array of semaphores (quite small in the patch as submitted) is
used so there's no chance of deadlocks against a 'real' spinlock or
anything like that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's the point of json_extract_path_op etc?

2014-06-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 06/25/2014 02:46 PM, Tom Lane wrote:
>> Why do we have essentially duplicate pg_proc entries for json_extract_path
>> and json_extract_path_op?
>> Likewise for json_extract_path_text_op, jsonb_extract_path_op, and
>> jsonb_extract_path_text_op.

> ISTR trying that and running into problems, maybe with opr_sanity checks.

Well, the reason that opr_sanity is complaining is that there's a
violation of our general policy of documenting either the operator or
the underlying function, not both.  Using a separate pg_proc entry
like this doesn't mean you didn't violate the policy; you just hid the
violation from opr_sanity.

Do we actually want to document these things as both operators and
functions?  If we do, then the right answer is to list them as known
exceptions in the opr_sanity test, not to hide the fact that we're
violating the general documentation policy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] better atomics - v0.5

2014-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2014 at 1:47 PM, Tom Lane  wrote:
> Robert Haas  writes:
> Would it be practical to say that the coding rule is that you can only
> use atomic ops on fields that are protected by the spinlock, ie, nobody
> else is supposed to be touching those fields while you have the spinlock?
> If that's the case, then the atomic op should see no contention and thus
> not take very long.

I wonder if this is true in all cases.  The address you are locking
might be logically protected but at the same time nearby to other
memory that is under contention.  In other words, I don't think you
can assume an atomic op locks exactly 4 bytes of memory for the
operation.

> On the other hand, if the atomic op is touching something not protected
> by the spinlock, that seems to me to be morally equivalent to taking a
> spinlock while holding another one, which as Robert says is forbidden
> by our current coding rules, and for very good reasons IMO.

Well not quite: you don't have the possibility of deadlock.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-06-26 Thread Fabrízio de Royes Mello
On Wed, Jun 11, 2014 at 1:19 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

> Hi all,
>
> As part of GSoC2014 I'm sending a patch to add the capability of change an
> unlogged table to logged [1].
>
>
Hi all,

As part of GSoC2014 and with agreement of my mentor and reviewer (Stephen
Frost) I've send a complement of the first patch to add the capability to
change a regular table to unlogged.

With this patch we finish the main goals of the GSoC2014 and now we'll work
in the additional goals.

Regards,


[1]
https://wiki.postgresql.org/wiki/Allow_an_unlogged_table_to_be_changed_to_logged_GSoC_2014

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..424f2e9 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ] name
 ENABLE REPLICA RULE rewrite_rule_name
 ENABLE ALWAYS RULE rewrite_rule_name
 CLUSTER ON index_name
+SET {LOGGED | UNLOGGED}
 SET WITHOUT CLUSTER
 SET WITH OIDS
 SET WITHOUT OIDS
@@ -432,6 +433,20 @@ ALTER TABLE [ IF EXISTS ] name

 

+SET {LOGGED | UNLOGGED}
+
+ 
+  This form change the table persistence type from unlogged to permanent or 
+  from unlogged to permanent by rewriting the entire contents of the table 
+  and associated indexes into new disk files.
+ 
+ 
+  Changing the table persistence type acquires an ACCESS EXCLUSIVE lock.
+ 
+
+   
+
+   
 SET WITHOUT CLUSTER
 
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 60d387a..9dfdca2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -125,18 +125,19 @@ static List *on_commits = NIL;
  * a pass determined by subcommand type.
  */
 
-#define AT_PASS_UNSET			-1		/* UNSET will cause ERROR */
-#define AT_PASS_DROP			0		/* DROP (all flavors) */
-#define AT_PASS_ALTER_TYPE		1		/* ALTER COLUMN TYPE */
-#define AT_PASS_OLD_INDEX		2		/* re-add existing indexes */
-#define AT_PASS_OLD_CONSTR		3		/* re-add existing constraints */
-#define AT_PASS_COL_ATTRS		4		/* set other column attributes */
+#define AT_PASS_UNSET-1		/* UNSET will cause ERROR */
+#define AT_PASS_DROP0		/* DROP (all flavors) */
+#define AT_PASS_ALTER_TYPE			1		/* ALTER COLUMN TYPE */
+#define AT_PASS_OLD_INDEX			2		/* re-add existing indexes */
+#define AT_PASS_OLD_CONSTR			3		/* re-add existing constraints */
+#define AT_PASS_COL_ATTRS			4		/* set other column attributes */
 /* We could support a RENAME COLUMN pass here, but not currently used */
-#define AT_PASS_ADD_COL			5		/* ADD COLUMN */
-#define AT_PASS_ADD_INDEX		6		/* ADD indexes */
-#define AT_PASS_ADD_CONSTR		7		/* ADD constraints, defaults */
-#define AT_PASS_MISC			8		/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_COL5		/* ADD COLUMN */
+#define AT_PASS_ADD_INDEX			6		/* ADD indexes */
+#define AT_PASS_ADD_CONSTR			7		/* ADD constraints, defaults */
+#define AT_PASS_MISC8		/* other stuff */
+#define AT_PASS_SET_LOGGED_UNLOGGED	9		/* SET LOGGED and UNLOGGED */
+#define AT_NUM_PASSES10
 
 typedef struct AlteredTableInfo
 {
@@ -376,6 +377,7 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMOD
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 	 char *cmd, List **wqueue, LOCKMODE lockmode,
 	 bool rewrite);
+static void ATPostAlterSetLoggedUnlogged(Oid relid);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -402,6 +404,13 @@ static void ATExecAddOf(Relation rel, const TypeName *ofTypename, LOCKMODE lockm
 static void ATExecDropOf(Relation rel, LOCKMODE lockmode);
 static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKMODE lockmode);
 static void ATExecGenericOptions(Relation rel, List *options);
+static void ATPrepSetLogged(Relation rel);
+static void ATPrepSetUnLogged(Relation rel);
+static void ATExecSetLogged(Relation rel);
+static void ATExecSetUnLogged(Relation rel);
+
+static void AlterTableSetLoggedCheckForeignConstraints(Relation rel);
+static void AlterTableSetLoggedOrUnlogged(Relation rel, bool toLogged);
 
 static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
    ForkNumber forkNum, char relpersistence);
@@ -2854,6 +2863,8 @@ AlterTableGetLockLevel(List *cmds)
 			case AT_AddIndexConstraint:
 			case AT_ReplicaIdentity:
 			case AT_SetNotNull:
+			case AT_SetLogged:
+			case AT_SetUnLogged:
 cmd_lockmode = AccessExclusiveLock;
 break;
 
@@ -3245,6 +3256,15 @@ ATPrepCmd(List **wque

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-06-26 Thread Petr Jelinek

Hello,

I went through the patch, seems mostly fine, I adjusted some wording, 
removed the default .pgpass file info since it's not accurate, and 
replaced couple of phrases with (hopefully) more informative ones.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
commit 4d77ccf19ac77dc0f43a58f4d2b74d4aebed871d
Author: Pavel Stehule 
Date:   Mon Jun 23 19:38:41 2014 +0200

without comments

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ee6ec3a..6a172dc 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -556,6 +556,15 @@ EOF
   
 
 
+
+  --help-variables
+  
+  
+  Show help about psql variables,
+  and exit.
+  
+  
+
   
  
 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 3aa3c16..1304e1a 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -78,12 +78,13 @@ usage(void)
 	printf(_("  -f, --file=FILENAME  execute commands from file, then exit\n"));
 	printf(_("  -l, --list   list available databases, then exit\n"));
 	printf(_("  -v, --set=, --variable=NAME=VALUE\n"
-			 "   set psql variable NAME to VALUE\n"));
+			 "   set psql variable NAME to VALUE e.g.: -v ON_ERROR_STOP=1\n"));
 	printf(_("  -V, --versionoutput version information, then exit\n"));
 	printf(_("  -X, --no-psqlrc  do not read startup file (~/.psqlrc)\n"));
 	printf(_("  -1 (\"one\"), --single-transaction\n"
 			 "   execute as a single transaction (if non-interactive)\n"));
 	printf(_("  -?, --help   show this help, then exit\n"));
+	printf(_("  --help-variables show a list of all specially treated variables, then exit\n"));
 
 	printf(_("\nInput and output options:\n"));
 	printf(_("  -a, --echo-all   echo all input from script\n"));
@@ -279,6 +280,92 @@ slashUsage(unsigned short int pager)
 }
 
 
+/*
+ * show list of available variables (options) from command line
+ */
+void
+help_variables(void)
+{
+	printf(_("List of specially treated variables.\n"));
+
+	printf(_("psql variables:\n"));
+	printf(_("Usage:\n"));
+	printf(_("  psql --set=NAME=VALUE\n  or \\set NAME VALUE in interactive mode\n\n"));
+
+	printf(_("  AUTOCOMMIT successful SQL commands are automatically committed\n"));
+	printf(_("  COMP_KEYWORD_CASE  determines which letter case to use when completing an SQL key word\n"));
+	printf(_("  DBNAME name of currently connected database\n"));
+	printf(_("  ECHO   control what input can be written to standard output [all, queries]\n"));
+	printf(_("  ECHO_HIDDENdisplay internal queries executed by backslash commands\n"));
+	printf(_("  ENCODING   current client character set encoding\n"));
+	printf(_("  FETCH_COUNTthe number of result rows to fetch and display at a time\n"
+	 " (default: 0=unlimited)\n"));
+	printf(_("  HISTCONTROLcontrol history list [ignorespace, ignoredups, ignoreboth]\n"));
+	printf(_("  HISTFILE   file name used to store the history list\n"));
+	printf(_("  HISTSIZE   the number of commands to store in the command history\n"));
+	printf(_("  HOST   the currently connected database server\n"));
+	printf(_("  IGNOREEOF  if unset, sending an EOF to interactive session terminates application\n"));
+	printf(_("  LASTOIDthe value of last affected OID\n"));
+	printf(_("  ON_ERROR_ROLLBACK  when on, the error doesn't stop transaction (uses implicit SAVEPOINTs)\n"));
+	printf(_("  ON_ERROR_STOP  stop batch execution after error\n"));
+	printf(_("  PORT   server port of the current connection\n"));
+	printf(_("  PROMPT1, PROMPT2, PROMPT3\n"
+	 " specify the psql prompt\n"));
+	printf(_("  QUIET  run quietly (same as -q option)\n"));
+	printf(_("  SINGLELINE end of line terminates SQL command mode (same as -S option)\n"));
+	printf(_("  SINGLESTEP single-step mode (same as -s option)\n"));
+	printf(_("  USER   the database user currently connected as\n"));
+	printf(_("  VERBOSITY  control verbosity of error reports [default, verbose, terse]\n"));
+
+	printf(_("\nPrinting options:\n"));
+	printf(_("Usage:\n"));
+	printf(_("  psql --pset=NAME[=VALUE]\n  or \\pset NAME [VALUE] in interactive mode\n\n"));
+
+	printf(_("  border border style (number)\n"));
+	printf(_("  columnsset the target width for the wrapped format\n"));
+	printf(_("  expanded (or x)toggle expanded output\n"));
+	printf(_("  fieldsep   field separator for unaligned output (default '|')\n"));
+	printf(_("  fieldsep_zero  set field separator in unaligned mode to zero\n"));
+	printf(_("  format set output format [unaligned, aligned, wrapped,

Re: [HACKERS] review: Built-in binning functions

2014-06-26 Thread Petr Jelinek

Here is v2,

with fixed documentation and numeric version of the implementation.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f475458..38330d4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -920,6 +920,51 @@
width_bucket(5.35, 0.024, 10.06, 5)
3
   
+
+  
+   
+
+ varwidth_bucket
+
+varwidth_bucket(op anyelemnt, thresholds anyarray)
+   
+   int
+   return the bucket to which operand would
+   be assigned based on right-bound bucket thresholds,
+   the thresholds must be ordered (smallest first)
+   varwidth_bucket(now(), array['yesterday', 'today', 'tomorrow'])
+   3
+  
+
+  
+   varwidth_bucket(op numeric, thresholds numerc[])
+   int
+   return the bucket to which operand would
+   be assigned based on right-bound bucket thresholds,
+   the thresholds must be ordered (smallest first)
+   varwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric[])
+   3
+  
+
+  
+   varwidth_bucket(op dp, thresholds dp[])
+   int
+   return the bucket to which operand would
+   be assigned based on right-bound bucket thresholds,
+   the thresholds must be ordered (smallest first)
+   varwidth_bucket(5.35, ARRAY[1, 3, 4, 6])
+   3
+  
+
+  
+   varwidth_bucket(op bigint, thresholds bigint[])
+   int
+   return the bucket to which operand would
+   be assigned based on right-bound bucket thresholds,
+   the thresholds must be ordered (smallest first)
+   varwidth_bucket(5, ARRAY[1, 3, 4, 6])
+   3
+  
  
 

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index f8e94ec..3a77638 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -5502,3 +5502,130 @@ array_replace(PG_FUNCTION_ARGS)
    fcinfo);
 	PG_RETURN_ARRAYTYPE_P(array);
 }
+
+/*
+ * Implements the generic version of the varwidth_bucket() function.
+ * See also varwidth_bucket_numeric().
+ *
+ * Uses the generic varwidth_bucket_internal to do the actual work.
+ */
+Datum
+varwidth_bucket_generic(PG_FUNCTION_ARGS)
+{
+	Datum		operand = PG_GETARG_DATUM(0);
+	ArrayType  *thresholds = PG_GETARG_ARRAYTYPE_P(1);
+	Oid			collation = PG_GET_COLLATION();
+	Oid			element_type = get_fn_expr_argtype(fcinfo->flinfo, 0);
+	TypeCacheEntry *typentry;
+	int32		ret;
+
+	/* Make sure val and thresholds use same types so we can be sure they can be compared. */
+	if (element_type != ARR_ELEMTYPE(thresholds))
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("cannot calculate buckets for operand using thresholds of different type")));
+
+	/* Fetch information about the input type */
+	typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+	if (typentry == NULL ||
+		typentry->type_id != element_type)
+	{
+		typentry = lookup_type_cache(element_type,
+	 TYPECACHE_CMP_PROC_FINFO);
+		if (!OidIsValid(typentry->cmp_proc_finfo.fn_oid))
+			ereport(ERROR,
+	(errcode(ERRCODE_UNDEFINED_FUNCTION),
+			   errmsg("could not identify a comparison function for type %s",
+	  format_type_be(element_type;
+		fcinfo->flinfo->fn_extra = (void *) typentry;
+	}
+
+	ret = varwidth_bucket_internal(operand, thresholds,
+   &typentry->cmp_proc_finfo, collation,
+   typentry->typlen, typentry->typbyval,
+   typentry->typalign);
+
+	/* Avoid leaking memory when handed toasted input. */
+	PG_FREE_IF_COPY(thresholds, 1);
+
+	PG_RETURN_INT32(ret);
+}
+
+/*
+ * Implements the generic version of the varwidth_bucket() function.
+ * See also varwidth_bucket_float8() and varwidth_bucket_int8().
+ *
+ * 'thresholds' is an array (must be sorted from smallest to biggest value)
+ * containing right-bounds for each "bucket", varwidth_bucket() returns
+ * integer indicating the bucket number that 'operand' belongs to. An operand
+ * greater than the upper bound is assigned to an additional bucket.
+ */
+int32
+varwidth_bucket_internal(Datum operand,
+		 ArrayType *thresholds,
+		 FmgrInfo *cmpfunc,
+		 Oid collation,
+		 int typlen,
+		 bool typbyval,
+		 char typalign)
+{
+	char	   *thresholds_data;
+	int32		left;
+	int32		mid;
+	int32		right;
+	FunctionCallInfoData locfcinfo;
+
+	/* Verify input */
+	if (ARR_NDIM(thresholds) != 1)
+		ereport(ERROR,
+			(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+			 errmsg("thresholds must be one dimensional array ")));
+
+	if (ARR_HASNULL(thresholds))
+		ereport(ERROR,
+			(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+			 errmsg("thresholds array must not contain NULLs")));
+
+	thresholds_data = (char *) ARR_DATA_PTR(thresholds);
+
+	InitFunctionCallInfoData(locfcinfo, cmpfunc, 2,
+			 collation, NULL, NULL);
+
+	/* Find the bucket */
+	left = 0;

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-26 Thread Gavin Flower

On 27/06/14 00:12, Rushabh Lathia wrote:

Thanks for sharing latest patch.

For me this syntax is limiting the current returning clause 
implementation.
Because its not allowing the returning PRIMARY KEYS with other 
columns, and

if user or application require that they need to do RETURNING *. Following
syntax not working, which i think is very useful in term if someone 
want to use

returning clause with PRIMARY KEY as well as some other table columns.

INSERT INTO dept VALUES (10,'ACCOUNTING','NEW YORK') returning primary 
key, dname;


I think allowing other columns with PRIMARY KEY would be more useful 
syntax.
Even in later versions if we want to extend this syntax to return 
UNIQUE KEY,

SEQUENCE VALUES, etc.. comma separation syntax will be more handy.

Others please share your inputs/thoughts.



[...]

I agree 100%.

I note that DELETE & INSERT have a RETURNING clause.  So the semantics 
and syntax should be the same, as much as is practicable.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Andres Freund
Hi,

I just spent 20+ hours debugging a elusive problem which only happened
under heavy concurrency. Slight changes to the code made it
disappear. In the end it turned out that gcc liked to move *one*
instruction across the SpinLockRelease() - and only sometimes. Unrelated
changes made the vanish.

The relevant code was:
volatile LWLock *lock = l;
...
dlist_push_head((dlist_head *) &lock->waiters, &MyProc->lwWaitLink);
SpinLockRelease(&lock->mutex);

Now you could argue that it's my fault because of two things:
a) The code casts away the volatile from lock.
b) MyProc isn't volatile.

But a) isn't really avoidable because it'll otherwise generate compiler
warnings and b) is done that way all over the tree. E.g. lwlock.c has
several copies of (note the nonvolatility of proc):
volatile LWLock *lock = l;
PGPROC *proc = MyProc;
...
proc->lwWaiting = true;
proc->lwWaitMode = LW_WAIT_UNTIL_FREE;
proc->lwWaitLink = NULL;

/* waiters are added to the front of the queue */
proc->lwWaitLink = lock->head;
if (lock->head == NULL)
lock->tail = proc;
lock->head = proc;

/* Can release the mutex now */
SpinLockRelease(&lock->mutex);
There's nothing forcing the compiler to not move any of the proc->*
assignments past the SpinLockRelease(). And indeed in my case it was
actually the store to lwWaitLink that was moved across the lock.

I think it's just pure luck that there's no active bug (that we know of)
caused by this. I wouldn't be surprised if some dubious behaviour we've
seen would be caused by similar issues.

Now, we can fix this and similar cases by more gratuitous use of
volatile. But for one we're never going to find all cases. For another
it won't help *at all* for architectures with looser CPU level memory
ordering guarantees.
I think we finally need to bite the bullet and make all S_UNLOCKs
compiler/write barriers.

I'd previously, in
http://www.postgresql.org/message-id/20130920151110.ga8...@awork2.anarazel.de,
gone through the list of S_UNLOCKs and found several that were
lacking. Most prominently the default S_UNLOCK is just
#define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0)
which allows the compiler to move non volatile access across and does
nothing for CPU level cache coherency.

I think we should rework things so that we fall back to
pg_write_barrier(), (*((volatile slock_t *) (lock)) = 0) instead of what
we have right now.
That'd require to make barrier.h independent from s_lock.h, but I think
that'd be a pretty clear improvement. One open question is what happens
with the SpinlockRelease() when barriers are implemented using spinlocks
and we need a barrier for the SpinlockRelease().

Better ideas, other suggestions?

I'm now going to drink.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Tom Lane
Andres Freund  writes:
> I think we should rework things so that we fall back to
> pg_write_barrier(), (*((volatile slock_t *) (lock)) = 0) instead of what
> we have right now.

Surely it had better be a read barrier as well?  And S_LOCK the same?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-26 Thread Jeff Janes
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar  wrote:

>
> Thank you for giving your time,  Please review the updated patch attached in
> the mail.
>
>
>
> 1.  Rebased the patch
>
> 2.  Implemented parallel execution for new option --analyze-in-stages

Hi Dilip,

Thanks for rebasing.

I haven't done an architectural or code review on it, I just applied
it and used it a little on Linux.

Based on that, I find most importantly that it doesn't seem to
correctly vacuum tables which have upper case letters in the name,
because it does not quote the table names when they need quotes.

Of course that needs to be fixed, but taking it as it is, the
resulting error message to the console is just:

 : Execute failed

Which is not very informative.  I get the same error if I do a "pg_ctl
shutdown -mi" while running the parallel vacuumdb. Without the -j
option it produces a more descriptive error message "FATAL:
terminating connection due to administrator command", so something
about the new feature suppresses the informative error messages.

I get some compiler warnings with the new patch:

vac_parallel.c: In function 'parallel_msg_master':
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c: In function 'exit_horribly':
vac_parallel.c:1071: warning: 'noreturn' function does return


In the usage message, the string has a tab embedded within it
(immediately before "use") that should be converted to literal spaces,
otherwise the output of --help gets misaligned:

printf(_("  -j, --jobs=NUM  use this many parallel
jobs to vacuum\n"));

Thanks for the work on this.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What's the point of json_extract_path_op etc?

2014-06-26 Thread Andrew Dunstan


On 06/26/2014 03:11 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 06/25/2014 02:46 PM, Tom Lane wrote:

Why do we have essentially duplicate pg_proc entries for json_extract_path
and json_extract_path_op?
Likewise for json_extract_path_text_op, jsonb_extract_path_op, and
jsonb_extract_path_text_op.

ISTR trying that and running into problems, maybe with opr_sanity checks.

Well, the reason that opr_sanity is complaining is that there's a
violation of our general policy of documenting either the operator or
the underlying function, not both.  Using a separate pg_proc entry
like this doesn't mean you didn't violate the policy; you just hid the
violation from opr_sanity.

Do we actually want to document these things as both operators and
functions?  If we do, then the right answer is to list them as known
exceptions in the opr_sanity test, not to hide the fact that we're
violating the general documentation policy.



It's quite important that we have the variadic functions exposed.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Andres Freund
On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
> Andres Freund  writes:
> > I think we should rework things so that we fall back to
> > pg_write_barrier(), (*((volatile slock_t *) (lock)) = 0) instead of what
> > we have right now.
> 
> Surely it had better be a read barrier as well?

I don't immediately see why it has to be read barrier? Hoisting a load
from after the release into the locked area of code should be safe? Note
that 'bad' reads can only happen for variables which aren't protected by
the spinlock since the S_LOCK needs to have acquire semantics and no
other process can modify protected variables concurrently.
The important thing is that all modifications that have been done inside
the spinlock are visible to other backends and that no writes are moved
outside the protected are.

> And S_LOCK the same?

It better be a read barrier, yes. I haven't checked yet, but I assume
that pretty much all TAS/tas implementation already guarantee that. I
think if not we'd seen problems. Well, at least on platforms that
receive testing under concurrent circumstances :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-06-26 Thread Tomas Vondra
On 26.6.2014 20:43, Tomas Vondra wrote:
> Attached is v2 of the patch, with some cleanups / minor improvements:
> 
> * there's a single FIXME, related to counting tuples in the

Meh, I couldn't resist resolving this FIXME, so attached is v3 of the
patch. This just adds a proper 'batch tuples' counter to the hash table.

All comments, measurements on different queries etc. welcome. We'll
certainly do a lot of testing, because this was a big issue for us.

regards
Tomas
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 0d9663c..db3a953 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1900,18 +1900,20 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		if (es->format != EXPLAIN_FORMAT_TEXT)
 		{
 			ExplainPropertyLong("Hash Buckets", hashtable->nbuckets, es);
+			ExplainPropertyLong("Original Hash Buckets",
+hashtable->nbuckets_original, es);
 			ExplainPropertyLong("Hash Batches", hashtable->nbatch, es);
 			ExplainPropertyLong("Original Hash Batches",
 hashtable->nbatch_original, es);
 			ExplainPropertyLong("Peak Memory Usage", spacePeakKb, es);
 		}
-		else if (hashtable->nbatch_original != hashtable->nbatch)
+		else if ((hashtable->nbatch_original != hashtable->nbatch) || (hashtable->nbuckets_original != hashtable->nbuckets))
 		{
 			appendStringInfoSpaces(es->str, es->indent * 2);
 			appendStringInfo(es->str,
-			"Buckets: %d  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
-			 hashtable->nbuckets, hashtable->nbatch,
-			 hashtable->nbatch_original, spacePeakKb);
+			"Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
+			 hashtable->nbuckets, hashtable->nbuckets_original,
+			 hashtable->nbatch, hashtable->nbatch_original, spacePeakKb);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 589b2f1..b32fb6b 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -39,6 +39,7 @@
 
 
 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
+static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
 static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
 	  int mcvsToUse);
 static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -271,6 +272,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	 */
 	hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
 	hashtable->nbuckets = nbuckets;
+	hashtable->nbuckets_original = nbuckets;
 	hashtable->log2_nbuckets = log2_nbuckets;
 	hashtable->buckets = NULL;
 	hashtable->keepNulls = keepNulls;
@@ -285,6 +287,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 	hashtable->nbatch_outstart = nbatch;
 	hashtable->growEnabled = true;
 	hashtable->totalTuples = 0;
+	hashtable->batchTuples = 0;
 	hashtable->innerBatchFile = NULL;
 	hashtable->outerBatchFile = NULL;
 	hashtable->spaceUsed = 0;
@@ -386,6 +389,22 @@ ExecHashTableCreate(Hash *node, List *hashOperators, bool keepNulls)
 /* Target bucket loading (tuples per bucket) */
 #define NTUP_PER_BUCKET			10
 
+/* Multiple of NTUP_PER_BUCKET triggering the increase of nbuckets.
+ * 
+ * Once we reach the threshold we double the number of buckets, and we
+ * want to get 1.0 on average (to get NTUP_PER_BUCKET on average). That
+ * means these two equations should hold
+ * 
+ *   b = 2a (growth)
+ *   (a + b)/2 = 1  (oscillate around NTUP_PER_BUCKET)
+ * 
+ * which means b=1. (a = b/2). If we wanted higher threshold, we
+ * could grow the nbuckets to (4*nbuckets), thus using (b=4a) for
+ * growth, leading to (b=1.6). Or (b=8a) giving 1. etc.
+ * 
+ * Let's start with doubling the bucket count, i.e. 1.333. */
+#define NTUP_GROW_THRESHOLD 1.333
+
 void
 ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 		int *numbuckets,
@@ -651,6 +670,7 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 /* prevtuple doesn't change */
 hashtable->spaceUsed -=
 	HJTUPLE_OVERHEAD + HJTUPLE_MINTUPLE(tuple)->t_len;
+hashtable->batchTuples -= 1;
 pfree(tuple);
 nfreed++;
 			}
@@ -682,6 +702,92 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 }
 
 /*
+ * ExecHashIncreaseNumBuckets
+ *		increase the original number of buckets in order to reduce
+ *		number of tuples per bucket
+ */
+static void
+ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
+{
+	int			i;
+	int ntuples = 0;
+	int			oldnbuckets = hashtable->nbuckets;
+	HashJoinTuple  *oldbuckets = hashtable->buckets;
+	MemoryContext   oldcxt;
+
+	/* XXX Not sure if we should update the info about used space here.
+	 * The code seems to ignore the space used for 'buckets' and we're not
+	 * allocating more space for tuples (just shuffling them to the new
+	 * buckets). And the amount of memory used for buckets is quite small
+	 * (just an array of pointers, thus ~8kB per 1k buckets on 64-bit

Re: [HACKERS] What's the point of json_extract_path_op etc?

2014-06-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 06/26/2014 03:11 PM, Tom Lane wrote:
>> Do we actually want to document these things as both operators and
>> functions?  If we do, then the right answer is to list them as known
>> exceptions in the opr_sanity test, not to hide the fact that we're
>> violating the general documentation policy.

> It's quite important that we have the variadic functions exposed.

Yeah, I suppose --- and at this point backwards compatibility would
demand it anyway.  I'll go fix the regression test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster name in ps output

2014-06-26 Thread Thomas Munro
On 25 June 2014 08:13, Fujii Masao  wrote:
> On Wed, Jun 25, 2014 at 1:29 PM, Abhijit Menon-Sen  
> wrote:
>> The patch looks OK, and works as advertised (I tested on Linux). If we
>> want the feature (I like it), this patch is a good enough way to get it.
>
> I got the following compiler warning.
>
> guc.c:3099: warning: initialization from incompatible pointer type

Thank you both for your reviews.  Please find attached an updated v5
patch (based on Abhijit's rebased and improved version) which fixes
the compiler warning.  (The previous unusual and possibly incorrect
static initialization with a string literal was intended to make sure
logging would work before the GUC machinery has finished setting up
default values, but that no longer applies.)

Regards,
Thomas Munro
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e3d1c62..e9a0e10 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -695,6 +695,23 @@ include 'filename'
   
  
 
+ 
+  cluster_name (string)
+  
+   cluster_name configuration parameter
+  
+  
+   
+Sets the cluster name that appears in the process title for all
+processes in this cluster.  No name is shown if this parameter is set
+to the empty string '' (which is the default).  The
+process title is typically viewed by the ps command, or in
+Windows by using the Process Explorer.
+This parameter can only be set at server start.
+   
+  
+ 
+
  
   tcp_keepalives_idle (integer)
   
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6902c23..5578d44 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -443,6 +443,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+char	   *cluster_name;
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3090,6 +3091,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{"cluster_name", PGC_POSTMASTER, CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the name of the cluster that appears in 'ps' output."),
+			NULL,
+			GUC_IS_NAME
+		},
+		&cluster_name,
+		"",
+		NULL, NULL, NULL
+	},
+
+	{
 		/*
 		 * Can't be set by ALTER SYSTEM as it can lead to recursive definition
 		 * of data_directory.
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d109394..e4e0411 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -74,6 +74,8 @@
 	# (change requires restart)
 #bonjour_name = ''			# defaults to the computer name
 	# (change requires restart)
+#cluster_name = ''			# visible in ps output if set
+	# (change requires restart)
 
 # - Security and Authentication -
 
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 3aeceae..471d890 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -29,6 +29,7 @@
 #include "libpq/libpq.h"
 #include "miscadmin.h"
 #include "utils/ps_status.h"
+#include "utils/guc.h"
 
 extern char **environ;
 bool		update_process_title = true;
@@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname,
 	 * apparently setproctitle() already adds a `progname:' prefix to the ps
 	 * line
 	 */
-	snprintf(ps_buffer, ps_buffer_size,
-			 "%s %s %s ",
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX ""
 #else
-	snprintf(ps_buffer, ps_buffer_size,
-			 "postgres: %s %s %s ",
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX "postgres: "
 #endif
 
+	if (*cluster_name == '\0')
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX "%s %s %s ",
+ username, dbname, host_info);
+	}
+	else
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX "%s %s %s %s ",
+ cluster_name, username, dbname, host_info);
+	}
+
 	ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);
 
 	set_ps_display(initial_str, true);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 1493d2c..0a729c1 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -224,6 +224,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
+extern char *cluster_name;
 extern char *data_directory;
 extern char *ConfigFileName;
 extern char *HbaFileName;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Tom Lane
Andres Freund  writes:
> On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
>> Surely it had better be a read barrier as well?

> I don't immediately see why it has to be read barrier? Hoisting a load
> from after the release into the locked area of code should be safe?

No doubt, but delaying a read till after the unlocking write would
certainly not be safe.

AFAICT, README.barrier completely fails to define what we think the
semantics of pg_read_barrier and pg_write_barrier actually are, so if
you believe that a write barrier prevents reordering of reads relative to
writes, you'd better propose some new text for that file.  It certainly
doesn't say that today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Andres Freund
Hi,

On 2014-06-26 23:01:10 +0200, Andres Freund wrote:
> I think we should rework things so that we fall back to
> pg_write_barrier(), (*((volatile slock_t *) (lock)) = 0) instead of what
> we have right now.
> That'd require to make barrier.h independent from s_lock.h, but I think
> that'd be a pretty clear improvement. One open question is what happens
> with the SpinlockRelease() when barriers are implemented using spinlocks
> and we need a barrier for the SpinlockRelease().

Too tired to think about this any further today. Here's my current state
of this:
* barrier.h's spinlock implementation moved to s_lock.c, loosing the
  s_lock.h include. That requires a S_UNLOCK definition which doesn't
  again use a barrier. I've coined it S_UNLOCKED_UNLOCK
* s_lock.h now includes barrier.h to implement the generic S_UNLOCK
  safely.
* gcc x86-64 had a superflous "cc" clobber. Likely copied from the 32bit
  version which does additional operations.
* PPC was missing a compiler barrier
* alpha was missing a "cc" clobber.
* mips was missing a compiler barrier and a "cc" clobber
* I have no idea how to fix pa-risc's S_UNLOCK for !gcc. The referenced
  spinlock paper calls a external function to avoid reordering.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index efe1b43..5052718 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -187,6 +187,23 @@ update_spins_per_delay(int shared_spins_per_delay)
 	return (shared_spins_per_delay * 15 + spins_per_delay) / 16;
 }
 
+/*
+ * Memory barrier implementation based on lock/unlock to a spinlock. Check
+ * barrier.h for details.
+ */
+#ifdef PG_SIMULATE_MEMORY_BARRIER
+void
+pg_memory_barrier_impl(void)
+{
+	S_LOCK(&dummy_spinlock);
+#ifdef S_UNLOCKED_UNLOCK
+	S_UNLOCKED_UNLOCK(&dummy_spinlock);
+#else
+	S_UNLOCK(&dummy_spinlock);
+#endif
+}
+#endif /* PG_SIMULATE_MEMORY_BARRIER */
+
 
 /*
  * Various TAS implementations that cannot live in s_lock.h as no inline
diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 9b71744..03a4fe8 100644
--- a/src/backend/storage/lmgr/spin.c
+++ b/src/backend/storage/lmgr/spin.c
@@ -25,6 +25,7 @@
 #include "access/xlog.h"
 #include "miscadmin.h"
 #include "replication/walsender.h"
+#include "storage/barrier.h"
 #include "storage/lwlock.h"
 #include "storage/pg_sema.h"
 #include "storage/spin.h"
diff --git a/src/include/storage/barrier.h b/src/include/storage/barrier.h
index bc61de0..e5692ac 100644
--- a/src/include/storage/barrier.h
+++ b/src/include/storage/barrier.h
@@ -13,10 +13,6 @@
 #ifndef BARRIER_H
 #define BARRIER_H
 
-#include "storage/s_lock.h"
-
-extern slock_t dummy_spinlock;
-
 /*
  * A compiler barrier need not (and preferably should not) emit any actual
  * machine code, but must act as an optimization fence: the compiler must not
@@ -155,10 +151,14 @@ extern slock_t dummy_spinlock;
  * spinlock acquire-and-release would be equivalent to a full memory barrier.
  * For example, I'm not sure that Itanium's acq and rel add up to a full
  * fence.  But all of our actual implementations seem OK in this regard.
+ *
+ * The actual implementation is in s_lock.c so we can include barrier.h from
+ * s_lock.h. Yes, this will make things even slower, but who cares.
  */
 #if !defined(pg_memory_barrier)
-#define pg_memory_barrier() \
-	do { S_LOCK(&dummy_spinlock); S_UNLOCK(&dummy_spinlock); } while (0)
+#define PG_SIMULATE_MEMORY_BARRIER
+extern void pg_memory_barrier_impl(void);
+#define pg_memory_barrier() pg_memory_barrier_impl()
 #endif
 
 /*
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index ba4dfe1..2732054 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -19,7 +19,8 @@
  *		Should return number of "delays"; see s_lock.c
  *
  *	void S_UNLOCK(slock_t *lock)
- *		Unlock a previously acquired lock.
+ *		Unlock a previously acquired lock. Needs to imply a compiler and
+ *  	write memory barrier.
  *
  *	bool S_LOCK_FREE(slock_t *lock)
  *		Tests if the lock is free. Returns TRUE if free, FALSE if locked.
@@ -39,6 +40,7 @@
  *		Atomic test-and-set instruction.  Attempt to acquire the lock,
  *		but do *not* wait.	Returns 0 if successful, nonzero if unable
  *		to acquire the lock.
+ *		Needs to imply a compiler and read memory barrier.
  *
  *	int TAS_SPIN(slock_t *lock)
  *		Like TAS(), but this version is used when waiting for a lock
@@ -94,6 +96,8 @@
 #ifndef S_LOCK_H
 #define S_LOCK_H
 
+#include "storage/barrier.h"
+
 #ifdef HAVE_SPINLOCKS	/* skip spinlocks if requested */
 
 #if defined(__GNUC__) || defined(__INTEL_COMPILER)
@@ -224,7 +228,7 @@ tas(volatile slock_t *lock)
 		"	xchgb	%0,%1	\n"
 :		"+q"(_res), "+m"(*lock)
 :
-:		"memory", "cc");
+:		"memory");
 	return (int) _res;
 }
 
@@ -478,14 +482,14 @@ tas(volati

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-26 Thread Petr Jelinek

On 26/06/14 19:57, Sawada Masahiko wrote:

$ pg_resetxlog  -s0 data
Transaction log reset
$ pg_controldata data | grep "Database system identifier"
Database system identifier:   6029284919152642525

this patch dose not works fine with -s0.



Yes, this is a bug, 0 input should throw error, which it does now.

Also based on Alvaro's comment, I replaced the scanf parsing code with 
strtoul(l) function.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 34b0606..39ae574 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -30,6 +30,7 @@ PostgreSQL documentation
-m mxid,mxid
-O mxoff
-l xlogfile
+   -s [sysid]
datadir
   
  
@@ -184,6 +185,13 @@ PostgreSQL documentation
   
 
   
+   The -s option instructs pg_resetxlog to 
+   set the system identifier of the cluster to specified sysid.
+   If the sysid is not provided new one will be generated using
+   same algorithm initdb uses.
+  
+
+  
The -V and --version options print
the pg_resetxlog version and exit.  The
options -? and --help show supported arguments,
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 915a1ed..0b28bc0 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -67,7 +67,9 @@ static MultiXactId set_mxid = 0;
 static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
 static uint32 minXlogTli = 0;
 static XLogSegNo minXlogSegNo = 0;
+static uint64 set_sysid = 0;
 
+static uint64 GenerateSystemIdentifier(void);
 static bool ReadControlFile(void);
 static void GuessControlValues(void);
 static void PrintControlValues(bool guessed);
@@ -111,7 +113,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:s::")) != -1)
 	{
 		switch (c)
 		{
@@ -227,6 +229,33 @@ main(int argc, char *argv[])
 XLogFromFileName(optarg, &minXlogTli, &minXlogSegNo);
 break;
 
+			case 's':
+if (optarg)
+{
+#ifdef HAVE_STRTOULL
+	set_sysid = strtoull(optarg, &endptr, 10);
+#else
+	set_sysid = strtoul(optarg, &endptr, 10);
+#endif
+	/*
+	 * Validate input, we use strspn because strtoul(l) accepts
+	 * negative numbers and silently converts them to unsigned
+	 */
+	if (set_sysid == 0 || errno != 0 ||
+		endptr == optarg || *endptr != '\0' ||
+		strspn(optarg, "0123456789") != strlen(optarg))
+	{
+		fprintf(stderr, _("%s: invalid argument for option -s\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+}
+else
+{
+	set_sysid = GenerateSystemIdentifier();
+}
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit(1);
@@ -354,6 +383,9 @@ main(int argc, char *argv[])
 	if (minXlogSegNo > newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
+	if (set_sysid != 0)
+		ControlFile.system_identifier = set_sysid;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.
@@ -395,6 +427,26 @@ main(int argc, char *argv[])
 
 
 /*
+ * Create a new unique installation identifier.
+ *
+ * See notes in xlog.c about the algorithm.
+ */
+static uint64
+GenerateSystemIdentifier(void)
+{
+	uint64			sysidentifier;
+	struct timeval	tv;
+
+	gettimeofday(&tv, NULL);
+	sysidentifier = ((uint64) tv.tv_sec) << 32;
+	sysidentifier |= ((uint64) tv.tv_usec) << 12;
+	sysidentifier |= getpid() & 0xFFF;
+
+	return sysidentifier;
+}
+
+
+/*
  * Try to read the existing pg_control file.
  *
  * This routine is also responsible for updating old pg_control versions
@@ -475,9 +527,6 @@ ReadControlFile(void)
 static void
 GuessControlValues(void)
 {
-	uint64		sysidentifier;
-	struct timeval tv;
-
 	/*
 	 * Set up a completely default set of pg_control values.
 	 */
@@ -489,14 +538,10 @@ GuessControlValues(void)
 
 	/*
 	 * Create a new unique installation identifier, since we can no longer use
-	 * any old XLOG records.  See notes in xlog.c about the algorithm.
+	 * any old XLOG records.
 	 */
-	gettimeofday(&tv, NULL);
-	sysidentifier = ((uint64) tv.tv_sec) << 32;
-	sysidentifier |= ((uint64) tv.tv_usec) << 12;
-	sysidentifier |= getpid() & 0xFFF;
 
-	ControlFile.system_identifier = sysidentifier;
+	ControlFile.system_identifier = GenerateSystemIdentifier();
 
 	ControlFile.checkPointCopy.redo = SizeOfXLogLongPHD;
 	ControlFile.checkPointCopy.ThisTimeLineID = 1;
@@ -649,6 +694,21 @@ PrintNewControlValues()
 	XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSegNo);
 	printf(_("First log segment after reset:%s\n"), fname);
 
+	if (set_sysid != 0)
+	{
+		char		sysident_str[32];
+
+		/*
+		 * Forma

Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-06-26 Thread Vik Fearing
On 06/21/2014 02:03 PM, David Rowley wrote:
> I'm marking this Waiting on Author pending discussion on pushing down
> entire expressions, but on the whole I think this is pretty much ready.
> 
> 
> As I said above, I don't think playing around with that code is really
> work for this patch. It can be done later in another patch if people
> think it would be useful.

I tend to agree.

This latest patch is ready for a committer to look at now.  The weird
comments have been changed, superfluous regression tests removed, and
nothing done about expression pushdown per (brief) discussion.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-06-26 Thread Andres Freund
On 2014-06-26 15:40:11 -0700, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-06-26 14:13:07 -0700, Tom Lane wrote:
> >> Surely it had better be a read barrier as well?
> 
> > I don't immediately see why it has to be read barrier? Hoisting a load
> > from after the release into the locked area of code should be safe?
> 
> No doubt, but delaying a read till after the unlocking write would
> certainly not be safe.

Right. What we actually want for locking is what several systems
(e.g. C11, linux) coin acquire/release barriers.
Not sure whether we should introduce a separate set of acquire/release
macros, or "promote" our barriers to have stronger guarantees than the
name implies.

The definition as I understand it is:

A acquire barrier implies that:
* memory operations from after the barrier cannot appear to have
  happened before the barrier
* but: memory operations from *before* the barrier are *not* guaranteed to be
  finished

A finished release barrier implies:
* stores from before the barrier cannot be moved past
* loads from before the barrier cannot be moved past
* but: reads from *after* the barrier might occur *before* the barrier.

I believe that all our current barrier definitions (except maybe alpha
which I haven't bothered to check thoroughly) satisfy those
constraints. That's primarily because we don't have support for all that
many platforms and use full memory barriers for read/write barriers in
several places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending MSVC scripts to support --with-extra-version

2014-06-26 Thread Asif Naeem
Hi,

I have spent some time reviewing the code. It applies well and PG master
branch build fine with setting extraver or keep it undefined. I have
observed the following output applying the patch i.e.

*Keeping extraver undefined* :

C:\PG\postgresql\inst_withpatch_no_extra-version>bin\psql.exe -d postgres
> psql (9.5devel)
> WARNING: Console code page (437) differs from Windows code page (1252)
>  8-bit characters might not work correctly. See psql reference
>  page "Notes for Windows users" for details.
> Type "help" for help.
> postgres=# select version();
> version
> 
>  PostgreSQL 9.5devel, compiled by Visual C++ build 1600, 64-bit
> (1 row)
> C:\PG\postgresql\inst_withpatch_no_extra-version>bin\initdb.exe -V
> initdb (PostgreSQL) 9.5devel


*Setting extraver as ‘June27’* :

C:\PG\postgresql\inst_withpatch_extra-version>bin\psql -d postgres
> psql (9.5devel)
> WARNING: Console code page (437) differs from Windows code page (1252)
>  8-bit characters might not work correctly. See psql reference
>  page "Notes for Windows users" for details.
> Type "help" for help.
> postgres=# select version();
>version
> --
>  PostgreSQL 9.5develJune27, compiled by Visual C++ build 1600, 64-bit
> (1 row)
>
> C:\PG\postgresql\inst_withpatch_extra-version>bin\initdb.exe -V
> initdb (PostgreSQL) 9.5devel


It seems that extraver information only appears when version function is
being used. If we use -V (--version)  with pg utilities/binaries, it does
not include additional provided information.

Can you please guide how can I perform similar functionality via configure
script (that can be used on Unix like OS/MinGW) or It is intended for
Window specific requirement ?. Thanks.

Regards,
Muhammad Asif Naeem


On Tue, May 27, 2014 at 5:58 AM, Michael Paquier 
wrote:

> Hi all,
>
> Please find attached a patch extending support of --with-extra-version
> in the MSVC scripts. This is something I have been using internally,
> and I believe that it is useful for Windows packagers.
> Similarly to the other options, a new field called extraver is added
> in config_default.pl and it can be overwritten in config.pl to have an
> additional version string, similarly to what is currently doable with
> ./configure.
>
> I'll add this patch to the next commit fest.
> Regards,
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-06-26 Thread Andreas Karlsson

On 06/24/2014 03:20 AM, Jeff Janes wrote:

I've tried your 0001 patch, reflecting this refactoring, on Linux and it
caused 'make check' to hang at 'starting postmaster'.


I found the bug in the code, and I have attached the a patch which you 
can apply on top of the patch. The regression tests pass now on my 
Debian machine.


One thing I noticed when trying to find the bug is that be-secure.c 
still includes some OpenSSL headers. Those should be removed since they 
have already been moved to be-secure-openssl.c.


--
Andreas Karlsson
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 6d943cf..11f67c4 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -280,7 +280,7 @@ secure_write(Port *port, void *ptr, size_t len)
 	}
 	else
 #endif
-		n = secure_raw_read(port, ptr, len);
+		n = secure_raw_write(port, ptr, len);
 
 	return n;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending MSVC scripts to support --with-extra-version

2014-06-26 Thread Michael Paquier
On Fri, Jun 27, 2014 at 8:26 AM, Asif Naeem  wrote:

> I have spent some time reviewing the code. It applies well and PG master
> branch build fine with setting extraver or keep it undefined.
>
Thanks for reviewing that.


> I have observed the following output applying the patch i.e.
>
It seems that extraver information only appears when version function is
> being used. If we use -V (--version)  with pg utilities/binaries, it does
> not include additional provided information.
>
You are right. The first version of this patch updates PG_VERSION_STR but
not PG_VERSION, which is the string used for all the binaries to report the
version.


>  Can you please guide how can I perform similar functionality via
> configure script (that can be used on Unix like OS/MinGW) or It is intended
> for Window specific requirement ?. Thanks.
>
Sure, you can do the equivalent with plain configure like that:
./configure --with-extra-version="-foo" --prefix=/to/path/
And here is the output that I get with such options on OSX for example:
$ psql -c 'select substring(version(), 1, 52)'
  substring
--
 PostgreSQL 9.5devel-foo on x86_64-apple-darwin13.2.0
(1 row)
$ initdb --version
initdb (PostgreSQL) 9.5devel-foo

With the new patch attached, the following output is generated for an MSVC
build:
$ psql -c 'select version()'
  version

 PostgreSQL 9.5devel-foo, compiled by Visual C++ build 1600, 64-bit
(1 row)
$ initdb --version
initdb (PostgreSQL) 9.5devel-foo

Regards,
--
Michael
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index f7a5abb..d53803e 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -164,9 +164,9 @@ sub GenerateFiles
 		  || confess "Could not write to pg_config.h\n";
 		while ()
 		{
-			s{PG_VERSION "[^"]+"}{PG_VERSION "$self->{strver}"};
+			s{PG_VERSION "[^"]+"}{PG_VERSION "$self->{strver}$self->{options}->{extraver}"};
 			s{PG_VERSION_NUM \d+}{PG_VERSION_NUM $self->{numver}};
-s{PG_VERSION_STR "[^"]+"}{__STRINGIFY(x) #x\n#define __STRINGIFY2(z) __STRINGIFY(z)\n#define PG_VERSION_STR "PostgreSQL $self->{strver}, compiled by Visual C++ build " __STRINGIFY2(_MSC_VER) ", $bits-bit"};
+			s{PG_VERSION_STR "[^"]+"}{__STRINGIFY(x) #x\n#define __STRINGIFY2(z) __STRINGIFY(z)\n#define PG_VERSION_STR "PostgreSQL $self->{strver}$self->{options}->{extraver}, compiled by Visual C++ build " __STRINGIFY2(_MSC_VER) ", $bits-bit"};
 			print O;
 		}
 		print O "#define PG_MAJORVERSION \"$self->{majorver}\"\n";
@@ -625,14 +625,15 @@ sub GetFakeConfigure
 	$cfg .= ' --enable-nls' if ($self->{options}->{nls});
 	$cfg .= ' --with-ldap'  if ($self->{options}->{ldap});
 	$cfg .= ' --without-zlib' unless ($self->{options}->{zlib});
-	$cfg .= ' --with-openssl'   if ($self->{options}->{ssl});
-	$cfg .= ' --with-ossp-uuid' if ($self->{options}->{uuid});
-	$cfg .= ' --with-libxml'if ($self->{options}->{xml});
-	$cfg .= ' --with-libxslt'   if ($self->{options}->{xslt});
-	$cfg .= ' --with-gssapi'if ($self->{options}->{gss});
-	$cfg .= ' --with-tcl'   if ($self->{options}->{tcl});
-	$cfg .= ' --with-perl'  if ($self->{options}->{perl});
-	$cfg .= ' --with-python'if ($self->{options}->{python});
+	$cfg .= ' --with-extra-version' if ($self->{options}->{extraver});
+	$cfg .= ' --with-openssl'   if ($self->{options}->{ssl});
+	$cfg .= ' --with-ossp-uuid' if ($self->{options}->{uuid});
+	$cfg .= ' --with-libxml'if ($self->{options}->{xml});
+	$cfg .= ' --with-libxslt'   if ($self->{options}->{xslt});
+	$cfg .= ' --with-gssapi'if ($self->{options}->{gss});
+	$cfg .= ' --with-tcl'   if ($self->{options}->{tcl});
+	$cfg .= ' --with-perl'  if ($self->{options}->{perl});
+	$cfg .= ' --with-python'if ($self->{options}->{python});
 
 	return $cfg;
 }
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index ebb47ab..20aee8b 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -10,17 +10,18 @@ our $config = {
 	  # blocksize => 8, # --with-blocksize, 8kB by default
 	  # wal_blocksize => 8, # --with-wal-blocksize, 8kB by default
 	  # wal_segsize => 16,  # --with-wal-segsize, 16MB by default
-	ldap=> 1,# --with-ldap
-	nls => undef,# --enable-nls=
-	tcl => undef,# --with-tls=
-	perl=> undef,# --with-perl
-	python  => undef,# --with-python=
-	openssl => undef,# --with-ssl=
-	uuid=> undef,# --with-ossp-uuid
-	xml => undef,# --with-libxml=
-	xslt=> undef,# --with-libxslt=
-	iconv   => undef,# (not in configure, path to iconv)
-	zlib=> undef # --with-zlib=
+	ldap => 1,   # --with-ldap
+	extraver => undef,   # --with-extra-version=
+	nls  => undef,   # --enable-nls=
+	tcl  => undef,   # --with-tls=
+	

Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-26 Thread Tom Dunstan
On 27 June 2014 06:14, Gavin Flower  wrote:

> On 27/06/14 00:12, Rushabh Lathia wrote:
>
>> INSERT INTO dept VALUES (10,'ACCOUNTING','NEW YORK') returning primary
>> key, dname;
>>
>> I think allowing other columns with PRIMARY KEY would be more useful
>> syntax.
>> Even in later versions if we want to extend this syntax to return UNIQUE
>> KEY,
>> SEQUENCE VALUES, etc.. comma separation syntax will be more handy.
>>
>>
> I agree 100%.
>

If the query is being hand-crafted, what's to stop the query writer from
just listing the id columns in the returning clause? And someone specifying
RETURNING * is getting all the columns anyway.

The target use-case for this feature is a database driver that has just
done an insert and doesn't know what the primary key columns are - in that
case mixing them with any other columns is actually counter-productive as
the driver won't know which columns are which. What use cases are there
where the writer of the query knows enough to write specific columns in the
RETURNING clause but not enough to know which column is the id column?

Consistency is nice, and I can understand wanting to treat the PRIMARY KEY
bit as just another set of columns in the list to return, but I'd hate to
see this feature put on the back-burner to support use-cases that are
already handled by the current RETURNING feature. Maybe it's easy to do,
though.. I haven't looked into the implementation at all.

Cheers

Tom


Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-06-26 Thread Tom Lane
Vik Fearing  writes:
> This latest patch is ready for a committer to look at now.  The weird
> comments have been changed, superfluous regression tests removed, and
> nothing done about expression pushdown per (brief) discussion.

I started to look at this patch and realized that there's an issue that
isn't covered, which is not too surprising because the existing code fails
to cover it too.  Remember that the argument for pushing down being safe
at all is that we expect the pushed-down qual to yield the same result at
all rows of a given partition, so that we either include or exclude the
whole partition and thereby don't change window function results.  This
means that not only must the qual expression depend only on partitioning
columns, but *it had better not be volatile*.

In exactly the same way, it isn't safe to push down quals into
subqueries that use DISTINCT unless the quals are non-volatile.  This
consideration is missed by the current code, and I think that's a bug.

(Pushing down volatile quals would also be unsafe in subqueries involving
aggregation, except that we put them into HAVING so that they're executed
only once per subquery output row anyway.)

Given the lack of prior complaints, I'm not excited about back-patching a
change to prevent pushing down volatile quals in the presence of DISTINCT;
but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "RETURNING PRIMARY KEY" syntax extension

2014-06-26 Thread Ian Barwick



On 27/06/14 09:09, Tom Dunstan wrote:

On 27 June 2014 06:14, Gavin Flower mailto:gavinflo...@archidevsys.co.nz>> wrote:

On 27/06/14 00:12, Rushabh Lathia wrote:

INSERT INTO dept VALUES (10,'ACCOUNTING','NEW YORK') returning primary 
key, dname;

I think allowing other columns with PRIMARY KEY would be more useful 
syntax.
Even in later versions if we want to extend this syntax to return 
UNIQUE KEY,
SEQUENCE VALUES, etc.. comma separation syntax will be more handy.


I agree 100%.


If the query is being hand-crafted, what's to stop the query writer from just 
listing the

> id columns in the returning clause? And someone specifying RETURNING * is 
getting all the
> columns anyway.


The target use-case for this feature is a database driver that has just done an 
insert and

> doesn't know what the primary key columns are - in that case mixing them with 
any other
> columns is actually counter-productive as the driver won't know which columns 
are which.
> What use cases are there where the writer of the query knows enough to write 
specific columns
> in the RETURNING clause but not enough to know which column is the id column?


Consistency is nice, and I can understand wanting to treat the PRIMARY KEY bit 
as just
another set of columns in the list to return, but I'd hate to see this feature 
put on

> the back-burner to support use-cases that are already handled by the current 
RETURNING
> feature. Maybe it's easy to do, though.. I haven't looked into the 
implementation at all.

Normal columns are injected into the query's returning list at parse time, 
whereas
this version of the patch handles expansion of PRIMARY KEY at the rewrite 
stage, which
would make handling a mix of PRIMARY KEY and normal output expressions somewhat 
tricky
to handle. (In order to maintain the columns in their expected position you'd
have to add some sort of placeholder/dummy TargetEntry to the returning list at 
parse
time, then rewrite it later with the expanded primary key columns, or something
equally messy).

On the other hand, it should be fairly straightforward to handle a list of 
keywords
for expansion (e.g. "RETURNING PRIMARY KEY, UNIQUE KEYS, SEQUENCE VALUES") 
should
the need arise.


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-06-26 Thread Vik Fearing
On 06/27/2014 02:49 AM, Tom Lane wrote:
> Vik Fearing  writes:
>> This latest patch is ready for a committer to look at now.  The weird
>> comments have been changed, superfluous regression tests removed, and
>> nothing done about expression pushdown per (brief) discussion.
> 
> I started to look at this patch and realized that there's an issue that
> isn't covered, which is not too surprising because the existing code fails
> to cover it too.  Remember that the argument for pushing down being safe
> at all is that we expect the pushed-down qual to yield the same result at
> all rows of a given partition, so that we either include or exclude the
> whole partition and thereby don't change window function results.  This
> means that not only must the qual expression depend only on partitioning
> columns, but *it had better not be volatile*.
> 
> In exactly the same way, it isn't safe to push down quals into
> subqueries that use DISTINCT unless the quals are non-volatile.  This
> consideration is missed by the current code, and I think that's a bug.
> 
> (Pushing down volatile quals would also be unsafe in subqueries involving
> aggregation, except that we put them into HAVING so that they're executed
> only once per subquery output row anyway.)

Are you going to take care of all this, or should David or I take a
crack at it?  The commitfest app still shows Ready for Committer.

> Given the lack of prior complaints, I'm not excited about back-patching a
> change to prevent pushing down volatile quals in the presence of DISTINCT;
> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.
> 
> Thoughts?

I didn't test it myself, I'm just taking your word on it.

If it's a bug, it should obviously be fixed in 9.5.  As for 9.4, I have
always viewed a beta as a time to fix bugs so I vote to backpatch it at
least that far.

Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-06-26 Thread Tom Lane
Vik Fearing  writes:
> On 06/27/2014 02:49 AM, Tom Lane wrote:
>> In exactly the same way, it isn't safe to push down quals into
>> subqueries that use DISTINCT unless the quals are non-volatile.  This
>> consideration is missed by the current code, and I think that's a bug.

>> Given the lack of prior complaints, I'm not excited about back-patching a
>> change to prevent pushing down volatile quals in the presence of DISTINCT;
>> but I think we probably ought to fix it in 9.5, and maybe 9.4 too.

> Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)

People get unhappy when minor releases de-optimize queries that had
been working for them.  It's not too late to change the behavior of
9.4, but I'm hesitant to do it in already-released branches, especially
in the absence of any complaints from the field.

> Are you going to take care of all this, or should David or I take a
> crack at it?  The commitfest app still shows Ready for Committer.

I can deal with it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-06-26 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Why wouldn't it go back all the way to 9.0?  (assuming 8.4 is dead)
> 
> People get unhappy when minor releases de-optimize queries that had
> been working for them.  It's not too late to change the behavior of
> 9.4, but I'm hesitant to do it in already-released branches, especially
> in the absence of any complaints from the field.

Agreed.  Back-patching to released versions would likely end up making
for a nasty surprise in a minor point release for some users.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_resetxlog to clear backup start/end locations.

2014-06-26 Thread Kyotaro HORIGUCHI
Hello, I have finished the patches for all of 9.x.

> I dont' touch what '-n' option shows and rewrite documents for
> the option a bit. And '-n' won't show the changes of backup
> location.

-8.4: does not have backup locations in ControlFileData.

9.0-9.1: resetxlog_backuploc_9_0-9.1.patch: Add clearance of backupStartPoint.

9.2: resetxlog_backuploc_9_2.patch: Add clearance of
   backupStart/EndPoint and backupEndRequired

9.3: resetxlog_backuploc_9_3.patch: Ditto. (format of XLogRecPtr changed)

9.4-master: resetxlog_backuploc_9_4-master.patch: Add clearance of
   backupPoints. help message and html doc changed.

With these patches, pg_resetxlog saves the stuck after recovery
failure with wrongly placed backup label.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 34b0606..b543f04 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -175,12 +175,14 @@ PostgreSQL documentation
   
 
   
+
The -n (no operation) option instructs
pg_resetxlog to print the values reconstructed from
-   pg_control and values about to be changed, and then exit
-   without modifying anything. This is mainly a debugging tool, but can be
-   useful as a sanity check before allowing pg_resetxlog
-   to proceed for real.
+
+   pg_control and significant values about to be changed, and
+   then exit without modifying anything. This is mainly a debugging tool, but
+   can be useful as a sanity check before allowing
+   pg_resetxlog to proceed for real.
   
 
   
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 915a1ed..9a80775 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -354,6 +354,10 @@ main(int argc, char *argv[])
 	if (minXlogSegNo > newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
+	ControlFile.backupStartPoint = InvalidXLogRecPtr;
+	ControlFile.backupEndPoint = InvalidXLogRecPtr;
+	ControlFile.backupEndRequired = false;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.
@@ -1083,7 +1087,7 @@ usage(void)
 	printf(_("  -f   force update to be done\n"));
 	printf(_("  -l XLOGFILE  force minimum WAL starting location for new transaction log\n"));
 	printf(_("  -m MXID,MXID set next and oldest multitransaction ID\n"));
-	printf(_("  -n   no update, just show what would be done (for testing)\n"));
+	printf(_("  -n   no update, just show significant changes which would be done\n   (for testing)\n"));
 	printf(_("  -o OID   set next OID\n"));
 	printf(_("  -O OFFSETset next multitransaction offset\n"));
 	printf(_("  -V, --versionoutput version information, then exit\n"));
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 38d03a3..e25ad64 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -350,6 +350,10 @@ main(int argc, char *argv[])
 	if (minXlogSegNo > newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
+	ControlFile.backupStartPoint = InvalidXLogRecPtr;
+	ControlFile.backupEndPoint = InvalidXLogRecPtr;
+	ControlFile.backupEndRequired = false;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 92d98fc..dd1930c 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -341,6 +341,12 @@ main(int argc, char *argv[])
 		newXlogSeg = minXlogSeg;
 	}
 
+	ControlFile.backupStartPoint.xlogid = 0;
+	ControlFile.backupStartPoint.xrecoff = 0;
+	ControlFile.backupEndPoint.xlogid = 0;
+	ControlFile.backupEndPoint.xrecoff = 0;
+	ControlFile.backupEndRequired = false;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index a120c6f..8ca1e2b 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -341,6 +341,9 @@ main(int argc, char *argv[])
 		newXlogSeg = minXlogSeg;
 	}
 
+	ControlFile.backupStartPoint.xlogid = 0;
+	ControlFile.backupStartPoint.xrecoff = 0;
+
 	/*
 	 * If we had to guess anything, and -f was not given, just print the
 	 * guessed values and exit.  Also print if -n is given.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf and reload

2014-06-26 Thread Amit Kapila
On Thu, Jun 26, 2014 at 1:49 PM, Christoph Berg  wrote:
>
> Re: Amit Kapila 2014-06-26 
> > On Wed, Jun 25, 2014 at 7:52 PM, Christoph Berg  wrote:
> > > Re: Amit Kapila 2014-06-25 <
> > caa4ek1+f9ztogvvw-wyj2+vt0k8_jxtziqhp8ivb7wdo1w1...@mail.gmail.com>
> > >
> > > > I think maintaining values both in postgresql.conf and by Alter
System
> > > > is not advisable.
> > >
> > > Possibly, but then the system should be warning about all options, not
> > > just the restart-only ones. And it should warn at startup, not at
> > > reload time.
> >
> > How about adding a note in Alter System so that users are aware of
> > such behaviour and can ensure that they don't have duplicate entries?
>
> If the behavior isn't going to change, that issue need to be
> documented, sure.

I will send a patch to address this unless someone comes with a better
way to address this and if no one objects to adding a note in Alter System
documentation.

> > Clearly such warnings indicate that there are conflicting settings, so
> > user can take appropriate action to avoid it.
>
> I don't think conflicting settings in postgresql.auto.conf are a user
> error, or misconfiguration. They are just normal use of ALTER SYSTEM.
> Of course the user might want to eventually consolidate their config,
> but we shouldn't treat the situation as bogus.
>
> Frankly what bugs me most about this is that the warnings occur only
> at reload time, not at startup. If the server thinks something is
> wrong with my config, it should tell me rightaway.

As per current design/code, server don't treat duplicate entries
via config file's as a problem, rather the last one is given preference.
So in the case you are mentioning, it gives warning at reload time as
it encounter's a different value than current value for PGC_POSTMASTER
parameter.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-26 Thread Amit Kapila
On Thu, Jun 26, 2014 at 8:17 PM, Vik Fearing  wrote:
> On 06/26/2014 05:07 AM, Amit Kapila wrote:
> > I think it will make sense if we support RESET ALL as well similar
> > to Alter Database .. RESET ALL syntax.  Do you see any reason
> > why we shouldn't support RESET ALL syntax for Alter System?
>
> Yes, that makes sense.  I've added that in the attached version 2 of the
> patch.

I think the idea used in patch to implement RESET ALL is sane.  In passing
by, I noticed that modified lines in .sgml have crossed 80 char
boundary which
is generally preferred to be maintained..

Refer below modification.

!values to the postgresql.auto.conf file. Setting
the parameter to
!DEFAULT, or using the RESET
variant, removes the configuration entry from


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] Index-only scans and non-MVCC snapshots

2014-06-26 Thread Ryan Johnson

Hi,

As part of a research project, I'm trying to change Read Committed 
isolation to use HeapTupleSatisfiesNow rather than acquiring a new 
snapshot at every command [1]. Things appear to have gone reasonably 
well so far, except certain queries fail with "ERROR:  non-MVCC 
snapshots are not supported in index-only scans."


I'm using v9.3.2, and the docs claim that index-only scans work without 
MVCC, but require some extra locking to avoid races [2]. Is this not 
actually implemented? If that is the case, shouldn't the query optimizer 
avoid selecting index-only scans for non-MVCC snapshots?


I realize I'm playing with fire here, but any pointers to sections of 
code I might look at to either work around or fix this issue would be 
greatly appreciated. I've been looking around in index_fetch_heap 
(indexam.c) as well as other locations that use scan->xs_continue_hot; 
there seems to be code in place to detect when a non-MVCC snapshot is in 
use, as if that were nothing out of the ordinary, but nothing prevents 
the error from arising if a hot chain is actually encountered.


Thanks,
Ryan

[1] Right now, Read Committed is significantly *slower* than Repeatable 
Read---for transactions involving multiple short commands---because the 
former acquires multiple snapshots per transaction and causes a lwlock 
bottleneck on my 12-core machine.


[2] http://www.postgresql.org/docs/9.3/static/index-locking.html:
with a non-MVCC-compliant snapshot (such as SnapshotNow), it would be 
possible to accept and return a row that does not in fact match the 
scan keys ... [so] we use a pin on an index page as a proxy to 
indicate that the reader might still be "in flight" from the index 
entry to the matching heap entry. Making ambulkdelete block on such a 
pin ensures that VACUUM cannot delete the heap entry before the reader 
is done with it. ... This solution requires that index scans be 
"synchronous": we have to fetch each heap tuple immediately after 
scanning the corresponding index entry. This is expensive for a number 
of reasons. An "asynchronous" scan in which we collect many TIDs from 
the index, and only visit the heap tuples sometime later, requires 
much less index locking overhead and can allow a more efficient heap 
access pattern. Per the above analysis, we must use the synchronous 
approach for non-MVCC-compliant snapshots. 




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-26 Thread Alvaro Herrera
Ryan Johnson wrote:

> As part of a research project, I'm trying to change Read Committed
> isolation to use HeapTupleSatisfiesNow rather than acquiring a new
> snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas 
Date:   Thu Aug 1 10:46:19 2013 -0400

Remove SnapshotNow and HeapTupleSatisfiesNow.

We now use MVCC catalog scans, and, per discussion, have eliminated
all other remaining uses of SnapshotNow, so that we can now get rid of
it.  This will break third-party code which is still using it, which
is intentional, as we want such code to be updated to do things the
new way.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans and non-MVCC snapshots

2014-06-26 Thread Ryan Johnson

On 26/06/2014 11:04 PM, Alvaro Herrera wrote:

Ryan Johnson wrote:

As part of a research project, I'm trying to change Read Committed
isolation to use HeapTupleSatisfiesNow rather than acquiring a new
snapshot at every command [1].

Are you aware of this?

commit 813fb0315587d32e3b77af1051a0ef517d187763
Author: Robert Haas 
Date:   Thu Aug 1 10:46:19 2013 -0400

 Remove SnapshotNow and HeapTupleSatisfiesNow.
That would be wonderful news... if snapshots weren't so darned expensive 
to create.


I guess there's no avoiding that bottleneck now, though.

Ryan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-06-26 Thread Michael Paquier
On Sun, Jun 15, 2014 at 6:11 AM, Heikki Linnakangas  wrote:

> On 06/14/2014 09:43 PM, Alvaro Herrera wrote:
>
>> Heikki Linnakangas wrote:
>>
>>  Here's a new version, rebased against master. No other changes.
>>>
>>
>> This is missing xlogrecord.h ...
>>
>
> Oops, here you are.
>
Looking at this patch, it does not compile when assertions are enabled
because of this assertion in heapam.c:
Assert(recdata == data + datalen);
It seems like a thinko as removing it makes the code compile.

Here is a review of this patch:
- Compilation without warnings, regression tests pass
- That's not a small patch, but the changes mainly done are xlog record
insertion in accordance to the new API format.
$ git diff master --stat | tail -n1
52 files changed, 3601 insertions(+), 4371 deletions(-)

Some comments about the code:
1) In src/backend/access/transam/README:
's/no further action is no required/no further action is required/g'
2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and
XLogGetBlockRefIds are missing in src/backend/access/transam/README.
3) There are a couple of code blocks marked as FIXME and BROKEN. There are
as well some NOT_USED blocks.
 4) Current patch crashes when running make check in contrib/test_decoding.
Looking closer, wal_level = logical is broken:
=# show wal_level;
 wal_level
---
 logical
(1 row)
=# create database foo;
The connection to the server was lost. Attempting reset: Failed.

Looking even closer, log_heap_new_cid is broken:
(lldb) bt
* thread #1: tid = 0x, 0x7fff8a3c2866
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x7fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x7fff9369db1a libsystem_c.dylib`abort + 125
frame #3: 0x000104428819
postgres`ExceptionalCondition(conditionName=0x0001044a711c,
errorType=0x00010448b19f, fileName=0x0001044a2cfd, lineNumber=6879)
+ 137 at assert.c:54
frame #4: 0x000103f08dbe
postgres`log_heap_new_cid(relation=0x7fae4482afb8,
tup=0x7fae438062e8) + 126 at heapam.c:6879
frame #5: 0x000103f080cf
postgres`heap_insert(relation=0x7fae4482afb8, tup=0x7fae438062e8,
cid=0, options=0, bistate=0x) + 383 at heapam.c:2095
frame #6: 0x000103f09b6a
postgres`simple_heap_insert(relation=0x7fae4482afb8,
tup=0x7fae438062e8) + 74 at heapam.c:2533
frame #7: 0x00010406aacf postgres`createdb(stmt=0x7fae44843690)
+ 6335 at dbcommands.c:511
frame #8: 0x00010429def8
postgres`standard_ProcessUtility(parsetree=0x7fae44843690,
queryString=0x7fae44842c38, context=PROCESS_UTILITY_TOPLEVEL,
params=0x, dest=0x7fae44843a18,
completionTag=0x7fff5bd4ee30) + 1720 at utility.c:56

5) Yes,there are some WAL records that have only data related to buffers,
XLOG_GIN_VACUUM_DATA_LEAF_PAGE being one, with nothing registered with
buffer_id = -d, but using a dummy entry seems counter-productive:
+   rdata = rdata_head;
+   if (rdata == NULL)
+   {
+   /*
+* the rest of this function assumes that there is at least
some
+* rdata entries, so fake an empty rdata entry
+*/
+   dummy_rdata.data = NULL;
+   dummy_rdata.len = 0;
+   dummy_rdata.next = NULL;
+   rdata = &dummy_rdata;
+   }
Could this be removed and replaced by a couple of checks on rdata being
NULL in some cases? XLogInsert should be updated in consequence.
6) XLogRegisterData creates a XLogRecData entry and appends it in one of
the static XLogRecData entries if buffer_id >= 0 or to
rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the
WAL record). XLogRegisterXLogRecData does something similar, but with an
entry of XLogRecData already built. What do you think about removing
XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the
interface simpler, and XLogRecData could be kept private in xlog.c. This
would need more work especially on gin side where I am seeing for example
constructLeafRecompressWALData directly building a XLogRecData entry.
By doing so, we could as remove the while loop at the bottom of
XLogRegisterXLogRecData as we would insert in the tail only the latest
record created, aka replacing that:
while ((*tail)->next)
*tail = (*tail)->next;
By simply that:
*tail = (*tail)->next;
7) New structures at the top of xlog.c should have more comments about
their role, particularly rdata_head and rdata_tail that store main WAL data
(id of -1)
8) What do you think about adding in the README a description about how to
retrieve the block list modified by a WAL record, for a flow similar to
that:
record = XLogReadRecord(...);
nblocks = XLogGetBlockRefIds(record, blockids);
for (i = 0; i < nblocks; i++)
{
bkpb = XLogGetBlockRef(record, id, NULL);
// stuff
}

Re: [HACKERS] review: Built-in binning functions

2014-06-26 Thread Pavel Stehule
Hello

I recheck this patch

1. applied cleanly and compilation was without warnings and errors
2. all tests was passed ok
3. documentation was rebuild without issues
4. I don't see any issue in code quality - it is well commented, well
formatted, with regress tests

It is ready for commit

Regards

Pavel


2014-06-26 22:43 GMT+02:00 Petr Jelinek :

> Here is v2,
>
> with fixed documentation and numeric version of the implementation.
>
>
> --
>  Petr Jelinek  http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-26 Thread Vik Fearing
On 06/27/2014 06:22 AM, Amit Kapila wrote:
> On Thu, Jun 26, 2014 at 8:17 PM, Vik Fearing  > wrote:
>> On 06/26/2014 05:07 AM, Amit Kapila wrote:
>> > I think it will make sense if we support RESET ALL as well similar
>> > to Alter Database .. RESET ALL syntax.  Do you see any reason
>> > why we shouldn't support RESET ALL syntax for Alter System?
>>
>> Yes, that makes sense.  I've added that in the attached version 2 of the
>> patch.
> 
> I think the idea used in patch to implement RESET ALL is sane.  In passing
> by, I noticed that modified lines in .sgml have crossed 80 char
> boundary which
> is generally preferred to be maintained..
> 
> Refer below modification.
> 
> !values to the postgresql.auto.conf file.
> Setting the parameter to
> !DEFAULT, or using the RESET
> variant, removes the configuration entry from

I did that on purpose so that it's easy for a reviewer/committer to see
what changed.

This third patch reformats the documentation in the way I expected it to
be committed.
-- 
Vik
*** a/doc/src/sgml/ref/alter_system.sgml
--- b/doc/src/sgml/ref/alter_system.sgml
***
*** 22,27  PostgreSQL documentation
--- 22,30 
   
  
  ALTER SYSTEM SET configuration_parameter { TO | = } { value | 'value' | DEFAULT }
+ 
+ ALTER SYSTEM RESET configuration_parameter
+ ALTER SYSTEM RESET ALL
  
   
  
***
*** 30,37  ALTER SYSTEM SET configuration_parameter
 ALTER SYSTEM writes the configuration parameter
!values to the postgresql.auto.conf file. With
!DEFAULT, it removes a configuration entry from
 postgresql.auto.conf file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
--- 33,41 
  

 ALTER SYSTEM writes the configuration parameter
!values to the postgresql.auto.conf file.
!Setting the parameter to DEFAULT, or using the
!RESET variant, removes the configuration entry from
 postgresql.auto.conf file. The values will be
 effective after reload of server configuration (SIGHUP) or in next 
 server start based on the type of configuration parameter modified.
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 401,407  static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  
  %type 	insert_rest
  
! %type  generic_set set_rest set_rest_more SetResetClause FunctionSetResetClause
  
  %type 	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type 	columnDef columnOptions
--- 401,407 
  
  %type 	insert_rest
  
! %type  generic_set set_rest set_rest_more generic_reset reset_rest SetResetClause FunctionSetResetClause
  
  %type 	TableElement TypedTableElement ConstraintElem TableFuncElement
  %type 	columnDef columnOptions
***
*** 1567,1605  NonReservedWord_or_Sconst:
  		;
  
  VariableResetStmt:
! 			RESET var_name
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = $2;
! 	$$ = (Node *) n;
  }
! 			| RESET TIME ZONE
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "timezone";
! 	$$ = (Node *) n;
  }
! 			| RESET TRANSACTION ISOLATION LEVEL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "transaction_isolation";
! 	$$ = (Node *) n;
  }
! 			| RESET SESSION AUTHORIZATION
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "session_authorization";
! 	$$ = (Node *) n;
  }
! 			| RESET ALL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET_ALL;
! 	$$ = (Node *) n;
  }
  		;
  
--- 1567,1613 
  		;
  
  VariableResetStmt:
! 			RESET reset_rest		{ $$ = (Node *) $2; }
! 		;
! 
! reset_rest:
! 			generic_reset			{ $$ = $1; }
! 			| TIME ZONE
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "timezone";
! 	$$ = n;
  }
! 			| TRANSACTION ISOLATION LEVEL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "transaction_isolation";
! 	$$ = n;
  }
! 			| SESSION AUTHORIZATION
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = "session_authorization";
! 	$$ = n;
  }
! 		;
! 
! generic_reset:
! 			var_name
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET;
! 	n->name = $1;
! 	$$ = n;
  }
! 			| ALL
  {
  	VariableSetStmt *n = makeNode(VariableSetStmt);
  	n->kind = VAR_RESET_ALL;
! 	$$ = n;
  }
  		;
  
***
*** 8474,8480  DropdbStmt: DROP DATABASE database_name
  
  /**

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-06-26 Thread Pavel Stehule
Hello

thank you Peter, so now only setting for MS Windows is missing?

Regards

Pavel


2014-06-26 21:57 GMT+02:00 Petr Jelinek :

> Hello,
>
> I went through the patch, seems mostly fine, I adjusted some wording,
> removed the default .pgpass file info since it's not accurate, and replaced
> couple of phrases with (hopefully) more informative ones.
>
>
> --
>  Petr Jelinek  http://www.2ndQuadrant.com/
>
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] ALTER SYSTEM RESET?

2014-06-26 Thread Vik Fearing
On 06/27/2014 08:49 AM, Vik Fearing wrote:
> This third patch reformats the documentation in the way I expected it to
> be committed.

Amit,

I added this to the next commitfest with your name as reviewer.

https://commitfest.postgresql.org/action/patch_view?id=1495

Please update the status as you see fit.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers