[HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-11-27 Thread Dmitry Ivanov
Hi hackers,

This is my proposal for 'pg_dump' utility enhancements which enable backup and 
recovery of statistical data stored in the 'pg_statistic' table. Statistical 
data is very valuable for the query planner, so support engineers have to 
manually generate it using ANALYZE whenever they migrate or restore a database 
from backup, which is unpleasant. This proposal aims to improve the overall 
situation.


Problem description
===

Currently there is no way to backup 'pg_statistic' because columns of 
'anyarray' type cannot be reconstructed solely with their textual 
representation. Meanwhile, all that is needed to solve this problem is a small 
extension capable of retrieving the element type of an 'anyarray' object and 
recreating this particular 'anyarray' object using the 'array_in' procedure. 
Another vital feature is the ability to transform various object identificators 
that are stored in this table to textual representations in order to make them 
portable. Such functionality could be easily implemented, which is why I've 
made up a tiny proof of concept extension that is able to demonstrate the 
possibility of recovery.


Design
==

Several things come to mind when we think of the 'pg_statistic' recovery:

* The procedure written in the C language is needed in order to determine the 
element type of a composite 'anyarray' type. The returned 'oid' will be used 
to reconstruct the 'anyarray' object using the 'array_in' procedure.

arr := array_in('{1,2,3}', 'text'::regtype::oid, -1);
anyarray_elemtype(arr) -> 25 ('text'::regtype::oid)

* The columns 'starelid' (relation identifier) and 'staop' (operator 
identifier) 
have type 'oid', so their values could be invalid within a new DB, because the 
object IDs of newly recovered relations and operators might have changed 
during recovery. These kinds of values should be substituted with proper casts 
to internal types, for example:

65554 (relid) -> 'public.test'::regclass::oid
15 (staopN) -> 'public.=(pg_catalog.=(pg_catalog.int4, pg_catalog.int8)'

Note that every type is schema-qualified in order to avoid naming conflicts.

* The type of a column which is referenced by the 'staattnum' column also 
needs to be checked for the sake of consistency. It should remain the same, 
otherwise collected stats won't be of any use.

* The main procedure will simply generate a bunch of INSERT queries (one query 
per each row of the 'pg_statistic') which can be saved to a text file.


Proof of concept


Interface
-

Currently there's a PoC extension which contains several functions:

dump_statistic() - returns a set of INSERT queries;

anyarray_elemtype(anyarray) - returns the object identificator of an element 
type;

to_schema_qualified_operator(opid oid) - converts the 'opid' (operator ID) to a 
schema-qualified operator name;
to_schema_qualified_relname(relid oid) - converts the 'relid' (relation ID) to 
a schema-qualified relation name;
to_schema_qualified_type(typid oid) - converts the 'typid' (type ID) to a 
schema-qualified type name;

to_attname(rel text, colnum smallint) - returns the name of the Nth column of 
the specified table 'rel';
to_attnum(rel text, col text) - converts the table name 'rel' and the column 
name 'col' to a column number;

to_atttype(rel text, col text) - returns the type of the column 'col';
to_atttype(rel text, colnum smallint) - overloaded for the column number 
'colnum';

The extension is compatible with versions 9.4 and above.

Usage example
-

DB=# \copy (select dump_statistic()) to 'stat_backup.sql'
$ psql DB < stat_backup.sql


Proposed changes to pg_dump
===

Now that the approach has been developed, it may be applied to improve the 
'pg_dump' utility. Some minor code changes would make the 'pg_dump' emit 
specially-formed recovery INSERTS for 'pg_statistic' in the 'binary-upgrade' 
mode, thus allowing us to restore saved stats after an upgrade.


Conclusion
==

I've attached a tarball with sources so that anyone could try the extension.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

dump_stat.tar.gz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-27 Thread Pavel Stehule
2015-11-27 17:54 GMT+01:00 Teodor Sigaev :

> Is this patch in 'Waiting on Author' state actually?


yes

I'll try redesign patch by Peter's proposal

Pavel


>
>
>  I don't think it's right to reuse SPIError for this.  SPIError is
>>>  clearly meant to signal an error in the SPI calls.  Of course, we
>>> can't
>>>  stop users from raising whatever exception they want, but if we're
>>> going
>>>  to advertise that users can raise exceptions, then we should create
>>>  separate exception classes.
>>>
>>>  I suppose the proper way to set this up would be to create a base
>>> class
>>>  like plpy.Error and derive SPIError from that.
>>>
>>>
>>> Do you have some ideas about the name of this class?
>>>
>>
>> I think plpy.Error is fine.
>>
>>
>>
>>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>


Re: [HACKERS] Redefine default result from PQhost()?

2015-11-27 Thread Noah Misch
On Thu, Nov 26, 2015 at 10:48:50AM -0500, Tom Lane wrote:
> Magnus Hagander  writes:
> > On Wed, Nov 25, 2015 at 11:59 PM, Tom Lane  wrote:
> >> I think we should [ return DEFAULT_PGSOCKET_DIR not NULL ]
> 
> > I agree with this change in genera. But I wonder if there's a risk here
> > that we break some applications isnt' it? It's clearly a backwards
> > incompatible change, so wouldn't it require a bump of libpq version?
> 
> I don't see why it would need a libpq version bump, as it's not an ABI
> breakage.  As for breaking application logic, it's true that any app code
> that is explicitly checking for NULL would become dead code, but that
> seems pretty harmless.  It's already the case that apps should be prepared
> to get back an explicit socket directory path spec; that would merely
> become a more common case than before.
> 
> Also, given the precedent of yesterday's psql fix, we should not discount
> the idea that we will be *removing* not adding corner-case bugs in some
> applications.  Failure to check for NULL is probably even more common in
> other apps than in psql.
> 
> On the other side of the coin, it's worth noting that there is no
> well-defined way for libpq-using apps to discover the value of
> DEFAULT_PGSOCKET_DIR.  (psql is getting it by #include'ing an internal
> header file; there is no way to get it from PQconndefaults.)  So whatever
> an app might have been doing if it did check for NULL is arguably
> inherently buggy, and bypassing such code will be a good thing.

I agree with each of those points.  I see little value in distinguishing
between, in a default build, explicit PGHOST=/tmp and unspecified PGHOST.  The
lack of a supported way to discover the default compounds that conclusion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redefine default result from PQhost()?

2015-11-27 Thread Tom Lane
Magnus Hagander  writes:
> Hmm. Good point. I didn't realize they already had to be ready to get a
> non-default path back.

Yeah, if it weren't for that, this would definitely be a hazardous change.
But AFAICS, an app that has a problem with this proposal was broken
already, it just didn't know it.

> Of course, the documentation just says it'll return a hostname. It should
> probably mention that it can return the unix socket path as well - but
> that's something that's missing already.

Agreed, will fix that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-27 Thread Teodor Sigaev

Is this patch in 'Waiting on Author' state actually?


 I don't think it's right to reuse SPIError for this.  SPIError is
 clearly meant to signal an error in the SPI calls.  Of course, we can't
 stop users from raising whatever exception they want, but if we're going
 to advertise that users can raise exceptions, then we should create
 separate exception classes.

 I suppose the proper way to set this up would be to create a base class
 like plpy.Error and derive SPIError from that.


Do you have some ideas about the name of this class?


I think plpy.Error is fine.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
Albe Laurenz  writes:
> I agree with your proposed fix, the only thing that makes me feel 
> uncomfortable
> is that you get error messages like:
>   ERROR:  character with byte sequence 0x96 in encoding "WIN1250" has no 
> equivalent in encoding "MULE_INTERNAL"

Hm, yeah.  It's pretty silly that this code uses a double conversion when
it has a table that would work fine for a direct conversion, anyway.  But
I think improving that is a separate question, especially since the above
behavior occurred already for a few code points.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] Duplicate patch in commitfest

2015-11-27 Thread Teodor Sigaev

Hi!

There are two commitfest entries which are linked to the same patch/thread

Move PinBuffer and UnpinBuffer to atomics
  https://commitfest.postgresql.org/7/370/
and
Replace buffer manager spinlock with atomic operations
  https://commitfest.postgresql.org/7/408/

Suppose, one of them could be rejected.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY (INSERT/UPDATE/DELETE .. RETURNING ..)

2015-11-27 Thread Teodor Sigaev

Patch is switched to "ready for committer".

Committed, thank you
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to add and use a static library within Postgres backend

2015-11-27 Thread XiaoChuan Yu
I'm doing some experimenting with the storage engine and I'd like to use a
C++ static library (that has C headers).  I only need this to build on
Ubuntu 64-bit for now.
How do I make postgres build with this library?
What changes do I need to make to the build system files?
I assume headers go in src/include but where do I put the library.a file?

Thanks,
Xiaochuan Yu


Re: [HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Tomas Vondra



On 11/27/2015 02:18 PM, Michael Paquier wrote:

On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
 wrote:

So, what's going on? The problem is that while the rename() is atomic, it's
not guaranteed to be durable without an explicit fsync on the parent
directory. And by default we only do fdatasync on the recycled segments,
which may not force fsync on the directory (and ext4 does not do that,
apparently).


Yeah, that seems to be the way the POSIX spec clears things.
"If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
force all currently queued I/O operations associated with the file
indicated by file descriptor fildes to the synchronized I/O completion
state. All I/O operations shall be completed as defined for
synchronized I/O file integrity completion."
http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
If I understand that right, it is guaranteed that the rename() will be
atomic, meaning that there will be only one file even if there is a
crash, but that we need to fsync() the parent directory as mentioned.


FWIW this has nothing to do with storage reliability - you may have good
drives, RAID controller with BBU, reliable SSDs or whatever, and you're
still not safe. This issue is at the filesystem level, not storage.


The POSIX spec authorizes this behavior, so the FS is not to blame,
clearly. At least that's what I get from it.


The spec seems a bit vague to me (but maybe it's not, I'm not a POSIX 
expert), but we should be prepared for the less favorable interpretation 
I think.





I think this issue might also result in various other issues, not just data
loss. For example, I wouldn't be surprised by data corruption due to
flushing some of the changes in data files to disk (due to contention for
shared buffers and reaching vm.dirty_bytes) and then losing the matching WAL
segment. Also, while I have only seen 1 to 3 segments getting lost, it might
be possible that more segments can get lost, possibly making the recovery
impossible. And of course, this might cause problems with WAL archiving due
to archiving the same
segment twice (before and after crash).


Possible, the switch to .done is done after renaming the segment in
xlogarchive.c. So this could happen in theory.


Yes. That's one of the suspicious places in my notes (haven't posted all 
the details, the message was long enough already).



Attached is a proposed fix for this (xlog-fsync.patch), and I'm pretty sure
this needs to be backpatched to all backbranches. I've also attached a patch
that adds pg_current_xlog_flush_location() function, which proved to be
quite useful when debugging this issue.


Agreed. We should be sure as well that the calls to fsync_fname get
issued in a critical section with START/END_CRIT_SECTION(). It does
not seem to be the case with your patch.


Don't know. I've based that on code from replication/logical/ which does 
fsync_fname() on all the interesting places, without the critical section.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tsvector editing functions

2015-11-27 Thread Teodor Sigaev


1 Please, make patch compilable with current master.
cd ../../../src/include/catalog && '/usr/local/bin/perl' ./duplicate_oids
3315
3316

2 lexin = TextDatumGetCString(PG_GETARG_DATUM(1))
  lexin_len = strlen(lexin)

Why do you use C-string instead of just text? Suppose, much better:
   t = PG_GETARG_TEXT_P(1)
   lexin = VARDATA(t)
   lexin_len = VARSIZE_ANY_EXHDR(t)

3 Why do you use linear search in tsvector instead of binary search? It could 
produce a performance impact


4 Again, using BuildTupleFromCStrings() call is not very optimal

5 printing weights as numbers is not consistent with other usage of weigth's in 
FTS. Lexem's weight are mentioned as one of A,B,C,D and default weight is a D.


Teodor Sigaev wrote:

There is patch that adds some editing routines for tsvector type.

...

When submitting a patch, it's a good idea to explain why someone would
want the feature you are adding.  Maybe that's obvious to you, but it
isn't clear to me why we'd want this.



Some examples:
tsvector delete(tsvector, text)
 remove wronlgy indexed word (may, be a stop word)
text[] to_array(tsvector)
 In my practice, I needed it to work with smlar module.
tsvector to_tsvector(text[])
 Converts list of tags to tsvector, because search in tsvector is more
 flexible and fast than array's equivalents
set unnest(tsvector)
  Count some complicated statistics.

That functions mostly needed in utility processing rather in workflow.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tsvector editing functions

2015-11-27 Thread Teodor Sigaev

Hmm, seems, it will be useful to add two fuctions:

tsvector filter(tsvector, array_of_weigths) - returns tsvector contains lexemes 
with given weights


tsvector setweight(tsvector, weigth, array_of_lexemes) - sets given weight for 
given lexemes


Stas Kelvich wrote:

Hello.

There is patch that adds some editing routines for tsvector type.

tsvector delete(tsvector, text)
removes entry from tsvector by lexeme name
set unnest(tsvector)
expands a tsvector to a set of rows. Each row has following columns: 
lexeme, postings, weights.
text[] to_array(tsvector)
converts tsvector to array of lexemes
tsvector to_tsvector(text[])
converts array of lexemes to tsvector






Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company







--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Teodor Sigaev

What happens is that when we recycle WAL segments, we rename them and then sync
them using fdatasync (which is the default on Linux). However fdatasync does not
force fsync on the parent directory, so in case of power failure the rename may
get lost. The recovery won't realize those segments actually contain changes

Agree. Some time ago I faced with this, although it wasn't a postgres.


So, what's going on? The problem is that while the rename() is atomic, it's not
guaranteed to be durable without an explicit fsync on the parent directory. And
by default we only do fdatasync on the recycled segments, which may not force
fsync on the directory (and ext4 does not do that, apparently).

This impacts all current kernels (tested on 2.6.32.68, 4.0.5 and 4.4-rc1), and
also all supported PostgreSQL versions (tested on 9.1.19, but I believe all
versions since spread checkpoints were introduced are vulnerable).

FWIW this has nothing to do with storage reliability - you may have good drives,
RAID controller with BBU, reliable SSDs or whatever, and you're still not safe.
This issue is at the filesystem level, not storage.

Agree again.


I plan to do more power failure testing soon, with more complex test scenarios.
I suspect there might be other similar issues (e.g. when we rename a file before
a checkpoint and don't fsync the directory - then the rename won't be replayed
and will be lost).

It would be very useful, but I hope you will not find a new bug :)

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some questions about the array.

2015-11-27 Thread Teodor Sigaev

Some comments about patch
1
Documentation isn't very informative
Outputs of
SELECT schedule[:][:] FROM sal_emp WHERE name = 'Bill'
and
SELECT schedule[:2][1:] FROM sal_emp WHERE name = 'Bill';
are the same. Suppose, it's better to have differs ones.

2
# create table xxx (a int[]);
# update xxx set a[2:] = '{1,2}';
UPDATE 0
# update xxx set a[2:] = '{1,2}';
ERROR:  cannot determine upper index for empty array
# update xxx set a[:2] = '{1,2}';
ERROR:  invalid input syntax for integer: "{1,2}"
# update xxx set a[:] = '{1,2}';
ERROR:  invalid input syntax for integer: "{1,2}"

Seems, error messages are too inconsistent. If you forbid omitting bound in 
assigment then if all cases error message should be the same or close.



YUriy Zhuravlev wrote:

Hello again.
I attached simple patch for omitted boundaries in the slice.
This will simplify the writing of SQL. Instead:
select arr[2:array_upper(arr, 1)];
you can write:
select arr[2:];

simple and elegant.
Omitted boundaries is prohibited in UPDATE.

Thanks.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-11-27 Thread Teodor Sigaev

Seems, patch is ready to commit. But it needs some documentation.

Alexander Korotkov wrote:

On Wed, Sep 30, 2015 at 9:46 AM, Michael Paquier > wrote:

On Sat, Sep 19, 2015 at 8:27 AM, Michael Paquier wrote:
> On Fri, Sep 18, 2015 at 6:25 PM, Michael Paquier wrote:
>> The refactoring of getTimelineHistory as you propose looks like a good
>> idea to me, I tried to remove by myself the difference between source
>> and target in copy_fetch.c and friends but this gets uglier,
>> particularly because of datadir_source in copy_file_range. Not worth
>> it.
>
> Forgot that:
> if (ControlFile_target.state != DB_SHUTDOWNED)
> pg_fatal("target server must be shut down cleanly\n");
> We may want to allow a target node shutdowned in recovery as well here.

So, attached is a more polished version of this patch, cleaned up of
its typos with as well other things. I have noticed for example that
it would be more useful to add the debug information of a timeline
file fetched from the source or a target server directly in
getTimelineHistory. I have as well updated a couple of comments in the
code regarding the fact that we do not necessarily use a master as a
target node, and mentioned in findCommonAncestorTimeline that we check
as well the start position of a timeline to cover the case where both
target and source node forked at the same timeline number but with a
different WAL fork position.
I am marking this patch as ready for committer. It would be cool in
the future to use the recovery test suite to have more advanced
scenarios tested, but it seems a shame to block this patch because of
that.


Thanks a lot. Now patch looks much better.
I found that it doesn't applies cleanly on the current master. Rebased version
is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Tomas Vondra

Hi,

I've been doing some power failure tests (i.e.  unexpectedly 
interrupting power) a few days ago, and I've discovered a fairly serious 
case of silent data loss on ext3/ext4. Initially i thought it's a 
filesystem bug, but after further investigation I'm pretty sure it's our 
fault.


What happens is that when we recycle WAL segments, we rename them and 
then sync them using fdatasync (which is the default on Linux). However 
fdatasync does not force fsync on the parent directory, so in case of 
power failure the rename may get lost. The recovery won't realize those 
segments actually contain changes from "future" and thus does not replay 
them. Hence data loss. The recovery completes as if everything went OK, 
so the data loss is entirely silent.


Reproducing this is rather trivial. I've prepared a simple C program 
simulating our WAL recycling, that I intended to send to ext4 mailing 
list to demonstrate the ext4 bug before (I realized it's most likely our 
bug and not theirs).


The example program is called ext4-data-loss.c and is available here 
(along with other stuff mentioned in this message):


https://github.com/2ndQuadrant/ext4-data-loss

Compile it, run it (over ssh from another host), interrupt the power and 
after restart you should see some of the segments be lost (the  rename 
reverted).


The git repo also contains a bunch of python scripts that I initially 
used to reproduce this on PostgreSQL - insert.py, update.py and 
xlog-watch.py. I'm not going to explain the details here, it's a bit 
more complicated but the cause is exactly the same as with the C 
program, just demonstrated in database. See README for instructions.


So, what's going on? The problem is that while the rename() is atomic, 
it's not guaranteed to be durable without an explicit fsync on the 
parent directory. And by default we only do fdatasync on the recycled 
segments, which may not force fsync on the directory (and ext4 does not 
do that, apparently).


This impacts all current kernels (tested on 2.6.32.68, 4.0.5 and 
4.4-rc1), and also all supported PostgreSQL versions (tested on 9.1.19, 
but I believe all versions since spread checkpoints were introduced are 
vulnerable).


FWIW this has nothing to do with storage reliability - you may have good 
drives, RAID controller with BBU, reliable SSDs or whatever, and you're 
still not safe. This issue is at the filesystem level, not storage.


I've done the same tests on xfs and that seems to be safe - I've been 
unable to reproduce the issue, so either the issue is not there or it's 
more difficult to hit it. I haven't tried on other file systems, because 
ext4 and xfs cover vast majority of deployments (at least on Linux), and 
thus issue on ext4 is serious enough I believe.


It's possible to make ext3/ext4 safe with respect to this issue by using 
full journaling (data=journal) instead of the default (data=ordered) 
mode. However this comes at a significant performance cost and pretty 
much no one is using it with PostgreSQL because data=ordered is believed 
to be safe.


It's also possible to mitigate this by setting wal_sync_method=fsync, 
but I don't think I've ever seen that change at a customer. This also 
comes with a significant performance penalty, comparable to setting 
data=journal. This has the advantage that this can be done without 
restarting the database (SIGHUP is enough).


So pretty much everyone running on Linux + ext3/ext4 is vulnerable.

It's also worth mentioning that the data is not actually lost - it's 
properly fsynced in the WAL segments, it's just the rename that got 
lost. So it's possible to survive this without losing data by manually 
renaming the segments, but this must happen before starting the cluster 
because the automatic recovery comes and discards all the data etc.


I think this issue might also result in various other issues, not just 
data loss. For example, I wouldn't be surprised by data corruption due 
to flushing some of the changes in data files to disk (due to contention 
for shared buffers and reaching vm.dirty_bytes) and then losing the 
matching WAL segment. Also, while I have only seen 1 to 3 segments 
getting lost, it might be possible that more segments can get lost, 
possibly making the recovery impossible. And of course, this might cause 
problems with WAL archiving due to archiving the same

segment twice (before and after crash).

Attached is a proposed fix for this (xlog-fsync.patch), and I'm pretty 
sure this needs to be backpatched to all backbranches. I've also 
attached a patch that adds pg_current_xlog_flush_location() function, 
which proved to be quite useful when debugging this issue.


I'd also like to propose adding "last segment" to pg_controldata, next 
to the last checkpoint / restartpoint. We don't need to write this on 
every commit, once per segment (on the first write) is enough. This 
would make investigating the issue much easier, and it'd also make it 
possible to terminate the 

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Tomas Vondra

Hi,

On 11/27/2015 02:28 PM, Greg Stark wrote:

On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra
 wrote:

I plan to do more power failure testing soon, with more complex
test scenarios. I suspect there might be other similar issues (e.g.
when we rename a file before a checkpoint and don't fsync the
directory -then the rename won't be replayed and will be lost).


I'm curious how you're doing this testing. The easiest way I can
think of would be to run a database on an LVM volume and take a large
number of LVM snapshots very rapidly and then see if the database can
start up from each snapshot. Bonus points for keeping track of the
committed transactions before each snaphsot and ensuring they're
still there I guess.


I do have reliable storage (Intel SSD with power-loss protection), and 
I've connected the system to a sophisticated power-loss-making device 
called "the power switch" (image attached).


In other words, in the last ~7 days the system got rebooted more times 
than in the previous ~5 years.



That always seemed unsatisfactory because in the past we were mainly
concerned with whether fsync was actually getting propagated to the
physical media. But for testing whether we're fsyncing enough for
the filesystem that would be good enough.


Yeah. I considered some form of virtualized setup initially, but my 
original intent was to verify whether disabling write barriers really is 
safe (because I've heard numerous complaints that it's stupid). And as 
write barriers are more tightly coupled to the hardware, I opted for the 
more brutal approach.


But I agree some form of virtualized setup might be more flexible, 
although I'm not sure LVM snapshots are good approach as snapshots may 
wait for I/O requests to complete and such. I think something qemu might 
work better when combined with "kill -9" and I plan to try reproducing 
the data loss issue on such setup.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Michael Paquier
On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
 wrote:
> So, what's going on? The problem is that while the rename() is atomic, it's
> not guaranteed to be durable without an explicit fsync on the parent
> directory. And by default we only do fdatasync on the recycled segments,
> which may not force fsync on the directory (and ext4 does not do that,
> apparently).

Yeah, that seems to be the way the POSIX spec clears things.
"If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
force all currently queued I/O operations associated with the file
indicated by file descriptor fildes to the synchronized I/O completion
state. All I/O operations shall be completed as defined for
synchronized I/O file integrity completion."
http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
If I understand that right, it is guaranteed that the rename() will be
atomic, meaning that there will be only one file even if there is a
crash, but that we need to fsync() the parent directory as mentioned.

> FWIW this has nothing to do with storage reliability - you may have good
> drives, RAID controller with BBU, reliable SSDs or whatever, and you're
> still not safe. This issue is at the filesystem level, not storage.

The POSIX spec authorizes this behavior, so the FS is not to blame,
clearly. At least that's what I get from it.

> I've done the same tests on xfs and that seems to be safe - I've been unable
> to reproduce the issue, so either the issue is not there or it's more
> difficult to hit it. I haven't tried on other file systems, because ext4 and
> xfs cover vast majority of deployments (at least on Linux), and thus issue
> on ext4 is serious enough I believe.
>
> So pretty much everyone running on Linux + ext3/ext4 is vulnerable.
>
> It's also worth mentioning that the data is not actually lost - it's
> properly fsynced in the WAL segments, it's just the rename that got lost. So
> it's possible to survive this without losing data by manually renaming the
> segments, but this must happen before starting the cluster because the
> automatic recovery comes and discards all the data etc.

Hm. Most users are not going to notice that, particularly where things
are embedded.

> I think this issue might also result in various other issues, not just data
> loss. For example, I wouldn't be surprised by data corruption due to
> flushing some of the changes in data files to disk (due to contention for
> shared buffers and reaching vm.dirty_bytes) and then losing the matching WAL
> segment. Also, while I have only seen 1 to 3 segments getting lost, it might
> be possible that more segments can get lost, possibly making the recovery
> impossible. And of course, this might cause problems with WAL archiving due
> to archiving the same
> segment twice (before and after crash).

Possible, the switch to .done is done after renaming the segment in
xlogarchive.c. So this could happen in theory.

> Attached is a proposed fix for this (xlog-fsync.patch), and I'm pretty sure
> this needs to be backpatched to all backbranches. I've also attached a patch
> that adds pg_current_xlog_flush_location() function, which proved to be
> quite useful when debugging this issue.

Agreed. We should be sure as well that the calls to fsync_fname get
issued in a critical section with START/END_CRIT_SECTION(). It does
not seem to be the case with your patch.

> And finally, I've done a quick review of all places that might suffer the
> same issue - some are not really interesting as the stuff is ephemeral
> anyway (like pgstat for example), but there are ~15 places that may need
> this fix:
>
>  * src/backend/access/transam/timeline.c (2 matches)
>  * src/backend/access/transam/xlog.c (9 matches)
>  * src/backend/access/transam/xlogarchive.c (3 matches)
>  * src/backend/postmaster/pgarch.c (1 match)
>
> Some of these places might be actually safe because a fsync happens
> somewhere immediately after the rename (e.g. in a caller), but I guess
> better safe than sorry.

I had a quick look at those code paths and indeed it would be safer to
be sure that once rename() is called we issue those fsync calls.

> I plan to do more power failure testing soon, with more complex test
> scenarios. I suspect there might be other similar issues (e.g. when we
> rename a file before a checkpoint and don't fsync the directory - then the
> rename won't be replayed and will be lost).

That would be great.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2015-11-27 Thread Greg Stark
On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra
 wrote:
> I plan to do more power failure testing soon, with more complex test
> scenarios. I suspect there might be other similar issues (e.g. when we
> rename a file before a checkpoint and don't fsync the directory - then the
> rename won't be replayed and will be lost).

I'm curious how you're doing this testing. The easiest way I can think
of would be to run a database on an LVM volume and take a large number
of LVM snapshots very rapidly and then see if the database can start
up from each snapshot. Bonus points for keeping track of the committed
transactions before each snaphsot and ensuring they're still there I
guess.

That always seemed unsatisfactory because in the past we were mainly
concerned with whether fsync was actually getting propagated to the
physical media. But for testing whether we're fsyncing enough for the
filesystem that would be good enough.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
I wrote:
> I have not attempted to reverify the files in utils/mb/Unicode against the
> original Unicode Consortium data, but maybe we ought to do that before
> taking any further steps here.

I downloaded the mapping files from unicode.org and attempted to verify
that the Unicode/*.map files could be reproduced from the stated sources.
Most of them are okay, but I failed to verify these:

euc_cn_to_utf8.map  utf8_to_euc_cn.map

Could not find the reference file GB2312.TXT; it is not at unicode.org

gb18030_to_utf8.map utf8_to_gb18030.map

Could not find the reference file gb-18030-2000.xml, whose origin is
unstated anyway.

euc_jp_to_utf8.map  utf8_to_euc_jp.map
euc_kr_to_utf8.map  utf8_to_euc_kr.map
johab_to_utf8.map   utf8_to_johab.map
uhc_to_utf8.map utf8_to_uhc.map

These four all have minor to significant differences from what I got by
running the generation scripts.  See attached diffs.

utf8_to_sjis.map

It's very disturbing that this fails to verify when its allegedly inverse
file does verify; either the script is broken or somebody did sloppy
manual editing.

Anyway, this seems to mean that it's okay to go ahead with fixing the
encoding conversion discrepancies I complained of yesterday; the data
those proposed diffs are based on is solid.  But we've evidently got
a number of other issues with these Far Eastern encodings.

regards, tom lane


*** euc_jp_to_utf8.map.orig Fri May 15 17:56:06 2015
--- euc_jp_to_utf8.map  Fri Nov 27 16:25:37 2015
***
*** 1,6 
  /* src/backend/utils/mb/Unicode/euc_jp_to_utf8.map */
  
! static const pg_local_to_utf LUmapEUC_JP[] = {
{0x8ea1, 0xefbda1},
{0x8ea2, 0xefbda2},
{0x8ea3, 0xefbda3},
--- 1,6 
  /* src/backend/utils/mb/Unicode/euc_jp_to_utf8.map */
  
! static const pg_local_to_utf LUmapEUC_JP[ 13007 ] = {
{0x8ea1, 0xefbda1},
{0x8ea2, 0xefbda2},
{0x8ea3, 0xefbda3},
***
*** 95,103 
{0xa1bd, 0xe28095},
{0xa1be, 0xe28090},
{0xa1bf, 0xefbc8f},
!   {0xa1c0, 0xefbcbc},
!   {0xa1c1, 0xefbd9e},
!   {0xa1c2, 0xe288a5},
{0xa1c3, 0xefbd9c},
{0xa1c4, 0xe280a6},
{0xa1c5, 0xe280a5},
--- 95,102 
{0xa1bd, 0xe28095},
{0xa1be, 0xe28090},
{0xa1bf, 0xefbc8f},
!   {0xa1c1, 0xe3809c},
!   {0xa1c2, 0xe28096},
{0xa1c3, 0xefbd9c},
{0xa1c4, 0xe280a6},
{0xa1c5, 0xe280a5},
***
*** 124,130 
{0xa1da, 0xe38090},
{0xa1db, 0xe38091},
{0xa1dc, 0xefbc8b},
!   {0xa1dd, 0xefbc8d},
{0xa1de, 0xc2b1},
{0xa1df, 0xc397},
{0xa1e0, 0xc3b7},
--- 123,129 
{0xa1da, 0xe38090},
{0xa1db, 0xe38091},
{0xa1dc, 0xefbc8b},
!   {0xa1dd, 0xe28892},
{0xa1de, 0xc2b1},
{0xa1df, 0xc397},
{0xa1e0, 0xc3b7},
***
*** 144,151 
{0xa1ee, 0xe28483},
{0xa1ef, 0xefbfa5},
{0xa1f0, 0xefbc84},
!   {0xa1f1, 0xefbfa0},
!   {0xa1f2, 0xefbfa1},
{0xa1f3, 0xefbc85},
{0xa1f4, 0xefbc83},
{0xa1f5, 0xefbc86},
--- 143,150 
{0xa1ee, 0xe28483},
{0xa1ef, 0xefbfa5},
{0xa1f0, 0xefbc84},
!   {0xa1f1, 0xc2a2},
!   {0xa1f2, 0xc2a3},
{0xa1f3, 0xefbc85},
{0xa1f4, 0xefbc83},
{0xa1f5, 0xefbc86},
***
*** 182,188 
{0xa2c1, 0xe288a9},
{0xa2ca, 0xe288a7},
{0xa2cb, 0xe288a8},
!   {0xa2cc, 0xefbfa2},
{0xa2cd, 0xe28792},
{0xa2ce, 0xe28794},
{0xa2cf, 0xe28880},
--- 181,187 
{0xa2c1, 0xe288a9},
{0xa2ca, 0xe288a7},
{0xa2cb, 0xe288a8},
!   {0xa2cc, 0xc2ac},
{0xa2cd, 0xe28792},
{0xa2ce, 0xe28794},
{0xa2cf, 0xe28880},
***
*** 588,676 
{0xa8be, 0xe294a5},
{0xa8bf, 0xe294b8},
{0xa8c0, 0xe29582},
-   {0xada1, 0xe291a0},
-   {0xada2, 0xe291a1},
-   {0xada3, 0xe291a2},
-   {0xada4, 0xe291a3},
-   {0xada5, 0xe291a4},
-   {0xada6, 0xe291a5},
-   {0xada7, 0xe291a6},
-   {0xada8, 0xe291a7},
-   {0xada9, 0xe291a8},
-   {0xadaa, 0xe291a9},
-   {0xadab, 0xe291aa},
-   {0xadac, 0xe291ab},
-   {0xadad, 0xe291ac},
-   {0xadae, 0xe291ad},
-   {0xadaf, 0xe291ae},
-   {0xadb0, 0xe291af},
-   {0xadb1, 0xe291b0},
-   {0xadb2, 0xe291b1},
-   {0xadb3, 0xe291b2},
-   {0xadb4, 0xe291b3},
-   {0xadb5, 0xe285a0},
-   {0xadb6, 0xe285a1},
-   {0xadb7, 0xe285a2},
-   {0xadb8, 0xe285a3},
-   {0xadb9, 0xe285a4},
-   {0xadba, 0xe285a5},
-   {0xadbb, 0xe285a6},
-   {0xadbc, 0xe285a7},
-   {0xadbd, 0xe285a8},
-   {0xadbe, 0xe285a9},
-   {0xadc0, 0xe38d89},
-   {0xadc1, 0xe38c94},
-   {0xadc2, 0xe38ca2},
-   {0xadc3, 0xe38d8d},
-   {0xadc4, 0xe38c98},
-   {0xadc5, 0xe38ca7},
-   {0xadc6, 0xe38c83},
-   {0xadc7, 0xe38cb6},
-   {0xadc8, 0xe38d91},
-   {0xadc9, 0xe38d97},
-   {0xadca, 0xe38c8d},
-   {0xadcb, 0xe38ca6},
-   {0xadcc, 0xe38ca3},
-   {0xadcd, 0xe38cab},
-   {0xadce, 0xe38d8a},
-   {0xadcf, 0xe38cbb},
-   {0xadd0, 0xe38e9c},
-   {0xadd1, 0xe38e9d},
-   {0xadd2, 0xe38e9e},
-   {0xadd3, 0xe38e8e},
-   {0xadd4, 0xe38e8f},
-   {0xadd5, 0xe38f84},
-   {0xadd6, 0xe38ea1},
-   {0xaddf, 0xe38dbb},
-   

Re: [HACKERS] Rework the way multixact truncations work

2015-11-27 Thread Noah Misch
On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch  wrote:
> >> I'm not following along right now - in order to make cleanups the plan is 
> >> to revert a couple commits and then redo them prettyfied?
> >
> > Yes, essentially.  Given the volume of updates, this seemed neater than
> > framing those updates as in-tree incremental development.
> 
> I think that's an odd way of representing this work. I tend to
> remember roughly when major things were committed even years later. An
> outright revert should represent a total back out of the original
> commit IMV. Otherwise, a git blame can be quite misleading.

I think you're saying that "clearer git blame" is a more-important reason than
"volume of updates" for preferring an outright revert over in-tree incremental
development.  Fair preference.  If that's a correct reading of your message,
then we do agree on the bottom line.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tatsuo Ishii
> I wrote:
>> I have not attempted to reverify the files in utils/mb/Unicode against the
>> original Unicode Consortium data, but maybe we ought to do that before
>> taking any further steps here.
> 
> I downloaded the mapping files from unicode.org and attempted to verify
> that the Unicode/*.map files could be reproduced from the stated sources.
> Most of them are okay, but I failed to verify these:
> 
> euc_cn_to_utf8.maputf8_to_euc_cn.map
> 
> Could not find the reference file GB2312.TXT; it is not at unicode.org
> 
> gb18030_to_utf8.map   utf8_to_gb18030.map
> 
> Could not find the reference file gb-18030-2000.xml, whose origin is
> unstated anyway.
> 
> euc_jp_to_utf8.maputf8_to_euc_jp.map
> euc_kr_to_utf8.maputf8_to_euc_kr.map
> johab_to_utf8.map utf8_to_johab.map
> uhc_to_utf8.map   utf8_to_uhc.map
> 
> These four all have minor to significant differences from what I got by
> running the generation scripts.  See attached diffs.
> 
> utf8_to_sjis.map
> 
> It's very disturbing that this fails to verify when its allegedly inverse
> file does verify;
> either the script is broken or somebody did sloppy
> manual editing.

Manual editing.

I explain why the manual editing is necessary.

One of the most famous problems with Unicode is "wave dash"
(U+301C). According the Unicode consortium's Unicode/SJIS map, it
corresponds to 0x8160 of Shift_JIS. Unfortunately this was a mistake
in Unicode (the glyph of Shift_JIS and Unicode is slightly different -
looks like to be rotated in 90 degrees of wave dash in vertical
scripting. Probably they did not understand the Japanese vertical
writing at that time). So later on the Unicode consortium decided to
add another "wave dash" as U+FF5E which has a correct glyph of "wave
dash". However since Unicode already decided that U+301C corresponds
to 0x8160 of Shift_JIS, there's no Shift_JIS code corresponding to
U+FF5E. Unlike Unicode's definition, Microsoft defines that 0x8160
(wave dash) corresponds to U+FF5E. This is widely used in Japan. So I
decided to hire this for "wave dash". i.e.

0x8160 -> U+FF5E (sjis_to_utf8.map)

U+301C -> 0x8160 (utf_to_sjis.map)
U+FF5E -> 0x8160 (utf_to_sjis.map)

Another problem is vendor extension.

There are several standards for SJIS and EUC_JP in Japan. There is a
standard "Shift_JIS" defined by Japanese Government (probably the
Unicode consortium's map can be based on this, but I need to
verify). However several major vendors include IBM, NEC added their
own additional characters to Shift_JIS and they are widely used in
Japan. Unfortunately they are not compatible. So as a compromise I and
other developers decided to "merge" NEC and IBM extension part and
added to Shift_JIS. Same thing can be said to EUC_JP.

In short, there are number of reasons we cannot simply import the
consortium's mapping regarding SJIS (and EUC_JP).

> Anyway, this seems to mean that it's okay to go ahead with fixing the
> encoding conversion discrepancies I complained of yesterday; the data
> those proposed diffs are based on is solid.  But we've evidently got
> a number of other issues with these Far Eastern encodings.
> 
>   regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Gather node in OPTIMIZER_DEBUG output

2015-11-27 Thread Peter Geoghegan
It looks like commit 3bd909b22 did not place GatherPath within
print_path(), preventing complete information for Gather paths from
appearing when OPTIMIZER_DEBUG is in use.

Attached patch fixes the issue.

-- 
Peter Geoghegan
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1fdcae5..4516cd3 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2771,6 +2771,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Unique";
 			subpath = ((UniquePath *) path)->subpath;
 			break;
+		case T_GatherPath:
+			ptype = "Gather";
+			subpath = ((GatherPath *) path)->subpath;
+			break;
 		case T_NestPath:
 			ptype = "NestLoop";
 			join = true;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate patch in commitfest

2015-11-27 Thread Michael Paquier
On Sat, Nov 28, 2015 at 1:32 AM, Teodor Sigaev  wrote:
> Hi!
>
> There are two commitfest entries which are linked to the same patch/thread
>
> Move PinBuffer and UnpinBuffer to atomics
>   https://commitfest.postgresql.org/7/370/
> and
> Replace buffer manager spinlock with atomic operations
>   https://commitfest.postgresql.org/7/408/
>
> Suppose, one of them could be rejected.

I can delete the newest entry if people are fine with that. That will
make the history cleaner.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Potential pointer dereference in plperl.c (caused by transforms patch)

2015-11-27 Thread Noah Misch
On Mon, May 04, 2015 at 02:02:18PM +0900, Michael Paquier wrote:
> Coverity is pointing out that as argtypes = NULL in
> plperl_call_perl_func@plperl.c, we will have a pointer dereference if
> desc->arg_arraytype[i] is not a valid OID, see here:
> +   Oid*argtypes = NULL;
> [...]
> +   if (fcinfo->flinfo->fn_oid)
> +   get_func_signature(fcinfo->flinfo->fn_oid, , );
> [...]
> if (OidIsValid(desc->arg_arraytype[i]))
> sv =
> plperl_ref_from_pg_array(fcinfo->arg[i], desc->arg_arraytype[i]);
> +   else if ((funcid =
> get_transform_fromsql(argtypes[i],
> current_call_data->prodesc->lang_oid,
> current_call_data->prodesc->trftypes)))
> +   sv = (SV *)
> DatumGetPointer(OidFunctionCall1(funcid, fcinfo->arg[i]));
> AFAIK, fcinfo->flinfo->fn_oid can be InvalidOid in this code path, so
> shouldn't we protect a bit the code with something like the patch
> attached?

fcinfo->flinfo->fn_oid==InvalidOid implies an inline block, and those have no
arguments.  If it placates Coverity, I lean toward an assert-only change:

--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -2112,6 +2112,8 @@ plperl_call_perl_func(plperl_proc_desc *desc, 
FunctionCallInfo fcinfo)
EXTEND(sp, desc->nargs);
 
+   /* Get signature for true functions; inline blocks have no args. */
if (fcinfo->flinfo->fn_oid)
get_func_signature(fcinfo->flinfo->fn_oid, , );
+   Assert(nargs == desc->nargs);
 
for (i = 0; i < desc->nargs; i++)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Tom Lane
I wrote:
> gb18030_to_utf8.map   utf8_to_gb18030.map
> Could not find the reference file gb-18030-2000.xml, whose origin is
> unstated anyway.

Ah, scratch that complaint; digging in our git history turned up the
origin of that file, so I double-checked it and then updated the script
with a comment.  We're good on that conversion ... but the other ones
still have problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-11-27 Thread Alvaro Herrera
Michael Paquier wrote:

> The result of a couple of hours of hacking is attached:
> - 0001 is the refactoring adding PostgresNode and RecursiveCopy. I have
> also found that it is quite advantageous to move some of the routines that
> are synonyms of system() and the stuff used for logging into another
> low-level library that PostgresNode depends on, that I called TestBase in
> this patch. This way, all the infrastructure depends on the same logging
> management. Existing tests have been refactored to fit into the new code,
> and this leads to a couple of simplifications particularly in pg_rewind
> tests because there is no more need to have there routines for environment
> cleanup and logging. I have done tests on OSX and Windows using it and
> tests are passing. I have as well tested that ssl tests were working.

Here's another version of this.  I changed the packages a bit more.  For
starters, I moved the routines around a bit; some of your choices seemed
more about keeping stuff where it was originally rather than moving it
to where it made sense.  These are the routines in each module:

TestBase:  system_or_bail system_log run_log slurp_dir slurp_file
append_to_file

TestLib:get_new_node teardown_node psql poll_query_until command_ok
command_fails command_exit_is program_help_ok program_version_ok
program_options_handling_ok command_like issues_sql_like

I tried to get rid of teardown_node by having a DESTROY method for
PostgresNode; that method would call "pg_ctl stop -m immediate".  That
would have been much cleaner.  However, when a test fails this doesn't
work sanely because the END block for File::Temp runs earlier than that
DESTROY block, which means the datadir is already gone by the time
pg_ctl stop runs, so the node stop doesn't work at all.  (Perhaps we
could fix this by noting postmaster's PID at start time, and then
sending a signal directly instead of relying on pg_ctl).

I moved all the initialization code (deleting stuff from environment,
detecting Windows, opening SimpleTie filedescs etc) into BEGIN blocks,
which run earlier than any other code.

I perltidy'ed PostgresNode (and all the other files actually), to have
the style match the rest of our code.  I also updated some code to be
more Perlish.

I added a lot of error checking in RecursiveCopy.

You had a "cat" call somewhere, which I replaced with slurp_file.


I considered updating RewindTest so that it didn't have to export the
node global variables, but decided not to, not because of the huge code
churn for the t/*.pl files but because of the problem with the DESTROY
method above: it didn't actually buy anything.

Hmm. I just noticed RewindTest sets $ENV{PGDATABASE} outside BEGIN.  Not
sure what to think of that.  Could instead pass the database name in
$node->getConnStr() calls, like run_pg_rewind() is already doing.

I tried all the t/ tests we have and all of them pass for me.  If I'm
able, I will push this on my Sunday late evening, so that I can fix
whatever gets red on Monday first thing ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 299dcf5..3b5d7af 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -4,10 +4,11 @@
 
 use strict;
 use warnings;
+use TestBase;
 use TestLib;
 use Test::More tests => 14;
 
-my $tempdir = TestLib::tempdir;
+my $tempdir = TestBase::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
 my $datadir = "$tempdir/data";
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index dc96bbf..65ed4de 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,6 +2,8 @@ use strict;
 use warnings;
 use Cwd;
 use Config;
+use PostgresNode;
+use TestBase;
 use TestLib;
 use Test::More tests => 51;
 
@@ -9,8 +11,15 @@ program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
 program_options_handling_ok('pg_basebackup');
 
-my $tempdir = tempdir;
-start_test_server $tempdir;
+my $tempdir = TestBase::tempdir;
+
+my $node = get_new_node();
+# Initialize node without replication settings
+$node->initNode(0);
+$node->startNode();
+my $pgdata = $node->getDataDir();
+
+$ENV{PGPORT} = $node->getPort();
 
 command_fails(['pg_basebackup'],
 	'pg_basebackup needs target directory specified');
@@ -26,19 +35,19 @@ if (open BADCHARS, ">>$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
 	close BADCHARS;
 }
 
-configure_hba_for_replication "$tempdir/pgdata";
-system_or_bail 'pg_ctl', '-D', "$tempdir/pgdata", 'reload';
+$node->setReplicationConf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
 
 command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of WAL configuration');
 
-open CONF, ">>$tempdir/pgdata/postgresql.conf";
+open CONF, ">>$pgdata/postgresql.conf";
 print 

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-11-27 Thread Dean Rasheed
On 11 November 2015 at 11:45, Dean Rasheed  wrote:
> Thanks for testing. I'll post an updated patch sometime soon.
>

I finally got round to looking at this again, and here is an updated patch.

I've reverted the changes to the CHECKFLOATVAL() checks in the
existing functions, so that can be a separate discussion. As for the
checks in the new functions added by this patch, I've left them as
they were on the grounds that the checks are taking place after
argument reduction, so the arguments will not be infinite  at that
point (and besides, I think the checks are correct as written anyway).

I've reduced the regression tests down to checks of the cases where
the results should be exact, so they should now be
platform-independent. Many of the original tests were useful during
development to ensure that sane-looking answers were being returned,
but they didn't really add anything as regression tests, other than
extra complication due to platform variations.

Regards,
Dean
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 60b9a09..3716210
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 993,1001 
 Finally,  shows the
 available trigonometric functions.  All trigonometric functions
 take arguments and return values of type double
!precision. Trigonometric functions arguments are expressed
!in radians. Inverse functions return values are expressed in
!radians.  See unit transformation functions
 radians() and
 degrees() above.

--- 993,1001 
 Finally,  shows the
 available trigonometric functions.  All trigonometric functions
 take arguments and return values of type double
!precision.  Each of the trigonometric functions comes in
!two varieties, one which works in radians and one which works in
!degrees.  See unit transformation functions
 radians() and
 degrees() above.

***
*** 1003,1012 
 
  Trigonometric Functions
  
! 
   

!Function
 Description

   
--- 1003,1013 
 
  Trigonometric Functions
  
! 
   

!Function (radians)
!Function (degrees)
 Description

   
***
*** 1018,1023 
--- 1019,1029 
   acos
  acos(x)
 
+
+ 
+  acosd
+ acosd(x)
+
 inverse cosine

  
***
*** 1028,1033 
--- 1034,1045 
  
  asin(x)
 
+
+ 
+  asind
+ 
+ asind(x)
+
 inverse sine

  
***
*** 1038,1043 
--- 1050,1061 
  
  atan(x)
 
+
+ 
+  atand
+ 
+ atand(x)
+
 inverse tangent

  
***
*** 1049,1054 
--- 1067,1079 
  atan2(y,
  x)
 
+
+ 
+  atan2d
+ 
+ atan2d(y,
+ x)
+
 inverse tangent of
  y/x

***
*** 1060,1065 
--- 1085,1096 
  
  cos(x)
 
+
+ 
+  cosd
+ 
+ cosd(x)
+
 cosine

  
***
*** 1070,1075 
--- 1101,1112 
  
  cot(x)
 
+
+ 
+  cotd
+ 
+ cotd(x)
+
 cotangent

  
***
*** 1080,1085 
--- 1117,1128 
  
  sin(x)
 
+
+ 
+  sind
+ 
+ sind(x)
+
 sine

  
***
*** 1090,1095 
--- 1133,1144 
  
  tan(x)
 
+
+ 
+  tand
+ 
+ tand(x)
+
 tangent

   
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
new file mode 100644
index 4e927d8..4a6fdae
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*** dcot(PG_FUNCTION_ARGS)
*** 1642,1648 
   errmsg("input is out of range")));
  
  	result = 1.0 / result;
! 	CHECKFLOATVAL(result, true /* cotan(pi/2) == inf */ , true);
  	PG_RETURN_FLOAT8(result);
  }
  
--- 1642,1648 
   errmsg("input is out of range")));
  
  	result = 1.0 / result;
! 	CHECKFLOATVAL(result, true /* cot(0) == Inf */ , true);
  	PG_RETURN_FLOAT8(result);
  }
  
*** dtan(PG_FUNCTION_ARGS)
*** 1688,1693 
--- 1688,2056 
  	PG_RETURN_FLOAT8(result);
  }
  
+ 
+ /*
+  *		asind_q1		- returns the inverse sine of x in degrees, where x is
+  *		  assumed to be in the range [0, 1] and the result is
+  *		  an angle in the first quadrant (0 to 90 degrees).
+  *
+  * In this quadrant there are 3 special case inputs (0, 0.5 and 1) for which
+  * this function will return exact values 

Re: [HACKERS] New email address

2015-11-27 Thread Greg Stark
On Thu, Nov 26, 2015 at 11:26 PM, Alvaro Herrera
 wrote:
> I don't think that's going to be anything but unwelcome noise.  What
> would they do if they became aware of the issue?  They could switch
> providers, but that only works for so long.  As soon as Gmail switches
> to p=reject, we've lost.  We got away with doing it for Yahoo because
> there's not a lot of people using that -- not on these lists anyway.

On further thought I think Gmail going p=reject is the wrong thing to
worry about. The thing we need to check is how major mail providers
like Gmail and Yahoo handle SPF failures *today*. There are plenty of
domains we probably don't want to miss emails from that *already* have
p=reject. For example if a Google employee mails us from @google.com
[*] today that domain has p=reject so will everyone reading the list
on Gmail or Yahoo miss the email? I bet other major companies have
p=reject on their corporate domains as well.

I'm hoping that as long as the DKIM signature succeeds those mails
will still get through to Gmail, Yahoo, etc. There may be some people
who miss it due to supporting SPF but not DKIM but if major providers
fall into that camp then we're done for regardless of whether they
have p=reject for their own domains.

[*] This finally explains to me why there was a push to get employees
to use an alternate domain for their free software mailing list posts
instead of their @google.com address.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New email address

2015-11-27 Thread Magnus Hagander
On Fri, Nov 27, 2015 at 10:36 AM, Greg Stark  wrote:

> On Thu, Nov 26, 2015 at 11:26 PM, Alvaro Herrera
>  wrote:
> > I don't think that's going to be anything but unwelcome noise.  What
> > would they do if they became aware of the issue?  They could switch
> > providers, but that only works for so long.  As soon as Gmail switches
> > to p=reject, we've lost.  We got away with doing it for Yahoo because
> > there's not a lot of people using that -- not on these lists anyway.
>
> On further thought I think Gmail going p=reject is the wrong thing to
> worry about. The thing we need to check is how major mail providers
> like Gmail and Yahoo handle SPF failures *today*. There are plenty of
> domains we probably don't want to miss emails from that *already* have
> p=reject. For example if a Google employee mails us from @google.com
> [*] today that domain has p=reject so will everyone reading the list
> on Gmail or Yahoo miss the email? I bet other major companies have
> p=reject on their corporate domains as well.
>

Google doesn't actually reject, but it increases the likelyhood of it
hitting spam significantly. However, they put a fairly low value on the SPF
records. In my experience, it seems they put a much higher value on DKIM
(failed or not).

Of course, Google also only actually *supports* email if both sender and
receiver is on gmail. Anything else is "we hope it works". (Yes, I have
official responses from google paid support saying they only support
scenarios where both sender and receiver is on gmail)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] WIP: About CMake v2

2015-11-27 Thread YUriy Zhuravlev
On Thursday 26 November 2015 19:28:15 you wrote:
> I think you don't understand the point: start with the *right* cmake
> version because you could have to redo (a lot of) your work

Between versions CMake there is no fundamental difference.
On my laptop or desktop is already 3.4.0 (new KDE requires). At friends is 
generally have the same all over 3.0.0. It is simply not convenient right now 
(develop under 2.8). 
But I try not to use 3.0 features. Later simply check will need to be 2.8.

Importantly do not forget CMake != GNU Make. CMake == autotools+gnumake.
New versions of CMake most fixes modules like FindBISON and etc. This 
different philosophy. 

Thanks.
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New email address

2015-11-27 Thread Greg Stark
On Fri, Nov 27, 2015 at 6:43 AM, Stefan Kaltenbrunner
 wrote:
>
> well not changing the subject seems like something we could do without
> fuss - not changing the body would likely mean we would (again) get a
> number of people asking "how do I unsubscribe", but maybe we will have
> to live with that.

I've never thought the footer actually helped with that. We still get
plenty of such emails anyways.

On a hopeful note if the emails pass DKIM I wonder if our
List-Unsubscribe link will start working. It should be making an
"Unsubscribe" button appear in Gmail but doesn't currently seem to
work. It's not clear if it will though since Gmail seems to think it's
associated with the sender -- i.e. that all emails are spam.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redefine default result from PQhost()?

2015-11-27 Thread Magnus Hagander
On Thu, Nov 26, 2015 at 4:48 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Wed, Nov 25, 2015 at 11:59 PM, Tom Lane  wrote:
> >> I think we should [ return DEFAULT_PGSOCKET_DIR not NULL ]
>
> > I agree with this change in genera. But I wonder if there's a risk here
> > that we break some applications isnt' it? It's clearly a backwards
> > incompatible change, so wouldn't it require a bump of libpq version?
>
> I don't see why it would need a libpq version bump, as it's not an ABI
> breakage.  As for breaking application logic, it's true that any app code
> that is explicitly checking for NULL would become dead code, but that
> seems pretty harmless.  It's already the case that apps should be prepared
> to get back an explicit socket directory path spec; that would merely
> become a more common case than before.
>

Hmm. Good point. I didn't realize they already had to be ready to get a
non-default path back.

Of course, the documentation just says it'll return a hostname. It should
probably mention that it can return the unix socket path as well - but
that's something that's missing already.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] New email address

2015-11-27 Thread Magnus Hagander
On Fri, Nov 27, 2015 at 10:57 AM, Greg Stark  wrote:

> On Fri, Nov 27, 2015 at 6:43 AM, Stefan Kaltenbrunner
>  wrote:
> >
> > well not changing the subject seems like something we could do without
> > fuss - not changing the body would likely mean we would (again) get a
> > number of people asking "how do I unsubscribe", but maybe we will have
> > to live with that.
>
> I've never thought the footer actually helped with that. We still get
> plenty of such emails anyways.
>

We used to get more of them. Whether it's because of this or just because
people are more used to it is a different question of course.



> On a hopeful note if the emails pass DKIM I wonder if our
> List-Unsubscribe link will start working. It should be making an
> "Unsubscribe" button appear in Gmail but doesn't currently seem to
> work. It's not clear if it will though since Gmail seems to think it's
> associated with the sender -- i.e. that all emails are spam.
>

Do you have any examples of lists where it *does* work? LIke, could it be
because our list-unsubscribe links are mailto: links and not http(s) links
for example?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-11-27 Thread Ashutosh Bapat
Thanks Rushabh for your review and comments.

On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia 
wrote:

> Hi Ashutosh,
>
> I reviewed your latest version of patch and over all the implementation
> and other details look good to me.
>
> Here are few cosmetic issues which I found:
>
> 1) Patch not getting applied cleanly - white space warning
>
>
Done.


> 2)
>
> -List   *usable_pathkeys = NIL;
> +List*useful_pathkeys_list = NIL;/* List of all pathkeys */
>
> Code alignment is not correct with other declared variables.
>
>
Incorporated the change in the patch.

3)
>
> +{
> +PathKey*pathkey;
> +List*pathkeys;
> +
> +pathkey = make_canonical_pathkey(root, cur_ec,
> +
> linitial_oid(cur_ec->ec_opfamilies),
> +BTLessStrategyNumber,
> +false);
> +pathkeys = list_make1(pathkey);
> +useful_pathkeys_list = lappend(useful_pathkeys_list,
> pathkeys);
> +}
>
> Code alignment need to fix at make_canonical_pathkey().
>

Incorporated the change in the patch.

I have also removed the TODO item in the prologue of this function, since
none has objected to externalization of make_canonical_pathkeys till now
and it's not expected to be part of the final commit.


>
> 4)
>
> I don't understand the meaning of following added testcase into
> postgres_fdw.
>
> +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
> @@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
>  -- join two tables
>  SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
> t1.c1 OFFSET 100 LIMIT 10;
>  -- subquery
>  SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <=
> 10) ORDER BY c1;
>  -- subquery+MAX
>  SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY
> c1;
>  -- used in CTE
>  WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3,
> t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
>  -- fixed values
>  SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
> +-- getting data sorted from the foreign table for merge join
> +-- Since we are interested in merge join, disable other joins
> +SET enable_hashjoin TO false;
> +SET enable_nestloop TO false;
> +-- inner join, expressions in the clauses appear in the equivalence class
> list
> +EXPLAIN (VERBOSE, COSTS false)
> +SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 =
> t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
> +SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C
> 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
> +-- outer join, expression in the clauses do not appear in equivalence
> class list
> +-- but no output change as compared to the previous query
> +EXPLAIN (VERBOSE, COSTS false)
> +SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1
> = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
> +SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 =
> t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
> +SET enable_hashjoin TO true;
> +SET enable_nestloop TO true;
>
> Because, I removed the code changes of the patch and then I run the test
> seem like it has nothing to do with the code changes. Above set of test
> giving
> same result with/without patch.
>
> Am I missing something ?
>

Actually, the output of merge join is always ordered by the pathkeys used
for merge join. That routed through LIMIT node remains ordered. So, we
actually do not need ORDER BY t1.c1 clause in the above queries. Without
that clause, the tests will show difference output with and without patch.
I have changed the attached patch accordingly.


>
> Apart from this I debugged the patch for each scenario (query pathkeys and
> pathkeys arising out of the equivalence classes) and so far patch looks
> good
> to me.
>
>
Thanks.


> Attaching update version of patch by fixing the cosmetic changes.
>
>
Attached version of patch contains your changes.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..a9b0bd8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,93 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 --+--
  fixed| 
 (1 row)
 
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+-- Merge join always produces the result in the sorted order, so no need for
+-- separate ORDER BY clause to get ordered results.
+SET enable_hashjoin TO false;
+SET enable_nestloop 

Re: [HACKERS] Error with index on unlogged table

2015-11-27 Thread Michael Paquier
On Fri, Nov 27, 2015 at 3:42 PM, Michael Paquier wrote:
> I am still investigating for a correct fix, looking at reinit.c the
> code in charge of copying the init fork as the main fork for a
> relation at the end of recovery looks to be doing its job correctly...

Attached is a patch that fixes the issue for me in master and 9.5.
Actually in the last patch I forgot a call to smgrwrite to ensure that
the INIT_FORKNUM is correctly synced to disk when those pages are
replayed at recovery, letting the reset routines for unlogged
relations do their job correctly. I have noticed as well that we need
to do the same for gin and brin relations. In this case I think that
we could limit the flush to unlogged relations, my patch does it
unconditionally though to generalize the logic. Thoughts?
-- 
Michael
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 99337b0..d7964ac 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -682,7 +682,12 @@ brinbuildempty(PG_FUNCTION_ARGS)
 	brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
 	   BRIN_CURRENT_VERSION);
 	MarkBufferDirty(metabuf);
-	log_newpage_buffer(metabuf, false);
+	/*
+	 * When this full page image is replayed, there is no guarantee that
+	 * this page will be present to disk when replayed particularly for
+	 * unlogged relations, hence enforce it to be flushed to disk.
+	 */
+	log_newpage_buffer(metabuf, false, true);
 	END_CRIT_SECTION();
 
 	UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index f876f62..572fe20 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -865,7 +865,7 @@ brin_initialize_empty_new_buffer(Relation idxrel, Buffer buffer)
 	page = BufferGetPage(buffer);
 	brin_page_init(page, BRIN_PAGETYPE_REGULAR);
 	MarkBufferDirty(buffer);
-	log_newpage_buffer(buffer, true);
+	log_newpage_buffer(buffer, true, false);
 	END_CRIT_SECTION();
 
 	/*
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 49e9185..755c983 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
 		ReadBufferExtended(index, INIT_FORKNUM, P_NEW, RBM_NORMAL, NULL);
 	LockBuffer(RootBuffer, BUFFER_LOCK_EXCLUSIVE);
 
-	/* Initialize and xlog metabuffer and root buffer. */
+	/*
+	 * Initialize and xlog metabuffer and root buffer. When those full
+	 * pages are replayed, it is not guaranteed that those relation
+	 * init forks will be flushed to disk after replaying them, hence
+	 * enforce those pages to be flushed to disk at replay, only the
+	 * last record will enforce a flush for performance reasons and
+	 * because it is actually unnecessary to do it multiple times.
+	 */
 	START_CRIT_SECTION();
 	GinInitMetabuffer(MetaBuffer);
 	MarkBufferDirty(MetaBuffer);
-	log_newpage_buffer(MetaBuffer, false);
+	log_newpage_buffer(MetaBuffer, false, false);
 	GinInitBuffer(RootBuffer, GIN_LEAF);
 	MarkBufferDirty(RootBuffer);
-	log_newpage_buffer(RootBuffer, false);
+	log_newpage_buffer(RootBuffer, false, true);
 	END_CRIT_SECTION();
 
 	/* Unlock and release the buffers. */
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 53bccf6..fd53268 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -84,7 +84,7 @@ gistbuildempty(PG_FUNCTION_ARGS)
 	START_CRIT_SECTION();
 	GISTInitBuffer(buffer, F_LEAF);
 	MarkBufferDirty(buffer);
-	log_newpage_buffer(buffer, true);
+	log_newpage_buffer(buffer, true, true);
 	END_CRIT_SECTION();
 
 	/* Unlock and release the buffer */
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 6a6fc3b..e9a9a8f 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
 		MAIN_FORKNUM,
 		state->rs_blockno,
 		state->rs_buffer,
-		true);
+		true,
+		false);
 		RelationOpenSmgr(state->rs_new_rel);
 
 		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
@@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 			MAIN_FORKNUM,
 			state->rs_blockno,
 			page,
-			true);
+			true,
+			false);
 
 			/*
 			 * Now write the page. We say isTemp = true even if it's not a
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index cf4a6dc..d211a98 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -206,7 +206,7 @@ btbuildempty(PG_FUNCTION_ARGS)
 			  (char *) metapage, true);
 	if (XLogIsNeeded())
 		log_newpage(>rd_smgr->smgr_rnode.node, INIT_FORKNUM,
-	BTREE_METAPAGE, metapage, false);
+	BTREE_METAPAGE, metapage, false, true);
 
 	/*
 	 * An immediate sync is required even if we xlog'd the page, because 

Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Albe Laurenz
Tom Lane wrote:
> There's a discussion over at
> http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz
> of an apparent error in our WIN1250 -> LATIN2 conversion.  I looked into this
> and found that indeed, the code will happily translate certain characters
> for which there seems to be no justification.  I made up a quick script
> that would recompute the conversion tables in latin2_and_win1250.c from
> the Unicode mapping files in src/backend/utils/mb/Unicode, and what it
> computes is shown in the attached diff.  (Zeroes in the tables indicate
> codes with no translation, for which an error should be thrown.)
> 
> Having done that, I thought it would be a good idea to see if we had any
> other conversion tables that weren't directly based on the Unicode data.
> The only ones I could find were in cyrillic_and_mic.c, and those seem to
> be absolutely filled with errors, to the point where I wonder if they were
> made from the claimed encodings or some other ones.  The attached patch
> recomputes those from the Unicode data, too.
> 
> None of this data seems to have been touched since Tatsuo-san's original
> commit 969e0246, so it looks like we simply didn't vet that submission
> closely enough.
> 
> I have not attempted to reverify the files in utils/mb/Unicode against the
> original Unicode Consortium data, but maybe we ought to do that before
> taking any further steps here.
> 
> Anyway, what are we going to do about this?  I'm concerned that simply
> shoving in corrections may cause problems for users.  Almost certainly,
> we should not back-patch this kind of change.

Thanks for picking this up.

I agree with your proposed fix, the only thing that makes me feel uncomfortable
is that you get error messages like:
  ERROR:  character with byte sequence 0x96 in encoding "WIN1250" has no 
equivalent in encoding "MULE_INTERNAL"
which is a bit misleading.
But the main thing is that no corrupt data can be entered.

I can understand the reluctance to back-patch; nobody likes his
application to suddenly fail after a minor database upgrade.

However, the people who would fail if this were back-patched are
people who will certainly run into trouble if they
a) upgrade to a release where this is fixed or
b) try to convert their database to, say, UTF8.

The least thing we should do is stick a fat warning into the release notes
of the first version where this is fixed, along with some guidelines what
to do (though I am afraid that there is not much more helpful to say than
"If your database encoding is X and data have been entered with client_encoding 
Y,
fix your data in the old system").

But I think that this fix should be applied to 9.6.
PostgreSQL has a strong reputation for being strict about correct encoding
(not saying that everybody appreciates that), and I think we shouldn't mar
that reputation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New email address

2015-11-27 Thread Greg Stark
On Fri, Nov 27, 2015 at 10:02 AM, Magnus Hagander  wrote:
> Do you have any examples of lists where it *does* work? LIke, could it be
> because our list-unsubscribe links are mailto: links and not http(s) links
> for example?

>From what I read they prefer mailto links. But apparently they only
display them from high reputation sources -- a comment that only makes
sense if you consider the list-unsubscribe as the source of the email
which only really makes sense in the context of marketing emails. So I
don't know if they'll work for actual lists.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers