Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote:

Wouldn't it be better to not archive the old segment, and instead
switch
to a new segment after writing the end-of-recovery checkpoint, so that
the segment on the new timeline is archived sooner?


It would be better to zero-fill and switch segments, yes. We should
NEVER be in a position of archiving two different versions of the same
segment.


Ok, I think we're in agreement that that's the way to go for master.


I've almost inclined to that but on some thoughts on the idea,
comming to think of recovery upto target timeline, the old
segment found to be necessary for the case. Without the old
segment, we would be obliged to seek to the first segment of the
*next* timeline (Is there any (simple) means to predict where is
it?) to complete the task.


How did the server that created the new timeline get the old, partial, 
segment? Was it already archived? Or did the DBA copy it into pg_xlog 
manually? Or was it streamed by streaming replication? Whatever the 
mechanism, the same mechanism ought to make sure the old segment is 
available for PITR, too.


Hmm. If you have set up streaming replication and a WAL archive, and 
your master dies and you fail over to a standby, what you describe does 
happen. The partial old segment is not in the archive, so you cannot 
PITR to a point in the old timeline that falls within the partial 
segment (ie. just before the failover). However, it's not guaranteed 
that all the preceding WAL segments on the old timeline were already 
archived, anyway, so even if the partial segment is archived, it's not 
guaranteed to work.


The old master is responsible for archiving the WAL on the old timeline, 
and the new master is responsible for archiving all the WAL on the new 
timeline. That's a straightforward, easy-to-understand rule. It might be 
useful to have a mode where the standby also archives all the received 
WAL, but that would need to be a separate option.



Is it the right way we kick the older one out of archive?


If it's already in the archive, it's not going to be removed from the 
archive.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Andrea Suisani

Hi all,

On 02/12/2014 08:27 PM, Greg Stark wrote:

On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Greg Stark st...@mit.edu writes:

For what it's worth I've confirmed the bug in wal-e caused the initial
problem.


Huh?  Bug in wal-e?  What bug?


WAL-E actually didn't restore a whole 1GB file due to a transient S3
problem, in fact a bunch of them. It's remarkable that Postgres kept
going with that much data missing. But the arithmetic worked out on
the case I checked it on, which was the last one that I just sent the
xlog record for last night. In that case there was precisely one
segment missing and the relation was extended by the number of
segments you would expect if it filled in that missing segment and
then jumped to the end of the relation.


sorry for interrupting, but did we already notify wal-e's maintainer?

Andrea

ps cc:ed Daniel





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how set GUC_check_errhint_string in call_string_check_hook()

2014-02-13 Thread amul sul
Dear all,

I need help to send hint message to erreport()  in  call_string_check_hook 
function [src/backend/utils/misc/guc.c:8092]

In   call_string_check_hook(), following parameter are resets

  /* Reset variables that might be set by hook */
8100         GUC_check_errcode_value = ERRCODE_INVALID_PARAMETER_VALUE;
8101         GUC_check_errmsg_string = NULL;
8102         GUC_check_errdetail_string = NULL;
8103         GUC_check_errhint_string = NULL;

at line 8107 in ereport() it has conditional operation as

8113GUC_check_errdetail_string ?

8114                                  errdetail_internal(%s, 
GUC_check_errdetail_string) : 0,

I have basic questions,
 
1. Condition testing of  GUC_check_errdetail_string or other GUC_*  is needed, 
even if we resting it?
2. Can I pass hint message in above ereport(), how?

Thanks in advance !

Regards,
Amul Sul

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/12/2014 01:24 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2014-01-13 52d3caff.3010...@vmware.com

Actually, why is the partially-filled 00010002 file
archived in the first place? Looking at the code, it's been like that
forever, but it seems like a bad idea. If the original server is still
up and running, and writing more data to that file, what will happen is
that when the original server later tries to archive it, it will fail
because the partial version of the file is already in the archive. Or
worse, the partial version overwrites a previously archived more
complete version.


Oh!  This explains some transient errors I've seen.


Wouldn't it be better to not archive the old segment, and instead switch
to a new segment after writing the end-of-recovery checkpoint, so that
the segment on the new timeline is archived sooner?


It would be better to zero-fill and switch segments, yes.  We should
NEVER be in a position of archiving two different versions of the same
segment.


Ok, I think we're in agreement that that's the way to go for master.

Now, what to do about back-branches? On one hand, I'd like to apply
the same fix to all stable branches, as the current behavior is
silly and always has been. On the other hand, we haven't heard any
complaints about it, so we probably shouldn't fix what ain't broken.
Perhaps we should apply it to 9.3, as that's where we have the acute
problem the OP reported. Thoughts?

In summary, I propose that we change master and REL9_3_STABLE to not
archive the partial segment from previous timeline. Older branches
will keep the current behavior.


I've seen the can't archive file from the old timeline problem on
9.2 and 9.3 slaves after promotion. The problem is in conjunction with
the proposed archive_command in the default postgresql.conf comments:

# e.g. 'test ! -f /mnt/server/archivedir/%f  cp %p /mnt/server/archivedir/%f'

With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
remove the test ! -f part. (An alternative fix would be to declare
the behavior ok and adjust that example in the config.)


Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different 
archive_command in 9.1, without the test?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how set GUC_check_errhint_string in call_string_check_hook()

2014-02-13 Thread Amit Langote
Amul,

On Thu, Feb 13, 2014 at 5:59 PM, amul sul sul_a...@yahoo.co.in wrote:

 I have basic questions,

 1. Condition testing of  GUC_check_errdetail_string or other GUC_*  is
 needed, even if we resting it?

A few lines before this conditional test, there is a call to the hook
function (using the check_hook function pointer) as follows which
might set these variables:

if (!(*conf-check_hook) (newval, extra, source))


 2. Can I pass hint message in above ereport(), how?


It could be done by setting the check hook related global variable
GUC_check_errhint_string inside your hook function.

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
 I think what you're arguing is that we should see WAL records filling the
 rest of segment 1 before we see any references to segment 2, but if that's
 the case then how did we get into the situation you reported?  Or is it
 just that it was a broken base backup to start with?

The scenario I could come up with that didn't require a broken base backup
was that there was an earlier truncate or vacuum. So the sequence is high
offset reference, truncate, growth, crash. All possibly on a single
database.

It's possible we're better off not assuming we've thought of all possible
ways this can happen though.


[HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread David Beck
Hello Hackers,

I work on a foreign data wrapper for a legacy system. I generally find the hook 
system very useful and flexible way to extend Postgres.
The post parse analyze hook almost fits what I need, but I have a few use cases 
where I would need to tap right into the parsed queries but before any catalog 
based validation is done.
Please find the attached trivial patch for this new hook.

One of the use cases I have is this:

I have table like data structures in the source system for the FDW I work on.
These tables are sometimes too big and the source system is able to filter and 
join them with limitations, thus it is not optimal to transfer the data to 
Postgres.
At the same time I want the users to think in terms of the original tables.

The idea is to rewrite the SQL queries like this:

  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
b.col2=987”

to:

  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”


This rewritten query would be handled by the FDW table that I previously added 
to the catalog.

The reason I want this new hook is that I don’t want tableA and tableB to be in 
the catalog.

Looking forward to hear your thoughts, opinions, comments.

Best regards, David





post_raw_parse.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-13 Thread Bruce Momjian
On Thu, Feb 13, 2014 at 10:20:46AM +0530, Amit Kapila wrote:
  Why not *only* prefix/suffix?
 
 To represent prefix/suffix match, we atleast need a way to tell
 that the offset and len of matched bytes and then how much
 is the length of unmatched bytes we have copied.
 I agree that a simpler format could be devised if we just want to
 do prefix-suffix match, but that would require much more test
 during recovery to ensure everything is fine, advantage with LZ
 format is that we don't need to bother about decoding, it will work
 as without any much change in LZ decode routine.

Based on the numbers I think prefix/suffix-only needs to be explored. 
Consider if you just change one field of a row --- prefix/suffix would
find all the matching parts.  If you change the first and last fields,
you get no compression at all, but your prefix/suffix test isn't going
to get that either.

As I understand it, the only place prefix/suffix with LZ compression is
a win over prefix/suffix-only is when you change two middle fields, and
there are common fields unchanged between them.  If we are looking at
11% CPU overhead for that, it isn't worth it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-13 Thread MauMau

eFrom: Andres Freund and...@2ndquadrant.com

could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.


Please find the attached file.  I hope this will reveal something.

Regards
MauMau

Dump of assembler code for function LWLockRelease:
  0x00647d40 +0:push   %r12
  0x00647d42 +2:mov%edi,%r12d
  0x00647d45 +5:shl$0x5,%r12
  0x00647d49 +9:add0x4f3a98(%rip),%r12# 0xb3b7e8 
LWLockArray
  0x00647d50 +16:   push   %rbp
  0x00647d51 +17:   mov%edi,%ebp
  0x00647d53 +19:   push   %rbx
  0x00647d54 +20:   mov0x4f3a86(%rip),%ebx# 0xb3b7e0 
num_held_lwlocks
  0x00647d5a +26:   nopw   0x0(%rax,%rax,1)
  0x00647d60 +32:   sub$0x1,%ebx
  0x00647d63 +35:   js 0x647ea4 LWLockRelease+356
  0x00647d69 +41:   movslq %ebx,%rax
  0x00647d6c +44:   cmp%ebp,0xb3b800(,%rax,4)
  0x00647d73 +51:   jne0x647d60 LWLockRelease+32
  0x00647d75 +53:   mov0x4f3a65(%rip),%esi# 0xb3b7e0 
num_held_lwlocks
  0x00647d7b +59:   sub$0x1,%esi
  0x00647d7e +62:   cmp%ebx,%esi
  0x00647d80 +64:   mov%esi,0x4f3a5a(%rip)# 0xb3b7e0 
num_held_lwlocks
  0x00647d86 +70:   jg 0x647d92 LWLockRelease+82
  0x00647d88 +72:   jmp0x647dad LWLockRelease+109
  0x00647d8a +74:   nopw   0x0(%rax,%rax,1)
  0x00647d90 +80:   mov%ecx,%ebx
  0x00647d92 +82:   lea0x1(%rbx),%ecx
  0x00647d95 +85:   movslq %ebx,%rax
  0x00647d98 +88:   movslq %ecx,%rdx
  0x00647d9b +91:   cmp%ecx,%esi
  0x00647d9d +93:   mov0xb3b800(,%rdx,4),%edx
  0x00647da4 +100:  mov%edx,0xb3b800(,%rax,4)
  0x00647dab +107:  jg 0x647d90 LWLockRelease+80
  0x00647dad +109:  mov$0x1,%eax
  0x00647db2 +114:  lock xchg %al,(%r12)
  0x00647db7 +119:  test   %al,%al
  0x00647db9 +121:  jne0x647ee4 LWLockRelease+420
  0x00647dbf +127:  movzbl 0x2(%r12),%eax
  0x00647dc5 +133:  test   %al,%al
  0x00647dc7 +135:  jle0x647f04 LWLockRelease+452
  0x00647dcd +141:  movzbl 0x2(%r12),%eax
  0x00647dd3 +147:  sub$0x1,%eax
  0x00647dd6 +150:  mov%al,0x2(%r12)
  0x00647ddb +155:  mov0x8(%r12),%rcx
  0x00647de0 +160:  test   %rcx,%rcx
  0x00647de3 +163:  je 0x647def LWLockRelease+175
  0x00647de5 +165:  movzbl 0x2(%r12),%eax
  0x00647deb +171:  test   %al,%al
  0x00647ded +173:  je 0x647e08 LWLockRelease+200
  0x00647def +175:  movb   $0x0,(%r12)
  0x00647df4 +180:  pop%rbx
  0x00647df5 +181:  pop%rbp
  0x00647df6 +182:  mov0x53d25c(%rip),%eax# 0xb85058 
InterruptHoldoffCount
  0x00647dfc +188:  pop%r12
  0x00647dfe +190:  sub$0x1,%eax
  0x00647e01 +193:  mov%eax,0x53d251(%rip)# 0xb85058 
InterruptHoldoffCount
  0x00647e07 +199:	retq   
  0x00647e08 +200:	mov0x4(%r12),%eax

  0x00647e0d +205:  test   %eax,%eax
  0x00647e0f +207:  jne0x647def LWLockRelease+175
  0x00647e11 +209:  movzbl 0x1(%r12),%eax
  0x00647e17 +215:  test   %al,%al
  0x00647e19 +217:  je 0x647def LWLockRelease+175
  0x00647e1b +219:  cmpb   $0x2,0x42(%rcx)
  0x00647e1f +223:  jne0x647f66 LWLockRelease+550
  0x00647e25 +229:  mov0x48(%rcx),%rax
  0x00647e29 +233:  test   %rax,%rax
  0x00647e2c +236:  je 0x647f66 LWLockRelease+550
  0x00647e32 +242:  mov%rax,%rdx
  0x00647e35 +245:  jmp0x647e44 LWLockRelease+260
  0x00647e37 +247:  mov0x48(%rdx),%rdx
  0x00647e3b +251:  test   %rdx,%rdx
  0x00647e3e +254:  je 0x647f16 LWLockRelease+470
  0x00647e44 +260:  movzbl 0x42(%rdx),%esi
  0x00647e48 +264:  mov%rdx,%rax
  0x00647e4b +267:  cmp$0x2,%sil
  0x00647e4f +271:  je 0x647e37 LWLockRelease+247
  0x00647e51 +273:  mov0x48(%rdx),%rdx
  0x00647e55 +277:  test   %sil,%sil
  0x00647e58 +280:  je 0x647f16 LWLockRelease+470
  0x00647e5e +286:  test   %rdx,%rdx
  0x00647e61 +289:  je 0x647f16 LWLockRelease+470
  0x00647e67 +295:  cmpb   $0x0,0x42(%rdx)
  0x00647e6b +299:  mov$0x1,%edi
  0x00647e70 +304:  jne0x647e8a LWLockRelease+330
  0x00647e72 

Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-13 Thread Magnus Hagander
On Wed, Feb 12, 2014 at 10:53 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 2/12/14, 4:34 PM, Magnus Hagander wrote:
  On Wed, Feb 12, 2014 at 10:28 PM, Peter Eisentraut pete...@gmx.net
  mailto:pete...@gmx.net wrote:
 
  On 2/12/14, 12:47 PM, Magnus Hagander wrote:
   Since there were no other objections, I've applied this patch.
 
  I'm getting a compiler warning:
 
  pg_basebackup.c:105:3: error: implicit declaration of function 'kill'
  [-Werror=implicit-function-declaration]
 
 
  What platform is that? And do you know which header the declaration
  actually lives in? I don't see it here...

 OS X, signal.h according to man page


Are you sure you made that test after my fixup patch (the one suggested by
Andres)? Because that one was at least supposed to add signal.h...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Kyotaro HORIGUCHI
Hello, I might have been misunderstood your words.

At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas 
hlinnakan...@vmware.com wrote in 52fc7e2a.9060...@vmware.com
 On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote:
  Wouldn't it be better to not archive the old segment, and instead
  switch
  to a new segment after writing the end-of-recovery checkpoint, so that
  the segment on the new timeline is archived sooner?
 
  It would be better to zero-fill and switch segments, yes. We should
  NEVER be in a position of archiving two different versions of the same
  segment.
 
  Ok, I think we're in agreement that that's the way to go for master.

Does this mean that no mechanical solution for this situation
will be given and an operator should remove the older segment for
the same segment id before starting recovoery process?

  I've almost inclined to that but on some thoughts on the idea,
  comming to think of recovery upto target timeline, the old
  segment found to be necessary for the case. Without the old
  segment, we would be obliged to seek to the first segment of the
  *next* timeline (Is there any (simple) means to predict where is
  it?) to complete the task.
 
 How did the server that created the new timeline get the old, partial,
 segment? Was it already archived? Or did the DBA copy it into pg_xlog
 manually? Or was it streamed by streaming replication? Whatever the
 mechanism, the same mechanism ought to make sure the old segment is
 available for PITR, too.

Sure.

 Hmm. If you have set up streaming replication and a WAL archive, and
 your master dies and you fail over to a standby, what you describe
 does happen. The partial old segment is not in the archive, so you
 cannot PITR to a point in the old timeline that falls within the
 partial segment (ie. just before the failover). However, it's not
 guaranteed that all the preceding WAL segments on the old timeline
 were already archived, anyway, so even if the partial segment is
 archived, it's not guaranteed to work.

Yes, and putting aside the insane or vanished segments in
archive, I understand that a pair of master and standby (standby
and cascaded standby and so on, too) can share one WAL archive,
or archived WAL segments and all the WAL segments not archived
and left in pg_xlog of the old master should be merged into WAL
archive of new master (promoted old slave) to keep the
availability of the online backup taken from the old master. Even
with the shared WAL archive, missing segments in archive should
be filled up using pg_xlog though. Nevertheless, the process can
be implemented in automatic way.

The test script at first of this thread is for the case of shared
archive and I have unconsciously put that as the context.


 The old master is responsible for archiving the WAL on the old
 timeline, and the new master is responsible for archiving all the WAL
 on the new timeline. That's a straightforward, easy-to-understand
 rule.

Yes, I was somewhat confused because of my assumption of shared
archive, but it actually can be converged into single archive,
and the older version of PostgreSQL could cope with that
situation.

  It might be useful to have a mode where the standby also
 archives all the received WAL, but that would need to be a separate
 option.

Perhaps such a mechanism is not demanded :)

  Is it the right way we kick the older one out of archive?
 
 If it's already in the archive, it's not going to be removed from the
 archive.

I have understood the conclusion so far is not archiving the
older segment when promotion but it seems a bit odd story as you
suggested. If the conclusion here is no aid as my new
understanding, would you let me hear the reason why recovery have
changed to prefer archive to pg_xlog?

The commit abf5c5c9a4 seems to change the behavior but I don't
find the reason for the change.

ragards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-02-13 Thread KONDO Mitsumasa

Hi Febien,

Thank you very much for your very detail and useful comments!
I read your comment, I agree most of your advice:)

Attached patch is fixed for your comment. That are...
 - Remove redundant long-option.
   - We can use --gaussian=NUM -S or --gaussian=NUMN -N options.
 - Add sentence in document
 - Separate two random generate function which are uniform and gaussian.
   - getGaussianrand() is created.
 - Fix ranged random number more strictly, ex. (0,1) or [0,1).
   - Please see comment of source code in detail:).
 - Fix typo.
 - Use cos() and sin() function when we generate gaussian random number.
 - Add fast sqrt calculation algorithm.
 - Reuse sqrt result and pre generate random number for reducing calculation 
cost.
   - Experience of this method is under following. It will be little-bit faster 
than non-reuse method. And distribution of gaussian is still good.


* Settings
 shared_buffers = 1024MB

* Test script
 pgbench -i -s 1
 pgbench --gaussian=2 -T 30 -S -c8 -j4 -n
 pgbench --gaussian=2 -T 30 -S -c8 -j4 -n
 pgbench --gaussian=2 -T 30 -S -c8 -j4 -n

* Result
  method |  try1  |  try2  |  try3  |
|
reuse method | 44189  | 44453  | 44013  |
non-reuse method | 43567  | 43635  | 43508  |



(2014/02/09 21:32), Fabien COELHO wrote:

   This is a valuable contribution to enable pgbench to generate more realistic
   loads, which is seldom uniform in practice.

Thanks!


   However, ISTM that other distributions such an exponantial one would make
   more sense,
I can easy to create exponential distribution. Here, I assume exponential 
distribution that is f(x) = lambda * exp^(-lambda * x) in general.

What do you think under following interface?

custom script: \setexp [varname] min max threshold
command  : --exponential=NUM(threshold)

I don't want to use lambda variable for simple implementation. So lambda is 
always 1. Because it can enough to control distribution by threshold. Threshold 
parameter is f(x) value. And using created distribution projects to 'aid' by same 
method. If you think OK, I will impliment under followings tomorrow, and also 
create parseing part of this function...


do
{
   rand = 1.0 - pg_erand48(thread-random_state);
   rand = -log(rand);
}while( rand  exp_threshold)

return rand / exp_threshold;



   and also the values should be further randomized so that
   neighboring values are not more likely to be drawn. The latest point is non
   trivial.
That's right, but I worry about gaussian randomness and benchmark reproducibility 
might be disappeared when we re-randomized access pattern, because Postgres 
storage method manages records by each pages and it is difficult to realize 
access randomness in whole pages, not record. If we solve this problem, we have 
to need algorithm for smart shuffule projection function that is still having 
gaussian randomized. I think it will be difficult, and it have to impement in 
another patch in the future.




* Mathematical soundness

   We want to derive a discrete normal distribution from a uniform one.
   Well, normal distributions are for continuous variables... Anyway, this is
   done by computing a continuous normal distribution which is then projected
   onto integers. I'm basically fine with that.

   The system uses a Box-Muller transform (1958) to do this transformation.
   The Ziggurat method seems to be prefered for this purpose, *but* it would
   require precalculated tables which depends on the target values. So I'm
   fine with the Box-Muller transform for pgbench.
Yes, that's right. I selected simple and relatively faster algorithm, that is 
Box-Muller transform.



   The BM method uses 2 uniformly distributed numbers to derive 2 normally
   distributed numbers. The implementation computes one of these, and loops
   over till one match a threshold criterion.

   More explanations, at least in comments, are needed about this threshold
   and its meaning. It is required to be more than 2. I guess is that it allows
   to limit the number of iterations of the while loop,

Yes. This loop could not almost go on, because min stdev_threshold is 2.
The possibility of retry-loop is under 4 percent. It might not be problem.


   but in what proportion
   is unclear. The documentation does not also help the user to understand
   this value and its meaning.

Yes, it is huristic method. So I added the comments in document.



   What I think it is: it is the deviation for the FURTHEST point around the
   mean, that is the actual deviation associated to the min and max target
   values. The 2 minimum value induces that there is a least 4 stddev lengths
   between min  max, with the most likely mean in the middle.

Correct!


   If the threshold test fails, one of the 2 uniform number is redrawn, a new
   candidate value is tested. I'm not at ease about why only 1 value is redrawn
   and not both, some explanations would be welcome. Also, on the other hand,
   why not 

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread Kohei KaiGai
See the discussion of Custom-Scan API.
https://commitfest.postgresql.org/action/patch_view?id=1282

I believe my third patch is what you really want to do...

 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.

 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.

I'd like to see why you wants the pseudo table fdw_tableA_tableB to
be in the catalog,
instead of the tableA and tableB. In addition, parser shall raise
an error if referenced
columns (as a part of tableA or tableB) are not in-catalog because
of name lookup
error.

Thanks,


2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net:
 Hello Hackers,

 I work on a foreign data wrapper for a legacy system. I generally find the 
 hook system very useful and flexible way to extend Postgres.
 The post parse analyze hook almost fits what I need, but I have a few use 
 cases where I would need to tap right into the parsed queries but before any 
 catalog based validation is done.
 Please find the attached trivial patch for this new hook.

 One of the use cases I have is this:

 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.

 The idea is to rewrite the SQL queries like this:

   SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987

 to:

   SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987


 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.

 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.

 Looking forward to hear your thoughts, opinions, comments.

 Best regards, David





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 01:37 PM, Kyotaro HORIGUCHI wrote:

At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas hlinnakan...@vmware.com 
wrote in 52fc7e2a.9060...@vmware.com

On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote:

Wouldn't it be better to not archive the old segment, and instead
switch
to a new segment after writing the end-of-recovery checkpoint, so that
the segment on the new timeline is archived sooner?


It would be better to zero-fill and switch segments, yes. We should
NEVER be in a position of archiving two different versions of the same
segment.


Ok, I think we're in agreement that that's the way to go for master.


Does this mean that no mechanical solution for this situation
will be given and an operator should remove the older segment for
the same segment id before starting recovoery process?


I don't understand. With the proposed change to never archive a partial 
segment, there is no segment with the same filename in the archive. 
There is nothing to remove. Or did you mean that you have to do that in 
9.0-9.2 back-branches, if we only back-patch this change to 9.3?



If it's already in the archive, it's not going to be removed from the
archive.


I have understood the conclusion so far is not archiving the
older segment when promotion but it seems a bit odd story as you
suggested. If the conclusion here is no aid as my new
understanding, would you let me hear the reason why recovery have
changed to prefer archive to pg_xlog?

The commit abf5c5c9a4 seems to change the behavior but I don't
find the reason for the change.


Well, it didn't work too well before abf5c5c9a4 either. With your test 
script, and commit abf5c5c9a4 reverted [1], you got this:


### STAGE 3/3: 2/2 START SERVER IN ARCHIVE RECOVERY MODE 2ND RUN ###
waiting for server to startLOG:  database system was interrupted; 
last known up at 2014-02-13 14:17:53 EET

LOG:  starting archive recovery
LOG:  restored log file 0002.history from archive
/bin/cp: cannot stat 
‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such 
file or directory

LOG:  restored log file 00010002 from archive
LOG:  record with zero length at 0/290
LOG:  invalid primary checkpoint record
/bin/cp: cannot stat 
‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such 
file or directory

LOG:  restored log file 00010002 from archive
LOG:  using previous checkpoint record at 0/228
LOG:  consistent recovery state reached at 0/228
LOG:  record with zero length at 0/290
LOG:  redo is not required
/bin/cp: cannot stat 
‘/home/heikki/pgsql.93stable/test/arc/00020002’: No such 
file or directory

LOG:  restored log file 00010002 from archive
/bin/cp: cannot stat 
‘/home/heikki/pgsql.93stable/test/arc/0003.history’: No such file or 
directory

LOG:  selected new timeline ID: 3
LOG:  restored log file 0002.history from archive
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

Note the invalid primary checkpoint record message. It restored the 
partial 00010002 file from the archive, and tried to 
recover using that instead of the 00020002 file from 
pg_xlog. The primary checkpoint record is not present in 
00010002, so it fell back to recover from the previous 
checkpoint.


It was the exact same issue, even though it appeared to work if you 
don't look carefully. The second recovery never replayed the WAL 
generated on timeline 2, it branched off directly from timeline 1, at 
the same point where timeline 2 did. You can see that by looking at the 
history files. Also, if you create a table and insert a row to it after 
the first recovery, just before killing the server, you'll see that the 
row is not there after the second recovery. That's wrong; the data files 
might already contain some changes from timeline 2, and you end up with 
a corrupt database. Thanks to commit abf5c5c9a4, the situation is at 
least detected, and the system refuses to start up.


The behavior where we prefer a segment from archive with lower TLI over 
a file with higher TLI in pg_xlog actually changed in commit 
a068c391ab0. Arguably changing it wasn't a good idea, but the problem 
your test script demonstrates can be fixed by not archiving the partial 
segment, with no change to the preference of archive/pg_xlog. As 
discussed, archiving a partial segment seems like a bad idea anyway, so 
let's just stop doing that.



[1] just applying commit abf5c5c9a4 in reverse doesn't apply cleanly due 
to further changes to the code, but I did this to simulate the old behavior:


diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c

index 6a4a504..352d3f7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4366,6 +4366,7 @@ readRecoveryCommandFile(void)

/* Enable fetching 

Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Christoph Berg
Re: Heikki Linnakangas 2014-02-13 52fc9468.4050...@vmware.com
 With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
 remove the test ! -f part. (An alternative fix would be to declare
 the behavior ok and adjust that example in the config.)
 
 Hmm, the behavior is the same in 9.1 and 9.2. Did you use a
 different archive_command in 9.1, without the test?

The behavior is different: 9.1 doesn't try to re-archive the last
segment from the old timeline. (I've seen it always happen in training
workshops, so that's always mostly idle machines. Possibly 9.1 will
re-archive the last segment under load.)

I've just reproduced it here with these settings (in Debian's
/etc/postgresql-common/createcluster.conf syntax):
archive_mode = on
archive_command = 'test ! -f /tmp/pg-%v/%%f  cp %%p /tmp/pg-%v/%%f'
wal_level = hot_standby
max_wal_senders = 3

... pg_basebackup ...

recovery.conf:
restore_command = 'cp /tmp/pg-9.3/%f %p'

9.1 slave log (in German, but it should be easy to figure out):

LOG:  konsistenter Wiederherstellungszustand erreicht bei 0/2D8
cp: der Aufruf von stat für „/tmp/pg-9.1/00010003“ ist nicht 
möglich: Datei oder Verzeichnis nicht gefunden
LOG:  konnte Datei »pg_xlog/00010003« nicht öffnen (Logdatei 0, 
Segment 3): Datei oder Verzeichnis nicht gefunden
LOG:  Redo fertig bei 0/2D8
LOG:  Logdatei »00010002« aus Archiv wiederhergestellt
cp: der Aufruf von stat für „/tmp/pg-9.1/0002.history“ ist nicht möglich: 
Datei oder Verzeichnis nicht gefunden
LOG:  gewählte neue Timeline-ID: 2
cp: der Aufruf von stat für „/tmp/pg-9.1/0001.history“ ist nicht möglich: 
Datei oder Verzeichnis nicht gefunden
LOG:  Wiederherstellung aus Archiv abgeschlossen
LOG:  unvollständiges Startpaket
LOG:  Autovacuum-Launcher startet
LOG:  Datenbanksystem ist bereit, um Verbindungen anzunehmen
(eof)

ls /tmp/pg-9.1
00010001  00010002.0020.backup 
00020003
00010002  00020002 0002.history

9.2 slave log:

LOG:  konsistenter Wiederherstellungszustand erreicht bei 0/2E0
cp: der Aufruf von stat für „/tmp/pg-9.2/00010003“ ist nicht 
möglich: Datei oder Verzeichnis nicht gefunden
LOG:  konnte Datei »pg_xlog/00010003« nicht öffnen (Logdatei 0, 
Segment 3): Datei oder Verzeichnis nicht gefunden
LOG:  Redo fertig bei 0/2E0
LOG:  Logdatei »00010002« aus Archiv wiederhergestellt
cp: der Aufruf von stat für „/tmp/pg-9.2/0002.history“ ist nicht möglich: 
Datei oder Verzeichnis nicht gefunden
LOG:  gewählte neue Timeline-ID: 2
cp: der Aufruf von stat für „/tmp/pg-9.2/0001.history“ ist nicht möglich: 
Datei oder Verzeichnis nicht gefunden
LOG:  Wiederherstellung aus Archiv abgeschlossen
LOG:  unvollständiges Startpaket
LOG:  Autovacuum-Launcher startet
LOG:  Datenbanksystem ist bereit, um Verbindungen anzunehmen
LOG:  Archivbefehl ist fehlgeschlagen mit Statuscode 1
DETAIL:  Der fehlgeschlagene Archivbefehl war: test ! -f 
/tmp/pg-9.2/00010002  cp pg_xlog/00010002 
/tmp/pg-9.2/00010002
LOG:  Archivbefehl ist fehlgeschlagen mit Statuscode 1
...

ls /tmp/pg-9.2
00010001  00010002 
00010002.0020.backup

The 9.3 log/ls looks the same.

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


signature.asc
Description: Digital signature


Re: [HACKERS] gaussian distribution pgbench

2014-02-13 Thread KONDO Mitsumasa

Sorry, previos attached patch has small bug.
Please use latest one.

 134 - return min + (int64) (max - min + 1) * rand;
 134 + return min + (int64)((max - min + 1) * rand);

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 176,181  int			progress_nthreads = 0; /* number of threads for progress report */
--- 176,183 
  bool		is_connect;			/* establish connection for each transaction */
  bool		is_latencies;		/* report per-command latencies */
  int			main_pid;			/* main process id used in log filename */
+ double		stdev_threshold = 5;		/* standard deviation threshold */
+ bool		gaussian_option = false;	/* use gaussian distribution random generator */
  
  char	   *pghost = ;
  char	   *pgport = ;
***
*** 338,346  static char *select_only = {
--- 340,390 
  	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
  };
  
+ /* --gaussian case */
+ static char *gaussian_tpc_b = {
+ 	\\set nbranches  CppAsString2(nbranches)  * :scale\n
+ 	\\set ntellers  CppAsString2(ntellers)  * :scale\n
+ 	\\set naccounts  CppAsString2(naccounts)  * :scale\n
+ 	\\setgaussian aid 1 :naccounts :stdev_threshold\n
+ 	\\setrandom bid 1 :nbranches\n
+ 	\\setrandom tid 1 :ntellers\n
+ 	\\setrandom delta -5000 5000\n
+ 	BEGIN;\n
+ 	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n
+ 	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
+ 	UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n
+ 	UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n
+ 	INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n
+ 	END;\n
+ };
+ 
+ /* --gaussian with -N case */
+ static char *gaussian_simple_update = {
+ 	\\set nbranches  CppAsString2(nbranches)  * :scale\n
+ 	\\set ntellers  CppAsString2(ntellers)  * :scale\n
+ 	\\set naccounts  CppAsString2(naccounts)  * :scale\n
+ 	\\setgaussian aid 1 :naccounts :stdev_threshold\n
+ 	\\setrandom bid 1 :nbranches\n
+ 	\\setrandom tid 1 :ntellers\n
+ 	\\setrandom delta -5000 5000\n
+ 	BEGIN;\n
+ 	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n
+ 	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
+ 	INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n
+ 	END;\n
+ };
+ 
+ /* --gaussian with -S case */
+ static char *gaussian_select_only = {
+ 	\\set naccounts  CppAsString2(naccounts)  * :scale\n
+ 	\\setgaussian aid 1 :naccounts :stdev_threshold\n
+ 	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
+ };
+ 
  /* Function prototypes */
  static void setalarm(int seconds);
  static void *threadRun(void *arg);
+ static inline double sqrtd(const double x);
  
  static void
  usage(void)
***
*** 381,386  usage(void)
--- 425,431 
  		 -v, --vacuum-all vacuum all four standard tables before tests\n
  		 --aggregate-interval=NUM aggregate data over NUM seconds\n
  		 --sampling-rate=NUM  fraction of transactions to log (e.g. 0.01 for 1%%)\n
+ 		 --gaussian=NUM   gaussian distribution with NUM standard deviation threshold\n
  		   \nCommon options:\n
  		 -d, --debug  print debugging output\n
  		 -h, --host=HOSTNAME  database server host or socket directory\n
***
*** 477,482  getrand(TState *thread, int64 min, int64 max)
--- 522,597 
  	return min + (int64) ((max - min + 1) * pg_erand48(thread-random_state));
  }
  
+ /* random number generator: gaussian distribution from min to max inclusive */
+ static int64
+ getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
+ {
+ 	double		stdev;
+ 	double		rand;
+ 	static double	rand1;
+ 	static double	rand2;
+ 	static double	var_sqrt;
+ 	static bool	reuse = false;
+ 	
+ 	/*
+ 	 * Get user specified random number(-stdev_threshold  stdev = stdev_threshold) 
+ 	 * in this loop. This loop is executed until appeared ranged number we want.
+ 	 * However, this loop could not almost go on, because min stdev_threshold is 2
+ 	 * then the possibility of retry-loop is under 4 percent. And possibility of
+ 	 * re-retry-loop is under 1.6 percent. And it doesn't happen frequentry even if
+ 	 * we also think about the cycle of the trigonometric function.
+  	 */
+ 	do
+ 	{
+ 		/* reuse pre calculation result as possible */
+ 		if(!reuse)
+ 		{
+ 			/* 
+  			 * pg_erand48 generates [0,1) random number. However rand1 
+  			 * needs (0,1) random number because log(0) cannot calculate.
+  			 * And rand2 also needs (0,1) random number in strictly. But
+  			 * normalization cost is high and we can substitute (0,1] at
+  			 * rand1 and [0,1) at rand2, so we use approximate calculation.
+  			 */
+ 			rand1 = 1.0 - pg_erand48(thread-random_state);
+ 			rand2 = pg_erand48(thread-random_state);
+ 		
+ 			 /* 

Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread David Beck
Thanks for the link.

I want flexibility. Here is a situation: my hook knows the size of tableA and 
tableB on the legacy side. It should be able to decide wether to offload the 
join/filter onto the legacy side or not. At the same time it can start 
transferring the data to real Postgres tables in the background. When the data 
is here in the local DB it can change the rewrite to use the local tables 
instead for the new queries.

Answering your question:

I don’t know what you mean by pseudo table. The “fdw_tableA_tableB” is a real 
FDW table with a catalog entry, with real catalog columns. tableA and tableB 
are non-real tables in my setup. My understanding is the 
raw_parse-transform-analyze triple first checks the catalog in the transform 
phase. That’s why I want the hook to be after raw_parse but before transform, 
so the transform phase won’t see tableA, tableB and their columns, thus I don’t 
expect errors.

Thank you, David


Please find my answers inline.

2014.02.13. dátummal, 13:28 időpontban Kohei KaiGai kai...@kaigai.gr.jp írta:

 See the discussion of Custom-Scan API.
 https://commitfest.postgresql.org/action/patch_view?id=1282
 
 I believe my third patch is what you really want to do...
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 I'd like to see why you wants the pseudo table fdw_tableA_tableB to
 be in the catalog,
 instead of the tableA and tableB. In addition, parser shall raise
 an error if referenced
 columns (as a part of tableA or tableB) are not in-catalog because
 of name lookup
 error.
 
 Thanks,
 
 
 2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net:
 Hello Hackers,
 
 I work on a foreign data wrapper for a legacy system. I generally find the 
 hook system very useful and flexible way to extend Postgres.
 The post parse analyze hook almost fits what I need, but I have a few use 
 cases where I would need to tap right into the parsed queries but before any 
 catalog based validation is done.
 Please find the attached trivial patch for this new hook.
 
 One of the use cases I have is this:
 
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987
 
 to:
 
  SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987
 
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 Looking forward to hear your thoughts, opinions, comments.
 
 Best regards, David
 
 
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 
 -- 
 KaiGai Kohei kai...@kaigai.gr.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 02:42 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2014-02-13 52fc9468.4050...@vmware.com

With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
remove the test ! -f part. (An alternative fix would be to declare
the behavior ok and adjust that example in the config.)


Hmm, the behavior is the same in 9.1 and 9.2. Did you use a
different archive_command in 9.1, without the test?


The behavior is different: 9.1 doesn't try to re-archive the last
segment from the old timeline. (I've seen it always happen in training
workshops, so that's always mostly idle machines. Possibly 9.1 will
re-archive the last segment under load.)

I've just reproduced it here with these settings (in Debian's
/etc/postgresql-common/createcluster.conf syntax):


Ah, I think I see what's going on. 9.2 changed the archive recovery 
behavior so that the restored WAL segments are copied into pg_xlog, so 
that you can recover the standby back to the same point after restart, 
even if the archive later stops working (also needed for cascading 
standbys). In 9.1, we only attempted to archive the last segment from 
previous timeline if it wasn't restored from archive. In 9.2, it's 
always archived.


I was testing this with streaming replication; 9.1 and 9.2 behave the 
same in that scenario. But they differ when doing archive recovery.


Is this an argument for back-patching the don't archive last segment 
from old timeline patch to 9.2 and 9.3, but leaving 9.1 alone? You have 
the same problem with 9.1 and streaming replication, but no-one's 
complained..


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-13 Thread knizhnik

On 02/12/2014 06:10 PM, Ants Aasma wrote:

On Wed, Feb 12, 2014 at 4:04 PM, knizhnik knizh...@garret.ru wrote:

Even if reordering was not done by compiler, it still can happen while
execution.
There is no warranty that two subsequent assignments will be observed by all
CPU cores in the same order.


The x86 memory model (total store order) provides that guarantee in
this specific case.

Regards,
Ants Aasma



Sorry, I thought that we are talking about general case, not just x86 
architecture.
May be I do not understand something in LWlock code, but it seems to me that 
assigning NULL to proc-lwWaitLink is not needed at all:

while (head != NULL)
{
LOG_LWDEBUG(LWLockRelease, lockid, release waiter);
proc = head;
head = proc-lwWaitLink;
proc-lwWaitLink = NULL;
proc-lwWaiting = false;
PGSemaphoreUnlock(proc-sem);
}



This part of L1 list is not traversed by any other processor. So nobody will 
inspect this field. When awakened process needs to wait for another lock,
it  will just assign NULL to this field itself:

proc-lwWaiting = 1;
proc-lwWaitMode = mode;
proc-lwWaitLink = NULL;
if (lock-head == NULL)
lock-head = proc;
else
lock-tail-lwWaitLink = proc;
lock-tail = proc;

Without TSO (total store order), such assignment of lwWaitLink in LWLockRlease 
outside critical section may just corrupt L1 list, in which awakened process is 
already linked.
But I am not sure that elimination of this assignment will be enough to ensure 
correctness of this code without TSO.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prevent pg_basebackup -Fp -D -?

2014-02-13 Thread Magnus Hagander
On Thu, Feb 13, 2014 at 4:10 AM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Oct  3, 2013 at 06:50:57AM +0200, Magnus Hagander wrote:
 
  On Oct 3, 2013 2:47 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:
  
   On Wed, Oct 2, 2013 at 11:31 PM, Magnus Hagander mag...@hagander.net
 wrote:
Right now, if you use
   
pg_basebackup -Ft -D -
   
you get a tarfile, written to stdout, for redirection.
   
However, if you use:
   
pg_basebackup -Fp -D -
   
you get a plaintext (unpackaged) backup, in a directory called -.
   
I can't think of a single usecase where this is a good idea.
 Therefor,
I would suggest we simply throw an error in this case, instead of
creating the directory. Only for the specific case of specifying
exactly - as a directory.
   
Comments?
   Isn't this a non-problem? This behavior is in line with the
   documentation, so I would suspected that if directory name is
   specified as - in plain mode, it should create the folder with this
   name.
   Do you consider having a folder of this name an annoyance?
 
  Yes, that is exactly the point - i do consider that an annoyance, and i
 don't
  see the use case where you'd actually want it. I bet 100% of the users
 of that
  have been accidental, thinking they'd get the pipe, not the directory.
 
Also, if we do that, is this something we should consider
backpatchable? It's not strictly speaking a bugfix, but I'd say it
fixes some seriously annoying behavior.
   This would change the spec of pg_basebackup, so no? Does the current
   behavior have potential security issues?
 
  No, there are no security issues that I can see. Just annoyance. And
 yes, I
  guess it would change the spec, so backpatching might be a bad idea..

 Has this been fixed?  If so, I don't see it.


It has not. I think the thread wasn't entirely clear on if we wanted it or
not, which is why I was waiting for more input from others. And then
promptly forgot about it since nobody spoke up :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Christoph Berg
Re: Heikki Linnakangas 2014-02-13 52fcca40.3060...@vmware.com
 I was testing this with streaming replication; 9.1 and 9.2 behave
 the same in that scenario. But they differ when doing archive
 recovery.
 
 Is this an argument for back-patching the don't archive last
 segment from old timeline patch to 9.2 and 9.3, but leaving 9.1
 alone? You have the same problem with 9.1 and streaming replication,
 but no-one's complained..

I can't say what would be better from a technical viewpoint. For me,
this is literally an academic problem people run into when they try
replication during the PostgreSQL course, using the archive_command
example from postgresql.conf. I guess if no one else has complained
yet, there's no reason to touch the back branches.

Is removing the test ! -f part and hence overwriting files in the
archive safe, i.e. are the files the same?

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


signature.asc
Description: Digital signature


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 03:53 PM, Christoph Berg wrote:

Is removing the test ! -f part and hence overwriting files in the
archive safe, i.e. are the files the same?


No. Not in general, anyway. If the old master keeps running, even for a 
moment, after the partial file was copied, it will have created more WAL 
records which are present in the old master's copy of the segment, but 
not in the standby's.


You could perhaps use diff as part of archive_command, to check if 
archived version is identical to the one about to be archived.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

2014-02-13 Thread Noah Misch
On Wed, Feb 12, 2014 at 06:06:40PM -0500, Bruce Momjian wrote:
 On Wed, Sep 11, 2013 at 02:10:45PM -0400, Robert Haas wrote:
  On Tue, Sep 10, 2013 at 11:33 PM, Noah Misch n...@leadboat.com wrote:
   I'm thinking to preserve postmaster.pid at immediate shutdown in all 
   released
   versions, but I'm less sure about back-patching a change to make
   PGSharedMemoryCreate() pickier.  On the one hand, allowing startup to 
   proceed
   with backends still active in the same data directory is a corruption 
   hazard.
   On the other hand, it could break weird shutdown/restart patterns that 
   permit
   trivial lifespan overlap between backends of different postmasters.  
   Opinions?
  
  I'm more sanguine about the second change than the first.  Leaving
  postmaster.pid around seems like a clear user-visible behavior change
  that could break user scripts or have other consequences that we don't
  foresee; thus, I would vote against back-patching it.  Indeed, I'm not
  sure it's a good idea to do that even in master.  On the other hand,
  tightening the checks in PGSharedMemoryCreate() seems very much worth
  doing, and I think it might also be safe enough to back-patch.
 
 Were these changes every applied?  I don't see them.

No, I haven't gotten around to writing them.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Christoph Berg
Re: Heikki Linnakangas 2014-02-13 52fcd02c.3060...@vmware.com
 Is removing the test ! -f part and hence overwriting files in the
 archive safe, i.e. are the files the same?
 
 No. Not in general, anyway. If the old master keeps running, even
 for a moment, after the partial file was copied, it will have
 created more WAL records which are present in the old master's copy
 of the segment, but not in the standby's.

So why does PostgreSQL then ever try to archive something from the old
timeline? Isn't overwriting files on the old timeline just a very bad
idea in general? If so, it shouldn't be the responsibility of
archive_command to prevent this, but the server shouldn't be trying
that in the first place.

 You could perhaps use diff as part of archive_command, to check if
 archived version is identical to the one about to be archived.

Well, if that makes it into the default postgresql.conf (and docs)...

Otoh, it's still not nice that this could still be an archive command
that fails, i.e. requires admin intervention before archiving can
proceed.

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


signature.asc
Description: Digital signature


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 04:07 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2014-02-13 52fcd02c.3060...@vmware.com

Is removing the test ! -f part and hence overwriting files in the
archive safe, i.e. are the files the same?


No. Not in general, anyway. If the old master keeps running, even
for a moment, after the partial file was copied, it will have
created more WAL records which are present in the old master's copy
of the segment, but not in the standby's.


So why does PostgreSQL then ever try to archive something from the old
timeline? Isn't overwriting files on the old timeline just a very bad
idea in general?


Yes.


If so, it shouldn't be the responsibility of
archive_command to prevent this, but the server shouldn't be trying
that in the first place.


Totally agreed. That's why I'm proposing that we stop doing that :-).

It's been a bad idea from day one, but I'm a bit hesitent to back-patch 
until someone actually complains. There has been subtle changes to 
archive recovery behavior in each release, so I'm worried that the old 
behavior was in fact useful to someone in some corner case that we're 
missing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating pg_basebackup background streamer

2014-02-13 Thread Peter Eisentraut
On 2/13/14, 6:25 AM, Magnus Hagander wrote:
 Are you sure you made that test after my fixup patch (the one suggested
 by Andres)? Because that one was at least supposed to add signal.h... 

works now



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-13 Thread Florian Pflug
On Feb10, 2014, at 17:38 , Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 So what we need to do is to acquire a write barrier between the
 assignments to lwWaitLink and lwWaiting, i.e.
proc-lwWaitLink = NULL;
pg_write_barrier();
proc-lwWaiting = false;
 
 You didn't really explain why you think that ordering is necessary?
 Each proc being awoken will surely see both fields updated, and other
 procs won't be examining these fields at all, since we already delinked
 all these procs from the LWLock's queue.
 
 The problem is that one the released backends could wake up concurrently
 because of a unrelated, or previous PGSemaphoreUnlock(). It could see
 lwWaiting = false, and thus wakeup and acquire the lock, even if the
 store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
 ordering here) yet.
 Now, it may well be that there's no practical consequence of that, but I
 am not prepared to bet on it.

AFAICS there is a potential problem if three backends are involved, since
by the time the waiting backend's lwWaitLink is set to NULL after the
original lock holder released the lock, the waiting backend might already
have acquired the lock (due to a spurious wakeup) *and* a third backend
might have already enqueued behind it.

The exact sequence for backends A,B,C that corrupts the wait queue is:

A: Release lock, set B's lwWaiting to false
B: Wakes up spuriously, takes the lock
C: Enqueues behind B
A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
   causing C and anyone behind it to block indefinitely.

I wonder whether LWLockRelease really needs to update lwWaitLink at all.
We take the backends we awake off the queue by updating the queue's head and
tail, so the contents of lwWaitLink should only matter once the backend is
re-inserted into some wait queue. But when doing that, we reset lwWaitLink
back to NULL anway.

best regards,
Florian Pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andrew Dunstan


On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't 
actually stuck, just livelocked. I'm just on my mobile right now, but 
afair Andrew described a loop involving lots of semaphores and 
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a 
large amount of items one by one, it's not surprising if its 
dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to 
clean up the pending list at a time.




Is that going to serialize all these inserts?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old warning in docs

2014-02-13 Thread Bruce Momjian
On Mon, Oct  7, 2013 at 08:43:27AM -0700, David Fetter wrote:
 On Mon, Oct 07, 2013 at 07:51:44AM -0400, Andrew Dunstan wrote:
  
  Given that we have not supported releases older than 8.3 for quite a
  while, do we need to keep this in extend.sgml any longer?
  
caution
  para
   Changing varnamePG_CONFIG/varname only works when building
   against productnamePostgreSQL/productname 8.3 or later.
   With older releases it does not work to set it to anything except
   literalpg_config/; you must alter your varnamePATH/
   to select the installation to build against.
  /para
 /caution
 
 I say bin it.  That reminds me.  There are probably a lot of places
 in the docs that refer to versions of PostgreSQL a good bit older than
 8.3.  Will grep and patch as I get the time.

I have created the attached patch which removes many of the pre-8.0
references, and trims some of the 8.1-8.3 references.  There are
probably some of these that should be kept, but it is easier to show you
all the possibilities and we can trim down the removal list based on
feedback.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 30fd9bb..a5b74e6
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** NUMERIC
*** 744,754 
  
  note
   para
!   Prior to productnamePostgreSQL/productname 7.4, the precision in
!   typefloat(replaceablep/replaceable)/type was taken to mean
!   so many emphasisdecimal/ digits.  This has been corrected to match the SQL
!   standard, which specifies that the precision is measured in binary
!   digits.  The assumption that typereal/type and
typedouble precision/type have exactly 24 and 53 bits in the
mantissa respectively is correct for IEEE-standard floating point
implementations.  On non-IEEE platforms it might be off a little, but
--- 744,750 
  
  note
   para
!   The assumption that typereal/type and
typedouble precision/type have exactly 24 and 53 bits in the
mantissa respectively is correct for IEEE-standard floating point
implementations.  On non-IEEE platforms it might be off a little, but
*** ALTER SEQUENCE replaceable class=param
*** 844,859 
/para
  /note
  
- note
-  para
-   Prior to productnamePostgreSQL/productname 7.3, typeserial/type
-   implied literalUNIQUE/literal.  This is no longer automatic.  If
-   you wish a serial column to have a unique constraint or be a
-   primary key, it must now be specified, just like
-   any other data type.
-  /para
- /note
- 
  para
   To insert the next value of the sequence into the typeserial/type
   column, specify that the typeserial/type
--- 840,845 
*** SELECT E'\\xDEADBEEF';
*** 1602,1609 
   The SQL standard requires that writing just typetimestamp/type
   be equivalent to typetimestamp without time
   zone/type, and productnamePostgreSQL/productname honors that
!  behavior.  (Releases prior to 7.3 treated it as typetimestamp
!  with time zone/type.)  typetimestamptz/type is accepted as an
   abbreviation for typetimestamp with time zone/type; this is a
   productnamePostgreSQL/productname extension.
  /para
--- 1588,1594 
   The SQL standard requires that writing just typetimestamp/type
   be equivalent to typetimestamp without time
   zone/type, and productnamePostgreSQL/productname honors that
!  behavior.  typetimestamptz/type is accepted as an
   abbreviation for typetimestamp with time zone/type; this is a
   productnamePostgreSQL/productname extension.
  /para
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
new file mode 100644
index bae2e97..8ace8bd
*** a/doc/src/sgml/ddl.sgml
--- b/doc/src/sgml/ddl.sgml
*** CREATE TABLE circles (
*** 1106,1114 
  within a single transaction.  In practice this limit is not a
  problem mdash; note that the limit is on the number of
  acronymSQL/acronym commands, not the number of rows processed.
! Also, as of productnamePostgreSQL/productname 8.3, only commands
! that actually modify the database contents will consume a command
! identifier.
 /para
   /sect1
  
--- 1106,1113 
  within a single transaction.  In practice this limit is not a
  problem mdash; note that the limit is on the number of
  acronymSQL/acronym commands, not the number of rows processed.
! Also, only commands that actually modify the database contents will
! consume a command identifier.
 /para
   /sect1
  
*** REVOKE CREATE ON SCHEMA public FROM PUBL
*** 1873,1883 
 /para
  
 para
! In 

Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 05:40 PM, Andrew Dunstan wrote:


On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:

On 02/12/2014 10:50 PM, Andres Freund wrote:

On February 12, 2014 9:33:38 PM CET, Tom Lane t...@sss.pgh.pa.us wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:

On investigation I found that a number of processes were locked

waiting for

one wedged process to end its transaction, which never happened

(this

transaction should normally take milliseconds). oprofile revealed

that

postgres was spending 87% of its time in s_lock(), and strace on the

wedged

process revealed that it was in a tight loop constantly calling

select(). It

did not respond to a SIGTERM.



That's a deficiency of the gin fastupdate cache: a) it bases it's

size

on work_mem which usually makes it *far* too big b) it doesn't

perform the

cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.


I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?


No. But I've several times seen similar backtraces where it wasn't
actually stuck, just livelocked. I'm just on my mobile right now, but
afair Andrew described a loop involving lots of semaphores and
spinlock, that shouldn't be the case if it were actually stuck.
If there dozens of processes waiting on the same lock, cleaning up a
large amount of items one by one, it's not surprising if its
dramatically slow.


Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.


Is that going to serialize all these inserts?


It will serialize the cleanup process, which moves entries from the 
pending list to the tree proper. But that's better than the current 
situation. Currently, when two processes attempt it, they will both try 
to insert into the GIN tree, but one of them will notice that the other 
one already did the cleanup, and bail out. So only one process 
contributes to progress, while the others just waste their effort.


The processes should try to get the lock, and just give up if it's 
already held rather than wait. If someone else is already doing the 
cleanup, there's no need for the current process to do it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:

The behavior where we prefer a segment from archive with lower TLI over
a file with higher TLI in pg_xlog actually changed in commit
a068c391ab0. Arguably changing it wasn't a good idea, but the problem
your test script demonstrates can be fixed by not archiving the partial
segment, with no change to the preference of archive/pg_xlog. As
discussed, archiving a partial segment seems like a bad idea anyway, so
let's just stop doing that.


After some further thought, while not archiving the partial segment 
fixes your test script, it's not enough to fix all variants of the 
problem. Even if archive recovery doesn't archive the last, partial, 
segment, if the original master server is still running, it's entirely 
possible that it fills the segment and archives it. In that case, 
archive recovery will again prefer the archived segment with lower TLI 
over the segment with newer TLI in pg_xlog.


So I agree we should commit the patch you posted (or something to that 
effect). The change to not archive the last segment still seems like a 
good idea, but perhaps we should only do that in master.


Even if after that patch, you can have a problem in more complicated 
scenarios involving both an archive and streaming replication. For 
example, imagine a timeline history like this:


TLI

1 +---
  |
2 +---


Now imagine that timeline 1 has been fully archived, and there are WAL 
segments much higher than the points where the timeline switch occurred 
present in the archive. But none of the WAL segments for timeline 2 have 
been archived; they are only present in a master server. You want to 
perform recovery to timeline 2, using the archived WAL segments for 
timelines 1, and streaming replication to catch up to the tip of timeline 2.


Whether we prefer files from pg_xlog or archive will make no difference 
in this case, as there are no files in pg_xlog. But it will merrily 
apply all the WAL for timeline 1 from the archive that it can find, past 
the timeline switch point. After that, when it tries to connect to the 
server will streaming replication, it will fail.


There's not much we can do about that in 9.2 and below, but in 9.3 the 
timeline history file contains the exact timeline switch points, so we 
could be more careful and not apply any extra WAL on the old timeline 
past the switch point. We could also be more exact in which files we try 
to restore from the archive, instead of just polling every future TLI in 
the history.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] nextVictimBuffer in README

2014-02-13 Thread Vik Fearing
While reading through src/backend/storage/buffer/README and looking at
the code that it describes, I noticed that the case is wrong for
nextVictimBuffer.

It's no big deal really, but the attached trivial patch makes the README
match the code.

-- 
Vik

*** a/src/backend/storage/buffer/README
--- b/src/backend/storage/buffer/README
***
*** 176,183  small limit value) whenever the buffer is pinned.  (This requires only the
  buffer header spinlock, which would have to be taken anyway to increment the
  buffer reference count, so it's nearly free.)
  
! The clock hand is a buffer index, NextVictimBuffer, that moves circularly
! through all the available buffers.  NextVictimBuffer is protected by the
  BufFreelistLock.
  
  The algorithm for a process that needs to obtain a victim buffer is:
--- 176,183 
  buffer header spinlock, which would have to be taken anyway to increment the
  buffer reference count, so it's nearly free.)
  
! The clock hand is a buffer index, nextVictimBuffer, that moves circularly
! through all the available buffers.  nextVictimBuffer is protected by the
  BufFreelistLock.
  
  The algorithm for a process that needs to obtain a victim buffer is:
***
*** 189,196  is pinned or has a nonzero usage count, it cannot be used; ignore it and
  return to the start of step 2.  Otherwise, pin the buffer, release
  BufFreelistLock, and return the buffer.
  
! 3. Otherwise, select the buffer pointed to by NextVictimBuffer, and
! circularly advance NextVictimBuffer for next time.
  
  4. If the selected buffer is pinned or has a nonzero usage count, it cannot
  be used.  Decrement its usage count (if nonzero) and return to step 3 to
--- 189,196 
  return to the start of step 2.  Otherwise, pin the buffer, release
  BufFreelistLock, and return the buffer.
  
! 3. Otherwise, select the buffer pointed to by nextVictimBuffer, and
! circularly advance nextVictimBuffer for next time.
  
  4. If the selected buffer is pinned or has a nonzero usage count, it cannot
  be used.  Decrement its usage count (if nonzero) and return to step 3 to
***
*** 254,264  Background Writer's Processing
  The background writer is designed to write out pages that are likely to be
  recycled soon, thereby offloading the writing work from active backends.
  To do this, it scans forward circularly from the current position of
! NextVictimBuffer (which it does not change!), looking for buffers that are
  dirty and not pinned nor marked with a positive usage count.  It pins,
  writes, and releases any such buffer.
  
! If we can assume that reading NextVictimBuffer is an atomic action, then
  the writer doesn't even need to take the BufFreelistLock in order to look
  for buffers to write; it needs only to spinlock each buffer header for long
  enough to check the dirtybit.  Even without that assumption, the writer
--- 254,264 
  The background writer is designed to write out pages that are likely to be
  recycled soon, thereby offloading the writing work from active backends.
  To do this, it scans forward circularly from the current position of
! nextVictimBuffer (which it does not change!), looking for buffers that are
  dirty and not pinned nor marked with a positive usage count.  It pins,
  writes, and releases any such buffer.
  
! If we can assume that reading nextVictimBuffer is an atomic action, then
  the writer doesn't even need to take the BufFreelistLock in order to look
  for buffers to write; it needs only to spinlock each buffer header for long
  enough to check the dirtybit.  Even without that assumption, the writer
***
*** 268,274  the contention cost of the writer compared to PG 8.0.)
  
  During a checkpoint, the writer's strategy must be to write every dirty
  buffer (pinned or not!).  We may as well make it start this scan from
! NextVictimBuffer, however, so that the first-to-be-written pages are the
  ones that backends might otherwise have to write for themselves soon.
  
  The background writer takes shared content lock on a buffer while writing it
--- 268,274 
  
  During a checkpoint, the writer's strategy must be to write every dirty
  buffer (pinned or not!).  We may as well make it start this scan from
! nextVictimBuffer, however, so that the first-to-be-written pages are the
  ones that backends might otherwise have to write for themselves soon.
  
  The background writer takes shared content lock on a buffer while writing it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andrew Dunstan


On 02/13/2014 10:58 AM, Heikki Linnakangas wrote:


Perhaps we should use a lock to enforce that only one process tries to
clean up the pending list at a time.


Is that going to serialize all these inserts?


It will serialize the cleanup process, which moves entries from the 
pending list to the tree proper. But that's better than the current 
situation. Currently, when two processes attempt it, they will both 
try to insert into the GIN tree, but one of them will notice that the 
other one already did the cleanup, and bail out. So only one process 
contributes to progress, while the others just waste their effort.


The processes should try to get the lock, and just give up if it's 
already held rather than wait. If someone else is already doing the 
cleanup, there's no need for the current process to do it.





Sounds good.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Andres Freund
On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote:
   Also, AutoVacOpts (used as part of reloptions) gained three extra
   fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
   more involve to put these at the end of that struct.  This might be a
   problem if somebody has a module calling RelationIsSecurityView().  If
   anyone thinks we should be concerned about such an ABI change, please
   shout quickly.
  
  That sounds problematic --- surely StdRdOptions might be something
  extensions are making use of?
 
 So can we assume that security_barrier is the only thing to be concerned
 about?  If so, the attached patch should work around the issue by
 placing it in the same physical location.

Aw. How instead about temporarily introducing AutoVacMXactOpts or
something? Changing the name of the member variable sounds just as
likely to break things.

 I guess if there are modules
 that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm
 not really sure how likely this is given that our reloptions design
 hasn't proven to be the most extensible thing in the world.

Hm, I don't see how it'd be problematic, even if they do. I don't really
understand the design of the reloptions code, but afaics, they shouldn't
do so by casting around rd_options but by parsing it anew, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Same double precision operations, different results

2014-02-13 Thread Daniel Vázquez
Please can someone tell me why this behavior? Thx!!

I know it's calculating over coordinates on same points, but the operations
of values (all double precision) differs even forcing with cast

(*) Here some double precision calculations and obtaining values

#Select
#cos(radians(39.9826557)) as f1,
#cos(radians(39.9826557)) as f2,
#cos(radians(-0.0477312004383) - radians(-0.0477312004383)) as
f3,
#sin(radians(39.9826557)) as f4,
#sin(radians(39.9826557)) as f5;
f1 |f2 | f3 |f4 |f5

---+---++---+---
 0.766238989559398 | 0.766238989559398 |  1 | 0.642555686986733 |
0.642555686986733
(1 row)



(*) Here above same obtaining values involved in simple operations

# select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 *
0.642555686986733) calc;
   calc
--
 0.999633651488135693
(1 row)



(*) Here above queries playing together --- Result = 1 ? N! It
should be 0.999633651488135693

# select (v.f1 *  v.f2 * v.f3 + v.f4 * v.f5) calc from ( select
   cos(radians(39.9826557)) as f1,
   cos(radians(39.9826557)) as f2,
   cos(radians(-0.0477312004383) - radians(-0.0477312004383)) as f3,
   sin(radians(39.9826557)) as f4,
   sin(radians(39.9826557)) as f5 ) as v;
 calc
--
1
(1 row)



(*) Yep, this produces problems with acos. Here acos working directly with
the above values. OK this is like acos(0.999633651488135693)

# select acos(0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733
* 0.642555686986733);
 acos
--
 2.58095682795179e-08
(1 row)



(*) Yep, you can see same result

# select acos(0.999633651488135693);
 acos
--
 2.58095682795179e-08
(1 r



(*) ok acos values from -1 to 1 you can see acos(1) here

# select acos(1);
 acos
--
0
(1 row)



(*) Why this calculation produces 1 and not 0.999633651488135693
?

# select (
   cos(radians(39.9826557))
   * cos(radians(39.9826557))
   * cos(radians(-0.0477312004383) - radians(-0.0477312004383))
   + sin(radians(39.9826557))
   * sin(radians(39.9826557))
 ) calc;
 calc
--
1
(1 row)



(*) any way, the result is 1, then if acos(1) get results, this must get
same result  NOPS!! input is out of range

# select acos(
#cos(radians(39.9826557))
#* cos(radians(39.9826557))
#* cos(radians(-0.0477312004383) - radians(-0.0477312004383))
#+ sin(radians(39.9826557))
#* sin(radians(39.9826557))
#  );
ERROR:  input is out of range



(*) You are thinking: it's some cast problem ... NOPS!! input is out of
range

# select acos( cast (
   cos(radians(39.9826557))
   * cos(radians(39.9826557))
   * cos(radians(-0.0477312004383) - radians(-0.0477312004383))
   + sin(radians(39.9826557))
   * sin(radians(39.9826557)) as double precision )
 );
ERROR:  input is out of range



Really, I can get the reason!!! Some idea? Workaround? Thx


Re: [HACKERS] Same double precision operations, different results

2014-02-13 Thread Tom Lane
=?ISO-8859-1?Q?Daniel_V=E1zquez?= daniel2d2...@gmail.com writes:
 Please can someone tell me why this behavior? Thx!!

You're confusing numeric and double precision; in particular this
calculation is *not* being done in float8, but numeric:

 # select (0.766238989559398 * 0.766238989559398 * 1 + 0.642555686986733 *
 0.642555686986733) calc;

 (*) Why this calculation produces 1 and not 0.999633651488135693

Actually, it's not producing 1, but a smidgen more:

regression=# set extra_float_digits TO 3;
SET
regression=# select ( cast (
   cos(radians(39.9826557))
   * cos(radians(39.9826557))
   * cos(radians(-0.0477312004383) - radians(-0.0477312004383))
   + sin(radians(39.9826557))
   * sin(radians(39.9826557)) as double precision )
 );
   float8
-
 1.00022
(1 row)

You've got roundoff error either way, but this way happens to be in the
direction that makes acos() complain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Alvaro Herrera
Andres Freund escribió:
 On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote:
Also, AutoVacOpts (used as part of reloptions) gained three extra
fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
more involve to put these at the end of that struct.  This might be a
problem if somebody has a module calling RelationIsSecurityView().  If
anyone thinks we should be concerned about such an ABI change, please
shout quickly.
   
   That sounds problematic --- surely StdRdOptions might be something
   extensions are making use of?
  
  So can we assume that security_barrier is the only thing to be concerned
  about?  If so, the attached patch should work around the issue by
  placing it in the same physical location.
 
 Aw. How instead about temporarily introducing AutoVacMXactOpts or
 something? Changing the name of the member variable sounds just as
 likely to break things.

Yes, that's what I did --- see the attached patch, which I would apply
on top of the code for master and would be only in 9.3.  The idea here
is to keep the existing bits of StdRdOpts identical, so that macros such
as RelationIsSecurityView() that were compiled with the old rel.h
continue to work unchanged and without requiring a recompile.

  I guess if there are modules
  that add extra stuff beyond StdRdOptions, this wouldn't work, but I'm
  not really sure how likely this is given that our reloptions design
  hasn't proven to be the most extensible thing in the world.
 
 Hm, I don't see how it'd be problematic, even if they do. I don't really
 understand the design of the reloptions code, but afaics, they shouldn't
 do so by casting around rd_options but by parsing it anew, right?

Now that I think about it, I don't think adding stuff at the end of
StdRdOptions has anything to do with adding nonstandard options.  So if
we extend that struct we're not breaking any ABI contract.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Alvaro Herrera
Alvaro Herrera escribió:

 Yes, that's what I did --- see the attached patch, which I would apply
 on top of the code for master and would be only in 9.3.

(Of course, these changes affect other parts of the code, in particular
autovacuum.c and reloptions.c.  But that's not important here).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
commit 689869c791aa7d13f408430ef438291bddb84b2f
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Date:   Thu Feb 13 12:31:37 2014 -0300

9.3 tweaks to avoid ABI break

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 534a9ea..6daa14a 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1170,11 +1170,11 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{autovacuum_freeze_table_age, RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, freeze_table_age)},
 		{autovacuum_multixact_freeze_min_age, RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_min_age)},
+		offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_min_age)},
 		{autovacuum_multixact_freeze_max_age, RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_max_age)},
+		offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_max_age)},
 		{autovacuum_multixact_freeze_table_age, RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, multixact_freeze_table_age)},
+		offsetof(StdRdOptions, autovacuum2) +offsetof(AutoVacOpts2, multixact_freeze_table_age)},
 		{autovacuum_vacuum_scale_factor, RELOPT_TYPE_REAL,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_scale_factor)},
 		{autovacuum_analyze_scale_factor, RELOPT_TYPE_REAL,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d749f70..8ef59e6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2430,10 +2430,11 @@ typedef struct VacuumStmt
 	int			options;		/* OR of VacuumOption flags */
 	int			freeze_min_age; /* min freeze age, or -1 to use default */
 	int			freeze_table_age;		/* age at which to scan whole table */
-	int			multixact_freeze_min_age; /* min multixact freeze age, or -1 to use default */
-	int			multixact_freeze_table_age; /* multixact age at which to scan whole table */
 	RangeVar   *relation;		/* single table to process, or NULL */
 	List	   *va_cols;		/* list of column names, or NIL for all */
+	/* place these at the end, to avoid ABI break within 9.3 branch */
+	int			multixact_freeze_min_age; /* min multixact freeze age, or -1 to use default */
+	int			multixact_freeze_table_age; /* multixact age at which to scan whole table */
 } VacuumStmt;
 
 /* --
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 47ae106..53d5690 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -187,7 +187,10 @@ typedef struct RelationData
  * be applied to relations that use this format or a superset for
  * private options data.
  */
- /* autovacuum-related reloptions. */
+ /* autovacuum-related reloptions.
+  *
+  * Split in two to avoid ABI break.
+  */
 typedef struct AutoVacOpts
 {
 	bool		enabled;
@@ -198,19 +201,24 @@ typedef struct AutoVacOpts
 	int			freeze_min_age;
 	int			freeze_max_age;
 	int			freeze_table_age;
-	int			multixact_freeze_min_age;
-	int			multixact_freeze_max_age;
-	int			multixact_freeze_table_age;
 	float8		vacuum_scale_factor;
 	float8		analyze_scale_factor;
 } AutoVacOpts;
 
+typedef struct AutoVacOpts2
+{
+	  int		multixact_freeze_min_age;
+	  int		multixact_freeze_max_age;
+	  int		multixact_freeze_table_age;
+} AutoVacOpts2;
+
 typedef struct StdRdOptions
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int			fillfactor;		/* page fill factor in percent (0..100) */
 	AutoVacOpts autovacuum;		/* autovacuum-related options */
 	bool		security_barrier;		/* for views */
+	AutoVacOpts2 autovacuum2;	/* rest of autovacuum options */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-13 Thread Heikki Linnakangas

On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:

On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:

The behavior where we prefer a segment from archive with lower TLI over
a file with higher TLI in pg_xlog actually changed in commit
a068c391ab0. Arguably changing it wasn't a good idea, but the problem
your test script demonstrates can be fixed by not archiving the partial
segment, with no change to the preference of archive/pg_xlog. As
discussed, archiving a partial segment seems like a bad idea anyway, so
let's just stop doing that.


After some further thought, while not archiving the partial segment
fixes your test script, it's not enough to fix all variants of the
problem. Even if archive recovery doesn't archive the last, partial,
segment, if the original master server is still running, it's entirely
possible that it fills the segment and archives it. In that case,
archive recovery will again prefer the archived segment with lower TLI
over the segment with newer TLI in pg_xlog.

So I agree we should commit the patch you posted (or something to that
effect). The change to not archive the last segment still seems like a
good idea, but perhaps we should only do that in master.


To draw this to conclusion, barring any further insights to this, I'm 
going to commit the attached patch to master and REL9_3_STABLE. Please 
have a look at the patch, to see if I'm missing something. I modified 
the state machine to skip over XLOG_FROM_XLOG state, if reading in 
XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and 
pg_xlog together, and then pg_xlog alone, which is pointless.


In master, I'm also going to remove the archive last segment on old 
timeline code.


- Heikki
From 15b0cadb0006b97f87257f4f903db8d33411ada9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 13 Feb 2014 19:16:00 +0200
Subject: [PATCH 1/1] Change the order that pg_xlog and WAL archive are polled
 for WAL segments.

If there is a WAL segment with same ID but different TLI present in both
the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
patch, the archive was polled first, for all expected TLIs, and only if no
file was found was pg_xlog scanned. This was a change in behavior from 9.3,
which first scanned archive and pg_xlog for the highest TLI, then archive
and pg_xlog for the next highest TLI and so forth. This patch reverts the
behavior back to what it was in 9.2.

The reason for this is that if for example you try to do archive recovery
to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
not archived yet, we would replay past the timeline switch point on
timeline 1 using the archived files, before even looking timeline 2's files
in pg_xlog

Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
was changed.
---
 src/backend/access/transam/xlog.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6a4a504..b2d8e1a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9576,17 +9576,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	/*---
 	 * Standby mode is implemented by a state machine:
 	 *
-	 * 1. Read from archive (XLOG_FROM_ARCHIVE)
-	 * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG)
-	 * 3. Check trigger file
-	 * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM)
-	 * 5. Rescan timelines
-	 * 6. Sleep 5 seconds, and loop back to 1.
+	 * 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just
+	 *pg_xlog (XLOG_FROM_XLOG)
+	 * 2. Check trigger file
+	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+	 * 4. Rescan timelines
+	 * 5. Sleep 5 seconds, and loop back to 1.
 	 *
 	 * Failure to read from the current source advances the state machine to
-	 * the next state. In addition, successfully reading a file from pg_xlog
-	 * moves the state machine from state 2 back to state 1 (we always prefer
-	 * files in the archive over files in pg_xlog).
+	 * the next state.
 	 *
 	 * 'currentSource' indicates the current state. There are no currentSource
 	 * values for check trigger, rescan timelines, and sleep states,
@@ -9614,9 +9612,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			switch (currentSource)
 			{
 case XLOG_FROM_ARCHIVE:
-	currentSource = XLOG_FROM_PG_XLOG;
-	break;
-
 case XLOG_FROM_PG_XLOG:
 
 	/*
@@ -9781,7 +9776,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  * Try to restore the file from archive, or read an existing
  * file from pg_xlog.
  */
-readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
+readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
+			  currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY : currentSource);
 if (readFile = 0)
 	return true;	/* success! */
 
-- 

Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Andres Freund
On 2014-02-13 14:40:39 -0300, Alvaro Herrera wrote:
 Andres Freund escribió:
  On 2014-02-12 17:40:44 -0300, Alvaro Herrera wrote:
 Also, AutoVacOpts (used as part of reloptions) gained three extra
 fields.  Since this is in the middle of StdRdOptions, it'd be somewhat
 more involve to put these at the end of that struct.  This might be a
 problem if somebody has a module calling RelationIsSecurityView().  If
 anyone thinks we should be concerned about such an ABI change, please
 shout quickly.

That sounds problematic --- surely StdRdOptions might be something
extensions are making use of?
   
   So can we assume that security_barrier is the only thing to be concerned
   about?  If so, the attached patch should work around the issue by
   placing it in the same physical location.
  
  Aw. How instead about temporarily introducing AutoVacMXactOpts or
  something? Changing the name of the member variable sounds just as
  likely to break things.
 
 Yes, that's what I did --- see the attached patch, which I would apply
 on top of the code for master and would be only in 9.3.  The idea here
 is to keep the existing bits of StdRdOpts identical, so that macros such
 as RelationIsSecurityView() that were compiled with the old rel.h
 continue to work unchanged and without requiring a recompile.

What I mean is that earlier code using StdRelOptions-security_barrier
directly now won't compile anymore. So you've changed a ABI breakage
into a API break. That's why I suggest adding the new options into a
separate struct at the end of StdRelOptions, that won't break anything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how set GUC_check_errhint_string in call_string_check_hook()

2014-02-13 Thread Tom Lane
Amit Langote amitlangot...@gmail.com writes:
 On Thu, Feb 13, 2014 at 5:59 PM, amul sul sul_a...@yahoo.co.in wrote:
 2. Can I pass hint message in above ereport(), how?

 It could be done by setting the check hook related global variable
 GUC_check_errhint_string inside your hook function.

You should use the macro GUC_check_errhint() to set up a hint inside
a GUC variable's check function.  See check_XactIsoLevel() in
src/backend/commands/variable.c for examples.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 I think what you're arguing is that we should see WAL records filling the
 rest of segment 1 before we see any references to segment 2, but if that's
 the case then how did we get into the situation you reported?  Or is it
 just that it was a broken base backup to start with?

 The scenario I could come up with that didn't require a broken base backup
 was that there was an earlier truncate or vacuum. So the sequence is high
 offset reference, truncate, growth, crash. All possibly on a single
 database.

That's not really an issue, because then it would be OK to discard the
high-offset update; we'd recognize that as safe when we replay the
truncation.

 It's possible we're better off not assuming we've thought of all possible
 ways this can happen though.

That's what's bothering me, too.  On the other hand, if we can't think of
a scenario where it'd be necessary to replay the high-offset update, then
I'm disinclined to mess with the code further.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The scenario I could come up with that didn't require a broken base backup
 was that there was an earlier truncate or vacuum. So the sequence is high
 offset reference, truncate, growth, crash. All possibly on a single
 database.

 That's not really an issue, because then it would be OK to discard the
 high-offset update; we'd recognize that as safe when we replay the
 truncation.

Yeah, that's my point.

 It's possible we're better off not assuming we've thought of all possible
 ways this can happen though.

 That's what's bothering me, too.  On the other hand, if we can't think of
 a scenario where it'd be necessary to replay the high-offset update, then
 I'm disinclined to mess with the code further.

And the whole point of the undefined page error checking is to detect
cases like this, so covering them up in the name of possible edge
cases we haven't thought of kind of defeats the purpose. In particular
I would have liked to get errors rather than soldier on when the
database found these missing segments.

In that vein, the other possibly open question was how we got past the
undefined pages errors that we did see. Andres said he thought that
was due to the bug where some piece of code was mistakenly using the
presence of a snapshot but I'm not clear how that can cause this
though.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread Tom Lane
David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.

 The idea is to rewrite the SQL queries like this:

   “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”

 to:

   “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”

TBH this sounds like a spectacularly bad idea, especially in the place and
way you propose to do it.  You can't even do catalog access safely where
you've put that hook, not to mention that there are many other places
where queries can be submitted.  But more generally, an FDW should not
operate in the way you're describing.

We do lack support for pushing joins to the foreign server, and that needs
to be addressed; but we need to do it in the planner, not by kluging the
query somewhere upstream of that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2014-02-13 Thread Bruce Momjian
On Fri, Oct 11, 2013 at 03:39:51PM -0700, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
  On 10/11/2013 01:11 PM, Bruce Momjian wrote:
  In summary, I think we need to:
 
  *  decide on new defaults for work_mem and maintenance_work_mem
  *  add an initdb flag to allow users/packagers to set shared_bufffers?
  *  add an autovacuum_work_mem setting?
  *  change the default for temp_buffers?
 
  If we're changing defaults, bgwriter_lru_maxpages and vacuum_cost_limit
  could also use a bump; those thresholds were set for servers with  1GB
  of RAM.
 
 +1 on those.
 
 Also, I have often had to bump cpu_tuple_cost into the 0.03 to 0.05
 range to get a good plan.  In general, this makes the exact
 settings of *_page_cost less fussy, and I have hit situations where
 I was completely unable to get a good plan to emerge without
 bumping cpu_tuple_cost relative to the other cpu costs.  I know that
 it's possible to engineer a workload that shows any particular cost
 adjustment to make things worse, but in real-life production
 environments I have never seen an increase in this range make plan
 choice worse.

So, would anyone like me to create patches for any of these items before
we hit 9.4 beta?  We have added autovacuum_work_mem, and increasing
work_mem and maintenance_work_mem by 4x is a simple operation.  Not sure
about the others.  Or do we just keep this all for 9.5?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Alvaro Herrera
Alvaro Herrera escribió:

 So here are two patches -- the first one, for 9.3 and HEAD, introduce
 the new aging variables and use them throughout vacuum and autovacuum,
 including per-table options; the second one adjusts the struct
 declarations to avoid the ABI break in VacuumStmt and StdRdOptions.

I forgot to ask: what opinions are there about
vacuum_multixact_freeze_table_age's default value?  Right now I have 150
million, same as for Xids.  However, it might make sense to use 300
millions, so that whole-table scans are not forced earlier than for Xids
unless consumption rate for multixacts is double the one for Xids.

I already have set autovacuum_multixact_freeze_max_age to 400 million,
i.e. double that for Xids.  This means emergency vacuums will not take
place for multis, unless consumption rate is double that for Xids.
This seems pretty reasonable to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding

2014-02-13 Thread Tom Lane
dig...@126.com writes:
 select t, t::bytea from convert_from('\xeec1', 'sql_ascii') as g(t);
 [ fails to check that string is valid in database encoding ]

Hm, yeah.  Normal input to the database goes through pg_any_to_server(),
which will apply a validation step if the source encoding is SQL_ASCII
and the destination encoding is something else.  However, pg_convert and
some other places call pg_do_encoding_conversion() directly, and that
function will just quietly do nothing if either encoding is SQL_ASCII.

The minimum-refactoring solution to this would be to tweak
pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but
the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing.

I'm not sure if this would break anything we need to have work,
though.  Thoughts?  Do we want to back-patch such a change?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Perhaps we should use a lock to enforce that only one process tries to
 clean up the pending list at a time.

Something like the attached?  Can somebody who's seen this problem confirm
this improves matters?

(ginInsertCleanup's header comment also needs to be rewritten, but for
testing purposes, this is fine.)

regards, tom lane

diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 4a65046..38cda14 100644
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 21,26 
--- 21,27 
  #include access/gin_private.h
  #include commands/vacuum.h
  #include miscadmin.h
+ #include storage/lmgr.h
  #include utils/memutils.h
  #include utils/rel.h
  
*** ginInsertCleanup(GinState *ginstate,
*** 739,744 
--- 740,755 
  	KeyArray	datums;
  	BlockNumber blkno;
  
+ 	/*
+ 	 * We use a heavyweight lock on the metapage to ensure that only one
+ 	 * backend at a time tries to clean up the pending list.  While it does
+ 	 * actually work for multiple backends to run this code concurrently, that
+ 	 * turns out to be a bad idea because there's lots of locking conflicts.
+ 	 * So if someone else is already running cleanup, we just do nothing.
+ 	 */
+ 	if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
+ 		return;
+ 
  	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
  	LockBuffer(metabuffer, GIN_SHARE);
  	metapage = BufferGetPage(metabuffer);
*** ginInsertCleanup(GinState *ginstate,
*** 748,753 
--- 759,765 
  	{
  		/* Nothing to do */
  		UnlockReleaseBuffer(metabuffer);
+ 		UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
  		return;
  	}
  
*** ginInsertCleanup(GinState *ginstate,
*** 925,930 
--- 937,944 
  
  	ReleaseBuffer(metabuffer);
  
+ 	UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+ 
  	/* Clean up temporary space */
  	MemoryContextSwitchTo(oldCtx);
  	MemoryContextDelete(opCtx);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2014-02-13 Thread Alvaro Herrera
Alvaro Herrera escribió:

 So here are two patches -- the first one, for 9.3 and HEAD, introduce
 the new aging variables and use them throughout vacuum and autovacuum,
 including per-table options; the second one adjusts the struct
 declarations to avoid the ABI break in VacuumStmt and StdRdOptions.

I have pushed this for both 9.3 and master.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andres Freund
On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  Perhaps we should use a lock to enforce that only one process tries to
  clean up the pending list at a time.
 
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?
 
 (ginInsertCleanup's header comment also needs to be rewritten, but for
 testing purposes, this is fine.)

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Another pgindent gripe

2014-02-13 Thread Tom Lane
I noticed that current pgindent wants to do this to a recently-added
comment in psql/copy.c:

*** handleCopyIn(PGconn *conn, FILE *copystr
*** 627,633 
/*
 * This code erroneously assumes '\.' 
on a line alone
 * inside a quoted CSV string 
terminates the \copy.
!* 
http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org
 */
if (strcmp(buf, \\.\n) == 0 ||
strcmp(buf, \\.\r\n) == 0)
--- 627,634 
/*
 * This code erroneously assumes '\.' 
on a line alone
 * inside a quoted CSV string 
terminates the \copy.
!* 
http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w
!* rigleys.postgresql.org
 */
if (strcmp(buf, \\.\n) == 0 ||
strcmp(buf, \\.\r\n) == 0)

Doesn't seem particularly friendly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another pgindent gripe

2014-02-13 Thread Bruce Momjian
On Thu, Feb 13, 2014 at 06:14:17PM -0500, Tom Lane wrote:
 I noticed that current pgindent wants to do this to a recently-added
 comment in psql/copy.c:
 
 *** handleCopyIn(PGconn *conn, FILE *copystr
 *** 627,633 
 /*
  * This code erroneously assumes '\.' 
 on a line alone
  * inside a quoted CSV string 
 terminates the \copy.
 !* 
 http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org
  */
 if (strcmp(buf, \\.\n) == 0 ||
 strcmp(buf, \\.\r\n) == 0)
 --- 627,634 
 /*
  * This code erroneously assumes '\.' 
 on a line alone
  * inside a quoted CSV string 
 terminates the \copy.
 !* 
 http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w
 !* rigleys.postgresql.org
  */
 if (strcmp(buf, \\.\n) == 0 ||
 strcmp(buf, \\.\r\n) == 0)
 
 Doesn't seem particularly friendly.

Yeah, that is ugly, and I was about to add a URL to a C comment myself
today.

I can set the column target width, but there doesn't seem to be any flag
that says to avoid breaking text to do the wrapping.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
 Something like the attached?  Can somebody who's seen this problem confirm
 this improves matters?

 Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
 I think we would need to at least need to stop using the fast tuple
 mechanism during gininsert() if it's already too big and do plain
 inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Andres Freund
On 2014-02-13 18:49:21 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
  Something like the attached?  Can somebody who's seen this problem confirm
  this improves matters?
 
  Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
  I think we would need to at least need to stop using the fast tuple
  mechanism during gininsert() if it's already too big and do plain
  inserts.
 
 No, because we've already got a process working on cleaning it out.

Well, appending to the fast tuple list will normally be fully cached io,
cleaning it up not so much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That's what's bothering me, too.  On the other hand, if we can't think of
 a scenario where it'd be necessary to replay the high-offset update, then
 I'm disinclined to mess with the code further.

 And the whole point of the undefined page error checking is to detect
 cases like this, so covering them up in the name of possible edge
 cases we haven't thought of kind of defeats the purpose.

Yeah, good point.  Let's leave it alone and see if the existing fix
helps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Store Extension Options

2014-02-13 Thread Fabrízio de Royes Mello
On Sun, Feb 9, 2014 at 2:22 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Sat, Jan 11, 2014 at 2:47 AM, Peter Eisentraut pete...@gmx.net wrote:
 
  On Sat, 2014-01-11 at 00:48 -0200, Fabrízio de Royes Mello wrote:
Now, if bdr is installed but the validation doesn't happen unless
   bdr
is loaded in some sense, then that is an implementation deficiency
that I think we can insist be rectified before this feature is
   accepted.
   
 
   Check if extension is already installed is not enough for the first
   version of this feature?
 
  Elsewhere it was argued that tying this to extensions is not
  appropriate.  I agree.
 
  It depends on how this feature is supposed to be used exactly.  A
  replication plugin might very well be loaded via
  session_preload_libraries and not appear in SQL at all.  In that case
  you need some C-level hook.  In another case, an extension might want to
  inspect relation options from user-space triggers.  So you'd need to
  register some SQL-level function for option validation.
 
  This could end up being two separate but overlapping features.
 

 Hi all,


 I taken this weekend to work on this patch and on monday or tuesday I'll
send it.

 But I have some doubts:

 1) I'm not convinced to tying this to extensions. I think this feature
must enable us to just store a custom GUC. We can set custom GUCs in a
backend session using SET class.variable = value, and this feature could
just enable us to store it for relations/attributes. Without the complexity
and overhead to register a function to validate them. That way we can use
this feature to extensions and other needs too.

 2) If we're implement the Robert's idea to have a function to validate
the extension options then we must think about how a extension developer
will register this function. Beacuse when we install a extension must have
one way to get de pg_proc OID and store it in the pg_extension (or a
different catalog). Or we'll implement some way to register this function
at the SQL level, like ALTER EXTENSION bdr SET VALIDATE FUNCTION
bdr_options_validate(); or another sintax of course.

 I don't know if you guys understood my concerns!! :-)

 Comments?


Hi all,

The attached patch implements the first option that I suggested before.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index d210077..5e9ee9d 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -82,6 +82,14 @@ ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESE
   xref linkend=SQL-REINDEX
   to get the desired effects.
  /para
+ note
+   para
+ A custom name can be used as namespace to define a storage parameter.
+ Storage option pattern: namespace.option=value
+ (namespace=custom name, option=option name and value=option value).
+ See example bellow.
+   /para
+ /note
 /listitem
/varlistentry
 
@@ -202,6 +210,17 @@ ALTER INDEX distributors SET (fillfactor = 75);
 REINDEX INDEX distributors;
 /programlisting/para
 
+  para
+   To set a custom storage parameter:
+programlisting
+ALTER INDEX distributors
+  SET (bdr.do_replicate=true);
+/programlisting
+   (bdr=custom name, do_replicate=option name and 
+   true=option value)
+/para
+
+
  /refsect1
 
  refsect1
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 89649a2..6fd9d67 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -213,6 +213,16 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
   of statistics by the productnamePostgreSQL/productname query
   planner, refer to xref linkend=planner-stats.
  /para
+
+ note
+  para
+A custom name can be used as namespace to define a storage parameter.
+Storage option pattern: namespace.option=value
+(namespace=custom name, option=option name and value=option value).
+See example bellow.
+  /para
+ /note
+
 /listitem
/varlistentry
 
@@ -476,6 +486,10 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
commandALTER TABLE/ does not treat literalOIDS/ as a
storage parameter.  Instead use the literalSET WITH OIDS/
and literalSET WITHOUT OIDS/ forms to change OID status.
+   A custom name can be used as namespace to define a storage parameter.
+   Storage option pattern: namespace.option=value
+   (namespace=custom name, option=option name and value=option value).
+   See example bellow.
   /para
  /note
 /listitem
@@ -1112,6 +1126,26 @@ ALTER TABLE distributors DROP CONSTRAINT 

Re: [HACKERS] [bug fix] psql \copy doesn't end if backend is killed

2014-02-13 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 If the backend is terminated with SIGKILL while psql is running \copy 
 table_name from file_name, the \copy didn't end forever.  I expected \copy 
 to be cancelled because the corresponding server process vanished.

I just noticed this CF entry pertaining to the same problem that Stephen
Frost reported a couple days ago:
http://www.postgresql.org/message-id/20140211205336.gu2...@tamriel.snowman.net

I believe it's been adequately fixed as of commits fa4440f516 and
b8f00a46bc, but if you'd test that those handle your problem cases,
I'd appreciate it.

 [Fix]
 If the message transmission fails in PQputCopyEnd(), switch 
 conn-asyncStatus back to PGASYNC_BUSY.

This patch seems inappropriate to me, because it will allow libpq to exit
the COPY IN state whether or not it still has a live connection.  If it
does, the backend will be in an inconsistent state and we'll have a mess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-13 Thread Hiroshi Inoue
(2014/02/13 9:51), Hiroshi Inoue wrote:
 (2014/02/12 12:28), Inoue, Hiroshi wrote:
 (2014/02/12 8:30), Tom Lane wrote:
 I wrote:
 Hiroshi Inoue in...@tpf.co.jp writes:
 I tried MINGW port with the attached change and successfully built
 src and contrib and all pararell regression tests were OK.

 I cleaned this up a bit (the if-nesting in Makefile.shlib was making
 my head hurt, not to mention that it left a bunch of dead code) and
 committed it.

 Hm ... according to buildfarm member narwhal, this doesn't work so well
 for plperl:

 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
 -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment   -shared -o 
 plperl.dll  plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common 
 -Wl,--allow-multiple-definition -L/mingw/lib  -Wl,--as-needed   
 -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon 
 -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm  -lws2_32 
 -lshfolder -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a
 Cannot export .idata$4: symbol not found
 Cannot export .idata$5: symbol not found
 Cannot export .idata$6: symbol not found
 Cannot export .text: symbol not found
 Cannot export perl58_NULL_THUNK_DATA: symbol not found
 Creating library file: libplperl.a
 collect2: ld returned 1 exit status
 make[3]: *** [plperl.dll] Error 1

 Oops I forgot to inclule plperl, tcl or python, sorry. I would
 retry build operations with them. Unfortunately it would take
 pretty long time because build operations are pretty (or veeery
in an old machine) slow.

 Not very clear what's going on there; could this be a problem in
 narwhal's admittedly-ancient toolchain?
 
 As for build, plperl and pltcl are OK on both Windows7+gcc4.6.1
 machine and Windows Vista+gcc3.4.5 machine. plpython is OK on
 gcc4.6.1 machine but causes a *initializer element is not constant*
 error on gcc3.4.5 machine.
 I've not run regression test yet.

Rebuild with --disable-auto-import causes errors in contrib on
both machines. Errors occur in pg_buffercache, pg_stat_statements,
 postgres_fdw and test_shm_mq. --enable-auto-import cures all of them.

regards,
Hiroshi Inoue




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-13 Thread Tom Lane
Hiroshi Inoue in...@tpf.co.jp writes:
 Rebuild with --disable-auto-import causes errors in contrib on
 both machines. Errors occur in pg_buffercache, pg_stat_statements,
  postgres_fdw and test_shm_mq.

Yeah, that's the idea: we want to get the same failures as on MSVC.
I'm going to put in PGDLLIMPORT macros to fix these cases, but I'm
waiting for verification that Cygwin also sees the problem before
making it go away.

brolga unfortunately seems to have been AWOL for the past day and
a half.  Andrew, could you give it a kick?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HBA files w/include support?

2014-02-13 Thread Jerry Sievers
Greetings Hackers.

I'm aware of how a pg_hba.conf file can refer to other files for
including @lists of users, etc.

But there is currently no support for being able to pull in entire file
segments as can be done for postgresql.conf via the include directive.

In the environment that I'm managing, we are using a makefile to stick
together a common header with a custom section for any of several
clusters and may extend this further to permit additional includes for
hba rules common to groupings of clusters.

Anyway, please advise.  I don't recall hearing anything like this
discussed.

Has been proposed, discussed and voted down?  Or never mentioned?

Thx

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [SQL] Comparison semantics of CHAR data type

2014-02-13 Thread Bruce Momjian
On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote:
   You can see the UTF8 case is fine because \n is considered greater
   than space, but in the C locale, where \n is less than space, the
   false return value shows the problem with
   internal_bpchar_pattern_compare() trimming the string and first
   comparing on lengths.  This is exactly the problem you outline, where
   space trimming assumes everything is less than a space.
  
  For collations other than C some of those issues that have to do with
  string comparisons might simply be hidden, depending on how strcoll()
  handles inputs off different lengths: If strcoll() applies implicit
  space padding to the shorter value, there won't be any visible
  difference in ordering between bpchar and varchar values.  If strcoll()
  does not apply such space padding, the right-trimming of bpchar values
  causes very similar issues even in a en_US collation.

I have added the attached C comment to explain the problem, and added a
TODO item to fix it if we ever break binary upgrading.

Does anyone think this warrants a doc mention?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
new file mode 100644
index 502ca44..284b5d1
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
*** bpcharcmp(PG_FUNCTION_ARGS)
*** 846,851 
--- 846,863 
  len2;
  	int			cmp;
  
+ 	/*
+ 	 * Trimming trailing spaces off of both strings can cause a string
+ 	 * with a character less than a space to compare greater than a
+ 	 * space-extended string, e.g. this returns false:
+ 	 *		SELECT E'ab\n'::CHAR(10)  E'ab '::CHAR(10);
+ 	 * even though '\n' is less than the space if CHAR(10) was
+ 	 * space-extended.  The correct solution would be to trim only
+ 	 * the longer string to be the same length of the shorter, if
+ 	 * possible, then do the comparison.  However, changing this
+ 	 * might break existing indexes, breaking binary upgrades.
+ 	 * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com
+ 	 */
  	len1 = bcTruelen(arg1);
  	len2 = bcTruelen(arg2);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-13 Thread Bruce Momjian
On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
 Greetings Hackers.
 
 I'm aware of how a pg_hba.conf file can refer to other files for
 including @lists of users, etc.
 
 But there is currently no support for being able to pull in entire file
 segments as can be done for postgresql.conf via the include directive.
 
 In the environment that I'm managing, we are using a makefile to stick
 together a common header with a custom section for any of several
 clusters and may extend this further to permit additional includes for
 hba rules common to groupings of clusters.
 
 Anyway, please advise.  I don't recall hearing anything like this
 discussed.
 
 Has been proposed, discussed and voted down?  Or never mentioned?

I have never heard of anyone request this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-02-13 Thread Peter Eisentraut
On Sun, 2014-01-19 at 15:31 +0100, Stefan Kaltenbrunner wrote:
  /* followings are for client encoding only */
  PG_SJIS,/* Shift JIS
  (Winindows-932) */
 
 while you have that file open: s/Winindows-932/Windows-932 maybe?

done



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2014-02-13 Thread Bruce Momjian
On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
 On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila amit.kapil...@gmail.com wrote:
  On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila amit.kapil...@gmail.com 
  wrote:
  On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net 
  wrote:
  On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com 
  wrote:
  On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Well, that sucks.  So it's a Windows bug.
 
  I agree we'll probably want to work around it in the end, but I still
  think it should be put to Microsoft PSS if we can. The usual - have we
  actually produced a self-contained example that does just this (and
  doesn't include the full postgres support) and submitted it to
  *microsoft* for comments?
 
I have written a self contained win32 console application with which
  the issue can be reproduced.
The application project is attached with this mail.
 
  Logged a support ticket with Microsoft, they could reproduce the issue
  with the sample application (it is same what I had posted on hackers
  in this thread) and working on it.
 
 Further update on this issue:
 
 Microsoft has suggested a workaround for stat API. Their suggestion
 is to use 'GetFileAttributesEx' instead of stat, when I tried their
 suggestion, it also gives me same problem as stat.
 
 Still they have not told anything about other API's
 (rmdir, RemoveDirectory) which has same problem.

Where are we on this?  Is there a check we should add in our code?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-13 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
  Greetings Hackers.
  
  I'm aware of how a pg_hba.conf file can refer to other files for
  including @lists of users, etc.
  
  But there is currently no support for being able to pull in entire file
  segments as can be done for postgresql.conf via the include directive.

 I have never heard of anyone request this.

On the contrary, I have vague memories that this has been discussed and
agreed upon; we have just never implemented it.

One issue with this is that pg_hba.conf is order sensitive, which could
become a trap for the unwary if includes are used carelessly.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-13 Thread Alvaro Herrera
I hadn't noticed this thread.  I will give this a look.  Thanks for
providing a patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-13 Thread Jerry Sievers
Alvaro Herrera alvhe...@2ndquadrant.com writes:

 Bruce Momjian wrote:

 On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
  Greetings Hackers.
  
  I'm aware of how a pg_hba.conf file can refer to other files for
  including @lists of users, etc.
  
  But there is currently no support for being able to pull in entire file
  segments as can be done for postgresql.conf via the include directive.

 I have never heard of anyone request this.

 On the contrary, I have vague memories that this has been discussed and
 agreed upon; we have just never implemented it.

Interesting and it'd surprise me if no one has ever wanted the feature.


 One issue with this is that pg_hba.conf is order sensitive, which could
 become a trap for the unwary if includes are used carelessly.

Indeed.

The other thing that comes to mind, is that as opposed to
postgresql.conf and the include scenario there... one can do show all or
query from pg_stat_activity just to see what setting they ended up
with. 

I'm not aware of any way to probe what hba rules are loaded at runtime
and thus, debugging hba config changes not really possible.

I presume that a simple scenario involving just 1 level of includes not
too difficult to grok but nested includes sure might be a foot gun
unless there was a way to dump the resulting configs somehow.

Thus pasting hba files together externally a more reliable approach.

Thanks

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-13 Thread Jesper Krogh

On 14/02/14 00:49, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-02-13 16:15:42 -0500, Tom Lane wrote:

Something like the attached?  Can somebody who's seen this problem confirm
this improves matters?

Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
I think we would need to at least need to stop using the fast tuple
mechanism during gininsert() if it's already too big and do plain
inserts.

No, because we've already got a process working on cleaning it out.

In any case, this needs some testing to see if it's an improvement
or not.


Having some real-world experience with the fastupdate mechanism. Under 
concurrent load
it behaves really bad. Random processes waiting for cleanup (or 
competing with cleanup) is
going to see latency-spikes, because they magically hit that corner, 
thus reverting to plain
inserts if it cannot add to the pending list, will not remove this 
problem, but will

make it only hit the process actually doing the cleanup.

The build in mechanism, that cleanup is i cost paid by the process who 
happened to
fill the pendinglist, is really hard to deal with in production. More 
control is appreciated,
perhaps even an explicit flush-mechanism..I'd like to batch up 
inserts during one transaction only

and flush on commit.

--
Jesper - with fastupdate turned off due to above issues.


--
Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers