Re: [HACKERS] proposal: simple date constructor from numeric values

2013-09-19 Thread Jeevan Chalke
On Wed, Sep 18, 2013 at 9:54 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 thank you,

 I have no comments


Thanks.

Assigned it to committer.



 Regards

 Pavel

 --
Jeevan B Chalke


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-19 Thread Fujii Masao
On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 samthaku...@gmail.com wrote:
I got the segmentation fault when I tested the case where the
 least-executed
query statistics is discarded, i.e., when I executed different queries more
 than
pg_stat_statements.max times. I guess that the patch might have a bug.
 Thanks, will try to fix it.

 pg_stat_statements--1.1.sql should be removed.
 Yes will do that



 +  entrystructfieldqueryid/structfield/entry
 +  entrytypebigint/type/entry
 +  entry/entry
 +  entryUnique value of each representative statement for the
 current statistics session.
 +   This value will change for each new statistics session./entry

 What does statistics session mean?
 The time period when statistics are gathered by statistics collector
 without being reset. So the statistics session continues across normal
 shutdowns, but in case of abnormal situations like crashes, format upgrades
 or statistics being reset for any other reason, a new time period of
 statistics collection starts i.e. a new statistics session. The queryid
 value generation is linked to statistics session so emphasize the fact that
 in case of crashes,format upgrades or any situation of statistics reset, the
 queryid for the same queries will also change.

I'm afraid that this behavior narrows down the use case of queryid very much.
For example, since the queryid of the same query would not be the same in
the master and the standby servers, we cannot associate those two statistics
by using the queryid. The queryid changes through the crash recovery, so
we cannot associate the query statistics generated before the crash with that
generated after the crash recovery even if the query is the same.

This is not directly related to the patch itself, but why does the queryid
need to be calculated based on also the statistics session?

 Will update documentation
 clearly explain the term statistics session in this context

Yep, that's helpful!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-19 Thread Dimitri Fontaine
Kevin Grittner kgri...@ymail.com writes:
 There are examples in the patch and this thread, but rather than
 reference back to those I'll add a new one.  Without the patch:

Thanks much for doing that.

 The problem, as I see it, is that the view and the concurrently
 refreshed materialized view don't yield the same results for the
 same query.  The rows are equal, but they are not the same.  With
 the patch the matview, after RMVC, looks just the same as the view.

My understanding is that if you choose citext then you don't care at all
about the case, so that the two relations actually yield the same
results for the right definition of same here: the citext one.

In other words, the results only look different in ways that don't
matter for the datatype involved, and I think that if it matters to the
user then he needs to review is datatype choices or view definition.

So my position on that is that your patch is only adding confusion for
no benefits that I'm able to understand.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-19 Thread Andres Freund
On 2013-09-18 11:02:50 +0200, Andres Freund wrote:
 On 2013-09-18 11:55:24 +0530, Amit Kapila wrote:
 
   I think that ship has long since sailed. postgresql.conf has allowed
   foo.bar style GUCs via custom_variable_classes for a long time, and
   these days we don't even require that but allow them generally. Also,
   SET, postgres -c, and SELECT set_config() already don't have the
   restriction to one dot in the variable name.
  
  It's even explained in document that a two-part name is allowed for
  Customized Options at link:
  http://www.postgresql.org/docs/devel/static/runtime-config-custom.html
 
 Oh I somehow missed that. I'll need to patch that as well.

Updated patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 2ab86c1cd230b4187ee994447075bd6a72b408dc Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sat, 14 Sep 2013 23:15:10 +0200
Subject: [PATCH] Allow custom GUCs to be nested more than one level in config
 files

Doing so was allowed via SET, postgres -c and set_config()
before. Allowing to do so is useful for grouping together variables
inside an extension's toplevel namespace.

There still are differences in the accepted variable names between the
different methods of setting GUCs after this commit, but the concensus
is that those are acceptable.
---
 doc/src/sgml/config.sgml  | 29 -
 src/backend/utils/misc/guc-file.l |  2 +-
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 370aa09..ae9ef3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6375,21 +6375,32 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 /para
 
 para
- Custom options have two-part names: an extension name, then a dot, then
- the parameter name proper, much like qualified names in SQL.  An example
- is literalplpgsql.variable_conflict/.
+ Custom options consist out of an extension name, then a dot, zero
+ or more extension internal namespaces each with a trailing dot,
+ then the parameter name proper, much like qualified names in
+ SQL. An example without extension internal namespaces
+ is literalplpgsql.variable_conflict/, one with
+ is literalfoo.a.connection/.
 /para
 
 para
  Because custom options may need to be set in processes that have not
  loaded the relevant extension module, productnamePostgreSQL/
- will accept a setting for any two-part parameter name.  Such variables
- are treated as placeholders and have no function until the module that
- defines them is loaded. When an extension module is loaded, it will add
- its variable definitions, convert any placeholder values according to
- those definitions, and issue warnings for any unrecognized placeholders
- that begin with its extension name.
+ will accept a setting for any parameter name containing at least one dot.
+ Such variables are treated as placeholders and have no function until the
+ module that defines them is loaded. When an extension module is loaded,
+ it will add its variable definitions, convert any placeholder values
+ according to those definitions, and issue warnings for any unrecognized
+ placeholders that begin with its extension name.
 /para
+
+para
+ Note that options set in filenamepostgresql.conf/ and files
+ it includes have different restrictions than names set
+ with commandSET/, commandSELECT set_config(...)/, or ones
+ passed directly to commandpostgres/ and commandpg_ctl/.
+/para
+
/sect1
 
sect1 id=runtime-config-developer
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index b730a12..ff4202d 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -78,7 +78,7 @@ LETTER			[A-Za-z_\200-\377]
 LETTER_OR_DIGIT [A-Za-z_0-9\200-\377]
 
 ID{LETTER}{LETTER_OR_DIGIT}*
-QUALIFIED_ID	{ID}.{ID}
+QUALIFIED_ID	{ID}(.{ID})+
 
 UNQUOTED_STRING {LETTER}({LETTER_OR_DIGIT}|[-._:/])*
 STRING			\'([^'\\\n]|\\.|\'\')*\'
-- 
1.8.4.21.g992c386.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Sawada Masahiko
On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

The currently patch synchronous_transfer doesn't work when
synchronous_commit is set 'off' or 'local'.
if user changes synchronous_commit value on transaction, checkpointer
process can't see it.
Due to that, even if synchronous_commit is changed to 'off' from 'on',
synchronous_transfer doesn't work.
I'm planning to modify the patch so that synchronous_transfer is not
affected by synchronous_commit.


 + * accept all the likely variants of off.

 This comment should be removed because synchronous_transfer
 doesn't accept the value off.

 +{commit, SYNCHRONOUS_TRANSFER_COMMIT, true},

 ISTM the third value true should be false.

 +{0, SYNCHRONOUS_TRANSFER_COMMIT, true},

 Why is this needed?

 +elog(WARNING, XLogSend sendTimeLineValidUpto(%X/%X) =
 sentPtr(%X/%X) AND sendTImeLine,
 + (uint32) (sendTimeLineValidUpto  32), (uint32)
 sendTimeLineValidUpto,
 + (uint32) (sentPtr  32), (uint32) sentPtr);

 Why is this needed?


They are unnecessary. I had forgot to remove unnecessary codes.

 +#define SYNC_REP_WAIT_FLUSH1
 +#define SYNC_REP_WAIT_DATA_FLUSH2

 Why do we need to separate the wait-queue for wait-data-flush
 from that for wait-flush? ISTM that wait-data-flush also can
 wait for the replication on the wait-queue for wait-flush, and
 which would simplify the patch.


Yes, it seems not necessary to add queue newly.
I will delete SYNC_REP_WAIT_DATA_FLUSH and related that.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: lob conversion functionality

2013-09-19 Thread Rushabh Lathia
Hi Pavel,

I have reviewed you patch.

-- Patch got applied cleanly (using patch -p1)
-- Make  Make install works fine
-- make check looks good

I done code-walk and it looks good. Also did some manual testing and haven't
found any issue with the implementation.

Patch introduced two new API load_lo() and make_lo() for loading and saving
from/to large objects Functions. When it comes to drop an lo object created
using make_lo() this still depend on older API lo_unlink(). I think we
should
add that into documentation for the clerification.

As a user to lo object function when I started testing this new API, first
question came to mind is why delete_lo() or destroy_lo() API is missing.
Later I realize that need to use lo_unlink() older API for that
functionality.
So I feel its good to document that. Do let you know what you think ?

Otherwise patch looks nice and clean.

Regards,
Rushabh Lathia
www.EnterpriseDB.com



On Sun, Aug 25, 2013 at 8:31 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 here is a patch

 it introduce a load_lo and make_lo functions

 postgres=# select make_lo(decode('ff00','hex'));
  make_lo
 ─
24629
 (1 row)

 Time: 40.724 ms
 postgres=# select load_lo(24628);
   load_lo
 
  \xff00
 (1 row)

 postgres=# \lo_import ~/avatar.png
 lo_import 24630

 postgres=# select md5(load_lo(24630));
md5
 ──
  513f60836f3b625713acaf1c19b6ea78
 (1 row)

 postgres=# \q
 bash-4.1$ md5sum ~/avatar.png
 513f60836f3b625713acaf1c19b6ea78  /home/pavel/avatar.png

 Regards

 Pavel Stehule



 2013/8/22 Jov am...@amutu.com

 +1
 badly need the large object and bytea convert function.

 Once I have to use the ugly pg_read_file() to put some text to pg,I tried
 to use large object but find it is useless without function to convert
 large object to bytea.

 Jov
 blog: http:amutu.com/blog http://amutu.com/blog


 2013/8/10 Pavel Stehule pavel.steh...@gmail.com

  Hello

 I had to enhance my older project, where XML documents are parsed and
 created on server side - in PLpgSQL and PLPerl procedures. We would to
 use a LO API for client server communication, but we have to
 parse/serialize LO on server side.

 I found so there are no simple API for working with LO from PL without
 access to file system. I had to use a ugly hacks:

 CREATE OR REPLACE FUNCTION parser.save_as_lob(bytea)
 RETURNS oid AS $$
 DECLARE
   _loid oid;
   _substr bytea;
 BEGIN
   _loid := lo_creat(-1);
   FOR i IN 0..length($1)/2048
   LOOP
 _substr := substring($1 FROM i * 2048 + 1 FOR 2048);
 IF _substr  '' THEN
   INSERT INTO pg_largeobject(loid, pageno, data)
 VALUES(_loid, i, _substr);
 END IF;
   END LOOP;

   EXECUTE format('GRANT SELECT ON LARGE OBJECT %s TO ohs', _loid);
   RETURN _loid;
 END;
 $$ LANGUAGE plpgsql SECURITY DEFINER STRICT SET search_path =
 'pg_catalog';

 and

 CREATE OR REPLACE FUNCTION fbuilder.attachment_to_xml(attachment oid)
 RETURNS xml AS $$
 DECLARE
   b_cum bytea = '';
   b bytea;
 BEGIN
   FOR b IN SELECT l.data
   FROM pg_largeobject l
  WHERE l.loid = attachment_to_xml.attachment
  ORDER BY l.pageno
   LOOP
 b_cum := b_cum || b;
   END LOOP;
   IF NOT FOUND THEN
 RETURN NULL;
   ELSE
 RETURN xmlelement(NAME attachment,
encode(b_cum, 'base64'));
   END IF;
 END;
 $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path =
 'pg_catalog';

 These functions can be simplified if we supports some functions like
 encode, decode for LO

 So my proposal is creating functions:

 * lo_encode(loid oid) .. returns bytea
 * lo_encode(loid oid, encoding text) .. returns text
 * lo_make(loid oid, data bytea)
 * lo_make(loid oid, data text, encoding text)

 This can simplify all transformation between LO and VARLENA. Known
 limit is 1G for varlena, but it is still relative enough high.

 Notes. comments?

 Regards

 Pavel


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Rushabh Lathia


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Fujii Masao
On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


 Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

 The currently patch synchronous_transfer doesn't work when
 synchronous_commit is set 'off' or 'local'.
 if user changes synchronous_commit value on transaction, checkpointer
 process can't see it.
 Due to that, even if synchronous_commit is changed to 'off' from 'on',
 synchronous_transfer doesn't work.
 I'm planning to modify the patch so that synchronous_transfer is not
 affected by synchronous_commit.

Hmm... when synchronous_transfer is set to data_flush,
IMO the intuitive behaviors are

(1) synchronous_commit = on
A data flush should wait for the corresponding WAL to be
flushed in the standby

(2) synchronous_commit = remote_write
A data flush should wait for the corresponding WAL to be
written to OS in the standby.

(3) synchronous_commit = local
(4) synchronous_commit = off
A data flush should wait for the corresponding WAL to be
written locally in the master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-19 Thread David Rowley
(Forgot to copy list)

 

From: David Rowley [mailto:dgrowle...@gmail.com] 
Sent: 19 September 2013 22:35
To: Albe Laurenz
Subject: Re: REVIEW: Allow formatting in log_line_prefix

 

Hi Laurenz,

Thanks for the review.

Please see my comments below and the updated patch attached.



On Thu, Sep 19, 2013 at 2:18 AM, Albe Laurenz laurenz.a...@wien.gv.at
mailto:laurenz.a...@wien.gv.at  wrote:

This is a review for patch
caaphdvpagtypzb2kwa0mmtksayg9+vashyjmjfatngxr1ad...@mail.gmail.com
mailto:caaphdvpagtypzb2kwa0mmtksayg9%2bvashyjmjfatngxr1ad...@mail.gmail.com
 

The patch is readable, applies fine and builds without warnings.

It contains sufficient documentation.

It works as it should, no crashes or errors.

It is well written, in fact it improves the readability of
the log_line_prefix function in backend/utils/error/elog.c.

I think that this is a useful feature as it can help to make
stderr logging more readable for humans.

I have a few gripes with the English:

- In the documentation patch:

  + numeric literal after the % and before the option. A negative
  + padding will cause the status information to be padded on the
  + right with spaces to give it a minimum width. Whereas a positive
  + padding will pad on the left. Padding can be useful keep log
  + files more human readable.

  I think that there should be a comma, not a period, before whereas.

 

I've made the change you request here and also made a change to my last
sentence, which hopefully is an improvement.

 

- The description for the function process_log_prefix_padding
  should probably mention that it returns NULL if it encounters
  bad syntax.

 

I've added a comment before the function for this.


 

- This comment:

  +   /* Format is invalid if it ends with the padding number */

  should begin with lower case.

 

Fixed, but not quite sure of the reason for lack of caps at the moment.

 

- In this comment:

  +   /*
  +* Process any formatting which may exist after the '%'.
  +* Note that this moves p passed the padding number
  +* if it exists.
  +*/

It should be past, not passed.

 

Fixed.

 

If these details get fixed, I'll mark the patch ready for committer.

Yours,
Laurenz Albe

 



log_line_formatting_v0.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-19 Thread Hannu Krosing
On 09/19/2013 10:54 AM, Dimitri Fontaine wrote:
 Kevin Grittner kgri...@ymail.com writes:
 There are examples in the patch and this thread, but rather than
 reference back to those I'll add a new one.  Without the patch:
 Thanks much for doing that.

 The problem, as I see it, is that the view and the concurrently
 refreshed materialized view don't yield the same results for the
 same query.  The rows are equal, but they are not the same.  With
 the patch the matview, after RMVC, looks just the same as the view.
 My understanding is that if you choose citext then you don't care at all
 about the case, so that the two relations actually yield the same
 results for the right definition of same here: the citext one.

 In other words, the results only look different in ways that don't
 matter for the datatype involved, and I think that if it matters to the
 user then he needs to review is datatype choices or view definition.

 So my position on that is that your patch is only adding confusion for
 no benefits that I'm able to understand.
The aim is to have a view and materialized view return the same results.

If they do not, the user is guaranteed to be confused and to consider
the matview implementation broken

the patch solves the general problem of when the table changes, refresh

After saying it like this, the problem could also be solved by including
xmin(s) for rows from underlying table(s)in the matview.

Would this be a better approach ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-09-19 Thread Heikki Linnakangas

On 18.09.2013 16:22, Andres Freund wrote:

On 2013-09-16 16:59:28 +0300, Heikki Linnakangas wrote:

Here's a rebased version of the patch, including the above-mentioned fixes.
Nothing else new.


* We need some higherlevel description of the algorithm somewhere in the
   source. I don't think I've understood the concept from the patch alone
   without having read the thread previously.
* why do we need to do the PageUpdateNeedsFreezing() dance in
   heap_page_prune? No xids should change during it.


Ah, but its LSN does change. Moving the LSN forward might move the LSN 
from one xid-lsn range to another, requiring any XIDs on the page that 
fall outside the xid range of the new xid-lsn range to be frozen.



* Why can we do a GetOldestXmin(allDbs = false) in
   BeginXidLSNRangeSwitch()?


Looks like a bug. I think I got the arguments backwards, was supposed to 
be allDbs = true and ignoreVacuum = false. I'm not sure if vacuums could 
be ignored, but 'false' is the safe option.



* Is there any concrete reasoning behind the current values for
   XID_LSN_RANGE_INTERVAL and NUM_XID_LSN_RANGES or just gut feeling?


The values in the patch are:

#define NUM_XID_LSN_RANGES 100
#define XID_LSN_RANGE_INTERVAL 100

The interval is probably way too small for production, but a small value 
is handy for testing, as you can get through ranges faster. Something 
like 100 million would probably be a more suitable value for production. 
With a smaller interval, you need to freeze more often, while with a 
large interval, you can't truncate the clog as early.


NUM_XID_LSN_RANGES is also quite arbitrary. I don't have a good feeling 
on what the appropriate sizing for that would be. Something like 5 or 10 
would probably be enough. Although at the moment, the patch doesn't 
handle the situation that you run out of slots very well. You could 
merge some old ranges to make room, but ATM, it just stops creating new 
ones.



* the lsn ranges file can possibly become bigger than 512bytes (the size
   we assume to be written atomically) and you write it inplace. If we
   fail halfway through writing, we seem to be able to recover by using
   the pageMatureLSN from the last checkpoint, but it seems better to
   do the fsync(),rename(),fsync() dance either way.


The lsn-range array is also written to the xlog in toto whenever it 
changes, so on recovery, the ranges will be read from the WAL, and the 
ranges file will be recreated at the end-of-recovery checkpoint.



* Should we preemptively freeze tuples on a page in lazy_scan_heap if we
   already have dirtied the page? That would make future modifcations
   cheaper.


Hmm, dunno. Any future modification will also dirty the page, so you're 
not actually saving any I/O by freezing it earlier. You're just choosing 
to do the CPU work and WAL insertion earlier than you have to. And if 
the page is not modified later, it is a waste of cycles. That said, 
maybe it would indeed be better to do it in vacuum when you have a 
chance to reduce latency in the critical path, even if it might be a waste.



* lazy_scan_heap now blocks acquiring a cleanup lock on every buffer
   that contains dead tuples. Shouldn't we use some kind of cutoff xid
   there? That might block progress too heavily. Also the comment above
   it still refers to the old logic.


Hmm, you mean like skip the page even if there are some dead tuples on 
it, as long as the dead tuples are not older than X xids? I guess we 
could do that. Or the other thing mentioned in the comments, ie. 
remember the page and come back to it later.


For now though, I think it's good enough as it is.


* There's no way to force a full table vacuum anymore, that seems
   problematic to me.


Yeah, it probably would be good to have that. An ignore visibility map 
option.



* I wonder if CheckPointVarsup() doesn't need to update
   minRecoveryPoint.


Hmm,  I guess it should.


StartupVarsup() should be ok, because we should only
   read one from the future during a basebackup?


You might read one from the future during recovery, whether it's crash 
or archive recovery, but as soon as you've replayed up to the min 
recovery point, or the end of WAL on crash recovery, the XID ranges 
array in memory should be consistent with the rest of the system, 
because changes to the array are WAL logged.



* xidlsnranges_recently[_dirtied] are not obvious on a first glance. Why
   can't we just reset dirty before the WriteXidLSNRangesFile() call?
   There's only one process doing the writeout. Just because the
   checkpointing process could be killed?


Right, if the write fails, you need to retry on the next checkpoint.


* I think we should either not require consuming an multixactid or use a
   function that doesn't need MultiXactIdSetOldestMember(). If the
   transaction doing so lives for long it will unnecessarily prevent
   truncation of mxacts.


Agreed, that was just a quick kludge to get it working.


* 

Re: [HACKERS] proposal: lob conversion functionality

2013-09-19 Thread Pavel Stehule
2013/9/19 Rushabh Lathia rushabh.lat...@gmail.com

 Hi Pavel,

 I have reviewed you patch.

 -- Patch got applied cleanly (using patch -p1)
 -- Make  Make install works fine
 -- make check looks good

 I done code-walk and it looks good. Also did some manual testing and
 haven't
 found any issue with the implementation.

 Patch introduced two new API load_lo() and make_lo() for loading and saving
 from/to large objects Functions. When it comes to drop an lo object created
 using make_lo() this still depend on older API lo_unlink(). I think we
 should
 add that into documentation for the clerification.

 As a user to lo object function when I started testing this new API, first
 question came to mind is why delete_lo() or destroy_lo() API is missing.
 Later I realize that need to use lo_unlink() older API for that
 functionality.
 So I feel its good to document that. Do let you know what you think ?


good idea

I'll send a updated patch evening



 Otherwise patch looks nice and clean.


Thank you :)

Regards

Pavel


 Regards,
 Rushabh Lathia
 www.EnterpriseDB.com



 On Sun, Aug 25, 2013 at 8:31 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 here is a patch

 it introduce a load_lo and make_lo functions

 postgres=# select make_lo(decode('ff00','hex'));
  make_lo
 ─
24629
 (1 row)

 Time: 40.724 ms
 postgres=# select load_lo(24628);
   load_lo
 
  \xff00
 (1 row)

 postgres=# \lo_import ~/avatar.png
 lo_import 24630

 postgres=# select md5(load_lo(24630));
md5
 ──
  513f60836f3b625713acaf1c19b6ea78
 (1 row)

 postgres=# \q
 bash-4.1$ md5sum ~/avatar.png
 513f60836f3b625713acaf1c19b6ea78  /home/pavel/avatar.png

 Regards

 Pavel Stehule



 2013/8/22 Jov am...@amutu.com

 +1
 badly need the large object and bytea convert function.

 Once I have to use the ugly pg_read_file() to put some text to pg,I
 tried to use large object but find it is useless without function to
 convert large object to bytea.

 Jov
 blog: http:amutu.com/blog http://amutu.com/blog


 2013/8/10 Pavel Stehule pavel.steh...@gmail.com

  Hello

 I had to enhance my older project, where XML documents are parsed and
 created on server side - in PLpgSQL and PLPerl procedures. We would to
 use a LO API for client server communication, but we have to
 parse/serialize LO on server side.

 I found so there are no simple API for working with LO from PL without
 access to file system. I had to use a ugly hacks:

 CREATE OR REPLACE FUNCTION parser.save_as_lob(bytea)
 RETURNS oid AS $$
 DECLARE
   _loid oid;
   _substr bytea;
 BEGIN
   _loid := lo_creat(-1);
   FOR i IN 0..length($1)/2048
   LOOP
 _substr := substring($1 FROM i * 2048 + 1 FOR 2048);
 IF _substr  '' THEN
   INSERT INTO pg_largeobject(loid, pageno, data)
 VALUES(_loid, i, _substr);
 END IF;
   END LOOP;

   EXECUTE format('GRANT SELECT ON LARGE OBJECT %s TO ohs', _loid);
   RETURN _loid;
 END;
 $$ LANGUAGE plpgsql SECURITY DEFINER STRICT SET search_path =
 'pg_catalog';

 and

 CREATE OR REPLACE FUNCTION fbuilder.attachment_to_xml(attachment oid)
 RETURNS xml AS $$
 DECLARE
   b_cum bytea = '';
   b bytea;
 BEGIN
   FOR b IN SELECT l.data
   FROM pg_largeobject l
  WHERE l.loid = attachment_to_xml.attachment
  ORDER BY l.pageno
   LOOP
 b_cum := b_cum || b;
   END LOOP;
   IF NOT FOUND THEN
 RETURN NULL;
   ELSE
 RETURN xmlelement(NAME attachment,
encode(b_cum, 'base64'));
   END IF;
 END;
 $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path =
 'pg_catalog';

 These functions can be simplified if we supports some functions like
 encode, decode for LO

 So my proposal is creating functions:

 * lo_encode(loid oid) .. returns bytea
 * lo_encode(loid oid, encoding text) .. returns text
 * lo_make(loid oid, data bytea)
 * lo_make(loid oid, data text, encoding text)

 This can simplify all transformation between LO and VARLENA. Known
 limit is 1G for varlena, but it is still relative enough high.

 Notes. comments?

 Regards

 Pavel


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




 --
 Rushabh Lathia



Re: [HACKERS] Assertions in PL/PgSQL

2013-09-19 Thread Pavel Stehule
2013/9/18 Jim Nasby j...@nasby.net

 On 9/14/13 11:55 PM, Pavel Stehule wrote:




 2013/9/15 Marko Tiikkaja ma...@joh.to mailto:ma...@joh.to


 On 2013-09-15 00:09, Pavel Stehule wrote:

 this is a possibility for introduction a new hook and possibility
 implement
 asserions and similar task in generic form (as extension). it can
 be
 assertions, tracing, profiling.


 You can already do tracing and profiling in an extension.  I don't
 see what you would put inside the function body for these two, either.


 you cannot mark a tracing points explicitly in current (unsupported now)
 extensions.

 These functions share  same pattern:

 CREATE OR REPLACE FUNCTION assert(boolean)
 RETURNS void AS $$
 IF current_setting('plpgsq.**assertions') = 'on' THEN
IF $1 THEN
  RAISE EXCEPTION 'Assert fails';
END IF;
 END IF;
 END;
 $$ LANGUAGE plpgsql;

 CREATE OR REPLACE FUNCTION trace(text)
 RETURNS void AS $$
 IF current_setting('plpgsq.trace'**) = 'on' THEN
  RAISE WARNING 'trace: %', $1; END IF;
 END;
 $$ LANGUAGE plpgsql;

 Depends on usage, these functions will not be extremely slow against to
 builtin solution - can be faster, if we implement it in C, and little bit
 faster if we implement it as internal PLpgSQL statement. But if you use a
 one not simple queries, then overhead is not significant (probably).

 You have to watch some global state variable and then execute (or not)
 some functionality.


 FWIW, we've written a framework (currently available in the EnovaTools
 project on pgFoundry) that allows for very, very fine-grain control over
 asserts.

 - Every assert has a name (and an optional sub-name) as well as a level
 - You can globally set the minimum level that will trigger an assert. This
 is useful for some debugging stuff; have an assert with a negative level
 and normally it won't fire unless you set the minimum level to be less than
 zero.
 - You can disable an assert globally (across all backends)
 - You can disable an assert only within your session

 We should eventually allow for disabling an assert only for your
 transaction; we just haven't gotten around to it yet.

 The reason for all this flexibility is the concept of it should be very
 difficult but not impossible for the code to do X. We use it for
 sanity-checking things.


I think so similar frameworks will be exists (we have some similar
functionality) in orafce too - and it is not reason why we should not merge
some function to core. I am with Marko, so some simple, user friendly
statement for assertions should be very nice plpgsql feature. I am
different in opinion how to implementat it and about syntax. I prefer a
possibility (not necessary currently implemented) to enhance this feature
for similar tasks (as buildin or external feature)

Probably You and me have a same opinion so only simple and very primitive
assert is not enough:

I see as useful feature for assertions:

a) possibility to specify a message (two parametric assert)
b) possibility to specify some threshold
c) possibility to specify some level (exception, warning, notice) ..
default should be exception
c) possibility to specify a handled/unhandled exception

Regards

Pavel





 --
 Jim C. Nasby, Data Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net



Re: [HACKERS] Assertions in PL/PgSQL

2013-09-19 Thread Marko Tiikkaja

On 9/18/13 5:11 PM, Pavel Stehule wrote:

In this code a assert fail can be lost in app log. Or can be knowingly
handled and ignored - what is wrong, and should not be allowed.

When I wrote a little bit complex procedures, I had to use a EXCEPTION WHEN
OTHERS clause - because I would not to lost a transaction. It worked, but
searching a syntax errors was significantly harder - so on base of this
experience I am thinking so some errors can be handled (related to database
usage) and others not - like syntax errors in PL/pgSQL or possible
assertions (although we can handle syntax error, but I don't think so it is
practical). It significantly increase a work that is necessary for error
identification.


I think that's a fair point.



Regards,
Marko Tiikkaja



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-19 Thread Marko Tiikkaja

On 9/19/13 2:08 PM, Pavel Stehule wrote:

I think so similar frameworks will be exists (we have some similar
Probably You and me have a same opinion so only simple and very primitive
assert is not enough:

I see as useful feature for assertions:

a) possibility to specify a message (two parametric assert)
b) possibility to specify some threshold
c) possibility to specify some level (exception, warning, notice) ..
default should be exception
c) possibility to specify a handled/unhandled exception


I think these are all neat ideas on how to further improve this feature. 
 I'd like to see at least a) in 9.4, but I haven't yet looked at how it 
could be implemented.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-19 Thread Kevin Grittner
Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Kevin Grittner kgri...@ymail.com writes:

 The problem, as I see it, is that the view and the concurrently
 refreshed materialized view don't yield the same results for the
 same query.  The rows are equal, but they are not the same. 
 With the patch the matview, after RMVC, looks just the same as
 the view.

 My understanding is that if you choose citext then you don't care
 at all about the case

That's not my understanding.  If that was what citext was for it
would be much simpler to force the case in creating each value.  It
*preserves* the case for display, but ignores it for comparisons.
That's the contract of the type, like it or not.  Equal does not
mean the same.  They clearly want to preserve and display
differences among equal values.

Streaming replication would preserve the differences.  pg_dump and
restore of the data would preserve the differences.  SELECTing the
data shows the differences.  suppress_redundant_updates_trigger()
will not suppress an UPDATE setting an equal value unless it is
also *the same*.  A normal VIEW shows the differences. RMV without
CONCURRENTLY will show the difference.  I'm suggesting that RMVC
and the upcoming incremental update of matviews should join this
crowd.

A matview which is being refreshed with the CONCURRENTLY option, or
maintained by incremental maintenance should not look different
from a regular VIEW, and should not suddenly look completely
different after a non-concurrent REFRESH.  If things are left that
way, people will quite justifiably feel that matviews are broken.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-19 Thread Andres Freund
On 2013-09-19 05:33:22 -0700, Kevin Grittner wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
  Kevin Grittner kgri...@ymail.com writes:
 
  The problem, as I see it, is that the view and the concurrently
  refreshed materialized view don't yield the same results for the
  same query.  The rows are equal, but they are not the same. 
  With the patch the matview, after RMVC, looks just the same as
  the view.
 
  My understanding is that if you choose citext then you don't care
  at all about the case
 
 That's not my understanding.  If that was what citext was for it
 would be much simpler to force the case in creating each value.  It
 *preserves* the case for display, but ignores it for comparisons.
 That's the contract of the type, like it or not.  Equal does not
 mean the same.  They clearly want to preserve and display
 differences among equal values.

I agree.

I am not 100% sure if the can of worms this opens is worth the trouble,
but from my POV it's definitely an understandable and sensible goal.

My complaints about this subfeature were never about trying to get
that right for matviews...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-19 Thread Pavel Stehule
2013/9/19 Marko Tiikkaja ma...@joh.to

 On 9/19/13 2:08 PM, Pavel Stehule wrote:

 I think so similar frameworks will be exists (we have some similar
 Probably You and me have a same opinion so only simple and very primitive
 assert is not enough:

 I see as useful feature for assertions:

 a) possibility to specify a message (two parametric assert)
 b) possibility to specify some threshold
 c) possibility to specify some level (exception, warning, notice) ..
 default should be exception
 c) possibility to specify a handled/unhandled exception


 I think these are all neat ideas on how to further improve this feature.
  I'd like to see at least a) in 9.4, but I haven't yet looked at how it
 could be implemented.


Not all must be implemented in 9.4, although it is +/- only exception
parametrization - not hard for implementation.

But syntax should be prepared for this functionality (or should be
extensible as minimum) before. Bison parser is not friendly for additional
extending :( - and we can break a future extending simply just only on
syntax level with bad design now. It is reason, why I am doing noise here.
I remember relatively difficult extending of RAISE statement.

Regards

Pavel





 Regards,
 Marko Tiikkaja



Re: [HACKERS] Freezing without write I/O

2013-09-19 Thread Andres Freund
Hi,

On 2013-09-19 14:42:19 +0300, Heikki Linnakangas wrote:
 On 18.09.2013 16:22, Andres Freund wrote:
 * Why can we do a GetOldestXmin(allDbs = false) in
BeginXidLSNRangeSwitch()?
 
 Looks like a bug. I think I got the arguments backwards, was supposed to be
 allDbs = true and ignoreVacuum = false. I'm not sure if vacuums could be
 ignored, but 'false' is the safe option.

Not sure either...

 * the lsn ranges file can possibly become bigger than 512bytes (the size
we assume to be written atomically) and you write it inplace. If we
fail halfway through writing, we seem to be able to recover by using
the pageMatureLSN from the last checkpoint, but it seems better to
do the fsync(),rename(),fsync() dance either way.
 
 The lsn-range array is also written to the xlog in toto whenever it changes,
 so on recovery, the ranges will be read from the WAL, and the ranges file
 will be recreated at the end-of-recovery checkpoint.

But we will read the file before we read the WAL record covering it,
won't we?

 * Should we preemptively freeze tuples on a page in lazy_scan_heap if we
already have dirtied the page? That would make future modifcations
cheaper.

 Hmm, dunno. Any future modification will also dirty the page, so you're not
 actually saving any I/O by freezing it earlier. You're just choosing to do
 the CPU work and WAL insertion earlier than you have to. And if the page is
 not modified later, it is a waste of cycles. That said, maybe it would
 indeed be better to do it in vacuum when you have a chance to reduce latency
 in the critical path, even if it might be a waste.

Well, if we already dirtied the page (as in vacuum itself, to remove
dead tuples) we already have to do WAL. True it's a separate record, but
that should probably be fixed.

 * lazy_scan_heap now blocks acquiring a cleanup lock on every buffer
that contains dead tuples. Shouldn't we use some kind of cutoff xid
there? That might block progress too heavily. Also the comment above
it still refers to the old logic.
 
 Hmm, you mean like skip the page even if there are some dead tuples on it,
 as long as the dead tuples are not older than X xids? I guess we could do
 that. Or the other thing mentioned in the comments, ie. remember the page
 and come back to it later.

 For now though, I think it's good enough as it is.

I have seen vacuum being slowed down considerably because we couldn't
acquire a cleanup lock. It's not that infrequent to have pages that are
pinned most of the time. And that was when we only acquired the cleanup
lock when necessary for freezing.

Not processing the page will not mark it as all-visible which basically
is remembering it...

 * switchFinishXmin and nextSwitchXid should probably be either volatile
or have a compiler barrier between accessing shared memory and
checking them. The compiler very well could optimize them away and
access shmem all the time which could lead to weird results.
 
 Hmm, that would be a weird optimization. Is that really legal for the
 compiler to do? Better safe than sorry, I guess.

I think it is. The compiler doesn't know anything about shared memory or
threads, so it doesn't see that those parameters can change. It will be
forced to load them as soon as an non-inlined function is called which
actually might make it safe in this case - unless you compile in LTO
mode where it recognizes that TransactionIdFollowsOrEquals() won't
modify thigns...

 * I wonder whether the fact that we're doing the range switches after
acquiring an xid could be problematic if we're preventing xid
allocation due to the checks earlier in that function?
 
 You mean that you might get into a situation where you cannot assign a new
 XID because you've reached xidStopLimit, and you cannot advance xidStopLimit
 because you can't switch to a new xid-lsn range, because no new XIDs are
 being assigned? Yeah, that would be nasty. The range management stuff should
 really be moved out of GetNewTransaction(), maybe do them in walwriter or
 bgwriter as Alvaro suggested.

Yes, I think we should do that. I am not sure yet where to put it
best though. Doing it in the wal writer doesn't seem to be a good idea, doing
more there will increase latency for normal backends.

 * I think heap_lock_tuple() needs to unset all-visible, otherwise we
won't vacuum that page again which can lead to problems since we
don't do full-table vacuums again?
 
 It's OK if the page is never vacuumed again, the whole point of the patch is
 that old XIDs can be just left lingering in the table. The next time the
 page is updated, perhaps to lock a tuple again, the page will be frozen and
 the old xmax will be cleared.

Yes, you're right, it should be possible to make it work that way. But
currently, we look at xmax and infomask of a tuple in heap_lock_tuple()
*before* the PageUpdateNeedsFreezing() call. Currently we would thus
create a new multixact with long dead xids as members and 

Re: [HACKERS] record identical operator

2013-09-19 Thread Kevin Grittner
Hannu Krosing ha...@2ndquadrant.com wrote:

 the patch solves the general problem of when the table changes,
 refresh

 After saying it like this, the problem could also be solved by
 including xmin(s) for rows from underlying table(s)in the
 matview.

 Would this be a better approach ?

Now you're moving from REFRESH territory into incremental
maintenance.  There have been a very large number of papers written
on that topic, I have reviewed the literature, and I have picked a
technique which looks like it will be fast and reliable -- based on
relational algebra.  Let's save discussion of alternatives such as
you're suggesting here, for when I get past the easy stuff ... like
just refreshing a view so that the new contents are the same as
what you would see if you re-ran the query defining the matview.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-19 Thread Peter Eisentraut
Something is weird in your latest patch.  The header is:

diff -u -r b/postgresql/doc/src/sgml/config.sgml 
a/postgresql/doc/src/sgml/config.sgml
--- b/postgresql/doc/src/sgml/config.sgml   2013-09-09 23:40:52.356371191 
+1200
+++ a/postgresql/doc/src/sgml/config.sgml   2013-09-19 22:13:26.236681468 
+1200

This doesn't apply with patch -p1 or -p0.  (You need -p2.)

Your previous patch in this thread seemed OK.  You might want to check what you 
did there.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] information schema parameter_default implementation

2013-09-19 Thread Amit Khandekar
On 15 September 2013 01:35, Peter Eisentraut pete...@gmx.net wrote:

 Here is an updated patch which fixes the bug you have pointed out.

 On Thu, 2013-01-31 at 18:59 +0500, Ali Dar wrote:

  I checked our your patch. There seems to be an issue when we have OUT
  parameters after the DEFAULT values.

 Fixed.

  Some other minor observations:
  1) Some variables are not lined in pg_get_function_arg_default().

 Are you referring to indentation issues?  I think the indentation is
 good, but pgindent will fix it anyway.

I find only proc variable not aligned. Adding one more tab for all the
variables should help.

  2) I found the following check a bit confusing, maybe you can make it
  better
  if (!argmodes || argmodes[i] == PROARGMODE_IN || argmodes[i] ==
  PROARGMODE_INOUT || argmodes[i] == PROARGMODE_VARIADIC)

 Factored that out into a separate helper function.


The statement needs to be split into 80 columns width lines.

 
  2) inputargn can be assigned in declaration.

 I'd prefer to initialize it close to where it is used.

Me too.

  3) Function level comment for pg_get_function_arg_default() is
  missing.

 I think the purpose of the function is clear.

Unless a reader goes through the definition, it is not obvious whether the
second function argument represents the parameter position in input
parameters or it is the parameter position in *all* the function parameters
(i.e. proargtypes or proallargtypes). I think at least a one-liner
description of what this function does should be present.

  4) You can also add comments inside the function, for example the
  comment for the line:
  nth = inputargn - 1 - (proc-pronargs - proc-pronargdefaults);

 Suggestion?

Yes, it is difficult to understand what's the logic behind this
calculation, until we go read the documentation related to
pg_proc.proargdefaults. I think this should be sufficient:
/*
* proargdefaults elements always correspond to the last N input arguments,
* where N = pronargdefaults. So calculate the nth_default accordingly.
*/


  5) I think the line added in the
  documentation(informational_schema.sgml) is very long. Consider
  revising. Maybe change from:
 
  The default expression of the parameter, or null if none or if the
  function is not owned by a currently enabled role. TO
 
  The default expression of the parameter, or null if none was
  specified. It will also be null if the function is not owned by a
  currently enabled role.
 
  I don't know what do you exactly mean by: function is not owned by a
  currently enabled role?

 I think this style is used throughout the documentation of the
 information schema.  We need to keep the descriptions reasonably
 compact, but I'm willing to entertain other opinions.

This requires first an answer to my earlier question about why the
role-related privilege is needed.
---
There should be an initial check to see if nth_arg is passed a negative
value. It is used as an index into the argmodes array, so a -ve array index
can cause a crash. Because pg_get_function_arg_default() can now be called
by a user also, we need to validate the input values. I am ok with
returning with an error Invalid argument.
---
Instead of :
deparse_expression_pretty(node, NIL, false, false, 0, 0)
you can use :
deparse_expression(node, NIL, false, false)

We are anyway not using pretty printing.
---
Other than this, I have no more issues.

---
After the final review is over, the catversion.h requires the appropriate
date value.




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] record identical operator

2013-09-19 Thread Hannu Krosing
On 09/19/2013 02:41 PM, Kevin Grittner wrote:
 Hannu Krosing ha...@2ndquadrant.com wrote:

 the patch solves the general problem of when the table changes,
 refresh

 After saying it like this, the problem could also be solved by
 including xmin(s) for rows from underlying table(s)in the
 matview.

 Would this be a better approach ?
 Now you're moving from REFRESH territory into incremental
 maintenance.  There have been a very large number of papers written
 on that topic, I have reviewed the literature, and I have picked a
 technique which looks like it will be fast and reliable -- based on
 relational algebra.  Let's save discussion of alternatives such as
 you're suggesting here, for when I get past the easy stuff ... like
 just refreshing a view so that the new contents are the same as
 what you would see if you re-ran the query defining the matview.
I'm sure that comparing xmin records would work exactly
similar to binary comparisons of records for detecting
possible change in 99.999% of real-world cases.

I am also pretty sure it would have its own cans of worms
all over again when trying to actually implement this in
matviews :)

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-09-19 Thread Andres Freund
On 2013-09-19 14:40:35 +0200, Andres Freund wrote:
  * I think heap_lock_tuple() needs to unset all-visible, otherwise we
 won't vacuum that page again which can lead to problems since we
 don't do full-table vacuums again?
  
  It's OK if the page is never vacuumed again, the whole point of the patch is
  that old XIDs can be just left lingering in the table. The next time the
  page is updated, perhaps to lock a tuple again, the page will be frozen and
  the old xmax will be cleared.
 
 Yes, you're right, it should be possible to make it work that way. But
 currently, we look at xmax and infomask of a tuple in heap_lock_tuple()
 *before* the PageUpdateNeedsFreezing() call. Currently we would thus
 create a new multixact with long dead xids as members and such.

That reminds me of something:

There are lots of checks sprayed around that unconditionally look at
xmin, xmax or infomask.

Things I noticed in a quick look after thinking about the previous
statment:
* AlterEnum() looks at xmin
* The PLs look at xmin
* So does afair VACUUM FREEZE, COPY and such.
* Several HeapTupleSatisfiesVacuum() callers look at xmin and stuff
  without checking for maturity or freezing the page first.
  * lazy_scan_heap() itself if the page isn't already marked all_visible
  * heap_page_is_all_visible()
* rewrite_heap_tuple() does an unconditional heap_freeze_tuple() without
  considering maturity/passing in Invalid*
* heap_page_prune_opt() shouldn't do anything on a mature (or even all
  visible) page.
* heap_page_prune() marks a buffer dirty, writes a new xid without
  changing the lsn.
* heap_prune_chain() looks at tuples without considering page maturity,
  but the current implementation actually looks safe.
* CheckForSerializableConflictOut() looks at xmins.
* pgrowlocks() looks at xmax unconditionally.
* heap_update() computes, just like heap_lock_tuple(), a new
  xmax/infomask before freezing.
* Same for heap_lock_updated_tuple(). Not sure if that's an actual
  concern, but it might if PageMatureLSN advances or such.
* heap_getsysattr() should probably return different things for mature
  pages.

There's probably more.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Sawada Masahiko
On Thu, Sep 19, 2013 at 7:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


 Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

 The currently patch synchronous_transfer doesn't work when
 synchronous_commit is set 'off' or 'local'.
 if user changes synchronous_commit value on transaction, checkpointer
 process can't see it.
 Due to that, even if synchronous_commit is changed to 'off' from 'on',
 synchronous_transfer doesn't work.
 I'm planning to modify the patch so that synchronous_transfer is not
 affected by synchronous_commit.

 Hmm... when synchronous_transfer is set to data_flush,
 IMO the intuitive behaviors are

 (1) synchronous_commit = on
 A data flush should wait for the corresponding WAL to be
 flushed in the standby

 (2) synchronous_commit = remote_write
 A data flush should wait for the corresponding WAL to be
 written to OS in the standby.

 (3) synchronous_commit = local
 (4) synchronous_commit = off
 A data flush should wait for the corresponding WAL to be
 written locally in the master.


It is good idea.
So synchronous_commit value need to be visible from other process.
To share synchronous_commit with other process, I will try to put
synchronous_commit
value into shared buffer.
Is there already the guc parameter which is shared with other process?
I tried to find such parameter, but there was not it.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote:
 Rebased patches attached.

I spent a bit of time looking at these patches yesterday and today.
It seems to me that there's a fair amount of stylistic cleanup that is
still needed, and some pretty bad naming choices, and some FIXMEs that
should probably be fixed, but for an initial review email it seems
more helpful to focus on high-level design, so here goes.

- Looking specifically at the 0004 patch, I think that the
RecentGlobalDataXmin changes are logically separate from the rest of
the patch, and that if we're going to commit them at all, it should be
separate from the rest of this.  I think this is basically a
performance optimization.  AFAICS, there's no correctness problem with
the idea of continuing to maintain a single RecentGlobalXmin; it's
just that you'll potentially end up with quite a lot of bloat.  But
that's not an argument that those two things HAVE to be committed
together; either could be done first, and independently of the other.
Also, these changes are quite complex and it's different to form a
judgement as to whether this idea is safe when they are intermingled
with the rest of the logical replication stuff.

More generally, the thing that bugs me about this approach is that
logical replication is not really special, but what you've done here
MAKES it special. There are plenty of other situations where we are
too aggressive about holding back xmin.  A single-statement
transaction that selects from a single table holds back xmin for the
entire cluster, and that is Very Bad.  It would probably be unfair to
say, well, you have to solve that problem first.  But on the other
hand, how do we know that the approach you've adopted here isn't going
to make the more general problem harder to solve?  It seems invasive
at a pretty low level.  I think we should at least spend some time
thinking about what *general* solutions to this problem would like
like and then decide whether this is approach is likely to be
forward-compatible with those solutions.

- There are no changes to the doc directory.  Obviously, if you're
going to add a new value for the wal_level GUC, it's gonna need to be
documented. Similarly, pg_receivellog needs to be documented.  In all
likelihood, you also need a whole chapter providing general background
on this technology.  A couple of README files is not going to do it,
and those files aren't suitable for check-in anyway (e.g. DESIGN.txt
makes reference to a URL where the current version of some patch can
be found; that's not appropriate for permanent documentation).  But
aside from that, what we really need here is user documentation, not
developer documentation.  I can perhaps pass judgement on whether the
guts of this functionality do things that are fundamentally unsafe,
but whether the user interface is good or bad is a question that
deserves broader input, and without documentation, most people aren't
going to understand it well enough to know whether they like it.  And
TBH, *I* don't really want to reverse-engineer what pg_receivellog
does from a read-through of the code, either.

- Given that we don't reassemble transactions until commit time, why
do we need to to ensure that XIDs are logged before their sub-XIDs
appear in WAL?  As I've said previously, I actually think that
on-the-fly reassembly is probably going to turn out to be very
important.  But if we're not getting that, do we really need this?
Also, while I'm trying to keep this email focused on high-level
concerns, I have to say that guaranteedlyLogged has got to be one of
the worst variable names I've ever seen, starting (but not ending)
with the fact that guaranteedly is not a word.  I'm also tempted to
say that all of the wal_level=logical changes should be split out as
their own patch, separate from the decoding stuff.  Personally, I
would find that much easier to review, although I admit it's less
critical here than for the RecentGlobalDataXmin stuff.

- If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
facility can't be used to replicate a system catalog table?  Is that a
restriction we should enforce/document somehow?

- The new code is rather light on comments.  decode.c is extremely
light. For example, consider the function DecodeAbort(), which
contains the following comment:

+   /*
+* this is a bit grotty, but if we're faking an abort we've
already gone
+* through
+*/

Well, I have no idea what that means.  I'm sure you do, but I bet the
next person who isn't you that tries to understand this probably
won't.  It's also probably true that I could figure it out if I spent
more time on it, but I think the point of comments is to keep the
amount of time that must be spent trying to understand code to a
manageable level.  Generally, I'd suggest that any non-trivial
functions in these files should have a header comment explaining what
their job is; e.g. for DecodeStandbyOp you 

Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 2:27 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Robert == Robert Haas robertmh...@gmail.com writes:

   Someone should do the same in WaitForBackgroundWorkerStartup so
   that building with -Werror works.

  Robert I don't get a warning there.  Can you be more specific about
  Robert the problem?

 bgworker.c: In function 'WaitForBackgroundWorkerStartup':
 bgworker.c:866: warning: 'pid' may be used uninitialized in this function

Does the attached patch fix it for you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bgworker-wait-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 10:09 PM, Daniel Farina dan...@heroku.com wrote:
 I'm not sure how many of you have been tracking this but courtesy of
 lwn.net I have learned that it seems that the OOM killer behavior in
 Linux 3.12 will be significantly different.  And by description, it
 sounds like an improvement.  I thought some people reading -hackers
 might be interested.

 Based on the description at lwn, excerpted below, it sounds like the
 news might be that systems with overcommit on might return OOM when a
 non-outlandish request for memory is made from the kernel.

 
 Johannes Weiner has posted a set of patches aimed at improving this
 situation. Following a bunch of cleanup work, these patches make two
 fundamental changes to how OOM conditions are handled in the kernel.
 The first of those is perhaps the most visible: it causes the kernel
 to avoid calling the OOM killer altogether for most memory allocation
 failures. In particular, if the allocation is being made in response
 to a system call, the kernel will just cause the system call to fail
 with an ENOMEMerror rather than trying to find a process to kill. That
 may cause system call failures to happen more often and in different
 contexts than they used to. But, naturally, that will not be a problem
 since all user-space code diligently checks the return status of every
 system call and responds with well-tested error-handling code when
 things go wrong.
 

 Subject to experiment, this may be some good news, as many programs,
 libraries, and runtime environments that may run parallel to Postgres
 on a machine are pretty lackadaisical about limiting the amount of
 virtual memory charged to them, and overcommit off is somewhat
 punishing in those situations if one really needed a large hash table
 from Postgres or whatever.  I've seen some cases here where a good
 amount of VM has been reserved and caused apparent memory pressure
 that cut throughput short of what should ought to be possible.

Yes, that does sound good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Andres Freund
Hi Robert,

On 2013-09-19 10:02:31 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Rebased patches attached.
 
 I spent a bit of time looking at these patches yesterday and today.
 It seems to me that there's a fair amount of stylistic cleanup that is
 still needed, and some pretty bad naming choices, and some FIXMEs that
 should probably be fixed, but for an initial review email it seems
 more helpful to focus on high-level design, so here goes.

Thanks for looking at it.

Yes, I think the highlevel stuff is the important bit.

As you note, the documentation needs to be written and that's certainly
not a small task. But doing so before the highlevel design is agreed
upon makes it too likely that it will need to be entirely scrapped.

 - Looking specifically at the 0004 patch, I think that the
 RecentGlobalDataXmin changes are logically separate from the rest of
 the patch, and that if we're going to commit them at all, it should be
 separate from the rest of this.  I think this is basically a
 performance optimization.  AFAICS, there's no correctness problem with
 the idea of continuing to maintain a single RecentGlobalXmin; it's
 just that you'll potentially end up with quite a lot of bloat.  But
 that's not an argument that those two things HAVE to be committed
 together; either could be done first, and independently of the other.
 Also, these changes are quite complex and it's different to form a
 judgement as to whether this idea is safe when they are intermingled
 with the rest of the logical replication stuff.

Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers
(primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on
that and considered it critical. I argued for a considerable amount of
time that it shouldn't be done in an initial patch and then gave in.

They have a point though, if you e.g. replicate a pgbench -j16 workload
the addition of RecentGlobalDataXmin reduces the performance impact of
replication from about 60% to less than 5% in my measurements. Turns out
heap pruning is damn critical for that kind of workload.

 More generally, the thing that bugs me about this approach is that
 logical replication is not really special, but what you've done here
 MAKES it special. There are plenty of other situations where we are
 too aggressive about holding back xmin.  A single-statement
 transaction that selects from a single table holds back xmin for the
 entire cluster, and that is Very Bad.  It would probably be unfair to
 say, well, you have to solve that problem first.  But on the other
 hand, how do we know that the approach you've adopted here isn't going
 to make the more general problem harder to solve?  It seems invasive
 at a pretty low level.

The reason why I think it's actually different is that the user actually
has control over how long transactions are running on the primary. They
don't really control how fast a replication consumer consumes and how
often it sends feedback messages.

 I think we should at least spend some time
 thinking about what *general* solutions to this problem would like
 like and then decide whether this is approach is likely to be
 forward-compatible with those solutions.

I thought about the general case for a good bit and decided that all
solutions that work in a more general scenario are complex enough that I
don't want to implement them. And I don't really see any backward
compatibility concerns here - removing the logic of using a separate
horizon for user tables in contrast to system tables is pretty trivial
and shouldn't have any external effect. Except pegging the horizon more,
but that's what the new approach would fix, right?

 - Given that we don't reassemble transactions until commit time, why
 do we need to to ensure that XIDs are logged before their sub-XIDs
 appear in WAL?

Currently it's important to know where the oldest transaction that is
alive started at to determine from where we need to restart
decoding. That's done by keeping a lsn-ordered list of in progress
toplevel transactions. The above invariant makes it cheap to maintain
that list.

 As I've said previously, I actually think that
 on-the-fly reassembly is probably going to turn out to be very
 important. But if we're not getting that, do we really need this?

It's also preparatory for supporting that.

I agree that it's pretty important, but after actually having
implemented a replication solution using this, I still think that most
usecase won't using it when available. I plan to work on implementing
that.

 Also, while I'm trying to keep this email focused on high-level
 concerns, I have to say that guaranteedlyLogged has got to be one of
 the worst variable names I've ever seen, starting (but not ending)
 with the fact that guaranteedly is not a word.  I'm also tempted to
 say that all of the wal_level=logical changes should be split out as
 their own patch, separate from the 

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Merlin Moncure
On Thu, Sep 19, 2013 at 9:12 AM, Robert Haas robertmh...@gmail.com wrote:
 But, naturally, that will not be a problem
 since all user-space code diligently checks the return status of every
 system call and responds with well-tested error-handling code when
 things go wrong.

That just short circuited my sarcasm detector.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 11:30 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 9:12 AM, Robert Haas robertmh...@gmail.com wrote:
 But, naturally, that will not be a problem
 since all user-space code diligently checks the return status of every
 system call and responds with well-tested error-handling code when
 things go wrong.

 That just short circuited my sarcasm detector.

I laughed, too, but the reality is that at least as far as PG is
concerned it's probably a truthful statement, and if it isn't, nobody
here is likely to complain about having to fix it.  Yeah, there's a
lot of other code out there not as well written or maintained as PG,
but using SIGKILL as a substitute for ENOMEM because people might not
be checking the return value for malloc() is extremely heavy-handed
nannyism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Andres Freund
On 2013-09-19 18:23:07 +0200, Dimitri Fontaine wrote:
 I've been told at several instances that this has been made for the JVM
 and other such programs that want to allocate huge amount of memory even
 if they don't really intend to use it.

That's not really related - what you describe is memory overcommitting
(which as lots of uses besides JVMs). That's not removed by the changes
references upthread.
What has changed is how to react to situations where memory has been
overcommitted but is now actually needed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 12:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 The problem is that it's not just about malloc() (aka brk() and
 mmap()) and friends. It's about many of the other systemcalls. Like
 e.g. send() to name one of the more likely ones.

*shrug*

If you're using for send() and not testing for a -1 return value,
you're writing amazingly bad code anyway.  And if you ARE testing for
-1, you'll probably do something at least mildly sensible with a
not-specifically-foreseen errno value, like print a message that
includes %m.  That's about what we'd probably do, and I have to
imagine what most people would do.

I'm not saying it won't break anything to return a proper error code;
I'm just saying that sending SIGKILL is worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Andres Freund
On 2013-09-19 11:49:05 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 11:30 AM, Merlin Moncure mmonc...@gmail.com wrote:
  On Thu, Sep 19, 2013 at 9:12 AM, Robert Haas robertmh...@gmail.com wrote:
  But, naturally, that will not be a problem
  since all user-space code diligently checks the return status of every
  system call and responds with well-tested error-handling code when
  things go wrong.
 
  That just short circuited my sarcasm detector.
 
 I laughed, too, but the reality is that at least as far as PG is
 concerned it's probably a truthful statement, and if it isn't, nobody
 here is likely to complain about having to fix it.  Yeah, there's a
 lot of other code out there not as well written or maintained as PG,
 but using SIGKILL as a substitute for ENOMEM because people might not
 be checking the return value for malloc() is extremely heavy-handed
 nannyism.

The problem is that it's not just about malloc() (aka brk() and
mmap()) and friends. It's about many of the other systemcalls. Like
e.g. send() to name one of the more likely ones.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 10:43 AM, Andres Freund and...@2ndquadrant.com wrote:
 - Looking specifically at the 0004 patch, I think that the
 RecentGlobalDataXmin changes are logically separate from the rest of
 the patch, and that if we're going to commit them at all, it should be
 separate from the rest of this.  I think this is basically a
 performance optimization.  AFAICS, there's no correctness problem with
 the idea of continuing to maintain a single RecentGlobalXmin; it's
 just that you'll potentially end up with quite a lot of bloat.  But
 that's not an argument that those two things HAVE to be committed
 together; either could be done first, and independently of the other.
 Also, these changes are quite complex and it's different to form a
 judgement as to whether this idea is safe when they are intermingled
 with the rest of the logical replication stuff.

 Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers
 (primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on
 that and considered it critical. I argued for a considerable amount of
 time that it shouldn't be done in an initial patch and then gave in.

 They have a point though, if you e.g. replicate a pgbench -j16 workload
 the addition of RecentGlobalDataXmin reduces the performance impact of
 replication from about 60% to less than 5% in my measurements. Turns out
 heap pruning is damn critical for that kind of workload.

No question.  I'm not saying that that optimization shouldn't go in
right after the main patch does, but IMHO right now there are too many
things going in the 0004 patch to discuss them all simultaneously.
I'd like to find a way of splitting this up that will let us
block-and-tackle individual pieces of it, even we end up committing
them all one right after the other.

But that raises an interesting question: why is the overhead so bad?
I mean, this shouldn't be any worse than having a series of
transactions running concurrently with pgbench that take a snapshot
and hold it for as long as it takes the decoding process to decode the
most-recently committed transaction.  Is the issue here that we can't
advance xmin until we've fsync'd the fruits of decoding down to disk?
If so, that's mighty painful.  But we'd really only need to hold back
xmin in that situation when some catalog change has occurred
meanwhile, which for pgbench means never.  So something seems fishy
here.

 I thought about the general case for a good bit and decided that all
 solutions that work in a more general scenario are complex enough that I
 don't want to implement them. And I don't really see any backward
 compatibility concerns here - removing the logic of using a separate
 horizon for user tables in contrast to system tables is pretty trivial
 and shouldn't have any external effect. Except pegging the horizon more,
 but that's what the new approach would fix, right?

Hmm, maybe.

 Also, while I'm trying to keep this email focused on high-level
 concerns, I have to say that guaranteedlyLogged has got to be one of
 the worst variable names I've ever seen, starting (but not ending)
 with the fact that guaranteedly is not a word.  I'm also tempted to
 say that all of the wal_level=logical changes should be split out as
 their own patch, separate from the decoding stuff.  Personally, I
 would find that much easier to review, although I admit it's less
 critical here than for the RecentGlobalDataXmin stuff.

 I can do that again and it actually was that way in the past. But
 there's no user for it before the later patch and it's hard to
 understand the reasoning for the changed wal logging separately, that's
 why I merged it at some point.

OK.  If I'm committing it, I'd prefer to handle that piece separately,
if possible.

 - If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
 facility can't be used to replicate a system catalog table?  Is that a
 restriction we should enforce/document somehow?

 Currently catalog tables aren't replicated, yes. They simply are skipped
 during decoding. XLOG_HEAP_INPLACE isn't the primary reason for that
 though.

 Do you see a usecase for it?

I can imagine someone wanting to do it, but I think we can live with
it not being supported.

 - It still bothers me that we're going to have mandatory slots for
 logical replication and no slots for physical replication.  Why are
 slots mandatory in one case and not even allowable in the other?  Is
 it sensible to think about slotless logical replication - or maybe I
 should say, is it any LESS sensible than slotless physical
 replication?

 Well, as you know, I do want to have slots for physical replication as
 well. But there actually is a fundamental difference why we need it for
 logical rep and not for physical: In physical replication, if the xmin
 progresses too far, client queries will be cancelled. Annoying but not
 fatal. In logical replication we will not be able to continue
 replicating since we cannot decode the WAL 

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Dimitri Fontaine
Andres Freund and...@2ndquadrant.com writes:
 What has changed is how to react to situations where memory has been
 overcommitted but is now actually needed.

Sure. You either have a failure at malloc() or usage, over commit is all
about never failing at malloc(), but now you have to deal with OOM
conditions in creative way, like with the OOM Killer.

Anyways,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in PLPgSQL

2013-09-19 Thread dlight
We intsall postgresql 9.3.0 server from FreeBSD ports  
http://svnweb.freebsd.org/ports/head/databases/postgresql93-server/ the
administrator says that he already contains this patch.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-problem-in-PLPgSQL-tp5764796p5771639.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 7:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 1. We're not in a huge hurry to ensure that sinval notifications are
 delivered in a timely fashion.  We know that sinval resets are bad, so
 if a backend is getting close to needing a sinval reset, we kick it in
 an attempt to get it to AcceptInvalidationMessages().  But if the
 sinval queue isn't filling up, there's no upper bound on the amount of
 time that can pass before a particular sinval is read.  Therefore, the
 amount of time that passes before an idle backend is forced to drain
 the sinval queue can vary widely, from a fraction of a second to
 minutes, hours, or days.   So it's kind of unappealing to think about
 making user-visible behavior dependent on how long it ends up taking.

 Well, when we're signalling it's certainly faster than waiting for the
 other's snapshot to vanish which can take ages for normal backends. And
 we can signal when we wait for consumption without too many
 problems.
 Also, I think in most of the usecases we can simply not wait for any of
 the idle backends, those don't use the old definition anyway.

Possibly.  It would need some thought, though.

 I am pretty sure there's quite a bit to improve around sinvals but I
 think any replacement would look surprisingly similar to what we
 have. So I think doing it incrementally is more realistic.
 And I am certainly scared by the thought of having to replace it without
 breaking corner cases all over.

I guess I was more thinking that we might want some parallel mechanism
with somewhat different semantics.  But that might be a bad idea
anyway.  On the flip side, if I had any clear idea how to adapt the
current mechanism to suck less, I would have done it already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-19 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I laughed, too, but the reality is that at least as far as PG is
 concerned it's probably a truthful statement, and if it isn't, nobody
 here is likely to complain about having to fix it.  Yeah, there's a
 lot of other code out there not as well written or maintained as PG,
 but using SIGKILL as a substitute for ENOMEM because people might not
 be checking the return value for malloc() is extremely heavy-handed
 nannyism.

I've been told at several instances that this has been made for the JVM
and other such programs that want to allocate huge amount of memory even
if they don't really intend to use it.

Back in the day that amount could well be greater that the actual amount
of physical memory available. So the only way to allow Java applications
on Linux was, as I've been told, to implement OOM. And as the target was
the desktop, well, have it turned on by default.

Now, I liked that story enough to never actually try and check about it,
so if some knows for real why the linux kernel appears so stupid in its
choice of implementing OOM and turning it on by default…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not In Foreign Key Constraint

2013-09-19 Thread Misa Simic
2013/9/19 David Johnston pol...@yahoo.com

 Misa Simic wrote
  I guess that rule can be achieved with triigers on TableA and TableC -
 but
  the same is true for FK (and FK constraint is more effective then trigger
  -
  that is why I wonder would it be useful/achievable to create that kind of
  constraint)
 
  Thoughts, ideas?

 You create a common keys in use table and only insert a record into the
 main tables if you can successfully add the desired key to the shared keys
 table ( as a unique value ).  Setup a normal FK to that table to help
 enforce that valid records must exist on the keys table.  Not fool-proof
 but
 you only need to worry about insertions - delete from the pk table to
 remove
 the record from the main table and free up the key.

 David J.






Thanks David,

Yes, that is one of ways that goal can be achieved via triggers (or to let
someone else worry about that Key is inserted/updated/deleted in Master
Table first...)

Constraint - should be more effective way... (It shouldnt be mixed with FK
constraint - even it is opposite on some kind... - it was just simplest way
to describe the feature)

And it should ensure that every row in table is valid from moment it is
created (what trigger can't ensure - constraint does it - or constraint
cant be created etc)

Thanks,

Misa


Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 12:52 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 It compiles without error and looks ok...

Thanks for checking.   Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

2013-09-19 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

  bgworker.c: In function 'WaitForBackgroundWorkerStartup':
  bgworker.c:866: warning: 'pid' may be used uninitialized in this function

 Robert Does the attached patch fix it for you?

It compiles without error and looks ok...

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: lob conversion functionality

2013-09-19 Thread Pavel Stehule
Hello

here is patch

Regards

Pavel



2013/9/19 Pavel Stehule pavel.steh...@gmail.com




 2013/9/19 Rushabh Lathia rushabh.lat...@gmail.com

 Hi Pavel,

 I have reviewed you patch.

 -- Patch got applied cleanly (using patch -p1)
 -- Make  Make install works fine
 -- make check looks good

 I done code-walk and it looks good. Also did some manual testing and
 haven't
 found any issue with the implementation.

 Patch introduced two new API load_lo() and make_lo() for loading and
 saving
 from/to large objects Functions. When it comes to drop an lo object
 created
 using make_lo() this still depend on older API lo_unlink(). I think we
 should
 add that into documentation for the clerification.

 As a user to lo object function when I started testing this new API, first
 question came to mind is why delete_lo() or destroy_lo() API is missing.
 Later I realize that need to use lo_unlink() older API for that
 functionality.
 So I feel its good to document that. Do let you know what you think ?


 good idea

 I'll send a updated patch evening



 Otherwise patch looks nice and clean.


 Thank you :)

 Regards

 Pavel


 Regards,
 Rushabh Lathia
 www.EnterpriseDB.com



 On Sun, Aug 25, 2013 at 8:31 PM, Pavel Stehule 
 pavel.steh...@gmail.comwrote:

 Hello

 here is a patch

 it introduce a load_lo and make_lo functions

 postgres=# select make_lo(decode('ff00','hex'));
  make_lo
 ─
24629
 (1 row)

 Time: 40.724 ms
 postgres=# select load_lo(24628);
   load_lo
 
  \xff00
 (1 row)

 postgres=# \lo_import ~/avatar.png
 lo_import 24630

 postgres=# select md5(load_lo(24630));
md5
 ──
  513f60836f3b625713acaf1c19b6ea78
 (1 row)

 postgres=# \q
 bash-4.1$ md5sum ~/avatar.png
 513f60836f3b625713acaf1c19b6ea78  /home/pavel/avatar.png

 Regards

 Pavel Stehule



 2013/8/22 Jov am...@amutu.com

 +1
 badly need the large object and bytea convert function.

 Once I have to use the ugly pg_read_file() to put some text to pg,I
 tried to use large object but find it is useless without function to
 convert large object to bytea.

 Jov
 blog: http:amutu.com/blog http://amutu.com/blog


 2013/8/10 Pavel Stehule pavel.steh...@gmail.com

  Hello

 I had to enhance my older project, where XML documents are parsed and
 created on server side - in PLpgSQL and PLPerl procedures. We would to
 use a LO API for client server communication, but we have to
 parse/serialize LO on server side.

 I found so there are no simple API for working with LO from PL without
 access to file system. I had to use a ugly hacks:

 CREATE OR REPLACE FUNCTION parser.save_as_lob(bytea)
 RETURNS oid AS $$
 DECLARE
   _loid oid;
   _substr bytea;
 BEGIN
   _loid := lo_creat(-1);
   FOR i IN 0..length($1)/2048
   LOOP
 _substr := substring($1 FROM i * 2048 + 1 FOR 2048);
 IF _substr  '' THEN
   INSERT INTO pg_largeobject(loid, pageno, data)
 VALUES(_loid, i, _substr);
 END IF;
   END LOOP;

   EXECUTE format('GRANT SELECT ON LARGE OBJECT %s TO ohs', _loid);
   RETURN _loid;
 END;
 $$ LANGUAGE plpgsql SECURITY DEFINER STRICT SET search_path =
 'pg_catalog';

 and

 CREATE OR REPLACE FUNCTION fbuilder.attachment_to_xml(attachment oid)
 RETURNS xml AS $$
 DECLARE
   b_cum bytea = '';
   b bytea;
 BEGIN
   FOR b IN SELECT l.data
   FROM pg_largeobject l
  WHERE l.loid = attachment_to_xml.attachment
  ORDER BY l.pageno
   LOOP
 b_cum := b_cum || b;
   END LOOP;
   IF NOT FOUND THEN
 RETURN NULL;
   ELSE
 RETURN xmlelement(NAME attachment,
encode(b_cum, 'base64'));
   END IF;
 END;
 $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path =
 'pg_catalog';

 These functions can be simplified if we supports some functions like
 encode, decode for LO

 So my proposal is creating functions:

 * lo_encode(loid oid) .. returns bytea
 * lo_encode(loid oid, encoding text) .. returns text
 * lo_make(loid oid, data bytea)
 * lo_make(loid oid, data text, encoding text)

 This can simplify all transformation between LO and VARLENA. Known
 limit is 1G for varlena, but it is still relative enough high.

 Notes. comments?

 Regards

 Pavel


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




 --
 Rushabh Lathia





load_lo_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor inheritance/check bug: Inconsistent behavior

2013-09-19 Thread Robert Haas
 Marking this as Ready for committer.

 Thanks a ton for reviewing the patch.

I rewrote the comments for this patch and fixed the incorrect
formatting in parse_relation.c.  It used spaces instead of tabs, which
is why if you look at the patch file you'll see that the alignment
looked off.  See new version, attached.

However, I have a few residual questions:

1. Does copy.c also need to be fixed?  I see that it also calls
ExecConstraints(), and I don't see it setting tableOid anywhere.  We
certainly want COPY and INSERT to behave the same way.

2. Should constraints also allow access to the oid column?  Right
now you can do that but it doesn't seem to work, e.g.:

rhaas=# create table foo (a int, check (oid::integer % 2 = 0)) with oids;
CREATE TABLE
rhaas=# insert into foo values (1);
INSERT 16404 1
rhaas=# insert into foo values (1);
INSERT 16405 1
rhaas=# insert into foo values (1);
INSERT 16406 1
rhaas=# select oid, * from foo;
  oid  | a
---+---
 16404 | 1
 16405 | 1
 16406 | 1
(3 rows)

Just prohibiting that might be fine; it doesn't seem that useful
anyway.  But if we're setting the table OID, maybe we should set the
OID too, and then just allow this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


sys_col_constr_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not In Foreign Key Constraint

2013-09-19 Thread David Johnston
Misa Simic wrote
 Hi hackers,
 
 I just wonder how hard would be to implement something like Not In FK
 Constraint or opposite to FK...

A more useful couple next sentences would be along the lines of:

I have this problemI've approached it by doingbut it seems that an
actual database enforced constraint would be a better solution.  Is that
something that has been considered?  Are their other ways of attacking this
problem I have not considered?

You took quite a bit of time to try and start a discussion, and I get that
you don't necessarily know where it is going to lead, but Not In FK
constraint, with a descriptive sentence of two, likely would have been
enough to get the ball rolling.  Instead you devoted more space to technical
clarification that would have been better served by espousing on what
problem and how current approaches to dealing with said problem are
limited.

A more specific end-question would also help solicit better responses.

I say all this because 3 days later nothing more substantial than why is
this feature necessary has been put forth.  The general idea likely has
some merit but you've not provided anything for people to hook their teeth
into.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Not-In-Foreign-Key-Constraint-tp5771056p5771651.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to load modules from?

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 12:53 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-18 08:46:08 -0400, Robert Haas wrote:
 Here's another idea.  At initdb time, create an empty directory called
 called pg_you_can_load_stuff_from_here (pick a better name) inside
 $PGDATA.  Allow it to be replaced with a symlink.  This would be
 similar to what we do today with pg_xlog.  In fact, you can imagine an
 equivalent of initdb -X that does something precisely analogous.  This
 feels a bit more natural to me than a GUC.

 I think I'd prefer a GUC that allows specifying multiple directories
 that are searched in order to a single symlinked directory.

Why?

I ask because I have the opposite preference, based on the precedent of pg_xlog.

 Also, aren't symlinks an absolute PITA to manipulate by hand on
 windows?

Maybe so, but if that's an issue here it's a preexisting issue also.
I think we shouldn't burden this patch with fixing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dead code or buggy code?

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 6:20 PM, Greg Stark st...@mit.edu wrote:
 The following code is in the ProcSleep at proc.c:1138.
 GetBlockingAutoVacuumPgproc() should presumably always return a vacuum
 pgproc entry since the deadlock state says it's blocked by autovacuum.
 But I'm not really familiar enough with this codepath to know whether
 there's not a race condition here where it can sometimes return null.
 The following code checks autovac != NULL but the PGXACT initializer
 would have seg faulted if it returned NULL if that's possible.

 if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM 
 allow_autovacuum_cancel)
 {
 PGPROC   *autovac = GetBlockingAutoVacuumPgproc();
 PGXACT   *autovac_pgxact =
 ProcGlobal-allPgXact[autovac-pgprocno];

 LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

 /*
  * Only do it if the worker is not working to protect against Xid
  * wraparound.
  */
 if ((autovac != NULL) 
 (autovac_pgxact-vacuumFlags  PROC_IS_AUTOVACUUM) 
 !(autovac_pgxact-vacuumFlags  PROC_VACUUM_FOR_WRAPAROUND))
 {

Hmm, yeah.  I remember noticing this some time ago but never got
around to fixing it.  +1 for rearranging things there somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 I generally think the current logic for triggering VACUUMs via
 autovacuum doesn't really make all that much sense in the days where we
 have the visibility map.

Right now, whether or not to autovacuum is the rest of a two-pronged
test.  The first prong is based on number of updates and deletes
relative to table size; that triggers a regular autovacuum.  The
second prong is based on age(relfrozenxid) and triggers a
non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

The typical case in which this doesn't work out well is when the table
has a lot of inserts but few or no updates and deletes.  So I propose
that we change the first prong to count inserts as well as updates and
deletes when deciding whether it needs to vacuum the table.  We
already use that calculation to decide whether to auto-analyze, so it
wouldn't be very novel.   We know that the work of marking pages
all-visible will need to be done at some point, and doing it sooner
will result in doing it in smaller batches, which seems generally
good.

However, I do have one concern: it might lead to excessive
index-vacuuming.  Right now, we skip the index vac step only if there
ZERO dead tuples are found during the heap scan.  Even one dead tuple
(or line pointer) will cause an index vac cycle, which may easily be
excessive.  So I further propose that we introduce a threshold for
index-vac; so that we only do index vac cycle if the number of dead
tuples exceeds, say 0.1% of the table size.

Thoughts?  Let the hurling of rotten tomatoes begin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 6:42 PM, MauMau maumau...@gmail.com wrote:
 It seems to me that these two points here are the real core of your
 proposal.  The rest is just syntactic sugar.

 No, those are desirable if possible features.  What's important is to
 declare in the manual that PostgreSQL officially supports national character
 types, as I stated below.

That may be what's important to you, but it's not what's important to
me.  I am not keen to introduce support for nchar and nvarchar as
differently-named types with identical semantics.  And I think it's an
even worse idea to introduce them now, making them work one way, and
then later change the behavior in a backward-incompatible fashion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-19 Thread Robert Haas
On Sat, Sep 14, 2013 at 6:09 PM, Marko Tiikkaja ma...@joh.to wrote:
 On 2013-09-14 23:55, Pavel Stehule wrote:
 but introduction a reserved keword for one very special purpose (without
 extensibility) is not prudent.

 How about using an existing keyword then?  ASSERTION used to be reserved in
 the SQL standard but is unreserved in postgres.  CHECK might work and is
 fully reserved.  CONSTRAINT?  IS?

Personally, I'm pretty skeptical about the value of adding dedicated
syntax for this.  I mean, I'll be the first to admit that PL/pgsql is
a clunky and awkward language.  But somebody's always proposing
something that they think will make it better, and I feel somehow that
if we accept all of those proposals at face value, we'll just end up
with a mess.  So IMHO the bar for adding new syntax to PL/pgsql should
be reasonably high.  YMMV, of course, and probably does.

The issue of how this is spelled is somewhat secondary for me.  I
think ASSERT is probably as good as anything.  But let's not kid
ourselves: even reserving this word only in PL/pgsql WILL break things
for some users, and there are LOTS of people out there with LOTS of
procedural code.  Every tiny backward-incompatibility reduces by just
a little bit the percentage of those people who can upgrade and
increases the delay before they do.  This is an area where past
experience has made me quite wary.

Maybe I'm worrying over nothing; this really is a pretty small change.
 But once bitten, twice shy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 7:29 PM, Vik Fearing vik.fear...@dalibo.com wrote:
 On 09/19/2013 12:55 AM, Dimitri Fontaine wrote:
 We have, as a community, gone to a fair amount of trouble  to make
  the concept of equality pluggable and allow multiple types of
  equality per type.  To me it seems the perfect tool to solve this
  problem.  Why the fuss?
 Because I don't understand why you need another equality than the
 default btree one, certainly.

 Basically because 'word'::citext and 'Word'::citext are the same to your
 brain but not to your eyeballs.

 Unique indexes, for example, only need to please your brain.  Matviews
 need to please your eyeballs.

Right, and well said.

It's perfectly reasonable to want a unique index that doesn't allow
both Kevin O'leary and Kevin O'Leary to be listed in the
irish_names table, but it's also perfectly reasonable to want to
remember that the user who entered the data spelled it the second way
and not the first.  And it's also reasonable to want any corrections
made to the table to propagate to any materialized views defined over
it.

The fact that we have multiple concepts of equality can be confusing,
but it's not for no reason, and we're not the only database or
programming language to have such a concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 6:56 PM, Andres Freund and...@2ndquadrant.com wrote:
 That's more or less what I figured.  One thing I'm uncomfortable with
 is that, while this is useful for extensions, what do we do when we
 have a similar requirement for core?  The proposed GUC name is of the
 format extension.user_defined_object_name.property_name; but omitting
 the extension name would lead to
 user_defined_object_name.property_name, which doesn't feel good as a
 method for naming core GUCs.  The user defined object name might just
 so happen to be an extension name.

 I think that ship has long since sailed. postgresql.conf has allowed
 foo.bar style GUCs via custom_variable_classes for a long time, and
 these days we don't even require that but allow them generally. Also,
 SET, postgres -c, and SELECT set_config() already don't have the
 restriction to one dot in the variable name.

Well, we should make it consistent, but I'm not sure that settles the
question of which direction to go.

 If it's not a good fit for pg_hba.conf, why is it a good fit for
 anything else we want to do?

 Well, for now it's primarily there for extensions, not for core
 code. Although somebody has already asked when I'd submit the patch
 because they could use it for their patch...
 I don't really find the pg_hba.conf comparison terribly convincing. As
 an extension, how would you prefer to formulate the configuration
 in the example?

Well, to me it looks like what you've got there is a table.  And I
don't have a clear feeling of how that ought to be represented in
configuration-land, but trying to jam it into the GUC machinery seems
awkward at best.  I think we need a way of handling tabular
configuration data, but I'm not convinced this is it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Range types do not display in pg_stats

2013-09-19 Thread Josh Berkus

create table tstztest( trange tstzrange );

postgres=# insert into tstztest select tstzrange(t, t + interval '1 month')
from generate_series('2012-01-01'::timestamptz,'2018-01-01','1 month')
as gs(t);
INSERT 0 73
postgres=# analyze tstztest;
ANALYZE
postgres=# select * from pg_stats where tablename = 'tstztest';
 schemaname | tablename | attname | inherited | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs | histogram_bounds
| correlation | most_common_elems | most_common_elem_freqs |
elem_count_histogram
+---+-+---+---+---++--+---+--
+-+---++--
 public | tstztest  | trange  | f | 0 |22 |
-1 |  |   |
| |   ||

Now, there actually *is* a histogram for the column, which you can find
via pg_statistic.  But is shows up as NULL in pg_stats view.

If this is a known issue, we ought to at least add it to the docs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dump/Reload broken with relocatable extensions

2013-09-19 Thread Vik Fearing
I don't know if this has been discussed before, a cursory search of the
archives didn't turn up anything interesting.  I perhaps didn't put in
the right keywords.

Unfortunately, the easiest way that I've found to reproduce this bug is
to apply the attached patch to the unaccent extension.  This isn't a bug
with that extension, it was just the simplest.  With that patch applied,
the following steps will put a database in a state which cannot be
dumped/restored.

create extension unaccent with schema public;
create schema s;
create table s.t (v text check (public.no_accents(v)));
insert into s.t values ('a');

The problem is the no_accents(text) function, which belongs to the
unaccent extension, calls unaccent(text), also belonging to the unaccent
extension.  If a table living in a different schema to that of the
extension has a CHECK constraint using the function, the dump/restore
behavior of setting the search_path will cause restoration to fail.

At first I thought the solution would be to have all functions of an
extension have a custom search_path equal to that of the extension, but
that doesn't really work because it would cause too many undesirable
side effects.

Another solution could be to postpone adding constraints until after
everything's been set up, but that seems a bit unwieldly.

My preferred solution at the moment is to invent a special $extension
schema analog to the $user schema.  It wouldn't be implicit like the
$user one, but an extension could call one of its own functions as
$extension.funcname().  I'm not sure what should happen if the caller
isn't part of an extension.  I'm leaning towards a schema does not
exist error.  This has grammar issues, though, that $user doesn't have.

The original problem came from a CHECK constraint on the PostGIS
_raster_constraint_pixel_types(raster) function which calls
st_bandmetadata(raster, int[]).  I thought that  a simple patch to the
in-core extension unaccent would be more practical.  I am not proposing
adding this patch to the unaccent extension.

-- 
Vik

*** a/contrib/unaccent/unaccent--1.0.sql
--- b/contrib/unaccent/unaccent--1.0.sql
***
*** 32,34  CREATE TEXT SEARCH DICTIONARY unaccent (
--- 32,39 
  	TEMPLATE = unaccent,
  	RULES= 'unaccent'
  );
+ 
+ CREATE FUNCTION no_accents(text)
+ 	RETURNS boolean
+ 	AS 'select $1 = unaccent($1);'
+ 	LANGUAGE sql STABLE STRICT;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to load modules from?

2013-09-19 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I think I'd prefer a GUC that allows specifying multiple directories
 that are searched in order to a single symlinked directory.

 Why?

 I ask because I have the opposite preference, based on the precedent of 
 pg_xlog.

I understand Andres preference, as it would allow a management somewhat
comparable to PATH or LD_LIBRARY_PATH here.

In an effort to reach consensus, what about having both, with the GUC
being empty by default? That way you have a default per-cluster place
where to stuff binaries to be loaded, and a GUC to manage finer settings
if needs be.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 Right now, whether or not to autovacuum is the rest of a two-pronged

 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
 
 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.
 
 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

+1  I've been thinking of suggesting something along the same lines,
for the same reasons.

 
-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dump/Reload broken with relocatable extensions

2013-09-19 Thread Dimitri Fontaine
Vik Fearing vik.fear...@dalibo.com writes:
 I don't know if this has been discussed before, a cursory search of the
 archives didn't turn up anything interesting.  I perhaps didn't put in
 the right keywords.

For others not to spend too much time on this: it seems like a problem
with the extension not abiding by the rules about its relocatable
property and the @extschema@ thingy.

  http://www.postgresql.org/docs/9.3/static/extend-extensions.html#AEN54999

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to load modules from?

2013-09-19 Thread Andres Freund
On 2013-09-19 22:56:52 +0200, Dimitri Fontaine wrote:
 Robert Haas robertmh...@gmail.com writes:
  I think I'd prefer a GUC that allows specifying multiple directories
  that are searched in order to a single symlinked directory.
 
  Why?
 
  I ask because I have the opposite preference, based on the precedent
  of pg_xlog.

Because I want to specify multiple paths. E.g. one with modules for a
specific postgres version, one for the cluster and one for my
development directory.
Now we could recursively search a directory that contains symlinks to
directories, but that seems ugly.

 I understand Andres preference, as it would allow a management somewhat
 comparable to PATH or LD_LIBRARY_PATH here.

 In an effort to reach consensus, what about having both, with the GUC
 being empty by default? That way you have a default per-cluster place
 where to stuff binaries to be loaded, and a GUC to manage finer settings
 if needs be.

Well, we can have the guc have a default value of $datadir/pg_lib or
such. But using two independent mechanisms seems like a bad idea to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freezing without write I/O

2013-09-19 Thread Ants Aasma
On Thu, Sep 19, 2013 at 2:42 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 * switchFinishXmin and nextSwitchXid should probably be either volatile
or have a compiler barrier between accessing shared memory and
checking them. The compiler very well could optimize them away and
access shmem all the time which could lead to weird results.

 Hmm, that would be a weird optimization. Is that really legal for the
 compiler to do? Better safe than sorry, I guess.

C doesn't really have a multi-threaded memory model before C11, so the
compiler makers have just made one up that suits them best. For unlocked
memory reads the compiler is free to assume that no one else is accessing the
variable, so yes that optimization would be legal according to their rules.

I'm tackling similar issues in my patch. What I'm thinking is that we should
change our existing supposedly atomic accesses to be more explicit and make
the API compatible with C11 atomics[1]. For now I think the changes should be
something like this:

* Have separate typedefs for atomic variants of datatypes (I don't think we
  have a whole lot of them). With C11 atomics support, this would change to

typedef _Atomic TransactionId AtomicTransactionId;

* Require that pg_atomic_init(type, var, val) be used to init atomic
  variables. Initially it would just pass through to assignment, as all
  supported platforms can do 32bit atomic ops. C11 compatible compilers will
  delegate to atomic_init().

* Create atomic read and write macros that look something like:

#define pg_atomic_load(type, var) (*((volatile type*) (var)))

  and

#define pg_atomic_store(type, var, val) do { \
*((volatile type*) (var)) = (val); \
} while(0)

  C11 would pass through to the compilers implementation with relaxed memory
  ordering.

* Rename pg_read_barrier()/pg_write_barrier()/pg_memory_barrier() to
  pg_acquire_fence()/pg_release_fence()/pg_acq_rel_fence(). Herb Sutter makes
  a convincing argument why loosening up the barrier semantics is the sane
  choice here. [2] C11 support can then pass though to atomic_thread_fence().

This way we have a relatively future proof baseline for lockfree programming,
with options to expand with other primitives. We could also only do the
volatile access macros part, at least it would make the intention more clear
than the current approach of sprinkling around volatile pointers.

Regards,
Ants Aasma

[1] http://en.cppreference.com/w/c/atomic
[2] (long video about atomics)
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-19 Thread Andres Freund
On 2013-09-19 14:57:53 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 6:56 PM, Andres Freund and...@2ndquadrant.com wrote:
  I think that ship has long since sailed. postgresql.conf has allowed
  foo.bar style GUCs via custom_variable_classes for a long time, and
  these days we don't even require that but allow them generally. Also,
  SET, postgres -c, and SELECT set_config() already don't have the
  restriction to one dot in the variable name.
 
 Well, we should make it consistent, but I'm not sure that settles the
 question of which direction to go.

I suggested making it consistent in either form elsewhere in this
thread, Tom wasn't convinced. I think because of backward compatibility
concerns we hardly can restrict the valid namespace in all forms to the
most restrictive form (i.e. guc-file.l). Quite some people use GUCs as
variables.
Maybe we can forbid the more absurd variants (SET ...  = 3;
SELECT set_config('...', '1', true)), but restricting it entirely seems
to cause pain for no reason.

  If it's not a good fit for pg_hba.conf, why is it a good fit for
  anything else we want to do?
 
  Well, for now it's primarily there for extensions, not for core
  code. Although somebody has already asked when I'd submit the patch
  because they could use it for their patch...
  I don't really find the pg_hba.conf comparison terribly convincing. As
  an extension, how would you prefer to formulate the configuration
  in the example?
 
 Well, to me it looks like what you've got there is a table.  And I
 don't have a clear feeling of how that ought to be represented in
 configuration-land, but trying to jam it into the GUC machinery seems
 awkward at best.  I think we need a way of handling tabular
 configuration data, but I'm not convinced this is it.

Well, it has two major advantages from my pov: a) we already have it
from SQL and people already use it that way b) it's dirt simple and it
doesn't allow anything not allowed before via other means.

Maybe you can invent something nicer to look at that's also parsed
before the server starts, but I find it hard to see the need
TBH. Especially as you will want most of the infrastructure GUCs
provide...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-09-19 Thread Pavel Stehule
2013/9/16 Satoshi Nagayasu sn...@uptime.jp

 (2013/07/06 1:16), Pavel Stehule wrote:

 I am sending a patch that removes strict requirements for DROP IF
 EXISTS statements. This behave is similar to our ALTER IF EXISTS
 behave now.


 postgres=# DROP CAST IF EXISTS (sss AS public.casttesttype);
 NOTICE:  types sss and public.casttesttype does not exist, skipping
 DROP CAST
 postgres=#   DROP FUNCTION IF EXISTS public.pt_in_widget(point, widget);
 NOTICE:  function public.pt_in_widget(point,**widget) does not exist,
 skipping
 DROP FUNCTION
 postgres=#  DROP OPERATOR IF EXISTS public.% (point, widget);
 NOTICE:  operator public.% does not exist, skipping
 DROP OPERATOR
 postgres=# DROP TRIGGER test_trigger_exists ON no_such_table;
 ERROR:  relation no_such_table does not exist
 postgres=# DROP TRIGGER IF EXISTS test_trigger_exists ON no_such_table;
 NOTICE:  trigger test_trigger_exists for table no_such_table does
 not exist, skipping
 DROP TRIGGER

 This functionality is necessary for correct quite reload from dump
 without possible warnings


 I'm looking at this patch, and I have a question here.

 Should DROP TRIGGER IF EXISTS ignore error for non-existing trigger
 and non-existing table? Or just only for non-existing trigger?


My opinion is so, both variants should be ignored - it should be fully
fault tolerant in this use case.

Regards

Pavel



 I've not understood the pg_restore issue precisely so far,
 but IMHO DROP TRIGGER IF EXISTS means if the _trigger_ exists,
 not if the _table_ exists.

 Is this a correct and/or an expected behavior?

 Sorry if I missed some consensus which we already made.

 Any comments?

 Regards,
 --
 Satoshi Nagayasu sn...@uptime.jp
 Uptime Technologies, LLC. http://www.uptime.jp



Re: [HACKERS] record identical operator - Review

2013-09-19 Thread Steve Singer

On 09/12/2013 06:27 PM, Kevin Grittner wrote:

Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.


Here is attempt at a review of the v1 patch.

There has been a heated discussion on list about if we even want this 
type of operator. I will try to summarize it here


The arguments against it are
* that a record_image_identical call on two records that returns false 
can still return true under the equality operator, and the records might 
(or might not) appear to be the same to users
* Once we expose this as an operator via SQL someone will find it and 
use it and then complain when we change things such as the internal 
representation of a datatype which might

   break there queries
* The differences between = and record_image_identical might not be 
understood by everywhere who finds the operator leading to confusion
* Using a criteria other than equality (the = operator provided by the 
datatype) might cause problems if we later provide 'on change' triggers 
for materialized views


The arguments for this patch are
* We want the materialized view to return the same value as would be 
returned if the query were executed directly.  This not only means that 
the values should be the same according to a datatypes = operator but 
that they should appear the same 'to the eyeball'.
* Supporting the materialized view refresh check via SQL makes a lot of 
sense but doing that requires exposing something via SQL
* If we adequately document what we mean by record_image_identical and 
the operator we pick for this then users shouldn't be surprised at what 
they get if they use this


I think there is agreement that better (as in more obscure) operators 
than === and !== need to be picked  and we need to find a place in the 
user-docs to warn users of the behaviour of this operators.   Hannu has 
proposed


*==*   binary equal, surely very equal by any other definition as wall
!==?  maybe not equal -- binary inequal, may still be equal by
other comparison methods

and no one has yet objected to this.  My vote would be to update the patch to 
use those operator names and then figure out where we can document them.  It it 
means we have to section describing the equal operator for records as well then 
maybe we should do that so we can get on with things.


Code Review


The record_image_eq and record_image_cmp functions are VERY similar.  It seems 
to me that you could have the meet of these functions into a common function 
(that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots 
based on a parameter that controls if it detoasts the tuple for the compare or 
returns false on the equals.

Beyond that I don't see any issues with the code.

You call out a question about two records being compared have a different 
number of columns should it always be an error, or only an error when they 
match on the columns they do have in common.

The current behaviour is

 select * FROM r1,r2 where r1==r2;
 a | b | a | b | c
---+---+---+---+---
 1 | 1 | 1 | 2 | 1
(1 row)

update r2 set b=1;
UPDATE 1
test=# select * FROM r1,r2 where r2==r1;
ERROR:  cannot compare record types with different numbers of columns

This seems like a bad idea to me.  I don't like that I get a type comparison 
error only sometimes based on the values of the data in the column.  If I'm a 
user testing code that compares two records with this operator and I test my 
query with 1 value pair then and it works then I'd naively expect to get a true 
or false on all other value sets of the same record type.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread MauMau

From: Robert Haas robertmh...@gmail.com

That may be what's important to you, but it's not what's important to
me.


National character types support may be important to some potential users of 
PostgreSQL and the popularity of PostgreSQL, not me.  That's why national 
character support is listed in the PostgreSQL TODO wiki.  We might be losing 
potential users just because their selection criteria includes national 
character support.




I am not keen to introduce support for nchar and nvarchar as
differently-named types with identical semantics.


Similar examples already exist:

- varchar and text: the only difference is the existence of explicit length 
limit

- numeric and decimal
- int and int4, smallint and int2, bigint and int8
- real/double precison and float

In addition, the SQL standard itself admits:

The key words NATIONAL CHARACTER are used to specify the character type 
with an implementation-
defined character set. Special syntax (N'string') is provided for 
representing literals in that character set.

...
NATIONAL CHARACTER is equivalent to the corresponding character string 
type with a specification
of CHARACTER SET CSN, where CSN is an implementation-defined character 
set name.


A national character string literal is equivalent to a character string 
literal with the N replaced by
introducercharacter set specification, where character set 
specification is an implementation-

defined character set name.



And I think it's an
even worse idea to introduce them now, making them work one way, and
then later change the behavior in a backward-incompatible fashion.


I understand your feeling.  The concern about incompatibility can be 
eliminated by thinking the following way.  How about this?


- NCHAR can be used with any database encoding.

- At first, NCHAR is exactly the same as CHAR.  That is, 
implementation-defined character set described in the SQL standard is the 
database character set.


- In the future, the character set for NCHAR can be selected at database 
creation like Oracle's CREATE DATABAWSE  NATIONAL CHARACTER SET 
AL16UTF16.  The default it the database set.



Could you tell me what kind of specification we should implement if we 
officially support national character types?


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Andres Freund
On 2013-09-19 14:39:43 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
  I generally think the current logic for triggering VACUUMs via
  autovacuum doesn't really make all that much sense in the days where we
  have the visibility map.
 
 Right now, whether or not to autovacuum is the rest of a two-pronged
 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

And I have some hopes we can get rid of that in 9.4 (that alone would be
worth a bump to 10.0 ;)). I really like Heikki's patch, even if I am
envious that I didn't have the idea :P. Although it needs quite a bit of
work to be ready.

 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.

Yes, that's a desperately needed change.

The reason I suggested keeping track of the xids of unremovable tuples
is that the current logic doesn't handle that at all. We just
unconditionally set n_dead_tuples to zero after a vacuum even if not a
single row could actually be cleaned out. Which has the effect that we
will not start a vacuum until enough bloat (or after changing this, new
inserts) has collected to start vacuum anew. Which then will do twice
the work.

Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
much good either - we would just immediately trigger a new vacuum the
next time we check, even if the xmin horizon is still the same.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

Yes, that's a pretty valid concern. But we can't really do it that
easily. a) We can only remove dead line pointers when we know there's no
index pointing to it anymore. Which we only know after the index has
been removed. b) We cannot check the validity of an index pointer if
there's no heap tuple for it. Sure, we could check whether we're
pointing to a dead line pointer, but the random io costs of that are
prohibitive.
Now, we could just mark line pointers as dead and not mark that page as
all-visible and pick it up again on the next vacuum cycle. But that
would suck long-term.

I think the only real solution here is to store removed tuples tids
(i.e. items where we've marked as dead) somewhere. Whenever we've found
sufficient tuples to-be-removed from indexes we do phase 2.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dead code or buggy code?

2013-09-19 Thread Greg Stark
So I'm just going to make the code defensive and assume NULL is possible
when if maybe it isn't.

In case it's not clear, this is one of the thing's Xi Wang's took picked
up. There not to many but it turns out they are indeed not all in the adt
code so I might wait until after the commit fest to commit it to avoid
causing bit churn.

-- 
greg
On 19 Sep 2013 12:52, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Sep 18, 2013 at 6:20 PM, Greg Stark st...@mit.edu wrote:
  The following code is in the ProcSleep at proc.c:1138.
  GetBlockingAutoVacuumPgproc() should presumably always return a vacuum
  pgproc entry since the deadlock state says it's blocked by autovacuum.
  But I'm not really familiar enough with this codepath to know whether
  there's not a race condition here where it can sometimes return null.
  The following code checks autovac != NULL but the PGXACT initializer
  would have seg faulted if it returned NULL if that's possible.
 
  if (deadlock_state == DS_BLOCKED_BY_AUTOVACUUM 
  allow_autovacuum_cancel)
  {
  PGPROC   *autovac = GetBlockingAutoVacuumPgproc();
  PGXACT   *autovac_pgxact =
  ProcGlobal-allPgXact[autovac-pgprocno];
 
  LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
  /*
   * Only do it if the worker is not working to protect
 against Xid
   * wraparound.
   */
  if ((autovac != NULL) 
  (autovac_pgxact-vacuumFlags  PROC_IS_AUTOVACUUM) 
  !(autovac_pgxact-vacuumFlags 
 PROC_VACUUM_FOR_WRAPAROUND))
  {

 Hmm, yeah.  I remember noticing this some time ago but never got
 around to fixing it.  +1 for rearranging things there somehow.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



[HACKERS] Looking for information on our elephant

2013-09-19 Thread Tatsuo Ishii
Hi,

I'm Looking for information on our elephant: especially how/why we
chose elephant as our mascot.

According to:
http://wiki.postgresql.org/wiki/Logo

The discussion was back to 1997 on the hackers list. Unfortunately the
official archive for 1997 was lost. Mine seems to be gone too. Any
information will be appreciated.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Looking for information on our elephant

2013-09-19 Thread Oleg Bartunov
Tatsuo,

I have  emails even from 1995 !  You can see that historical message here:
http://www.pgsql.ru/db/mw/msg.html?mid=1238939

Re: [HACKERS] PostgreSQL logo.
*Author:* yang( at )sjuphil( dot )sju( dot )edu
*Date:* 1997-04-03 20:36:33
*Subject:*http://www.pgsql.ru/db/mw/index.html?word=Re%3A%20%5BHACKERS%5D%20PostgreSQL%20logo.Re:
[HACKERS] PostgreSQL logo.

Some other ideas:

derivative: a sword (derivative of the Dragon book cover -- Postgres as a tool)
illustrative: a bowl of Alphabet Soup, with letters spelling out POSTGRESQL
obscure: a revolver/hit man (Grosse Pt is an anagram of Postgres, and an
abbreviation of the title of the new John Cusack movie)

but if you want an animal-based logo, how about some sort of elephant?
After all, as the Agatha Christie title read, elephants can remember ...

David yangy...@sju.edu



Oleg


On Fri, Sep 20, 2013 at 3:16 AM, Tatsuo Ishii is...@postgresql.org wrote:

 Hi,

 I'm Looking for information on our elephant: especially how/why we
 chose elephant as our mascot.

 According to:
 http://wiki.postgresql.org/wiki/Logo

 The discussion was back to 1997 on the hackers list. Unfortunately the
 official archive for 1997 was lost. Mine seems to be gone too. Any
 information will be appreciated.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Tatsuo Ishii
 On Mon, Sep 16, 2013 at 8:49 AM, MauMau maumau...@gmail.com wrote:
 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
 contain Unicode data.
 ...
 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
 Fixed-width encoding may allow faster string manipulation as described in
 Oracle's manual.  But I'm not sure about this, because UTF-16 is not a real
 fixed-width encoding due to supplementary characters.
 
 It seems to me that these two points here are the real core of your
 proposal.  The rest is just syntactic sugar.
 
 Let me start with the second one: I don't think there's likely to be
 any benefit in using UTF-16 as the internal encoding.  In fact, I
 think it's likely to make things quite a bit more complicated, because
 we have a lot of code that assumes that server encodings have certain
 properties that UTF-16 doesn't - specifically, that any byte with the
 high-bit clear represents the corresponding ASCII character.

Agreed.

 As to the first one, if we're going to go to the (substantial) trouble
 of building infrastructure to allow a database to store data in
 multiple encodings, why limit it to storing UTF-8 in non-UTF-8
 databases?  What about storing SHIFT-JIS in UTF-8 databases, or
 Windows-yourfavoriteM$codepagehere in UTF-8 databases, or any other
 combination you might care to name?
 
 Whether we go that way or not, I think storing data in one encoding in
 a database with a different encoding is going to be pretty tricky and
 require far-reaching changes.  You haven't mentioned any of those
 issues or discussed how you would solve them.

What about limiting to use NCHAR with a database which has same
encoding or compatible encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the database encoding in the server side thus we can treat
NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
for NCHAR should be defined in initdb time or creation of the database
(if we allow this, we need to add a new column to know what encoding
is used for NCHAR).

For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
SHIFT-JIS and database encoding is UTF-8 because there is a conversion
between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
SHIFT-JIS and database encoding is ISO-8859-1 because there's no
conversion between them.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Looking for information on our elephant

2013-09-19 Thread Tatsuo Ishii
Oh great! Thank you Oleg!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 Tatsuo,
 
 I have  emails even from 1995 !  You can see that historical message here:
 http://www.pgsql.ru/db/mw/msg.html?mid=1238939
 
 Re: [HACKERS] PostgreSQL logo.
 *Author:* yang( at )sjuphil( dot )sju( dot )edu
 *Date:* 1997-04-03 20:36:33
 *Subject:*http://www.pgsql.ru/db/mw/index.html?word=Re%3A%20%5BHACKERS%5D%20PostgreSQL%20logo.Re:
 [HACKERS] PostgreSQL logo.
 
 Some other ideas:
 
 derivative: a sword (derivative of the Dragon book cover -- Postgres as a 
 tool)
 illustrative: a bowl of Alphabet Soup, with letters spelling out POSTGRESQL
 obscure: a revolver/hit man (Grosse Pt is an anagram of Postgres, and an
 abbreviation of the title of the new John Cusack movie)
 
 but if you want an animal-based logo, how about some sort of elephant?
 After all, as the Agatha Christie title read, elephants can remember ...
 
 David yangy...@sju.edu
 
 
 
 Oleg
 
 
 On Fri, Sep 20, 2013 at 3:16 AM, Tatsuo Ishii is...@postgresql.org wrote:
 
 Hi,

 I'm Looking for information on our elephant: especially how/why we
 chose elephant as our mascot.

 According to:
 http://wiki.postgresql.org/wiki/Logo

 The discussion was back to 1997 on the hackers list. Unfortunately the
 official archive for 1997 was lost. Mine seems to be gone too. Any
 information will be appreciated.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types do not display in pg_stats

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 4:54 PM, Josh Berkus j...@agliodbs.com wrote:

 create table tstztest( trange tstzrange );

 postgres=# insert into tstztest select tstzrange(t, t + interval '1 month')
 from generate_series('2012-01-01'::timestamptz,'2018-01-01','1 month')
 as gs(t);
 INSERT 0 73
 postgres=# analyze tstztest;
 ANALYZE
 postgres=# select * from pg_stats where tablename = 'tstztest';
  schemaname | tablename | attname | inherited | null_frac | avg_width |
 n_distinct | most_common_vals | most_common_freqs | histogram_bounds
 | correlation | most_common_elems | most_common_elem_freqs |
 elem_count_histogram
 +---+-+---+---+---++--+---+--
 +-+---++--
  public | tstztest  | trange  | f | 0 |22 |
 -1 |  |   |
 | |   ||

 Now, there actually *is* a histogram for the column, which you can find
 via pg_statistic.  But is shows up as NULL in pg_stats view.

 If this is a known issue, we ought to at least add it to the docs.

It probably has to do with the CASE stakind stuff in the definition of
the pg_stats view.  Try \d+ pg_stats to see what I mean.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Valentine Gogichashvili
Hi,


  That may be what's important to you, but it's not what's important to
 me.


 National character types support may be important to some potential users
 of PostgreSQL and the popularity of PostgreSQL, not me.  That's why
 national character support is listed in the PostgreSQL TODO wiki.  We might
 be losing potential users just because their selection criteria includes
 national character support.


the whole NCHAR appeared as hack for the systems, that did not have it from
the beginning. It would not be needed, if all the text would be magically
stored in UNICODE or UTF from the beginning and idea of character would be
the same as an idea of a rune and not a byte.

PostgreSQL has a very powerful possibilities for storing any kind of
encoding. So maybe it makes sense to add the ENCODING as another column
property, the same way a COLLATION was added?

It would make it possible to have a database, that talks to the clients in
UTF8 and stores text and varchar data in the encoding that is the most
appropriate for the situation.

It will make it impossible (or complicated) to make the database have a
non-UTF8 default encoding (I wonder who should need that in this case), as
conversions will not be possible from the broader charsets into the default
database encoding.

One could define an additional DATABASE property like LC_ENCODING that
would work for the ENCODING property of a column like LC_COLLATE for
COLLATE property of a column.

Text operations should work automatically, as in memory all strings will be
converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs
for the column data storage, like snappy compression or even BSON, gobs or
protbufs for much more compact type storage.

Regards,

-- Valentine Gogichashvili


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-19 Thread Amit Kapila
On Thu, Sep 19, 2013 at 2:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-18 11:02:50 +0200, Andres Freund wrote:
 On 2013-09-18 11:55:24 +0530, Amit Kapila wrote:

   I think that ship has long since sailed. postgresql.conf has allowed
   foo.bar style GUCs via custom_variable_classes for a long time, and
   these days we don't even require that but allow them generally. Also,
   SET, postgres -c, and SELECT set_config() already don't have the
   restriction to one dot in the variable name.
 
  It's even explained in document that a two-part name is allowed for
  Customized Options at link:
  http://www.postgresql.org/docs/devel/static/runtime-config-custom.html

 Oh I somehow missed that. I'll need to patch that as well.

 Updated patch attached.

old part of line
- PostgreSQL will accept a setting for any two-part parameter name.

new part of line
+ PostgreSQL will accept a setting for any parameter name containing
at least one dot.

If I read new part of line just in context of custom options, it makes
sense, but when I compare with old line I think below lines or variant
of them might make more sense as this line is not restricted to just
custom options:

a. PostgreSQL will accept a setting for any parameter name containing
two or more part parameter names.
b. PostgreSQL will accept a setting for any parameter name containing
one or more dots in parts of parameter names.

It's just how user interpret any line, so may be your line is more
meaningful to some users. If you don't think there is any need to
change then keep as it is and let committer decide about it. I don't
have any big problem with the current wording.

I think Robert is still not fully convinced about this patch, but from
my side review of patch with the current scope is complete, so I will
mark it as Ready For Committer if nobody has objections for the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: lob conversion functionality

2013-09-19 Thread Rushabh Lathia
On Thu, Sep 19, 2013 at 10:19 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 here is patch


Looks good.

Marking it as Ready for Committer.




 Regards

 Pavel



 2013/9/19 Pavel Stehule pavel.steh...@gmail.com




 2013/9/19 Rushabh Lathia rushabh.lat...@gmail.com

 Hi Pavel,

 I have reviewed you patch.

 -- Patch got applied cleanly (using patch -p1)
 -- Make  Make install works fine
 -- make check looks good

 I done code-walk and it looks good. Also did some manual testing and
 haven't
 found any issue with the implementation.

 Patch introduced two new API load_lo() and make_lo() for loading and
 saving
 from/to large objects Functions. When it comes to drop an lo object
 created
 using make_lo() this still depend on older API lo_unlink(). I think we
 should
 add that into documentation for the clerification.

 As a user to lo object function when I started testing this new API,
 first
 question came to mind is why delete_lo() or destroy_lo() API is missing.
 Later I realize that need to use lo_unlink() older API for that
 functionality.
 So I feel its good to document that. Do let you know what you think ?


 good idea

 I'll send a updated patch evening



 Otherwise patch looks nice and clean.


 Thank you :)

 Regards

 Pavel


 Regards,
 Rushabh Lathia
 www.EnterpriseDB.com



 On Sun, Aug 25, 2013 at 8:31 PM, Pavel Stehule 
 pavel.steh...@gmail.comwrote:

 Hello

 here is a patch

 it introduce a load_lo and make_lo functions

 postgres=# select make_lo(decode('ff00','hex'));
  make_lo
 ─
24629
 (1 row)

 Time: 40.724 ms
 postgres=# select load_lo(24628);
   load_lo
 
  \xff00
 (1 row)

 postgres=# \lo_import ~/avatar.png
 lo_import 24630

 postgres=# select md5(load_lo(24630));
md5
 ──
  513f60836f3b625713acaf1c19b6ea78
 (1 row)

 postgres=# \q
 bash-4.1$ md5sum ~/avatar.png
 513f60836f3b625713acaf1c19b6ea78  /home/pavel/avatar.png

 Regards

 Pavel Stehule



 2013/8/22 Jov am...@amutu.com

 +1
 badly need the large object and bytea convert function.

 Once I have to use the ugly pg_read_file() to put some text to pg,I
 tried to use large object but find it is useless without function to
 convert large object to bytea.

 Jov
 blog: http:amutu.com/blog http://amutu.com/blog


 2013/8/10 Pavel Stehule pavel.steh...@gmail.com

  Hello

 I had to enhance my older project, where XML documents are parsed and
 created on server side - in PLpgSQL and PLPerl procedures. We would to
 use a LO API for client server communication, but we have to
 parse/serialize LO on server side.

 I found so there are no simple API for working with LO from PL without
 access to file system. I had to use a ugly hacks:

 CREATE OR REPLACE FUNCTION parser.save_as_lob(bytea)
 RETURNS oid AS $$
 DECLARE
   _loid oid;
   _substr bytea;
 BEGIN
   _loid := lo_creat(-1);
   FOR i IN 0..length($1)/2048
   LOOP
 _substr := substring($1 FROM i * 2048 + 1 FOR 2048);
 IF _substr  '' THEN
   INSERT INTO pg_largeobject(loid, pageno, data)
 VALUES(_loid, i, _substr);
 END IF;
   END LOOP;

   EXECUTE format('GRANT SELECT ON LARGE OBJECT %s TO ohs', _loid);
   RETURN _loid;
 END;
 $$ LANGUAGE plpgsql SECURITY DEFINER STRICT SET search_path =
 'pg_catalog';

 and

 CREATE OR REPLACE FUNCTION fbuilder.attachment_to_xml(attachment oid)
 RETURNS xml AS $$
 DECLARE
   b_cum bytea = '';
   b bytea;
 BEGIN
   FOR b IN SELECT l.data
   FROM pg_largeobject l
  WHERE l.loid = attachment_to_xml.attachment
  ORDER BY l.pageno
   LOOP
 b_cum := b_cum || b;
   END LOOP;
   IF NOT FOUND THEN
 RETURN NULL;
   ELSE
 RETURN xmlelement(NAME attachment,
encode(b_cum, 'base64'));
   END IF;
 END;
 $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path =
 'pg_catalog';

 These functions can be simplified if we supports some functions like
 encode, decode for LO

 So my proposal is creating functions:

 * lo_encode(loid oid) .. returns bytea
 * lo_encode(loid oid, encoding text) .. returns text
 * lo_make(loid oid, data bytea)
 * lo_make(loid oid, data text, encoding text)

 This can simplify all transformation between LO and VARLENA. Known
 limit is 1G for varlena, but it is still relative enough high.

 Notes. comments?

 Regards

 Pavel


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




 --
 Rushabh Lathia






-- 
Rushabh Lathia