Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-29 Thread Sawada Masahiko
On Tue, Jul 21, 2015 at 3:50 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Jul 20, 2015 at 9:59 PM, Beena Emerson memissemer...@gmail.com 
 wrote:
 Simon Riggs wrote:

 The choice between formats is not
 solely predicated on whether we have multi-line support.

 I still think writing down some actual use cases would help bring the
 discussion to a conclusion. Inventing a general facility is hard without
 some clear goals about what we need to support.

 We need to at least support the following:
 - Grouping: Specify of standbys along with the minimum number of commits
 required from the group.
 - Group Type: Groups can either be priority or quorum group.

 As far as I understood at the lowest level a group is just an alias
 for a list of nodes, quorum or priority are properties that can be
 applied to a group of nodes when this group is used in the expression
 to define what means synchronous commit.

 - Group names: to simplify status reporting
 - Nesting: At least 2 levels of nesting

 If I am following correctly, at the first level there is the
 definition of the top level objects, like groups and sync expression.


The grouping and using same application_name different server is similar.
How does the same application_name different server work?

 Using JSON, sync rep parameter to replicate in 2 different clusters could be
 written as:

{remotes:
 {quorum: 2,
  servers: [{london:
 {priority: 2,
  servers: [lndn1, lndn2, lndn3]
 }}
 ,
   {nyc:
 {priority: 1,
  servers: [ny1, ny2]
 }}
   ]
 }
 }
 The same parameter in the new language (as suggested above) could be written
 as:
  'remotes: 2(london: 1[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'

 OK, there is a typo. That's actually 2(london: 2[lndn1, lndn2, lndn3],
 nyc: 1[ny1, ny2]) in your grammar. Honestly, if we want group aliases,
 I think that JSON makes the most sense. One of the advantage of a
 group is that you can use it in several places in the blob and set
 different properties into it, hence we should be able to define a
 group out of the sync expression.
 Hence I would think that something like that makes more sense:
 {
 sync_standby_names:
 {
 quorum:2,
 nodes:
 [
 {priority:1,group:cluster1},
 {quorum:2,nodes:[node1,node2,node3]}
 ]
 },
 groups:
 {
 cluster1:[node11,node12,node13],
 cluster2:[node21,node22,node23]
 }
 }

 Also, I was thinking the name of the main group could be optional.
 Internally, it can be given the name 'default group' or 'main group' for
 status reporting.

 The above could also be written as:
  '2(london: 2[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'

 backward compatible:
 In JSON, while validating we may have to check if it starts with '{' to go

 Something worth noticing, application_name can begin with {.

 for JSON parsing else proceed with the current method.

 A,B,C = 1[A,B,C]. This can be added in the new parser code.

 This makes sense. We could do the same for JSON-based format as well
 by reusing the in-memory structure used to deparse the blob when the
 former grammar is used as well.

If I validate s_s_name JSON syntax, I will definitely use JSONB,
rather than JSON.
Because JSONB has some useful operation functions for adding node,
deleting node to s_s_name today.
But the down side of using JSONB for s_s_name is that it could switch
in key name order place.(and remove duplicate key)
For example in the syntax Michael suggested,


* JSON (just casting JSON)
  json

 { +
 sync_standby_names: +
 { +
 quorum:2,   +
 nodes:  +
 [ +
 {priority:1,group:cluster1},+
 {quorum:2,nodes:[node1,node2,node3]}+
 ] +
 },+
 groups: +
 { +
 cluster1:[node11,node12,node13],  +
 cluster2:[node21,node22,node23]   +
 } +
 }

* JSONB (using jsonb_pretty)
 jsonb_pretty

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-16 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 3:07 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 July 2015 at 15:11, Sawada Masahiko sawada.m...@gmail.com wrote:


 Oops, I had forgotten to add new file heapfuncs.c.
 Latest patch is attached.


 I think we've established the approach is desirable and defined the way
 forwards for this, so this is looking good.

 If we want to move stuff like pg_stattuple, pg_freespacemap into core,
 we could move them into heapfuncs.c.

 Some of my requests haven't been actioned yet, so I personally would not
 commit this yet. I am happy to continue as reviewer/committer unless others
 wish to take over.
 The main missing item is pg_upgrade support, which won't happen by end of
 CF1, so I am marking this as Returned With Feedback. Hopefully we can review
 this again before CF2.

 I appreciate your reviewing.
 Yeah, the pg_upgrade support and regression test for VFM patch is
 almost done now, I will submit the patch in this week after testing it
 .

Attached patch is latest v9 patch.

I added:
- regression test for visibility map (visibilitymap.sql and
visibilitymap.out files)
- pg_upgrade support (rewriting vm file to vfm file)
- regression test for pg_upgrade

Please review it.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, vmbuffer))
+		if (visibilitymap_test(rel, blkno, vmbuffer, VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat-tuple_len += BLCKSZ - freespace;
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..806ce27 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,7 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o \
+	heapfuncs.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..796b76f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2131,8 +2131,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * or all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
 	   InvalidBuffer, options, bistate,
@@ -2147,7 +2148,11 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	if (PageIsAllVisible(BufferGetPage(buffer)))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllFrozen(BufferGetPage(buffer));
+
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber((heaptup-t_self)),
 			vmbuffer);
@@ -2448,7 +2453,11 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		if (PageIsAllVisible(page))
 		{
 			all_visible_cleared = true;
+
+			/* all-frozen information is also cleared at the same time */
 			PageClearAllVisible(page);
+			PageClearAllFrozen(page);
+
 			visibilitymap_clear(relation,
 BufferGetBlockNumber(buffer),
 vmbuffer);
@@ -2731,9 +2740,9 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	/*
 	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, we'll have to unlock and
-	 * re-lock, to avoid holding the buffer lock across an I/O.  That's a bit
-	 * unfortunate, but hopefully shouldn't happen often.
+	 * visible or all frozen while we were busy locking the buffer, we'll
+	 * have to unlock and re-lock, to avoid holding the buffer lock across an
+	 * I/O.  That's a bit unfortunate, but hopefully shouldn't happen often.
 	 */
 	if (vmbuffer == InvalidBuffer  PageIsAllVisible(page))
 	{
@@ -2925,10 +2934,15 @@ l1:
 	 */
 	PageSetPrunable(page, xid);
 
+	/* clear PD_ALL_VISIBLE and PD_ALL_FORZEN flags */
 	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(page

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-14 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 July 2015 at 15:11, Sawada Masahiko sawada.m...@gmail.com wrote:


 Oops, I had forgotten to add new file heapfuncs.c.
 Latest patch is attached.


 I think we've established the approach is desirable and defined the way
 forwards for this, so this is looking good.

If we want to move stuff like pg_stattuple, pg_freespacemap into core,
we could move them into heapfuncs.c.

 Some of my requests haven't been actioned yet, so I personally would not
 commit this yet. I am happy to continue as reviewer/committer unless others
 wish to take over.
 The main missing item is pg_upgrade support, which won't happen by end of
 CF1, so I am marking this as Returned With Feedback. Hopefully we can review
 this again before CF2.

I appreciate your reviewing.
Yeah, the pg_upgrade support and regression test for VFM patch is
almost done now, I will submit the patch in this week after testing it
.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:

 I think we need something for pg_upgrade to rewrite existing VMs.
 Otherwise a large read only database would suddenly require a massive
 revacuum after upgrade, which seems bad. That can wait for now until we all
 agree this patch is sound.


 Since we need to rewrite the vm map, I think we should call the new map
 vfm

 That way we will be able to easily check whether the rewrite has been
 conducted on all relations.

 Since the maps are just bits there is no other way to tell that a map has
 been rewritten

To avoid revacuum after upgrade, you meant that we need to rewrite
each bit of vm to corresponding bits of vfm, if it's from
not-supporting vfm version(i.g., 9.5 or earlier ). right?
If so, we will need to do whole scanning table, which is expensive as well.
Clearing vm and do revacuum would be nice, rather than doing in
upgrading, I think.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson memissemer...@gmail.com wrote:
 Hello,

 Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

 Adding group labels do have a lot of values but as Amit has pointed out,
 with little modification, they can be included in GUC as well. It will not
 make it any more complex.

 On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.

 I was trying to figure out how the JSON metadata can be used.
 It would have to be set using a given set of functions. Right?
 I am sorry this question is very basic.

 The functions could be something like:
 1. pg_add_synch_set(set_name NAME, quorum INT, is_priority bool, set_members
 VARIADIC)

 This will be used to add a sync set. The set_members can be individual
 elements of another set name. The parameter is_priority is used to decide
 whether the set is priority (true) set or quorum (false). This function call
 will  create a folder pg_syncinfo/groups/$NAME and store the json blob?

 The root group would be automatically sset by finding the group which is not
 included in other groups? or can be set by another function?

 2. pg_modify_sync_set(set_name NAME, quorum INT, is_priority bool,
 set_members VARIADIC)

 This will update the pg_syncinfo/groups/$NAME to store the new values.

 3. pg_drop_synch_set(set_name NAME)

 This will update the pg_syncinfo/groups/$NAME folder. Also all the groups
 which included this would be updated?

 4. pg_show_synch_set()

 this will display the current sync setting in json format.

 Am I missing something?

 Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
 format already known to users?

 In a real-life scenario, at most how many groups and nesting would be
 expected?


I might missing something but, these functions will generate WAL?
If they does, we will face the situation where we need to wait
forever, Fujii-san pointed out.


Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:
 
  I think we need something for pg_upgrade to rewrite existing VMs.
  Otherwise a large read only database would suddenly require a massive
  revacuum after upgrade, which seems bad. That can wait for now until we
  all
  agree this patch is sound.
 
 
  Since we need to rewrite the vm map, I think we should call the new
  map
  vfm
 
  That way we will be able to easily check whether the rewrite has been
  conducted on all relations.
 
  Since the maps are just bits there is no other way to tell that a map
  has
  been rewritten

 To avoid revacuum after upgrade, you meant that we need to rewrite
 each bit of vm to corresponding bits of vfm, if it's from
 not-supporting vfm version(i.g., 9.5 or earlier ). right?
 If so, we will need to do whole scanning table, which is expensive as
 well.
 Clearing vm and do revacuum would be nice, rather than doing in
 upgrading, I think.


 How will you ensure to have revacuum for all the tables after
 upgrading?

We use script file which are generated by pg_upgrade.

  Till the time Vacuum is done on the tables that
 have vm before upgrade, any queries on those tables can
 become slower.

Even If we implement rewriting tool for vm into pg_upgrade, it will
take time as much as revacuum because it need whole scanning table.
I meant that we rewrite vm using by existing facility (i.g., vacuum
(freeze)), instead of implementing new rewriting tool for vm.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
 Even If we implement rewriting tool for vm into pg_upgrade, it will
 take time as much as revacuum because it need whole scanning table.

 Why would it? Sure, you can only set allvisible and not the frozen bit,
 but that's fine. That way the cost for freezing can be paid over time.

 If we require terrabytes of data to be scanned, including possibly
 rewriting large portions due to freezing, before index only scans work
 and most vacuums act in a partial manner the migration to 9.6 will be a
 major pain for our users.

Ah, If we set all bit as not all-frozen,  we don't need to whole table
scanning, only scan vm.
And I agree with this.

But please image the case where old cluster has table which is very
large, read-only and vacuum freeze is done.
In this case, the all-frozen bit of such table in new cluster will not
set, unless we do vacuum freeze again.
The information of all-frozen of such table is lacked.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-10 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 3:42 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Jul 8, 2015 at 10:10 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Thu, Jul 9, 2015 at 4:31 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko sawada.m...@gmail.com
  wrote:
 
  It's impossible to have VM bits set to frozen but not visible.
  These bit are controlled independently. But eventually, when
  all-frozen bit is set, all-visible is also set.
 
 
  If that combination is currently impossible, could it be used indicate
  that
  the page is all empty?

 Yeah, the status of that VM bits set to frozen but not visible is
 impossible, so we could use this status for another something status
 of the page.

  Having a crash-proof bitmap of all-empty pages would make vacuum
  truncation
  scans much more efficient.

 The empty page is always marked all-visible by vacuum today, it's not
 enough?


 The current vacuum can just remember that they were empty as well as
 all-visible.

 But the next vacuum that occurs on the table won't know that they are empty,
 just that they are all-visible, so it can't truncate them away without
 having to read each one first.

Yeah, it would be effective for vacuum empty page.


 It is a minor thing, but if there is no other use for this fourth
 bit-space, it seems a shame to waste it when there is some use for it.  I
 haven't looked at the code around this area to know how hard it would be to
 implement the setting and clearing of the bit.

I think so too, we would be able to use unused fourth status of bits
efficiently.
Should I include these improvement into this patch?
This topic should be discussed on another thread after this feature is
committed, I think.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-10 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 10:43 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 10, 2015 at 2:41 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Jul 10, 2015 at 3:05 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:

 Also something for pg_upgrade is also not yet.

 TODO
 - Test case for this feature
 - pg_upgrade support.


 I had forgotten to change the fork name of visibility map to vfm.
 Attached latest v7 patch.
 Please review it.

 The compilation failed on my machine...

 gcc -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -g -O0 -I../../../../src/include -D_GNU_SOURCE   -c -o
 visibilitymap.o visibilitymap.c
 make[4]: *** No rule to make target `heapfuncs.o', needed by
 `objfiles.txt'.  Stop.
 make[4]: *** Waiting for unfinished jobs
 ( echo src/backend/access/index/genam.o
 src/backend/access/index/indexam.o ) objfiles.txt
 make[4]: Leaving directory `/home/postgres/pgsql/git/src/backend/access/index'
 gcc -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -g -O0 -I../../../src/include -D_GNU_SOURCE   -c -o
 tablespace.o tablespace.c
 gcc -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -g -O0 -I../../../src/include -D_GNU_SOURCE   -c -o
 instrument.o instrument.c
 make[4]: Leaving directory `/home/postgres/pgsql/git/src/backend/access/heap'
 make[3]: *** [heap-recursive] Error 2
 make[3]: Leaving directory `/home/postgres/pgsql/git/src/backend/access'
 make[2]: *** [access-recursive] Error 2
 make[2]: *** Waiting for unfinished jobs


Oops, I had forgotten to add new file heapfuncs.c.
Latest patch is attached.

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-09 Thread Sawada Masahiko
On Tue, Jul 7, 2015 at 8:49 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 9:00 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:


 Thank you for bug report, and comments.

 Fixed version is attached, and source code comment is also updated.
 Please review it.


 I am looking into this patch and would like to share my findings with
 you:

Thank you for comment.
I appreciate you taking time to review this patch.


 1.
 @@ -2131,8 +2133,9 @@ heap_insert(Relation relation, HeapTuple tup,
 CommandId cid,

 CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);

   /*
 - * Find buffer to insert this
 tuple into.  If the page is all visible,
 - * this will also pin the requisite visibility map page.
 +
  * Find buffer to insert this tuple into.  If the page is all visible
 + * of all frozen, this will also pin
 the requisite visibility map and
 + * frozen map page.
   */

 typo in comments.

 /of all frozen/or all frozen

Fixed.

 2.
 visibilitymap.c
 + * The visibility map is a bitmap with two bits (all-visible and all-frozen
 + * per heap page.

 /and all-frozen/and all-frozen)
 closing round bracket is missing.

Fixed.

 3.
 visibilitymap.c
 -/*#define TRACE_VISIBILITYMAP */
 +#define TRACE_VISIBILITYMAP

 why is this hash define opened?

Fixed.

 4.
 -visibilitymap_count(Relation rel)
 +visibilitymap_count(Relation rel, bool for_visible)

 This API needs to count set bits for either visibility info, frozen info
 or both (if required), it seems better to have second parameter as
 uint8 flags rather than bool. Also, if it is required to be called at most
 places for both visibility and frozen bits count, why not get them
 in one call?

Fixed.

 5.
 Clearing visibility and frozen bit separately for the dml
 operations would lead locking/unlocking the corresponding buffer
 twice, can we do it as a one operation.  I think this is suggested
 by Simon as well.

Latest patch clears bits in one operation, and set all-frozen with
all-visible in one operation.
We can judge the page is all-frozen in two places: first scanning the
page(lazy_scan_heap), and after cleaning garbage(lazy_vacuum_page).

 6.
 - * Before locking the buffer, pin the visibility map page if it appears to
 - * be necessary.
 Since we haven't got the lock yet, someone else might be
 + * Before locking the buffer, pin the
 visibility map if it appears to be
 + * necessary.  Since we haven't got the lock yet, someone else might
 be

 Why you have deleted 'page' in above comment?

Fixed.

 7.
 @@ -3490,21 +3532,23 @@ l2:
   UnlockTupleTuplock(relation, (oldtup.t_self), *lockmode);

 if (vmbuffer != InvalidBuffer)
   ReleaseBuffer(vmbuffer);
 +
   bms_free
 (hot_attrs);

 Seems unnecessary change.

Fixed.

 8.
 @@ -1919,11 +1919,18 @@ index_update_stats(Relation rel,
   {
   BlockNumber relpages =
 RelationGetNumberOfBlocks(rel);
   BlockNumber relallvisible;
 + BlockNumber
 relallfrozen;

   if (rd_rel-relkind != RELKIND_INDEX)
 - relallvisible =
 visibilitymap_count(rel);
 + {
 + relallvisible = visibilitymap_count(rel,
 true);
 + relallfrozen = visibilitymap_count(rel, false);
 + }
   else
 /* don't bother for indexes */
 + {
   relallvisible = 0;
 +
 relallfrozen = 0;
 + }

 I think in this function, you have forgotten to update the
 relallfrozen value in pg_class.

Fixed.

 9.
 vacuumlazy.c

 @@ -253,14 +258,16 @@ lazy_vacuum_rel(Relation onerel, int options,
 VacuumParams *params,
   * NB: We
 need to check this before truncating the relation, because that
   * will change -rel_pages.
   */
 -
 if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
 + if ((vacrelstats-scanned_pages +
 vacrelstats-vmskipped_pages)
 +  vacrelstats-rel_pages)
   {
 - Assert(!scan_all);

 Why you have removed this Assert, won't the count of
 vacrelstats-scanned_pages + vacrelstats-vmskipped_pages be
 equal to vacrelstats-rel_pages when scall_all = true.

Fixed.

 10.
 vacuumlazy.c
 lazy_vacuum_rel()
 ..
 + scanned_all |= scan_all;
 +

 Why this new assignment is added, please add a comment to
 explain it.

It's not necessary, removed.

 11.
 lazy_scan_heap()
 ..
 + * Also, skipping even a single page accorind to all-visible bit of
 + * visibility map means that we can't update relfrozenxid, so we only want
 + * to do it if we can skip a goodly number. On the other hand, we count
 + * both how many pages we skipped according to all-frozen bit of visibility
 + * map and how many pages we freeze page, so we can update relfrozenxid if
 + * the sum of their is as many as tuples per page.

 a.
 typo
 /accorind/according

Fixed.

 b.
 is the second part of comment (starting from On the other hand)
 right?  I mean you are comparing sum of pages skipped due to
 all_frozen bit and number of pages freezed with tuples per page.
 I don't understand how are they related?


It's wrong, I wanted to say at last sentence that, so we can update
relfrozenxid if the sum of them is as many as pages of table.

 12.
 @@ -918,8 +954,13 @@ lazy_scan_heap

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-09 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 3:05 AM, Sawada Masahiko sawada.m...@gmail.com wrote:

 Also something for pg_upgrade is also not yet.

 TODO
 - Test case for this feature
 - pg_upgrade support.


I had forgotten to change the fork name of visibility map to vfm.
Attached latest v7 patch.
Please review it.

Regards,

--
Sawada Masahiko
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, vmbuffer))
+		if (visibilitymap_test(rel, blkno, vmbuffer, VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat-tuple_len += BLCKSZ - freespace;
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..806ce27 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,7 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o \
+	heapfuncs.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..ac74100 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,7 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+			bool all_visible_cleared, bool new_all_visible_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2131,8 +2131,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * or all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
 	   InvalidBuffer, options, bistate,
@@ -2147,10 +2148,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	if (PageIsAllVisible(BufferGetPage(buffer)))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllFrozen(BufferGetPage(buffer));
+
 		visibilitymap_clear(relation,
-			ItemPointerGetBlockNumber((heaptup-t_self)),
-			vmbuffer);
+			ItemPointerGetBlockNumber((heaptup-t_self)), vmbuffer);
 	}
 
 	/*
@@ -2448,10 +2452,12 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		if (PageIsAllVisible(page))
 		{
 			all_visible_cleared = true;
+
+			/* all-frozen information is also cleared at the same time */
 			PageClearAllVisible(page);
-			visibilitymap_clear(relation,
-BufferGetBlockNumber(buffer),
-vmbuffer);
+			PageClearAllFrozen(page);
+
+			visibilitymap_clear(relation, BufferGetBlockNumber(buffer), vmbuffer);
 		}
 
 		/*
@@ -2495,7 +2501,9 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec-flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			if (all_visible_cleared)
+xlrec-flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
 			xlrec-ntuples = nthispage;
 
 			/*
@@ -2731,9 +2739,9 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	/*
 	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, we'll have to unlock and
-	 * re-lock, to avoid holding the buffer lock across an I/O.  That's a bit
-	 * unfortunate, but hopefully shouldn't happen often.
+	 * visible or all frozen while we were busy locking the buffer, we'll
+	 * have to unlock and re-lock, to avoid holding the buffer lock across an
+	 * I/O.  That's a bit unfortunate, but hopefully shouldn't happen often.
 	 */
 	if (vmbuffer == InvalidBuffer  PageIsAllVisible(page))
 	{
@@ -2925,12 +2933,16 @@ l1:
 	 */
 	PageSetPrunable(page, xid);
 
+	/* clear PD_ALL_VISIBLE and PD_ALL_FORZEN flags */
 	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-08 Thread Sawada Masahiko
On Thu, Jul 9, 2015 at 4:31 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:
 
 
  Also, the flags of each heap page header might be set PD_ALL_FROZEN,
  as well as all-visible
 
 
  Is it possible to have VM bits set to frozen but not visible?
 
  The description makes those two states sound independent of each other.
 
  Are they? Or not? Do we test for an impossible state?
 

 It's impossible to have VM bits set to frozen but not visible.
 These bit are controlled independently. But eventually, when
 all-frozen bit is set, all-visible is also set.


 If that combination is currently impossible, could it be used indicate that
 the page is all empty?

Yeah, the status of that VM bits set to frozen but not visible is
impossible, so we could use this status for another something status
of the page.

 Having a crash-proof bitmap of all-empty pages would make vacuum truncation
 scans much more efficient.

The empty page is always marked all-visible by vacuum today, it's not enough?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-07 Thread Sawada Masahiko
On Wed, Jul 8, 2015 at 12:37 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-07 16:25:13 +0100, Simon Riggs wrote:
 I don't think pg_freespacemap is the right place.

 I agree that pg_freespacemap sounds like an odd location.

 I'd prefer to add that as a single function into core, so we can write
 formal tests.

 With the advent of src/test/modules it's not really a prerequisite for
 things to be builtin to be testable. I think there's fair arguments for
 moving stuff like pg_stattuple, pg_freespacemap, pg_buffercache into
 core at some point, but that's probably a separate discussion.


I understood.
So I will place bunch of test like src/test/module/visibilitymap_test,
which contains  some tests regarding this feature,
and gather them into one patch.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More logging for autovacuum

2015-07-07 Thread Sawada Masahiko
On Sat, Jul 4, 2015 at 2:30 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh gurj...@singh.im wrote:

 On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Gregory Stark wrote:
 
  I'm having trouble following what's going on with autovacuum and I'm
  finding
  the existing logging insufficient. In particular that it's only logging
  vacuum
  runs *after* the vacuum finishes makes it hard to see what vacuums are
  running
  at any given time. Also, I want to see what is making autovacuum decide
  to
  forgo vacuuming a table and the log with that information is at DEBUG2.

 So did this idea go anywhere?


 Assuming the thread stopped here, I'd like to rekindle the proposal.

 log_min_messages acts as a single gate for everything headed for the
 server logs; controls for per-background process logging do not exist. If
 one wants to see DEBUG/INFO messages for just the Autovacuum (or
 checkpointer, bgwriter, etc.), they have to set log_min_messages to that
 level, but the result would be a lot of clutter from other processes to
 grovel through, to see the messages of interest.


 I think that will be quite helpful.  During the patch development of
 parallel sequential scan, it was quite helpful to see the LOG messages
 of bgworkers, however one of the recent commits (91118f1a) have
 changed those to DEBUG1, now if I have to enable all DEBUG1
 messages, then what I need will be difficult to find in all the log
 messages.
 Having control of separate logging for background tasks will serve such
 a purpose.

 The facilities don't yet exist, but it'd be nice if such parameters when
 unset (ie NULL) pick up the value of log_min_messages. So by default, the
 users will get the same behaviour as today, but can choose to tweak per
 background-process logging when needed.


 Will this proposal allow user to see all the messages by all the background
 workers or will it be per background task.  Example in some cases user
 might want to see the messages by all bgworkers and in some cases one
 might want to see just messages by AutoVacuum and it's workers.

 I think here designing User Interface is an important work if others also
 agree
 that such an option is useful, could you please elaborate your proposal?

+1

I sometime want to set log_min_messages per process, especially when
less than DEBUG level log is needed.
It's not easy to set log level to particular process from immediately
after beginning of launch today.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More logging for autovacuum

2015-07-07 Thread Sawada Masahiko
On Sat, Jul 4, 2015 at 2:30 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh gurj...@singh.im wrote:

 On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Gregory Stark wrote:
 
  I'm having trouble following what's going on with autovacuum and I'm
  finding
  the existing logging insufficient. In particular that it's only logging
  vacuum
  runs *after* the vacuum finishes makes it hard to see what vacuums are
  running
  at any given time. Also, I want to see what is making autovacuum decide
  to
  forgo vacuuming a table and the log with that information is at DEBUG2.

 So did this idea go anywhere?


 Assuming the thread stopped here, I'd like to rekindle the proposal.

 log_min_messages acts as a single gate for everything headed for the
 server logs; controls for per-background process logging do not exist. If
 one wants to see DEBUG/INFO messages for just the Autovacuum (or
 checkpointer, bgwriter, etc.), they have to set log_min_messages to that
 level, but the result would be a lot of clutter from other processes to
 grovel through, to see the messages of interest.


 I think that will be quite helpful.  During the patch development of
 parallel sequential scan, it was quite helpful to see the LOG messages
 of bgworkers, however one of the recent commits (91118f1a) have
 changed those to DEBUG1, now if I have to enable all DEBUG1
 messages, then what I need will be difficult to find in all the log
 messages.
 Having control of separate logging for background tasks will serve such
 a purpose.


+1

I sometime want to set log_min_messages per process, especially when
less than DEBUG level log is needed.
It's not easy to set log level to particular process from immediately
after beginning of launch today.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-07 Thread Sawada Masahiko
 So I don't understand why you have two separate calls to visibilitymap_clear()
 Surely the logic should be to clear both bits at the same time?
Yes, you're right. all-frozen bit should be cleared at the same time
as clearing all-visible bit.

 1. Both bits unset   ~(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)
 which can be changed to state 2 only
 2. VISIBILITYMAP_ALL_VISIBLE only
 which can be changed state 1 or state 3
 3. VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN
 which can be changed to state 1 only
 If that is the case please simplify the logic for setting and unsetting the 
 bits so they are set together efficiently.
 At the same time please also put in Asserts to ensure that the state logic is 
 maintained when it is set and when it is tested.

 In patch, during Vacuum first the frozen bit is set and then the visibility
 will be set in a later operation, now if the crash happens between those
 2 operations, then isn't it possible that the frozen bit is set and visible
 bit is not set?

In current patch, frozen bit is set first in lazy_scan_heap(), so it's
possible to have VM bits set frozen bit but not visible as Amit
pointed out.
To fix it, I'm modifying the patch to more simpler and setting both
bits at the same time efficiently.

 I would also like to see the visibilitymap_test function exposed in SQL,
 so we can write code to examine the map contents for particular ctids.
 By doing that we can then write a formal test that shows the evolution of 
 tuples from insertion,
 vacuuming and freezing, testing the map has been set correctly at each stage.
 I guess that needs to be done as an isolationtest so we have an observer that 
 contrains the xmin in various ways.
 In light of multixact bugs, any code that changes the on-disk tuple metadata 
 needs formal tests.

Attached patch adds a few function to contrib/pg_freespacemap to
explore the inside of visibility map, which I used for my test.
I hope it helps for testing this feature.

 I think we need something for pg_upgrade to rewrite existing VMs.
 Otherwise a large read only database would suddenly require a massive
 revacuum after upgrade, which seems bad. That can wait for now until we all
 agree this patch is sound.

Yeah, I will address them.

Regards,

--
Sawada Masahiko


001_visibilitymap_test_function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Sawada Masahiko
On Thu, Jul 2, 2015 at 9:31 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 5:44 PM, Beena Emerson memissemer...@gmail.com wrote:
 Hello,
 There has been a lot of discussion. It has become a bit confusing.
 I am summarizing my understanding of the discussion till now.
 Kindly let me know if I missed anything important.

 Backward compatibility:
 We have to provide support for the current format and behavior for
 synchronous replication (The first running standby from list s_s_names)
 In case the new format does not include GUC, then a special value to be
 specified for s_s_names to indicate that.

 Priority and quorum:
 Quorum treats all the standby with same priority while in priority behavior,
 each one has a different priority and ACK must be received from the
 specified k lowest priority servers.
 I am not sure how combining both will work out.
 Mostly we would like to have some standbys from each data center to be in
 sync. Can it not be achieved by quorum only?

 So you're wondering if there is the use case where both quorum and priority 
 are
 used together?

 For example, please imagine the case where you have two standby servers
 (say A and B) in local site, and one standby server (say C) in remote disaster
 recovery site. You want to set up sync replication so that the master waits 
 for
 ACK from either A or B, i.e., the setting of 1(A, B). Also only when either A
 or B crashes, you want to make the master wait for ACK from either the
 remaining local standby or C. On the other hand, you don't want to use the
 setting like 1(A, B, C). Because in this setting, C can be sync standby when
 the master craches, and both A and B might be very behind of C. In this case,
 you need to promote the remote standby server C to new master,,, this is what
 you'd like to avoid.

 The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed 
 grammer.


If we set the remote disaster recovery site up as synch replica, we
would get some big latencies even though we use quorum commit.
So I think this case Fujii-san suggested is a good configuration, and
many users would want to use it.
I tend to agree with combine quorum and prioritization into one GUC
parameter while keeping backward compatibility.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-06 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 5:25 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:


 Also, the flags of each heap page header might be set PD_ALL_FROZEN,
 as well as all-visible


 Is it possible to have VM bits set to frozen but not visible?

 The description makes those two states sound independent of each other.

 Are they? Or not? Do we test for an impossible state?


 It's impossible to have VM bits set to frozen but not visible.
 These bit are controlled independently. But eventually, when
 all-frozen bit is set, all-visible is also set.

Attached latest version including some bug fix.
Please review it.

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:


 Also, the flags of each heap page header might be set PD_ALL_FROZEN,
 as well as all-visible


 Is it possible to have VM bits set to frozen but not visible?

 The description makes those two states sound independent of each other.

 Are they? Or not? Do we test for an impossible state?


It's impossible to have VM bits set to frozen but not visible.
These bit are controlled independently. But eventually, when
all-frozen bit is set, all-visible is also set.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus j...@agliodbs.com wrote:
 On 07/02/2015 12:44 PM, Andres Freund wrote:
 On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
 So there's two parts to this:

 1. I need to ensure that data is replicated to X places.

 2. I need to *know* which places data was synchronously replicated to
 when the master goes down.

 My entire point is that (1) alone is useless unless you also have (2).

 I think there's a good set of usecases where that's really not the case.

 Please share!  My plea for usecases was sincere.  I can't think of any.

 And do note that I'm talking about information on the replica, not on
 the master, since in any failure situation we don't have the old
 master around to check.

 How would you, even theoretically, synchronize that knowledge to all the
 replicas? Even when they're temporarily disconnected?

 You can't, which is why what we need to know is when the replica thinks
 it was last synced from the replica side.  That is, a sync timestamp and
 lsn from the last time the replica ack'd a sync commit back to the
 master successfully.  Based on that information, I can make an informed
 decision, even if I'm down to one replica.

 ... because we would know definitively which servers were in sync.  So
 maybe that's the use case we should be supporting?

 If you want automated failover you need a leader election amongst the
 surviving nodes. The replay position is all they need to elect the node
 that's furthest ahead, and that information exists today.

 I can do that already.  If quorum synch commit doesn't help us minimize
 data loss any better than async replication or the current 1-redundant,
 why would we want it?  If it does help us minimize data loss, how?

 In your example of 2 : { local_replica, london_server, nyc_server },
 if there is not something like quorum commit, only local_replica is synch
 and the other two are async. In this case, if the local data center gets
 destroyed, you need to promote either london_server or nyc_server. But
 since they are async, they might not have the data which have been already
 committed in the master. So data loss! Of course, as I said yesterday,
 they might have all the data and no data loss happens at the promotion.
 But the point is that there is no guarantee that no data loss happens.
 OTOH, if we use quorum commit, we can guarantee that either london_server
 or nyc_server has all the data which have been committed in the master.

 So I think that quorum commit is helpful for minimizing the data loss.


Yeah, quorum commit is helpful for minimizing data loss in comparison
with today replication.
But in this your case, how can we know which server we should use as
the next master server, after local data center got down?
If we choose a wrong one, we would get the data loss.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 6:23 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Yeah, quorum commit is helpful for minimizing data loss in comparison
 with today replication.
 But in this your case, how can we know which server we should use as
 the next master server, after local data center got down?
 If we choose a wrong one, we would get the data loss.

 Check the progress of each server, e.g., by using
 pg_last_xlog_replay_location(),
 and choose the server which is ahead of as new master.


Thanks. So we can choice the next master server using by checking the
progress of each server, if hot standby is enabled.
And a such procedure is needed even today replication.

I think that the #2 problem which is Josh pointed out seems to be solved;
1. I need to ensure that data is replicated to X places.
2. I need to *know* which places data was synchronously replicated
to when the master goes down.
And we can address #1 problem using quorum commit.

Thought?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-02 Thread Sawada Masahiko
On Thu, Jul 2, 2015 at 1:06 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 12:13 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com 
 wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It 
 would
 also match pg_clog, which uses two bits per transaction --- maybe we 
 can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. 
 Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good 
 source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


 The second patch is attached.

 In second patch, I added a bit that indicates all tuples in page are
 completely frozen into visibility map.
 The visibility map became a bitmap with two bit per heap page:
 all-visible and all-frozen.
 The logics around vacuum, insert/update/delete heap are almost same as
 previous version.

 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


 The previous patch is no longer applied cleanly to HEAD.
 The attached v2 patch is latest version.

 Please review it.

 Attached new rebased version patch.
 Please give me comments!

 Now we should review your design and approach rather than code,
 but since I got an assertion error while trying the patch, I report it.

 initdb -D test -k caused the following assertion failure.

 vacuuming database template1 ... TRAP:
 FailedAssertion(!PageHeader) (heapPage))-pd_flags  0x0004)),
 File: visibilitymap.c, Line: 328)
 sh: line 1: 83785 Abort trap: 6
 /dav/000_add_frozen_bit_into_visibilitymap_v3/bin/postgres --single
 -F -O -c search_path=pg_catalog -c exit_on_error=true template1 
 /dev/null
 child process exited with exit code 134
 initdb: removing data directory test

Thank you for bug report, and comments.

Fixed version is attached, and source code comment is also updated.
Please review it.

And I explain again here about what this patch does, current design.

- A additional bit for visibility map.
I added additional bit, say all-frozen bit, which indicates whether
the all pages of corresponding page are frozen, to visibility map.
This structure is similar to CLOG.
So the size of VM grew as twice as today.
Also, the flags of each heap page header might be set PD_ALL_FROZEN,
as well as all-visible

- Set and clear a all-frozen bit
Update and delete and insert(multi insert) operation would clear a bit
of that page, and clear flags of page header at same time.
Only vauum operation can set a bit if all tuple of a page are frozen.

- Anti-wrapping vacuum
We have to scan whole table for XID anti-warring today, and it's
really quite expensive because disk I/O.
The main benefit of this proposal is to reduce and avoid such
extremely large quantity I/O even when anti-wrapping vacuum is
executed.
We have to scan whole table for XID anti-warring today, and it's
really quite expensive.
In lazy_scan_heap() function, I added a such logic for experimental.

There were several another idea on previous discussion such as
read-only table, frozen map. But advantage of this direction is that
we don't need additional heap file, and can use the matured VM
mechanism.

Regards,

--
Sawada Masahiko
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..835d714 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+bool all_visible_cleared, bool new_all_visible_cleared,
+bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2107,7 +2108,8 @@ heap_insert(Relation

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-01 Thread Sawada Masahiko
On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. 
 Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


 The second patch is attached.

 In second patch, I added a bit that indicates all tuples in page are
 completely frozen into visibility map.
 The visibility map became a bitmap with two bit per heap page:
 all-visible and all-frozen.
 The logics around vacuum, insert/update/delete heap are almost same as
 previous version.

 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


 The previous patch is no longer applied cleanly to HEAD.
 The attached v2 patch is latest version.

 Please review it.

Attached new rebased version patch.
Please give me comments!

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-01 Thread Sawada Masahiko
On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus j...@agliodbs.com wrote:
 On 06/29/2015 01:01 AM, Michael Paquier wrote:

 You're confusing two separate things.  The primary manageability problem
 has nothing to do with altering the parameter.  The main problem is: if
 there is more than one synch candidate, how do we determine *after the
 master dies* which candidate replica was in synch at the time of
 failure?  Currently there is no way to do that.  This proposal plans to,
 effectively, add more synch candidate configurations without addressing
 that core design failure *at all*.  That's why I say that this patch
 decreases overall reliability of the system instead of increasing it.

 When I set up synch rep today, I never use more than two candidate synch
 servers because of that very problem.  And even with two I have to check
 replay point because I have no way to tell which replica was in-sync at
 the time of failure.  Even in the current limited feature, this
 significantly reduces the utility of synch rep.  In your proposal, where
 I could have multiple synch rep groups in multiple geos, how on Earth
 could I figure out what to do when the master datacenter dies?

We can have same application name servers today, it's like group.
So there are two problems regarding fail-over:
1. How can we know which group(set) we should use? (group means
application_name here)
2. And how can we decide which a server of that group we should
promote to the next master server?

#1, it's one of the big problem, I think.
I haven't came up with correct solution yet, but we would need to know
which server(group) is the best for promoting
without the running old master server.
For example, improving pg_stat_replication view. or the mediation
process always check each progress of standby.

#2, I guess the best solution is that the DBA can promote any server of group.
That is, DBA always can promote server without considering state of
server of that group.
It's not difficult, always using lowest LSN of a group as group LSN.


 BTW, ALTER SYSTEM is a strong reason to use JSON for the synch rep GUC
 (assuming it's one parameter) instead of some custom syntax.  If it's
 JSON, we can validate it in psql, whereas if it's some custom syntax we
 have to wait for the db to reload and fail to figure out that we forgot
 a comma.  Using JSON would also permit us to use jsonb_set and
 jsonb_delete to incrementally change the configuration.

Sounds convenience and flexibility. I agree with this json format
parameter only if we don't combine both quorum and prioritization.
Because of  backward compatibility.
I tend to use json format value and it's new separated GUC parameter.
Anyway, if we use json, I'm imaging parameter values like below.
{
group1 : {
quorum : 1,
standbys : [
{
a : {
quorum : 2,
standbys : [
c, d
]
}
},
b
]
}
}


 Question: what happens *today* if we have two different synch rep
 strings in two different *.conf files?  I wouldn't assume that anyone
 has tested this ...

We use last defied parameter even if sync rep strings in several file, right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Semantics of pg_file_settings view

2015-06-28 Thread Sawada Masahiko
-discussed change to
 suppress application of duplicated GUC entries, it would be possible to
 mark the ignored entries in the view, which would save people the effort
 of figuring out manually which ones were ignored.  The question is exactly
 how to mark the ignored entries.  A simple tweak would be to put something
 in the error column, say ignored because of later duplicate entry.
 However, this would break the property that an entry in the error column
 means there's something you'd better fix, which I think would be a useful
 rule for nagios-like monitoring tools.

 Another issue with the API as it stands here is that you can't easily
 distinguish errors that caused the entire config file to be ignored from
 those that only prevented application of one setting.

 The best idea I have at the moment is to also add a boolean applied
 column which is true if the entry was successfully applied.  Errors that
 result in the whole file being ignored would mean that *all* the entries
 show applied = false.  Otherwise, applied = false with nothing in the
 error column would imply that the entry was ignored due to a later
 duplicate.  There are multiple other ways it could be done of course;
 anyone want to lobby for some other design?

 There is more that could be done with this basic idea; in particular,
 it would be useful if set_config failures would be broken down in more
 detail than setting could not be applied.  That would require somewhat
 invasive refactoring though, and it would only be an incremental
 improvement in usability, so I'm disinclined to tackle the point under
 time pressure.

 Comments, better ideas?  Barring strong objections I'd like to get this
 finished over the weekend.


I think that we can find applied value by arranging
pg_file_settings.seqno ascending order.
The value which has highest seqno is the currently applied value, and
it's default value if pg_file_settings does not have that entry.
Because of above reason, I think it's enough to mark which entry was
not applied due to contain error in its config file rather than
marking which entry was applied.
For example, the 'ignored' column of the ignored parameter due to
contain the error in config file is true, OTOH, the ignored parameter
due to duplicate later is false.
Though?


@@ -289,6 +321,7 @@ ProcessConfigFile(GucContext context)

(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
 errmsg(parameter \%s\
cannot be changed without restarting the server,
+   item-errmsg = pstrdup(parameter cannot be
changed without restarting the server);
error = true;
continue;

Also, the above hunk is wrong, because the item variable is wobbly and
the correspond line is not exists here.
If we restart after removing a line to use default value which context
is SIGHUP(e,g, port), it leads to occur SEGV.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-28 Thread Sawada Masahiko
On Fri, Jun 26, 2015 at 2:46 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Jun 25, 2015 at 8:32 PM, Simon Riggs  wrote:
 Let's start with a complex, fully described use case then work out how to
 specify what we want.

 Well, one of the most simple cases where quorum commit and this
 feature would be useful for is that, with 2 data centers:
 - on center 1, master A and standby B
 - on center 2, standby C and standby D
 With the current synchronous_standby_names, what we can do now is
 ensuring that one node has acknowledged the commit of master. For
 example synchronous_standby_names = 'B,C,D'. But you know that :)
 What this feature would allow use to do is for example being able to
 ensure that a node on the data center 2 has acknowledged the commit of
 master, meaning that even if data center 1 completely lost for a
 reason or another we have at least one node on center 2 that has lost
 no data at transaction commit.

 Now, regarding the way to express that, we need to use a concept of
 node group for each element of synchronous_standby_names. A group
 contains a set of elements, each element being a group or a single
 node. And for each group we need to know three things when a commit
 needs to be acknowledged:
 - Does my group need to acknowledge the commit?
 - If yes, how many elements in my group need to acknowledge it?
 - Does the order of my elements matter?

 That's where the micro-language idea makes sense to use. For example,
 we can define a group using separators and like (elt1,...eltN) or
 [elt1,elt2,eltN]. Appending a number in front of a group is essential
 as well for quorum commits. Hence for example, assuming that '()' is
 used for a group whose element order does not matter, if we use that:
 - k(elt1,elt2,eltN) means that we need for the k elements in the set
 to return true (aka commit confirmation).
 - k[elt1,elt2,eltN] means that we need for the first k elements in the
 set to return true.

 When k is not defined for a group, k = 1. Using only elements
 separated by commas for the upper group means that we wait for the
 first element in the set (for backward compatibility), hence:
 1(elt1,elt2,eltN) = elt1,elt2,eltN


I think that you meant 1[elt1,elt2,eltN] = elt1,elt2,eltN in this
case (for backward compatibility), right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Semantics of pg_file_settings view

2015-06-28 Thread Sawada Masahiko
On Mon, Jun 29, 2015 at 12:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Sawada Masahiko sawada.m...@gmail.com writes:
 On Sun, Jun 28, 2015 at 12:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However there's a further tweak to the view that I'd like to think about
 making.  Once this is in and we make the originally-discussed change to
 suppress application of duplicated GUC entries, it would be possible to
 mark the ignored entries in the view, which would save people the effort
 of figuring out manually which ones were ignored.  The question is exactly
 how to mark the ignored entries.  A simple tweak would be to put something
 in the error column, say ignored because of later duplicate entry.
 However, this would break the property that an entry in the error column
 means there's something you'd better fix, which I think would be a useful
 rule for nagios-like monitoring tools.

 Another issue with the API as it stands here is that you can't easily
 distinguish errors that caused the entire config file to be ignored from
 those that only prevented application of one setting.

 The best idea I have at the moment is to also add a boolean applied
 column which is true if the entry was successfully applied.  Errors that
 result in the whole file being ignored would mean that *all* the entries
 show applied = false.  Otherwise, applied = false with nothing in the
 error column would imply that the entry was ignored due to a later
 duplicate.  There are multiple other ways it could be done of course;
 anyone want to lobby for some other design?

 I think that we can find applied value by arranging
 pg_file_settings.seqno ascending order.
 The value which has highest seqno is the currently applied value, and
 it's default value if pg_file_settings does not have that entry.

 Yeah, I realize that it's *possible* to get this information out of the
 view as it stands.  But it's not especially *convenient*.  And since this
 seems like the main if not only use-case for the view (at least prior to
 the addition of error information), I don't see why we insist on making it
 difficult for users to identify ignored entries.

Yep, I think that it will have enough information by adding additional
information of WIP patch.

 (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
  errmsg(parameter \%s\
 cannot be changed without restarting the server,
 +   item-errmsg = pstrdup(parameter cannot be
 changed without restarting the server);
 error = true;
 continue;

 Also, the above hunk is wrong, because the item variable is wobbly and
 the correspond line is not exists here.

 er ... what?


Sorry for confusing you.
Anyway I meant that I got SEGV after applied WIP patch, and the cause
is the above changes.
The case is following.
1. Add port = 6543 to postgresql.conf and restart server.
2. Remove that added line from postgresql.conf
3. Reload.
Got SEGV.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_file_settings view vs. Windows

2015-06-28 Thread Sawada Masahiko
On Mon, Jun 29, 2015 at 12:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 I noticed that in EXEC_BACKEND builds (ie, Windows) the pg_file_settings
 view doesn't act as its author presumably intended.  Specifically, it
 reads as empty until/unless the current session processes a SIGHUP event.
 ...
 More or less bad alternative answers include:
 ...
 3. Force a SIGHUP processing cycle but don't actually apply any of the
 values.  This would be pretty messy though, especially if you wanted it
 to produce results exactly like the normal case so far as detection of
 incorrect values goes.  I don't think this is a good idea; it's going
 to be too prone to corner-case bugs.

 I had second thoughts about how difficult this might be.  I had forgotten
 that set_config_option already has a changeVal argument that does more or
 less exactly what we need here: if false, it makes all the checks but
 doesn't actually apply the value.

 So let me make a radical proposal that both gets rid of the portability
 problem and, arguably, makes the view more useful than it is today.
 I think we should define the view as showing, not what was in the config
 file(s) at the last SIGHUP, but what is in the files *now*.  That means
 you could use it to validate edits before you attempt to apply them,
 rather than having to pull the trigger and then ask if it worked.  And yet
 it would remain just about as useful as it is now for post-mortem checks
 when a SIGHUP didn't do what you expected.

 This could be implemented by doing essentially a SIGHUP cycle but passing
 changeVal = false to set_config_option.  Other details would remain mostly
 as in my WIP patch of yesterday.  The temporary context for doing SIGHUP
 work still seems like a good idea, but we could flush it at the end of
 the view's execution rather than needing to hang onto it indefinitely.

 The main bug risk here is that there might be GUCs whose apply_hook is
 making validity checks that should have been in a check_hook.  However,
 any such coding is wrong already; and the symptom here would only be that
 the view might fail to point out values that can't be applied, which would
 be annoying but it's hardly catastrophic.

 Comments?


You meant that we parse each GUC parameter in configration file, and
then pass changeVal=false to set_config_option whenever
pg_file_settings is refered.
So this view would be used for checking whether currently
configuration file is applied successfully or not, right?

The such a feature would be also good, but the main purpose of
pg_file_settings was to resolve where each GUC parameter came from,
especially in case of using ALTER SYSTEM command.
I'm not sure that it would be adequate for our originally purpose.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_file_settings view vs. Windows

2015-06-27 Thread Sawada Masahiko
On Sun, Jun 28, 2015 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Sun, Jun 28, 2015 at 8:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, exactly.  Unfortunately I see no way to add a useful test, at least
 not one that will work in installcheck mode.  There's no way to predict
 what will be in the view in that case.  Even for make check, the output
 would be pretty darn environment-dependent.

 And also because this patch had no review input regarding Windows and
 EXEC_BACKEND. I would suggest pinging the author (just did so),
 waiting for a fix a bit, and move on with 4. if nothing happens.

 Well, mumble.  After playing with this for a bit, I'm fairly convinced
 that it offers useful functionality, especially with the error-reporting
 additions I've proposed.  Right now, there is no easy way to tell whether
 a SIGHUP has worked, or why not if not, unless you have access to the
 postmaster log.  So I think there's definite usefulness here for
 remote-administration scenarios.

 So I kinda think that alternative 1 (document the Windows deficiency)
 is better than having no such functionality at all.  Obviously a proper
 fix would be better yet, but that's something that could be rolled in
 later.

 We usually require that a patch includes support for Windows as a
 requirement (see for example discussions about why pg_fincore in not a
 contrib module even if it overlaps a bit with pg_prewarm), why would
 this patch have a different treatment?

 Agreed, but it was evidently not obvious to anyone that there was a
 portability issue in this code, else we'd have resolved the issue
 before it got committed.  As a thought experiment, what would happen
 if we'd not noticed this issue till post-release, which is certainly
 not implausible?

 Also, there are multiple pre-existing minor bugs (the leakage problem
 I mentioned earlier, and some other things I've found while hacking
 on the view patch) that we would have to deal with in some other
 way if we revert now.  I'd just as soon not detangle that.


Thank you for bug report.

I have not came up with portable idea yet, but I will deal with this
problem immediately.
If I couldn't come up with better solution, I'd like to propose #1 idea.
But it would be unavoidable to be revert it if there are any reason
for Windows support.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-25 Thread Sawada Masahiko
On Thu, Jun 25, 2015 at 7:32 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 25 June 2015 at 05:01, Michael Paquier michael.paqu...@gmail.com wrote:

 On Thu, Jun 25, 2015 at 12:57 PM, Fujii Masao wrote:
  On Thu, Jun 25, 2015 at 12:15 PM, Michael Paquier wrote:
  and that's actually equivalent to that in
  the grammar: 1(AAA,BBB,CCC).
 
  I don't think that they are the same. In the case of 1(AAA,BBB,CCC),
  while
  two servers AAA and BBB are running, the master server may return a
  success
  of the transaction to the client just after it receives the ACK from
  BBB.
  OTOH, in the case of AAA,BBB, that never happens. The master must wait
  for
  the ACK from AAA to arrive before completing the transaction. And then,
  if AAA goes down, BBB should become synchronous standby.

 Ah. Right. I missed your point, that's a bad day... We could have
 multiple separators to define group types then:
 - () where the order of acknowledgement does not matter
 - [] where it does not.
 You would find the old grammar with:
 1[AAA,BBB,CCC]

 Let's start with a complex, fully described use case then work out how to
 specify what we want.

 I'm nervous of it would be good ifs because we do a ton of work only to
 find a design flaw.


I'm not sure specific implementation yet, but I came up with solution
for this case.

For example,
- s_s_name = '1(a, b), c, d'
The priority of both 'a' and 'b' are 1, and 'c' is 2, 'd' is 3.
i.g, 'b' and 'c' are potential sync node, and the quorum commit is
enable only between 'a' and 'b'.

- s_s_name = 'a, 1(b,c), d'
priority of 'a' is 1, 'b' and 'c' are 2, 'd' is 3.
So the quorum commit with 'b' and 'c' will be enabled after 'a' down.

With this idea, I think that we could use conventional syntax as in the past.
Though?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN function of pageinspect has inconsistency data type.

2015-06-19 Thread Sawada Masahiko
On Wed, Jun 17, 2015 at 4:11 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 6/16/15 8:26 AM, Sawada Masahiko wrote:

 I noticed while using gin function of pageinspect that there are some
 inconsistency data types.
 For example, data type of GinMetaPageData.head, and tail  is
 BlockNumber, i.g, uint32.
 But in ginfunc.c, we get that value as int64.


 You can't put a uint32 into a signed int32 (which is what the SQL int is).
 That's why it's returning a bigint.

I understood why it's returning a bigint.
Thanks!

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-05-27 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


 The second patch is attached.

 In second patch, I added a bit that indicates all tuples in page are
 completely frozen into visibility map.
 The visibility map became a bitmap with two bit per heap page:
 all-visible and all-frozen.
 The logics around vacuum, insert/update/delete heap are almost same as
 previous version.

 This patch lack some point: documentation, comment in source code,
 etc, so it's WIP patch yet,
 but I think that it's enough to discuss about this.


The previous patch is no longer applied cleanly to HEAD.
The attached v2 patch is latest version.

Please review it.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index caacc10..fcbf06a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+bool all_visible_cleared, bool new_all_visible_cleared,
+bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2107,7 +2108,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		all_visible_cleared = false;
+	bool		all_visible_cleared = false,
+all_frozen_cleared = false;
 
 	/*
 	 * Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2131,8 +2133,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * of all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
 	   InvalidBuffer, options, bistate,
@@ -2150,7 +2153,16 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		PageClearAllVisible(BufferGetPage(buffer));
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber((heaptup-t_self)),
-			vmbuffer);
+			vmbuffer, VISIBILITYMAP_ALL_VISIBLE);
+	}
+
+	if (PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		visibilitymap_clear(relation,
+			ItemPointerGetBlockNumber((heaptup-t_self)),
+			vmbuffer, VISIBILITYMAP_ALL_FROZEN);
 	}
 
 	/*
@@ -2199,6 +2211,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xlrec.flags = 0;
 		if (all_visible_cleared)
 			xlrec.flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+		if (all_frozen_cleared)
+			xlrec.flags |= XLH_INSERT_ALL_FROZEN_CLEARED;
 		if (options  HEAP_INSERT_SPECULATIVE)
 			xlrec.flags |= XLH_INSERT_IS_SPECULATIVE;
 		Assert(ItemPointerGetBlockNumber(heaptup-t_self) == BufferGetBlockNumber(buffer));
@@ -2406,7 +2420,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	{
 		Buffer		buffer;
 		Buffer		vmbuffer = InvalidBuffer;
-		bool		all_visible_cleared = false;
+		bool		all_visible_cleared = false,
+	all_frozen_cleared = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2451,7 +2466,16 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			PageClearAllVisible(page);
 			visibilitymap_clear(relation

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-05-16 Thread Sawada Masahiko
On Fri, May 15, 2015 at 9:18 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, May 15, 2015 at 8:55 PM, Beena Emerson memissemer...@gmail.com 
 wrote:
 There was a discussion on support for N synchronous standby servers started
 by Michael. Refer
 http://archives.postgresql.org/message-id/cab7npqr9c84ig0zuvhmqamq53vqsd4rc82vyci4dr27pvof...@mail.gmail.com
 . The use of hooks and dedicated language was suggested, however, it seemed
 to be an overkill for the scenario and there was no consensus on this.
 Exploring GUC-land was preferred.

 Cool.

 Please find attached a patch,  built on Michael's patch from above mentioned
 thread, which supports choosing different number of nodes from each set i.e.
 k nodes from set 1, l nodes from set 2, so on.
 The format of synchronous_standby_names has been updated to standby name
 followed by the required count separated by hyphen. Ex: 'aa-1, bb-3'.  The
 transaction waits for all the specified number of standby in each group. Any
 extra nodes with the same name will be considered potential. The special
 entry * for the standby name is also supported.

 I don't think that this is going in the good direction, what was
 suggested mainly by Robert was to use a micro-language that would
 allow far more extensibility that what you are proposing. See for
 example ca+tgmobpwoenmmepfx0jwrvqufxvbqrv26ezq_xhk21gxrx...@mail.gmail.com
 for some ideas. IMO, before writing any patch in this area we should
 find a clear consensus on what we want to do. Also, unrelated to this
 patch, we should really get first the patch implementing the... Hum...
 infrastructure for regression tests regarding replication and
 archiving to be able to have actual tests for this feature (working on
 it for next CF).

The dedicated language for multiple sync replication would be more
extensibility as you said, but I think there are not a lot of user who
want to or should use this.
IMHO such a dedicated extensible feature could be extension module,
i.g. contrib. And we could implement more simpler feature into
PostgreSQL core with some restriction.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-14 Thread Sawada Masahiko
On Thu, May 14, 2015 at 9:58 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 13, 2015 at 8:25 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 The v15 patch emits a line for each table when reindexing multiple
 tables, and emits a line for each index when reindexing single table.
 But v14 patch emits a line for each index, regardless of reindex target.
 Should I change back to v14 patch?

 Uh, maybe.  What made you change it?


I thought that the users who want to reindex multiple tables are
interested in the time  to reindex whole table takes.
But I think it seems sensible to emit a line for each index even when
reindex multiple tables.
The v16 patch is based on v14 and a few modified is attached.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..703b760 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable
+REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8c8a9ea..bac9fbe 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3184,13 +3185,17 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+int options)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3334,6 +3339,14 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	if (options  REINDEXOPT_VERBOSE)
+		ereport(INFO,
+(errmsg(index \%s\ was reindexed,
+		get_rel_name(indexId)),
+ errdetail(%s.,
+		   pg_rusage_show(ru0;
+
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
 	heap_close(heapRelation, NoLock);
@@ -3375,7 +3388,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
  * index rebuild.
  */
 bool
-reindex_relation(Oid relid, int flags)
+reindex_relation(Oid relid, int flags, int options)
 {
 	Relation	rel;
 	Oid			toast_relid;
@@ -3466,7 +3479,7 @@ reindex_relation(Oid relid, int flags)
 RelationSetIndexList(rel, doneIndexes, InvalidOid);
 
 			reindex_index(indexOid, !(flags  REINDEX_REL_CHECK_CONSTRAINTS),
-		  persistence);
+		  persistence, options);
 
 			CommandCounterIncrement();
 
@@ -3501,7 +3514,7 @@ reindex_relation(Oid relid, int flags)
 	 * still hold the lock on the master table.
 	 */
 	if ((flags  REINDEX_REL_PROCESS_TOAST)  OidIsValid(toast_relid))
-		result |= reindex_relation(toast_relid, flags);
+		result |= reindex_relation(toast_relid, flags, options);
 
 	return result;
 }
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3febdd5..7ab4874 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1532,7 +1532,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	reindex_relation(OIDOldHeap, reindex_flags);
+	reindex_relation(OIDOldHeap, reindex_flags, 0);
 
 	/*
 	 * If the relation being rebuild is pg_class, swap_relation_files()
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 351d48e..7340a1f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1681,7 +1681,7 @@ ChooseIndexColumnNames(List *indexElems)
  *		Recreate a specific index.
  */
 Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, int options)
 {
 	Oid			indOid;
 	Oid			heapOid = InvalidOid;
@@ -1706,7 +1706,7 @@ ReindexIndex(RangeVar *indexRelation)
 	persistence = irel-rd_rel-relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 3:10 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Uh, are we really using INFO to log this?  I thought that was a
 discouraged level to use anymore.  Why not NOTICE?


I think it should be INFO level because it is a information of REINDEX
command,such as progress of itself, CPU usage and so on. it would be
overkill if we output the logs as NOTICE level, and verbose outputs of
other maintenance command are emitted as INFO level.

 Also, when multiple tables are reindexed, do we emit lines for each
 index, or only for each table?  If we're going to emit a line for each
 index in the single-table mode, it seems more sensible to do the same
 for the multi-table forms also.


I agree that we emit lines for each table when we do reindex multiple tables.
The latest patch is attached.


Regards,

---
Sawada Masahiko


000_reindex_verbose_v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 4:53 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 13, 2015 at 2:10 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Uh, are we really using INFO to log this?  I thought that was a
 discouraged level to use anymore.  Why not NOTICE?

 Well, as Masahiko-san points out, VACUUM uses INFO.  I can't see any
 good reason to make this different.

 Also, when multiple tables are reindexed, do we emit lines for each
 index, or only for each table?  If we're going to emit a line for each
 index in the single-table mode, it seems more sensible to do the same
 for the multi-table forms also.

 Hmm, yeah, I agree with that.  I thought the patch worked that way,
 but I see now that it doesn't.  I think that should be changed.


The v15 patch emits a line for each table when reindexing multiple
tables, and emits a line for each index when reindexing single table.
But v14 patch emits a line for each index, regardless of reindex target.
Should I change back to v14 patch?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 12:28 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 7, 2015 at 6:55 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Sorry, I forgot attach files.

 Review comments:

 - Customarily we use int, rather than uint8, for flags variables.  I
 think we should do that here also.

 - reindex_index() emits a log message either way, but at DEBUG2 level
 without VERBOSE and at the INFO level with it.  I think we should skip
 it altogether without VERBOSE.  i.e. if (options  REINDEXOPT_VERBOSE)
 ereport(...)

 - The errmsg() in that function should not end with a period.

 - The 000 patch makes a pointless whitespace change to tab-complete.c.

 - Instead of an enumerated type (ReindexOption) just use #define to
 define the flag value.

 Apart from those fairly minor issues, I think this looks pretty solid.


Thank you for reviwing..
All fixed.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-09 Thread Sawada Masahiko
On Sat, May 9, 2015 at 4:26 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:


 On Fri, May 8, 2015 at 4:23 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:


 On Thu, May 7, 2015 at 7:55 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:
 
  On 5/7/15, Sawada Masahiko sawada.m...@gmail.com wrote:
   On Wed, May 6, 2015 at 5:42 AM, Robert Haas robertmh...@gmail.com
   javascript:; wrote:
   On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko
   sawada.m...@gmail.com
   javascript:; wrote:
   On Fri, May 1, 2015 at 9:04 PM, Robert Haas robertmh...@gmail.com
   javascript:; wrote:
   On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
   sawada.m...@gmail.com
   javascript:; wrote:
   VACUUM has both syntax: with parentheses and without parentheses.
   I think we should have both syntax for REINDEX like VACUUM does
   because it would be pain to put parentheses whenever we want to do
   REINDEX.
   Are the parentheses optional in REINDEX command?
  
   No.  The unparenthesized VACUUM syntax was added back before we
   realized that that kind of syntax is a terrible idea.  It requires
   every option to be a keyword, and those keywords have to be in a
   fixed
   order.  I believe the intention is to keep the old VACUUM syntax
   around for backward-compatibility, but not to extend it.  Same for
   EXPLAIN and COPY.
  
   REINDEX will have only one option VERBOSE for now.
   Even we're in a situation like that it's not clear to be added newly
   additional option to REINDEX now, we should need to put parenthesis?
  
   In my opinion, yes.  The whole point of a flexible options syntax is
   that we can add new options without changing the grammar.  That
   involves some compromise on the syntax, which doesn't bother me a
   bit.
   Our previous experiments with this for EXPLAIN and COPY and VACUUM
   have worked out quite well, and I see no reason for pessimism here.
  
   I agree that flexible option syntax does not need to change grammar
   whenever we add new options.
   Attached patch is changed based on your suggestion.
   And the patch for reindexdb is also attached.
   Please feedbacks.
  
   Also I'm not sure that both implementation and documentation
   regarding
   VERBOSE option should be optional.
  
   I don't know what this means.
  
  
   Sorry for confusing you.
   Please ignore this.
  
 
  Sorry, I forgot attach files.
 

 I applied the two patches to master and I got some errors when compile:

 tab-complete.c: In function ‘psql_completion’:
 tab-complete.c:3338:12: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
 ^
 tab-complete.c:3338:21: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:31: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
^
 tab-complete.c:3338:41: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:53: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:5: warning: statement with no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:59: error: expected ‘;’ before ‘}’ token
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
^
 tab-complete.c:3340:22: error: ‘list_REINDEX’ undeclared (first use in
 this function)
COMPLETE_WITH_LIST(list_REINDEX);
   ^
 tab-complete.c:169:22: note: in definition of macro ‘COMPLETE_WITH_LIST’
   completion_charpp = list; \
   ^
 tab-complete.c:3340:22: note: each undeclared identifier is reported only
 once for each function it appears in
COMPLETE_WITH_LIST(list_REINDEX);
   ^
 tab-complete.c:169:22: note: in definition of macro ‘COMPLETE_WITH_LIST’
   completion_charpp = list; \
   ^
 make[3]: *** [tab-complete.o] Error 1
 make[3]: *** Waiting for unfinished jobs
 make[2]: *** [install-psql-recurse] Error 2
 make[2]: *** Waiting for unfinished jobs
 make[1]: *** [install-bin-recurse] Error 2
 make: *** [install-src-recurse] Error 2


 Looking at the code I think you remove one line accidentally from
 tab-complete.c:

 $ git diff src/bin/psql/tab-complete.c
 diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
 index 750e29d..55b0df5 100644
 --- a/src/bin/psql/tab-complete.c
 +++ b/src/bin/psql/tab-complete.c
 @@ -3335,7 +3335,6 @@ psql_completion(const char *text, int

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-09 Thread Sawada Masahiko
On Sun, May 10, 2015 at 2:23 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sat, May 9, 2015 at 4:26 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:


 On Fri, May 8, 2015 at 4:23 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:


 On Thu, May 7, 2015 at 7:55 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:
 
  On 5/7/15, Sawada Masahiko sawada.m...@gmail.com wrote:
   On Wed, May 6, 2015 at 5:42 AM, Robert Haas robertmh...@gmail.com
   javascript:; wrote:
   On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko
   sawada.m...@gmail.com
   javascript:; wrote:
   On Fri, May 1, 2015 at 9:04 PM, Robert Haas robertmh...@gmail.com
   javascript:; wrote:
   On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
   sawada.m...@gmail.com
   javascript:; wrote:
   VACUUM has both syntax: with parentheses and without parentheses.
   I think we should have both syntax for REINDEX like VACUUM does
   because it would be pain to put parentheses whenever we want to do
   REINDEX.
   Are the parentheses optional in REINDEX command?
  
   No.  The unparenthesized VACUUM syntax was added back before we
   realized that that kind of syntax is a terrible idea.  It requires
   every option to be a keyword, and those keywords have to be in a
   fixed
   order.  I believe the intention is to keep the old VACUUM syntax
   around for backward-compatibility, but not to extend it.  Same for
   EXPLAIN and COPY.
  
   REINDEX will have only one option VERBOSE for now.
   Even we're in a situation like that it's not clear to be added newly
   additional option to REINDEX now, we should need to put parenthesis?
  
   In my opinion, yes.  The whole point of a flexible options syntax is
   that we can add new options without changing the grammar.  That
   involves some compromise on the syntax, which doesn't bother me a
   bit.
   Our previous experiments with this for EXPLAIN and COPY and VACUUM
   have worked out quite well, and I see no reason for pessimism here.
  
   I agree that flexible option syntax does not need to change grammar
   whenever we add new options.
   Attached patch is changed based on your suggestion.
   And the patch for reindexdb is also attached.
   Please feedbacks.
  
   Also I'm not sure that both implementation and documentation
   regarding
   VERBOSE option should be optional.
  
   I don't know what this means.
  
  
   Sorry for confusing you.
   Please ignore this.
  
 
  Sorry, I forgot attach files.
 

 I applied the two patches to master and I got some errors when compile:

 tab-complete.c: In function ‘psql_completion’:
 tab-complete.c:3338:12: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
 ^
 tab-complete.c:3338:21: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:31: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
^
 tab-complete.c:3338:41: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:53: warning: left-hand operand of comma expression has
 no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:5: warning: statement with no effect [-Wunused-value]
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
  ^
 tab-complete.c:3338:59: error: expected ‘;’ before ‘}’ token
 {TABLE, INDEX, SYSTEM, SCHEMA, DATABASE, NULL};
^
 tab-complete.c:3340:22: error: ‘list_REINDEX’ undeclared (first use in
 this function)
COMPLETE_WITH_LIST(list_REINDEX);
   ^
 tab-complete.c:169:22: note: in definition of macro ‘COMPLETE_WITH_LIST’
   completion_charpp = list; \
   ^
 tab-complete.c:3340:22: note: each undeclared identifier is reported only
 once for each function it appears in
COMPLETE_WITH_LIST(list_REINDEX);
   ^
 tab-complete.c:169:22: note: in definition of macro ‘COMPLETE_WITH_LIST’
   completion_charpp = list; \
   ^
 make[3]: *** [tab-complete.o] Error 1
 make[3]: *** Waiting for unfinished jobs
 make[2]: *** [install-psql-recurse] Error 2
 make[2]: *** Waiting for unfinished jobs
 make[1]: *** [install-bin-recurse] Error 2
 make: *** [install-src-recurse] Error 2


 Looking at the code I think you remove one line accidentally from
 tab-complete.c:

 $ git diff src/bin/psql/tab-complete.c
 diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
 index 750e29d..55b0df5 100644
 --- a/src/bin/psql/tab-complete.c
 +++ b/src/bin/psql

[HACKERS] Typo in reindexdb documentation

2015-05-09 Thread Sawada Masahiko
Hi all,

Attached patch fixes the typo is in reindexdb documentation regarding
REINDEX SCHEMA.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index b5b449c..a745f6c 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -30,7 +30,7 @@ PostgreSQL documentation
   arg choice=plainoption--schema/option/arg
   arg choice=plainoption-S/option/arg
  /group
- replaceabletable/replaceable
+ replaceableschema/replaceable
 /arg
/arg
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-07 Thread Sawada Masahiko
On 5/7/15, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, May 6, 2015 at 5:42 AM, Robert Haas robertmh...@gmail.com
 javascript:; wrote:
 On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko sawada.m...@gmail.com
 javascript:; wrote:
 On Fri, May 1, 2015 at 9:04 PM, Robert Haas robertmh...@gmail.com
 javascript:; wrote:
 On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
 sawada.m...@gmail.com
 javascript:; wrote:
 VACUUM has both syntax: with parentheses and without parentheses.
 I think we should have both syntax for REINDEX like VACUUM does
 because it would be pain to put parentheses whenever we want to do
 REINDEX.
 Are the parentheses optional in REINDEX command?

 No.  The unparenthesized VACUUM syntax was added back before we
 realized that that kind of syntax is a terrible idea.  It requires
 every option to be a keyword, and those keywords have to be in a fixed
 order.  I believe the intention is to keep the old VACUUM syntax
 around for backward-compatibility, but not to extend it.  Same for
 EXPLAIN and COPY.

 REINDEX will have only one option VERBOSE for now.
 Even we're in a situation like that it's not clear to be added newly
 additional option to REINDEX now, we should need to put parenthesis?

 In my opinion, yes.  The whole point of a flexible options syntax is
 that we can add new options without changing the grammar.  That
 involves some compromise on the syntax, which doesn't bother me a bit.
 Our previous experiments with this for EXPLAIN and COPY and VACUUM
 have worked out quite well, and I see no reason for pessimism here.

 I agree that flexible option syntax does not need to change grammar
 whenever we add new options.
 Attached patch is changed based on your suggestion.
 And the patch for reindexdb is also attached.
 Please feedbacks.

 Also I'm not sure that both implementation and documentation regarding
 VERBOSE option should be optional.

 I don't know what this means.


 Sorry for confusing you.
 Please ignore this.


Sorry, I forgot attach files.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..703b760 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable
+REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ac3b785..c04b907 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3133,13 +3134,22 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+uint8 options)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
+
+	/* Set option(s) */
+	if (options  REINDEXOPT_VERBOSE)
+		elevel = INFO;
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3283,6 +3293,13 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	ereport(elevel,
+			(errmsg(index \%s\ was reindexed.,
+	get_rel_name(indexId)),
+	errdetail(%s.,
+			  pg_rusage_show(ru0;
+
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
 	heap_close(heapRelation, NoLock);
@@ -3324,7 +3341,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
  * index rebuild.
  */
 bool
-reindex_relation(Oid relid, int flags)
+reindex_relation(Oid relid, int flags, uint8 options)
 {
 	Relation	rel;
 	Oid			toast_relid;
@@ -3415,7 +3432,7 @@ reindex_relation(Oid relid, int flags)
 RelationSetIndexList(rel, doneIndexes, InvalidOid);
 
 			reindex_index(indexOid, !(flags  REINDEX_REL_CHECK_CONSTRAINTS),
-		  persistence);
+		  persistence, options);
 
 			CommandCounterIncrement();
 
@@ -3450,7 +3467,7 @@ reindex_relation(Oid relid, int flags)
 	 * still hold

[HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-07 Thread Sawada Masahiko
On Wed, May 6, 2015 at 5:42 AM, Robert Haas robertmh...@gmail.com
javascript:; wrote:
 On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko sawada.m...@gmail.com
javascript:; wrote:
 On Fri, May 1, 2015 at 9:04 PM, Robert Haas robertmh...@gmail.com
javascript:; wrote:
 On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko sawada.m...@gmail.com
javascript:; wrote:
 VACUUM has both syntax: with parentheses and without parentheses.
 I think we should have both syntax for REINDEX like VACUUM does
 because it would be pain to put parentheses whenever we want to do
 REINDEX.
 Are the parentheses optional in REINDEX command?

 No.  The unparenthesized VACUUM syntax was added back before we
 realized that that kind of syntax is a terrible idea.  It requires
 every option to be a keyword, and those keywords have to be in a fixed
 order.  I believe the intention is to keep the old VACUUM syntax
 around for backward-compatibility, but not to extend it.  Same for
 EXPLAIN and COPY.

 REINDEX will have only one option VERBOSE for now.
 Even we're in a situation like that it's not clear to be added newly
 additional option to REINDEX now, we should need to put parenthesis?

 In my opinion, yes.  The whole point of a flexible options syntax is
 that we can add new options without changing the grammar.  That
 involves some compromise on the syntax, which doesn't bother me a bit.
 Our previous experiments with this for EXPLAIN and COPY and VACUUM
 have worked out quite well, and I see no reason for pessimism here.

I agree that flexible option syntax does not need to change grammar
whenever we add new options.
Attached patch is changed based on your suggestion.
And the patch for reindexdb is also attached.
Please feedbacks.

 Also I'm not sure that both implementation and documentation regarding
 VERBOSE option should be optional.

 I don't know what this means.


Sorry for confusing you.
Please ignore this.

Regards,

---
Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-05 Thread Sawada Masahiko
On Fri, May 1, 2015 at 9:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 VACUUM has both syntax: with parentheses and without parentheses.
 I think we should have both syntax for REINDEX like VACUUM does
 because it would be pain to put parentheses whenever we want to do
 REINDEX.
 Are the parentheses optional in REINDEX command?

 No.  The unparenthesized VACUUM syntax was added back before we
 realized that that kind of syntax is a terrible idea.  It requires
 every option to be a keyword, and those keywords have to be in a fixed
 order.  I believe the intention is to keep the old VACUUM syntax
 around for backward-compatibility, but not to extend it.  Same for
 EXPLAIN and COPY.

REINDEX will have only one option VERBOSE for now.
Even we're in a situation like that it's not clear to be added newly
additional option to REINDEX now, we should need to put parenthesis?
Also I'm not sure that both implementation and documentation regarding
VERBOSE option should be optional.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-05-01 Thread Sawada Masahiko
On Fri, May 1, 2015 at 6:24 AM, David Steele da...@pgmasters.net wrote:
 On 4/30/15 6:05 AM, Fujii Masao wrote:
 On Thu, Apr 30, 2015 at 12:57 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:

 I have changed the status this to Ready for Committer.

 The specification of session audit logging seems to be still unclear to me.
 For example, why doesn't session audit logging log queries accessing to

 The idea was that queries consisting of *only* catalog relations are
 essentially noise.  This makes the log much quieter when tools like psql
 or PgAdmin are in use.  Queries that contain a mix of catalog and user
 tables are logged.

 However, you make a good point, so in the spirit of cautious defaults
 I've changed the behavior to log in this case and allow admins to turn
 off the behavior if they choose with a new GUC, pg_audit.log_catalog.

 a catalog like pg_class? Why doesn't it log any queries executed in aborted
 transaction state? Since there is no such information in the document,

 The error that aborts a transaction can easily be logged via the
 standard logging facility.  All prior statements in the transaction will
 be logged with pg_audit.  This is acceptable from an audit logging
 perspective as long as it is documented behavior, which as you point out
 it currently is not.

 This has now been documented in the caveats sections which should make
 it clearer to users.

 I'm afraid that users would easily get confused with it. Even if we document 
 it,
 I'm not sure if the current behavior is good for the audit purpose. For 
 example,
 some users may want to log even queries accessing to the catalogs.

 Agreed, and this is now the default.

 I have no idea about when the current CommitFest will end. But probably
 we don't have that much time left. So I'm thinking that maybe we should pick 
 up
 small, self-contained and useful part from the patch and focus on that.
 If we try to commit every features that the patch provides, we might get
 nothing at least in 9.5, I'm afraid.

 May 15th is the feature freeze, so that does give a little time.  It's
 not clear to me what a self-contained part of the patch would be.  If
 you have specific ideas on what could be broken out I'm interested to
 hear them.

 Patch v11 is attached with the changes discussed here plus some other
 improvements to the documentation suggested by Erik.


For now, since pg_audit patch has a pg_audit_ddl_command_end()
function which uses part of un-committed deparsing utility commands
patch API,
pg_audit patch is completely depend on that patch.
If API name, interface are changed, it would affect for pg_audit patch.
I'm not sure about progress of deparsing utility command patch but
you have any idea if that patch is not committed into 9.5 until May
15?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-30 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good source
 for cut/paste/whack.


 I agree with adding a bit that indicates corresponding page is
 all-frozen into VM, just like CLOG.
 I'll change the patch as second version patch.


The second patch is attached.

In second patch, I added a bit that indicates all tuples in page are
completely frozen into visibility map.
The visibility map became a bitmap with two bit per heap page:
all-visible and all-frozen.
The logics around vacuum, insert/update/delete heap are almost same as
previous version.

This patch lack some point: documentation, comment in source code,
etc, so it's WIP patch yet,
but I think that it's enough to discuss about this.

Please feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b504ccd..a06e16d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -86,7 +86,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+bool all_visible_cleared, bool new_all_visible_cleared,
+bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2068,7 +2069,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		all_visible_cleared = false;
+	bool		all_visible_cleared = false,
+all_frozen_cleared = false;
 
 	/*
 	 * Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2092,8 +2094,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * of all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup-t_len,
 	   InvalidBuffer, options, bistate,
@@ -2110,7 +2113,16 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		PageClearAllVisible(BufferGetPage(buffer));
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber((heaptup-t_self)),
-			vmbuffer);
+			vmbuffer, VISIBILITYMAP_ALL_VISIBLE);
+	}
+
+	if (PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		visibilitymap_clear(relation,
+			ItemPointerGetBlockNumber((heaptup-t_self)),
+			vmbuffer, VISIBILITYMAP_ALL_FROZEN);
 	}
 
 	/*
@@ -2157,6 +2169,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 		xlrec.offnum = ItemPointerGetOffsetNumber(heaptup-t_self);
 		xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+		if (all_frozen_cleared)
+			xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
 		Assert(ItemPointerGetBlockNumber(heaptup-t_self) == BufferGetBlockNumber(buffer));
 
 		/*
@@ -2350,7 +2364,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	{
 		Buffer		buffer;
 		Buffer		vmbuffer = InvalidBuffer;
-		bool		all_visible_cleared = false;
+		bool		all_visible_cleared = false,
+	all_frozen_cleared = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2395,7 +2410,16 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			PageClearAllVisible(page);
 			visibilitymap_clear(relation,
 BufferGetBlockNumber(buffer),
-vmbuffer);
+vmbuffer, VISIBILITYMAP_ALL_VISIBLE);
+		}
+
+		if (PageIsAllFrozen(page))
+		{
+			all_frozen_cleared = true;
+			PageClearAllFrozen(page

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Fri, Apr 10, 2015 at 2:52 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, Apr 9, 2015 at 1:14 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 10:53 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 
  On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
  sawada.m...@gmail.com
  wrote:
 
  Thank you for your reviewing.
  I modified the patch and attached latest version patch(v7).
  Please have a look it.
 
 
  Looks good to me. Attached patch (v8) just fix a tab indentation in
  gram.y.
 

 I had forgotten fix a tab indentation, sorry.
 Thank you for reviewing!
 It looks good to me too.
 Can this patch be marked as Ready for Committer?


 +1


 Changed status to Ready for Committer.

 The patch adds new syntax like REINDEX ... WITH VERBOSE, i.e., () is not
 added after WITH clause. Did we reach the consensus about this syntax?
 The last email from Robert just makes me think that () should be added
 into the syntax.


 Thank you for taking time for this patch!

 I removed the FORCE option from REINDEX, so you would need to update the 
 patch.

 Thanks.
 I will change the patch with this change.

 This was quite complicated issue since we already have a lot of style
 command currently.
 We can think many of style from various perspective: kind of DDL, new
 or old command, maintenance command. And each command has each its
 story.
 I believe we have reached the consensus with this style at least once
 (please see previous discussion), but we might needs to discuss
 more...

 Okay, another question is that; WITH must be required whenever the options
 are specified? Or should it be abbreviatable?

 In previous discussion, the WITH clause is always required by VERBOSE
 option. I don't think to enable us to abbreviate WITH clause for now.
 Also, at this time that FORCE option is removed, we could bring back
 idea is to put VERBOSE at after object name like CLUSTER. (INDEX,
 TABLE, etc.)


Attached v10 patch is latest version patch.
The syntax is,
REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]

That is, WITH clause is optional.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..2e8db5a 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ WITH ] [ VERBOSE ]
 /synopsis
  /refsynopsisdiv
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ac3b785..27364ec 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3133,13 +3134,18 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+bool verbose)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = verbose ? INFO : DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3283,6 +3289,13 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	ereport(elevel,
+			(errmsg(index \%s\ was reindexed.,
+	get_rel_name(indexId)),
+	errdetail(%s.,
+			  pg_rusage_show(ru0;
+
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
 	heap_close(heapRelation, NoLock);
@@ -3324,7 +3337,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
  * index rebuild.
  */
 bool
-reindex_relation(Oid relid, int flags)
+reindex_relation(Oid relid

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 8:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 6:37 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 Attached v10 patch is latest version patch.
 The syntax is,
 REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]

 That is, WITH clause is optional.

 I thought we agreed on moving this earlier in the command:

 http://www.postgresql.org/message-id/18569.1423159...@sss.pgh.pa.us


Oh, I see.
Attached patch is modified syntax as
REINDEX [VERBOSE] { INDEX | ... } name

Thought?

Regards,

---
Sawada Masahiko


000_reindex_verbose_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Fri, May 1, 2015 at 1:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 9:15 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, Apr 30, 2015 at 8:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 6:37 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 Attached v10 patch is latest version patch.
 The syntax is,
 REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]

 That is, WITH clause is optional.

 I thought we agreed on moving this earlier in the command:

 http://www.postgresql.org/message-id/18569.1423159...@sss.pgh.pa.us


 Oh, I see.
 Attached patch is modified syntax as
 REINDEX [VERBOSE] { INDEX | ... } name

 Thought?

 I thought what we agreed on was:

 REINDEX (flexible options) { INDEX | ... } name

 The argument wasn't about whether to use flexible options, but where
 in the command to put them.


VACUUM has both syntax: with parentheses and without parentheses.
I think we should have both syntax for REINDEX like VACUUM does
because it would be pain to put parentheses whenever we want to do
REINDEX.
Are the parentheses optional in REINDEX command?

And CLUSTER should have syntax like that in future?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-29 Thread Sawada Masahiko
On Wed, Apr 29, 2015 at 12:20 AM, David Steele da...@pgmasters.net wrote:
 On 4/27/15 10:37 PM, Sawada Masahiko wrote:

 Thank you for your reviewing.
 Attached v8 patch is latest version.

 I posted a review through the CF app but it only went to the list
 instead of recipients of the latest message.  install-checkworld is
 failing but the fix is pretty trivial.

 See:
 http://www.postgresql.org/message-id/20150428145626.2632.75287.p...@coridan.postgresql.org


Thank you for reviewing!
I have fixed regarding regression test.
The latest patch is attached.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-29 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 6:31 AM, David Steele da...@pgmasters.net wrote:
 On 4/29/15 5:16 PM, Robert Haas wrote:
 On Fri, Apr 24, 2015 at 2:40 PM, David Steele da...@pgmasters.net wrote:
   para
The view structnamepg_file_settings/structname provides access to
 run-time parameters
that are defined in configuration files via SQL.  In contrast to
structnamepg_settings/structname a row is provided for each
 occurrence
of the parameter in a configuration file.  This is helpful for
 discovering why one value
may have been used in preference to another when the parameters were
 loaded.
   /para

 This seems to imply that this gives information about only a subset of
 configuration files; specifically, those auto-generated based on SQL
 commands - i.e. postgresql.conf.auto.  But I think it's supposed to
 give information about all configuration files, regardless of how
 generated.  Am I wrong?  If not, I'd suggest run-time parameters that
 are defined in configuration files via SQL - run-time parameters
 stored in configuration files.

 The view does give information about all configuration files regardless
 of how they were created.

 That's what I intended the text to say but I think your phrasing is clearer.


Thank you for reviewing.
I agree with this.
Attached patch is updated version v10.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 4b79958..adb8628 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7560,6 +7560,11 @@
  /row
 
  row
+  entrylink linkend=view-pg-file-settingsstructnamepg_file_settings/structname/link/entry
+  entryparameter settings of file/entry
+ /row
+
+ row
   entrylink linkend=view-pg-shadowstructnamepg_shadow/structname/link/entry
   entrydatabase users/entry
  /row
@@ -9173,6 +9178,74 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  /sect1
 
+ sect1 id=view-pg-file-settings
+  titlestructnamepg_file_settings/structname/title
+
+  indexterm zone=view-pg-file-settings
+   primarypg_file_settings/primary
+  /indexterm
+
+  para
+   The view structnamepg_file_settings/structname provides access to
+   run-time parameters stored in configuration files.
+   In contrast to structnamepg_settings/structname a row is provided for
+   each occurrence of the parameter in a configuration file. This is helpful
+   for discovering why one value may have been used in preference to another
+   when the parameters were loaded.
+  /para
+
+  table
+   titlestructnamepg_file_settings/ Columns/title
+
+  tgroup cols=3
+   thead
+row
+ entryName/entry
+ entryType/entry
+ entryDescription/entry
+/row
+   /thead
+   tbody
+row
+ entrystructfieldsourcefile/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entryA path of configration file/entry
+/row
+row
+ entrystructfieldsourceline/structfield/entry
+ entrystructfieldinteger/structfield/entry
+ entry
+  Line number within the configuration file the current value was
+  set at (null for values set from sources other than configuration files,
+  or when examined by a non-superuser)
+ /entry
+/row
+row
+ entrystructfieldseqno/structfield/entry
+ entrystructfieldinteger/structfield/entry
+ entryOrder in which the setting was loaded from the configuration/entry
+/row
+row
+ entrystructfieldname/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entry
+  Configuration file the current value was set in (null for values
+  set from sources other than configuration files, or when examined by a
+  non-superuser); helpful when using literalinclude/literal directives in
+  configuration files
+ /entry
+/row
+row
+ entrystructfieldsetting/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entryRun-time configuration parameter name/entry
+/row
+   /tbody
+  /tgroup
+ /table
+
+/sect1
+
  sect1 id=view-pg-shadow
   titlestructnamepg_shadow/structname/title
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2ad01f4..18921c4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -411,6 +411,12 @@ CREATE RULE pg_settings_n AS
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
 
+CREATE VIEW pg_file_settings AS
+   SELECT * FROM pg_show_all_file_settings() AS A;
+
+REVOKE ALL on pg_file_settings FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_show_all_file_settings() FROM PUBLIC;
+
 CREATE VIEW pg_timezone_abbrevs AS
 SELECT * FROM pg_timezone_abbrevs();
 
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index c5e0fac..873d950 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -119,6 +119,8 @@ ProcessConfigFile(GucContext context)
 	ConfigVariable *item,
 			   *head

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-29 Thread Sawada Masahiko
On Wed, Apr 29, 2015 at 12:17 AM, David Steele da...@pgmasters.net wrote:
 On 4/28/15 2:14 AM, Sawada Masahiko wrote:
 On Fri, Apr 24, 2015 at 3:23 AM, David Steele da...@pgmasters.net wrote:
 I've also added some checking to make sure that if anything looks funny
 on the stack an error will be generated.

 Thanks for the feedback!


 Thank you for updating the patch!
 I ran the postgres regression test on database which is enabled
 pg_audit, it works fine.
 Looks good to me.

 If someone don't have review comment or bug report, I will mark this
 as Ready for Committer.

 Thank you!  I appreciate all your work reviewing this patch and of
 course everyone else who commented on, reviewed or tested the patch
 along the way.


I have changed the status this to Ready for Committer.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-28 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 3:23 AM, David Steele da...@pgmasters.net wrote:
 On 4/23/15 5:49 AM, Sawada Masahiko wrote:

 I'm concerned that behaviour of pg_audit has been changed at a few
 times as far as I remember. Did we achieve consensus on this design?

 The original author Abhijit expressed support for the SESSION/OBJECT
 concept before I started working on the code and so has Stephen Frost.
 As far as I know all outstanding comments from the community have been
 addressed.

 Overall behavior has not changed very much since being submitted to the
 CF in February - mostly just tweaks and additional options.

 And one question; OBJECT logging of all tuple deletion (i.g. DELETE
 FROM hoge) seems like not work as follows.


 =# grant all on bar TO masahiko;

 (1) Delete all tuple
 =# insert into bar values(1);
 =# delete from bar ;
 NOTICE:  AUDIT: SESSION,47,1,WRITE,DELETE,TABLE,public.bar,delete from bar ;
 DELETE 1

 (2) Delete specified tuple (but same result as (1))
 =# insert into bar values(1);
 =# delete from bar where col = 1;
 NOTICE:  AUDIT: OBJECT,48,1,WRITE,DELETE,TABLE,public.bar,delete from
 bar where col = 1;
 NOTICE:  AUDIT: SESSION,48,1,WRITE,DELETE,TABLE,public.bar,delete from
 bar where col = 1;
 DELETE 1

 Definitely a bug.  Object logging works in the second case because the
 select privileges on the col column trigger logging.  I have fixed
 this and added a regression test.

 I also found a way to get the stack memory context under the query
 memory context.  Because of the order of execution it requires moving
 the memory context but I still think it's a much better solution.  I was
 able to remove most of the stack pops (except function logging) and the
 output remained stable.

 I've also added some checking to make sure that if anything looks funny
 on the stack an error will be generated.

 Thanks for the feedback!


Thank you for updating the patch!
I ran the postgres regression test on database which is enabled
pg_audit, it works fine.
Looks good to me.

If someone don't have review comment or bug report, I will mark this
as Ready for Committer.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Sat, Apr 25, 2015 at 3:40 AM, David Steele da...@pgmasters.net wrote:
 On 4/4/15 9:21 AM, Sawada Masahiko wrote:
 I added documentation changes to patch is attached.
 Also I tried to use memory context for allocation of guc_file_variable
 in TopMemoryContext,
 but it was failed access after received SIGHUP.
 Below is my review of the v5 patch:

Thank you for your review comment!
The latest patch is attached.

 diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
 + sect1 id=view-pg-file-settings
 +  titlestructnamepg_file_settings/structname/title
 +
 +  indexterm zone=view-pg-file-settings
 +   primarypg_file_settings/primary
 +  /indexterm
 +
 +  para
 +   The view structnamepg_file_settings/structname provides confirm
 parameters via SQL,
 +   which is written in configuration file. This view can be update by
 reloading configration file.
 +  /para

 Perhaps something like:

   para
The view structnamepg_file_settings/structname provides access to
 run-time parameters
that are defined in configuration files via SQL.  In contrast to
structnamepg_settings/structname a row is provided for each
 occurrence
of the parameter in a configuration file.  This is helpful for
 discovering why one value
may have been used in preference to another when the parameters were
 loaded.
   /para

 +  table
 +   titlestructnamepg_file_settings/ Columns/title
 +
 +  tgroup cols=3
 +   thead
 +row
 + entryName/entry
 + entryType/entry
 + entryDescription/entry
 +/row
 +   /thead
 +   tbody
 +row
 + entrystructfieldsourcefile/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryA path of configration file/entry

 From pg_settings:

   entryConfiguration file the current value was set in (null for
   values set from sources other than configuration files, or when
   examined by a non-superuser);
   helpful when using literalinclude/ directives in configuration
 files/entry

 +/row
 +row
 + entrystructfieldsourceline/structfield/entry
 + entrystructfieldinteger/structfield/entry
 + entryThe line number in configuration file/entry

 From pg_settings:

   entryLine number within the configuration file the current value was
   set at (null for values set from sources other than configuration
 files,
   or when examined by a non-superuser)
   /entry


 +/row
 +row
 + entrystructfieldname/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryParameter name in configuration file/entry

 From pg_settings:

   entryRun-time configuration parameter name/entry

 +/row
 +row
 + entrystructfieldsetting/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryValue of the parameter in configuration file/entry
 +/row
 +   /tbody
 +  /tgroup
 + /table
 +
 +/sect1

The documentation is updated based on your suggestion.

 diff --git a/src/backend/utils/misc/guc-file.l
 b/src/backend/utils/misc/guc-file.l
 @@ -342,6 +345,42 @@ ProcessConfigFile(GucContext context)
 +guc_array = guc_file_variables;
 +for (item = head; item; item = item-next, guc_array++)
 +{
 +free(guc_array-name);
 +free(guc_array-value);
 +free(guc_array-filename);

 There's an issue freeing these when calling pg_reload_conf():

Fix.

 postgres=# alter system set max_connections = 100;
 ALTER SYSTEM
 postgres=# select * from pg_reload_conf();
 LOG:  received SIGHUP, reloading configuration files
  pg_reload_conf
 
  t
 (1 row)

 postgres=# postgres(25078,0x7fff747b2300) malloc: *** error for object
 0x424d380044: pointer being freed was not allocated
 *** set a breakpoint in malloc_error_break to debug

 Of course a config reload can't change max_connections, but I wouldn't
 expect it to crash, either.

 +guc_array-sourceline = -1;

 I can't see the need for this since it is reassigned further down.

Fix.

 --

 Up-thread David J had recommended an ordering column (or seqno) that
 would provide the order in which the settings were loaded.  I think this
 would give valuable information that can't be gleaned from the line
 numbers alone.

 Personally I think it would be fine to start from 1 and increment for
 each setting found, rather than rank within a setting.  If the user
 wants per setting ranking that's what SQL is for.  So the output would
 look something like:

sourcefile | sourceline | seqno |  name   |  setting
 --++---+-+---
  /db/postgresql.conf  | 64 | 1 | max_connections | 100
  /db/postgresql.conf  |116 | 2 | shared_buffers  | 128MB
  /db/postgresql.conf  |446 | 3 | log_timezone|
 US/Eastern
  /db/postgresql.auto.conf |  3 | 4 | max_connections | 200


Yep, I agree with you.
Added seqno column into pg_file_settings

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Tue, Apr 28, 2015 at 3:22 AM, David Steele da...@pgmasters.net wrote:
 On 4/27/15 10:31 AM, Sawada Masahiko wrote:
 Thank you for your review comment!
 The latest patch is attached.

 Looks good overall - a few more comments below:

Thank you for your reviewing.
Attached v8 patch is latest version.

 diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
 +row
 + entrystructfieldseqno/structfield/entry
 + entrystructfieldinteger/structfield/entry
 + entrySequence number of current view/entry

 I would suggest:

 Order in which the setting was loaded from the configuration

FIx.

 +/row
 +row
 + entrystructfieldname/structfield/entry
 + entrystructfieldtext/structfield/entry

 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
 +/*
 + * show_all_file_settings
 + */
 +
 +#define NUM_PG_FILE_SETTINGS_ATTS 5
 +
 +Datum
 +show_all_file_settings(PG_FUNCTION_ARGS)

 It would be good to get more detail in the function comment, as well as
 more comments in the function itself.

Added some comments.

 A minor thing, but there are a number of whitespace errors when applying
 the patch:

 ../000_pg_file_settings_view_v6.patch:159: indent with spaces.
 free(guc_file_variables[i].name);
 ../000_pg_file_settings_view_v6.patch:160: indent with spaces.
 free(guc_file_variables[i].value);
 ../000_pg_file_settings_view_v6.patch:161: indent with spaces.
 free(guc_file_variables[i].filename);
 ../000_pg_file_settings_view_v6.patch:178: indent with spaces.
 guc_array-name = guc_strdup(FATAL, item-name);
 ../000_pg_file_settings_view_v6.patch:179: indent with spaces.
 guc_array-value = guc_strdup(FATAL, item-value);
 warning: squelched 2 whitespace errors
 warning: 7 lines add whitespace errors.

 I'm sure the committer would appreciate it if you'd clean those up.

I tried to get rid of white space.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 898865e..50b93cf 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7437,6 +7437,11 @@
  /row
 
  row
+  entrylink linkend=view-pg-file-settingsstructnamepg_file_settings/structname/link/entry
+  entryparameter settings of file/entry
+ /row
+
+ row
   entrylink linkend=view-pg-shadowstructnamepg_shadow/structname/link/entry
   entrydatabase users/entry
  /row
@@ -9050,6 +9055,74 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  /sect1
 
+ sect1 id=view-pg-file-settings
+  titlestructnamepg_file_settings/structname/title
+
+  indexterm zone=view-pg-file-settings
+   primarypg_file_settings/primary
+  /indexterm
+
+  para
+   The view structnamepg_file_settings/structname provides access to
+   run-time parameters that are defined in configuration files via SQL.
+   In contrast to structnamepg_settings/structname a row is provided for
+   each occurrence of the parameter in a configuration file. This is helpful
+   for discovering why one value may have been used in preference to another
+   when the parameters were loaded.
+  /para
+
+  table
+   titlestructnamepg_file_settings/ Columns/title
+
+  tgroup cols=3
+   thead
+row
+ entryName/entry
+ entryType/entry
+ entryDescription/entry
+/row
+   /thead
+   tbody
+row
+ entrystructfieldsourcefile/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entryA path of configration file/entry
+/row
+row
+ entrystructfieldsourceline/structfield/entry
+ entrystructfieldinteger/structfield/entry
+ entry
+  Line number within the configuration file the current value was
+  set at (null for values set from sources other than configuration files,
+  or when examined by a non-superuser)
+ /entry
+/row
+row
+ entrystructfieldseqno/structfield/entry
+ entrystructfieldinteger/structfield/entry
+ entryOrder in which the setting was loaded from the configuration/entry
+/row
+row
+ entrystructfieldname/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entry
+  Configuration file the current value was set in (null for values
+  set from sources other than configuration files, or when examined by a
+  non-superuser); helpful when using literalinclude/literal directives in
+  configuration files
+ /entry
+/row
+row
+ entrystructfieldsetting/structfield/entry
+ entrystructfieldtext/structfield/entry
+ entryRun-time configuration parameter name/entry
+/row
+   /tbody
+  /tgroup
+ /table
+
+/sect1
+
  sect1 id=view-pg-shadow
   titlestructnamepg_shadow/structname/title
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4c35ef4..31faab0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -411,6 +411,12 @@ CREATE

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Mon, Apr 27, 2015 at 11:31 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sat, Apr 25, 2015 at 3:40 AM, David Steele da...@pgmasters.net wrote:
 On 4/4/15 9:21 AM, Sawada Masahiko wrote:
 I added documentation changes to patch is attached.
 Also I tried to use memory context for allocation of guc_file_variable
 in TopMemoryContext,
 but it was failed access after received SIGHUP.
 Below is my review of the v5 patch:

 Thank you for your review comment!
 The latest patch is attached.

 diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
 + sect1 id=view-pg-file-settings
 +  titlestructnamepg_file_settings/structname/title
 +
 +  indexterm zone=view-pg-file-settings
 +   primarypg_file_settings/primary
 +  /indexterm
 +
 +  para
 +   The view structnamepg_file_settings/structname provides confirm
 parameters via SQL,
 +   which is written in configuration file. This view can be update by
 reloading configration file.
 +  /para

 Perhaps something like:

   para
The view structnamepg_file_settings/structname provides access to
 run-time parameters
that are defined in configuration files via SQL.  In contrast to
structnamepg_settings/structname a row is provided for each
 occurrence
of the parameter in a configuration file.  This is helpful for
 discovering why one value
may have been used in preference to another when the parameters were
 loaded.
   /para

 +  table
 +   titlestructnamepg_file_settings/ Columns/title
 +
 +  tgroup cols=3
 +   thead
 +row
 + entryName/entry
 + entryType/entry
 + entryDescription/entry
 +/row
 +   /thead
 +   tbody
 +row
 + entrystructfieldsourcefile/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryA path of configration file/entry

 From pg_settings:

   entryConfiguration file the current value was set in (null for
   values set from sources other than configuration files, or when
   examined by a non-superuser);
   helpful when using literalinclude/ directives in configuration
 files/entry

 +/row
 +row
 + entrystructfieldsourceline/structfield/entry
 + entrystructfieldinteger/structfield/entry
 + entryThe line number in configuration file/entry

 From pg_settings:

   entryLine number within the configuration file the current value was
   set at (null for values set from sources other than configuration
 files,
   or when examined by a non-superuser)
   /entry


 +/row
 +row
 + entrystructfieldname/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryParameter name in configuration file/entry

 From pg_settings:

   entryRun-time configuration parameter name/entry

 +/row
 +row
 + entrystructfieldsetting/structfield/entry
 + entrystructfieldtext/structfield/entry
 + entryValue of the parameter in configuration file/entry
 +/row
 +   /tbody
 +  /tgroup
 + /table
 +
 +/sect1

 The documentation is updated based on your suggestion.

 diff --git a/src/backend/utils/misc/guc-file.l
 b/src/backend/utils/misc/guc-file.l
 @@ -342,6 +345,42 @@ ProcessConfigFile(GucContext context)
 +guc_array = guc_file_variables;
 +for (item = head; item; item = item-next, guc_array++)
 +{
 +free(guc_array-name);
 +free(guc_array-value);
 +free(guc_array-filename);

 There's an issue freeing these when calling pg_reload_conf():

 Fix.

 postgres=# alter system set max_connections = 100;
 ALTER SYSTEM
 postgres=# select * from pg_reload_conf();
 LOG:  received SIGHUP, reloading configuration files
  pg_reload_conf
 
  t
 (1 row)

 postgres=# postgres(25078,0x7fff747b2300) malloc: *** error for object
 0x424d380044: pointer being freed was not allocated
 *** set a breakpoint in malloc_error_break to debug

 Of course a config reload can't change max_connections, but I wouldn't
 expect it to crash, either.

 +guc_array-sourceline = -1;

 I can't see the need for this since it is reassigned further down.

 Fix.

 --

 Up-thread David J had recommended an ordering column (or seqno) that
 would provide the order in which the settings were loaded.  I think this
 would give valuable information that can't be gleaned from the line
 numbers alone.

 Personally I think it would be fine to start from 1 and increment for
 each setting found, rather than rank within a setting.  If the user
 wants per setting ranking that's what SQL is for.  So the output would
 look something like:

sourcefile | sourceline | seqno |  name   |  setting
 --++---+-+---
  /db/postgresql.conf  | 64 | 1 | max_connections | 100
  /db/postgresql.conf  |116 | 2 | shared_buffers  | 128MB
  /db/postgresql.conf  |446 | 3 | log_timezone|
 US/Eastern
  /db/postgresql.auto.conf |  3 | 4

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko
On Thu, Apr 23, 2015 at 3:24 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 22, 2015 at 12:39 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 The thing that made me nervous about that approach is that it made the LSN
 of each page critical information. If you somehow zeroed out the LSN, you
 could no longer tell which pages are frozen and which are not. I'm sure it
 could be made to work - and I got it working to some degree anyway - but
 it's a bit scary. It's similar to the multixid changes in 9.3: multixids
 also used to be data that you can just zap at restart, and when we changed
 the rules so that you lose data if you lose multixids, we got trouble. Now,
 LSNs are much simpler, and there wouldn't be anything like the
 multioffset/member SLRUs that you'd have to keep around forever or vacuum,
 but still..

 LSNs are already pretty critical.  If they're in the future, you can't
 flush those pages.  Ever.  And if they're wrong in either direction,
 crash recovery is broken.  But it's still worth thinking about ways
 that we could make this more robust.

 I keep coming back to the idea of treating any page that is marked as
 all-visible as frozen, and deferring freezing until the page is again
 modified.  The big downside of this is that if the page is set as
 all-visible and then immediately thereafter modified, it sucks to have
 to freeze when the XIDs in the page are still present in CLOG.  But if
 we could determine from the LSN that the XIDs in the page are new
 enough to still be considered valid, then we could skip freezing in
 those cases and only do it when the page is old.  That way, if
 somebody zeroed out the LSN (why, oh why?) the worst that would happen
 is that we'd do some extra freezing when the page was next modified.

In your idea, if we have WORM (write-once read-many) table then these
tuples in page would not be frozen at all unless we do VACUUM FREEZE.
Also in this situation, from the second time VACUUM FREEZE would need
to scan only pages of increment from last freezing, we could reduce
I/O, but we would still need to do explicitly freezing for
anti-wrapping as in the past. WORM table has huge data in general, and
that data would be increase rapidly, so it would also be expensive.


 I would feel safer if we added a completely new epoch counter to the page
 header, instead of reusing LSNs. But as we all know, changing the page
 format is a problem for in-place upgrade, and takes some space too.

 Yeah.  We have a serious need to reduce the size of our on-disk
 format.  On a TPC-C-like workload Jan Wieck recently tested, our data
 set was 34% larger than another database at the beginning of the test,
 and 80% larger by the end of the test.  And we did twice the disk
 writes.  See The Elephants in the Room.pdf at
 https://sites.google.com/site/robertmhaas/presentations


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-23 Thread Sawada Masahiko
On Mon, Apr 20, 2015 at 10:17 PM, David Steele da...@pgmasters.net wrote:
 On 4/20/15 4:40 AM, Sawada Masahiko wrote:

 Thank you for updating the patch.

 One question about regarding since v7 (or later) patch is;
 What is the different between OBJECT logging and SESSION logging?

 In brief, session logging can log anything that happens in a user
 session while object logging only targets DML and SELECT on selected
 relations.

 The pg_audit.log_relation setting is intended to mimic the prior
 behavior of pg_audit before object logging was added.

 In general, I would not expect pg_audit.log = 'read, write' to be used
 with pg_audit.role.  In other words, session logging of DML and SELECT
 should probably not be turned on at the same time as object logging is
 in use.  Object logging is intended to be a fine-grained version of
 pg_audit.log = 'read, write' (one or both).

Thank you for your explanation!
I understood about how to use two logging style.

 I used v9 patch with pg_audit.log_relation = on, and got quite
 similar but different log as follows.

 =# select * from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.hoge,select *
 from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.hoge,select *
 from hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.bar,select * from
 hoge, bar where hoge.col = bar.col;
 NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.bar,select *
 from hoge, bar where hoge.col = bar.col;

 The behaviour of SESSION log is similar to OBJECT log now, and SESSION
 log per session (i.g, pg_audit.log_relation = off) is also similar to
 'log_statement = all'. (enhancing log_statement might be enough)
 So I couldn't understand needs of SESSION log.

 Session logging is quite different from 'log_statement = all'.  See:

 http://www.postgresql.org/message-id/552323b2.8060...@pgmasters.net

 and/or the Why pg_audit? section of the pg_audit documentation.

 I agree that it may make sense in the future to merge session logging
 into log_statements, but for now it does provide important additional
 functionality for creating audit logs.


I'm concerned that behaviour of pg_audit has been changed at a few
times as far as I remember. Did we achieve consensus on this design?

And one question; OBJECT logging of all tuple deletion (i.g. DELETE
FROM hoge) seems like not work as follows.


=# grant all on bar TO masahiko;

(1) Delete all tuple
=# insert into bar values(1);
=# delete from bar ;
NOTICE:  AUDIT: SESSION,47,1,WRITE,DELETE,TABLE,public.bar,delete from bar ;
DELETE 1

(2) Delete specified tuple (but same result as (1))
=# insert into bar values(1);
=# delete from bar where col = 1;
NOTICE:  AUDIT: OBJECT,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
NOTICE:  AUDIT: SESSION,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
DELETE 1

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/23/15 11:06 AM, Petr Jelinek wrote:

 On 23/04/15 17:45, Bruce Momjian wrote:

 On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
 Agreed, no extra file, and the same write volume as currently.  It would
 also match pg_clog, which uses two bits per transaction --- maybe we can
 reuse some of that code.


 Yeah, this approach seems promising. We probably can't reuse code from
 clog because the usage pattern is different (key for clog is xid, while
 for visibility/freeze map ctid is used). But visibility map storage
 layer is pretty simple so it should be easy to extend it for this use.


 Actually, there may be some bit manipulation functions we could reuse;
 things like efficiently counting how many things in a byte are set. Probably
 doesn't make sense to fully refactor it, but at least CLOG is a good source
 for cut/paste/whack.


I agree with adding a bit that indicates corresponding page is
all-frozen into VM, just like CLOG.
I'll change the patch as second version patch.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Freeze avoidance of very large table.

2015-04-21 Thread Sawada Masahiko
On Tue, Apr 21, 2015 at 7:00 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/20/15 2:45 AM, Sawada Masahiko wrote:

 Current patch adds new source file src/backend/access/heap/frozenmap.c
 which is quite similar to visibilitymap.c. They have similar code but
 are separated for now. I do refactoring these source code like adding
 bitmap.c, if needed.


Thank you for having a look this patch.


 My feeling is we'd definitely want this refactored; it looks to be a whole
 lot of duplicated code. But before working on that we should get consensus
 that a FrozenMap is a good idea.

Yes, we need to get consensus about FrozenMap before starting work on.
In addition to comment you pointed out, I noticed that one problems I
should address, that a bit of FrozenMap need to be cleared on deletion and
(i.g. xmax is set).
The page as frozen could have the dead tuple for now, but I think to change
to that the frozen page guarantees that page is all frozen *and* all
visible.

 Are there any meaningful differences between the two, besides the obvious
 name changes?

No, there aren't.

 I think there's also a bunch of XLOG stuff that could be refactored too...

I agree with you.

 Also, when skipping vacuum by visibility map, we can skip at least
 SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
 frozen map.


 That's probably something else that can be factored out, since it's
 basically the same logic. I suspect we just need to  some of the checks
so
 we're looking at both FM and VM at the same time.

FrozenMap is used to skip scan only when anti-wrapping vacuum or freezing
all tuples (i.g scan_all is true).
The normal vacuum uses only VM, doesn't use FM for now.

 Other comments...

 It would be nice if we didn't need another page bit for FM; do you see any
 reasonable way that could happen?

We may be able to remove page bit for FM from page header, but I'm not sure
we could do that.

 +* If we didn't pin the visibility(and frozen) map page and the
page
 has
 +* become all visible(and frozen) while we were busy locking the
 buffer,
 +* or during some subsequent window during which we had it
unlocked,
 +* we'll have to unlock and re-lock, to avoid holding the buffer
 lock
 +* across an I/O.  That's a bit unfortunate, especially since
we'll
 now
 +* have to recheck whether the tuple has been locked or updated
 under us,
 +* but hopefully it won't happen very often.
  */

 s/(and frozen)/ or frozen/


 + * Reply XLOG_HEAP3_FROZENMAP record.
 s/Reply/Replay/

Understood.


 +   /*
 +* XLogReplayBufferExtended locked the buffer. But
 frozenmap_set
 +* will handle locking itself.
 +*/
 +   LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);

 Doesn't this create a race condition?


 Are you sure the bit in finish_heap_swap() is safe? If so, we should add
the
 same the same for the visibility map too (it certainly better be all
visible
 if it's frozen...)

We can not ensure page is all visible even if we execute VACUUM FULL,
because of dead tuple could be remained. e.g. the case when other process
does insert and update to same tuple in same transaction before VACUUM FULL.
I was thinking that the FrozenMap is free of the influence of delete
operation. But as I said at top of this mail, a bit of FrozenMap needs to
be cleared on deletion.
So I will remove these related code as you mentioned.




 +   /*
 +* Current block is all-visible.
 +* If frozen map represents that it's all frozen
and
 this
 +* function is called for freezing tuples, we can
 skip to
 +* vacuum block.
 +*/

 I would state this as Even if scan_all is true, we can skip blocks that
are
 marked as frozen.

 +   if (frozenmap_test(onerel, blkno, fmbuffer) 
 scan_all)

 I suspect it's faster to reverse those tests (scan_all 
 frozenmap_test())... but why do we even need to look at scan_all? AFAICT
if
 a block as frozen we can skip it unconditionally.

The tuple which is frozen and dead, could be remained in page is marked all
frozen, in currently patch.
i.g., There is possible to exist the page is not all visible but marked
frozen.
But I'm thinking to change that.



 +   /*
 +* If the un-frozen tuple is remaining in current
 page and
 +* current page is marked as ALL_FROZEN, we should
 clear it.
 +*/

 That needs to NEVER happen. If it does then we're going to consider tuples
 as visible/frozen that shouldn't be. We should probably throw an error
here,
 because it means the heap is now corrupted. At the minimum it needs to be
an
 assert().

I understood. I'll fix it.

 Note that I haven't reviewed all the logic in detail at this point. If
this
 ends up being refactored

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-21 Thread Sawada Masahiko
On Wed, Apr 22, 2015 at 12:02 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-04-21 23:59:45 +0900, Sawada Masahiko wrote:
 The page as frozen could have the dead tuple for now, but I think to change
 to that the frozen page guarantees that page is all frozen *and* all
 visible.

 It shouldn't. That'd potentially cause corruption after a wraparound. A
 tuple's visibility might change due to that.

The page as frozen could have some dead tuples, right?
I think we should to clear a bit of FrozenMap (and flag of page
header) on delete operation, and a bit is set only by vacuum.
So accordingly, the page as frozen guarantees that all frozen and all visible?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-20 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/6/15 5:18 PM, Greg Stark wrote:

 Only I would suggest thinking of it in terms of two orthogonal boolean
 flags rather than three states. It's easier to reason about whether a
 table has a specific property than trying to control a state machine in
 a predefined pathway.

 So I would say the two flags are:
 READONLY: guarantees nothing can be dirtied
 ALLFROZEN: guarantees no unfrozen tuples are present

 In practice you can't have the later without the former since vacuum
 can't know everything is frozen unless it knows nobody is inserting. But
 perhaps there will be cases in the future where that's not true.


 I'm not so sure about that. There's a logical state progression here (see
 below). ISTM it's easier to just enforce that in one place instead of a
 bunch of places having to check multiple conditions. But, I'm not wed to a
 single field.

 Incidentally there are number of other optimisations tat over had in
 mind that are only possible on frozen read-only tables:

 1) Compression: compress the pages and pack them one after the other.
 Build a new fork with offsets for each page.

 2) Automatic partition elimination where the statistics track the
 minimum and maximum value per partition (and number of tuples) and treat
 then as implicit constraints. In particular it would magically make read
 only empty parent partitions be excluded regardless of the where clause.


 AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
 uncompact and re-compact for #1 when you actually freeze (which maybe isn't
 worth it), but freezing isn't absolutely required. #2 would only require
 that everything in the relation is visible; not frozen.

 I think there's value here to having an ALLVISIBLE state as well as
 ALLFROZEN.


 Based on may suggestions, I'm going to deal with FM at first as one
 patch. It would be simply mechanism and similar to VM, at first patch.
 - Each bit of FM represent single page
 - The bit is set only by vacuum
 - The bit is un-set by inserting and updating and deleting

 At second, I'll deal with simply read-only table and 2 states,
 Read/Write(default) and ReadOnly as one patch. ITSM the having the
 Frozen state needs to more discussion. read-only table just allow us
 to disable any updating table, and it's controlled by read-only flag
 pg_class has. And DDL command which changes these status is like ALTER
 TABLE SET READ ONLY, or READ WRITE.
 Also as Alvaro's suggested, the read-only table affect not only
 freezing table but also performance optimization. I'll consider
 including them when I deal with read-only table.


Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.

Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.

Current patch adds new source file src/backend/access/heap/frozenmap.c
which is quite similar to visibilitymap.c. They have similar code but
are separated for now. I do refactoring these source code like adding
bitmap.c, if needed.
Also, when skipping vacuum by visibility map, we can skip at least
SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
frozen map.

Please give me feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..53f07fd 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o frozenmap.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/frozenmap.c b/src/backend/access/heap/frozenmap.c
new file mode 100644
index 000..6e64cb8
--- /dev/null
+++ b/src/backend/access/heap/frozenmap.c
@@ -0,0 +1,567 @@
+/*-
+ *
+ * frozenmap.c
+ *	  bitmap for tracking frozen heap tuples
+ *
+ * Portions Copyright (c) 2015, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/heap/frozenmap.c
+ *
+ *-
+ */
+#include postgres.h
+
+#include access/frozenmap.h
+#include access/heapam_xlog.h
+#include access/xlog.h

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-20 Thread Sawada Masahiko
On Thu, Apr 16, 2015 at 2:34 AM, David Steele da...@pgmasters.net wrote:
 On 4/15/15 11:30 AM, Sawada Masahiko wrote:
 On Wed, Apr 15, 2015 at 10:52 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I tested v8 patch with CURSOR case I mentioned before, and got
 segmentation fault again.
 Here are log messages in my environment,

 =# select test();
  LOG:  server process (PID 29730) was terminated by signal 11:
 Segmentation fault
 DETAIL:  Failed process was running: select test();
 LOG:  terminating any other active server processes
 WARNING:  terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process
 exited abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 FATAL:  the database system is in recovery mode

 I investigated this problem and inform you my result here.

 When we execute test() function I mentioned, the three stack items in
 total are stored into auditEventStack.
 The two of them are freed by stack_pop() - stack_free() (i.g,
 stack_free() is called by stack_pop()).
 One of them is freed by PortalDrop() - .. -
 MemoryContextDeleteChildren() - ... - stack_free().
 And it is freed at the same time as deleting pg_audit memory context,
 and stack will be completely empty.

 But after freeing all items, finish_xact_command() function could call
 PortalDrop(), so ExecutorEnd() function could be called again.
 pg_audit has ExecutorEnd_hook, so postgres tries to free that item.. SEGV.

 In my environment, the following change fixes it.

 --- pg_audit.c.org2015-04-15 14:21:07.541866525 +0900
 +++ pg_audit.c2015-04-15 11:36:53.758877339 +0900
 @@ -1291,7 +1291,7 @@
  standard_ExecutorEnd(queryDesc);

  /* Pop the audit event off the stack */
 -if (!internalStatement)
 +if (!internalStatement  auditEventStack != NULL)
  {
  stack_pop(auditEventStack-stackId);
  }

 It might be good idea to add Assert() at before calling stack_pop().

 I'm not sure this change is exactly correct, but I hope this
 information helps you.

 I appreciate you taking the time to look - this is the same conclusion I
 came to.  I also hardened another area that I thought might be vulnerable.

 I've seen this problem with explicit cursors before (and fixed it in
 another place a while ago).  The memory context that is current in
 ExecutorStart is freed before ExecutorEnd is called only in this case as
 far as I can tell.  I'm not sure this is very consistent behavior.

 I have attached patch v9 which fixes this issue as you suggested, but
 I'm not completely satisfied with it.  It seems like there could be an
 unintentional pop from the stack in a case of deeper nesting.  This
 might not be possible but it's hard to disprove.

 I'll think about it some more, but meanwhile this patch addresses the
 present issue.

Thank you for updating the patch.

One question about regarding since v7 (or later) patch is;
What is the different between OBJECT logging and SESSION logging?

I used v9 patch with pg_audit.log_relation = on, and got quite
similar but different log as follows.

=# select * from hoge, bar where hoge.col = bar.col;
NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.hoge,select *
from hoge, bar where hoge.col = bar.col;
NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.hoge,select *
from hoge, bar where hoge.col = bar.col;
NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.bar,select * from
hoge, bar where hoge.col = bar.col;
NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.bar,select *
from hoge, bar where hoge.col = bar.col;

The behaviour of SESSION log is similar to OBJECT log now, and SESSION
log per session (i.g, pg_audit.log_relation = off) is also similar to
'log_statement = all'. (enhancing log_statement might be enough)
So I couldn't understand needs of SESSION log.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-15 Thread Sawada Masahiko
On Wed, Apr 15, 2015 at 10:52 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 8:57 AM, David Steele da...@pgmasters.net wrote:
 On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
 This patch does not apply cleanly due to the moving of pgbench (patch
 to filelist.sgml failed).

 Thank you for pointing that out!

 Ironic that it was the commit directly after the one I was testing with
 that broke the patch.  It appears the end of the last CF is a very bad
 time to be behind HEAD.

 Fixed in attached v8 patch.

 Thank you for updating the patch!

 I applied the patch and complied them successfully without WARNING.

 I tested v8 patch with CURSOR case I mentioned before, and got
 segmentation fault again.
 Here are log messages in my environment,

 =# select test();
  LOG:  server process (PID 29730) was terminated by signal 11:
 Segmentation fault
 DETAIL:  Failed process was running: select test();
 LOG:  terminating any other active server processes
 WARNING:  terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process
 exited abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 FATAL:  the database system is in recovery mode

 I hope that these messages helps you to address this problem.
 I will also try to address this.

 Regards,

 ---
 Sawada Masahiko



 I will also try to address this.

I investigated this problem and inform you my result here.

When we execute test() function I mentioned, the three stack items in
total are stored into auditEventStack.
The two of them are freed by stack_pop() - stack_free() (i.g,
stack_free() is called by stack_pop()).
One of them is freed by PortalDrop() - .. -
MemoryContextDeleteChildren() - ... - stack_free().
And it is freed at the same time as deleting pg_audit memory context,
and stack will be completely empty.

But after freeing all items, finish_xact_command() function could call
PortalDrop(), so ExecutorEnd() function could be called again.
pg_audit has ExecutorEnd_hook, so postgres tries to free that item.. SEGV.

In my environment, the following change fixes it.

--- pg_audit.c.org2015-04-15 14:21:07.541866525 +0900
+++ pg_audit.c2015-04-15 11:36:53.758877339 +0900
@@ -1291,7 +1291,7 @@
 standard_ExecutorEnd(queryDesc);

 /* Pop the audit event off the stack */
-if (!internalStatement)
+if (!internalStatement  auditEventStack != NULL)
 {
 stack_pop(auditEventStack-stackId);
 }

It might be good idea to add Assert() at before calling stack_pop().

I'm not sure this change is exactly correct, but I hope this
information helps you.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread Sawada Masahiko
On Wed, Apr 15, 2015 at 8:57 AM, David Steele da...@pgmasters.net wrote:
 On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
 This patch does not apply cleanly due to the moving of pgbench (patch
 to filelist.sgml failed).

 Thank you for pointing that out!

 Ironic that it was the commit directly after the one I was testing with
 that broke the patch.  It appears the end of the last CF is a very bad
 time to be behind HEAD.

 Fixed in attached v8 patch.

Thank you for updating the patch!

I applied the patch and complied them successfully without WARNING.

I tested v8 patch with CURSOR case I mentioned before, and got
segmentation fault again.
Here are log messages in my environment,

=# select test();
 LOG:  server process (PID 29730) was terminated by signal 11:
Segmentation fault
DETAIL:  Failed process was running: select test();
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
FATAL:  the database system is in recovery mode

I hope that these messages helps you to address this problem.
I will also try to address this.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-09 Thread Sawada Masahiko
On Thu, Apr 9, 2015 at 1:14 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 10:53 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 
  On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko sawada.m...@gmail.com
  wrote:
 
  Thank you for your reviewing.
  I modified the patch and attached latest version patch(v7).
  Please have a look it.
 
 
  Looks good to me. Attached patch (v8) just fix a tab indentation in
  gram.y.
 

 I had forgotten fix a tab indentation, sorry.
 Thank you for reviewing!
 It looks good to me too.
 Can this patch be marked as Ready for Committer?


 +1


 Changed status to Ready for Committer.

 The patch adds new syntax like REINDEX ... WITH VERBOSE, i.e., () is not
 added after WITH clause. Did we reach the consensus about this syntax?
 The last email from Robert just makes me think that () should be added
 into the syntax.


 Thank you for taking time for this patch!

 I removed the FORCE option from REINDEX, so you would need to update the 
 patch.

Thanks.
I will change the patch with this change.

 This was quite complicated issue since we already have a lot of style
 command currently.
 We can think many of style from various perspective: kind of DDL, new
 or old command, maintenance command. And each command has each its
 story.
 I believe we have reached the consensus with this style at least once
 (please see previous discussion), but we might needs to discuss
 more...

 Okay, another question is that; WITH must be required whenever the options
 are specified? Or should it be abbreviatable?

In previous discussion, the WITH clause is always required by VERBOSE
option. I don't think to enable us to abbreviate WITH clause for now.
Also, at this time that FORCE option is removed, we could bring back
idea is to put VERBOSE at after object name like CLUSTER. (INDEX,
TABLE, etc.)

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-08 Thread Sawada Masahiko
On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 
  On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko sawada.m...@gmail.com
  wrote:
 
  Thank you for your reviewing.
  I modified the patch and attached latest version patch(v7).
  Please have a look it.
 
 
  Looks good to me. Attached patch (v8) just fix a tab indentation in
  gram.y.
 

 I had forgotten fix a tab indentation, sorry.
 Thank you for reviewing!
 It looks good to me too.
 Can this patch be marked as Ready for Committer?


 +1


 Changed status to Ready for Committer.

 The patch adds new syntax like REINDEX ... WITH VERBOSE, i.e., () is not
 added after WITH clause. Did we reach the consensus about this syntax?
 The last email from Robert just makes me think that () should be added
 into the syntax.


Thank you for taking time for this patch!

This was quite complicated issue since we already have a lot of style
command currently.
We can think many of style from various perspective: kind of DDL, new
or old command, maintenance command. And each command has each its
story.
I believe we have reached the consensus with this style at least once
(please see previous discussion), but we might needs to discuss
more...

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 9:32 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:

 On Mon, Apr 6, 2015 at 10:21 AM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Hello, I have some trivial comments about the latest patch.
 
  At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko
  sawada.m...@gmail.com wrote in
  CAD21AoBxPCpPvKQmvJMUh+p=2pfau03gkjq2r2zy47xhsh2...@mail.gmail.com
  sawada.mshk On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby
  jim.na...@bluetreble.com wrote:
   Are the parenthesis necessary? No other WITH option requires them,
other
   than create table/matview (COPY doesn't actually require them).
   
  
   I was imagining EXPLAIN syntax.
   Is there some possibility of supporting multiple options for REINDEX
   command in future?
   If there is, syntax will be as follows, REINDEX { INDEX | ... } name
   WITH VERBOSE, XXX, XXX;
   I thought style with parenthesis is better than above style.
  
  
   The thing is, ()s are actually an odd-duck. Very little supports it,
   and
   while COPY allows it they're not required. EXPLAIN is a different
   story,
   because that's not WITH; we're actually using () *instead of* WITH.
  
   So because almost all commands that use WITH doen't even accept (), I
   don't
   think this should either. It certainly shouldn't require them,
   because
   unlike EXPLAIN, there's no need to require them.
  
 
  I understood what your point is.
  Attached patch is changed syntax, it does not have parenthesis.
 
  As I looked into the code to find what the syntax would be, I
  found some points which would be better to be fixed.
 
  In gram.y the options is a list of cstring but it is not necesary
  to be a list because there's only one kind of option now.
 
  If you prefer it to be a list, I have a comment for the way to
  make string list in gram.y. You stored bare cstring in the
  options list but I think it is not the preferable form. I suppose
  the followings are preferable. Corresponding fixes are needed in
  ReindexTable, ReindexIndex, ReindexMultipleTables.
 
  $ = list_make1(makeString($1);
   
  $ = lappend($1, list_make1(makeString($3));
 
 
  In equalfuncs.c, _equalReindexStmt forgets to compare the member
  options. _copyReindexStmt also forgets to copy it. The way you
  constructed the options list prevents them from doing their jobs
  using prepared methods. Comparing and copying the member option
  is needed even if it becomes a simple string.
 

 I revised patch, and changed gram.y as I don't use the list.
 So this patch adds new syntax,
 REINDEX { INDEX | ... } name WITH VERBOSE;

 Also documentation is updated.
 Please give me feedbacks.


 Some notes:

 1) There are a trailing space in src/backend/parser/gram.y:

 -   | REINDEX DATABASE name opt_force
 +   | REINDEX reindex_target_multitable name WITH opt_verbose
 {
 ReindexStmt *n = makeNode(ReindexStmt);
 -   n-kind = REINDEX_OBJECT_DATABASE;
 +   n-kind = $2;
 n-name = $3;
 n-relation = NULL;
 +   n-verbose = $5;
 $$ = (Node *)n;
 }
 ;


 2) The documentation was updated and is according the behaviour.


 3) psql autocomplete is ok.


 4) Lack of regression tests. I think you should add some regression like
 that:

 fabrizio=# \set VERBOSITY terse
 fabrizio=# create table reindex_verbose(id integer primary key);
 CREATE TABLE
 fabrizio=# reindex table reindex_verbose with verbose;
 INFO:  index reindex_verbose_pkey was reindexed.
 REINDEX


 5) Code style and organization is ok


 6) You should add the new field ReindexStmt-verbose to
 src/backend/nodes/copyfuncs.c


 Regards,



Thank you for your reviewing.
I modified the patch and attached latest version patch(v7).
Please have a look it.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..27be1a4 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable WITH VERBOSE
 /synopsis
  /refsynopsisdiv
 
@@ -159,6 +160,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 351dcb2..fc44495 100644
--- a/src/backend/catalog/index.c

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 
  On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko sawada.m...@gmail.com
  wrote:
 
  Thank you for your reviewing.
  I modified the patch and attached latest version patch(v7).
  Please have a look it.
 
 
  Looks good to me. Attached patch (v8) just fix a tab indentation in
  gram.y.
 

 I had forgotten fix a tab indentation, sorry.
 Thank you for reviewing!
 It looks good to me too.
 Can this patch be marked as Ready for Committer?


 +1


Changed status to Ready for Committer.
Thank you for final reviewing!

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:

 On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 Thank you for your reviewing.
 I modified the patch and attached latest version patch(v7).
 Please have a look it.


 Looks good to me. Attached patch (v8) just fix a tab indentation in gram.y.


I had forgotten fix a tab indentation, sorry.
Thank you for reviewing!
It looks good to me too.
Can this patch be marked as Ready for Committer?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/6/15 5:18 PM, Greg Stark wrote:

 Only I would suggest thinking of it in terms of two orthogonal boolean
 flags rather than three states. It's easier to reason about whether a
 table has a specific property than trying to control a state machine in
 a predefined pathway.

 So I would say the two flags are:
 READONLY: guarantees nothing can be dirtied
 ALLFROZEN: guarantees no unfrozen tuples are present

 In practice you can't have the later without the former since vacuum
 can't know everything is frozen unless it knows nobody is inserting. But
 perhaps there will be cases in the future where that's not true.


 I'm not so sure about that. There's a logical state progression here (see
 below). ISTM it's easier to just enforce that in one place instead of a
 bunch of places having to check multiple conditions. But, I'm not wed to a
 single field.

 Incidentally there are number of other optimisations tat over had in
 mind that are only possible on frozen read-only tables:

 1) Compression: compress the pages and pack them one after the other.
 Build a new fork with offsets for each page.

 2) Automatic partition elimination where the statistics track the
 minimum and maximum value per partition (and number of tuples) and treat
 then as implicit constraints. In particular it would magically make read
 only empty parent partitions be excluded regardless of the where clause.


 AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
 uncompact and re-compact for #1 when you actually freeze (which maybe isn't
 worth it), but freezing isn't absolutely required. #2 would only require
 that everything in the relation is visible; not frozen.

 I think there's value here to having an ALLVISIBLE state as well as
 ALLFROZEN.


Based on may suggestions, I'm going to deal with FM at first as one
patch. It would be simply mechanism and similar to VM, at first patch.
- Each bit of FM represent single page
- The bit is set only by vacuum
- The bit is un-set by inserting and updating and deleting

At second, I'll deal with simply read-only table and 2 states,
Read/Write(default) and ReadOnly as one patch. ITSM the having the
Frozen state needs to more discussion. read-only table just allow us
to disable any updating table, and it's controlled by read-only flag
pg_class has. And DDL command which changes these status is like ALTER
TABLE SET READ ONLY, or READ WRITE.
Also as Alvaro's suggested, the read-only table affect not only
freezing table but also performance optimization. I'll consider
including them when I deal with read-only table.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko
On Fri, Apr 3, 2015 at 10:01 PM, David Steele da...@pgmasters.net wrote:
 On 4/3/15 3:59 AM, Sawada Masahiko wrote:
 On Thu, Apr 2, 2015 at 2:46 AM, David Steele da...@pgmasters.net wrote:
 Let me know if you see any other issues.


 I pulled HEAD, and then tried to compile source code after applied
 following deparsing utility command patch without #0001 and #0002.
 (because these two patches are already pushed)
 http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org

 But I could not use new pg_audit patch due to compile error (that
 deparsing utility command patch might have).
 I think I'm missing something, but I'm not sure.
 Could you tell me which patch should I apply before compiling pg_audit?

 When Alvaro posted his last patch set he recommended applying them to
 b3196e65:

 http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org

 Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
 space error) then applying pg_audit-v6.patch works fine.


Thank you for your advice!
I'm testing pg_audit, so I will send report to you as soon as possible.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/3/15 12:59 AM, Sawada Masahiko wrote:

 +   case HEAPTUPLE_LIVE:
 +   case HEAPTUPLE_RECENTLY_DEAD:
 +   case HEAPTUPLE_INSERT_IN_PROGRESS:
 +   case HEAPTUPLE_DELETE_IN_PROGRESS:
 +   if
 (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
 +
 mxactcutoff, frozen[nfrozen]))
 +   frozen[nfrozen++].offset
 = offnum;
 +   break;


 This doesn't seem safe enough to me. Can't there be tuples that are still
 new enough that they can't be frozen, and are still live?


 Yep.  I've set a table to read only while it contained unfreezable tuples,
 and the tuples remain unfrozen yet the read-only action claims to have
 succeeded.



 Somewhat related... instead of forcing the freeze to happen synchronously,
 can't we set this up so a table is in one of three states? Read/Write, Read
 Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
 appropriate state, and all the vacuum infrastructure would continue to
 process those tables as it does today. lazy_vacuum_rel would become
 responsible for tracking if there were any non-frozen tuples if it was also
 attempting a freeze. If it discovered there were none, AND the table was
 marked as ReadOnly, then it would change the table state to Frozen and set
 relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
 AT_SetReadWrite could change relfrozenxid to it's own Xid as an
 optimization. Doing it that way leaves all the complicated vacuum code in
 one place, and would eliminate concerns about race conditions with still
 running transactions, etc.


 +1 here as well.  I might want to set tables to read only for reasons other
 than to avoid repeated freezing.  (After all, the name of the command
 suggests it is a general purpose thing) and wouldn't want to automatically
 trigger a vacuum freeze in the process.


Thank you for comments.

 Somewhat related... instead of forcing the freeze to happen synchronously, 
 can't we set this up so a table is in one of three states? Read/Write, Read 
 Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to  the 
 appropriate state, and all the vacuum infrastructure would continue to 
 process those tables as it does today. lazy_vacuum_rel would become 
 responsible for tracking if there were any non-frozen tuples if it was also 
 attempting  a freeze. If it discovered there were none, AND the table was 
 marked as ReadOnly, then it would change the table state to Frozen and set 
 relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. 
 AT_SetReadWrite  could change relfrozenxid to it's own Xid as an 
 optimization. Doing it that way leaves all the complicated vacuum code in one 
 place, and would eliminate concerns about race conditions with still running 
 transactions, etc.

I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?

 +1 here as well.  I might want to set tables to read only for reasons other 
 than to avoid repeated freezing.  (After all, the name of the command 
 suggests it is a general purpose thing) and wouldn't want to automatically 
 trigger a
 vacuum freeze in the process.

 There is another possibility here, too. We can completely divorce a ReadOnly 
 mode (which I think is useful for other things besides freezing) from the 
 question of whether we need to force-freeze a relation if we create a
 FrozenMap, similar to the visibility map. This has the added advantage of 
 helping freeze scans on relations that are not ReadOnly in the case of tables 
 that are insert-mostly or any other pattern where most pages stay all-frozen.
 Prior to the visibility map this would have been a rather daunting project, 
 but I believe this could piggyback on the VM code rather nicely. Anytime you 
 clear the VM you clearly must clear the FrozenMap as well. The logic for
 setting the FM is clearly different, but that would be entirely 
 self-contained to vacuum. Unlike the VM, I don't see any point to marking 
 special bits in the page itself for FM.

I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-06 Thread Sawada Masahiko
On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, I have some trivial comments about the latest patch.

 At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko sawada.m...@gmail.com 
 wrote in CAD21AoBxPCpPvKQmvJMUh+p=2pfau03gkjq2r2zy47xhsh2...@mail.gmail.com
 sawada.mshk On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby 
 jim.na...@bluetreble.com wrote:
  Are the parenthesis necessary? No other WITH option requires them, other
  than create table/matview (COPY doesn't actually require them).
  
 
  I was imagining EXPLAIN syntax.
  Is there some possibility of supporting multiple options for REINDEX
  command in future?
  If there is, syntax will be as follows, REINDEX { INDEX | ... } name
  WITH VERBOSE, XXX, XXX;
  I thought style with parenthesis is better than above style.
 
 
  The thing is, ()s are actually an odd-duck. Very little supports it, and
  while COPY allows it they're not required. EXPLAIN is a different story,
  because that's not WITH; we're actually using () *instead of* WITH.
 
  So because almost all commands that use WITH doen't even accept (), I don't
  think this should either. It certainly shouldn't require them, because
  unlike EXPLAIN, there's no need to require them.
 

 I understood what your point is.
 Attached patch is changed syntax, it does not have parenthesis.

 As I looked into the code to find what the syntax would be, I
 found some points which would be better to be fixed.

 In gram.y the options is a list of cstring but it is not necesary
 to be a list because there's only one kind of option now.

 If you prefer it to be a list, I have a comment for the way to
 make string list in gram.y. You stored bare cstring in the
 options list but I think it is not the preferable form. I suppose
 the followings are preferable. Corresponding fixes are needed in
 ReindexTable, ReindexIndex, ReindexMultipleTables.

 $$ = list_make1(makeString($1);
  
 $$ = lappend($1, list_make1(makeString($3));


 In equalfuncs.c, _equalReindexStmt forgets to compare the member
 options. _copyReindexStmt also forgets to copy it. The way you
 constructed the options list prevents them from doing their jobs
 using prepared methods. Comparing and copying the member option
 is needed even if it becomes a simple string.


I revised patch, and changed gram.y as I don't use the list.
So this patch adds new syntax,
REINDEX { INDEX | ... } name WITH VERBOSE;

Also documentation is updated.
Please give me feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..27be1a4 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable WITH VERBOSE
 /synopsis
  /refsynopsisdiv
 
@@ -159,6 +160,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 351dcb2..fc44495 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3133,13 +3134,18 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+bool verbose)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = verbose ? INFO : DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3283,6 +3289,13 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	ereport(elevel,
+			(errmsg(index \%s\ was reindexed.,
+	get_rel_name(indexId)),
+	errdetail(%s.,
+			  pg_rusage_show(ru0;
+
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
 	heap_close(heapRelation, NoLock);
@@ -3324,7 +3337,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
  * index rebuild.
  */
 bool
-reindex_relation(Oid relid, int flags)
+reindex_relation(Oid

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko
On Fri, Apr 3, 2015 at 10:01 PM, David Steele da...@pgmasters.net wrote:
 On 4/3/15 3:59 AM, Sawada Masahiko wrote:
 On Thu, Apr 2, 2015 at 2:46 AM, David Steele da...@pgmasters.net wrote:
 Let me know if you see any other issues.


 I pulled HEAD, and then tried to compile source code after applied
 following deparsing utility command patch without #0001 and #0002.
 (because these two patches are already pushed)
 http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org

 But I could not use new pg_audit patch due to compile error (that
 deparsing utility command patch might have).
 I think I'm missing something, but I'm not sure.
 Could you tell me which patch should I apply before compiling pg_audit?

 When Alvaro posted his last patch set he recommended applying them to
 b3196e65:

 http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org

 Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
 space error) then applying pg_audit-v6.patch works fine.


I tested v6 patch, but I got SEGV when I executed test() function I
mentioned up thread.
Could you address this problem?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko
On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 4/6/15 1:46 AM, Sawada Masahiko wrote:

 On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:


 On 4/3/15 12:59 AM, Sawada Masahiko wrote:


 +   case HEAPTUPLE_LIVE:
 +   case HEAPTUPLE_RECENTLY_DEAD:
 +   case HEAPTUPLE_INSERT_IN_PROGRESS:
 +   case HEAPTUPLE_DELETE_IN_PROGRESS:
 +   if
 (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
 +
 mxactcutoff, frozen[nfrozen]))
 +
 frozen[nfrozen++].offset
 = offnum;
 +   break;



 This doesn't seem safe enough to me. Can't there be tuples that are
 still
 new enough that they can't be frozen, and are still live?



 Yep.  I've set a table to read only while it contained unfreezable
 tuples,
 and the tuples remain unfrozen yet the read-only action claims to have
 succeeded.



 Somewhat related... instead of forcing the freeze to happen
 synchronously,
 can't we set this up so a table is in one of three states? Read/Write,
 Read
 Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
 the
 appropriate state, and all the vacuum infrastructure would continue to
 process those tables as it does today. lazy_vacuum_rel would become
 responsible for tracking if there were any non-frozen tuples if it was
 also
 attempting a freeze. If it discovered there were none, AND the table was
 marked as ReadOnly, then it would change the table state to Frozen and
 set
 relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
 AT_SetReadWrite could change relfrozenxid to it's own Xid as an
 optimization. Doing it that way leaves all the complicated vacuum code
 in
 one place, and would eliminate concerns about race conditions with still
 running transactions, etc.



 +1 here as well.  I might want to set tables to read only for reasons
 other
 than to avoid repeated freezing.  (After all, the name of the command
 suggests it is a general purpose thing) and wouldn't want to
 automatically
 trigger a vacuum freeze in the process.


 Thank you for comments.

 Somewhat related... instead of forcing the freeze to happen
 synchronously, can't we set this up so a table is in one of three states?
 Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
 simply change to  the appropriate state, and all the vacuum infrastructure
 would continue to process those tables as it does today. lazy_vacuum_rel
 would become responsible for tracking if there were any non-frozen tuples if
 it was also attempting  a freeze. If it discovered there were none, AND the
 table was marked as ReadOnly, then it would change the table state to Frozen
 and set relfrozenxid = InvalidTransactionId and relminxid =
 InvalidMultiXactId. AT_SetReadWrite  could change relfrozenxid to it's own
 Xid as an optimization. Doing it that way leaves all the complicated vacuum
 code in one place, and would eliminate concerns about race conditions with
 still running transactions, etc.


 I agree with 3 status, Read/Write, ReadOnly and Frozen.
 But I'm not sure when we should do to freeze tuples, e.g., scan whole
 tables.
 I think that the any changes to table are completely
 ignored/restricted if table is marked as ReadOnly table,
 and it's accompanied by freezing tuples, just mark as ReadOnly.
 Frozen table ensures that all tuples of its table completely has been
 frozen, so it also needs to scan whole table as well.
 e.g., we should need to scan whole table at two times. right?


 No. You would be free to set a table as ReadOnly any time you wanted,
 without scanning anything. All that setting does is disable any DML on the
 table.

 The Frozen state would only be set by the vacuum code, IFF:
 - The table state is ReadOnly *at the start of vacuum* and did not change
 during vacuum
 - Vacuum ensured that there were no un-frozen tuples in the table

 That does not necessitate 2 scans.


I understood this comcept, and have question as I wrote below.

 +1 here as well.  I might want to set tables to read only for reasons
 other than to avoid repeated freezing.  (After all, the name of the command
 suggests it is a general purpose thing) and wouldn't want to automatically
 trigger a
 vacuum freeze in the process.

 There is another possibility here, too. We can completely divorce a
 ReadOnly mode (which I think is useful for other things besides freezing)
 from the question of whether we need to force-freeze a relation if we create
 a
 FrozenMap, similar to the visibility map. This has the added advantage of
 helping freeze scans on relations that are not ReadOnly in the case of
 tables that are insert-mostly or any other pattern where most pages stay
 all-frozen.
 Prior to the visibility map this would have been a rather

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-04 Thread Sawada Masahiko
On Wed, Mar 11, 2015 at 11:46 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
 Sawada,

 * Sawada Masahiko (sawada.m...@gmail.com) wrote:
 Thank you for your review!
 Attached file is the latest version (without document patch. I making it 
 now.)
 As per discussion, there is no change regarding of super user permission.

 Ok.  Here's another review.


 Thank you for your review!

 diff --git a/src/backend/catalog/system_views.sql 
 b/src/backend/catalog/system_views.sql
 index 5e69e2b..94ee229 100644
 --- a/src/backend/catalog/system_views.sql
 +++ b/src/backend/catalog/system_views.sql
 @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS

  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;

 +CREATE VIEW pg_file_settings AS
 +   SELECT * FROM pg_show_all_file_settings() AS A;
 +
 +REVOKE ALL on pg_file_settings FROM public;
 +

 This suffers from a lack of pg_dump support, presuming that the
 superuser is entitled to change the permissions on this function.  As it
 turns out though, you're in luck in that regard since I'm working on
 fixing that for a mostly unrelated patch.  Any feedback you have on what
 I'm doing to pg_dump here:

 http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net

 Would certainly be appreciated.


 Thank you for the info.
 I will read the discussion and send the feedbacks.

 diff --git a/src/backend/utils/misc/guc-file.l 
 b/src/backend/utils/misc/guc-file.l
 [...]
 +  * Calculate size of guc_array and allocate it. From the secound time 
 to allcate memory,
 +  * we should free old allocated memory.
 +  */
 + guc_array_size = num_guc_file_variables * sizeof(struct 
 ConfigFileVariable);
 + if (!guc_file_variables)
 + {
 + /* For the first call */
 + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
 guc_array_size);
 + }
 + else
 + {
 + guc_array = guc_file_variables;
 + for (item = head; item; item = item-next, guc_array++)
 + {
 + free(guc_array-name);
 + free(guc_array-value);
 + free(guc_array-filename);

 It's a minor nit-pick, as the below loop should handle anything we care
 about, but I'd go ahead and reset guc_array-sourceline to be -1 or
 something too, just to show that everything's being handled here with
 regard to cleanup.  Or perhaps just add a comment saying that the
 sourceline for all currently valid entries will be updated.

 Fixed.


 + guc_file_variables = (ConfigFileVariable *) 
 guc_realloc(FATAL, guc_file_variables, guc_array_size);
 + }

 Nasby made a comment, I believe, that we might actually be able to use
 memory contexts here, which would certainly be much nicer as then you'd
 be able to just throw away the whole context and create a new one.  Have
 you looked at that approach at all?  Offhand, I'm not sure if it'd work
 or not (I have my doubts..) but it seems worthwhile to consider.


 I successfully used palloc() and pfree() when allocate and free
 guc_file_variable,
 but these variable seems to be freed already when I get value of
 pg_file_settings view via SQL.

 Otherwise, it seems like this would address my previous concerns that
 this would end up leaking memory, and even if it's a bit slower than one
 might hope, it's not an operation we should be doing very frequently.

 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 [...]
  /*
 + * Return the total number of GUC File variables
 + */
 +int
 +GetNumConfigFileOptions(void)
 +{
 + return num_guc_file_variables;
 +}

 What's the point of this function..?  I'm not convinced it's the best
 idea, but it certainly looks like the function and the variable are only
 used from this file anyway?

 It's unnecessary.
 Fixed.

 + if (call_cntr  max_calls)
 + {
 + char   *values[NUM_PG_FILE_SETTINGS_ATTS];
 + HeapTuple   tuple;
 + Datum   result;
 + ConfigFileVariable conf;
 + charbuffer[256];

 Isn't that buffer a bit.. excessive in size?

 Fixed.


 diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
 index d3100d1..ebb96cc 100644
 --- a/src/include/utils/guc.h
 +++ b/src/include/utils/guc.h
 @@ -133,6 +133,14 @@ typedef struct ConfigVariable
   struct ConfigVariable *next;
  } ConfigVariable;

 +typedef struct ConfigFileVariable
 +{
 + char*name;
 + char*value;
 + char*filename;
 + int sourceline;
 +} ConfigFileVariable;
 +
 [...]
 +extern int   GetNumConfigFileOptions(void);

 These aren't actually used anywhere else, are they?  Not sure that
 there's any need to expose them beyond guc.c when that's the only place
 they're used.


 Fixed.

 This will also need a
 REVOKE EXECUTE on pg_show_all_file_settings() FROM public;

 Added.


I added

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-03 Thread Sawada Masahiko
On Thu, Apr 2, 2015 at 2:46 AM, David Steele da...@pgmasters.net wrote:
 Hi Sawada,

 On 3/25/15 9:24 AM, David Steele wrote:
 On 3/25/15 7:46 AM, Sawada Masahiko wrote:
 2.
 I got ERROR when executing function uses cursor.

 1) create empty table (hoge table)
 2) create test function as follows.

 create function test() returns int as $$
 declare
 cur1 cursor for select * from hoge;
 tmp int;
 begin
 open cur1;
 fetch cur1 into tmp;
return tmp;
 end$$
 language plpgsql ;

 3) execute test function (got ERROR)
 =# select test();
 LOG:  AUDIT: SESSION,6,1,READ,SELECT,,,selecT test();
 LOG:  AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT 
 test();
 LOG:  AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge
 CONTEXT:  PL/pgSQL function test() line 6 at OPEN
 ERROR:  pg_audit stack is already empty
 STATEMENT:  selecT test();

 It seems like that the item in stack is already freed by deleting
 pg_audit memory context (in MemoryContextDelete()),
 before calling stack_pop in dropping of top-level Portal.

 This has been fixed and I have attached a new patch.  I've seen this
 with cursors before where the parent MemoryContext is freed before
 control is returned to ProcessUtility.  I think that's strange behavior
 but there's not a lot I can do about it.


Thank you for updating the patch!

 The code I put in to deal with this situation was not quite robust
 enough so I had to harden it a bit more.

 Let me know if you see any other issues.


I pulled HEAD, and then tried to compile source code after applied
following deparsing utility command patch without #0001 and #0002.
(because these two patches are already pushed)
http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org

But I could not use new pg_audit patch due to compile error (that
deparsing utility command patch might have).
I think I'm missing something, but I'm not sure.
Could you tell me which patch should I apply before compiling pg_audit?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-25 Thread Sawada Masahiko
On Wed, Mar 25, 2015 at 12:23 PM, David Steele da...@pgmasters.net wrote:
 On Wed, Mar 25, 2015 at 12:38 AM, David Steele da...@pgmasters.net wrote:
 2. OBJECT auditing does not work before adding acl info to 
 pg_class.rel_acl.
 In following situation, pg_audit can not audit OBJECT log.
 $ cat postgresql.conf | grep audit
 shared_preload_libraries = 'pg_audit'
 pg_audit.role = 'hoge_user'
 pg_audit.log = 'read, write'
 $ psql -d postgres -U hoge_user
 =# create table hoge(col int);
 =# select * from hoge;
 LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;

 OBJECT audit log is not logged here since pg_class.rel_acl is empty
 yet. (Only logged SESSION log)
 So after creating another unconcerned role and grant any privilege to that 
 user,
 OBJECT audit is logged successfully.

 Yes, object auditing does not work until some grants have been made to
 the audit role.

 =# create role bar_user;
 =# grant select on hoge to bar_user;
 =# select * from hoge;
 LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
 LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;

 The both OBJCET and SESSION log are logged.

 Looks right to me.  If you don't want the session logging then disable
 pg_audit.log.

 Session and object logging are completely independent from each other:
 one or the other, or both, or neither can be enabled at any time.

 It means that OBJECT log is not logged just after creating table, even
 if that table is touched by its owner.
 To write OBJECT log, we need to grant privilege to role at least. right?

 Exactly.  Privileges must be granted to the audit role in order for
 object auditing to work.


The table owner always can touch its table.
So does it lead that table owner can get its table information while
hiding OBJECT logging?

Also I looked into latest patch again.
Here are two review comment.

1.
 typedef struct
 {
int64 statementId;
   int64 substatementId;
Both statementId and substatementId could be negative number.
I think these should be uint64 instead.

2.
I got ERROR when executing function uses cursor.

1) create empty table (hoge table)
2) create test function as follows.

create function test() returns int as $$
declare
cur1 cursor for select * from hoge;
tmp int;
begin
open cur1;
fetch cur1 into tmp;
   return tmp;
end$$
language plpgsql ;

3) execute test function (got ERROR)
=# select test();
LOG:  AUDIT: SESSION,6,1,READ,SELECT,,,selecT test();
LOG:  AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT test();
LOG:  AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge
CONTEXT:  PL/pgSQL function test() line 6 at OPEN
ERROR:  pg_audit stack is already empty
STATEMENT:  selecT test();

It seems like that the item in stack is already freed by deleting
pg_audit memory context (in MemoryContextDelete()),
before calling stack_pop in dropping of top-level Portal.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-24 Thread Sawada Masahiko
On Tue, Mar 24, 2015 at 3:17 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Sawada Masahiko wrote:

 I tied to look into latest patch, but got following error.

 masahiko [pg_audit] $ LANG=C make
 gcc -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE   -c -o
 pg_audit.o pg_audit.c
 pg_audit.c: In function 'log_audit_event':
 pg_audit.c:456: warning: ISO C90 forbids mixed declarations and code
 pg_audit.c: In function 'pg_audit_ddl_command_end':
 pg_audit.c:1436: error: 'pg_event_trigger_expand_command' undeclared
 (first use in this function)

 You need to apply my deparsing patch first, last version of which I
 posted here:
 https://www.postgresql.org/message-id/20150316234406.gh3...@alvh.no-ip.org


Thank you for the info.
I've applied these patchese successfully.

I looked into this module, and had a few comments as follows.
1. pg_audit audits only one role currently.
In currently code, we can not multiple user as auditing user. Why?
(Sorry if this topic already has been discussed.)

2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl.
In following situation, pg_audit can not audit OBJECT log.
$ cat postgresql.conf | grep audit
shared_preload_libraries = 'pg_audit'
pg_audit.role = 'hoge_user'
pg_audit.log = 'read, write'
$ psql -d postgres -U hoge_user
=# create table hoge(col int);
=# select * from hoge;
LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;

OBJECT audit log is not logged here since pg_class.rel_acl is empty
yet. (Only logged SESSION log)
So after creating another unconcerned role and grant any privilege to that user,
OBJECT audit is logged successfully.

=# create role bar_user;
=# grant select on hoge to bar_user;
=# select * from hoge;
LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;

The both OBJCET and SESSION log are logged.

3. pg_audit logged OBJECT log even EXPLAIN command.
EXPLAIN command does not touch the table actually, but pg_audit writes
audit OBJECT log.
I'm not sure we need to log it. Is it intentional?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-24 Thread Sawada Masahiko
Hi David,

Thank you for your answer!

On Wed, Mar 25, 2015 at 12:38 AM, David Steele da...@pgmasters.net wrote:
 Hi Sawada,

 Thank you for taking the time to look at the patch.

 On 3/24/15 10:28 AM, Sawada Masahiko wrote:
 I've applied these patchese successfully.

 I looked into this module, and had a few comments as follows.
 1. pg_audit audits only one role currently.
 In currently code, we can not multiple user as auditing user. Why?
 (Sorry if this topic already has been discussed.)

 There is only one master audit role in a bid for simplicity.  However,
 there are two ways you can practically have multiple audit roles (both
 are mentioned in the docs):

 1) The audit role honors inheritance so you can grant all your audit
 roles to the master role set in pg_audit.role and all the roles will
 be audited.

 2) Since pg_audit.role is a GUC, you can set a different audit role per
 database by using ALTER DATABASE ... SET.  You can set the GUC per logon
 role as well though that would probably make things very complicated.
 The GUC is SUSET so normal users cannot tamper with it.

I understood.

 2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl.
 In following situation, pg_audit can not audit OBJECT log.
 $ cat postgresql.conf | grep audit
 shared_preload_libraries = 'pg_audit'
 pg_audit.role = 'hoge_user'
 pg_audit.log = 'read, write'
 $ psql -d postgres -U hoge_user
 =# create table hoge(col int);
 =# select * from hoge;
 LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;

 OBJECT audit log is not logged here since pg_class.rel_acl is empty
 yet. (Only logged SESSION log)
 So after creating another unconcerned role and grant any privilege to that 
 user,
 OBJECT audit is logged successfully.

 Yes, object auditing does not work until some grants have been made to
 the audit role.

 =# create role bar_user;
 =# grant select on hoge to bar_user;
 =# select * from hoge;
 LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
 LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;

 The both OBJCET and SESSION log are logged.

 Looks right to me.  If you don't want the session logging then disable
 pg_audit.log.

 Session and object logging are completely independent from each other:
 one or the other, or both, or neither can be enabled at any time.

It means that OBJECT log is not logged just after creating table, even
if that table is touched by its owner.
To write OBJECT log, we need to grant privilege to role at least. right?

 3. pg_audit logged OBJECT log even EXPLAIN command.
 EXPLAIN command does not touch the table actually, but pg_audit writes
 audit OBJECT log.
 I'm not sure we need to log it. Is it intentional?

 This is intentional.  They are treated as queries since in production
 they should be relatively rare (that is, not part of a normal function
 or process) and it's good information to have because EXPLAIN can be
 used to determine the number of rows in a table, and could also be used
 to figure out when data is added or removed from a table.  In essence,
 it is a query even if it does not return row data.

Okay, I understood.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-23 Thread Sawada Masahiko
 to be committed for 9.5.

 I have prepared a patch that brings event triggers and deparse back to
 pg_audit based on the Alvaro's dev/deparse branch at
 git://git.postgresql.org/git/2ndquadrant_bdr.git (commit 0447fc5).  I've
 updated the unit tests accordingly.

 I left in the OAT code for now.  It does add detail to one event that
 the event triggers do not handle (creating PK indexes) and I feel that
 it lends an element of safety in case something happens to the event
 triggers.  OAT is also required for function calls so the code path
 cannot be eliminated entirely.  I'm not committed to keeping the
 redundant OAT code, but I'd rather not remove it until deparse is
 committed to master.

 I've been hesitant to post this patch as it will not work in master
 (though it will compile), but I don't want to hold on to it any longer
 since the end of the CF is nominally just weeks away.  If you want to
 run the patch in master, you'll need to disable the
 pg_audit_ddl_command_end trigger.

 I've also addressed Fujii's concerns about logging parameters - this is
 now an option.  The event stack has been formalized and
 MemoryContextRegisterResetCallback() is used to cleanup the stack on errors.

 Let me know what you think.


Hi,

I tied to look into latest patch, but got following error.

masahiko [pg_audit] $ LANG=C make
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE   -c -o
pg_audit.o pg_audit.c
pg_audit.c: In function 'log_audit_event':
pg_audit.c:456: warning: ISO C90 forbids mixed declarations and code
pg_audit.c: In function 'pg_audit_ddl_command_end':
pg_audit.c:1436: error: 'pg_event_trigger_expand_command' undeclared
(first use in this function)
pg_audit.c:1436: error: (Each undeclared identifier is reported only once
pg_audit.c:1436: error: for each function it appears in.)
make: *** [pg_audit.o] Error 1

Am I missing something?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-12 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 5:05 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/9/15 9:43 PM, Sawada Masahiko wrote:

 On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 3/2/15 10:58 AM, Sawada Masahiko wrote:


 On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:


 On 2/24/15 8:28 AM, Sawada Masahiko wrote:



 According to the above discussion, VACUUM and REINDEX should have
 trailing options. Tom seems (to me) suggesting that SQL-style
 (bare word preceded by WITH) options and Jim suggesting '()'
 style options? (Anyway VACUUM gets the*third additional*  option
 sytle, but it is the different discussion from this)




 Well, almost everything does a trailing WITH. We need to either stick
 with
 that for consistency, or add leading () as an option to those WITH
 commands.

 Does anyone know why those are WITH? Is it ANSI?

 As a refresher, current commands are:

VACUUM (ANALYZE, VERBOSE) table1 (col1);
REINDEX INDEX index1 FORCE;
COPY table1 FROM 'file.txt' WITH (FORMAT csv);
CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH
 DATA;
CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
CREATE ROLE role WITH LOGIN;
GRANT  WITH GRANT OPTION;
CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;



 BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the
 most
 consistent with everything else. Is there a problem with doing that? I
 know
 getting syntax is one of the hard parts of new features, but it seems
 like
 we reached consensus here...


 Attached is latest version patch based on Tom's idea as follows.
 REINDEX { INDEX | ... } name WITH ( options [, ...] )


 Are the parenthesis necessary? No other WITH option requires them, other
 than create table/matview (COPY doesn't actually require them).

 We have discussed about this option including FORCE option, but I
 think there are not user who want to use both FORCE and VERBOSE option
 at same time.



 I find that very hard to believe... I would expect a primary use case for
 VERBOSE to be I ran REINDEX, but it doesn't seem to have done
 anything...
 what's going on? and that's exactly when you might want to use FORCE.


 In currently code, nothing happens even if FORCE option is specified.
 This option completely exist for backward compatibility.
 But this patch add new syntax including FORCE option for now.


 I forgot that. There's no reason to support it with the new stuff then.

 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Attached patch is latest version patch changed syntax a little.
This patch adds following syntax for now.
 REINDEX { INDEX | ... } name WITH (VERBOSE);

But we are under the discussion regarding parenthesis, so there is
possibility of change.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-12 Thread Sawada Masahiko
On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/11/15 6:33 AM, Sawada Masahiko wrote:

 As a refresher, current commands are:
 
 VACUUM (ANALYZE, VERBOSE) table1 (col1);
 REINDEX INDEX index1 FORCE;
 COPY table1 FROM 'file.txt' WITH (FORMAT csv);
 CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry
  WITH
 DATA;
 CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
 CREATE ROLE role WITH LOGIN;
 GRANT  WITH GRANT OPTION;
 CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
 ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
 DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;

 
 
 
 BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be
  the
 most
 consistent with everything else. Is there a problem with doing that?
  I
 know
 getting syntax is one of the hard parts of new features, but it
  seems
 like
 we reached consensus here...

 
 
 Attached is latest version patch based on Tom's idea as follows.
 REINDEX { INDEX | ... } name WITH ( options [, ...] )

 
 
 Are the parenthesis necessary? No other WITH option requires them, other
 than create table/matview (COPY doesn't actually require them).
 

 I was imagining EXPLAIN syntax.
 Is there some possibility of supporting multiple options for REINDEX
 command in future?
 If there is, syntax will be as follows, REINDEX { INDEX | ... } name
 WITH VERBOSE, XXX, XXX;
 I thought style with parenthesis is better than above style.


 The thing is, ()s are actually an odd-duck. Very little supports it, and
 while COPY allows it they're not required. EXPLAIN is a different story,
 because that's not WITH; we're actually using () *instead of* WITH.

 So because almost all commands that use WITH doen't even accept (), I don't
 think this should either. It certainly shouldn't require them, because
 unlike EXPLAIN, there's no need to require them.


I understood what your point is.
Attached patch is changed syntax, it does not have parenthesis.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..3cea35f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,11 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable WITH replaceable class=PARAMETERoptions/replaceable [, ...]
+
+phrasewhere replaceable class=PARAMETERoption/replaceable can be one of:/phrase
+
+VERBOSE
 /synopsis
  /refsynopsisdiv
 
@@ -159,6 +164,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f85ed93..786f173 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3130,13 +3131,18 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+bool verbose)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = verbose ? INFO : DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3280,6 +3286,13 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	ereport(elevel,
+			(errmsg(index \%s\ was reindexed.,
+	get_rel_name(indexId)),
+	errdetail(%s.,
+			  pg_rusage_show(ru0;
+
 	/* Close rels, but keep locks */
 	index_close(iRel, NoLock);
 	heap_close(heapRelation, NoLock);
@@ -3321,7 +3334,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
  * index rebuild.
  */
 bool
-reindex_relation(Oid relid, int flags)
+reindex_relation(Oid relid, int flags, bool verbose)
 {
 	Relation	rel;
 	Oid			toast_relid;
@@ -3412,7 +3425,7 @@ reindex_relation(Oid relid, int flags)
 RelationSetIndexList(rel, doneIndexes, InvalidOid);
 
 			reindex_index(indexOid, !(flags  REINDEX_REL_CHECK_CONSTRAINTS),
-		  persistence);
+		  persistence, verbose);
 
 			CommandCounterIncrement

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-11 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 5:05 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/9/15 9:43 PM, Sawada Masahiko wrote:

 On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 3/2/15 10:58 AM, Sawada Masahiko wrote:


 On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:


 On 2/24/15 8:28 AM, Sawada Masahiko wrote:



 According to the above discussion, VACUUM and REINDEX should have
 trailing options. Tom seems (to me) suggesting that SQL-style
 (bare word preceded by WITH) options and Jim suggesting '()'
 style options? (Anyway VACUUM gets the*third additional*  option
 sytle, but it is the different discussion from this)




 Well, almost everything does a trailing WITH. We need to either stick
 with
 that for consistency, or add leading () as an option to those WITH
 commands.

 Does anyone know why those are WITH? Is it ANSI?

 As a refresher, current commands are:

VACUUM (ANALYZE, VERBOSE) table1 (col1);
REINDEX INDEX index1 FORCE;
COPY table1 FROM 'file.txt' WITH (FORMAT csv);
CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH
 DATA;
CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
CREATE ROLE role WITH LOGIN;
GRANT  WITH GRANT OPTION;
CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;



 BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the
 most
 consistent with everything else. Is there a problem with doing that? I
 know
 getting syntax is one of the hard parts of new features, but it seems
 like
 we reached consensus here...


 Attached is latest version patch based on Tom's idea as follows.
 REINDEX { INDEX | ... } name WITH ( options [, ...] )


 Are the parenthesis necessary? No other WITH option requires them, other
 than create table/matview (COPY doesn't actually require them).


I was imagining EXPLAIN syntax.
Is there some possibility of supporting multiple options for REINDEX
command in future?
If there is, syntax will be as follows, REINDEX { INDEX | ... } name
WITH VERBOSE, XXX, XXX;
I thought style with parenthesis is better than above style.

 We have discussed about this option including FORCE option, but I
 think there are not user who want to use both FORCE and VERBOSE option
 at same time.



 I find that very hard to believe... I would expect a primary use case for
 VERBOSE to be I ran REINDEX, but it doesn't seem to have done
 anything...
 what's going on? and that's exactly when you might want to use FORCE.


 In currently code, nothing happens even if FORCE option is specified.
 This option completely exist for backward compatibility.
 But this patch add new syntax including FORCE option for now.


 I forgot that. There's no reason to support it with the new stuff then.

 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-11 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
 Sawada,

 * Sawada Masahiko (sawada.m...@gmail.com) wrote:
 Thank you for your review!
 Attached file is the latest version (without document patch. I making it 
 now.)
 As per discussion, there is no change regarding of super user permission.

 Ok.  Here's another review.


Thank you for your review!

 diff --git a/src/backend/catalog/system_views.sql 
 b/src/backend/catalog/system_views.sql
 index 5e69e2b..94ee229 100644
 --- a/src/backend/catalog/system_views.sql
 +++ b/src/backend/catalog/system_views.sql
 @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS

  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;

 +CREATE VIEW pg_file_settings AS
 +   SELECT * FROM pg_show_all_file_settings() AS A;
 +
 +REVOKE ALL on pg_file_settings FROM public;
 +

 This suffers from a lack of pg_dump support, presuming that the
 superuser is entitled to change the permissions on this function.  As it
 turns out though, you're in luck in that regard since I'm working on
 fixing that for a mostly unrelated patch.  Any feedback you have on what
 I'm doing to pg_dump here:

 http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net

 Would certainly be appreciated.


Thank you for the info.
I will read the discussion and send the feedbacks.

 diff --git a/src/backend/utils/misc/guc-file.l 
 b/src/backend/utils/misc/guc-file.l
 [...]
 +  * Calculate size of guc_array and allocate it. From the secound time 
 to allcate memory,
 +  * we should free old allocated memory.
 +  */
 + guc_array_size = num_guc_file_variables * sizeof(struct 
 ConfigFileVariable);
 + if (!guc_file_variables)
 + {
 + /* For the first call */
 + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
 guc_array_size);
 + }
 + else
 + {
 + guc_array = guc_file_variables;
 + for (item = head; item; item = item-next, guc_array++)
 + {
 + free(guc_array-name);
 + free(guc_array-value);
 + free(guc_array-filename);

 It's a minor nit-pick, as the below loop should handle anything we care
 about, but I'd go ahead and reset guc_array-sourceline to be -1 or
 something too, just to show that everything's being handled here with
 regard to cleanup.  Or perhaps just add a comment saying that the
 sourceline for all currently valid entries will be updated.

Fixed.


 + guc_file_variables = (ConfigFileVariable *) guc_realloc(FATAL, 
 guc_file_variables, guc_array_size);
 + }

 Nasby made a comment, I believe, that we might actually be able to use
 memory contexts here, which would certainly be much nicer as then you'd
 be able to just throw away the whole context and create a new one.  Have
 you looked at that approach at all?  Offhand, I'm not sure if it'd work
 or not (I have my doubts..) but it seems worthwhile to consider.


I successfully used palloc() and pfree() when allocate and free
guc_file_variable,
but these variable seems to be freed already when I get value of
pg_file_settings view via SQL.

 Otherwise, it seems like this would address my previous concerns that
 this would end up leaking memory, and even if it's a bit slower than one
 might hope, it's not an operation we should be doing very frequently.

 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 [...]
  /*
 + * Return the total number of GUC File variables
 + */
 +int
 +GetNumConfigFileOptions(void)
 +{
 + return num_guc_file_variables;
 +}

 What's the point of this function..?  I'm not convinced it's the best
 idea, but it certainly looks like the function and the variable are only
 used from this file anyway?

It's unnecessary.
Fixed.

 + if (call_cntr  max_calls)
 + {
 + char   *values[NUM_PG_FILE_SETTINGS_ATTS];
 + HeapTuple   tuple;
 + Datum   result;
 + ConfigFileVariable conf;
 + charbuffer[256];

 Isn't that buffer a bit.. excessive in size?

Fixed.


 diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
 index d3100d1..ebb96cc 100644
 --- a/src/include/utils/guc.h
 +++ b/src/include/utils/guc.h
 @@ -133,6 +133,14 @@ typedef struct ConfigVariable
   struct ConfigVariable *next;
  } ConfigVariable;

 +typedef struct ConfigFileVariable
 +{
 + char*name;
 + char*value;
 + char*filename;
 + int sourceline;
 +} ConfigFileVariable;
 +
 [...]
 +extern int   GetNumConfigFileOptions(void);

 These aren't actually used anywhere else, are they?  Not sure that
 there's any need to expose them beyond guc.c when that's the only place
 they're used.


Fixed.

 This will also need a
 REVOKE EXECUTE on pg_show_all_file_settings() FROM public;

Added.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v4.patch
Description: Binary data

-- 
Sent via

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-09 Thread Sawada Masahiko
On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 3/2/15 10:58 AM, Sawada Masahiko wrote:

 On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 2/24/15 8:28 AM, Sawada Masahiko wrote:


 According to the above discussion, VACUUM and REINDEX should have
 trailing options. Tom seems (to me) suggesting that SQL-style
 (bare word preceded by WITH) options and Jim suggesting '()'
 style options? (Anyway VACUUM gets the*third additional*  option
 sytle, but it is the different discussion from this)



 Well, almost everything does a trailing WITH. We need to either stick
 with
 that for consistency, or add leading () as an option to those WITH
 commands.

 Does anyone know why those are WITH? Is it ANSI?

 As a refresher, current commands are:

   VACUUM (ANALYZE, VERBOSE) table1 (col1);
   REINDEX INDEX index1 FORCE;
   COPY table1 FROM 'file.txt' WITH (FORMAT csv);
   CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
   CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
   CREATE ROLE role WITH LOGIN;
   GRANT  WITH GRANT OPTION;
   CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
   ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
   DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;


 BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the most
 consistent with everything else. Is there a problem with doing that? I know
 getting syntax is one of the hard parts of new features, but it seems like
 we reached consensus here...

Attached is latest version patch based on Tom's idea as follows.
REINDEX { INDEX | ... } name WITH ( options [, ...] )


 We have discussed about this option including FORCE option, but I
 think there are not user who want to use both FORCE and VERBOSE option
 at same time.


 I find that very hard to believe... I would expect a primary use case for
 VERBOSE to be I ran REINDEX, but it doesn't seem to have done anything...
 what's going on? and that's exactly when you might want to use FORCE.


In currently code, nothing happens even if FORCE option is specified.
This option completely exist for backward compatibility.
But this patch add new syntax including FORCE option for now.

Todo
- tab completion
- reindexdb command

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..a4109aa 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,12 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable WITH ( replaceable class=PARAMETERoptions/replaceable [, ...] )
+
+phrasewhere replaceable class=PARAMETERoption/replaceable can be one of:/phrase
+
+FORCE [ replaceable class=PARAMETERboolean/replaceable ]
+VERBOSE [ replaceable class=PARAMETERboolean/replaceable ]
 /synopsis
  /refsynopsisdiv
 
@@ -159,6 +165,29 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } replaceable class=PARAM
  /para
 /listitem
/varlistentry
+
+   varlistentry
+termliteralVERBOSE/literal/term
+listitem
+ para
+  Prints a progress report as each index is reindexed.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
+termreplaceable class=parameterboolean/replaceable/term
+listitem
+ para
+  Specifies whether the selected option should be turned on or off.
+  You can write literalTRUE/literal, literalON/, or
+  literal1/literal to enable the option, and literalFALSE/literal,
+  literalOFF/, or literal0/literal to disable it.  The
+  replaceable class=parameterboolean/replaceable value can also
+  be omitted, in which case literalTRUE/literal is assumed.
+ /para
+/listitem
+   /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f85ed93..786f173 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include utils/inval.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/pg_rusage.h
 #include utils/syscache.h
 #include utils/tuplesort.h
 #include utils/snapmgr.h
@@ -3130,13 +3131,18 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+bool verbose)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = verbose ? INFO : DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3280,6 +3286,13

[HACKERS] Wrong error message in REINDEX command

2015-03-07 Thread Sawada Masahiko
Hi,

I got wrong error message when I did REINDEX SYSTEM command in
transaction as follows.
It should say ERROR:  REINDEX SYSTEM cannot run inside a transaction block
Attached patch fixes it.

[postgres][5432](1)=# begin;
BEGIN
[postgres][5432](1)=# reindex system postgres;
ERROR:  REINDEX DATABASE cannot run inside a transaction block
STATEMENT:  reindex system postgres;

Regards,

---
Sawada Masahiko


fix_reindex_error_message.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Sawada Masahiko
On Sat, Feb 28, 2015 at 2:27 PM, Stephen Frost sfr...@snowman.net wrote:
 Sawada,

 * Sawada Masahiko (sawada.m...@gmail.com) wrote:
 Attached patch is latest version including following changes.
 - This view is available to super use only
 - Add sourceline coulmn

 Alright, first off, to Josh's point- I'm definitely interested in a
 capability to show where the heck a given config value is coming from.
 I'd be even happier if there was a way to *force* where config values
 have to come from, but that ship sailed a year ago or so.  Having this
 be for the files, specifically, seems perfectly reasonable to me.  I
 could see having another function which will report where a currently
 set GUC variable came from (eg: ALTER ROLE or ALTER DATABASE), but
 having a function for which config file is this GUC set in seems
 perfectly reasonable to me.

 To that point, here's a review of this patch.

 diff --git a/src/backend/catalog/system_views.sql 
 b/src/backend/catalog/system_views.sql
 index 6df6bf2..f628cb0 100644
 --- a/src/backend/catalog/system_views.sql
 +++ b/src/backend/catalog/system_views.sql
 @@ -412,6 +412,11 @@ CREATE RULE pg_settings_n AS

  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;

 +CREATE VIEW pg_file_settings AS
 +   SELECT * FROM pg_show_all_file_settings() AS A;
 +
 +REVOKE ALL on pg_file_settings FROM public;

 While this generally works, the usual expectation is that functions
 that should be superuser-only have a check in the function rather than
 depending on the execute privilege.  I'm certainly happy to debate the
 merits of that approach, but for the purposes of this patch, I'd suggest
 you stick an if (!superuser()) ereport(must be superuser) into the
 function itself and not work about setting the correct permissions on
 it.

 + ConfigFileVariable *guc;

 As this ends up being an array, I'd call it guc_array or something
 along those lines.  GUC in PG-land has a pretty specific single-entity
 meaning.

 @@ -344,6 +346,21 @@ ProcessConfigFile(GucContext context)
   PGC_BACKEND, 
 PGC_S_DYNAMIC_DEFAULT);
   }

 + guc_file_variables = (ConfigFileVariable *)
 + guc_malloc(FATAL, num_guc_file_variables * 
 sizeof(struct ConfigFileVariable));

 Uh, perhaps I missed it, but what happens on a reload?  Aren't we going
 to realloc this every time?  Seems like we should be doing a
 guc_malloc() the first time through but doing guc_realloc() after, or
 we'll leak memory on every reload.

 + /*
 +  * Apply guc config parameters to guc_file_variable
 +  */
 + guc = guc_file_variables;
 + for (item = head; item; item = item-next, guc++)
 + {
 + guc-name = guc_strdup(FATAL, item-name);
 + guc-value = guc_strdup(FATAL, item-value);
 + guc-filename = guc_strdup(FATAL, item-filename);
 + guc-sourceline = item-sourceline;
 + }

 Uh, ditto and double-down here.  I don't see a great solution other than
 looping through the previous array and free'ing each of these, since we
 can't depend on the memory context machinery being up and ready at this
 point, as I recall.

 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
 index 931993c..c69ce66 100644
 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -3561,9 +3561,17 @@ static const char *const map_old_guc_names[] = {
   */
  static struct config_generic **guc_variables;

 +/*
 + * lookup of variables for pg_file_settings view.
 + */
 +static struct ConfigFileVariable *guc_file_variables;
 +
  /* Current number of variables contained in the vector */
  static int   num_guc_variables;

 +/* Number of file variables */
 +static int   num_guc_file_variables;
 +

 I'd put these together, and add a comment explaining that
 guc_file_variables is an array of length num_guc_file_variables..

  /*
 + * Return the total number of GUC File variables
 + */
 +int
 +GetNumConfigFileOptions(void)
 +{
 + return num_guc_file_variables;
 +}

 I don't see the point of this function..

 +Datum
 +show_all_file_settings(PG_FUNCTION_ARGS)
 +{
 + FuncCallContext *funcctx;
 + TupleDesc   tupdesc;
 + int call_cntr;
 + int max_calls;
 + AttInMetadata *attinmeta;
 + MemoryContext oldcontext;
 +
 + if (SRF_IS_FIRSTCALL())
 + {
 + funcctx = SRF_FIRSTCALL_INIT();
 +
 + oldcontext = 
 MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
 +
 + /*
 +  * need a tuple descriptor representing NUM_PG_SETTINGS_ATTS 
 columns
 +  * of the appropriate types
 +  */
 +
 + tupdesc = CreateTemplateTupleDesc(NUM_PG_FILE_SETTINGS_ATTS, 
 false);
 + TupleDescInitEntry(tupdesc, (AttrNumber) 1, name,
 +TEXTOID, -1, 0);
 + TupleDescInitEntry(tupdesc, (AttrNumber) 2, setting

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-02 Thread Sawada Masahiko
On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 2/24/15 8:28 AM, Sawada Masahiko wrote:

 According to the above discussion, VACUUM and REINDEX should have
 trailing options. Tom seems (to me) suggesting that SQL-style
 (bare word preceded by WITH) options and Jim suggesting '()'
 style options? (Anyway VACUUM gets the*third additional*  option
 sytle, but it is the different discussion from this)


 Well, almost everything does a trailing WITH. We need to either stick with
 that for consistency, or add leading () as an option to those WITH commands.

 Does anyone know why those are WITH? Is it ANSI?

 As a refresher, current commands are:

  VACUUM (ANALYZE, VERBOSE) table1 (col1);
  REINDEX INDEX index1 FORCE;
  COPY table1 FROM 'file.txt' WITH (FORMAT csv);
  CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
  CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
  CREATE ROLE role WITH LOGIN;
  GRANT  WITH GRANT OPTION;
  CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
  ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
  DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;

We have discussed about this option including FORCE option, but I
think there are not user who want to use both FORCE and VERBOSE option
at same time.
Can we think and add new syntax without FORCE option while leaving
current Reindex statement syntax?

As prototype, I attached new version patch has the following syntax.
REINDEX { INDEX | TABLE | ... } relname [ FORCE | VERBOSE];

Regards,

---
Sawada Masahiko


000_reindex_verbose_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-24 Thread Sawada Masahiko
On Fri, Feb 20, 2015 at 12:24 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello,

 I showed an extreme number of examples to include *almost of all*
 variations of existing syntax of option specification. And showed
 what if all variations could be used for all commands. It was
 almost a mess. Sorry for the confusion.

 I think the issues at our hands are,

  - Options location: at-the-end or right-after-the-keyword?

  - FORCE options to be removed?

  - Decide whether to allow bare word option if the options are to
be located right after the keyword.

 Optinions or thoughts?

 
 Rethinking from here.

 At Wed, 11 Feb 2015 14:34:17 -0600, Jim Nasby jim.na...@bluetreble.com 
 wrote in 54dbbcc9.1020...@bluetreble.com
 On 2/5/15 12:01 PM, Tom Lane wrote:
 ...
  Meh.  Options-at-the-end seems by far the most sensible style to me.
  The options-right-after-the-keyword style is a mess, both logically
  and from a parsing standpoint, and the only reason we have it at all
  is historical artifact (ask Bruce about the origins of VACUUM ANALYZE
  over a beer sometime).
 ...
 I know you were worried about accepting options anywhere because it
 leads to reserved words, but perhaps we could support it just for
 EXPLAIN and VACUUM, and then switch to trailing options if people
 think that would be better.

 According to the above discussion, VACUUM and REINDEX should have
 trailing options. Tom seems (to me) suggesting that SQL-style
 (bare word preceded by WITH) options and Jim suggesting '()'
 style options? (Anyway VACUUM gets the *third additional* option
 sytle, but it is the different discussion from this)

 VACUUM [tname [(cname, ...)]] [({FULL [bool]|FREEZE | [bool]|...})]

  =# VACUUM t1 (FULL, FREEZE);

 VACUUM [tname [(cname, ...)]] [WITH [FULL [bool]]|[FREEZE | [bool]|...]

  =# VACUUM t1 WITH FULL;

 IMHO, we are so accustomed to call by the names VACUUM FULL or
 VACUUM FREEZE that the both of them look a bit uneasy.


 If the new syntax above is added, REINDEX should have *only* the
 trailing style.

 REINDEX [{INDEX|TABLE|...}] name [(VERBOSE [bool]|...)]

  =# REINDEX TABLE t1 (VERBOSE);

 REINDEX [{INDEX|TABLE|...}] name [WITH {VERBOSE [bool]|...}]

  =# REINDEX INDEX i_t1_pkey WITH VERBOSE;

 Also, both of them seems to be unintuitive..


 EXPLAIN.. it seems to be preferred to be as it is..


 As the result, it seems the best way to go on the current syntax
 for all of those commands.

 Optinions?



 At Wed, 18 Feb 2015 23:58:15 +0900, Sawada Masahiko sawada.m...@gmail.com 
 wrote in cad21aobkjndqaq2z-wbscmdhox257bjj6suythwwfs2il8y...@mail.gmail.com
 From consistency perspective,  I tend to agree with Robert to put
 option at immediately after command name as follows.
 REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;

 I don't object against it if you prefer it. The remaining issue
 is the choice between options-at-the-end or this
 options-right-after-the-keyword mentioned above. I prefer the
 more messy(:-) one..


 Btw how long will the FORCE command available?

 The options is obsolete since 7.4. I think it should have been
 fade away long since and it's the time to remove it. But once the
 ancient option removed, the above syntax looks a bit uneasy and
 the more messy syntax looks natural.


 REINDEX [VERBOSE] {INDEX | ...} name;

 That do you think about this?


Thank you for summarizing them.

I said right-after-the-keyword is looks good to me.
But it's will be possible only if FORCE command is removed.
REINDEX command has FORCE option at the end, so REINDEX probably
should have options at the end.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-18 Thread Sawada Masahiko
   VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)]
 | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)]
 | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})]

 REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)]

 EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] statement


I don't think (OptionName [bool]) style like (VERBOSE on, FORCE
on) is needed for REINDEX command.
EXPLAIN command has such option style because it has the FORMAT option
can have value excepting ON/TRUE or OFF/FALSE.(e.g., TEXT, XML)
But the value of REINDEX command option can have only ON or OFF.
I think the option name is good enough.

Next, regarding of the location of such option, the several
maintenance command like CLUSTER, VACUUM has option at immediately
after command name.
From consistency perspective,  I tend to agree with Robert to put
option at immediately after command name as follows.
REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;

Btw how long will the FORCE command available?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-04 Thread Sawada Masahiko
On Wed, Feb 4, 2015 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby jim.na...@bluetreble.com writes:
 On 2/3/15 5:08 PM, Tom Lane wrote:
 Jim Nasby jim.na...@bluetreble.com writes:
 VACUUM puts the options before the table name, so ISTM it'd be best to
 keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or
 REINDEX {INDEX | ...} (options).

 Well, I really really don't like the first of those.  IMO the command name
 is REINDEX INDEX etc, so sticking something in the middle of that is
 bogus.

 Actually, is there a reason we can't just accept all 3? Forcing people
 to remember exact ordering of options has always struck me as silly.

 And that's an even worse idea.  Useless flexibility in syntax tends to
 lead to unfortunate consequences like having to reserve keywords.


As per discussion, it seems to good with
REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ]
or
REINDEX { INDEX | TABLE | etc } [ (option [, optoin ...] ) ] name
i.g., the options of reindex(VERBOSE and FORCE) are put at before or
after object name.

Because other maintenance command put option at before object name, I
think the latter is better.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-03 Thread Sawada Masahiko
On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Sawada Masahiko sawada.m...@gmail.com writes:
 On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Now, I think that it may
 be better to provide the keyword VERBOSE before the type of object
 reindexed as REINDEX [ VERBOSE ] object.

 Actually, my first WIP version of patch added VERBOSE word at before
 type of object.
 I'm feeling difficult about that the position of VERBOSE word in
 REINDEX statement.

 The way that FORCE was added to REINDEX was poorly thought out; let's not
 double down on that with another option added without any consideration
 for future expansion.  I'd be happier if we adopted something similar to
 the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in
 parentheses.


I understood.
I'm imagining new REINDEX syntax are followings.
- REINDEX (INDEX, VERBOSE) hoge_idx;
- REINDEX (TABLE) hoge_table;

i.g., I will add following syntax format,
REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] )
name [FORCE];

Thought?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-02 Thread Sawada Masahiko
On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Feb 2, 2015 at 8:31 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Attached patch adds VERBOSE option to REINDEX commands.
 Please give me feedbacks.
 This could provide useful feedback to users.

Thanks.

 Now, I think that it may
 be better to provide the keyword VERBOSE before the type of object
 reindexed as REINDEX [ VERBOSE ] object.

Actually, my first WIP version of patch added VERBOSE word at before
type of object.
I'm feeling difficult about that the position of VERBOSE word in
REINDEX statement.

 In any case, at quick sight,
 the table completion for REINDEX is broken with your patch because by
 typing REINDEX VERBOSE you would show the list of objects and once
 again VERBOSE.

I have also rebased the tab-completion source, I think it's not happen.
In my environment, it does not show list of object and VERBOSE again
after typing REINDEX VERBOSE.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-02 Thread Sawada Masahiko
Hi all,

Attached patch adds VERBOSE option to REINDEX commands.
The another maintaining commands(VACUUM FULL, CLUSTER) has VERBOSE option,
but REINDEX has not been had it.

Examples is following,

- REINDEX TABLE
[postgres][5432](1)=# REINDEX TABLE VERBOSE hoge;
INFO:  index hoge_idx was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  index hoge2_idx was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
REINDEX

- REINDEX SCHEMA
[postgres][5432](1)=# REINDEX SCHEMA VERBOSE s;
INFO:  index hoge_idx was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index hoge2_idx was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  indexes of whole table s.hoge were reindexed
REINDEX

Please give me feedbacks.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 12:58 AM, Mike Blackwell mike.blackw...@rrd.com wrote:
 This would default to being available to superusers only, right?  Details of
 the file system shouldn't be available to any random user.


This WIP patch does not behave like that, but I agree.
This view would be effective combine with ALTER SYSTEM,  and ALTER
SYSTEM command is executable to superuser only also.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 12:24 AM, David Fetter da...@fetter.org wrote:
 On Fri, Jan 30, 2015 at 09:38:10PM +0900, Sawada Masahiko wrote:
 On Tue, Jan 27, 2015 at 3:34 AM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Jan 22, 2015 at 5:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  David Johnston david.g.johns...@gmail.com writes:
  On Thu, Jan 22, 2015 at 3:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Is that a requirement, and if so why?  Because this proposal doesn't
  guarantee any such knowledge AFAICS.
 
  The proposal provides for SQL access to all possible sources of variable
  value setting and, ideally, a means of ordering them in priority order, 
  so
  that a search for TimeZone would return two records, one for
  postgresql.auto.conf and one for postgresql.conf - which are numbered 1 
  and
  2 respectively - so that in looking at that result if the
  postgresql.auto.conf entry were to be removed the user would know that 
  what
  the value is in postgresql.conf that would become active.  Furthermore, 
  if
  postgresql.conf has a setting AND there is a mapping in an #included file
  that information would be accessible via SQL as well.
 
  I know what the proposal is.  What I am questioning is the use-case that
  justifies having us build and support all this extra mechanism.  How often
  does anyone need to know what the next down variable value would be?
 
  I believe I've run into situations where this would be useful.  I
  wouldn't be willing to put in the effort to do this myself, but I
  wouldn't be prepared to vote against a high-quality patch that someone
  else felt motivated to write.
 

 Attached patch adds new view pg_file_settings (WIP version).
 This view behaves like followings,
 [postgres][5432](1)=# select * from pg_file_settings ;
 name|  setting   |
   sourcefile
 ++
  max_connections| 100|
 /home/masahiko/pgsql/master/3data/postgresql.conf
  shared_buffers | 128MB  |
 /home/masahiko/pgsql/master/3data/postgresql.conf

 This looks great!

 Is there a reason not to have the sourcefile as a column in
 pg_settings?


pg_file_settings view also has source file column same as pg_settings.
It might was hard to confirm that column in previous mail.
I put example of pg_file_settings again as follows.

[postgres][5432](1)=# select * from pg_file_settings where name = 'work_mem';
-[ RECORD 1 ]--
name   | work_mem
setting| 128MB
sourcefile | /home/masahiko/pgsql/master/3data/hoge.conf
-[ RECORD 2 ]--
name   | work_mem
setting| 64MB
sourcefile | /home/masahiko/pgsql/master/3data/postgresql.auto.conf

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 4:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Johnston david.g.johns...@gmail.com writes:
 On Fri, Jan 30, 2015 at 12:05 PM, David Fetter da...@fetter.org wrote:
 Why might the contents of pg_settings be different from what would be
 in pg_file_settings, apart from the existence of this column?

 The contents of pg_settings uses the setting name as a primary key.
 The contents of pg_file_setting uses a compound key consisting of the
 setting name and the source file.

 [ raised eyebrow... ]  Seems insufficient in the case that multiple
 settings of the same value occur in the same source file.  Don't you
 need a line number in the key as well?


Yep, a line number is also needed.
The source file and line number would be primary key of pg_file_settings view.
I need to change like that.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 3:20 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sat, Jan 31, 2015 at 2:00 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Sat, Jan 31, 2015 at 4:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Johnston david.g.johns...@gmail.com writes:
 On Fri, Jan 30, 2015 at 12:05 PM, David Fetter da...@fetter.org wrote:
 Why might the contents of pg_settings be different from what would be
 in pg_file_settings, apart from the existence of this column?

 The contents of pg_settings uses the setting name as a primary key.
 The contents of pg_file_setting uses a compound key consisting of the
 setting name and the source file.

 [ raised eyebrow... ]  Seems insufficient in the case that multiple
 settings of the same value occur in the same source file.  Don't you
 need a line number in the key as well?


 Yep, a line number is also needed.
 The source file and line number would be primary key of pg_file_settings 
 view.
 I need to change like that.


 Attached patch is latest version including following changes.
 - This view is available to super use only
 - Add sourceline coulmn

 Also I registered CF app.


Sorry, I registered unnecessary (extra) threads to CF app.
How can I delete them?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 2:00 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sat, Jan 31, 2015 at 4:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Johnston david.g.johns...@gmail.com writes:
 On Fri, Jan 30, 2015 at 12:05 PM, David Fetter da...@fetter.org wrote:
 Why might the contents of pg_settings be different from what would be
 in pg_file_settings, apart from the existence of this column?

 The contents of pg_settings uses the setting name as a primary key.
 The contents of pg_file_setting uses a compound key consisting of the
 setting name and the source file.

 [ raised eyebrow... ]  Seems insufficient in the case that multiple
 settings of the same value occur in the same source file.  Don't you
 need a line number in the key as well?


 Yep, a line number is also needed.
 The source file and line number would be primary key of pg_file_settings view.
 I need to change like that.


Attached patch is latest version including following changes.
- This view is available to super use only
- Add sourceline coulmn

Also I registered CF app.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >