date:20130412

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Sameer Thakur

On Fri, Apr 12, 2013 at 9:52 PM, Andres Freund wrote:

> On 2013-04-12 12:14:24 -0400, Tom Lane wrote:
> > Andrew Dunstan  writes:
> > > On 04/12/2013 10:15 AM, Tom Lane wrote:
> > >> There's 0 chance of making that work, because the two databases
> wouldn't
> > >> have the same notions of committed XIDs.
> >
> > > Yeah. Trying to think way outside the box, could we invent some sort of
> > > fixup mechanism that could be applied to adopted files?
> >
> > Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> > InvalidXID as appropriate, if you had access to the source database's
> > clog while you did the copying.  It just wouldn't be very fast.
>
> >I think if one goes over the heap and hint bits everything (so the item
> >pointers don't have to be immediately rewritten), freeze everything and
> >such it should be doable at about disk speed unless you have a really
> >fast disk subsystem.
> >But it still is fairly complicated and I doubt its really necessary.
>
> >> I suppose it would still be faster than a COPY transfer, but I'm not
> > >sure it'd be enough faster to justify the work and the additional
> > >portability hits you'd be taking.
>
> >Using binary copy might already give quite a speedup, Sameer, did you
> try that?
> No we have not so far, was soliciting feedback first from the hackers and
> possibly implement as a contrib module. Also i did misread the earlier post
> on the subject.



> >Also, do you really need parts of a cluster or would a base backup of
> >the whole cluster do the trick?
> We were looking at parts of cluster as an faster alternative to pg_dump
> and restore
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hash Join cost estimates

2013-04-12 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'm certainly curious about those, but I'm also very interested in the
> > possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
> > depending on the work_mem setting.
> 
> Not sure about that.  That would make the hash-bucket-header array
> larger without changing the size of the rest of the hash table, thus
> probably making the CPU cache situation worse not better (which would
> manifest as more time at the first of these two statements relative to
> the second).

In the testing that I've done, I've yet to find a case where having a
smaller NTUP_PER_BUCKET makes things worse and it can have a dramatic
improvement.  Regarding the memory situation, I'm not sure that using
buckets really helps, though I'm no CPU architect.  However, assuming
that the CPU is smart enough to only pull in bits of memory that it
needs, I'm not sure how having to pull in parts of a large array of
pointers is worse than having to go fetch randomly placed entries in a
bucket, especially since we have to step through an entire bucket each
time and the bucket tuples are allocated independently, while the hash
array is allocated once.  Indeed, it seems you'd be more likely to get
other things you need in a given pull into cache for the hash table
array than with the bucket tuple entries which might well include
unrelated garbage.

> Can you add some instrumentation code to get us information about the
> average/max length of the bucket chains?  And maybe try to figure out
> how many distinct hash values per bucket, which would give us a clue
> whether your two-level-list idea is worth anything.

I ended up implementing the two-level system and doing some testing with
it.  It ends up making hash table building take quite a bit longer and
only improves the scan performance in very select cases.  The
improvement requires an individual bucket to have both lots of dups and
a lot of distinct values, because it only helps when it can skip a
significant number of tuples.  If we move to a system where there's
rarely more than one distinct value in a bucket then the chances of a
serious improvment from the two-level system goes down that much more.

As such, I've come up with this (trival) patch which simply modifies
ExecChooseHashTableSize() to ignore NTUP_PER_BUCKET (essentially
treating it as 1) when work_mem is large enough to fit the entire hash
table (which also implies that there is only one batch).  I'd love to
hear feedback from others on what this does under different conditions.
This also makes the "hash-the-small-table" case faster for the test case
which I provided earlier (http://snowman.net/~sfrost/test_case2.sql),
and it uses quite a bit less memory too.  Now that I've convinced myself
that the two-level system isn't practical and the "hash-the-small-table"
case is actually faster than "hash-the-big-table", I'll start looking
at improving on your proposal to change the costing to favor the smaller
table being hashed more often.

Thanks,

Stephen
colordiff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
new file mode 100644
index 6a2f236..a8a8168
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*** ExecChooseHashTableSize(double ntuples,
*** 489,496 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;

! 		dbuckets = ceil(ntuples / NTUP_PER_BUCKET);
! 		dbuckets = Min(dbuckets, max_pointers);
  		nbuckets = (int) dbuckets;

  		nbatch = 1;
--- 489,495 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;

! 		dbuckets = Min(ntuples, max_pointers);
  		nbuckets = (int) dbuckets;

  		nbatch = 1;

signature.asc
Description: Digital signature

Re: [HACKERS] Small reduction in memory usage of index relcache entries

2013-04-12 Thread Tom Lane

Heikki Linnakangas  writes:
> Looking closer at where that memory is spent, a lot of it goes into the 
> FmgrInfo structs in RelationAmInfo. But some of them are outright unused 
> (ambuild, ambuildempty, amcostestimate, amoptions), and a few others 
> hardly are called so seldom that they hardly need to be cached 
> (ambuildelete, ambacuumcleanup). Removing those fields, patch attached, 
> reduces the memory usage nicely:

+1, but it would look nicer if you could encapsulate the code in some
macro analogous to GET_REL_PROCEDURE(), perhaps call it
GET_UNCACHED_REL_PROCEDURE()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Ants Aasma

On Sat, Apr 13, 2013 at 12:38 AM, Jeff Davis  wrote:
> On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote:
>> I think this is a bad idea. It complicates the WAL format significantly.
>> Simon's patch didn't include the changes to recovery to validate the
>> checksum, but I suspect it would be complicated. And it reduces the
>> error-detection capability of WAL recovery. Keep in mind that unlike
>> page checksums, which are never expected to fail, so even if we miss a
>> few errors it's still better than nothing, the WAL checkum is used to
>> detect end-of-WAL. There is expected to be a failure every time we do
>> crash recovery. This far, we've considered the probability of one in
>> 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.
>
> One thing that just occurred to me is that we could make the SIMD
> checksum a 32-bit checksum, and reduce it down to 16 bits for the data
> pages. That might give us more flexibility to later use it for WAL
> without compromising on the error detection nearly as much (though
> obviously that wouldn't work with Simon's current proposal which uses
> the same data page checksum in a WAL backup block).

The simple 32bit version of the algorithm would need CPU capability
checks for the fast version and would work only on CPUs produced in
the last few years. Not a show stopper but but more complex code and
less applicability for sure.

An alternative would be to calculate 2 16 bit checksums, concat them
for the 32bit checksum and add them for the 16 bit one. In this case
we wouldn't need to change the current algorithm. A future change
could just factor out everything until the last add as the common
function. But keep in mind that we are talking about sharing about 400
bytes of machine code here.

> In general, we have more flexibility with WAL because there is no
> upgrade issue. It would be nice to share code with the data page
> checksum algorithm; but really we should just use whatever offers the
> best trade-off in terms of complexity, performance, and error detection
> rate.
>
> I don't think we need to decide all of this right now. Personally, I'm
> satisfied having SIMD checksums on data pages now and leaving WAL
> optimization until later.

+1

I feel quite uneasy about reducing the effectiveness of WAL end
detection. There are many ways to improve WAL performance and I have
no idea what would be the best one. At the very least some performance
tests are in order. As this is not an essential part of having usable
checksums, but a general performance optimization I feel that it is
not fair to others to postpone the release to resolve this now. I'd be
more than happy to research this for 9.4.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis

On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote:
> I think this is a bad idea. It complicates the WAL format significantly. 
> Simon's patch didn't include the changes to recovery to validate the 
> checksum, but I suspect it would be complicated. And it reduces the 
> error-detection capability of WAL recovery. Keep in mind that unlike 
> page checksums, which are never expected to fail, so even if we miss a 
> few errors it's still better than nothing, the WAL checkum is used to 
> detect end-of-WAL. There is expected to be a failure every time we do 
> crash recovery. This far, we've considered the probability of one in 
> 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.

One thing that just occurred to me is that we could make the SIMD
checksum a 32-bit checksum, and reduce it down to 16 bits for the data
pages. That might give us more flexibility to later use it for WAL
without compromising on the error detection nearly as much (though
obviously that wouldn't work with Simon's current proposal which uses
the same data page checksum in a WAL backup block).

In general, we have more flexibility with WAL because there is no
upgrade issue. It would be nice to share code with the data page
checksum algorithm; but really we should just use whatever offers the
best trade-off in terms of complexity, performance, and error detection
rate.

I don't think we need to decide all of this right now. Personally, I'm
satisfied having SIMD checksums on data pages now and leaving WAL
optimization until later.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis

On Fri, 2013-04-12 at 21:28 +0200, Andres Freund wrote:
> That means we will have to do the verification for this in
> ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
> won't always recognize the end of WAL correctly.
> And I am a bit wary of reducing the likelihood of noticing the proper
> end-of-recovery by reducing the crc width.

Good point.

> Why again are we doing this now? Just to reduce the overhead of CRC
> computation for full page writes? Or are we forseeing issues with the
> page checksums being wrong because of non-zero data in the hole being
> zero after the restore from bkp blocks?

That shouldn't be a problem, because the block is not expected to have a
proper checksum in WAL, and it will be recalculated before being
written. So I see these changes as mostly independent.

The reason we're discussing right now is because, when choosing the
checksum algorithm, I was hoping that it might be usable in the future
for WAL backup blocks. I'm convinced that they can be; and the primary
question now seems to be "should they be", which does not need to be
settled right now in my opinion.

Anyway, I would be perfectly happy if we just got the SIMD algorithm in
for data pages. The support for changing the WAL checksums seems
lukewarm, and there might be quite a few alternatives (e.g. optimizing
the CRC for backup blocks as Heikki suggested) to achieve that
performance goal.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis

On Fri, 2013-04-12 at 15:21 -0400, Bruce Momjian wrote:
> > * When we do PageSetChecksumInplace(), we need to be 100% sure that the
> > hole is empty; otherwise the checksum will fail when we re-expand it. It
> > might be worth a memset beforehand just to be sure.
> 
> Do we write the page holes to the WAL for full-page writes?  I hope we
> don't.

No, but the page hole is included in the checksum.

Let's say that the page hole contains some non-zero value, and we
calculate a checksum. When we eliminate the page hole, and then
reconstitute the page using zeros for the page hole later, then the page
will not match the checksum any more.

So, we need to be sure the original page hole is all-zero when we
calculate the checksum.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Simon Riggs

On 12 April 2013 21:03, Heikki Linnakangas  wrote:

> No, the patch has to compute the 16-bit checksum for the page when the
> full-page image is added to the WAL record. There would otherwise be no need
> to calculate the page checksum at that point, but only later when the page
> is written out from shared buffer cache.
>
> I think this is a bad idea. It complicates the WAL format significantly.
> Simon's patch didn't include the changes to recovery to validate the
> checksum, but I suspect it would be complicated. And it reduces the
> error-detection capability of WAL recovery. Keep in mind that unlike page
> checksums, which are never expected to fail, so even if we miss a few errors
> it's still better than nothing, the WAL checkum is used to detect
> end-of-WAL. There is expected to be a failure every time we do crash
> recovery. This far, we've considered the probability of one in 1^32 small
> enough for that purpose, but IMHO one in 1^16 is much too weak.
>
> If you want to speed up the CRC calculation of full-page images, you could
> have an optimized version of the WAL CRC algorithm, using e.g. SIMD
> instructions. Because typical WAL records are small, max 100-200 bytes, and
> it consists of several even smaller chunks, the normal WAL CRC calculation
> is quite resistant to common optimization techniques. But it might work for
> the full-page images. Let's not conflate it with the page checksums, though.

I accept the general tone of that as a reasonable perspective and in
many ways am on the fence myself. This is sensitive stuff.

A few points
* The code to validate the checksum isn't complex, though it is more
than the current one line. Lets say about 10 lines of clear code. I'll
work on that to show its true. I don't see that as a point of
objection.

* WAL checksum is not used as the sole basis for end-of-WAL discovery.
We reuse the WAL files, so the prev field in each WAL record shows
what the previous end of WAL was. Hence if the WAL checksums give a
false positive we still have a double check that the data really is
wrong. It's unbelievable that you'd get a false positive and then have
the prev field match as well, even though it was the genuine
end-of-WAL.

Yes, we could also have a second SIMD calculation optimised for WAL
CRC32 on an 8192 byte block, rather than just one set of SIMD code for
both. We could also have a single set of SIMD code producing a 32-bit
checksum, then take the low 16 bits as we do currently.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Heikki Linnakangas


On 12.04.2013 22:31, Bruce Momjian wrote:

On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:

Only point worth discussing is that this change would make backup blocks be
covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
header is covered by a CRC32 but the backup blocks only by 16-bit.


That means we will have to do the verification for this in
ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
won't always recognize the end of WAL correctly.
And I am a bit wary of reducing the likelihood of noticing the proper
end-of-recovery by reducing the crc width.

Why again are we doing this now? Just to reduce the overhead of CRC
computation for full page writes? Or are we forseeing issues with the
page checksums being wrong because of non-zero data in the hole being
zero after the restore from bkp blocks?


I thought the idea is that we were going to re-use the already-computed
CRC checksum on the page, and we only have 16-bits of storage for that.


No, the patch has to compute the 16-bit checksum for the page when the 
full-page image is added to the WAL record. There would otherwise be no 
need to calculate the page checksum at that point, but only later when 
the page is written out from shared buffer cache.


I think this is a bad idea. It complicates the WAL format significantly. 
Simon's patch didn't include the changes to recovery to validate the 
checksum, but I suspect it would be complicated. And it reduces the 
error-detection capability of WAL recovery. Keep in mind that unlike 
page checksums, which are never expected to fail, so even if we miss a 
few errors it's still better than nothing, the WAL checkum is used to 
detect end-of-WAL. There is expected to be a failure every time we do 
crash recovery. This far, we've considered the probability of one in 
1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.


If you want to speed up the CRC calculation of full-page images, you 
could have an optimized version of the WAL CRC algorithm, using e.g. 
SIMD instructions. Because typical WAL records are small, max 100-200 
bytes, and it consists of several even smaller chunks, the normal WAL 
CRC calculation is quite resistant to common optimization techniques. 
But it might work for the full-page images. Let's not conflate it with 
the page checksums, though.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small reduction in memory usage of index relcache entries

2013-04-12 Thread Heikki Linnakangas

While debugging an out-of-memory situation, I noticed that the relcache 
entry of each index consumes 2k of memory. From a memory dump:



...
pg_database_datname_index: 2048 total in 1 blocks; 712 free (0 chunks); 
1336 used
pg_trigger_tgrelid_tgname_index: 2048 total in 1 blocks; 576 free (0 
chunks); 1472 used
pg_rewrite_rel_rulename_index: 2048 total in 1 blocks; 576 free (0 chunks); 
1472 used
pg_amproc_fam_proc_index: 2048 total in 1 blocks; 232 free (0 chunks); 1816 
used
pg_opclass_oid_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used
pg_index_indexrelid_index: 2048 total in 1 blocks; 664 free (0 chunks); 
1384 used
pg_attribute_relid_attnum_index: 2048 total in 1 blocks; 528 free (0 
chunks); 1520 used
pg_class_oid_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used


Looking closer at where that memory is spent, a lot of it goes into the 
FmgrInfo structs in RelationAmInfo. But some of them are outright unused 
(ambuild, ambuildempty, amcostestimate, amoptions), and a few others 
hardly are called so seldom that they hardly need to be cached 
(ambuildelete, ambacuumcleanup). Removing those fields, patch attached, 
reduces the memory usage nicely:



...
pg_database_datname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 
used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 
chunks); 960 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 
960 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 
1336 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 
used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 
chunks); 1008 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used


This brings most index entries from two 1k blocks down to one 1k block. 
A backend seems to load about 60 indexes into the cache for system 
tables, so that saves about ~60 kB of memory for every backend at a minimum.


This isn't a huge difference, unless you access a huge number of 
indexes. But the patch is trivial, and every little helps. Any 
objections if I commit this to master?


- Heikki
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index ce94649..b9e519a 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -671,14 +671,19 @@ index_bulk_delete(IndexVacuumInfo *info,
   void *callback_state)
 {
 	Relation	indexRelation = info->index;
-	FmgrInfo   *procedure;
+	RegProcedure procOid;
+	FmgrInfo	procedure;
 	IndexBulkDeleteResult *result;
 
 	RELATION_CHECKS;
-	GET_REL_PROCEDURE(ambulkdelete);
+
+	procOid = indexRelation->rd_am->ambulkdelete;
+	if (!RegProcedureIsValid(procOid))
+		elog(ERROR, "invalid ambulkdelete regproc");
+	fmgr_info(procOid, &procedure);
 
 	result = (IndexBulkDeleteResult *)
-		DatumGetPointer(FunctionCall4(procedure,
+		DatumGetPointer(FunctionCall4(&procedure,
 	  PointerGetDatum(info),
 	  PointerGetDatum(stats),
 	  PointerGetDatum((Pointer) callback),
@@ -698,14 +703,19 @@ index_vacuum_cleanup(IndexVacuumInfo *info,
 	 IndexBulkDeleteResult *stats)
 {
 	Relation	indexRelation = info->index;
-	FmgrInfo   *procedure;
+	RegProcedure procOid;
+	FmgrInfo	procedure;
 	IndexBulkDeleteResult *result;
 
 	RELATION_CHECKS;
-	GET_REL_PROCEDURE(amvacuumcleanup);
+
+	procOid = indexRelation->rd_am->amvacuumcleanup;
+	if (!RegProcedureIsValid(procOid))
+		elog(ERROR, "invalid amvacuumcleanup regproc");
+	fmgr_info(procOid, &procedure);
 
 	result = (IndexBulkDeleteResult *)
-		DatumGetPointer(FunctionCall2(procedure,
+		DatumGetPointer(FunctionCall2(&procedure,
 	  PointerGetDatum(info),
 	  PointerGetDatum(stats)));
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a4daf77..05f2011 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -47,8 +47,8 @@ typedef LockInfoData *LockInfo;
 
 
 /*
- * Cached lookup information for the index access method functions defined
- * by the pg_am row associated with an index relation.
+ * Cached lookup information for the frequently used index access method
+ * functions, defined by the pg_am row associated with an index relation.
  */
 typedef struct RelationAmInfo
 {
@@ -60,13 +60,7 @@ typedef struct RelationAmInfo
 	FmgrInfo	amendscan;
 	FmgrInfo	ammarkpos;
 	FmgrInfo	amrestrpos;
-	FmgrInfo	ambuild;
-	FmgrInfo	ambuildempty;
-	FmgrInfo	ambulkdelete;
-	FmgrInfo	amvacuumcleanup;
 	FmgrInfo	amcanreturn;
-	FmgrInfo	amcostestimate;
-	FmgrInfo	amoptions;
 } RelationAmInfo;
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Andres Freund

On 2013-04-12 15:31:36 -0400, Bruce Momjian wrote:
> On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:
> > > Only point worth discussing is that this change would make backup blocks 
> > > be
> > > covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
> > > header is covered by a CRC32 but the backup blocks only by 16-bit.
> > 
> > That means we will have to do the verification for this in
> > ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
> > won't always recognize the end of WAL correctly.
> > And I am a bit wary of reducing the likelihood of noticing the proper
> > end-of-recovery by reducing the crc width.
> > 
> > Why again are we doing this now? Just to reduce the overhead of CRC
> > computation for full page writes? Or are we forseeing issues with the
> > page checksums being wrong because of non-zero data in the hole being
> > zero after the restore from bkp blocks?
> 
> I thought the idea is that we were going to re-use the already-computed
> CRC checksum on the page, and we only have 16-bits of storage for that.

Well, but the proposal seems to be to do this also for non-checksum
enabled datadirs, so ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Gurjeet Singh

On Fri, Apr 12, 2013 at 3:26 PM, Bruce Momjian  wrote:

> On Fri, Apr 12, 2013 at 01:34:49PM -0400, Gurjeet Singh wrote:
> > Can you also improve the output when it dies upon failure to fetch
> something?
> > Currently the only error message it emits is "fetching xyz", and leaves
> the
> > user confused as to what really the problem was. The only indication of a
> > problem might be the exit code,  but I'm not sure of that, and that
> doesn't
> > help the interactive user running it on terminal.
>
> Good point.  I have reviewed all the error messages and improved them
> with the attached, applied patch, e.g.:
>
> cannot locate typedefs file "xyz" at /usr/local/bin/pgindent line
> 121.
>

Thanks!

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian

On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:
> > Only point worth discussing is that this change would make backup blocks be
> > covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
> > header is covered by a CRC32 but the backup blocks only by 16-bit.
> 
> That means we will have to do the verification for this in
> ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
> won't always recognize the end of WAL correctly.
> And I am a bit wary of reducing the likelihood of noticing the proper
> end-of-recovery by reducing the crc width.
> 
> Why again are we doing this now? Just to reduce the overhead of CRC
> computation for full page writes? Or are we forseeing issues with the
> page checksums being wrong because of non-zero data in the hole being
> zero after the restore from bkp blocks?

I thought the idea is that we were going to re-use the already-computed
CRC checksum on the page, and we only have 16-bits of storage for that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Andres Freund

On 2013-04-11 20:12:59 +0100, Simon Riggs wrote:
> On 11 April 2013 04:27, Jeff Davis  wrote:
> 
> > On Wed, 2013-04-10 at 20:17 +0100, Simon Riggs wrote:
> >
> > > OK, so we have a single combined "calculate a checksum for a block"
> > > function. That uses Jeff's zeroing trick and Ants' bulk-oriented
> > > performance optimization.
> > >
> > >
> > > For buffer checksums we simply calculate for the block.
> >
> > Sounds good.
> >
> > > For WAL full page writes, we first set the checksums for all defined
> > > buffers, then calculate the checksum of remaining data plus the
> > > pd_checksum field from each block using the normal WAL CRC32.
> > >
> > > Seems good to me. One set of fast code. And it avoids the weirdness
> > > that the checksum stored on the full page is actually wrong.
> >
> > Oh, that's a nice benefit.
> 
> 
> So, if we apply a patch like the one attached, we then end up with the WAL
> checksum using the page checksum as an integral part of its calculation.
> (There is no increase in code inside WALInsertLock, nothing at all touched
> in that area).
> 
> Then all we need to do is make PageSetChecksumInplace() use Ants' algo and
> we're done.
> 
> Only point worth discussing is that this change would make backup blocks be
> covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
> header is covered by a CRC32 but the backup blocks only by 16-bit.

That means we will have to do the verification for this in
ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
won't always recognize the end of WAL correctly.
And I am a bit wary of reducing the likelihood of noticing the proper
end-of-recovery by reducing the crc width.

Why again are we doing this now? Just to reduce the overhead of CRC
computation for full page writes? Or are we forseeing issues with the
page checksums being wrong because of non-zero data in the hole being
zero after the restore from bkp blocks?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Bruce Momjian

On Fri, Apr 12, 2013 at 01:34:49PM -0400, Gurjeet Singh wrote:
> Can you also improve the output when it dies upon failure to fetch something?
> Currently the only error message it emits is "fetching xyz", and leaves the
> user confused as to what really the problem was. The only indication of a
> problem might be the exit code,  but I'm not sure of that, and that doesn't
> help the interactive user running it on terminal.

Good point.  I have reviewed all the error messages and improved them
with the attached, applied patch, e.g.:

cannot locate typedefs file "xyz" at /usr/local/bin/pgindent line 121.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 2e9d443..73237ca 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -30,7 +30,7 @@ my %options = (
 	"excludes=s"  => \$excludes,
 	"indent=s"=> \$indent,
 	"build"   => \$build,);
-GetOptions(%options) || die "bad command line";
+GetOptions(%options) || die "bad command line argument";
 
 run_build($code_base) if ($build);
 
@@ -118,10 +118,11 @@ sub load_typedefs
 		  if (-f "$tdtry/src/tools/pgindent/typedefs.list");
 		$tdtry = "$tdtry/..";
 	}
-	die "no typedefs file" unless $typedefs_file && -f $typedefs_file;
+	die "cannot locate typedefs file \"$typedefs_file\""
+		unless $typedefs_file && -f $typedefs_file;
 
 	open(my $typedefs_fh, '<', $typedefs_file)
-	  || die "opening $typedefs_file: $!";
+	  || die "cannot open typedefs file \"$typedefs_file\": $!";
 	my @typedefs = <$typedefs_fh>;
 	close($typedefs_fh);
 
@@ -143,7 +144,7 @@ sub process_exclude
 {
 	if ($excludes && @files)
 	{
-		open(my $eh, '<', $excludes) || die "opening $excludes";
+		open(my $eh, '<', $excludes) || die "cannot open exclude file \"$excludes\"";
 		while (my $line = <$eh>)
 		{
 			chomp $line;
@@ -162,7 +163,7 @@ sub read_source
 	my $source;
 
 	open(my $src_fd, '<', $source_filename)
-	  || die "opening $source_filename: $!";
+	  || die "cannot open file \"$source_filename\": $!";
 	local ($/) = undef;
 	$source = <$src_fd>;
 	close($src_fd);
@@ -177,7 +178,7 @@ sub write_source
 	my $source_filename = shift;
 
 	open(my $src_fh, '>', $source_filename)
-	  || die "opening $source_filename: $!";
+	  || die "cannot open file \"$source_filename\": $!";
 	print $src_fh $source;
 	close($src_fh);
 }
@@ -436,25 +437,25 @@ sub run_build
 		$code_base = "$code_base/..";
 	}
 
-	die "no src/tools/pgindent directory in $code_base"
+	die "cannot locate src/tools/pgindent directory in \"$code_base\""
 	  unless -d "$code_base/src/tools/pgindent";
 
 	chdir "$code_base/src/tools/pgindent";
 
-	my $rv = getstore("http://buildfarm.postgresql.org/cgi-bin/typedefs.pl";,
-		"tmp_typedefs.list");
+	my $typedefs_list_url = "http://buildfarm.postgresql.org/cgi-bin/typedefs.pl";;
 
-	die "fetching typedefs.list" unless is_success($rv);
+	my $rv = getstore($typedefs_list_url, "tmp_typedefs.list");
+
+	die "cannot fetch typedefs list from $typedefs_list_url" unless is_success($rv);
 
 	$ENV{PGTYPEDEFS} = abs_path('tmp_typedefs.list');
 
-	my $pg_bsd_indent_name = "pg_bsd_indent-" . $INDENT_VERSION . ".tar.gz";
+	my $pg_bsd_indent_url = "ftp://ftp.postgresql.org/pub/dev/pg_bsd_indent-"; . 
+ $INDENT_VERSION . ".tar.gz";
 
-	$rv =
-	  getstore("ftp://ftp.postgresql.org/pub/dev/$pg_bsd_indent_name";,
-		"pg_bsd_indent.tgz");
+	$rv = getstore($pg_bsd_indent_url, "pg_bsd_indent.tgz");
 
-	die "fetching $pg_bsd_indent_name" unless is_success($rv);
+	die "cannot fetch BSD indent tarfile from $pg_bsd_indent_url" unless is_success($rv);
 
 	# XXX add error checking here
 
@@ -484,7 +485,7 @@ sub build_clean
 		$code_base = "$code_base/..";
 	}
 
-	die "no src/tools/pgindent directory in $code_base"
+	die "cannot locate src/tools/pgindent directory in \"$code_base\""
 	  unless -d "$code_base/src/tools/pgindent";
 
 	chdir "$code_base";

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian

On Fri, Apr 12, 2013 at 12:07:36PM -0700, Jeff Davis wrote:
> > (Attached patch is discussion only. Checking checksum in recovery
> > isn't coded at all.)
> 
> I like it.
> 
> A few points:
> 
> * Given that setting the checksum is unconditional in a backup block, do
> we want to zero the checksum field when the backup block is restored if
> checksums are disabled? Otherwise we would have a strange situation
> where some blocks have a checksum on disk even when checksums are
> disabled.
> 
> * When we do PageSetChecksumInplace(), we need to be 100% sure that the
> hole is empty; otherwise the checksum will fail when we re-expand it. It
> might be worth a memset beforehand just to be sure.

Do we write the page holes to the WAL for full-page writes?  I hope we
don't.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane

Kevin Grittner  writes:
> For now what I'm suggesting is generating statistics in all the
> cases it did before, plus the case where it starts truncation but
> does not complete it.  The fact that before this patch there were
> cases where the autovacuum worker was killed, resulting in not
> generating needed statistics seems like a bug, not a behavior we
> need to preserve.

Well, in the case where it gets killed, it's still not gonna generate
statistics.  What we've really got here is a new case that did not exist
before, namely that it voluntarily stops truncating.  But I agree that
modeling that case's behavior on the kill case was poorly thought out.

In other words, yes, I think we're on the same page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis

On Thu, 2013-04-11 at 20:12 +0100, Simon Riggs wrote:

> So, if we apply a patch like the one attached, we then end up with the
> WAL checksum using the page checksum as an integral part of its
> calculation. (There is no increase in code inside WALInsertLock,
> nothing at all touched in that area).
> 
> 
> Then all we need to do is make PageSetChecksumInplace() use Ants' algo
> and we're done.
> 
> 
> Only point worth discussing is that this change would make backup
> blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e.
> the record header is covered by a CRC32 but the backup blocks only by
> 16-bit. 

FWIW, that's fine with me. 

> (Attached patch is discussion only. Checking checksum in recovery
> isn't coded at all.)

I like it.

A few points:

* Given that setting the checksum is unconditional in a backup block, do
we want to zero the checksum field when the backup block is restored if
checksums are disabled? Otherwise we would have a strange situation
where some blocks have a checksum on disk even when checksums are
disabled.

* When we do PageSetChecksumInplace(), we need to be 100% sure that the
hole is empty; otherwise the checksum will fail when we re-expand it. It
might be worth a memset beforehand just to be sure.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > The big win here over a binary COPY is pulling through the indexes as-is
> > as well- without having to rebuild them.

[... lots of reasons this is hard ...]

I agree that it's quite a bit more difficult, to the point that logical
replication which can be selective (eg: give me only table X + indexes)
might end up being the only answer, but otherwise this approach will
likely only be a modest improvement over binary COPY FREEZE- and there
only because we essentially end up skipping the type validation (which
we could just provide as an option, similar to COPY FREEZE...).

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-12 Thread Kohei KaiGai

>> A problem regarding to validation of sepgsql-regtest policy module
>> is originated by semodule commands that takes root privilege to
>> list up installed policy modules. So, I avoided to use this command
>> in the test_sepgsql script.
>> However, I have an idea that does not raise script fail even if "sudo
>> semodule -l" returned an error, except for a case when it can run
>> correctly and the policy version is not expected one.
>> How about your opinion for this check?
>
> Not sure that's too useful.  And I don't like the idea of putting sudo
> commands in a test harness script.  That seems too much like the sort
> of thing bad people do.
>
OK, I also doubt whether my idea make sense.

The attached patch omitted the portion to check the version of
sepgsql-regtest, and add some notice in the document instead.
Also, it moves current directory to the contrib/sepgsql on top of
the script, to avoid the problem when we run test_sepgsql
on the directory except for contring/sepgsql.

Thanks,
-- 
KaiGai Kohei 


sepgsql-v9.3-test-script-fixup.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-12 Thread Kohei KaiGai

2013/4/12 Robert Haas :
> On Fri, Apr 12, 2013 at 10:42 AM, Alvaro Herrera
>  wrote:
>> Robert Haas escribió:
>>> On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai  wrote:
>>
>>> > Also, the attached function-execute-permission patch is a rebased
>>> > version. I rethought its event name should be OAT_FUNCTION_EXECUTE,
>>> > rather than OAT_FUNCTION_EXEC according to the manner without
>>> > abbreviation. Other portion is same as previous ones.
>>>
>>> Great.  This looks fine to me, committed.  I also committed your
>>> getObjectIdentity patch, which caused some regression test output
>>> changes.  I applied the necessary correction before committing, I
>>> think, but please check.
>>
>> I think the function-execute code path is still using
>> getObjectDescription rather than identity.
>
> Yeah, I think so.  KaiGai, can you provide a follow-on patch to fix that?
>
Yes, of course. The attached one replaces the getObjectDescription in
sepgsql/proc.c, and relative changes in regression test.

Thanks,
-- 
KaiGai Kohei 


sepgsql-v9.3-replace-get-object-description.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner

[some relevant dropped bits of the thread restored]

Tom Lane  wrote:
> Kevin Grittner  writes:
>> Tom Lane  wrote:
>>> Kevin Grittner  writes:
 Jeff Janes  wrote:

 I propose to do the following:

 (1)  Restore the prior behavior of the VACUUM command.  This
 was only ever intended to be a fix for a serious autovacuum
 problem which caused many users serious performance problems

 (2)  If autovacuum decides to try to truncate but the lock
 cannot be initially acquired, and analyze is requested, skip
 the truncation and do the autoanalyze.

>>> I think that the minimum appropriate fix here is to [...] take
>>> out the suppression of stats reporting and analysis.
>> 
>> I'm not sure I understand -- are you proposing that is all we do
>> for both the VACUUM command and autovacuum?
> 
> No, I said that was the minimum fix.

OK, I suggested that and more, so I wasn't sure what you were
getting at.

> OK, I see that now.  In the old behavior, of the lock was
> acquired, but then we were shoved off from it, the analyze
> was not done.  But, in the old behavior if the lock was never
> acquired at all, then it would go ahead to do the
> autoanalyze,

 Ah, I see now.  So the actual worst case for the old code, in
 terms of both head-banging and statistics, was if autovacuum
 was able to acquire the lock but then many tasks all piled up
 behind its lock.  If the system was even *more* busy it would
 not acquire the lock at all, and would behave better.

> and I suppose the rationale for suppressing the stats report was
> this same idea of lying to the stats collector in order to
> encourage a new vacuum attempt to happen right away.

I think Jan expressed some such sentiment back during the original
discussion.  I was not persuaded by that; but he pointed out that
if the deadlock killer killed an autovacuum process which was doing
a truncate, the it did not get to the statistics phase; so I agreed
that any change in that behavior should be a separate patch.  I
missed the fact that if it failed to initially get the lock it did
proceed to the statistics phase.  I explained this earlier in this
thread.  No need to cast about for hypothetical explanations.

> Now I'm not sure that that's a good idea at all

I'm pretty sure it isn't; that's why I proposed changing it.

> But if it is reasonable, we need a redesign of the reporting
> messages, not just a hack to not tell the stats collector what we
> did.

The idea was to try to make as small a change in previous behavior
as possible.  Jan pointed out that when the deadlock detection code
killed an autovacuum worker which was trying to truncate, the
statistics were not updated, leading to retries.  This was an
attempt to *not change* existing behavior.  It was wrong, because
we both missed the fact that if it didn't get the lock in the first
place it went ahead with statistics generation.  That being the
case, I was proposing we always generate statistics if we were
supposed to.  That would be a change toward *more* up-to-date
statistics and *fewer* truncation retries than we've had.  I'm OK
with that because a table hot enough to hit the issue will likely
need the space again or need another vacuum soon.

> Are you saying you intend to revert that whole concept?

No.  I was merely asking what you were suggesting.  As I said
earlier:

 I have seen cases where the old logic head-banged for hours or
 days without succeeding at the truncation attempt in
 autovacuum, absolutely killing performance until the user ran
 an explicit VACUUM.  And in the meantime, since the deadlock
 detection logic was killing autovacuum before it got to the
 analyze phase, the autoanalyze was never done.

> Otherwise we need some thought about how to inform the stats
> collector what's really happening.

I think we can probably improve that on some future release.  I
don't think a new scheme for that makes sense for back-patching or
9.3.

For now what I'm suggesting is generating statistics in all the
cases it did before, plus the case where it starts truncation but
does not complete it.  The fact that before this patch there were
cases where the autovacuum worker was killed, resulting in not
generating needed statistics seems like a bug, not a behavior we
need to preserve.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I suppose it would still be faster than a COPY transfer, but I'm not
>> sure it'd be enough faster to justify the work and the additional
>> portability hits you'd be taking.

> The big win here over a binary COPY is pulling through the indexes as-is
> as well- without having to rebuild them.

Meh.  That raises the ante another substantial multiple with respect to
the amount of portability risk (eg, you're now absolutely dependent on
locale sort orders to be identical in both databases).  And I think
you'd have to freeze all updates to the table while you were copying the
table+indexes, if you wanted them to be consistent.

I can't imagine that we'd accept a patch that says to the recipient
database, "here are some large binary blobs, please believe that
they represent a valid table and associated indexes.  Oh, and don't you
dare try to actually check them, because that would be slow."

Some other interesting things to think about here would be toast-table
OIDs embedded in toast pointers, data type OIDs embedded in arrays (and
maybe records too, I forget), enum value OIDs, btree vacuum cycle IDs,
GiST NSNs ... not sure what else, but I bet that's not a complete list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> InvalidXID as appropriate, if you had access to the source database's
> clog while you did the copying.  It just wouldn't be very fast.

If you're doing that in a streaming method, it strikes me that it'd be
plenty fast.

> I suppose it would still be faster than a COPY transfer, but I'm not
> sure it'd be enough faster to justify the work and the additional
> portability hits you'd be taking.

The big win here over a binary COPY is pulling through the indexes as-is
as well- without having to rebuild them.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Alvaro Herrera

Tom Lane escribió:

> Are you saying you intend to revert that whole concept?  That'd be
> okay with me, I think.  Otherwise we need some thought about how to
> inform the stats collector what's really happening.

Maybe what we need is to consider table truncation as a separate
activity from vacuuming.  Then autovacuum could invoke it without having
to do a full-blown vacuum.  For this to work, I guess we would like to
separately store the status of the back-scan in pgstat somehow (I think
a boolean flag suffices: "were we able to truncate all pages that
appeared to be empty?")

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane

Kevin Grittner  writes:
> Tom Lane  wrote:
>> I think that the minimum appropriate fix here is to revert the hunk
>> I quoted, ie take out the suppression of stats reporting and analysis.

> I'm not sure I understand -- are you proposing that is all we do
> for both the VACUUM command and autovacuum?

No, I said that was the minimum fix.

Looking again at the patch, I note this comment:

   /*
+* We failed to establish the lock in the specified number of
+* retries. This means we give up truncating. Suppress the
+* ANALYZE step. Doing an ANALYZE at this point will reset the
+* dead_tuple_count in the stats collector, so we will not get
+* called by the autovacuum launcher again to do the truncate.
+*/

and I suppose the rationale for suppressing the stats report was this
same idea of lying to the stats collector in order to encourage a new
vacuum attempt to happen right away.  Now I'm not sure that that's a
good idea at all --- what's the reasoning for thinking the table will be
any less hot in thirty seconds?  But if it is reasonable, we need a
redesign of the reporting messages, not just a hack to not tell the
stats collector what we did.

Are you saying you intend to revert that whole concept?  That'd be
okay with me, I think.  Otherwise we need some thought about how to
inform the stats collector what's really happening.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Gurjeet Singh

On Fri, Apr 12, 2013 at 11:44 AM, Bruce Momjian  wrote:

> On Tue, Feb 19, 2013 at 04:50:45PM -0500, Gurjeet Singh wrote:
> > Please find attached the patch for some cleanup and fix bit rot in
> pgindent
> > script.
> >
> > There were a few problems with the script.
> >
> > .) It failed to use the $ENV{PGENTAB} even if it was set.
> > .) The file it tries to download from Postgres' ftp site is no longer
> present.
> > ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
> > .) The tar command extracted the above-mentioned file to a child
> directory, but
> > did not descend into it before running make on it.
> > I think it expected a tarbomb, but clearly the current .tgz file on
> ftp
> > site is not a tarbomb.
> >
> > I don't like the fact that it dies with a message "fetching xyz" rather
> than
> > saying "Could not fetch xyz", but this patch does not address that since
> it
> > doesn't really affect the output when script does work.
> >
> > With this patch in place I could very easily run it on any arbitrary
> file,
> > which seemed like a black-magic before the patch.
> >
> > src/tools/pgindent/pgindent --build 
>
> I have applied this patch.  However, I modified the tarball name to
> reference $INDENT_VERSION, rather than hard-coding "1.2".


Thanks!

Can you also improve the output when it dies upon failure to fetch
something? Currently the only error message it emits is "fetching xyz", and
leaves the user confused as to what really the problem was. The only
indication of a problem might be the exit code,  but I'm not sure of that,
and that doesn't help the interactive user running it on terminal.

-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables

2013-04-12 Thread Mike Broers

Looks like psql> vacuum (verbose, analyze) is not reflecting in
pg_stat_user_tables as well in some cases.  In this scenario I run the
command, it outputs all the deleted pages etc (unlike the vacuumdb -avz
analyze that seemed to be skipped in the log), but it does not update
pg_stat_user_tables.  Thats probably expected based on the description
previously reported, but I wanted to confirm what I was seeing.

On Fri, Apr 12, 2013 at 10:36 AM, Kevin Grittner  wrote:

> Scott Marlowe  wrote:
>
> > Does this behavior only affect the 9.2 branch? Or was it ported
> > to 9.1 or 9.0 or 8.4 as well?
>
> After leaving it on master for a while to see if anyone reported
> problems in development, I back-patched as far as 9.0 in time for
> the 9.2.3 (and related) patches.  Prior to that the code was too
> different for it to be the same patch, and (perhaps not entirely
> coincidentally) I had not seen the problems before 9.0.  From 9.0
> on I have seen multiple sites (all using queuing from Slony or a
> JMS implementation) with recurring problems when the queue
> temporarily got large, shrank again, and then wrapped around to the
> beginning of the table's file space.  In some cases performance was
> so impaired that when such an event was triggered they would shut
> down their application until a manual VACUUM could be run.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian

On Wed, Apr 10, 2013 at 11:19:56AM -0700, Jeff Davis wrote:
> On Wed, 2013-04-10 at 11:01 +0300, Ants Aasma wrote:
> > I think we should first deal with using it for page checksums and if
> > future versions want to reuse some of the code for WAL checksums then
> > we can rearrange the code.
> 
> Sounds good to me, although I expect we at least want any assembly to be
> in a separate file (if the specialization makes it in 9.3).

Sounds good.  Simon has done a good job shepherding this to completion. 

My only question is whether the 16-bit page checksums stored in WAL
reduce our ability to detect failed/corrupt writes to WAL?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane

Andres Freund  writes:
> On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
>> However, we're still thinking too small.  I've been wondering whether we
>> couldn't entirely remove the dirty, awful kluges that were installed in
>> the lock manager to kill autovacuum when somebody blocked behind it.
>> This mechanism should ensure that AV never takes an exclusive lock
>> for long enough to be a serious problem, so do we need that anymore?

> Wouldn't that make DROP TABLE stop working while autovac is processing
> the table?

Meh ... I guess you're right.  I wasn't thinking about exclusive locks
being taken elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner

Tom Lane  wrote:
> Kevin Grittner  writes:
>> OK, will review that to confirm;but assuming that's right, and
>> nobody else is already working on a fix, I propose to do the
>> following:
>
>> (1)  Restore the prior behavior of the VACUUM command.  This was
>> only ever intended to be a fix for a serious autovacuum problem
>> which caused many users serious performance problems -- in some
>> cases including unscheduled down time.  I also saw sites where,
>> having been bitten by this, they disabled autovacuum and later ran
>> into problems with bloat and/or xid wraparound.
>
>> (2)  If autovacuum decides to try to truncate but the lock cannot
>> be initially acquired, and analyze is requested, skip the
>> truncation and do the autoanalyze.  If the table is so hot that we
>> cannot get the lock, the space may get re-used soon, and if not
>> there is a good chance another autovacuum will trigger soon.  If
>> the user really wants the space released to the OS immediately,
>> they can run a manual vacuum to force the issue.
>
> I think that the minimum appropriate fix here is to revert the hunk
> I quoted, ie take out the suppression of stats reporting and analysis.

I'm not sure I understand -- are you proposing that is all we do
for both the VACUUM command and autovacuum?  (i.e., we don't try to
full restore the old VACUUM command behavior; just the troublesome
failure to generate statistics?)

> However, we're still thinking too small.  I've been wondering whether we
> couldn't entirely remove the dirty, awful kluges that were installed in
> the lock manager to kill autovacuum when somebody blocked behind it.
> This mechanism should ensure that AV never takes an exclusive lock
> for long enough to be a serious problem, so do we need that anymore?

Are you suggesting that just in master/HEAD or back to 9.0?  If the
latter, what existing problem does it cure (besides ugly code)?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Object oriented programming language native to EAV

2013-04-12 Thread Sergei Sheinin

Programming language environment whose parser and interpreter are written
with plpgsql. Proof of concept prototype has been tested. An object
oriented programming language implemented with self-describing Entity
Attribute Value model that stores objects and object metadata descriptions.
The environment has features of a column-oriented database. High level
language commands are mapped to sets of key-value pairs that comprise low
level language code. http://sproutpl.wordpress.com

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Andres Freund

On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
> Kevin Grittner  writes:
> > OK, will review that to confirm;but assuming that's right, and
> > nobody else is already working on a fix, I propose to do the
> > following:
> 
> > (1)� Restore the prior behavior of the VACUUM command.� This was
> > only ever intended to be a fix for a serious autovacuum problem
> > which caused many users serious performance problems -- in some
> > cases including unscheduled down time.� I also saw sites where,
> > having been bitten by this, they disabled autovacuum and later ran
> > into problems with bloat and/or xid wraparound.
> 
> > (2)� If autovacuum decides to try to truncate but the lock cannot
> > be initially acquired, and analyze is requested, skip the
> > truncation and do the autoanalyze.� If the table is so hot that we
> > cannot get the lock, the space may get re-used soon, and if not
> > there is a good chance another autovacuum will trigger soon.� If
> > the user really wants the space released to the OS immediately,
> > they can run a manual vacuum to force the issue.
> 
> I think that the minimum appropriate fix here is to revert the hunk
> I quoted, ie take out the suppression of stats reporting and analysis.
> 
> However, we're still thinking too small.  I've been wondering whether we
> couldn't entirely remove the dirty, awful kluges that were installed in
> the lock manager to kill autovacuum when somebody blocked behind it.
> This mechanism should ensure that AV never takes an exclusive lock
> for long enough to be a serious problem, so do we need that anymore?

Wouldn't that make DROP TABLE stop working while autovac is processing
the table? Considering how long e.g. a full table vacuum on a huge table
can take that doesn't seem to be acceptable.
Sure, you can manually kill the autovac process, but that will soon be
restarted. So you have to know that you need to start the DROP TABLE
beforehand so it will get the lock quicker and such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Tom Lane

Kevin Grittner  writes:
> OK, will review that to confirm;but assuming that's right, and
> nobody else is already working on a fix, I propose to do the
> following:

> (1)  Restore the prior behavior of the VACUUM command.  This was
> only ever intended to be a fix for a serious autovacuum problem
> which caused many users serious performance problems -- in some
> cases including unscheduled down time.  I also saw sites where,
> having been bitten by this, they disabled autovacuum and later ran
> into problems with bloat and/or xid wraparound.

> (2)  If autovacuum decides to try to truncate but the lock cannot
> be initially acquired, and analyze is requested, skip the
> truncation and do the autoanalyze.  If the table is so hot that we
> cannot get the lock, the space may get re-used soon, and if not
> there is a good chance another autovacuum will trigger soon.  If
> the user really wants the space released to the OS immediately,
> they can run a manual vacuum to force the issue.

I think that the minimum appropriate fix here is to revert the hunk
I quoted, ie take out the suppression of stats reporting and analysis.

However, we're still thinking too small.  I've been wondering whether we
couldn't entirely remove the dirty, awful kluges that were installed in
the lock manager to kill autovacuum when somebody blocked behind it.
This mechanism should ensure that AV never takes an exclusive lock
for long enough to be a serious problem, so do we need that anymore?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Bruce Momjian

On Fri, Apr 12, 2013 at 10:22:38PM +0530, Pavan Deolasee wrote:
> 
> 
> 
> On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
> > On 04/12/2013 10:15 AM, Tom Lane wrote:
> >> There's 0 chance of making that work, because the two databases 
> wouldn't
> >> have the same notions of committed XIDs.
> 
> > Yeah. Trying to think way outside the box, could we invent some sort of
> > fixup mechanism that could be applied to adopted files?
> 
> Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> InvalidXID as appropriate, if you had access to the source database's
> clog while you did the copying.  It just wouldn't be very fast.
> 
> 
> 
> Would it be possible to fix the XIDs *after* copying the data files,
> potentially on a different server so as to avoid any additional overhead on 
> the
> main server ? I guess so, though we will probably need some mechanism to lock
> out access to the table (which seems easy), flush all its data pages to the
> disk and some way to reliably flush all clog pages as well so that they can be
> copied along with the data files. The page LSNs seem to be easy to handle and
> can be easily zeroed out outside the server.
> 
> I wonder though if this all look like a material for something like pg_reorg
> (pack) though some kind of support from the core may be required.

Uh, now that you mention it, pg_upgrade in non-link mode does
something similer, in that it copies the data files and clog.  You could
use pg_upgrade in non-link mode, run VACUUM FREEZE on the upgraded
cluster, and then copy the data files.

The only problem is that pg_upgrade can't upgrade tablespaces with the
same system catalog version because the tablespace directory names would
conflict.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path

2013-04-12 Thread Tom Lane

Robert Haas  writes:
> The hunk that changes the messages might need some thought so that it
> doesn't cause a translation regression.  But in general I see no
> reason not to do this before we release beta1.  It seems safe enough,
> and changes that reduce the need for packagers to carry private
> patches are, I think, generally a good thing.

It looks to me like this is asking for pg_regress to adopt a nonstandard
interpretation of PGHOST, which doesn't seem like a wise thing at all,
especially if it's not documented.

FWIW, the equivalent thing in the Red Hat/Fedora packages can be seen
in this patch:

http://pkgs.fedoraproject.org/cgit/postgresql.git/plain/postgresql-var-run-socket.patch

which would not get noticeably shorter if we hacked pg_regress in the
suggested way.  AFAICT, instead of touching pg_regress.c, Red Hat's
patch would need to do something to the regression Makefiles if we
wanted to use this implementation.  I'm not convinced that'd be better
at all.  TBH, if this is committed, the Red Hat patches will probably
end up reverting it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Pavan Deolasee

On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane  wrote:

> Andrew Dunstan  writes:
> > On 04/12/2013 10:15 AM, Tom Lane wrote:
> >> There's 0 chance of making that work, because the two databases wouldn't
> >> have the same notions of committed XIDs.
>
> > Yeah. Trying to think way outside the box, could we invent some sort of
> > fixup mechanism that could be applied to adopted files?
>
> Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> InvalidXID as appropriate, if you had access to the source database's
> clog while you did the copying.  It just wouldn't be very fast.
>
>
Would it be possible to fix the XIDs *after* copying the data files,
potentially on a different server so as to avoid any additional overhead on
the main server ? I guess so, though we will probably need some mechanism
to lock out access to the table (which seems easy), flush all its data
pages to the disk and some way to reliably flush all clog pages as well so
that they can be copied along with the data files. The page LSNs seem to be
easy to handle and can be easily zeroed out outside the server.

I wonder though if this all look like a material for something like
pg_reorg(pack) though some kind of support from the core may be required.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Andres Freund

On 2013-04-12 12:14:24 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 04/12/2013 10:15 AM, Tom Lane wrote:
> >> There's 0 chance of making that work, because the two databases wouldn't
> >> have the same notions of committed XIDs.
> 
> > Yeah. Trying to think way outside the box, could we invent some sort of 
> > fixup mechanism that could be applied to adopted files?
> 
> Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> InvalidXID as appropriate, if you had access to the source database's
> clog while you did the copying.  It just wouldn't be very fast.

I think if one goes over the heap and hint bits everything (so the item
pointers don't have to be immediately rewritten), freeze everything and
such it should be doable at about disk speed unless you have a really
fast disk subsystem.
But it still is fairly complicated and I doubt its really necessary.

> I suppose it would still be faster than a COPY transfer, but I'm not
> sure it'd be enough faster to justify the work and the additional
> portability hits you'd be taking.

Using binary copy might already give quite a speedup, Sameer, did you
try that?

Also, do you really need parts of a cluster or would a base backup of
the whole cluster do the trick?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Tom Lane

Andrew Dunstan  writes:
> On 04/12/2013 10:15 AM, Tom Lane wrote:
>> There's 0 chance of making that work, because the two databases wouldn't
>> have the same notions of committed XIDs.

> Yeah. Trying to think way outside the box, could we invent some sort of 
> fixup mechanism that could be applied to adopted files?

Well, it wouldn't be that hard to replace XIDs with FrozenXID or
InvalidXID as appropriate, if you had access to the source database's
clog while you did the copying.  It just wouldn't be very fast.

I suppose it would still be faster than a COPY transfer, but I'm not
sure it'd be enough faster to justify the work and the additional
portability hits you'd be taking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Alvaro Herrera

Andrew Dunstan escribió:
> 
> On 04/12/2013 10:15 AM, Tom Lane wrote:
> >Sameer Thakur  writes:
> >>The proposed tool tries to make migration faster for tables and indices
> >>only by copying their binary data files.
> >There's 0 chance of making that work, because the two databases wouldn't
> >have the same notions of committed XIDs.
> 
> Yeah. Trying to think way outside the box, could we invent some sort
> of fixup mechanism that could be applied to adopted files? Of
> course, that could slow things down so much that it wouldn't be
> worth it, but it might be a nice research project.

I think the fixup procedure involves freezing Xids (prior to the
transporting), which the OP said he didn't want to do.

If you don't freeze beforehand, there's not enough info in the new
cluster to know which tuples are dead/alive.  Another option would be to
have a "private" copy of pg_clog/pg_subtrans for the transported
table(s), but that seems very difficult to arrange.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (auto)vacuum truncate exclusive lock

2013-04-12 Thread Kevin Grittner

Jeff Janes  wrote:

>> If we're going to have the message, we should make it useful.
>> My biggest question here is not whether we should add this info,
>> but whether it should be DEBUG instead of LOG

> I like it being LOG.  If it were DEBUG, I don't think anyone
> would be likely to see it when they needed to, as it happens
> sporadically on busy servers and I don't think people would run
> those with DEBUG on.  I figure it is analogous to an autovacuum
> cancel message it partially replaces, and those are LOG.

OK, sold.

>>> Also, I think that permanently boycotting doing autoanalyze
>>> because someone is camping out on an access share lock (or
>>> because there are a never-ending stream of overlapping locks)
>>> and so the truncation cannot be done is a bit drastic,
>>> especially for inclusion in a point release.

>> That much is not a change in the event that the truncation does
>> not complete.  

> OK, I see that now.  In the old behavior, of the lock was
> acquired, but then we were shoved off from it, the analyze was
> not done.  But, in the old behavior if the lock was never
> acquired at all, then it would go ahead to do the autoanalyze,
> and that has changed.   That is they way I was testing it
> (camping out on an access shared lock so the access exclusive
> could never be granted in the first place; because intercepting
> it during the truncate phase was hard to do) and I just assumed
> the behavior I saw would apply to both cases, but it does not.

Ah, I see now.  So the actual worst case for the old code, in terms
of both head-banging and statistics, was if autovacuum was able to
acquire the lock but then many tasks all piled up behind its lock.
If the system was even *more* busy it would not acquire the lock at
all, and would behave better.

>> I have seen cases where the old logic head-banged for
>> hours or days without succeeding at the truncation attempt in
>> autovacuum, absolutely killing performance until the user ran an
>> explicit VACUUM.  And in the meantime, since the deadlock
>> detection logic was killing autovacuum before it got to the
>> analyze phase, the autoanalyze was never done.

> OK, so there three problems.  It would take a second to yield, in
> doing so it would abandon all the progress it had made in that
> second rather than saving it, and it would tight loop (restricted
> by naptime) on this because of the lack of analyze.  So it fixed
> the first two in a way that seems an absolute improvement for the
> auto case, but it made the third one worse in a common case,
> where it never acquires the lock in the first place, and so
> doesn't analyze when before it did in that one case.

Yeah, I see that now.

>> Perhaps the new logic should go ahead and get its lock even on a
>> busy system (like the old logic),

> As far as I can tell, the old logic was always conditional on the
> AccessExlusive lock acquisition, whether it was manual or auto.

OK, will review that to confirm;but assuming that's right, and
nobody else is already working on a fix, I propose to do the
following:

(1)  Restore the prior behavior of the VACUUM command.  This was
only ever intended to be a fix for a serious autovacuum problem
which caused many users serious performance problems -- in some
cases including unscheduled down time.  I also saw sites where,
having been bitten by this, they disabled autovacuum and later ran
into problems with bloat and/or xid wraparound.

(2)  If autovacuum decides to try to truncate but the lock cannot
be initially acquired, and analyze is requested, skip the
truncation and do the autoanalyze.  If the table is so hot that we
cannot get the lock, the space may get re-used soon, and if not
there is a good chance another autovacuum will trigger soon.  If
the user really wants the space released to the OS immediately,
they can run a manual vacuum to force the issue.

If I don't hear anything within the next day or two, I'll write
that up and post it here before applying (and back-patching to
affected branches).

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analyzing bug 8049

2013-04-12 Thread Tom Lane

"Dickson S. Guedes"  writes:
> In my tests, after ANALYZE _bug_header and _bug_line, the query plan
> changes and query results returns as expected. Is this a chance that
> things isn't too bad?

No, it just means that in this particular scenario, the bug only
manifests if a nestloop plan is chosen --- without that, there's no
possibility of pushing a join clause down to the scan level anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Andrew Dunstan



On 04/12/2013 10:15 AM, Tom Lane wrote:

Sameer Thakur  writes:

The proposed tool tries to make migration faster for tables and indices
only by copying their binary data files.

There's 0 chance of making that work, because the two databases wouldn't
have the same notions of committed XIDs.



Yeah. Trying to think way outside the box, could we invent some sort of 
fixup mechanism that could be applied to adopted files? Of course, that 
could slow things down so much that it wouldn't be worth it, but it 
might be a nice research project.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to make pgindent work cleanly

2013-04-12 Thread Bruce Momjian

On Tue, Feb 19, 2013 at 04:50:45PM -0500, Gurjeet Singh wrote:
> Please find attached the patch for some cleanup and fix bit rot in pgindent
> script.
> 
> There were a few problems with the script.
> 
> .) It failed to use the $ENV{PGENTAB} even if it was set.
> .) The file it tries to download from Postgres' ftp site is no longer present.
> ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz
> .) The tar command extracted the above-mentioned file to a child directory, 
> but
> did not descend into it before running make on it.
> I think it expected a tarbomb, but clearly the current .tgz file on ftp
> site is not a tarbomb.
> 
> I don't like the fact that it dies with a message "fetching xyz" rather than
> saying "Could not fetch xyz", but this patch does not address that since it
> doesn't really affect the output when script does work.
> 
> With this patch in place I could very easily run it on any arbitrary file,
> which seemed like a black-magic before the patch.
> 
> src/tools/pgindent/pgindent --build 

I have applied this patch.  However, I modified the tarball name to
reference $INDENT_VERSION, rather than hard-coding "1.2".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analyzing bug 8049

2013-04-12 Thread Dickson S. Guedes

Em Sex, 2013-04-12 às 10:58 -0400, Tom Lane escreveu:
> Robert Haas  writes:
> > On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane  wrote:
> >> The plan I'm considering is to get this written and committed to HEAD
> >> in the next week, so that it can go out in 9.3beta1.  After the patch
> >> has survived a reasonable amount of beta testing, I'd be more comfortable
> >> about back-patching into 9.2.
> 
> > I'm not very sanguine about the chances that back-patching this won't
> > provoke any screams of agony ... but I don't have a better idea,
> > either.  Letting queries return wrong answers isn't a superior
> > solution, for sure.
> 
> The only alternative I can see is to make a back-patch that just teaches
> get_eclass_for_sort_expr() to compute valid nullable_relids for the sort
> expression.  


In my tests, after ANALYZE _bug_header and _bug_line, the query plan
changes and query results returns as expected. Is this a chance that
things isn't too bad?


[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br
http://www.rnp.br/keyserver/pks/lookup?search=0x8F3E3C06D428D10A


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path

2013-04-12 Thread Alvaro Herrera

Robert Haas escribió:
> On Tue, Apr 9, 2013 at 8:08 AM, Christoph Berg  wrote:
> > Debian has been patching pg_regress for years because our default unix
> > socket directory is /var/run/postgresql, but that is not writable by
> > the build user at build time. This used to be a pretty ugly "make-
> > patch-make check-unpatch-make install" patch dance, but now it is a
> > pretty patch that makes pg_regress accept --host=/tmp.
> >
> > The patch is already part of the .deb files on apt.postgresql.org and
> > passes all regression test suites there.
> >
> > Please consider it for 9.3. (And maybe backpatch it all the way...)
> 
> The hunk that changes the messages might need some thought so that it
> doesn't cause a translation regression.

FWIW I don't think we translate pg_regress at all, and I have my doubts
that it makes much sense.  I'd go as far as suggest we get rid of the _()
decorations in the lines we're changing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables

2013-04-12 Thread Kevin Grittner

Scott Marlowe  wrote:

> Does this behavior only affect the 9.2 branch? Or was it ported
> to 9.1 or 9.0 or 8.4 as well?

After leaving it on master for a while to see if anyone reported
problems in development, I back-patched as far as 9.0 in time for
the 9.2.3 (and related) patches.  Prior to that the code was too
different for it to be the same patch, and (perhaps not entirely
coincidentally) I had not seen the problems before 9.0.  From 9.0
on I have seen multiple sites (all using queuing from Slony or a
JMS implementation) with recurring problems when the queue
temporarily got large, shrank again, and then wrapped around to the
beginning of the table's file space.  In some cases performance was
so impaired that when such an event was triggered they would shut
down their application until a manual VACUUM could be run.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to use EXPLAIN (TIMING)

2013-04-12 Thread Andres Freund

On 2013-04-12 20:44:02 +0530, Robins Tharakan wrote:
> Hi,
> 
> While creating regression tests for EXPLAIN I am (somehow) unable to get
> (TIMING) option to work with EXPLAIN!
> 
> I must be doing something stupid but all these options below just didn't
> work. Could someone point me to the right direction?

EXPLAIN (ANALYZE, TIMING on/off) ...;

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to use EXPLAIN (TIMING)

2013-04-12 Thread Robins Tharakan

Hi,

While creating regression tests for EXPLAIN I am (somehow) unable to get
(TIMING) option to work with EXPLAIN!

I must be doing something stupid but all these options below just didn't
work. Could someone point me to the right direction?

mpf2=# explain (TIMING) SELECT 1;
ERROR:  EXPLAIN option TIMING requires ANALYZE

mpf2=# EXPLAIN (TIMING FALSE) SELECT 1;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)

mpf2=# EXPLAIN ANALYSE (TIMING) SELECT 1;
ERROR:  syntax error at or near "TIMING"
LINE 1: EXPLAIN ANALYSE (TIMING) SELECT 1;
 ^
mpf2=# EXPLAIN ANALYSE (TIMING TRUE) SELECT 1;
ERROR:  syntax error at or near "TIMING"
LINE 1: EXPLAIN ANALYSE (TIMING TRUE) SELECT 1;
 ^
mpf2=# EXPLAIN ANALYSE TIMING TRUE SELECT 1;
ERROR:  syntax error at or near "TIMING"
LINE 1: EXPLAIN ANALYSE TIMING TRUE SELECT 1;
^
mpf2=# EXPLAIN ANALYSE TIMING SELECT 1;
ERROR:  syntax error at or near "TIMING"
LINE 1: EXPLAIN ANALYSE TIMING SELECT 1;
^
mpf2=# EXPLAIN TIMING ANALYSE SELECT 1;
ERROR:  syntax error at or near "TIMING"
LINE 1: EXPLAIN TIMING ANALYSE SELECT 1;
^
mpf2=# EXPLAIN (TIMING) ANALYSE SELECT 1;
ERROR:  syntax error at or near "ANALYSE"
LINE 1: EXPLAIN (TIMING) ANALYSE SELECT 1;
 ^
mpf2=# EXPLAIN (TIMING) ANALYZE SELECT 1;
ERROR:  syntax error at or near "ANALYZE"
LINE 1: EXPLAIN (TIMING) ANALYZE SELECT 1;
 ^
mpf2=# EXPLAIN (TIMING FALSE) ANALYZE SELECT 1;
ERROR:  syntax error at or near "ANALYZE"
LINE 1: EXPLAIN (TIMING FALSE) ANALYZE SELECT 1;
   ^
mpf2=# SELECT version();
   version


--
 PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
20120
305 (Red Hat 4.4.6-4), 64-bit
(1 row)


Thanks
--
Robins Tharakan

Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path

2013-04-12 Thread Robert Haas

On Tue, Apr 9, 2013 at 8:08 AM, Christoph Berg  wrote:
> Debian has been patching pg_regress for years because our default unix
> socket directory is /var/run/postgresql, but that is not writable by
> the build user at build time. This used to be a pretty ugly "make-
> patch-make check-unpatch-make install" patch dance, but now it is a
> pretty patch that makes pg_regress accept --host=/tmp.
>
> The patch is already part of the .deb files on apt.postgresql.org and
> passes all regression test suites there.
>
> Please consider it for 9.3. (And maybe backpatch it all the way...)

The hunk that changes the messages might need some thought so that it
doesn't cause a translation regression.  But in general I see no
reason not to do this before we release beta1.  It seems safe enough,
and changes that reduce the need for packagers to carry private
patches are, I think, generally a good thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analyzing bug 8049

2013-04-12 Thread Tom Lane

Robert Haas  writes:
> On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane  wrote:
>> The plan I'm considering is to get this written and committed to HEAD
>> in the next week, so that it can go out in 9.3beta1.  After the patch
>> has survived a reasonable amount of beta testing, I'd be more comfortable
>> about back-patching into 9.2.

> I'm not very sanguine about the chances that back-patching this won't
> provoke any screams of agony ... but I don't have a better idea,
> either.  Letting queries return wrong answers isn't a superior
> solution, for sure.

The only alternative I can see is to make a back-patch that just teaches
get_eclass_for_sort_expr() to compute valid nullable_relids for the sort
expression.  That's necessary code in any case, and it would fix the
immediately complained-of bug.  The thing that scares me is that it is
now clear that equivclass.c is capable of considering two expressions to
be equivalent when they should not be; that is, I'm afraid there are
variants of the problem that would not be cured by such a limited
back-patch.  But maybe I should try to create such an example before
proposing the more invasive approach.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables

2013-04-12 Thread Scott Marlowe

Does this behavior only affect the 9.2 branch? Or was it ported to 9.1 or
9.0 or 8.4 as well?


On Thu, Apr 11, 2013 at 7:48 PM, Kevin Grittner  wrote:

> Tom Lane  wrote:
>
> > However I've got to say that both of those side-effects of
> > exclusive-lock abandonment seem absolutely brain dead now that I
> > see them.  Why would we not bother to tell the stats collector
> > what we've done?  Why would we think we should not do ANALYZE
> > when we were told to?
> >
> > Would someone care to step forward and defend this behavior?
> > Because it's not going to be there very long otherwise.
>
> I'm pretty sure that nobody involved noticed the impact on VACUUM
> ANALYZE command; all discussion was around autovacuum impact; and
> Jan argued that this was leaving things in a status quo for that,
> so I conceded the point and left it for a follow-on patch if
> someone felt the behavior needed to change.  Sorry for the miss.
>
> http://www.postgresql.org/message-id/50bb700e.8060...@yahoo.com
>
> As far as I'm concerned all effects on the explicit command were
> unintended and should be reverted.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-ad...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



-- 
To understand recursion, one must first understand recursion.

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-12 Thread Robert Haas

On Fri, Apr 12, 2013 at 10:42 AM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai  wrote:
>
>> > Also, the attached function-execute-permission patch is a rebased
>> > version. I rethought its event name should be OAT_FUNCTION_EXECUTE,
>> > rather than OAT_FUNCTION_EXEC according to the manner without
>> > abbreviation. Other portion is same as previous ones.
>>
>> Great.  This looks fine to me, committed.  I also committed your
>> getObjectIdentity patch, which caused some regression test output
>> changes.  I applied the necessary correction before committing, I
>> think, but please check.
>
> I think the function-execute code path is still using
> getObjectDescription rather than identity.

Yeah, I think so.  KaiGai, can you provide a follow-on patch to fix that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analyzing bug 8049

2013-04-12 Thread Robert Haas

On Thu, Apr 11, 2013 at 1:25 PM, Tom Lane  wrote:
> This idea needs more fleshing out, but it's seeming awfully attractive
> right now.  The big problem with it is that it's going to be a more
> invasive patch than I feel terribly comfortable about back-patching.
> However, I'm not sure there's much choice, because I don't see any narrow
> fix for 9.2 that would not result in very substantial degradation of its
> optimization ability.  We can't just lobotomize equivalence-class
> processing.
>
> The plan I'm considering is to get this written and committed to HEAD
> in the next week, so that it can go out in 9.3beta1.  After the patch
> has survived a reasonable amount of beta testing, I'd be more comfortable
> about back-patching into 9.2.

I'm not very sanguine about the chances that back-patching this won't
provoke any screams of agony ... but I don't have a better idea,
either.  Letting queries return wrong answers isn't a superior
solution, for sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-12 Thread Alvaro Herrera

Robert Haas escribió:
> On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai  wrote:

> > Also, the attached function-execute-permission patch is a rebased
> > version. I rethought its event name should be OAT_FUNCTION_EXECUTE,
> > rather than OAT_FUNCTION_EXEC according to the manner without
> > abbreviation. Other portion is same as previous ones.
> 
> Great.  This looks fine to me, committed.  I also committed your
> getObjectIdentity patch, which caused some regression test output
> changes.  I applied the necessary correction before committing, I
> think, but please check.

I think the function-execute code path is still using
getObjectDescription rather than identity.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Tom Lane

Sameer Thakur  writes:
> The proposed tool tries to make migration faster for tables and indices
> only by copying their binary data files.

There's 0 chance of making that work, because the two databases wouldn't
have the same notions of committed XIDs.  You apparently don't
understand what you read in the other discussion --- the steps you are
objecting to are not optional, whether copying a whole tablespace or
only one table.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] after 9.2.4 patch vacuumdb -avz not analyzing all tables

2013-04-12 Thread Mike Broers

On further review this particular server skipped from 9.2.2 to 9.2.4.  This
is my most busy and downtime sensitive server and I was waiting on a
maintenance window to patch to 9.2.3 when 9.2.4 dropped and bumped up the
urgency.  However, I have 3 other less busy production servers that were
all running 9.2.3 for a while, didnt exhibit the problem, and still dont on
9.2.4.

psql> analyze seems to work ok in the meantime, I'll report back if I
notice any problems with that.

Thanks very much for the response and investigation, it is much
appreciated!

On Thu, Apr 11, 2013 at 8:48 PM, Kevin Grittner  wrote:

> Tom Lane  wrote:
>
> > However I've got to say that both of those side-effects of
> > exclusive-lock abandonment seem absolutely brain dead now that I
> > see them.  Why would we not bother to tell the stats collector
> > what we've done?  Why would we think we should not do ANALYZE
> > when we were told to?
> >
> > Would someone care to step forward and defend this behavior?
> > Because it's not going to be there very long otherwise.
>
> I'm pretty sure that nobody involved noticed the impact on VACUUM
> ANALYZE command; all discussion was around autovacuum impact; and
> Jan argued that this was leaving things in a status quo for that,
> so I conceded the point and left it for a follow-on patch if
> someone felt the behavior needed to change.  Sorry for the miss.
>
> http://www.postgresql.org/message-id/50bb700e.8060...@yahoo.com
>
> As far as I'm concerned all effects on the explicit command were
> unintended and should be reverted.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-12 Thread Robert Haas

On Mon, Apr 8, 2013 at 12:28 PM, Kohei KaiGai  wrote:
> Thanks. I could find two obvious wording stuffs here, please see smaller
> one of the attached patches. I didn't fixup manner to use "XXX" in source
> code comments.

Committed.

> Also, the attached function-execute-permission patch is a rebased
> version. I rethought its event name should be OAT_FUNCTION_EXECUTE,
> rather than OAT_FUNCTION_EXEC according to the manner without
> abbreviation. Other portion is same as previous ones.

Great.  This looks fine to me, committed.  I also committed your
getObjectIdentity patch, which caused some regression test output
changes.  I applied the necessary correction before committing, I
think, but please check.

>> BTW, if it were possible to set things up so that the test_sepgsql
>> script could validate the version of the sepgsql-regtest policy
>> installed, that would eliminate a certain category of errors.  I
>> notice also that the regression tests themselves seem to fail if you
>> invoke the script as contrib/sepgsql/test_sepgsql rather than
>> ./test_sepgsql, which suggests another possible pre-validation step.
>>
> Please see the test-script-fixup patch.
> I added "cd `dirname $0`" on top of the script. It makes pg_regress to
> avoid this troubles. Probably, pg_regress was unavailable to read
> sql commands to run.
>
> A problem regarding to validation of sepgsql-regtest policy module
> is originated by semodule commands that takes root privilege to
> list up installed policy modules. So, I avoided to use this command
> in the test_sepgsql script.
> However, I have an idea that does not raise script fail even if "sudo
> semodule -l" returned an error, except for a case when it can run
> correctly and the policy version is not expected one.
> How about your opinion for this check?

Not sure that's too useful.  And I don't like the idea of putting sudo
commands in a test harness script.  That seems too much like the sort
of thing bad people do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_regress and non-default unix socket path

2013-04-12 Thread Christoph Berg

Re: To PostgreSQL Hackers 2013-04-09 <20130409120807.gd26...@msgid.df7cb.de>

If the patch looks too intrusive at this stage of the release, it
would be enough if the last chunk got included, which should really be
painless:

> diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
> new file mode 100644
> index b632326..d4d4279
> *** a/src/test/regress/pg_regress.c
> --- b/src/test/regress/pg_regress.c

[...]

> *** regression_main(int argc, char *argv[],
> *** 2249,2255 
>*/
>   header(_("starting postmaster"));
>   snprintf(buf, sizeof(buf),
> !  SYSTEMQUOTE "\"%s/postgres\" -D \"%s/data\" 
> -F%s -c \"listen_addresses=%s\" > \"%s/log/postmaster.log\" 2>&1" SYSTEMQUOTE,
>bindir, temp_install,
>debug ? " -d 5" : "",
>hostname ? hostname : "",
> --- 2254,2262 
>*/
>   header(_("starting postmaster"));
>   snprintf(buf, sizeof(buf),
> !  hostname && *hostname == '/'
> !  ? SYSTEMQUOTE "\"%s/postgres\" -D 
> \"%s/data\" -F%s -k \"%s\" > \"%s/log/postmaster.log\" 2>&1" SYSTEMQUOTE
> !  : SYSTEMQUOTE "\"%s/postgres\" -D 
> \"%s/data\" -F%s -c \"listen_addresses=%s\" > \"%s/log/postmaster.log\" 2>&1" 
> SYSTEMQUOTE,
>bindir, temp_install,
>debug ? " -d 5" : "",
>hostname ? hostname : "",

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

2013-04-12 Thread Hannu Krosing


On 04/11/2013 12:17 AM, Tom Lane wrote:

Alvaro Herrera  writes:

Hannu Krosing wrote:

Natural solution to this seems to move most of pg_dump functionality
into backend as functions, so we have pg_dump_xxx() for everything
we want to dump plus a topological sort function for getting the
objects in right order.

This idea doesn't work because of back-patch considerations (i.e. we
would not be able to create the functions in back branches, and so this
new style of pg_dump would only work with future server versions).  So
pg_dump itself would have to retain capability to dump stuff from old
servers.  This seems unlikely to fly at all, because we'd be then
effectively maintaining pg_dump in two places, both backend and the
pg_dump source code.

There are other issues too, in particular that most of the backend's
code tends to work on SnapshotNow time whereas pg_dump would really
prefer it was all done according to the transaction snapshot.

I was just thinking of moving the queries the pg_dump currently
uses into UDF-s, which do _not_ use catalog cache, but will use
the same SQL to query catalogs as pg_dump currently does
using whatever snapshot mode is currently set .

the pg_dump will need to still have the same queries for older
versions of postgresql but for new versions pg_dump  can become
catalog-agnostic.

and I think that we can retire pg_dump support for older
postgresql versions the same way we drop support for
older versions of postgresql itself.

Hannu


We have
got bugs of that ilk already in pg_dump, but we shouldn't introduce a
bunch more.  Doing this right would therefore mean that we'd have to
write a lot of duplicative code in the backend, ie, it's not clear that
we gain any synergy by pushing the functionality over.  It might
simplify cross-backend-version issues (at least for backend versions
released after we'd rewritten all that code) but otherwise I'm afraid
it'd just be pushing the problems somewhere else.

In any case, "push it to the backend" offers no detectable help with the
core design issue here, which is figuring out what functionality needs
to be exposed with what API.

main things I see would be

 * get_list_of_objects(object_type, pattern or namelist)
 * get_sql_def_for_object(object_type, object_name)
 * sort_by_dependency(list of [obj_type, obj_name])

from this you could easily construct most uses, especially if
sort_by_dependency(list of [obj_type, obj_name])
would be smart enough to break circular dependencies, like
turning to tables with mutual FK-s into tabledefs without
FKs + separate constraints.

Or we could always have constraints separately, so that
the ones depending on non-exported objects would be easy
to leave out

My be the dependency API analysis itself is something
worth a GSOC effort ?

Hannu


regards, tom lane




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund

On 2013-04-12 16:58:44 +0530, Pavan Deolasee wrote:
> On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund wrote:
> 
> >
> >
> > I don't think that holds true at all. If you look at pg_stat_bgwriter in
> > any remotely bugs cluster with a hot data set over shared_buffers you'll
> > notice that a large percentage of writes will have been done by backends
> > themselves.
> >
> 
> Even if what you are saying it true, which I am sure is, the pages that the
> backend is evicting mustn't be recently used by the LRU algorithm which
> means that the WAL pertaining to the last change to the page in most
> likelihood is already replicated, unless the replication is really lagging
> much behind. Of course, if the standby is not able to keep pace with the
> master in a realistic manner then we have a problem with the approach.

It frequently takes time in the sub to few second range for usagecounts
in zero.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Pavan Deolasee

On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund wrote:

>
>
> I don't think that holds true at all. If you look at pg_stat_bgwriter in
> any remotely bugs cluster with a hot data set over shared_buffers you'll
> notice that a large percentage of writes will have been done by backends
> themselves.
>

Even if what you are saying it true, which I am sure is, the pages that the
backend is evicting mustn't be recently used by the LRU algorithm which
means that the WAL pertaining to the last change to the page in most
likelihood is already replicated, unless the replication is really lagging
much behind. Of course, if the standby is not able to keep pace with the
master in a realistic manner then we have a problem with the approach.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund

On 2013-04-12 11:18:01 +0530, Pavan Deolasee wrote:
> On Thu, Apr 11, 2013 at 8:39 PM, Ants Aasma  wrote:
> 
> > On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing 
> > wrote:
> > > On 04/11/2013 03:52 PM, Ants Aasma wrote:
> > >>
> > >> On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing 
> > >> wrote:
> > >>>
> > >>> The proposed fix - halting all writes of data pages to disk and
> > >>> to WAL files while waiting ACK from standby - will tremendously
> > >>> slow down all parallel work on master.
> > >>
> > >> This is not what is being proposed. The proposed fix halts writes of
> > >> only data pages that are modified within the window of WAL that is not
> > >> yet ACKed by the slave. This means pages that were recently modified
> > >> and where the clocksweep or checkpoint has decided to evict them. This
> > >> only affects the checkpointer, bgwriter and backends doing allocation.
> > >> Furthermore, for the backend clocksweep case it would be reasonable to
> > >> just pick another buffer to evict. The slowdown for most actual cases
> > >> will be negligible.
> > >
> > > You also need to hold back all WAL writes, including the ones by
> > > parallel async and locally-synced transactions. Which means that
> > > you have to make all locally synced transactions to wait on the
> > > syncrep transactions committed before them.
> > > After getting the ACK from slave you then have a backlog of stuff
> > > to write locally, which then also needs to be sent to slave. Basically
> > > this turns a nice smooth WAL write-and-stream pipeline into a
> > > chunky wait-and-write-and-wait-and-stream-and-wait :P
> > > This may not be a problem in slight write load cases, which is
> > > probably the most widely happening usecase for postgres, but it
> > > will harm top performance and also force people to get much
> > > better (and more expensive) hardware than would otherways
> > > be needed.
> >
> > Why would you need to hold back WAL writes? WAL is written on master
> > first and then steamed to slave as it is done now. You would only need
> > hold back dirty page evictions having a recent enough LSN to not yet
> > be replicated. This holding back is already done to wait for local WAL
> > flushes, see bufmgr.c:1976 and bufmgr.c:669. When a page gets dirtied
> > it's usage count gets bumped, so it will not be considered for
> > eviction for at least one clocksweep cycle. In normal circumstances
> > that will be enough time to get an ACK from the slave. When WAL is
> > generated at an higher rate than can be replicated this will not be
> > true. In that case backends that need to bring in new pages will have
> > to wait for WAL to be replicated before they can continue. That will
> > hopefully include the backends that are doing the dirtying, throttling
> > the WAL generation rate. This would definitely be optional behavior,
> > not something turned on by default.
> >
> >
> I agree. I don't think the proposes change would cause a lot of performance
> bottleneck since the proposal is to hold back writing of dirty pages until
> the WAL is replicated successfully to the standby. The heap pages are
> mostly written by the background threads often much later than the WAL for
> the change is written. So in all likelihood, there will be no wait
> involved. Of course, this will not be true for very frequently updated
> pages that must be written at a checkpoint.

I don't think that holds true at all. If you look at pg_stat_bgwriter in
any remotely bugs cluster with a hot data set over shared_buffers you'll
notice that a large percentage of writes will have been done by backends
themselves.
Yes, we need to improve on this, and we are talking about it right now
in another thread, but until thats solved this argumentation seems to
fall flat on its face.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund

On 2013-04-12 02:29:01 +0900, Fujii Masao wrote:
> On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing  wrote:
> >
> > You just shut down the old master and let the standby catch
> > up (takas a few microseconds ;) ) before you promote it.
> >
> > After this you can start up the former master with recovery.conf
> > and it will follow nicely.
>
> No. When you shut down the old master, it might not have been
> able to send all the WAL records to the standby. I have observed
> this situation several times. So in your approach, new standby
> might fail to catch up with the master nicely.

It seems most of this thread is focusing on the wrong thing then. If we
really are only talking about planned failover then we need to solve
*that* not some ominous "don't flush data too early" which has
noticeable performance and implementation complexity problems.

I guess youre observing that not everything is replicated because youre
doing an immediate shutdown - probably because performing the shutdown
checkpoint would take too long. This seems solveable by implementing a
recovery connection command which initiates a shutdown that just
disables future WAL inserts and returns the last lsn that has been
written. Then you can fall over as soon as that llsn has been reached
and can make the previous master follow from there on without problems.

You could even teach the standby not to increment the timeline in that
case since thats safe.

The biggest issue seems to be how to implement this without another
spinlock acquisition for every XLogInsert(), but that seems possible.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

2013-04-12 Thread Shuai Fan


On 04/11/2013 11:48 PM, Andrew Dunstan wrote:
It could be interesting to have a library that would output database 
metadata in some machine readable and manipulatable format such as 
JSON or XML. One thing that's annoying about the text output pg_dump 
produces is that it's not at all structured, so if you want, for 
example, to restore a table but to a table of a different name, or to 
a different schema, then you're reduced to having to mangle the SQL by 
using hand editing or regular expression matching. Something with the 
potential to ease that pain would be worth having.


Yes. This is really interesting. Current code in pg_dump, supports 4 
ArchiveFormat, e.g. archCustom, archTar, archNull and archDirectory. 
These formats are implementation of interface "pg_backup". Maybe I could 
try to add two implementation of "XML" and "JSON".
It is worth to mention that I wrote a program to parse XML format 
file into csv one using library "libxercesc" a month ago, Although, this 
program is just like helloworld. But, maybe I could get benefit from 
that small program, because both of them use XML format. And what I need 
to do is try another xml library.
I had a look at JSON on wiki. The format is a little like XML. Both 
of them are nested. And there are some library could be used, e.g. 
libjson (or json.c, or other json library writting in C) and libxml2 (or 
something else).


BTW, could it be an idea for GSOC? If so, I can have a try. Add XML 
and JSON output format for pg_dump.


Thank you all for your attention.


Best regards,
 Shuai





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

2013-04-12 Thread Dimitri Fontaine

Michael Paquier  writes:
> I recall discussions about reverse engineering of a parsed query tree in
> the event trigger threads but nothing has been committed I think. Also, you

Yes. The name used in there was "Normalized Command String".

> need to consider that implementing such reverse engineering mechanism in
> core might not be a good thing for new features and maintenance, as it
> would mean that it is necessary to change those APIs consistently with what
> is added on the parsing side.

The goal is to retain the same API, which is quite simple:

  function get_command_string(Node *parsetree) returns text

At the SQL level, the Node * is of datatype "internal" and you can't
forge it, you need to be given it in some ways. In the Event Trigger
case we though of a TG_PARSETREE magic variable, or maybe another
function get_current_parsetree() that only work when called from an
event trigger.

The other part of the API of course is how to represent the data, and as
we're talking about a Normalized Command String, there's no choice but
issuing a valid SQL command string that the server would know how to
execute and which would have the same side effects.

So of course a 9.3 and a 9.4 server equiped with that hypothetical
function would behave differently when the syntax did change. And that's
exactly why I think it the best choice here is to have that code
embedded and maintained in core.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC] questions about idea "rewrite pg_dump as library"

2013-04-12 Thread Dimitri Fontaine

Tom Lane  writes:
> Alvaro Herrera  writes:
>> This idea doesn't work because of back-patch considerations (i.e. we
>> would not be able to create the functions in back branches, and so this
>> new style of pg_dump would only work with future server versions).  So

That is a policy question, not a technical one.

We could either add the new functions in the backend binary itself, or
provide it as an extension that pg_dump would know to install when
needed, if we decided it's ok.

My understanding is that will need to change that policy anyways the day
we have a page disk format change and pg_upgrade needs to flag the old
cluster pages with the old page version number before being able to run,
or something.

> There are other issues too, in particular that most of the backend's
> code tends to work on SnapshotNow time whereas pg_dump would really
> prefer it was all done according to the transaction snapshot.  We have

Would that be solved by having MVCC catalogs, or the backend code you're
talking about wouldn't be included in there? (which would be surprising
to me, as much as trumping the benefits of MVCC catalogs, but well).

> In any case, "push it to the backend" offers no detectable help with the
> core design issue here, which is figuring out what functionality needs
> to be exposed with what API.

Andrew did begin to work on that parts with the "Retail DDL" project. We
know of several "getddl" implementation, and you can also have a look at
the pg_dump -Fs (split format) patch that didn't make it for 9.3, where
some API work has been done.

The need exists and some thinking over the API to get here did happen.
Some more certainly needs to be done, granted.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add sql_drop event for event triggers

2013-04-12 Thread Dimitri Fontaine

Tom Lane  writes:
> Yeah, I was just looking at the IfSupported variant.  In the structure
> I just suggested (separate ProcessSlowUtility function), we could make
> that work by having switch cases for some statements in both functions,

I've done it the way you propose here, and then in the Slow variant we
have two set of cases again: those with some manual transactionnal
behavior or some other code complexities, and the really simple ones.

The attached patch involves a second layer of distinction to simplify
the code fully and remove all the Event Trigger related macrology that I
didn't much like. Maybe that's going a tad too far, see what you think.

Of course the patch passes make check.

Also, some switch cases are now duplicated for the sole benefit of
keeping some compiler help, but I don't know how much help we're talking
about now that we have 3 different switches. It seemed cleaner to do it
that way as it allows to ERROR out in the very last default case.

Finally, I've been surprised to find out that those cases are only
triggering for "ddl_event_start" (and not "ddl_event_end"), I think
that's a bug we should be fixing:

case T_AlterDomainStmt:
case T_DefineStmt:
case T_IndexStmt:   /* CREATE INDEX */

The T_IndexStmt should be triggering ddl_event_end when stmt->concurrent
is false, and that's not the case in the code in the master's branch.
The two other cases should just be moved to the later switch so that
both start and end triggers happen.

Or maybe I'm missing something here. Given that it might as well be the
case, I did only refactor the code keeping its current behavior rather
than fixing what I think is a bug while doing so.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
***
*** 68,73 
--- 68,80 
  /* Hook for plugins to get control in ProcessUtility() */
  ProcessUtility_hook_type ProcessUtility_hook = NULL;

+ /* local function declarations */
+ void standard_ProcessSlowUtility(Node *parsetree,
+  const char *queryString,
+  ParamListInfo params,
+  DestReceiver *dest,
+  char *completionTag,
+  ProcessUtilityContext context);

  /*
   * Verify user has ownership of specified relation, else ereport.
***
*** 342,411  ProcessUtility(Node *parsetree,
  dest, completionTag, context);
  }

- #define InvokeDDLCommandEventTriggers(parsetree, fncall) \
- 	do { \
- 	if (isCompleteQuery) \
- { \
- 			EventTriggerDDLCommandStart(parsetree); \
- 		} \
- 		fncall; \
- if (isCompleteQuery) \
- { \
- 			EventTriggerSQLDrop(parsetree); \
- 			EventTriggerDDLCommandEnd(parsetree); \
- 		} \
- 	} while (0)
- 
- #define InvokeDDLCommandEventTriggersIfSupported(parsetree, fncall, objtype) \
- 	do { \
- 		bool	_supported = EventTriggerSupportsObjectType(objtype); \
- 		\
- 		if (_supported) \
- 		{ \
- 			EventTriggerDDLCommandStart(parsetree); \
- 		} \
- 		fncall; \
- 		if (_supported) \
- 		{ \
- 			EventTriggerSQLDrop(parsetree); \
- 			EventTriggerDDLCommandEnd(parsetree); \
- 		} \
- 	} while (0)
- 
  /*
!  * UTILITY_BEGIN_QUERY and UTILITY_END_QUERY are a pair of macros to enclose
!  * execution of a single DDL command, to ensure the event trigger environment
!  * is appropriately set up before starting, and tore down after completion or
!  * error.
   */
- #define UTILITY_BEGIN_QUERY(isComplete) \
- 	do { \
- 		bool		_needCleanup; \
- 		\
- 		_needCleanup = (isComplete) && EventTriggerBeginCompleteQuery(); \
- 		\
- 		PG_TRY(); \
- 		{ \
- 			/* avoid empty statement when followed by a semicolon */ \
- 			(void) 0
- 
- #define UTILITY_END_QUERY() \
- 		} \
- 		PG_CATCH(); \
- 		{ \
- 			if (_needCleanup) \
- 			{ \
- EventTriggerEndCompleteQuery(); \
- 			} \
- 			PG_RE_THROW(); \
- 		} \
- 		PG_END_TRY(); \
- 		if (_needCleanup) \
- 		{ \
- 			EventTriggerEndCompleteQuery(); \
- 		} \
- 	} while (0)
- 
  void
  standard_ProcessUtility(Node *parsetree,
  		const char *queryString,
--- 349,364 
  dest, completionTag, context);
  }

  /*
!  * Processing a Utility statement is now divided into two functions. A fast
!  * path is taken for statements for which we have not Event Trigger support, so
!  * that there's no need to do any Event Trigger specific setup.
!  *
!  * It's not just a performance requirement, mind you, because refreshing the
!  * Event Triggers cache (see src/backend/utils/cache/evtcache.c) is prone to
!  * taking a snapshot, which can not occur with in a START TRANSACTION statement
!  * for example.
   */
  void
  standard_ProcessUtility(Node *parsetree,
  		const char *queryString,
***
*** 415,429  standard_ProcessUtility(Node *parsetree,
  		ProcessUtilityContext context)
  {
  	bool		isTopLevel = (context == PROCESS_UTILITY_TOPLEVEL);

[HACKERS] Detach/attach table and index data files from one cluster to another

2013-04-12 Thread Sameer Thakur

Hello,

The current process of transferring data files from one cluster to another
by using pg_dump and pg_restore is time consuming.

The proposed tool tries to make migration faster for tables and indices
only by copying their binary data files. This is like pg_upgrade but used
for migration of table and indices

* *The discussion here @
http://www.postgresql.org/message-id/caa-alv5cqf09zvfrcb1xxuqlsp-oux0s_hq6ryscd6ctami...@mail.gmail.com

speaks of possibility detaching/attaching  databases as an alternative to
dump/restore. But the process of freezing XID’s and zeroing out LSN’s make
the solution equally time consuming if not more.

 But if we consider just tables and indexes to be detached/reattached,
would this be a viable alternative to dump and restore of tables?

 The same discussion indicates it could be done but is more complicated as
one has to deal with system catalogs of the newly mounted table and map old
OID’s to new ones. This is required to ensure consistency in roles, and
objects owned by those roles.

 We would also need to ensure LSN values of the reattached pages are less
than the current WAL endpoint in receiver.

 Are there any more issues we need to be aware of?

regards

Sameer

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Hannu Krosing


On 04/11/2013 07:29 PM, Fujii Masao wrote:

On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing  wrote:

You just shut down the old master and let the standby catch
up (takas a few microseconds ;) ) before you promote it.

After this you can start up the former master with recovery.conf
and it will follow nicely.

No. When you shut down the old master, it might not have been
able to send all the WAL records to the standby.

In what cases (other than a standby lagging too much or
not listening at all) have you observed this ?

I have observed
this situation several times. So in your approach, new standby
might fail to catch up with the master nicely.

the page http://wiki.postgresql.org/wiki/Streaming_Replication claims this:

* Graceful shutdown

When smart/fast shutdown is requested, the primary waits to exit
until XLOG records have been sent to the standby, up to the
shutdown checkpoint record.

Maybe you were requesting immediate shutdown ?

Regards
Hannu Krosing



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

70 matches

Mail list logo