date:20140819

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-08-19 Thread Dilip kumar

15 July 2014 19:29 Amit Kapila Wrote,

>Implementation details:
---
>1. This feature is implemented only for tar format in windows
>as native windows utilites are not able to create symlinks while
>extracting files from tar (It might be possible to create symlinks
>if cygwin is installed on your system, however I feel we need this
>feature to work for native windows as well).  Another reason to not
>create it for non-tar (plain) format is that plain format can update the
>symlinks via -T option and backing up symlink file during that
>operation can lead to spurious symlinks after archive recovery.

I have reviewed the patch and did not find any major comments.

There are some comments I would like to share with you


1.  Rebase the patch to current GIT head.



2.  +  * Construct symlink file

+  */

+  initStringInfo(&symlinkfbuf);
I think declaration and initialization of symlinkfbuf string can be 
moved under #ifdef WIN32 compile time macro,
for other platform it’s simply allocated and freed which can be avoided.


3.  +  /*

+  * native windows utilites are not able 
create symlinks while

+  * extracting files from tar.

+  */

Rephrase the above sentence and fix spelling mistake  (utilities 
are not able to create)

I haven’t done the testing yet, once I finish with testing i will share the 
result with you.


Regards,
Dilip

Re: [HACKERS] Trove with PostgreSQL-XC

2014-08-19 Thread Mark Kirkwood


On 19/08/14 23:14, Vivek Singh Raghuwanshi wrote:


Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but
i am not sure about XC.


AFAIK [1], vanilla Postgres support for Trove is still at the review 
stage, and there is quite a bit to finish, and some stuff to fix.


I'd think that getting Trove to provision a complete XC VM (or more 
sensibly a set of VMs that run XC) is going to require fair bit of extra 
work.


It looks like replication will be deployable via Trove for the supported 
flavours (Mysql, Mongo, Cassandra, Counch, Redis) see 
https://wiki.openstack.org/wiki/Trove/Replication-And-Clustering . I'm 
not seeing any specific discussion of sharding notice.


Cheers

Mark

[1] I've been helping debug the Postgres Trove patch



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Jeff Davis

On Tue, 2014-08-19 at 12:54 +0200, Tomas Vondra wrote:
> The use-case for this is tracking a chosen subtree of contexts - e.g.
> aggcontext and below, so I'd expect the tracked subtrees to be relatively
> shallow. Am I right?

Right.

> My fear is that by removing the inheritance bit, we'll hurt cases with a
> lot of child contexts. For example, array_agg currently creates a separate
> context for each group - what happens if you have 100k groups and do
> MemoryContextGetAllocated? I guess iterating over 100k groups is not free.

Good point. We don't want to make checking the allocated space into an
expensive operation, because then we will be forced to call it less
frequently, which implies that we'd be sloppier about actually fitting
in work_mem.

> Wouldn't the solution with inheritance and propagating the accounting info
> to the parent actually better? Or maybe allowing both, having two flags
> when creating a context instead of one?

That seems overly complicated. We only have one driving use case, so two
orthogonal options sounds excessive. Perhaps one option if we can't
solve the performance problem and we need to isolate the changes to
hashagg.

> Also, do we really need to track allocated bytes - couldn't we track
> kilobytes or something and use smaller data types to get below the 64B?

Good idea.

I attached a patch that uses two uint32 fields so that it doesn't
increase the size of MemoryContextData, and it tracks memory usage for
all contexts. I was unable to detect any performance regression vs.
master, but on my machine the results are noisy.

It would be easier to resolve the performance concern if I could
reliably get the results Robert is getting. I think I was able to
reproduce the regression with the old patch, but the results were still
noisy.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 242,247  typedef struct AllocChunkData
--- 242,249 
  #define AllocChunkGetPointer(chk)	\
  	((AllocPointer)(((char *)(chk)) + ALLOC_CHUNKHDRSZ))
  
+ static void update_allocation(MemoryContext context, size_t size);
+ 
  /*
   * These functions implement the MemoryContext API for AllocSet contexts.
   */
***
*** 250,256  static void AllocSetFree(MemoryContext context, void *pointer);
  static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
  static void AllocSetInit(MemoryContext context);
  static void AllocSetReset(MemoryContext context);
! static void AllocSetDelete(MemoryContext context);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
  static void AllocSetStats(MemoryContext context, int level);
--- 252,258 
  static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
  static void AllocSetInit(MemoryContext context);
  static void AllocSetReset(MemoryContext context);
! static void AllocSetDelete(MemoryContext context, MemoryContext parent);
  static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
  static bool AllocSetIsEmpty(MemoryContext context);
  static void AllocSetStats(MemoryContext context, int level);
***
*** 500,505  AllocSetContextCreate(MemoryContext parent,
--- 502,510 
  	 errdetail("Failed while creating memory context \"%s\".",
  			   name)));
  		}
+ 
+ 		update_allocation((MemoryContext) context, blksize);
+ 
  		block->aset = context;
  		block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block->endptr = ((char *) block) + blksize;
***
*** 590,595  AllocSetReset(MemoryContext context)
--- 595,601 
  		else
  		{
  			/* Normal case, release the block */
+ 			update_allocation(context, -(block->endptr - ((char*) block)));
  #ifdef CLOBBER_FREED_MEMORY
  			wipe_mem(block, block->freeptr - ((char *) block));
  #endif
***
*** 611,617  AllocSetReset(MemoryContext context)
   * But note we are not responsible for deleting the context node itself.
   */
  static void
! AllocSetDelete(MemoryContext context)
  {
  	AllocSet	set = (AllocSet) context;
  	AllocBlock	block = set->blocks;
--- 617,623 
   * But note we are not responsible for deleting the context node itself.
   */
  static void
! AllocSetDelete(MemoryContext context, MemoryContext parent)
  {
  	AllocSet	set = (AllocSet) context;
  	AllocBlock	block = set->blocks;
***
*** 623,628  AllocSetDelete(MemoryContext context)
--- 629,644 
  	AllocSetCheck(context);
  #endif
  
+ 	/*
+ 	 * Parent is already unlinked from context, so can't use
+ 	 * update_allocation().
+ 	 */
+ 	while (parent != NULL)
+ 	{
+ 		parent->total_allocated -= context->total_allocated;
+ 		parent = parent->parent;
+ 	}
+ 
  	/* Make it look empty, just in case... */
  	MemSetAligned(set->freelist, 0, sizeof(set->freelist));
  	set->blocks = NULL;
***
*** 678,683  AllocSetAlloc(MemoryContext co

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-19 Thread Michael Paquier

On Fri, Jul 25, 2014 at 4:45 AM, Fabrízio de Royes Mello
 wrote:
>
> Given this is a very small and simple patch I thought it's not necessary...
>
> Added to the next CommitFest.

I had a look at this patch, and here are a couple of comments:
1) Depending on how ArchiveEntry is called to register an object to
dump, namespace may be NULL, but it is not the case
namespace->dobj.name, so you could get the namespace name at the top
of the function that have their verbose output improved with something
like that:
const char *namespace = tbinfo->dobj.namespace ?
   tbinfo->dobj.namespace->dobj.name : NULL;
And then simplify the message output as follows:
if (namespace)
   write_msg("blah \"%s\".\"%s\" blah", namespace, classname);
else
   write_msg("blah \"%s\" blah", classname);
You can as well safely remove the checks on namespace->dobj.name.
2) I don't think that this is correct:
-   ahlog(AH, 1, "processing data
for table \"%s\"\n",
- te->tag);
+   ahlog(AH, 1, "processing data
for table \"%s\".\"%s\"\n",
+ AH->currSchema, te->tag);
There are some code paths where AH->currSchema is set to NULL, and I
think that you should use te->namespace instead.
3) Changing only this message is not enough. The following verbose
messages need to be changed too for consistency:
- pg_dump: creating $tag $object
- pg_dump: setting owner and privileges for [blah]

I have been pondering as well about doing similar modifications to the
error message paths, but it did not seem worth it as this patch is
aimed only for the verbose output. Btw, I have basically fixed those
issues while doing the review, and finished with the attached patch.
Fabrizio, is this new version fine for you?
Regards,
-- 
Michael
From e0809869655c9e22cce11955c7286cef8a42bf1d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 20 Aug 2014 14:40:40 +0900
Subject: [PATCH] Improve verbose messages of pg_dump with namespace

Namespace is added to the verbose output when it is available, relation
and namespace names are put within quotes for clarity and consistency
with the other tools as well.
---
 src/bin/pg_dump/pg_backup_archiver.c | 26 ---
 src/bin/pg_dump/pg_dump.c| 85 ++--
 2 files changed, 93 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 3aebac8..07cc10e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -546,8 +546,13 @@ RestoreArchive(Archive *AHX)
 		/* Both schema and data objects might now have ownership/ACLs */
 		if ((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0)
 		{
-			ahlog(AH, 1, "setting owner and privileges for %s %s\n",
-  te->desc, te->tag);
+			/* Show namespace if available */
+			if (te->namespace)
+ahlog(AH, 1, "setting owner and privileges for %s \"%s\".\"%s\"\n",
+	  te->desc, te->namespace, te->tag);
+			else
+ahlog(AH, 1, "setting owner and privileges for %s \"%s\"\n",
+	  te->desc, te->tag);
 			_printTocEntry(AH, te, ropt, false, true);
 		}
 	}
@@ -621,7 +626,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 
 	if ((reqs & REQ_SCHEMA) != 0)		/* We want the schema */
 	{
-		ahlog(AH, 1, "creating %s %s\n", te->desc, te->tag);
+		/* Show namespace if available */
+		if (te->namespace)
+			ahlog(AH, 1, "creating %s \"%s\".\"%s\"\n",
+  te->desc, te->namespace, te->tag);
+		else
+			ahlog(AH, 1, "creating %s \"%s\"\n", te->desc, te->tag);
+
 
 		_printTocEntry(AH, te, ropt, false, false);
 		defnDumped = true;
@@ -713,8 +724,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 
-	ahlog(AH, 1, "processing data for table \"%s\"\n",
-		  te->tag);
+	/* Show namespace if available */
+	if (te->namespace)
+		ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n",
+			  te->namespace, te->tag);
+	else
+		ahlog(AH, 1, "processing data for table \"%s\"\n",
+			  te->tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5c0f95f..dd7eef9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1383,6 +1383,8 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 {
 	TableDataInfo *tdinfo = (TableDataInfo *) dcontext;
 	TableInfo  *tbinfo = tdinfo->tdtable;
+	const char *namespace = tbinfo->dobj.namespace ?
+		tbinfo->dobj.namespace->dobj.name : NULL;
 	const char *classname = tbinfo->dobj.name;
 	const bool	hasoids = tbinfo->hasoids;
 	const bool	oids = tdinfo->oids;
@@ -1400,7 +1402,16 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, "dumping contents of tabl

Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-19 Thread Joshua D. Drake



On 08/19/2014 08:34 AM, Craig Ringer wrote:


I iterate, the current apt.postgresql.org is not doing things correctly.
It breaks things and it shouldn't.


FWIW, this is inconsistent with what yum.postgresql.org does - it takes
the POLA approach of packing the libpq from the major release configured
in the repo. Each major has its own sub-repo.

I find it pretty hard to justify installing a 9.3 libpq alongside a 9.1
server myself.



Exactly.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans."


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication commands and log_statements

2014-08-19 Thread Michael Paquier

On Wed, Aug 20, 2014 at 2:06 AM, Robert Haas  wrote:

> On Sat, Aug 16, 2014 at 10:27 AM, Amit Kapila 
> wrote:
> > I think ideally it would have been better if we could have logged
> > replication commands under separate log_level, but as still there
> > is no consensus on extending log_statement and nobody is even
> > willing to pursue, it seems okay to go ahead and log these under
> > 'all' level.
>
> I think the consensus is clearly for a separate GUC.
>
+1.
-- 
Michael

Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-08-19 Thread Noah Misch

On Mon, Aug 18, 2014 at 12:30:40PM +0100, Greg Stark wrote:
> On Tue, Aug 5, 2014 at 3:41 AM, Noah Misch  wrote:
> > This remains open for 9.4.  Your proposal to revert the feature in 9.4 and 
> > fix
> > it in 9.5 sounds reasonable.
> 
> Ok, I've gone ahead and done this. I'm sorry for the delays and confusion.

Thanks.

> > I did try psql-wrapped-expanded-fix-v5.patch with the tests Peter and I 
> > posted
> > upthread, and those tests now behave as they do in released versions.  What
> > cases did you find that still change vs. 9.3?
> 
> I was trying to build a spreadsheet of every combination of these
> options. It turns out that 4-dimensional spreadsheets are kind of
> awkward.

What's one query that still behaves differently in 9.5 vs. 9.3 (under
formatting options that exist in both versions)?

> I think the fundamental dilemma was the same that was discussed
> upthread. If wrapped expanded mode should have an extra space padding
> column for the wrap indicators then all expanded modes should have
> that column to be consistent since wrapping shouldn't change the
> amount of padding.

I might agree for a greenfield design, but -1 for changing expanded mode now
to improve consistency in this way.  I predict the complaints from users of
expanded mode in automation would overpower any applause for the consistency.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-19 Thread Josh Berkus

On 08/15/2014 04:19 PM, Tom Lane wrote:
> Personally I'd prefer to go to the all-lengths approach, but a large
> part of that comes from a subjective assessment that the hybrid approach
> is too messy.  Others might well disagree.
> 
> In case anyone else wants to do measurements on some more data sets,
> attached is a copy of Heikki's patch updated to apply against git tip.

Note that this is not 100% comparable because I'm running it against git
clone, and the earlier tests were against beta2.  However, the Heikki
patch looks like a bust on this dataset -- see below.

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
 pg_size_pretty

 394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));

 pg_size_pretty

 542 MB

Extraction Test:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
QUERY
PLAN
---
 Bitmap Heap Scan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
(actual time=22.742..5281.823 rows=100423 loops=1)
   Recheck Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
   Heap Blocks: exact=1471
   ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
rows=200 width=0) (actual time=22.445..22.445 rows=100423 loops=1)
 Index Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
 Planning time: 0.095 ms
 Execution time: 5292.047 ms
(7 rows)

So, that extraction test is about 1% *slower* than the basic Tom Lane
lengths-only patch, and still 80% slower than original JSONB.  And it's
the same size as the lengths-only version.

Huh?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-08-19 Thread Michael Paquier

On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing  wrote:

> On 04/30/2014 11:41 PM, Tom Lane wrote:
> > We really oughta fix the WAL situation, not just band-aid around it.
>
> After re-reading this thread, it is not clear that anyone is going to
> work on it so I'll just ask:
>
> Is anyone working on this?
>
> If not, I'd like to put it on my plate.
>
Vik, did you get time to look at that finally?
Regards,
-- 
Michael

[HACKERS] KNN searches support for SP-GiST [GSOC'14]

2014-08-19 Thread Vladislav Sterzhanov

Hi there, pg-Hackers!
Here I go with the patch which brings up the possibility to perform
nearest-neighbour searches on SP-GiSTs (as of now includes implementation
for quad and kd trees). Pre-reviewed by my GSoC mentor Alexander Korotkov.
Sample benchmarking script included in the attachment (dumps the current
geonames archive and runs several searches on the (latitude, longitude)
points), which demonstrates the dramatic improvements against plain
searches and sorting. Regression tests included, compiles and runs
successfully under both of my Ubuntu 12.04 Server and 08/2014 Arch Linux.

Vlad Sterzhanov / Quadrocube
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 56827e5..5214770 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -83,6 +83,7 @@
>>
>^
~=
+   <->
   
  
  
@@ -95,6 +96,7 @@
>>
>^
~=
+   <->
   
  
  
@@ -137,6 +139,10 @@
   supports the same operators but uses a different index data structure which
   may offer better performance in some applications.
  
+ 
+  By supporting the ordering <-> operator the quad_point_ops and kd_point_ops provide 
+  a user with the ability to perform a K-nearest-neighbour search over the indexed point dataset.
+ 
 
 
 
@@ -539,9 +545,12 @@ CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
 typedef struct spgInnerConsistentIn
 {
 ScanKey scankeys;   /* array of operators and comparison values */
+ScanKey		orderbyKeys;	/* array of ordering operators and comparison values */
 int nkeys;  /* length of array */
+int norderbys;  /* length of array */
 
 Datum   reconstructedValue; /* value reconstructed at parent */
+Datum		suppValue		/* supplimentary value of parent */
 int level;  /* current level (counting from zero) */
 boolreturnData; /* original data must be returned? */
 
@@ -559,6 +568,8 @@ typedef struct spgInnerConsistentOut
 int*nodeNumbers;/* their indexes in the node array */
 int*levelAdds;  /* increment level by this much for each */
 Datum  *reconstructedValues;/* associated reconstructed values */
+Datum	   *suppValues;		/* any additional data implementation needs to be stored in the child nodes */
+double	   **distances;		/* associated distances */
 } spgInnerConsistentOut;
 
 
@@ -573,10 +584,15 @@ typedef struct spgInnerConsistentOut
In particular it is not necessary to check sk_flags to
see if the comparison value is NULL, because the SP-GiST core code
will filter out such conditions.
+   orderbyKeys, of length norderbys,
+   describes ordering operators (if any) in the same fashion.
reconstructedValue is the value reconstructed for the
parent tuple; it is (Datum) 0 at the root level or if the
inner_consistent function did not provide a value at the
parent level.
+   suppValue is any additional value that an implementation
+   decided to store for the parent node ((Datum) 0 in case the 
+   current node is root).
level is the current inner tuple's level, starting at
zero for the root level.
returnData is true if reconstructed data is
@@ -592,7 +608,6 @@ typedef struct spgInnerConsistentOut
nNodes is the number of child nodes contained in the
inner tuple, and
nodeLabels is an array of their label values, or
-   NULL if the nodes do not have labels.
   
 
   
@@ -608,9 +623,17 @@ typedef struct spgInnerConsistentOut
reconstructedValues to an array of the values
reconstructed for each child node to be visited; otherwise, leave
reconstructedValues as NULL.
+   suppValues serves the similiar purpose of holding
+   the implementation-defined data for the inner nodes.
+   distances if the ordered search is carried out,
+   the implementation is supposed to fill them in accordance to the
+   ordering operators provided in orderbyKeys
+   (nodes with lowest distances will be processed first). Leave it
+   NULL otherwise.
Note that the inner_consistent function is
responsible for palloc'ing the
-   nodeNumbers, levelAdds and
+   nodeNumbers, levelAdds,
+   distances, suppValues and
reconstructedValues arrays.
   
  
@@ -636,7 +659,9 @@ CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
 typedef struct spgLeafConsistentIn
 {
 ScanKey scankeys;   /* array of operators and comparison values */
+ScanKey orderbykeys;/* array of ordering operators and comparison values */
 int nkeys;  /* length of array */
+int norderbys;		/* length of array */
 
 Datum   reconstructedValue; /* value reconstructed at parent */
 int level;  /* curre

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund

On 2014-08-19 19:11:46 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
> >> I don't much like adding a separate function for every SSL implementation,
> >> but you've got a point that it would be nice to make it difficult to call
> >> PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
> >> struct, while it's actually something else. Perhaps:
> 
> > A good reason to not have functions with the respective functions is
> > that it requires either including the relevant headers or adding forward
> > declarations of the libraries type.
> 
> It requires no such thing.  What we do for PQgetssl() is declare it as
> returning "void *", and we could easily do the same for other libraries.

Well, the reason the library specific variant has been called superiour
upthread is the potential for type safety...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane

Andres Freund  writes:
> On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
>> I don't much like adding a separate function for every SSL implementation,
>> but you've got a point that it would be nice to make it difficult to call
>> PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
>> struct, while it's actually something else. Perhaps:

> A good reason to not have functions with the respective functions is
> that it requires either including the relevant headers or adding forward
> declarations of the libraries type.

It requires no such thing.  What we do for PQgetssl() is declare it as
returning "void *", and we could easily do the same for other libraries.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-08-19 Thread Claudio Freire

On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
 wrote:
> Also, please split prefetching of regular index scans into a separate patch.
> It's orthogonal to doing async I/O; we could prefetch regular index scans
> with posix_fadvise too, and AIO should be useful for prefetching other
> stuff.

That patch already happened on the list, and it wasn't a win in many
cases. I'm not sure it should be proposed independently of this one.
Maybe a separate patch, but it should be considered dependent on this.

I don't have an archive link at hand atm, but I could produce one later.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund

On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
> I don't much like adding a separate function for every SSL implementation,
> but you've got a point that it would be nice to make it difficult to call
> PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
> struct, while it's actually something else. Perhaps:

A good reason to not have functions with the respective functions is
that it requires either including the relevant headers or adding forward
declarations of the libraries type.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-08-19 Thread Heikki Linnakangas


On 08/20/2014 12:17 AM, John Lumby wrote:

I am attaching a new version of the patch for consideration in the current 
commit fest.


Thanks for working on this!


Relative to the one I submitted on 25 June in 
bay175-w412ff89303686022a9f16aa3...@phx.gbl
the method for handling aio completion using sigevent has been re-written to use
signals exclusively rather than a composite of signals and LWlocks,
and this has fixed the problem I mentioned before with the LWlock method.


ISTM the patch is still allocating stuff in shared memory that really 
doesn't belong there. Namely, the BufferAiocb structs. Or at least parts 
of it; there's also a waiter queue there which probably needs to live in 
shared memory, but the rest of it does not.


At least BufAWaitAioCompletion is still calling aio_error() on an AIO 
request that might've been initiated by another backend. That's not OK.


Please write the patch without atomic CAS operation. Just use a 
spinlock. There's a patch in the commitfest to add support for that, but 
it's not committed yet, and all those USE_AIO_ATOMIC_BUILTIN_COMP_SWAP 
ifdefs make the patch more difficult to read. Same with all the other 
#ifdefs; please just pick a method that works.


Also, please split prefetching of regular index scans into a separate 
patch. It's orthogonal to doing async I/O; we could prefetch regular 
index scans with posix_fadvise too, and AIO should be useful for 
prefetching other stuff.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 10:31 PM, Robert Haas wrote:

On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander  wrote:

On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane  wrote:

Magnus Hagander  writes:

On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas  wrote:

I have a hard time believing that something like this will really
satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
call it good?  We can try to be incredibly thorough in exposing the
information people want and we will still inevitably miss something
that someone cares about; worse, we'll spend an awful lot of time and
energy along the way.



Well, for one you push the full burden onto the application.


Robert's got a point though: there is always going to be somebody who
wants something we fail to expose.  It's better to be able to say "well,
you can do PQgetssl and then munge it for yourself" than to have to say
"sorry, you're screwed".  So if we're going to define PQgetssl as
returning NULL when you're not using OpenSSL, I don't see why we
shouldn't expose a similarly-defined PQgetXXX for each other underlying
implementation we support.  There will not be that many of 'em, and
I suspect the people with very specific needs will not care about more
than one underlying library anyway.

This does not say that we shouldn't also try to have some
library-independent functionality for interrogating certificate state
etc.  Just that having an escape hatch isn't a bad thing.


Yeah, wouldn't hurt I guess.


I do agree tha thaving both would be useful. We could have something like
int PQgetSSLstruct(void **sslstruct)


I think it's likely smarter to have totally separate functions.
First, to make it less likely that users will try to use a pointer to
one type of object as a pointer to some other kind of object.  And
second, because you might, for example, someday have an SSL
implementation that wants to return two pointers.  May as well make
that kind of thing easy.


The struct it returns is totally SSL-implementation specific anyway, so 
for an implementation that would like to return two structs, you could 
well define it to return a struct like:


struct {
CoolStructA *a;
CoolStructB *b;
} CoolSSLStruct;

I don't much like adding a separate function for every SSL 
implementation, but you've got a point that it would be nice to make it 
difficult to call PQgetSSLstruct() and just assume that the returned 
struct is e.g an OpenSSL struct, while it's actually something else. 
Perhaps:


int PQgetSSLstruct(void **sslstruct, char *structname)

You'd call it like PQgetSSLStruct(&mystruct, "openssl"), and it checks 
that the argument matches the library actually been used, otherwise it 
returns an error. And if you need to return two structs, you'd call it 
twice: PQgetSSLStruct(&a, "cool_a") and PQgetSSLStruct(&b, "cool_b").


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Alvaro Herrera

MauMau wrote:

> With that said, copying to a temporary file like .tmp and
> renaming it to  sounds worthwhile even as a basic copy
> utility.  I want to avoid copying to a temporary file with a fixed
> name like _copy.tmp, because some advanced utility may want to run
> multiple instances of pg_copy to copy several files into the same
> directory simultaneously.  However, I'm afraid multiple .tmp
> files might continue to occupy disk space after canceling copy or
> power failure in some use cases, where the copy of the same file
> won't be retried.  That's also the reason why I chose to not use a
> temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor?   If there was, you could create a temp file, open an
fd, then delete the file.  That would remove the issue with files being
leaked due to failures of various kinds.

Also, it's been mentioned that this utility might be useful for
restore_command.  That sounds good I guess, but need to keep the
RECOVERYXLOG trick in mind.  I remember a case of stalled replay because
the restore command was writing to RECOVERYXLOG.gz and ungzipping, and
the unlink("RECOVERYXLOG") call failed after a partial copy and so did
the copy from the archive.  (Removing the borked RECOVERYXLOG.gz fixed
it.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

2014-08-19 Thread Andres Freund

On August 19, 2014 10:24:20 PM CEST, Jeff Janes  wrote:
>On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes 
>wrote:
>
>> On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera
>> > wrote:
>>
>>> Jeff Janes wrote:
>>>
>>> > This problem was initially fairly easy to reproduce, but since I
>>> > started adding instrumentation specifically to catch it, it has
>become
>>> > devilishly hard to reproduce.
>>> >
>>> > I think my next step will be to also log each of the values which
>goes
>>> > into the complex if (...) expression that decides on the deletion.
>>>
>>> Could you please to reproduce it after updating to latest?  I pushed
>>> fixes that should close these issues.  Maybe you want to remove the
>>> instrumentation you added, to make failures more likely.
>>>
>>
>> There are still some problems in 9.4, but I haven't been able to
>diagnose
>> them and wanted to do more research on it.  The announcement of
>upcoming
>> back-branches for 9.3 spurred me to try it there, and I have problems
>with
>> 9.3 (12c5bbdcbaa292b2a4b09d298786) as well.  The move of truncation
>to the
>> checkpoint seems to have made the problem easier to reproduce.  On an
>8
>> core machine, this test fell over after about 20 minutes, which is
>much
>> faster than it usually reproduces.
>>
>> This the error I get:
>>
>> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR:  could not access
>status of
>> transaction 85837221
>> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL:  Could not open file
>> "pg_multixact/members/14031": No such file or directory.
>> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT:  SQL statement
>"SELECT 1
>> FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=)
>$1 FOR
>> KEY SHARE OF x"
>>
>> The testing harness is attached as 3 patches that must be made to the
>test
>> server, and 2 scripts. The script do.sh sets up the database (using
>fixed
>> paths, so be careful) and then invokes count.pl in a loop to do the
>> actual work.
>>
>
>Sorry, after a long time when I couldn't do much testing on this, I've
>now
>been able to get back to it.
>
>It looks like what is happening is that  checkPoint.nextMultiOffset
>wraps
>around from 2^32 to 0, even if 0 is still being used.  At that point it
>starts deleting member files that are still needed.
>
>Is there some interlock which is supposed to prevent from
>checkPoint.nextMultiOffset rom lapping iself?  I haven't been able to
>find
>it.  It seems like the interlock applies only to MultiXid, not the
>Offset.

There is none (and there never has been one either). I've complained about it a 
couple of times but nobody, me included, had time and energy to fix that :(

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"

2014-08-19 Thread Jeff Janes

On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes  wrote:

> On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera  > wrote:
>
>> Jeff Janes wrote:
>>
>> > This problem was initially fairly easy to reproduce, but since I
>> > started adding instrumentation specifically to catch it, it has become
>> > devilishly hard to reproduce.
>> >
>> > I think my next step will be to also log each of the values which goes
>> > into the complex if (...) expression that decides on the deletion.
>>
>> Could you please to reproduce it after updating to latest?  I pushed
>> fixes that should close these issues.  Maybe you want to remove the
>> instrumentation you added, to make failures more likely.
>>
>
> There are still some problems in 9.4, but I haven't been able to diagnose
> them and wanted to do more research on it.  The announcement of upcoming
> back-branches for 9.3 spurred me to try it there, and I have problems with
> 9.3 (12c5bbdcbaa292b2a4b09d298786) as well.  The move of truncation to the
> checkpoint seems to have made the problem easier to reproduce.  On an 8
> core machine, this test fell over after about 20 minutes, which is much
> faster than it usually reproduces.
>
> This the error I get:
>
> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR:  could not access status of
> transaction 85837221
> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL:  Could not open file
> "pg_multixact/members/14031": No such file or directory.
> 2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT:  SQL statement "SELECT 1
> FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR
> KEY SHARE OF x"
>
> The testing harness is attached as 3 patches that must be made to the test
> server, and 2 scripts. The script do.sh sets up the database (using fixed
> paths, so be careful) and then invokes count.pl in a loop to do the
> actual work.
>

Sorry, after a long time when I couldn't do much testing on this, I've now
been able to get back to it.

It looks like what is happening is that  checkPoint.nextMultiOffset wraps
around from 2^32 to 0, even if 0 is still being used.  At that point it
starts deleting member files that are still needed.

Is there some interlock which is supposed to prevent from
 checkPoint.nextMultiOffset rom lapping iself?  I haven't been able to find
it.  It seems like the interlock applies only to MultiXid, not the Offset.

Thanks,

Jeff

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Josh Berkus

On 08/19/2014 12:37 PM, Peter Eisentraut wrote:
> On 8/19/14 9:35 AM, MauMau wrote:
>> pg_copy is for copying a file reliably by syncing.  If sync is not
>> necessary, people can use cp/copy.
> 
> I'm getting mixed messages from this thread.
> 
> I think there could be a fair amount of support for a new tool that can
> serve as a universal plug-and-play archive_command, with a variety of
> options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
> atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
> would reduce the need for users to compose adventurous shell commands,
> and it would set out the various options more clearly.

Yes please!

Although I'm not sold on the idea of using DirectIO for this.  Is there
really enough benefit to make it worth the trouble?

> 
> This is not that.  This is cp+fsync with a hardcoded fadvise policy and
> optional direct-io.  That is a valid problem to address, but it is one
> of many.  On the other hand, I fear that the addition of this
> single-and-a-half-purpose tool would make the overall landscape more
> complicated than it already is.  Since it's in the examples, people will
> probably use it, even if they don't need to or shouldn't.  And not
> recommending it for the restore_command is also confusing.

I'm afraid that I agree with Peter here.  pg_copy looks like a nice
foundation for the eventual pg_copy utility we need, but it's not there yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> BTW, if we're beating on libpq, I wonder if we shouldn't consider
> bumping the soversion at some point.  I mean, I know that we
> technically don't need to do that if we're only *adding* functions and
> not changing any of the existing stuff in backward-incompatible ways,
> but we might *want* to make some backward-incompatible changes at some
> point, and I think there's a decent argument that any patch in this
> are is already doing that at least to PQgetSSL().  Maybe this would be
> a good time to think if there's anything else we want to do that
> would, either by itself or in combination, justify a bump.

I'm not a big fan of doing it for this specific item, though it's
technically an API breakage (which means we should actually have
libpq2-dev packages, make everything that build-deps on libpq-dev
update to build-dep on libpq2-dev, have libpq6, etc..).  If there are
other backwards-incompatible things we wish to do, then I agree that
it'd be good to do them all at the same time (perhaps in conjunction
with 10.0...).  This is the part where I wish we had been keeping an
updated list of things we want to change (like on the wiki..).

It's certainly not a fun transistion to go through.  I also wonder if
we're going to need to worry about what happens when libpq5 and libpq6
end up linked into the same running application.  I don't think we
have any symbol versioning or anything to address that risk in place..

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Peter Eisentraut

On 8/15/14 10:46 AM, Fujii Masao wrote:
> At last, the big question is, is there really no OS command which provides
> the same functionality as pg_copy does? If there is, I'd like to avoid 
> duplicate
> work basically.

If you look hard enough, you can maybe find an OS command that can fsync
a file after it was copied.  Some versions of dd can do that, and some
systems have an fsync program.  But it's not clear whether all systems
have that, and it probably won't be simple and consistent.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Peter Eisentraut

On 8/19/14 9:35 AM, MauMau wrote:
> pg_copy is for copying a file reliably by syncing.  If sync is not
> necessary, people can use cp/copy.

I'm getting mixed messages from this thread.

I think there could be a fair amount of support for a new tool that can
serve as a universal plug-and-play archive_command, with a variety of
options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
would reduce the need for users to compose adventurous shell commands,
and it would set out the various options more clearly.

This is not that.  This is cp+fsync with a hardcoded fadvise policy and
optional direct-io.  That is a valid problem to address, but it is one
of many.  On the other hand, I fear that the addition of this
single-and-a-half-purpose tool would make the overall landscape more
complicated than it already is.  Since it's in the examples, people will
probably use it, even if they don't need to or shouldn't.  And not
recommending it for the restore_command is also confusing.

Another example of how confusing all of this is: On Windows, the copy
command by default doesn't overwrite files, which is what we want
(usually).  The patch changes those instances of copy to pg_copy, but it
doesn't have that behavior.  Should the examples by changed to do a test
&& pg_copy on Windows (what's the Windows shell syntax for that?), or
should pg_copy have an option to not overwrite a file?  How do you then
avoid inconsistencies with the Unix behavior?  Or what if you want fsync
but allow overwriting on Windows?

On the technical side, I think if you fsync a new file, you also need to
fsync the directory, to make sure the file is certain to be visible
after a crash.

[*] I keep reading "directio" as a typo of "direction", so please
consider putting a hyphen or underscore in the option and variable
names. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Robert Haas

On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander  wrote:
> On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane  wrote:
>> Magnus Hagander  writes:
>>> On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas  wrote:
 I have a hard time believing that something like this will really
 satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
 call it good?  We can try to be incredibly thorough in exposing the
 information people want and we will still inevitably miss something
 that someone cares about; worse, we'll spend an awful lot of time and
 energy along the way.
>>
>>> Well, for one you push the full burden onto the application.
>>
>> Robert's got a point though: there is always going to be somebody who
>> wants something we fail to expose.  It's better to be able to say "well,
>> you can do PQgetssl and then munge it for yourself" than to have to say
>> "sorry, you're screwed".  So if we're going to define PQgetssl as
>> returning NULL when you're not using OpenSSL, I don't see why we
>> shouldn't expose a similarly-defined PQgetXXX for each other underlying
>> implementation we support.  There will not be that many of 'em, and
>> I suspect the people with very specific needs will not care about more
>> than one underlying library anyway.
>>
>> This does not say that we shouldn't also try to have some
>> library-independent functionality for interrogating certificate state
>> etc.  Just that having an escape hatch isn't a bad thing.
>
> I do agree tha thaving both would be useful. We could have something like
> int PQgetSSLstruct(void **sslstruct)

I think it's likely smarter to have totally separate functions.
First, to make it less likely that users will try to use a pointer to
one type of object as a pointer to some other kind of object.  And
second, because you might, for example, someday have an SSL
implementation that wants to return two pointers.  May as well make
that kind of thing easy.

BTW, if we're beating on libpq, I wonder if we shouldn't consider
bumping the soversion at some point.  I mean, I know that we
technically don't need to do that if we're only *adding* functions and
not changing any of the existing stuff in backward-incompatible ways,
but we might *want* to make some backward-incompatible changes at some
point, and I think there's a decent argument that any patch in this
are is already doing that at least to PQgetSSL().  Maybe this would be
a good time to think if there's anything else we want to do that
would, either by itself or in combination, justify a bump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> I think you just packed up the goalposts for a one-way trip to Mars,
> but I wonder: What would you consider "proper SSL support"? What
> exactly are we missing?

I hit on a few things in my other email, but there is a huge portion of
SSL which is just about making it easy and sensible to install and get
working properly.  Apache is a good example of how to do this and is one
that a lot of people are familiar with.  Specific issues that I recall
running into are lack of the 'directory' options for certificates,
having trouble figuring out the right format and structure to provide
the complete root chain for the server's certificate and then trying to
figure out how to add intermediate and additional root CAs for client
certificates, getting CRLs to work was a pain, and nothing about how to
get OCSP working.

I think there's been some improvement since I last had to go through the
pain of setting this all up, and some of it is undoubtably OpenSSL's
fault, but there's definitely quite a bit more we could be doing to make
SSL support easier.  I'm hopeful that I'll be able to spend more time on
this in the future but it's not a priority currently.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> On 08/19/2014 06:44 PM, Stephen Frost wrote:
> >>>Hmm. That seems a bit too much. Perhaps provide just the certificate
> >>>itself in DER/PEM format, and have the client parse it (using
> >>>OpenSSL or something else) if it wants more details.
> >I really don't care for that approach.  Our SSL support has always been
> >horrible- I was hoping we'd actually improve that situation.  Adding
> >things in piecemeal over time will just be painful for our users and I
> >don't see why we should wait.
> 
> What would you like to do with the certificates?

In applications which I've developed in the past, I've had to rely on
the CN, serial number, and signing root CA to ensure that there were no
duplicates (this was a government environment which trusted multiple
independent root CAs, and there was no guarantee that even a given CA
wouldn't issue the same serial number to different individuals).  In
other cases, I've had to rely on the fingerprint, but that gets painful
when you have certificate roll-over since you then have to re-enroll
individuals when they get issued a new certificate.  I've also
implemented systems which have certificate expiration warnings.
Checking the extended attributes of the certificate has been a
requirement in the past (to verify it's only being used for its intended
purpose).

One of the things we don't support today is anything beyond matching on
the CN of the certificate in pg_ident, to map from a client certificate
to a PG role.  That wouldn't be acceptable in environments I've worked
in because two different individuals could have identical CNs.  Another
interesting twist are systems (such as Windows..) where the client
certificate to be presented depends on which root CA the server's
certificate is signed with.

I'm not asking this patch to fix that, but you asked what else a
developer might be looking for when it comes to SSL and I'm telling you
things I've actively used.  Generally speaking, these have been on the
server side (eg: with mod_ssl), but I could see a client wanting to use
them, and if we abstract getting this information on the server side to
meet the needs I've described above, wouldn't we be able to (and want
to) share that abstraction with users of libpq?

> I'm imagining that a GUI tool like pgAdmin might want to extract all
> information from the certificate, display it in a window, and let
> the user look at the whole chain and all the fields.

While that'd certainly be nice, it's not what I'm referring to and I
agree that having a third party library to handle that makes sense, as
some operating systems do.  In general, I'm all for more (and better)
integration with the OS-provided certificate systems.  For one thing,
they also can address the issues around ensuring that the client side
certificate is encrypted-at-rest, and can handle prompting the user for
the passphrase to decrypt it.

> But I don't think that exists in OpenSSL, let alone
> in other libraries, and the attribute names would be all different
> anyway.

As I said- let's look at mod_ssl/gnutls as a minimum set to start with..
That's certainly a set I'm familiar with and one which I expect most
other developers who work with SSL are also.  There are bits missing
from that list (mainly around the extended attributes..), but it's
certainly better than the list originally proposed.

> But if we provide an interface to grab the whole certificate chain,
> then you can use any library you want to parse and present it to the
> user.

Yes- we should do this also because there may be cases where the app
developers wants to pass that off to another library or do something
else with it, sure.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander

On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas  wrote:
>>> I have a hard time believing that something like this will really
>>> satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
>>> call it good?  We can try to be incredibly thorough in exposing the
>>> information people want and we will still inevitably miss something
>>> that someone cares about; worse, we'll spend an awful lot of time and
>>> energy along the way.
>
>> Well, for one you push the full burden onto the application.
>
> Robert's got a point though: there is always going to be somebody who
> wants something we fail to expose.  It's better to be able to say "well,
> you can do PQgetssl and then munge it for yourself" than to have to say
> "sorry, you're screwed".  So if we're going to define PQgetssl as
> returning NULL when you're not using OpenSSL, I don't see why we
> shouldn't expose a similarly-defined PQgetXXX for each other underlying
> implementation we support.  There will not be that many of 'em, and
> I suspect the people with very specific needs will not care about more
> than one underlying library anyway.
>
> This does not say that we shouldn't also try to have some
> library-independent functionality for interrogating certificate state
> etc.  Just that having an escape hatch isn't a bad thing.

I do agree tha thaving both would be useful. We could have something like
int PQgetSSLstruct(void **sslstruct)

which returns the type of struct. Then it's up to the application to
know if it can handle it. For those apps that need a *lot*. But the
basic attributes - something like the list from apache - should be
retrievable in a library independent way.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane

Magnus Hagander  writes:
> On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas  wrote:
>> I have a hard time believing that something like this will really
>> satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
>> call it good?  We can try to be incredibly thorough in exposing the
>> information people want and we will still inevitably miss something
>> that someone cares about; worse, we'll spend an awful lot of time and
>> energy along the way.

> Well, for one you push the full burden onto the application.

Robert's got a point though: there is always going to be somebody who
wants something we fail to expose.  It's better to be able to say "well,
you can do PQgetssl and then munge it for yourself" than to have to say
"sorry, you're screwed".  So if we're going to define PQgetssl as
returning NULL when you're not using OpenSSL, I don't see why we
shouldn't expose a similarly-defined PQgetXXX for each other underlying
implementation we support.  There will not be that many of 'em, and
I suspect the people with very specific needs will not care about more
than one underlying library anyway.

This does not say that we shouldn't also try to have some
library-independent functionality for interrogating certificate state
etc.  Just that having an escape hatch isn't a bad thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander

On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas  wrote:
> On Mon, Aug 18, 2014 at 7:54 AM, Heikki Linnakangas
>  wrote:
>> In order to support alternatives to OpenSSL, we need to wean off
>> applications from using PQgetssl(). To do that, we have to provide an
>> alternative API to get the same information. PQgetSSL() returns a pointer
>> directly to the OpenSSL private struct, and you can do anything with that.
>> We cannot have a generic interface that exposes everything, so we need to
>> identify the information that people actually want, and expose that.
>
> I have a hard time believing that something like this will really
> satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
> call it good?  We can try to be incredibly thorough in exposing the
> information people want and we will still inevitably miss something
> that someone cares about; worse, we'll spend an awful lot of time and
> energy along the way.

Well, for one you push the full burden onto the application. Then
every application has to support every SSL library we do, even for the
simplest check. And it has to be built against the same one. (So for
example if someone wants to use openssl on windows - yes there might
still be reasons for that even if we support schannel - they have to
rebuild every one of their applications. And every one of their higher
level language drivers sitting on top of openssl).

The same problem of course appears on say Linux, if you end up using a
mix of openssl and gnutls or a mix of nss and openssl for example.
It's not likely to happen as long as you only use the officially built
packages, but you're likely in for quite a bit of pain if you are
using any non-standard packaging like the oneclick installers etc.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Robert Haas

On Mon, Aug 18, 2014 at 7:54 AM, Heikki Linnakangas
 wrote:
> In order to support alternatives to OpenSSL, we need to wean off
> applications from using PQgetssl(). To do that, we have to provide an
> alternative API to get the same information. PQgetSSL() returns a pointer
> directly to the OpenSSL private struct, and you can do anything with that.
> We cannot have a generic interface that exposes everything, so we need to
> identify the information that people actually want, and expose that.

I have a hard time believing that something like this will really
satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
call it good?  We can try to be incredibly thorough in exposing the
information people want and we will still inevitably miss something
that someone cares about; worse, we'll spend an awful lot of time and
energy along the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> Um, libpq has recently gained the ability to return result fragments,
> >> right?  Those didn't exist when libpq-ification of odbc was attempted,
> >> as I recall -- perhaps it's possible now.
> 
> > I was trying to remember off-hand if we still had that or not..  I
> > thought there was discussion about removing it, actually, but perhaps
> > that was something else.
> 
> Sure,
> http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html
> That's a done deal, it won't be going away.

Ugh.  Yes, there's single-row mode, but I had been thinking there was a
'batch' mode available ala what OCI8 had, where you'd allocate a chunk
of memory and then have it filled directly by the library as rows came
back in until it was full (there was a similar 'bulk send' operation, as
I recall).  Perhaps it was the 'pipelining' thread that I was thinking
about.  Not really relevant, in any case.

> Whether it would solve ODBC's problem I don't know (and I'm not
> volunteering to do the work ;-))

It could work..  though it's certainly been a while since I looked at
the ODBC internals.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-08-19 Thread Tomas Vondra

On 19.8.2014 19:05, Robert Haas wrote:
> On Sat, Aug 16, 2014 at 9:31 AM, Tomas Vondra  wrote:
>> On 12.8.2014 00:30, Tomas Vondra wrote:
>>> On 11.8.2014 20:25, Robert Haas wrote:
 It also strikes me that when there's only 1 batch, the set of bits
 that map onto the batch number is zero-width, and one zero-width bit
 range is as good as another.  In other words, if you're only planning
 to do one batch, you can easily grow the number of buckets on the fly.
 Growing the number of buckets only becomes difficult once you have
 more than one batch.
>>
>> ...
>>
>>> I was considering using reversing the bits of the hash, because that's
>>> pretty much the simplest solution. But I think you're right it might
>>> actually work like this:
>>>
>>>   * Are more batches needed?
>>>
>>>   (yes) => just use nbuckets = my_log2(work_mem / tuple_size)
>>>
>>>   (no) => go ahead, until processing all tuples or hitting work_mem
>>>
>>>   (work_mem) => meh, use the same nbuckets above
>>>
>>>   (all tuples) => compute optimal nbuckets / resize
>>>
>>>
>>> But I need to think about this a bit. So far it seems to me there's no
>>> way additional batches might benefit from increasing nbuckets further.
>>
>> I think this is a simple and solid solution, solving the batchno
>> computation issues quite nicely. Attached is v10 patch (bare and
>> combined with the dense allocation), that does this:
>>
>> 1) when we know we'll need batching, buckets are sized for full work_mem
>>(using the estimated tuple width, etc.)
>>
>> 2) without the batching, we estimate the 'right number of buckets' for
>>the estimated number of tuples, and keep track of the optimal number
>>as tuples are added to the hash table
>>
>>- if we discover we need to start batching, we keep the current
>>  optimal value (which should be the same as the max number of
>>  buckets) and don't mess with it anymore (making it possible to
>>  compute batch IDs just like before)
>>
>>- also, on the fist rebatch (nbatch=1 => nbatch=2) the hash table
>>  is resized as part of the rebatch
>>
>>- if the hash build completes without batching, we do the resize
>>
>> I believe the patch is pretty much perfect. I plan to do more thorough
>> testing on a wide range of queries in the next few days.
>>
>> I also removed the 'enable_hash_resize' GUC, because it would be more
>> complex to implement this properly after doing the resize as part of
>> rebatch etc.. So either it would make the patch more complex, or it
>> wouldn't do what the name promises.
> 
> A variety of trivial comments on this:
> 
> PostgreSQL style is un-cuddled curly braces.  Also, multi-line
> comments need to start with a line containing only /* and end with a
> line containing only */.  In one place you've added curly braces
> around a single-line block that is otherwise unmodified; please don't
> do that.  In one place, you have "becase" instead of "because".  In
> another place, you write "add if after it" but it should say "add it
> after it" or maybe better "add the new one after it".  Avoid using
> punctuation like "=>" in comments to illustrate the connection between
> sentences; instead, use a connecting word like "then" or "therefore"
> or whatever is appropriate; in this instance, a period followed by the
> start of a new sentence seems sufficient.  Revert the removal of a
> single line of whitespace near the top of nodeHash.c.
> 
> There are too many things marked XXX in this patch.  They should
> either be fixed, if they are real problems, or they should be
> commented in a way that doesn't give rise to the idea that they're
> problems if they aren't.

OK, thanks for pointing this out. Attached is v11 of the patch (both
separate and combined with the dense allocation, as before).

I fixed as many of those issues as possible. All the XXX items were
obsolete, except for one in the chunk_alloc function.

I have also removed one constant

> 
> OK, now on to some more substantive stuff:
> 
> 1. It's not clear to me what the overall effect of this patch on
> memory utilization is.  Reducing NTUP_PER_BUCKET from 10 to 1 is going
> to use, on the average, 10x as much bucket-header memory per tuple.
> Specifically, I think it means we'll use about 8 bytes of
> bucket-header memory per tuple instead of 0.8 bytes per tuple.  If the
> tuples are narrow, that could be significant; concerns have been
> expressed about that here in the past.  Increasing the number of
> buckets could also increase memory usage.  On the other hand, the
> dense allocation stuff probably saves a ton of memory, so maybe we end
> up overall, but I'm not sure.  Your thoughts, and maybe some test
> results with narrow and wide tuples, would be appreciated.

The effect of the dense allocation was briefly discussed in this thread,
along with some quick measurements:

http://www.postgresql.org/message-id/53beea9e.2080...@fuzzy.cz

The dense allocati

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Rahila Syed

>So, it seems like you're basically using malloc to work around the
>fact that a palloc failure is an error, and we can't throw an error in
>a critical section.  I don't think that's good; we want all of our
>allocations, as far as possible, to be tracked via palloc.  It might
>be a good idea to add a new variant of palloc or MemoryContextAlloc
>that returns NULL on failure instead of throwing an error; I've wanted
>that once or twice.  But in this particular case, I'm not quite seeing
>why it should be necessary

I am using malloc to return NULL in case of failure and proceed without
compression of FPW ,if it returns NULL.
Proceeding without compression seems to be more accurate than throwing an
error and exiting because of failure to allocate memory for compression.

>the number of backup blocks per record is
>limited to some pretty small number, so it ought to be possible to
>preallocate enough memory to compress them all, perhaps just by
>declaring a global variable like char wal_compression_space[8192]; or
>whatever.

In the updated patch  a static global variable is added to which memory is
allocated from heap using malloc outside critical section. The size of the
memory block is 4 * BkpBlock header + 4 * BLCKSZ.


Thank you,



On Mon, Aug 18, 2014 at 10:40 PM, Robert Haas  wrote:

> On Thu, Jul 3, 2014 at 3:58 PM, Rahila Syed 
> wrote:
> > Updated version of patches are attached.
> > Changes are as follows
> > 1. Improved readability of the code as per the review comments.
> > 2. Addition of block_compression field in BkpBlock structure to store
> > information about compression of block. This provides for switching
> > compression on/off and changing compression algorithm as required.
> > 3.Handling of OOM in critical section by checking for return value of
> malloc
> > and proceeding without compression of FPW if return value is NULL.
>
> So, it seems like you're basically using malloc to work around the
> fact that a palloc failure is an error, and we can't throw an error in
> a critical section.  I don't think that's good; we want all of our
> allocations, as far as possible, to be tracked via palloc.  It might
> be a good idea to add a new variant of palloc or MemoryContextAlloc
> that returns NULL on failure instead of throwing an error; I've wanted
> that once or twice.  But in this particular case, I'm not quite seeing
> why it should be necessary - the number of backup blocks per record is
> limited to some pretty small number, so it ought to be possible to
> preallocate enough memory to compress them all, perhaps just by
> declaring a global variable like char wal_compression_space[8192]; or
> whatever.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-08-19 Thread Robert Haas

On Sat, Aug 16, 2014 at 9:31 AM, Tomas Vondra  wrote:
> On 12.8.2014 00:30, Tomas Vondra wrote:
>> On 11.8.2014 20:25, Robert Haas wrote:
>>> It also strikes me that when there's only 1 batch, the set of bits
>>> that map onto the batch number is zero-width, and one zero-width bit
>>> range is as good as another.  In other words, if you're only planning
>>> to do one batch, you can easily grow the number of buckets on the fly.
>>> Growing the number of buckets only becomes difficult once you have
>>> more than one batch.
>
> ...
>
>> I was considering using reversing the bits of the hash, because that's
>> pretty much the simplest solution. But I think you're right it might
>> actually work like this:
>>
>>   * Are more batches needed?
>>
>>   (yes) => just use nbuckets = my_log2(work_mem / tuple_size)
>>
>>   (no) => go ahead, until processing all tuples or hitting work_mem
>>
>>   (work_mem) => meh, use the same nbuckets above
>>
>>   (all tuples) => compute optimal nbuckets / resize
>>
>>
>> But I need to think about this a bit. So far it seems to me there's no
>> way additional batches might benefit from increasing nbuckets further.
>
> I think this is a simple and solid solution, solving the batchno
> computation issues quite nicely. Attached is v10 patch (bare and
> combined with the dense allocation), that does this:
>
> 1) when we know we'll need batching, buckets are sized for full work_mem
>(using the estimated tuple width, etc.)
>
> 2) without the batching, we estimate the 'right number of buckets' for
>the estimated number of tuples, and keep track of the optimal number
>as tuples are added to the hash table
>
>- if we discover we need to start batching, we keep the current
>  optimal value (which should be the same as the max number of
>  buckets) and don't mess with it anymore (making it possible to
>  compute batch IDs just like before)
>
>- also, on the fist rebatch (nbatch=1 => nbatch=2) the hash table
>  is resized as part of the rebatch
>
>- if the hash build completes without batching, we do the resize
>
> I believe the patch is pretty much perfect. I plan to do more thorough
> testing on a wide range of queries in the next few days.
>
> I also removed the 'enable_hash_resize' GUC, because it would be more
> complex to implement this properly after doing the resize as part of
> rebatch etc.. So either it would make the patch more complex, or it
> wouldn't do what the name promises.

A variety of trivial comments on this:

PostgreSQL style is un-cuddled curly braces.  Also, multi-line
comments need to start with a line containing only /* and end with a
line containing only */.  In one place you've added curly braces
around a single-line block that is otherwise unmodified; please don't
do that.  In one place, you have "becase" instead of "because".  In
another place, you write "add if after it" but it should say "add it
after it" or maybe better "add the new one after it".  Avoid using
punctuation like "=>" in comments to illustrate the connection between
sentences; instead, use a connecting word like "then" or "therefore"
or whatever is appropriate; in this instance, a period followed by the
start of a new sentence seems sufficient.  Revert the removal of a
single line of whitespace near the top of nodeHash.c.

There are too many things marked XXX in this patch.  They should
either be fixed, if they are real problems, or they should be
commented in a way that doesn't give rise to the idea that they're
problems if they aren't.

OK, now on to some more substantive stuff:

1. It's not clear to me what the overall effect of this patch on
memory utilization is.  Reducing NTUP_PER_BUCKET from 10 to 1 is going
to use, on the average, 10x as much bucket-header memory per tuple.
Specifically, I think it means we'll use about 8 bytes of
bucket-header memory per tuple instead of 0.8 bytes per tuple.  If the
tuples are narrow, that could be significant; concerns have been
expressed about that here in the past.  Increasing the number of
buckets could also increase memory usage.  On the other hand, the
dense allocation stuff probably saves a ton of memory, so maybe we end
up overall, but I'm not sure.  Your thoughts, and maybe some test
results with narrow and wide tuples, would be appreciated.

2. But, on the positive side, modulo the memory utilization questions
mentioned above, I would expect the impact on hash join performance to
be positive.  Going from 10 tuples per bucket to just 1 should help,
and on cases where the actual load factor would have ended up much
higher because of poor estimation, increasing the number of buckets on
the fly should help even more.  I haven't tested this, though.

I haven't had a chance to completely go through this yet, so these are
just some initial thoughts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (

Re: [HACKERS] replication commands and log_statements

2014-08-19 Thread Robert Haas

On Sat, Aug 16, 2014 at 10:27 AM, Amit Kapila  wrote:
> I think ideally it would have been better if we could have logged
> replication commands under separate log_level, but as still there
> is no consensus on extending log_statement and nobody is even
> willing to pursue, it seems okay to go ahead and log these under
> 'all' level.

I think the consensus is clearly for a separate GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 06:00 PM, Magnus Hagander wrote:

On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost  wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

   server_cert_valid: Did the server present a valid certificate?
"yes" or "no"

   server_cert_matches_host: Does the Common Name of the certificate
match the host connected to? "yes" or "no"


Aren't these questions addressed by sslmode?


Not entirely. You can have sslmode=require and have a matching
certificate. You don't *have* to have sslmode=verify-full for that.

However, whether it makes *sense* without sslmode is another story -
but assuming you use something like kerberos for auth, it might. For
password, you've already lost once you get that far.


Hmm, right, because the client application doesn't get control between 
libpq doing the SSL negotiation and sending the password to the server. 
So if after connecting you decided that you don't actually trust the 
server, you've already sent to password. Not good.


You might think that you could try connecting without password first, 
and try again with the password, but that's not safe either, because 
there's no guarantee that the second connection reaches the same server 
as the first one.


I think we need a callback or new asynchronous polling state after SSL 
negotiation but before libpq sends the password to the server. But 
that's a separate feature and patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 06:52 PM, Stephen Frost wrote:

* Andres Freund (and...@2ndquadrant.com) wrote:

No. We should build something that's suitable for postgres, not
something general. We'll fail otherwise. For anything fancy the user has
to look at the certificate themselves. We should make it easy to get at
the whole certificate chain in a consistent manner.


I don't buy this argument at all.


Telling users they simply can't have this information isn't
acceptable.


Meh. Why? Most of that isn't something a normal libpq user is going to
need.


I'm not interested in SSL support for users who don't use or care about
SSL (which would be 'normal libpq users', really).  I've *long* been
frustrated by our poor support of SSL and at how painful it is to get
proper SSL working- and it's been a real problem getting PG to pass the
security compliance requirements because of that poor support.  Let's
stop the rhetoric that PG doesn't need anything but the most basic
SSL/auditing/security capabilities.


I think you just packed up the goalposts for a one-way trip to Mars, but 
I wonder: What would you consider "proper SSL support"? What exactly are 
we missing?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane

Stephen Frost  writes:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>> Um, libpq has recently gained the ability to return result fragments,
>> right?  Those didn't exist when libpq-ification of odbc was attempted,
>> as I recall -- perhaps it's possible now.

> I was trying to remember off-hand if we still had that or not..  I
> thought there was discussion about removing it, actually, but perhaps
> that was something else.

Sure,
http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html
That's a done deal, it won't be going away.

Whether it would solve ODBC's problem I don't know (and I'm not
volunteering to do the work ;-))

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 07:10 PM, Alvaro Herrera wrote:

Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:


Indeed, the ODBC driver only uses libpq for authentication, then
calls PQgetssl(), and takes over the whole show calling SSL_read()
and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
that, but that's not an easy job. In the short-term, I think we need
to export pqsecure_read() and pqsecure_write() functions in libpq,
so that the ODBC driver can use those instead of SSL_read() and
SSL_write().


Yeah, that's what I remembered.  There was an attempt to make that
change at one point, but it was reverted due to the lack of batching
ability in libpq (without resorting to cursors, as I recall...),
requiring double the memory usage.  Still, if pqsecure_read and
pqsecure_write are sufficient to make the ODBC driver work, that's good
news.  I had been worried it did other things with the OpenSSL struct
beyond just using those.


Um, libpq has recently gained the ability to return result fragments,
right?  Those didn't exist when libpq-ification of odbc was attempted,
as I recall -- perhaps it's possible now.


IIRC the thing that psqlodbc does that libpq doesn't support is sending 
multiple queries to the backend, and then wait for *all* the replies to 
arrive, in a single round-trip. The closest thing is using PQexec("foo; 
bar;"), but that's quite limited.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 06:44 PM, Stephen Frost wrote:

>Hmm. That seems a bit too much. Perhaps provide just the certificate
>itself in DER/PEM format, and have the client parse it (using
>OpenSSL or something else) if it wants more details.

I really don't care for that approach.  Our SSL support has always been
horrible- I was hoping we'd actually improve that situation.  Adding
things in piecemeal over time will just be painful for our users and I
don't see why we should wait.

What would you like to do with the certificates?

I'm imagining that a GUI tool like pgAdmin might want to extract all 
information from the certificate, display it in a window, and let the 
user look at the whole chain and all the fields. Like a browser does 
when you click the little lock icon in the address bar. That would be a 
nice feature, but it's a huge effort to expose *all* certificate 
information through attributes, especially if you want to support 
multiple SSL libraries. If there was a generic "get attribute X" 
interface in OpenSSL and all the other SSL libraries we wish to support, 
we could provide a pass-through mechanism for that, so that e.g all 
attributes that OpenSSL exposes were mapped to "server_cert_*". But I 
don't think that exists in OpenSSL, let alone in other libraries, and 
the attribute names would be all different anyway.

So that's not really feasible.

But if we provide an interface to grab the whole certificate chain, then 
you can use any library you want to parse and present it to the user. 
You could use OpenSSL, but you could also use a more light-weight parser 
like libtasn1, or if you're writing a python app for example, whatever 
x509 certificate handling library they have. You wouldn't be *verifying* 
the certificates - that's handled by libpq (or rather, the SSL library 
that libpq uses) - so no cryptography required.

Or you could just pass the whole cert to a 3rd party program 
specifically written to display x509 certificates, and let it do the 
parsing. I'll mention that the Windows Crypto API has a built-in 
function called CryptUIDlgViewCertificate that pops up a dialog for 
viewing the certificate. Very handy. I think it's the same dialog that 
Internet Explorer uses.

If you want to write such a GUI from scratch, anyway, I think you would 
be better off to *not* rely on libpq functions, so that you could use 
the same GUI in other contexts too. Like to view an arbitrary 
certificate file on the filesystem.

That said, if there's a need to extract some specific fields for some 
other purpose than displaying the whole certificate to the user, let's 
hear it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > Yeah, that's what I remembered.  There was an attempt to make that
> > change at one point, but it was reverted due to the lack of batching
> > ability in libpq (without resorting to cursors, as I recall...),
> > requiring double the memory usage.  Still, if pqsecure_read and
> > pqsecure_write are sufficient to make the ODBC driver work, that's good
> > news.  I had been worried it did other things with the OpenSSL struct
> > beyond just using those.
> 
> Um, libpq has recently gained the ability to return result fragments,
> right?  Those didn't exist when libpq-ification of odbc was attempted,
> as I recall -- perhaps it's possible now.

I was trying to remember off-hand if we still had that or not..  I
thought there was discussion about removing it, actually, but perhaps
that was something else.

I agree that having that would definitely help with the ODBC driver.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> > Per Apache's documentation, mod_ssl and mod_gnutls support the same set
> > of environment variables (with the same names even), so I don't buy this
> > argument either.
> 
> Gnutls is quite similar from what it provides to openssl. That's not
> saying much. Schannel would be more interesting from that point of view.

Fine- but let's at least start with what two of the three support and
figure out if there's actually an issue getting this information from
Schannel.  I'd be surprised if there really is, but I'm a lot happier
starting with a larger set and then considering if we can live without
certain things than trying to build up one-by-one over major releases.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Alvaro Herrera

Stephen Frost wrote:
> * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> 
> > Indeed, the ODBC driver only uses libpq for authentication, then
> > calls PQgetssl(), and takes over the whole show calling SSL_read()
> > and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
> > that, but that's not an easy job. In the short-term, I think we need
> > to export pqsecure_read() and pqsecure_write() functions in libpq,
> > so that the ODBC driver can use those instead of SSL_read() and
> > SSL_write().
> 
> Yeah, that's what I remembered.  There was an attempt to make that
> change at one point, but it was reverted due to the lack of batching
> ability in libpq (without resorting to cursors, as I recall...),
> requiring double the memory usage.  Still, if pqsecure_read and
> pqsecure_write are sufficient to make the ODBC driver work, that's good
> news.  I had been worried it did other things with the OpenSSL struct
> beyond just using those.

Um, libpq has recently gained the ability to return result fragments,
right?  Those didn't exist when libpq-ification of odbc was attempted,
as I recall -- perhaps it's possible now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund

On 2014-08-19 11:52:37 -0400, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > No. We should build something that's suitable for postgres, not
> > something general. We'll fail otherwise. For anything fancy the user has
> > to look at the certificate themselves. We should make it easy to get at
> > the whole certificate chain in a consistent manner.
> 
> I don't buy this argument at all.

Aha.

> > > Telling users they simply can't have this information isn't
> > > acceptable.
> > 
> > Meh. Why? Most of that isn't something a normal libpq user is going to
> > need.
> 
> I'm not interested in SSL support for users who don't use or care about
> SSL (which would be 'normal libpq users', really).

That's the majority of our users. Even those that care about ssl care
about setting it up in a safe manner, won't care about most of the
attributes.

I have no problem to expand the list of attributes once we have a couple
of differing backends for the support, but having a long list of things
that need to be supported by every one just makes getting there harder.

> I've *long* been
> frustrated by our poor support of SSL and at how painful it is to get
> proper SSL working- and it's been a real problem getting PG to pass the
> security compliance requirements because of that poor support.  Let's
> stop the rhetoric that PG doesn't need anything but the most basic
> SSL/auditing/security capabilities.

I've no problem with keeping future extensions of the API in mind while
this is being designed. We just shouldn't start too big. This is about
getting a proper abstraction in place, not making pg pass security
compliance stuff. Don't mix those too much.

> > > If we end up not being able to provide everything for all of the
> > > libraries we support then perhaps we can document which are available
> > > from all of them, but I'd hope the list of "only in X" is pretty small.
> > 
> > I'm pretty sure that we can't build a reasonable list of the information
> > exposed by any library. Especially as we're likely going to need some
> > mapping to agree to map to the common names.
> 
> Per Apache's documentation, mod_ssl and mod_gnutls support the same set
> of environment variables (with the same names even), so I don't buy this
> argument either.

Gnutls is quite similar from what it provides to openssl. That's not
saying much. Schannel would be more interesting from that point of view.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> No. We should build something that's suitable for postgres, not
> something general. We'll fail otherwise. For anything fancy the user has
> to look at the certificate themselves. We should make it easy to get at
> the whole certificate chain in a consistent manner.

I don't buy this argument at all.

> > Telling users they simply can't have this information isn't
> > acceptable.
> 
> Meh. Why? Most of that isn't something a normal libpq user is going to
> need.

I'm not interested in SSL support for users who don't use or care about
SSL (which would be 'normal libpq users', really).  I've *long* been
frustrated by our poor support of SSL and at how painful it is to get
proper SSL working- and it's been a real problem getting PG to pass the
security compliance requirements because of that poor support.  Let's
stop the rhetoric that PG doesn't need anything but the most basic
SSL/auditing/security capabilities.

> > If we end up not being able to provide everything for all of the
> > libraries we support then perhaps we can document which are available
> > from all of them, but I'd hope the list of "only in X" is pretty small.
> 
> I'm pretty sure that we can't build a reasonable list of the information
> exposed by any library. Especially as we're likely going to need some
> mapping to agree to map to the common names.

Per Apache's documentation, mod_ssl and mod_gnutls support the same set
of environment variables (with the same names even), so I don't buy this
argument either.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> I think it would be nice to be able to query those attributes
> explicitly, rather than just expect libpq to reject the connection
> if something's wrong. For example, I'm thinking that an interactive
> client might present an annoying pop-up window to the user if the
> server cert is not valid, asking if he wants to connect anyway, and
> perhaps remember the certificate and not ask again (TOFU).

Alright, I could see that being useful, though as you say, it'd really
be new functionality.

> Hmm. That seems a bit too much. Perhaps provide just the certificate
> itself in DER/PEM format, and have the client parse it (using
> OpenSSL or something else) if it wants more details.

I really don't care for that approach.  Our SSL support has always been
horrible- I was hoping we'd actually improve that situation.  Adding
things in piecemeal over time will just be painful for our users and I
don't see why we should wait.

> >OCSP used?
> 
> We don't support OCSP.

Another thing that we really should address (actually- can't you enable
it in OpenSSL directly?  I seem to recall something along those lines
anyway, though it's been quite a few years now).

> >This really shouldn't be for *just* the server's certificate but rather
> >available for all certificates involved- on both sides.
> 
> Ok, but why? All the other stuff is readily available in the
> configuration you use to connect. I guess it doesn't hurt to expose
> them through this interface as well, but I can't immediately think
> of an example that would use them.

For starters, certificates can be passed between the client and the
server to complete the chain, so I don't see how it's "readily
available", not to mention that even if the location of the certs was in
simple files locally, the application would need to bring in their own
library to parse and extract out this information, which we've
more-or-less already got.

> Indeed, the ODBC driver only uses libpq for authentication, then
> calls PQgetssl(), and takes over the whole show calling SSL_read()
> and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
> that, but that's not an easy job. In the short-term, I think we need
> to export pqsecure_read() and pqsecure_write() functions in libpq,
> so that the ODBC driver can use those instead of SSL_read() and
> SSL_write().

Yeah, that's what I remembered.  There was an attempt to make that
change at one point, but it was reverted due to the lack of batching
ability in libpq (without resorting to cursors, as I recall...),
requiring double the memory usage.  Still, if pqsecure_read and
pqsecure_write are sufficient to make the ODBC driver work, that's good
news.  I had been worried it did other things with the OpenSSL struct
beyond just using those.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_dump refactor patch to remove global variables

2014-08-19 Thread Craig Ringer

On 08/19/2014 01:40 AM, Robert Haas wrote:
>> > Attached is a patch that doesn't add any new functionality or
>> > features, all it does is get rid of the global variables that
>> > pg_dump.c is full of.
> I think this is an excellent idea.

It's also one small step toward library-ifying pg_dump.

Huge +1.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-19 Thread Craig Ringer

On 08/19/2014 05:02 AM, Joshua D. Drake wrote:
> 
> I am finally able to get back around to this and I am still calling:
> Hokey. I just loaded up a fresh precise (I assume trusty will act the
> same way) and installed postgresql. I installed it, without the PDGD
> repository and everything worked perfectly. The only error I got when
> using pgxnclient to install pg_repack was an error about not having
> libedit-dev installed. I installed it, and it was perfect. I even tested
> with create extension etc...
> 
> So... If we are supposed to ship the "latest" lib... how come Debian or
> Ubuntu don't do that? They ship the latest lib for the version they are
> shipping and because of that, everything works, as expected.
> 
> I iterate, the current apt.postgresql.org is not doing things correctly.
> It breaks things and it shouldn't.

FWIW, this is inconsistent with what yum.postgresql.org does - it takes
the POLA approach of packing the libpq from the major release configured
in the repo. Each major has its own sub-repo.

I find it pretty hard to justify installing a 9.3 libpq alongside a 9.1
server myself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread Craig Ringer

I replied to the XC-list (only) to ask them to discontinue cross-posting
this thread.

Replying here just so you know.

On 08/19/2014 07:46 PM, Vivek Singh Raghuwanshi wrote:
> Thanks,
> One more question, is this library support multitenancy or we need to
> launch separate VPC (virtual public cloud) every time for each customer.
> its good if we have both option.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund

On 2014-08-19 11:05:07 -0400, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
> > > At first blush, I'd say a whole bunch..  Off the top of my head I can
> > > think of:
> 
> [...]
> 
> > I'm not really sure we need all that. We're not building a general ssl
> > library abstraction here.
> 
> Really?  I'm pretty sure that's exactly what we're doing.

No. We should build something that's suitable for postgres, not
something general. We'll fail otherwise. For anything fancy the user has
to look at the certificate themselves. We should make it easy to get at
the whole certificate chain in a consistent manner.

> Telling users they simply can't have this information isn't
> acceptable.

Meh. Why? Most of that isn't something a normal libpq user is going to
need.

> > What I'm wondering is whether we should differentiate 'standard'
> > attributes that we require from ones that a library can supply
> > optionally. If we don't we'll have difficulty enlarging the 'standard'
> > set over time.
> 
> If we end up not being able to provide everything for all of the
> libraries we support then perhaps we can document which are available
> from all of them, but I'd hope the list of "only in X" is pretty small.

I'm pretty sure that we can't build a reasonable list of the information
exposed by any library. Especially as we're likely going to need some
mapping to agree to map to the common names.

I'd just go for plain names for standard attributes and X-$library- for library
specific stuff.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 05:48 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

   server_cert_valid: Did the server present a valid certificate?
"yes" or "no"

   server_cert_matches_host: Does the Common Name of the certificate
match the host connected to? "yes" or "no"


Aren't these questions addressed by sslmode?


Sort of. In sslmode=verify-ca, libpq checks that the server cert was 
valid (the first attribute) and rejects the connection if not. In 
verify-full mode, it also checks that the hostname matches (the second 
attribute). But in sslmode=require, it's possible to connect to a server 
with an invalid server cert. (to be precise in sslmode=require mode 
libpq checks the server cert if a root CA cert was given, but if no root 
CA cert is configured it will allow connecting anyway).


I think it would be nice to be able to query those attributes 
explicitly, rather than just expect libpq to reject the connection if 
something's wrong. For example, I'm thinking that an interactive client 
might present an annoying pop-up window to the user if the server cert 
is not valid, asking if he wants to connect anyway, and perhaps remember 
the certificate and not ask again (TOFU).


We don't actually have such functionality today; you can query the 
OpenSSL structs for those things, but the checks that libpq performs are 
not exactly the same that OpenSSL does. We have our own function to 
check if a wildcard cert matches a hostname, for example, and libpq 
knows that "host" and "hostaddr" can be different. So this would 
actually be a new feature, probably best to be implemented as a separate 
patch. (I grabbed the idea for those attributes from Martijn's ancient 
gnutls patch.)



Exposing the SSL information as generic key/value pairs allows
adding more attributes in the future, without breaking the ABI, and
it also allows exposing implementation-specific information in a
generic way. The attributes listed above cover the needs of psql.
What else do we need?


At first blush, I'd say a whole bunch..  Off the top of my head I can
think of:

For all certificates:
(client, server, cert that signed each, any intermediate CAs, root CAs)
   Certificate itself (perhaps in DER, PEM, X509 formats..)
   Fingerprint
   Signed-By info
   Common Name
   Organization (et al)
   Alternate names
   Issue date, expiration date
   CRL info, OCSP info
   Allowed usage (encryption, signing, etc)


Hmm. That seems a bit too much. Perhaps provide just the certificate 
itself in DER/PEM format, and have the client parse it (using OpenSSL or 
something else) if it wants more details.



CRL checking done?


I guess, although you know implicitly that it was if the sslcrl option 
was given.



OCSP used?


We don't support OCSP.


I think it would also be nice to get more information from the
server's certificate, like the hostname and the organization its
issued to, and expiration date, so that an interactive client like
pgAdmin or even psql could display that information like a web
browser does. Would it be best to add those as extra attributes in
the above list, perhaps with a "server_cert_*" prefix, or add a new
function for extracting server cert's attributes?


This really shouldn't be for *just* the server's certificate but rather
available for all certificates involved- on both sides.


Ok, but why? All the other stuff is readily available in the 
configuration you use to connect. I guess it doesn't hurt to expose them 
through this interface as well, but I can't immediately think of an 
example that would use them.



Have you looked at how this change will play out with the ODBC driver..?
Especially on Windows with the SSL library you're proposing we use
there..  I recall that at one point the ODBC driver simply used libpq to
handle the authentication and set everything up, and then switched to
talking directly without libpq.  In any case, it'd probably be good to
make sure the attributes you're suggesting are sufficient to meet the
needs of the ODBC driver too.


Indeed, the ODBC driver only uses libpq for authentication, then calls 
PQgetssl(), and takes over the whole show calling SSL_read() and 
SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing that, 
but that's not an easy job. In the short-term, I think we need to export 
pqsecure_read() and pqsecure_write() functions in libpq, so that the 
ODBC driver can use those instead of SSL_read() and SSL_write().


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost  wrote:
> > Aren't these questions addressed by sslmode?
> 
> Not entirely. You can have sslmode=require and have a matching
> certificate. You don't *have* to have sslmode=verify-full for that.
> 
> However, whether it makes *sense* without sslmode is another story -
> but assuming you use something like kerberos for auth, it might. For
> password, you've already lost once you get that far.

Sure- I guess my point was really, if you're not verifying them by
sslmode=verify-full, do you really want to ask the question?  If you
*are* verifying them by verify-full, then you already know the answers.

> >> What else do we need?
> >
> > At first blush, I'd say a whole bunch..  Off the top of my head I can
> > think of:
> >
> > For all certificates:
> > (client, server, cert that signed each, any intermediate CAs, root CAs)
> >   Certificate itself (perhaps in DER, PEM, X509 formats..)
> 
> Yeah, if we can extract it in PEM for example, that would be useful.
> 
> >   Fingerprint
> 
> Definitely.
> 
> >   Signed-By info
> 
> If we can get the full cert, do that one instead.
> 
> >   Common Name
> 
> Definitely.
> 
> >   Organization (et al)
> >   Alternate names
> >   Issue date, expiration date
> >   CRL info, OCSP info
> >   Allowed usage (encryption, signing, etc)
> 
> All those would also be covered by the "certificate itself" part I
> think - they're not that common.

Not sure I agree with that but what I don't really like is the
suggestion that we'll need to tell everyone who wants more detailed
information from the certificate to link in whatever their preferred SSL
library is and use that to decode the PEM cert to pull the info.  We'll
end up having applications linking in both OpenSSL and GNUTLS, for
example, which is pretty grotty, imv.

Serial is absolutely another one we need to include, as I look over at
what mod_ssl supports.  Really, I'd look at that list as our minimum to
support..

> >> I think it would also be nice to get more information from the
> >> server's certificate, like the hostname and the organization its
> >> issued to, and expiration date, so that an interactive client like
> >> pgAdmin or even psql could display that information like a web
> >> browser does. Would it be best to add those as extra attributes in
> >> the above list, perhaps with a "server_cert_*" prefix, or add a new
> >> function for extracting server cert's attributes?
> >
> > This really shouldn't be for *just* the server's certificate but rather
> > available for all certificates involved- on both sides.
> 
> Well, if you are already the client, wouldn't you know your own certificate?

Uh, no?  Not without having a library of your own which can open the
certificate file (after it figures out which one we decided to use- oh
yeah, we should probably include that information too..  and then we
have to make sure we can represent things like "on a smart card") and
then parse and extract the information you want from it..

> > That's not ideal, but the only other option I can think of offhand is to
> > break the existing API and force everyone to update and that seems
> > worse.
> 
> Agreed.
> 
> If we just return an arbitrary pointer, then any application that
> *did* actually try to use it would crash.

That wasn't what I was thinking but rather something like "remove
PQgetssl and replace it with PQgetopenssl" or something, breaking the
API completely, forcing everyone to make changes to compile against the
new library, etc, etc.  Very ugly but also very obvious.

> It's not ideal, but errorring in the way of not saying we're secure
> when we are, is acceptable - unlike the opposite.

Yeah, I tend to agree, though I don't particularly like it.  The options
are just so much worse. :/

> Of course, we need to publish it very clearly in the release notes,
> and I would suggest backpatching into the documentation in old
> versions etc as well.

Sounds like a good idea to me.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander

On Tue, Aug 19, 2014 at 5:05 PM, Stephen Frost  wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
>> On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
>> > At first blush, I'd say a whole bunch..  Off the top of my head I can
>> > think of:
>
> [...]
>
>> I'm not really sure we need all that. We're not building a general ssl
>> library abstraction here.
>
> Really?  I'm pretty sure that's exactly what we're doing.  What I was
> wondering is which one we should be modeling off of.
>
> One thought I had was to look at what Apache's mod_ssl provides, which
> can be seen here: http://httpd.apache.org/docs/2.2/mod/mod_ssl.html
>
> I know that I've used quite a few of those.
>
> Telling users they simply can't have this information isn't acceptable.
> I'm not a huge fan of just passing back all of the certificates and
> making the user extract out the information themselves, but if it comes
> down to it then that's at least better than removing any ability to get
> at that information.

Yeah, being able to provide most of them easily accessible is a good
thing. Otherwise, we just move the burden to deparse them to the
client which will then have to know which SSL library it's built
against, so every single client that wants to do something useful with
the cert would have to know about multiple implementations.

I think starting from the apache list is a very good idea.

We should then expose the same set of data at least through the
sslinfo server module.


>> What I'm wondering is whether we should differentiate 'standard'
>> attributes that we require from ones that a library can supply
>> optionally. If we don't we'll have difficulty enlarging the 'standard'
>> set over time.
>
> If we end up not being able to provide everything for all of the
> libraries we support then perhaps we can document which are available
> from all of them, but I'd hope the list of "only in X" is pretty small.

+1. I bet the most common ones will be in all of them, because
frankly, it's functionality you just need to use SSL properly.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-19 Thread Alvaro Herrera

Abhijit Menon-Sen wrote:
> At 2014-06-29 20:35:04 +0900, maumau...@gmail.com wrote:
> >
> > Thanks, I marked it as ready for committer.  I hope Fujii san or
> > another committer will commit this, refining English expression if
> > necessary.
> 
> Since it was just a matter of editing, I went through the patch and
> corrected various minor errors (typos, awkwardness, etc.). I agree
> that this is now ready for committer.

FWIW I think "determines" was correct.



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
> > At first blush, I'd say a whole bunch..  Off the top of my head I can
> > think of:

[...]

> I'm not really sure we need all that. We're not building a general ssl
> library abstraction here.

Really?  I'm pretty sure that's exactly what we're doing.  What I was
wondering is which one we should be modeling off of.

One thought I had was to look at what Apache's mod_ssl provides, which
can be seen here: http://httpd.apache.org/docs/2.2/mod/mod_ssl.html

I know that I've used quite a few of those.

Telling users they simply can't have this information isn't acceptable.
I'm not a huge fan of just passing back all of the certificates and
making the user extract out the information themselves, but if it comes
down to it then that's at least better than removing any ability to get
at that information.

> What I'm wondering is whether we should differentiate 'standard'
> attributes that we require from ones that a library can supply
> optionally. If we don't we'll have difficulty enlarging the 'standard'
> set over time.

If we end up not being able to provide everything for all of the
libraries we support then perhaps we can document which are available
from all of them, but I'd hope the list of "only in X" is pretty small.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander

On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost  wrote:
> * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
>>   server_cert_valid: Did the server present a valid certificate?
>> "yes" or "no"
>>
>>   server_cert_matches_host: Does the Common Name of the certificate
>> match the host connected to? "yes" or "no"
>
> Aren't these questions addressed by sslmode?

Not entirely. You can have sslmode=require and have a matching
certificate. You don't *have* to have sslmode=verify-full for that.

However, whether it makes *sense* without sslmode is another story -
but assuming you use something like kerberos for auth, it might. For
password, you've already lost once you get that far.


>> Exposing the SSL information as generic key/value pairs allows
>> adding more attributes in the future, without breaking the ABI, and
>> it also allows exposing implementation-specific information in a
>> generic way. The attributes listed above cover the needs of psql.
>> What else do we need?
>
> At first blush, I'd say a whole bunch..  Off the top of my head I can
> think of:
>
> For all certificates:
> (client, server, cert that signed each, any intermediate CAs, root CAs)
>   Certificate itself (perhaps in DER, PEM, X509 formats..)

Yeah, if we can extract it in PEM for example, that would be useful.

>   Fingerprint

Definitely.

>   Signed-By info

If we can get the full cert, do that one instead.

>   Common Name

Definitely.

>   Organization (et al)
>   Alternate names
>   Issue date, expiration date
>   CRL info, OCSP info
>   Allowed usage (encryption, signing, etc)

All those would also be covered by the "certificate itself" part I
think - they're not that common.


> CRL checking done?
> OCSP used?
>
>> I think it would also be nice to get more information from the
>> server's certificate, like the hostname and the organization its
>> issued to, and expiration date, so that an interactive client like
>> pgAdmin or even psql could display that information like a web
>> browser does. Would it be best to add those as extra attributes in
>> the above list, perhaps with a "server_cert_*" prefix, or add a new
>> function for extracting server cert's attributes?
>
> This really shouldn't be for *just* the server's certificate but rather
> available for all certificates involved- on both sides.

Well, if you are already the client, wouldn't you know your own certificate?


>> The other question is: What do we do with PQgetssl()? We should
>> document it as deprecated, but we'll have to keep it around for the
>> foreseeable future for backwards-compatibility. We obviously cannot
>> return a valid OpenSSL struct when using any other implementation,
>> so I think it'll have to just return NULL when not using OpenSSL.
>> Probably the most common use of PQgetssl() is to just check if it
>> returns NULL or not, to determine if SSL is enabled, so a client
>> that does that would incorrectly think that SSL is not used, even
>> when it is. I think we can live with that.
>
> That's not ideal, but the only other option I can think of offhand is to
> break the existing API and force everyone to update and that seems
> worse.

Agreed.

If we just return an arbitrary pointer, then any application that
*did* actually try to use it would crash.

It's not ideal, but errorring in the way of not saying we're secure
when we are, is acceptable - unlike the opposite.

Of course, we need to publish it very clearly in the release notes,
and I would suggest backpatching into the documentation in old
versions etc as well.


> Have you looked at how this change will play out with the ODBC driver..?
> Especially on Windows with the SSL library you're proposing we use
> there..  I recall that at one point the ODBC driver simply used libpq to
> handle the authentication and set everything up, and then switched to
> talking directly without libpq.  In any case, it'd probably be good to
> make sure the attributes you're suggesting are sufficient to meet the
> needs of the ODBC driver too.

+1.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund

On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
> > Exposing the SSL information as generic key/value pairs allows
> > adding more attributes in the future, without breaking the ABI, and
> > it also allows exposing implementation-specific information in a
> > generic way. The attributes listed above cover the needs of psql.
> > What else do we need?
> 
> At first blush, I'd say a whole bunch..  Off the top of my head I can
> think of:
> 
> For all certificates:
> (client, server, cert that signed each, any intermediate CAs, root CAs)
>   Certificate itself (perhaps in DER, PEM, X509 formats..)
>   Fingerprint
>   Signed-By info
>   Common Name
>   Organization (et al)
>   Alternate names
>   Issue date, expiration date
>   CRL info, OCSP info
>   Allowed usage (encryption, signing, etc)
> 
> CRL checking done?
> OCSP used?

I'm not really sure we need all that. We're not building a general ssl
library abstraction here. Presenting all those in a common and useful
format isn't trivial.

What I'm wondering is whether we should differentiate 'standard'
attributes that we require from ones that a library can supply
optionally. If we don't we'll have difficulty enlarging the 'standard'
set over time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
>   server_cert_valid: Did the server present a valid certificate?
> "yes" or "no"
> 
>   server_cert_matches_host: Does the Common Name of the certificate
> match the host connected to? "yes" or "no"

Aren't these questions addressed by sslmode?

> Exposing the SSL information as generic key/value pairs allows
> adding more attributes in the future, without breaking the ABI, and
> it also allows exposing implementation-specific information in a
> generic way. The attributes listed above cover the needs of psql.
> What else do we need?

At first blush, I'd say a whole bunch..  Off the top of my head I can
think of:

For all certificates:
(client, server, cert that signed each, any intermediate CAs, root CAs)
  Certificate itself (perhaps in DER, PEM, X509 formats..)
  Fingerprint
  Signed-By info
  Common Name
  Organization (et al)
  Alternate names
  Issue date, expiration date
  CRL info, OCSP info
  Allowed usage (encryption, signing, etc)

CRL checking done?
OCSP used?

> I think it would also be nice to get more information from the
> server's certificate, like the hostname and the organization its
> issued to, and expiration date, so that an interactive client like
> pgAdmin or even psql could display that information like a web
> browser does. Would it be best to add those as extra attributes in
> the above list, perhaps with a "server_cert_*" prefix, or add a new
> function for extracting server cert's attributes?

This really shouldn't be for *just* the server's certificate but rather
available for all certificates involved- on both sides.

> The other question is: What do we do with PQgetssl()? We should
> document it as deprecated, but we'll have to keep it around for the
> foreseeable future for backwards-compatibility. We obviously cannot
> return a valid OpenSSL struct when using any other implementation,
> so I think it'll have to just return NULL when not using OpenSSL.
> Probably the most common use of PQgetssl() is to just check if it
> returns NULL or not, to determine if SSL is enabled, so a client
> that does that would incorrectly think that SSL is not used, even
> when it is. I think we can live with that.

That's not ideal, but the only other option I can think of offhand is to
break the existing API and force everyone to update and that seems
worse.

Have you looked at how this change will play out with the ODBC driver..?
Especially on Windows with the SSL library you're proposing we use
there..  I recall that at one point the ODBC driver simply used libpq to
handle the authentication and set everything up, and then switched to
talking directly without libpq.  In any case, it'd probably be good to
make sure the attributes you're suggesting are sufficient to meet the
needs of the ODBC driver too.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Andres Freund

On 2014-08-19 10:33:29 -0400, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> 
> > Barring objections or better ideas, I'm leaning towards
> > XLogReadBufferForRedo.
> 
> WFM

for me too. Although we could imo strip the 'XLog' in the beginning if
we want to make it shorter. The ForRedo is saying that pretty much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Alvaro Herrera

Heikki Linnakangas wrote:

> Barring objections or better ideas, I'm leaning towards
> XLogReadBufferForRedo.

WFM

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enable WAL archiving even in standby

2014-08-19 Thread MauMau


From: "Fujii Masao" 

I'd propose the attached WIP patch which allows us to enable WAL archiving
even in standby. The patch adds "always" as the valid value of 
archive_mode.
If it's set to "always", the archiver is started when the server is in 
standby
mode and all the WAL files that walreceiver wrote to the disk are archived 
by

using archive_command. Then, even after the server is promoted to master,
the archiver keeps archiving WAL files. The patch doesn't change the 
meanings

of the setting values "on" and "off" of archive_mode.

I think that this feature is useful for the case, e.g., where large 
database

needs to be replicated between remote servers. Imagine the situation where
the replicated database gets corrupted completely in the remote standby.
How should we address this problematic situation and restart the standby?

One approach is to take a fresh backup from the master and restore it onto
the standby. But since the database is large and there is long distance
between two servers, this approach might take a surprisingly long time.

Another approach is to restore the backup which was taken from the standby
before. But most of many WAL files which the backup needs might exist only
in the master (because WAL archiving cannot be enabled in the standby) and
they need to be transfered from the master to the standby via 
long-distance

network. So I think that this approach also would take a fairly long time.
To shorten that time, you may think that archive_command in the master can
be set so that it transfers WAL files from the master to the standby's
archival storage. I agree that this setting can accelerate the database 
restore

process. But this causes every WAL files to be transfered between remote
servers twice (one is by streaming replication, another is by 
archive_command),

and which is a waste of network bandwidth.


Great.  This is exactly what I hoped for disaster recovery, although I 
haven't looked at the patch yet.




Back to the patch. If archive_mode is set to "always", archive_command is
always used to archive WAL files even during recovery. Do we need to 
separate

the command into two for master and standby, respectively? We can add
something like standby_archive_command parameter which is used to archive
only WAL files walreceiver writes. The other WAL files are archived by
archive_command. I'm not sure if it's really worth separating the command
that way. Is there any use case?


I don't see any reason to separate parameters.  I want the spec simple.



I've not included the update of document in the patch yet. If we agree to
support this feature, I will do the remaining work.


Could you consider adding a new section for disaster recovery that describes 
concrete parameter settings (e.g. how do we discard old archive WAL files 
after taking a base backup from standby, because backup label file is not 
created?).  Good luck!


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] New PostgreSQL buildfarm client release 4.14 - bug fix for MSVC

2014-08-19 Thread Andrew Dunstan

There is a new release - version 4.14 - of the buildfarm client, now 
available at 



The only change of note is that a bug which only affects MSVC clients 
(such that the client will not complete a run) and is present in 
releases 4.12 and 4.13 is fixed. Clients on other platforms do not need 
to upgrade.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund

On 2014-08-19 21:47:26 +0800, Craig Ringer wrote:
> On 08/19/2014 06:21 PM, Andres Freund wrote:
> > What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
> > Reporting the LSN in the command tag? Anything doing transparent
> > failover needs to be aware of transaction boundaries anyway. 
> 
> Tom's objection to a GUC applies there too - a client app can send that
> when the underlying driver doesn't expect to get the results.

I don't really think this is true. With a GUC it's set for the whole
session or even users. With such a option to COMMIT it'd only set when
issued by something that actually does transparent failover (i.e. the
underlying driver).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Craig Ringer

On 08/19/2014 06:21 PM, Andres Freund wrote:
> What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
> Reporting the LSN in the command tag? Anything doing transparent
> failover needs to be aware of transaction boundaries anyway. 

Tom's objection to a GUC applies there too - a client app can send that
when the underlying driver doesn't expect to get the results.

I'm not completely convinced that's a problem - oh dear, the app breaks.
The answer to so many other things in Pg is "well, don't do that then"
that I don't see this as overly different.

However, granting that it is a problem, the same objection to a GUC
applies to this too.
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-19 Thread MauMau


From: "Alvaro Herrera" 

Is this supposed to be session-local data, or is it visible from remote
sessions too?  How durable is it supposed to be?  Keep in mind that in
case of a crash, all pgstats data is erased.


I want it to be visible from other sessions.  I'm okay about the data 
erasure during recovery.  We can probably extend pg_statsinfo to save the 
new info for long-term trend analysis.  TBH, I want a feature like 
pg_statsinfo in core.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread MauMau


From: "Fujii Masao" 
What's the main purpose of this tool? If it's for WAL archiving, the tool 
name

"pg_copy" sounds too generic. We already have pg_archivecleanup, so maybe
"pg_archivecopy" or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can 
cause
the problem which some people reported. The problem is that, when the 
server
crashes while WAL file is being archived by cp command, its 
partially-filled
WAL file remains at the archival area. This half-baked archive file can 
cause
various troubles. To address this, WAL file needs to be copied to the 
temporary
file at first, then renamed to the actual name. I think that pg_copy 
should

copy the WAL file in that way.


I intended to make pg_copy a straightforward replacement of cp/copy, which 
complements the missing sync.  Direct I/O and posix_fadvice() feature may be 
convenient but not essential for this utility.  cp/copy doesn't copy to a 
temporary file, and the problem can be solved easily by mv/move.  I wanted 
to keep pg_copy as generic as cp/copy, so that it can be used by some 
advanced features in the future, e.g. comprehensive backup/recovery 
management like RMAN (this example may not be best) when it's integrated 
into the core.


With that said, copying to a temporary file like .tmp and renaming it 
to  sounds worthwhile even as a basic copy utility.  I want to avoid 
copying to a temporary file with a fixed name like _copy.tmp, because some 
advanced utility may want to run multiple instances of pg_copy to copy 
several files into the same directory simultaneously.  However, I'm afraid 
multiple .tmp files might continue to occupy disk space after 
canceling copy or power failure in some use cases, where the copy of the 
same file won't be retried.  That's also the reason why I chose to not use a 
temporary file like cp/copy.


Currently pg_copy always syncs the archive file, and there is no way to 
disable
that. But I'm sure that not everyone want to sync the archive file. So I 
think

that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.


pg_copy is for copying a file reliably by syncing.  If sync is not 
necessary, people can use cp/copy.



Some users might want to specify whether to call posix_fadvise or not 
because

they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.


This sounds reasonable.  Do you have an idea on the switch name and the 
default behavior?



Do you recommend to use pg_copy for restore_command? If yes, it also 
should
be documented. And in the WAL restore case, the restored WAL files are 
re-read

soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why 
not

source file? That might be useful especially for restore_command case.


No, I don't think it's necessary to use pg_copy for restore_command.



At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid 
duplicate

work basically.


If there exists such a command available in the standard OS installation, I 
want to use it.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 9:52 AM,   wrote:
>> Thank you for updating the patch.
>>
>> I did not get error with applying, and compiling.
>> It works fine. I think this function code has no problem.
>> Could you please submit patch to commit fest app?
>
> Thanks for the review!
>
> As you pointed out, submitted patch to commit fest app.

When replication slot is not specified in pg_receivexlog, the flush location
in the feedback message always indicates invalid. So there seems to be
no need to send the feedback as soon as fsync is issued, in that case.
How should this option work when replication slot is not specified?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 8:22 PM, Andres Freund  wrote:
> On 2014-08-19 19:59:51 +0900, Fujii Masao wrote:
>> On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund  
>> wrote:
>> > On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
>> >> On 08/19/2014 01:03 AM, Robert Haas wrote:
>> >> > 2. I agree that it's not good to have this get controlled by a GUC.
>> >> > If the behavior change is big enough that it's going to break clients,
>> >> > adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
>> >> > unnecessary.
>> >>
>> >> There's plenty of agreement on "not a GUC" - but what about alternatives?
>> >
>> > What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
>> > Reporting the LSN in the command tag? Anything doing transparent
>> > failover needs to be aware of transaction boundaries anyway.
>>
>> So something like transparent failover doesn't work when a client is
>> working in auto commit mode? That sounds not good.
>
> I don't think transparent failover + autocommit is a sensible
> combination.
>
>> Just idea. What about using NoticeResponse message to report LSN?
>> It can be sent basically anytime and this idea doesn't break current
>> wire protocol.
>
> I think that'd be horrible from multiple perspectives: a) how to discern
> them from regular notice messages

You can implement your own protocol upon existing messages like
replication is done.

 b) It's not sent in the same protocol
> level message as the COMMIT message. Thus there's scenarios where you
> only have the commit, but not the LSN.

Hmm.. you can change the code so that the message with LSN is sent
as soon as COMMIT message is sent, if required.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread Vivek Singh Raghuwanshi

Thanks,
One more question, is this library support multitenancy or we need to
launch separate VPC (virtual public cloud) every time for each customer.
its good if we have both option.



On Tue, Aug 19, 2014 at 5:04 PM, 鈴木 幸市  wrote:

>  XC’s libpq is binary compatible with PG.   So as long as Openstack uses
> libpq to connect to PG, XC should work too.
>
>  Appreciate if you have a chance to try.
>
>  Thanks.
> ---
> Koichi Suzuki
>
>  2014/08/19 20:14、Vivek Singh Raghuwanshi 
> のメール：
>
>
>  Hi All,
>
>  Please let me know is that possible to use Openstack Trove with
> Postgres-XC.
>  With instances and Baremetal (after Juno Release).
>  I Know it is possible to use other medium like MySQL or PostgreSQL, but
> i am not sure about XC.
>
>
>  --
> ViVek Raghuwanshi
> Mobile -+91-09595950504
> Skype - vivek_raghuwanshi
> IRC vivekraghuwanshi
> http://vivekraghuwanshi.wordpress.com/
> http://in.linkedin.com/in/vivekraghuwanshi
>
> --
> ___
> Postgres-xc-developers mailing list
> postgres-xc-develop...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>
>


-- 
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi

[HACKERS] dynahash vs. memory-bounded HashAggregate (hashjoin-style)

2014-08-19 Thread Tomas Vondra

Hi all,

while working on a prototype of memory-bounded hash aggregate (alternative
to Jeff's patch discussed in [1]), I ran into difficulties when dealing
with dynahash. So I'm asking for help ...

Some of the difficulties stem from my limited knowledge of dynahash, and
complexity in execGrouping.c, but it seems to me some are actually due to
a mismatch with the proposed hashjoin-like approach to batching.

The hashjoin-like batching essentially does this:

1) put data (in this case aggregate states) into a hash table, until a
memory limit is reached
2) double the number of batches and move half the entries from the hash
table (based on batch number, computed from the hash value)
3) repeat

This however requires two things, that I think seem quite difficult to do
with dynahash:

(a) detecting that a memory limit was reached

   This is usually done with a condition like this:

   if (consumed_memory > work_mem) {
  ... do the batching / remove ~50% of data from hash table
  ... it's expected counsumed_memory drops by 50%
   }

   Where consumed memory is the size of a memory context (cheap to
compute, thanks to another patch from Jeff [2]).

   This however does not work with dynahash, because dynahash does not
release memory for removed tuples - it just moves it to a freelist, so
consumed_memory only grows.

   For a while I thought I could do this:

   if (consumed_memory > consumed_memory_prev) {
  ...
  consumed_memory_prev = consumed_memory
   }

   But then I found out dynahash does not grow continuously, but in (quite
large) steps. Exceeding the limit a bit is not a big deal, but the
growth is quite fast and quickly leads to allocating much more than the
limit.


(b) removing tuples while batching

Whenever the number of batches is increased (doubled), I need to walk
through the hash table and remove entries not belonging to the current
batch (should be ~50% of them). The only way to do this with dynahash
sems to be iterating over the entries, and then doing another search
with HASH_REMOVE. Is there a better way?


I've been considering switching this to a custom hash table (similar to
what's used in hashjoin
https://commitfest.postgresql.org/action/patch_view?id=1494), which seems
like a better solution for this use-case, but I'm not sure about this. I'm
not a big fan of replacing large amounts of code for no good reason.

Opinions?


I'd be grateful if someone more knowledgeable of dynahash / the way it's
used in execGrouping could review the prototype I have so far. There's
only a handful of functions related to dynahash, and most of the issues I
have is about passing the values properly (slots, dummy values, tuples).


regards
Toms


[1]
http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop

[2]
http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enable WAL archiving even in standby

2014-08-19 Thread Fujii Masao

On Fri, Aug 15, 2014 at 4:30 AM, Robert Haas  wrote:
> On Wed, Aug 13, 2014 at 6:42 AM, Fujii Masao  wrote:
>> I'd propose the attached WIP patch which allows us to enable WAL archiving
>> even in standby. The patch adds "always" as the valid value of archive_mode.
>> If it's set to "always", the archiver is started when the server is in 
>> standby
>> mode and all the WAL files that walreceiver wrote to the disk are archived by
>> using archive_command. Then, even after the server is promoted to master,
>> the archiver keeps archiving WAL files. The patch doesn't change the meanings
>> of the setting values "on" and "off" of archive_mode.
>
> I like the feature, but I don't much like this as a control mechanism.
> Having archive_command and standby_archive_command, as you propose
> further down, seems saner.

Okay, that's fine. One question is; Which WAL files should be archived by
standby_archive_command? There are following kinds of WAL files.

(1) WAL files which were fully written and closed by walreceiver
 Curently they are not archived at all.

(2) WAL file which is being written by walreceiver
 This file will be closed before it's fully written because of,
 for example, standby promotion.
 Currently this is archived by archive_command.

(3) WAL file with new timeline, which is copied from (2)
  At the end of recovery, after new timeline is assigned,
  this latest WAL file with new timeline is created by being copied
  from (2) (i.e., latest WAL file with old timeline). WAL data of
  end-of-recovery checkpoint is written to this latest WAL file.
  Currently this is archived by archive_command.

(4) Timeline history files
 When standby is promoted to the master, the imeline is incremented
 and the timeline history file is created.
 Currently the timeline history files are archived by archive_command.

(5) WAL files generated in normal processing mode
  Currently they are archived by archive_command.

I'm thinking to use standby_archive_command only for (1) because
the others are currently archived by archive_command. That means
that even if there are type (1) WAL files which have not been archived
yet after the standby promotion (i.e., the situation where WAL archiving
was delayed for some reasons in the standby), they are archived by
standby_archive_command. IOW, the archiver uses both archive commands
as the situation demands.

OTOH, maybe there are people who want to use standby_archive_command
for all the WAL files with old timeline, i.e., (1) and (2). Thought?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread 鈴木幸市

XC’s libpq is binary compatible with PG.   So as long as Openstack uses libpq 
to connect to PG, XC should work too.

Appreciate if you have a chance to try.

Thanks.
---
Koichi Suzuki

2014/08/19 20:14、Vivek Singh Raghuwanshi 
mailto:vivekraghuwan...@gmail.com>> のメール：


Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but i am 
not sure about XC.


--
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi
--
___
Postgres-xc-developers mailing list
postgres-xc-develop...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund

On 2014-08-19 19:59:51 +0900, Fujii Masao wrote:
> On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund  wrote:
> > On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
> >> On 08/19/2014 01:03 AM, Robert Haas wrote:
> >> > 2. I agree that it's not good to have this get controlled by a GUC.
> >> > If the behavior change is big enough that it's going to break clients,
> >> > adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
> >> > unnecessary.
> >>
> >> There's plenty of agreement on "not a GUC" - but what about alternatives?
> >
> > What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
> > Reporting the LSN in the command tag? Anything doing transparent
> > failover needs to be aware of transaction boundaries anyway.
> 
> So something like transparent failover doesn't work when a client is
> working in auto commit mode? That sounds not good.

I don't think transparent failover + autocommit is a sensible
combination.

> Just idea. What about using NoticeResponse message to report LSN?
> It can be sent basically anytime and this idea doesn't break current
> wire protocol.

I think that'd be horrible from multiple perspectives: a) how to discern
them from regular notice messages b) It's not sent in the same protocol
level message as the COMMIT message. Thus there's scenarios where you
only have the commit, but not the LSN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Trove with PostgreSQL-XC

2014-08-19 Thread Vivek Singh Raghuwanshi

Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but i
am not sure about XC.


-- 
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-19 Thread Amit Kapila

On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas  wrote:
>
> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila 
wrote:
> > 1.
> > +Number of parallel connections to perform the operation. This
> > option will enable the vacuum
> > +operation to run on parallel connections, at a time one table
will
> > be operated on one connection.
> >
> > a. How about describing w.r.t asynchronous connections
> > instead of parallel connections?
>
> I don't think "asynchronous" is a good choice of word.

Agreed.

>Maybe "simultaneous"?

Not sure. How about *concurrent* or *multiple*?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund  wrote:
> On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
>> On 08/19/2014 01:03 AM, Robert Haas wrote:
>> > 2. I agree that it's not good to have this get controlled by a GUC.
>> > If the behavior change is big enough that it's going to break clients,
>> > adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
>> > unnecessary.
>>
>> There's plenty of agreement on "not a GUC" - but what about alternatives?
>
> What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
> Reporting the LSN in the command tag? Anything doing transparent
> failover needs to be aware of transaction boundaries anyway.

So something like transparent failover doesn't work when a client is
working in auto commit mode? That sounds not good.

Just idea. What about using NoticeResponse message to report LSN?
It can be sent basically anytime and this idea doesn't break current
wire protocol.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-19 Thread Amit Kapila

On Wed, Aug 13, 2014 at 4:01 PM, Dilip kumar  wrote:
> On 11 August 2014 10:29, Amit kapila wrote,
> >5.
>
> >res = executeQuery(conn,
>
> >"select relname, nspname from pg_class c, pg_namespace ns"
>
> >" where (relkind = \'r\' or relkind = \'m\')"
>
> >" and c.relnamespace = ns.oid order by relpages desc",
>
> >progname, echo);
>
> >
>
> >a. Here you need to use SQL keywords in capital letters, refer one
> >of the other caller of executeQuery() in vacuumdb.c
>
> >b. Why do you need this condition c.relnamespace = ns.oid in above
>
> >query?
>
>
>
> IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE
ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE
NEED TO VACUUM ALL THE
>
> TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE
WILL BE TREATED SAME.)

Thats right, however writing query in below way might
make it more understandable
+ "SELECT relname, nspname FROM pg_class c, pg_namespace ns"

"SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns"

> >7.
>
>
> Here we are getting message string, I think if we need to find the error
code then we need to parse the string, and after that we need to compare
with error codes.
>
> Is there any other way to do this ?

You can compare against SQLSTATE by using below API.
val = PQresultErrorField(res, PG_DIAG_SQLSTATE);

You need to handle *42P01* SQLSTATE, also please refer below
usage where we are checking SQLSTATE.

fe-connect.c
PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
   ERRCODE_INVALID_PASSWORD) == 0)

Few more comments:

1.
* If user has not given the vacuum of complete db, then if

I think here you have said reverse of what code is doing.
You don't need *not* in above sentence.

2.
+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
I think here you need to use function fmtQualifiedId() or fmtId()
or something similar to handle quotes appropriately.

3.

+  */
+ if (!r && !completedb)
Here the usage of completedb variable is reversed which means
that it goes into error path when actually whole database is
getting vacuumed and the reason is that you are setting it
to false in below code:
+ /* Vaccuming full database*/
+ vacuum_tables = false;

4.
Functions prepare_command() and vacuum_one_database() contain
duplicate code, is there any problem in using prepare_command()
function in vacuum_one_database(). Another point in this context
is that I think it is better to name function prepare_command()
as append_vacuum_options() or something on that lines, also it will
be better if you can write function header for this function as well.

5.
+ if (error)
+ {
+ for (i = 0; i < max_slot; i++)
+ {
+ DisconnectDatabase(&connSlot[i]);
+ }

Here why do we need DisconnectDatabase() type of function?
Why can't we simply call PQfinish() as in base code?

6.
+ /*
+  * if table list is not provided then we need to do vaccum for whole DB
+  * get the list of all tables and prpare the list
+  */
spelling of prepare is wrong. I have noticed spell mistake
in comments at some other place as well, please check all
comments once

7. I think in new mechanism cancel handler will not work.
In single connection vacuum it was always set/reset
in function executeMaintenanceCommand(). You might need
to set/reset it in function run_parallel_vacuum().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra

On 19 Srpen 2014, 10:26, Jeff Davis wrote:
> On Sat, 2014-08-16 at 23:09 +0200, Tomas Vondra wrote:
>> But maybe the inheritance really is not necessary - maybe it would be
>> enough to track this per-context, and then just walk through the
>> contexts and collect this. Because my observation is that most of the
>> time is actually spent in walking through blocks and freelists.
>
> That makes a lot of sense to me.
>
> Another approach is to pass a flag to hash_create that tells it not to
> create a subcontext. Then we don't need to traverse anything; we already
> know which context we want to look at. Perhaps I was being too clever
> with the idea of tracking space for an entire hierarchy.
>
> Also, as I pointed out in my reply to Robert, adding too many fields to
> MemoryContextData may be the cause of the regression. Your idea requires
> only one field, which doesn't show the same regression in my tests.

Yeah, keeping the structure size below 64B seems like a good idea.

The use-case for this is tracking a chosen subtree of contexts - e.g.
aggcontext and below, so I'd expect the tracked subtrees to be relatively
shallow. Am I right?

My fear is that by removing the inheritance bit, we'll hurt cases with a
lot of child contexts. For example, array_agg currently creates a separate
context for each group - what happens if you have 100k groups and do
MemoryContextGetAllocated? I guess iterating over 100k groups is not free.

Wouldn't the solution with inheritance and propagating the accounting info
to the parent actually better? Or maybe allowing both, having two flags
when creating a context instead of one?

  AllocSetCreateTracked( , bool track, bool propagate_immediately)

By squashing both flags into a single mask you wouldn't increase the size.
Also, do we really need to track allocated bytes - couldn't we track
kilobytes or something and use smaller data types to get below the 64B?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra

On 19 Srpen 2014, 9:52, Jeff Davis wrote:
> On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
>> I think that's right, and I rather like your (Jeff's) approach.  It's
>> definitely true that we could do better if we have a mechanism for
>> serializing and deserializing group states, but (1) I think an awful
>> lot of cases would get an awful lot better even just with the approach
>> proposed here and (2) I doubt we would make the
>> serialization/deserialization interfaces mandatory, so even if we had
>> that we'd probably want a fallback strategy anyway.
>
> Thank you for taking a look.
>
> To solve the problem for array_agg, that would open up two potentially
> lengthy discussions:
>
> 1. Trying to support non-serialized representations (like
> ArrayBuildState for array_agg) as a real type rather than using
> "internal".

That's certainly an option, and it's quite straightforward. The downside
of it is that you either prevent the aggregates from using the most
efficient state form (e.g. the array_agg might use a simple array as a
state) or you cause a proliferation of types with no other purpose.


> 2. What changes should we make to the aggregate API? As long as we're
> changing/extending it, should we go the whole way and support partial
> aggregation[1] (particularly useful for parallelism)?

Maybe, but not in this patch please. That's far wider scope, and while
considering it when designing API changes is probably a good idea, we
should resist the attempt to do those two things in the same patch.

> Both of those discussions are worth having, and perhaps they can happen
> in parallel as I wrap up this patch.

Exactly.

> I'll see whether I can get consensus that my approach is (potentially)
> commit-worthy, and your statement that it (potentially) solves a real
> problem is a big help.

IMHO it's a step in the right direction. It may not go as far as I'd like,
but that's OK.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] After switching primary server while using replication slot.

2014-08-19 Thread Fujii Masao

On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko  wrote:
> Hi all,
>
> After switching primary serer while using repliaction slot, the
> standby server will not able to connect new primary server.
> Imagine this situation, if primary server has two ASYNC standby
> servers, also use each replication slots.
> And the one standby(A) apply WAL without problems. But another one
> standby(B) has stopped after connected to primary server.
> (or sending WAL is too delayed)
>
> In this situation, the standby(B) has not received WAL segment file
> while stopping itself.
> And the primary server can not remove WAL segments which has not been
> received to all standby.
> Therefore the primary server have to keep the WAL segment file which
> has not been received to all standby.
> But standby(A) can do checkpoint itself, and then it's possible to
> recycle WAL segments.
> The number of WAL segment of each server are different.
> ( The number of WAL files of standby(A) having smaller than primary server.)
> After the primary server is crashed, the standby(A) promote to primary,
> we can try to connect standby(B) to standby(A) as new standby server.
> But it will be failed because the standby(A) server might not have WAL
> segment files that standby(B) required.

This sounds valid concern.

> To resolve this situation, I think that we should make master server
> to notify about removal of WAL segment to all standby servers.
> And the standby servers recycle WAL segments files base on that information.
>
> Thought?

How does the server recycle WAL files after it's promoted from the
standby to master?
It does that as it likes? If yes, your approach would not be enough.

The approach prevents unexpected removal of WAL files while the standby
is running. But after the standby is promoted to master, it might recycle
needed WAL files immediately. So another standby may still fail to retrieve
the required WAL file after the promotion.

ISTM that, in order to address this, we might need to log all the replication
slot activities and replicate them to the standby. I'm not sure if this
breaks the design of replication slot at all, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund

On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
> On 08/19/2014 01:03 AM, Robert Haas wrote:
> > 2. I agree that it's not good to have this get controlled by a GUC.
> > If the behavior change is big enough that it's going to break clients,
> > adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
> > unnecessary.
> 
> There's plenty of agreement on "not a GUC" - but what about alternatives?

What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
Reporting the LSN in the command tag? Anything doing transparent
failover needs to be aware of transaction boundaries anyway. 

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Greg Stark

On Tue, Aug 19, 2014 at 1:21 AM, Craig Ringer  wrote:
> There's plenty of agreement on "not a GUC" - but what about alternatives?

It could be a new protocol message. Currently there are no transaction
oriented protocol messages (other than the "transaction status" in
ReadyForQuery). But would it not make sense to have TransactionBegin,
TransactionCommit, and TransactionAbort in the protocol? Would that
make it easier for the client-side failover to keep track of what
transactions are pending or committed and need to be verified after a
failover?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-08-19 Thread David Rowley

On Tue, Feb 11, 2014 at 7:59 AM, Alexander Korotkov 
wrote:
>
> Done. Patch is splitted.
>

I've started to look at this, and for now I'm still finding my way around
the patch, so I'm not quite there yet with understanding everything.
Never-the-less it seems best to post my comments early, so as to help
maintain concurrency between the review and getting the patch into shape.

I've only been looking at partial-sort-basic-1.patch so far;

The patch no longer applies to master, but this was only due to a tab being
replaced by 2 spaces in a pgident run. I've attached an updated patch which
currently applies without any issues.

Here's a few notes from reading over the code:

* pathkeys.c

  EquivalenceMember *member = (EquivalenceMember *)
lfirst(list_head(key->pk_eclass->ec_members));

You can use linitial() instead of lfirst(list_head()). The same thing
occurs in costsize.c

* pathkeys.c

The following fragment:

n = pathkeys_common(root->query_pathkeys, pathkeys);

if (n != 0)
{
/* It's useful ... or at least the first N keys are */
return n;
}

return 0; /* path ordering not useful */
}

Could just read:

/* return the number of path keys in common, or 0 if there are none */
return pathkeys_common(root->query_pathkeys, pathkeys);

* execnodes.h

In struct SortState, some new fields don't have a comment.

I've also thrown a few different workloads at the patch and I'm very
impressed with most of the results. Especially when LIMIT is used, however
I've found a regression case which I thought I should highlight, but for
now I can't quite see what could be done to fix it.

create table a (x int not null, y int not null);
insert into a select x.x,y.y from generate_series(1,100) x(x) cross
join generate_series(1,10) y(y);

Patched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=92.42..163.21 rows=10 width=8) (actual
time=6239.426..6239.429 rows=10 loops=1)
   ->  Partial sort  (cost=92.42..354064.37 rows=5 width=8) (actual
time=6239.406..6239.407 rows=10 loops=1)
 Sort Key: x, y
 Presorted Key: x
 Sort Method: quicksort  Memory: 25kB
 ->  Index Scan using a_x_idx on a  (cost=0.44..353939.13
rows=5 width=8) (actual time=0.059..6239.319 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.212 ms
 Execution time: 6239.505 ms
(10 rows)

Time: 6241.220 ms

Unpatched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=195328.26..195328.28 rows=10 width=8) (actual
time=3077.759..3077.761 rows=10 loops=1)
   ->  Sort  (cost=195328.26..195453.26 rows=5 width=8) (actual
time=3077.757..3077.757 rows=10 loops=1)
 Sort Key: x, y
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on a  (cost=0.00..194247.77 rows=5 width=8)
(actual time=0.018..3077.705 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.510 ms
 Execution time: 3077.837 ms
(9 rows)

Time: 3080.201 ms

As you can see, the patched version performs an index scan in order to get
the partially sorted results, but it does end up quite a bit slower than
the seqscan/sort that the unpatched master performs. I'm not quite sure how
realistic the x+0 = 1 WHERE clause is, but perhaps the same would happen if
something like x+y = 1 was performed too After a bit more analysis on
this, I see that if I change the 50k estimate to 10 in the debugger that
the num_groups is properly estimated at 1 and it then performs the seq scan
instead. So it looks like the costings of the patch are not to blame here.
(The 50k row estimate comes from rel tuples  / DEFAULT_NUM_DISTINCT)

That's all I have at the moment... More to follow soon.

Regards

David Rowley

partial-sort-basic-1_rebased.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Rahila Syed

Hello,
Thank you for comments.

>Could you tell me where the patch for "single block in one run" is?
Please find attached patch for single block compression in one run.

Thank you,



On Tue, Aug 19, 2014 at 1:17 PM, Fujii Masao  wrote:

> On Tue, Aug 19, 2014 at 2:08 AM, Andres Freund 
> wrote:
> > On 2014-08-18 13:06:15 -0400, Robert Haas wrote:
> >> On Mon, Aug 18, 2014 at 7:19 AM, Rahila Syed 
> wrote:
> >> >>According to the measurement result, the amount of WAL generated in
> >> >>"Multiple Blocks in one run" than that in "Single Block in one run".
> >> >>So ISTM that compression of multiple blocks at one run can improve
> >> >>the compression ratio. Am I missing something?
> >> >
> >> > Sorry for using unclear terminology. WAL generated here means WAL
> that gets
> >> > generated in each run without compression.
> >> > So, the value WAL generated in the  above measurement is uncompressed
> WAL
> >> > generated to be specific.
> >> > uncompressed WAL = compressed WAL  + Bytes saved.
> >> >
> >> > Here, the measurements are done for a constant amount of time rather
> than
> >> > fixed number of transactions. Hence amount of WAL generated does not
> >> > correspond to compression ratios of each algo. Hence have calculated
> bytes
> >> > saved in order to get accurate idea of the amount of compression in
> each
> >> > scenario and for various algorithms.
> >> >
> >> > Compression ratio i.e Uncompressed WAL/compressed WAL in each of the
> above
> >> > scenarios are as follows:
> >> >
> >> > Compression algo   Multiple Blocks in one runSingle Block in
> one run
> >> >
> >> > LZ4  1.21
>1.27
> >> >
> >> > Snappy1.19
>  1.25
> >> >
> >> > pglz 1.14
>1.16
> >> >
> >> > This shows compression ratios of both the scenarios Multiple blocks
> and
> >> > single block  are nearly same for this benchmark.
> >>
> >> I don't agree with that conclusion.  The difference between 1.21 and
> >> 1.27, or between 1.19 and 1.25, is quite significant.  Even the
> >> difference beyond 1.14 and 1.16 is not trivial.  We should try to get
> >> the larger benefit, if it is possible to do so without an unreasonable
> >> effort.
> >
> > Agreed.
> >
> > One more question: Do I see it right that multiple blocks compressed
> > together compress *worse* than compressing individual blocks? If so, I
> > have a rather hard time believing that the patch is sane.
>
> Or the way of benchmark might have some problems.
>
> Rahila,
> I'd like to measure the compression ratio in both multiple blocks and
> single block cases.
> Could you tell me where the patch for "single block in one run" is?
>
> Regards,
>
> --
> Fujii Masao
>


CompressSingleBlock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 6:03 PM, Andres Freund  wrote:
> On 2014-08-19 18:02:32 +0900, Fujii Masao wrote:
>> On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund  
>> wrote:
>> > On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
>> >> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
>> >> check was done but in 9.4 this command returns 4 fields:
>> >> (PQntuples(res) != 1 || PQnfields(res) < 3)
>> >
>> > Which is correct. We don't want to error out in the case where 3 columns
>> > are returned because that'd unnecessarily break compatibility with <
>> > 9.4. Previously that check was != 3...
>> >
>> > This isn't a bug.
>>
>> Okay, I understood why you didn't update those codes.
>>
>> Since we don't allow replication between different major versions,
>> it's better to apply this change at least into libpqwalreceiver.c. Thought?
>
> We'd discussed that we'd rather keep it consistent. It also results in a
> more explanatory error message lateron.

Hmm... okay, will revert the commit.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Andres Freund

On 2014-08-19 18:02:32 +0900, Fujii Masao wrote:
> On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund  wrote:
> > On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
> >> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
> >> check was done but in 9.4 this command returns 4 fields:
> >> (PQntuples(res) != 1 || PQnfields(res) < 3)
> >
> > Which is correct. We don't want to error out in the case where 3 columns
> > are returned because that'd unnecessarily break compatibility with <
> > 9.4. Previously that check was != 3...
> >
> > This isn't a bug.
> 
> Okay, I understood why you didn't update those codes.
> 
> Since we don't allow replication between different major versions,
> it's better to apply this change at least into libpqwalreceiver.c. Thought?

We'd discussed that we'd rather keep it consistent. It also results in a
more explanatory error message lateron.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund  wrote:
> On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
>> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
>> check was done but in 9.4 this command returns 4 fields:
>> (PQntuples(res) != 1 || PQnfields(res) < 3)
>
> Which is correct. We don't want to error out in the case where 3 columns
> are returned because that'd unnecessarily break compatibility with <
> 9.4. Previously that check was != 3...
>
> This isn't a bug.

Okay, I understood why you didn't update those codes.

Since we don't allow replication between different major versions,
it's better to apply this change at least into libpqwalreceiver.c. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Andres Freund

On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
> check was done but in 9.4 this command returns 4 fields:
> (PQntuples(res) != 1 || PQnfields(res) < 3)

Which is correct. We don't want to error out in the case where 3 columns
are returned because that'd unnecessarily break compatibility with <
9.4. Previously that check was != 3...

This isn't a bug.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Heikki Linnakangas


On 08/19/2014 11:07 AM, Michael Paquier wrote:

Patch 1 looks good. The main difference with the v1 submitted a couple of
days back is that the global variable approach is replaced with additional
arguments for the LSN position and record pointer in XLogReplayBuffer. I
have as well run a couple of tests with the page comparison tool, done some
tests based on installcheck-world with a slave replaying WAL behind, and
found no issues with it.


Thanks!


Perhaps we could consider pushing it to facilitate the next work? Even if
the second patch is dropped it is still a win IMO to have backup block
replay managed within a single function (being it named XLogReplayBuffer in
latest patch), and having it return a status flag.


Yeah, that was my plan.

Regarding the name, the following have been suggested so far:

XLogReplayBuffer
XLogRestoreBuffer
XLogRecoverBuffer
XLogReadBufferForReplay
ReadBufferForXLogReplay

One more idea:

XLogRedoBuffer (would be like three first options above, but would match 
that we call the functions that call this "redo" functions)


I think XLogReadBufferForReplay is the most descriptive. Andres and 
Alvaro both suggested it - independently I believe - so that seems to 
come out naturally. But maybe make it XLogReadBufferForRedo, since we 
call the redo functions redo functions and not replay functions.


Yet another option is to just call it XLogReadBuffer, and rename the 
existing XLogReadBuffer to something else. With the 2nd patch, there are 
only a few callers of XLogReadBuffer left. But is it too deceitful if 
XLogReadBuffer doesn't merely read the page, but also sometimes replaces 
it with a full-page image? Maybe it's OK..



Barring objections or better ideas, I'm leaning towards 
XLogReadBufferForRedo.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao

On Mon, Aug 18, 2014 at 4:01 PM, Michael Paquier
 wrote:
> On Mon, Aug 18, 2014 at 3:48 PM, Fujii Masao  wrote:
>> On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier
>>  wrote:
>>> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
>>> check was done but in 9.4 this command returns 4 fields:
>>> (PQntuples(res) != 1 || PQnfields(res) < 3)
>>> That's not directly related to this patch, but making some corrections
>>> is not going to hurt..
>>
>> Good catch! I found that libpqwalreceiver.c, etc have the same problem.
>> It's better to fix this separately. Patch attached.
> Patch looks good to me.

Okay, applied!

> Once you push that I'll rebase the stuff on
> this thread once again, that's going to conflict for sure. And now
> looking at your patch there is additional refactoring possible with
> IDENTIFY_SYSTEM and pg_basebackup as well...

Yep, that's possible. But since the patch needs to be back-patch to 9.4,
I didn't do the refactoring.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Jeff Davis

On Sat, 2014-08-16 at 23:09 +0200, Tomas Vondra wrote:
> But maybe the inheritance really is not necessary - maybe it would be
> enough to track this per-context, and then just walk through the
> contexts and collect this. Because my observation is that most of the
> time is actually spent in walking through blocks and freelists.

That makes a lot of sense to me.

Another approach is to pass a flag to hash_create that tells it not to
create a subcontext. Then we don't need to traverse anything; we already
know which context we want to look at. Perhaps I was being too clever
with the idea of tracking space for an entire hierarchy.

Also, as I pointed out in my reply to Robert, adding too many fields to
MemoryContextData may be the cause of the regression. Your idea requires
only one field, which doesn't show the same regression in my tests.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Jeff Davis

On Thu, 2014-08-14 at 12:52 -0400, Robert Haas wrote:
> It appears to me that the performance characteristics for this version
> are not significantly different from version 1.  I have not looked at
> the code.

While trying to reproduce your results, I noticed what might be around a
1% regression just from adding the 3 fields to MemoryContextData. If I
cut it down to adding just one field, the regression disappears.

The results are fairly noisy, so I could be chasing the wrong thing. But
one reason to believe it is that I pushed the size of MemoryContextData
above 64, which sounds like it might be an important threshold.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Michael Paquier

On Mon, Aug 18, 2014 at 10:55 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:
> All rightey.
>
> Here's the next version of this work. It now comes as two patches. The
first
> one refactors adds the XLogReplayBuffer() function and refactors all the
> redo functions to use it. It doesn't change the WAL record format in any
> way. The second patch applies over the first one, and changes the WAL
> format, and all the WAL-creating functions to use the new API for
> constructing WAL records. The second patch removes the relfilenode and
block
> number arguments from XLogReplayBuffer, because they're no longer needed
> when that information is included in the record format.
>
> Todo:
>
> * Performance testing. Do the new WAL construction functions add overhead
to
> WAL insertion?
> * Compare WAL record sizes before and after. I've tried to keep it as
> compact as possible, but I know that some records have gotten bigger. Need
> to do a more thorough analysis.
> * Rename XLogReplayBuffer. I don't particularly like any of the
suggestions
> so far, XLogReplayBuffer included. Discussion continues..
> * Anything else?

Patch 1 looks good. The main difference with the v1 submitted a couple of
days back is that the global variable approach is replaced with additional
arguments for the LSN position and record pointer in XLogReplayBuffer. I
have as well run a couple of tests with the page comparison tool, done some
tests based on installcheck-world with a slave replaying WAL behind, and
found no issues with it.
Perhaps we could consider pushing it to facilitate the next work? Even if
the second patch is dropped it is still a win IMO to have backup block
replay managed within a single function (being it named XLogReplayBuffer in
latest patch), and having it return a status flag.

Regarding patch 2:
- The main differences with the latest version are the modifications for
XLogReplayBuffer having new arguments (LSN position and record pointer).
XLogRecGetBlockRefIds has been changed to return a palloc'd array of block
IDs. xloginsert.h, containing all the functions for xlog record
construction is introduced as well.
- Tiny thing, be aware of tab padding. Here is heapam.c:
page = BufferGetPage(buffer);
PageSetAllVisible(page);
MarkBufferDirty(buffer);
- XLogRecGetBlockRefIds is not described in
src/backend/access/transam/README. Btw, pg_xlogdump drops a core dump when
using it:
--Output:
Assertion failed: (n == *num_refs), function XLogRecGetBlockRefIds, file
xlogreader.c, line 1157.
rmgr: Heaplen (rec/tot): 14/ 12912, tx:  3, lsn:
0/01000148, prev 0/01000120, Abort trap: 6 (core dumped)
-- Backtrace:
frame #4: 0x000103870363
pg_xlogdump`XLogRecGetBlockRefIds(record=0x7ff38a003200,
num_refs=0x7fff5c394464) + 435 at xlogreader.c:1157
frame #5: 0x00010386d610
pg_xlogdump`XLogDumpDisplayRecord(config=0x7fff5c3945c8,
ReadRecPtr=16777544, record=0x7ff38a003200) + 272 at pg_xlogdump.c:357
frame #6: 0x00010386cad8 pg_xlogdump`main(argc=2,
argv=0x7fff5c394658) + 3160 at pg_xlogdump.c:749
In order to reproduce that, simply run regression tests, followed by
pg_xlogdump on one of the WAL files generated.
- This patch includes some diffs from pg_receivexlog.c taken from 52bffe3.
- I have run installcheck-world and compared the size of the WAL generated
on HEAD and the patch (any hints to improve such analysis are of course
welcome)
  name  |   start   |stop|   diff
+---++---
 HEAD (8605bc7) | 0/16C6808 | 0/11A2C670 | 271998568
 Patch 1+2  | 0/16D45D8 | 0/1267A4B0 | 284843736
(2 rows)
So that's a diff of more or less 13MB for this test set.

Looking forward for some performance numbers as well as more precise
comparison of WAL record length.
-- 
Michael

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Jeff Davis

On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
> I think that's right, and I rather like your (Jeff's) approach.  It's
> definitely true that we could do better if we have a mechanism for
> serializing and deserializing group states, but (1) I think an awful
> lot of cases would get an awful lot better even just with the approach
> proposed here and (2) I doubt we would make the
> serialization/deserialization interfaces mandatory, so even if we had
> that we'd probably want a fallback strategy anyway.

Thank you for taking a look.

To solve the problem for array_agg, that would open up two potentially
lengthy discussions:

1. Trying to support non-serialized representations (like
ArrayBuildState for array_agg) as a real type rather than using
"internal".

2. What changes should we make to the aggregate API? As long as we're
changing/extending it, should we go the whole way and support partial
aggregation[1] (particularly useful for parallelism)?

Both of those discussions are worth having, and perhaps they can happen
in parallel as I wrap up this patch.

I'll see whether I can get consensus that my approach is (potentially)
commit-worthy, and your statement that it (potentially) solves a real
problem is a big help.

Regards,
Jeff Davis

[1]
http://blogs.msdn.com/b/craigfr/archive/2008/01/18/partial-aggregation.aspx

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Fujii Masao

On Tue, Aug 19, 2014 at 2:08 AM, Andres Freund  wrote:
> On 2014-08-18 13:06:15 -0400, Robert Haas wrote:
>> On Mon, Aug 18, 2014 at 7:19 AM, Rahila Syed  wrote:
>> >>According to the measurement result, the amount of WAL generated in
>> >>"Multiple Blocks in one run" than that in "Single Block in one run".
>> >>So ISTM that compression of multiple blocks at one run can improve
>> >>the compression ratio. Am I missing something?
>> >
>> > Sorry for using unclear terminology. WAL generated here means WAL that gets
>> > generated in each run without compression.
>> > So, the value WAL generated in the  above measurement is uncompressed WAL
>> > generated to be specific.
>> > uncompressed WAL = compressed WAL  + Bytes saved.
>> >
>> > Here, the measurements are done for a constant amount of time rather than
>> > fixed number of transactions. Hence amount of WAL generated does not
>> > correspond to compression ratios of each algo. Hence have calculated bytes
>> > saved in order to get accurate idea of the amount of compression in each
>> > scenario and for various algorithms.
>> >
>> > Compression ratio i.e Uncompressed WAL/compressed WAL in each of the above
>> > scenarios are as follows:
>> >
>> > Compression algo   Multiple Blocks in one runSingle Block in one 
>> > run
>> >
>> > LZ4  1.21   
>> > 1.27
>> >
>> > Snappy1.19   1.25
>> >
>> > pglz 1.14   
>> > 1.16
>> >
>> > This shows compression ratios of both the scenarios Multiple blocks and
>> > single block  are nearly same for this benchmark.
>>
>> I don't agree with that conclusion.  The difference between 1.21 and
>> 1.27, or between 1.19 and 1.25, is quite significant.  Even the
>> difference beyond 1.14 and 1.16 is not trivial.  We should try to get
>> the larger benefit, if it is possible to do so without an unreasonable
>> effort.
>
> Agreed.
>
> One more question: Do I see it right that multiple blocks compressed
> together compress *worse* than compressing individual blocks? If so, I
> have a rather hard time believing that the patch is sane.

Or the way of benchmark might have some problems.

Rahila,
I'd like to measure the compression ratio in both multiple blocks and
single block cases.
Could you tell me where the patch for "single block in one run" is?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

95 matches

Mail list logo