date:20140807

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Benedikt Grundmann

On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini <
gabriele.bartol...@2ndquadrant.it> wrote:

> Hi Marco,
>
> > With the current full backup procedure they are backed up, so I think
> > that having them backed up with a rsync-like algorithm is what an user
> > would expect for an incremental backup.
>
> Exactly. I think a simple, flexible and robust method for file based
> incremental backup is all we need. I am confident it could be done for
> 9.5.
>
> I would like to quote every single word Simon said. Block level
> incremental backup (with Robert's proposal) is definitely the ultimate
> goal for effective and efficient physical backups. I see file level
> incremental backup as a very good "compromise", a sort of intermediate
> release which could nonetheless produce a lot of benefits to our user
> base, for years to come too.
>
> Thanks,
> Gabriele
>

I haven't been following this discussion closely at all. But at Janestreet
we have been using pg_start_backup together with rsync --link-dest (onto a
big NFS) to achieve incremental stored backup.  In our experience this
works very well, it is however advisable to look into whatever is used to
serve the NFS as we had to set some options to increase the maximum number
of hardlinks.

Cheers,

Bene


>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Ashutosh Bapat

On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost  wrote:

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > I looked into the issue reported in bug #11109.  The problem appears to
> be
> > that jsonb's on-disk format is designed in such a way that the leading
> > portion of any JSON array or object will be fairly incompressible,
> because
> > it consists mostly of a strictly-increasing series of integer offsets.
> > This interacts poorly with the code in pglz_compress() that gives up if
> > it's found nothing compressible in the first first_success_by bytes of a
> > value-to-be-compressed.  (first_success_by is 1024 in the default set of
> > compression parameters.)
>
> I haven't looked at this in any detail, so take this with a grain of
> salt, but what about teaching pglz_compress about using an offset
> farther into the data, if the incoming data is quite a bit larger than
> 1k?  This is just a test to see if it's worthwhile to keep going, no?  I
> wonder if this might even be able to be provided as a type-specific
> option, to avoid changing the behavior for types other than jsonb in
> this regard.
>
>
+1 for offset. Or sample the data in the beginning, middle and end.
Obviously one could always come up with worst case, but.


> (I'm imaginging a boolean saying "pick a random sample", or perhaps a
> function which can be called that'll return "here's where you wanna test
> if this thing is gonna compress at all")
>
> I'm rather disinclined to change the on-disk format because of this
> specific test, that feels a bit like the tail wagging the dog to me,
> especially as I do hope that some day we'll figure out a way to use a
> better compression algorithm than pglz.
>
> Thanks,
>
> Stephen
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-07 Thread Ashutosh Bapat

On Fri, Aug 8, 2014 at 8:54 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hi, thank you for the comment.
>
> > Hi Kyotaro,
> > I looked at the patches and felt that the approach taken here is too
> > intrusive, considering that the feature is only for foreign scans.
>
> I agree to you premising that it's only for foreign scans but I
> regard it as an example of parallel execution planning.
>
> > There are quite a few members added to the generic Path, Plan structures,
> > whose use is is induced only through foreign scans. Each path now stores
> > two sets of costs, one with parallelism and one without. The parallel
> > values will make sense only when there is a foreign scan, which uses
> > parallelism, in the plan tree. So, those costs are maintained
> unnecessarily
> > or the memory for those members is wasted in most of the cases, where the
> > tables involved are not foreign. Also, not many foreign tables will be
> able
> > to use the parallelism, e.g. file_fdw. Although, that's my opinion; I
> would
> > like hear from others.
>
> I intended to discuss what the estimation and planning for
> parallel exexution (not limited to foreign scan) would be
> like. Backgroud worker would be able to take on executing some
> portion of path tree in 'parallel'. The postgres_fdw for this
> patch is simply a case in planning of parallel
> executions. Although, as you see, it does only choosing whether
> to go parallel for the path constructed regardless of parallel
> execution but thinking of the possible alternate paths of
> parallel execution will cost too much.
>
> Limiting to parallel scans for this discussion, the overall gain
> by multiple simultaneous scans distributed in path/plan tree
> won't be known before cost counting is done up to the root node
> (more precisely the common parent of them). This patch foolishly
> does bucket brigade of parallel cost up to root node, but there
> should be smarter way to shortcut it, for example, simplly
> picking up parallelizable nodes by scanning completed path/plan
> tree and calculate the probably-eliminable costs from them, then
> subtract it from or compare to the total (nonparallel) cost. This
> might be more acceptable for everyone than current implement.
>
>
Planning for parallel execution, would be a much harder problem to solve.
Just to give a glimpse, how many worker backends can be spawned depends
entirely on the load at the time of execution. For prepared queries, the
load condition can change between planning and execution and thus the
number of parallel backends, which would decide the actual time of
execution and hence cost, can not be estimated at the time of the planning.
Mixing this that parallelism with FDW's parallelism would make things even
more complicated. I think those two problems are to be solved in different
ways.


> > Instead, an FDW which can use parallelism can add two paths one with and
> > one without parallelism with appropriate costs and let the logic choosing
> > the cheapest path take care of the actual choice. In fact, I thought,
> > parallelism would be always faster than the non-parallel one, except when
> > the foreign server is too much loaded. But we won't be able to check that
> > anyway. Can you point out a case where the parallelism may not win over
> > serial execution?
>
> It always wins against serial execution if parallel execution can
> launched with no extra cost. But actually it costs extra resource
> so I thought that parallel execution should be curbed for small
> gain. It's the two GUCs added by this patch and what
> choose_parallel_scans() does, although in non-automated way. The
> overloading issue is not a matter confined to parallel execution
> but surely it will be more severe since it is less visible and
> controllable from users. However, it anyhow would should go to
> manual configuration at end.
>

I am not sure, whether the way this patch provides manual control is really
effective or in-effective without understanding the full impact. Do we have
any numbers to show the cases, when parallelism would effective and when it
would not and how those GUCs help choose the effective one?


>
> > BTW, the name parallelism seems to be misleading here. All, it will be
> able
> > to do is fire the queries (or data fetch requests) asynchronously. So, we
> > might want to change the naming appropriately.
>
> It is right ragarding what I did exactly to postgres_fdw. But not
> allowing all intermedate tuples from child execution nodes in
> parallel to be piled up on memory without restriction, I suppose
> all 'parallel' execution to be a kind of this 'asynchronous'
> startup/fething. As for postgres_fdw, it would look more like
> 'parallel' (and perhaps more effeicient) by processing queries
> using libpq's single-row mode instead of a cursor but the similar
> processing takes place under system calls even for the case.
>
>
By single mode, do you mean executing FETCH for every row? That wouldn't be

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 1:19 PM, Amit Kapila  wrote:
> On Thu, Aug 7, 2014 at 12:36 PM, Fujii Masao  wrote:
>> On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila 
>> wrote:
>> > On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao 
>> > wrote:
>
>> >> What about picking up only data_directory at the first pass?
>> >
>> > I think that will workout and for that I think we might need to add
>> > few checks during ProcessConfigFile.  Do you want to address it
>> > during parsing of config file or during setting of values?
>>
>> I prefer "during paring". The attached patch does that. In this patch,
>> the first call of ProcessConfigFile() picks up only data_directory. One
>> concern of this patch is that the logic is a bit ugly.
>
> I think handling 'data_directory' in ParseConfigFp() looks bit out of
> place as this API is used to parse other config files as well like
> recovery.conf.  I agree that for all other paths DataDir might be
> already set due to which this new path will never be executed, still
> from code maintenance point of view it would have been better if we
> can find a way to handle it in a place where we are processing
> the server's main config file (postgresql.conf).

Yep, right. ParseConfigFp() is not good place to pick up data_directory.
What about the attached patch which makes ProcessConfigFile() instead of
ParseConfigFp() pick up data_directory just after the configuration file is
parsed?

Regards,

-- 
Fujii Masao
*** a/src/backend/utils/misc/guc-file.l
--- b/src/backend/utils/misc/guc-file.l
***
*** 162,167  ProcessConfigFile(GucContext context)
--- 162,209 
  	}
  
  	/*
+ 	 * Pick up only the data_directory if DataDir is not set, which
+ 	 * means that the configuration file is read for the first time and
+ 	 * PG_AUTOCONF_FILENAME file cannot be read yet. In this case,
+ 	 * we should not pick up all the settings except the data_directory
+ 	 * from postgresql.conf yet because they might be overwritten
+ 	 * with the settings in PG_AUTOCONF_FILENAME file which will be
+ 	 * read later. OTOH, since it's ensured that data_directory doesn't
+ 	 * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten
+ 	 * later. Now only the data_directory needs to be picked up to
+ 	 * read PG_AUTOCONF_FILENAME file later.
+ 	 */
+ 	if (DataDir == NULL)
+ 	{
+ 		ConfigVariable *prev = NULL;
+ 
+ 		for (item = head; item;)
+ 		{
+ 			ConfigVariable *ptr = item;
+ 
+ 			item = item->next;
+ 			if (strcmp(ptr->name, "data_directory") != 0)
+ 			{
+ if (prev == NULL)
+ 	head = ptr->next;
+ else
+ {
+ 	prev->next = ptr->next;
+ 	/*
+ 	 * On removing last item in list, we need to update tail
+ 	 * to ensure that list will be maintianed.
+ 	 */
+ 	if (prev->next == NULL)
+ 		tail = prev;
+ }
+ pfree(ptr);
+ 			}
+ 			else
+ prev = ptr;
+ 		}
+ 	}
+ 
+ 	/*
  	 * Mark all extant GUC variables as not present in the config file.
  	 * We need this so that we can tell below which ones have been removed
  	 * from the file since we last processed it.
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 4342,4347  SelectConfigFiles(const char *userDoption, const char *progname)
--- 4342,4354 
  		return false;
  	}
  
+ 	/*
+ 	 * Read the configuration file for the first time. This time only
+ 	 * data_directory parameter is picked up to determine the data directory
+ 	 * so that we can read PG_AUTOCONF_FILENAME file next time. Then don't
+ 	 * forget to read the configuration file again later to pick up all the
+ 	 * parameters.
+ 	 */
  	ProcessConfigFile(PGC_POSTMASTER);
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 2:12 PM, Alvaro Herrera  wrote:
> Fujii Masao wrote:
>> On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
>>  wrote:
>
>> > Hm, what's with the parse_int signature change and the hintmsg thing?
>> > Is it just me or the patch is incomplete?
>>
>> Sorry, probably I failed to see your point. You mean that the signature
>> of parse_int needs to be changed so that it includes the hintmsg as the
>> argument? If yes, there is no problem. Both signature and function body
>> of parse_int has already included the hingmsg as the argument so far.
>> Am I missing something?
>
> I just mean that the parse_int function body is not touched by your
> patch, unless I am failing to see something.

Yes, my patch doesn't change the parse_int function at all because I didn't
think such change is required for the purpose (i.e., just allows us to specify
the unit in the setting of storage parameters). But, you might find the
reason why it needs to be changed?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Tatsuo Ishii

Fabien,

> Hello Tatsuo-san,
> 
>> Thanks for the review. I have registered it to Aug Commit fest.
>> https://commitfest.postgresql.org/action/patch_view?id=1532
>>
>>> I'm not sure of the variable name "is_non_init_parameter_set". I would
>>> suggest "benchmarking_option_set"?
>>
>> Ok, I will replace the variable name as you suggested.
>>
>>> Also, to be consistent, maybe it should check that no
>>> initialization-specific option are set when benchmarking.
>>
>> Good suggesition. Here is the v2 patch.
> 
> I applied it without problem and tested it.
> 
> 
> * It seems that -c is ignored, the atoi() line has been removed.
> 
> * Option -q is initialization specific, but not detected as such like
> * the other, although there is a specific detection later. I think that
> * it would be better to use the systematic approach, and to remove the
> * specific check.
> 
> * I would name the second boolean "initialization_option_set", as it is
> * describe like that in the documentation.
> 
> 
> * I would suggest the following error messages:
>  "some options cannot be used in initialization (-i) mode\n" and
>  "some options cannot be used in benchmarking mode\n".
> Although these messages are rough, I think that they are enough and
> avoid running something unexpected, which is your purpose.
> 
> 
> Find attached a patch which adds these changes to your current
> version.

Thank you for the review and patch. Looks good. I changed the status
to "Ready for Committer". I will wait for a fewd days and if there's
no objection, I will commit the patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)

I haven't looked at this in any detail, so take this with a grain of
salt, but what about teaching pglz_compress about using an offset
farther into the data, if the incoming data is quite a bit larger than
1k?  This is just a test to see if it's worthwhile to keep going, no?  I
wonder if this might even be able to be provided as a type-specific
option, to avoid changing the behavior for types other than jsonb in
this regard.

(I'm imaginging a boolean saying "pick a random sample", or perhaps a
function which can be called that'll return "here's where you wanna test
if this thing is gonna compress at all")

I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
especially as I do hope that some day we'll figure out a way to use a
better compression algorithm than pglz.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Alvaro Herrera

Fujii Masao wrote:
> On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
>  wrote:

> > Hm, what's with the parse_int signature change and the hintmsg thing?
> > Is it just me or the patch is incomplete?
> 
> Sorry, probably I failed to see your point. You mean that the signature
> of parse_int needs to be changed so that it includes the hintmsg as the
> argument? If yes, there is no problem. Both signature and function body
> of parse_int has already included the hingmsg as the argument so far.
> Am I missing something?

I just mean that the parse_int function body is not touched by your
patch, unless I am failing to see something.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-07 Thread Stephen Frost

JD,

* Joshua D. Drake (j...@commandprompt.com) wrote:
> But this is just plain wrong. I don't care that the FAQ (on the
> wiki) says we are doing it wrong for good reasons. When I (or anyone
> else) pulls postgresql-$version-dev, I want the libpq for my
> version. I do not want 9.3.

No, it isn't wrong.  If you'd like the specific version, follow what's
in the FAQ to get the specific version.  Otherwise, we're going to
follow the Debian guidelines which are quite sensible and more-or-less
say "make new builds go against the latest version".

> There can be unintended circumstances on machines when you mix and
> match like that. Can we please do some proper packaging on this?

This *is* the proper packaging.

If you want the specific version, update your deb line.  Don't complain
because you used the Debian repo that follows the Debian guidelines and
didn't like what you got- that's not going to change.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera
 wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> We can specify the unit when setting autovacuum_vacuum_cost_delay
>> GUC as follows.
>>
>> ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';
>>
>> OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
>> as storage parameter as follows.
>>
>> CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = 
>> '80ms');
>> ERROR:  invalid value for integer option
>> "autovacuum_vacuum_cost_delay": 80ms
>>
>> This is not user-friendly.
>
> No disagreement here.
>
>> I'd like to propose the attached patch which
>> introduces the infrastructure which allows us to specify the unit when
>> setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
>> Comment? Review?
>
> Hm, what's with the parse_int signature change and the hintmsg thing?
> Is it just me or the patch is incomplete?

Sorry, probably I failed to see your point. You mean that the signature
of parse_int needs to be changed so that it includes the hintmsg as the
argument? If yes, there is no problem. Both signature and function body
of parse_int has already included the hingmsg as the argument so far.
Am I missing something?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Amit Kapila

On Thu, Aug 7, 2014 at 12:36 PM, Fujii Masao  wrote:
> On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila 
wrote:
> > On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao 
wrote:

> >> What about picking up only data_directory at the first pass?
> >
> > I think that will workout and for that I think we might need to add
> > few checks during ProcessConfigFile.  Do you want to address it
> > during parsing of config file or during setting of values?
>
> I prefer "during paring". The attached patch does that. In this patch,
> the first call of ProcessConfigFile() picks up only data_directory. One
> concern of this patch is that the logic is a bit ugly.

I think handling 'data_directory' in ParseConfigFp() looks bit out of
place as this API is used to parse other config files as well like
recovery.conf.  I agree that for all other paths DataDir might be
already set due to which this new path will never be executed, still
from code maintenance point of view it would have been better if we
can find a way to handle it in a place where we are processing
the server's main config file (postgresql.conf).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Larry White

Apologies if this is a ridiculous suggestion, but I believe that swapping
out the compression algorithm (for Snappy, for example) has been discussed
in the past. I wonder if that algorithm is sufficiently different that it
would produce a better result, and if that might not be preferable to some
of the other options.


On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane  wrote:

> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)
>
> As an example, here's gdb's report of the bitwise representation of the
> example JSON value in the bug thread:
>
> 0x2ab85ac:  0x2005  0x0004  0x50003098  0x309f
> 0x2ab85bc:  0x30ae  0x30b8  0x30cf  0x30da
> 0x2ab85cc:  0x30df  0x30ee  0x3105  0x6b6e756a
> 0x2ab85dc:  0x40de  0x0034  0x0068  0x009c
> 0x2ab85ec:  0x00d0  0x0104  0x0138  0x016c
> 0x2ab85fc:  0x01a0  0x01d4  0x0208  0x023c
> 0x2ab860c:  0x0270  0x02a4  0x02d8  0x030c
> 0x2ab861c:  0x0340  0x0374  0x03a8  0x03dc
> 0x2ab862c:  0x0410  0x0444  0x0478  0x04ac
> 0x2ab863c:  0x04e0  0x0514  0x0548  0x057c
> 0x2ab864c:  0x05b0  0x05e4  0x0618  0x064c
> 0x2ab865c:  0x0680  0x06b4  0x06e8  0x071c
> 0x2ab866c:  0x0750  0x0784  0x07b8  0x07ec
> 0x2ab867c:  0x0820  0x0854  0x0888  0x08bc
> 0x2ab868c:  0x08f0  0x0924  0x0958  0x098c
> 0x2ab869c:  0x09c0  0x09f4  0x0a28  0x0a5c
> 0x2ab86ac:  0x0a90  0x0ac4  0x0af8  0x0b2c
> 0x2ab86bc:  0x0b60  0x0b94  0x0bc8  0x0bfc
> 0x2ab86cc:  0x0c30  0x0c64  0x0c98  0x0ccc
> 0x2ab86dc:  0x0d00  0x0d34  0x0d68  0x0d9c
> 0x2ab86ec:  0x0dd0  0x0e04  0x0e38  0x0e6c
> 0x2ab86fc:  0x0ea0  0x0ed4  0x0f08  0x0f3c
> 0x2ab870c:  0x0f70  0x0fa4  0x0fd8  0x100c
> 0x2ab871c:  0x1040  0x1074  0x10a8  0x10dc
> 0x2ab872c:  0x1110  0x1144  0x1178  0x11ac
> 0x2ab873c:  0x11e0  0x1214  0x1248  0x127c
> 0x2ab874c:  0x12b0  0x12e4  0x1318  0x134c
> 0x2ab875c:  0x1380  0x13b4  0x13e8  0x141c
> 0x2ab876c:  0x1450  0x1484  0x14b8  0x14ec
> 0x2ab877c:  0x1520  0x1554  0x1588  0x15bc
> 0x2ab878c:  0x15f0  0x1624  0x1658  0x168c
> 0x2ab879c:  0x16c0  0x16f4  0x1728  0x175c
> 0x2ab87ac:  0x1790  0x17c4  0x17f8  0x182c
> 0x2ab87bc:  0x1860  0x1894  0x18c8  0x18fc
> 0x2ab87cc:  0x1930  0x1964  0x1998  0x19cc
> 0x2ab87dc:  0x1a00  0x1a34  0x1a68  0x1a9c
> 0x2ab87ec:  0x1ad0  0x1b04  0x1b38  0x1b6c
> 0x2ab87fc:  0x1ba0  0x1bd4  0x1c08  0x1c3c
> 0x2ab880c:  0x1c70  0x1ca4  0x1cd8  0x1d0c
> 0x2ab881c:  0x1d40  0x1d74  0x1da8  0x1ddc
> 0x2ab882c:  0x1e10  0x1e44  0x1e78  0x1eac
> 0x2ab883c:  0x1ee0  0x1f14  0x1f48  0x1f7c
> 0x2ab884c:  0x1fb0  0x1fe4  0x2018  0x204c
> 0x2ab885c:  0x2080  0x20b4  0x20e8  0x211c
> 0x2ab886c:  0x2150  0x2184  0x21b8  0x21ec
> 0x2ab887c:  0x2220  0x2254  0x2288  0x22bc
> 0x2ab888c:  0x22f0  0x2324  0x2358  0x238c
> 0x2ab889c:  0x23c0  0x23f4  0x2428  0x245c
> 0x2ab88ac:  0x2490  0x24c4  0x24f8  0x252c
> 0x2ab88bc:  0x2560  0x2594  0x25c8  0x25fc
> 0x2ab88cc:  0x2630  0x2664  0x2698  0x26cc
> 0x2ab88dc:  0x2700

Re: [HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Alvaro Herrera

Fujii Masao wrote:
> Hi,
> 
> We can specify the unit when setting autovacuum_vacuum_cost_delay
> GUC as follows.
> 
> ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';
> 
> OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
> as storage parameter as follows.
> 
> CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms');
> ERROR:  invalid value for integer option
> "autovacuum_vacuum_cost_delay": 80ms
> 
> This is not user-friendly.

No disagreement here.

> I'd like to propose the attached patch which
> introduces the infrastructure which allows us to specify the unit when
> setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
> Comment? Review?

Hm, what's with the parse_int signature change and the hintmsg thing?
Is it just me or the patch is incomplete?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use unique index for longer pathkeys.

2014-08-07 Thread Kyotaro HORIGUCHI

Hello, 

> > Although, yes, you're right, irrespective of the "common
> > something", and even if the dropped index was i_t1_pkey_2, which
> > is on t1(a, b), the result tuples are sorted as expected only by
> > the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
> > still unique so the joined tuples are also unique, and the unique
> > key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
> > which can be transformed to (t.a, t1.b) using the equiality
> > between t.a and t1.a. And considering the inner relation (t1) is
> > already sorted by (a, b), the sort itself could be elimited from
> > the plan.
> 
> I think if we could eliminate t1.c,t1.d considering indexes on
> individual relations (may be by using technique I have mentioned
> upthread or some other way), then the current code itself will
> eliminate the ORDER BY clause.  I have tried that by using a query
> without having t1.c,t1.d in ORDER BY clause
> 
> explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
> t1.a,t1.b;
> QUERY PLAN
> --
>  Merge Join
>Merge Cond: (t1.a = t.a)
>->  Index Scan using i_t1_pkey_2 on t1
>->  Index Scan using i_t_pkey on t
> (4 rows)

Ya, the elimination seems to me so fascinate:)

> Okay, I think you want to handle the elimination of ORDER BY clauses
> at a much broader level which might handle most of simple cases as
> well.  However I think eliminating clauses as mentioned above is itself
> a good optimization for many cases and more so if that can be done in
> a much simpler way.

Yes, if can be done in "much" simpler way..  I guess that it
could be looking from opposite side, that is, equivalence
classes, anyway, I'll try it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Specifying the unit in storage parameter

2014-08-07 Thread Fujii Masao

Hi,

We can specify the unit when setting autovacuum_vacuum_cost_delay
GUC as follows.

ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms';

OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay
as storage parameter as follows.

CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms');
ERROR:  invalid value for integer option
"autovacuum_vacuum_cost_delay": 80ms

This is not user-friendly. I'd like to propose the attached patch which
introduces the infrastructure which allows us to specify the unit when
setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
Comment? Review?

Regards,

-- 
Fujii Masao
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***
*** 105,111  static relopt_int intRelOpts[] =
  			"Packs btree index pages only to this percentage",
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 105,111 
  			"Packs btree index pages only to this percentage",
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 113,119  static relopt_int intRelOpts[] =
  			"Packs hash index pages only to this percentage",
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 113,119 
  			"Packs hash index pages only to this percentage",
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 121,127  static relopt_int intRelOpts[] =
  			"Packs gist index pages only to this percentage",
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 121,127 
  			"Packs gist index pages only to this percentage",
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 129,135  static relopt_int intRelOpts[] =
  			"Packs spgist index pages only to this percentage",
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 129,135 
  			"Packs spgist index pages only to this percentage",
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 137,143  static relopt_int intRelOpts[] =
  			"Minimum number of tuple updates or deletes prior to vacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 137,143 
  			"Minimum number of tuple updates or deletes prior to vacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 145,151  static relopt_int intRelOpts[] =
  			"Minimum number of tuple inserts, updates or deletes prior to analyze",
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 145,151 
  			"Minimum number of tuple inserts, updates or deletes prior to analyze",
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 153,159  static relopt_int intRelOpts[] =
  			"Vacuum cost delay in milliseconds, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100
  	},
  	{
  		{
--- 153,159 
  			"Vacuum cost delay in milliseconds, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100, GUC_UNIT_MS
  	},
  	{
  		{
***
*** 161,167  static relopt_int intRelOpts[] =
  			"Vacuum cost amount available before napping, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 1
  	},
  	{
  		{
--- 161,167 
  			"Vacuum cost amount available before napping, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 1, 0
  	},
  	{
  		{
***
*** 169,175  static relopt_int intRelOpts[] =
  			"Minimum age at which VACUUM should freeze a table row, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10
  	},
  	{
  		{
--- 169,175 
  			"Minimum age at which VACUUM should freeze a table row, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10, 0
  	},
  	{
  		{
***
*** 177,183  static relopt_int intRelOpts[] =
  			"Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10
  	},
  	{
  		{
--- 177,183 
  			"Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 10, 0
  	},
  	{
  		{
***
*** 185,191  static relopt_int intRelOpts[] =
  			"Age at which to autovacuum a table to prevent transaction ID wraparound",
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 20
  	},
  	{

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-07 Thread Kyotaro HORIGUCHI

Hi, thank you for the comment.

> Hi Kyotaro,
> I looked at the patches and felt that the approach taken here is too
> intrusive, considering that the feature is only for foreign scans.

I agree to you premising that it's only for foreign scans but I
regard it as an example of parallel execution planning.

> There are quite a few members added to the generic Path, Plan structures,
> whose use is is induced only through foreign scans. Each path now stores
> two sets of costs, one with parallelism and one without. The parallel
> values will make sense only when there is a foreign scan, which uses
> parallelism, in the plan tree. So, those costs are maintained unnecessarily
> or the memory for those members is wasted in most of the cases, where the
> tables involved are not foreign. Also, not many foreign tables will be able
> to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would
> like hear from others.

I intended to discuss what the estimation and planning for
parallel exexution (not limited to foreign scan) would be
like. Backgroud worker would be able to take on executing some
portion of path tree in 'parallel'. The postgres_fdw for this
patch is simply a case in planning of parallel
executions. Although, as you see, it does only choosing whether
to go parallel for the path constructed regardless of parallel
execution but thinking of the possible alternate paths of
parallel execution will cost too much.

Limiting to parallel scans for this discussion, the overall gain
by multiple simultaneous scans distributed in path/plan tree
won't be known before cost counting is done up to the root node
(more precisely the common parent of them). This patch foolishly
does bucket brigade of parallel cost up to root node, but there
should be smarter way to shortcut it, for example, simplly
picking up parallelizable nodes by scanning completed path/plan
tree and calculate the probably-eliminable costs from them, then
subtract it from or compare to the total (nonparallel) cost. This
might be more acceptable for everyone than current implement.

> Instead, an FDW which can use parallelism can add two paths one with and
> one without parallelism with appropriate costs and let the logic choosing
> the cheapest path take care of the actual choice. In fact, I thought,
> parallelism would be always faster than the non-parallel one, except when
> the foreign server is too much loaded. But we won't be able to check that
> anyway. Can you point out a case where the parallelism may not win over
> serial execution?

It always wins against serial execution if parallel execution can
launched with no extra cost. But actually it costs extra resource
so I thought that parallel execution should be curbed for small
gain. It's the two GUCs added by this patch and what
choose_parallel_scans() does, although in non-automated way. The
overloading issue is not a matter confined to parallel execution
but surely it will be more severe since it is less visible and
controllable from users. However, it anyhow would should go to
manual configuration at end.

> BTW, the name parallelism seems to be misleading here. All, it will be able
> to do is fire the queries (or data fetch requests) asynchronously. So, we
> might want to change the naming appropriately.

It is right ragarding what I did exactly to postgres_fdw. But not
allowing all intermedate tuples from child execution nodes in
parallel to be piled up on memory without restriction, I suppose
all 'parallel' execution to be a kind of this 'asynchronous'
startup/fething. As for postgres_fdw, it would look more like
'parallel' (and perhaps more effeicient) by processing queries
using libpq's single-row mode instead of a cursor but the similar
processing takes place under system calls even for the case.


Well, I will try to make the version not including parallel costs
in plan/path structs, and single-row mode for postgres_fdw. I
hope it will go towards anything.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 09:51 AM, Tom Lane wrote:

>> AFAIK we don't _have_ a fancy negotiation system in the protocol, with
>> back-and-forth exchanges of capabilities information.
> 
> Maybe it's time to invent that.  It would be positively foolish to
> create any such behavior without a protocol version bump anyway.

I was hoping it'd be easier to sneak a new message type in without a
full protocol bump. As you can imagine that makes it a ... rather larger
job.

Still, if it's unsafe to do it that way...

> Although as I said, I don't think embedding knowledge of LSNs at the
> protocol level is a good thing to begin with. 

As I said upthread, it need not necessarily be an LSN. A commit
timestamp would do the job too, if information about the last-replayed
commit timestamp was accessible on the downstream node.

It needs to be a sequence identifier that can be matched against
pg_replication_slots / pg_stat_replication or passed to a function on
the downstream end to say "wait until we're replayed to this point".

For streaming replication there's only one upstream, so there's no need
to identify it. For BDR you'd also have to identify the upstream node of
interest - probably by slot ID, or by (sysid, tlid, dboid) tuple.

In the end, it can be an opaque magic cookie. It really doesn't matter,
so long as what the client receives is a value it can pass to another Pg
instance and say "wait until you've replayed up to this, please" or
"have you replayed up to this yet?".

> Is it really necessary that this information be pushed to the client
on every commit, as opposed to the client asking for it occasionally?

I think so, yes, though I'd be glad to be proved wrong.

For the purpose of transparent failover (BDR) at least, the server
currently being written to can go away at any moment, and you should
know exactly what you're up to in order to make it safe to continue on
another server.

Consider, for a multi-master configuration where two servers replicate
to each other:

On a connection to server1:

INSERT INTO bird(id, parrot)
VALUES (1, 'African Grey');

[client grabs magic cookie for server replay state]

INSERT INTO bird(id, parrot)
VALUES (2, 'Lorikkeet');

[server1 sends changes to server2, which is behind on replay
 and still working on catching up]

[server1 dies abruptly]

[client drops connection to server1, connects to server2]

-- Correct spelling
UPDATE bird
SET parrot = 'Lorikeet'
WHERE id = 2;

If the INSERT from server1 hasn't replayed on server2 yet this will
fail. Other anomalies can be worse and cause lost updates, etc.

To protect against this the client needs a way to wait, after connecting
to server2, until it's caught up with the state of server1. That's what
I'm talking about here. In this case, if you used a periodic progress
indicator requested by the client, you'd still get the same error,
because you'd wait until the first INSERT but not the second.

So yes, the client needs this info at every commit.

That means that enabling client-side fail-over won't be free, especially
for lots of small transactions. It'll be cheaper if Pg can push the info
with the commit confirmation instead of the client having to request it
afterwards though.

(Note that the client risks waiting forever if server1 didn't send the
required commits before it died, but that's where application policy
decisions come in).

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb format is pessimal for toast compression

2014-08-07 Thread Tom Lane

I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This interacts poorly with the code in pglz_compress() that gives up if
it's found nothing compressible in the first first_success_by bytes of a
value-to-be-compressed.  (first_success_by is 1024 in the default set of
compression parameters.)

As an example, here's gdb's report of the bitwise representation of the
example JSON value in the bug thread:

0x2ab85ac:  0x2005  0x0004  0x50003098  0x309f
0x2ab85bc:  0x30ae  0x30b8  0x30cf  0x30da
0x2ab85cc:  0x30df  0x30ee  0x3105  0x6b6e756a
0x2ab85dc:  0x40de  0x0034  0x0068  0x009c
0x2ab85ec:  0x00d0  0x0104  0x0138  0x016c
0x2ab85fc:  0x01a0  0x01d4  0x0208  0x023c
0x2ab860c:  0x0270  0x02a4  0x02d8  0x030c
0x2ab861c:  0x0340  0x0374  0x03a8  0x03dc
0x2ab862c:  0x0410  0x0444  0x0478  0x04ac
0x2ab863c:  0x04e0  0x0514  0x0548  0x057c
0x2ab864c:  0x05b0  0x05e4  0x0618  0x064c
0x2ab865c:  0x0680  0x06b4  0x06e8  0x071c
0x2ab866c:  0x0750  0x0784  0x07b8  0x07ec
0x2ab867c:  0x0820  0x0854  0x0888  0x08bc
0x2ab868c:  0x08f0  0x0924  0x0958  0x098c
0x2ab869c:  0x09c0  0x09f4  0x0a28  0x0a5c
0x2ab86ac:  0x0a90  0x0ac4  0x0af8  0x0b2c
0x2ab86bc:  0x0b60  0x0b94  0x0bc8  0x0bfc
0x2ab86cc:  0x0c30  0x0c64  0x0c98  0x0ccc
0x2ab86dc:  0x0d00  0x0d34  0x0d68  0x0d9c
0x2ab86ec:  0x0dd0  0x0e04  0x0e38  0x0e6c
0x2ab86fc:  0x0ea0  0x0ed4  0x0f08  0x0f3c
0x2ab870c:  0x0f70  0x0fa4  0x0fd8  0x100c
0x2ab871c:  0x1040  0x1074  0x10a8  0x10dc
0x2ab872c:  0x1110  0x1144  0x1178  0x11ac
0x2ab873c:  0x11e0  0x1214  0x1248  0x127c
0x2ab874c:  0x12b0  0x12e4  0x1318  0x134c
0x2ab875c:  0x1380  0x13b4  0x13e8  0x141c
0x2ab876c:  0x1450  0x1484  0x14b8  0x14ec
0x2ab877c:  0x1520  0x1554  0x1588  0x15bc
0x2ab878c:  0x15f0  0x1624  0x1658  0x168c
0x2ab879c:  0x16c0  0x16f4  0x1728  0x175c
0x2ab87ac:  0x1790  0x17c4  0x17f8  0x182c
0x2ab87bc:  0x1860  0x1894  0x18c8  0x18fc
0x2ab87cc:  0x1930  0x1964  0x1998  0x19cc
0x2ab87dc:  0x1a00  0x1a34  0x1a68  0x1a9c
0x2ab87ec:  0x1ad0  0x1b04  0x1b38  0x1b6c
0x2ab87fc:  0x1ba0  0x1bd4  0x1c08  0x1c3c
0x2ab880c:  0x1c70  0x1ca4  0x1cd8  0x1d0c
0x2ab881c:  0x1d40  0x1d74  0x1da8  0x1ddc
0x2ab882c:  0x1e10  0x1e44  0x1e78  0x1eac
0x2ab883c:  0x1ee0  0x1f14  0x1f48  0x1f7c
0x2ab884c:  0x1fb0  0x1fe4  0x2018  0x204c
0x2ab885c:  0x2080  0x20b4  0x20e8  0x211c
0x2ab886c:  0x2150  0x2184  0x21b8  0x21ec
0x2ab887c:  0x2220  0x2254  0x2288  0x22bc
0x2ab888c:  0x22f0  0x2324  0x2358  0x238c
0x2ab889c:  0x23c0  0x23f4  0x2428  0x245c
0x2ab88ac:  0x2490  0x24c4  0x24f8  0x252c
0x2ab88bc:  0x2560  0x2594  0x25c8  0x25fc
0x2ab88cc:  0x2630  0x2664  0x2698  0x26cc
0x2ab88dc:  0x2700  0x2734  0x2768  0x279c
0x2ab88ec:  0x27d0  0x2804  0x2838  0x286c
0x2ab88fc:  0x28a0  0x28d4  0x2908  0x293c
0x2ab890c:  0x2970  0x29a4  0x29d8  0x2a0c
0x2ab891c:  0x2a40  0x2a74  0x2aa8  0x2adc
0x2ab892c:  0x2b10  0x2b44  0x2b78  0x2bac
0x2ab893c:  0x2be0  0x2c14  0x2c48  0x2c7c
0x

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Michael Paquier

On Fri, Aug 8, 2014 at 11:58 AM, Fujii Masao  wrote:
> On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer  wrote:
> ISTM that the proper solution to that problem is the introduction of
> new synchronous replication mode which makes the transaction wait for
> its WAL to be replayed by the standby. If this mode is used, a client
> doesn't need to track the LSN of each transaction and check whether
> the committed transaction has already replayed by the standby or not.
Don't you need to combine that with the possibility to wait for N
targets instead of 1 in synchronous_standby_names? You may want to be
sure that the commit is done on a series of standbys before
considering any further operations after this transaction commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 10:58 AM, Fujii Masao wrote:
> ISTM that the proper solution to that problem is the introduction of
> new synchronous replication mode which makes the transaction wait for
> its WAL to be replayed by the standby. If this mode is used, a client
> doesn't need to track the LSN of each transaction and check whether
> the committed transaction has already replayed by the standby or not.

I'm not convinced of that.

That pushes the penalty onto the writer - which now has to wait until
replicas catch up. It has to pay this for every commit, even if actually
failing over to another node is unlikely.

It'd be better to just enable sync rep instead, or it would if we had
all-nodes sync rep.

IMO any waiting needs to be done on the other side, i.e. "Wait until I
am caught up before proceeding" rather than "wait for the other end to
catch up before returning".

Doing it the way you describe would make it nearly useless for enabling
client-side failover in BDR, where half the point is that it can deal
with high latency or intermittently available links to downstream replicas.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Fujii Masao

On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer  wrote:
> On 08/08/2014 03:54 AM, Tom Lane wrote:
>> Craig Ringer  writes:
>>> Hi all
>>> To support transparent client-side failover in BDR, it's necessary to
>>> know what the LSN of a node was at the time a transaction committed and
>>> keep track of that in the client/proxy.
>>
>>> I'm thinking about adding a new message type in the protocol that gets
>>> sent immediately before CommandComplete, containing the LSN of the
>>> commit. Clients would need to enable the sending of this message with a
>>> GUC that they set when they connect, so it doesn't confuse clients that
>>> aren't expecting it or aware of it.
>>
>> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
>> at all.   What happens five years from now when we switch to some other
>> implementation that doesn't have LSNs?
>
> Everyone who's relying on them already via pg_stat_replication, etc, breaks.
>
> They're _already_ exposed to users. That ship has sailed.
>
> That's not to say I'm stuck to LSNs as the way to do this, just that I
> don't think that particular argument is relevant.
>
>> I don't mind if you expose some other way to inquire about LSNs, but
>> let's *not* embed it into the wire protocol.  Not even as an option.
>
> Well, the underlying need here is to have the client able to keep track
> of what point in the server's time-line it must see on a replica before
> it proceeds to use that replica.
>
> So if the client does some work on server A, then for some reason needs
> to / must use server B, it can ask server B "are you replayed up to the
> last transaction I performed on server A yet?" and wait until it is.

ISTM that the proper solution to that problem is the introduction of
new synchronous replication mode which makes the transaction wait for
its WAL to be replayed by the standby. If this mode is used, a client
doesn't need to track the LSN of each transaction and check whether
the committed transaction has already replayed by the standby or not.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Josh Berkus

On 08/07/2014 05:52 PM, Michael Paquier wrote:
> On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus  wrote:
>> On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
>>> +1 for BRIN !
>>>
>>> On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs  wrote:
 On 7 August 2014 14:53, Robert Haas  wrote:
 A better description would be "block range index" since we are
 indexing a range of blocks (not just one block). Perhaps a better one
 would be simply "range index", which we could abbreviate to RIN or
 BRIN.
>>
>> How about Block Range Dynamic indexes?
>>
>> Or Range Usage Metadata indexes?
>>
>> You see what I'm getting at:
>>
>> BRanDy
>>
>> RUM
>>
>> ... to keep with our "new indexes" naming scheme ...
> Not the best fit for kids, fine for grad students.

But, it goes perfectly with our GIN and VODKA indexes.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 09:02 AM, Tom Lane wrote:
> Craig Ringer  writes:
>> On 08/08/2014 03:54 AM, Tom Lane wrote:
>>> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
>>> at all.   What happens five years from now when we switch to some other
>>> implementation that doesn't have LSNs?
> 
>> Everyone who's relying on them already via pg_stat_replication, etc, breaks.
>> They're _already_ exposed to users. That ship has sailed.
> 
> They're exposed to replication tools, yeah, but embedding them in the
> wire protocol would be moving the goalposts a long way past that.  As an
> example of something that doubtless seemed like a good idea at the time,
> consider the business about how an INSERT command completion tag includes
> the OID of the inserted row.  We're stuck with that obsolete idea
> *forever* because it's embedded in the protocol for all clients.

That makes sense.


>> Well, I'd prefer to be able to negotiate with the server and establish
>> requirements during the protocol handshake.
> 
> Sure, but a GUC is not that.  The problem with a GUC for changing
> wire-level behavior is that it might be set by code far removed from
> the wire, possibly breaking lower code levels that expected different
> behavior.  The multitude of ways that we offer for setting GUCs is
> an active evil in this particular context.

I'd value your input and suggestions then.

I thought this is what PGC_BACKEND GUCs were for - set them in the
startup packet and you can't mess with them afterwards. I can see that
the ability of someone to cause that to be set in (e.g.) PGOPTIONS could
be a problem though.

AFAIK we don't _have_ a fancy negotiation system in the protocol, with
back-and-forth exchanges of capabilities information.

So is the appropriate thing to do here to set a non-GUC 'options' field
in the startup packet?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread David Johnston

On Thu, Aug 7, 2014 at 5:53 PM, Tom Lane  wrote:

> David G Johnston  writes:
> > Tom Lane-2 wrote
> >> Surely that was meant to read "invalid number OF arguments".  The
> errhint
> >> is only charitably described as English, as well.  I'd suggest something
> >> like "Arguments of json_build_object() must be pairs of keys and
> values."
> >> --- but maybe someone else can phrase that better.
>
> > The user documentation is worth emulating here:
> > http://www.postgresql.org/docs/9.4/interactive/functions-json.html
>
> > errmsg("argument count must be divisible by 2")
> > errhint("The argument list consists of alternating names and values")
>
> Seems reasonable to me.
>
> > Note that I s/keys/names/ to match said documentation.
>
> Hm.  The docs aren't too consistent either: there are several other nearby
> places that say "keys".  Notably, the functions json[b]_object_keys() have
> that usage embedded in their names, where we can't readily change it.
>
> I'm inclined to think we should s/names/keys/ in the docs instead.
> Thoughts?
>
>
Agreed - have the docs match the common API term usage in our
implementation.

Not sure its worth a thorough hunt but at least fix them as they are
noticed.

David J.

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/07/2014 11:42 PM, Robert Haas wrote:
> I doubt whether it makes sense to do this without a broader
> understanding of how the client-side failover mechanism would work.
> If we're going to add something like this, it should include libpq
> support for actually doing something useful with it.

I'm currently interested in targeting PgBouncer and PgJDBC, not libpq,
though I can see that exposing helpers for it in libpq could be useful.

The goal here is to permit a client to safely switch from one server to
another - either in a multimaster async replication system like BDR, or
routing read-only queries to hot standbys with streaming replication -
and know for sure that its last commit is visible on the server it is
now connected to.

For hot standby that means it can avoid running queries that won't see
the latest work it did if the standby is lagging - deciding to run them
on the upstream instead, or wait, as appropriate.

For BDR it'll permit the client to safely perform transparent failover
to another node and resume write operations without risking conflicts
with its own prior transactions . (I wrote some explanations about this
on -general in the thread here:
http://www.postgresql.org/message-id/84184aef-887d-49df-8f47-6377b1d6e...@gmail.com
).

Broadly, what I'm thinking of is:

* Whenever a client issues a transaction that gets a txid assigned, and
that tx commits, the server reports the LSN that includes the commit.

* The client keeps track of which server it is connected to using the
server's (sysid, timelineid, databaseoid) or a similar identifier -
probably specific to the replication protocol in use, unless something
generic proves practical.

* When the client has to switch to a new server or chooses to do so, it
checks pg_stat_replication or pg_replication_slots, finds the server it
was previously connected to, and checks to see if the new server has
replayed up to the last write transaction this client performed on the
previous server. If not, it can make a policy-driven decision: wait
until replay catchup, wait for a while then bail out, etc.

This is admittedly all a bit hand-wavey. I'm looking at ways to do it,
not a firm implementation plan.

Notably, the LSN (and timelineID) aren't the only way to keep track of
the replay progress of a server and check it from another server. If the
commit timestamps work is merged and the timestamp of the last replayed
commit record is exposed in pg_replication_slots, the client could use
the server-reported commit timestamp to the same effect.

In the above you'll note that the client has to make some choices. The
client might be picking different servers for failover, read load
spreading, or other things I haven't thought of. It might be retaining
the old connection and making new ones it wants to be consistent up to a
certain point on the old connection (read spreading), or it might be
dropping the old connection and making a new one (failover). If the new
server to connect to isn't caught up yet it might want to wait
indefinitely, wait a short while, or bail out immediately and try a
different server. There's a lot of client/application specific policy
going to be involved here, so I'm not sure it makes sense to try to make
it transparent in libpq. I can see it being useful to expose some tools
in libpq for it, without a doubt, so clients can do these sorts of
things usefully.

(There's also another whole new question: how do you pick which
alternative server to connect to? But that's not really within the scope
of this.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Tom Lane

Craig Ringer  writes:
> On 08/08/2014 03:54 AM, Tom Lane wrote:
>> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
>> at all.   What happens five years from now when we switch to some other
>> implementation that doesn't have LSNs?

> Everyone who's relying on them already via pg_stat_replication, etc, breaks.
> They're _already_ exposed to users. That ship has sailed.

They're exposed to replication tools, yeah, but embedding them in the
wire protocol would be moving the goalposts a long way past that.  As an
example of something that doubtless seemed like a good idea at the time,
consider the business about how an INSERT command completion tag includes
the OID of the inserted row.  We're stuck with that obsolete idea
*forever* because it's embedded in the protocol for all clients.

>> This position also obviates the need to complain about having a GUC
>> that changes the protocol-level behavior, which is also a seriously
>> bad idea.

> Well, I'd prefer to be able to negotiate with the server and establish
> requirements during the protocol handshake.

Sure, but a GUC is not that.  The problem with a GUC for changing
wire-level behavior is that it might be set by code far removed from
the wire, possibly breaking lower code levels that expected different
behavior.  The multitude of ways that we offer for setting GUCs is
an active evil in this particular context.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 7:58 AM, Robert Haas  wrote:
> range index might get confused with range types; block range index
> seems better.  I like summary, but I'm fine with block range index or
> block filter index, too.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Tom Lane

David G Johnston  writes:
> Tom Lane-2 wrote
>> Surely that was meant to read "invalid number OF arguments".  The errhint
>> is only charitably described as English, as well.  I'd suggest something
>> like "Arguments of json_build_object() must be pairs of keys and values."
>> --- but maybe someone else can phrase that better.

> The user documentation is worth emulating here:
> http://www.postgresql.org/docs/9.4/interactive/functions-json.html

> errmsg("argument count must be divisible by 2")
> errhint("The argument list consists of alternating names and values")

Seems reasonable to me.

> Note that I s/keys/names/ to match said documentation.

Hm.  The docs aren't too consistent either: there are several other nearby
places that say "keys".  Notably, the functions json[b]_object_keys() have
that usage embedded in their names, where we can't readily change it.

I'm inclined to think we should s/names/keys/ in the docs instead.
Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Michael Paquier

On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus  wrote:
> On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
>> +1 for BRIN !
>>
>> On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs  wrote:
>>> On 7 August 2014 14:53, Robert Haas  wrote:
>>> A better description would be "block range index" since we are
>>> indexing a range of blocks (not just one block). Perhaps a better one
>>> would be simply "range index", which we could abbreviate to RIN or
>>> BRIN.
>
> How about Block Range Dynamic indexes?
>
> Or Range Usage Metadata indexes?
>
> You see what I'm getting at:
>
> BRanDy
>
> RUM
>
> ... to keep with our "new indexes" naming scheme ...
Not the best fit for kids, fine for grad students.

BRIN seems to be a perfect consensus, so +1 for it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Craig Ringer

On 08/08/2014 03:54 AM, Tom Lane wrote:
> Craig Ringer  writes:
>> Hi all
>> To support transparent client-side failover in BDR, it's necessary to
>> know what the LSN of a node was at the time a transaction committed and
>> keep track of that in the client/proxy.
> 
>> I'm thinking about adding a new message type in the protocol that gets
>> sent immediately before CommandComplete, containing the LSN of the
>> commit. Clients would need to enable the sending of this message with a
>> GUC that they set when they connect, so it doesn't confuse clients that
>> aren't expecting it or aware of it.
> 
> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
> at all.   What happens five years from now when we switch to some other
> implementation that doesn't have LSNs?

Everyone who's relying on them already via pg_stat_replication, etc, breaks.

They're _already_ exposed to users. That ship has sailed.

That's not to say I'm stuck to LSNs as the way to do this, just that I
don't think that particular argument is relevant.

> I don't mind if you expose some other way to inquire about LSNs, but
> let's *not* embed it into the wire protocol.  Not even as an option.

Well, the underlying need here is to have the client able to keep track
of what point in the server's time-line it must see on a replica before
it proceeds to use that replica.

So if the client does some work on server A, then for some reason needs
to / must use server B, it can ask server B "are you replayed up to the
last transaction I performed on server A yet?" and wait until it is.

That's useful for streaming replication (to permit consistent queries
against read replicas) but it's much more so for BDR, where it's
necessary to avoid a variety of multi-master replication anomalies and
conflicts.

I considered LSNs to be the logical mechanism for this as they're
already user-visible, exposed in pg_stat_replication, they can already
be used for just this purpose by hand (just with an extra round-trip), etc.

An obvious alternative is to merge the commit timestamp work, then
expose the timestamp of the last commit replayed in pg_stat_replication.
Then all the client needs to keep track of is the server time of the
last commit.

> This position also obviates the need to complain about having a GUC
> that changes the protocol-level behavior, which is also a seriously
> bad idea.

Well, I'd prefer to be able to negotiate with the server and establish
requirements during the protocol handshake.

As far as I know there isn't an explicit protocol negotiation with
capabilities fields (just a plain version field), but we do have the
startup packet's 'options' field. So I was thinking that requesting the
setting of a PGC_BACKEND GUC in the startup packet would be a logical
way for the client to request use of a protocol extension.

Looking at ProcessStartupPacket(...) in postmaster.c I see that there's
room for special-casing options. Do you think it would be more
appropriate to add a new connection option that's sent by a client to
request reporting of commit timestamps / LSNs / whatever by the server
at commit time?

If not, do you have an alternative suggestion? I can't imagine that
extending the CommandComplete message is a desirable option.

It seems like it'd be useful to expose this as a read-only GUC anyway,
so I don't really see why a PGC_BACKEND GUC isn't exactly the right
thing to use for this, but I'm happy to listen to suggestions.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Josh Berkus

On 08/07/2014 08:38 AM, Oleg Bartunov wrote:
> +1 for BRIN !
> 
> On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs  wrote:
>> On 7 August 2014 14:53, Robert Haas  wrote:
>> A better description would be "block range index" since we are
>> indexing a range of blocks (not just one block). Perhaps a better one
>> would be simply "range index", which we could abbreviate to RIN or
>> BRIN.

How about Block Range Dynamic indexes?

Or Range Usage Metadata indexes?

You see what I'm getting at:

BRanDy

RUM

... to keep with our "new indexes" naming scheme ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quick doc fix

2014-08-07 Thread Tom Lane

Guillaume Lelarge  writes:
> Still translating the 9.4 manual, and found another typo. Patch attached.

Applied, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Josh Berkus

On 08/07/2014 04:48 PM, Tom Lane wrote:
> plpgsql is not efficient at all about coercions performed as a side
> effect of assignments; if memory serves, it always handles them by
> converting to text and back.  So basically the added cost here came
> from float8out() and float4in().  There has been some talk of trying
> to do such coercions via SQL casts, but nothing's been done for fear
> of compatibility problems.

Yeah, that's a weeks-long project for someone.  And would require a lot
of tests ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Tom Lane

James Cloos  writes:
> "ST" == Shaun Thomas  writes:
> ST> That said, the documentation here says FLOAT4 is an alias for REAL,
> ST> so it's somewhat nonintuitive for FLOAT4 to be so much slower than
> ST> FLOAT8, which is an alias for DOUBLE PRECISION.

> There are some versions of glibc where doing certain math on double is
> faster than doing it on float, depending on how things are compiled.
> Maybe this is one of them?

No, it isn't.  The problem here is that the result of SQRT() is
float8 (a/k/a double precision) while the variable that it is to
be assigned to is float4 (a/k/a real).  As was noted upthread,
changing the variable's declared type to eliminate the run-time
type coercion removes just about all the discrepancy between PG
and Oracle runtimes.  The original comparison is not apples-to-apples
because the Oracle coding required no type coercions.  (Or at least,
so I assume; I'm not too familiar with Oracle's math functions.)

plpgsql is not efficient at all about coercions performed as a side
effect of assignments; if memory serves, it always handles them by
converting to text and back.  So basically the added cost here came
from float8out() and float4in().  There has been some talk of trying
to do such coercions via SQL casts, but nothing's been done for fear
of compatibility problems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-07 Thread Joshua D. Drake



Hello,

I know this has been brought up before:

http://www.postgresql.org/message-id/20140724080902.ga28...@msg.df7cb.de

But this is just plain wrong. I don't care that the FAQ (on the wiki) 
says we are doing it wrong for good reasons. When I (or anyone else) 
pulls postgresql-$version-dev, I want the libpq for my version. I do not 
want 9.3.


Yes, it "should" (because of protocol compatibility) work but it doesn't 
always (as stated in that email and in a similar problem we just ran into).


There can be unintended circumstances on machines when you mix and match 
like that. Can we please do some proper packaging on this?


Sincerely,

Joshua D. Drake
--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans."


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Append to a GUC parameter ?

2014-08-07 Thread Jerry Sievers

Alvaro Herrera  writes:

> Fabrízio de Royes Mello wrote:
>
>> On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane  wrote:
>> 
>> > Josh Berkus  writes:
>> > > BTW, while there's unlikely to be a good reason to put search_path in
>> > > pg.conf with appends, there are a LOT of reasons to want to be able to
>> > > append to it during a session.
>> >
>> > [shrug...]  You can do that today with current_setting()/set_config().
>> 
>> With a very simple statement you can do that:
>
> Of course, this doesn't solve the requirement that started this thread,
> which is about having "includeable" pg.conf fragments to enable
> extensions.
o
ISTM the idea of a token in the value string that would expand to an
existing setting s/b general purpose enough to  allow for prepend/append
and not require adding a new opperator as += or whatever.
>
I say this without knowing just exactly what the implementation effort
is but just to reiterate the original intent.

I think someone already suggest this upthread.

shared_preload_libraries = '%,more_libs'
shared_preload_libraries = 'more_libs,%'

At conclusion of file processing, stripping off an unnecessary delimiter
at beginning or end of string would be a nice asthetic feature and/or
might be required depending on whether an empty list value is legal.

Thanks

> -- 
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor  wrote:
> This one is frequently sorted as batch operations against the files are
> performed in alphabetical order to reduce conflict issues that a random
> ordering may cause between jobs.

Sure. There are cases out there. But, again, I have a hard time
imagining why you'd expect those to be pre-sorted in practice, and
particularly why you'd feel justified in expecting that to sort much
faster than equivalent though slightly imperfectly correlated data.
Without that, the fmgr-elision aspect of sort support appears to offer
enough for us to still win on balance [1], assuming 9.4 is our basis
of comparison.

[1] 
http://www.postgresql.org/message-id/cam3swzqhjxiyrsqbs5w3u-vtj_jt2hp8o02big5wyb4m9lp...@mail.gmail.com
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor

Sigh. Found another example.

A table with 15 million entries and a unique key on filesystem location for
things users created via a web interface.

Entries all begin with /usr/home/ ...

This one is frequently sorted as batch operations against the files are
performed in alphabetical order to reduce conflict issues that a random
ordering may cause between jobs.

regards,

Rod

On Thu, Aug 7, 2014 at 5:23 PM, Rod Taylor  wrote:

>
> On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan  wrote:
>
>> I think that pre-sorted, all-unique text datums, that have all
>> differences beyond the first 8 bytes, that the user happens to
>> actually want to sort are fairly rare.
>
>
> While I'm sure it's not common, I've seen a couple of ten-million tuple
> tables having a URL column as primary key where 98% of the entries begin
> with 'http://www.'
>
> So, that exact scenario is out there.
>

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor  wrote:
> While I'm sure it's not common, I've seen a couple of ten-million tuple
> tables having a URL column as primary key where 98% of the entries begin
> with 'http://www.'
>
> So, that exact scenario is out there.

Sure, that scenario is relatively common. My point was that that
scenario, alongside a perfect logical/physical heap correlation, and
alongside a frequent need to sort is uncommon. If you're able to get
away with a "bubble sort best case" there, then the sort is going to
be extremely fast relative to any other database system anyway, and
you don't have too much to complain about. It's exactly the same
wasted cost as in the general case (where the tuples aren't in order),
except that it looks more expensive next to your unrealistically fast
sort that you were very lucky to be able to get away with.

I think the special pre-check for already sorted tuples within qsort()
is at best only justified by scenarios like where a serial primary key
index is reindexed. That's about it.

You can totally alter the outcome of this case by inserting a single
tuple, so the top-level pre-check fails.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread David G Johnston

Tom Lane-2 wrote
> Robert Haas <

> robertmhaas@

> > writes:
>> On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
>> <

> daniele.varrazzo@

> > wrote:
>>> I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
>>> "argument 1: something is wrong": the string is likely to end up in
>>> sentences with other colons so I'd rather use "something is wrong at
>>> argument 1".
>>> 
>>> Is the patch attached better?
> 
>> Looks OK to me.  I thought someone else might comment, but since no
>> one has, committed.
> 
> It looks to me like this is still wrong:
> 
> if (nargs % 2 != 0)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -errmsg("invalid number or arguments: object must be
> matched key value pairs")));
> +errmsg("invalid number or arguments"),
> +errhint("Object must be matched key value pairs.")));
> 
> Surely that was meant to read "invalid number OF arguments".  The errhint
> is only charitably described as English, as well.  I'd suggest something
> like "Arguments of json_build_object() must be pairs of keys and values."
> --- but maybe someone else can phrase that better.

The user documentation is worth emulating here:

http://www.postgresql.org/docs/9.4/interactive/functions-json.html


errmsg("argument count must be divisible by 2")
errhint("The argument list consists of alternating names and values")

Note that I s/keys/names/ to match said documentation.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fixed-redundant-i18n-strings-in-json-tp5813330p5814122.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Rod Taylor

On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan  wrote:

> I think that pre-sorted, all-unique text datums, that have all
> differences beyond the first 8 bytes, that the user happens to
> actually want to sort are fairly rare.


While I'm sure it's not common, I've seen a couple of ten-million tuple
tables having a URL column as primary key where 98% of the entries begin
with 'http://www.'

So, that exact scenario is out there.

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan  wrote:
> I think that pre-sorted, all-unique text datums, that have all
> differences beyond the first 8 bytes, that the user happens to
> actually want to sort are fairly rare.

Actually, you could use that case to justify not doing strxfrm()
transformation of very large lists in the more general case, where
strxfrm() blobs aren't truncated/abbreviated, but our qsort() is used,
which goes against the strong recommendation of, just to pick an
example at random, the glibc documentation. It states: "Here is an
example of how you can use strxfrm when you plan to do many
comparisons. It does the same thing as the previous example, but much
faster, because it has to transform each string only once, no matter
how many times it is compared with other strings. Even the time needed
to allocate and free storage is much less than the time we save, when
there are many strings." [1]

Sure, that would still be better than the equivalent abbreviated keys
case, since no strcoll() tie-breakers are required, but it would still
be bad, and all because of our pre-check for sorted input.

[1] https://www.gnu.org/software/libc/manual/html_node/Collation-Functions.html
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Quick doc fix

2014-08-07 Thread Guillaume Lelarge

Hi,

Still translating the 9.4 manual, and found another typo. Patch attached.

Thanks.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index d3fcb82..cf174f0 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -528,7 +528,7 @@
 the relfrozenxid column of a table's
 pg_class row contains the freeze cutoff XID that was used
 by the last whole-table VACUUM for that table.  All rows
-inserted by transactions with XIDs XIDs older than this cutoff XID are
+inserted by transactions with XIDs older than this cutoff XID are
 guaranteed to have been frozen.  Similarly,
 the datfrozenxid column of a database's
 pg_database row is a lower bound on the unfrozen XIDs

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:17 PM, Robert Haas  wrote:
> Gah.  Hit send to soon.  Also, as much as I'd prefer to avoid
> relitigating the absolutely stupid debate about how to expand the
> buffers, this is no good:
>
> +   tss->buflen1 = Max(len1 + 1, tss->buflen1 * 2);
>
> If the first expansion is for a string >512MB and the second string is
> longer than the first, this will exceed MaxAllocSize and error out.

Fair point. I think this problem is already present in a few existing
places, but it shouldn't be. I suggest this remediation:

> +   tss->buflen1 = Max(len1 + 1, Min(tss->buflen1 * 2, (int) 
> MaxAllocSize));

I too would very much prefer to not repeat that debate.  :-)
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Tom Lane

Craig Ringer  writes:
> Hi all
> To support transparent client-side failover in BDR, it's necessary to
> know what the LSN of a node was at the time a transaction committed and
> keep track of that in the client/proxy.

> I'm thinking about adding a new message type in the protocol that gets
> sent immediately before CommandComplete, containing the LSN of the
> commit. Clients would need to enable the sending of this message with a
> GUC that they set when they connect, so it doesn't confuse clients that
> aren't expecting it or aware of it.

FWIW, I think it's a seriously bad idea to expose LSNs in the protocol
at all.   What happens five years from now when we switch to some other
implementation that doesn't have LSNs?

I don't mind if you expose some other way to inquire about LSNs, but
let's *not* embed it into the wire protocol.  Not even as an option.

This position also obviates the need to complain about having a GUC
that changes the protocol-level behavior, which is also a seriously
bad idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 12:15 PM, Robert Haas  wrote:
> In my original patch, I wrote NUL, as in the NUL character.  You've
> changed it to NULL, but the original was correct.  NULL is a pointer
> value that is not relevant here; the character with value 0 is NUL.

"NULL-terminated string" seems like acceptable usage (e.g. [1]), but
I'll try to use the term NUL in reference to '\0' in the future to
avoid confusion.

[1] https://en.wikipedia.org/wiki/Null-terminated_string
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: show only failed queries

2014-08-07 Thread Pavel Stehule

2014-08-07 7:10 GMT+02:00 Fujii Masao :

> On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule 
> wrote:
> > Hello
> >
> > updated version patch in attachment
>
> Thanks! But ISTM you forgot to attached the patch.
>

grr .. I am sorry


>
> >> +/* all psql known variables are included in list by default */
> >> +for (known_varname = known_varnames; *known_varname;
> known_varname++)
> >> +varnames[nvars++] = pg_strdup(*known_varname);
> >>
> >> Don't we need to append both prefix and suffix to even known variables?
> >
> >
> > ??? I am not sure if I understand well - probably system "read only"
> > variables as DBNAME, USER, VERSION should not be there
>
> I had that question because complete_from_variables() is also called by the
> tab-completion of "\echo :" and it shows half-baked variables list. So
> I thought probably we need to change complete_from_variables() more to
> fix the problem.
>

I understand now.

I fixed it.

I have a question.\echo probably should not to show empty known variable.

data for autocomplete for \echo should be same as result of "\set"



>
> >> +else if (strcmp(prev2_wd, "\\set") == 0)
> >> +{
> >> +if (strcmp(prev_wd, "AUTOCOMMIT") == 0)
> >>
> >> ISTM that some psql variables like IGNOREEOF are not there. Why not?
> >
> >
> > yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT,
> HISTCONTROL,
> > HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER,  VERSION
> >
> > There are more reasons:
> >
> > * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT,
> > HISTSIZE, ..
> >
> > * variable is pseudo read only  - change has not any effect: DBNAME,
> > ENCODING, VERSION
>
> So HISTCONTROL should be there because it doesn't have such reasons at all?
>
>
yes

Pavel


> Regards,
>
> --
> Fujii Masao
>
commit bacfdc0e03219265aeee34b78f7ec9d272d49f72
Author: Pavel Stehule 
Date:   Wed Aug 6 23:05:42 2014 +0200

warnings and typo fixed

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 24e60b7..b94ed63 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3608,6 +3608,79 @@ psql_completion(const char *text, int start, int end)
 	{
 		matches = complete_from_variables(text, "", "");
 	}
+	else if (strcmp(prev2_wd, "\\set") == 0)
+	{
+		if (strcmp(prev_wd, "AUTOCOMMIT") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", "interactive", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "COMP_KEYWORD_CASE") == 0)
+		{
+			static const char *const my_list[] =
+			{"lower", "upper", "preserve-lower", "preserve-upper", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "ECHO") == 0)
+		{
+			static const char *const my_list[] =
+			{"none", "errors", "queries", "all", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "ECHO_HIDDEN") == 0)
+		{
+			static const char *const my_list[] =
+			{"noexec", "off", "on", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "ON_ERROR_ROLLBACK") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", "interactive", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "ON_ERROR_STOP") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "QUIET") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "SINGLELINE") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "SINGLESTEP") == 0)
+		{
+			static const char *const my_list[] =
+			{"on", "off", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "VERBOSITY") == 0)
+		{
+			static const char *const my_list[] =
+			{"default", "verbose", "terse", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+	}
 	else if (strcmp(prev_wd, "\\sf") == 0 || strcmp(prev_wd, "\\sf+") == 0)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
 	else if (strcmp(prev_wd, "\\cd") == 0 ||
@@ -4062,28 +4135,61 @@ complete_from_variables(const char *text, const char *prefix, const char *suffix
 {
 	char	  **matches;
 	char	  **varnames;
+	static const char **known_varname;
 	int			nvars = 0;
 	int			maxvars = 100;
 	int			i;
 	struct _variable *ptr;
 
+	static const char *known_varnames[] = {
+		"AUTOCOMMIT", "COMP_KEYWORD_CASE", "DBNAME", "ECHO", "ECHO_HIDDEN", "ENCODING",
+		"FETCH_COUNT", "HISTCONTROL", "HISTFILE", "HISTSIZE", "HOST", "IGNOREEOFF",
+		"LASTOID", "ON_ERROR_ROLLBACK", "ON_ERROR_STOP", "PORT", "PROMPT1", "PROMPT2",
+		"PROMPT3", "QUIET", "SINGLELINE", "SINGLESTEP", "USER", "VERBOSITY",
+		NULL
+	};
+
 	varnames = (char **) pg_malloc((maxvars + 1) * sizeof(char *));
 
+	/* all psql known variables are included in list by default

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Tom Lane

Robert Haas  writes:
> On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
>  wrote:
>> I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
>> "argument 1: something is wrong": the string is likely to end up in
>> sentences with other colons so I'd rather use "something is wrong at
>> argument 1".
>> 
>> Is the patch attached better?

> Looks OK to me.  I thought someone else might comment, but since no
> one has, committed.

It looks to me like this is still wrong:

if (nargs % 2 != 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg("invalid number or arguments: object must be matched 
key value pairs")));
+errmsg("invalid number or arguments"),
+errhint("Object must be matched key value pairs.")));

Surely that was meant to read "invalid number OF arguments".  The errhint
is only charitably described as English, as well.  I'd suggest something
like "Arguments of json_build_object() must be pairs of keys and values."
--- but maybe someone else can phrase that better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 3:15 PM, Robert Haas  wrote:
> On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan  wrote:
>> On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas  wrote:
>>> I've committed the patch I posted yesterday.  I did not see a good
>>> reason to meld that together in a single commit with the first of the
>>> patches you posted; I'll leave you to revise that patch to conform
>>> with this approach.
>>
>> Okay. Attached is the same patch set, rebased on top on your commit
>> with appropriate amendments.
>
> Two things:
>
> +* result.  Also, since strxfrm()/strcoll() require
> NULL-terminated inputs,
>
> In my original patch, I wrote NUL, as in the NUL character.  You've
> changed it to NULL, but the original was correct.  NULL is a pointer
> value that is not relevant here; the character with value 0 is NUL.

Gah.  Hit send to soon.  Also, as much as I'd prefer to avoid
relitigating the absolutely stupid debate about how to expand the
buffers, this is no good:

+   tss->buflen1 = Max(len1 + 1, tss->buflen1 * 2);

If the first expansion is for a string >512MB and the second string is
longer than the first, this will exceed MaxAllocSize and error out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan  wrote:
> On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas  wrote:
>> I've committed the patch I posted yesterday.  I did not see a good
>> reason to meld that together in a single commit with the first of the
>> patches you posted; I'll leave you to revise that patch to conform
>> with this approach.
>
> Okay. Attached is the same patch set, rebased on top on your commit
> with appropriate amendments.

Two things:

+* result.  Also, since strxfrm()/strcoll() require
NULL-terminated inputs,

In my original patch, I wrote NUL, as in the NUL character.  You've
changed it to NULL, but the original was correct.  NULL is a pointer
value that is not relevant here; the character with value 0 is NUL.


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas  wrote:
> I think that's actually not a very unrealistic case at all.  In
> general, I think that if a particular data distribution is a
> reasonable scenario, that data distribution plus "it's already sorted"
> is also reasonable.  Data gets presorted in all kinds of ways:
> sometimes it gets loaded in sorted (or nearly-sorted) order from
> another data source; sometimes we do an index scan that produces data
> in order by column A and then later sort by A, B; sometimes somebody
> clusters the table.  Respecting the right of other people to have
> different opinions, I don't think I'll ever be prepared to concede the
> presorted case as either uncommon or unimportant.

I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare. All I'm saying is: if that's a
case that has to lose out more than your more limited case in order to
get a large number of representative cases 3 - 7 times faster, and
marginal cases a good bit faster, that's probably worth it. I am
willing to fix any case that I can - as you say, it's often easier to
write the code that to argue - but let's have a sense of proportion
about these things. Even your quite reasonable case is going to lose
out a bit, because what I've done is fundamentally about deciding at
intervals during copying tuples if it's worth it. If it isn't, then
clearly that whole process went to waste, and we've merely cut our
losses. We could add a GUC to control it as suggested by Simon, or
even put something in attoptions per attribute to indicate that the
optimization should be avoided, but that's something that we've
historically resisted doing.

> That's not to prejudge anything that may or may not be in your patch,
> which I have not studied in enormous detail.  It's just what I think
> about the subject in general.

Fair enough. At the risk of sounding trite, I'll add: everything is a
trade-off.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-08-07 Thread Bruce Momjian

On Tue, Aug  5, 2014 at 07:31:21PM -0400, Bruce Momjian wrote:
> On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote:
> > On Thu, Jul 10, 2014 at 06:17:14PM -0400, Bruce Momjian wrote:
> > > Well, we are going to need to call internal C functions, often bypassing
> > > their typical call sites and the assumption about locking, etc.  Perhaps
> > > this could be done from a plpgsql function.  We could add and drop a
> > > dummy column to force TOAST table creation --- we would then only need a
> > > way to detect if a function _needs_ a TOAST table, which was skipped in
> > > binary upgrade mode previously.
> > > 
> > > That might be a minimalistic approach.
> > 
> > I have thought some more on this.  I thought I would need to open
> > pg_class in C and do complex backend stuff, but I now realize I can do
> > it from libpq, and just call ALTER TABLE and I think that always
> > auto-checks if a TOAST table is needed.  All I have to do is query
> > pg_class from libpq, then construct ALTER TABLE commands for each item,
> > and it will optionally create the TOAST table if needed.  I just have to
> > use a no-op ALTER TABLE command, like SET STATISTICS.
> 
> Attached is a completed patch which handles oid conflicts in pg_class
> and pg_type for TOAST tables that were not needed in the old cluster but
> are needed in the new one.  I was able to recreate a failure case and
> this fixed it.
> 
> The patch need to be backpatched because I am getting more-frequent bug
> reports from users using pg_upgrade to leave now-end-of-life'ed 8.4. 
> There is not a good work-around for pg_upgrade failures without this
> fix, but at least pg_upgrade throws an error.

Patch applied through 9.3, with an additional Assert check. 9.2 code was
different enough that there was too high a risk for backpatching.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan  wrote:
> On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas  wrote:
>> So here.  You may not agree that the mitigation strategies for which
>> others are asking for are worthwhile, but you can't expect everyone
>> else to agree with your assessment of which cases are likely to occur
>> in practice.  The case of a cohort of strings to be sorted which share
>> a long fixed prefix and have different stuff at the end does not seem
>> particularly pathological to me.  It doesn't, in other words, require
>> an adversary: some real-world data sets will look like that.  I will
>> forebear repeating examples I've given before, but I've seen that kind
>> of thing more than once in real data sets that people (well, me,
>> anyway) actually wanted to put into a PostgreSQL database.  So I'm
>> personally of the opinion that the time you've put into trying to
>> protect against those cases is worthwhile.  I understand that you may
>> disagree with that, and that's fine: we're not all going to agree on
>> everything.
>
> I actually agree with you here.

/me pops cork.

> Sorting text is important, so we
> should spend a lot of time avoiding regressions. Your example is
> reasonable - I believe that people do that to a non-trivial degree.
> The fact is, I probably can greatly ameliorate that case. However, to
> give an example of a case I have less sympathy for, I'll name the case
> you mention, *plus* the strings are already in logical order, and so
> our qsort() can (dubiously) take advantage of that, so that there is a
> 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
> There might be other cases like that that crop up.

I think that's actually not a very unrealistic case at all.  In
general, I think that if a particular data distribution is a
reasonable scenario, that data distribution plus "it's already sorted"
is also reasonable.  Data gets presorted in all kinds of ways:
sometimes it gets loaded in sorted (or nearly-sorted) order from
another data source; sometimes we do an index scan that produces data
in order by column A and then later sort by A, B; sometimes somebody
clusters the table.  Respecting the right of other people to have
different opinions, I don't think I'll ever be prepared to concede the
presorted case as either uncommon or unimportant.

That's not to prejudge anything that may or may not be in your patch,
which I have not studied in enormous detail.  It's just what I think
about the subject in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Petr Jelinek


On 07/08/14 16:16, Simon Riggs wrote:


A better description would be "block range index" since we are
indexing a range of blocks (not just one block). Perhaps a better one
would be simply "range index", which we could abbreviate to RIN or
BRIN.



+1 for block range index (BRIN)

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Nicolas Barbier

2014-08-07 Oleg Bartunov :

> +1 for BRIN !

+1, rolls off the tongue smoothly and captures the essence :-).

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Gabriele Bartolini

Hi Marco,

> With the current full backup procedure they are backed up, so I think
> that having them backed up with a rsync-like algorithm is what an user
> would expect for an incremental backup.

Exactly. I think a simple, flexible and robust method for file based
incremental backup is all we need. I am confident it could be done for
9.5.

I would like to quote every single word Simon said. Block level
incremental backup (with Robert's proposal) is definitely the ultimate
goal for effective and efficient physical backups. I see file level
incremental backup as a very good "compromise", a sort of intermediate
release which could nonetheless produce a lot of benefits to our user
base, for years to come too.

Thanks,
Gabriele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Peter Geoghegan

On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas  wrote:
> So here.  You may not agree that the mitigation strategies for which
> others are asking for are worthwhile, but you can't expect everyone
> else to agree with your assessment of which cases are likely to occur
> in practice.  The case of a cohort of strings to be sorted which share
> a long fixed prefix and have different stuff at the end does not seem
> particularly pathological to me.  It doesn't, in other words, require
> an adversary: some real-world data sets will look like that.  I will
> forebear repeating examples I've given before, but I've seen that kind
> of thing more than once in real data sets that people (well, me,
> anyway) actually wanted to put into a PostgreSQL database.  So I'm
> personally of the opinion that the time you've put into trying to
> protect against those cases is worthwhile.  I understand that you may
> disagree with that, and that's fine: we're not all going to agree on
> everything.

I actually agree with you here. Sorting text is important, so we
should spend a lot of time avoiding regressions. Your example is
reasonable - I believe that people do that to a non-trivial degree.
The fact is, I probably can greatly ameliorate that case. However, to
give an example of a case I have less sympathy for, I'll name the case
you mention, *plus* the strings are already in logical order, and so
our qsort() can (dubiously) take advantage of that, so that there is a
1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
There might be other cases like that that crop up.

I also thought the paper was pretty cool, and thought it might be
interesting because of the fact that is was authored by McIlroy, and
he refers to weaknesses in his and our very own implementation
specifically.
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread John Cochran

On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas  wrote:

> On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan  wrote:
> > "The adversarial method works for almost any polymorphic program
> > recognizable as quicksort. The subject quicksort may copy values at
> > will, or work with lists rather than arrays. It may even pick the
> > pivot at random. The quicksort will be vulnerable provided only that
> > it satisfies some mild assumptions that are met by every
> > implementation I have seen".
> >
> > IMHO, while worst case performance is a very useful tool for analyzing
> > algorithms (particularly their worst case time complexity), a worst
> > case should be put in its practical context. For example, if we had
> > reason to be concerned about *adversarial* inputs, I think that there
> > is a good chance that our qsort() actually would be problematic to the
> > point of driving us to prefer some generally slower alternative.
>

If one is concerned about adversarial inputs, then as mentioned earlier,
two main steps need to be taken.
1. Don't permit potential adversaries define the comparison function used
by qsort().
2. Perform random selection of pivot to prevent precomputed lists of data
that invoke quadratic behavior in qsort.

And before the argument that a random pivot will be less efficient than the
median of median that's currently being used, you need to look at what is
currently being used.
1. If a "small" partition is being sorted, the element at n/2 is selected
as the pivot with n being the size of the partition.
2. If a "medium" partition is being sorted, then pivot selected is the
median of the elements at 0, n/2, and n.
3. And finally, if a "large" partition is being sorted, potential pivots
are selected from 0, n/8, n/4, 3n/8, n/2, 5n/8, 3n/4, 7n/8, and n.  Of
those 9 candidates, the median of medians is selected as the pivot.

There is nothing in the world that would prevent the following.
1. Short partition - select a pivot at random.
2. Medium partition - select the median of 3 randomly selected candidates.
3. Large partition - select the median of medians of 9 randomly selected
candidates.

Using the above random pivot selection methods, the performance of a
hardened qsort should be close to that of an unhardened qsort. The only
real difference is the cost of generating random numbers to select the
candidates for the median tests. However, if an malicious compare function
is permitted to be defined, it would still be vulnerable to that attack,
but it would be pretty much immune to a precomputed data attack.

Or perhaps it just might be simpler to detect an attack and fall back to a
sort algorithm that doesn't have quadratic behavior.

What I mean by that is the qsort in the "Engineering a Sort Function" has a
big O of approximately 1.1 n ln n comparisons. If upon starting qsort, it
computes an upper bound of comparisons it's willing to perform. Say 3 n ln
n or so (1.386 n log n is the estimate for a randomly selected pivot). Then
if the upper bound is reached, the qsort function backs out leaving the
array in a partially sorted order, and then calls a slower sort that
doesn't exhibit quadratic behavior to actually finish the sort.

The result of doing that would be most, if not all, sorts being handled by
qsort in a timely fashion. But if bad luck or an adversary strikes, the
qsort bails out after things look bad and passes the task to a different
sort. I'll be the first to admit that the hybrid approach would be slower
in those bad cases than it would be if the alternate sort was selected at
the beginning, but it would be faster than letting qsort continue with
quadratic behavior.

-- 

There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies and the other way is to
make it so complicated that there are no obvious deficiencies. — C.A.R.
Hoare

Re: [HACKERS] replication commands and log_statements

2014-08-07 Thread Abhijit Menon-Sen

At 2014-08-07 23:22:43 +0900, masao.fu...@gmail.com wrote:
>
> That is, we log replication commands only when log_statement is set to
> all. Neither new parameter is introduced nor log_statement is
> redefined as a list.

That sounds good to me.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini

Il 07/08/14 17:25, Bruce Momjian ha scritto:
> On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
>> On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao  wrote:
>>> There are some data which don't have LSN, for example, postgresql.conf.
>>> When such data has been modified since last backup, they also need to
>>> be included in incremental backup? Probably yes.
>> Definitely yes. That's as well the case of paths like pg_clog,
>> pg_subtrans and pg_twophase.
> 
> I assumed these would be unconditionally backed up during an incremental
> backup because they relatively small and you don't want to make a mistake.
> 

You could decide to always copy files which doesn't have LSN, but you
don't know what the user could put inside PGDATA. I would avoid any
assumption on files which are not owned by Postgres.

With the current full backup procedure they are backed up, so I think
that having them backed up with a rsync-like algorithm is what an user
would expect for an incremental backup.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Marco Nenciarini

Il 07/08/14 17:29, Bruce Momjian ha scritto:
> I am a little worried that many users will not realize this until they
> try it and are disappointed, e.g. "Why is PG writing to my static data
> so often?" --- then we get beaten up about our hint bits and freezing
> behavior.  :-(
> 
> I am just trying to set realistic expectations.
> 

Our experience is that for big databases (size over about 50GB) the
file-level approach is often enough to halve the size of the backup.

Users which run Postgres as Data Warehouse surely will benefit from it,
so we could present it as a DWH oriented feature.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 9:15 PM, Craig Ringer  wrote:
> To support transparent client-side failover in BDR, it's necessary to
> know what the LSN of a node was at the time a transaction committed and
> keep track of that in the client/proxy.

I doubt whether it makes sense to do this without a broader
understanding of how the client-side failover mechanism would work.
If we're going to add something like this, it should include libpq
support for actually doing something useful with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Oleg Bartunov

+1 for BRIN !

On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs  wrote:
> On 7 August 2014 14:53, Robert Haas  wrote:
>> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
>>  wrote:
>>> 2014-08-06 Claudio Freire :
>>>
 So, I like blockfilter a lot. I change my vote to blockfilter ;)
>>>
>>> +1 for blockfilter, because it stresses the fact that the "physical"
>>> arrangement of rows in blocks matters for this index.
>>
>> I don't like that quite as well as summary, but I'd prefer either to
>> the current naming.
>
> Yes, "summary index" isn't good. I'm not sure where the block or the
> filter part comes in though, so -1 to "block filter", not least
> because it doesn't have a good abbreviation (bfin??).
>
> A better description would be "block range index" since we are
> indexing a range of blocks (not just one block). Perhaps a better one
> would be simply "range index", which we could abbreviate to RIN or
> BRIN.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian

On Thu, Aug  7, 2014 at 11:03:40AM +0100, Simon Riggs wrote:
> Well, there is a huge difference between file-level and block-level backup.
> 
> Designing, writing and verifying block-level backup to the point that
> it is acceptable is a huge effort. (Plus, I don't think accumulating
> block numbers as they are written will be "low overhead". Perhaps
> there was a misunderstanding there and what is being suggested is to
> accumulate file names that change as they are written, since we
> already do that in the checkpointer process, which would be an option
> between 2 and 3 on the above list).
> 
> What is being proposed here is file-level incremental backup that
> works in a general way for various backup management tools. It's the
> 80/20 first step on the road. We get most of the benefit, it can be
> delivered in this release as robust, verifiable code. Plus, that is
> all we have budget for, a fairly critical consideration.
> 
> Big features need to be designed incrementally across multiple
> releases, delivering incremental benefit (or at least that is what I
> have learned). Yes, working block-level backup would be wonderful, but
> if we hold out for that as the first step then we'll get nothing
> anytime soon.

That is fine.  I just wanted to point out that as features are added,
file-level incremental backups might not be useful.  In fact, I think
there are a lot of users for which file-level incremental backups will
never be useful, i.e. you have to have a lot of frozen/static data for
file-level incremental backups to be useful.  

I am a little worried that many users will not realize this until they
try it and are disappointed, e.g. "Why is PG writing to my static data
so often?" --- then we get beaten up about our hint bits and freezing
behavior.  :-(

I am just trying to set realistic expectations.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Bruce Momjian

On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
> On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao  wrote:
> > There are some data which don't have LSN, for example, postgresql.conf.
> > When such data has been modified since last backup, they also need to
> > be included in incremental backup? Probably yes.
> Definitely yes. That's as well the case of paths like pg_clog,
> pg_subtrans and pg_twophase.

I assumed these would be unconditionally backed up during an incremental
backup because they relatively small and you don't want to make a mistake.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Fabien COELHO



Hello Tatsuo-san,


Thanks for the review. I have registered it to Aug Commit fest.
https://commitfest.postgresql.org/action/patch_view?id=1532

I'm not sure of the variable name "is_non_init_parameter_set". I would 
suggest "benchmarking_option_set"?


Ok, I will replace the variable name as you suggested.


Also, to be consistent, maybe it should check that no initialization-specific 
option are set when benchmarking.


Good suggesition. Here is the v2 patch.


I applied it without problem and tested it.


* It seems that -c is ignored, the atoi() line has been removed.

* Option -q is initialization specific, but not detected as such like the 
other, although there is a specific detection later. I think that it would 
be better to use the systematic approach, and to remove the specific 
check.


* I would name the second boolean "initialization_option_set", as it is 
describe like that in the documentation.



* I would suggest the following error messages:
 "some options cannot be used in initialization (-i) mode\n" and
 "some options cannot be used in benchmarking mode\n".
Although these messages are rough, I think that they are enough and avoid 
running something unexpected, which is your purpose.



Find attached a patch which adds these changes to your current version.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 67d7960..2f7d80e 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2521,7 +2521,7 @@ main(int argc, char **argv)
 	bool		scale_given = false;
 
 	bool		benchmarking_option_set = false;
-	bool		initializing_option_set = false;
+	bool		initialization_option_set = false;
 
 	CState	   *state;			/* status of clients */
 	TState	   *threads;		/* array of thread */
@@ -2610,6 +2610,7 @@ main(int argc, char **argv)
 break;
 			case 'c':
 benchmarking_option_set = true;
+nclients = atoi(optarg);
 if (nclients <= 0 || nclients > MAXCLIENTS)
 {
 	fprintf(stderr, "invalid number of clients: %d\n", nclients);
@@ -2695,6 +2696,7 @@ main(int argc, char **argv)
 use_log = true;
 break;
 			case 'q':
+initialization_option_set = true;
 use_quiet = true;
 break;
 			case 'f':
@@ -2722,7 +2724,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'F':
-initializing_option_set = true;
+initialization_option_set = true;
 fillfactor = atoi(optarg);
 if ((fillfactor < 10) || (fillfactor > 100))
 {
@@ -2776,14 +2778,14 @@ main(int argc, char **argv)
 			case 0:
 /* This covers long options which take no argument. */
 if (foreign_keys || unlogged_tables)
-	initializing_option_set = true;
+	initialization_option_set = true;
 break;
 			case 2:/* tablespace */
-initializing_option_set = true;
+initialization_option_set = true;
 tablespace = pg_strdup(optarg);
 break;
 			case 3:/* index-tablespace */
-initializing_option_set = true;
+initialization_option_set = true;
 index_tablespace = pg_strdup(optarg);
 break;
 			case 4:
@@ -2835,7 +2837,7 @@ main(int argc, char **argv)
 	{
 		if (benchmarking_option_set)
 		{
-			fprintf(stderr, "some parameters cannot be used in initializing mode\n");
+			fprintf(stderr, "some options cannot be used in initialization (-i) mode\n");
 			exit(1);
 		}
 
@@ -2844,9 +2846,9 @@ main(int argc, char **argv)
 	}
 	else
 	{
-		if (initializing_option_set)
+		if (initialization_option_set)
 		{
-			fprintf(stderr, "some parameters cannot be used in benchmarking mode\n");
+			fprintf(stderr, "some options cannot be used in benchmarking mode\n");
 			exit(1);
 		}
 	}
@@ -2868,13 +2870,6 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	/* -q may be used only with -i */
-	if (use_quiet && !is_init_mode)
-	{
-		fprintf(stderr, "quiet-logging is allowed only in initialization mode (-i)\n");
-		exit(1);
-	}
-
 	/* --sampling-rate may must not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Robert Haas

On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan  wrote:
> "The adversarial method works for almost any polymorphic program
> recognizable as quicksort. The subject quicksort may copy values at
> will, or work with lists rather than arrays. It may even pick the
> pivot at random. The quicksort will be vulnerable provided only that
> it satisfies some mild assumptions that are met by every
> implementation I have seen".
>
> IMHO, while worst case performance is a very useful tool for analyzing
> algorithms (particularly their worst case time complexity), a worst
> case should be put in its practical context. For example, if we had
> reason to be concerned about *adversarial* inputs, I think that there
> is a good chance that our qsort() actually would be problematic to the
> point of driving us to prefer some generally slower alternative.

I completely agree with this, and I think everyone else does, too.
Where we don't all necessarily agree is which worst cases are
realistic and which worst cases are simply pathological cases with
which we need not be concerned in practice.  For example, when I
proposed the patch to use MVCC catalog snapshots, Andres invented a
test case which I thought was far more brutal than anything likely to
be encountered in the real world.  But Andres didn't agree; he thought
the test case was realistic.  So, I worked on the patch some more and
came up with a solution that performs acceptably even if those brutal
conditions are encountered in the world.  As a result, the patch as
committed is better than the one originally submitted.  I could
certainly have spent more time debating whether that effort was
worthwhile, and maybe I would have won the argument, but it was a
better of use of that time to instead try to improve the patch.

So here.  You may not agree that the mitigation strategies for which
others are asking for are worthwhile, but you can't expect everyone
else to agree with your assessment of which cases are likely to occur
in practice.  The case of a cohort of strings to be sorted which share
a long fixed prefix and have different stuff at the end does not seem
particularly pathological to me.  It doesn't, in other words, require
an adversary: some real-world data sets will look like that.  I will
forebear repeating examples I've given before, but I've seen that kind
of thing more than once in real data sets that people (well, me,
anyway) actually wanted to put into a PostgreSQL database.  So I'm
personally of the opinion that the time you've put into trying to
protect against those cases is worthwhile.  I understand that you may
disagree with that, and that's fine: we're not all going to agree on
everything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Robert Haas

On Thu, Aug 7, 2014 at 10:16 AM, Simon Riggs  wrote:
> On 7 August 2014 14:53, Robert Haas  wrote:
>> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
>>  wrote:
>>> 2014-08-06 Claudio Freire :
>>>
 So, I like blockfilter a lot. I change my vote to blockfilter ;)
>>>
>>> +1 for blockfilter, because it stresses the fact that the "physical"
>>> arrangement of rows in blocks matters for this index.
>>
>> I don't like that quite as well as summary, but I'd prefer either to
>> the current naming.
>
> Yes, "summary index" isn't good. I'm not sure where the block or the
> filter part comes in though, so -1 to "block filter", not least
> because it doesn't have a good abbreviation (bfin??).
>
> A better description would be "block range index" since we are
> indexing a range of blocks (not just one block). Perhaps a better one
> would be simply "range index", which we could abbreviate to RIN or
> BRIN.

range index might get confused with range types; block range index
seems better.  I like summary, but I'm fine with block range index or
block filter index, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Alvaro Herrera

Simon Riggs wrote:
> On 7 August 2014 14:53, Robert Haas  wrote:
> > On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
> >  wrote:
> >> 2014-08-06 Claudio Freire :
> >>
> >>> So, I like blockfilter a lot. I change my vote to blockfilter ;)
> >>
> >> +1 for blockfilter, because it stresses the fact that the "physical"
> >> arrangement of rows in blocks matters for this index.
> >
> > I don't like that quite as well as summary, but I'd prefer either to
> > the current naming.
> 
> Yes, "summary index" isn't good. I'm not sure where the block or the
> filter part comes in though, so -1 to "block filter", not least
> because it doesn't have a good abbreviation (bfin??).

I was thinking just "blockfilter" (I did show a sample command).
Claudio explained the name downthread; personally, of all the options
suggested thus far, it's the one I like the most (including minmax).

At this point, the naming issue is what is keeping me from committing
this patch, so the quicker we can solve it, the merrier.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Merlin Moncure

On Thu, Aug 7, 2014 at 5:12 AM, Craig Ringer  wrote:
> New Intel hardware supports IEEE 754:2008 decimal floating point in
> hardware, and I'm quite interested in implementing DECFLOAT(n) for
> PostgreSQL to take advantage of that.

+1

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication commands and log_statements

2014-08-07 Thread Fujii Masao

On Wed, Jul 2, 2014 at 4:26 AM, Abhijit Menon-Sen  wrote:
> Hi.
>
> Do we have any consensus about what to do with these two patches?
>
> 1. Introduce a "log_replication_command" setting.
> 2. Change log_statement to be a list of tokens.
>
> If I understand correctly, there weren't any strong objections to the
> former, but the situation is less clear when it comes to the second.

On second thought, I'd like to propose the third option. This is the same idea
as Andres suggested upthread. That is, we log replication commands only
when log_statement is set to all. Neither new parameter is introduced nor
log_statement is redefined as a list.

Firstly, since log_statement=all means that all statements are logged, it's
basically natural and intuitive to log even replication commands in that
setting. Secondly, any statements except DDL and data-modifying statements,
e.g., VACUUM, CHECKPOINT, BEGIN, ..etc, are logged only when log_statement=all.
So even if some users want to control the logging of DDL and maintenance
commands orthogonally, which is impossible for now. Therefore, even if the
logging of replication commands which are neither DDL nor data-modifying
statements also cannot be controlled orthogonally, I don't think that users
get so surprised. Of course I have no objection to address the issue by, e.g.,
enabling such orthogonal logging control, in the future, though.

Thought? What about introducing that simple but not so confusing change first?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-08-07 Thread Kevin Grittner

Thanks for your review and comments, Amit!

Amit Khandekar  wrote:
> On 21 June 2014 23:36, Kevin Grittner  wrote:
>> Kevin Grittner  wrote:
>> I didn't change the tuplestores to TID because it seemed to me that
>> it would preclude using transition relations with FDW triggers, and
>> it seemed bad not to support that.  Does anyone see a way around
>> that, or feel that it's OK to not support FDW triggers in this
>> regard?
>
> I think it is ok to use tuplestores for now, but as mentioned by you
> somewhere else in the thread, later on we should change this to using
> tids as an optimization.

Well, the optimization would probably be to use a tuplestore of 
tids referencing modified tuples in the base table, rather than a 
tuplestore of the data itself.  But I think we're in agreement.

>> Does this look good otherwise, as far as it goes?
>
> I didn't yet extensively go through the patch, but before that, just a
> few quick comments:
>
> I see that the tupelstores for transition tables are stored per query
> depth. If the
> DML involves a table that has multiple child tables, it seems as though
> a single tuplestore would be shared among all these tables. That means
> if we define
> such triggers using transition table clause for all the child tables, then
> the trigger function for a child table will see tuples from other child tables
> as well. Is that true ?

I don't think so.  I will make a note of the concern to confirm by testing.

> If it is, it does not make sense.

I agree.

> I tried to google some SQLs that use REFERENCING clause with triggers.
> It looks like in some database systems, even the WHEN clause of CREATE TRIGGER
> can refer to a transition table, just like how it refers to NEW and
> OLD row variables.
>
> For e.g. :
> CREATE TRIGGER notify_dept
>   AFTER UPDATE ON weather
>   REFERENCING NEW_TABLE AS N_TABLE
>   NEW AS N_ROW
>   FOR EACH ROW
>   WHEN ((SELECT AVG (temperature) FROM N_TABLE) > 10)
>   BEGIN
> notify_department(N_ROW.temperature, N_ROW.city);
>   END
>
> Above, it is used to get an aggregate value of all the changed rows. I think
> we do not currently support aggregate expressions in the where clause, but 
> with
> transition tables, it makes more sense to support it later if not now.

Interesting point; I had not thought about that.  Will see if I can 
include support for that in the patch for the next CF; failing 
that; I will at least be careful to not paint myself into a corner 
where it is unduly hard to do later.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Claudio Freire

On Thu, Aug 7, 2014 at 11:16 AM, Simon Riggs  wrote:
> On 7 August 2014 14:53, Robert Haas  wrote:
>> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
>>  wrote:
>>> 2014-08-06 Claudio Freire :
>>>
 So, I like blockfilter a lot. I change my vote to blockfilter ;)
>>>
>>> +1 for blockfilter, because it stresses the fact that the "physical"
>>> arrangement of rows in blocks matters for this index.
>>
>> I don't like that quite as well as summary, but I'd prefer either to
>> the current naming.
>
> Yes, "summary index" isn't good. I'm not sure where the block or the
> filter part comes in though, so -1 to "block filter", not least
> because it doesn't have a good abbreviation (bfin??).

Block filter would refer to the index property that selects blocks,
not tuples, and it does so through a "filter function" (for min-max,
it's a range check, but for other opclasses it could be anything).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Simon Riggs

On 7 August 2014 14:53, Robert Haas  wrote:
> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
>  wrote:
>> 2014-08-06 Claudio Freire :
>>
>>> So, I like blockfilter a lot. I change my vote to blockfilter ;)
>>
>> +1 for blockfilter, because it stresses the fact that the "physical"
>> arrangement of rows in blocks matters for this index.
>
> I don't like that quite as well as summary, but I'd prefer either to
> the current naming.

Yes, "summary index" isn't good. I'm not sure where the block or the
filter part comes in though, so -1 to "block filter", not least
because it doesn't have a good abbreviation (bfin??).

A better description would be "block range index" since we are
indexing a range of blocks (not just one block). Perhaps a better one
would be simply "range index", which we could abbreviate to RIN or
BRIN.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Append to a GUC parameter ?

2014-08-07 Thread Alvaro Herrera

Fabrízio de Royes Mello wrote:
> On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane  wrote:
> 
> > Josh Berkus  writes:
> > > BTW, while there's unlikely to be a good reason to put search_path in
> > > pg.conf with appends, there are a LOT of reasons to want to be able to
> > > append to it during a session.
> >
> > [shrug...]  You can do that today with current_setting()/set_config().
> 
> With a very simple statement you can do that:

Of course, this doesn't solve the requirement that started this thread,
which is about having "includeable" pg.conf fragments to enable
extensions.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier
 wrote:
> 2014-08-06 Claudio Freire :
>
>> So, I like blockfilter a lot. I change my vote to blockfilter ;)
>
> +1 for blockfilter, because it stresses the fact that the "physical"
> arrangement of rows in blocks matters for this index.

I don't like that quite as well as summary, but I'd prefer either to
the current naming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 7:20 PM, Jeff Janes  wrote:
> On Wed, Aug 6, 2014 at 8:35 AM, Robert Haas  wrote:
>>
>> On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes  wrote:
>> > I think you missed one of the regression tests, see attached
>>
>> Woops.  Thanks, committed.
>
> Thanks.
>
> It needs to go into 9_4_STABLE as well.

Sheesh, where is my brain?  Done, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning performance: cache stringToNode() of pg_constraint.ccbin

2014-08-07 Thread Robert Haas

On Wed, Aug 6, 2014 at 9:35 PM, Bruce Momjian  wrote:
> On Sun, Jan 12, 2014 at 12:53:40PM -0500, Noah Misch wrote:
>> On Sat, Jan 11, 2014 at 02:10:01PM -0500, Bruce Momjian wrote:
>> > On Mon, Jun  3, 2013 at 03:07:27PM -0400, Noah Misch wrote:
>> > > A colleague, Korry Douglas, observed a table partitioning scenario where
>> > > deserializing pg_constraint.ccbin is a hot spot.  The following test 
>> > > case, a
>> > > simplification of a typical partitioning setup, spends 28% of its time in
>> > > stringToNode() and callees thereof:
>> >
>> > Noah, what is the status on this?
>>
>> Further study revealed a defect in the patch's memory management, and I have
>> not gotten around to correcting that.
>
> I talked to Noah and he can't continue on this item.  Can someone else
> work on it?

Well, it would be helpful if he could describe the defect he found, so
that the next person doesn't have to guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled

2014-08-07 Thread Fujii Masao

On Fri, May 23, 2014 at 9:38 PM, Robert Haas  wrote:
> On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU  wrote:
>> The failing assertion is for prohibiting memory allocation in a critical 
>> section, which is introduced by commit 4a170ee9 on 2014-04-04.

This issue is still in open item list for 9.4. But the commit which had
caused this issue was reverted by d27d493. So I removed this from the
Open Item list.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wraparound limits

2014-08-07 Thread Heikki Linnakangas


On 08/07/2014 01:34 PM, Teodor Sigaev wrote:

Hi!

I have a questions about setting transaction's wraparound limits. Function
SetTransactionIdLimit() in access/transam/varsup.c:

1)
  xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
  if (xidWrapLimit < FirstNormalTransactionId)
  xidWrapLimit += FirstNormalTransactionId;

Isn't  it a problem if oldest_datfrozenxid > MaxTransactionId/2?


Don't think so. What problem do you see?


2)
  xidStopLimit = xidWrapLimit - 100;
  if (xidStopLimit < FirstNormalTransactionId)
  xidStopLimit -= FirstNormalTransactionId;

  xidWarnLimit = xidStopLimit - 1000;
  if (xidWarnLimit < FirstNormalTransactionId)
  xidWarnLimit -= FirstNormalTransactionId;

Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In
this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit >
xidStopLimit...


Remember that the limits are compared with xids using wrap-around aware 
functions TransactionIdPrecedes and TransactionidFollows. Not regular < 
and >. The "<" checks above are just to check if the XID hit one of the 
special TransactionIds, and if so, increase/decrese it to get back to 
the normal range.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-08-07 Thread Michael Paquier

On Thu, May 8, 2014 at 10:28 PM, Andres Freund  wrote:
> Well, we have to live with it for now :)
I just had a look at the first patch and got some comments:
1) Instead of using an assertion here, wouldn't it be better to error
out if name is NULL, and truncate the name if it is longer than
SHMEM_INDEX_KEYSIZE - 1 (including '\0')?
scanstr in scansup.c?
Assert(IsUnderPostmaster);
+   Assert(name != NULL && strlen(name) > 0 &&
+  strlen(name) < SHMEM_INDEX_KEYSIZE - 1);
2) The addition of a field to track the size of a dsm should be
explicitly mentioned, this is useful for the 2nd patch.
3) The refactoring done in dsm_create to find an unused slot should be
done as a separate patch for clarity.
4) Using '\0' here would be more adapted:
+   item->name[SHMEM_INDEX_KEYSIZE - 1] = 0;

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-08-07 Thread Amit Khandekar

On 21 June 2014 23:36, Kevin Grittner  wrote:
> Kevin Grittner  wrote:
> I didn't change the tuplestores to TID because it seemed to me that
> it would preclude using transition relations with FDW triggers, and
> it seemed bad not to support that.  Does anyone see a way around
> that, or feel that it's OK to not support FDW triggers in this
> regard?

I think it is ok to use tuplestores for now, but as mentioned by you
somewhere else in the thread, later on we should change this to using
tids as an optimization.

>
> Does this look good otherwise, as far as it goes?

I didn't yet extensively go through the patch, but before that, just a
few quick comments:

I see that the tupelstores for transition tables are stored per query
depth. If the
DML involves a table that has multiple child tables, it seems as though
a single tuplestore would be shared among all these tables. That means
if we define
such triggers using transition table clause for all the child tables, then
the trigger function for a child table will see tuples from other child tables
as well. Is that true ? If it is, it does not make sense.
For fdw tuplestore, this issue does not arise because the DML won't involve
multiple target tables I suppose.

---

I tried to google some SQLs that use REFERENCING clause with triggers.
It looks like in some database systems, even the WHEN clause of CREATE TRIGGER
can refer to a transition table, just like how it refers to NEW and
OLD row variables.

For e.g. :
CREATE TRIGGER notify_dept
  AFTER UPDATE ON weather
  REFERENCING NEW_TABLE AS N_TABLE
  NEW AS N_ROW
  FOR EACH ROW
  WHEN ((SELECT AVG (temperature) FROM N_TABLE) > 10)
  BEGIN
notify_department(N_ROW.temperature, N_ROW.city);
  END

Above, it is used to get an aggregate value of all the changed rows. I think
we do not currently support aggregate expressions in the where clause, but with
transition tables, it makes more sense to support it later if not now.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Michael Paquier

On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao  wrote:
> There are some data which don't have LSN, for example, postgresql.conf.
> When such data has been modified since last backup, they also need to
> be included in incremental backup? Probably yes.
Definitely yes. That's as well the case of paths like pg_clog,
pg_subtrans and pg_twophase.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:20 AM, Bruce Momjian  wrote:
> On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
>> On 6 August 2014 03:16, Bruce Momjian  wrote:
>> > On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
>> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs  wrote:
>> >> >
>> >> > On 5 August 2014 22:38, Claudio Freire  wrote:
>> >> > Thinking some more, there seems like this whole store-multiple-LSNs
>> >> > thing is too much. We can still do block-level incrementals just by
>> >> > using a single LSN as the reference point. We'd still need a complex
>> >> > file format and a complex file reconstruction program, so I think that
>> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>> >>
>> >> Yes, that's the approach taken by pg_rman for its block-level
>> >> incremental backup. Btw, I don't think that the CPU cost to scan all
>> >> the relation files added to the one to rebuild the backups is worth
>> >> doing it on large instances. File-level backup would cover most of the
>> >
>> > Well, if you scan the WAL files from the previous backup, that will tell
>> > you what pages that need incremental backup.
>>
>> That would require you to store that WAL, which is something we hope
>> to avoid. Plus if you did store it, you'd need to retrieve it from
>> long term storage, which is what we hope to avoid.
>
> Well, for file-level backups we have:
>
> 1) use file modtime (possibly inaccurate)
> 2) use file modtime and checksums (heavy read load)
>
> For block-level backups we have:
>
> 3) accumulate block numbers as WAL is written
> 4) read previous WAL at incremental backup time
> 5) read data page LSNs (high read load)
>
> The question is which of these do we want to implement?

There are some data which don't have LSN, for example, postgresql.conf.
When such data has been modified since last backup, they also need to
be included in incremental backup? Probably yes. So implementing only
block-level backup seems not complete solution. It needs file-level backup as
an infrastructure for such data. This makes me think that it's more reasonable
to implement file-level backup first.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 5:02 PM, Heikki Linnakangas
 wrote:
> On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote:
>>
>> 2014-08-07 13:47 GMT+09:00 Fujii Masao :
>>
>>> On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
>>>  wrote:

 On 08/06/2014 08:39 PM, Fujii Masao wrote:
>
> The WAL files that pg_receivexlog writes will not be re-read soon
> basically,
> so we can advise the OS to release any cached pages when WAL file is
> closed. I feel inclined to change pg_receivexlog that way. Thought?



 -1. The OS should be smart enough to not thrash the cache by files that
>>>
>>> are

 written sequentially and never read.
>>>
>>>
>> OS's buffer strategy is optimized for general situation. Do you forget OS
>> hackers discussion last a half of year?
>>
>>> Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
>>> not,
>>> so I was thinking that posix_fadvise is called when the server closes WAL
>>> file.
>>
>>
>> That's right.
>
>
> Well, I'd like to hear someone from the field complaining that
> pg_receivexlog is thrashing the cache and thus reducing the performance of
> some other process. Or a least a synthetic test case that demonstrates that
> happening.

Yeah, I will test that by seeing the performance of PostgreSQL which is
running in the same server as pg_receivexlog is running. We can just
compare that performance with normal pg_receivexlog and that with
the patched one (i.e., posix_fadvise is called).

>
>
>> By the way, does pg_receivexlog process have fsync() in every WAL commit?
>
>
> It fsync's each file after finishing to write it. Ie. each WAL file is
> fsync'd once.
>
>
>> If yes, I think that we need no or less fsync() option for the better
>> performance. It is general in NOSQL storages.
>> If no, we need fsync() option for more getting reliability and data
>> integrarity.
>
>
> Hmm. An fsync=off style option might make sense, although I doubt the one
> fsync at end of file is causing a performance problem for anyone in
> practice. Haven't heard any complaints, anyway.
>
> An option to fsync after every commit record might make sense if you use
> pg_receivexlog with synchronous replication. Doing that would require
> parsing the WAL, though, to see where the commit records are. But then
> again, the fsync's wouldn't need to correspond to commit records. We could
> fsync just before we go to sleep to wait for more WAL to be received.

That's what Furuya-san proposed in last CommitFest.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-07 Thread Ants Aasma

On Thu, Aug 7, 2014 at 4:15 AM, Craig Ringer  wrote:
> I'm thinking about adding a new message type in the protocol that gets
> sent immediately before CommandComplete, containing the LSN of the
> commit. Clients would need to enable the sending of this message with a
> GUC that they set when they connect, so it doesn't confuse clients that
> aren't expecting it or aware of it.
>
> Is this something you can see being useful for other non-BDR purposes?

I have been thinking about something similar.

For regular streaming replication you could keep track of the commit
LSN on a per client basis and automatically redirect read queries to a
standby if standby apply location is larger than the commit LSN of
this particular client. This can be done mostly transparently for the
application while not running into the issue that recent modifications
disappear for a while after commit.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [GSoC] kmedoids status report

2014-08-07 Thread Maxence Ahlouche

Hi!

Here is a report of what has been discussed yesterday on IRC.

The kmedoids module now seems to work correctly on basic datasets. I've
also implemented its variants with different seeding methods: random
initial medoids, and initial medoids distributed among the points (similar
to kmeans++ [0]).

The next steps are:

   - Making better tests (1-2d)
   - Writing the documentation (1d)
   - Adapting my code to GP and HAWQ -- btw, are default parameters now
   available in GP and HAWQ? (1-2d)
   - Refactoring kmedoids and kmeans, as there is code duplication between
   those two.
   For this step, I don't know if I'll have time to create a clustering
   module, and make kmeans and kmedoids submodules of it. If yes, then it's
   perfect; otherwise, I'll just rename the common functions in kmeans, and
   have kmedoids call them from there.

Hai also helped me setup (once more) the VM where GreenPlum and HAWQ are
installed, so that I can test my code on these DBMS.

As a reminder, I'm supposed to stop coding next Monday, and then the last
week is dedicated to documentation, tests, refactoring and polishing.

Regards,

Maxence

[0] https://en.wikipedia.org/wiki/K-means%2B%2B

[HACKERS] Wraparound limits

2014-08-07 Thread Teodor Sigaev


Hi!

I have a questions about setting transaction's wraparound limits. Function 
SetTransactionIdLimit() in access/transam/varsup.c:


1)
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
if (xidWrapLimit < FirstNormalTransactionId)
xidWrapLimit += FirstNormalTransactionId;

Isn't  it a problem if oldest_datfrozenxid > MaxTransactionId/2?

2)
xidStopLimit = xidWrapLimit - 100;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;

xidWarnLimit = xidStopLimit - 1000;
if (xidWarnLimit < FirstNormalTransactionId)
xidWarnLimit -= FirstNormalTransactionId;

Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In 
this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit > 
xidStopLimit...



Thank you.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-07 Thread Craig Ringer

On 08/05/2014 10:44 PM, Shaun Thomas wrote:
> On 08/05/2014 12:56 AM, Mark Kirkwood wrote:
> 
>> The moral of the story for this case is that mapping Oracle to Postgres
>> datatypes can require some careful thought. Using 'native' types (like
>> integer, float8 etc) will generally give vastly quicker performance.
> 
> We've seen a lot of this ourselves. Oracle's NUMERIC is a native type,
> whereas ours is emulated.

I'm not sure what you mean by "native" vs "emulated" here.

PostgreSQL's NUMERIC is binary-coded decimal with mathematical
operations performed in software.

According to the docs, my impression is that Oracle's NUMBER is stored
more like a decfloat:

http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i22289

but my Oracle expertise is admittedly lacking.

New Intel hardware supports IEEE 754:2008 decimal floating point in
hardware, and I'm quite interested in implementing DECFLOAT(n) for
PostgreSQL to take advantage of that.

A DECFLOAT type would also be more compatible with things like the C#
"Decimal" type than our current NUMERIC is.

> At least you used INT though. I've seen too many Oracle shops using
> NUMERIC in PostgreSQL because it's there, and suffering major
> performance hits because of it.

In retrospect it might be a bit of a loss that the numeric type format
doesn't reserve a couple of bits for short-value flags, so we could
store and work with native integers for common values.

There's NumericShort and NumericLong, but no NumericNative or
NumericInt32 or whatever. OTOH, by the time you handle alignment and
padding requirements and the cost of deciding which numeric format the
input is, it might not've been much faster. Presumably it was looked at
during the introduction of NumericShort.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Incremental Backup

2014-08-07 Thread Simon Riggs

On 6 August 2014 17:27, Bruce Momjian  wrote:
> On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
>> On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian  wrote:
>> >
>> > Well, for file-level backups we have:
>> >
>> > 1) use file modtime (possibly inaccurate)
>> > 2) use file modtime and checksums (heavy read load)
>> >
>> > For block-level backups we have:
>> >
>> > 3) accumulate block numbers as WAL is written
>> > 4) read previous WAL at incremental backup time
>> > 5) read data page LSNs (high read load)
>> >
>> > The question is which of these do we want to implement?  #1 is very easy
>> > to implement, but incremental _file_ backups are larger than block-level
>> > backups.  If we have #5, would we ever want #2?  If we have #3, would we
>> > ever want #4 or #5?
>>
>> You may want to implement both #3 and #2. #3 would need a config
>> switch to enable updating the bitmap. That would make it optional to
>> incur the I/O cost of updating the bitmap. When the bitmap isn't
>> there, the backup would use #2. Slow, but effective. If slowness is a
>> problem for you, you enable the bitmap and do #3.
>>
>> Sounds reasonable IMO, and it means you can start by implementing #2.
>
> Well, Robert Haas had the idea of a separate process that accumulates
> the changed WAL block numbers, making it low overhead.  I question
> whether we need #2 just to handle cases where they didn't enable #3
> accounting earlier.  If that is the case, just do a full backup and
> enable #3.

Well, there is a huge difference between file-level and block-level backup.

Designing, writing and verifying block-level backup to the point that
it is acceptable is a huge effort. (Plus, I don't think accumulating
block numbers as they are written will be "low overhead". Perhaps
there was a misunderstanding there and what is being suggested is to
accumulate file names that change as they are written, since we
already do that in the checkpointer process, which would be an option
between 2 and 3 on the above list).

What is being proposed here is file-level incremental backup that
works in a general way for various backup management tools. It's the
80/20 first step on the road. We get most of the benefit, it can be
delivered in this release as robust, verifiable code. Plus, that is
all we have budget for, a fairly critical consideration.

Big features need to be designed incrementally across multiple
releases, delivering incremental benefit (or at least that is what I
have learned). Yes, working block-level backup would be wonderful, but
if we hold out for that as the first step then we'll get nothing
anytime soon.

I would also point out that the more specific we make our backup
solution the less likely it is to integrate with external backup
providers. Oracle's RMAN requires specific support in external
software. 10 years after Postgres PITR we still see many vendors
showing "PostgreSQL Backup Supported" as meaning pg_dump only.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enhancing pgbench parameter checking

2014-08-07 Thread Tatsuo Ishii

Fabien,

> I have not tested, but the patch looks ok in principle.

Thanks for the review. I have registered it to Aug Commit fest.
https://commitfest.postgresql.org/action/patch_view?id=1532

> I'm not sure of the variable name "is_non_init_parameter_set". I would 
> suggest "benchmarking_option_set"?

Ok, I will replace the variable name as you suggested.

> Also, to be consistent, maybe it should check that no initialization-specific 
> option are set when benchmarking.

Good suggesition. Here is the v2 patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index c0e5e24..67d7960 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2520,6 +2520,9 @@ main(int argc, char **argv)
 	char	   *filename = NULL;
 	bool		scale_given = false;
 
+	bool		benchmarking_option_set = false;
+	bool		initializing_option_set = false;
+
 	CState	   *state;			/* status of clients */
 	TState	   *threads;		/* array of thread */
 
@@ -2599,12 +2602,14 @@ main(int argc, char **argv)
 break;
 			case 'S':
 ttype = 1;
+benchmarking_option_set = true;
 break;
 			case 'N':
 ttype = 2;
+benchmarking_option_set = true;
 break;
 			case 'c':
-nclients = atoi(optarg);
+benchmarking_option_set = true;
 if (nclients <= 0 || nclients > MAXCLIENTS)
 {
 	fprintf(stderr, "invalid number of clients: %d\n", nclients);
@@ -2629,6 +2634,7 @@ main(int argc, char **argv)
 #endif   /* HAVE_GETRLIMIT */
 break;
 			case 'j':			/* jobs */
+benchmarking_option_set = true;
 nthreads = atoi(optarg);
 if (nthreads <= 0)
 {
@@ -2637,9 +2643,11 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
+benchmarking_option_set = true;
 is_connect = true;
 break;
 			case 'r':
+benchmarking_option_set = true;
 is_latencies = true;
 break;
 			case 's':
@@ -2652,6 +2660,7 @@ main(int argc, char **argv)
 }
 break;
 			case 't':
+benchmarking_option_set = true;
 if (duration > 0)
 {
 	fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both.\n");
@@ -2665,6 +2674,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'T':
+benchmarking_option_set = true;
 if (nxacts > 0)
 {
 	fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both.\n");
@@ -2681,12 +2691,14 @@ main(int argc, char **argv)
 login = pg_strdup(optarg);
 break;
 			case 'l':
+benchmarking_option_set = true;
 use_log = true;
 break;
 			case 'q':
 use_quiet = true;
 break;
 			case 'f':
+benchmarking_option_set = true;
 ttype = 3;
 filename = pg_strdup(optarg);
 if (process_file(filename) == false || *sql_files[num_files - 1] == NULL)
@@ -2696,6 +2708,8 @@ main(int argc, char **argv)
 {
 	char	   *p;
 
+	benchmarking_option_set = true;
+
 	if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 	{
 		fprintf(stderr, "invalid variable definition: %s\n", optarg);
@@ -2708,6 +2722,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'F':
+initializing_option_set = true;
 fillfactor = atoi(optarg);
 if ((fillfactor < 10) || (fillfactor > 100))
 {
@@ -2716,6 +2731,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'M':
+benchmarking_option_set = true;
 if (num_files > 0)
 {
 	fprintf(stderr, "query mode (-M) should be specifiled before transaction scripts (-f)\n");
@@ -2731,6 +2747,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'P':
+benchmarking_option_set = true;
 progress = atoi(optarg);
 if (progress <= 0)
 {
@@ -2745,6 +2762,8 @@ main(int argc, char **argv)
 	/* get a double from the beginning of option value */
 	double		throttle_value = atof(optarg);
 
+	benchmarking_option_set = true;
+
 	if (throttle_value <= 0.0)
 	{
 		fprintf(stderr, "invalid rate limit: %s\n", optarg);
@@ -2756,14 +2775,19 @@ main(int argc, char **argv)
 break;
 			case 0:
 /* This covers long options which take no argument. */
+if (foreign_keys || unlogged_tables)
+	initializing_option_set = true;
 break;
 			case 2:/* tablespace */
+initializing_option_set = true;
 tablespace = pg_strdup(optarg);
 break;
 			case 3:/* index-tablespace */
+initializing_option_set = true;
 index_tablespace = pg_strdup(optarg);
 break;
 			case 4:
+benchmarking_option_set = true;
 sample_rate = atof(optarg);
 if (sample_rate <= 0.0 || sample_rate > 1.0)
 {
@@ -2776,6 +2800,7 @@ main(int argc, char **argv)
 fprintf(stderr, "--aggregate-interval is not currently supported on Windows");
 exit(1);
 #else
+benchmarking_option_set = true;
 agg_interval = atoi(optarg);
 if (agg_interval <= 0)

Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-07 Thread furuyao

> Okay, applied the patch.
> 
> I heavily modified your patch based on the master that the refactoring
> patch has been applied. Attached is that modified version. Could you
> review that?

Thank you for the patch.
I did a review of the patch. 

No problem in the patch. 

Behavior after the true return of ProcessXLogDataMsg was changed by the patch.
Although it was moving to while(1), it has changed so that a while(r != 0) loop 
may be continued.
Since still_sending is false, although skip processing is performed, a result 
of operation does not change.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-07 Thread Peter Geoghegan

On Wed, Aug 6, 2014 at 10:36 PM, Peter Geoghegan  wrote:
> This *almost* applies to patched Postgres if you pick a benchmark that
> is very sympathetic to my patch. To my surprise, work_mem = '10MB'
> (which results in an external tape sort) is sometimes snapping at the
> heels of a work_mem = '5GB' setting (which results in an in-memory
> quicksort).

Note that this was with a default temp_tablespaces setting that wrote
temp files on my home partition/SSD. With a /dev/shm/ temp tablespace,
tape sort edges ahead, and has a couple of hundred milliseconds on
quicksort for this test case. It's actually faster.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Heikki Linnakangas


On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote:

2014-08-07 13:47 GMT+09:00 Fujii Masao :


On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
 wrote:

On 08/06/2014 08:39 PM, Fujii Masao wrote:

The WAL files that pg_receivexlog writes will not be re-read soon
basically,
so we can advise the OS to release any cached pages when WAL file is
closed. I feel inclined to change pg_receivexlog that way. Thought?



-1. The OS should be smart enough to not thrash the cache by files that

are

written sequentially and never read.



OS's buffer strategy is optimized for general situation. Do you forget OS
hackers discussion last a half of year?


Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
not,
so I was thinking that posix_fadvise is called when the server closes WAL
file.


That's right.


Well, I'd like to hear someone from the field complaining that 
pg_receivexlog is thrashing the cache and thus reducing the performance 
of some other process. Or a least a synthetic test case that 
demonstrates that happening.



By the way, does pg_receivexlog process have fsync() in every WAL commit?


It fsync's each file after finishing to write it. Ie. each WAL file is 
fsync'd once.



If yes, I think that we need no or less fsync() option for the better
performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data
integrarity.


Hmm. An fsync=off style option might make sense, although I doubt the 
one fsync at end of file is causing a performance problem for anyone in 
practice. Haven't heard any complaints, anyway.


An option to fsync after every commit record might make sense if you use 
pg_receivexlog with synchronous replication. Doing that would require 
parsing the WAL, though, to see where the commit records are. But then 
again, the fsync's wouldn't need to correspond to commit records. We 
could fsync just before we go to sleep to wait for more WAL to be received.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO



Hello John,


[...]
In fact, the mentioned paper says this about the subject "Moreover, if 
worst-case performance is important, Quicksort is the wrong algorithm."


I fully agree with this conclusion.

--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A worst case for qsort

2014-08-07 Thread Fabien COELHO




IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.


ISTM that you raise two distinct questions wrt to PostgreSQL, which are, 
is the worst case performance really an issue:

 (1) in general
 (2) wrt adversarial inputs

The answer could be (1) "mostly no" and (2) "maybe yes".

It suggests that where qsort is used, the administrator wary of (2) could 
be allowed to use an alternate implementation, maybe some merge sort, say 
by tweaking a configuration option in "postgresql.conf".


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Mitsumasa KONDO

Hi,

2014-08-07 13:47 GMT+09:00 Fujii Masao :

> On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
>  wrote:
> > On 08/06/2014 08:39 PM, Fujii Masao wrote:
> >> The WAL files that pg_receivexlog writes will not be re-read soon
> >> basically,
> >> so we can advise the OS to release any cached pages when WAL file is
> >> closed. I feel inclined to change pg_receivexlog that way. Thought?
> >
> >
> > -1. The OS should be smart enough to not thrash the cache by files that
> are
> > written sequentially and never read.
>
OS's buffer strategy is optimized for general situation. Do you forget OS
hackers discussion last a half of year?


> Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
> not,
> so I was thinking that posix_fadvise is called when the server closes WAL
> file.

That's right.


> > If we go down this path, we'd need to
> > sprinkle posix_fadvises into many, many places.
>
Why do you aim to be perfect at the beginning?
It is as same as history of postgres, your concern doesn't make sense.


> > Anyway, who are we to say that they won't be re-read soon? You might e.g
> > have a secondary backup site where you copy the files received by
> > pg_receivexlog, as soon as they're completed.
>
> So whether posix_fadvise is called or not needs to be exposed as an
> user-configurable option. We would need to measure how useful exposing
> that is, though.

By the way, does pg_receivexlog process have fsync() in every WAL commit?
If yes, I think that we need no or less fsync() option for the better
performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data
integrarity.


Best regards,
--
Mitsumasa KONDO

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:28 PM, Amit Kapila  wrote:
> On Wed, Aug 6, 2014 at 11:39 AM, Fujii Masao  wrote:
>>
>> On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao 
>> wrote:
>> > On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane  wrote:
>> >> Fujii Masao  writes:
>> >>> The patch chooses the last settings for every parameters and ignores
>> >>> the
>> >>> former settings. But I don't think that every parameters need to be
>> >>> processed
>> >>> this way. That is, we can change the patch so that only PGC_POSTMASTER
>> >>> parameters are processed that way. The invalid settings in the
>> >>> parameters
>> >>> except PGC_POSTMASTER can be checked by pg_ctl reload as they are now.
>> >>> Also this approach can reduce the number of comparison to choose the
>> >>> last setting, i.e., "n" in O(n^2) is the number of uncommented
>> >>> *PGC_POSTMASTER*
>> >>> parameters (not every parameters). Thought?
>> >>
>> >> I don't find that to be a particularly good idea.  In the first place,
>> >> it introduces extra complication and a surprising difference in the
>> >> behavior of different GUCs.  In the second place, I thought part of the
>> >> point of this patch was to suppress log messages complaining about
>> >> invalid values that then weren't actually used for anything.  That
>> >> issue
>> >> exists just as much for non-POSTMASTER variables.  (IOW, "value cannot
>> >> be changed now" is not the only log message we're trying to suppress.)
>> >
>> > Yep, sounds reasonable. This makes me think that we can suppress
>> > such invalid message of the parameters which are actually not used
>> > at not only conf file reload but also *postmaster startup*. That's
>> > another
>> > story, though. Anyway, barring any objection, I will commit Amit's
>> > patch.
>>
>> Applied the slightly-modified version!
>
> Thanks.  There is a commitfest entry [1] for this patch, do you
> want some thing more to be addressed or shall we mark that as
> committed.
>
> [1]
> https://commitfest.postgresql.org/action/patch_view?id=1500

Yeah, let's mark this as committed because your patch has been committed and
the originally-reported problem has been fixed. We are now discussing another
patch, but I will add that as new CF entry.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-08-07 Thread Fujii Masao

On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila  wrote:
> On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao  wrote:
>> On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila 
>> wrote:
>> >
>> > The reason is that during startup DataDir is not set by the time server
>> > processes config file due to which we process .auto.conf file in second
>> > pass.  I think ideally it should ignore the invalid setting in such a
>> > case
>> > as it does during reload, however currently there is no good way to
>> > process .auto.conf  incase DataDir is not set, so we handle it
>> > separately
>> > in second pass.
>>
>> What about picking up only data_directory at the first pass?
>
> I think that will workout and for that I think we might need to add
> few checks during ProcessConfigFile.  Do you want to address it
> during parsing of config file or during setting of values?

I prefer "during paring". The attached patch does that. In this patch,
the first call of ProcessConfigFile() picks up only data_directory. One
concern of this patch is that the logic is a bit ugly. Do you have any
other better idea?

Regards,

-- 
Fujii Masao
*** a/src/backend/utils/misc/guc-file.l
--- b/src/backend/utils/misc/guc-file.l
***
*** 648,653  ParseConfigFp(FILE *fp, const char *config_file, int depth, int elevel,
--- 648,672 
  		else
  		{
  			/*
+ 			 * Pick up only the data_directory if DataDir is not set, which
+ 			 * means that the configuration file is read for the first time and
+ 			 * PG_AUTOCONF_FILENAME file cannot be read yet. In this case,
+ 			 * we should not pick up all the settings except the data_directory
+ 			 * from postgresql.conf yet because they might be overwritten
+ 			 * with the settings in PG_AUTOCONF_FILENAME file which will be
+ 			 * read later. OTOH, since it's ensured that data_directory doesn't
+ 			 * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten
+ 			 * later. Now only the data_directory needs to be picked up to
+ 			 * read PG_AUTOCONF_FILENAME file later.
+ 			 */
+ 			if (DataDir == NULL && strcmp(opt_name, "data_directory") != 0)
+ 			{
+ if (token == 0)
+ 	break;
+ continue;
+ 			}
+ 
+ 			/*
  			 * ordinary variable, append to list.  For multiple items of
  			 * same parameter, retain only which comes later.
  			 */
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 4342,4347  SelectConfigFiles(const char *userDoption, const char *progname)
--- 4342,4354 
  		return false;
  	}
  
+ 	/*
+ 	 * Read the configuration file for the first time. This time only
+ 	 * data_directory parameter is picked up to determine the data directory
+ 	 * so that we can read PG_AUTOCONF_FILENAME file next time. Then don't
+ 	 * forget to read the configuration file again later to pick up all the
+ 	 * parameters.
+ 	 */
  	ProcessConfigFile(PGC_POSTMASTER);
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

99 matches

Mail list logo