Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Teodor Sigaev

It seems that *scanf() with %s format occures only here:
- check.c - get_bin_version()
- server.c - get_major_server_version()
- filemap.c - isRelDataFile()
- pg_backup_directory.c - _LoadBlobs()
- xlog.c - do_pg_stop_backup()
- mac.c - macaddr_in()
I think here sscanf() do not works with the UTF-8 characters. And probably this
is only spell.c issue.


Hmm. Here
src/backend/access/transam/xlog.c read_tablespace_map()
using %s in scanf looks suspisious. I don't fully understand but it looks like 
it tries to read oid as string. So, it should be safe in usial case


Next, _LoadBlobs() reads filename (fname) with a help of sscanf. Could file name 
be in UTF-8 encoding here?




I agree that previous patch is wrong. Instead of using new parse_ooaffentry()
function maybe better to use sscanf() with %ls format. The %ls format is used to
read a wide character string.

Does %ls modifier exist everewhere?
Apple docs says 
(https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/sscanf.3.html):

s ...
  If an l qualifier is present, the next pointer must be a pointer to wchar_t,
  into which the input will be placed after conversion by mbrtowc

Actually, it means that wchar2char() call should be used, but it uses
  wcstombs[_l] which could do not present on some platforms. Does it mean that 
l modifier of string presents too or not? What do we need to do if %l exists but 
wcstombs[_l] not?


I'm a bit crazy with locale problems and it seems to me that Artur's patch is 
good idea. Actually, I don't remember exactly, but, seems, commit 
7ac8a4be8946c11d5a6bf91bb971b9750c1c60e5 introduced parse_affentry() instead of 
corresponding sscanf to avoid problems with encoding and scanf.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Stephen Frost
* David Steele (da...@pgmasters.net) wrote:
> On 2/10/16 9:44 AM, Stephen Frost wrote:
> > Hrmmm.  If that's the case then perhaps you're right.  I liked the
> > general idea of not having to maintain a TCP connection during the
> > entire backup (TCP connections can be annoyingly finicky in certain
> > environments...), but I'm not sure it's worth a lot of additional
> > complexity.
> 
> Well, pgBackRest holds a connection to PostgreSQL through the entire
> backup and will abort the backup if it is severed.  The connection is
> always held locally, though, even if the master backup process is on a
> different server.  I haven't gotten any reports of aborted backups due
> to the connection failing.

Yeah, I know, I had been thinking it might be nice to not do that at
some point in the future, but thinking on it further, we've already got
a "pick up where you left off" capability with pgbackrest, so it's
really not a huge deal if the backup fails and has to be restarted, and
this does remove the need (or at least deprecate) to use the "stop an
already-running backup if you find one" option that pgbackrest has.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Tom Lane
Noah Misch  writes:
>>> That's reasonable.  If you would like higher-fidelity data, I can run loops 
>>> of
>>> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
>>> that for HEAD and 9.2 simultaneously.  A day of logs from that should show
>>> clearly if HEAD is systematically worse than 9.2.

>> That sounds like a fine plan, please do it.

> Log files:
> HEAD: 
> https://drive.google.com/uc?export=download=0B9IURs2-_2ZMakl2TjFHUlpvc1k
> 92:   
> https://drive.google.com/uc?export=download=0B9IURs2-_2ZMYVZtY3VqcjBFX1k

Thanks for doing this.

I did a bit of analysis of these logs.  There is no case in the HEAD log
where 'lock files all released' comes out more than 2.221 seconds after
'shutdown checkpoint complete'.  So it doesn't appear that stats writing
is ever a big problem for your machine.  The checkpoints themselves,
though ... here are the slowest few checkpoints on HEAD:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts 
order by total desc limit 10;
 ts |  write  |  sync   |  other  |  total  
+-+-+-+-
 2016-02-09 03:33:14.671-08 | 287.691 | 355.528 |  12.509 | 655.728
 2016-02-09 11:40:13.565-08 | 287.327 | 314.218 |  44.059 | 645.604
 2016-02-09 12:36:49.753-08 | 149.652 | 481.736 |  13.094 | 644.482
 2016-02-09 04:32:53.708-08 | 214.896 | 304.102 |  17.929 | 536.927
 2016-02-09 16:31:34.661-08 | 254.462 | 186.246 |  51.541 | 492.249
 2016-02-09 08:27:04.365-08 |  14.479 | 298.043 | 100.586 | 413.108
 2016-02-09 07:37:57.999-08 | 280.960 |  10.408 |  14.287 | 305.655
 2016-02-09 07:06:30.121-08 | 155.331 |  93.433 |  23.814 | 272.578
 2016-02-09 17:45:34.422-08 | 149.471 | 104.185 |  18.598 | 272.254
 2016-02-09 00:51:41.157-08 | 134.502 | 103.482 |  34.110 | 272.094
(10 rows)

and the same for 9.2:

regression=# select ts,write,sync,total-write-sync as other, total from 
chkpts92 order by total desc limit 10;
 ts |  write  |  sync   |  other  |  total  
+-+-+-+-
 2016-02-08 23:33:36.16-08  |  66.904 | 397.653 | 229.020 | 693.577
 2016-02-09 11:39:57.061-08 |   2.471 | 303.924 | 201.923 | 508.318
 2016-02-09 12:37:43.52-08  |  17.620 | 333.983 | 141.172 | 492.775
 2016-02-09 03:27:12.904-08 |  48.761 | 124.453 | 299.877 | 473.091
 2016-02-09 04:32:49.668-08 |  11.328 | 292.310 | 148.088 | 451.726
 2016-02-09 16:31:25.657-08 |   3.860 | 184.042 | 248.488 | 436.390
 2016-02-09 08:27:04.407-08 |  12.019 | 300.510 | 102.855 | 415.384
 2016-02-09 03:32:30.718-08 | 296.907 |  15.260 |   5.644 | 317.811
 2016-02-09 15:31:41.691-08 |   0.385 | 307.996 |   0.155 | 308.536
 2016-02-09 12:23:10.92-08  |  93.634 | 161.581 |  19.342 | 274.557
(10 rows)

It looks like you're going to have to set PGCTLTIMEOUT somewhere
north of 10 minutes to keep these animals from failing under load.

Interestingly, we seem to have managed to greatly reduce the "other"
time (which I presume is basically mdpostchkpt unlinking) since 9.2.
The worst case observed in HEAD is about 100s:

regression=# select ts,write,sync,total-write-sync as other, total from chkpts 
order by total-write-sync desc limit 10;
 ts |  write  |  sync   |  other  |  total  
+-+-+-+-
 2016-02-09 08:27:04.365-08 |  14.479 | 298.043 | 100.586 | 413.108
 2016-02-09 06:15:22.477-08 |  75.099 |  26.590 |  90.452 | 192.141
 2016-02-09 16:31:34.661-08 | 254.462 | 186.246 |  51.541 | 492.249
 2016-02-09 15:25:01.586-08 |   0.015 |   2.985 |  47.822 |  50.822
 2016-02-09 11:40:13.565-08 | 287.327 | 314.218 |  44.059 | 645.604
 2016-02-09 08:56:51.339-08 | 135.464 |  31.242 |  39.131 | 205.837
 2016-02-09 20:23:52.797-08 |   1.309 |  12.155 |  36.350 |  49.814
 2016-02-09 00:51:41.157-08 | 134.502 | 103.482 |  34.110 | 272.094
 2016-02-09 19:28:33.154-08 |   0.575 |  20.434 |  33.803 |  54.812
 2016-02-09 03:47:02.57-08  |   0.066 |  37.475 |  33.119 |  70.660
(10 rows)

but 9.2's worst cases greatly exceed that:

regression=# select ts,write,sync,total-write-sync as other, total from 
chkpts92 order by total-write-sync desc limit 10;
 ts | write  |  sync   |  other  |  total  
++-+-+-
 2016-02-09 03:27:12.904-08 | 48.761 | 124.453 | 299.877 | 473.091
 2016-02-09 16:31:25.657-08 |  3.860 | 184.042 | 248.488 | 436.390
 2016-02-08 23:33:36.16-08  | 66.904 | 397.653 | 229.020 | 693.577
 2016-02-09 11:39:57.061-08 |  2.471 | 303.924 | 201.923 | 508.318
 2016-02-09 07:37:44.48-08  |  3.317 |   1.494 | 158.159 | 162.970
 2016-02-09 04:32:49.668-08 | 11.328 | 292.310 | 148.088 | 451.726
 2016-02-09 12:37:43.52-08  | 17.620 | 333.983 | 141.172 | 492.775
 2016-02-09 08:27:04.407-08 | 12.019 | 300.510 | 102.855 | 415.384
 2016-02-09 18:17:32.193-08 | 90.370 |  25.369 

Re: [HACKERS] extend pgbench expressions with functions

2016-02-10 Thread Michael Paquier
On Tue, Feb 9, 2016 at 5:06 AM, Alvaro Herrera  wrote:
> Fabien COELHO wrote:
>>
>> Hello,
>>
>> >>v23 attached, which does not change the message but does the other fixes.
>> >
>> >This doesn't apply anymore
>>
>> Indeed, but the latest version was really v25.
>>
>> >-- please rebase and submit to the next CF.
>>
>> I already provided it as v25 on Feb 1st.
>>
>> >I closed it here as returned with feedback.
>>
>> I turned it to "moved to next CF" as the patch is already on the queue.
>
> Ah, I didn't see v25 anywhere.  What you did should be fine, thanks.

I just had another look at this patch.

+  parameter%  of the time.
Nitpick: double space here.


+   switch (func)
{
[...]
+   }
+   default:
+   return false;
}
In evalFunc(), the default case in switch for the operator functions
should never be reached. Adding for example Assert(0) is something to
consider.

PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them or
replace the code paths where they would be used by assertions to
trigger errors for future developments?

Other than that the patch looks in pretty good shape to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-10 Thread Fabien COELHO


Hello Michaël,


+  parameter%  of the time.
Nitpick: double space here.


Ok.


+   default:
+   return false;
   }
In evalFunc(), the default case in switch for the operator functions
should never be reached. Adding for example Assert(0) is something to
consider.


Ok for Assert + a comment.


PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them


Ok.

v26 attached implements these changes.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..eaa0889 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -796,17 +796,21 @@ pgbench  options  dbname
   Sets variable varname to an integer value calculated
   from expression.
   The expression may contain integer constants such as 5432,
-  references to variables :variablename,
+  double constants such as 3.14159,
+  references to integer variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -826,66 +830,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter around the mean; for instance, if
-  parameter is 4.0, 67% of values are drawn from the
-  middle quarter (1.0 / 4.0) of the interval (i.e. from
-  3.0 / 8.0 to 5.0 / 8.0) and 95% from
-  the middle half (2.0 / 4.0) of the interval (second and
-  third quartiles). The minimum parameter is 2.0 for
-  performance of the Box-Muller transform.
- 
+  
+   
+\setrandom n 1 10 exponential 3.0 is equivalent to
+\set n random_exponential(1, 10, 3.0) and uses an
+exponential distribution.
+   
+  
 
- 
-  For an exponential distribution, parameter
-  controls the distribution by truncating a quickly-decreasing
-  exponential distribution at parameter, and then
-  projecting onto integers between the bounds.
-  To be precise, with
-
-f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
-
-  Then value i between min and
-  max inclusive is drawn with probability:
-  f(x) - f(x + 1).
-  Intuitively, the larger parameter, the more
-  frequently values close to min are accessed, and the
-  less frequently values close to max are accessed.
-  The closer to 0 parameter, the flatter (more uniform)
-  the access distribution.
-  A crude approximation of the distribution is that the most frequent 1%
-  values in the range, close to min, are drawn
-  parameter% of the time.
-  parameter value must be strictly positive.
+  
+   
+\setrandom n 1 10 gaussian 2.0 is equivalent to
+\set n random_gaussian(1, 10, 2.0), and uses a gaussian
+distribution.
+   
+  
+ 
+
+   See the documentation of these functions below for further information
+   about the precise shape of these distributions, depending on the value
+   of the 

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Magnus Hagander
On Wed, Feb 10, 2016 at 4:38 PM, David Steele  wrote:

> On 2/10/16 7:46 AM, Magnus Hagander wrote:
> > Per discussion at the developer meeting in Brussels, here's a patch that
> > makes some updates to the backup APIs, to support non-exclusive backups
> > without using pg_basebackup. <...>
>
> This sounds like a great idea and I have signed up to review.
>
> > * A new version of pg_stop_backup is created, taking an argument
> > specifying if it's exclusive or not. The current version of
> > pg_stop_backup() continues to work for exclusive backups, with no
> > changes to behavior. The new pg_stop_backup will return a record of
> > three columns instead of just the value -- the LSN (pglsn), the backup
> > label file (text) and the tablespace map file (text). If used to cancel
> > an exclusive backup, backup label file and tablespace map will be NULL.
> > At this point it's up to the backup software to write down the files in
> > the location of the backup.
>
> It would be nice if this new pg_stop_backup() function also output the
> exact time when the backup stopped.  Depending on how long
> pg_stop_backup() waits for WAL to be archived a time-stamp recorded
> after pg_stop_backup() returns can be pretty far off.
>
> This information is handy for automating selection of a backup when
> doing time-based PITR (or so the user can manually select).  For
> exclusive backups it is possible to parse the .backup file to get this
> information but non-exclusive backups do not output the .backup file.
>

The non-exclusive backups *do* output the backup_label file, it just shows
up as a text field instead of as a separate file. You're supposed to write
that data to a file in the backup program.



> I would be happy to see the time-stamp returned from the
> pg_start_backup() function as well.  It's a bigger change, but once
> pg_start_backup() returns multiple columns it will be easier to add more
> columns in the future.
>
> In fact, it might be better if backup_label and tablespace_map were
> output by pg_start_backup().  This would give the backup software more
> information to work with from the start.
>

I was trying to figure out why that's a bad idea, but I can't really say
why :)

For pg_basebackup backups, I think the reason we put it at the end is
simply that we don't want to write it into the backup directory until the
rest of the backup is complete, since it's not going to be useful before
then. But that requirement can certainly be put on the backup client
instead of the server. With the newer API it's going to have to keep those
things around anyway.

That would then change the function syntax for pg_start_backup() but
instead make pg_stop_backup() now behave the same as it did before.

Would anybody else like to comment on if that's a good idea or if there are
any holes in it? :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Artur Zakirov

On 09.02.2016 20:13, Tom Lane wrote:

I do not like this patch much.  It is basically "let's stop using sscanf()
because it seems to have a bug on one platform".  There are at least two
things wrong with that approach:

1. By my count there are about 80 uses of *scanf() in our code.  Are we
going to replace every one of them with hand-rolled code?  If not, why
is only this instance vulnerable?  How can we know whether future uses
will have a problem?


It seems that *scanf() with %s format occures only here:
- check.c - get_bin_version()
- server.c - get_major_server_version()
- filemap.c - isRelDataFile()
- pg_backup_directory.c - _LoadBlobs()
- xlog.c - do_pg_stop_backup()
- mac.c - macaddr_in()
I think here sscanf() do not works with the UTF-8 characters. And 
probably this is only spell.c issue.


I agree that previous patch is wrong. Instead of using new 
parse_ooaffentry() function maybe better to use sscanf() with %ls 
format. The %ls format is used to read a wide character string.




2. We're not being very good citizens of the software universe if we
just install a hack in Postgres rather than nagging Apple to fix the
bug at its true source.

I think the appropriate next step to take is to dig into the OS X
sources (see http://www.opensource.apple.com, I think probably the
relevant code is in the Libc package) and identify exactly what is
causing the misbehavior.  That would both allow an informed answer
to point #1 and greatly increase the odds of getting action on a
bug report to Apple.  Even if we end up applying this patch verbatim,
I think we need that information first.

regards, tom lane



I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
with the %s format reads the string one character at a time. The size of 
letter 'х' is 2. And sscanf() separate it into two wrong characters.


In conclusion, I think in spell.c should be used sscanf() with %ls format.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-10 Thread Thom Brown
On 10 February 2016 at 08:00, Rushabh Lathia  wrote:
> Fujita-san, I am attaching update version of the patch, which added
> the documentation update.
>
> Once we finalize this, I feel good with the patch and think that we
> could mark this as ready for committer.

I find this wording a bit confusing:

"If the IsForeignRelUpdatable pointer is set to NULL, foreign tables
are assumed to be insertable, updatable, or deletable either the FDW
provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete
respectively or if the FDW optimizes a foreign table update on a
foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to
provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to
execute the optimized update.)."

This is a very long sentence, and the word "either" doesn't work here.

Also:

"When the query doesn't has the clause, the FDW must also increment
the row count for the ForeignScanState node in the EXPLAIN ANALYZE
case."

Should read "doesn't have"

The rest looks fine AFAICT.

Regards

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Andres Freund
Hi,

On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup.

Thanks for following through with this!

> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does.
> To use these non-exclusive backups the backup software will have to
> maintain a persistent connection to the database -- something that should
> not be a problem for any of the modern ones out there (but would've been
> slightly trickier back in the days when we suggested shellscripts)

I think we might want to make this one optional, but I'm not going to
fight super hard for it.

> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text).

I wonder if we shouldn't change the API a bit more aggressively. Right
now we return the label and the map - but what if there's more files at
some later point? One way would be to make it a SETOF function returning
'filename', 'content' or such.  Makes it less clear what to do with the
lsn admittedly.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-10 Thread Artur Zakirov

On 02.02.2016 15:45, Artur Zakirov wrote:

On 01.02.2016 20:12, Artur Zakirov wrote:


I have changed the patch:
1 - trgm2.data was corrected, duplicates were deleted.
2 - I have added operators <<-> and <->> with GiST index supporting. A
regression test will pass only with the patch
http://www.postgresql.org/message-id/capphfdt19fwqxaryjkzxb3oxmv-kan3fluzrooare_u3h3c...@mail.gmail.com


3 - the function substring_similarity() was renamed to
subword_similarity().

But there is not a function substring_similarity_pos() yet. It is not
trivial.



Sorry, in the previous patch was a typo. Here is the fixed patch.



I have attached a new version of the patch. It fixes error of operators 
<->> and %>:
- operator <->> did not pass the regression test in CentOS 32 bit (gcc 
4.4.7 20120313).
- operator %> did not pass the regression test in FreeBSD 32 bit (gcc 
4.2.1 20070831).


It was because of variable optimization by gcc.

In this patch pg_trgm documentation was corrected. Now operators were 
wrote as %> and <->> (not <% and <<->).


There is a problem in adding the substring_similarity_pos() function. It 
can bring additional overhead. Because we need to store characters 
position including spaces in addition. Spaces between words are lost in 
current implementation.

Does it actually need?


In conclusion, this patch introduces:
1 - functions:
- subword_similarity()
2 - operators:
- %>
- <->>
3 - GUC variables:
- pg_trgm.sml_limit
- pg_trgm.subword_limit

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/contrib/pg_trgm/pg_trgm--1.2.sql
--- b/contrib/pg_trgm/pg_trgm--1.2.sql
***
*** 3,13 
--- 3,15 
  -- complain if script is sourced in psql, rather than via CREATE EXTENSION
  \echo Use "CREATE EXTENSION pg_trgm" to load this file. \quit
  
+ -- Deprecated function
  CREATE FUNCTION set_limit(float4)
  RETURNS float4
  AS 'MODULE_PATHNAME'
  LANGUAGE C STRICT VOLATILE;
  
+ -- Deprecated function
  CREATE FUNCTION show_limit()
  RETURNS float4
  AS 'MODULE_PATHNAME'
***
*** 26,32  LANGUAGE C STRICT IMMUTABLE;
  CREATE FUNCTION similarity_op(text,text)
  RETURNS bool
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT STABLE;  -- stable because depends on trgm_limit
  
  CREATE OPERATOR % (
  LEFTARG = text,
--- 28,34 
  CREATE FUNCTION similarity_op(text,text)
  RETURNS bool
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT STABLE;  -- stable because depends on pg_trgm.sml_limit
  
  CREATE OPERATOR % (
  LEFTARG = text,
*** a/contrib/pg_trgm/trgm.h
--- b/contrib/pg_trgm/trgm.h
***
*** 105,111  typedef char *BITVECP;
  
  typedef struct TrgmPackedGraph TrgmPackedGraph;
  
! extern float4 trgm_limit;
  
  extern uint32 trgm2int(trgm *ptr);
  extern void compact_trigram(trgm *tptr, char *str, int bytelen);
--- 105,111 
  
  typedef struct TrgmPackedGraph TrgmPackedGraph;
  
! extern double trgm_sml_limit;
  
  extern uint32 trgm2int(trgm *ptr);
  extern void compact_trigram(trgm *tptr, char *str, int bytelen);
*** a/contrib/pg_trgm/trgm_gin.c
--- b/contrib/pg_trgm/trgm_gin.c
***
*** 206,212  gin_trgm_consistent(PG_FUNCTION_ARGS)
  			 * similarity is just c / len1.
  			 * So, independly on DIVUNION the upper bound formula is the same.
  			 */
! 			res = (nkeys == 0) ? false : ((float4) ntrue) / ((float4) nkeys))) >= trgm_limit) ? true : false);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
--- 206,213 
  			 * similarity is just c / len1.
  			 * So, independly on DIVUNION the upper bound formula is the same.
  			 */
! 			res = (nkeys == 0) ? false :
! ((float4) ntrue) / ((float4) nkeys))) >= trgm_sml_limit) ? true : false);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
***
*** 283,289  gin_trgm_triconsistent(PG_FUNCTION_ARGS)
  			/*
  			 * See comment in gin_trgm_consistent() about * upper bound formula
  			 */
! 			res = (nkeys == 0) ? GIN_FALSE : (float4) ntrue) / ((float4) nkeys)) >= trgm_limit) ? GIN_MAYBE : GIN_FALSE);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
--- 284,291 
  			/*
  			 * See comment in gin_trgm_consistent() about * upper bound formula
  			 */
! 			res = (nkeys == 0) ? GIN_FALSE :
! (float4) ntrue) / ((float4) nkeys)) >= trgm_sml_limit) ? GIN_MAYBE : GIN_FALSE);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
*** a/contrib/pg_trgm/trgm_gist.c
--- b/contrib/pg_trgm/trgm_gist.c
***
*** 294,300  gtrgm_consistent(PG_FUNCTION_ARGS)
  float4		tmpsml = cnt_sml(key, qtrg);
  
  /* strange bug at freebsd 5.2.1 and gcc 3.3.3 */
! res = (*(int *)  == *(int *) _limit || tmpsml > trgm_limit) ? true : false;
  			}
  			else if (ISALLTRUE(key))
  			{	/* non-leaf contains signature */
--- 294,301 
  float4		tmpsml = cnt_sml(key, qtrg);
  
  /* strange bug at freebsd 5.2.1 and gcc 3.3.3 */
! 		

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Stephen Frost
* Andres Freund (and...@anarazel.de) wrote:
> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> > * If the client disconnects with a non-exclusive backup running, the backup
> > is automatically aborted. This is the same thing that pg_basebackup does.
> > To use these non-exclusive backups the backup software will have to
> > maintain a persistent connection to the database -- something that should
> > not be a problem for any of the modern ones out there (but would've been
> > slightly trickier back in the days when we suggested shellscripts)
> 
> I think we might want to make this one optional, but I'm not going to
> fight super hard for it.

I was thinking along the same lines.

> > * A new version of pg_stop_backup is created, taking an argument specifying
> > if it's exclusive or not. The current version of pg_stop_backup() continues
> > to work for exclusive backups, with no changes to behavior. The new
> > pg_stop_backup will return a record of three columns instead of just the
> > value -- the LSN (pglsn), the backup label file (text) and the tablespace
> > map file (text).
> 
> I wonder if we shouldn't change the API a bit more aggressively. Right
> now we return the label and the map - but what if there's more files at
> some later point? One way would be to make it a SETOF function returning
> 'filename', 'content' or such.  Makes it less clear what to do with the
> lsn admittedly.

If we make the 'client disconnect means abort' optional then we'd also
need to modify the API of stop backup to specify which backup to stop,
I'd think.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
> On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund  wrote:
> > On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> > > Per discussionat the developer meeting in Brussels, here's a patch that
> > > makes some updates to the backup APIs, to support non-exclusive backups
> > > without using pg_basebackup.
> >
> > Thanks for following through with this!
> >
> > > * If the client disconnects with a non-exclusive backup running, the
> > backup
> > > is automatically aborted. This is the same thing that pg_basebackup does.
> > > To use these non-exclusive backups the backup software will have to
> > > maintain a persistent connection to the database -- something that should
> > > not be a problem for any of the modern ones out there (but would've been
> > > slightly trickier back in the days when we suggested shellscripts)
> >
> > I think we might want to make this one optional, but I'm not going to
> > fight super hard for it.
> 
> Not sure what you're referring to here. Do you mean being able to make a
> non-exclusive backup while not maintaining a connection? That's going to
> make things a *lot* more complicated, as we'd have to invent something like
> "backup slots" similar to what we're doing with replication slots. I doubt
> it's worth all that work and complexity.

Hrmmm.  If that's the case then perhaps you're right.  I liked the
general idea of not having to maintain a TCP connection during the
entire backup (TCP connections can be annoyingly finicky in certain
environments...), but I'm not sure it's worth a lot of additional
complexity.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread David Steele
On 2/10/16 7:46 AM, Magnus Hagander wrote:
> Per discussion at the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. <...>

This sounds like a great idea and I have signed up to review.

> * A new version of pg_stop_backup is created, taking an argument
> specifying if it's exclusive or not. The current version of
> pg_stop_backup() continues to work for exclusive backups, with no
> changes to behavior. The new pg_stop_backup will return a record of
> three columns instead of just the value -- the LSN (pglsn), the backup
> label file (text) and the tablespace map file (text). If used to cancel
> an exclusive backup, backup label file and tablespace map will be NULL.
> At this point it's up to the backup software to write down the files in
> the location of the backup.

It would be nice if this new pg_stop_backup() function also output the
exact time when the backup stopped.  Depending on how long
pg_stop_backup() waits for WAL to be archived a time-stamp recorded
after pg_stop_backup() returns can be pretty far off.

This information is handy for automating selection of a backup when
doing time-based PITR (or so the user can manually select).  For
exclusive backups it is possible to parse the .backup file to get this
information but non-exclusive backups do not output the .backup file.

I would be happy to see the time-stamp returned from the
pg_start_backup() function as well.  It's a bigger change, but once
pg_start_backup() returns multiple columns it will be easier to add more
columns in the future.

In fact, it might be better if backup_label and tablespace_map were
output by pg_start_backup().  This would give the backup software more
information to work with from the start.

Thanks!
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Magnus Hagander
On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund  wrote:

> Hi,
>
> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> > Per discussionat the developer meeting in Brussels, here's a patch that
> > makes some updates to the backup APIs, to support non-exclusive backups
> > without using pg_basebackup.
>
> Thanks for following through with this!
>
> > * If the client disconnects with a non-exclusive backup running, the
> backup
> > is automatically aborted. This is the same thing that pg_basebackup does.
> > To use these non-exclusive backups the backup software will have to
> > maintain a persistent connection to the database -- something that should
> > not be a problem for any of the modern ones out there (but would've been
> > slightly trickier back in the days when we suggested shellscripts)
>
> I think we might want to make this one optional, but I'm not going to
> fight super hard for it.
>

Not sure what you're referring to here. Do you mean being able to make a
non-exclusive backup while not maintaining a connection? That's going to
make things a *lot* more complicated, as we'd have to invent something like
"backup slots" similar to what we're doing with replication slots. I doubt
it's worth all that work and complexity.


>
> > * A new version of pg_stop_backup is created, taking an argument
> specifying
> > if it's exclusive or not. The current version of pg_stop_backup()
> continues
> > to work for exclusive backups, with no changes to behavior. The new
> > pg_stop_backup will return a record of three columns instead of just the
> > value -- the LSN (pglsn), the backup label file (text) and the tablespace
> > map file (text).
>
> I wonder if we shouldn't change the API a bit more aggressively. Right
> now we return the label and the map - but what if there's more files at
> some later point? One way would be to make it a SETOF function returning
> 'filename', 'content' or such.  Makes it less clear what to do with the
> lsn admittedly.
>

*Adding* more columns to the API shouldn't be a problem in the future. If
there's another file, we can return a fourth column. A backup program is
going to have to know about those things anyway and shouldn't just blindly
write those files to the backup, IMO.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread David Steele
On 2/10/16 9:44 AM, Stephen Frost wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
>> On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund  wrote:
>>> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
 * If the client disconnects with a non-exclusive backup running, the
>>> backup
 is automatically aborted. This is the same thing that pg_basebackup does.
 To use these non-exclusive backups the backup software will have to
 maintain a persistent connection to the database -- something that should
 not be a problem for any of the modern ones out there (but would've been
 slightly trickier back in the days when we suggested shellscripts)
>>>
>>> I think we might want to make this one optional, but I'm not going to
>>> fight super hard for it.
>>
>> Not sure what you're referring to here. Do you mean being able to make a
>> non-exclusive backup while not maintaining a connection? That's going to
>> make things a *lot* more complicated, as we'd have to invent something like
>> "backup slots" similar to what we're doing with replication slots. I doubt
>> it's worth all that work and complexity.

Agreed.

> Hrmmm.  If that's the case then perhaps you're right.  I liked the
> general idea of not having to maintain a TCP connection during the
> entire backup (TCP connections can be annoyingly finicky in certain
> environments...), but I'm not sure it's worth a lot of additional
> complexity.

Well, pgBackRest holds a connection to PostgreSQL through the entire
backup and will abort the backup if it is severed.  The connection is
always held locally, though, even if the master backup process is on a
different server.  I haven't gotten any reports of aborted backups due
to the connection failing.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andrew Dunstan



On 02/09/2016 11:21 PM, Andrew Dunstan wrote:





The idea I was toying with is that previous filesystem activity (making
the temp install, the server's never-fsync'd writes, etc) has built up a
bunch of dirty kernel buffers, and at some point the kernel goes nuts
writing all that data.  So the issues we're seeing would come and go
depending on the timing of that I/O spike.  I'm not sure how to prove
such a theory from here.



Yeah. It's faintly possible that a kernel upgrade will  help.






Another data point. I have another RPi2B that is running Debian Wheezy 
rather than the Fedora remix. I'm running the same test on it we ran 
yesterday on axolotl. It seems to be running without having the same 
problems. So maybe the particular kernel port to ARM is what's tripping 
us up.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2016-02-10 Thread Teodor Sigaev



I duplicate the patch here.


it's very good thing to update disctionaries to support modern versions. And 
thank you for improving documentation. Also I've impressed by long description 
in spell.c header.


Som notices about code:

1
 struct SPELL. Why do you remove union p? You leave comment
 about using d struct instead of flag field and as can see
 it's right comment. It increases size of SPELL structure.

2 struct AFFIX. I'm agree with Alvaro taht sum of sizes of bit fields should be 
less or equal to size of integer. In opposite case, suppose, we can get 
undefined behavior. Please, split  bitfields  to two integers.


3 unsigned char flagval[65000];
  Is it forbidden to use 6 number? In any case, decodeFlag() doesn't
  restrict return value. I suggest to enlarge array to 1<<16 and add limit
  to return value of decodeFlag().

4
 I'd like to see a short comment describing at least new functions

5
 Pls, add tests for new code.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Mike Rylander
On Wed, Feb 10, 2016 at 4:28 AM, Oleg Bartunov  wrote:
> It  looks like there is a very old bug in full text parser (somebody pointed
> me on it), which appeared after moving tsearch2 into the core.  The problem
> is in how full text parser process hyphenated words. Our original idea was
> to report hyphenated word itself as well as its parts and ignore hyphen.
> That was how tsearch2 works.
>
> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
> than ones with plain text words like 'four-dot', no hyphenated word itself
> reported.
>
> I think we should consider this as a bug and produce fix for all supported
> versions.
>

The Evergreen project has long depended on tsearch2 (both as an
extension and in-core FTS), and one thing we've struggled with is date
range parsing such as birth and death years for authors in the form of
1979-2014, for instance.  Strings like that end up being parsed as two
lexems, "1979" and "-2014".  We work around this by pre-normalizing
strings matching /(\d+)-(\d+)/ into two numbers separated by a space
instead of a hyphen, but if fixing this bug would remove the need for
such a preprocessing step it would be a great help to us.  Would such
strings be parsed "properly" into lexems of the form of "1979" and
"2014" with you proposed change?

Thanks!

--
Mike Rylander

> After  investigation we found this commit:
>
> commit 73e6f9d3b61995525785b2f4490b465fe860196b
> Author: Tom Lane 
> Date:   Sat Oct 27 19:03:45 2007 +
>
> Change text search parsing rules for hyphenated words so that digit
> strings
> containing decimal points aren't considered part of a hyphenated word.
> Sync the hyphenated-word lookahead states with the subsequent
> part-by-part
> reparsing states so that we don't get different answers about how much
> text
> is part of the hyphenated word.  Per my gripe of a few days ago.
>
>
> 8.2.23
>
> select tok_type, description, token from ts_debug('dot-four');
>   tok_type   |  description  |  token
> -+---+--
>  lhword  | Latin hyphenated word | dot-four
>  lpart_hword | Latin part of hyphenated word | dot
>  lpart_hword | Latin part of hyphenated word | four
> (3 rows)
>
> select tok_type, description, token from ts_debug('dot-4');
>   tok_type   |  description  | token
> -+---+---
>  hword   | Hyphenated word   | dot-4
>  lpart_hword | Latin part of hyphenated word | dot
>  uint| Unsigned integer  | 4
> (3 rows)
>
> select tok_type, description, token from ts_debug('4-dot');
>  tok_type |   description| token
> --+--+---
>  uint | Unsigned integer | 4
>  lword| Latin word   | dot
> (2 rows)
>
> 8.3.23
>
> select alias, description, token from ts_debug('dot-four');
>   alias  |   description   |  token
> -+-+--
>  asciihword  | Hyphenated word, all ASCII  | dot-four
>  hword_asciipart | Hyphenated word part, all ASCII | dot
>  blank   | Space symbols   | -
>  hword_asciipart | Hyphenated word part, all ASCII | four
> (4 rows)
>
> select alias, description, token from ts_debug('dot-4');
>alias   |   description   | token
> ---+-+---
>  asciiword | Word, all ASCII | dot
>  int   | Signed integer  | -4
> (2 rows)
>
> select alias, description, token from ts_debug('4-dot');
>alias   |   description| token
> ---+--+---
>  uint  | Unsigned integer | 4
>  blank | Space symbols| -
>  asciiword | Word, all ASCII  | dot
> (3 rows)
>
>
> Regards,
> Oleg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
Artur Zakirov  writes:
> I agree that previous patch is wrong. Instead of using new 
> parse_ooaffentry() function maybe better to use sscanf() with %ls 
> format. The %ls format is used to read a wide character string.

No, that way is going to give you worse portability problems than what
we have now.  Older implementations won't have %ls, and even if they
do, they might not have wcstombs() which is the only way you'd get from
libc's idea of wide characters to an encoding we recognize.

> I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
> with the %s format reads the string one character at a time. The size of 
> letter 'х' is 2. And sscanf() separate it into two wrong characters.

That argument might be convincing if OSX behaved that way for all
multibyte characters, but it doesn't seem to be doing that.  Why is
only 'х' affected?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some refactoring of index structures .

2016-02-10 Thread Anastasia Lubennikova

Hi, hackers.

Long story short, I'd like to do some refactoring of the code related to 
indexes.


I work on patch which provides INCLUDING columns functional [1]. The 
patch was reviewed again and again, I fixed many issues, but we still 
don't sure enough that all occurrences of /rd_index->indnatts/ are 
replaced with /rd_index->indnkeyatts /accurately. I have no idea how to 
do it without thorough reading of code, so I am doing it now.


I already replaced all these /rd_index->indnatts ///with macro 
/IndexRelationGetNumberOfAttributes/ and /rd_index->indnkeyatts /with 
macro /IndexRelationGetNumberOfKeyAttributes/. But still there are a lot 
of places with vague naming.
For example, /indnatts/, /natts/, /ncolumns/ and so on for the very same 
/rd_index->indnatts. /I changed them with /indnatts./
Or /indexStruct, index, idxForm, index_form/ for /index->rd_index./ They 
are definitely should be unified. I'd prefer indexForm. Any objections?


One more point is indkey in FormData_pg_index 
. 
The same for indkeys 
, 
ii_KeyAttrNumbers 
 
, indexkeys 
. 
With my patch index can have key and non-key (included) columns. 
Therefore indkey is not just the key now. It contains both key and 
included columns. And its length is not indnkeyatts as one might think, 
but indnatts. I suppose it's worth to rename it somehow. But I can't 
find the proper name. Do you have any ideas?


If you see any related changes, please, let me know.

1. 
http://www.postgresql.org/message-id/flat/56168952.4010...@postgrespro.ru#56168952.4010...@postgrespro.ru


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-02-10 Thread Craig Ringer
On 11 February 2016 at 00:21, Robert Haas  wrote:
>
>
> If there's a strong feeling that we should keep the old APIs around,
> we can do that, but I think that (1) if we don't remove them now, we
> probably never will and (2) they are vile APIs.


Yep.

Delete 'em.

Things regularly change between releases in ways that're visible to
extensions, though not necessarily as obviously part of the extension API.
The most obvious being at least one change to ProcessUtility_hook and
probably other hook function signature changes over time.

It's a pain to have to #if around the differences, but better to export
that to extensions than *never* be able to retire cruft from core. Lets not
be Java, still stuck with Java 1.0 APIs everyone knows are unspeakably
awful.

Delete the APIs, relnote it with the required changes and be done with it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread David Steele
On 2/10/16 11:01 AM, Andres Freund wrote:
> On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote:
>>> I would be happy to see the time-stamp returned from the
>>> pg_start_backup() function as well.  It's a bigger change, but once
>>> pg_start_backup() returns multiple columns it will be easier to add more
>>> columns in the future.
>>>
>>> In fact, it might be better if backup_label and tablespace_map were
>>> output by pg_start_backup().  This would give the backup software more
>>> information to work with from the start.
>>
>> I was trying to figure out why that's a bad idea, but I can't really say
>> why :)
>>
>> For pg_basebackup backups, I think the reason we put it at the end is
>> simply that we don't want to write it into the backup directory until the
>> rest of the backup is complete, since it's not going to be useful before
>> then. But that requirement can certainly be put on the backup client
>> instead of the server. With the newer API it's going to have to keep those
>> things around anyway.
>>
>> That would then change the function syntax for pg_start_backup() but
>> instead make pg_stop_backup() now behave the same as it did before.
>>
>> Would anybody else like to comment on if that's a good idea or if there are
>> any holes in it? :)
> 
> I don't think that's a good idea. It makes it impossible to add
> information to labels about the minimum recovery point and
> such. Currently there's some rather fragile logic to discover that, but
> I'd really like to get rid of that at some point.

That makes sense.  The backup_label "as is" could be output at the
beginning but if we want to add the minimum recovery point it would need
to be output at the end.

It seems like tablespace_map could still be output at the beginning, though.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andres Freund
On 2016-02-09 22:27:07 -0500, Tom Lane wrote:
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data.  So the issues we're seeing would come and go
> depending on the timing of that I/O spike.  I'm not sure how to prove
> such a theory from here.

It'd be interesting to monitor
$ grep -E '^(Dirty|Writeback):' /proc/meminfo
output. At least on linux. It's terribly easy to get the kernel into a
state where it has so much data needing to be written back that an
immediate checkpoint takes pretty much forever.

If I understand the code correctly, once a buffer has been placed into
'writeback', it'll be more-or-less processed in order. That can e.g. be
because these buffers have been written to more than 30s ago. If there
then are buffers later that also need to be written back (e.g. due to an
fsync()), you'll often wait for the earlier ones.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 11:08 AM, Heikki Linnakangas  wrote:
> On 10/02/16 17:12, Robert Haas wrote:
>> Code cleanup in the wake of recent LWLock refactoring.
>>
>> As of commit c1772ad9225641c921545b35c84ee478c326b95e, there's no
>> longer any way of requesting additional LWLocks in the main tranche,
>> so we don't need NumLWLocks() or LWLockAssign() any more.  Also,
>> some of the allocation counters that we had previously aren't needed
>> any more either.
>
> (Sorry if this was discussed already, I haven't been paying attention)
>
> LWLockAssign() is used by extensions. Are we OK with just breaking them,
> requiring them to change LWLockAssign() with the new mechanism, with #ifdefs
> to support multiple server versions? Seems like it shouldn't be too hard to
> keep LWLockAssign() around for the benefit of extensions, so it seems a bit
> inconsiderate to remove it.

It was discussed on the "Refactoring of LWLock tranches" thread,
though there wasn't an overwhelming consensus or anything.  I don't
think the burden on extension authors is very much, because you just
have to do this:

--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -404,7 +404,7 @@ _PG_init(void)
 * resources in pgss_shmem_startup().
 */
RequestAddinShmemSpace(pgss_memsize());
-   RequestAddinLWLocks(1);
+   RequestNamedLWLockTranche("pg_stat_statements", 1);

/*
 * Install hooks.
@@ -480,7 +480,7 @@ pgss_shmem_startup(void)
if (!found)
{
/* First time through ... */
-   pgss->lock = LWLockAssign();
+   pgss->lock =
&(GetNamedLWLockTranche("pg_stat_statements"))->lock;
pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
pgss->mean_query_len = ASSUMED_LENGTH_INIT;
SpinLockInit(>mutex);

We've gone through and done this to all of the EnterpriseDB
proprietary extensions over the last couple of days.

If there's a strong feeling that we should keep the old APIs around,
we can do that, but I think that (1) if we don't remove them now, we
probably never will and (2) they are vile APIs.  Allocating the number
of add-in lwlocks that are requested or a minimum of 3 is just awful.
If somebody allocates a different number than they request it
sometimes works, except when combined with some other extension, when
it maybe doesn't work.  This way, you ask for an LWLock under a given
name and then get it under that name, so if an extension does it
wrong, it is that extension that breaks rather than some other one.  I
think that's enough benefit to justify requiring a small code change
on the part of extension authors that use LWLocks, but that's
obviously biased by my experience maintaining EDB's extensions, and
other people may well feel differently.  But to me, the update that's
required here is no worse than what
5043193b78919a1bd144563aadc2f7f726549913 required, and I didn't hear
any complaints about that.  You just go through and do to your code
what got done to the core contrib modules, and you're done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Tom Lane
Oleg Bartunov  writes:
> It  looks like there is a very old bug in full text parser (somebody
> pointed me on it), which appeared after moving tsearch2 into the core.  The
> problem is in how full text parser process hyphenated words. Our original
> idea was to report hyphenated word itself as well as its parts and ignore
> hyphen. That was how tsearch2 works.

> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
> than ones with plain text words like 'four-dot', no hyphenated word itself
> reported.

> I think we should consider this as a bug and produce fix for all supported
> versions.

I don't see anything here that looks like a bug, more like a definition
disagreement.  As such, I'd be pretty dubious about back-patching a
change.  But it's hard to debate the merits when you haven't said exactly
what you'd do instead.

I believe the commit you mention was intended to fix this inconsistency:

http://www.postgresql.org/message-id/6269.1193184...@sss.pgh.pa.us

so I would be against simply reverting it.  In any case, the examples
given there make it look like there was already inconsistency about mixed
words and numbers.  Do we really think that "4-dot" should be considered
a hyphenated word?  I'm not sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
2016-02-09 23:31 GMT+01:00 Jim Nasby :

> On 2/8/16 10:02 AM, Pavel Stehule wrote:
>
>>
>> I think it would make sense to implement the interface in at least
>> one of our other supported PLs. I'm not entirely clear how well this
>> will match up with, say, plperl, but I'd be interested to see.
>>
>>
>> The minimalistic interface can be based on get/set functions. We can do
>> necessary transformations there.
>>
>
> get/set functions where?
>
> I don't think that really makes sense. I would expect schema variables to
> be exposed to a function as variables or attributes, either in the global
> namespace for that PL, or as an attribute of some object (ie the plpy
> object in plpython).
>

I don't know a python, and I don't know what is possible there and what I
know. Set/Get function I can implement in any PL other than PLpgSQL. You
have to do conversion from Postgres type to PL types and I can do it in
function.



>
> I certainly wouldn't expect this patch to do that for all existing PLs,
> but I think it's important to do it for one PL besides plpgsql to make sure
> there's no gotchas.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.

2016-02-10 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 10, 2016 at 11:08 AM, Heikki Linnakangas  wrote:
>> (Sorry if this was discussed already, I haven't been paying attention)
>> 
>> LWLockAssign() is used by extensions. Are we OK with just breaking them,
>> requiring them to change LWLockAssign() with the new mechanism, with #ifdefs
>> to support multiple server versions? Seems like it shouldn't be too hard to
>> keep LWLockAssign() around for the benefit of extensions, so it seems a bit
>> inconsiderate to remove it.

> If there's a strong feeling that we should keep the old APIs around,
> we can do that, but I think that (1) if we don't remove them now, we
> probably never will and (2) they are vile APIs.  Allocating the number
> of add-in lwlocks that are requested or a minimum of 3 is just awful.
> If somebody allocates a different number than they request it
> sometimes works, except when combined with some other extension, when
> it maybe doesn't work.  This way, you ask for an LWLock under a given
> name and then get it under that name, so if an extension does it
> wrong, it is that extension that breaks rather than some other one.  I
> think that's enough benefit to justify requiring a small code change
> on the part of extension authors that use LWLocks, but that's
> obviously biased by my experience maintaining EDB's extensions, and
> other people may well feel differently.

FWIW, I wasn't paying attention either, but I'm convinced by Robert's
argument.  Avoiding coupling between extensions is worth an API break.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Andres Freund
On 2016-02-10 11:06:01 -0500, David Steele wrote:
> That makes sense.  The backup_label "as is" could be output at the
> beginning but if we want to add the minimum recovery point it would need
> to be output at the end.
> 
> It seems like tablespace_map could still be output at the beginning, though.

I don't really see enough benefit to go that way. What are you thinking
of using the information for ("This would give the backup software more
information to work with from the start.")?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 3:24 AM, Aleksander Alekseev
 wrote:
>> Basically, the burden for you to impose a new coding rule on everybody
>> who uses shared hash tables in the future is very high.
>
> I fixed an issue you described. Number of spinlocks doesn't depend of
> NUM_LOCK_PARTITIONS anymore and could be specified for each hash table
> on a calling side.

Thanks, this looks way better.  Some more comments:

- I don't think there's any good reason for this patch to change the
calling convention for ShmemInitHash().  Maybe that's a good idea and
maybe it isn't, but it's a separate issue from what this patch is
doing, and if we're going to do it at all, it should be discussed
separately.  Let's leave it out of this patch.

- I would not provide an option to change the number of freelist
mutexes.  Let's dump DEFAULT_MUTEXES_NUM and MAX_MUTEXES_NUM and have
FREELIST_MUTEXES_NUM.  The value of 32 which you selected is fine with
me.

- The change to LOG2_NUM_LOCK_PARTITIONS in lwlock.h looks like an
independent change.  Again, leave it out of this patch and submit it
separately with its own benchmarking data if you want to argue for it.

I think if you make these changes this patch will be quite a bit
smaller and in pretty good shape.

Thanks for working on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Artur Zakirov

On 10.02.2016 18:51, Teodor Sigaev wrote:

Hmm. Here
src/backend/access/transam/xlog.c read_tablespace_map()
using %s in scanf looks suspisious. I don't fully understand but it
looks like it tries to read oid as string. So, it should be safe in
usial case

Next, _LoadBlobs() reads filename (fname) with a help of sscanf. Could
file name be in UTF-8 encoding here?


This function reads the "blobs.toc" file. It lines have the following 
format:

 

Where  is blob_.dat.

Therefore it should be safe too.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
Hi

Would it make sense to explicitly import variables in function definitions?
>
> CREATE SESSION VARIABLE foo integer;
> CREATE SESSION VARIABLE my_schema.bar text;
> SET SESSION VARIABLE foo to 4;
> SET SESSION VARIABLE my_schema.bar to 'hi mom';
>
> CREATE FUNCTION my_func (p_param text) returns boolean
> LANGUAGE SQL
> IMPORT g_foo integer FROM foo,
> IMPORT g_textval IN OUT text FROM my_schema.bar
>
> AS $$
>
> SELECT COUNT(*) > 1
> FROM my_table
> WHERE id = g_foo
> AND name = g_textval;
> $$;
>
>
> The IMPORT clause would be something like:
>
> IMPORT local_var_name [IN] [OUT] type FROM [session variable | expression ]
>
>
It cannot be implemented in SQL language, because there are not other
variables than function parameters.

It is possible in PLpgSQL, but I prefer the ALIAS keyword - introduction
new reserved keyword introduces a compatibility issues.


>
> And obviously it would reject importing an expression as an OUT type.
> Importing an expression would largely serve the purpose of compile-time
> macro, or allowing us to pass parameters into anonymous blocks, something
> we've wanted for a while now.
>
> With something like this, the session variables are seen as parameters
> inside the function regardless of language and with no new prefix, :, @, or
> otherwise.
>
> Oh, and I suggest we call them SESSION variables rather than SCHEMA
> variables, to reinforce the idea of how long the values in the variables
> live. A session variable is in a sense a 1x1 temp table, whose definition
> persists across sessions but whose value does not.
>

I didn't propose SESSION variables - now there are some workarounds how to
anybody can emulate it, so this feature can wait. What we need is safe
session variables with limited access. And the border can be defined by
schema scope. So the keyword SCHEMA has sense, and it is necessary.


>
> Of course, if they do persist across sessions, then yeah, SCHEMA makes
> more sense. But every package variable in Oracle PL/SQL was initialized
> when the package was first loaded into the session.
>
>


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Andres Freund
On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote:
> > I would be happy to see the time-stamp returned from the
> > pg_start_backup() function as well.  It's a bigger change, but once
> > pg_start_backup() returns multiple columns it will be easier to add more
> > columns in the future.
> >
> > In fact, it might be better if backup_label and tablespace_map were
> > output by pg_start_backup().  This would give the backup software more
> > information to work with from the start.
> >
> 
> I was trying to figure out why that's a bad idea, but I can't really say
> why :)
> 
> For pg_basebackup backups, I think the reason we put it at the end is
> simply that we don't want to write it into the backup directory until the
> rest of the backup is complete, since it's not going to be useful before
> then. But that requirement can certainly be put on the backup client
> instead of the server. With the newer API it's going to have to keep those
> things around anyway.
> 
> That would then change the function syntax for pg_start_backup() but
> instead make pg_stop_backup() now behave the same as it did before.
> 
> Would anybody else like to comment on if that's a good idea or if there are
> any holes in it? :)

I don't think that's a good idea. It makes it impossible to add
information to labels about the minimum recovery point and
such. Currently there's some rather fragile logic to discover that, but
I'd really like to get rid of that at some point.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread David Steele
On 2/10/16 10:50 AM, Magnus Hagander wrote:
> On Wed, Feb 10, 2016 at 4:38 PM, David Steele 
> This information is handy for automating selection of a backup when
> doing time-based PITR (or so the user can manually select).  For
> exclusive backups it is possible to parse the .backup file to get this
> information but non-exclusive backups do not output the .backup file.
> 
> 
> The non-exclusive backups *do* output the backup_label file, it just
> shows up as a text field instead of as a separate file. You're supposed
> to write that data to a file in the backup program.

I meant the .backup file (e.g. 0001008C001A.0028.backup)
that is archived along with the WAL for an exlcusive backup.  I believe
this is currently the only place to get the stop time (without reading
the WAL segments).

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread David Steele
On 2/10/16 11:12 AM, Andres Freund wrote:
> On 2016-02-10 11:06:01 -0500, David Steele wrote:
>> That makes sense.  The backup_label "as is" could be output at the
>> beginning but if we want to add the minimum recovery point it would need
>> to be output at the end.
>>
>> It seems like tablespace_map could still be output at the beginning, though.
> 
> I don't really see enough benefit to go that way. What are you thinking
> of using the information for ("This would give the backup software more
> information to work with from the start.")?

Currently backup software that handles tablespaces needs to read the
pg_tblspc directory to build an oid/path mapping in order to know which
tablespace directories to copy.  Since Postgres is already doing this
when it creates tablespace_map it seems like it's redundant for the
backup software to do it again.

Since tablespace_map is created in do_pg_start_backup() and I don't see
how it could be updated after that, I think it is logical to output it
from pg_start_backup().  I don't feel strongly about it, though.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
2016-02-09 20:55 GMT+01:00 David G. Johnston :

> On Tue, Feb 9, 2016 at 11:32 AM, Corey Huinker 
> wrote:
>
>>
>> Oh, and I suggest we call them SESSION variables rather than SCHEMA
>> variables, to reinforce the idea of how long the values in the variables
>> live. A session variable is in a sense a 1x1 temp table, whose definition
>> persists across sessions but whose value does not.
>>
>> Of course, if they do persist across sessions, then yeah, SCHEMA makes
>> more sense. But every package variable in Oracle PL/SQL was initialized
>> when the package was first loaded into the session.
>>
>>
> ​The key distinction for SCHEMA was that all functions in the schema would
> be able to see them (and only those in the schema).
>
> I am a bit partial, with little deep thought, to the IMPORT mechanic.
> Changing them to actual session variables would be doable and you could
> allow for the IMPORT specification to use search_path or explicit means to
> locate said variables regardless of which schema​
>
> ​they exist in.
>

Very important part of my proposal is independence on search_path. With
search_path you have not any control over variable type, variable existence
- and there are possible lot of impacts on plan cache, behave. So I propose
SCHEMA VARIABLES with schema scope - and then search_path has zero effect
on the behave. It doesn't introduce new dependencies.

Pavel

>
> However, part of the goal is to blend into the broader database community
> and thus increase porting capabilities.  I'm not sure how well this would
> help fulfill that goal.
>
> David J.
> ​
>
>


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-10 Thread Robert Haas
On Tue, Feb 9, 2016 at 6:35 PM, Kouhei Kaigai  wrote:
> Unfortunately, it was not sufficient.
>
> Due to the assumption, the buffer page to be suspended does not exist
> when a backend process issues a series P2P DMA command. (If block would
> be already loaded to the shared buffer, it don't need to issue P2P DMA,
> but just use usual memory<->device DMA because RAM is much faster than
> SSD.)
> It knows the pair of (rel,fork,block), but no BufferDesc of this block
> exists. Thus, it cannot acquire locks in BufferDesc structure.
>
> Even if the block does not exist at this point, concurrent process may
> load the same page. BufferDesc of this page shall be assigned at this
> point, however, here is no chance to lock something in BufferDesc for
> the process which issues P2P DMA command.
>
> It is the reason why I assume the suspend/resume mechanism shall take
> a pair of (rel,fork,block) as identifier of the target block.

I see the problem, but I'm not terribly keen on putting in the hooks
that it would take to let you solve it without hacking core.  It
sounds like an awfully invasive thing for a pretty niche requirement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve docs wrt catalog object ACLs

2016-02-10 Thread Tom Lane
Stephen Frost  writes:
> The way permissions on catalog objects are handled isn't discussed at
> all in the documentation.  Barring objections, I'll commit and
> back-patch the attached to improve that situation in the next day or so.

I have no objection to the goal, but I do not think this wording is very
helpful.  In particular I find the terminology "catalog object" vague
and confusing: are you talking about catalogs, objects described in the
catalogs, or both?

You probably need to distinguish at least two cases:

1. Altering permissions on system catalogs, as such, only restricts what
can be done by user queries on the catalogs; the database's internal
operations do not check permissions when accessing or updating catalogs.
Thus for example denying select on pg_proc does not stop the parser from
looking up function names, but it would break psql's \df.

2. Altering permissions on built-in objects, such as built-in functions,
does work to the extent that those objects are used in user queries (and
not by internal operations).

The point about such changes not being preserved across pg_dump or
pg_upgrade applies to both cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Jim Nasby

On 2/10/16 11:25 AM, Pavel Stehule wrote:


Oh, and I suggest we call them SESSION variables rather than SCHEMA
variables, to reinforce the idea of how long the values in the
variables live. A session variable is in a sense a 1x1 temp table,
whose definition persists across sessions but whose value does not.


I didn't propose SESSION variables - now there are some workarounds how
to anybody can emulate it, so this feature can wait. What we need is
safe session variables with limited access. And the border can be
defined by schema scope. So the keyword SCHEMA has sense, and it is
necessary.


BTW, if all that's desired here are session variables for plpgsql, I 
think it makes a lot more sense to start with implementing per-function 
session variables. That's a lot simpler design-wise and is something we 
should have anyway. You don't necessarily want session variables to be 
schema-level. (I realize the other PLs make them global, which is even 
worse, but that's no reason to continue that path.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
>> I didn't propose SESSION variables - now there are some workarounds how
>> to anybody can emulate it, so this feature can wait. What we need is
>> safe session variables with limited access. And the border can be
>> defined by schema scope. So the keyword SCHEMA has sense, and it is
>> necessary.
>>
>
> BTW, if all that's desired here are session variables for plpgsql, I think
> it makes a lot more sense to start with implementing per-function session
> variables. That's a lot simpler design-wise and is something we should have
> anyway. You don't necessarily want session variables to be schema-level. (I
> realize the other PLs make them global, which is even worse, but that's no
> reason to continue that path.)


I am not able to implement SET and GET content in one function effectively.
I believe so static variables can be enough for 50%, but it is too limited.
Postgres cannot to pass and work with references, so this C design can be
too expensive.

Regards

Pavel


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
2016-02-10 20:25 GMT+01:00 Jim Nasby :

> On 2/10/16 1:17 PM, Pavel Stehule wrote:It is too simple and too like
> workaround :) I can do it this in plpgsql
>
>> extension probably.
>>
>
> I think it's something people will definitely want. If we don't have it,
> then they're going to be using schema variables as a work-around because
> they can't do a private static variable inside a single function.
>

the schema variables can be used for different purpose, but different
direction isn't possible. So I am starting with schema variables.

But schema variables doesn't block implementation private static variables,
and probably if we will have static variables, then schema variables can
reuse (or use) related code well.


>
> Most importantly, since this effects only plpgsql and only
>> individual functions, the design is simple and should be easy to
>> commit in 9.6. I don't have the same confidence with schema variables.
>>
>>
>> My target is not 9.6 - next commitfest will be full - finishing multi
>> CPU queries, logical replication, .. and I have still three opened
>> patches. But if we find a agreement in this spring, I can implement it
>> in summer, and it can be in upstream in early 9.7 commitfest. I know,
>> this topic is difficult, so have to start it now.
>>
>
> Sure. I think it would be useful to have a wiki page with info as it gets
> ironed out. A good starting point would be use cases. One that I don't
> think has been considered is different extensions adding/using different
> schema variables. Questions like should extension A have direct access to
> variables for extension B.


yes, it is question.

My reply is - not (my opinion), minimally in first iteration. a) I can use
a functions, b) direct sharing content (variables) between extensions,
schemas is not recommended generally (it is not just only my idea).

second question is "why you need direct access to variables between
extensions, schemas?". Can you write some use cases?

Maybe we need a different class for this purpose -  some stream (pipe)
between extensions. But this is out of this scope. Although it is pretty
valid.

Regards

Pavel



>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


[HACKERS] Improve docs wrt catalog object ACLs

2016-02-10 Thread Stephen Frost
Greetings,

The way permissions on catalog objects are handled isn't discussed at
all in the documentation.  Barring objections, I'll commit and
back-patch the attached to improve that situation in the next day or so.

Thanks!

Stephen
From ad8e663893ea906238a9c0346bf8791eafe3d333 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Wed, 10 Feb 2016 13:28:11 -0500
Subject: [PATCH] Add note regarding permissions in pg_catalog

Add a note to the system catalog section pointing out that while
modifying the permissions on catalog tables is possible, it's
unlikely to have the desired effect.
---
 doc/src/sgml/catalogs.sgml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 412c845..3e8ebee 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -21,6 +21,17 @@
particularly esoteric operations, such as adding index access methods.
   
 
+  
+   
+Changing the permissions on objects in the system catalogs, while
+possible, is unlikely to have the desired effect as the internal
+lookup functions use a cache and do not check the permissions nor
+policies of tables in the system catalog.  Further, permission
+changes to objects in the system catalogs are not preserved by
+pg_dump or across upgrades.
+   
+  
+
  
   Overview
 
-- 
2.5.0



signature.asc
Description: Digital signature


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-10 Thread Pavel Stehule
Hi



>
>> Actually I imagine that if there's no agreement between author and first
>> reviewer, there might not *be* a committer in the first place.  Perhaps
>> try to get someone else to think about it and make a decision.  It is
>> possible that some other committer is able to decide by themselves but I
>> wouldn't count on it.
>>
>
> +1.
>
> FWIW, I'd think it's better to not break backwards compatibility, but I'm
> also far from a python expert. It might well be worth adding a plpython GUC
> to control the behavior so that there's a migration path forward, or maybe
> do something like the 'import __future__' that python is doing to ease
> migration to python 3.
>
>
>
Iacob is maybe little bit too defensive - but why not. The implementation
of GUC should not be hard. I see it as best way now. Tomorrow I'll try to
find last versions of this patch in mailbox and try to design compatibility
mode.

I don't like too much a introduction of new general function raise
(proposed by Jim). It is not consistent with current design and it needs a
introduction of enums visible from user level. The work with it isn't too
much comfortable. If it will be required, then we have it. The original
implementation of this proposal was designed in exactly same style.

Regards

Pavel


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Jim Nasby

On 2/10/16 1:04 PM, Pavel Stehule wrote:


BTW, if all that's desired here are session variables for plpgsql, I
think it makes a lot more sense to start with implementing
per-function session variables. That's a lot simpler design-wise and
is something we should have anyway. You don't necessarily want
session variables to be schema-level. (I realize the other PLs make
them global, which is even worse, but that's no reason to continue
that path.)


I am not able to implement SET and GET content in one function
effectively. I believe so static variables can be enough for 50%, but it
is too limited. Postgres cannot to pass and work with references, so
this C design can be too expensive.


You can always accept a boolean that tells you if you're setting or just 
returning. And there's probably some use cases where you don't even need 
to expose the variable outside of the function.


Most importantly, since this effects only plpgsql and only individual 
functions, the design is simple and should be easy to commit in 9.6. I 
don't have the same confidence with schema variables.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-10 Thread Jim Nasby

On 2/10/16 12:44 PM, Pavel Stehule wrote:


FWIW, I'd think it's better to not break backwards compatibility,
but I'm also far from a python expert. It might well be worth adding
a plpython GUC to control the behavior so that there's a migration
path forward, or maybe do something like the 'import __future__'
that python is doing to ease migration to python 3.



Iacob is maybe little bit too defensive - but why not. The
implementation of GUC should not be hard. I see it as best way now.
Tomorrow I'll try to find last versions of this patch in mailbox and try
to design compatibility mode.


BTW, there's other cases where we're going to face this compatibility 
issue. The one I know of right now is that current handling of composite 
types containing arrays in plpython sucks, but there's no way to change 
that without breaking compatibility.


I don't see any good way to handle these compatibility things other than 
giving each one its own pl-specific GUC, but I figured I'd at least 
mention it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-10 Thread Jim Nasby

On 2/10/16 12:26 PM, Jim Nasby wrote:

I editorialized the docs and some of the comments. In particular, I documented 
behavior of not truncating, and recommended casting to name[] if user cares 
about that. Added a unit test to verify that works. BTW, I saw mention in the 
thread about not truncated spaces, but the function*does*  truncate them, 
unless they're inside quotes, where they're legitimate.


New patch for that.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 139aa2b..b4a2898 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1778,6 +1778,27 @@
   

 
+ parse_ident
+
+parse_ident(str 
text,
+   [ strictmode boolean DEFAULT 
true ] )
+   
+   text[]
+   Split qualified identifier into array 
parts.
+   When strictmode is false, extra characters after 
the identifier are ignored.
+   This is useful for parsing identifiers for objects like functions and 
arrays that may have trailing
+   characters. By default, extra characters after the last identifier are 
considered an error.
+   second parameter is false, then chars after last identifier are 
ignored. Note that this function
+   does not truncate quoted identifiers. If you care about that you should 
cast the result of this
+  function to name[].
+   
+   parse_ident('"SomeSchema".someTable')
+   "SomeSchema,sometable"
+  
+
+  
+   
+
  pg_client_encoding
 
 pg_client_encoding()
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 923fe58..61d5b80 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -965,3 +965,10 @@ RETURNS jsonb
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'jsonb_set';
+
+CREATE OR REPLACE FUNCTION
+  parse_ident(str text, strictmode boolean DEFAULT true)
+RETURNS text[]
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2b4ab20..7aa5b76 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -130,6 +130,15 @@ scanstr(const char *s)
 char *
 downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
+   return downcase_identifier(ident, len, warn, true);
+}
+
+/*
+ * a workhorse for downcase_truncate_identifier
+ */
+char *
+downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+{
char   *result;
int i;
boolenc_is_single_byte;
@@ -158,12 +167,13 @@ downcase_truncate_identifier(const char *ident, int len, 
bool warn)
}
result[i] = '\0';
 
-   if (i >= NAMEDATALEN)
+   if (i >= NAMEDATALEN && truncate)
truncate_identifier(result, i, warn);
 
return result;
 }
 
+
 /*
  * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
  *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 6a306f3..3072c32 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -21,12 +21,15 @@
 #include 
 
 #include "access/sysattr.h"
+#include "access/htup_details.h"
 #include "catalog/catalog.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "parser/scansup.h"
 #include "parser/keywords.h"
 #include "postmaster/syslogger.h"
 #include "rewrite/rewriteHandler.h"
@@ -38,6 +41,7 @@
 #include "utils/ruleutils.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/timestamp.h"
 
@@ -598,3 +602,173 @@ pg_column_is_updatable(PG_FUNCTION_ARGS)
 
PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
 }
+
+
+/*
+ * This simple parser utility are compatible with lexer implementation,
+ * used only in parse_ident function
+ */
+static bool
+is_ident_start(unsigned char c)
+{
+   if (c == '_')
+   return true;
+   if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+   return true;
+
+   if (c >= 0200 && c <= 0377)
+   return true;
+
+   return false;
+}
+
+static bool
+is_ident_cont(unsigned char c)
+{
+   if (c >= '0' && c <= '9')
+   return true;
+
+   return is_ident_start(c);
+}
+
+/*
+ * parse_ident - parse SQL composed identifier to separate identifiers.
+ * When strict mode is active (second parameter), then any chars after
+ * last identifiers are disallowed.
+ */
+Datum
+parse_ident(PG_FUNCTION_ARGS)
+{
+   text*qualname;
+   char*qualname_str;
+   boolstrict_mode;
+ 

Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Jim Nasby

On 2/10/16 11:33 AM, Pavel Stehule wrote:


I don't think that really makes sense. I would expect schema
variables to be exposed to a function as variables or attributes,
either in the global namespace for that PL, or as an attribute of
some object (ie the plpy object in plpython).


I don't know a python, and I don't know what is possible there and what
I know. Set/Get function I can implement in any PL other than PLpgSQL.
You have to do conversion from Postgres type to PL types and I can do it
in function.


Requiring PLs to go through the native SPI to get to schema variables is 
just plain ugly and inefficient. Why wouldn't they be exposed natively?


Not designing these things through is how we ended up with the mess that 
is composite types and arrays in plpython and pltcl. I'm not saying any 
PLs besides plpgsql need to support this natively out of the box, but we 
better have an idea of how they would support it (and it'd be a good 
idea if at least one other PL did support it).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
2016-02-10 20:10 GMT+01:00 Jim Nasby :

> On 2/10/16 1:04 PM, Pavel Stehule wrote:
>
>>
>> BTW, if all that's desired here are session variables for plpgsql, I
>> think it makes a lot more sense to start with implementing
>> per-function session variables. That's a lot simpler design-wise and
>> is something we should have anyway. You don't necessarily want
>> session variables to be schema-level. (I realize the other PLs make
>> them global, which is even worse, but that's no reason to continue
>> that path.)
>>
>>
>> I am not able to implement SET and GET content in one function
>> effectively. I believe so static variables can be enough for 50%, but it
>> is too limited. Postgres cannot to pass and work with references, so
>> this C design can be too expensive.
>>
>
> You can always accept a boolean that tells you if you're setting or just
> returning. And there's probably some use cases where you don't even need to
> expose the variable outside of the function.
>

It is too simple and too like workaround :) I can do it this in plpgsql
extension probably.


> Most importantly, since this effects only plpgsql and only individual
> functions, the design is simple and should be easy to commit in 9.6. I
> don't have the same confidence with schema variables.


My target is not 9.6 - next commitfest will be full - finishing multi CPU
queries, logical replication, .. and I have still three opened patches. But
if we find a agreement in this spring, I can implement it in summer, and it
can be in upstream in early 9.7 commitfest. I know, this topic is
difficult, so have to start it now.

Regards

Pavel


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Tom Lane
Andrew Dunstan  writes:
>> Yeah. It's faintly possible that a kernel upgrade will  help.

> Another data point. I have another RPi2B that is running Debian Wheezy 
> rather than the Fedora remix. I'm running the same test on it we ran 
> yesterday on axolotl. It seems to be running without having the same 
> problems. So maybe the particular kernel port to ARM is what's tripping 
> us up.

I'd bet against it being a port-specific problem; it sounds more like a
filesystem performance-tuning issue, which could easily change from one
kernel version to another.  What are the kernel version numbers exactly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-10 Thread Jim Nasby
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

NEEDS CATVERSION BUMP

I editorialized the docs and some of the comments. In particular, I documented 
behavior of not truncating, and recommended casting to name[] if user cares 
about that. Added a unit test to verify that works. BTW, I saw mention in the 
thread about not truncated spaces, but the function *does* truncate them, 
unless they're inside quotes, where they're legitimate.

Also added test for invalid characters.

I think "strict" would be more in line with other uses in code. There are 
currently no other occurrences of 'strictmode' in the code. There are loads of 
references to 'strict', but I didn't go through all of them to see if any were 
used as externally visible function parameter names.

qualname_str is used in exactly 1 place. Either it should be gotten rid of, or 
all the uses of text_to_cstring(qualname) should be changed to qualname_str.

I think the code would have been clearer if instead of the big if (*nextp == 
'\"') it did the same "inquote" looping that is done elsewhere, but I don't 
have a strong opinion on it.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] eXtensible Transaction Manager API (v2)

2016-02-10 Thread Konstantin Knizhnik

Hi,

PostgresProffesional cluster teams wants to propose new version of 
eXtensible Transaction Manager API.

Previous discussion concerning this patch can be found here:

http://www.postgresql.org/message-id/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru

The API patch itself is small enough, but we think that it will be 
strange to provide just API without examples of its usage.


We have implemented various implementations of distributed transaction 
manager based on this API:
pg_dtm (based ion snapshot sharing) and pg_tsdtm (CSN based on local 
system time).
Based on this two DTM implementation we have developed various "cluster" 
implementations:
multimaster+pg_dtm, multimaster+pg_tsdtm, pg_shard+pg_dtm, 
pg_shard+pg_tsdtm, postgres_fdw+pg_dtm, postgres_fdw+pg+tsdtm,... 
Multimaster is based on logical replication is something like BDR but 
synchronous: provide consistency across cluster.


But we want to make this patch as small as possible.
So we decided to include in it only pg_tsdtm and patch of postgres_fdw 
allowing to use it with pg_tsdtm.
pg_tsdtm is simpler than pg_dtm because last one includes arbiter with 
RAFT protocol (centralized service)

and sockhub for efficient multiplexing backend connections.
Also, in theory, pg_tsdtm provides better scalability, because it is 
decentralized.


Architecture of DTM and tsDTM as well as benchmark results are available 
at WiKi page:


https://wiki.postgresql.org/wiki/DTM

Please notice pg-tsdtm is just reference implementation of DTM using 
this XTM API.
The primary idea of this patch is to add XTM API to PostgreSQL code, 
allowing to implement own transaction managers as
Postgres extension. So please review first of all XTM API itself and not 
pg_tsdtm which is just and example of its usage.


The complete PostgreSQL branch with all our changes can be found here:

https://github.com/postgrespro/postgres_cluster


-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company
diff --git a/contrib/pg_tsdtm/Makefile b/contrib/pg_tsdtm/Makefile
new file mode 100644
index 000..e70dffc
--- /dev/null
+++ b/contrib/pg_tsdtm/Makefile
@@ -0,0 +1,20 @@
+MODULE_big = pg_tsdtm
+OBJS = pg_tsdtm.o
+
+EXTENSION = pg_tsdtm
+DATA = pg_tsdtm--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_tsdtm
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+check:
+	env DESTDIR='$(abs_top_builddir)'/tmp_install make install
+	$(prove_check)
diff --git a/contrib/pg_tsdtm/dtm_recovery/dtm_recovery.cpp b/contrib/pg_tsdtm/dtm_recovery/dtm_recovery.cpp
new file mode 100644
index 000..38285be
--- /dev/null
+++ b/contrib/pg_tsdtm/dtm_recovery/dtm_recovery.cpp
@@ -0,0 +1,129 @@
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+using namespace std;
+using namespace pqxx;
+
+int main (int argc, char* argv[])
+{
+if (argc == 1){
+printf("Use -h to show usage options\n");
+return 1;
+}
+vector connections;
+set prepared_xacts;
+set committed_xacts;
+bool verbose = false;
+for (int i = 1; i < argc; i++) {
+if (argv[i][0] == '-') {
+switch (argv[i][1]) {
+  case 'C':
+  case 'c':
+connections.push_back(string(argv[++i]));
+continue;
+  case 'v':
+verbose = true;
+continue;
+}
+}
+printf("Perform recovery of pg_tsdtm cluster.\n"
+   "Usage: dtm_recovery {options}\n"
+   "Options:\n"
+   "\t-c STR\tdatabase connection string\n"
+   "\t-v\tverbose mode: print extra information while processing\n");
+return 1;
+}
+if (verbose) {
+cout << "Collecting information about prepared transactions...\n";
+}
+for (vector::iterator ic = connections.begin(); ic != connections.end(); ++ic)
+{
+if (verbose) {
+cout << "Connecting to " << *ic << "...\n";
+}
+connection con(*ic);
+work txn(con);
+result r = txn.exec("select gid from pg_prepared_xacts");
+for (result::const_iterator it = r.begin(); it != r.end(); ++it)
+{
+string gid = it.at("gid").as(string());
+prepared_xacts.insert(gid);
+}
+txn.commit();
+}
+if (verbose) {
+cout << "Prepared transactions: ";
+for (set::iterator it = prepared_xacts.begin(); it != prepared_xacts.end(); ++it)
+{
+cout << *it << ", ";
+}
+cout << "\nChecking which of them are committed...\n";
+}
+for (vector::iterator ic = connections.begin(); ic != connections.end(); ++ic)
+{
+if (verbose) {
+cout << "Connecting to " << *ic << "...\n";
+}
+

Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Jim Nasby

On 2/10/16 11:54 AM, Pavel Stehule wrote:

2016-02-09 23:41 GMT+01:00 Jim Nasby >:
The other big thing you get is public vs private. You can
sorta-kinda-almost simulate that with permissions in simple cases,
but it ultimately falls apart as soon as you want a private function
that does something as the user calling the function.


The schema variables are private by design. It can be enhanced in
future, but now it is out my scope. If you need public access to these
variables, you can use a functions. The access to functions can be
controlled by a rights. We can introduce a private (schema limited)
function too, but again it is out scope of this proposal.


So it's not possible for function schema_a.blah to access variables in 
schema_b? If it is then variables are NOT private.



When it comes to variables, I think it's a mistake to discuss this
patch while pretending that packages don't exist. For example all we
wanted were session variables, there's no reason they need to be
tied to schemas. The only reason to tie them to schemas is to try
and fake package support via schemas. I think it'd be a mistake to
have non-schema variables, but lets not fool ourselves as to why
that would be a mistake.


I am happy, so you are opened the question about that package.
Originally the Oracle package is a Ada language feature, but if you
compare Oracle schemas and Postgresql schemas, you should to see a
significant differences. Our schemas are much more similar to Oracle
packages than Oracle schemas. So introduction of packages to Postgres is
contra productive  -  will be pretty messy to have the packages and the
schemas together. We don't need packages, because we have schemas, but
we have not any safe (and simply used) schema scope tools. I implemented
Orafce and the main problems there are not missing packages, but
different default casting rules and missing procedures.


I'm not saying we have to implement packages the same way oracle did. Or 
at all.


My point is that there are MAJOR features that packages offer that we 
don't have at all, with or without schemas. One of those features is the 
idea of private objects. You CAN NOT do the same thing with permissions 
either, because public vs private doesn't care one iota about what role 
is executing something. They only care about what's in the call stack.



Another problem I have with this is it completely ignores
public/private session variables. The current claim is that's not a
big deal because you can only access the variables from a PL, but I
give it 2 days of this being released before people are asking for a
way to access the variables directly from SQL. Now you have a
problem because if you want private variables (which I think is
pretty important) you're only choice is to use SECDEF functions,
which is awkward at best.


While this patch doesn't need to implement SQL access to variables, I 
think the design needs to address it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Jim Nasby

On 2/10/16 1:17 PM, Pavel Stehule wrote:



2016-02-10 20:10 GMT+01:00 Jim Nasby >:

On 2/10/16 1:04 PM, Pavel Stehule wrote:


 BTW, if all that's desired here are session variables for
plpgsql, I
 think it makes a lot more sense to start with implementing
 per-function session variables. That's a lot simpler
design-wise and
 is something we should have anyway. You don't necessarily want



It is too simple and too like workaround :) I can do it this in plpgsql
extension probably.


I think it's something people will definitely want. If we don't have it, 
then they're going to be using schema variables as a work-around because 
they can't do a private static variable inside a single function.



Most importantly, since this effects only plpgsql and only
individual functions, the design is simple and should be easy to
commit in 9.6. I don't have the same confidence with schema variables.


My target is not 9.6 - next commitfest will be full - finishing multi
CPU queries, logical replication, .. and I have still three opened
patches. But if we find a agreement in this spring, I can implement it
in summer, and it can be in upstream in early 9.7 commitfest. I know,
this topic is difficult, so have to start it now.


Sure. I think it would be useful to have a wiki page with info as it 
gets ironed out. A good starting point would be use cases. One that I 
don't think has been considered is different extensions adding/using 
different schema variables. Questions like should extension A have 
direct access to variables for extension B.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
>>
>> The schema variables are private by design. It can be enhanced in
>> future, but now it is out my scope. If you need public access to these
>> variables, you can use a functions. The access to functions can be
>> controlled by a rights. We can introduce a private (schema limited)
>> function too, but again it is out scope of this proposal.
>>
>
> So it's not possible for function schema_a.blah to access variables in
> schema_b? If it is then variables are NOT private.
>

the scope is schema. It is private for schema objects. The access to
content is limited from objects from same schema. I know so this concept is
new in Postgres, but I hope so it is useful. Anybody can expect different
behave related to some technical terms - so maybe "private" keyword isn't
best in this moment. I just variables with possible safe (limited) access.
Anything else I can do with functionality that we have already.

I got off list mail with little bit different syntax proposal

CREATE VARIABLE xxx DEFAULT [ PRIVATE ]

I am thinking so more SQL natural is form:

CREATE [ PRIVATE ] VARIABLE xxx ...

There should not be only variables, there can be tables, views, functions,
...

The "PRIVATE" in this context means - only accessible from current schema.
The syntax is different, than I propose, but the idea is same.


>
> When it comes to variables, I think it's a mistake to discuss this
>> patch while pretending that packages don't exist. For example all we
>> wanted were session variables, there's no reason they need to be
>> tied to schemas. The only reason to tie them to schemas is to try
>> and fake package support via schemas. I think it'd be a mistake to
>> have non-schema variables, but lets not fool ourselves as to why
>> that would be a mistake.
>>
>>
>> I am happy, so you are opened the question about that package.
>> Originally the Oracle package is a Ada language feature, but if you
>> compare Oracle schemas and Postgresql schemas, you should to see a
>> significant differences. Our schemas are much more similar to Oracle
>> packages than Oracle schemas. So introduction of packages to Postgres is
>> contra productive  -  will be pretty messy to have the packages and the
>> schemas together. We don't need packages, because we have schemas, but
>> we have not any safe (and simply used) schema scope tools. I implemented
>> Orafce and the main problems there are not missing packages, but
>> different default casting rules and missing procedures.
>>
>
> I'm not saying we have to implement packages the same way oracle did. Or
> at all.
>
> My point is that there are MAJOR features that packages offer that we
> don't have at all, with or without schemas. One of those features is the
> idea of private objects. You CAN NOT do the same thing with permissions
> either, because public vs private doesn't care one iota about what role is
> executing something. They only care about what's in the call stack.
>

I don't understand well, and probably I don't explain my ideas well. But
this exactly what I would to implement. The security based on locality, not
based on roles.


>
> Another problem I have with this is it completely ignores
>> public/private session variables. The current claim is that's not a
>> big deal because you can only access the variables from a PL, but I
>> give it 2 days of this being released before people are asking for a
>> way to access the variables directly from SQL. Now you have a
>> problem because if you want private variables (which I think is
>> pretty important) you're only choice is to use SECDEF functions,
>> which is awkward at best.
>>
>
> While this patch doesn't need to implement SQL access to variables, I
> think the design needs to address it.


SQL access to variables needs a) change in SQL parser (with difficult
discussion about syntax) or b) generic get/set functions. @b can be used in
other PL in first iteration.

I afraid to open pandora box and I would to hold the scope of this patch
too small what is possible - to be possible implement it in one release
cycle.

Regards

Pavel


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


[HACKERS] Moving responsibility for logging "database system is shut down"

2016-02-10 Thread Tom Lane
In our recent investigations of slow shutdown on some buildfarm members,
it became clear that "database system is shut down" can get logged many
seconds before the postmaster actually exits; the main cause of delay
is that the stats collector's final data dump can take awhile.  This is
pretty confusing, IMO.  I think that that message should only come out
once the postmaster has removed its pidfile, which is the action that
makes it look shut-down to "pg_ctl stop".

After some investigation it seems that the best place to do the logging
is right in UnlinkLockFiles(); that's called when and only when we need
a log message, and there doesn't seem to be a better place for it.
I considered putting the logging into proc_exit(), but then we would need
more logic to determine whether the type of process we're exiting is one
for which such a message should be printed.  That seems messier.

I also considered moving the "shutting down" message that is currently
printed at the top of ShutdownXLOG(), but could not find any clearly
better place to do that than where it is.

So I propose the attached patch.  Any objections?  Should this get
back-patched?  It's arguably a bug, though surely a minor one, that
the message comes out when it does.

regards, tom lane

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a2846c4..07a7679 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** ShutdownXLOG(int code, Datum arg)
*** 7948,7957 
  	ShutdownCommitTs();
  	ShutdownSUBTRANS();
  	ShutdownMultiXact();
- 
- 	/* Don't be chatty in standalone mode */
- 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
- 			(errmsg("database system is shut down")));
  }
  
  /*
--- 7948,7953 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index b7bab56..603a256 100644
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** UnlinkLockFiles(int status, Datum arg)
*** 724,729 
--- 724,740 
  	}
  	/* Since we're about to exit, no need to reclaim storage */
  	lock_files = NIL;
+ 
+ 	/*
+ 	 * Lock file removal should always be the last externally visible action
+ 	 * of a postmaster or standalone backend, while we won't come here at all
+ 	 * when exiting postmaster child processes.  Therefore, this is a good
+ 	 * place to log completion of shutdown.  We could alternatively teach
+ 	 * proc_exit() to do it, but that seems uglier.  In a standalone backend,
+ 	 * use NOTICE elevel to be less chatty.
+ 	 */
+ 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
+ 			(errmsg("database system is shut down")));
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 3:00 AM, Rushabh Lathia
 wrote:
> Fujita-san, I am attaching update version of the patch, which added
> the documentation update.
>
> Once we finalize this, I feel good with the patch and think that we
> could mark this as ready for committer.

It would be nice to hear from Tom and/or Stephen whether the changes
that have been made since they last commented on it.  I feel like the
design is reasonably OK, but I don't want to push this through if
they're still not happy with it.  One thing I'm not altogether keen on
is the use of "pushdown" or "dml pushdown" as a term of art; on the
other hand, I'm not sure what other term would be better.

Comments on specific points follow.

This seems to need minor rebasing in the wake of the join pushdown patch.

+   /* Likewise for ForeignScan that has pushed down INSERT/UPDATE/DELETE */
+   if (IsA(plan, ForeignScan) &&
+   (((ForeignScan *) plan)->operation == CMD_INSERT ||
+((ForeignScan *) plan)->operation == CMD_UPDATE ||
+((ForeignScan *) plan)->operation == CMD_DELETE))
+   return;

I don't really understand why this is a good idea.  Since target lists
are only displayed with EXPLAIN (VERBOSE), I don't really understand
what is to be gained by suppressing them in any case at all, and
therefore I don't understand why it's a good idea to do it here,
either.  If there is a good reason, maybe the comment should explain
what it is.

+   /* Check point 1 */
+   if (operation == CMD_INSERT)
+   return false;
+
+   /* Check point 2 */
+   if (nodeTag(subplan) != T_ForeignScan)
+   return false;
+
+   /* Check point 3 */
+   if (subplan->qual != NIL)
+   return false;
+
+   /* Check point 4 */
+   if (operation == CMD_UPDATE)

These comments are referring to something in the function header
further up, but you could instead just delete the stuff from the
header and mention the actual conditions here.  Also:

- If the first condition is supposed accept only CMD_UPDATE or
CMD_DELETE, say if (operation != CMD_UPDATE || operation !=
CMD_DELETE) rather than doing it this way.  Is-not-insert may in this
context be functionally equivalent to is-update-or-delete, but it's
better to write the comment and the code so that they exactly match
rather than kinda-match.

- For point 2, use IsA(subplan, ForiegnScan).

+   /*
+* ignore subplan if the FDW pushes down the
command to the remote
+* server
+*/

This comment states what the code does, instead of explaining why it
does it.  Please update it so that it explains why it does it.

+   List   *fdwPushdowns;   /* per-target-table FDW
pushdown flags */

Isn't a list of booleans an awfully inefficient representation?  How
about a Bitmapset?

+   /*
+* Prepare remote-parameter expressions for evaluation.  (Note: in
+* practice, we expect that all these expressions will be just
Params, so
+* we could possibly do something more efficient than using the full
+* expression-eval machinery for this.  But probably there
would be little
+* benefit, and it'd require postgres_fdw to know more than is desirable
+* about Param evaluation.)
+*/
+   dpstate->param_exprs = (List *)
+   ExecInitExpr((Expr *) fsplan->fdw_exprs,
+(PlanState *) node);

This is an exact copy of an existing piece of code and its associated
comment.  A good bit of the surrounding code is copied, too.  You need
to refactor to avoid duplication, like by putting some of the code in
a new function that both postgresBeginForeignScan and
postgresBeginForeignModify can call.

execute_dml_stmt() has some of the same disease.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 7:21 PM, Tom Lane  wrote:

> Oleg Bartunov  writes:
> > It  looks like there is a very old bug in full text parser (somebody
> > pointed me on it), which appeared after moving tsearch2 into the core.
> The
> > problem is in how full text parser process hyphenated words. Our original
> > idea was to report hyphenated word itself as well as its parts and ignore
> > hyphen. That was how tsearch2 works.
>
> > This behaviour was changed after moving tsearch2 into the core:
> > 1. hyphen now reported by parser, which is useless.
> > 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently
> > than ones with plain text words like 'four-dot', no hyphenated word
> itself
> > reported.
>
> > I think we should consider this as a bug and produce fix for all
> supported
> > versions.
>
> I don't see anything here that looks like a bug, more like a definition
> disagreement.  As such, I'd be pretty dubious about back-patching a
> change.  But it's hard to debate the merits when you haven't said exactly
> what you'd do instead.
>

Yeah, better say not bug, but inconsistency. We definitely should work on
better
"consistent" parser with predicted behaviour.


>
> I believe the commit you mention was intended to fix this inconsistency:
>
> http://www.postgresql.org/message-id/6269.1193184...@sss.pgh.pa.us
>
> so I would be against simply reverting it.  In any case, the examples
> given there make it look like there was already inconsistency about mixed
> words and numbers.  Do we really think that "4-dot" should be considered
> a hyphenated word?  I'm not sure.
>

I agree, that we shouldn't  just revert it.  My idea is to work on new
parser and leave old as is for compatibility reason. Fortunately, fts is
flexible enough, so we could add new parser at any time as an extension.



>
> regards, tom lane
>


Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-02-10 Thread Andreas 'ads' Scherbaum


Hello,

thanks for reviewing the patch!

On 09.02.2016 20:32, Christian Ullrich wrote:


- Are there portability issues/Will it work on Windows/BSD etc.:

   No, it will not work correctly on Windows when built with MSVC,
   although it may work with MinGW.

   +++ postgresql-9.5.0/src/backend/tcop/pquery.c
   @@ -195,7 +195,7 @@
{
  case CMD_SELECT:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
   - "SELECT %u", queryDesc->estate->es_processed);
   + "SELECT %lu", queryDesc->estate->es_processed);


   %lu formats unsigned long. "long" is problematic in terms of
   portability, because sizeof(long) is different everywhere. It is 32
   bits on Windows and on 32-bit *nix, and 64 bits on 64-bit *nix.

   I added the following line to the INSERT formatting in pquery.c:

 queryDesc->estate->es_processed += 471147114711LL;

   This number is 0x6DB28E70D7; so inserting one row should return
   "INSERT 0 2995679448" (0xB28E70D8):

 postgres=# insert into t1 values (0);
 INSERT 0 2995679448

   To fix this, I think it will be enough to change the format strings to
   use "%zu" instead of "%lu". pg_snprintf() is selected by configure if
   the platform's snprintf() does not support the "z" conversion. I tried
   this, and it appears to work:

 postgres=# insert into t1 values (0);
 INSERT 0 471147114712

   I have looked for other uses of "%lu", and found none that may cause
   the same issue; apparently they are all used with values that clearly
   have 32-bit type; actually, most of them are used to format error
   codes in Windows-specific code.


Attached is a new version of the patch, with %lu replaced by %zu.
I re-ran all the tests, especially the long test with 2^32+x rows, and 
it produces the same result as before.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


64bit_4.diff.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 7:45 PM, Mike Rylander  wrote:

> On Wed, Feb 10, 2016 at 4:28 AM, Oleg Bartunov 
> wrote:
> > It  looks like there is a very old bug in full text parser (somebody
> pointed
> > me on it), which appeared after moving tsearch2 into the core.  The
> problem
> > is in how full text parser process hyphenated words. Our original idea
> was
> > to report hyphenated word itself as well as its parts and ignore hyphen.
> > That was how tsearch2 works.
> >
> > This behaviour was changed after moving tsearch2 into the core:
> > 1. hyphen now reported by parser, which is useless.
> > 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently
> > than ones with plain text words like 'four-dot', no hyphenated word
> itself
> > reported.
> >
> > I think we should consider this as a bug and produce fix for all
> supported
> > versions.
> >
>
> The Evergreen project has long depended on tsearch2 (both as an
> extension and in-core FTS), and one thing we've struggled with is date
> range parsing such as birth and death years for authors in the form of
> 1979-2014, for instance.  Strings like that end up being parsed as two
> lexems, "1979" and "-2014".  We work around this by pre-normalizing
> strings matching /(\d+)-(\d+)/ into two numbers separated by a space
> instead of a hyphen, but if fixing this bug would remove the need for
> such a preprocessing step it would be a great help to us.  Would such
> strings be parsed "properly" into lexems of the form of "1979" and
> "2014" with you proposed change?
>
>
I'd love to consider all hyphenated "words" in one way, disregarding to
what is "a word", number of plain text, namely,  'w1-w2' should be reported
as {'w1-w2', 'w1', 'w2'}. The problem is in definition of "word".


We'll definitely look on parser again, fortunately, we could just fork
default parser and develop new one to not break compatibility. You have
chance to help us to produce "consistent" view of what tokens new parser
should recognize and how process them.





> Thanks!
>
> --
> Mike Rylander
>
> > After  investigation we found this commit:
> >
> > commit 73e6f9d3b61995525785b2f4490b465fe860196b
> > Author: Tom Lane 
> > Date:   Sat Oct 27 19:03:45 2007 +
> >
> > Change text search parsing rules for hyphenated words so that digit
> > strings
> > containing decimal points aren't considered part of a hyphenated
> word.
> > Sync the hyphenated-word lookahead states with the subsequent
> > part-by-part
> > reparsing states so that we don't get different answers about how
> much
> > text
> > is part of the hyphenated word.  Per my gripe of a few days ago.
> >
> >
> > 8.2.23
> >
> > select tok_type, description, token from ts_debug('dot-four');
> >   tok_type   |  description  |  token
> > -+---+--
> >  lhword  | Latin hyphenated word | dot-four
> >  lpart_hword | Latin part of hyphenated word | dot
> >  lpart_hword | Latin part of hyphenated word | four
> > (3 rows)
> >
> > select tok_type, description, token from ts_debug('dot-4');
> >   tok_type   |  description  | token
> > -+---+---
> >  hword   | Hyphenated word   | dot-4
> >  lpart_hword | Latin part of hyphenated word | dot
> >  uint| Unsigned integer  | 4
> > (3 rows)
> >
> > select tok_type, description, token from ts_debug('4-dot');
> >  tok_type |   description| token
> > --+--+---
> >  uint | Unsigned integer | 4
> >  lword| Latin word   | dot
> > (2 rows)
> >
> > 8.3.23
> >
> > select alias, description, token from ts_debug('dot-four');
> >   alias  |   description   |  token
> > -+-+--
> >  asciihword  | Hyphenated word, all ASCII  | dot-four
> >  hword_asciipart | Hyphenated word part, all ASCII | dot
> >  blank   | Space symbols   | -
> >  hword_asciipart | Hyphenated word part, all ASCII | four
> > (4 rows)
> >
> > select alias, description, token from ts_debug('dot-4');
> >alias   |   description   | token
> > ---+-+---
> >  asciiword | Word, all ASCII | dot
> >  int   | Signed integer  | -4
> > (2 rows)
> >
> > select alias, description, token from ts_debug('4-dot');
> >alias   |   description| token
> > ---+--+---
> >  uint  | Unsigned integer | 4
> >  blank | Space symbols| -
> >  asciiword | Word, all ASCII  | dot
> > (3 rows)
> >
> >
> > Regards,
> > Oleg
>


Re: [HACKERS] [PATCH v4] GSSAPI encryption support

2016-02-10 Thread Robbie Harwood
Hello friends,

For your consideration, here is a new version of GSSAPI encryption
support.  For those who prefer, it's also available on my github:
https://github.com/frozencemetery/postgres/commit/c92275b6605d7929cda5551de47a4c60aab7179e

Some thoughts:

- The overall design is different this time - GSS encryption sits in
  parallel construction to SSL encryption rather than at the protocol
  level - so a strict diff probably isn't useful.

- The GSSAPI authentication code has been moved without modification.
  In doing so, the temptation to modify it (flags, error checking, that
  big comment at the top about things from Athena, etc.)  is very large.
  I do not know whether these changes are best suited to another patch
  in this series or should be reviewed separately.  I am also hesitant
  to add things beyond the core before I am told this is the right
  approach.

- There's no fallback here.  I wrote fallback support for versions 1-3,
  and the same design could apply here without too much trouble, but I
  am hesitant to port it over before the encryption design is approved.
  I strongly suspect you will not want to merge this without fallback
  support, and that makes sense to me.

- The client and server code look a lot like each other.  This
  resemblance is not exact, and my understanding is that server and
  client need to compile independently, so I do not know of a way to
  rectify this.  Suggestions are welcome.

Thanks!
From c92275b6605d7929cda5551de47a4c60aab7179e Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Tue, 17 Nov 2015 18:34:14 -0500
Subject: [PATCH] Connect encryption support for GSSAPI

Existing GSSAPI authentication code is extended to support connection
encryption.  Connection begins as soon as possible - that is,
immediately after the client and server complete authentication.
---
 configure   |   2 +
 configure.in|   1 +
 doc/src/sgml/client-auth.sgml   |   2 +-
 doc/src/sgml/runtime.sgml   |  20 +-
 src/Makefile.global.in  |   1 +
 src/backend/libpq/Makefile  |   4 +
 src/backend/libpq/auth.c| 330 +---
 src/backend/libpq/be-gssapi.c   | 584 
 src/backend/libpq/be-secure.c   |  16 +
 src/backend/postmaster/postmaster.c |  12 +
 src/include/libpq/libpq-be.h|  31 ++
 src/interfaces/libpq/Makefile   |   4 +
 src/interfaces/libpq/fe-auth.c  | 182 ---
 src/interfaces/libpq/fe-auth.h  |   5 +
 src/interfaces/libpq/fe-connect.c   |  10 +
 src/interfaces/libpq/fe-gssapi.c| 475 +
 src/interfaces/libpq/fe-secure.c|  16 +-
 src/interfaces/libpq/libpq-int.h|  16 +-
 18 files changed, 1193 insertions(+), 518 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi.c
 create mode 100644 src/interfaces/libpq/fe-gssapi.c

diff --git a/configure b/configure
index 3dd1b15..7fd7610 100755
--- a/configure
+++ b/configure
@@ -712,6 +712,7 @@ with_uuid
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5488,6 +5489,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 9398482..b19932e 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 3b2935c..7d37223 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -915,7 +915,7 @@ omicron bryanh  guest1
 provides automatic authentication (single sign-on) for systems
 that support it. The authentication itself is secure, but the
 data sent over the database connection will be sent unencrypted unless
-SSL is used.
+SSL or GSSAPI are used.

 

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index cda05f5..bd8156f 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1880,12 +1880,13 @@ pg_dumpall -p 5432 | psql -d postgres -p 5433
   
 
   
-   To prevent spoofing on TCP connections, the best solution is to use
-   SSL certificates and make sure that clients check the server's certificate.
-   To do that, the server
-   must be configured to accept only hostssl connections () and have SSL key and certificate files
-   (). The TCP client must connect using
+   To prevent spoofing on TCP connections, the best solutions are either to
+   use GSSAPI for authentication and encryption or to use SSL certificates and
+   make sure that clients check the server's certificate.  To secure using
+   SSL, the server must be configured to accept only hostssl
+   connections () and have SSL key and
+   

Re: [HACKERS] Moving responsibility for logging "database system is shut down"

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 3:13 PM, Tom Lane  wrote:
> In our recent investigations of slow shutdown on some buildfarm members,
> it became clear that "database system is shut down" can get logged many
> seconds before the postmaster actually exits; the main cause of delay
> is that the stats collector's final data dump can take awhile.  This is
> pretty confusing, IMO.  I think that that message should only come out
> once the postmaster has removed its pidfile, which is the action that
> makes it look shut-down to "pg_ctl stop".
>
> After some investigation it seems that the best place to do the logging
> is right in UnlinkLockFiles(); that's called when and only when we need
> a log message, and there doesn't seem to be a better place for it.
> I considered putting the logging into proc_exit(), but then we would need
> more logic to determine whether the type of process we're exiting is one
> for which such a message should be printed.  That seems messier.
>
> I also considered moving the "shutting down" message that is currently
> printed at the top of ShutdownXLOG(), but could not find any clearly
> better place to do that than where it is.
>
> So I propose the attached patch.  Any objections?  Should this get
> back-patched?  It's arguably a bug, though surely a minor one, that
> the message comes out when it does.

I would vote against a back-patch.  And I kind of agree with Jim's
comments that we ought to consider sprinkling a few more debug
messages into the shutdown sequence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
I wrote:
> Artur Zakirov  writes:
>> I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
>> with the %s format reads the string one character at a time. The size of 
>> letter 'х' is 2. And sscanf() separate it into two wrong characters.

> That argument might be convincing if OSX behaved that way for all
> multibyte characters, but it doesn't seem to be doing that.  Why is
> only 'х' affected?

I looked into the OS X sources, and found that indeed you are right:
*scanf processes the input a byte at a time, and applies isspace() to
each byte separately, even when the locale is such that that's a clearly
insane thing to do.  Since this code was derived from FreeBSD, FreeBSD
has or once had the same issue.  (A look at the freebsd project on github
says it still does, assuming that's the authoritative repo.)  Not sure
about other BSDen.

I also verified that in UTF8-based locales, isspace() thinks that 0x85 and
0xA0, and no other high-bit-set values, are spaces.  Not sure exactly why
it thinks that, but that explains why 'х' fails when adjacent code points
don't.

So apparently the coding rule we have to adopt is "don't use *scanf()
on data that might contain multibyte characters".  (There might be corner
cases where it'd work all right for conversion specifiers other than %s,
but probably you might as well just use strtol and friends in such cases.)
Ugh.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Larry Rosenman

On 2016-02-10 16:19, Tom Lane wrote:

I wrote:

Artur Zakirov  writes:
I think this is not a bug. It is a normal behavior. In Mac OS 
sscanf()
with the %s format reads the string one character at a time. The size 
of

letter 'х' is 2. And sscanf() separate it into two wrong characters.



That argument might be convincing if OSX behaved that way for all
multibyte characters, but it doesn't seem to be doing that.  Why is
only 'х' affected?


I looked into the OS X sources, and found that indeed you are right:
*scanf processes the input a byte at a time, and applies isspace() to
each byte separately, even when the locale is such that that's a 
clearly

insane thing to do.  Since this code was derived from FreeBSD, FreeBSD
has or once had the same issue.  (A look at the freebsd project on 
github

says it still does, assuming that's the authoritative repo.)  Not sure
about other BSDen.

I also verified that in UTF8-based locales, isspace() thinks that 0x85 
and
0xA0, and no other high-bit-set values, are spaces.  Not sure exactly 
why
it thinks that, but that explains why 'х' fails when adjacent code 
points

don't.

So apparently the coding rule we have to adopt is "don't use *scanf()
on data that might contain multibyte characters".  (There might be 
corner
cases where it'd work all right for conversion specifiers other than 
%s,
but probably you might as well just use strtol and friends in such 
cases.)

Ugh.

regards, tom lane

Definitive FreeBSD Sources:

https://svnweb.freebsd.org/base/


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 7011 W Parmer Ln, Apt 1115, Austin, TX 78729-6961


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving responsibility for logging "database system is shut down"

2016-02-10 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 10, 2016 at 3:13 PM, Tom Lane  wrote:
>> So I propose the attached patch.  Any objections?  Should this get
>> back-patched?  It's arguably a bug, though surely a minor one, that
>> the message comes out when it does.

> I would vote against a back-patch.  And I kind of agree with Jim's
> comments that we ought to consider sprinkling a few more debug
> messages into the shutdown sequence.

[ shrug... ]  I won't stand in the way of someone else figuring out
what makes sense there, but I don't intend to do it; and I don't think
that the quick hacks I did over the last couple days make a reasonable
basis for a permanent patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
Larry Rosenman  writes:
> On 2016-02-10 16:19, Tom Lane wrote:
>> I looked into the OS X sources, and found that indeed you are right:
>> *scanf processes the input a byte at a time, and applies isspace() to
>> each byte separately, even when the locale is such that that's a
>> clearly insane thing to do.  Since this code was derived from FreeBSD,
>> FreeBSD has or once had the same issue.  (A look at the freebsd project
>> on github says it still does, assuming that's the authoritative repo.)
>> Not sure about other BSDen.

> Definitive FreeBSD Sources:
> https://svnweb.freebsd.org/base/

Ah, thanks for the link.  I'm not totally sure which branch is most
current, but at least on this one, it's still clearly wrong:
https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336=markup
convert_string(), which handles %s, applies isspace() to individual bytes
regardless of locale.  convert_wstring(), which handles %ls, does it more
intelligently ... but as I said upthread, relying on %ls would just give
us a different set of portability problems.

It looks like Artur's patch is indeed what we need to do, along with
looking around for other *scanf() uses that are vulnerable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-10 Thread Pavel Stehule
Hi

2016-02-09 23:41 GMT+01:00 Jim Nasby :

> On 2/9/16 4:13 PM, Corey Huinker wrote:
>
>>
>> We're not going to get source compatibility without implementing
>> packages, and there's no enthusiasm for that. It's been stated a few
>> times before by some that the only value they see in packages is the
>> package/session variables. Pavel's idea gives us that.
>>
>
> The other big thing you get is public vs private. You can
> sorta-kinda-almost simulate that with permissions in simple cases, but it
> ultimately falls apart as soon as you want a private function that does
> something as the user calling the function.
>

The schema variables are private by design. It can be enhanced in future,
but now it is out my scope. If you need public access to these variables,
you can use a functions. The access to functions can be controlled by a
rights. We can introduce a private (schema limited) function too, but again
it is out scope of this proposal.


> When it comes to variables, I think it's a mistake to discuss this patch
> while pretending that packages don't exist. For example all we wanted were
> session variables, there's no reason they need to be tied to schemas. The
> only reason to tie them to schemas is to try and fake package support via
> schemas. I think it'd be a mistake to have non-schema variables, but lets
> not fool ourselves as to why that would be a mistake.
>

I am happy, so you are opened the question about that package. Originally
the Oracle package is a Ada language feature, but if you compare Oracle
schemas and Postgresql schemas, you should to see a significant
differences. Our schemas are much more similar to Oracle packages than
Oracle schemas. So introduction of packages to Postgres is contra
productive  -  will be pretty messy to have the packages and the schemas
together. We don't need packages, because we have schemas, but we have not
any safe (and simply used) schema scope tools. I implemented Orafce and the
main problems there are not missing packages, but different default casting
rules and missing procedures.


>
> Another problem I have with this is it completely ignores public/private
> session variables. The current claim is that's not a big deal because you
> can only access the variables from a PL, but I give it 2 days of this being
> released before people are asking for a way to access the variables
> directly from SQL. Now you have a problem because if you want private
> variables (which I think is pretty important) you're only choice is to use
> SECDEF functions, which is awkward at best.
>
> I forgot to mention that if we're FROM-phobic the syntax could also be
>>
>> IMPORT my_schema.bar AS g_localtext IN OUT text
>>
>> Either way, you get the idea: the function defines what external globals
>> it's willing to see, and gives an alias for them, and it's the same
>> regardless of what the function language is.
>>
>
> ISTM that for plpgsql it would be better to add a namespace level above
> the current top level (which is the function level).


It is. Outer function level is a schema.

I though about possible feature:

DECLARE xxx int%SCHEMASCOPE;

But it can be pretty difficult checked - other function can has "DECLARE
xxx bigint%SCHEMASCOPE;" and what is valid version. Currently we can do
validation of any function without checking any other functions. If I miss
extern living object, then I have to do validation all functions in schema
together. What is much more expensive. I don't would to introduce slower
Oracle compilations and dependency issues. So I need externally created
object. It was reason, why I used a statement CREATE instead statement
DECLARE.

Regards

Pavel

Regards

Pavel


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-10 Thread Andrew Dunstan



On 02/10/2016 12:53 PM, Tom Lane wrote:

Andrew Dunstan  writes:

Yeah. It's faintly possible that a kernel upgrade will  help.

Another data point. I have another RPi2B that is running Debian Wheezy
rather than the Fedora remix. I'm running the same test on it we ran
yesterday on axolotl. It seems to be running without having the same
problems. So maybe the particular kernel port to ARM is what's tripping
us up.

I'd bet against it being a port-specific problem; it sounds more like a
filesystem performance-tuning issue, which could easily change from one
kernel version to another.  What are the kernel version numbers exactly?


axolotl: Linux pirouette 
3.18.13-501.20150510gitf36e19f.sc20.armv7hl.bcm2709 #1 SMP PREEMPT

debian: Linux piratic 3.18.7-v7+ #755 SMP PREEMPT

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-10 Thread Pavel Stehule
Hi Vitaly,

please, can you send your version of this patch, how we talked about it in
Moscow?

Thank you

Pavel


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
Artur Zakirov  writes:
>  [ tsearch_aff_parse_v1.patch ]

I've pushed this with some corrections --- notably, I did not like the
lack of buffer-overrun prevention, and it did the wrong thing if a line
had more than one trailing space character.

We still need to look at other uses of *scanf(), but I think that any
other changes that might be needed should be separate patches anyhow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Chapman Flack
On 02/10/16 17:19, Tom Lane wrote:

> I also verified that in UTF8-based locales, isspace() thinks that 0x85 and
> 0xA0, and no other high-bit-set values, are spaces.  Not sure exactly why

Unicode NEXT LINE (NEL) and NO-BREAK SPACE, respectively.

http://unicode.org/standard/reports/tr13/tr13-5.html
http://unicode.org/charts/PDF/U0080.pdf

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Amit Kapila
On Wed, Feb 10, 2016 at 1:42 PM, Michael Paquier 
wrote:
>
> On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila 
wrote:
> > Okay, but isn't it better that we remove the snapshot taken
> > at checkpoint time in the main branch or till where this code is
> > getting back-patched.   Do you see any need of same after
> > having the logging of snapshot in bgwriter?
>
> But this one is necessary as well to allow hot standby faster to
> initialize, no? Particularly in the case where a bgwriter snapshot
> would have been taken just before the checkpoint, there may be up to
> 15s until the next one.
>

It could be helpful if checkpoint is done at shorter intervals, otherwise
we anyway log it at 15s interval and if need faster initialisation
of hot-standby, then it is better to reduce the log snapshot interval
in bgwriter.  I think if go by my reasoning then it should have been
removed when bgwriter is changed to log snapshot, so I think you
can take this suggestion if you also feel the same, anyway this
is not a correctness issue.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-02-10 Thread Vitaly Burovoy
On 2/10/16, Pavel Stehule  wrote:
> Hi Vitaly,
>
> please, can you send your version of this patch, how we talked about it in
> Moscow?
>
> Thank you
>
> Pavel

Hello, Pavel!

Please find attached my version of the patch (it applies cleanly on
top of 64d89a9 which is current master).

It is time to change oid. 3331 is used by "bytea_sortsupport", I set
3334 to "pg_size_bytes".

I got a code design of numbers checking from json_lex_number in
src/backend/utils/adt/json.c
For me it seems more structured. I adapted it a little and it allows
to add parsing an exponent (like '10е3 Mb') easily for full support of
numeric (if sometimes it is necessary).

When I added "trimming" for size units (playing with avoiding an extra
buffer), I found there is easy to support "bytes" unit (but "byte" is
still unsupported).


Also this version includes all changes I mentioned in my last review[1]:
1. parse_memory_unit returns value instead of using a pointer (return
zero if noting is found) for it;
2. all messages are in a single style (nuances are in errdetails);
3. "select"s are in uppercase, rephrased and moved a comment block in test;
4. several tests are added (also with supporting of "bytes" unit);
5. a sentence in a documentation is rephrased (numeric->fixed-point
number); "bytes" unit is added to both functions;
6. fixed indentation a little;
7. pfree is removed (it is easier than removing all other allocated resources).


I still think my changes are little and they are based on your work
(and research).

[1]http://www.postgresql.org/message-id/cakoswnk13wvdem06lro-hucr0pr6et29+dvqy6j5skxzaru...@mail.gmail.com

-- 
Best regards,
Vitaly Burovoy


pg-size-bytes-11chgd.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Chapman Flack
On 02/10/16 23:55, Tom Lane wrote:

> Yeah, I got that --- what seems squishier is that none of the other C1
> control characters are considered whitespace?

That seems to be exactly the case:

http://www.unicode.org/Public/5.2.0/ucd/PropList.txt

09..0D, 20, 85, and A0 are the only whitespace chars whose codepoints
fit in a byte.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 1:25 AM, Kouhei Kaigai  wrote:
> It is pretty good!
>
> The attached patch (primary one) implements the above idea.
>
> Now ExtensibleNode works as a basis structure of data container,
> regardless of CustomScan and ForeignScan.
> Also, fdw_private and custom_private are de-defined to Node * type
> from List * type. It affected to a few FDW APIs.

I extracted the subset of this that just creates the extensible node
framework and did some cleanup - the result is attached.  I will
commit this if nobody objects.

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1.  Perhaps we could start a
new thread to talk about that specific idea.  This is useful even
without that, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index fe2e460..0b1e98c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = nodeFuncs.o nodes.o list.o bitmapset.o tidbitmap.o \
-   copyfuncs.o equalfuncs.o makefuncs.o \
+   copyfuncs.o equalfuncs.o extensible.o makefuncs.o \
outfuncs.o readfuncs.o print.o read.o params.o value.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e54d174..a9e9cc3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -23,6 +23,7 @@
 #include "postgres.h"
 
 #include "miscadmin.h"
+#include "nodes/extensible.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 #include "utils/datum.h"
@@ -4166,6 +4167,27 @@ _copyList(const List *from)
 }
 
 /* 
+ *	extensible.h copy functions
+ * 
+ */
+static ExtensibleNode *
+_copyExtensibleNode(const ExtensibleNode *from)
+{
+	ExtensibleNode	   *newnode;
+	const ExtensibleNodeMethods *methods;
+
+	methods = GetExtensibleNodeMethods(from->extnodename, false);
+	newnode = (ExtensibleNode *) newNode(methods->node_size,
+		 T_ExtensibleNode);
+	COPY_STRING_FIELD(extnodename);
+
+	/* copy the private fields */
+	methods->nodeCopy(newnode, from);
+
+	return newnode;
+}
+
+/* 
  *	value.h copy functions
  * 
  */
@@ -4545,6 +4567,13 @@ copyObject(const void *from)
 			break;
 
 			/*
+			 * EXTENSIBLE NODES
+			 */
+		case T_ExtensibleNode:
+			retval = _copyExtensibleNode(from);
+			break;
+
+			/*
 			 * PARSE NODES
 			 */
 		case T_Query:
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 08ccc0d..b9c3959 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -29,6 +29,7 @@
 
 #include "postgres.h"
 
+#include "nodes/extensible.h"
 #include "nodes/relation.h"
 #include "utils/datum.h"
 
@@ -871,6 +872,25 @@ _equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
 	return true;
 }
 
+/*
+ * Stuff from extensible.h
+ */
+static bool
+_equalExtensibleNode(const ExtensibleNode *a, const ExtensibleNode *b)
+{
+	const ExtensibleNodeMethods  *methods;
+
+	COMPARE_STRING_FIELD(extnodename);
+
+	/* At this point, we know extnodename is the same for both nodes. */
+	methods = GetExtensibleNodeMethods(a->extnodename, false);
+
+	/* compare the private fields */
+	if (!methods->nodeEqual(a, b))
+		return false;
+
+	return true;
+}
 
 /*
  * Stuff from parsenodes.h
@@ -2874,6 +2894,13 @@ equal(const void *a, const void *b)
 			break;
 
 			/*
+			 * EXTENSIBLE NODES
+			 */
+		case T_ExtensibleNode:
+			retval = _equalExtensibleNode(a, b);
+			break;
+
+			/*
 			 * PARSE NODES
 			 */
 		case T_Query:
diff --git a/src/backend/nodes/extensible.c b/src/backend/nodes/extensible.c
new file mode 100644
index 000..8fb4767
--- /dev/null
+++ b/src/backend/nodes/extensible.c
@@ -0,0 +1,92 @@
+/*-
+ *
+ * extensible.c
+ *	  Support for extensible node types
+ *
+ * Loadable modules can define what are in effect new types of nodes using
+ * the routines in this file.  All such nodes are flagged T_ExtensibleNode,
+ * with the extnodename field distinguishing the specific type.  Use
+ * RegisterExtensibleNodeMethods to register a new type of extensible node,
+ * and GetExtensibleNodeMethods to get information about a previously
+ * registered type of extensible node.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  

Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
Chapman Flack  writes:
> On 02/10/16 17:19, Tom Lane wrote:
>> I also verified that in UTF8-based locales, isspace() thinks that 0x85 and
>> 0xA0, and no other high-bit-set values, are spaces.  Not sure exactly why

> Unicode NEXT LINE (NEL) and NO-BREAK SPACE, respectively.

Yeah, I got that --- what seems squishier is that none of the other C1
control characters are considered whitespace?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-10 Thread Pavel Stehule
2016-02-10 19:26 GMT+01:00 Jim Nasby :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> NEEDS CATVERSION BUMP
>
> I editorialized the docs and some of the comments. In particular, I
> documented behavior of not truncating, and recommended casting to name[] if
> user cares about that. Added a unit test to verify that works. BTW, I saw
> mention in the thread about not truncated spaces, but the function *does*
> truncate them, unless they're inside quotes, where they're legitimate.
>

ok

>
> Also added test for invalid characters.
>
> I think "strict" would be more in line with other uses in code. There are
> currently no other occurrences of 'strictmode' in the code. There are loads
> of references to 'strict', but I didn't go through all of them to see if
> any were used as externally visible function parameter names.
>

I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?


>
> qualname_str is used in exactly 1 place. Either it should be gotten rid
> of, or all the uses of text_to_cstring(qualname) should be changed to
> qualname_str.
>

fixed, qualname_str is used everywhere


>
> I think the code would have been clearer if instead of the big if (*nextp
> == '\"') it did the same "inquote" looping that is done elsewhere, but I
> don't have a strong opinion on it.
>

The almost all code +/- is related to human readable error messages. We can
move some code to separate static functions - read_quoted_ident,
read_unquoted_ident, but there will be some magic about parameters, and the
code will not be much better, than it is now.

>
> The new status of this patch is: Waiting on Author
>

Thank you for your work on documentation.

I am sending updated version of this patch.

Regards

Pavel


>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..9eed19a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1821,1826 
--- 1821,1847 

 
  
+  parse_ident
+ 
+ parse_ident(str text,
+[ strictmode boolean DEFAULT true ] )
+
+text[]
+Split qualified identifier into array parts.
+When strictmode is false, extra characters after the identifier are ignored.
+This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+characters. By default, extra characters after the last identifier are considered an error.
+second parameter is false, then chars after last identifier are ignored. Note that this function
+does not truncate quoted identifiers. If you care about that you should cast the result of this
+ 	   function to name[].
+
+parse_ident('"SomeSchema".someTable')
+"SomeSchema,sometable"
+   
+ 
+   
+
+ 
   pg_client_encoding
  
  pg_client_encoding()
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 923fe58..61d5b80
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 965,967 
--- 965,974 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*** scanstr(const char *s)
*** 130,135 
--- 130,144 
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*** downcase_truncate_identifier(const char
*** 158,169 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-10 Thread Michael Paquier
On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
 wrote:
>
> Hello,
>
> At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Relation extension scalability

2016-02-10 Thread Andres Freund
On 2016-02-10 10:32:44 +0530, Dilip Kumar wrote:
> On Fri, Feb 5, 2016 at 4:50 PM, Amit Kapila  wrote:
> 
> > > Could you also measure how this behaves for an INSERT instead of a COPY
> > > workload?
> >
> >
> I think such a test will be useful.
> >
> 
> I have measured the performance with insert to see the behavior when it
> don't use strategy. I have tested multiple option, small tuple, big tuple,
> data fits in shared buffer and doesn't fit in shared buffer.

Could you please also have a look at the influence this has on latency?
I think you unfortunately have to look at the per-transaction logs, and
then see whether the worst case latencies get better or worse.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-10 Thread Rushabh Lathia
Fujita-san, I am attaching update version of the patch, which added
the documentation update.

Once we finalize this, I feel good with the patch and think that we
could mark this as ready for committer.

On Fri, Feb 5, 2016 at 5:33 PM, Rushabh Lathia 
wrote:

>
>
> On Fri, Feb 5, 2016 at 4:46 PM, Rushabh Lathia 
> wrote:
>
>>
>>
>> On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita <
>> fujita.ets...@lab.ntt.co.jp> wrote:
>>
>>> On 2016/01/28 15:20, Rushabh Lathia wrote:
>>>
 On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
 >
 wrote:

 On 2016/01/27 21:23, Rushabh Lathia wrote:

 If I understood correctly, above documentation means, that if
 FDW have
 DMLPushdown APIs that is enough. But in reality thats not the
 case, we
 need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
 in case
 DML is not pushable.

 And here fact is DMLPushdown APIs are optional for FDW, so that
 if FDW
 don't have DMLPushdown APIs they can still very well perform
 the DML
 operations using ExecForeignInsert, ExecForeignUpdate, or
 ExecForeignDelete.

>>>
>>> So documentation should be like:

 If the IsForeignRelUpdatable pointer is set to NULL, foreign
 tables are
 assumed to be insertable, updatable, or deletable if the FDW
 provides
 ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
 respectively,

 If FDW provides DMLPushdown APIs and the DML are pushable to the
 foreign
 server, then FDW still needs ExecForeignInsert,
 ExecForeignUpdate, or
 ExecForeignDelete for the non-pushable DML operation.

 What's your opinion ?

>>>
>>> I agree that we should add this to the documentation, too.

>>>
>>> I added docs to the IsForeignRelUpdatable documentation.  Also, a brief
>>> introductory remark has been added at the beginning of the DML pushdown
>>> APIs' documentation.
>>>
>>> BTW, if I understand correctly, I think we should also modify
 relation_is_updatabale() accordingly.  Am I right?

>>>
>>> Yep, we need to modify relation_is_updatable().

>>>
>>> I thought I'd modify that function in the same way as
>>> CheckValidResultRel(), but I noticed that we cannot do that, because we
>>> don't have any information on whether each update is pushed down to the
>>> remote server by PlanDMLPushdown, during relation_is_updatabale().
>>
>>
>> Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives
>> information update whether update is pushed down safe or not ? What my
>> concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true
>> (PlanDMLPushdown return true), but later into CheckValidResultRel() it
>> found out that missing BeginDMLPushdown, IterateDMLPushdown and
>> EndDMLPushdown APIs and it will end up with error.
>>
>> What I think CheckValidResultRel() should do is, if
>> resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API
>> and if it doesn't find those API then check for traditional APIs
>> (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it
>> doesn't find both then it should return an error.
>>
>> I changed CheckValidResultRel(), where
>>
>> 1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and
>> DMLPushdown APIs are missing as query can still perform operation with
>> traditional ExecForeign APIs. So in this situation just marked
>> resultRelInfo->ri_FdwPushdown to false.
>>
>> (Wondering can we add the checks for DMLPushdown APIs into
>> PlanDMLPushdown as additional check? Means PlanDMLPushdown should return
>> true only if FDW provides the BeginDMLPushdown & IterateDMLPushdown &
>> EndDMLPushdown APIs ? What you say ?)
>>
>>
> On another thought, we should not give responsibility to check for the
> APIs to the FDW. So may be we should call PlanDMLPushdown only if
> BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs are present
> into FDW. That means prepare DMLPushdown plan only when all the required
> APIs are available with FDW. This will also reduce the changes into
> CheckValidResultRel().
>
> Thanks Ashutosh Bapat for healthy discussion.
>
> PFA patch.
>
>
>
>> 2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and
>> DMLPushdown APIs is present but ExecForeign APIs are missing.
>> 3) Throw an error if resultRelInfo->ri_FdwPushdown is false and
>> ExecForeign APIs are missing.
>>
>> Attaching is the WIP patch here, do share your thought.
>> (need to apply on top of V6 patch)
>>
>>
>> So, I left that function as-is.  relation_is_updatabale() is just used
>>> for display in the information_schema views, so ISTM 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Amit Kapila
On Wed, Feb 10, 2016 at 1:01 PM, Michael Paquier 
wrote:

>
>
> On Wed, Feb 10, 2016 at 4:11 PM, Amit Kapila 
> wrote:
>
>>
>>
>>
>>>
>>> >> > - last_snapshot_lsn != GetXLogInsertRecPtr())
>>> >> > +
>>> >> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
>>> >> >
>>> >> > How the above check in patch suppose to work?
>>> >> > I think it could so happen once bgwriter logs snapshot, both
>>> checkpoint
>>> >> > and progressAt won't get updated any further and I think this check
>>> will
>>> >> > allow to log snapshots in such a case as well.
>>> >>
>>> >> The only purpose of this check is to do the following thing: if no WAL
>>> >> activity has happened since last checkpoint, there is no need to log a
>>> >> standby snapshot from the perspective of the bgwriter. In case the
>>> >> system is idle, we want to skip logging that and avoid unnecessary
>>> >> checkpoints because those records would have been generated. If the
>>> >> system is not idle, or to put it in other words there has been at
>>> >> least one record since the last checkpoint, we would log a standby
>>> >> snapshot, enforcing as well a checkpoint to happen the next time
>>> >> CreateCheckpoint() is gone through, and a standby snapshot is logged
>>> >> as well after the checkpoint contents are flushed. I am not sure I
>>> >> understand what you are getting at...
>>> >
>>> > Let me try to say again, suppose ControlFile->checkPoint is at
>>> > 100 and latest value of progressAt returned by GetProgressRecPtr
>>> > is 105, so the first time the above check happens, it will allow
>>> > to log standby snapshot which is okay, now assume there is no
>>> > activity, neither there is any checkpoint and again after
>>> > LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
>>> > gets executed, it will pass and log the standby snapshot which is
>>> > *not* okay, because we don't want to log standby snapshot when
>>> > there is no activity.  Am I missing something?
>>>
>>> Ah, right. Sorry for not getting your point. OK now I got it. So
>>> basically what the code does not is that if there is just one record after
>>> a checkpoint we would log every 15s a standby snapshot until the next
>>> checkpoint even if nothing happens, but we can save a bunch of standby
>>> records. So your point is that we need to save the result of
>>> GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
>>> taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
>>> next loop we compare it on the current result of GetProgressRecPtr(), so
>>> the condition to check if a standby snapshot should be generated or not in
>>> the bgwriter becomes that:
>>> (now >= timeout &&
>>>  GetLastCheckpointRecPtr() < current_progress_ptr &&
>>>  last_progress_ptr < current_progress_ptr)
>>>
>>
>> Why do you think including checkpoint related check is
>> required, how about something like below:
>>
>> (now >= timeout &&
>>  last_progress_ptr < current_progress_ptr)
>>
>
> We need as well to be sure that no standby snapshots are logged just after
> a checkpoint in this code path when last_progress_ptr is referring to an
> LSN position *before* the last checkpoint LSN. There is already one
> snapshot in CreateCheckpoint() that is logged, there is no need of an extra
> one, explaining why the check on progress since last checkpoint is
> necessary to me.
>

Okay, but isn't it better that we remove the snapshot taken
at checkpoint time in the main branch or till where this code is
getting back-patched.   Do you see any need of same after
having the logging of snapshot in bgwriter?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 12:28 PM, Oleg Bartunov  wrote:

> It  looks like there is a very old bug in full text parser (somebody
> pointed me on it), which appeared after moving tsearch2 into the core.  The
> problem is in how full text parser process hyphenated words. Our original
> idea was to report hyphenated word itself as well as its parts and ignore
> hyphen. That was how tsearch2 works.
>
> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently than ones with plain text words like 'four-dot', no hyphenated
> word itself reported.
>
> I think we should consider this as a bug and produce fix for all supported
> versions.
>
> After  investigation we found this commit:
>
> commit 73e6f9d3b61995525785b2f4490b465fe860196b
> Author: Tom Lane 
> Date:   Sat Oct 27 19:03:45 2007 +
>
> Change text search parsing rules for hyphenated words so that digit
> strings
> containing decimal points aren't considered part of a hyphenated word.
> Sync the hyphenated-word lookahead states with the subsequent
> part-by-part
> reparsing states so that we don't get different answers about how much
> text
> is part of the hyphenated word.  Per my gripe of a few days ago.
>
>
> 8.2.23
>
> select tok_type, description, token from ts_debug('dot-four');
>   tok_type   |  description  |  token
> -+---+--
>  lhword  | Latin hyphenated word | dot-four
>  lpart_hword | Latin part of hyphenated word | dot
>  lpart_hword | Latin part of hyphenated word | four
> (3 rows)
>
> select tok_type, description, token from ts_debug('dot-4');
>   tok_type   |  description  | token
> -+---+---
>  hword   | Hyphenated word   | dot-4
>  lpart_hword | Latin part of hyphenated word | dot
>  uint| Unsigned integer  | 4
> (3 rows)
>
> select tok_type, description, token from ts_debug('4-dot');
>  tok_type |   description| token
> --+--+---
>  uint | Unsigned integer | 4
>  lword| Latin word   | dot
> (2 rows)
>
> 8.3.23
>
> select alias, description, token from ts_debug('dot-four');
>   alias  |   description   |  token
> -+-+--
>  asciihword  | Hyphenated word, all ASCII  | dot-four
>  hword_asciipart | Hyphenated word part, all ASCII | dot
>  blank   | Space symbols   | -
>  hword_asciipart | Hyphenated word part, all ASCII | four
> (4 rows)
>
> select alias, description, token from ts_debug('dot-4');
>alias   |   description   | token
> ---+-+---
>  asciiword | Word, all ASCII | dot
>  int   | Signed integer  | -4
> (2 rows)
>
> select alias, description, token from ts_debug('4-dot');
>alias   |   description| token
> ---+--+---
>  uint  | Unsigned integer | 4
>  blank | Space symbols| -
>  asciiword | Word, all ASCII  | dot
> (3 rows)
>
>

Oh, one more bug, which existed even in tsearch2.

select tok_type, description, token from ts_debug('4-dot');
 tok_type |   description| token
--+--+---
 uint | Unsigned integer | 4
 lword| Latin word   | dot
(2 rows)




>
> Regards,
> Oleg
>


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-10 Thread Aleksander Alekseev
Hello, Robert

> Basically, the burden for you to impose a new coding rule on everybody
> who uses shared hash tables in the future is very high.

I fixed an issue you described. Number of spinlocks doesn't depend of
NUM_LOCK_PARTITIONS anymore and could be specified for each hash table
on a calling side.

I did a benchmark described in a first message of this thread again.
Currently I don't have access to the same 60-core server so I used more
common 12-core server (24 with HT). According to this benchmark TPS
increment depends on NUM_LOCK_PARTITIONS and default number of
spinlocks this way:

pgbench -f pgbench.sql -T 150 -P 1 -c 40 -j 12

 DMN | NLP = 16 | NLP = 32 | NLP = 64 | NLP = 128
-|--|--|--|--
   8 |  +15.1%  |  +28.2%  |  +34.1%  |  +33.7%  
  16 |  +16.6%  |  +30.9%  |  +37.0%  |  +40.8%  
  32 |  +15.1%  |  +33.9%  |  +39.5%  |  +41.9%  
  64 |  +15.0%  |  +31.9%  |  +40.1%  |  +42.9%  
 128 |   +7.7%  |  +24.7%  |  +29.6%  |  +31.6%  

* NLP = NUM_LOCK_PARTITIONS
* DMN = DEFAULT_MUTEXES_NUM

I realize this benchmark doesn't represent any possible workload so for
attached patch I choose NUM_LOCK_PARTITIONS = DEFAULT_MUTEXES_NUM = 32.
It seems to be a reasonable compromise of a speedup according to
"synthetic and meaningless in practice" benchmark and number of used
locks which could mean quite a lot in practice. Still this values could
be easily changed in any moment.

Here are before/after benchmark results for this concrete patch.


BEFORE

pgbench (default):

tps = 1295.798531 (including connections establishing)
tps = 1295.858295 (excluding connections establishing)

pgbench -f pgbench.sql:

tps = 1020.072172 (including connections establishing)
tps = 1020.116888 (excluding connections establishing)


AFTER

pgbench (default):

tps = 1299.369807 (including connections establishing)
tps = 1299.429764 (excluding connections establishing)

pgbench -f pgbench.sql:

tps = 1365.749333 (including connections establishing)
tps = 1365.814384 (excluding connections establishing)


So as I understand this patch solves a lock contention problem and
doesn't make anything else worse.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index dffc477..9a15a2a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -495,7 +495,7 @@ pgss_shmem_startup(void)
 	info.hash = pgss_hash_fn;
 	info.match = pgss_match_fn;
 	pgss_hash = ShmemInitHash("pg_stat_statements hash",
-			  pgss_max, pgss_max,
+			  pgss_max,
 			  ,
 			  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
 
diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c
index 39e8baf..dd5acb7 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -62,7 +62,7 @@ InitBufTable(int size)
 	info.num_partitions = NUM_BUFFER_PARTITIONS;
 
 	SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table",
-  size, size,
+  size,
   ,
   HASH_ELEM | HASH_BLOBS | HASH_PARTITION);
 }
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 81506ea..4c18701 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -237,7 +237,7 @@ InitShmemIndex(void)
 	hash_flags = HASH_ELEM;
 
 	ShmemIndex = ShmemInitHash("ShmemIndex",
-			   SHMEM_INDEX_SIZE, SHMEM_INDEX_SIZE,
+			   SHMEM_INDEX_SIZE,
 			   , hash_flags);
 }
 
@@ -255,17 +255,12 @@ InitShmemIndex(void)
  * exceeded substantially (since it's used to compute directory size and
  * the hash table buckets will get overfull).
  *
- * init_size is the number of hashtable entries to preallocate.  For a table
- * whose maximum size is certain, this should be equal to max_size; that
- * ensures that no run-time out-of-shared-memory failures can occur.
- *
  * Note: before Postgres 9.0, this function returned NULL for some failure
  * cases.  Now, it always throws error instead, so callers need not check
  * for NULL.
  */
 HTAB *
 ShmemInitHash(const char *name, /* table string name for shmem index */
-			  long init_size,	/* initial table size */
 			  long max_size,	/* max size of the table */
 			  HASHCTL *infoP,	/* info about key and bucket size */
 			  int hash_flags)	/* info about infoP */
@@ -299,7 +294,7 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
 	/* Pass location of hashtable header to hash_create */
 	infoP->hctl = (HASHHDR *) location;
 
-	return hash_create(name, init_size, infoP, hash_flags);
+	return hash_create(name, max_size, infoP, hash_flags);
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index e3e9599..fc20a67 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -374,18 +374,10 @@ void
 InitLocks(void)
 {
 	HASHCTL		info;
-	long		init_table_size,
-

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-10 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-10 Thread Michael Paquier
On Wed, Feb 10, 2016 at 5:00 PM, Amit Kapila  wrote:
> Okay, but isn't it better that we remove the snapshot taken
> at checkpoint time in the main branch or till where this code is
> getting back-patched.   Do you see any need of same after
> having the logging of snapshot in bgwriter?

But this one is necessary as well to allow hot standby faster to
initialize, no? Particularly in the case where a bgwriter snapshot
would have been taken just before the checkpoint, there may be up to
15s until the next one.

And AFAIK, this code would not get a back-patch, as stated by Andres
upthread :( I would think that it is better to actually get a
backpatch, but well...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
It  looks like there is a very old bug in full text parser (somebody
pointed me on it), which appeared after moving tsearch2 into the core.  The
problem is in how full text parser process hyphenated words. Our original
idea was to report hyphenated word itself as well as its parts and ignore
hyphen. That was how tsearch2 works.

This behaviour was changed after moving tsearch2 into the core:
1. hyphen now reported by parser, which is useless.
2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
than ones with plain text words like 'four-dot', no hyphenated word itself
reported.

I think we should consider this as a bug and produce fix for all supported
versions.

After  investigation we found this commit:

commit 73e6f9d3b61995525785b2f4490b465fe860196b
Author: Tom Lane 
Date:   Sat Oct 27 19:03:45 2007 +

Change text search parsing rules for hyphenated words so that digit
strings
containing decimal points aren't considered part of a hyphenated word.
Sync the hyphenated-word lookahead states with the subsequent
part-by-part
reparsing states so that we don't get different answers about how much
text
is part of the hyphenated word.  Per my gripe of a few days ago.


8.2.23

select tok_type, description, token from ts_debug('dot-four');
  tok_type   |  description  |  token
-+---+--
 lhword  | Latin hyphenated word | dot-four
 lpart_hword | Latin part of hyphenated word | dot
 lpart_hword | Latin part of hyphenated word | four
(3 rows)

select tok_type, description, token from ts_debug('dot-4');
  tok_type   |  description  | token
-+---+---
 hword   | Hyphenated word   | dot-4
 lpart_hword | Latin part of hyphenated word | dot
 uint| Unsigned integer  | 4
(3 rows)

select tok_type, description, token from ts_debug('4-dot');
 tok_type |   description| token
--+--+---
 uint | Unsigned integer | 4
 lword| Latin word   | dot
(2 rows)

8.3.23

select alias, description, token from ts_debug('dot-four');
  alias  |   description   |  token
-+-+--
 asciihword  | Hyphenated word, all ASCII  | dot-four
 hword_asciipart | Hyphenated word part, all ASCII | dot
 blank   | Space symbols   | -
 hword_asciipart | Hyphenated word part, all ASCII | four
(4 rows)

select alias, description, token from ts_debug('dot-4');
   alias   |   description   | token
---+-+---
 asciiword | Word, all ASCII | dot
 int   | Signed integer  | -4
(2 rows)

select alias, description, token from ts_debug('4-dot');
   alias   |   description| token
---+--+---
 uint  | Unsigned integer | 4
 blank | Space symbols| -
 asciiword | Word, all ASCII  | dot
(3 rows)


Regards,
Oleg


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-10 Thread Ashutosh Bapat
Here's patch to remove this declaration. The Assert next probably prevents
the warning for build with asserts. But both those lines are not needed.

On Wed, Feb 10, 2016 at 12:01 PM, Jeff Janes  wrote:

> On Tue, Feb 9, 2016 at 11:16 AM, Robert Haas 
> wrote:
> > On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
> >  wrote:
> >> Thanks Jeevan for your review and comments. PFA the patch which fixes
> those.
> >
> > Committed with a couple more small adjustments.
>
> I'm getting a compiler warning which I think is coming from this commit.
>
> postgres_fdw.c: In function 'fetch_more_data':
> postgres_fdw.c:2526:17: warning: unused variable 'fsplan'
> [-Wunused-variable]
> ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
>
> Thanks,
>
> Jeff
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


extra_fsplan.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Magnus Hagander
Per discussionat the developer meeting in Brussels, here's a patch that
makes some updates to the backup APIs, to support non-exclusive backups
without using pg_basebackup. The idea is to fix at least three main issues
that are there today -- that you cannot run concurrent backups, that the
backup_label file is created in the data directory which makes it
impossible to distinguish between a cluster restored from backup and one
that crashed while a backup was running, and a cluster can get "stuck" in
backup mode if the backup script/software crashes.

To make this work, this patch:

* Introduces a new argument for pg_start_backup(), for "exclusive". It
defaults to true, which means that just calling pg_start_backup() like
before will behave exactly like before. If it's called with exclusive='f',
a non-exclusive backup is started, and the backup label is collected in
memory instead of in the backup_label file.

* If the client disconnects with a non-exclusive backup running, the backup
is automatically aborted. This is the same thing that pg_basebackup does.
To use these non-exclusive backups the backup software will have to
maintain a persistent connection to the database -- something that should
not be a problem for any of the modern ones out there (but would've been
slightly trickier back in the days when we suggested shellscripts)

* A new version of pg_stop_backup is created, taking an argument specifying
if it's exclusive or not. The current version of pg_stop_backup() continues
to work for exclusive backups, with no changes to behavior. The new
pg_stop_backup will return a record of three columns instead of just the
value -- the LSN (pglsn), the backup label file (text) and the tablespace
map file (text). If used to cancel an exclusive backup, backup label file
and tablespace map will be NULL. At this point it's up to the backup
software to write down the files in the location of the backup.


Attached patch currently doesn't include full documentation, since Bruce
was going to restructure that section of the docs (also based on the
devmeeting). I will write up the docs once that is done (I assume it will
be soon enough, or I'll go do it regardless), but I wanted to get some
review in on the code while waiting.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 16981,16987  SELECT set_config('log_statement_stats', 'off', false);


 
! pg_start_backup(label text , fast boolean )
  
 pg_lsn
 Prepare for performing on-line backup (restricted to superusers or replication roles)
--- 16981,16987 


 
! pg_start_backup(label text , fast boolean , exclusive boolean )
  
 pg_lsn
 Prepare for performing on-line backup (restricted to superusers or replication roles)
***
*** 16991,16997  SELECT set_config('log_statement_stats', 'off', false);
  pg_stop_backup()
  
 pg_lsn
!Finish performing on-line backup (restricted to superusers or replication roles)


 
--- 16991,17004 
  pg_stop_backup()
  
 pg_lsn
!Finish performing exclusive on-line backup (restricted to superusers or replication roles)
!   
!   
!
! pg_stop_backup(exclusive boolean)
! 
!setof record
!Finish performing exclusive or non-exclusive on-line backup (restricted to superusers or replication roles)


 
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 9797,9804  XLogFileNameP(TimeLineID tli, XLogSegNo segno)
   */
  XLogRecPtr
  do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
!    char **labelfile, DIR *tblspcdir, List **tablespaces,
!    char **tblspcmapfile, bool infotbssize,
     bool needtblspcmapfile)
  {
  	bool		exclusive = (labelfile == NULL);
--- 9797,9804 
   */
  XLogRecPtr
  do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
!    StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
!    StringInfo tblspcmapfile, bool infotbssize,
     bool needtblspcmapfile)
  {
  	bool		exclusive = (labelfile == NULL);
***
*** 9812,9819  do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
  	XLogSegNo	_logSegNo;
  	struct stat stat_buf;
  	FILE	   *fp;
- 	StringInfoData labelfbuf;
- 	StringInfoData tblspc_mapfbuf;
  
  	backup_started_in_recovery = RecoveryInProgress();
  
--- 9812,9817 
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 28,41 
--- 28,64 
  #include "replication/walreceiver.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
+ #include "utils/memutils.h"
  

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 7:46 AM, Magnus Hagander  wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. The idea is to fix at least three main issues
> that are there today -- that you cannot run concurrent backups, that the
> backup_label file is created in the data directory which makes it impossible
> to distinguish between a cluster restored from backup and one that crashed
> while a backup was running, and a cluster can get "stuck" in backup mode if
> the backup script/software crashes.
>
> To make this work, this patch:
>
> * Introduces a new argument for pg_start_backup(), for "exclusive". It
> defaults to true, which means that just calling pg_start_backup() like
> before will behave exactly like before. If it's called with exclusive='f', a
> non-exclusive backup is started, and the backup label is collected in memory
> instead of in the backup_label file.
>
> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does. To
> use these non-exclusive backups the backup software will have to maintain a
> persistent connection to the database -- something that should not be a
> problem for any of the modern ones out there (but would've been slightly
> trickier back in the days when we suggested shellscripts)
>
> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text). If used to cancel an exclusive backup, backup label file
> and tablespace map will be NULL. At this point it's up to the backup
> software to write down the files in the location of the backup.
>
>
> Attached patch currently doesn't include full documentation, since Bruce was
> going to restructure that section of the docs (also based on the
> devmeeting). I will write up the docs once that is done (I assume it will be
> soon enough, or I'll go do it regardless), but I wanted to get some review
> in on the code while waiting.

Wow, this is a great idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-10 Thread Robert Haas
On Wed, Feb 10, 2016 at 7:12 AM, Ashutosh Bapat
 wrote:
> Here's patch to remove this declaration. The Assert next probably prevents
> the warning for build with asserts. But both those lines are not needed.

I like the Assert(), so I kept that and ditched the variable.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Larry Rosenman

On 2016-02-10 17:00, Tom Lane wrote:

Larry Rosenman  writes:

On 2016-02-10 16:19, Tom Lane wrote:

I looked into the OS X sources, and found that indeed you are right:
*scanf processes the input a byte at a time, and applies isspace() to
each byte separately, even when the locale is such that that's a
clearly insane thing to do.  Since this code was derived from 
FreeBSD,
FreeBSD has or once had the same issue.  (A look at the freebsd 
project
on github says it still does, assuming that's the authoritative 
repo.)

Not sure about other BSDen.



Definitive FreeBSD Sources:
https://svnweb.freebsd.org/base/


Ah, thanks for the link.  I'm not totally sure which branch is most
current, but at least on this one, it's still clearly wrong:
https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336=markup
convert_string(), which handles %s, applies isspace() to individual 
bytes
regardless of locale.  convert_wstring(), which handles %ls, does it 
more
intelligently ... but as I said upthread, relying on %ls would just 
give

us a different set of portability problems.

It looks like Artur's patch is indeed what we need to do, along with
looking around for other *scanf() uses that are vulnerable.

regards, tom lane


that would be the current 10.x tree, production, and getting ready for 
10.3 which is in code slush.


If you want, file a bug at https://bugs.freebsd.org/bugzilla


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 7011 W Parmer Ln, Apt 1115, Austin, TX 78729-6961


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving responsibility for logging "database system is shut down"

2016-02-10 Thread Simon Riggs
On 10 February 2016 at 22:39, Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Feb 10, 2016 at 3:13 PM, Tom Lane  wrote:
> >> So I propose the attached patch.  Any objections?  Should this get
> >> back-patched?  It's arguably a bug, though surely a minor one, that
> >> the message comes out when it does.
>
> > I would vote against a back-patch.  And I kind of agree with Jim's
> > comments that we ought to consider sprinkling a few more debug
> > messages into the shutdown sequence.
>
> [ shrug... ]  I won't stand in the way of someone else figuring out
> what makes sense there, but I don't intend to do it; and I don't think
> that the quick hacks I did over the last couple days make a reasonable
> basis for a permanent patch.
>

I think its worth adding log messages, but only when its slower than
expected.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Tom Lane
Larry Rosenman  writes:
> If you want, file a bug at https://bugs.freebsd.org/bugzilla

Probably not much point; the commit log shows pretty clearly that they
have been thinking about the code's behavior with multibyte characters,
so I assume they've intentionally decided to keep it like this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers