[HACKERS] IPv6 link-local addresses and init data type

2016-05-29 Thread Tom Dunstan
Hi all

I just ran into an issue that was originally reported way back in 2007 - 
https://www.postgresql.org/message-id/flat/0262b803-b664-4ebe-85b4-3c9a40ea6...@mobygames.com
 


Basically the inet data type cannot store or parse valid ipv6 address literals 
with a scope / zone id suffix. Apparently the combination of virtualised linux, 
ipv6 network and JVM that we are using has combined to report connections on 
localhost as coming from ‘::1%0’, which our app is unsuccessfully attempting to 
store in the db in an inet column. This is the first time that I have ever seen 
this, but perhaps it will get more common as ipv6-first usage increases.

Given that inet is a varlena struct with only known-length fields, it seems 
potentially possible to extend it to add an optional, variable length zone id 
on the end, with the result being backwards compatible with existing data.

Thoughts?

Cheers

Tom



Re: [HACKERS] [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

2016-05-29 Thread Michael Paquier
On Sat, May 28, 2016 at 9:57 AM, Andres Freund  wrote:
> On 2016-05-27 20:54:43 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
>> > 2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL:  
>> > 42501:
>> > could not open file "pg_xlog/RECOVERYXLOG": Permission denied
>>
>> > So, what's the permission of RECOVERYXLOG at that point?  It's pretty
>> > weird that directly after running reason_command it's not readable.
>>
>> s/not readable/not writable/.  I doubt that it's a good idea for that
>> code to think that it can fail hard on non-writable files.
>
> But we actually sometimes write to files we've recovered; if they're the
> end of the WAL after archive recovery and/or promotion. If a
> restore_command restores files in a non-writable way it's buggy; I don't
> see why it's worthwhile to work around that.

Not exactly, startup process does not write directly to the files of
pg_xlog while in recovery. Even with the current code, the first file
that needs to be writable is the first WAL segment of the new
timeline, which is made as a copy of the last partial segment of the
old timeline.

Anyway, I agree with Andres here. We had definitely better be sure
that durable_rename does fsync correctly the old and new entries when
doing a rename meaning that failing on EPERM is adapted, or we may
lose the rename in case of a crash, and data would be lost if the
rename was not seen as effective. And personally I am more worrying
about data losses than strangely designed restore_command scripts,
which is likely using sudo.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question and suggestion about application binary compatibility policy

2016-05-29 Thread Tsunakawa, Takayuki
Hello,

I'd like to ask you about the policy of application binary compatibility.  And 
have a suggestion as well.

QUESTION
==

My customer asked me if the following usage is correct.

- Build an embedded SQL C application with PostgreSQL 9.2.
- Distribute the app without ecpg nor libpq libraries.  Require users to 
install PostgreSQL client which includes ecpg and libpq libraries.
- Use the app with newer PostgreSQL major versions without rebuilding the app.  
That is, the users just replaces the PostgreSQL client.

I expect this is legal, because the so_major versions of ecpg and libpq are 6 
and 5 respectively for all PostgreSQL 9.x versions so far.  However, I could 
not find any description of this binary compatibility policy in the manual, so 
I haven't been able to answer the customer.

What's the official policy of application binary compatibility?  I found the 
same discussion in the below thread, but I'm afraid any clear answer wasn't 
given:

https://www.postgresql.org/message-id/522f0b6b.1040...@4js.com


SUGGESTION
==

How about adding an article about application binary compatibility in the 
following section, as well as chapters for libpq, ECPG, etc?

17.6. Upgrading a PostgreSQL Clust
https://www.postgresql.org/docs/devel/static/upgrading.html

There are three kinds of application assets that users are concerned about when 
upgrading.  Are there anything else to mention?

* libpq app
* ECPG app
* C-language user defined function (and other stuff dependent on it, such as 
extensions, UDTs, etc.)

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-29 Thread Masahiko Sawada
On Sun, May 29, 2016 at 2:44 PM, Noah Misch  wrote:
> On Fri, May 06, 2016 at 04:42:48PM -0400, Robert Haas wrote:
>> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
>> > On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>> > +   charnew_vmbuf[BLCKSZ];
>> > +   char   *new_cur = new_vmbuf;
>> > +   boolempty = true;
>> > +   boolold_lastpart;
>> > +
>> > +   /* Copy page header in advance */
>> > +   memcpy(new_vmbuf, , 
>> > SizeOfPageHeaderData);
>> >
>> > Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>> > with old_lastpart && !empty, right?
>>
>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>> better fix that right away (although I don't actually have time before
>> the wrap).
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
>
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

Thank you for notification.

Regarding check tool for visibility map is still under the discussion.
I'm going to address other review comments, and send the patch ASAP.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does people favor to have matrix data type?

2016-05-29 Thread Kouhei Kaigai
> On 05/28/2016 03:33 PM, Kouhei Kaigai wrote:
> >> -Original Message-
> >> From: Joe Conway [mailto:m...@joeconway.com]
> >> Sent: Sunday, May 29, 2016 1:40 AM
> >> To: Kaigai Kouhei(海外 浩平); Jim Nasby; Ants Aasma; Simon Riggs
> >> Cc: pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] Does people favor to have matrix data type?
> >>
> >> On 05/28/2016 07:12 AM, Kouhei Kaigai wrote:
> >>> Sparse matrix! It is a disadvantaged area for the current array format.
> >>>
> >>> I have two ideas. HPC folks often split a large matrix into multiple
> >>> grid. A grid is typically up to 1024x1024 matrix, for example.
> >>> If a grid is consists of all zero elements, it is obvious we don't need
> >>> to have individual elements on the grid.
> >>> One other idea is compression. If most of matrix is zero, it is an ideal
> >>> data for compression, and it is easy to reconstruct only when calculation.
> >>>
>  Related to this, Tom has mentioned in the past that perhaps we should
>  support abstract use of the [] construct. Currently point finds a way to
>  make use of [], but I think that's actually coded into the grammar.
> 
> >>> Yep, if we consider 2D-array is matrix, no special enhancement is needed
> >>> to use []. However, I'm inclined to have own data structure for matrix
> >>> to present the sparse matrix.
> >>
> >> +1 I'm sure this would be useful for PL/R as well.
> >>
> >> Joe
> >>
> > It is pretty good idea to combine PL/R and PL/CUDA (what I'm now working)
> > for advanced analytics. We will be able to off-load heavy computing portion
> > to GPU, then also utilize various R functions inside database.
> 
> Agreed. Perhaps at some point we should discuss closer integration of
> some sort, or at least a sample use case.
>
What I'm trying to implement first is k-means clustering by GPU. It core 
workload
is iteration of massive distance calculations. When I run kmeans() function of R
for million items with 10 clusters on 40 dimensions, it took about thousand 
seconds.
If GPU version provides the result matrix more rapidly, then I expect R can plot
relationship between items and clusters in human friendly way.

For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange
the data structure with no serialization/de-serialization when PL/R code tries
to call SQL functions. IIUC, pg.spi.exec("SELECT my_function(...)") is the only
way to call SQL functions inside PL/R scripts.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-29 Thread Andreas Seltenreich
Alvaro Herrera writes:

> If you can re-run sqlsmith and see if you can find different bugs, I'd
> appreciate it.
[...]
> [2. text/x-diff; brincrash-2.patch]

BRIN is inconspicuous since applying this patch.  All coredumps I see
now are either due to the parallel worker shutdown issue or acl.c's
text/name confusion, both reported earlier.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Redesigning parallel dump/restore's wait-for-workers logic

2016-05-29 Thread Tom Lane
One of the things I do not like about the current coding of parallel
pg_dump/pg_restore is its baroque logic for handling worker completion
reports, specifically the ListenToWorkers/ReapWorkerStatus APIs.
That code is messy and hard to use --- if the existing logic in
restore_toc_entries_parallel doesn't make your head hurt, you're a better
man than I am.  And we've got two other similar loops using those
functions, which cry out to be merged but can't be because the other two
have hard-wired ideas about what the cleanup action is.

Hence, I propose the attached redesign.  This is based on the idea of
having DispatchJobForTocEntry register a callback function that will take
care of state cleanup, doing whatever had been done by the caller of
ReapWorkerStatus in the old design.  (This callback is essentially just
the old mark_work_done function in the restore case, and a trivial test
for worker failure in the dump case.)  Then we can have ListenToWorkers
call the callback immediately on receipt of a status message, and return
the worker to WRKR_IDLE state; so the WRKR_FINISHED state goes away.
And it becomes easy to design a unified wait-for-worker-messages loop:
in the attached, WaitForWorkers replaces EnsureIdleWorker and
EnsureWorkersFinished as well as the mess in restore_toc_entries_parallel.
Also, we no longer need the fragile API spec that the caller of
DispatchJobForTocEntry is responsible for ensuring there's an idle worker.

In passing, I got rid of the ParallelArgs struct, which was a net negative
in terms of notational verboseness, and didn't seem to be providing any
noticeable amount of abstraction either.

BTW, I also tried to make ParallelState an opaque struct known only
within parallel.c.  I failed at that because there are two loops in
get_next_work_item that want to look at all the actively-being-worked-on
TocEntrys.  A possible solution to that is to separate the TocEntry
fields into their own array, so that ParallelState looks like

typedef struct ParallelState
{
int   numWorkers;   /* allowed number of workers */
/* these arrays have numWorkers entries, one per worker: */
TocEntry**te;   /* item being worked on, or NULL */
ParallelSlot *parallelSlot; /* private info about each worker */
} ParallelState;

where ParallelSlot could be opaque outside parallel.c.  I'm not sure
if this is worth the trouble though.

Comments?

regards, tom lane

diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index e9e8698..5774093 100644
*** a/src/bin/pg_dump/parallel.c
--- b/src/bin/pg_dump/parallel.c
***
*** 35,43 
   * the required action (dump or restore) and returns a malloc'd status string.
   * The status string is passed back to the master where it is interpreted by
   * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update state or catalog information on the master's side,
   * depending on the reply from the worker process.  In the end it returns a
!  * status code, which is 0 for successful execution.
   *
   * Remember that we have forked off the workers only after we have read in
   * the catalog.  That's why our worker processes can also access the catalog
--- 35,45 
   * the required action (dump or restore) and returns a malloc'd status string.
   * The status string is passed back to the master where it is interpreted by
   * AH->MasterEndParallelItemPtr, another format-specific routine.  That
!  * function can update format-specific information on the master's side,
   * depending on the reply from the worker process.  In the end it returns a
!  * status code, which we pass to the ParallelCompletionPtr callback function
!  * that was passed to DispatchJobForTocEntry().  The callback function does
!  * state updating for the master control logic in pg_backup_archiver.c.
   *
   * Remember that we have forked off the workers only after we have read in
   * the catalog.  That's why our worker processes can also access the catalog
***
*** 48,60 
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
   *		WRKR_IDLE: it's waiting for a command
!  *		WRKR_WORKING: it's been sent a command
!  *		WRKR_FINISHED: it's returned a result
   *		WRKR_TERMINATED: process ended
-  * The FINISHED state indicates that the worker is idle, but we've not yet
-  * dealt with the status code it returned from the prior command.
-  * ReapWorkerStatus() extracts the unhandled command status value and sets
-  * the workerStatus back to WRKR_IDLE.
   */
  
  #include "postgres_fe.h"
--- 50,57 
   * In the master process, the workerStatus field for each worker has one of
   * the following values:
   *		WRKR_IDLE: it's waiting for a command
!  *		WRKR_WORKING: it's working on a command
   *		WRKR_TERMINATED: process ended
   */
  
  #include "postgres_fe.h"
***
*** 75,80 
--- 72,79 

[HACKERS] regexp_match() returning text

2016-05-29 Thread Emre Hasegeli
Attached patch adds regexp_match() function which is a simple variant of
regexp_matches() that doesn't return a set.  It is based on Tom Lane's
comment to bug #10889 [1].

[1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us
From f8c113c77864ef1ca6386195aea02b2090ff17b6 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli 
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml   |   5 ++
 doc/src/sgml/func.sgml |  59 +++---
 src/backend/catalog/information_schema.sql |   2 +-
 src/backend/utils/adt/regexp.c | 125 +
 src/include/catalog/pg_proc.h  |   4 +
 src/include/utils/builtins.h   |   2 +
 src/test/regress/expected/regex.out|  28 +++
 src/test/regress/expected/strings.out  |   4 +-
 src/test/regress/sql/regex.sql |   7 ++
 9 files changed, 188 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..503dfe5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,20 +1935,37 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text
+   
+Return a single captured substring resulting from matching a POSIX regular
+expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', 'barbeque')
+   barbeque
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,81 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
 

 
 
+ The regexp_match function returns the first captured
+ substring resulting from matching a POSIX regular expression
+ pattern.  It has the syntax
+ regexp_match(string, pattern
+ , flags ).
+ If the pattern does not match, the function returns
+ NULL.  It ignores the parenthesized subexpressions in
+ the pattern.  regexp_matches can be used in
+ this case (see below for details).
+ The flags parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior.  Supported flags
+ are described in .
+
+
+
  The regexp_matches function returns a text array of
  all of the captured substrings resulting from matching a POSIX
- regular expression pattern.  It has the syntax
- regexp_matches(string, pattern
- , flags ).
+ regular expression pattern.  It has the same syntax as
+ regexp_match.
  The function can return no rows, one row, or multiple rows (see
  the g flag below).  If the pattern
  does not match, the function returns no rows.  If the pattern
  contains no parenthesized subexpressions, then each row
  returned is a single-element text array containing the substring
  matching the whole pattern.  If the pattern contains parenthesized
  subexpressions, the function returns a text array whose
  n'th element is the substring matching the
  n'th parenthesized subexpression of the pattern
  (not counting non-capturing parentheses; see below for
  details).
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
- function's behavior.  Flag g causes the 

Re: [HACKERS] Does people favor to have matrix data type?

2016-05-29 Thread Joe Conway
On 05/28/2016 03:33 PM, Kouhei Kaigai wrote:
>> -Original Message-
>> From: Joe Conway [mailto:m...@joeconway.com]
>> Sent: Sunday, May 29, 2016 1:40 AM
>> To: Kaigai Kouhei(海外 浩平); Jim Nasby; Ants Aasma; Simon Riggs
>> Cc: pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] Does people favor to have matrix data type?
>>
>> On 05/28/2016 07:12 AM, Kouhei Kaigai wrote:
>>> Sparse matrix! It is a disadvantaged area for the current array format.
>>>
>>> I have two ideas. HPC folks often split a large matrix into multiple
>>> grid. A grid is typically up to 1024x1024 matrix, for example.
>>> If a grid is consists of all zero elements, it is obvious we don't need
>>> to have individual elements on the grid.
>>> One other idea is compression. If most of matrix is zero, it is an ideal
>>> data for compression, and it is easy to reconstruct only when calculation.
>>>
 Related to this, Tom has mentioned in the past that perhaps we should
 support abstract use of the [] construct. Currently point finds a way to
 make use of [], but I think that's actually coded into the grammar.

>>> Yep, if we consider 2D-array is matrix, no special enhancement is needed
>>> to use []. However, I'm inclined to have own data structure for matrix
>>> to present the sparse matrix.
>>
>> +1 I'm sure this would be useful for PL/R as well.
>>
>> Joe
>>
> It is pretty good idea to combine PL/R and PL/CUDA (what I'm now working)
> for advanced analytics. We will be able to off-load heavy computing portion
> to GPU, then also utilize various R functions inside database.

Agreed. Perhaps at some point we should discuss closer integration of
some sort, or at least a sample use case.

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-29 Thread Emre Hasegeli
> My interpretation of the standard is that FILTER is not allowable for
> a window function, and IGNORE|RESPECT NULLS is not allowable for an
> ordinary aggregate.

Yes, it is clear.

> So if we support IGNORE|RESPECT NULLS for anything other than a window
> function, we have to come up with our own semantics.

I don't think this clause is useful for aggregates especially while we
already have the FILTER clause.  Though, I can see this error message
being useful:

> ERROR:  IGNORE NULLS is only implemented for the lead and lag window functions

Can we still give this message when the syntax is not part of the OVER clause?

Thank you for returning back to this patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers