Re: Is there a memory leak in commit 8561e48?

2018-04-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> I would bet money that that "_SPI_current->internal_xact" thing is
 Tom> wrong/inadequate.

In particular this looks wrong to me: after doing:

do $$
  begin
execute 'declare foo cursor with hold for select 1/x as a from (values 
(1),(0)) v(x)';
commit;  -- errors within the commit
  end;
$$;
ERROR:  division by zero
CONTEXT:  PL/pgSQL function inline_code_block line 1 at COMMIT

the SPI stack is not cleaned up at all, and _SPI_connected is >= 0 even
when back at the main backend command loop.

-- 
Andrew (irc:RhodiumToad)



Re: GSoC 2018: Sorting Algorithm and Benchmarking

2018-04-26 Thread Andrey Borodin
Hi!

> 26 апр. 2018 г., в 0:12, Kefan Yang  написал(а):
>  
> My name is Kefan Yang. I am so excited that my proposal ‘Sorting Algorithm 
> and Benchmarking 2018’ has been accepted.
Welcome! I'm also glad you've chosen this project.

> I see on the Wiki page that you can mentor this proposal.
Yes, we with Atri Sharma will mentor this project.

> This is what I’ve done in these few days:
>   • I’ve carefully read through Submit a Patch and have a basic 
> understanding of the submission process.
Cool!
>   • I’ve set up the environment and ready to code. The sorting routine is 
> kind of separated from other modules so I don’t need much time to get 
> familiar with the code base. If things go smoothly, I can start coding ahead 
> of schedule
Great!
> Now I have some questions:
>   • If I understand it correctly, the sorting benchmark should be an 
> executable  under the src/bin/ folder just like pgbench?
Well, I think it is up to you how to make reproducible, precise and correct 
benchmarking :)
You can make exec (we will not have to merge it into PG, we need just results), 
or you can make Postgres extension (it is much easier) which will be executed 
from psql.
>   • Do you, or other community members, have any suggestions about the 
> final version of proposal?
As far as I know, proposal cannot be modified. Actual work can be slightly 
adjusted though.
>   • Are there any specific thing you expect me to do during this 
> community bonding period?

Let's establish communication. You can also build PostgreSQL from git and try 
pgbench (tool for benchmarking overall performance). Read something about 
TPC-(A,B,C) and YCSB, and may be even read something about sysbench.
I can think of following mediums of communications:
1. Slack channel
2. Telegram chat
3. Skype chat
4. E-mail

Atri, Kefar, how do you think, which of these will do best for us?


Best re


Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-04-26 Thread Ashutosh Bapat
On Thu, Apr 26, 2018 at 5:36 PM, Etsuro Fujita
 wrote:
> (2018/04/25 18:51), Ashutosh Bapat wrote:
>>
>> On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita
>>   wrote:
>>>
>>> o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
>
>>> Another thing I noticed about this patch is this:
>>>
>>> postgres=# create table prt1 (a int, b int, c varchar) partition by range
>>> (a);
>>> postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO
>>> (250);
>>> postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250)
>>> TO
>>> (500)
>>> ;
>>> postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM') from
>>> generate
>>> _series(0, 499) i where i % 2 = 0;
>>> postgres=# analyze prt1;
>>> postgres=# create table prt2 (a int, b int, c varchar) partition by range
>>> (b);
>>> postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO
>>> (250);
>>> postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250)
>>> TO
>>> (500)
>>> ;
>>> postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM') from
>>> generate
>>> _series(0, 499) i where i % 3 = 0;
>>> postgres=# analyze prt2;
>>>
>>> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text =
>>> t2::text a
>>> nd t1.a = t2.b;
>>> ERROR:  ConvertRowtypeExpr found where not expected
>>>
>>> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
>>> pull_vars_clause() in distribute_qual_to_rels() and
>>> generate_base_implied_equalities_no_const() as well.
>>
>>
>> Thanks for the catch. I have updated patch with your suggested fix.
>
>
> Thanks, but I don't think that's enough.  Consider:
>
> postgres=# create table base_tbl (a int, b int);
> postgres=# insert into base_tbl select i % 25, i from generate_series(0,
> 499) i where i % 6 = 0;
>
> postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select a, b,
> 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = t2::text
> and t1.a = t2.b;
> ERROR:  ConvertRowtypeExpr found where not expected
>

Thanks for the test.

> To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
> pull_var_clause() in find_placeholders_in_expr() as well.
>
> The patch doesn't pass that to pull_var_clause() in other places such as
> fix_placeholder_input_needed_levels() or planner.c, but I don't understand
> the reason why that's OK.

I was trying to be conservative not to include
PVC_RECURSE_CONVERTROWTYPES everywhere. But it looks like because of
inheritance_planner() and partition-wise aggregate commit, we can have
ConvertRowtypeExprs almost everywhere. Added
PVC_RECURSE_CONVERTROWTYPEs in almost all the places except the ones
listed in 0001 patch.

>
> Sorry, I've not finished the review yet, so I'll continue that.

Thanks for the tests and review.

Here's updated patch set.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


expr_ref_error_pwj_v4.tar.gz
Description: GNU Zip compressed data


Re: community bonding

2018-04-26 Thread Craig Ringer
On 24 April 2018 at 12:22, Charles Cui  wrote:
> Hi PostgreSQL community and mentors:
>
> Thanks for selecting my project as one of GSoC student projects! Pretty
> exciting and honor to join the development for PostgreSQL (the best database
> in the world :)). So for the first phase of this project (community
> bonding), I am planning to go ahead to set up my developing environment and
> familiar with the related code bases. And I have several questions regarding
> this.
> 1. What ide or command line tools do you guys used most for PostgreSQL
> development?

Varies.

vim, emacs, whatever editor you want. Some people use VS Code. Some
use Eclipse. You could code on Windows in Visual Studio if you want,
or XCode on Mac OS X.

I use vim with a bunch of plugins. Whatever floats your boat, really,
so long as you set it up to help you follow the coding conventions and
source structure rules re tab/spaces etc.

> 2. What version control do you use for postgres? Is github acceptable?

git. But patches are emailed to the list. Github is fine, but we don't
use its pull request facilities, and the PostgreSQL source on github
is a mirror of git.postgresql.org.

The stuff about context diff on the wiki is pretty outdated, I think
most people use unified diff now.

> 3. How do you look at code in PostgreSQL code base? do you have cross
> function reference where I can search function and definition online?

Use whatever facilities your editor and tools provide. I use vim and
cscope mostly, via "vim -t" and "ctrl-]". Visual Studio's Intellisense
works, whatever you want really.

There's doxygen generated documentation too, see
https://doxygen.postgresql.org/ . I don't find it that useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Allow auto_explain to log to NOTICE

2018-04-26 Thread Andrew Gierth
> "Tom" == Tom Dunstan  writes:

 Tom> Hi all
 
 Tom> This patch allows a user to configure auto_explain to log to
 Tom> NOTICE instead of the server log. This allows automated testing of
 Tom> application-generated queries to ensure correct plans etc from
 Tom> code that can inspect returned notices but not the server log.

You do know that you can do

SET client_min_messages = log;

to get all log messages generated within your session sent to the client
as well, yes?

-- 
Andrew (irc:RhodiumToad)



Re: community bonding

2018-04-26 Thread Charles Cui
Thanks Gasper for the information!

2018-04-23 22:41 GMT-07:00 Gasper Zejn :

>
> On 24. 04. 2018 06:22, Charles Cui wrote:
> > Hi PostgreSQL community and mentors:
> >
> > Thanks for selecting my project as one of GSoC student projects!
> > Pretty exciting and honor to join the development for PostgreSQL (the
> > best database in the world :)). So for the first phase of this project
> > (community bonding), I am planning to go ahead to set up my developing
> > environment and familiar with the related code bases. And I have
> > several questions regarding this.
> Welcome!
>
> I'm not a commiter, but I've been looking at Postgres as a side project
> for a while now. You'll find that there's a lot to learn.
> > 1. What ide or command line tools do you guys used most for PostgreSQL
> > development?
> I've had most success with Eclipse. Metin Döşlü presented a talk at
> PGConf.EU on how to do that, but the slides[1] are not the most verbose,
> as he did a live demo. My key takeaway was that you can get the backend
> pid with pg_backend_pid() and then attach GDB in Eclipse. There's also a
> page on Eclipse on wiki[2].
> >
> > 2. What version control do you use for postgres? Is github acceptable?
> Postgres uses GIT[3]. Github does host a mirror, however Postgres as a
> project uses its own git and uses patch based workflow, which is
> described on wiki[4].
>
> > 3. How do you look at code in PostgreSQL code base? do you have cross
> > function reference where I can search function and definition online?
>
> There's a doxygen[5] available, but most of that can also be done in
> Eclipse.
> >
> > Also let me know if you have requirements in this phase which are not
> > covered by my plan.
> >
> > Thanks Charles.
>
> A thing to note is that Postgres mailing lists are not top-post, so
> please try not to do that.
> Wiki is a great resource.
>
> Kind regards,
> Gasper Zejn
>
>
> [1]
> https://www.postgresql.eu/events/pgconfeu2017/sessions/
> session/1605-hacking-postgresql-with-eclipse/
> [2] https://wiki.postgresql.org/wiki/Working_with_Eclipse
> [3] https://www.postgresql.org/docs/current/static/git.html
> [4] https://wiki.postgresql.org/wiki/Submitting_a_Patch
> [5] https://doxygen.postgresql.org/
>


Re: community bonding

2018-04-26 Thread Charles Cui
Thanks Aleksander for your comments!
These are helpful!

2018-04-24 2:07 GMT-07:00 Aleksander Alekseev :

> Hello Charles,
>
> > Thanks for selecting my project as one of GSoC student projects! Pretty
> > exciting and honor to join the development for PostgreSQL (the best
> > database in the world :)).
>
> Welcome!
>
> > 1. What ide or command line tools do you guys used most for PostgreSQL
> > development?
>
> Personally I prefer Vim. Sublime Text and Visual Studio Code are fine too.
>
> > 2. What version control do you use for postgres? Is github acceptable?
>
> Github is OK.
>
> > 3. How do you look at code in PostgreSQL code base? do you have cross
> > function reference where I can search function and definition online?
>
> Vim + Ctags. Editors like Sublime Text and Visual Studio Code have
> features like "goto definition" out-of-the-box.
>
> If you have any other questions please don't hesitate to ask!
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: Allow auto_explain to log to NOTICE

2018-04-26 Thread Andres Freund
On 2018-04-27 11:52:18 +0930, Tom Dunstan wrote:
> > I'd argue this should contain the non-error cases. It's just as
> > reasonable to want to add this as a debug level or such.
> >
> 
> So all of warning, info, debug and debug1-5?

Yea. Likely nobody will ever use debug5, but I don't see a point making
that judgement call here.

Greetings,

Andres Freund



Re: Allow auto_explain to log to NOTICE

2018-04-26 Thread Tom Dunstan
(Resent with subscribed email address, thanks gmail)

Hi Andres, thanks for the extremely fast review!

On 27 April 2018 at 11:46, Andres Freund  wrote:

>
> > I don't see any tests for auto_explain so haven't added any test cases.
>
>> > Happy to do so if that's deemed necessary.
>
>>
> I'd be in favor of adding them.
>

OK, sure.


> > +static int   auto_explain_log_destination = LOG;
>
>>
> I very much dislike this name - it's too similar too the log_destination
> GUC, while being about something different. How about "log_level"?
>

Works for me.


> > +static const struct config_enum_entry destination_options[] = {
>
>> > + {"log", LOG, false},
>
>> > + {"notice", NOTICE, false},
>
>> > + {NULL, 0, false}
>
>> > +};
>
>>
> I'd argue this should contain the non-error cases. It's just as
> reasonable to want to add this as a debug level or such.
>

So all of warning, info, debug and debug1-5?

Thanks

Tom


Re: Allow auto_explain to log to NOTICE

2018-04-26 Thread Andres Freund
Hi,

On 2018-04-27 11:43:58 +0930, Tom Dunstan wrote:
> This patch allows a user to configure auto_explain to log to NOTICE instead
> of the server log. This allows automated testing of application-generated
> queries to ensure correct plans etc from code that can inspect returned
> notices but not the server log.

> I don't see any tests for auto_explain so haven't added any test cases.
> Happy to do so if that's deemed necessary.

I'd be in favor of adding them.


> +static int   auto_explain_log_destination = LOG;

I very much dislike this name - it's too similar too the log_destination
GUC, while being about something different. How about "log_level"?

>  
> +static const struct config_enum_entry destination_options[] = {
> + {"log", LOG, false},
> + {"notice", NOTICE, false},
> + {NULL, 0, false}
> +};

I'd argue this should contain the non-error cases. It's just as
reasonable to want to add this as a debug level or such.

Greetings,

Andres Freund



Allow auto_explain to log to NOTICE

2018-04-26 Thread Tom Dunstan
Hi all

This patch allows a user to configure auto_explain to log to NOTICE instead
of the server log. This allows automated testing of application-generated
queries to ensure correct plans etc from code that can inspect returned
notices but not the server log.

I don't see any tests for auto_explain so haven't added any test cases.
Happy to do so if that's deemed necessary.

Diff is against master. Also at
https://github.com/tomdcc/postgres/tree/auto-explain-notice

Cheers

Tom


auto-explain-notice-v1.diff
Description: Binary data


Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Amit Langote
On 2018/04/27 10:01, Kyotaro HORIGUCHI wrote:
> At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita wrote:
>> (2018/04/26 20:06), Kyotaro HORIGUCHI wrote:
>>> Agreed on all points above.
>>
>> Thanks for reviewing!
> 
> I'm happy if it helps you.

Thank you both for reviewing.

Regards,
Amit




Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-26 Thread Thomas Munro
On Tue, Apr 24, 2018 at 12:09 PM, Bruce Momjian  wrote:
> On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
>> > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
>> > least on Linux. When fsync() returns success it means "all writes since the
>> > last fsync have hit disk" but we assume it means "all writes since the last
>> > SUCCESSFUL fsync have hit disk".
>>
>> > But then we retried the checkpoint, which retried the fsync(). The retry
>> > succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.
>>
>> Random other thing we should look at: Some filesystems (nfs yes, xfs
>> ext4 no) flush writes at close(2). We check close() return code, just
>> log it... So close() counts as an fsync for such filesystems().
>
> Well, that's interesting.  You might remember that NFS does not reserve
> space for writes like local file systems like ext4/xfs do.  For that
> reason, we might be able to capture the out-of-space error on close and
> exit sooner for NFS.

It seems like some implementations flush on close and therefore
discover ENOSPC problem at that point, unless they have NVSv4 (RFC
3050) "write delegation" with a promise from the server that a certain
amount of space is available.  It seems like you can't count on that
in any way though, because it's the server that decides when to
delegate and how much space to promise is preallocated, not the
client.  So in userspace you always need to be able to handle errors
including ENOSPC returned by close(), and if you ignore that and
you're using an operating system that immediately incinerates all
evidence after telling you that (so that later fsync() doesn't fail),
you're in trouble.

Some relevant code:

https://github.com/torvalds/linux/commit/5445b1fbd123420bffed5e629a420aa2a16bf849
https://github.com/freebsd/freebsd/blob/master/sys/fs/nfsclient/nfs_clvnops.c#L618

It looks like the bleeding edge of the NFS spec includes a new
ALLOCATE operation that should be able to support posix_fallocate()
(if we were to start using that for extending files):

https://tools.ietf.org/html/rfc7862#page-64

I'm not sure how reliable [posix_]fallocate is on NFS in general
though, and it seems that there are fall-back implementations of
posix_fallocate() that write zeros (or even just feign success?) which
probably won't do anything useful here if not also flushed (that
fallback strategy might only work on eager reservation filesystems
that don't have direct fallocate support?) so there are several layers
(libc, kernel, nfs client, nfs server) that'd need to be aligned for
that to work, and it's not clear how a humble userspace program is
supposed to know if they are.

I guess if you could find a way to amortise the cost of extending
(like Oracle et al do by extending big container datafiles 10MB at a
time or whatever), then simply writing zeros and flushing when doing
that might work out OK, so you wouldn't need such a thing?  (Unless of
course it's a COW filesystem, but that's a different can of worms.)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Toast issues with OldestXmin going backwards

2018-04-26 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 Robert> I wonder if perhaps get_actual_variable_range() has a hazard of
 Robert> this type as well. If OldestXmin retreats, then the special
 Robert> snapshot type which it uses could see a row whose TOAST entry
 Robert> has been removed meanwhile.

Actually get_actual_variable_range may have a hazard even if OldestXmin
does not retreat; it uses RecentGlobalXmin, which is quite likely to be
older than an OldestXmin value used by a vacuum.

-- 
Andrew (irc:RhodiumToad)



Re: Standby corruption after master is restarted

2018-04-26 Thread Michael Paquier
On Fri, Apr 27, 2018 at 09:49:08AM +0900, Kyotaro HORIGUCHI wrote:
> Thank you for noticing me of that. Is there any way to know how a
> bug report has been concluded? Or should I search -hackers for
> a corresponding thread?

Keeping a look at the list of patches for bugs in the CF app, and
looking at the list of open items is what I use now.  Now for this
particular issue my memory has just served me well as it is hard to know
that both are the same issue by looking at the title.  Good thing I
looked at your patch as well.

> At Thu, 26 Apr 2018 21:13:48 +0900, Michael Paquier  
> wrote in <20180426121348.ga2...@paquier.xyz>
>> On Thu, Apr 26, 2018 at 07:53:04PM +0900, Kyotaro HORIGUCHI wrote:
>>> I think this behavior is a bug. XLogReadRecord is considering the
>>> case but palloc_extended() breaks it. So in the attached, add a
>>> new flag MCXT_ALLOC_NO_PARAMERR to palloc_extended() and
>>> allocate_recordbuf calls it with the new flag. That alone fixes
>>> the problem. However, the patch frees read state buffer facing
>>> errorneous records since such records can leave a too-large
>>> buffer allocated.
>> 
>> This problem is already discussed here:
>> https://commitfest.postgresql.org/18/1516/
>> 
>> And here is the thread:
>> https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
>> 
>> Tsunakawa-san and I discussed a couple of approaches.  Extending
>> palloc_extended so as an incorrect length does not result in an error is
>> also something that crossed by mind, but the length handling is
>> different on the backend and the frontend, so I discarded the idea you
>> are proposing here and instead relied on a check with AllocSizeIsValid,
>> which gives a more simple patch:
>> https://www.postgresql.org/message-id/20180314052753.GA16179%40paquier.xyz
> 
> Yeah, perhaps all approaches in the thread came to my mind but
> choosed different one. I'm fine with the approach in the thread.

Okay, cool.

>> This got no interest from committers yet unfortunately, but I think that
>> this is a real problem which should be back-patched :(
> 
> Several other WAL-related fixes are also waiting to be picked up..

Yeah, simply ignoring corrupted 2PC files at redo is no fun, as well as
is breaking the promise of replication slots.  Let's just make sure that
everything is properly tracked and listed, that's the least we can do.
--
Michael


signature.asc
Description: PGP signature


Re: Toast issues with OldestXmin going backwards

2018-04-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Have we given up on the angle of "prevent OldestXmin from
 Tom> retreating"?

I haven't, and I'm thinking about it, but I don't have an immediate
solution.

Thinking-out-loud mode:

1) we could store minimum values for OldestXmin for each database in
   some kind of shared-memory structure. Downside is that we can't
   easily make this fixed-size, since we don't know how many DBs there
   are - we can't size it based on connections since we need to be able
   to track values for dbs with no currently active connections.

   (Or do we need to track it across restarts? maybe we do, to deal with
   replication slaves without slots, or changes in parameters)

2) in-place updates of a new column in pg_database? and a control file
   field for the global values? not back-patchable, but would it work
   going forward?

3) there was the previous suggestion to store it per-table in the main
   table's reloptions and inplace-update that; this isn't impossible,
   but it would be fiddly to do because toast-table-only vacuums would
   need to locate the main table, and lock order issues might get
   interesting. (Maybe it would be sneakier to store it in the toast
   table's otherwise-unused reloptions, and have vacuum/create index on
   the main table consult that? This assumes that we only care about the
   issue when dealing with toast tables)

4) something else??

-- 
Andrew (irc:RhodiumToad)



Re: jitflags in _outPlannedStmt and _readPlannedStmt treated as bool type

2018-04-26 Thread Andres Freund
Hi,

On 2018-04-26 13:34:44 +1000, Haribabu Kommi wrote:
> The jitflags in the PlannedStmt structure of type "int", but in _out and
> _read functions it is treated as of "bool" type.
> 
> WRITE_BOOL_FIELD(jitFlags);
> READ_BOOL_FIELD(jitFlags);
> 
> I am thinking of it is a copy paste mistake as the other members around the
> initflags are of "bool" type or is there any specific reason to treat as
> "bool" type?

No, it's an oversight. It was a bool first and then morphed from
there... Will fix.

Greetings,

Andres Freund



Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Kyotaro HORIGUCHI
At Thu, 26 Apr 2018 21:16:38 +0900, Etsuro Fujita  
wrote in <5ae1c326.6040...@lab.ntt.co.jp>
> (2018/04/26 20:06), Kyotaro HORIGUCHI wrote:
> > Please rewrite it to use not array reference, but pointer
> > reference if one mtstate logically holds just one resultRelInfo.
> 
> Maybe I don't understand your words correctky, but in that UPDATE
> case, I think that mtstate can have multiple ResultRelInfo.

Ah, mtstate has the same number of resultRelInfo with subplans
and it is *written* in the comment just above:( And it is exactly
for the UPDATE case. Sorry for the silly comment.

>  Anyway, I think that
>  the former is more like an improvement rather than a fix, so it would
>  be
>  better to leave that for another patch for PG12?
> >>>
> >>> I agree, so I'm dropping the patch for 1.
> >>
> >> OK, let's focus on #2!
> >>
> >>> See attached an updated version with changes as described above.
> >>
> >> Looks good to me.  Thanks for the updated version!
> >
> > Agreed on all points above.
> 
> Thanks for reviewing!

I'm happy if it helps you.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: jitflags in _outPlannedStmt and _readPlannedStmt treated as bool type

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 07:03:49PM +0530, Ashutosh Bapat wrote:
> On Thu, Apr 26, 2018 at 9:04 AM, Haribabu Kommi
>> I am thinking of it is a copy paste mistake as the other members around the
>> initflags are of "bool" type or is there any specific reason to treat as
>> "bool" type?
> 
> No, I don't see any. Here's patch for the same.

The code is wrong, and you guys are right as well as the patch.  I have
added an open item to keep track of this one.
--
Michael


signature.asc
Description: PGP signature


Re: Standby corruption after master is restarted

2018-04-26 Thread Kyotaro HORIGUCHI
Thank you for noticing me of that. Is there any way to know how a
bug report has been concluded? Or should I search -hackers for
a corresponding thread?

At Thu, 26 Apr 2018 21:13:48 +0900, Michael Paquier  wrote 
in <20180426121348.ga2...@paquier.xyz>
> On Thu, Apr 26, 2018 at 07:53:04PM +0900, Kyotaro HORIGUCHI wrote:
> > I think this behavior is a bug. XLogReadRecord is considering the
> > case but palloc_extended() breaks it. So in the attached, add a
> > new flag MCXT_ALLOC_NO_PARAMERR to palloc_extended() and
> > allocate_recordbuf calls it with the new flag. That alone fixes
> > the problem. However, the patch frees read state buffer facing
> > errorneous records since such records can leave a too-large
> > buffer allocated.
> 
> This problem is already discussed here:
> https://commitfest.postgresql.org/18/1516/
> 
> And here is the thread:
> https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
> 
> Tsunakawa-san and I discussed a couple of approaches.  Extending
> palloc_extended so as an incorrect length does not result in an error is
> also something that crossed by mind, but the length handling is
> different on the backend and the frontend, so I discarded the idea you
> are proposing here and instead relied on a check with AllocSizeIsValid,
> which gives a more simple patch:
> https://www.postgresql.org/message-id/20180314052753.GA16179%40paquier.xyz

Yeah, perhaps all approaches in the thread came to my mind but
choosed different one. I'm fine with the approach in the thread.

> This got no interest from committers yet unfortunately, but I think that
> this is a real problem which should be back-patched :(

Several other WAL-related fixes are also waiting to be picked up..

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: description of root_tuple_slot missing

2018-04-26 Thread Amit Langote
On 2018/04/27 3:54, Bruce Momjian wrote:
> On Mon, Apr 23, 2018 at 02:50:10PM +0900, Amit Langote wrote:
>> I noticed that the description of root_tuple_slot member is missing in the
>> comment above PartitionTupleRouting definition.  See if the attached patch
>> fixes it correctly.
> 
> Patch applied.  Thanks.

Thank you.

Regards,
Amit




Re: Typo in JIT documentation

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 02:50:51PM -0400, Robert Haas wrote:
> Hmm, I thought we talked about changing such references to say 'JIT
> compilation' rather than just 'JIT'.

Perhaps.  I recall that this has been mentioned already the last couple
of weeks on the JIT thread.  Personally, I can see that JIT is listed as
an acronym in the docs with its meaning  explained by a link to
wikipedia about just-in-time compilation, so I am fine with the use of
the acronym.  "JIT" is actually used without the  markup in
config.sgml.  I think that at least this part should be fixed.

At the end that would be mainly Andres' call?  Do you think that an open
item would be adapted?
--
Michael


signature.asc
Description: PGP signature


Re: Toast issues with OldestXmin going backwards

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 03:09:01PM -0400, Tom Lane wrote:
> Have we given up on the angle of "prevent OldestXmin from retreating"?
> That seems like it'd be far more bulletproof than trying to work
> around places that break one-by-one.

Strong +1 on that.
--
Michael


signature.asc
Description: PGP signature


Re: Setting rpath on llvmjit.so?

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 01:16:09PM -0600, Jason Petersen wrote:
> So the tooling around people using it is there and it’s used by some
> very large and mature projects covering a variety of domains: I don’t
> know if it’s “poorly supported” (the documentation leaves something to
> be desired), but if CMake has trouble down the line a lot of very
> important projects will be in trouble. It seems like a pretty safe bet
> given the sheer inertia of the list above. 

Yes, those would be arguments pushing in favor of cmake.  Yuriy has
mentioned me a couple of times that when he worked on the cmake
integration on Windows he used heavily VS because that was quite
friendly on Windows and their was a good integration work.
--
Michael


signature.asc
Description: PGP signature


Re: Is there a memory leak in commit 8561e48?

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 06:36:01PM -0400, Tom Lane wrote:
> Uh, no, the memory *isn't* reused later.  The coding in AtEOXact_SPI
> assumes that it can just null out _SPI_stack because whatever that
> might've been pointing to is transaction-lifespan memory and will
> go away without extra work here.  Which was true in every prior
> release, because SPI_connect caused the stack to be allocated in
> TopTransactionContext.  You've changed it to be in TopMemoryContext
> and not adjusted the cleanup to match.

Yep, thanks for jumping in the ship.

> We could change the coding in AtEOXact_SPI so that it preserves
> _SPI_stack and _SPI_stack_depth and only resets _SPI_connected,
> but I'm not sure what's the point of keeping the stack storage
> if _SPI_connected gets reset; that means we no longer think
> there's any data there.  One way or the other this code is now
> quite schizophrenic.

Would we want to handle memory contexts differently between an atomic
and a non-atomic connection to the SPI?  For an atomic call, there is
little point in keeping the stack in the session context as this is
invalid data by the end of the transaction commit, and back-branches
rely on the transaction's context removed at when a transaction finishes
to cleanup the memory of the SPI stack.

> Possibly what you really want is for the stack storage to be permanent
> and for _SPI_connected to get reset to something that's not necessarily
> -1, so that AtEOXact_SPI is more like AtEOSubXact_SPI.  But then you need
> a mechanism for figuring out how much of the stack ought to outlive the
> current transaction.  I would bet money that that
> "_SPI_current->internal_xact" thing is wrong/inadequate.
> 
> The comments in both SPI_connect and AtEOXact_SPI about memory management
> seem to need work, too.

Yeah, error handling also needs some extra lookup.  Do you still want
the memory to be around when a transaction fails with an internal ERROR?
Having consistency between the way atomic and non-atomic connections are
handled would be nice, for both the creation of the SPI stack and its
deletion.  The final result should not impact the user experience
particularly for background workers or people would get ramping silent
failures because of memory exhaustion.
--
Michael


signature.asc
Description: PGP signature


Re: obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-26 Thread Tom Lane
Andrew Dunstan  writes:
> After installing python3, I switched buildfarm animal prion to build and
> test with python3 instead of python2 a day or two ago. All this required
> was setting PYTHON=/usr/bin/python3 in the environment. Everything else
> Just Worked.

That just means we've hacked the regression tests to the point where
they work in that environment ;-).  An installation from that code
would still think plpythonu means plpython2u.

regards, tom lane



Re: Is there a memory leak in commit 8561e48?

2018-04-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/19/18 02:10, Michael Paquier wrote:
>> You are right.  I can easily see the leak if I use for example a
>> background worker which connects to a database, and launches many
>> transactions in a row.  The laziest reproducer I have is to patch one of
>> my bgworkers to launch millions of transactions in a tight loop and the
>> leak is plain (this counts relations automatically, does not matter):
>> https://github.com/michaelpq/pg_plugins/tree/master/count_relations

> This is not a memory leak in the sense that the memory is not reusable.
> It allocates memory for a stack, and that stack will be reused later.
> The stack could be bigger than you'll need, but that's not necessarily a
> problem.

Uh, no, the memory *isn't* reused later.  The coding in AtEOXact_SPI
assumes that it can just null out _SPI_stack because whatever that
might've been pointing to is transaction-lifespan memory and will
go away without extra work here.  Which was true in every prior
release, because SPI_connect caused the stack to be allocated in
TopTransactionContext.  You've changed it to be in TopMemoryContext
and not adjusted the cleanup to match.

We could change the coding in AtEOXact_SPI so that it preserves
_SPI_stack and _SPI_stack_depth and only resets _SPI_connected,
but I'm not sure what's the point of keeping the stack storage
if _SPI_connected gets reset; that means we no longer think
there's any data there.  One way or the other this code is now
quite schizophrenic.

Possibly what you really want is for the stack storage to be permanent
and for _SPI_connected to get reset to something that's not necessarily
-1, so that AtEOXact_SPI is more like AtEOSubXact_SPI.  But then you need
a mechanism for figuring out how much of the stack ought to outlive the
current transaction.  I would bet money that that
"_SPI_current->internal_xact" thing is wrong/inadequate.

The comments in both SPI_connect and AtEOXact_SPI about memory management
seem to need work, too.

regards, tom lane



Re: obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-26 Thread Andrew Dunstan


On 04/26/2018 04:39 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 4/25/18 04:50, Pavel Raiskup wrote:
>>> So full drop is not what ask for now (it will be needed for lts/enterprise
>>> distros for quite some time anyway), I just think that plpython3 should be
>>> the default (by build system, by 'plpythonu'=>'plpython3u', etc.).
>> I don't think we should do that unless there is an update to PEP 394.
> PEP 394 points out that some distros (at least Arch) have already switched
> "python" to mean "python3", and I gather from Pavel's inquiry that the day
> is in sight when Fedora will do that.  Perhaps we should think about
> providing a configure switch to control whether "plpythonu" means
> "plpython2u" or "plpython3u", so that distros can reasonably easily sync
> PG's behavior with whatever they've chosen to do with the python shell
> command.  I agree that it's not PG's business to be out in front of this
> change, but maybe we shouldn't be holding it back, either.
>
> I'm not very sure how many moving parts would be involved in making that
> happen.  One thing we should probably do first is retire the pg_pltemplate
> system catalog in favor of making PL extension scripts self-contained,
> as you'd mentioned recently would be a good project to finish off.
>
>   


After installing python3, I switched buildfarm animal prion to build and
test with python3 instead of python2 a day or two ago. All this required
was setting PYTHON=/usr/bin/python3 in the environment. Everything else
Just Worked.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is there a memory leak in commit 8561e48?

2018-04-26 Thread Peter Eisentraut
On 4/19/18 02:10, Michael Paquier wrote:
> On Thu, Apr 19, 2018 at 11:38:09AM +0800, jian.l...@i-soft.com.cn wrote:
>> in commit 8561e48, _SPI_stack alloc from TopMemoryContext. But
>> AtEOXact_SPI just set _SPI_stack = NULL. Is this a memory leak? 
> 
> You are right.  I can easily see the leak if I use for example a
> background worker which connects to a database, and launches many
> transactions in a row.  The laziest reproducer I have is to patch one of
> my bgworkers to launch millions of transactions in a tight loop and the
> leak is plain (this counts relations automatically, does not matter):
> https://github.com/michaelpq/pg_plugins/tree/master/count_relations

This is not a memory leak in the sense that the memory is not reusable.
It allocates memory for a stack, and that stack will be reused later.
The stack could be bigger than you'll need, but that's not necessarily a
problem.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Clock with Adaptive Replacement

2018-04-26 Thread Peter Geoghegan
On Thu, Apr 26, 2018 at 10:39 AM, Robert Haas  wrote:
> I think it is.  We haven't done anything to address it.  I think if we
> want to move to direct I/O -- which may be something we need or want
> to do -- we're going to have to work a lot harder at making good page
> eviction decisions.

+1

> Your patch to change the page eviction algorithm
> didn't help noticeably once we eliminated the contention around buffer
> eviction, but that's just because the cost of a bad eviction went
> down, not because we stopped doing bad evictions.

The goal of that patch, which was literally written over a wet
weekend, was to make people realize that we were missing something
there. I think that it accomplished that much.

> I think it would be
> interesting to insert a usleep() call into mdread() and then test
> buffer eviction various algorithms with that in place.

That could be a useful strategy.

> I'm personally not very excited about making rules like "index pages
> are more valuable than heap pages".  Such rules will in many cases be
> true, but it's easy to come up with cases where they don't hold: for
> example, we might run pgbench for a while and then stop running
> pgbench and start running big sequential scans for reporting purposes.

I am not in favor of this general approach. Actually, I'm willing to
come out against it strongly right now: let's not teach the buffer
manager to distinguish between the AM of a block.

> We don't want to artificially pin the index buffers in shared_buffers
> just because they're index pages; we want to figure out which pages
> really matter.  Now, I realize that you weren't proposing (and
> wouldn't propose) a rule that index pages never get evicted, but I
> think that favoring index pages even in some milder way is basically a
> hack.  Index pages aren't *intrinsically* more valuable; they are more
> valuable because they will, in many workloads, be accessed more often
> than heap pages.  A good algorithm ought to be able to figure that out
> based on the access pattern, without being explicitly given a hint, I
> think.

I won't argue against any of that, but I think that it's rather
nuanced, and that the nuance probably matters a lot.

First of all, we must have a sense of proportion about what is likely
to be true in the average case. I mentioned to Thomas that the pgbench
accounts table is 6x the size its primary key, and that with a uniform
distribution + point lookups the leaf pages literally have 6x the
utility of the heap pages. It really is that simple there. Also,
whatever the distribution of the point lookups is, "cache more leaf
pages" is probably going to be the effect that a better caching
strategy would have. Even 6x is probably an underestimate of the
utility of leaf pages compared to heap pages in the average case. I
bet it's more like 10x for a typical OLTP app.

Second of all, we must have a sense of perspective about what we're
trying to do here, which is to predict the future based on the past.
The best outcome we can hope for is to lose less on average without
hurting anything that already works well enough.

> I believe the root of the problem here is that the usage count we have
> today does a very poor job distinguishing what's hot from what's not.

I think that the root of the problem is that we don't remember
anything about evicted buffers. The same block can bounce in and out
of shared_buffers repeatedly, and we don't recognize that. There are
probably second-order effects from sizing shared_buffers. We
needlessly tie the number of buffers to our capacity to remember
things about block popularity. That would be bad if we had no FS
cache, but with the FS cache it seems awful.

I share the intuition that we need something adaptive, that can be
analysed and understood relatively easily, and has little to no magic.
That's why I'm willing to say now that we shouldn't do anything based
on the type of the blocks involved. However, I think that that model
might work about as well, and that this matters because it provides
motivating examples. Surely you're right when you say that index
blocks are not intrinsically more useful than any of type of block,
but if they're 10x more useful on average, and 20x more useful at
other times, then a person could be forgiven for modelling them as
intrinsically more useful.

It's also useful to examine why a random replacement strategy is
merely mediocre to bad, and not abysmal. That's what every replacement
strategy ends up being with enough cache pressure anyway. This makes
it more reasonable to focus on the average case than it is in other
places.

> There have been previous experiments around making usage_count use
> some kind of a log scale: we make the maximum, say, 32, and the clock
> hand divides by 2 instead of subtracting 1.  I don't think those
> experiments were enormously successful and I suspect that a big part
> of the reason is that it's still pretty easy to get to a state where
> the counters are maxe

Re: obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/25/18 04:50, Pavel Raiskup wrote:
>> So full drop is not what ask for now (it will be needed for lts/enterprise
>> distros for quite some time anyway), I just think that plpython3 should be
>> the default (by build system, by 'plpythonu'=>'plpython3u', etc.).

> I don't think we should do that unless there is an update to PEP 394.

PEP 394 points out that some distros (at least Arch) have already switched
"python" to mean "python3", and I gather from Pavel's inquiry that the day
is in sight when Fedora will do that.  Perhaps we should think about
providing a configure switch to control whether "plpythonu" means
"plpython2u" or "plpython3u", so that distros can reasonably easily sync
PG's behavior with whatever they've chosen to do with the python shell
command.  I agree that it's not PG's business to be out in front of this
change, but maybe we shouldn't be holding it back, either.

I'm not very sure how many moving parts would be involved in making that
happen.  One thing we should probably do first is retire the pg_pltemplate
system catalog in favor of making PL extension scripts self-contained,
as you'd mentioned recently would be a good project to finish off.

regards, tom lane



Re: obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-26 Thread Peter Eisentraut
On 4/25/18 04:50, Pavel Raiskup wrote:
> So full drop is not what ask for now (it will be needed for lts/enterprise
> distros for quite some time anyway), I just think that plpython3 should be
> the default (by build system, by 'plpythonu'=>'plpython3u', etc.).

I don't think we should do that unless there is an update to PEP 394.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transform for pl/perl

2018-04-26 Thread Peter Eisentraut
On 4/24/18 14:31, Andrew Dunstan wrote:
> There is the routine IsValidJsonNumber that helps - see among others
> hstore_io.c for an example use.

I would need something like that taking a double/float8 input.  But I
think there is no such shortcut available, so I just wrote out the tests
for isinf and isnan explicitly.  Attached patch should fix it.
jsonb_plpython will need a similar fix.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ec8529348008e58826442e43809772c19d02a067 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 26 Apr 2018 15:20:43 -0400
Subject: [PATCH] Prevent infinity and NaN in jsonb/plperl transform

jsonb uses numeric internally, and numeric can store NaN, but that is
not allowed by jsonb on input, so we shouldn't store it.  Also prevent
infinity to get a consistent error message.  (numeric input would reject
infinity anyway.)
---
 .../jsonb_plperl/expected/jsonb_plperl.out| 24 +--
 .../jsonb_plperl/expected/jsonb_plperlu.out   | 24 +--
 contrib/jsonb_plperl/jsonb_plperl.c   | 16 +++--
 contrib/jsonb_plperl/sql/jsonb_plperl.sql | 23 +-
 contrib/jsonb_plperl/sql/jsonb_plperlu.sql| 22 +
 5 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/contrib/jsonb_plperl/expected/jsonb_plperl.out 
b/contrib/jsonb_plperl/expected/jsonb_plperl.out
index 99a2e8e135..d6c3becf63 100644
--- a/contrib/jsonb_plperl/expected/jsonb_plperl.out
+++ b/contrib/jsonb_plperl/expected/jsonb_plperl.out
@@ -39,6 +39,26 @@ SELECT testSVToJsonb();
  1
 (1 row)
 
+CREATE FUNCTION testInf() RETURNS jsonb
+LANGUAGE plperl
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = 0 + 'Inf';
+return $val;
+$$;
+SELECT testInf();
+ERROR:  cannot convert infinity to jsonb
+CONTEXT:  PL/Perl function "testinf"
+CREATE FUNCTION testNaN() RETURNS jsonb
+LANGUAGE plperl
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = 0 + 'NaN';
+return $val;
+$$;
+SELECT testNaN();
+ERROR:  cannot convert NaN to jsonb
+CONTEXT:  PL/Perl function "testnan"
 -- this revealed a bug in the original implementation
 CREATE FUNCTION testRegexpResultToJsonb() RETURNS jsonb
 LANGUAGE plperl
@@ -71,7 +91,7 @@ SELECT roundtrip('1');
 (1 row)
 
 SELECT roundtrip('1E+131071');
-ERROR:  cannot convert infinite value to jsonb
+ERROR:  cannot convert infinity to jsonb
 CONTEXT:  PL/Perl function "roundtrip"
 SELECT roundtrip('-1');
  roundtrip 
@@ -207,4 +227,4 @@ SELECT roundtrip('{"1": {"2": [3, 4, 5]}, "2": 3}');
 
 \set VERBOSITY terse \\ -- suppress cascade details
 DROP EXTENSION plperl CASCADE;
-NOTICE:  drop cascades to 6 other objects
+NOTICE:  drop cascades to 8 other objects
diff --git a/contrib/jsonb_plperl/expected/jsonb_plperlu.out 
b/contrib/jsonb_plperl/expected/jsonb_plperlu.out
index 8053cf6aa8..65ed21f3b2 100644
--- a/contrib/jsonb_plperl/expected/jsonb_plperlu.out
+++ b/contrib/jsonb_plperl/expected/jsonb_plperlu.out
@@ -39,6 +39,26 @@ SELECT testSVToJsonb();
  1
 (1 row)
 
+CREATE FUNCTION testInf() RETURNS jsonb
+LANGUAGE plperlu
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = 0 + 'Inf';
+return $val;
+$$;
+SELECT testInf();
+ERROR:  cannot convert infinity to jsonb
+CONTEXT:  PL/Perl function "testinf"
+CREATE FUNCTION testNaN() RETURNS jsonb
+LANGUAGE plperlu
+TRANSFORM FOR TYPE jsonb
+AS $$
+$val = 0 + 'NaN';
+return $val;
+$$;
+SELECT testNaN();
+ERROR:  cannot convert NaN to jsonb
+CONTEXT:  PL/Perl function "testnan"
 -- this revealed a bug in the original implementation
 CREATE FUNCTION testRegexpResultToJsonb() RETURNS jsonb
 LANGUAGE plperlu
@@ -71,7 +91,7 @@ SELECT roundtrip('1');
 (1 row)
 
 SELECT roundtrip('1E+131071');
-ERROR:  cannot convert infinite value to jsonb
+ERROR:  cannot convert infinity to jsonb
 CONTEXT:  PL/Perl function "roundtrip"
 SELECT roundtrip('-1');
  roundtrip 
@@ -207,4 +227,4 @@ SELECT roundtrip('{"1": {"2": [3, 4, 5]}, "2": 3}');
 
 \set VERBOSITY terse \\ -- suppress cascade details
 DROP EXTENSION plperlu CASCADE;
-NOTICE:  drop cascades to 6 other objects
+NOTICE:  drop cascades to 8 other objects
diff --git a/contrib/jsonb_plperl/jsonb_plperl.c 
b/contrib/jsonb_plperl/jsonb_plperl.c
index 837bae2ab5..3f9802c696 100644
--- a/contrib/jsonb_plperl/jsonb_plperl.c
+++ b/contrib/jsonb_plperl/jsonb_plperl.c
@@ -209,10 +209,22 @@ SV_to_JsonbValue(SV *in, JsonbParseState **jsonb_state, 
bool is_elem)
{
double  nval = SvNV(in);
 
+   /*
+* jsonb doesn't allow infinity or NaN (per JSON
+* specification), but the numeric type that is 
used for the
+* storage accepts NaN, so we have to prevent 
it here
+* explicitly.  We don't really have to check 
for isinf()
+* here, as numeric doesn't allow 

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-04-26 Thread Robert Haas
On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund  wrote:
>> I think the real question is whether the scenario is common enough to
>> worry about.  In practice, you'd have to be extremely unlucky to be
>> doing many bulk loads at the same time that all happened to hash to
>> the same bucket.
>
> With a bunch of parallel bulkloads into partitioned tables that really
> doesn't seem that unlikely?

It increases the likelihood of collisions, but probably decreases the
number of cases where the contention gets really bad.

For example, suppose each table has 100 partitions and you are
bulk-loading 10 of them at a time.  It's virtually certain that you
will have some collisions, but the amount of contention within each
bucket will remain fairly low because each backend spends only 1% of
its time in the bucket corresponding to any given partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2018-04-26 Thread Alvaro Herrera
Hello Markus,

Markus Winand wrote:

> * Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.
> 
> > Column_expressions that match TEXT or CDATA nodes must return
> > the contents of the node itself, not the content of the
> > non-existing childs (i.e. the empty string).
> 
> The following query returns a wrong result in master:
> 
> SELECT *
>   FROM (VALUES ('text'::xml)
>  , (''::xml)
>) d(x)
>   CROSS JOIN LATERAL
> XMLTABLE('/xml'
>  PASSING x
>  COLUMNS "node()" TEXT PATH 'node()'
> ) t

> The correct result can be obtained with patch 0001 applied:
> 
>x| node()
> +-
>  text| text
>   | 
> 
> The patch adopts existing tests to guard against future regressions.

I remembered while reading this that Craig Ringer had commented that XML
text sections can have multiple children: just put XML comments amidst
the text.  To test this, I added a comment in one of the text values, in
the regression database:

update xmldata set data = regexp_replace(data::text, 'HongKong', 'HongKong')::xml;

With the modified data, the new query in the regression tests fails:

SELECT  xmltable.*
   FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
 PASSING data
 COLUMNS id int PATH '@id',
  _id FOR ORDINALITY,
  country_name text PATH 'COUNTRY_NAME/text()' 
NOT NULL,
  country_id text PATH 'COUNTRY_ID',
  region_id int PATH 'REGION_ID',
  size float PATH 'SIZE',
  unit text PATH 'SIZE/@unit',
  premier_name text PATH 'PREMIER_NAME' DEFAULT 
'not specified');

ERROR:  more than one value returned by column XPath expression

This seems really undesirable, so I looked into getting it fixed.  Turns
out that this error is being thrown from inside the same function we're
modifying, line 4559.  I didn't have a terribly high opinion of that
ereport() when working on this feature to begin with, so now that it's
proven to provide a bad experience, let's try removing it.  With that
change (patch attached), the feature seems to work; I tried with this
query, which seems to give the expected output (notice we have some
columns of type xml, others of type text, with and without the text()
function in the path):

SELECT  xmltable.*
   FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
 PASSING data COLUMNS
  country_name text PATH 'COUNTRY_NAME/text()' 
NOT NULL,
  country_name_xml xml PATH 
'COUNTRY_NAME/text()' NOT NULL,
  country_name_n text PATH 'COUNTRY_NAME' NOT 
NULL,
  country_name_xml_n xml PATH 'COUNTRY_NAME' 
NOT NULL);
  country_name  │ country_name_xml │ country_name_n │  
country_name_xml_n   
┼──┼┼───
 Australia  │ Australia│ Australia  │ 
Australia
 China  │ China│ China  │ 
China
 HongKong   │ HongKong │ HongKong   │ HongKong
 India  │ India│ India  │ 
India
 Japan  │ Japan│ Japan  │ 
Japan
 Singapore  │ Singapore│ Singapore  │ 
Singapore
 Czech Republic │ Czech Republic   │ Czech Republic │ Czech 
Republic
 Germany│ Germany  │ Germany│ 
Germany
 France │ France   │ France │ 
France
 Egypt  │ Egypt│ Egypt  │ 
Egypt
 Sudan  │ Sudan│ Sudan  │ 
Sudan
(11 filas)


This means that the concatenation now works for all types, not just xml, so I
can do this also:

update xmldata set data = regexp_replace(data::text, '791', '791')::xml;

SELECT  xmltable.*
   FROM (SELECT data FROM xmldata) x,
LATERAL XMLTABLE('/ROWS/ROW'
 PASSING data
 COLUMNS 
  country_name text PATH 'COUNTRY_NAME/text()' 
NOT NULL,
  size_text float PATH 'SIZE/text()',
  "SIZE" float, size_xml xml PATH 'SIZE');
  country_name  │ size_text │ SIZE 
┼───┼──
 Australia  │   │ 
 China  │   │ 
 HongKong   │   │ 
 India  │   │ 
 Japan  │   │ 
 Singapore  │   791 │  791
 Czech Republic │   │ 
 Germany│   │ 
 France │   │ 
 Egypt  │

Re: Setting rpath on llvmjit.so?

2018-04-26 Thread Jason Petersen
> On Apr 18, 2018, at 8:57 AM, Tom Lane  wrote:
> 
> I'm wondering whether that will result in expending a lot of effort to move 
> from a poorly-supported build system to a different poorly-supported build 
> system.


I’m not sure whether the former is autoconf/make or cmake, but count me as 
another vote for cmake over meson. CMake has direct support from Microsoft 
(Visual Studio 2017 can auto-ingest CMake-based files as soon as they’re 
checked out) and JetBrains CLion and is in use by LLVM, KDE, Qt, Blender, 
libpng, cURL, LAPACK, MySQL/MariaDB, OpenCV, SDL, the Dolphin Gamecube emulator 
etc.

So the tooling around people using it is there and it’s used by some very large 
and mature projects covering a variety of domains: I don’t know if it’s “poorly 
supported” (the documentation leaves something to be desired), but if CMake has 
trouble down the line a lot of very important projects will be in trouble. It 
seems like a pretty safe bet given the sheer inertia of the list above.

--
Jason Petersen
Software Engineer | Citus Data
303.736.9255
ja...@citusdata.com



Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-04-26 Thread Andres Freund
Hi,

On 2018-04-26 15:08:24 -0400, Robert Haas wrote:
> I don't think that's a very useful suggestion.  Changing
> N_RELEXTLOCK_ENTS requires a recompile, which is going to be
> impractical for most users.  Even if we made it a GUC, we don't want
> users to have to tune stuff like this.  If we actually think this is
> going to be a problem, we'd probably better rethink the desgin.

Agreed.


> I think the real question is whether the scenario is common enough to
> worry about.  In practice, you'd have to be extremely unlucky to be
> doing many bulk loads at the same time that all happened to hash to
> the same bucket.

With a bunch of parallel bulkloads into partitioned tables that really
doesn't seem that unlikely?

Greetings,

Andres Freund



Re: Toast issues with OldestXmin going backwards

2018-04-26 Thread Tom Lane
Robert Haas  writes:
> I wonder if perhaps get_actual_variable_range() has a hazard of this
> type as well.  If OldestXmin retreats, then the special snapshot type
> which it uses could see a row whose TOAST entry has been removed
> meanwhile.

Hm, yeah.  I wonder if we could fix that by extracting the value from
the index rather than recomputing it from the heap tuple, as we do now.
We'd still have to visit the heap tuple to check liveness, but we'd
not need to extract anything from its data part.

Have we given up on the angle of "prevent OldestXmin from retreating"?
That seems like it'd be far more bulletproof than trying to work
around places that break one-by-one.

regards, tom lane



Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-04-26 Thread Robert Haas
On Thu, Apr 26, 2018 at 2:10 AM, Masahiko Sawada  wrote:
> Thank you for sharing. That's good to know.
>
> Andres pointed out the performance degradation due to hash collision
> when multiple loading. I think the point is that it happens at where
> users don't know.  Therefore even if we make N_RELEXTLOCK_ENTS
> configurable parameter, since users don't know the hash collision they
> don't know when they should tune it.
>
> So it's just an idea but how about adding an SQL-callable function
> that returns the estimated number of lock waiters of the given
> relation? Since user knows how many processes are loading to the
> relation, if a returned value by the function is greater than the
> expected value user  can know hash collision and will be able to start
> to consider to increase N_RELEXTLOCK_ENTS.

I don't think that's a very useful suggestion.  Changing
N_RELEXTLOCK_ENTS requires a recompile, which is going to be
impractical for most users.  Even if we made it a GUC, we don't want
users to have to tune stuff like this.  If we actually think this is
going to be a problem, we'd probably better rethink the desgin.

I think the real question is whether the scenario is common enough to
worry about.  In practice, you'd have to be extremely unlucky to be
doing many bulk loads at the same time that all happened to hash to
the same bucket.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2018-04-26 Thread Bruce Momjian
On Mon, Apr 16, 2018 at 07:48:47PM +0300, Liudmila Mantrova wrote:
> Hi everyone,
> 
> When translating doc updates, Alexander Lakhin noticed that trigram examples
> were not quite accurate.
> A small patch fixing this issue is attached.

FYI, this has been applied by Teodor Sigaev:

   
https://git.postgresql.org/pg/commitdiff/9975c128a1d1bd7e7366adf133b21540a2bc2450

---


> 
> 
> On 03/21/2018 03:35 PM, Teodor Sigaev wrote:
> >Thank you, pushed
> >
> >David Steele wrote:
> >>On 3/6/18 7:04 AM, Teodor Sigaev wrote:
> I agree with Teodor (upthread, not quoted here) that the documentation
> could use some editing.
> 
> I started to do it myself, but quickly realized I have no knowledge of
> the content.  I'm afraid I would destroy the meaning while updating
> the
> grammar.
> 
> Anyone understand the subject matter well enough to review the
> documentation?
> >>>
> >>>Liudmila tried to improve docs in Alexander's patchset.
> >>>
> >>>https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752c...@postgrespro.ru
> >>>
> >>
> >>This looks good to me with a few minor exceptions:
> >>
> >>+   word_similarity(text, text) requires further
> >>+   explanation. Consider the following example:
> >>
> >>Maybe too verbose?  I think "word_similarity(text,
> >>text) requires further explanation." can be removed entirely.
> >>
> >>+   string.  However, this function does not add paddings to the
> >>
> >>"add padding"
> >>
> >>>BTW, adding Liudmila's message to commitfest task
> >>>(https://commitfest.postgresql.org/17/1403/) doesn't work
> >>
> >>Doesn't work for me either.
> >>
> >>Alexander, can you post the final patches to the thread so they show up
> >>in the CF app?
> >>
> >>Thanks,
> >>
> >
> 
> -- 
> Liudmila Mantrova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
> 

> diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
> index 8f39529..be43cdf 100644
> --- a/doc/src/sgml/pgtrgm.sgml
> +++ b/doc/src/sgml/pgtrgm.sgml
> @@ -152,9 +152,9 @@
>  
>  
> In the first string, the set of trigrams is
> -   {"  w"," wo","ord","wor","rd "}.
> +   {"  w"," wo","wor","ord","rd "}.
> In the second string, the ordered set of trigrams is
> -   {"  t"," tw",two,"wo ","  w"," wo","wor","ord","rds", ds 
> "}.
> +   {"  t"," tw","two","wo ","  w"," wo","wor","ord","rds","ds 
> "}.
> The most similar extent of an ordered set of trigrams in the second string
> is {"  w"," wo","wor","ord"}, and the similarity is
> 0.8.
> @@ -172,7 +172,7 @@
> At the same time, strict_word_similarity(text, text)
> has to select an extent that matches word boundaries.  In the example 
> above,
> strict_word_similarity(text, text) would select the
> -   extent {"  w"," wo","wor","ord","rds", ds "}, which
> +   extent {"  w"," wo","wor","ord","rds","ds "}, which
> corresponds to the whole word 'words'.
>  
>  


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: description of root_tuple_slot missing

2018-04-26 Thread Bruce Momjian
On Mon, Apr 23, 2018 at 02:50:10PM +0900, Amit Langote wrote:
> I noticed that the description of root_tuple_slot member is missing in the
> comment above PartitionTupleRouting definition.  See if the attached patch
> fixes it correctly.
> 
> Thanks,
> Amit

> diff --git a/src/include/executor/execPartition.h 
> b/src/include/executor/execPartition.h
> index e81bdc4a0a..7f2b66a206 100644
> --- a/src/include/executor/execPartition.h
> +++ b/src/include/executor/execPartition.h
> @@ -90,6 +90,10 @@ typedef struct PartitionDispatchData *PartitionDispatch;
>   *   given leaf 
> partition's rowtype after that
>   *   partition is 
> chosen for insertion by
>   *   tuple-routing.
> + * root_tuple_slot   TupleTableSlot to be used to 
> transiently hold
> + *   copy of a tuple 
> that's being moved across
> + *   partitions in 
> the root partitioned table's
> + *   rowtype
>   *---
>   */
>  typedef struct PartitionTupleRouting

Patch applied.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: GSoC 2018: Sorting Algorithm and Benchmarking

2018-04-26 Thread Robert Haas
On Wed, Apr 25, 2018 at 3:12 PM, Kefan Yang  wrote:
> If I understand it correctly, the sorting benchmark should be an executable
> under the src/bin/ folder just like pgbench?

What would this executable do, exactly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Typo in JIT documentation

2018-04-26 Thread Robert Haas
On Wed, Apr 25, 2018 at 3:30 AM, Magnus Hagander  wrote:
> On Wed, Apr 25, 2018 at 4:11 AM, Michael Paquier 
> wrote:
>>
>> Hi all,
>>
>> I just found $subject:
>>  the server was compiled without --with-llvm),
>> -JIT will not performed, even if considered to be
>> +JIT will not be performed, even if considered to
>> be
>>  beneficial based on the above criteria.  Setting > linkend="guc-jit"/>
>
> Applied, thanks.

Hmm, I thought we talked about changing such references to say 'JIT
compilation' rather than just 'JIT'.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Toast issues with OldestXmin going backwards

2018-04-26 Thread Robert Haas
On Mon, Apr 23, 2018 at 11:21 PM, Amit Kapila  wrote:
> If I read correctly, it seems one of the main reason [1] is to save
> the extra pass over the heap and improve the code.  Now, ideally, the
> extra pass over heap should also free up some space (occupied by the
> rows that contains old toast pointers corresponding to which we are
> going to remove rows from toast table), but it is quite possible that
> it is already clean due to a previous separate vacuum pass over the
> heap.  I think it is good to save extra pass over heap which might not
> be as useful as we expect, but that can cost us correctness issues in
> boundary cases as in the case being discussed in this thread.

I don't think anybody disagrees with that, but it doesn't answer the
question of how best to fix it.  Making VACUUM a lot more expensive
will not win us any awards, and an extra pass over the heap could be
very expensive.  You seem to think Andrew's fix isn't really
addressing the root of the problem, but I think that just depends on
how you define the problem.  If you define the problem as "the table
should never have dangling pointers to the TOAST table", then Andrew's
approach is only fixing the symptom.  But if you define the problem as
"we shouldn't try to follow TOAST pointers in heap tuples that are not
visible to any current or future MVCC snapshot", then it's fixing the
root issue.

I wonder if perhaps get_actual_variable_range() has a hazard of this
type as well.  If OldestXmin retreats, then the special snapshot type
which it uses could see a row whose TOAST entry has been removed
meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: perltidy tweaks

2018-04-26 Thread Andrew Dunstan


On 04/26/2018 12:33 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> I prpose two tweaks for the perltidy configuration:
>> - The new perltidy version can avoid writing backup files, so we don't
>> need to explicitly delete them after.
> +1


+1

>
>> - Use option --nooutdent-long-quotes.  This is similar to a change we
>> made for pgindent in the C code a while ago.  See patch for the results.
> No strong opinion as to whether to do that or not, but if we do, I wonder
> if we shouldn't also select --nooutdent-long-comments.  I noticed a few
> places where it was doing that (ie moving comments left if they went past
> the right margin) and didn't particularly like it.


Yeah, not a fan of the outdenting, so +1 to both of these.

>
> What would be more per project style is to reflow overly-wide comments
> into multiple lines, but I don't see an option for that :-(
>
>   


Right. Maybe we need a script that at least identifies them for us.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Clock with Adaptive Replacement

2018-04-26 Thread Robert Haas
On Wed, Apr 25, 2018 at 6:54 PM, Peter Geoghegan  wrote:
> Before some of the big shared_buffers bottlenecks were alleviated
> several years ago, it was possible to observe shared_buffers evictions
> occurring essentially at random. I have no idea if that's still true,
> but it could be.

I think it is.  We haven't done anything to address it.  I think if we
want to move to direct I/O -- which may be something we need or want
to do -- we're going to have to work a lot harder at making good page
eviction decisions.  Your patch to change the page eviction algorithm
didn't help noticeably once we eliminated the contention around buffer
eviction, but that's just because the cost of a bad eviction went
down, not because we stopped doing bad evictions.  I think it would be
interesting to insert a usleep() call into mdread() and then test
buffer eviction various algorithms with that in place.

I'm personally not very excited about making rules like "index pages
are more valuable than heap pages".  Such rules will in many cases be
true, but it's easy to come up with cases where they don't hold: for
example, we might run pgbench for a while and then stop running
pgbench and start running big sequential scans for reporting purposes.
We don't want to artificially pin the index buffers in shared_buffers
just because they're index pages; we want to figure out which pages
really matter.  Now, I realize that you weren't proposing (and
wouldn't propose) a rule that index pages never get evicted, but I
think that favoring index pages even in some milder way is basically a
hack.  Index pages aren't *intrinsically* more valuable; they are more
valuable because they will, in many workloads, be accessed more often
than heap pages.  A good algorithm ought to be able to figure that out
based on the access pattern, without being explicitly given a hint, I
think.

I believe the root of the problem here is that the usage count we have
today does a very poor job distinguishing what's hot from what's not.
There have been previous experiments around making usage_count use
some kind of a log scale: we make the maximum, say, 32, and the clock
hand divides by 2 instead of subtracting 1.  I don't think those
experiments were enormously successful and I suspect that a big part
of the reason is that it's still pretty easy to get to a state where
the counters are maxed out for a large number of buffers, and at that
point you can't distinguish between those buffers any more: they all
look equally hot.  We need something better.  If a system like this is
working properly, things like interior index pages and visibility map
pages ought to show up as fiery hot on workloads where the index or
visibility map, as the case may be, is heavily used.

A related problem is that user-connected backends end up doing a lot
of buffer eviction themselves on many workloads.  Maybe the
bgreclaimer patch Amit wrote a few years ago could help somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: perltidy tweaks

2018-04-26 Thread Tom Lane
Peter Eisentraut  writes:
> I prpose two tweaks for the perltidy configuration:
> - The new perltidy version can avoid writing backup files, so we don't
> need to explicitly delete them after.

+1

> - Use option --nooutdent-long-quotes.  This is similar to a change we
> made for pgindent in the C code a while ago.  See patch for the results.

No strong opinion as to whether to do that or not, but if we do, I wonder
if we shouldn't also select --nooutdent-long-comments.  I noticed a few
places where it was doing that (ie moving comments left if they went past
the right margin) and didn't particularly like it.

What would be more per project style is to reflow overly-wide comments
into multiple lines, but I don't see an option for that :-(

regards, tom lane



Scariest patch tournament, PostgreSQL 11 edition

2018-04-26 Thread Alvaro Herrera
PostgreSQL hackers and community at large,

In the spirit of the season, the Release Management Team would like to
gather your thoughts on Fear, Risk and Data Corruption for features in
PostgreSQL 11.  What patch or patches committed in this cycle do you
think have the highest probability of causing problems of any sort for
users?

The poll is not open yet; we'll give a couple of weeks for you to
collect your thoughts, reflect back on the highly active development
cycle, and go over the long list of Postgres 11 commits to define your
candidates.

The PostgreSQL 11 users-to-be will wholeheartedly thank you.

 ---

This, of course, is a bit tongue-in-cheek.  However, going back to
Peter's thoughts[1] about the previous time we did it: the main point of
this poll is to crowd-source the question of where should testing and
post-commit review efforts focus on the most.  Please keep this in mind
when considering your answers.

[1] 
https://www.postgresql.org/message-id/cam3swzqi7b4juvysru6j_zjozkvdynqg7zkdcjxwhabc86a...@mail.gmail.com

-- 
Álvaro Herrera



Re: unused_oids script is broken with bsd sed

2018-04-26 Thread Tom Lane
John Naylor  writes:
> On 4/26/18, Tom Lane  wrote:
>> I notice that duplicate_oids is now a good factor of 10 slower than it was
>> before (~0.04 sec to ~0.4 sec, on my machine).  While this doesn't seem
>> like a problem for manual use, it seems annoying as part of the standard
>> build process, especially on slower buildfarm critters.

> If you're concerned about build speed, I think the generated *_d.h
> headers cause a lot more files to be rebuilt than before. I'll think
> about how to recover that.

Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly.  Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed?  I'm not sure; that would cost something too.

> -removes duplicate_oids (not sure if you intended this, but since
> unused_oids has already been committed to report duplicates, we may as
> well standardize on that) and cleans up after that fact

I don't think anyone's asked for duplicate_oids to be removed altogether,
and I'm afraid that doing so would break existing workflows.  For that
matter, now that I think about it, changing the behavior of unused_oids
as we did yesterday was ill-considered as well, so I undid that part.

I've pushed the current-directory-independence change by itself so that
we can see if any buildfarm members think it's problematic.  (They'll
still be testing duplicate_oids, as things stand.)  Once we have some
confirmation on the FindBin stuff actually being portable, I'll push
the changes to make genbki do duplicate checks and remove duplicate_oids
from the build sequence.

regards, tom lane



Re: WIP: a way forward on bootstrap data

2018-04-26 Thread David Fetter
On Wed, Apr 25, 2018 at 04:39:42PM -0400, Tom Lane wrote:
> Mark Dilger  writes:
> >> On Apr 25, 2018, at 1:00 PM, Robert Haas  wrote:
> >> -1 for trying to automate this.  It varies between fooin and foo_in,
> >> and it'll be annoying to have to remember which one happens
> >> automatically and which one needs an override.
> 
> > That may be a good argument.  I was on the fence about it, because I
> > like the idea that the project might do some cleanup and standardize
> > the names of all in/out/send/recv functions so that no overrides would
> > be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
> > be named in a standard way.)
> 
> > Would that be an API break and hence a non-starter?
> 
> Yeah, I'm afraid so.  I'm pretty sure I recall cases where people
> invoked I/O functions by name, and I definitely recall cases where
> operator-implementation functions were invoked by name.  Those might
> not be very bright things to do, but people do 'em.
> 
> We'd also be risking creating problems for individual apps by conflicting
> with user-defined functions that we didn't use to conflict with.  We take
> that chance every time we add functionality, of course, but usually we can
> soothe complaints by pointing out that they got some new functionality in
> return.  I don't think "we made I/O function names more uniform" is going
> to be a very satisfactory excuse for breakage.

What do you rate the chances that someone created a foo_in when
there's already a built-in fooin?  If it's low enough, we could add
all the foo_ins as aliases to fooins and then mandate that new ones be
called foo_in for future foos.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: unused_oids script is broken with bsd sed

2018-04-26 Thread Tom Lane
I wrote:
> Andreas Karlsson  writes:
>> What about using FindBin (http://perldoc.perl.org/FindBin.html)?

> A quick check suggests that that's present in the core perl
> distribution pretty far back, so I think it'll work.
> Question is, is it any better than John's "dirname(__FILE__)"
> solution?

I experimented a bit, and found that actually what we seem to
need is FindBin::RealBin.  That's the only one of these solutions
that gives the right answer if someone's put a symlink to
unused_oids or duplicate_oids into their PATH --- the other ones
give you the directory containing the symlink.  Now, maybe that
would be an uncommon usage, but it hardly seems out of the question.

regards, tom lane



Re: minor fix for acquire_inherited_sample_rows

2018-04-26 Thread Alvaro Herrera
Amit Langote wrote:
> On Thu, Apr 26, 2018 at 9:54 PM, Ashutosh Bapat
>  wrote:
> > On Thu, Apr 26, 2018 at 1:08 AM, Robert Haas  wrote:
> >> +1.  I think we're really abusing equalTupleDescs() for purposes for
> >> which it was not invented.  Instead of changing it, let's invent a new
> >> function that tests for the thing partitioning cares about (same
> >> ordering of the same columns with the same type information) and call
> >> it logicallyEqualTupleDescs() or something like that.
> >
> > Why don't we just rely on the output of convert_tuples_by_name(),
> > which it seems is always called right now? What's advantage of adding
> > another tuple descriptor comparison?
> 
> The patch I mentioned in my email above does more or less that (what
> you're saying we should do).  In fact it even modifies
> convert_tuple_by_name and convert_tuple_by_name_map to remove some
> redundant computation.  See that patch here if you're interested:
> 
> https://www.postgresql.org/message-id/825031be-942c-8c24-6163-13c27f217a3d%40lab.ntt.co.jp

Yeah, I didn't like the additional nullness checks in that patch.  I was
thinking it might be saner to create a new struct with the AttrNumber *
array, its length and a boolean flag indicating a no-op conversion.
Then we can call map_variable_attnos unconditionally and it does nothing
if the flag is set.  This localizes the ugliness.

I'm not sure if this idea is better than Robert's logicallyEqualTupleDescs().
Maybe we can use both in different places.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: unused_oids script is broken with bsd sed

2018-04-26 Thread Tom Lane
Andreas Karlsson  writes:
> What about using FindBin (http://perldoc.perl.org/FindBin.html)?

A quick check suggests that that's present in the core perl
distribution pretty far back, so I think it'll work.
Question is, is it any better than John's "dirname(__FILE__)"
solution?

regards, tom lane



Re: minor fix for acquire_inherited_sample_rows

2018-04-26 Thread Amit Langote
On Thu, Apr 26, 2018 at 9:54 PM, Ashutosh Bapat
 wrote:
> On Thu, Apr 26, 2018 at 1:08 AM, Robert Haas  wrote:
>> +1.  I think we're really abusing equalTupleDescs() for purposes for
>> which it was not invented.  Instead of changing it, let's invent a new
>> function that tests for the thing partitioning cares about (same
>> ordering of the same columns with the same type information) and call
>> it logicallyEqualTupleDescs() or something like that.
>
> Why don't we just rely on the output of convert_tuples_by_name(),
> which it seems is always called right now? What's advantage of adding
> another tuple descriptor comparison?

The patch I mentioned in my email above does more or less that (what
you're saying we should do).  In fact it even modifies
convert_tuple_by_name and convert_tuple_by_name_map to remove some
redundant computation.  See that patch here if you're interested:

https://www.postgresql.org/message-id/825031be-942c-8c24-6163-13c27f217a3d%40lab.ntt.co.jp

Thanks,
Amit



Re: jitflags in _outPlannedStmt and _readPlannedStmt treated as bool type

2018-04-26 Thread Ashutosh Bapat
On Thu, Apr 26, 2018 at 9:04 AM, Haribabu Kommi
 wrote:
> The jitflags in the PlannedStmt structure of type "int", but in _out and
> _read functions it is treated as of "bool" type.
>
> WRITE_BOOL_FIELD(jitFlags);
> READ_BOOL_FIELD(jitFlags);
>
> I am thinking of it is a copy paste mistake as the other members around the
> initflags are of "bool" type or is there any specific reason to treat as
> "bool" type?

No, I don't see any. Here's patch for the same.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 95d025f09d3cc619b45fed5519685dd247c74252 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 26 Apr 2018 18:58:39 +0530
Subject: [PATCH 3/3] Use READ/WRITE_INT_FIELD macro for jitFlags

jitFlags is declared as int in PlannerStmt but uses
READ/WRITE_BOOL_FIELD _read/_outPlannedStmt respectively.

Ashutosh Bapat, per suggestion from Haribabu Kommi
---
 src/backend/nodes/outfuncs.c  |2 +-
 src/backend/nodes/readfuncs.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3991a0c..cbebabf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -272,7 +272,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_BOOL_FIELD(transientPlan);
 	WRITE_BOOL_FIELD(dependsOnRole);
 	WRITE_BOOL_FIELD(parallelModeNeeded);
-	WRITE_BOOL_FIELD(jitFlags);
+	WRITE_INT_FIELD(jitFlags);
 	WRITE_NODE_FIELD(planTree);
 	WRITE_NODE_FIELD(rtable);
 	WRITE_NODE_FIELD(resultRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c466b98..2826cec 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1518,7 +1518,7 @@ _readPlannedStmt(void)
 	READ_BOOL_FIELD(transientPlan);
 	READ_BOOL_FIELD(dependsOnRole);
 	READ_BOOL_FIELD(parallelModeNeeded);
-	READ_BOOL_FIELD(jitFlags);
+	READ_INT_FIELD(jitFlags);
 	READ_NODE_FIELD(planTree);
 	READ_NODE_FIELD(rtable);
 	READ_NODE_FIELD(resultRelations);
-- 
1.7.9.5



Re: WIP: a way forward on bootstrap data

2018-04-26 Thread Ashutosh Bapat
On Thu, Apr 26, 2018 at 2:11 AM, Mark Dilger  wrote:
>
>> On Apr 25, 2018, at 1:31 PM, Tom Lane  wrote:
>>
>> Robert Haas  writes:
>>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  
>>> wrote:
 There still seems to be a lot of boilerplate in the .dat files
 that could be eliminated. ...
>>
 {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
 typreceive => 'Xrecv', typsend => 'Xsend', ... },
>>
>>> -1 for trying to automate this.  It varies between fooin and foo_in,
>>> and it'll be annoying to have to remember which one happens
>>> automatically and which one needs an override.
>>
>> Yeah, that was my first reaction to that example as well.  If it
>> weren't so nearly fifty-fifty then we might have more traction there;
>> but it's pretty close, and actually the foo_in cases outnumber fooin
>> if I counted correctly.  (Array entries should be ignored for this
>> purpose; maybe we'll autogenerate them someday.)
>
> Part of the problem right now is that nothing really encourages new
> functions to be named foo_in vs. fooin, so the nearly 50/50 split will
> continue when new code is contributed.  If the system forced you to
> specify the name when you did it in a nonstandard way, and not otherwise,
> that would have the effect of documenting which way is now considered
> standard.
>

FWIW, I would like some standard naming convention one way or the
other for in/out function names. Looking up pg_type.h for in/out
functions when you know the built-in type name isn't great. But that
itself may not be enough reason to change it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: minor fix for acquire_inherited_sample_rows

2018-04-26 Thread Ashutosh Bapat
On Thu, Apr 26, 2018 at 1:08 AM, Robert Haas  wrote:
> On Tue, Apr 24, 2018 at 6:19 AM, Amit Langote
>  wrote:
>> Given what equalTupleDescs was invented for (commit a152ebeec), reducing
>> it down to what can be sensibly used for checking whether tuple conversion
>> is needed between a parent and child will, I'm afraid, make it useless for
>> its original purpose.  Just looking at a few checks that are now in it,
>> for example:
>>
>> for (i = 0; i < tupdesc1->natts; i++)
>> {
>> Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
>> Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i);
>>
>> <...snip...>
>>
>> if (attr1->attislocal != attr2->attislocal)
>> return false;
>> if (attr1->attinhcount != attr2->attinhcount)
>> return false;
>> <...snip...>
>> /* attacl, attoptions and attfdwoptions are not even present... */
>> }
>>
>> attislocal and attinhcount obviously can't be same for parent and a child.
>>  Also, the comment above seems to assume that this function is being
>> called from the only place it was designed for (RelationClearRelation).
>
> +1.  I think we're really abusing equalTupleDescs() for purposes for
> which it was not invented.  Instead of changing it, let's invent a new
> function that tests for the thing partitioning cares about (same
> ordering of the same columns with the same type information) and call
> it logicallyEqualTupleDescs() or something like that.

Why don't we just rely on the output of convert_tuples_by_name(),
which it seems is always called right now? What's advantage of adding
another tuple descriptor comparison?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: unused_oids script is broken with bsd sed

2018-04-26 Thread Andreas Karlsson
On 04/26/2018 01:37 PM, John Naylor wrote:> Patch 0002 is my attempt at 
this. Not sure how portable this is. It's

also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.


What about using FindBin (http://perldoc.perl.org/FindBin.html)?

Andreas



Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Etsuro Fujita

(2018/04/26 20:06), Kyotaro HORIGUCHI wrote:

It seems almost fine for me, but just one point..

At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita  wrote 
in<5ae18f9c.6080...@lab.ntt.co.jp>

(2018/04/26 15:35), Amit Langote wrote:

On 2018/04/26 12:43, Etsuro Fujita wrote:
+resultRelation == plan->nominalRelation)
+ resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+}


Seems like a better change than mine; because this simplifies the
logic.


Please rewrite it to use not array reference, but pointer
reference if one mtstate logically holds just one resultRelInfo.


Maybe I don't understand your words correctky, but in that UPDATE case, 
I think that mtstate can have multiple ResultRelInfo.



So, attached are two patches.

1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
  InitResultRelInfo

2. v5 of the patch to fix the bug of foreign partitions

Thoughts?


Actually, I also thought the former when creating the patch, but I
left
that as-is because I'm not sure that would be really safe;
ExecConstraints
looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so
what I
thought was: that might cause unexpected behavior.


OK, I have to agree here that we better leave 1 to be looked at later.

After this change, GetInsertedColumns/GetUpdatedColumns will start
returning a different set of columns in some cases than it does now.
Currently, it *always* returns a set containing root table's attribute
numbers, even for UPDATE.  But with this change, for UPDATE, it will
start
returning the set containing the first subplan target rel's attribute
numbers, which could be any partition with arbitrarily different
attribute
numbers.


Anyway, I think that
the former is more like an improvement rather than a fix, so it would
be
better to leave that for another patch for PG12?


I agree, so I'm dropping the patch for 1.


OK, let's focus on #2!


See attached an updated version with changes as described above.


Looks good to me.  Thanks for the updated version!


Agreed on all points above.


Thanks for reviewing!

Best regards,
Etsuro Fujita



Re: Standby corruption after master is restarted

2018-04-26 Thread Michael Paquier
On Thu, Apr 26, 2018 at 07:53:04PM +0900, Kyotaro HORIGUCHI wrote:
> I think this behavior is a bug. XLogReadRecord is considering the
> case but palloc_extended() breaks it. So in the attached, add a
> new flag MCXT_ALLOC_NO_PARAMERR to palloc_extended() and
> allocate_recordbuf calls it with the new flag. That alone fixes
> the problem. However, the patch frees read state buffer facing
> errorneous records since such records can leave a too-large
> buffer allocated.

This problem is already discussed here:
https://commitfest.postgresql.org/18/1516/

And here is the thread:
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05

Tsunakawa-san and I discussed a couple of approaches.  Extending
palloc_extended so as an incorrect length does not result in an error is
also something that crossed by mind, but the length handling is
different on the backend and the frontend, so I discarded the idea you
are proposing here and instead relied on a check with AllocSizeIsValid,
which gives a more simple patch:
https://www.postgresql.org/message-id/20180314052753.GA16179%40paquier.xyz

This got no interest from committers yet unfortunately, but I think that
this is a real problem which should be back-patched :(
--
Michael


signature.asc
Description: PGP signature


Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-04-26 Thread Etsuro Fujita

(2018/04/25 18:51), Ashutosh Bapat wrote:

On Tue, Apr 24, 2018 at 4:49 PM, Etsuro Fujita
  wrote:

o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:



Another thing I noticed about this patch is this:

postgres=# create table prt1 (a int, b int, c varchar) partition by range
(a);
postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO
(250);
postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) TO
(500)
;
postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM') from
generate
_series(0, 499) i where i % 2 = 0;
postgres=# analyze prt1;
postgres=# create table prt2 (a int, b int, c varchar) partition by range
(b);
postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO
(250);
postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) TO
(500)
;
postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM') from
generate
_series(0, 499) i where i % 3 = 0;
postgres=# analyze prt2;

postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text =
t2::text a
nd t1.a = t2.b;
ERROR:  ConvertRowtypeExpr found where not expected

To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to
pull_vars_clause() in distribute_qual_to_rels() and
generate_base_implied_equalities_no_const() as well.


Thanks for the catch. I have updated patch with your suggested fix.


Thanks, but I don't think that's enough.  Consider:

postgres=# create table base_tbl (a int, b int);
postgres=# insert into base_tbl select i % 25, i from generate_series(0, 
499) i where i % 6 = 0;


postgres=# update prt1 t1 set c = 'foo' from prt2 t2 left join (select 
a, b, 1 from base_tbl) ss(c1, c2, c3) on (t2.b = ss.c2) where t1::text = 
t2::text and t1.a = t2.b;

ERROR:  ConvertRowtypeExpr found where not expected

To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to 
pull_var_clause() in find_placeholders_in_expr() as well.


The patch doesn't pass that to pull_var_clause() in other places such as 
fix_placeholder_input_needed_levels() or planner.c, but I don't 
understand the reason why that's OK.


Sorry, I've not finished the review yet, so I'll continue that.

Thanks for updating the patch!

Best regards,
Etsuro Fujita



Re: unused_oids script is broken with bsd sed

2018-04-26 Thread John Naylor
On 4/26/18, Tom Lane  wrote:
> John Naylor  writes:
>> For those following along, these scripts still assume we're in the
>> catalog directory. I can hack on that part tomorrow if no one else
>> has.
>
> I didn't touch this point.

Patch 0002 is my attempt at this. Not sure how portable this is. It's
also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.

> I notice that duplicate_oids is now a good factor of 10 slower than it was
> before (~0.04 sec to ~0.4 sec, on my machine).  While this doesn't seem
> like a problem for manual use, it seems annoying as part of the standard
> build process, especially on slower buildfarm critters.

If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.

> I think we should
> do what you said upthread and teach genbki.pl to complain about duplicate
> oids, so that we can remove duplicate_oids from the build process.
>
> I went ahead and pushed things as-is so we can confirm that duplicate_oids
> has no portability issues that the buildfarm could find.  But let's try
> to get the build process change in place after a day or so.

Patch 0001
-moves the compile-time test into genbki.pl
-removes a usability quirk from unused_oids
-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact
-updates the documentation.

-John Naylor
From 9cf5c517b6a9359fb1863e3315f5412860db45ff Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 26 Apr 2018 15:54:53 +0700
Subject: [PATCH 1/2] Remove standalone script duplicate_oids

Commit 5602265f770 added duplicate checking to unused_oids, so use
that script for manual checks. Also change it so that it doesn't
exit when encountering a duplicate -- doing so slows down fixing
multiple duplicates.

This left unused_oids as the only caller of FindAllOidsFromHeaders().
Since that function no longer offers any useful abstraction, remove
it from Catalog.pm and inline it into unused_oids.

Make genbki.pl responsible for compile time duplicate checks. This
simplifies the Makefile a bit and keeps from having to read the
catalog files twice.
---
 doc/src/sgml/bki.sgml  |  9 +++
 src/backend/catalog/Catalog.pm | 51 --
 src/backend/catalog/Makefile   |  3 +--
 src/backend/catalog/genbki.pl  | 31 ++-
 src/include/catalog/duplicate_oids | 29 --
 src/include/catalog/unused_oids| 49 +---
 6 files changed, 80 insertions(+), 92 deletions(-)
 delete mode 100755 src/include/catalog/duplicate_oids

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 5a4cd39..087272c 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -377,13 +377,12 @@
 script src/include/catalog/unused_oids.
 It prints inclusive ranges of unused OIDs (e.g., the output
 line 45-900 means OIDs 45 through 900 have not been
-allocated yet).  Currently, OIDs 1- are reserved for manual
+allocated yet), as well as any duplicate OIDs found.
+Currently, OIDs 1- are reserved for manual
 assignment; the unused_oids script simply looks
 through the catalog headers and .dat files
-to see which ones do not appear.  You can also use
-the duplicate_oids script to check for mistakes.
-(That script is run automatically at compile time, and will stop the
-build if a duplicate is found.)
+to see which ones do not appear.  In addition, if genbki.pl detects
+duplicates at compile time, it will stop the build.

 

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 0b057a8..823e09a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -384,55 +384,4 @@ sub FindDefinedSymbolFromData
 	die "no definition found for $symbol\n";
 }
 
-# Extract an array of all the OIDs assigned in the specified catalog headers
-# and their associated data files (if any).
-sub FindAllOidsFromHeaders
-{
-	my @input_files = @_;
-
-	my @oids = ();
-
-	foreach my $header (@input_files)
-	{
-		$header =~ /(.+)\.h$/
-		  or die "Input files need to be header files.\n";
-		my $datfile = "$1.dat";
-
-		my $catalog = Catalog::ParseHeader($header);
-
-		# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
-		# as those are expected to appear in the initial data for pg_class
-		# and pg_type.  For regular catalogs, include these OIDs.
-		if (!$catalog->{bootstrap})
-		{
-			push @oids, $catalog->{relation_oid}
-			  if ($catalog->{relation_oid});
-			push @oids, $catalog->{rowtype_oid} if ($catalog->{rowtype_oid});
-		}
-
-		# No

Re: Built-in connection pooling

2018-04-26 Thread Konstantin Knizhnik



On 26.04.2018 05:09, Michael Paquier wrote:

On Wed, Apr 25, 2018 at 03:42:31PM -0400, Robert Haas wrote:

The difficulty of finding them all is really the problem.  If we had a
reliable way to list everything that needs to be moved into session
state, then we could try to come up with a design to do that.
Otherwise, we're just swatting issues one by one and I bet we're
missing quite a few.

Hm?  We already know about the reset value of a parameter in
pg_settings, which points out to the value which would be used if reset
in a session, even after ebeing reloaded.  If you compare it with the
actual setting value, wouldn't that be enough to know which parameters
have been changed at session-level by an application once connecting?
So you can pull out a list using such comparisons.  The context a
parameter is associated to can also help.
--
Michael
Sorry, may be I do not understand you correctly. But GUCs are already 
handled by builtin connection pooler.
It is done at guc.c level, so doesn't matter how GUC variable is 
changed. All modified GUCs are saved into the session context and 
restored on reschedule.


But there are some other static variables which are not related with 
GUCs. Most of them are really associated with backend, not with session. 
So them should not be handled by reschedule.
But there may be some variables which are intended to be session 
specific. And locating this variables is really non trivial task.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Kyotaro HORIGUCHI
It seems almost fine for me, but just one point..

At Thu, 26 Apr 2018 17:36:44 +0900, Etsuro Fujita  
wrote in <5ae18f9c.6080...@lab.ntt.co.jp>
> (2018/04/26 15:35), Amit Langote wrote:
> > On 2018/04/26 12:43, Etsuro Fujita wrote:
> > +resultRelation == plan->nominalRelation)
> > + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> > +}
> 
> Seems like a better change than mine; because this simplifies the
> logic.

Please rewrite it to use not array reference, but pointer
reference if one mtstate logically holds just one resultRelInfo.

> > I added the block inside the if because I agree with your comment
> > below
> > that the changes to ExecInitPartitionInfo are better kept for later to
> > think more carefully about the dependencies it might break.
> 
> OK
> 
> >>> If we apply the other patch I proposed, resultRelation always points
> >>> to
> >>> the correct RTE.
> >>>
> > I tried to do that.  So, attached are two patches.
> >
> > 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
> >  InitResultRelInfo
> >
> > 2. v5 of the patch to fix the bug of foreign partitions
> >
> > Thoughts?
> >>
> >> Actually, I also thought the former when creating the patch, but I
> >> left
> >> that as-is because I'm not sure that would be really safe;
> >> ExecConstraints
> >> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so
> >> what I
> >> thought was: that might cause unexpected behavior.
> >
> > OK, I have to agree here that we better leave 1 to be looked at later.
> >
> > After this change, GetInsertedColumns/GetUpdatedColumns will start
> > returning a different set of columns in some cases than it does now.
> > Currently, it *always* returns a set containing root table's attribute
> > numbers, even for UPDATE.  But with this change, for UPDATE, it will
> > start
> > returning the set containing the first subplan target rel's attribute
> > numbers, which could be any partition with arbitrarily different
> > attribute
> > numbers.
> >
> >> Anyway, I think that
> >> the former is more like an improvement rather than a fix, so it would
> >> be
> >> better to leave that for another patch for PG12?
> >
> > I agree, so I'm dropping the patch for 1.
> 
> OK, let's focus on #2!
> 
> > See attached an updated version with changes as described above.
> 
> Looks good to me.  Thanks for the updated version!

Agreed on all points above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Sv: Sv: Re: Query is over 2x slower with jit=on

2018-04-26 Thread Andreas Joseph Krogh
På onsdag 18. april 2018 kl. 18:26:03, skrev Andreas Joseph Krogh <
andr...@visena.com >:
På onsdag 18. april 2018 kl. 17:50:55, skrev Andres Freund mailto:and...@anarazel.de>>:
On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
 > With jit=on:
 > https://explain.depesz.com/s/vYB
 > Planning Time: 0.336 ms
 >  JIT:
 >   Functions: 716
 >   Generation Time: 78.404 ms
 >   Inlining: false
 >   Inlining Time: 0.000 ms
 >   Optimization: false
 >   Optimization Time: 43.916 ms
 >   Emission Time: 600.031 ms

 Any chance this is a debug LLVM build?


 > What's the deal with jit making it slower?

 JIT has cost, and sometimes it's not beneficial. Here our heuristics
 when to JIT appear to be a bit off. In the parallel world it's worse
 because the JITing is duplicated for parallel workers atm.
 
PostgreSQL is built with "--enable-debug --with-llvm". LLVM is the one which 
comes with Ubuntu-17.10.
 
Some more info;
 
Without --enable-debug (seems this didn't impact performance mutch):
 
First 5 executions (prepared-statement issued from psql)
 Planning Time: 47.634 ms
  JIT:
    Functions: 725
    Generation Time: 74.748 ms
    Inlining: true
    Inlining Time: 90.763 ms
    Optimization: true
    Optimization Time: 5822.516 ms
    Emission Time: 3089.127 ms
  Execution Time: 16375.996 ms
 
 
After 5 executions (prepared-statement issued from psql)
Planning Time: 0.385 ms
 JIT:
   Functions: 716
   Generation Time: 76.382 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 41.709 ms
   Emission Time: 613.074 ms
 Execution Time: 2031.830 ms
 
jit=off:
 
Planning Time: 0.171 ms
 Execution Time: 832.489 ms
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: Built-in connection pooling

2018-04-26 Thread Konstantin Knizhnik



On 25.04.2018 20:02, Merlin Moncure wrote:


Would integrated pooling help the sharding case (genuinely curious)?
I don't quite have my head around the issue.  I've always wanted
pgbouncer to be able to do things like round robin queries to
non-sharded replica for simple load balancing but it doesn't (yet)
have that capability.  That type of functionality would not fit into
in in-core pooler AIUI.  Totally agree that the administrative
benefits (user/role/.conf/etc/etc) is a huge win.


Yes, pgbpouncer is not intended to balance workload.
You should use ha-proxy or pg-pool. libpq now allow tp specify multiple 
URLs, but unfortunately right now libpq is not able to perform load 
balancing.

I  do not understand how it is related with integrating connection pooling.
Such pooler definitely shound be external if you want to scatter queries 
between different nodes.



The next most common problem are prepared statements breaking, which certainly 
qualifies as a session-level feature.

Yep.  The main workaround today is to disable them.  Having said that,
it's not that difficult to imagine hooking prepared statement creation
to a backend starting up (feature: run X,Y,Z SQL before running user
queries).


Sorry, I do not completely understand your idea.
Yes, it is somehow possible to simulate session semantic by prepending 
all session specific commands (mostly setting GUCs) to each SQL statements.
But it doesn't work for prepared statements: the idea of prepared 
statements is that compilation of statement should be done only once.



  This might be be less effort than, uh, moving backend
session state to a shareable object.  I'll go further; managing cache
memory consumption (say for pl/pgsql cached plans) is a big deal for
certain workloads.   The only really effective way to deal with that
is to manage the server connection count and/or recycle server
connections on intervals.  Using pgbouncer to control backend count is
a very effective way to deal with this problem and allowing
virtualized connections to each mange there independent cache would be
a step in the opposite direction. I very much like having control so
that I have exactly 8 backends for my 8 core server with 8 copies of
cache.


Database performance is mostly limited by disk, so optimal number of 
backends may be different from number of cores.
But certainly possibility to launch "optimal" number of backends is one 
of the advantages of builtin session pooling.




Advisory locks are a completely separate problem.  I suspect they
might be used more than you realize, and they operate against a very
fundamental subsystem of the database: the locking engine.  I'm
struggling as to why we would take another approach than 'don't use
the non-xact variants of them in a pooling environment'.

merlin


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Standby corruption after master is restarted

2018-04-26 Thread Kyotaro HORIGUCHI
Hello. I found how this occurs.

# added -hackers

I've seen similar case with inadequite operation but this time I
found that it can happen under normal operation.

At Tue, 17 Apr 2018 17:14:05 +0200, Emre Hasegeli  wrote in 

> > OK, this seems to confirm the theory that there's a race condition between
> > segment recycling and replicating. It's likely limited to short period after
> > a crash, otherwise we'd probably see many more reports.
> 
> I am suspicious about this part.  It cannot be happening 2 times to us
> and never to anybody else.  Maybe people do not report it because it
> is easy to deal with the problem.  You just delete the corrupted WAL
> file.  Or maybe there is something special in our infrastructure.
> 
> Thank you for your help so far.
> 
> >> This is the connection information.  Although the master shows SSL
> >> compression is disabled in despite of being explicitly asked for.
> 
> > Hmmm, that seems like a separate issue. When you say 'master shows SSL
> > compression is disabled' where do you see that?
> 
> I thought it might be related first, so I mentioned.  Then I found out
> on pg_stat_ssl view on the server that it is actually disabled.
> Probably the configuration of OpenSSL is disabling it.

A segment is not cleard on recycling. walreceiver writes WAL
record by record so startup process can see arbitrary byte
sequence after the last valid record when replication connection
is lost or standby is restarted.

The following scenario results in the similar situation.

1. create master and standby and run.

   It makes happen this easily if wal_keep_segments is set large
   (20 or so) on master and 0 on standby.

2. Write WAL to recycle happens on standby. Explicit checkpoints
   on standby make it faster. May be required to run several
   rounds before we see recycled segment on standby.

   maybe_loop {
 master:
   create table t (a int);
   insert into t (select a from generate_series(0, 15) a);
   delete from t;
   checkpoint;

 standby:
   checkpoint;
   
   }

3. stop master

4. standby starts to complain that master is missing.

  At this time, standby complains for several kinds of failure. I
  saw 'invalid record length' and 'incorrect prev-link' this
  time. I saw 'invalid resource manager ID' when mixing different
  size records. If XLogReadRecord saw a record with impossibly
  large tot_length there, it will causes the palloc failure and
  startup process dies.


5. If you see 'zero length record', it's nothing interesting.
  Repeat 3 and 4 to see another.


This doesn't seem to happen on master since XLogWrite writes in
units of page so redo always sees zero length record or wrong
magic, or pageaddr at page boundaries after a crash. On the other
hand walreceiver writes in unit of records so redo can see
arbitrary byte sequence as xl_tot_len, xl_prev, xl_rmid while
fetching the next record. Espcially the first among them can
cause FATAL with "invalid memory alloc".

I think this behavior is a bug. XLogReadRecord is considering the
case but palloc_extended() breaks it. So in the attached, add a
new flag MCXT_ALLOC_NO_PARAMERR to palloc_extended() and
allocate_recordbuf calls it with the new flag. That alone fixes
the problem. However, the patch frees read state buffer facing
errorneous records since such records can leave a too-large
buffer allocated.

After applying this patch, the behavior for the situation changes
as follows.

> LOG:  started streaming WAL from primary at 0/8300 on timeline 1
> LOG:  record length 1298694144 at 0/83C17B70 too long
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  record length 1298694144 at 0/83C17B70 too long
> LOG:  record length 1298694144 at 0/83C17B70 too long
> LOG:  received promote request
> LOG:  redo done at 0/83C17B38
> LOG:  last completed transaction was at log time 2018-04-26 19:10:12.360253+09
> LOG:  selected new timeline ID: 2
> LOG:  archive recovery complete
> LOG:  database system is ready to accept connections


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 3a86f3497e..02c26224fc 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -165,7 +165,8 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
 	state->readRecordBuf =
-		(char *) palloc_extended(newSize, MCXT_ALLOC_NO_OOM);
+		(char *) palloc_extended(newSize,
+ MCXT_ALLOC_NO_OOM | MCXT_ALLOC_NO_PARAMERR);
 	if (state->readRecordBuf == NULL)
 	{
 		state->readRecordBufSize = 0;
@@ -601,10 +602,19 @@ err:
 
 /*
  * Invalidate the xlogreader's read state to force a re-read.
+ *
+ * read state might hold too large buffer on reading invalid record so release
+ * the buffer as well.
  */
 void
 XLogReaderInvalReadState(XLogReaderState *state)
 {
+	if

Re: "could not reattach to shared memory" on buildfarm member dory

2018-04-26 Thread Craig Ringer
On 24 April 2018 at 15:18, Magnus Hagander  wrote:

> Back when I was combating windows AV on a daily basis, this normally did not
> have the same effect. Just disabling the AV didn't actually remove the parts
> that caused issues, it just hid them. Actual uninstall is what was required.

Yep. Specifically, they tended to inject kernel hooks and/or load hook
DLLs that did funky and often flakey things. Often with poor awareness
of things like multiple processes opening one file for write at the
same time.

I think I heard that MS has cleaned up the situation with AV
considerably by offering more kernel infrastructure for it, and
restricting what you can do in terms of kernel extensions etc. But I
don't know how much.

In any case, do you think dropping a minidump at the point of failure
might be informative? It should contain the full memory mapping
information. For this purpose we could just create a crashdumps/
directory then abort() when we detect the error, and have the
buildfarm stop processing until someone can grab the tempdir with the
dumps, binaries, .pdb files, etc.

src/backend/port/win32/crashdump.c doesn't expose a helper function to
do all the dbghelp.dll messing around and create a crashdump, it only
allows that to be done via a crash handler. But it might make sense to
break out the actual "write a crash dump" part to a separately
callable function. I've looked at doing this before, but always got
stuck with the apparent lack of support in gdb or lldb to be used as a
library for self-dumping. You can always shell out to gcore I guess...
but ew. Or we can fork() and abort() the forked child like
https://github.com/RuntimeTools/gencore does, but again, ew.

I was thinking that maybe the buildfarm could just create crashdumps/
automatically, but then we'd need to have support infrastructure for
recording the Pg binaries and .pdb files along with the dumps,
rotating them so we don't run out of space, etc etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Remove mention in docs that foreign keys on partitioned tables are not supported

2018-04-26 Thread Amit Langote
On 2018/04/26 13:01, David Rowley wrote:
> The attached small patch removes the mention that partitioned tables
> cannot have foreign keys defined on them.
> 
> This has been supported since 3de241db

I noticed also that the item regarding row triggers might be obsolete as
of 86f575948c7, thanks again to Alvaro!  So, I updated your patch to take
care of that.

Thanks,
Amit
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 89735b4804..c2e4ec9ab9 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3315,8 +3315,7 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
  
   
While primary keys are supported on partitioned tables, foreign
-   keys referencing partitioned tables are not supported, nor are foreign
-   key references from a partitioned table to some other table.
+   keys referencing partitioned tables are not supported.
   
  
 
@@ -3340,13 +3339,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
version.
   
  
-
- 
-  
-   Row triggers, if necessary, must be defined on individual partitions,
-   not the partitioned table.
-  
- 
 
 
 


Re: Racing DEADLOCK on PostgreSQL 9.3

2018-04-26 Thread Vladimir Svedov
try "Сombination of blocked and blocking activity" from
https://wiki.postgresql.org/wiki/Lock_Monitoring - it should show you the
originator.
pg_stat_activity.waiting only show affected

On 25 April 2018 at 19:56, Jerry Sievers  wrote:

> Nick Dro  writes:
>
> > Hi,
> > I have a stock table.
> >
> > One of the users in the system ran this query:  update stock set
> > quantity=quantity-5 where stockid=100  (from his client application).
> > On the same time I ran from pg-admin this query:
> >
> > do $$
> > begin
> > alter table stock disable trigger stock_aftertrigger;
> > update stock set stock=0 where stockid=106;
> > alter table stock enable trigger stock_aftertrigger;
> > end; $$
> >
> > What actualy happened is that both queries were stuck on waiting
> > (after 3 minutes I decided to investagate as there quries should be
> > extremly fast!).
>
> I suspect your alter trigger job was blocked first by something else and
> the more trivial update blocked behind you, which is not a *deadlock*
> but a legit case of MVCC.
>
> A real case of deadlock should have been broken in about 1s by the lock
> management policy unless you are running a configuration with huge
> deadlock timeout.
>
> That your alter statement needs a heavy lock means that it can be easily
> blocked and in so doing, block anything else whatsoever also requiring
> access to same objects.
>
> > I ran also this query:
> >
> > SELECT
> > pid,
> > now() - pg_stat_activity.query_start AS duration,
> > query,
> > state, *
> > FROM pg_stat_activity
> > WHERE waiting
> >
> >
> > and both users were on waiting. When I stopped my query the other
> > user got imiddiate result, then I reran mine which also finished
> > immidiatly.
> > I don't understand why both queries were stuck, the logic thing is
> > that one ran and the other one is waiting (if locks aquired etc) it
> > doesnt make senece that both queries are on waiting. waiting for what
> > exactly?
> >
> >
> > Any thoughts on this issue?
> >
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
> p: 312.241.7800
>
>


Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Etsuro Fujita

(2018/04/26 15:35), Amit Langote wrote:

On 2018/04/26 12:43, Etsuro Fujita wrote:

(2018/04/25 17:29), Amit Langote wrote:

Hmm, I missed that we do need information from a proper RTE as well.  So,
I suppose we should be doing this instead of creating the RTE for foreign
partition from scratch.

+rte = list_nth(estate->es_range_table, resultRelation - 1);
+rte = copyObject(rte);
+rte->relid = RelationGetRelid(rel);
+rte->relkind = RELKIND_FOREIGN_TABLE;


As I said upthread, we can use the RTE in the range table as-is if the
foreign table is one of the UPDATE subplan partitions or the target
specified in a COPY command.  So, I created the patch that way because
that would save cycles.  Why not just use that RTE in those cases?


Yes, I agree it's better to reuse the RTE in the cases we can.  So, how
does this look:

+/*
+ * If the foreign table is a partition, we need to create a new RTE
+ * describing the foreign table for use by deparseInsertSql and
+ * create_foreign_modify() below, after first copying the parent's
+ * RTE and modifying some fields to describe the foreign partition to
+ * work on. However, if this is invoked by UPDATE, the existing RTE
+ * may already correspond to this partition if it is one of the
+ * UPDATE subplan target rels; in that case, we can just use the
+ * existing RTE as-is.
+ */
+rte = list_nth(estate->es_range_table, resultRelation - 1);
+if (rte->relid != RelationGetRelid(rel))
+{
+rte = copyObject(rte);
+rte->relid = RelationGetRelid(rel);
+rte->relkind = RELKIND_FOREIGN_TABLE;
+
+/*
+ * For UPDATE, we must use the RT index of the first subplan
+ * target rel's RTE, because the core code would have built
+ * expressions for the partition, such as RETURNING, using that
+ * RT index as varno of Vars contained in those expressions.
+ */
+if (plan&&  plan->operation == CMD_UPDATE&&
+resultRelation == plan->nominalRelation)
+resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+}


Seems like a better change than mine; because this simplifies the logic.


I added the block inside the if because I agree with your comment below
that the changes to ExecInitPartitionInfo are better kept for later to
think more carefully about the dependencies it might break.


OK


If we apply the other patch I proposed, resultRelation always points to
the correct RTE.


I tried to do that.  So, attached are two patches.

1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
 InitResultRelInfo

2. v5 of the patch to fix the bug of foreign partitions

Thoughts?


Actually, I also thought the former when creating the patch, but I left
that as-is because I'm not sure that would be really safe; ExecConstraints
looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I
thought was: that might cause unexpected behavior.


OK, I have to agree here that we better leave 1 to be looked at later.

After this change, GetInsertedColumns/GetUpdatedColumns will start
returning a different set of columns in some cases than it does now.
Currently, it *always* returns a set containing root table's attribute
numbers, even for UPDATE.  But with this change, for UPDATE, it will start
returning the set containing the first subplan target rel's attribute
numbers, which could be any partition with arbitrarily different attribute
numbers.


Anyway, I think that
the former is more like an improvement rather than a fix, so it would be
better to leave that for another patch for PG12?


I agree, so I'm dropping the patch for 1.


OK, let's focus on #2!


See attached an updated version with changes as described above.


Looks good to me.  Thanks for the updated version!

Best regards,
Etsuro Fujita



Re: Oddity in tuple routing for foreign partitions

2018-04-26 Thread Amit Langote
On 2018/04/26 2:59, Robert Haas wrote:
> On Tue, Apr 24, 2018 at 10:19 PM, Amit Langote
>  wrote:
>> I tried to do that.  So, attached are two patches.
>>
>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>InitResultRelInfo
>>
>> 2. v5 of the patch to fix the bug of foreign partitions
>>
>> Thoughts?
> 
> Are you splitting this into 2 patches here because there are 2
> separate issues?  If so, what are those issues?  There seem to be a
> bunch of interrelated changes here but I haven't seen a clear
> explanation of which ones are needed for which reasons.
> 
> I agree that fixing ExecInitPartitionInfo to use the right RT index in
> the first place sounds like a better idea than trying to piece
> together which RT index we should be using after-the-fact, but I need
> to better understand what problems we're trying to fix here before I
> can be sure if that's the strategy I want to endorse...

After considering Fujita-san's comment downthread, I think it might be
better to revisit 1 later, because it could break a certain things which
rely on ri_RangeTableIndex being set to the index of the root partitioned
table's RT entry.  Fujita-san pointed out GetInsertedColumns et al that
return a set of attribute numbers after looking up the RT entry using
ri_RangeTableIndex of a given ResultRelInfo.  They would start returning a
different set for partitions than previously in some cases due to this
change.  Maybe, there are other things that rely on what a partition's
ri_RangeTableEntry has been set to.

For now, I think we'll have to stick with a solution to determine which RT
index to pass to deparseInsertSql that involves a bit of
reverse-engineering, whereby we base that on where the partition's
ResultRelInfo seems to have originated from.  If it looks like it was
created by ExecInitModifyTable because the partition is one of the subplan
target rels, then use the RTE and the RT index as is.  If not,
ExecInitPartitionInfo would have created it because the partition was
chosen as the tuple routing target.  In that case, we have to freshly
create an RTE for the partition after copying most of the fields from the
parent's RTE.  We're done if the partition was selected as routing target
for an INSERT.  If for UPDATE, we also have to change the RT index to pass
to deparseInsertSql to that of the first subplan's target rel, because
that's what we use as varno of expressions contained in the partition's
ResultRelInfo.  Phew!

Thanks,
Amit




Re: Should we add GUCs to allow partition pruning to be disabled?

2018-04-26 Thread David Rowley
On 25 April 2018 at 09:59, Alvaro Herrera  wrote:
> Amit Langote wrote:
>> Although the config.sgml coverage of the new capabilities seems pretty
>> good, some may find their being mentioned in 5.10 Table Partitioning
>> helpful.  Or if we don't want to hijack 5.10.4, maybe create a 5.10.5.
>
> Can you (or someone) describe what would that section contain?

I've drafted and attached a patch of how I think this should look.
Likely it will need some tweaking, but it is probably a good starting
point for discussion.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


further_enable_partition_pruning_doc_updates.patch
Description: Binary data