Re: [HACKERS] optimizing vacuum truncation scans

2015-09-27 Thread Haribabu Kommi
On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janes  wrote:
> On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs  wrote:
>>
>> On 22 July 2015 at 17:11, Jeff Janes  wrote:
>>>
>>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
>>> wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes 
 wrote:
 > Attached is a patch that implements the vm scan for truncation.  It
 > introduces a variable to hold the last blkno which was skipped during
 > the
 > forward portion.  Any blocks after both this blkno and after the last
 > inspected nonempty page (which the code is already tracking) must have
 > been
 > observed to be empty by the current vacuum.  Any other process
 > rendering the
 > page nonempty are required to clear the vm bit, and no other process
 > can set
 > the bit again during the vacuum's lifetime.  So if the bit is still
 > set, the
 > page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.
>>>
>>>
>>> I wouldn't say forever, as it would be easy to revert the change if
>>> something more important came along that conflicted with it.
>>
>>
>> I think what is being said here is that someone is already using this
>> technique, or if not, then we actively want to encourage them to do so as an
>> extension or as a submission to core.
>>
>> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
>> Jeff. Probably needs code comments added.
>
>
> Sure, that sounds like the consensus.  The VM method was very efficient, but
> I agree it is pretty fragile and restricting.
>
>>
>>
>> That does still leave the prefetch technique, so all is not lost.
>>
>> Can we see a patch with just prefetch, probably with a simple choice of
>> stride? Thanks.
>
>
> I probably won't get back to it this commit fest, so it can be set to
> returned with feedback.  But if anyone has good ideas for how to set the
> stride (or detect that it is on SSD and so is pointless to try) I'd love to
> hear about them anytime.

I got the following way to get the whether data file is present in the
DISK or SSD.

1. Get the device file system that table data file is mapped using the
following or similar.

df -P "filename" | awk 'NR==2{print $1}'

2. if the device file system is of type /dev/sd* then treat is as a
disk system and proceed
with the prefetch optimization.

3. if we are not able to find the device details directly then we need
to get the information
from the mapping system.

Usually the devices will map like the following

/dev/mapper/v** points to ../dm-*

4. Get the name of the "dm-*"  from the above details and check
whether it is a SSD or not
with the following command.

/sys/block/dm-*/queue/rotation

5. If the value is 0 then it is an SSD drive, 1 means disk drive.

The described above procedure works only for linux. I didn't check for
other operating systems yet.
Is it worth to consider?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doubt in pgbench TPS number

2015-09-27 Thread Tatsuo Ishii
> I think that the degree of parallelism to consider is nclients, not
> nthreads: while connection time is accumulated in conn_time, other
> clients are possibly doing their transactions, in parallel, 

I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.

I found this while playing with pgpool-II. pgpool-II pre-forks child
process, whose number is defined by "num_init_children"
directive. What I observed was, pgbench starts connecting to pgpool-II
until "-c" connections are established. So, if "-c" is larger than
"num_init_children", no transaction starts.

> even if it
> is in the same thread, so it is not "stopped time" for all clients. It
> starts to matter with "-j 1 -c 30" and slow transactions, the
> cumulated conn_time in each thread may be arbitrary close to the whole
> time if there are many clients.
> 
> Now, I do not think that this tps computation makes that much
> sense. If you want to know the tps without reconnect, run without
> reconnecting... It is clear that I do not get this figure when running
> without -C, so maybe
> the tps with/without reconnection should be dropped?
> 
> Anyway, here is a patch, which:
>  - fixes the "exclude" computation (s/nthreads/nclients/)
>  - fixes the count total for skipped (s/threads/thread/)
>  - removes a useless parameter to doCustom
>(the conn_time is available through the thread parameter).

I have tested your patch. It seems the tolerance is much better than
before with your patch.

With the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql  test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.896850 (including connections establishing)
tps = 98.101662 (excluding connections establishing)

Without the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql  test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.919415 (including connections establishing)
tps = 124.732475 (excluding connections establishing)

I'm going to commit your patch if there's no objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Revised documentation on base backups

2015-09-27 Thread Amir Rohan
On 09/28/2015 06:33 AM, Michael Paquier wrote:
> On Sun, Sep 27, 2015 at 11:27 AM, Amir Rohan wrote:
>> Further editing. See attached V3.
>
> I think that you should consider adding this patch to the next commit
> fest so as we do not lose track of it:
> https://commitfest.postgresql.org/7/
>


I've recently picked up the base backup section of the documentation:
http://www.postgresql.org/docs/9.4/static/continuous-archiving.html
(Section 24.3.2) and went through like a tutorial. I'm a relative
newcomer to postgres, so this was the real deal.

The docs seem to be in need of some improvement, primarily because
it has the feel of having been monkey-patched repeatedly it in the
past. It needs to have someone with more experience read through it
and comment on technical accuracy and any knee-slapper error I
might have introduced.


One pain point in rewriting is the definition of terms, or lack
there of in the manual. I didn't find a section defining terms
conclusively for consistent referencing throughout the documentation.
For example, how should one refer to the directory housing
"postgresql.conf"?, is it:

1. The data directory
2. The storage directory
3. The PGDATA directory
4. The cluster directory
5. The server root directory
6. The config file directory
7. etc'

This lack of clear guidelines and uniformly applied terms
makes for some awkward verbal manoeuvring and compromises the
quality of the documentation (but admittedly, not its copiousness).

Amir

>From f763259e6e91d69dd1c41f67169fa0666b1f82d9 Mon Sep 17 00:00:00 2001
From: root 
Date: Sat, 26 Sep 2015 11:50:43 +0300
Subject: [PATCH] Rewritten documentation on base backups V2


diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 7413666..e2727e1 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -427,8 +427,8 @@ tar -cf backup.tar /usr/local/pgsql/data
   
If simultaneous snapshots are not possible, one option is to shut down
the database server long enough to establish all the frozen snapshots.
-   Another option is to perform a continuous archiving base backup () because such backups are immune to file
+   Another option is to create a base backup using the continuous archiving feature
+   () because such backups are immune to file
system changes during the backup.  This requires enabling continuous
archiving just during the backup process; restore is done using
continuous archive recovery ().
@@ -752,60 +752,58 @@ test ! -f /mnt/server/archivedir/000100A90065 && cp pg_xlog/
Making a Base Backup
 

-The easiest way to perform a base backup is to use the
- tool. It can create
-a base backup either as regular files or as a tar archive. If more
-flexibility than  can provide is
-required, you can also make a base backup using the low level API
-(see ).
+A base backup consists of one or more WAL files and a .
+Together they sufficient to recreate the database's state at some point in the past.
+Once a base backup is made, the WAL files that precede its creation are no longer
+necessary in order to recover the database to some later point in time.
+   
+
+   
+The interval between base backups should usually be
+chosen based on how much storage you want to expend on archived WAL
+files, since you must keep all the archived WAL files back to your
+last base backup.
+You should also consider how long you are prepared to spend
+recovering, if recovery should be necessary — the system will have to
+replay all those WAL segments, and that could take awhile if it has
+been a long time since the last base backup.

 

-It is not necessary to be concerned about the amount of time it takes
-to make a base backup. However, if you normally run the
-server with full_page_writes disabled, you might notice a drop
-in performance while the backup runs since full_page_writes is
-effectively forced on during backup mode.
+Creating a base backup may be a lengthy process if you have a lots of data.
+Be aware that If you normally run the server with full_page_writes
+disabled, you might notice a drop in performance while the backup runs since
+full_page_writes is effectively forced on during backup mode.

 

 To make use of the backup, you will need to keep all the WAL
 segment files generated during and after the file system backup.
-To aid you in doing this, the base backup process
-creates a backup history file that is immediately
-stored into the WAL archive area. This file is named after the first
-WAL segment file that you need for the file system backup.
-For example, if the starting WAL file is
-0001123455CD the backup history file will be
-named something like
-0001123455CD.007C9330.backup. (The second
-part of the file name stands for an exact position within the WAL
-file, and can ordinar

Re: [HACKERS] pg_xlogdump and .partial files

2015-09-27 Thread Michael Paquier
On Mon, Sep 28, 2015 at 9:47 AM, Peter Eisentraut  wrote:
> The pg_xlogdump man page states that it cannot read .partial WAL files
> and that they need to be renamed.  It seems to me with about the same
> number of lines we could just make it accept those files, no?

FWIW, the discussion has happened here:
http://www.postgresql.org/message-id/cab7npqs_x5-m+s9yo_xd7oxwq1etvwv-1tsjuqnovqyyazt...@mail.gmail.com
In short, I was under the impression that this was not worth it, and I
still am under this impression knowing that this is not a common use
case.

Now, and I imagine that this is actually what is disturbing for the
user: we could allow the definition of a .partial file as an end
segment. We would need to be careful though in XLogDumpXLogRead when
doing the segment switch when the last segment is being read. As
pg_xlogdump does not handle timeline jumps this would be useful to
allow the user to have a look at the segments of a previous timeline
up to the point the promoted standby has switched to, particularly if
the .partial file, archived by a promoted standby, and its complete
version, archived by old master are present at the same position.
Still this seems like a very narrow use-case to me, so opinions are
welcome.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_xlogdump and .partial files

2015-09-27 Thread Peter Eisentraut
The pg_xlogdump man page states that it cannot read .partial WAL files
and that they need to be renamed.  It seems to me with about the same
number of lines we could just make it accept those files, no?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Gavin Wahl
> Yeah.  I would urgently recommend that people *not* try to build new
> things like planagg.c right now.  A large part of the point of upper
> planner path-ification is to have a less grotty way of dealing with
> things like specialized aggregate implementations.

Ok. I will wait and ask again later.


Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-09-27 Thread Peter Geoghegan
On Sun, Sep 27, 2015 at 4:11 PM, Peter Geoghegan  wrote:
> Debugging this stuff is sometimes like keyhole surgery. If you could
> just see at/get to the structure that you care about, it would be 10
> times easier. Hopefully this tool makes it easier to identify problems.

I should add that the way that the L&Y technique works, and the way
that Postgres code is generally very robust/defensive can make direct
testing a difficult thing. I have seen cases where a completely messed
up B-Tree still gave correct results most of the time, and was just
slower. That can happen, for example, because the "move right" thing
results in a degenerate linear scan of the entire index. The
comparisons in the internal pages were totally messed up, but it
"didn't matter" once a scan could get to leaf pages and could move
right and find the value that way.

I wrote amcheck because I thought it was scary how B-Tree indexes
could be *completely* messed up without it being obvious; what hope is
there of a test finding a subtle problem in their structure, then?
Testing the invariants directly seemed like the only way to have a
chance of not introducing bugs when adding new stuff to the B-Tree
code. I believe that adding optimizations to the B-Tree code will be
important in the next couple of years, and there is no other way to
approach it IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-09-27 Thread Peter Geoghegan
On Thu, Sep 3, 2015 at 8:35 AM, Anastasia Lubennikova
 wrote:
>> * Since everything is aligned within B-Tree, it's probably worth
>> considering the alignment boundaries when doing prefix compression, if
>> you want to go that way. We can probably imagine a world where
>> alignment is not required for B-Tree, which would work on x86
>> machines, but I can't see it happening soon. It isn't worth
>> compressing unless it compresses enough to cross an "alignment
>> boundary", where we're not actually obliged to store as much data on
>> disk. This point may be obvious, not sure.
>
> That is another reason, why I doubt prefix compression, whereas effective
> duplicate storage hasn't this problem.

Okay. That sounds reasonable. I think duplicate handling is a good project.

A good learning tool for Postgres B-Trees -- or at least one of the
better ones -- is my amcheck tool. See:

https://github.com/petergeoghegan/postgres/tree/amcheck

This is a tool for verifying B-Tree invariants hold, which is loosely
based on pageinspect. It checks that certain conditions hold for
B-Trees. A simple example is that all items on each page be in the
correct, logical order. Some invariants checked are far more
complicated, though, and span multiple pages or multiple levels. See
the source code for exact details. This tool works well when running
the regression tests (see stress.sql -- I used it with pgbench), with
no problems reported last I checked. It often only needs light locks
on relations, and single shared locks on buffers. (Buffers are copied
to local memory for the tool to operate on, much like
contrib/pageinspect).

While I have yet to formally submit amcheck to a CF (I once asked for
input on the goals for the project on -hackers), the comments are
fairly comprehensive, and it wouldn't be too hard to adopt this to
guide your work on duplicate handling. Maybe it'll happen for 9.6.
Feedback appreciated.

The tool calls _bt_compare() for many things currently, but doesn't
care about many lower level details, which is (very roughly speaking)
the level that duplicate handling will work at. You aren't actually
proposing to change anything about the fundamental structure that
B-Tree indexes have, so the tool could be quite useful and low-effort
for debugging your code during development.

Debugging this stuff is sometimes like keyhole surgery. If you could
just see at/get to the structure that you care about, it would be 10
times easier. Hopefully this tool makes it easier to identify problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Tom Lane
Alvaro Herrera  writes:
> Gavin Wahl wrote:
>> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> just find the page range with the largest/smallest value, and then only
>> scan that one. Would that be hard to implement? I'm interested in working
>> on it if someone can give me some pointers.

> I think this means you need to represent this operation as a specific
> Path in some way.  See build_minmax_path() and its callers in planagg;
> you probably need to tweak preprocess_minmax_aggregates() to consider
> this.

> This doesn't look like a simple project to me, mind.

>> Somewhat harder but still possible would be using BRIN indexes to
>> accelerate ORDER BY. This would require a sorting algorithm that can take
>> advantage of mostly-sorted inputs. You would sort the page ranges by their
>> minimum or maximum value, then feed the sorting algorithm in that order.

> I wouldn't know where to start for this.  Maybe once Tom is done with
> planner rejiggering it would make sense to consider looking at how to do
> it.

Yeah.  I would urgently recommend that people *not* try to build new
things like planagg.c right now.  A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.

(And yes, I am working on that.  Honest.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Building with MinGW

2015-09-27 Thread Andrew Dunstan



On 09/27/2015 02:11 PM, Jeff Janes wrote:
Has anyone had success following the instructions at 
https://wiki.postgresql.org/wiki/Building_With_MinGW#Installing_Git 
recently?


I've followed the instructions to set up the build environment on a 
Windows box, and I can't build from git.  I can from the nightly 
tarball.  So I think the most recent environment you get when 
following these instructions has suffered some regression in the 
bison, yak, or something else involved in building from git but not 
from tarball.



The Msys DTK contains a bison and flex that are adequate for our 
purposes. Are you sure you installed the developer tools?


I just followed this recipe. However, I had to do the following 
additional small steps on a fresh Windows Server 2012 instances on Amazon:


 * the git installer had to be run as Administrator or it got
   permissions violations
 * I had to add this to the end of the MSys PATH: ":/c/prog/git/cmd" (I
   installed it in c:/prog/git)
 * I had to add "C:/mingw /mingw" to the fstab


Given that, I was able to clone the code, build it and run the 
regression set without incident. This was using the gcc that comes with 
the mingw installer, not the separate one from mingw-w64.





Also, is there anyway to know the exact git commit which the nightly 
tarballs are build against?






Not to my knowledge. Perhaps the tarball generator should create a file 
reporting the head commit on the branch.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Thomas Munro
On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl  wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one.

You might need to scan more than that if you don't find any rows that
are visible.

> Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.

Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a
Sort node (quicksort or external sort).  So the sort is already
receiving data sorted in BRIN-block order, isn't it?  Are you saying
that the sort node should switch to something like insertion sort in
this case?

http://www.sorting-algorithms.com/nearly-sorted-initial-order

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Alvaro Herrera
Gavin Wahl wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only
> scan that one. Would that be hard to implement? I'm interested in working
> on it if someone can give me some pointers.

I think this means you need to represent this operation as a specific
Path in some way.  See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.

This doesn't look like a simple project to me, mind.

> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

I wouldn't know where to start for this.  Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-27 Thread Gavin Wahl
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.


Re: [HACKERS] Rework the way multixact truncations work

2015-09-27 Thread Andres Freund
On 2015-09-27 14:21:08 -0500, Jim Nasby wrote:
> IMHO doing just a log of something this serious; it should at least be a
> WARNING.

In postgres LOG, somewhat confusingly, is more severe than WARNING.

> I think the concern about upgrading a replica before the master is valid; is
> there some way we could over-ride a PANIC when that's exactly what someone
> is trying to do? Check for a special file maybe?

I don't understand this concern - that's just the situation we have in
all released branches today.

> + boolsawTruncationInCkptCycle;
> What happens if someone downgrades the master, back to a version that no
> longer logs truncation? (I don't think assuming that the replica will need
> to restart if that happens is a safe bet...)

It'll just to do legacy truncation again - without a restart on the
standby required.

> - if (MultiXactIdPrecedes(oldestMXact, earliest))
> + /* If there's nothing to remove, we can bail out early. */
> + if (MultiXactIdPrecedes(oldestMulti, earliest))
>   {
> - DetermineSafeOldestOffset(oldestMXact);
> + LWLockRelease(MultiXactTruncationLock);
> If/when this is backpatched, would it be safer to just leave this alone?

What do you mean? This can't just isolated be left alone?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-27 Thread Jim Nasby

On 9/23/15 1:48 PM, Andres Freund wrote:

Honestly, I wonder whether this message
>ereport(LOG,
>(errmsg("performing legacy multixact 
truncation"),
> errdetail("Legacy truncations are sometimes 
performed when replaying WAL from an older primary."),
> errhint("Upgrade the primary, it is 
susceptible to data corruption.")));
>shouldn't rather be a PANIC.  (The main reason not to, I think, is that
>once you see this, there is no way to put the standby in a working state
>without recloning).

Huh? The behaviour in that case is still better than what we have in
9.3+ today (not delayed till the restartpoint). Don't see why that
should be a panic. That'd imo make it pretty much impossible to upgrade
a pair of primary/master where you normally upgrade the standby first?


IMHO doing just a log of something this serious; it should at least be a 
WARNING.


I think the concern about upgrading a replica before the master is 
valid; is there some way we could over-ride a PANIC when that's exactly 
what someone is trying to do? Check for a special file maybe?


+   boolsawTruncationInCkptCycle;
What happens if someone downgrades the master, back to a version that no 
longer logs truncation? (I don't think assuming that the replica will 
need to restart if that happens is a safe bet...)


-   if (MultiXactIdPrecedes(oldestMXact, earliest))
+   /* If there's nothing to remove, we can bail out early. */
+   if (MultiXactIdPrecedes(oldestMulti, earliest))
{
-   DetermineSafeOldestOffset(oldestMXact);
+   LWLockRelease(MultiXactTruncationLock);
If/when this is backpatched, would it be safer to just leave this alone?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Building with MinGW

2015-09-27 Thread Jeff Janes
Has anyone had success following the instructions at
https://wiki.postgresql.org/wiki/Building_With_MinGW#Installing_Git
recently?

I've followed the instructions to set up the build environment on a Windows
box, and I can't build from git.  I can from the nightly tarball.  So I
think the most recent environment you get when following these instructions
has suffered some regression in the bison, yak, or something else involved
in building from git but not from tarball.

Also, is there anyway to know the exact git commit which the nightly
tarballs are build against?

Cheers,

Jeff


Re: [HACKERS] Patch: Optimize memory allocation in function 'bringetbitmap'

2015-09-27 Thread Alvaro Herrera
Jinyu Zhang wrote:
> 
> BRIN Scan: Optimize memory allocation in function 'bringetbitmap'.
> We can allocate memory for some pointer before do long loop instead of 
> allocating
> memory in long loop.
> 
> Before optimizing code (warm run)
> postgres=# select count(*) from lineitem where l_orderkey=1;
> Time: 456.219 ms
> 
> After optimizing code (warm run)
> postgres=# select count(*) from lineitem where l_orderkey=1;
> Time: 349.219 ms

Hmm, did you measure this in a c-assert-enabled build?  Those are slower
in allocation because of the memory-clobber thingy; without that, the
allocations would be a lot faster, so perhaps the gains are not so
interesting.  Still, it's a lot of palloc/pfree calls that do not happen
with your patch, so perhaps it's still worthwhile, but please do measure
it.

However I wonder if it would be simpler to have the dtup structure have
the pointers, so that you can pass it as NULL in the first call and then
followup calls reuse the one allocated in the first call.  That makes
the callers a bit less ugly, I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump LOCK TABLE ONLY question

2015-09-27 Thread Tom Lane
=?UTF-8?Q?Filip_Rembia=C5=82kowski?=  writes:
> I'm running pg_dump constrained to one schema. It appears that pg_dump runs

> "LOCK TABLE %s IN ACCESS SHARE MODE" for each table.

> Naturally it makes sense, but...

> This schema has a table that serves as parent for thousands of child
> tables (via INHERITS).

> So effectively, to dump this schema, I have to LOCK all these tables
> not only parent.

They'd all end up locked anyway wouldn't they?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2015-09-27 Thread David Rowley
On 26 September 2015 at 01:57, Tomas Vondra 
wrote:

> Hi,
>
> On 09/25/2015 03:39 AM, David Rowley wrote:
>
>> I looked at this again, and I'm not all that sure it's possible to
>>
> assume that 1.0 /  is valid when there's more than one
>> relation at either side of the join.
>>
> >
>
>> My reasoning for this is that the whole basis for the patch is that a
>> if we find a foreign key match, then we can be sure enough, as far as
>> row estimations go, that exactly 1 tuple will exist matching that
>> condition. This assumption of the 1 tuple no longer holds when 2
>> relations have already been joined, as this can either cause tuple
>> duplication, or elimination.
>>
>
> I don't see why that would be the case. Of course, you need to take the
> right , i.e. the "target" of the foreign key (the table with UNIQUE
> constraint) so that the selectivity matches the fact that exactly 1 tuple
> (on the PK side) matches.
>

hmm, ok. You're right. It appears I was a bit confused, but thanks for
explaining it again. I get it now.

I've been working on this again. I've put back the code that you wrote for
the looping over each combination of relations from either side of the join.

I've also added some code to get around the problem with eclass joins and
the RestrictInfo having some alternative Vars that don't belong to the
foreign key. Basically I'm just checking if the RestrictInfo has a
parent_ec, and if it does just loop over the members to try and find the
Vars that belong to the foreign key. I've tested it with the following, and
it seems to work:

create table a as select i as a_id1, i as a_id2, i as dummy1 from
generate_series(0,999) s(i);
alter table a add unique (a_id1, a_id2);
create table b as select i as b_id1, i as b_id2 from generate_series(0,332)
s(i);

analyze a;
analyze b;

alter table b add foreign key (b_id1, b_id2) references a (a_id1, a_id2);

explain analyze select * from a inner join b on a.dummy1 = b.b_id1 and
a.a_id2 = b.b_id2 where a.a_id1 = a.dummy1;

 QUERY PLAN
---
 Hash Join  (cost=18.57..26.41 rows=2 width=20) (actual time=0.775..1.046
rows=333 loops=1)
   Hash Cond: ((b.b_id1 = a.dummy1) AND (b.b_id2 = a.a_id2))
   ->  Seq Scan on b  (cost=0.00..5.33 rows=333 width=8) (actual
time=0.013..0.046 rows=333 loops=1)
   ->  Hash  (cost=18.50..18.50 rows=5 width=12) (actual time=0.737..0.737
rows=1000 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 51kB
 ->  Seq Scan on a  (cost=0.00..18.50 rows=5 width=12) (actual
time=0.014..0.389 rows=1000 loops=1)
   Filter: (dummy1 = a_id1)

The non-patched version estimates 1 row. The patched estimates 2 rows, but
that's due to the bad estimate on dummy1 = a_id1.

The 2 comes from ceil(5 * 0.333).

Perhaps you have a better test case to for this?

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


estimation-with-fkeys-v2_davidv3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump LOCK TABLE ONLY question

2015-09-27 Thread Filip Rembiałkowski
Hi.

I'm running pg_dump constrained to one schema. It appears that pg_dump runs

"LOCK TABLE %s IN ACCESS SHARE MODE" for each table.

Naturally it makes sense, but...

This schema has a table that serves as parent for thousands of child
tables (via INHERITS).

So effectively, to dump this schema, I have to LOCK all these tables
not only parent.

pg_dump does it automatically, I checked in current trunk - it does
not add ONLY keyword. Should it?

I wonder if it it counts as a bug.
If not a bug, maybe it counts as a feature request?

Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Optimize memory allocation in function 'bringetbitmap'

2015-09-27 Thread Jinyu Zhang

BRIN Scan: Optimize memory allocation in function 'bringetbitmap'.
We can allocate memory for some pointer before do long loop instead of 
allocating
memory in long loop.

Before optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
 count
---
 6
(1 row)

Time: 456.219 ms

After optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
 count
---
 6
(1 row)

Time: 349.219 ms

The following shows the DDL of this test case.
CREATE TABLE LINEITEM ( L_ORDERKEYINTEGER NOT NULL,
 L_PARTKEY INTEGER NOT NULL,
 L_SUPPKEY INTEGER NOT NULL,
 L_LINENUMBER  INTEGER NOT NULL,
 L_QUANTITYDECIMAL(15,2) NOT NULL,
 L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
 L_DISCOUNTDECIMAL(15,2) NOT NULL,
 L_TAX DECIMAL(15,2) NOT NULL,
 L_RETURNFLAG  CHAR(1) NOT NULL,
 L_LINESTATUS  CHAR(1) NOT NULL,
 L_SHIPDATEDATE NOT NULL,
 L_COMMITDATE  DATE NOT NULL,
 L_RECEIPTDATE DATE NOT NULL,
 L_SHIPINSTRUCT CHAR(25) NOT NULL,
 L_SHIPMODE CHAR(10) NOT NULL,
 L_COMMENT  VARCHAR(44) NOT NULL);

copy lineitem from '/home/jinyu/mywork/dbgen/lineitem.tbl' delimiter '|';
create index brinLineitem on lineitem using brin(L_ORDERKEY) 
with(pages_per_range = 1);

Jinyu Zhang





网易考拉iPhone6s玫瑰金5288元,现货不加价

patch_optimize_mem
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN Scan: Optimize memory allocation in function 'bringetbitmap'

2015-09-27 Thread Jinyu Zhang

BRIN Scan: Optimize memory allocation in function 'bringetbitmap'.
We can allocate memory for some pointer before do long loop instead of 
allocating
memory in long loop.

Before optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
 count
---
 6
(1 row)

Time: 456.219 ms

After optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
 count
---
 6
(1 row)

Time: 349.219 ms

The following shows the DDL of this test case.
CREATE TABLE LINEITEM ( L_ORDERKEYINTEGER NOT NULL,
 L_PARTKEY INTEGER NOT NULL,
 L_SUPPKEY INTEGER NOT NULL,
 L_LINENUMBER  INTEGER NOT NULL,
 L_QUANTITYDECIMAL(15,2) NOT NULL,
 L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
 L_DISCOUNTDECIMAL(15,2) NOT NULL,
 L_TAX DECIMAL(15,2) NOT NULL,
 L_RETURNFLAG  CHAR(1) NOT NULL,
 L_LINESTATUS  CHAR(1) NOT NULL,
 L_SHIPDATEDATE NOT NULL,
 L_COMMITDATE  DATE NOT NULL,
 L_RECEIPTDATE DATE NOT NULL,
 L_SHIPINSTRUCT CHAR(25) NOT NULL,
 L_SHIPMODE CHAR(10) NOT NULL,
 L_COMMENT  VARCHAR(44) NOT NULL);

copy lineitem from '/home/jinyu/mywork/dbgen/lineitem.tbl' delimiter '|';
create index brinLineitem on lineitem using brin(L_ORDERKEY) 
with(pages_per_range = 1);

Jinyu Zhang