Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Hannu Krosing
On 12/11/2013 01:44 AM, Greg Stark wrote:
> On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs  wrote:
>> When we select a block we should read all rows on that block, to help
>> identify the extent of clustering within the data.
> So how do you interpret the results of the sample read that way that
> doesn't introduce bias?
>
Initially/experimentally we could just compare it to our current approach :)

That is, implement *some* block sampling and then check it against what
we currently have. Then figure out the bad differences. Rinse. Repeat.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] invalid magic number in log segment

2013-12-10 Thread Erikjan Rijkers
I don't know whether the below constitutes a bug, but:

Daily (sometimes even more often) I recompile 9.4devel (after git pull)  to run 
a large dev database (100 GB or so).

To avoid frequent initdb and many-hour-restore of data, I do this only when the 
following two #defines are unchanged:
  CATALOG_VERSION_NO in src/include/catalog/catversion.h, and
  PG_CONTROL_VERSION in src/include/catalog/pg_control.h

the goal being to always run the latest db, without having to reload the ~100 
GB dev db unexpectedly at inconvenient times.

Generally, this works OK.

However, the last few weeks I sometimes get, after such recompiling,  'invalid 
magic number' errors from which I don't know
how to recover (it means, apparently, an initdb is needed and I have then to 
reload the database).


2013-12-11 00:15:25.627 CET 25304 LOG:  received smart shutdown request
2013-12-11 00:15:25.631 CET 25306 LOG:  shutting down
2013-12-11 00:15:25.904 CET 25306 LOG:  database system is shut down
2013-12-11 08:11:59.858 CET 25490 LOG:  database system was shut down at 
2013-12-11 00:15:25 CET
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid primary checkpoint record
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid secondary checkpoint record
2013-12-11 08:11:59.901 CET 25490 PANIC:  could not locate a valid checkpoint 
record
2013-12-11 08:12:00.326 CET 25492 FATAL:  the database system is starting up
2013-12-11 08:12:01.328 CET 25493 FATAL:  the database system is starting up
2013-12-11 08:12:01.682 CET 25489 LOG:  startup process (PID 25490) was 
terminated by signal 6: Aborted
2013-12-11 08:12:01.682 CET 25489 LOG:  aborting startup due to startup process 
failure


My question is two-fold:

1. (general:)  is this 'invalid magic number' unexpected, and should it be 
reported always?

2. (for my setup specifically:)  is there any way that I can recognize, 
beforehand, at the code base level, such an
impending 'invalid magic number' state?
Can de db be recovered from easily? (although this dev database is expendable, 
it takes many hours to rebuild; I'd like to
avoid that if possible).

thanks,

Erikjan Rijkers









-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 10:34 PM, Simon Riggs  wrote:
> On 11 December 2013 01:27, Sergey E. Koposov  wrote:
>> For what it's worth.
>>
>> I'll quote Chaudhuri et al. first line from the abstract about the block
>> sampling.
>> "Block-level sampling is far more efficient than true uniform-random
>> sampling over a large database, but prone to  significant errors if used to
>> create database statistics."
>
> This glosses over the point that both SQLServer and Oracle use this technique.

That seems like an unusual omission for Microsoft Research to have made.

I didn't read that paper, because undoubtedly it's all patented. But
before I figured that out, after finding it on Google randomly, I did
read the first couple of paragraphs, which more or less said "what
follows - the entire paper - is an explanation as to why it's okay
that we do block sampling".


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 11 December 2013 01:27, Sergey E. Koposov  wrote:
> For what it's worth.
>
> I'll quote Chaudhuri et al. first line from the abstract about the block
> sampling.
> "Block-level sampling is far more efficient than true uniform-random
> sampling over a large database, but prone to  significant errors if used to
> create database statistics."

This glosses over the point that both SQLServer and Oracle use this technique.

> And after briefly glancing through the paper, my opinion is why it works is
> because after making one version of statistics they cross-validate, see how
> well it goes and then collect more if the cross-validation error is large
> (for example because the data is clustered). Without this bit, as far as I
> can a simply block based sampler will be bound to make catastrophic mistakes
> depending on the distribution

I don't think its true that a block based sampler will be *bound* to
make "catastrophic mistakes". They can clearly happen, just as they
can with random samples, hence the need for a parameter to control the
sample with a parameter.

Realistically, I never heard of an Oracle DBA doing advanced
statistical mathematics before setting the sample size on ANALYZE. You
use the default and bump it up if the sample is insufficient for the
data.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time-Delayed Standbys

2013-12-10 Thread KONDO Mitsumasa

(2013/12/10 18:38), Andres Freund wrote:

"master PITR"? What's that? All PITR is based on recovery.conf and thus
not really a "master"?
"master PITR" is PITR with "standby_mode = off". It's just recovery from 
basebackup. They have difference between "master PITR" and "standby" that the 
former will be independent timelineID, but the latter is same timeline ID taht 
following the master sever. In the first place, purposes are different.



Why should we prohibit using this feature in PITR? I don't see any
advantage in doing so. If somebody doesn't want the delay, they
shouldn't set it in the configuration file. End of story.
Unfortunately, there are a lot of stupid in the world... I think you have these 
clients, too.



There's not really a that meaningful distinction between PITR and
replication using archive_command. Especially when using
*pause_after.
It is meaningless in "master PITR". It will be master which has new timelineID at 
unexpected timing.



I think this feature will be used in a lot of scenarios in
which PITR is currently used.

We have to judge which is better, we get something potential or to protect 
stupid.
And we had better to wait author's comment...

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-12-10 Thread KONDO Mitsumasa

(2013/12/10 22:55), Claudio Freire wrote:

On Tue, Dec 10, 2013 at 5:03 AM, KONDO Mitsumasa
 wrote:

I revise this patch and re-run performance test, it can work collectry in
Linux and no complile wanings. I add GUC about enable_kernel_readahead
option in new version. When this GUC is on(default), it works in
POSIX_FADV_NORMAL which is general readahead in OS. And when it is off, it
works in POSXI_FADV_RANDOM or POSIX_FADV_SEQUENTIAL which is judged by
buffer hint in Postgres, readahead parameter is optimized by postgres. We
can change this parameter in their transactions everywhere and everytime.


I'd change the naming to

OK. I think "on" or "off" naming is not good, too.


enable_readahead=os|fadvise

with os = on, fadvise = off

Hmm. fadvise is method and is not a purpose. So I consider another idea of this 
GUC.

1)readahead_strategy=os|pg
  This naming is good for future another implements. If we will want to set
  maximum readahead paraemeter which is always use POSIX_FADV_SEQUENTIAL, we can 
set "max".


2)readahead_optimizer=os|pg or readahaed_strategist=os|pg
  This naming is easy to understand to who is opitimized readahead.
  But it isn't extensibility for future another implements.


And, if you want to keep the on/off values, I'd reverse them. Because
off reads more like "I don't do anything special", and in your patch
it's quite the opposite.

I understand your feeling. If we adopt "on|off" setting, I would like to set GUC
optimized_readahead=off|on.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY table FROM STDIN doesn't show count tag

2013-12-10 Thread Rajeev rastogi
On 9th December, Amit Khandelkar wrote:

>1.  slashcopyissuev1.patch :- This patch fixes the \COPY issue.
>You have removed the if condition in this statement, mentioning that it is 
>always true now:
>-   if (copystream == pset.cur_cmd_source)
>-   pset.lineno++;
>+   pset.lineno++;
>
 >But copystream can be different than pset.cur_cmd_source , right ?

As per the earlier code, condition result was always true. So pset.lineno was 
always incremented.
In the earlier code pset.cur_cmd_source was sent as parameter to function and 
inside the function same parameter was used with the name copystream. So on 
entry of this function both will be one and same.
I checked inside the function handleCopyIn, both of these parameters are not 
changing before above check. Also since pset is specific to single session, so 
it cannot change concurrently.
Please let me know, if I am missing something.

>+   FILE   *copyStream; /* Stream to read/write for copy 
>command */
>
>There is no tab between FILE and *copystream, hence it is not aligned.

OK. I shall change accodingly.


>2.  initialcopyissuev1_ontopofslashcopy.patch : Fix for "COPY table FROM 
>STDIN/STDOUT doesn't show count tag".
>The following header comments of ProcessResult() need to be modified:
>* Changes its argument to point to the last PGresult of the command string,
>* or NULL if that result was for a COPY FROM STDIN or COPY TO STDOUT.

OK. I shall change accodingly.

>Regression results show all passed.
>Other than this, the patch needs a new regression test.

I had checked the existing regression test cases and observed that it has 
already got all kind of test cases. Like
copystdin,
copystdout,
\copy.stdin
\copy.stdout.

But since as regression framework runs in "quite i.e. -q" mode, so it does not 
show any message except query output.
So our new code change does not impact regression framework.

Please let me know if you were expecting any other test cases?

>I don't think we need to do any doc changes, because the doc already mentions 
>that COPY should show the COUNT tag, and does not mention anything specific to 
>client-side COPY.
 OK.

Please provide you opinion, based on which I shall prepare new patch and share 
the same.

Thanks and Regards,
Kumar Rajeev Rastogi



Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Dec 10, 2013 at 12:15 PM, Pavel Stehule  
> wrote:
>> Now, PG has no any tool for checking dependency between functions and other
>> objects.

> Isn't that already done for SQL function's (fmgr_sql_validator)?

Pavel's point is that the only way to find out if the validator will fail
is to run it and see if it fails; and even if it does, how much will you
know about why?  That's not particularly helpful for pg_dump, which needs
to understand dependencies in a fairly deep fashion.  It not only needs to
figure out how to dump the database objects in a dependency-safe order,
but what to do to break dependency loops.

Right now I believe that pg_dump has a workable strategy for every
possible case of circular dependencies, because they are all caused by
secondary attributes of objects that can be split out and applied later,
for example applying a column default via ALTER TABLE ALTER COLUMN SET
DEFAULT rather than listing the default right in the CREATE TABLE command.

However, if function A depends on B and also vice-versa (mutual recursion
is not exactly an unheard-of technique), there is no way to load them both
if the function bodies are both checked at creation time.

I guess we could invent some equivalent of a forward declaration, but
that still leaves us short of understanding what the function body is
depending on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] -d option for pg_isready is broken

2013-12-10 Thread Fujii Masao
On Thu, Nov 21, 2013 at 9:58 PM, Fujii Masao  wrote:
> On Thu, Nov 21, 2013 at 6:41 AM, Robert Haas  wrote:
>> On Wed, Nov 20, 2013 at 4:55 AM, Fujii Masao  wrote:
 Not that I can see, but it's not very future-proof.  If libpq changes
 its idea of what will provoke database-name expansion, this will again
 be broken.
>>>
>>> Yeah, I agree that we should make the logic of pg_isready more future-proof
>>> in 9.4dev. One idea is to expose internal_ping() as a libpq function. Then,
>>> instead of just calling PQpingParams(), we can do PQconnectStartParams(),
>>> libpq-function-version-of-internal_ping(), PQhost(), PQport(), and 
>>> PQfinish().
>>> That is, we get to know the host and port information by calling
>>> PQhost() and PQport(),
>>> after trying to connect to the server.

While I was investigaing PQhost() for that approach, I found several
problems of PQhost().

(1) PQhost() can return Unix-domain socket directory path even in the
platform that
doesn't support Unix-domain socket.

(2) In the platform that doesn't support Unix-domain socket, when
neither host nor hostaddr
are specified, the default host 'localhost' is used to connect to
the server and
PQhost() must return that, but it doesn't.

(3) PQhost() cannot return the hostaddr.

As the result of these problems, you can see the incorrect result of
\conninfo, for example.

$ psql -d "hostaddr=127.0.0.1"
=# \conninfo
You are connected to database "postgres" as user "postgres" via socket
in "/tmp" at port "5432".

Obviously "/tmp" should be "127.0.0.1".

The attached patch fixes these problems of PQhost(). But I'm concerned
about that
this change may break the existing application using PQhost(). That is, some
existing application might not assume that PQhost() returns hostaddr.
If my concern
is true, maybe we need to implement something like PQhostaddr(). It's too late
to add such new libpq function into the current stable release,
though... Thought?

If it's okay to change the behavior of PQhost() as I explained, I will commit
the patch in all supported versions.

Regards,

-- 
Fujii Masao
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 5188,5194  PQhost(const PGconn *conn)
  {
  	if (!conn)
  		return NULL;
! 	return conn->pghost ? conn->pghost : conn->pgunixsocket;
  }
  
  char *
--- 5188,5205 
  {
  	if (!conn)
  		return NULL;
! 	if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0')
! 		return conn->pghostaddr;
! 	else if (conn->pghost != NULL && conn->pghost[0] != '\0')
! 		return conn->pghost;
! 	else
! 	{
! #ifdef HAVE_UNIX_SOCKETS
! 		return conn->pgunixsocket;
! #else
! 		return DefaultHost;
! #endif
! 	}
  }
  
  char *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Amit Kapila
On Tue, Dec 10, 2013 at 12:15 PM, Pavel Stehule  wrote:
> 2013/12/10 Amit Kapila 
>> On Mon, Dec 9, 2013 at 10:54 AM, Pavel Stehule 
>> wrote:
>> > 2013/12/9 Amit Kapila 
>> >> > There are two points, that should be solved
>> >> >
>> >> > a) introduction a dependency to other object in schema - now CREATE
>> >> > FUNCTION
>> >> > is fully independent on others
>> >> >
>> >> > b) slow start - if we check all paths on start, then start can be
>> >> > slower

>> >> > some like
>> >> >
>> >> > #option check_on_first_start
>> >> > #option check_on_create
>> >> > #option check_newer
>> >>
>> >> what exactly check_newer means, does it mean whenever a function is
>> >> replaced (changed)?
>> >>
>> >
>> > no, it means, so request for check will be ignored ever - some functions
>> > cannot be deeply checked due using dynamic SQL or dynamic created data
>> > types
>> > - temporary tables created in functions.
>>
>
>
> Now, PG has no any tool for checking dependency between functions and other
> objects. What has positive effects - we have very simply deploying, that
> works in almost use cases very well - and works with our temporary tables
> implementation. There is simple rule - depended object must living before
> they are >>used in runtime<<. But checking should not be runtime event and
> we would to decrease a false alarms. So we have to expect so almost all
> necessary object are created - it is reason, why we decided don't do check
> in create function statement time.

Isn't that already done for SQL function's (fmgr_sql_validator)?

postgres=# CREATE FUNCTION clean_emp() RETURNS void AS
postgres'# DELETE FROM emp
postgres'# WHERE salary < 0;
postgres'# ' LANGUAGE SQL;
ERROR:  relation "emp" does not exist
LINE 2: DELETE FROM emp
^

I mean to say that the above rule stated by you ("There is simple rule
- depended object must living before
they are >>used in runtime<<") doesn't seem to be true for SQL functions.
So isn't it better to do same for plpgsql functions as well?

For doing at runtime (during first execution of function) are you
planing to add it as a extra step
such that if parameter check_on_first_start is set, then do it.

>We don't would to introduce new dependency if it will be possible.
  In that case what exactly you mean to say in point a) ("introduction
a dependency to other object..") above in you mail.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread KONDO Mitsumasa
(2013/12/11 10:25), Tom Lane wrote:
> Jeff Janes  writes:
>> On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire 
>> wrote:
>>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>>> Checkpoint logic would have to be heavily reworked to account for an
>>> impaired kernel cache.
> 
>> I don't think it would need anything more than a sorted checkpoint.
> 
> Nonsense.  We don't have access to the physical-disk-layout information
> needed to do reasonable sorting;
OS knows physical-disk-layout which is under following.
> [mitsu-ko@ssd ~]$ filefrag -v .bashrc
> Filesystem type is: ef53
> File size of .bashrc is 124 (1 block, blocksize 4096)
>  ext logical physical expected length flags
>0   0 15761410   1 eof
> .bashrc: 1 extent found
If we have to know this information, we can get physical-disk-layout whenever.

> to say nothing of doing something
> intelligent in a multi-spindle environment, or whenever any other I/O
> is going on concurrently.
IO scheduler in OS knows it best. So I think BufferedIO is faster than DirectIO
in general situations.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Claudio Freire
On Tue, Dec 10, 2013 at 11:33 PM, Jeff Janes  wrote:
> On Tuesday, December 10, 2013, Tom Lane wrote:
>>
>> Jeff Janes  writes:
>> > On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire
>> > wrote:
>> >> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> >> Checkpoint logic would have to be heavily reworked to account for an
>> >> impaired kernel cache.
>>
>> > I don't think it would need anything more than a sorted checkpoint.
>>
>> Nonsense.  We don't have access to the physical-disk-layout information
>> needed to do reasonable sorting; to say nothing of doing something
>> intelligent in a multi-spindle environment, or whenever any other I/O
>> is going on concurrently.
>
>
> The proposal I was responding to was simply to increase shared_buffers to
> 80% of RAM *instead of* implementing directIO.


If you do not leave a reasonable amount of RAM, writes will be direct
and synchronous.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-10 Thread Tom Lane
Robert Haas  writes:
> On Tue, Dec 10, 2013 at 7:55 PM, Tom Lane  wrote:
>> Anyway, bottom line is that I think we need to institute, and
>> back-patch, some consistent scheme for when to analyze the standard
>> tables during the regression tests, so that we don't have hazards
>> like this for tests that want to check what plan gets selected.
>> 
>> Comments?

> Everything you're saying sounds reasonable from here.

I experimented with this and found out that an across-the-board ANALYZE
in the copy test causes a number of existing plan results to change,
because actually the very small tables like int4_tbl have never been
analyzed at all in the past.  Now, that might be just fine, or it might
be that those tests would no longer test the code path they were meant
to test.  I lack the energy right now to examine each one and decide
whether the change is OK.  What I propose instead is that the copy test
only analyze the tables that it itself loaded.  This lets us get rid
of most of the ad-hoc ANALYZEs of pre-existing tables, as in the attached
proposed patch against HEAD.  I ended up leaving in the "vacuum analyze
tenk1" in create_index.sql, because it turns out what that is actually
accomplishing is to fill the visibility map so that an index-only scan
will be chosen.  Maybe we should change all the added ANALYZEs in the
copy test to VACUUM ANALYZEs?  That would be redundant though with the
database-wide VACUUM in sanity_check.sql, which unfortunately runs after
create_index.sql.

I haven't touched matview.sql here; that seems like a distinct issue.

regards, tom lane

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index e17881c..c05b39c 100644
*** a/src/test/regress/expected/aggregates.out
--- b/src/test/regress/expected/aggregates.out
*** FROM bool_test;
*** 506,512 
  -- Test cases that should be optimized into indexscans instead of
  -- the generic aggregate implementation.
  --
- analyze tenk1;		-- ensure we get consistent plans here
  -- Basic cases
  explain (costs off)
select min(unique1) from tenk1;
--- 506,511 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 232a233..0f0c638 100644
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
*** ALTER INDEX tmp_onek_unique1 RENAME TO o
*** 140,146 
  CREATE VIEW tmp_view (unique1) AS SELECT unique1 FROM tenk1;
  ALTER TABLE tmp_view RENAME TO tmp_view_new;
  -- hack to ensure we get an indexscan here
- ANALYZE tenk1;
  set enable_seqscan to off;
  set enable_bitmapscan to off;
  -- 5 values, sorted
--- 140,145 
diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index d47861e..23b3902 100644
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
*** SELECT 0 || ARRAY[1,2] || 3 AS "{0,1,2,3
*** 421,427 
   {0,1,2,3}
  (1 row)
  
- ANALYZE array_op_test;
  SELECT * FROM array_op_test WHERE i @> '{32}' ORDER BY seqno;
   seqno |i| t  
  ---+-+
--- 421,426 
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index c3598e2..85113a9 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*** on (x1 = xx1) where (xx2 is not null);
*** 2129,2135 
  -- regression test: check for bug with propagation of implied equality
  -- to outside an IN
  --
- analyze tenk1;		-- ensure we get consistent plans here
  select count(*) from tenk1 a where unique1 in
(select unique1 from tenk1 b join tenk1 c using (unique1)
 where b.unique2 = 42);
--- 2129,2134 
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index ab3f508..80f911f 100644
*** a/src/test/regress/input/copy.source
--- b/src/test/regress/input/copy.source
*** COPY array_op_test FROM '@abs_srcdir@/da
*** 60,65 
--- 60,90 
  
  COPY array_index_op_test FROM '@abs_srcdir@/data/array.data';
  
+ -- analyze all the data we just loaded, to ensure plan consistency
+ -- in later tests
+ 
+ ANALYZE aggtest;
+ ANALYZE onek;
+ ANALYZE tenk1;
+ ANALYZE slow_emp4000;
+ ANALYZE person;
+ ANALYZE emp;
+ ANALYZE student;
+ ANALYZE stud_emp;
+ ANALYZE road;
+ ANALYZE real_city;
+ ANALYZE hash_i4_heap;
+ ANALYZE hash_name_heap;
+ ANALYZE hash_txt_heap;
+ ANALYZE hash_f8_heap;
+ ANALYZE test_tsvector;
+ ANALYZE bt_i4_heap;
+ ANALYZE bt_name_heap;
+ ANALYZE bt_txt_heap;
+ ANALYZE bt_f8_heap;
+ ANALYZE array_op_test;
+ ANALYZE array_index_op_test;
+ 
  --- test copying i

Re: [HACKERS] [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 7:56 PM, Andres Freund  wrote:
> On 2013-12-11 09:54:36 +0900, Michael Paquier wrote:
>> On Wed, Dec 11, 2013 at 9:34 AM, Robert Haas  wrote:
>> > Add a new reloption, user_catalog_table.
>> Sorry for not having completely followed the thread of logical
>> replication, but wouldn't this deserve some documentation?
>
> I'd say this needs to be documented when the later facilities are
> committed. It seems hard to explain without that context.

I actually think Michael is right.  There's no reason this can't or
shouldn't get a mention in Storage
Parameters at the very least.  We might not have the right
stuff to hyperlink to for more details yet, but we can at least get
the basic bits in there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.8

2013-12-10 Thread Robert Haas
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund  wrote:
> [ updated logical decoding patches ]

Regarding patch #4, introduce wal decoding via catalog timetravel,
which seems to be the bulk of what's not committed at this point...

- I think this needs SGML documentation, same kind of thing we have
for background workers, except probably significantly more.  A design
document with ASCII art in a different patch does not constitute
usable documentation.  I think it would fit best under section VII,
internals, perhaps entitled "Writing a Logical Decoding Plugin".
Looking at it, I rather wonder if the "Background Worker Processes"
ought to be there as well, rather than under section V, server
programming.

+   /* setup the redirected t_self for the benefit
of logical decoding */
...
+   /* reset to original, non-redirected, tid */

Clear as mud.

+ * rrow to disk but only do so in batches when we've collected several of them

Typo.

+ * position before those records back. Independently from WAL logging,

"before those records back"?

+ * position before those records back. Independently from WAL logging,
+ * everytime a rewrite is finished all generated mapping files are directly

I would delete "Independently from WAL logging" from this sentence.
And "everytime" is two words.

+ * file. That leaves the tail end that has not yet been fsync()ed to disk open
...
+ * fsynced.

Pick a spelling and stick with it.  My suggestion is "flushed to
disk", not actually using fsync per se at all.

+ * TransactionDidCommit() check. But we want to support physical replication
+ * for availability and to support logical decoding on the standbys.

What is physical replication for availability?

+ * necessary. If we detect that we don't need to log anything we'll prevent
+ * any further action by logical_*rewrite*

Sentences should end with a period, and the reason for the asterisks
is not clear.

+   logical_xmin =
+   ((volatile LogicalDecodingCtlData*) LogicalDecodingCtl)->xmin;

Ugly.

+errmsg("failed to write to
logical remapping file: %m")));

Message style.

+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("incomplete write to
logical remapping file, wrote %d of %u",
+   written, len)));

Message style.  I suggest treating a short write as ENOSPC; there is
precedent elsewhere.

I don't think there's much point in including "remapping" in all of
the error messages.  It adds burden for translators and users won't
know what a remapping file is anyway.

+   /*
+* We intentionally violate the usual WAL coding
practices here and
+* write to the file *first*. This way an eventual
checkpoint will
+* sync any part of the file that's not guaranteed to
be recovered by
+* the XLogInsert(). We deal with the potential
corruption in the tail
+* of the file by truncating it to the last safe point
during WAL
+* replay and by checking whether the xid performing
the mapping has
+* committed.
+*/

Don't have two different comments explaining this.  Either replace
this one with a reference to the other one, or get rid of the other
one and just keep this one.  I vote for the latter.

I don't see a clear explanation anywhere of what the
rs_logical_mappings hash is actually doing.  This is badly needed.
This code basically presupposes that you know what it's try to
accomplish, and even though I sort of do, it leaves a lot to be
desired in terms of clarity.

+   /* nothing to do if we're not working on a catalog table */
+   if (!state->rs_logical_rewrite)
+   return;

Comment doesn't accurately describe code.

+   /* use *GetUpdateXid to correctly deal with multixacts */
+   xmax = HeapTupleHeaderGetUpdateXid(new_tuple->t_data);

I don't feel enlightened by that comment.

+   if (!TransactionIdIsNormal(xmax))
+   {
+   /*
+* no xmax is set, can't have any permanent ones, so
this check is
+* sufficient
+*/
+   }
+   else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple->t_data->t_infomask))
+   {
+   /* only locked, we don't care */
+   }
+   else if (!TransactionIdPrecedes(xmax, cutoff))
+   {
+   /* tuple has been deleted recently, log */
+   do_log_xmax = true;
+   }

Obfuscated.  Rewrite without empty blocks.

+   /*
+* Write out buffer everytime we've too many in-memory entries.
+*/
+   if (state->rs_num_rewrite_mappings >= 1000 /* arbitrary number */)
+   logical_heap_rewrite_flush_mappings(state);

What happens if the number of items per hash table entry is small but
th

Re: [HACKERS] Completing PL support for Event Triggers

2013-12-10 Thread Peter Eisentraut
On Mon, 2013-12-09 at 09:45 +0100, Dimitri Fontaine wrote:
> It looks like you started with the v1 of the plperl patch rather than
> the v2, where the only difference is in only using is_trigger or using
> both is_trigger and is_event_trigger. Your version currently uses both
> where I though we chose using is_trigger only.

I think you are mistaken.  My patch includes all changes between your v1
and v2 patch.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why the buildfarm is all pink

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 7:55 PM, Tom Lane  wrote:
> This doesn't make me happy.  Aside from the sheer waste of cycles
> involved in re-analyzing the entire regression database, this
> test runs in parallel with half a dozen others, and it could cause
> plan instability in those.  Of course, if it does, then most likely
> those tests have a hazard from autovacuum anyway.  But this still
> looks to me like a poor bit of test design.

Agreed.

> Anyway, bottom line is that I think we need to institute, and
> back-patch, some consistent scheme for when to analyze the standard
> tables during the regression tests, so that we don't have hazards
> like this for tests that want to check what plan gets selected.
>
> Comments?

Everything you're saying sounds reasonable from here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Jeff Janes
On Tuesday, December 10, 2013, Tom Lane wrote:

> Jeff Janes > writes:
> > On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire 
> > 
> >wrote:
> >> Problem is, Postgres relies on a working kernel cache for checkpoints.
> >> Checkpoint logic would have to be heavily reworked to account for an
> >> impaired kernel cache.
>
> > I don't think it would need anything more than a sorted checkpoint.
>
> Nonsense.  We don't have access to the physical-disk-layout information
> needed to do reasonable sorting; to say nothing of doing something
> intelligent in a multi-spindle environment, or whenever any other I/O
> is going on concurrently.
>

The proposal I was responding to was simply to increase shared_buffers to
80% of RAM *instead of* implementing directIO.

Cheers,

Jeff


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Jeff Janes
On Tuesday, December 10, 2013, Simon Riggs wrote:

> On 11 December 2013 00:28, Greg Stark >
> wrote:
> >On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs 
> > >
> wrote:
> >> Block sampling, with parameter to specify sample size. +1
> >
> > Simon this is very frustrating. Can you define "block sampling"?
>
> Blocks selected using Vitter's algorithm, using a parameterised
> fraction of the total.
>

OK, thanks for defining that.

We only need Vitter's algorithm when we don't know in advance how many
items we are sampling from (such as for tuples--unless we want to rely on
the previous estimate for the current round of analysis).  But for blocks,
we do know how many there are, so there are simpler ways to pick them.


>
> When we select a block we should read all rows on that block, to help
> identify the extent of clustering within the data.
>

But we have no mechanism to store such information (or to use it if it were
stored), nor even ways to prevent the resulting skew in the sample from
seriously messing up the estimates which we do have ways of storing and
using.

Cheers,

Jeff


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Tom Lane
Peter Geoghegan  writes:
> Again, it isn't as if the likely efficacy of *some* block sampling
> approach is in question. I'm sure analyze.c is currently naive about
> many things.

It's not *that* naive; this is already about a third-generation algorithm.
The last major revision (commit 9d6570b8a4) was to address problems with
misestimating the number of live tuples due to nonuniform tuple density
in a table.  IIRC, the previous code could be seriously misled if the
first few pages in the table were significantly non-representative of the
live-tuple density further on.  I'm not sure how we can significantly
reduce the number of blocks examined without re-introducing that hazard in
some form.  In particular, given that you want to see at least N tuples,
how many blocks will you read if you don't have an a-priori estimate of
tuple density?  You have to decide that before you start sampling blocks,
if you want all blocks to have the same probability of being selected
and you want to read them in sequence.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2013-12-10 Thread Etsuro Fujita
I wrote:
> Kyotaro HORIGUCHI wrote:
> > I'm convinced of the validity of your patch. I'm satisfied with it.

> Then, if there are no objections of others, I'll mark this as "Ready for
> Committer".

Done.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane  wrote:
>> I'm wondering whether this doesn't indicate that we need to rethink where
>> the fingerprinter has been plugged in.  I'm not sure that somewhere in
>> the planner, post-constant-folding, would be a better place; but it's
>> worth thinking about.

> ... if you're not talking about blaming plans and not queries, you
> must be talking about making the planner do the constant folding in a
> way that facilitates tools like pg_stat_statements in getting the
> "essential nature" of the query (*not* the plan) post constant
> folding.

Yeah; if we were going to take this seriously, we'd need to do some
refactoring to separate normalization-type activities from other
planning activities.  I'm not sure it's worth the trouble.  Right
now, for instance, eval_const_expressions() also handles inlining
of SQL functions, which is a behavior we'd almost certainly *not*
want in front of query fingerprinting.  But it's hard to see how
we disentangle that without either lost optimization capacity
or duplicative processing.  A significant fraction of the point of
const-folding is to exploit opportunities revealed by inlining.

Anyway, I'm not volunteering to do this ;-) ... just idly speculating
about whether it'd be worth doing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Sergey E. Koposov

For what it's worth.

I'll quote Chaudhuri et al. first line from the abstract about the block 
sampling.
"Block-level sampling is far more efficient than true uniform-random 
sampling over a large database, but prone to  significant errors if used 
to create database statistics."


And after briefly glancing through the paper, my opinion is why it works 
is because after making one version of statistics they cross-validate, see 
how well it goes and then collect more if the cross-validation error is 
large (for example because the data is clustered). Without this bit, as 
far as I can a simply block based sampler will be bound to make 
catastrophic mistakes depending on the distribution


Also, just another point about targets (e.g X%) for estimating stuff from 
the samples (as it was discussed in the thread). Basically, the is a 
point talking about a sampling a fixed target (5%) of the data 
ONLY if you fix the actual  distribution of your data in the table, and 
decide what statistic you are trying to find, e.g. average, std. dev. a 
90% percentile, ndistinct or a histogram and so forth. There won't be a 
general answer as the percentages will be distribution dependend and 
statistic dependent.


Cheers,
Sergey

PS I'm not a statistician, but I use statistics a lot

***
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire wrote:
>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> Checkpoint logic would have to be heavily reworked to account for an
>> impaired kernel cache.

> I don't think it would need anything more than a sorted checkpoint.

Nonsense.  We don't have access to the physical-disk-layout information
needed to do reasonable sorting; to say nothing of doing something
intelligent in a multi-spindle environment, or whenever any other I/O
is going on concurrently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Claudio Freire
On Tue, Dec 10, 2013 at 9:22 PM, Jeff Janes  wrote:
>> Communicating more with the kernel (through posix_fadvise, fallocate,
>> aio, iovec, etc...) would probably be good, but it does expose more
>> kernel issues. posix_fadvise, for instance, is a double-edged sword
>> ATM. I do believe, however, that exposing those issues and prompting a
>> fix is far preferable than silently working around them.
>
>
> Getting the kernel to improve those things so PostgreSQL can be changed to
> use them more aggressively seems almost hopeless to me.  PostgreSQL would
> have to be coded to take advantage of the improved versions, while defending
> itself from the pre-improved versions.  And my understanding is that
> different distributions of Linux cherry pick changes to the kernel back and
> forth into their code, so just looking at the kernel version number without
> also looking at the distribution doesn't mean very much about whether we
> have the improved feature or not.  Or am I misinformed about that?
>
> If we can point things out to the kernel hackers things that would be
> absolute improvements, where PostgreSQL and everything else just magically
> start working better if that improvement makes it in, that is great. Both if
> both systems have to be changed in sync to derive any benefit, how do we
> coordinate that?


Well, posix_fadvise is one such thing. It's a cheap form of AIO used
by more than a few programs that want I/O performance, and in its
current form is sub-optimal, the fix is rather simple, it just needs a
lot of testing.

But my report on LKML[0] spurred little actual work. So it's possible
this kind of thing will need patches attached.

On Tue, Dec 10, 2013 at 9:34 PM, Andres Freund  wrote:
> On 2013-12-04 05:39:23 -0200, Claudio Freire wrote:
>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> Checkpoint logic would have to be heavily reworked to account for an
>> impaired kernel cache.
>
> I don't think checkpoints are the critical problem with that, they are
> nicely in the background and we could easily add sorting.

Problem is, with DirectIO, they won't be so background.

Currently, checkpoints assume there's a background process catching
all I/O requests, sorting them, and flushing them as optimally as
possible. This makes the checkpoint's slow-paced write pattern
benignly background, since it will be scheduled opportunistically by
the kernel.

If you use DirectIO, however, a write will pretty much physically move
the writing head (when it reaches the queue's head at least) of
rotating media, causing delays on all other pending I/O requests.
That's quite un-backgroundly of it.

A few blocks per second like that can pretty much kill sequential
scans (I've seen that effect happen with fadvise).


[0] https://lkml.org/lkml/2012/11/9/353


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 11 December 2013 00:44, Greg Stark  wrote:
> On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs  wrote:
>> When we select a block we should read all rows on that block, to help
>> identify the extent of clustering within the data.
>
> So how do you interpret the results of the sample read that way that
> doesn't introduce bias?

Yes, it is not a perfect statistical sample. All sampling is subject
to an error that is data dependent.

I'm happy that we have an option to select this/or not and a default
that maintains current behaviour, since otherwise we might expect some
plan instability.

I would like to be able to

* allow ANALYZE to run faster in some cases
* increase/decrease sample size when it matters
* have the default sample size vary according to the size of the
table, i.e. a proportional sample

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.

2013-12-10 Thread Andres Freund
On 2013-12-11 09:54:36 +0900, Michael Paquier wrote:
> On Wed, Dec 11, 2013 at 9:34 AM, Robert Haas  wrote:
> > Add a new reloption, user_catalog_table.
> Sorry for not having completely followed the thread of logical
> replication, but wouldn't this deserve some documentation?

I'd say this needs to be documented when the later facilities are
committed. It seems hard to explain without that context.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why the buildfarm is all pink

2013-12-10 Thread Tom Lane
I was surprised to see that my back-patches of the recent SubLink
unpleasantness were failing on many of the buildfarm members, but
only in the 9.1 and 9.0 branches.  The difficulty appears to be
that the EXPLAIN output for the new test query changes depending on
whether or not "tenk1" has been analyzed yet.  In 9.2 and up,
it reliably has been, because create_index runs first and that script
does this:

create_index.sql:901:vacuum analyze tenk1;  -- ensure we get 
consistent plans here

But the older branches lack that.  Running the tests serially
usually fails in 9.1 and 9.0, and likely would fail in 8.4 except
that that branch isn't printing the selected plan for lack of
EXPLAIN (COSTS OFF).  Parallel tests sometimes succeed (and
did for me), because the subselect test runs concurrently with
"aggregates" and "join", which have

aggregates.sql:211:analyze tenk1;   -- ensure we get consistent 
plans here
join.sql:333:analyze tenk1; -- ensure we get consistent plans here

so depending on timing, one of those might have gotten the job done,
or maybe autovacuum would show up in time to save the day.

We need a more consistent strategy for this :-(

The minimum-change strategy for getting the buildfarm green again
would be to insert another ad-hoc "analyze tenk1" into subselect.sql
in the back branches.  I don't particularly want to fix it that way,
though, because it'd just be a problem waiting to happen anytime
someone back-patches a bug fix that includes EXPLAIN output.

What I think would be the best strategy, on the whole, is to put
a whole-database "ANALYZE;" at the end of the "copy" regression test,
which is the one that loads up tenk1 and the other large test tables.
It also comes after the tests that load up small static tables such
as int4_tbl.  This would ensure that all the tables that we typically
use for one-off EXPLAIN tests are analyzed early in the proceedings.
Then we could get rid of the various ad-hoc analyzes that have snuck
into various tests.

While I'm on the subject ... I noticed that the recently-added 
matview test has this:

matview.sql:133:VACUUM ANALYZE;

This doesn't make me happy.  Aside from the sheer waste of cycles
involved in re-analyzing the entire regression database, this
test runs in parallel with half a dozen others, and it could cause
plan instability in those.  Of course, if it does, then most likely
those tests have a hazard from autovacuum anyway.  But this still
looks to me like a poor bit of test design.

Anyway, bottom line is that I think we need to institute, and
back-patch, some consistent scheme for when to analyze the standard
tables during the regression tests, so that we don't have hazards
like this for tests that want to check what plan gets selected.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a new reloption, user_catalog_table.

2013-12-10 Thread Michael Paquier
On Wed, Dec 11, 2013 at 9:34 AM, Robert Haas  wrote:
> Add a new reloption, user_catalog_table.
Sorry for not having completely followed the thread of logical
replication, but wouldn't this deserve some documentation?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 4:14 PM, Simon Riggs  wrote:
> err, so what does stats target mean exactly in statistical theory?

Why would I even mention that to a statistician? We want guidance. But
yes, I bet I could give a statistician an explanation of statistics
target that they'd understand without too much trouble.

> Waiting for a statistician, and confirming his credentials before you
> believe him above others here, seems like wasted time.
>
> What your statistician will tell you is it that YMMV, depending on the data.

I'm reasonably confident that they'd give me more than that.

> So we'll still need a parameter to fine tune things when the default
> is off. We can argue about the default later, in various level of
> rigour.
>
> Block sampling, with parameter to specify sample size. +1

Again, it isn't as if the likely efficacy of *some* block sampling
approach is in question. I'm sure analyze.c is currently naive about
many things. Everyone knows that there are big gains to be had.
Balancing those gains against the possible downsides in terms of
impact on the quality of statistics generated is pretty nuanced. I do
know enough to know that a lot of thought goes into mitigating and/or
detecting the downsides of block-based sampling.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Greg Stark
On Wed, Dec 11, 2013 at 12:40 AM, Simon Riggs  wrote:
> When we select a block we should read all rows on that block, to help
> identify the extent of clustering within the data.

So how do you interpret the results of the sample read that way that
doesn't introduce bias?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 11 December 2013 00:28, Greg Stark  wrote:
>On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs  
> wrote:
>> Block sampling, with parameter to specify sample size. +1
>
> Simon this is very frustrating. Can you define "block sampling"?

Blocks selected using Vitter's algorithm, using a parameterised
fraction of the total.

When we select a block we should read all rows on that block, to help
identify the extent of clustering within the data.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.8

2013-12-10 Thread Robert Haas
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund  wrote:
> I've primarily sent this, because I don't know of further required
> changes in 0001-0003. I am trying reviewing the other patches in detail
> atm.

Committed #3 also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Andres Freund
On 2013-12-04 05:39:23 -0200, Claudio Freire wrote:
> Problem is, Postgres relies on a working kernel cache for checkpoints.
> Checkpoint logic would have to be heavily reworked to account for an
> impaired kernel cache.

I don't think checkpoints are the critical problem with that, they are
nicely in the background and we could easily add sorting.

Rather I think it would be the writeout of a dirty victim buffer when
acquiring a new buffer.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Greg Stark
   On Wed, Dec 11, 2013 at 12:14 AM, Simon Riggs  wrote:
> Block sampling, with parameter to specify sample size. +1

Simon this is very frustrating. Can you define "block sampling"?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane  wrote:
> A different point of view is that it's more or less an implementation
> artifact that pg_stat_statements doesn't already see the cases as
> equivalent; that happens only because it looks at the querytree before
> the planner gets around to constant-folding ARRAY[1,2,3] into the single
> Const '{1,2,3}'::int[].

Right. It's also an implementation artifact, for example, that
pg_stat_statements sees two queries as distinct based solely on the
choice of JOIN syntax for (arguably) semantically equivalent inner
join queries. In a perfect world, I guess it would see them as
non-distinct.

> So my objection to what Peter is suggesting is not that it's a bad idea
> in isolation, but that I don't see where he's going to stop, short of
> reinventing every query-normalization behavior that exists in the planner.
> If this particular case is worthy of fixing with a hack in the
> fingerprinter, aren't there going to be dozens more with just as good
> claims?  (Perhaps not, but what's the basis for thinking this is far
> worse than any other normalization issue?)

The basis is simply that I've heard a number of complaints about it
(I'm shocked at just how big a problem Andres considers this to be),
and yet I've heard no complaints about any of the other cases. No one
cares about those, because those are distinctions that applications
and application frameworks are not at all inclined to unpredictably
inject. Doing some string interpolation to build what will become an
ArrayExpr during parse analysis is a common thing. But unpredictably
switching from one syntax to another equivalent one (like varying JOIN
syntax), with the relevant user code traceable to a single point in
the application, simply doesn't happen in the real world.

> I'm wondering whether this doesn't indicate that we need to rethink where
> the fingerprinter has been plugged in.  I'm not sure that somewhere in
> the planner, post-constant-folding, would be a better place; but it's
> worth thinking about.

Hardly seems worth the trouble. A nice property of pg_stat_statements
is that it has good temporal locality; fingerprinting is just an extra
step of parse analysis. I don't think that this would necessarily be
lost by what you outline, but it'd sure be a concern.

We had this discussion, or at least a similar discussion back when the
normalization work was underway (fingerprint at raw parse tree vs.
post-parse-analysis tree .vs after rule expansion .vs plan tree). My
position at the time was that assigning query execution costs at the
plan granularity is a fine thing, but doing so at the query
granularity is also a fine thing, which is more immediately useful. I
believe this so strongly that I went so far as to write a derivative
of pg_stat_statements to do the fingerprinting at the plan granularity
(while expanding that in a more experimental direction, towards
querying the structure of plans, relatively untested ideas not well
suited to even a contrib module). I could certainly see us
instrumenting plan costs as a separate, perhaps complimentary thing,
but it should not happen at the expense of simple blaming of costs on
queries.

So, if you're not talking about blaming plans and not queries, you
must be talking about making the planner do the constant folding in a
way that facilitates tools like pg_stat_statements in getting the
"essential nature" of the query (*not* the plan) post constant
folding. Which doesn't seem like a good allocation of resources, since
as I said this is the sole complaint of this nature that I've heard,
and people talk to me about pg_stat_statements a lot.

The role of pg_stat_statements is to instrument statements: to assign
costs traceable back to a point in the application. Because of this
implementation artifact, that makes it harder to trace back the costs
to that point, because some entries with less execution numbers of
ArrayExpr elements may be evicted while others are not, and so on.

I just discussed the Django issue with Jacob Kaplan-Moss, Heroku's
directory of security, who is on the Django core team. He's going to
take a look at it, and I wouldn't be surprised if it was fixed, but
that's hardly a scalable model.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Jeff Janes
On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire wrote:

> On Wed, Dec 4, 2013 at 4:28 AM, Tatsuo Ishii  wrote:
> >>> Can we avoid the Linux kernel problem by simply increasing our shared
> >>> buffer size, say up to 80% of memory?
> >> It will be swap more easier.
> >
> > Is that the case? If the system has not enough memory, the kernel
> > buffer will be used for other purpose, and the kernel cache will not
> > work very well anyway. In my understanding, the problem is, even if
> > there's enough memory, the kernel's cache does not work as expected.
>
>
> Problem is, Postgres relies on a working kernel cache for checkpoints.
> Checkpoint logic would have to be heavily reworked to account for an
> impaired kernel cache.
>

I don't think it would need anything more than a sorted checkpoint.  There
are patches around for doing those.  I can dig one up again and rebase it
to HEAD if anyone cares.  What else would be needed checkpoint-wise?

As far as I can tell, the main problem with large shared_buffers is some
poorly characterized locking issues related to either the buffer mapping or
the freelist.  And those locking issues seem to trigger even more poorly
characterized scheduling issues in the kernel, at least in some kernels.

But note that if we did do this, just crank up shared_buffers so it takes
up 95% of RAM, our own ring buffer access strategy would be even worse for
the case which started this thread than the kernel's policy being
complained of.  That strategy is only acceptable because it normally sits
on top of a substantial cache at the kernel level.


>
> Really, there's no difference between fixing the I/O problems in the
> kernel(s) vs in postgres. The only difference is, in the kernel(s),
> everyone profits, and you've got a huge head start.
>

That assumes the type of problem the kernel faces is the same as the ones a
database does, which I kind of doubt.  Even if the changes were absolute
improvements with no trade-offs, we would need to convince a much larger
community of that fact.


>
> Communicating more with the kernel (through posix_fadvise, fallocate,
> aio, iovec, etc...) would probably be good, but it does expose more
> kernel issues. posix_fadvise, for instance, is a double-edged sword
> ATM. I do believe, however, that exposing those issues and prompting a
> fix is far preferable than silently working around them.
>

Getting the kernel to improve those things so PostgreSQL can be changed to
use them more aggressively seems almost hopeless to me.  PostgreSQL would
have to be coded to take advantage of the improved versions, while
defending itself from the pre-improved versions.  And my understanding is
that different distributions of Linux cherry pick changes to the kernel
back and forth into their code, so just looking at the kernel version
number without also looking at the distribution doesn't mean very much
about whether we have the improved feature or not.  Or am I misinformed
about that?

If we can point things out to the kernel hackers things that would be
absolute improvements, where PostgreSQL and everything else just magically
start working better if that improvement makes it in, that is great. Both
if both systems have to be changed in sync to derive any benefit, how do we
coordinate that?

Cheers,

Jeff


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 10 December 2013 23:43, Peter Geoghegan  wrote:
> On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby  wrote:
>>> I agree that looking for information on block level sampling
>>> specifically, and its impact on estimation quality is likely to not
>>> turn up very much, and whatever it does turn up will have patent
>>> issues.
>>
>>
>> We have an entire analytics dept. at work that specializes in finding
>> patterns in our data. I might be able to get some time from them to at least
>> provide some guidance here, if the community is interested. They could
>> really only serve in a consulting role though.
>
> I think that Greg had this right several years ago: it would probably
> be very useful to have the input of someone with a strong background
> in statistics. It doesn't seem that important that they already know a
> lot about databases, provided they can understand what our constraints
> are, and what is important to us. It might just be a matter of having
> them point us in the right direction.

err, so what does stats target mean exactly in statistical theory?
Waiting for a statistician, and confirming his credentials before you
believe him above others here, seems like wasted time.

What your statistician will tell you is it that YMMV, depending on the data.

So we'll still need a parameter to fine tune things when the default
is off. We can argue about the default later, in various level of
rigour.

Block sampling, with parameter to specify sample size. +1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.8

2013-12-10 Thread Robert Haas
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund  wrote:
> On 2013-12-03 15:19:26 -0500, Robert Haas wrote:
>> Yeah, you're right.  I think the current logic will terminate when all
>> flags are set to false or all attribute numbers have been checked, but
>> it doesn't know that if HOT's been disproven then we needn't consider
>> further HOT columns.  I think the way to fix that is to tweak this
>> part:
>>
>> +   if (next_hot_attnum > FirstLowInvalidHeapAttributeNumber)
>> check_now = next_hot_attnum;
>> +   else if (next_key_attnum > 
>> FirstLowInvalidHeapAttributeNumber)
>> +   check_now = next_key_attnum;
>> +   else if (next_id_attnum > FirstLowInvalidHeapAttributeNumber)
>> +   check_now = next_id_attnum;
>> else
>> +   break;
>>
>> What I think we ought to do there is change each of those criteria to
>> say if (hot_result && next_hot_attnum >
>> FirstLowInvalidHeapAttributeNumber) and similarly for the other two.
>> That way we consider each set a valid source of attribute numbers only
>> until the result flag for that set flips false.
>
> That seems to work well, yes.
>
> Updated & rebased series attached.
>
> * Rebased since the former patch 01 has been applied
> * Lots of smaller changes in the wal_level=logical patch
>   * Use Robert's version of wal_level=logical, with the above fixes
>   * Use only macros for RelationIsAccessibleInLogicalDecoding/LogicallyLogged
>   * Moved a mit more logic into ExtractReplicaIdentity
>   * some comment copy-editing
>   * Bug noted by Euler fixed, testcase added
> * Some copy editing in later patches, nothing significant.
>
> I've primarily sent this, because I don't know of further required
> changes in 0001-0003. I am trying reviewing the other patches in detail
> atm.

Committed #1 (again).  Regarding this:

+   /* XXX: we could also do this unconditionally, the space is used anyway
+   if (copy_oid)
+   HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp));

I would like to put in a big +1 for doing that unconditionally.  I
didn't make that change before committing, but I think it'd be a very
good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 3:26 PM, Jim Nasby  wrote:
>> I agree that looking for information on block level sampling
>> specifically, and its impact on estimation quality is likely to not
>> turn up very much, and whatever it does turn up will have patent
>> issues.
>
>
> We have an entire analytics dept. at work that specializes in finding
> patterns in our data. I might be able to get some time from them to at least
> provide some guidance here, if the community is interested. They could
> really only serve in a consulting role though.

I think that Greg had this right several years ago: it would probably
be very useful to have the input of someone with a strong background
in statistics. It doesn't seem that important that they already know a
lot about databases, provided they can understand what our constraints
are, and what is important to us. It might just be a matter of having
them point us in the right direction.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-10 Thread Jim Nasby

Just to add a data point (and sorry, I can't find where someone was talking 
about numbers in the thread)...

For a while earlier this year we were running a 3.x kernel and saw a very 
modest (1-2%) improvement in overall performance. This would be on a server 
with 512G RAM running ext4.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Tom Lane
Noah Misch  writes:
> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
>> Let's not add more cases like that, if we can avoid it.

> Only if we can avoid it for a modicum of effort and feature compromise.
> You're asking for PostgreSQL to reshape its use of persistent resources so you
> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
> memory leak.  That use case, not PostgreSQL, has the defect here.

The larger point is that such a shutdown process has never in the history
of Postgres been successful at removing shared-memory (or semaphore)
resources.  I do not feel a need to put a higher recoverability standard
onto the DSM code than what we've had for fifteen years with SysV shmem.

But, by the same token, it shouldn't be any *less* recoverable.  In this
context that means that starting a new postmaster should recover the
resources, at least 90% of the time (100% if we still have the old
postmaster lockfile).  I think the idea of keeping enough info in the
SysV segment to permit recovery of DSM resources is a good one.  Then,
any case where the existing logic is able to free the old SysV segment
is guaranteed to also work for DSM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Jim Nasby

On 12/10/13 2:17 PM, Peter Geoghegan wrote:

On Tue, Dec 10, 2013 at 11:59 AM, Greg Stark  wrote:

But I don't really think this is the right way to go about this.
Research papers are going to turn up pretty specialized solutions that
are probably patented. We don't even have the basic understanding we
need. I suspect a basic textbook chapter on multistage sampling will
discuss at least the standard techniques.


I agree that looking for information on block level sampling
specifically, and its impact on estimation quality is likely to not
turn up very much, and whatever it does turn up will have patent
issues.


We have an entire analytics dept. at work that specializes in finding patterns 
in our data. I might be able to get some time from them to at least provide 
some guidance here, if the community is interested. They could really only 
serve in a consulting role though.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Daniel Farina
On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane  wrote:
> So my objection to what Peter is suggesting is not that it's a bad idea
> in isolation, but that I don't see where he's going to stop, short of
> reinventing every query-normalization behavior that exists in the planner.
> If this particular case is worthy of fixing with a hack in the
> fingerprinter, aren't there going to be dozens more with just as good
> claims?  (Perhaps not, but what's the basis for thinking this is far
> worse than any other normalization issue?)

Qualitatively, the dynamic length values list is the worst offender.

There is no algebraic solution to where to stop with normalizations,
but as Peter points out, that bridge has been crossed already:
assumptions have already been made that toss some potentially useful
information already, and yet the program is undoubtedly practical.

Based on my own experience (which sounds similar to Andres's), I am
completely convinced the canonicalization he proposes here is more
practical than the current definition to a large degree, so I suggest
the goal is worthwhile.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Andres Freund
On 2013-12-10 18:12:53 -0500, Noah Misch wrote:
> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
> > On 12/10/2013 07:27 PM, Noah Misch wrote:
> > >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
> > Let's not add more cases like that, if we can avoid it.
> 
> Only if we can avoid it for a modicum of effort and feature compromise.
> You're asking for PostgreSQL to reshape its use of persistent resources so you
> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
> memory leak.  That use case, not PostgreSQL, has the defect here.

Empathically seconded.

> > > Another refinement is to wait for all the processes to attach before 
> > > setting
> > > the segment's size with ftruncate(). That way, when the window is open for
> > > leaking the segment, it's still 0-sized so leaking it is not a big deal.
> 
> > >and it is less
> > >general: not every use of DSM is conducive to having all processes attach 
> > >in a
> > >short span of time.

> > Let's cross that bridge when we get there. AFAICS it fits all the
> > use cases discussed this far.

The primary use case I have for dsm, namely writing extensions that can
use shared memory without having to be listed in
shared_preload_libraries, certainly wouldn't work in any sensible way
with such a restriction.
And I don't think that's an insignificant usecase.

So I really fail to see what this would buy us.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Noah Misch
On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
> On 12/10/2013 07:27 PM, Noah Misch wrote:
> >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
> >>>On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas 
> >>> wrote:
> * As discussed in the "Something fishy happening on frogmouth" thread, I
> don't like the fact that the dynamic shared memory segments will be
> permanently leaked if you kill -9 postmaster and destroy the data 
> directory.

> >>I really think we need to do something about it. To use your earlier
> >>example of parallel sort, it's not acceptable to permanently leak a 512
> >>GB segment on a system with 1 TB of RAM.
> >
> >I don't.  Erasing your data directory after an unclean shutdown voids any
> >expectations for a thorough, automatic release of system resources.  Don't do
> >that.  The next time some new use of a persistent resource violates your hope
> >for this scenario, there may be no remedy.
> 
> Well, the point of erasing the data directory is to release system
> resources. I would normally expect "killall -9 ; rm -rf
> " to thorougly get rid of the running program and all the
> resources. It's surprising enough that the regular shared memory
> segment is left behind

Your expectation is misplaced.  Processes and files are simply not the only
persistent system resources of interest.

> but at least that one gets cleaned up when
> you start a new server (on same port).

In the most-typical case, yes.  In rare cases involving multiple postmasters
starting and stopping, the successor to the erased data directory will not
clean up the sysv segment.

> Let's not add more cases like that, if we can avoid it.

Only if we can avoid it for a modicum of effort and feature compromise.
You're asking for PostgreSQL to reshape its use of persistent resources so you
can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
memory leak.  That use case, not PostgreSQL, has the defect here.

> BTW, what if the data directory is seriously borked, and the server
> won't start? Sure, don't do that, but it would be nice to have a way
> to recover if you do anyway. (docs?)

If something is corrupting your data directory in an open-ended manner, you
have bigger problems than a memory leak until reboot.  Recovering DSM happens
before we read the control file, so the damage would need to fall among a
short list of files for this to happen (bugs excluded).  Nonetheless, I don't
object to documenting the varieties of system resources that PostgreSQL may
reserve and referencing the OS facilities for inspecting them.

Are you actually using PostgreSQL this way: frequent "killall -9 postgres; rm
-rf $PGDATA" after arbitrarily-bad $PGDATA corruption?  Some automated fault
injection test rig, perhaps?

> >>One idea is to create the shared memory object with shm_open, and wait
> >>until all the worker processes that need it have attached to it. Then,
> >>shm_unlink() it, before using it for anything. That way the segment will
> >>be automatically released once all the processes close() it, or die. In
> >>particular, kill -9 will release it. (This is a variant of my earlier
> >>idea to create a small number of anonymous shared memory file
> >>descriptors in postmaster startup with shm_open(), and pass them down to
> >>child processes with fork()). I think you could use that approach with
> >>SysV shared memory as well, by destroying the segment with
> >>sgmget(IPC_RMID) immediately after all processes have attached to it.
> >
> >That leaves a window in which we still leak the segment,
> 
> A small window is better than a large one.

Yes.

> Another refinement is to wait for all the processes to attach before
> setting the segment's size with ftruncate(). That way, when the
> window is open for leaking the segment, it's still 0-sized so
> leaking it is not a big deal.
> 
> >and it is less
> >general: not every use of DSM is conducive to having all processes attach in 
> >a
> >short span of time.
> 
> Let's cross that bridge when we get there. AFAICS it fits all the
> use cases discussed this far.

It does fit the use cases discussed thus far.

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Tom Lane
Robert Haas  writes:
> Right, but the flip side is that you could collapse things that people
> don't want collapsed.  If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.  On the other hand if you have two
> queries and one of them looks like this:

> WHERE status IN ('active') AND user_id = ?

> and the other looks like this:

> WHERE status IN ('inactive', 'deleted') AND user_id = ?

> ...it might actually annoy you to have those two things conflated;
> it's easy to imagine one having much different performance
> characteristics than the other.

Of course, "status = 'active'" and "status = 'deleted'" might have very
different performance characteristics all by themselves, yet we've
already hard-wired a decision that pg_stat_statements will conflate them.
So I don't think the above argument holds a lot of water.

A different point of view is that it's more or less an implementation
artifact that pg_stat_statements doesn't already see the cases as
equivalent; that happens only because it looks at the querytree before
the planner gets around to constant-folding ARRAY[1,2,3] into the single
Const '{1,2,3}'::int[].

So my objection to what Peter is suggesting is not that it's a bad idea
in isolation, but that I don't see where he's going to stop, short of
reinventing every query-normalization behavior that exists in the planner.
If this particular case is worthy of fixing with a hack in the
fingerprinter, aren't there going to be dozens more with just as good
claims?  (Perhaps not, but what's the basis for thinking this is far
worse than any other normalization issue?)

I'm wondering whether this doesn't indicate that we need to rethink where
the fingerprinter has been plugged in.  I'm not sure that somewhere in
the planner, post-constant-folding, would be a better place; but it's
worth thinking about.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-10 Thread Kevin Grittner
Antonin Houska  wrote:

> I used the DDLs attached (tables.ddl) for this query too, not
> only for the queries in quaries.sql. Yes, if I had mentioned it
> and/or qualified the 'k' column reference, it wouldn't have
> broken anything.

Apologies; I missed the attachments.  It makes a lot more sense now
that I see those.

I see this was a patch originally posted on 2013-10-31 and not
added to the CommitFest.

I applied it to master and ran the regression tests, and one of the
subselect tests failed.

This query:

SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second Field"
  FROM SUBSELECT_TBL upper
  WHERE f1 IN
    (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

Should have this for a result set:

 six | Correlated Field | Second Field
-+--+--
 |    2 |    4
 |    3 |    5
 |    1 |    1
 |    2 |    2
 |    3 |    3
(5 rows)

But during the `make check` or `make install-check` it is missing
the last two rows.  Oddly, if I go into the database later and try
it, the rows show up.  It's not immediately apparent to me what's
wrong.

Will look again later, or maybe someone more familiar with the
planner can spot the problem.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Jeff Janes
On Mon, Dec 9, 2013 at 2:37 PM, Robert Haas  wrote:

> On Mon, Dec 9, 2013 at 4:18 PM, Jeff Janes  wrote:
> > My reading of the code is that if it is not in the MCV, then it is
> assumed
> > to have the average selectivity (about 1/n_distinct, but deflating top
> and
> > bottom for the MCV list).  There is also a check that it is less than the
> > least common of the MCV, but I don't know why that situation would ever
> > prevail--that should always be higher or equal to the average
> selectivity.
>
> I've never seen an n_distinct value of more than 5 digits, regardless
> of reality.  Typically I've seen 20-50k, even if the real number is
> much higher.



I don't recall seeing an n_distinct that is literally above 100,000 in the
wild, but I've seen negative ones that multiply with reltuples to give
values more than that.  In test cases it is easy enough to get values in
the millions by creating tables using floor(random()*$something).

create table baz as select floor(random()*1000), md5(random()::text)
from generate_series(1,1);
create table baz2 as select * from baz order by floor;
create table baz3 as select * from baz order by md5(floor::text);

baz unclustered, baz2 is clustered with perfect correlation, baz3 is
clustered but without correlation.

After analyzing all of them:

select tablename, n_distinct,correlation  from pg_stats where tablename
 like 'baz%' and attname='floor'  ;
 tablename | n_distinct  | correlation
---+-+-
 baz   | 8.56006e+06 |  0.00497713
 baz2  |  333774 |   1
 baz3  |  361048 |  -0.0118147

So baz is pretty close, while the other two are way off.  But the
n_distinct computation doesn't depend on the order of the rows, as far as I
can tell, but only the composition of the sample.  So this can only mean
that our "random" sampling method is desperately non-random, right?



>  But the n_distinct value is only for non-MCVs, so if we
> estimate the selectivity of column = 'rarevalue' to be
> (1-nullfrac-mcvfrac)/n_distinct, then making mcvfrac bigger reduces
> the estimate, and making the MCV list longer naturally makes mcvfrac
> bigger.


Ah, I see.  By including more things into MCV, we crowd out the rest of the
space implicitly.




> I'm not sure how important the
> less-frequent-than-the-least-common-MCV part is, but I'm very sure
> that raising the statistics target helps to solve the problem of
> overestimating the prevalence of uncommon values in a very big table.
>
> > I think that parts of the planner are N^2 in the size of histogram (or
> was
> > that the size of the MCV list?).  So we would probably need a way to use
> a
> > larger sample size to get more accurate n_distinct and MCV frequencies,
> but
> > not save the entire histogram that goes with that sample size.
>
> I think the saving the histogram part is important.  As you say, the
> MCVs are important for a variety of planning purposes, such as hash
> joins.  More than that, in my experience, people with large tables are
> typically very willing to spend more planning time to get a better
> plan, because mistakes are expensive and the queries are likely to run
> for a while anyway.  People with small tables care about planning

time, because it makes no sense to spend an extra 1ms planning a query
> unless you improve the plan by enough to save at least 1ms when
> executing it, and when the tables are small and access is expected to
> be fast anyway that's often not the case.
>

I would think that the dichotomy is more about the size of the query-plan
than of the tables.  I think a lot of people with huge tables end up doing
mostly indexed lookups in unique or highly selective indexes, once all the
planning is done.

Does anyone have generators for examples of cases where increasing the
sample size to get better histograms (as opposed more accurate n_distinct
or more accurate MCV) was the key to fixing bad plans?  I wonder if it is
more bins, or more accurate boundaries, that makes the difference.

Cheers,

Jeff


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 2:55 PM, Peter Geoghegan  wrote:
> You might get lucky and have this exact case, and be able to
> leverage the knowledge that the 2 constants in the ArrayExpr must be
> the latter and 1 must be the former, but experience suggests very
> probably not.

When things get this bad, you'd probably be better off adding a
tautology to the query with the problematic constants (or, dare I say,
installing pg_stat_plans). That seems weird, but then if you were able
to recognize the case you described without one it's nothing more than
dumb luck.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Andres Freund
On 2013-12-10 17:46:56 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund  wrote:
> > On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
> >> Did you really find pg_stat_statements to be almost useless in such
> >> situations? That seems worse than I thought.
> >
> > It's very hard to see where you should spend efforts when every "logical
> > query" is split into hundreds of pg_stat_statement entries. Suddenly
> > it's important whether a certain counts of parameters are more frequent
> > than others because in the equally distributed cases they fall out of
> > p_s_s again pretty soon. I think that's probably a worse than average
> > case, but certainly not something only I could have the bad fortune of
> > looking at.
> 
> Right, but the flip side is that you could collapse things that people
> don't want collapsed.  If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.

Yea, completely agreed. It might also lead to users missing the fact
that their precious prepared-statement cache is just using up loads of
backend memory and individual prepared statements are seldomly
re-executed because there are so many...

>  On the other hand if you have two
> queries and one of them looks like this:
> 
> WHERE status IN ('active') AND user_id = ?
> 
> and the other looks like this:
> 
> WHERE status IN ('inactive', 'deleted') AND user_id = ?

That too.

> Part of me wonders if the real solution here is to invent a way to
> support an arbitrarily large hash table of entries efficiently, and
> then let people do further roll-ups of the data in userland if they
> don't like our rollups.  Part of the pain here is that when you
> overflow the hash table, you start losing information that can't be
> recaptured after the fact.  If said hash table were by chance also
> suitable for use as part of the stats infrastructure, in place of the
> whole-file-rewrite technique we use today, massive win.
> 
> Of course, even if we had all this, it necessarily make doing
> additional rollups *easy*; it's easy to construct cases that can be
> handled much better given access to the underlying parse tree
> representation than they can be with sed and awk.  But it's a thought.

That would obviously be neat, but I have roughly no clue how to achieve
that. Granular control over how such rollups would work sounds very hard
to achieve unless that granular control just is getting passed a tree
and returning another.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 2:46 PM, Robert Haas  wrote:
> Right, but the flip side is that you could collapse things that people
> don't want collapsed.  If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.  On the other hand if you have two
> queries and one of them looks like this:
>
> WHERE status IN ('active') AND user_id = ?
>
> and the other looks like this:
>
> WHERE status IN ('inactive', 'deleted') AND user_id = ?
>
> ...it might actually annoy you to have those two things conflated;
> it's easy to imagine one having much different performance
> characteristics than the other.

That is true, but you're probably not going to be able to make any use
of the distinction in the first place, because it's just going to be a
bunch of ? ? characters as constants. You'd have to plan ahead most
likely. You might get lucky and have this exact case, and be able to
leverage the knowledge that the 2 constants in the ArrayExpr must be
the latter and 1 must be the former, but experience suggests very
probably not.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund  wrote:
> On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
>> Did you really find pg_stat_statements to be almost useless in such
>> situations? That seems worse than I thought.
>
> It's very hard to see where you should spend efforts when every "logical
> query" is split into hundreds of pg_stat_statement entries. Suddenly
> it's important whether a certain counts of parameters are more frequent
> than others because in the equally distributed cases they fall out of
> p_s_s again pretty soon. I think that's probably a worse than average
> case, but certainly not something only I could have the bad fortune of
> looking at.

Right, but the flip side is that you could collapse things that people
don't want collapsed.  If you've got lots of query that differ only in
that some of them say user_id IN (const1, const2) and others say
user_id IN (const1, const2, const3) and the constants vary a lot, then
of course this seems attractive.  On the other hand if you have two
queries and one of them looks like this:

WHERE status IN ('active') AND user_id = ?

and the other looks like this:

WHERE status IN ('inactive', 'deleted') AND user_id = ?

...it might actually annoy you to have those two things conflated;
it's easy to imagine one having much different performance
characteristics than the other.

Part of me wonders if the real solution here is to invent a way to
support an arbitrarily large hash table of entries efficiently, and
then let people do further roll-ups of the data in userland if they
don't like our rollups.  Part of the pain here is that when you
overflow the hash table, you start losing information that can't be
recaptured after the fact.  If said hash table were by chance also
suitable for use as part of the stats infrastructure, in place of the
whole-file-rewrite technique we use today, massive win.

Of course, even if we had all this, it necessarily make doing
additional rollups *easy*; it's easy to construct cases that can be
handled much better given access to the underlying parse tree
representation than they can be with sed and awk.  But it's a thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 2:38 PM, Andres Freund  wrote:
> It's very hard to see where you should spend efforts when every "logical
> query" is split into hundreds of pg_stat_statement entries. Suddenly
> it's important whether a certain counts of parameters are more frequent
> than others because in the equally distributed cases they fall out of
> p_s_s again pretty soon. I think that's probably a worse than average
> case, but certainly not something only I could have the bad fortune of
> looking at.

Another problem is that creating a new entry is relatively expensive,
because we need to acquire an exclusive lock to do so. If there was a
lot of churn, I'd worry that the performance overhead of
pg_stat_statements. It could be quite a lot higher than necessary.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Andres Freund
On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
> Did you really find pg_stat_statements to be almost useless in such
> situations? That seems worse than I thought.

It's very hard to see where you should spend efforts when every "logical
query" is split into hundreds of pg_stat_statement entries. Suddenly
it's important whether a certain counts of parameters are more frequent
than others because in the equally distributed cases they fall out of
p_s_s again pretty soon. I think that's probably a worse than average
case, but certainly not something only I could have the bad fortune of
looking at.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 1:41 PM, Andres Freund  wrote:
> FWIW, I hit exactly this issue every single time I have looked at
> pg_stat_statements in some client's database making it nearly useless
> for analysis. So improving it might be worthwile.

I think the thing that makes me lean towards doing this is the fact
that pgFouine and pgBadger have been doing something similar for
years, and those tools continue to be much more popular than I thought
they'd be now that pg_stat_statements with normalization is fairly
widely available. Yes, of course the problem can be solved by using "
= ANY(array)". But some people are lazy, or uninformed, or they don't
directly control this. As I mentioned, this has traditionally been a
problem with Django, where I imagine the fix is non-trivial. I haven't
asked, but it isn't too hard to imagine that the Django guys are
probably not all that keen on doing a lot of work that will only be
applicable to Postgres.

Did you really find pg_stat_statements to be almost useless in such
situations? That seems worse than I thought.

I guess it in no small part comes down to what the long term future of
the query finger-printer is.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Tom Lane
Robert Haas  writes:
> On Tue, Dec 10, 2013 at 4:30 AM, Peter Geoghegan  wrote:
>> pg_stat_statements' fingerprinting logic considers the following two
>> statements as distinct:
>> 
>> select 1 in (1, 2, 3);
>> select 1 in (1, 2, 3, 4);
>> 
>> [ and some people think it shouldn't ]

> I am very wary of implementing special-case logic here even though I
> know it could be useful to some people, simply because I fear that
> there could be a near-infinite variety of situations where, in a
> particular environment, a particular distinction isn't important.

FWIW, I didn't much care for this idea either, though Robert's expressed
it better than what was rattling around in my brain this morning.  There's
a very slippery slope from here to inserting all sorts of random hacks
into the query fingerprinter, and that's not someplace I want to go.

There are alternatives that the requestor could consider for making these
cases more alike, such as supplying the set of IDs as an array constant
(or parameter) to begin with.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stats for network traffic WIP

2013-12-10 Thread Tom Lane
Robert Haas  writes:
> Yes, I think the overhead of this patch is far, far too high to
> contemplate applying it.  It sends a stats collector message after
> *every socket operation*.  Once per transaction would likely be too
> much overhead already (think: pgbench -S) but once per socket op is
> insane.

Oh, is that what the problem is?  That seems trivially fixable --- only
flush the data to the collector once per query or so.  I'd be a bit
inclined to add it to the existing transaction-end messages instead of
adding any new traffic.

> Moreover, even if we found some way to reduce that overhead to an
> acceptable level, I think a lot of people would be unhappy about the
> statsfile bloat.

This could be a bigger problem, but what are we aggregating over?
If the stats are only recorded at say the database level, that's not
going to take much space.

Having said that, I can't get very excited about this feature anyway,
so I'm fine with rejecting the patch.  I'm not sure that enough people
care to justify any added overhead at all.  The long and the short of
it is that network traffic generally is what it is, for any given query
workload, and so it's not clear what's the point of counting it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Greg Stark
On Tue, Dec 10, 2013 at 7:49 PM, Peter Geoghegan  wrote:
>> Back in 2005/6, I advocated a block sampling method, as described by
>> Chaudri et al (ref?)
>
> I don't think that anyone believes that not doing block sampling is
> tenable, fwiw. Clearly some type of block sampling would be preferable
> for most or all purposes.

We do block sampling now. But then we select rows from those blocks uniformly.

Incidentally, if you mean Surajit Chaudhuri, he's a Microsoft Research
lead so I would be nervous about anything he's published being
patented by Microsoft.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Josh Berkus
On 12/10/2013 01:33 PM, Mark Kirkwood wrote:
> Yeah - and we seem to be back to Josh's point about needing 'some math'
> to cope with the rows within a block not being a purely random selection.

Well, sometimes they are effectively random.  But sometimes they are
not.  The Chaudri et al paper had a formula for estimating randomness
based on the grouping of rows in each block, assuming that the sampled
blocks were widely spaced (if they aren't there's not much you can do).
 This is where you get up to needing a 5% sample; you need to take
enough blocks that you're confident that the blocks you sampled are
representative of the population.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Andres Freund
On 2013-12-10 16:09:02 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2013 at 4:30 AM, Peter Geoghegan  wrote:
> > I'm not sure that I agree, but there is anecdata that suggests that it
> > isn't uncommon for these sorts of queries to be broken out when
> > they're all traceable back to a single point in the application
> > (apparently it's common for Django apps to do so, perhaps
> > questionably). If we assume that doing what I've described has no real
> > downside, then it would probably be worth implementing. Plus I'm
> > pretty sure that tools that do regex normalization are already doing
> > something analogous. Thoughts?
> 
> Sounds like this:
> 
> // What's the worst thing that could happen?
> pandoras_box.open();
> 
> I am very wary of implementing special-case logic here even though I
> know it could be useful to some people, simply because I fear that
> there could be a near-infinite variety of situations where, in a
> particular environment, a particular distinction isn't important.  We
> won't be serving anyone well if we ignore all of those distinctions,
> because not everyone will want to ignore all of them.  And adding a
> configuration option for each one doesn't sound like a good idea,
> either.  *Maybe* this particular case is narrow enough that it's OK to
> just ignore it unconditionally and maybe that'd be OK ... but I fear
> this is a rathole.

FWIW, I hit exactly this issue every single time I have looked at
pg_stat_statements in some client's database making it nearly useless
for analysis. So improving it might be worthwile.

On the other hand, so far all could fix their application in a
reasonable amount of time to generate = ANY(array) queries
instead. Which probably isn't a bad idea either, considering they now
use far fewer prepared statements.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 1:09 PM, Robert Haas  wrote:
> I am very wary of implementing special-case logic here even though I
> know it could be useful to some people, simply because I fear that
> there could be a near-infinite variety of situations where, in a
> particular environment, a particular distinction isn't important.

I am too, which is why I asked.

We're already in the business of deciding what is and isn't essential
to a query in this way. For example, we already determine that
Var.varcollid shouldn't appear in a query jumble - there is no better
reason for that then "it would hurt more than it helped", even though
it's possible that someone could care about such a distinction. Now, I
have no intention of avoiding the issue with a relativistic argument
("who is to say what the essential nature of a query is anyway?"), but
I know doctrinarianism isn't helpful either.

I do think I know who should determine what is the essential nature of
a query for fingerprinting purposes: we should. We should pick the
scheme that is most widely useful, while weighing the worst case. I'm
not asserting that this is closer to that, but it might be.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Mark Kirkwood

On 11/12/13 09:19, Heikki Linnakangas wrote:

On 12/10/2013 10:00 PM, Simon Riggs wrote:

On 10 December 2013 19:54, Josh Berkus  wrote:

On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs 
 wrote:

I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of block sampling would be preferable
for most or all purposes.


As discussed, we need math though.  Does anyone have an ACM 
subscription

and time to do a search?  Someone must.  We can buy one with community
funds, but no reason to do so if we don't have to.


We already have that, just use Vitter's algorithm at the block level
rather than the row level.


And what do you do with the blocks? How many blocks do you choose? 
Details, please.





Yeah - and we seem to be back to Josh's point about needing 'some math' 
to cope with the rows within a block not being a purely random selection.


Regards

Mark





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 12:50 PM, Tom Lane wrote:
> One would hope that turning off check_function_bodies would be sufficient
> to disable any added checking, though, so I don't see this being a problem
> for pg_dump.  But there might be other scenarios where an additional knob
> would be useful.

I can't think of one, offhand.  And +1 for NOT adding a new knob.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why standby.max_connections must be higher than primary.max_connections?

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 3:34 AM, 山田聡  wrote:
> Hello hackers.
>
> I am struggling to understand why standby.max_connections must be higher
> than
> primary.max_connections.Do someone know the reason why?

Because the KnownAssignedXIDs and lock tables on the standby need to
be large enough to contain the largest snapshot and greatest number of
AccessExclusiveLocks that could exist on the master at any given time.

> I think this restriction obstructs making a flexible load balancing.
> I'd like to make standby database to use load balancing.Of course
> a role of a standby server is different from one of a master server.
> So I think it it natural that I want to set standby.max_connections less
> than
> primary.max_connections.

Your load balancer should probably have a configuration setting that
controls how many connections it will try to make to the database, and
you can set that to a lower value than max_connections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 4:04 PM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
>>  wrote:
>> > Speaking of the functionality this does offer, it seems pretty limited. A
>> > commit timestamp is nice, but it isn't very interesting on its own. You
>> > really also want to know what the transaction did, who ran it, etc. ISTM
>> > some kind of a auditing or log-parsing system that could tell you all that
>> > would be much more useful, but this patch doesn't get us any closer to 
>> > that.
>>
>> For what it's worth, I think that this has been requested numerous
>> times over the years by numerous developers of replication solutions.
>> My main question (apart from whether or not it may have bugs) is
>> whether it makes a noticeable performance difference.  If it does,
>> that sucks.  If it does not, maybe we ought to enable it by default.
>
> I expect it will have some performance impact -- this is why we made it
> disable-able in the first place, and why I went to the trouble of
> ensuring it can be turned on after initdb.  Normal pg_clog entries are 2
> bits per transaction, whereas the commit timestamp stuff adds 12 *bytes*
> per transaction.  Not something to be taken lightly, hence it's off by
> default.  Presumably people who is using one of those replication
> systems is okay with taking some (reasonable) performance hit.

Well, writing 12 extra bytes (why not 8?) on each commit is not
intrinsically that expensive.  The (poor) design of SLRU might make it
expensive, though, because since it has no fsync absorption queue, so
sometimes you end up waiting for an fsync, and doing that 48x more
often will indeed have some cost.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 4:30 AM, Peter Geoghegan  wrote:
> pg_stat_statements' fingerprinting logic considers the following two
> statements as distinct:
>
> select 1 in (1, 2, 3);
> select 1 in (1, 2, 3, 4);
>
> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list
> of elements recursively. In this case it's a list of Const nodes, and
> the fingerprinting logic jumbles those nodes indifferently.
>
> Somebody told me that they think that pg_stat_statements should not do
> that. This person felt that it would be preferable for such
> expressions to be normalized without regard to the number of distinct
> Const elements. I suppose that that would work by determing if the
> ArrayExpr elements list was a list of Const nodes and only const
> nodes. Iff that turned out to be the case, something else would be
> jumbled (something other than the list) that would essentially be a
> representation of "some list of zero or more (or maybe one or more)
> Const nodes with consttype of, in this example, 23". I think that this
> would make at least one person happy, because of course the two
> statements above would have their costs aggregated within a single
> pg_stat_statements entry.
>
> I'm not sure that I agree, but there is anecdata that suggests that it
> isn't uncommon for these sorts of queries to be broken out when
> they're all traceable back to a single point in the application
> (apparently it's common for Django apps to do so, perhaps
> questionably). If we assume that doing what I've described has no real
> downside, then it would probably be worth implementing. Plus I'm
> pretty sure that tools that do regex normalization are already doing
> something analogous. Thoughts?

Sounds like this:

// What's the worst thing that could happen?
pandoras_box.open();

I am very wary of implementing special-case logic here even though I
know it could be useful to some people, simply because I fear that
there could be a near-infinite variety of situations where, in a
particular environment, a particular distinction isn't important.  We
won't be serving anyone well if we ignore all of those distinctions,
because not everyone will want to ignore all of them.  And adding a
configuration option for each one doesn't sound like a good idea,
either.  *Maybe* this particular case is narrow enough that it's OK to
just ignore it unconditionally and maybe that'd be OK ... but I fear
this is a rathole.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2013-12-10 Thread Alvaro Herrera
Robert Haas escribió:
> On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
>  wrote:

> > Speaking of the functionality this does offer, it seems pretty limited. A
> > commit timestamp is nice, but it isn't very interesting on its own. You
> > really also want to know what the transaction did, who ran it, etc. ISTM
> > some kind of a auditing or log-parsing system that could tell you all that
> > would be much more useful, but this patch doesn't get us any closer to that.
> 
> For what it's worth, I think that this has been requested numerous
> times over the years by numerous developers of replication solutions.
> My main question (apart from whether or not it may have bugs) is
> whether it makes a noticeable performance difference.  If it does,
> that sucks.  If it does not, maybe we ought to enable it by default.

I expect it will have some performance impact -- this is why we made it
disable-able in the first place, and why I went to the trouble of
ensuring it can be turned on after initdb.  Normal pg_clog entries are 2
bits per transaction, whereas the commit timestamp stuff adds 12 *bytes*
per transaction.  Not something to be taken lightly, hence it's off by
default.  Presumably people who is using one of those replication
systems is okay with taking some (reasonable) performance hit.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Tom Lane
Robert Haas  writes:
> This is a very good point.  Annotating the function itself with
> markers that cause it to be more strictly checked will create a
> dump/reload problem that we won't enjoy solving.  The decision to
> check the function more strictly or not would need to be based on some
> kind of session state that users could establish but dump restore
> would not.

One would hope that turning off check_function_bodies would be sufficient
to disable any added checking, though, so I don't see this being a problem
for pg_dump.  But there might be other scenarios where an additional knob
would be useful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 4:56 AM, Heikki Linnakangas
 wrote:
> Generally speaking, I'm not in favor of adding dead code, even if it might
> be useful to someone in the future. For one, it's going to get zero testing.
> Once someone comes up with an actual use case, let's add that stuff at that
> point. Otherwise there's a good chance that we build something that's almost
> but not quite useful.

Fair.

> Speaking of the functionality this does offer, it seems pretty limited. A
> commit timestamp is nice, but it isn't very interesting on its own. You
> really also want to know what the transaction did, who ran it, etc. ISTM
> some kind of a auditing or log-parsing system that could tell you all that
> would be much more useful, but this patch doesn't get us any closer to that.

For what it's worth, I think that this has been requested numerous
times over the years by numerous developers of replication solutions.
My main question (apart from whether or not it may have bugs) is
whether it makes a noticeable performance difference.  If it does,
that sucks.  If it does not, maybe we ought to enable it by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 1:45 AM, Pavel Stehule  wrote:
> Now, PG has no any tool for checking dependency between functions and other
> objects. What has positive effects - we have very simply deploying, that
> works in almost use cases very well - and works with our temporary tables
> implementation. There is simple rule - depended object must living before
> they are >>used in runtime<<. But checking should not be runtime event and
> we would to decrease a false alarms. So we have to expect so almost all
> necessary object are created - it is reason, why we decided don't do check
> in create function statement time. We don't would to introduce new
> dependency if it will be possible.

This is a very good point.  Annotating the function itself with
markers that cause it to be more strictly checked will create a
dump/reload problem that we won't enjoy solving.  The decision to
check the function more strictly or not would need to be based on some
kind of session state that users could establish but dump restore
would not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stats for network traffic WIP

2013-12-10 Thread Robert Haas
On Tue, Dec 10, 2013 at 12:29 AM, Fujii Masao  wrote:
> On Tue, Dec 10, 2013 at 6:56 AM, Nigel Heron  wrote:
>> On Sat, Dec 7, 2013 at 1:17 PM, Fujii Masao  wrote:
>>>
>>> Could you share the performance numbers? I'm really concerned about
>>> the performance overhead caused by this patch.
>>>
>>
>> I've tried pgbench in select mode with small data sets to avoid disk
>> io and didn't see any difference. That was on my old core2duo laptop
>> though .. I'll have to retry it on some server class multi core
>> hardware.
>
> When I ran pgbench -i -s 100 in four parallel, I saw the performance 
> difference
> between the master and the patched one. I ran the following commands.
>
> psql -c "checkpoint"
> for i in $(seq 1 4); do time pgbench -i -s100 -q db$i & done
>
> The results are:
>
> * Master
>   1000 of 1000 tuples (100%) done (elapsed 13.91 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 14.03 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 14.01 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 14.13 s, remaining 0.00 s).
>
>   It took almost 14.0 seconds to store 1000 tuples.
>
> * Patched
>   1000 of 1000 tuples (100%) done (elapsed 14.90 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 15.05 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 15.42 s, remaining 0.00 s).
>   1000 of 1000 tuples (100%) done (elapsed 15.70 s, remaining 0.00 s).
>
>   It took almost 15.0 seconds to store 1000 tuples.
>
> Thus, I'm afraid that enabling network statistics would cause serious
> performance
> degradation. Thought?

Yes, I think the overhead of this patch is far, far too high to
contemplate applying it.  It sends a stats collector message after
*every socket operation*.  Once per transaction would likely be too
much overhead already (think: pgbench -S) but once per socket op is
insane.

Moreover, even if we found some way to reduce that overhead to an
acceptable level, I think a lot of people would be unhappy about the
statsfile bloat.  Unfortunately, the bottom line here is that, until
someone overhauls the stats collector infrastructure to make
incremental updates to the statsfile cheap, we really can't afford to
add much of anything in the way of new statistics.  So I fear this
patch is doomed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reference to parent query from ANY sublink

2013-12-10 Thread Antonin Houska
On 12/06/2013 03:33 PM, Kevin Grittner wrote:
> Antonin Houska  wrote:
> 
>> SELECT *
>> FROMtab1 a
>>  LEFT JOIN
>>  tab2 b
>>  ON a.i = ANY (
>>  SELECT  k
>>  FROMtab3 c
>>  WHEREk = a.i);
> 
> This query works with k in any or all tables, but the semantics
> certainly vary depending on where k happens to be.  It would help a
> lot if you showed SQL statements to create and populate the tables
> involved and/or if you qualified all referenced column names with
> the table alias to avoid ambiguity.

I used the DDLs attached (tables.ddl) for this query too, not only for
the queries in quaries.sql. Yes, if I had mentioned it and/or qualified
the 'k' column reference, it wouldn't have broken anything.

> If I assume that the k reference is supposed to be a column in
> tab3, what you have is a query where you always get all rows from
> tab1, and for each row from tab1 you either match it to all rows
> from tab2 or no rows from tab2 depending on whether the tab1 row
> has a match in tab3.

I concede this particular query is not useful. But the important thing
to consider here is which side of the LEFT JOIN the subquery references.

>> SELECT  *
>> FROMtab1 a
>>  LEFT JOIN
>>  (
>>SELECT *
>>tab2 b
>>SEMI JOIN
>>(  SELECT  k
>>FROMtab3 c
>>WHERE  k = a.i
>>) AS ANY_subquery
>>ON a.i = ANY_subquery.k
>>  ) AS SJ_subquery
>>  ON true;
> 
> It is hard to see what you intend here, since this is not valid
> syntax.

This is what I - after having read the related source code - imagine to
happen internally when the ANY predicate of the first query is being
processed. In fact it should become something like this (also internal
stuff)

SELECT  *
FROMtab1 a
LEFT JOIN
(
  tab2 b
  SEMI JOIN
  (  SELECT  k
  FROMtab3 c
  WHERE  k = a.i
  ) AS ANY_subquery
  ON a.i = ANY_subquery.k
)
ON true;

that is, SEMI JOIN node inserted into the tree rather than a subquery
(SJ_subquery). I posted the construct with SJ_subquery to show how I
thought about the problem: I thought it's safe (even though not
necessarily beautiful) to wrap the SEMI JOIN into the SJ_subquery and
let the existing infrastructure decide whether it's legal to turn it
into a join node. I concluded that the subquery's references to the tab1
ensure that SJ_subquery won't be flattened, so the patch does nothing if
such a reference exists.

> PostgreSQL supports semi-joins; but that is an implementation detail
> for the EXISTS or IN syntax.

... and for ANY, see subselect.c:convert_ANY_sublink_to_join()

> Could you clarify your intent?

To get rid of a subplan in some cases that require it so far: when the
subquery references table exactly 1 level higher (i.e. the immediate
parent query).

(I got the idea while reading the source code, as opposed to query
tuning.)

// Antonin Houska (Tony)

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Heikki Linnakangas

On 12/10/2013 10:00 PM, Simon Riggs wrote:

On 10 December 2013 19:54, Josh Berkus  wrote:

On 12/10/2013 11:49 AM, Peter Geoghegan wrote:

On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs  wrote:
I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of block sampling would be preferable
for most or all purposes.


As discussed, we need math though.  Does anyone have an ACM subscription
and time to do a search?  Someone must.  We can buy one with community
funds, but no reason to do so if we don't have to.


We already have that, just use Vitter's algorithm at the block level
rather than the row level.


And what do you do with the blocks? How many blocks do you choose? 
Details, please.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 11:59 AM, Greg Stark  wrote:
> But I don't really think this is the right way to go about this.
> Research papers are going to turn up pretty specialized solutions that
> are probably patented. We don't even have the basic understanding we
> need. I suspect a basic textbook chapter on multistage sampling will
> discuss at least the standard techniques.

I agree that looking for information on block level sampling
specifically, and its impact on estimation quality is likely to not
turn up very much, and whatever it does turn up will have patent
issues.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 10 December 2013 19:54, Josh Berkus  wrote:
> On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
>> On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs  wrote:
>> I don't think that anyone believes that not doing block sampling is
>> tenable, fwiw. Clearly some type of block sampling would be preferable
>> for most or all purposes.
>
> As discussed, we need math though.  Does anyone have an ACM subscription
> and time to do a search?  Someone must.  We can buy one with community
> funds, but no reason to do so if we don't have to.

We already have that, just use Vitter's algorithm at the block level
rather than the row level.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Greg Stark
On Tue, Dec 10, 2013 at 7:54 PM, Josh Berkus  wrote:
> As discussed, we need math though.  Does anyone have an ACM subscription
> and time to do a search?  Someone must.  We can buy one with community
> funds, but no reason to do so if we don't have to.

Anyone in a university likely has access through their library.

But I don't really think this is the right way to go about this.
Research papers are going to turn up pretty specialized solutions that
are probably patented. We don't even have the basic understanding we
need. I suspect a basic textbook chapter on multistage sampling will
discuss at least the standard techniques.

Once we have a handle on the standard multistage sampling techniques
that would be safe from patents then we might want to go look at
research papers to find how they've been applied to databases in the
past but we would have to do that fairly carefully.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Josh Berkus
On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
> On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs  wrote: 
> I don't think that anyone believes that not doing block sampling is
> tenable, fwiw. Clearly some type of block sampling would be preferable
> for most or all purposes.

As discussed, we need math though.  Does anyone have an ACM subscription
and time to do a search?  Someone must.  We can buy one with community
funds, but no reason to do so if we don't have to.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 10 December 2013 19:49, Peter Geoghegan  wrote:
> On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs  wrote:
>> However, these things presume that we need to continue scanning most
>> of the blocks of the table, which I don't think needs to be the case.
>> There is a better way.
>
> Do they? I think it's one opportunistic way of ameliorating the cost.
>
>> Back in 2005/6, I advocated a block sampling method, as described by
>> Chaudri et al (ref?)
>
> I don't think that anyone believes that not doing block sampling is
> tenable, fwiw. Clearly some type of block sampling would be preferable
> for most or all purposes.

If we have one way of reducing cost of ANALYZE, I'd suggest we don't
need 2 ways - especially if the second way involves the interaction of
otherwise not fully related parts of the code.

Or to put it clearly, lets go with block sampling and then see if that
needs even more work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Peter Geoghegan
On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs  wrote:
> However, these things presume that we need to continue scanning most
> of the blocks of the table, which I don't think needs to be the case.
> There is a better way.

Do they? I think it's one opportunistic way of ameliorating the cost.

> Back in 2005/6, I advocated a block sampling method, as described by
> Chaudri et al (ref?)

I don't think that anyone believes that not doing block sampling is
tenable, fwiw. Clearly some type of block sampling would be preferable
for most or all purposes.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-10 Thread Alexander Korotkov
On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 12/09/2013 11:34 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
>> wrote:
>>
>>  Even if we use varbyte encoding, I wonder if it would be better to treat
>>> block + offset number as a single 48-bit integer, rather than encode them
>>> separately. That would allow the delta of two items on the same page to
>>> be
>>> stored as a single byte, rather than two bytes. Naturally it would be a
>>> loss on other values, but would be nice to see some kind of an analysis
>>> on
>>> that. I suspect it might make the code simpler, too.
>>>
>>
>> Yeah, I had that idea, but I thought it's not a better option. Will try to
>> do some analysis.
>>
>
> The more I think about that, the more convinced I am that it's a good
> idea. I don't think it will ever compress worse than the current approach
> of treating block and offset numbers separately, and, although I haven't
> actually tested it, I doubt it's any slower. About the same amount of
> arithmetic is required in both versions.
>
> Attached is a version that does that. Plus some other minor cleanup.
>
> (we should still investigate using a completely different algorithm,
> though)


Yes, when I though about that, I didn't realize that we can reserve less
than 16 bits for offset number.
I rerun my benchmark and got following results:

 event | period
---+-
 index_build   | 00:01:46.39056
 index_build_recovery  | 00:00:05
 index_update  | 00:06:01.557708
 index_update_recovery | 00:01:23
 search_new| 00:24:05.600366
 search_updated| 00:25:29.520642
(6 rows)

 label  | blocks_mark
+-
 search_new |   847509920
 search_updated |   883789826
(2 rows)

 label |   size
---+---
  new   | 364560384
 after_updates | 642736128
(2 rows)

Speed is same while index size is less. In previous format it was:

 label |   size
---+---
 new   | 419299328
 after_updates | 715915264
(2 rows)

Good optimization, thanks. I'll try another datasets but I expect similar
results.
However, patch didn't apply to head. Corrected version is attached.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-20.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Andres Freund
On 2013-12-10 19:23:37 +, Simon Riggs wrote:
> On 6 December 2013 09:21, Andres Freund  wrote:
> > On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
> >> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> >> other full-table scans? That doesn't really help Greg, because his
> >> complaint is mostly that a fresh ANALYZE is too expensive, but it
> >> could be an interesting, albeit risky approach.
> >
> > What I've been thinking of is
> >
> > a) making it piggy back on scans vacuum is doing instead of doing
> > separate ones all the time (if possible, analyze needs to be more
> > frequent). Currently with quite some likelihood the cache will be gone
> > again when revisiting.
> 
> > b) make analyze incremental. In lots of bigger tables most of the table
> > is static - and we actually *do* know that, thanks to the vm. So keep a
> > rawer form of what ends in the catalogs around somewhere, chunked by the
> > region of the table the statistic is from. Everytime a part of the table
> > changes, re-sample only that part. Then recompute the aggregate.
> 
> Piggy-backing sounds like a bad thing. If I run a query, I don't want
> to be given some extra task thanks! Especially if we might need to run
> data type code I'm not authroised to run, or to sample data I may not
> be authorised to see. The only way that could work is to kick off an
> autovacuum worker to run the ANALYZE as a separate process and then
> use synchronous scan behaviour to derive benefit indirectly.

I was suggesting to piggyback on VACUUM, not user queries. The latter
suggestion was somebody else.
In combination with incremental or chunk-wise building of statistics,
doing more frequent partial vacuums which re-compute the changing part
of the stats would be great. All those blocks have to be read anyway,
and we should be more agressive about vacuuming anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE sampling is too good

2013-12-10 Thread Simon Riggs
On 6 December 2013 09:21, Andres Freund  wrote:
> On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
>> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
>> other full-table scans? That doesn't really help Greg, because his
>> complaint is mostly that a fresh ANALYZE is too expensive, but it
>> could be an interesting, albeit risky approach.
>
> What I've been thinking of is
>
> a) making it piggy back on scans vacuum is doing instead of doing
> separate ones all the time (if possible, analyze needs to be more
> frequent). Currently with quite some likelihood the cache will be gone
> again when revisiting.

> b) make analyze incremental. In lots of bigger tables most of the table
> is static - and we actually *do* know that, thanks to the vm. So keep a
> rawer form of what ends in the catalogs around somewhere, chunked by the
> region of the table the statistic is from. Everytime a part of the table
> changes, re-sample only that part. Then recompute the aggregate.

Piggy-backing sounds like a bad thing. If I run a query, I don't want
to be given some extra task thanks! Especially if we might need to run
data type code I'm not authroised to run, or to sample data I may not
be authorised to see. The only way that could work is to kick off an
autovacuum worker to run the ANALYZE as a separate process and then
use synchronous scan behaviour to derive benefit indirectly.

However, these things presume that we need to continue scanning most
of the blocks of the table, which I don't think needs to be the case.
There is a better way.

Back in 2005/6, I advocated a block sampling method, as described by
Chaudri et al (ref?)
That has two advantages

* We don't need to visit all of the blocks, reducing I/O

* It would also give better analysis of clustered data (its all in
that paper...)

The recent patch on TABLESAMPLE contained a rewrite of the guts of
ANALYZE into a generic sample scan, which would then be used as the
basis for a TABLESAMPLE BERNOULLI. A TABLESAMPLE SYSTEM could use a
block sample, as it does in some other DBMS (DB2, SQLServer).

While I was unimpressed with the TABLESAMPLE patch, all it needs is
some committer-love, so Greg...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Andres Freund
On 2013-12-10 11:12:03 -0800, Josh Berkus wrote:
> On 12/10/2013 10:48 AM, Andres Freund wrote:
> > On 2013-12-10 10:44:30 -0800, Josh Berkus wrote:
> >> On 12/10/2013 10:39 AM, Andres Freund wrote:
> >>> Hi,
> >>>
> >>> On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
>  We've just run across a case of this exact issue on 9.2.4.  I thought it
>  was supposed to be 9.3-only?
> >>>
> >>> Could you please describe "this exact issue"?
> >>
> >> Fatal errors due to missing pg_subtrans files on creating a new replica.
> > 
> >> If this is fixed in 9.2.6, great, but I didn't get that impression from
> >> the commits ...
> > 
> > I am pretty sure this thread isn't about the bug you're hitting. You get
> > errors during xid assignment, right? If so, upgrade do 9.2.6, that's
> > fixed (it's the bug in which wake the replication bug was introduced).
> 
> I thought that only affected 9.2.5?  This machine is 9.2.4, as was the
> prior master.

9.2.5 tried to fix the issue (errors because of missing pg_subtrans
files during xid assignments), but the fix had unintended consequences,
namely the HS init bug. 9.2.6 removed those unintended consequences.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] coredump of 9.3.2

2013-12-10 Thread Tom Lane
Teodor Sigaev  writes:
> SELECT
>  *
> FROM
>  t
> WHERE (
>  CASE
>  WHEN a%2 IN (SELECT c FROM tt) THEN a
>  END IN (SELECT c FROM tt)
> );

> I suppose, the problem is connected to hashed subplan, but I'm not very 
> familiar 
> with executor. And this affects all supported versions of pgsql.

It seems to be a planner bug: it's doing the wrong thing with the
PARAM_SUBLINK Params for the nested IN SubLinks.  (They don't look
to be nested textually, but they are, and convert_testexpr is
mistakenly replacing the inner one's Params when it should only
be replacing the outer one's Params.)

I think this has probably been broken since about 2005 :-(.
The comment on convert_testexpr claims it doesn't need to worry
about nested cases; which is true when it's called during
SS_process_sublinks, but not so much when it's called from
convert_ANY_sublink_to_join.  Some digging in the git history
suggests that the latter call existed at the time, meaning the
comment was wrong even when written.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 10:48 AM, Andres Freund wrote:
> On 2013-12-10 10:44:30 -0800, Josh Berkus wrote:
>> On 12/10/2013 10:39 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
 We've just run across a case of this exact issue on 9.2.4.  I thought it
 was supposed to be 9.3-only?
>>>
>>> Could you please describe "this exact issue"?
>>
>> Fatal errors due to missing pg_subtrans files on creating a new replica.
> 
>> If this is fixed in 9.2.6, great, but I didn't get that impression from
>> the commits ...
> 
> I am pretty sure this thread isn't about the bug you're hitting. You get
> errors during xid assignment, right? If so, upgrade do 9.2.6, that's
> fixed (it's the bug in which wake the replication bug was introduced).

I thought that only affected 9.2.5?  This machine is 9.2.4, as was the
prior master.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Andres Freund
On 2013-12-10 10:44:30 -0800, Josh Berkus wrote:
> On 12/10/2013 10:39 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
> >> We've just run across a case of this exact issue on 9.2.4.  I thought it
> >> was supposed to be 9.3-only?
> > 
> > Could you please describe "this exact issue"?
> 
> Fatal errors due to missing pg_subtrans files on creating a new replica.

> If this is fixed in 9.2.6, great, but I didn't get that impression from
> the commits ...

I am pretty sure this thread isn't about the bug you're hitting. You get
errors during xid assignment, right? If so, upgrade do 9.2.6, that's
fixed (it's the bug in which wake the replication bug was introduced).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] coredump of 9.3.2

2013-12-10 Thread Josh Berkus
On 12/10/2013 09:39 AM, Teodor Sigaev wrote:
> 
> SELECT
> *
> FROM
> t
> WHERE (
> CASE
> WHEN a%2 IN (SELECT c FROM tt) THEN a
> END IN (SELECT c FROM tt)
> );

Wow, it wouldn't have occured to me that that was even supported syntax.
 I'm not suprised that it doesn't work ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
On 12/10/2013 10:39 AM, Andres Freund wrote:
> Hi,
> 
> On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
>> We've just run across a case of this exact issue on 9.2.4.  I thought it
>> was supposed to be 9.3-only?
> 
> Could you please describe "this exact issue"?

Fatal errors due to missing pg_subtrans files on creating a new replica.

Sequence:

1. Failed over from master to replica #1

2. Remastered other replicas

3. Tried to create a new replica.

4. New replica started failing with errors similar to the original report.

If this is fixed in 9.2.6, great, but I didn't get that impression from
the commits ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Andres Freund
Hi,

On 2013-12-10 10:38:32 -0800, Josh Berkus wrote:
> We've just run across a case of this exact issue on 9.2.4.  I thought it
> was supposed to be 9.3-only?

Could you please describe "this exact issue"?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3

2013-12-10 Thread Josh Berkus
Andres, all:

We've just run across a case of this exact issue on 9.2.4.  I thought it
was supposed to be 9.3-only?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Heikki Linnakangas

On 12/10/2013 07:27 PM, Noah Misch wrote:

On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:

On 11/20/2013 09:58 PM, Robert Haas wrote:

On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas  
wrote:

* As discussed in the "Something fishy happening on frogmouth" thread, I
don't like the fact that the dynamic shared memory segments will be
permanently leaked if you kill -9 postmaster and destroy the data directory.


Your test elicited different behavior for the dsm code vs. the main
shared memory segment because it involved running a new postmaster
with a different data directory but the same port number on the same
machine, and expecting that that new - and completely unrelated -
postmaster would clean up the resources left behind by the old,
now-destroyed cluster.  I tend to view that as a defect in your test
case more than anything else, but as I suggested previously, we could
potentially change the code to use something like 100 + (port *
100) with a forward search for the control segment identifier, instead
of using a state file, mimicking the behavior of the main shared
memory segment.  I'm not sure we ever reached consensus on whether
that was overall better than what we have now.


I really think we need to do something about it. To use your earlier
example of parallel sort, it's not acceptable to permanently leak a 512
GB segment on a system with 1 TB of RAM.


I don't.  Erasing your data directory after an unclean shutdown voids any
expectations for a thorough, automatic release of system resources.  Don't do
that.  The next time some new use of a persistent resource violates your hope
for this scenario, there may be no remedy.


Well, the point of erasing the data directory is to release system 
resources. I would normally expect "killall -9 ; rm -rf dir>" to thorougly get rid of the running program and all the resources. 
It's surprising enough that the regular shared memory segment is left 
behind, but at least that one gets cleaned up when you start a new 
server (on same port). Let's not add more cases like that, if we can 
avoid it.


BTW, what if the data directory is seriously borked, and the server 
won't start? Sure, don't do that, but it would be nice to have a way to 
recover if you do anyway. (docs?)



One idea is to create the shared memory object with shm_open, and wait
until all the worker processes that need it have attached to it. Then,
shm_unlink() it, before using it for anything. That way the segment will
be automatically released once all the processes close() it, or die. In
particular, kill -9 will release it. (This is a variant of my earlier
idea to create a small number of anonymous shared memory file
descriptors in postmaster startup with shm_open(), and pass them down to
child processes with fork()). I think you could use that approach with
SysV shared memory as well, by destroying the segment with
sgmget(IPC_RMID) immediately after all processes have attached to it.


That leaves a window in which we still leak the segment,


A small window is better than a large one.

Another refinement is to wait for all the processes to attach before 
setting the segment's size with ftruncate(). That way, when the window 
is open for leaking the segment, it's still 0-sized so leaking it is not 
a big deal.



and it is less
general: not every use of DSM is conducive to having all processes attach in a
short span of time.


Let's cross that bridge when we get there. AFAICS it fits all the use 
cases discussed this far.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-12-10 Thread Antonin Houska
Thanks for checking. The new version addresses your findings.

// Antonin Houska (Tony)

On 12/09/2013 03:49 PM, Fujii Masao wrote:
> On Fri, Dec 6, 2013 at 6:43 PM, Boszormenyi Zoltan  wrote:
>> Hi,
>>
>> 2013-12-05 15:36 keltezéssel, Antonin Houska írta:
>>
>>> On 12/02/2013 02:23 PM, Boszormenyi Zoltan wrote:

 Hi,

 I am reviewing your patch.
>>>
>>> Thanks. New version attached.
>>
>>
>> I have reviewed and tested it and marked it as ready for committer.
> 
> Here are the review comments:
> 
> +  -r
> +  --max-rate
> 
> You need to add something like  class="parameter">rate.
> 
> +The purpose is to limit impact of
> pg_basebackup
> +on a running master server.
> 
> s/"master server"/"server" because we can take a backup from also the standby.
> 
> I think that it's better to document the default value and the accepted range 
> of
> the rate that we can specify.
> 
> You need to change the protocol.sgml because you changed BASE_BACKUP
> replication command.
> 
> +printf(_("  -r, --max-rate maximum transfer rate to
> transfer data directory\n"));
> 
> You need to add something like =RATE just after --max-rate.
> 
> +result = strtol(src, &after_num, 0);
> 
> errno should be set to 0 just before calling strtol().
> 
> +if (errno_copy == ERANGE || result != (uint64) ((uint32) result))
> +{
> +fprintf(stderr, _("%s: transfer rate \"%s\" exceeds integer
> range\n"), progname, src);
> +exit(1);
> +}
> 
> We can move this check after the check of "src == after_num" like
> parse_int() in guc.c does.
> If we do this, the local variable 'errno_copy' is no longer necessary.
> 
> I think that it's better to output the hint message like "Valid units for
> the transfer rate are \"k\" and \"M\"." when a user specified wrong unit.
> 
> +/*
> + * THROTTLING_SAMPLE_MIN / MAX_RATE_LOWER (in seconds) should be the
> + * longest possible time to sleep. Thus the cast to long is safe.
> + */
> +pg_usleep((long) sleep);
> 
> It's better to use the latch here so that we can interrupt immediately.
> 
> Regards,
> 

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..2f24fff 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1719,7 +1719,7 @@ The commands accepted in walsender mode are:
   
 
   
-BASE_BACKUP [LABEL 'label'] [PROGRESS] [FAST] [WAL] [NOWAIT]
+BASE_BACKUP [LABEL 'label'] [PROGRESS] [FAST] [WAL] [NOWAIT] [MAX_RATE]
 
  
   Instructs the server to start streaming a base backup.
@@ -1787,7 +1787,23 @@ The commands accepted in walsender mode are:
   the waiting and the warning, leaving the client responsible for
   ensuring the required log is available.
  
- 
+
+   
+   
+MAX_RATE
+
+ 
+  Limit the maximum amount of data transferred to client per unit of
+  time. The expected unit is bytes per second. If
+  MAX_RATE is passed, it must be either equal to
+  zero or fall to range 32 kB through 1 GB (inclusive). If zero is
+  passed or the option is not passed at all, no restriction is imposed
+  on the transfer.
+ 
+ 
+  MAX_RATE does not affect WAL streaming.
+ 
+

   
  
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c379df5..caede77 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,28 @@ PostgreSQL documentation
  
 
  
+  -r rate
+  --max-rate=rate
+  
+   
+The maximum amount of data transferred from server per second.
+The purpose is to limit impact of pg_basebackup
+on running server.
+   
+   
+This option always affects transfer of the data directory. Transfer of
+WAL files is only affected if the collection method is fetch.
+   
+   
+Value between 32 kB and 1024 MB
+is expected. Suffixes k (kilobytes) and
+M (megabytes) are accepted. For example:
+10M
+   
+  
+ 
+
+ 
   -R
   --write-recovery-conf
   
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ba8d173..e3ff2cf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,9 +30,11 @@
 #include "replication/walsender_private.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "utils/builtins.h"
 #include "utils/elog.h"
 #include "utils/ps_status.h"
+#include "utils/timestamp.h"
 #include "pgtar.h"
 
 typedef struct
@@ -42,6 +44,7 @@ typedef struct
 	bool		fastcheckpoint;
 	bool		nowait;
 	bool		includewal;
+	uint32		maxrate;
 } basebackup_options;
 
 
@@ -59,6 +62,7 @@ static void perform_base_backup(basebackup_options *opt,

[HACKERS] coredump of 9.3.2

2013-12-10 Thread Teodor Sigaev

Hi!

Rather simple script (original query was a 500 lines of SQL with >100 Mb of 
gzipped dump. Query is looked strange, but actually it was auto-generated):


CREATE TABLE t (a int, b int);
CREATE TABLE tt (c int);

INSERT INTO t VALUES (1,1), (2,2);
INSERT INTO tt VALUES (2);


SELECT
*
FROM
t
WHERE (
CASE
WHEN a%2 IN (SELECT c FROM tt) THEN a
END IN (SELECT c FROM tt)
);

I suppose, the problem is connected to hashed subplan, but I'm not very familiar 
with executor. And this affects all supported versions of pgsql.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-12-10 Thread Noah Misch
On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
> On 11/20/2013 09:58 PM, Robert Haas wrote:
>> On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas 
>>  wrote:
>>> * As discussed in the "Something fishy happening on frogmouth" thread, I
>>> don't like the fact that the dynamic shared memory segments will be
>>> permanently leaked if you kill -9 postmaster and destroy the data directory.
>>
>> Your test elicited different behavior for the dsm code vs. the main
>> shared memory segment because it involved running a new postmaster
>> with a different data directory but the same port number on the same
>> machine, and expecting that that new - and completely unrelated -
>> postmaster would clean up the resources left behind by the old,
>> now-destroyed cluster.  I tend to view that as a defect in your test
>> case more than anything else, but as I suggested previously, we could
>> potentially change the code to use something like 100 + (port *
>> 100) with a forward search for the control segment identifier, instead
>> of using a state file, mimicking the behavior of the main shared
>> memory segment.  I'm not sure we ever reached consensus on whether
>> that was overall better than what we have now.
>
> I really think we need to do something about it. To use your earlier  
> example of parallel sort, it's not acceptable to permanently leak a 512  
> GB segment on a system with 1 TB of RAM.

I don't.  Erasing your data directory after an unclean shutdown voids any
expectations for a thorough, automatic release of system resources.  Don't do
that.  The next time some new use of a persistent resource violates your hope
for this scenario, there may be no remedy.

Having said that, I'm not opposed to storing the DSM control segment handle in
PGShmemHeader, which would enable DSM cleanup whenever we find cause to
reclaim a main sysv segment.

> One idea is to create the shared memory object with shm_open, and wait  
> until all the worker processes that need it have attached to it. Then,  
> shm_unlink() it, before using it for anything. That way the segment will  
> be automatically released once all the processes close() it, or die. In  
> particular, kill -9 will release it. (This is a variant of my earlier  
> idea to create a small number of anonymous shared memory file  
> descriptors in postmaster startup with shm_open(), and pass them down to  
> child processes with fork()). I think you could use that approach with  
> SysV shared memory as well, by destroying the segment with  
> sgmget(IPC_RMID) immediately after all processes have attached to it.

That leaves a window in which we still leak the segment, and it is less
general: not every use of DSM is conducive to having all processes attach in a
short span of time.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why standby.max_connections must be higher than primary.max_connections?

2013-12-10 Thread 山田聡
* Hello hackers. I am struggling to understand why standby.max_connections
must be higher than primary.max_connections.Do
 someone know the reason why? I know
that standby cluster can not start if standby.max_connections is higher
than primary.max_connections.
http://www.postgresql.org/docs/9.3/static/hot-standby.html
 --- The
setting of some parameters on the standby will need reconfiguration if they
have been changed on the primary. For these parameters, the value on the
standby must be equal to or greater than the value on the primary. If these
parameters are not set high enough then the standby will refuse to start.
Higher values can then be supplied and the server restarted to begin
recovery again. These parameters are: max_connections
max_prepared_transactions max_locks_per_transaction --- I've also read
src/backend/access/transam/xlog.c. But there are no comments about a
reason. (I cannot find the reason,why standby.max_connections must be
higher than primary.max_connections.) I think this restriction obstructs
making a flexible load balancing. I'd like to make standby database to use
load balancing.Of course a role of a standby server is different from one
of a master server. So I think it it natural that I want to set
standby.max_connections less than primary.max_connections. thanks & regards*


Re: [HACKERS] pg_archivecleanup bug

2013-12-10 Thread Bruce Momjian
On Thu, Dec  5, 2013 at 12:06:07PM -0800, Kevin Grittner wrote:
> An EDB customer reported a problem with pg_archivecleanup which I
> have looked into and found a likely cause.  It is, in any event, a
> bug which I think should be fixed.  It has to do with our use of
> the readdir() function:
> 
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html
> 
> These are the relevant bits:
> 
> | Applications wishing to check for error situations should set
> | errno to 0 before calling readdir(). If errno is set to non-zero
> | on return, an error occurred.
> 
> | Upon successful completion, readdir() returns a pointer to an
> | object of type struct dirent. When an error is encountered, a
> | null pointer is returned and errno is set to indicate the error.
> | When the end of the directory is encountered, a null pointer is
> | returned and errno is not changed.

Wow, another case where errno clearing is necessary.  We were just
looking this requirement for getpwuid() last week.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON decoding plugin

2013-12-10 Thread Merlin Moncure
On Mon, Dec 9, 2013 at 10:53 AM, Euler Taveira  wrote:
> On 09-12-2013 13:12, Merlin Moncure wrote:
>> This is pretty neat.   Couple minor questions:
>> *) Aren't you *en*coding data into json, not the other way around (decoding?)
>>
> Yes. The 'decoding' came from the functionality (logical decoding) and
> because the POC plugin is named 'test_decoding'. I also think that
> 'json_decoding' doesn't say much about the module purpose. I confess
> that I don't like the name but can't come up with a good name. Maybe
> 'wal2json' or 'logrep2json'? Could you suggest something?

I'm partial to wal2json actually.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of tables

2013-12-10 Thread Merlin Moncure
On Tue, Dec 10, 2013 at 1:18 AM, Thomas Munro  wrote:
> Hi
>
> I have been wondering what the minimum useful heap table compression
> system would be for Postgres, in order to reduce disk footprint of
> large mostly static datasets.  Do you think an approach similar to the
> static row-level compression of DB2 could make sense?  I imagine
> something like this:
>
> 1.  You have a table which already has data in it.
>
> 2.  You run a COMPRESS operation, which builds a static dictionary,
> and rewrites the whole table with compressed frozen tuples.  Frozen
> tuples have CTIDs just like regular tuples, and can be pointed to by
> indexes.  They are decompressed on the fly when needed.
>
> Clearly things get tricky once you need to update rows.  Assume for
> simplicity that future UPDATEs and INSERTs produce normal,
> non-compressed tuples that would only be compressed by a subsequent
> COMPRESS operation.  The question is how to deal with the existing
> compressed rows when UPDATEd or DELETEd.  Some approaches:
>
> 1.  Just don't allow updates of compressed rows (!).
>
> 2.  Exclusively lock the whole page when trying to update any
> compressed row, while you explode it into regular uncompressed tuples
> on new pages which you can work on (!).
>
> 3.  Pull the minimum header fields out of the compressed tuples so
> that the MVCC machinery can work, to support updates of compressed
> tuples.  Perhaps just the t_xmax, t_ctid values (t_xmin == frozen is
> implied), so that a writer can update them.  This means an overhead of
> at least 10 bytes per tuple over the compressed size (plus the item
> offsets in the page header).
>
> 4.  Something far cleverer.
>
> Well, these are straw-man suggestions really and I probably don't
> understand enough about PG internals (MVCC and implications for
> VACUUM) to be making them.  But I'm curious to know if anyone has
> researched something like this.
>
> For example, I have a system that occupies a couple of TB on disk, but
> only a few to a few hundred MB per day change, mostly adding data to
> an active partition.  I periodically run CLUSTER on any partition that
> has pg_stat.correlation < 0.9 (this effectively just re-CLUSTERs the
> active one).  At the same times I would COMPRESS, and the DB could
> potentially fit on much smaller SSDs.
>
> Most commercial database systems I encounter these days are using
> compression of some sort (more sophisticated than the above,
> with dynamic dictionaries, and sometimes column based storage etc).

postgres compresses TOASTED data: one strategy could be to arrange
your data somehow to utilize TOAST.

I doubt you'll ever see generally heap compressed data in the way
you're thinking: postgres has a strong informal policy of not
implementing features which are dubious and or excessively complicated
with limited benefit, particularly if there are ways to handle this
outside the database; there are various operating system level tricks
that can cause a compressed file or even an entire tablespace (o/s
folder) masquerade as a regular structures.  So maybe you are asking
for a feature we already have: CREATE TABLESPACE.

For example take a look here:
https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F

(out of curiosity, if this strategy fits the bill for you I wouldn't
mind seeing a follow up on how this handles your static data use
case).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra functionality to createuser

2013-12-10 Thread Amit Kapila
On Tue, Dec 10, 2013 at 12:20 AM, Robert Haas  wrote:
> On Sat, Dec 7, 2013 at 11:39 PM, Amit Kapila  wrote:
>> On Fri, Dec 6, 2013 at 10:31 AM, Peter Eisentraut  wrote:
>>>
>>> How about only one role name per -g option, but allowing the -g option
>>> to be repeated?
>>
>>I think that might simplify the problem and patch, but do you think
>> it is okay to have inconsistency
>>for usage of options between Create User statement and this utility?
>
> Yes.  In general, command-line utilities use a very different syntax
> for options-passing that SQL commands.  Trying to make them consistent
> feels unnecessary or perhaps even counterproductive.  And the proposed
> syntax is certainly a convention common to many other command-line
> utilities, so I think it's fine.

Okay, the new way for syntax suggested by Peter has simplified the problem.
Please find the updated patch and docs for multiple -g options.

If there are no objections, then I will mark this patch as Ready For Committer.

Christopher, please check once, if you have any comments/objections
for modifications.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 2f1ea2f..63d4c6c 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -131,6 +131,19 @@ PostgreSQL documentation
  
 
  
+  -g role
+  --role=role
+  
+   
+ Indicates role to which this role will be added immediately as a new
+ member. Multiple roles to which this role will be added as a member
+ can be specified by writing multiple
+ -g switches.
+ 
+  
+ 
+
+ 
   -i
   --inherit
   
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 83623ea..d98b420 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -24,6 +24,7 @@ main(int argc, char *argv[])
{"host", required_argument, NULL, 'h'},
{"port", required_argument, NULL, 'p'},
{"username", required_argument, NULL, 'U'},
+   {"role", required_argument, NULL, 'g'},
{"no-password", no_argument, NULL, 'w'},
{"password", no_argument, NULL, 'W'},
{"echo", no_argument, NULL, 'e'},
@@ -57,6 +58,7 @@ main(int argc, char *argv[])
char   *host = NULL;
char   *port = NULL;
char   *username = NULL;
+   SimpleStringList roles = {NULL, NULL};
enum trivalue prompt_password = TRI_DEFAULT;
boolecho = false;
boolinteractive = false;
@@ -83,7 +85,7 @@ main(int argc, char *argv[])
 
handle_help_version_opts(argc, argv, "createuser", help);
 
-   while ((c = getopt_long(argc, argv, "h:p:U:wWedDsSaArRiIlLc:PEN",
+   while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSaArRiIlLc:PEN",
long_options, 
&optindex)) != -1)
{
switch (c)
@@ -97,6 +99,9 @@ main(int argc, char *argv[])
case 'U':
username = pg_strdup(optarg);
break;
+   case 'g':
+   simple_string_list_append(&roles, optarg);
+   break;
case 'w':
prompt_password = TRI_NO;
break;
@@ -302,6 +307,19 @@ main(int argc, char *argv[])
appendPQExpBufferStr(&sql, " NOREPLICATION");
if (conn_limit != NULL)
appendPQExpBuffer(&sql, " CONNECTION LIMIT %s", conn_limit);
+   if (roles.head != NULL)
+   {
+   SimpleStringListCell *cell;
+   appendPQExpBufferStr(&sql, " IN ROLE ");
+
+   for (cell = roles.head; cell; cell = cell->next)
+   {
+   if (cell->next)
+   appendPQExpBuffer(&sql, "%s,", 
fmtId(cell->val));
+   else
+   appendPQExpBuffer(&sql, "%s", fmtId(cell->val));
+   }
+   }
appendPQExpBufferStr(&sql, ";\n");
 
if (echo)
@@ -334,6 +352,7 @@ help(const char *progname)
printf(_("  -D, --no-createdb role cannot create databases 
(default)\n"));
printf(_("  -e, --echoshow the commands being sent to 
the server\n"));
printf(_("  -E, --encrypted   encrypt stored password\n"));
+   printf(_("  -g, --role=ROLE   role(s) to associate with this 
new role\n"));
printf(_("  -i, --inherit role inherits privileges of roles 
it is a\n"
 "member of (default)\n"));
printf(_("  -I, --no-inherit  role does not inherit 
privileges\n"));

-- 
Sen

  1   2   >