Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On May 10, 2012, at 4:19 PM, Andrew Dunstan  wrote:
> On 05/10/2012 06:15 PM, Tom Lane wrote:
>> How about a hybrid: we continue to identify patch authors as now, that is 
>> with names attached to the feature/bugfix descriptions, and then have a 
>> separate section "Other Contributors" to recognize patch reviewers and other 
>> helpers?
> 
> works for me.

Me, too.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Kevin Grittner
[point splintered in quoting re-joined]
 
Florian Pflug  wrote:
> On May10, 2012, at 18:36 , Kevin Grittner wrote:
>> Robert Haas  wrote:
>>
>>> I wonder if you could do this with something akin to the Bitmap
>>> Heap Scan machinery. Populate a TID bitmap with a bunch of
>>> randomly chosen TIDs, fetch them all in physical order
>>> and if you don't get as many rows as you need, rinse and repeat
>>> until you do.
>>
>> If you get too many, it is important that you read all the way to
>> the end and then randomly omit some of them.  While a bit of a
>> bother, that's pretty straightforward and should be pretty fast,
>> assuming you're not, like, an order of magnitude high.
>
> Why is that? From a statistical point of view it shouldn't matter
> whether you pick N random samples, or pick M >= N random samples an
> then randomly pick N from M. (random implying uniformly distributed
> here).
 
That sounds to me like exactly what what Robert and I both said.
While passing the heap with the bitmap, if you get to the number you
want you don't stop there -- you read all of them ("M" in your
parlance) and randomly drop M minus N of them.  Or, if you prefer, I
guess you could *pick* N of them.  I don't see a logical difference.
 
>> But falling short is tougher; making up the difference could be an
>> iterative process, which could always wind up with having you read
>> all tuples in the table without filling your sample.
>
> But the likelihood of that happening is extremely low, no?
 
That depends.  What if someone just did a mass delete and your
statistics aren't yet up to date when they ask to pick a relatively
large percentage of the rows.
 
> Unless the sampling percentage is very high
 
Or the statistics are not current.  I agree, this shouldn't happen
often, but we can never know, going in, whether it *is* the case.
You *could* always wind up needing to read the entire table, and
still not hit the initially-targeted number of rows.  Now, arguably
you could use data gleaned from each pass to adjust the target or
inform the size of the next pass.  My point is that "we selected too
few" is a lot more complicated that the "we selected too many" case.
 
> but that case isn't of much practical importance anyway.
 
It's important that it's handled in some sane way when it happens.
And it will happen.
 
> But something else comes to mind. Does the standard permit samples
> taken with the BERNOULLI method to contain the same tuple multiple
> times?
 
I'm pretty sure not.  That would be nonsensical.
 
> If not, any kind of TID-based approach will have to all previously
> fetched TIDs, which seems doable but unfortunate...
 
Right.  You would always need to ignore a duplicate random choice in
any one cycle of generating ctid values; and if you are iterating
because you fell short, you would also need to ignore values from
previous iterations.  And OR your new bitmap against the previous
ones to save for the next iteration, if needed.  I never said it
couldn't be done; it's just very fussy, and you want to avoid a very
large number of iterations in the case that someone deleted 99.99% of
your rows right before you ran your sample and before autovacuum
caught up.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-05-10 Thread Etsuro Fujita
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Friday, May 11, 2012 1:36 AM
> To: Etsuro Fujita
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
> foreign tables
> 
> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita

> wrote:
> > I would like to propose to improve parsing efficiency of
> > contrib/file_fdw by selective parsing proposed by Alagiannis et
> > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw
> > performs binary conversion only for the columns needed for query
> > processing.  Attached is a WIP patch implementing the feature.
> 
> Can you add this to the next CommitFest?  Looks interesting.

Done.

Best regards,
Etsuro Fujita

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread Tom Lane
"David E. Wheeler"  writes:
> Ooh, heisenbug. What version of Perl? Mine is 5.14.2 compiled from source.

I also tried this on a Fedora 16 box, which has

$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for 
x86_64-linux-thread-multi

Works fine there too...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, May 10, 2012 at 05:27:26PM -0700, David E. Wheeler wrote:
>> Interesting. My build (from source):
>> 
>> PostgreSQL 9.1.3 on x86_64-apple-darwin11.3.0, compiled by 
>> i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 
>> 5658) (LLVM build 2336.1.00), 64-bit
>> (1 row)

> OK, still an abort on 9.1.X head:

I can't reproduce this problem either.  I tested HEAD and 9.1 branch tip
on my Mac laptop, using what appears to be the same compiler version
that David is using, as well as the Apple-supplied perl:

$ which perl
/usr/bin/perl
$ perl -v

This is perl 5, version 12, subversion 3 (v5.12.3) built for 
darwin-thread-multi-2level
(with 2 registered patches, see perl -V for more detail)
Copyright 1987-2010, Larry Wall

I wonder whether David is using some other perl ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 05:46:06PM -0700, David E. Wheeler wrote:
> On May 10, 2012, at 5:41 PM, Bruce Momjian wrote:
> 
> > OK, still an abort on 9.1.X head:
> > 
> > $ psql test
> > psql (9.1.3)
> > Type "help" for help.
> > 
> > test=> begin;
> > BEGIN
> > test=> do language plperl $$ elog(ERROR, 'foo')$$;
> > ERROR:  foo at line 1.
> > CONTEXT:  PL/Perl anonymous code block
> > test=> select true;
> > ERROR:  current transaction is aborted, commands ignored until end of
> > transaction block
> 
> Ooh, heisenbug. What version of Perl? Mine is 5.14.2 compiled from source.

I have:

  This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread David E. Wheeler
On May 10, 2012, at 5:41 PM, Bruce Momjian wrote:

> OK, still an abort on 9.1.X head:
> 
>   $ psql test
>   psql (9.1.3)
>   Type "help" for help.
>   
>   test=> begin;
>   BEGIN
>   test=> do language plperl $$ elog(ERROR, 'foo')$$;
>   ERROR:  foo at line 1.
>   CONTEXT:  PL/Perl anonymous code block
>   test=> select true;
>   ERROR:  current transaction is aborted, commands ignored until end of
>   transaction block

Ooh, heisenbug. What version of Perl? Mine is 5.14.2 compiled from source.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 05:27:26PM -0700, David E. Wheeler wrote:
> On May 10, 2012, at 5:20 PM, Bruce Momjian wrote:
> 
> >> 
> >> Shouldn't a call to elog(NOTICE) invalidate the current tranaction?
> > 
> > I assume you mean elog(ERROR)?
> 
> Yes, sorry.
> 
> > Well, git head show an error:
> > 
> > test=> begin;
> > BEGIN
> > test=> do language plperl $$ elog(ERROR, 'foo')$$;
> > ERROR:  foo at line 1.
> > CONTEXT:  PL/Perl anonymous code block
> > test=> select true;
> > ERROR:  current transaction is aborted, commands ignored until end of
> > transaction block
> 
> Interesting. My build (from source):
> 
>  PostgreSQL 9.1.3 on x86_64-apple-darwin11.3.0, compiled by 
> i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) 
> (LLVM build 2336.1.00), 64-bit
> (1 row)

OK, still an abort on 9.1.X head:

$ psql test
psql (9.1.3)
Type "help" for help.

test=> begin;
BEGIN
test=> do language plperl $$ elog(ERROR, 'foo')$$;
ERROR:  foo at line 1.
CONTEXT:  PL/Perl anonymous code block
test=> select true;
ERROR:  current transaction is aborted, commands ignored until end of
transaction block

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Florian Pflug
On May10, 2012, at 18:36 , Kevin Grittner wrote:
> Robert Haas  wrote:
> 
>> I wonder if you could do this with something akin to the Bitmap
>> Heap Scan machinery.  Populate a TID bitmap with a bunch of
>> randomly chosen TIDs, fetch them all in physical order
>> and if you don't get as many rows as you need, rinse and repeat
>> until you do.
> 
> Ay, there's the rub.  If you get too many, it is important that you
> read all the way to the end and then randomly omit some of them.

Why is that? From a statistical point of view it shouldn't matter
whether you pick N random samples, or pick M >= N random samples an
then randomly pick N from M. (random implying uniformly distributed
here).

> While a bit of a bother, that's pretty straightforward and should be
> pretty fast, assuming you're not, like, an order of magnitude high. 
> But falling short is tougher; making up the difference could be an
> iterative process, which could always wind up with having you read
> all tuples in the table without filling your sample.

But the likelihood of that happening is extremely low, no? Unless the
sampling percentage is very high, that is, but that case isn't of much
practical importance anyway.

But something else comes to mind. Does the standard permit samples taken
with the BERNOULLI method to contain the same tuple multiple times? If
not, any kind of TID-based approach will have to all previously fetched
TIDs, which seems doable but unfortunate...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread David E. Wheeler
On May 10, 2012, at 5:20 PM, Bruce Momjian wrote:

>> 
>> Shouldn't a call to elog(NOTICE) invalidate the current tranaction?
> 
> I assume you mean elog(ERROR)?

Yes, sorry.

> Well, git head show an error:
> 
>   test=> begin;
>   BEGIN
>   test=> do language plperl $$ elog(ERROR, 'foo')$$;
>   ERROR:  foo at line 1.
>   CONTEXT:  PL/Perl anonymous code block
>   test=> select true;
>   ERROR:  current transaction is aborted, commands ignored until end of
>   transaction block

Interesting. My build (from source):

 PostgreSQL 9.1.3 on x86_64-apple-darwin11.3.0, compiled by 
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) 
(LLVM build 2336.1.00), 64-bit
(1 row)

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 04:21:24PM -0700, David E. Wheeler wrote:
> Hackers,
> 
> Shouldn't a call to elog(NOTICE) invalidate the current tranaction?

I assume you mean elog(ERROR)?

> david=# begin;
> BEGIN
> Time: 0.178 ms
> david=# do language plperl $$ elog(ERROR, 'foo')$$;
> ERROR:  foo at line 1.
> CONTEXT:  PL/Perl anonymous code block
> david=# select true;
>  bool 
> --
>  t
> (1 row)
> 
> Time: 0.203 ms
> 
> The docs say:
> 
> > ERROR raises an error condition; if this is not trapped by the surrounding 
> > Perl code, the error propagates out to the calling query, causing the 
> > current transaction or subtransaction to be aborted.
> 
> So I'm surprised that the transaction is not aborted. Bug?

Well, git head show an error:

test=> begin;
BEGIN
test=> do language plperl $$ elog(ERROR, 'foo')$$;
ERROR:  foo at line 1.
CONTEXT:  PL/Perl anonymous code block
test=> select true;
ERROR:  current transaction is aborted, commands ignored until end of
transaction block

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous_commit and remote_write

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 03:55:45PM -0700, Josh Berkus wrote:
> 
> > So, are we shipping remote_write in beta1?
> 
> Given that it's thursday afternoon US time, and we haven't changed it
> yet, yes.

Did we conclude just the docs are wrong and we do write (but not fsync)
on the remote?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PL/perl elog(ERROR) Does not Abort Transaction

2012-05-10 Thread David E. Wheeler
Hackers,

Shouldn't a call to elog(NOTICE) invalidate the current tranaction?

david=# begin;
BEGIN
Time: 0.178 ms
david=# do language plperl $$ elog(ERROR, 'foo')$$;
ERROR:  foo at line 1.
CONTEXT:  PL/Perl anonymous code block
david=# select true;
 bool 
--
 t
(1 row)

Time: 0.203 ms

The docs say:

> ERROR raises an error condition; if this is not trapped by the surrounding 
> Perl code, the error propagates out to the calling query, causing the current 
> transaction or subtransaction to be aborted.

So I'm surprised that the transaction is not aborted. Bug?

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 06:15 PM, Tom Lane wrote:
How about a hybrid: we continue to identify patch authors as now, that 
is with names attached to the feature/bugfix descriptions, and then 
have a separate section "Other Contributors" to recognize patch 
reviewers and other helpers?



works for me.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous_commit and remote_write

2012-05-10 Thread Josh Berkus

> So, are we shipping remote_write in beta1?

Given that it's thursday afternoon US time, and we haven't changed it
yet, yes.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous_commit and remote_write

2012-05-10 Thread Bruce Momjian
On Wed, May 09, 2012 at 05:02:57PM -0700, Josh Berkus wrote:
> 
> > If so, we should also rename the column "write_location" in 
> > pg_stat_replication?
> 
> Now that you bring it up, probably.  Although not necessarily for 9.2.
> 
> > I named "remote_write (originally write)" after that column. And, in
> > "remote_write",
> > internally the master waits for replication until the wait LSN has
> > reached write_location.
> 
> Yeah, I get what it means.  But I'm not the person I'm worried about
> being confused.

So, are we shipping remote_write in beta1?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 04:16:01PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > Well, that would be fine, too.  What I think is bizarre is that I got
> > credit for some things I was barely involved in (like SP-gist) and no
> > credit for other things I spent a LOT of time on (like security views
> > and some of KaiGai's other stuff), and similarly for other people.
> > Similarly, some things I am credited on involve very significant
> > contributions from other people and others are cases where I did
> > nearly all the work.  I think it's weird to lump all those cases
> > together without any distinction.
> 
> Well, you know, these are *draft* release notes.  Feel free to correct
> them anywhere you believe they are inaccurate.

Yep.

> I think the bigger issue here is that we don't seem to have consensus
> about whether to include reviewers' names.  Bruce evidently thinks
> that's a good idea, else he wouldn't have done it, but I only recall one
> other person speaking in favor of it.  Everybody else seems to think
> that it'll be too verbose.

There were 2-3 who liked the reviewer names.  The bottom line is it is
easy to _remove_ names;  it requires a lot of research to add them.

One creative idea would be to keep the reviewer names as-is, but trim
the release notes down to a single name just before final release.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Josh Berkus

> The other problem with such an approach is that section (1) would be
> extremely duplicative of the main release-notes text.  How about a
> hybrid: we continue to identify patch authors as now, that is with names
> attached to the feature/bugfix descriptions, and then have a separate
> section "Other Contributors" to recognize patch reviewers and other
> helpers?

Sounds good to me.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Josh Berkus  writes:
>> It's been said elsewhere that adding all this to the release notes as
>> found on the official docs would be too bulky.  How about having a
>> second copy of the release notes that contains authorship info as
>> proposed by Andrew?  Then the docs could have no names at all, and
>> credit would be given by some other page in the website (to which the
>> release notes would link).

> Personally, I'd love to have something really simple, with just 3 sections:

> "Contributors to PostgreSQL 9.2"

> (1) Specific Patch Contributors
>   patch, original (co) author name

> (2) Patch reviewers
>   list without specific patch affiliation

> (3) Other Contributors to this release
>   anyone who contributed but isn't mentioned above

> I think if we get any more complicated, this is going to become a major
> chore for someone, and then it won't get done.

The other problem with such an approach is that section (1) would be
extremely duplicative of the main release-notes text.  How about a
hybrid: we continue to identify patch authors as now, that is with names
attached to the feature/bugfix descriptions, and then have a separate
section "Other Contributors" to recognize patch reviewers and other
helpers?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Josh Berkus

> It's been said elsewhere that adding all this to the release notes as
> found on the official docs would be too bulky.  How about having a
> second copy of the release notes that contains authorship info as
> proposed by Andrew?  Then the docs could have no names at all, and
> credit would be given by some other page in the website (to which the
> release notes would link).

Personally, I'd love to have something really simple, with just 3 sections:

"Contributors to PostgreSQL 9.2"

(1) Specific Patch Contributors
patch, original (co) author name

(2) Patch reviewers
list without specific patch affiliation

(3) Other Contributors to this release
anyone who contributed but isn't mentioned above

I think if we get any more complicated, this is going to become a major
chore for someone, and then it won't get done.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 01:51:28PM -0400, Robert Haas wrote:
> On Thu, May 10, 2012 at 1:44 PM, Bruce Momjian  wrote:
> > Not sure where to move that to.  Source Code doesn't seem right.  I
> > moved it lower in the performance section.
> 
> I'd just delete it.  Instead, under index-only scans, I'd mention it
> in the detail text: "This is possible because the visibility map has
> been improved to be robust even in the face of database or system
> crashes.  Various race conditions that could result in incorrect data
> in the visibility map have also been fixed."

OK, I merged it in in the attached, applied patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index fc477c8..374ffb4
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 413,422 
 
  This is often called "index-only scans" or "covering indexes".
  This is possible for heap pages with exclusively all-visible
! tuples, as reported by the visibility map.
 

!   

 
  Allow frequently uncontended locks to be recorded using a new
--- 413,423 
 
  This is often called "index-only scans" or "covering indexes".
  This is possible for heap pages with exclusively all-visible
! tuples, as reported by the visibility map.  The visibility map was
! made crash-safe as a necessary part of implementing this feature.
 

! 

 
  Allow frequently uncontended locks to be recorded using a new
***
*** 539,555 
 


-   
-
- Make the visibility map crash-safe (Robert Haas, Noah Misch)
-
-
-
- This helps vacuum be more efficient, and is necessary for
- index-only scans.
-
-   
-   

 
  Improve PowerPC and Itanium spinlock performance (Manabu Ori,
--- 540,545 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue may 10 16:07:33 -0400 2012:
> On Thu, May 10, 2012 at 3:07 PM, Andrew Dunstan  wrote:
> > The important thing about the current mechanism is that it ties the
> > contributor's name to a feature in the only place where we currently list
> > features on a time basis. So if I (for example) want to put on my resume
> > that I contributed adding new values to an enum in the 9.1 release, there is
> > a really easy way for someone to check that that's true, without having to
> > search commit logs, which aren't always wonderfully reliable either. If you
> > want a little finer granularity, let me offer the following categories as a
> > way of opening up discussion:
> >
> >   Author: contributed a significant portion of the code of a feature
> >   (say, over 25%)
> >   Contributor: made a significant contribution to the code (say 10% or
> >   more?), but less than that of an author.
> >   Reviewer: did a significant review of the code but not a significant
> >   code contribution.
> >
> > These are intended as broad guidelines, rather than something to be
> > nitpicked and litigated, but you should get the idea.
> 
> Well, that would be fine, too.  What I think is bizarre is that I got
> credit for some things I was barely involved in (like SP-gist) and no
> credit for other things I spent a LOT of time on (like security views
> and some of KaiGai's other stuff), and similarly for other people.
> Similarly, some things I am credited on involve very significant
> contributions from other people and others are cases where I did
> nearly all the work.  I think it's weird to lump all those cases
> together without any distinction.

It's been said elsewhere that adding all this to the release notes as
found on the official docs would be too bulky.  How about having a
second copy of the release notes that contains authorship info as
proposed by Andrew?  Then the docs could have no names at all, and
credit would be given by some other page in the website (to which the
release notes would link).

We could even have both be built from a single source, if we made
inclusion depend on some DSSSL flag or something.

(Obviously I'm not proposing doing this for beta1).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Robert Haas  writes:
> Well, that would be fine, too.  What I think is bizarre is that I got
> credit for some things I was barely involved in (like SP-gist) and no
> credit for other things I spent a LOT of time on (like security views
> and some of KaiGai's other stuff), and similarly for other people.
> Similarly, some things I am credited on involve very significant
> contributions from other people and others are cases where I did
> nearly all the work.  I think it's weird to lump all those cases
> together without any distinction.

Well, you know, these are *draft* release notes.  Feel free to correct
them anywhere you believe they are inaccurate.

I think the bigger issue here is that we don't seem to have consensus
about whether to include reviewers' names.  Bruce evidently thinks
that's a good idea, else he wouldn't have done it, but I only recall one
other person speaking in favor of it.  Everybody else seems to think
that it'll be too verbose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-10 Thread Alex

Peter Eisentraut  writes:

> I have been reviewing how our new libpq URL syntax compares against
> existing implementations of URL syntaxes in other drivers or
> higher-level access libraries.  In the case of SQLAlchemy, there is an
> incompatibility regarding how Unix-domain sockets are specified.
>
> First, here is the documentation on that:
> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
>
> The recommended way to access a server over a Unix-domain socket is to
> leave off the host, as in:
>
> postgresql://user:password@/dbname
>
> In libpq, this is parsed as host='/dbname', no database.

Ah, good catch: thanks for heads up.

I believe this was introduced lately in the dev cycle when we've noticed
that users will have to specify some defaults explicitly to be able to
override other defaults, while avoiding the whole "?keyword=value&..."
business.

I'll give this another look and will get back with a proposal to fix
this in form of a patch.

--
Regards,
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 3:07 PM, Andrew Dunstan  wrote:
> The important thing about the current mechanism is that it ties the
> contributor's name to a feature in the only place where we currently list
> features on a time basis. So if I (for example) want to put on my resume
> that I contributed adding new values to an enum in the 9.1 release, there is
> a really easy way for someone to check that that's true, without having to
> search commit logs, which aren't always wonderfully reliable either. If you
> want a little finer granularity, let me offer the following categories as a
> way of opening up discussion:
>
>   Author: contributed a significant portion of the code of a feature
>   (say, over 25%)
>   Contributor: made a significant contribution to the code (say 10% or
>   more?), but less than that of an author.
>   Reviewer: did a significant review of the code but not a significant
>   code contribution.
>
> These are intended as broad guidelines, rather than something to be
> nitpicked and litigated, but you should get the idea.

Well, that would be fine, too.  What I think is bizarre is that I got
credit for some things I was barely involved in (like SP-gist) and no
credit for other things I spent a LOT of time on (like security views
and some of KaiGai's other stuff), and similarly for other people.
Similarly, some things I am credited on involve very significant
contributions from other people and others are cases where I did
nearly all the work.  I think it's weird to lump all those cases
together without any distinction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WalSndWakeup() and synchronous_commit=off

2012-05-10 Thread Andres Freund
Hi all,

I noticed that when synchronous_commit=off were not waking up the wal sender 
latch in xact.c:RecordTransactionCommit which leads to ugly delays of approx 7 
seconds (1 + replication_timeout/10) with default settings.
Given that were flushing the wal to disk much sooner this appears to be a bad 
idea - especially as this may happen even under load if we ever reach the 
'coughtup' state.

I wonder why the WalSndWakeup isn't done like:

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index ecb71b6..7a3224b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool 
xlog_switch)
xlogctl->LogwrtRqst.Flush = LogwrtResult.Flush;
SpinLockRelease(&xlogctl->info_lck);
}
+
+   /* the walsender wasn't woken up in xact.c */
+   if(max_wal_senders > 1 && synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
+   WalSndWakeup();
 }

Doing that for the synchronous_commit=off case can imo be considered a bugfix, 
but I wonder why we ever wake the senders somewhere else?
The only argument I can see for doing it at places like StartTransactionCommit 
is that thats the place after which the data will be visible on the client. I 
think thats a non-argument though because if wal is flushed to disk outside of 
a commit there normally is enough data to make it worthwile.

Doing the above results in a very noticeable reduction in lagginess and even a 
noticeable reduction in cpu-usage spikes on a busy replication test setup.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, May 10, 2012 at 01:56:33AM -0400, Robert Haas wrote:
>> As a general comment, I think that your new policy of crediting the
>> reviewer on every feature except when that reviewer is also a
>> committer has produced a horrific mess.

> I assumed reviewers mentioned in the commit messages made substantive
> suggestions on improving the patch, rather than just +1.

I wouldn't assume that.  On most of what I committed from commitfest
items, I just listed whoever was named in the CF entry.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-10 Thread Tom Lane
Fujii Masao  writes:
> On Fri, May 11, 2012 at 12:07 AM, MauMau  wrote:
>> Thanks for your explanation. Although I haven't understood it well yet, I'll
>> consider what you taught. And I'll consider if the tentative measure of
>> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
>> against Japanese text.

> In Japanese, it's common to do a text search with two characters keyword.
> But since pg_trgm is 3-gram, you basically would not be able to use index
> for such text search. So you might need something like pg_bigm or pg_unigm
> for Japanese text search.

I believe the trigrams are three *bytes* not three characters.  So a
couple of kanji should work just fine for this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 02:29 PM, Robert Haas wrote:

On Thu, May 10, 2012 at 2:15 PM, Josh Berkus  wrote:

Then reviewers should be removed.

I disagree.  We're trying to get more reviewers, and encourage them to
do more reviewing.  Giving credit is a big part of that.

Are you disagreeing with Bruce's premise, my logic, or the conclusion?

Hah, good point.  I'm disagreeing with the conclusion that reviewers
should be removed, unless we're going to remove everyone *and* give them
credit elsewhere.  Which I would also be in favor of, I'm just not able
to do the work right now.

Well, the problem with the way it is right now is that we're giving
similar amounts of credit for very different amounts of contribution,
which IMHO is no good.  I think that putting a "Credits" section at
the bottom and listing contributors there would be a reasonable
solution; I also think that crediting people on a web page or in some
other place would be a fine solution.  What we have right now manages
to be both unfair and unreadable.



I don't really believe either of these. It's certainly not unreadable, 
and it's largely fair, although there may be some room for improvement. 
Moreover, until we have something better I'm strongly opposed to 
removing what we currently do (or have done in the past.)


The important thing about the current mechanism is that it ties the 
contributor's name to a feature in the only place where we currently 
list features on a time basis. So if I (for example) want to put on my 
resume that I contributed adding new values to an enum in the 9.1 
release, there is a really easy way for someone to check that that's 
true, without having to search commit logs, which aren't always 
wonderfully reliable either. If you want a little finer granularity, let 
me offer the following categories as a way of opening up discussion:


   Author: contributed a significant portion of the code of a feature
   (say, over 25%)
   Contributor: made a significant contribution to the code (say 10% or
   more?), but less than that of an author.
   Reviewer: did a significant review of the code but not a significant
   code contribution.


These are intended as broad guidelines, rather than something to be 
nitpicked and litigated, but you should get the idea.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Corner cases with GiST n-way splits

2012-05-10 Thread Heikki Linnakangas

On 10.05.2012 21:04, Alexander Korotkov wrote:

On Thu, May 10, 2012 at 9:14 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:


I found two corner cases with the current implementation when a page is
split into many halves:

1. If a page is split into more than 100 pages, you run into the same
limit of 100 simultaneous lwlocks that Tom Forbes reported with a
pathological intarray index. This time it's not because we hold locks on
many different levels, but because of a single split.

2. When the root page is split, there is no parent page to update, so we
just create a new root page with the downlinks. However, when you split a
page into a lot of siblings, it's possible that all the downlinks don't fit
on a single page. The code is prepared for that situation. You get an
error, when it tries to add more downlinks on a single page than fit there.

I'm not sure what to do about these. Neither issue is something you'd
actually bump into in an index that's doing something useful; there's been
no user complaints about these.


If such cases are very rare, we could call genericPickSplit if decide user
picksplit function result to be bad. We're already doing this if user
picksplit puts all the tuples into one page.


Yeah. We just need to decide when we consider picksplit to be doing such 
a bad job that we fall back to genericPickSplit. Something like, if the 
split produces more than 5 pages, perhaps.



GiST can split page into many pages because of nulls and multicolumn
indexes independently on user picksplit function.
Imagine we've following tuples:

tuple key1  key2  key3  key4  key5  ...
1 value value value value value ...
2 NULL  value value value value ...
3 NULL  NULL  NULL  value value ...
4 NULL  NULL  NULL  NULL  value ...
5 NULL  NULL  NULL  NULL  NULL  ...
..

In this case splitByKey will find only non-null value in the first key and
splits it into separate page. Then it will do same thing for the second key
etc. However, this process is limited by INDEX_MAX_KEYS.


Interesting, I didn't realize we handle NULLs like that. INDEX_MAX_KEYS 
is 32, which is less than the 100 lwlock limit, so I think we're safe 
with that.



BTW, I was thinking that it's not very right to let user picksplit decide
what split ratio (ratio of tuples in smaller and greater pages) is
acceptable.


It's not hard to imagine a case where it really does make sense to split 
a page so that one tuple goes on one page, and all the rest go to 
another. For example, imagine that a page contains 100 identical tuples, 
plus one that's different from all the rest. Now you insert one more 
tuple that's identical to the 100 tuples, and the insert causes a page 
split. It makes sense to split off the single outlier to a page of its 
own, and put all the rest on one page. One more insert will make the 
page split again, but the tree is better organized.


Whether that's too marginal to worry about, and we should enforce a 
split ratio anyway, I'm not sure..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Christopher Browne
On Thu, May 10, 2012 at 12:55 PM, Josh Berkus  wrote:
> On 5/10/12 9:44 AM, Peter Eisentraut wrote:
>> On tor, 2012-05-10 at 10:44 -0400, Bruce Momjian wrote:
>>> The big take-away is that the release notes are mostly for blame and
>>> to designate a go-to person for feature problems, not for giving
>>> credit,
>>
>> Then reviewers should be removed.
>
> I disagree.  We're trying to get more reviewers, and encourage them to
> do more reviewing.  Giving credit is a big part of that.

As much as that's nice, I don't think that's quite enough reason to do
so, at least not as a last minute afterthought in trying to finalize
the release notes.

On the other hand, if reviewers are considered extra "go-to" people
for the purposes of 'blamecasting' if something goes wrong with a new
feature, that's actually a fine reason to include them.  If both the
developer *and* the reviewer missed an issue, then *both* are
"blameworthy," and if we have any features gone desperately wrong,
both deserve to have appropriate things thrown at them.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 2:07 PM, Kevin Grittner
 wrote:
> Ants Aasma  wrote:
>> It seems to me that the simplest thing to do would be to lift the
>> sampling done in analyze.c (acquire_sample_rows) and use that to
>> implement the SYSTEM sampling method.
>
> Definitely.  I thought we had all agreed on that ages ago.

Right, and I don't think we should be considering any of this other
stuff until that basic thing is implemented and working.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 2:15 PM, Josh Berkus  wrote:
 Then reviewers should be removed.
>>>
>>> I disagree.  We're trying to get more reviewers, and encourage them to
>>> do more reviewing.  Giving credit is a big part of that.
>>
>> Are you disagreeing with Bruce's premise, my logic, or the conclusion?
>
> Hah, good point.  I'm disagreeing with the conclusion that reviewers
> should be removed, unless we're going to remove everyone *and* give them
> credit elsewhere.  Which I would also be in favor of, I'm just not able
> to do the work right now.

Well, the problem with the way it is right now is that we're giving
similar amounts of credit for very different amounts of contribution,
which IMHO is no good.  I think that putting a "Credits" section at
the bottom and listing contributors there would be a reasonable
solution; I also think that crediting people on a web page or in some
other place would be a fine solution.  What we have right now manages
to be both unfair and unreadable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-10 Thread Fujii Masao
On Thu, May 10, 2012 at 11:51 PM, Magnus Hagander  wrote:
> And taking this a step further - we *already* send these GUCs.
> Previous references to us not doing that were incorrect :-)
>
> So this should be a much easier fix than we thought. And can be done
> entirely in pg_basebackup, meaning we don't need to worry about
> beta...

Sounds good!

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Alexander Korotkov
"Improve GiST box and point index performance by producing better trees
with less memory allocation overhead (Alexander Korotkov, Heikki
Linnakangas, Kevin Grittner)"
Is this note about following two commits?
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f3bd86843e5aad84585a57d3f6b80db3c609916
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d50e1251946a6e59092f0a84fc903532eb599a4f
These improvements influence not only boxes and points but all geometrical
datatypes.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Josh Berkus

>>> Then reviewers should be removed.
>>
>> I disagree.  We're trying to get more reviewers, and encourage them to
>> do more reviewing.  Giving credit is a big part of that.
> 
> Are you disagreeing with Bruce's premise, my logic, or the conclusion?

Hah, good point.  I'm disagreeing with the conclusion that reviewers
should be removed, unless we're going to remove everyone *and* give them
credit elsewhere.  Which I would also be in favor of, I'm just not able
to do the work right now.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Kevin Grittner
Ants Aasma  wrote:
 
> It seems to me that the simplest thing to do would be to lift the
> sampling done in analyze.c (acquire_sample_rows) and use that to
> implement the SYSTEM sampling method. 
 
Definitely.  I thought we had all agreed on that ages ago.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Corner cases with GiST n-way splits

2012-05-10 Thread Alexander Korotkov
On Thu, May 10, 2012 at 9:14 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> GiST page splitting has the peculiarity that it sometimes needs to split a
> single page into more than two pages. It happens rarely in practice, but it
> possible (*). With a bad picksplit function, it happens more often.
>
> While testing with a custom gist opclass with truly evil helper functions,
> I found two corner cases with the current implementation when a page is
> split into many halves:
>
> 1. If a page is split into more than 100 pages, you run into the same
> limit of 100 simultaneous lwlocks that Tom Forbes reported with a
> pathological intarray index. This time it's not because we hold locks on
> many different levels, but because of a single split.
>
> 2. When the root page is split, there is no parent page to update, so we
> just create a new root page with the downlinks. However, when you split a
> page into a lot of siblings, it's possible that all the downlinks don't fit
> on a single page. The code is prepared for that situation. You get an
> error, when it tries to add more downlinks on a single page than fit there.
>
> I'm not sure what to do about these. Neither issue is something you'd
> actually bump into in an index that's doing something useful; there's been
> no user complaints about these.
>

If such cases are very rare, we could call genericPickSplit if decide user
picksplit function result to be bad. We're already doing this if user
picksplit puts all the tuples into one page.

GiST can split page into many pages because of nulls and multicolumn
indexes independently on user picksplit function.
Imagine we've following tuples:

tuple key1  key2  key3  key4  key5  ...
1 value value value value value ...
2 NULL  value value value value ...
3 NULL  NULL  NULL  value value ...
4 NULL  NULL  NULL  NULL  value ...
5 NULL  NULL  NULL  NULL  NULL  ...
..

In this case splitByKey will find only non-null value in the first key and
splits it into separate page. Then it will do same thing for the second key
etc. However, this process is limited by INDEX_MAX_KEYS.

BTW, I was thinking that it's not very right to let user picksplit decide
what split ratio (ratio of tuples in smaller and greater pages) is
acceptable. We could pass minimal acceptable ratio to use picksplit
function and call genericPickSplit if ratio is not followed. However, the
question arises if we're going to measure ratio in tuples count or in
tuples size. Ratio in tuples size seems to be more desirable while ratio in
tuples count seems to be easier for user picksplit to follow.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 1:44 PM, Bruce Momjian  wrote:
> Not sure where to move that to.  Source Code doesn't seem right.  I
> moved it lower in the performance section.

I'd just delete it.  Instead, under index-only scans, I'd mention it
in the detail text: "This is possible because the visibility map has
been improved to be robust even in the face of database or system
crashes.  Various race conditions that could result in incorrect data
in the visibility map have also been fixed."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 05:57:01AM -0700, Josh Kupershmidt wrote:
> On Wed, May 9, 2012 at 8:11 PM, Bruce Momjian  wrote:
> > I have completed my draft of the 9.2 release notes, and committed it to
> > git.  I am waiting for our development docs to build, but after 40
> > minutes, I am still waiting:
> 
> This bit:
>   Previously supplied years and year masks of less than four digits
> wrapped inconsistently.
> 
> I first read as "Previously-supplied years..." instead of "Previously,
> years and year masks with...".

Good suggestion, I fixed that, and a few more, with the applied patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
commit 45f6fb2713feb21bc24fa907bad575008fd680ef
Author: Bruce Momjian 
Date:   Thu May 10 13:47:49 2012 -0400

Add comma after "Previously" as suggested by Josh Kupershmidt

diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index ed8ce99..fc477c8
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 153,159 
 
 
 
! Previously supplied years and year masks of less than four digits
  wrapped inconsistently.
 

--- 153,159 
 
 
 
! Previously, supplied years and year masks of less than four digits
  wrapped inconsistently.
 

***
*** 1010,1016 
 
 
 
! Previously such not-valid-for-session errors would cause all
  setting changes to be ignored by that backend.
 

--- 1010,1016 
 
 
 
! Previously, such not-valid-for-session errors would cause all
  setting changes to be ignored by that backend.
 

***
*** 1165,1171 
 
 
 
! Previously the generic label ?column? was used.
 


--- 1165,1171 
 
 
 
! Previously, the generic label ?column? was used.
 


***
*** 1852,1858 
 
 
 
! Previously default permissions generated NULL fields.
  (WAS IT NULL?)
 

--- 1852,1858 
 
 
 
! Previously, default permissions generated NULL fields.
  (WAS IT NULL?)
 

***
*** 2389,2395 
 
 
 
! Previously libpq collected the entire query result into memory
  before passing it back to the application.
 

--- 2389,2395 
 
 
 
! Previously, libpq collected the entire query result into memory
  before passing it back to the application.
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Ants Aasma
On Thu, May 10, 2012 at 6:33 PM, Robert Haas  wrote:
> I'm worried this project is getting so complicated that it will be
> beyond the ability of a new hacker to get anything useful done.  Can
> we simplify the requirements here to something that is reasonable for
> a beginner?

It seems to me that the simplest thing to do would be to lift the
sampling done in analyze.c (acquire_sample_rows) and use that to
implement the SYSTEM sampling method. The language in the standard
leads me to believe that Vitter's algorithm used in analyze.c is
exactly what was intended by the authors. The difference between
Vitter's algorithm and a pure Bernoulli process is precisely that
Vitter's method increases the chances to see multiple rows picked from
a single page.

One tricky issue is that tablesample is defined in terms of a
percentage of the underlying table while Vitter's algorithm needs a
fixed number of rows. The standard does state that the result needs to
contain "approximately" the stated percentage of rows. I'm not sure if
calculating the amount of rows to return from reltuples would fit that
definition of approximate. If not, would re-estimating the amount of
reltuples after sampling and taking appropriate corrective action make
it better or would an accurate number be necessary. Getting an
accurate number efficiently would require solving of the COUNT(*)
issue.

For the Bernoulli case I can't think of anything simple that would
better than just scanning the table or poking it with random TIDs.
(the latter has the same problem of estimating the desired result set
size) It seems to me that Vitter's approach could be amended to
produce independent samples by selecting slightly more pages than
result tuples and tweaking the acceptance levels to cancel out the
bias. But that definitely isn't in the territory of simple and would
require rigorous statistical analysis.

And as for the monetary unit sampling, I agree that this is better
left as an optional extra.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result set slicing broken in Python 3

2012-05-10 Thread Peter Eisentraut
On lör, 2012-05-05 at 22:45 +0200, Jan Urbański wrote:
> Apparently once you implement PyMappingMethods.mp_subscript you can
> drop PySequenceMethods.sq_slice, but I guess there's no harm in
> keeping it (and I'm not sure it'd work on Python 2.3 with only
> mp_subscript implemented).

Committed this now.

>From test coverage reports, I now see that PLy_result_ass_item() is no
longer called.  That's probably OK, if assignments are now handled
through the mapping methods.  But should we remove the function then?

> 
> Do we want to backpatch this? If so, I'd need to produce a version
> that applies to the monolithic plpython.c file from the previous
> releases. 

I don't think this should be backpatched.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 01:11:54PM +0100, Peter Geoghegan wrote:
> On 10 May 2012 04:11, Bruce Momjian  wrote:
> > I have completed my draft of the 9.2 release notes, and committed it to
> > git.  I am waiting for our development docs to build, but after 40
> > minutes, I am still waiting:
> 
> "Allow the bgwriter, walwriter, and statistics collector to sleep more
> efficiently during periods of inactivity (Peter Geoghegan, Heikki
> Linnakangas, Tom Lane)...This reduces CPU wake-ups."
> 
> I think that there should be mention of why this is a good thing. When
> fully idle the server reaches less than a single wake-up per second,

I added text that says it reduces power consuption on idle servers.

> which I think is a nice, relevant fact. You should add the archiver
> and checkpointer to that list, though I suppose you could argue that
> the checkpointer, as a "new" auxiliary process, shouldn't count.

I added the archiver and checkpointer to the list.  Seems there is no
doc section to link to for these processes.

> Why can't we call group commit group commit (and for that matter,
> index-only scans index-only scans), so that people will understand
> that we are now competitive with other RDBMSs in this area? "Improve
> performance of WAL writes when multiple transactions commit at the
> same time" seems like a pretty bad description, since it doesn't make
> any reference to batching of commits.  Also, I don't think that the

I didn't call it "group commit" because we have settings we used to
regard as group commit:

#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000

These are still there.  Should they be removed?

I updated the release docs to call the item "group commit" because I now
don't see any reference to that term in our docs.

> placement of this as the second to last performance feature is
> commensurate with its actual importance. Still, the actual major

I am really unclear on how the performance items should be listed in
terms of importance, so I am ready for someone to tell me the proper
order.

> feature list is a much more relevant indicator of how it is felt that
> individual features should be weighed, and of course that hasn't been
> decided upon yet.
> 
> "Change pg_stat_statements' total_time column to be measured in
> milliseconds (Tom Lane)". Surely this should be under
> "pg_stat_statements"?

I had it above because it was a major incompatibility.  I do have some
incompatibilities, e.g. pg_upgrade, that I kept in their own section. 
Should I move it?  Can we assume people will also look in per-module
sections for incompatibility information?

> Does "Make the visibility map crash-safe" really belong under "Performance"?

Not sure where to move that to.  Source Code doesn't seem right.  I
moved it lower in the performance section.

> It's not clear that this isn't just within comments that will be later
> removed, but I'd remove all references to "we".

Fixed.  Attached patch applied.  Thanks.

I do appreciate all the feedback.  I think I got almost zero feedback on
9.1 and it was kind of weird.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
commit ffda90f3afe3f4db10127d2b853dfe4230720873
Author: Bruce Momjian 
Date:   Thu May 10 13:38:05 2012 -0400

9.2 release note updates from Peter Geoghegan

diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index 0b43c3a..ed8ce99
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 72,79 
 
 
  Users should now use hstore(text, text).  Since
! PostgreSQL 9.0, we have emitted a
! warning message when an operator named => is created because
  the SQL standard reserves that token for
  another use.
 
--- 72,79 
 
 
  Users should now use hstore(text, text).  Since
! PostgreSQL 9.0, a warning message is
! emitted when an operator named => is created because
  the SQL standard reserves that token for
  another use.
 
***
*** 462,478 


 
- Make the visibility map crash-safe (Robert Haas, Noah Misch)
-
-
-
- This helps vacuum be more efficient, and is necessary for
- index-only scans.
-
-   
-   
-   
-
  Improve GiST box and point index performance by producing better
  trees with less memory allocation overhead (Alexander Korotkov,
  Heikki Linnakangas, Kevin Grittner)
--- 462,467 
***
*** 545,553 


 
! Improve performance of WAL writes when multiple
! transactions commit at the same time (Peter Geoghegan, Simon Riggs,
! Heikki Linn

Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Peter Eisentraut
On tor, 2012-05-10 at 09:55 -0700, Josh Berkus wrote:
> On 5/10/12 9:44 AM, Peter Eisentraut wrote:
> > On tor, 2012-05-10 at 10:44 -0400, Bruce Momjian wrote:
> >> The big take-away is that the release notes are mostly for blame and
> >> to designate a go-to person for feature problems, not for giving
> >> credit,
> > 
> > Then reviewers should be removed.
> 
> I disagree.  We're trying to get more reviewers, and encourage them to
> do more reviewing.  Giving credit is a big part of that.

Are you disagreeing with Bruce's premise, my logic, or the conclusion?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 09:55:37AM -0700, Josh Berkus wrote:
> On 5/10/12 9:44 AM, Peter Eisentraut wrote:
> > On tor, 2012-05-10 at 10:44 -0400, Bruce Momjian wrote:
> >> The big take-away is that the release notes are mostly for blame and
> >> to designate a go-to person for feature problems, not for giving
> >> credit,
> > 
> > Then reviewers should be removed.
> 
> I disagree.  We're trying to get more reviewers, and encourage them to
> do more reviewing.  Giving credit is a big part of that.

OK, we are officially now "all over the map" on this!  I did favor
reviewers over committers for the purpose of encouragement.  I am fine
with anything that has the same or fewer names than we have now.  I
don't think anyone is arguing for more names, so at least we have a
direction.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 12:18:08PM +0200, Magnus Hagander wrote:
> On Thu, May 10, 2012 at 5:11 AM, Bruce Momjian  wrote:
> > (Why is there no time zone shown in the date/time at the top?)   I think
> > it will eventually show up here:
> >
> >        http://www.postgresql.org/docs/devel/static/release-9-2.html
> >
> 
> Other than the comments others have specified:
> 
> * Add libpq parameters for specifying the locations of server-side SSL
> files (Peter Eisentraut)
> 
> Those are regular server side gucs and not libpq parameters. You
> certainly can't control the location of server-side files with libpq
> parameters..

But imagine how much fun we could have if we could!  Anyway,  fixed.  ;-)
Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Corner cases with GiST n-way splits

2012-05-10 Thread Heikki Linnakangas
GiST page splitting has the peculiarity that it sometimes needs to split 
a single page into more than two pages. It happens rarely in practice, 
but it possible (*). With a bad picksplit function, it happens more often.


While testing with a custom gist opclass with truly evil helper 
functions, I found two corner cases with the current implementation when 
a page is split into many halves:


1. If a page is split into more than 100 pages, you run into the same 
limit of 100 simultaneous lwlocks that Tom Forbes reported with a 
pathological intarray index. This time it's not because we hold locks on 
many different levels, but because of a single split.


2. When the root page is split, there is no parent page to update, so we 
just create a new root page with the downlinks. However, when you split 
a page into a lot of siblings, it's possible that all the downlinks 
don't fit on a single page. The code is prepared for that situation. You 
get an error, when it tries to add more downlinks on a single page than 
fit there.


I'm not sure what to do about these. Neither issue is something you'd 
actually bump into in an index that's doing something useful; there's 
been no user complaints about these.



(*) What follows is an explanation of how a page can be split into more 
than two halves, to help you (and me!) understand this:


In a very pathological case, it's possible for a single insertion to 
cause a page to be split into hundreds of pages. Imagine that you have a 
page full of very small tuples (let's imagine that a page can hold 8 
letters, and ignore all tuple overhead for now):


A B C D E F G H

Now you insert one large tuple on the page, . picksplit algorithm 
can choose to split this as:


A - B C D E F G H 

The right side is still too large to on a single page, so it's 
iteratively split again:


A - B - C D E F G H 

And again:

A - B - C - D E F G H 

And again:

A - B - C - D - E F G H 

In this example, the page was split into 5 halves, but in reality a page 
can hold many more tuples, and the difference between a small and a 
large tuple can be much greater, so you can end up with many more 
siblings in one split.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 10:50:14AM +0300, Heikki Linnakangas wrote:
> On 10.05.2012 06:11, Bruce Momjian wrote:
> >I have completed my draft of the 9.2 release notes, and committed it to
> >git.
> 
> Thanks! I committed a few trivial fixes, below are a few more I
> wasn't sure about:
> 
> >* Add support for range data types (Jeff Davis, Tom Lane, Alexander Korotkov)
> >
> >The range data type records a lower and upper bound, and supports 
> >comparisons like contains, overlaps, and intersection.
> 
> /s/comparisons/operations/ ?
> 
> >* Allow a user to cancel queries in other owned sessions using 
> >pg_cancel_backend() (Magnus Hagander)
> >
> >Previously only the superuser could cancel queries.
> 
> "other owned sessions" is a bit ambiguous. It reads to me like
> "possessed sessions" or "0wned sessions". It's not clear it means
> sessions owned by the same user. How about "... to cancel queries in
> his other sessions, using ..." ? Or:
> 
> * Allow a non-superuser to cancel queries in another backend using
> pg_cancel_backend(), as long as the victim backend belongs to the
> same user
> 
> Previously only the superuser could cancel queries.
> 
> 
> >* Change default names of triggers to fire action triggers before check 
> >triggers (Tom Lane)
> >
> >This allows default-named check triggers to check post-action rows.
> 
> That's quite a mouthful :-). I don't understand what it means.
> 
> >In psql tab completion, complete SQL key words based on COMP_KEYWORD_CASE 
> >setting and the perhaps case of the partially-supplied word (Peter 
> >Eisentraut, Fujii Masao)
> 
> Which is correct spelling, "keyword" or "key word"? We seem to use
> both in the docs. "Keyword" sounds much better to me, but I think I
> might be more prone to write words together than native English
> speakers.

I have made adjustments based on your comments in the attached, applied
patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index 86a4e04..d63b572
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 1567,1573 
 
 
  The range data type records a lower and upper bound, and supports
! comparisons like contains, overlaps, and intersection.
 


--- 1567,1573 
 
 
  The range data type records a lower and upper bound, and supports
! operations like contains, overlaps, and intersection.
 


***
*** 1659,1665 
  

 
! Allow a user to cancel queries in other owned sessions using pg_cancel_backend()
  (Magnus Hagander)
 
--- 1659,1665 
  

 
! Allow users to cancel queries in user-matching sessions using pg_cancel_backend()
  (Magnus Hagander)
 
***
*** 1730,1737 
  

 
! Change default names of triggers to fire action triggers before
! check triggers (Tom Lane)
 
 
 
--- 1730,1737 
  

 
! Change default names of triggers to fire "action" triggers before
! "check" triggers (Tom Lane)
 
 
 
***
*** 2229,2235 

 
  In psql tab completion,
! complete SQL key words based on
  COMP_KEYWORD_CASE setting and the perhaps case of
  the partially-supplied word (Peter Eisentraut, Fujii Masao)
 
--- 2229,2235 

 
  In psql tab completion,
! complete SQL keywords based on
  COMP_KEYWORD_CASE setting and the perhaps case of
  the partially-supplied word (Peter Eisentraut, Fujii Masao)
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 01:56:33AM -0400, Robert Haas wrote:
> As a general comment, I think that your new policy of crediting the
> reviewer on every feature except when that reviewer is also a
> committer has produced a horrific mess.  Just to pick one of many
> examples, consider this item:
> 
> Add a security_barrier option for views (KaiGai Kohei, Noah Misch)
> 
> Here is what the commit message says:
> 
> Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
> (in October 2009!).  Review (in earlier versions) by Noah Misch and
> others.  Design advice by Tom Lane and myself.  Further review and
> cleanup by me.
> 
> So there are four people mentioned in this commit message, and you've
> picked out two of them to credit, not on the basis of who did the most
> work, but rather on the basis of which ones happen to not be
> committers.  The result is that, as I read through these release
> notes, one gets what I believe to be a very misleading notion of who
> developed which features.  I don't object to not being credited on
> this one, but I don't think it makes sense to credit Noah and NOT
> credit me.  As you have it, people who did little more than say "yep,
> looks fine to me" are credited almost equally with the people who
> wrote the code, while a committer who heavily revised the patch may
> not be mentioned at all, or sometimes (seemingly at random) they are.

I assumed reviewers mentioned in the commit messages made substantive
suggestions on improving the patch, rather than just +1.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
 On Thu, May 10, 2012 at 01:56:33AM -0400, Robert Haas wrote:
> On Wed, May 9, 2012 at 11:11 PM, Bruce Momjian  wrote:
> > I have completed my draft of the 9.2 release notes, and committed it to
> > git.
> 
> Extra parens:
> Remove the spclocation field from pg_tablespace (Magnus Hagander, Tom Lane))
> Reduce overhead of creating virtual transaction id locks ((Robert
> Haas, Jeff Davis)

Done.

> The antecedent of "these" is unclear:
> Allow backends to detect postmaster death via a pipe read failure,
> rather than polling (Peter Geoghegan, Heikki Linnakangas)
> These are internally called "latches".

Fixed.

> Missing comma:
> Cancel queries if clients get disconnected (Florian Pflug Greg Jaskiewicz)

Fixed by some else.

> You mean "effect":
> Such casts have no affect.

Fixed.

> 
> I think all three of these are the same thing:
> Avoid table and index rebuilds when NUMERIC, VARBIT, and temporal
> columns are changed in compatible ways (Noah Misch)
> Reduce need to rebuild indexes for various ALTER TABLE operations
> (Noah Misch) DUPLICATE?
> Avoid index rebuilds for no-rewrite ALTER TABLE / ALTER TYPE (Noah Misch)

Agreed, duplicates removed.

> This feature wasn't committed at all:
> Parallel pg_dump (Robert Haas, Joachim Wieland) DETAILS?

I was confused because there were infrastructure commits menting the
feature, so I thought it was lost somehow.

> Yes, this is still true:
> This is currently unused. STILL TRUE?

OK, fixed.

The attached patch includes all these fixes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index e1f5c90..86a4e04
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 97,103 
  Remove the spclocation
  field from pg_tablespace (Magnus Hagander,
! Tom Lane))
 
 
 
--- 97,103 
  Remove the spclocation
  field from pg_tablespace (Magnus Hagander,
! Tom Lane)
 
 
 
***
*** 601,607 
 
 
 
! These are internally called "latches".
 


--- 601,607 
 
 
 
! The wait events are internally called "latches".
 


***
*** 1195,1201 
 
 
 
! Such casts have no affect.
 


--- 1195,1201 
 
 
 
! Such casts have no effect.
 


***
*** 1312,1326 
  

 
! Avoid table and index rebuilds when NUMERIC,
! VARBIT, and temporal columns are changed in compatible
! ways (Noah Misch)
!
!   
!   
!   
!
! Reduce need to rebuild indexes for various ALTER TABLE
  operations (Noah Misch) DUPLICATE?
 
--- 1312,1318 
  

 
! Reduce need to rebuild tables and indexes for various ALTER TABLE
  operations (Noah Misch) DUPLICATE?
 
***
*** 1328,1341 


 
- Avoid index rebuilds for no-rewrite ALTER TABLE
- / ALTER TYPE
- (Noah Misch)
-
-   
-   
-   
-
  Add IF EXIST clause to ALTER
  commands (Pavel Stehule)
 
--- 1320,1325 
***
*** 2291,2303 
  

 
- Parallel pg_dump (Robert Haas, Joachim Wieland)
- DETAILS?
-
-   
-   
-   
-
  Add an --exclude-table-data option to
  pg_dump (Andrew Dunstan)
 
--- 2275,2280 
***
*** 2513,2519 
 
 
 
! This is currently unused.  STILL TRUE?
 


--- 2490,2496 
 
 
 
! This is currently unused.
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-10 Thread Fujii Masao
On Fri, May 11, 2012 at 12:07 AM, MauMau  wrote:
> Thanks for your explanation. Although I haven't understood it well yet, I'll
> consider what you taught. And I'll consider if the tentative measure of
> removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm
> against Japanese text.

In Japanese, it's common to do a text search with two characters keyword.
But since pg_trgm is 3-gram, you basically would not be able to use index
for such text search. So you might need something like pg_bigm or pg_unigm
for Japanese text search.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Josh Berkus
On 5/10/12 9:44 AM, Peter Eisentraut wrote:
> On tor, 2012-05-10 at 10:44 -0400, Bruce Momjian wrote:
>> The big take-away is that the release notes are mostly for blame and
>> to designate a go-to person for feature problems, not for giving
>> credit,
> 
> Then reviewers should be removed.

I disagree.  We're trying to get more reviewers, and encourage them to
do more reviewing.  Giving credit is a big part of that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 07:40:29PM +0300, Peter Eisentraut wrote:
> On tor, 2012-05-10 at 12:24 -0400, Tom Lane wrote:
> > >
> > openjade:/home/bf/bfr/root/HEAD/pgsql.9367/../pgsql/doc/src/sgml/release-9.2.sgml:1946:14:E:
> >  "324" is not a character number in the document character set
> > 
> > I get the same, and so do some of the buildfarm members.  I've changed
> > the text and added a note to release.sgml specifying not to use
> > numeric character entities. 
> 
> The problem is not using numeric character entities, it's using a
> character not in the document character set, which is Latin 1.

That's what I suspected.  I now see they were saying you have to define
the charset as Unicode, which we can't do.  I updated the docs again to
explain that.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-10 Thread Tom Lane
I wrote:
> Hence I think we oughta swap the order of those two array
> elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
> pgwin32_select.)

Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
nasty bug: it does not reset ImmediateInterruptOK before returning.
How is it that Windows machines aren't falling over constantly?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result set slicing broken in Python 3

2012-05-10 Thread Peter Eisentraut
On tor, 2012-05-10 at 12:37 -0400, Robert Haas wrote:
> On Sat, May 5, 2012 at 4:45 PM, Jan Urbański  wrote:
> >> I found some instructions on how to deal with the Python 2/Python 3
> >> slicing mess:
> >>
> >>
> >> http://renesd.blogspot.com/2009/07/python3-c-api-simple-slicing-sqslice.html
> >
> >
> > Thanks to the helpful folk at #python I found out that the fix is much
> > easier. Attached is a patch that fixes the bug and passes regression tests
> > on Pythons 2.3 through 3.2.
> >
> > Apparently once you implement PyMappingMethods.mp_subscript you can drop
> > PySequenceMethods.sq_slice, but I guess there's no harm in keeping it (and
> > I'm not sure it'd work on Python 2.3 with only mp_subscript implemented).
> >
> > Do we want to backpatch this? If so, I'd need to produce a version that
> > applies to the monolithic plpython.c file from the previous releases.
> 
> Did this get forgotten about?

I'm working on it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Peter Eisentraut
On tor, 2012-05-10 at 10:44 -0400, Bruce Momjian wrote:
> The big take-away is that the release notes are mostly for blame and
> to designate a go-to person for feature problems, not for giving
> credit,

Then reviewers should be removed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Peter Eisentraut
On tor, 2012-05-10 at 12:24 -0400, Tom Lane wrote:
> >
> openjade:/home/bf/bfr/root/HEAD/pgsql.9367/../pgsql/doc/src/sgml/release-9.2.sgml:1946:14:E:
>  "324" is not a character number in the document character set
> 
> I get the same, and so do some of the buildfarm members.  I've changed
> the text and added a note to release.sgml specifying not to use
> numeric character entities. 

The problem is not using numeric character entities, it's using a
character not in the document character set, which is Latin 1.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result set slicing broken in Python 3

2012-05-10 Thread Robert Haas
On Sat, May 5, 2012 at 4:45 PM, Jan Urbański  wrote:
>> I found some instructions on how to deal with the Python 2/Python 3
>> slicing mess:
>>
>>
>> http://renesd.blogspot.com/2009/07/python3-c-api-simple-slicing-sqslice.html
>
>
> Thanks to the helpful folk at #python I found out that the fix is much
> easier. Attached is a patch that fixes the bug and passes regression tests
> on Pythons 2.3 through 3.2.
>
> Apparently once you implement PyMappingMethods.mp_subscript you can drop
> PySequenceMethods.sq_slice, but I guess there's no harm in keeping it (and
> I'm not sure it'd work on Python 2.3 with only mp_subscript implemented).
>
> Do we want to backpatch this? If so, I'd need to produce a version that
> applies to the monolithic plpython.c file from the previous releases.

Did this get forgotten about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Kevin Grittner
Robert Haas  wrote:
 
> I wonder if you could do this with something akin to the Bitmap
> Heap Scan machinery.  Populate a TID bitmap with a bunch of
> randomly chosen TIDs, fetch them all in physical order
 
It would be pretty hard for any other plan to beat that by very
much, so it seems like a good approach which helps keep things
simple.
 
> and if you don't get as many rows as you need, rinse and repeat
> until you do.
 
Ay, there's the rub.  If you get too many, it is important that you
read all the way to the end and then randomly omit some of them. 
While a bit of a bother, that's pretty straightforward and should be
pretty fast, assuming you're not, like, an order of magnitude high. 
But falling short is tougher; making up the difference could be an
iterative process, which could always wind up with having you read
all tuples in the table without filling your sample.  Still, this
approach seems like it would perform better than generating random
ctid values and randomly fetching until you've tried them all.
 
> I'm worried this project is getting so complicated that it will be
> beyond the ability of a new hacker to get anything useful done. 
> Can we simplify the requirements here to something that is
> reasonable for a beginner?
 
I would be inclined to omit monetary unit sampling from the first
commit.  Do the parts specified in the standard first and get it
committed.  Useful as unit sampling is, it seems like the hardest to
do, and should probably be done "if time permits" or left as a
future enhancement.  It's probably enough to just remember that it's
there and make a "best effort" attempt not to paint ourselves in a
corner which precludes its development.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-05-10 Thread Robert Haas
On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
 wrote:
> I would like to propose to improve parsing efficiency of contrib/file_fdw by
> selective parsing proposed by Alagiannis et al.[1], which means that for a
> CSV/TEXT file foreign table, file_fdw performs binary conversion only for
> the columns needed for query processing.  Attached is a WIP patch
> implementing the feature.

Can you add this to the next CommitFest?  Looks interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 12:24:10PM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > This has broken my docs build because of this line:
> 
> > release-9.2.sgml:1946:Urbańnski, Steve Singer)
> 
> > with this error:
> 
> > 
> > openjade:/home/bf/bfr/root/HEAD/pgsql.9367/../pgsql/doc/src/sgml/release-9.2.sgml:1946:14:E:
> >  "324" is not a character number in the document character set
> 
> I get the same, and so do some of the buildfarm members.  I've changed
> the text and added a note to release.sgml specifying not to use numeric
> character entities.

I does build on my Debian Squeeze toolchain and does render right, but I
was very concerned about its use because I saw it referenced in only one
URL:

http://webdesign.about.com/library/bl_htmlcodes.htm

I needed 'n' with an accent.  I have remove that URL from release.sgml.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Peter Eisentraut
On tor, 2012-05-10 at 17:31 +0200, Magnus Hagander wrote:
> If people want the main docs building more often that's not really a
> problem other than time - we just need to decouple it from the
> buildfarm and run a separate job for it. It's not rocket science.. 

Many years ago, Bruce and myself in particular put in a lot of work to
make the turnaround time on the docs build less than 5 minutes, based on
various requests.  I'm disappointed to learn that that was abandoned
without discussion.  We might as well just put the old job back.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Andrew Dunstan  writes:
> This has broken my docs build because of this line:

> release-9.2.sgml:1946:Urbańnski, Steve Singer)

> with this error:

> 
> openjade:/home/bf/bfr/root/HEAD/pgsql.9367/../pgsql/doc/src/sgml/release-9.2.sgml:1946:14:E:
>  "324" is not a character number in the document character set

I get the same, and so do some of the buildfarm members.  I've changed
the text and added a note to release.sgml specifying not to use numeric
character entities.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 11:54:36AM -0400, Andrew Dunstan wrote:
> >We could try cutting it down to one name and see if we have any problems
> >with it.  Robert is right that if you are thinking of this as "credit"
> >it is never going to work.
> >
> 
> 
> I don't really buy this at all. The fact that it's not perfect
> doesn't mean that it's wrong. Just about the only reward we give
> contributors is some kudos, and the more the better as far as I'm
> concerned. I'd almost like to see a "Credits" section of the release
> notes, but if we're not going to have that let's keep doing what we
> have been doing.

Well, the new change is that we now are listing reviewers, but frankly,
we are doing a lot more collaborative work than we have in the past,
meaning there is an increase in the number of names in this release, and
it has been steadily growing even if we don't include the reviewers.

I think the names are a balance between the release notes looking trim
and professional, and giving credit to individuals, however imperfect. 
I think giving credit to companies is going too far away from
trim/professional, and I think most agree on that. (I have suggested the
company names belong mostly in the release announcement.)

The question is how do we handle the explosion of names, and does it
still look trim/professional?  I think we are probably on the far edge
of that with the 9.2 release notes.  Putting names at the bottom or a
"Credit" section just seems also too far away from trim/professional
because there is no procedural reason for the names, and they are
literally listed as "Credit".

Let me also add I am embarassed at the number of pg_upgrade release note
items with my name on them.  They are all user-visible changes, so
should be listed, but does having 9 pg_upgrade items out of 245 (4%)
seem fair credit-wise?  No.  There's another unfair example.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 11:32 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 11:26:14AM -0400, Robert Haas wrote:

There are some cases, like index-only scans, where I think it would be
very hard to get down to one name, because four different people wrote
code that ended up being part of that.  Now you could probably get it
down to just two by cutting Heikki (who isn't listed) and Ibrar (who
is) but saying that only one of Tom and I did that feature would be
quite misleading regardless of who you picked.  Similarly, there are a
couple of patches that I worked on with Simon where crediting only one
of us would be wrong, regardless of which one you picked, and I think
there are other cases of this involving other people as well.  So I
think a hard and fast rule of crediting exactly one person is not
going to work, but limiting it to the primary author or authors is
feasible.

Honestly, I'm leaning more and more toward the view that we should
just rip the names out entirely.  I mean, look at something like
sortsupport.  That would never have gotten done without Peter
Geoghegan's work on it, but the code *as committed* was half mine and
half Tom's.  So what are you going to do with that?  It's weird to
credit Peter and not Tom or I, and it's weird to credit Tom or I and
not Peter, and it's even weird of you credit all three of us because
any decision about who to put first is arguable and maybe wrong.  The
simplest solution to my mind is to credit no one, which at least has
the advantage of being unarguably uniform.

Keep in mind that the reason I originally had names in the release notes
was so I could remember who to email when something broke.  That really
isn't the case anymore.

I agree that making these names give _credit_ is never going to work.
Robert's example above is very clear on that.

We could try cutting it down to one name and see if we have any problems
with it.  Robert is right that if you are thinking of this as "credit"
it is never going to work.




I don't really buy this at all. The fact that it's not perfect doesn't 
mean that it's wrong. Just about the only reward we give contributors is 
some kudos, and the more the better as far as I'm concerned. I'd almost 
like to see a "Credits" section of the release notes, but if we're not 
going to have that let's keep doing what we have been doing.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 11:46:20AM -0400, Andrew Dunstan wrote:
> >>I don't think 5 minutes is anywhere near necessary even for the docs,
> >>but there is a lot of room between 5 minutes and 4 hours, so we can
> >>definitely shorten it.
> >Do you want me to just setup a build on my machine like we did before;
> >5 minutes is no problem for me.
> >
> >I use the doc build to show patch submitters what their final work looks
> >like, and anything more than a few minutes delay makes that useless.
> >
> 
> It's been done the current way for quite a few months now. If you're
> only noticing it now is it really such an inconvenience? Having said
> that, I'm not at all opposed to reducing the lag time.

Well, I am not applying doc patches much anymore;  my point is that the
first time I actually need to send out a URL of the docs, it wasn't
sufficient for me.

Yes, I can work around it, but it that what we want everyone to do?  I
don't remember this change being discussed anywhere I see.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 11:24 AM, Bruce Momjian wrote:

On Thu, May 10, 2012 at 12:49:51PM +0200, Magnus Hagander wrote:

On Thu, May 10, 2012 at 12:43 PM, Andrew Dunstan  wrote:


On 05/10/2012 01:29 AM, Tom Lane wrote:

Bruce Momjianwrites:

The docs finally built 90 minutes after my commit, and the URL above is
now working.  (Does it always take this long to update?)

I believe the new implementation of that stuff is that the devel docs
are built whenever the buildfarm member guaibasaurus runs for HEAD,
which it seems to do on an hourly schedule.  This is definitely not as
fast-responding as Peter's former custom script, but I'm not sure if
it's worth thinking of another way.


I don't see any reason it can't run more frequently, though. Currently a run
takes 15 minutes or so. We could reduce that by making it skip some steps,
and get it down to about 10 minutes. It would be perfectly reasonable to run
every 5 minutes (it won't schedule concurrent runs - if the lock file is
held by another run it exits gracefully). Of course, that's up to Magnus and
Stefan.

If we can make it do *just* the docs, we can certainly run it a bit
more often. But we don't want to make it run the full set of checks
more or less continously, since the machine is shared with a number of
other tasks...

I don't think 5 minutes is anywhere near necessary even for the docs,
but there is a lot of room between 5 minutes and 4 hours, so we can
definitely shorten it.

Do you want me to just setup a build on my machine like we did before;
5 minutes is no problem for me.

I use the doc build to show patch submitters what their final work looks
like, and anything more than a few minutes delay makes that useless.



It's been done the current way for quite a few months now. If you're 
only noticing it now is it really such an inconvenience? Having said 
that, I'm not at all opposed to reducing the lag time.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, May 10, 2012 at 11:26:14AM -0400, Robert Haas wrote:
>> Honestly, I'm leaning more and more toward the view that we should
>> just rip the names out entirely.

> We will need to make some decision in the next few hours.

I think this is a delicate question and we should *not* make a hasty
decision.  The release notes are almost certainly going to get worked
over quite a bit between now and 9.2 final; there is no need to assume
that the beta1 version has to reflect a final decision.

I'd vote for starting a separate thread to solicit people's opinions
on whether we need names in the release notes.  Is there anybody on
-hackers who would be offended, or would have a harder time persuading
$BOSS to let them spend time on Postgres if they weren't mentioned in
the release notes?  There'd still be a policy of crediting people in
commit messages of course, but it's not clear to me whether the release
note mentions are important to anybody.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-10 Thread Tom Lane
Magnus Hagander  writes:
> On May 10, 2012 4:59 PM, "Tom Lane"  wrote:
>> I spent some time staring at the Windows WaitLatchOrSocket code myself.
>> The only thing I could find that seemed wrong is that in the event
>> array, we list the latch's event before pgwin32_signal_event.  The
>> Microsoft documentation I looked at says that if more than one event
>> is ready, WaitforMultipleObjects reports the first such array member.
>> This means that if the latch is already set when control gets here,
>> signal handlers will not be serviced.

> Yeah, that does seem wrong.

>> That doesn't match what would
>> happen on a Unix machine, so it seems like at least a violation of the
>> POLA.  Hence I think we oughta swap the order of those two array
>> elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
>> pgwin32_select.)  I do not however

> Maybe we need a loop that checks for all events?

I don't think so.  It's already the case that WaitLatch doesn't
guarantee that all possible flags are set in its result.  In connection
with Peter G's observation that we could simplify the API by rechecking
PostmasterIsAlive for WL_POSTMASTER_DEATH, I was planning to clarify
the API spec as "result bits that are set are guaranteed to reflect
reality, but it's not guaranteed that we set every bit that could
possibly be set".  This should not break any caller since the same
result could occur given a slight change in timing anyway; the caller
has to be prepared to come back and check for more conditions after it
services whatever WaitLatch does report.  However, signal service is
not a condition the caller is supposed to deal with, so I think we
want a guarantee that that happens inside WaitLatch.

>> see a way that that would explain the
>> pgstat failures, because the stats collector's latch really shouldn't
>> ever get set during normal regression test runs.

> So could there be something wrong in the other end, meaning the latch
> *does* get set?

Even if it did, it'd get cleared at the top of the loop, so that the
next call ought to handle things.  Tis a puzzlement.  AFAICS the only
condition WaitforMultipleObjects is going to see in these tests is
read-ready on the socket; surely it wouldn't fail to notice that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 05:31:15PM +0200, Magnus Hagander wrote:
> > I use the doc build to show patch submitters what their final work looks
> > like, and anything more than a few minutes delay makes that useless.
> >
> 
> Anything that runs off the main git repo would be useless there, since it 
> would
> never show up prior to commit.

I will commit something then send them a URL saying, "Hey, committed,
look here for the results."

> If people want the main docs building more often that's not really a problem
> other than time - we just need to decouple it from the buildfarm and run a
> separate job for it. It's not rocket science..

I do think we need to do that.  The release note publication was stalled
for 90 minutes just in this one case.  The docs are still hard enough to
build that I can imagine others would like to have quick feedback.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 10:28 AM, Kevin Grittner
 wrote:
>> One problem I see with this approach is that its efficiency
>> depends on the average tuple length, at least with a naive
>> approach to random ctid generator. The simplest way to generate
>> those randomly without introducing bias is to generate a random
>> page index between 0 and the relation's size in pages,
>
> Right.
>
>> and then generate random tuple index between 0 and
>> MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard
>> page size of 8k.
>
> I think we can do better than that without moving too far from "the
> simplest way".  It seems like we should be able to get a more
> accurate calculation of a minimum base tuple size based on the table
> definition, and calculate the maximum number of those which could
> fit on a page.  On the other hand, ctid uses a line pointer, doesn't
> it?  Do we need to worry about dead line pointers allowing higher
> tuple indexes than the calculated maximum number of tuples per page?

I wonder if you could do this with something akin to the Bitmap Heap
Scan machinery.  Populate a TID bitmap with a bunch of randomly chosen
TIDs, fetch them all in physical order, and if you don't get as many
rows as you need, rinse and repeat until you do.

I'm worried this project is getting so complicated that it will be
beyond the ability of a new hacker to get anything useful done.  Can
we simplify the requirements here to something that is reasonable for
a beginner?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 11:26:14AM -0400, Robert Haas wrote:
> There are some cases, like index-only scans, where I think it would be
> very hard to get down to one name, because four different people wrote
> code that ended up being part of that.  Now you could probably get it
> down to just two by cutting Heikki (who isn't listed) and Ibrar (who
> is) but saying that only one of Tom and I did that feature would be
> quite misleading regardless of who you picked.  Similarly, there are a
> couple of patches that I worked on with Simon where crediting only one
> of us would be wrong, regardless of which one you picked, and I think
> there are other cases of this involving other people as well.  So I
> think a hard and fast rule of crediting exactly one person is not
> going to work, but limiting it to the primary author or authors is
> feasible.
> 
> Honestly, I'm leaning more and more toward the view that we should
> just rip the names out entirely.  I mean, look at something like
> sortsupport.  That would never have gotten done without Peter
> Geoghegan's work on it, but the code *as committed* was half mine and
> half Tom's.  So what are you going to do with that?  It's weird to
> credit Peter and not Tom or I, and it's weird to credit Tom or I and
> not Peter, and it's even weird of you credit all three of us because
> any decision about who to put first is arguable and maybe wrong.  The
> simplest solution to my mind is to credit no one, which at least has
> the advantage of being unarguably uniform.

Keep in mind that the reason I originally had names in the release notes
was so I could remember who to email when something broke.  That really
isn't the case anymore.

I agree that making these names give _credit_ is never going to work. 
Robert's example above is very clear on that.

We could try cutting it down to one name and see if we have any problems
with it.  Robert is right that if you are thinking of this as "credit"
it is never going to work.

We will need to make some decision in the next few hours.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Magnus Hagander
On May 10, 2012 5:24 PM, "Bruce Momjian"  wrote:
>
> On Thu, May 10, 2012 at 12:49:51PM +0200, Magnus Hagander wrote:
> > On Thu, May 10, 2012 at 12:43 PM, Andrew Dunstan 
wrote:
> > >
> > >
> > > On 05/10/2012 01:29 AM, Tom Lane wrote:
> > >>
> > >> Bruce Momjian  writes:
> > >>>
> > >>> The docs finally built 90 minutes after my commit, and the URL
above is
> > >>> now working.  (Does it always take this long to update?)
> > >>
> > >> I believe the new implementation of that stuff is that the devel docs
> > >> are built whenever the buildfarm member guaibasaurus runs for HEAD,
> > >> which it seems to do on an hourly schedule.  This is definitely not
as
> > >> fast-responding as Peter's former custom script, but I'm not sure if
> > >> it's worth thinking of another way.
> > >>
> > >
> > > I don't see any reason it can't run more frequently, though.
Currently a run
> > > takes 15 minutes or so. We could reduce that by making it skip some
steps,
> > > and get it down to about 10 minutes. It would be perfectly reasonable
to run
> > > every 5 minutes (it won't schedule concurrent runs - if the lock file
is
> > > held by another run it exits gracefully). Of course, that's up to
Magnus and
> > > Stefan.
> >
> > If we can make it do *just* the docs, we can certainly run it a bit
> > more often. But we don't want to make it run the full set of checks
> > more or less continously, since the machine is shared with a number of
> > other tasks...
> >
> > I don't think 5 minutes is anywhere near necessary even for the docs,
> > but there is a lot of room between 5 minutes and 4 hours, so we can
> > definitely shorten it.
>
> Do you want me to just setup a build on my machine like we did before;
> 5 minutes is no problem for me.
>
> I use the doc build to show patch submitters what their final work looks
> like, and anything more than a few minutes delay makes that useless.
>

Anything that runs off the main git repo would be useless there, since it
would never show up prior to commit.

If people want the main docs building more often that's not really a
problem other than time - we just need to decouple it from the buildfarm
and run a separate job for it. It's not rocket science..

/Magnus


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-10 Thread Magnus Hagander
On May 10, 2012 4:59 PM, "Tom Lane"  wrote:
>
> I wrote:
> > Last night I changed the stats collector process to use
> > WaitLatchOrSocket instead of a periodic forced wakeup to see whether
> > the postmaster has died.  This morning I observe that several Windows
> > buildfarm members are showing regression test failures caused by
> > unexpected "pgstat wait timeout" warnings.  Everybody else is fine.
>
> > This suggests that there is something broken in the Windows
> > implementation of WaitLatchOrSocket.  I wonder whether it also
> > tells us something we did not know about the underlying cause of
> > those messages.  Not sure what though.  Ideas?  Can anyone who
> > knows Windows take another look at WaitLatchOrSocket?
>
> Anybody have any clues about that?  If not, I think I'll have to revert
> the pgstat changes for beta1, which isn't really forward progress.

Haven't had time to look at the code itself, and won't before wrap time.
Sorry.

> I spent some time staring at the Windows WaitLatchOrSocket code myself.
> The only thing I could find that seemed wrong is that in the event
> array, we list the latch's event before pgwin32_signal_event.  The
> Microsoft documentation I looked at says that if more than one event
> is ready, WaitforMultipleObjects reports the first such array member.
> This means that if the latch is already set when control gets here,
> signal handlers will not be serviced.

Yeah, that does seem wrong.

>  That doesn't match what would
> happen on a Unix machine, so it seems like at least a violation of the
> POLA.  Hence I think we oughta swap the order of those two array
> elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
> pgwin32_select.)  I do not however

Maybe we need a loop that checks for all events?

> see a way that that would explain the
> pgstat failures, because the stats collector's latch really shouldn't
> ever get set during normal regression test runs.

So could there be something wrong in the other end, meaning the latch
*does* get set?

/Magnus


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 11:16 AM, Bruce Momjian  wrote:
>> Yes.  This seems to be a policy change that was made without notice or
>> discussion, and I personally don't find it to be a good idea.  I think
>> the release notes should only credit the primary author(s) of a feature.
>> Face it, most people don't care about that, so we should not be
>> expending much space on it.
>
> Agreed on just using the primary author.  The first name is _always_ the
> primary author, so we can just go with that.  I didn't want to do:
>
>        (Tom Lane, Robert Haas;  reviewers Bruce Momjian, Jeff Davis)
>
> That was too complicated.
>
> Should I make the change now?  It is easy.  Should we remove the names
> completely?  We can consider going to a single name as a move toward
> removing names evantually.

There are some cases, like index-only scans, where I think it would be
very hard to get down to one name, because four different people wrote
code that ended up being part of that.  Now you could probably get it
down to just two by cutting Heikki (who isn't listed) and Ibrar (who
is) but saying that only one of Tom and I did that feature would be
quite misleading regardless of who you picked.  Similarly, there are a
couple of patches that I worked on with Simon where crediting only one
of us would be wrong, regardless of which one you picked, and I think
there are other cases of this involving other people as well.  So I
think a hard and fast rule of crediting exactly one person is not
going to work, but limiting it to the primary author or authors is
feasible.

Honestly, I'm leaning more and more toward the view that we should
just rip the names out entirely.  I mean, look at something like
sortsupport.  That would never have gotten done without Peter
Geoghegan's work on it, but the code *as committed* was half mine and
half Tom's.  So what are you going to do with that?  It's weird to
credit Peter and not Tom or I, and it's weird to credit Tom or I and
not Peter, and it's even weird of you credit all three of us because
any decision about who to put first is arguable and maybe wrong.  The
simplest solution to my mind is to credit no one, which at least has
the advantage of being unarguably uniform.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 12:49:51PM +0200, Magnus Hagander wrote:
> On Thu, May 10, 2012 at 12:43 PM, Andrew Dunstan  wrote:
> >
> >
> > On 05/10/2012 01:29 AM, Tom Lane wrote:
> >>
> >> Bruce Momjian  writes:
> >>>
> >>> The docs finally built 90 minutes after my commit, and the URL above is
> >>> now working.  (Does it always take this long to update?)
> >>
> >> I believe the new implementation of that stuff is that the devel docs
> >> are built whenever the buildfarm member guaibasaurus runs for HEAD,
> >> which it seems to do on an hourly schedule.  This is definitely not as
> >> fast-responding as Peter's former custom script, but I'm not sure if
> >> it's worth thinking of another way.
> >>
> >
> > I don't see any reason it can't run more frequently, though. Currently a run
> > takes 15 minutes or so. We could reduce that by making it skip some steps,
> > and get it down to about 10 minutes. It would be perfectly reasonable to run
> > every 5 minutes (it won't schedule concurrent runs - if the lock file is
> > held by another run it exits gracefully). Of course, that's up to Magnus and
> > Stefan.
> 
> If we can make it do *just* the docs, we can certainly run it a bit
> more often. But we don't want to make it run the full set of checks
> more or less continously, since the machine is shared with a number of
> other tasks...
> 
> I don't think 5 minutes is anywhere near necessary even for the docs,
> but there is a lot of room between 5 minutes and 4 hours, so we can
> definitely shorten it.

Do you want me to just setup a build on my machine like we did before; 
5 minutes is no problem for me.

I use the doc build to show patch submitters what their final work looks
like, and anything more than a few minutes delay makes that useless.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Bruce Momjian  writes:
> Should I make the change now?  It is easy.

Yes.

> Should we remove the names completely?

That would be a policy change too, and one that probably requires more
leisurely consideration than we have time for today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 07:20:51AM +0200, Erik Rijkers wrote:
> On Thu, May 10, 2012 06:33, Bruce Momjian wrote:
> > On Wed, May 09, 2012 at 11:11:02PM -0400, Bruce Momjian wrote:
> >>
> >>http://www.postgresql.org/docs/devel/static/release-9-2.html
> >
> 
> To "E.1.2.5. Monitoring" should be added:
> 
> "Rename pg_stat_activity.current_query to query (Magnus Hagander)"
> 
> 
> 
> And perhaps (same paragraph):
> 
> "The previous query values are preserved, allowing for enhanced analysis."
> 
> would be clearer as:
> 
> "The last query values are preserved, allowing for enhanced analysis."

Thanks for the feedback.  Adjustments made with the attached patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/release-9.2.sgml b/doc/src/sgml/release-9.2.sgml
new file mode 100644
index f7d185b..9bf08d6
*** a/doc/src/sgml/release-9.2.sgml
--- b/doc/src/sgml/release-9.2.sgml
***
*** 335,341 
 
  Rename pg_stat_activity.procpid
! to pid, to match other system tables (Magnus Hagander)
 


--- 335,341 
 
  Rename pg_stat_activity.procpid
! to pid, to match other system tables (Magnus Hagander)
 


***
*** 347,354 
 
 
 
! The previous query values are preserved, allowing for enhanced
! analysis.
 


--- 347,361 
 
 
 
! The last query values are preserved, allowing for enhanced analysis.
!
!   
!   
!   
!
! Rename pg_stat_activity.current_query to
! query because it is not cleared when the query
! completes (Magnus Hagander)
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 11:04:47AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > When we did the 9.1 release notes, reviewers weren't credited, and I
> > sort of assumed that policy would be the same this time around.
> 
> Yes.  This seems to be a policy change that was made without notice or
> discussion, and I personally don't find it to be a good idea.  I think
> the release notes should only credit the primary author(s) of a feature.
> Face it, most people don't care about that, so we should not be
> expending much space on it.

Agreed on just using the primary author.  The first name is _always_ the
primary author, so we can just go with that.  I didn't want to do:

(Tom Lane, Robert Haas;  reviewers Bruce Momjian, Jeff Davis)

That was too complicated.

Should I make the change now?  It is easy.  Should we remove the names
completely?  We can consider going to a single name as a move toward
removing names evantually.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-10 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
>> Alvaro Herrera  writes:
> I noticed while doing some tests that the checkpointer process does not
> recover very nicely after a backend crashes under postmaster -T

> It seems to me that the bug is in the postmaster state machine rather
> than checkpointer itself.  After a few false starts, this seems to fix
> it:

> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
> signal_child(WalWriterPID, SIGTERM);
> if (BgWriterPID != 0)
> signal_child(BgWriterPID, SIGTERM);
> +   if (FatalError && CheckpointerPID != 0)
> +   signal_child(CheckpointerPID, SIGUSR2);

Surely we do not want the checkpointer doing a shutdown checkpoint here.
If we need it to die immediately, SIGQUIT is the way.  If we want a
shutdown checkpoint, that has to wait till after everything else is
known dead.  So while I agree this may be a state machine bug, that
doesn't look like a good fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-10 Thread MauMau

From: "Kevin Grittner" 

"MauMau"  wrote:

For information, what kind of breakage would occur?



I imagined removing KEEPONLYALNUM would just accept
non-alphanumeric characters and cause no harm to those who use
only alphanumeric characters.


This would break our current usages because of the handling of
trigrams at the "edges" of groups of qualifying characters.  It
would make similarity (and distance) values less useful for our
current name searches using it.  To simulate the effect, I used an
'8' in place of a comma instead of recompiling with the suggested
change.

test=# select show_trgm('smith,john');
show_trgm
---
{"  j","  s"," jo"," sm","hn ",ith,joh,mit,ohn,smi,"th "}
(1 row)

test=# select show_trgm('smith8john');
 show_trgm
-
{"  s"," sm",8jo,h8j,"hn ",ith,joh,mit,ohn,smi,th8}
(1 row)

test=# select similarity('smith,john', 'jon smith');
similarity

  0.615385
(1 row)

test=# select similarity('smith8john', 'jon smith');
similarity

0.3125
(1 row)

So making the proposed change unconditionally could indeed hurt
current users of the technique.  On the other hand, if there was
fine-grained control of this, it might make trigrams useful for
searching statute cites (using all characters) as well as names
(using the current character set); so I wouldn't want it to just be
controlled by a global GUC.


Thanks for your explanation. Although I haven't understood it well yet, I'll 
consider what you taught. And I'll consider if the tentative measure of 
removing KEEPONLYALNUM is correct for someone who wants to use pg_trgm 
against Japanese text.


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer code behaving strangely on postmaster -T

2012-05-10 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue may 10 02:27:32 -0400 2012:
> Alvaro Herrera  writes:
> > I noticed while doing some tests that the checkpointer process does not
> > recover very nicely after a backend crashes under postmaster -T (after
> > all processes have been kill -CONTd, of course, and postmaster told to
> > shutdown via Ctrl-C on its console).  For some reason it seems to get
> > stuck on a loop doing sleep(0.5s)  In other case I caught it trying to
> > do a checkpoint, but it was progressing a single page each time and then
> > sleeping.  In that condition, the checkpoint took a very long time to
> > finish.
> 
> Is this still a problem as of HEAD?  I think I've fixed some issues in
> the checkpointer's outer loop logic, but not sure if what you saw is
> still there.

Yep, it's still there as far as I can tell.  A backtrace from the
checkpointer shows it's waiting on the latch.

It seems to me that the bug is in the postmaster state machine rather
than checkpointer itself.  After a few false starts, this seems to fix
it:

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2136,6 +2136,8 @@ pmdie(SIGNAL_ARGS)
signal_child(WalWriterPID, SIGTERM);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGTERM);
+   if (FatalError && CheckpointerPID != 0)
+   signal_child(CheckpointerPID, SIGUSR2);
 
/*
 * If we're in recovery, we can't kill the startup process
@@ -2178,6 +2180,8 @@ pmdie(SIGNAL_ARGS)
signal_child(WalReceiverPID, SIGTERM);
if (BgWriterPID != 0)
signal_child(BgWriterPID, SIGTERM);
+   if (FatalError && CheckpointerPID != 0)
+   signal_child(CheckpointerPID, SIGUSR2);
if (pmState == PM_RECOVERY)
{
/* only checkpointer is active in this state */


Note that since checkpointer can only be running after we enter
FatalError when the -T (send SIGSTOP instead of SIGQUIT) switch is used,
this bug doesn't seem to affect normal usage.  (I'm not sure SIGUSR2 is
the most appropriate signal to send at this time -- since we're in
FatalError, probably SIGQUIT is better suited.)

One good thing is that when I patched postmaster in a different way
(which I later realized to be bogus), I caused it to die with an
assertion while checkpointer was still running; the debug output let me
know that checkpointer went away immediately.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Tom Lane
Robert Haas  writes:
> When we did the 9.1 release notes, reviewers weren't credited, and I
> sort of assumed that policy would be the same this time around.

Yes.  This seems to be a policy change that was made without notice or
discussion, and I personally don't find it to be a good idea.  I think
the release notes should only credit the primary author(s) of a feature.
Face it, most people don't care about that, so we should not be
expending much space on it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "pgstat wait timeout" just got a lot more common on Windows

2012-05-10 Thread Tom Lane
I wrote:
> Last night I changed the stats collector process to use
> WaitLatchOrSocket instead of a periodic forced wakeup to see whether
> the postmaster has died.  This morning I observe that several Windows
> buildfarm members are showing regression test failures caused by
> unexpected "pgstat wait timeout" warnings.  Everybody else is fine.

> This suggests that there is something broken in the Windows
> implementation of WaitLatchOrSocket.  I wonder whether it also
> tells us something we did not know about the underlying cause of
> those messages.  Not sure what though.  Ideas?  Can anyone who
> knows Windows take another look at WaitLatchOrSocket?

Anybody have any clues about that?  If not, I think I'll have to revert
the pgstat changes for beta1, which isn't really forward progress.

I spent some time staring at the Windows WaitLatchOrSocket code myself.
The only thing I could find that seemed wrong is that in the event
array, we list the latch's event before pgwin32_signal_event.  The
Microsoft documentation I looked at says that if more than one event
is ready, WaitforMultipleObjects reports the first such array member.
This means that if the latch is already set when control gets here,
signal handlers will not be serviced.  That doesn't match what would
happen on a Unix machine, so it seems like at least a violation of the
POLA.  Hence I think we oughta swap the order of those two array
elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
pgwin32_select.)  I do not however see a way that that would explain the
pgstat failures, because the stats collector's latch really shouldn't
ever get set during normal regression test runs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-10 Thread Magnus Hagander
On Thu, May 10, 2012 at 4:43 PM, Magnus Hagander  wrote:
> On Thu, May 10, 2012 at 3:04 PM, Magnus Hagander  wrote:
>> Argh. This thread appears to have been forgotten - sorry about that.
>>
>> Given that we're taling about a potential protocol change, we really
>> should resolve this before we wrap beta, no?
>
> Had a chat with Heikki about this, and we came to the conslusion that
> we don't actually have to fix it befor ebeta. Because pg_basebackup is
> going to have to consider 9.1 servers anyway, and we can just treat
> 9.2beta1 as being a 9.1 from this perspective.
>
> We still have to fix it, but it' snot as urgent :-)
>
> FWIW, the main plan we're onto is still to add the GUCs on new
> connections to walsender, so we have something to work with...

And taking this a step further - we *already* send these GUCs.
Previous references to us not doing that were incorrect :-)

So this should be a much easier fix than we thought. And can be done
entirely in pg_basebackup, meaning we don't need to worry about
beta...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Bruce Momjian
On Thu, May 10, 2012 at 09:20:32AM -0400, Alvaro Herrera wrote:
> 
> Excerpts from Peter Geoghegan's message of jue may 10 09:12:57 -0400 2012:
> > On 10 May 2012 13:45, Andrew Dunstan  wrote:
> > > Right, but I think it would be good to identify them explicitly as 
> > > reviewers
> > > if we're going to include the names.
> > 
> > +1. I think we should probably do more to credit reviewers. It's not
> > uncommon for a reviewer to end up becoming a co-author, particularly
> > if they're a committer, but it's a little misleading to add a reviewer
> > after the feature description without qualifying that they are the
> > reviewer.
> 
> Agreed.
> 
> What about crediting patch sponsors (other than the author's employer, I
> mean)?  I remember crediting one in a commit message and being told it
> wasn't okay.  Is it okay to credit them in the release notes?

No.  We discussed crediting companies in the release notes, and that was
agreed to be a bad idea, I think because the release notes live for so
long, and because the release notes would end up being an advertisement.
Can you imagine all our employers saying we should get their name into
the release notes more?

The big take-away is that the release notes are mostly for blame and to
designate a go-to person for feature problems, not for giving credit,
and especially not for company credit.  There are just too many people
reading those release notes for that to happen, and if we are going to
go any direction, it would be to remove user names completely from the
release notes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-10 Thread Magnus Hagander
On Thu, May 10, 2012 at 3:04 PM, Magnus Hagander  wrote:
> Argh. This thread appears to have been forgotten - sorry about that.
>
> Given that we're taling about a potential protocol change, we really
> should resolve this before we wrap beta, no?

Had a chat with Heikki about this, and we came to the conslusion that
we don't actually have to fix it befor ebeta. Because pg_basebackup is
going to have to consider 9.1 servers anyway, and we can just treat
9.2beta1 as being a 9.1 from this perspective.

We still have to fix it, but it' snot as urgent :-)

FWIW, the main plan we're onto is still to add the GUCs on new
connections to walsender, so we have something to work with...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can pg_trgm handle non-alphanumeric characters?

2012-05-10 Thread Kevin Grittner
"MauMau"  wrote:
 
>>> On 09-05-2012 19:17, MauMau wrote:
 Then, does it make sense to remove "#define KEEPONLYALNUM" in
 9.1.4? Would it cause any problems?
 
Yes, it will cause problems.
 
> For information, what kind of breakage would occur?
 
> I imagined removing KEEPONLYALNUM would just accept
> non-alphanumeric characters and cause no harm to those who use
> only alphanumeric characters.
 
This would break our current usages because of the handling of
trigrams at the "edges" of groups of qualifying characters.  It
would make similarity (and distance) values less useful for our
current name searches using it.  To simulate the effect, I used an
'8' in place of a comma instead of recompiling with the suggested
change.

test=# select show_trgm('smith,john');
 show_trgm 
---
 {"  j","  s"," jo"," sm","hn ",ith,joh,mit,ohn,smi,"th "}
(1 row)

test=# select show_trgm('smith8john');
  show_trgm  
-
 {"  s"," sm",8jo,h8j,"hn ",ith,joh,mit,ohn,smi,th8}
(1 row)

test=# select similarity('smith,john', 'jon smith');
 similarity 

   0.615385
(1 row)

test=# select similarity('smith8john', 'jon smith');
 similarity 

 0.3125
(1 row)
 
So making the proposed change unconditionally could indeed hurt
current users of the technique.  On the other hand, if there was
fine-grained control of this, it might make trigrams useful for
searching statute cites (using all characters) as well as names
(using the current character set); so I wouldn't want it to just be
controlled by a global GUC.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Robert Haas
On Thu, May 10, 2012 at 9:12 AM, Peter Geoghegan  wrote:
> On 10 May 2012 13:45, Andrew Dunstan  wrote:
>> Right, but I think it would be good to identify them explicitly as reviewers
>> if we're going to include the names.
>
> +1. I think we should probably do more to credit reviewers. It's not
> uncommon for a reviewer to end up becoming a co-author, particularly
> if they're a committer, but it's a little misleading to add a reviewer
> after the feature description without qualifying that they are the
> reviewer.

Right.  Plus Bruce has arbitrarily excluded committer-reviewers even
when they substantially revised the patch as part of that review, and
included non-committer-reviewers even when they did little more than
say "good idea, +1".  There are patches on that list where I did A LOT
of work and am not credited, including some where other people did get
credited for much less work.  I don't feel a crying need to be
credited on the maximum possible number of items, but it seems weird
to see one group of people credited for what may well have been an
hour's work while another group of people isn't credited even when
they did two or three days worth of work.

When we did the 9.1 release notes, reviewers weren't credited, and I
sort of assumed that policy would be the same this time around.  I
also sort of assumed that the committer would be credited if the
commit message stated that they had done substantial further work on
the patch, but not if it said that they'd only done a little bit of
work or none.  Honestly, I don't really care what the standard for
inclusion is, but it's so glaringly non-uniform right now that it
really makes no sense.

I think my own personal preference would be to remove all the reviewer
names from individual items and list only the people who contributed
significantly to the code, and then have a section at the bottom where
we credit all the reviewers without reference to specific patches.  Or
maybe we should just remove all the names from the release notes, full
stop, since it's pretty clear that we're on the verge of having the
names take up more space than the items to which they refer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-05-10 Thread Kevin Grittner
Florian Pflug  wrote:
 
> One problem I see with this approach is that its efficiency
> depends on the average tuple length, at least with a naive
> approach to random ctid generator. The simplest way to generate
> those randomly without introducing bias is to generate a random
> page index between 0 and the relation's size in pages,
 
Right.
 
> and then generate random tuple index between 0 and
> MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard
> page size of 8k.
 
I think we can do better than that without moving too far from "the
simplest way".  It seems like we should be able to get a more
accurate calculation of a minimum base tuple size based on the table
definition, and calculate the maximum number of those which could
fit on a page.  On the other hand, ctid uses a line pointer, doesn't
it?  Do we need to worry about dead line pointers allowing higher
tuple indexes than the calculated maximum number of tuples per page?
 
> The current toasting threshold (TOAST_TUPLE_THRESHOLD) is
> approximately 2k, so having tables with an average heap tuple size
> of a few hundred bytes doesn't seem unlikely. Now, assume the
> average tuple length is 128 bytes, i.e. on average you'll have ~
> 8k/128 = 64 live tuples / page if the fill factor is 100% and all
> tuples are live. To account for lower fill factors and dead
> tuples, let's thus say there are 50 live tuples / page. Then, on 
> average, only every 6th randomly generated ctid will point to a
> live tuple. But whether or not it does can only be decided after
> reading the page from disk, so you end up with a rate of 6
> random-access reads per returned tuple.
> 
> IIRC, the cutoff point where an index scan loses compared to a
> sequential scan is somewhere around 10% of the table read, i.e. if
> a predicate selects more than 10% of the available rows, a
> sequential scan is more efficient than an index scan.
 
That ratio *might* be better for a ctid scan, since you don't have
the index access in the mix.
 
> Scaling that with the 1/6-th success rate from above means that
> Kevin's approach would only beat a sequential scan if the sampling
> percentage isn't much larger than 1%, assuming an average row size
> of 128 bytes.
> 
> The algorithm still seems like a good choice for very small
> sampling percentages, though.
 
Yeah, even with a maximum tuple count calculated by page, there are
certainly going to be cases where another approach will be faster,
especially where the sample is a relatively high percentage of the
table.  It would be good to have multiple plans compete on costs, if
possible.  It would not surprise me at all if the typical break-even
point between the two techniques was somewhere on the order of a 1%
sample size, but one would hope we could get there on the basis of
estimated costs rather than using arbitrary rules or forcing the
user to guess and code a choice explicitly.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Alvaro Herrera

Excerpts from Peter Geoghegan's message of jue may 10 09:12:57 -0400 2012:
> On 10 May 2012 13:45, Andrew Dunstan  wrote:
> > Right, but I think it would be good to identify them explicitly as reviewers
> > if we're going to include the names.
> 
> +1. I think we should probably do more to credit reviewers. It's not
> uncommon for a reviewer to end up becoming a co-author, particularly
> if they're a committer, but it's a little misleading to add a reviewer
> after the feature description without qualifying that they are the
> reviewer.

Agreed.

What about crediting patch sponsors (other than the author's employer, I
mean)?  I remember crediting one in a commit message and being told it
wasn't okay.  Is it okay to credit them in the release notes?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Peter Geoghegan
On 10 May 2012 13:45, Andrew Dunstan  wrote:
> Right, but I think it would be good to identify them explicitly as reviewers
> if we're going to include the names.

+1. I think we should probably do more to credit reviewers. It's not
uncommon for a reviewer to end up becoming a co-author, particularly
if they're a committer, but it's a little misleading to add a reviewer
after the feature description without qualifying that they are the
reviewer.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of jue may 10 07:19:53 -0400 2012:

> BTW, if there has been no change a buildfarm animal normally does no 
> work (other than a git pull followed by the check for updates), which is 
> why it's often safe to schedule it very frequently. However, if you need 
> to schedule tasks at times when it's known not to be running then a 
> sparse schedule makes sense.

Magnus was trying to say that the physical machine has other VMs doing
unrelated stuff, not that the BF animal VM itself had other things to
do.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-10 Thread Magnus Hagander
Argh. This thread appears to have been forgotten - sorry about that.

Given that we're taling about a potential protocol change, we really
should resolve this before we wrap beta, no?


On Thu, Mar 29, 2012 at 6:43 AM, Fujii Masao  wrote:
> On Tue, Feb 28, 2012 at 6:08 PM, Fujii Masao  wrote:
>> On Wed, Feb 8, 2012 at 1:33 AM, Magnus Hagander  wrote:
>>> Will it break using pg_basebackup 9.2 on a 9.1 server, though? that
>>> would also be very useful in the scenario of the central server...
>>
>> No unless I'm missing something. Because pg_basebackup doesn't use
>> any message which is defined in walprotocol.h if "-x stream" option is
>> not specified.
>
> No, this is not right at all :( Changing TimestampTz fields in 9.2 would break
> that use case.
>
> If we support that use case, pg_basebackup 9.2 must know which an integer
> or a double is used for TimestampTz in 9.1 server. Otherwise pg_basebackup
> cannot process a WAL data message proporly. But unfortunately there is no
> way for pg_basebackup 9.2 to know that... 9.1 has no API to report the actual
> datatype of its TimestampTz field.
>
> One idea to support that use case is to add new command-line option into
> pg_basebackup, which specifies the datatype of TimestampTz field. You can
> use one pg_basebackup 9.2 executable on 9.1 server whether
> --disable-integer-datetimes is specified or not. But I'm not really sure if 
> it's
> worth doing this, because ISTM that it's rare to build a server and a
> client with
> the different choice about TimestampTz datatype.

I think that's survivable - but what will things look like if they do
mismatch? Can we detect "abnormal values" somewhere and at least abort
in a controlled fashion saying "sorry, wrong build flags"?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Josh Kupershmidt
On Wed, May 9, 2012 at 8:11 PM, Bruce Momjian  wrote:
> I have completed my draft of the 9.2 release notes, and committed it to
> git.  I am waiting for our development docs to build, but after 40
> minutes, I am still waiting:

This bit:
  Previously supplied years and year masks of less than four digits
wrapped inconsistently.

I first read as "Previously-supplied years..." instead of "Previously,
years and year masks with...".

In line with what Robert said, IMO he should be credited on
pg_opfamily_is_visible(), particularly since it was his idea and code
originally IIRC. And a few more I'm familiar with with, such as psql's
\ir which he substantially revised, not to mention his
much-appreciated assistance with all the psql comment-displaying under
'E.1.3.9.2.1. Comments'.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 08:28 AM, Vik Reykja wrote:
On Thu, May 10, 2012 at 2:24 PM, Andrew Dunstan > wrote:




On 05/10/2012 08:11 AM, Peter Geoghegan wrote:

I'm not really sure why you've listed Daniel Farina as a
co-author of the pg_stat_statements normalisation feature. He
did a good job of reviewing it, but he didn't actually
contribute any code.



It looks like reviewers have been given credit throughout.


Which could be good incentive to become more involved in reviewing for 
some people.



Right, but I think it would be good to identify them explicitly as 
reviewers if we're going to include the names. Otherwise it could enable 
people to claim authorship of something they did not in fact author, and 
even without that would dilute the claim to authorship of the actual 
author(s).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Heikki Linnakangas

On 10.05.2012 13:21, Thom Brown wrote:

On 10 May 2012 04:11, Bruce Momjian  wrote:

I have completed my draft of the 9.2 release notes, and committed it to
git.

> ...


Couple typo corrections attached.


Applied.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Vik Reykja
On Thu, May 10, 2012 at 2:24 PM, Andrew Dunstan  wrote:

>
>
> On 05/10/2012 08:11 AM, Peter Geoghegan wrote:
>
>> I'm not really sure why you've listed Daniel Farina as a co-author of the
>> pg_stat_statements normalisation feature. He did a good job of reviewing
>> it, but he didn't actually contribute any code.
>>
>
>
> It looks like reviewers have been given credit throughout.
>

Which could be good incentive to become more involved in reviewing for some
people.


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Simon Riggs
On 10 May 2012 13:11, Peter Geoghegan  wrote:

> Why can't we call group commit group commit (and for that matter,
> index-only scans index-only scans), so that people will understand
> that we are now competitive with other RDBMSs in this area? "Improve
> performance of WAL writes when multiple transactions commit at the
> same time" seems like a pretty bad description, since it doesn't make
> any reference to batching of commits.  Also, I don't think that the
> placement of this as the second to last performance feature is
> commensurate with its actual importance.

Agreed.

Group Commit is a recognised term and also one where other DBMS, e.g.
Marea have included that feature recently.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-10 Thread Andrew Dunstan



On 05/10/2012 08:11 AM, Peter Geoghegan wrote:
I'm not really sure why you've listed Daniel Farina as a co-author of 
the pg_stat_statements normalisation feature. He did a good job of 
reviewing it, but he didn't actually contribute any code.



It looks like reviewers have been given credit throughout.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >