Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-17 Thread Tom Lane
Amit Kapila  writes:
>> I'm afraid I'm still not following you very well.  Perhaps you could
>> submit a proposed patch?

> Before that can you please explain in little more detail (if possible with
> small example) about the idea you have told in original mail : "is there any
> join clause that both these relations participate in?"

Um ... wasn't that well enough explained already?

I think there are basically two cases.  You can have a join clause that
is immediately useful for joining two relations, say

select ... from a,b where a.x = b.y;

This is "immediate" in the sense that you can apply it when joining a
to b, regardless of any other relations involved in the query.

Or you can have a case like

select ... from a,b,c where (a.x + b.y) = c.z;

This clause is not immediately useful for joining any two of the three
relations in the query.  It will be useful when we get to level 3,
particularly so if we chose to join a and b first and there's an index
on c.z.  But we would have had to accept doing a cartesian join of a and
b to arrive at that situation.  In this example, we have no alternative
except to do some cartesian join at level 2 --- but as soon as we add
some more tables and join clauses to the example, we could get
distracted from the possibility that a cartesian join of a and b might
be a good idea.

Given that make_rels_by_joins doesn't (and shouldn't IMO) have any
detailed understanding of the semantics of particular join clauses,
I would not expect it to realize that joining a to b is the most likely
option out of the three possible clauseless joins that are available
at level 2 in this query.  It's going to have to generate all 3, and
then costing at the next level will figure out what's best to do.

However, I think it *does* need to understand that clauses relating
3 or more relations can work like this.  In the code as it stood before
last week, it would actively reject joining a to b if there were any
additional relations in the query.  That's just not right.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Wolfgang Wilhelm
First of all I think a bug tracker will never be a system that is somehow 
"loved". Bugs are things which developers hate. Bugs in a cute nice small bug 
house aren't nice, are they?


Question: Is there a reasonable chance from may be Tom Lane's idea of a wiki to 
an improved (define improvements) version which PG developers think may be 
useful?

Wolfgang




 Von: Robert Haas 
An: Magnus Hagander  
CC: Jay Levitt ; Alex ; Dimitri 
Fontaine ; Alvaro Herrera ; 
Peter Eisentraut ; Tom Lane ; Greg Smith 
; Pg Hackers  
Gesendet: 3:07 Mittwoch, 18.April 2012
Betreff: Re: [HACKERS] Bug tracker tool we need
 
On Tue, Apr 17, 2012 at 1:47 AM, Magnus Hagander  wrote:
> That's probably one reason people aren't jumping on this. Because
> there is no tracker out there that people actually *like*...

I think this is a point worth serious thought.  The bug trackers I've
used have been mostly terrible; saying that they are to bug tracking
what CVS is to version control is insulting CVS.  They're more like
what editing RCS files in vi is to version control - i.e. worse than
not having version control.

To put that in practical terms, I think everyone (including people
like Tom and I) who (a) are old curmudgeons or anyway middle-aged
curmudgeons and (b) would spend much more time in bed with any system
that we adopted than the average hacker would agree that the current
system is kind of a pain.  But there is no point in replacing it with
something else unless that new thing is going to be significantly
better than what we are doing now.  And it's not entirely clear that
such a thing exists.  There are certainly people out there, and even
on this list, who will tell you that system ABC is great.  But for any
given ABC there are also people who will tell you that it's got
significant problems.  We don't need to change anything to get a
system that's got significant problems; we already have one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Tom Lane
Magnus Hagander  writes:
> I think this cleraly outlines that we need to remember that there are
> *two* different patterns that people are trying tosolve with the
> bugtracker.

Yeah, remember we drifted to this topic from discussion of management of
CF patches, which might be yet a third use-case.  It's not obvious that
it's the same as tracking unfixed bugs, at least; though maybe the
requirements end up the same.

> Any tool we'd go for should aim to cover *both* usecases.

Not convinced that we should expect one tool to be good at both
(or all three) things.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Magnus Hagander
On Wed, Apr 18, 2012 at 07:52, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Wed, Apr 18, 2012 at 04:30, Tom Lane  wrote:
>>> So when I read Andrew's recent suggestion that we use
>>> Bugzilla, my immediate reaction was "egad, can't we do better?".
>>> Maybe we can't :-(.
>
>> Personally, I'd say we *already* do better than that...
>
> Just meditating a little ... one of my big beefs with Bugzilla is that
> it shows basically a historical record of how a bug was discovered and
> dealt with.  While that surely has, er, historical value, it's not that
> useful to read when you want to know which bug matches your symptoms,
> what the possible consequences are, which versions it was fixed in,
> etc.  One particularly nasty point is that (AFAIK) it's impossible to
> delete or edit incorrect comments, only to add new ones.
>
> I wonder whether a better model would be a wiki page per bug, with
> an editable description and some links to reports, commits, etc.
> Not but what I hate every wiki I've ever used too ... but at least
> they let you fix the information when it's wrong or unhelpful.

There's no reason why a bugtracker couldn't and shoulldn't make it
possible to do that. I think the reason Bugzilla doesn't goes back to
the "all comments are concatenated into a huge text field", which
means you'd have to prase that text field to get the actual data back,
instead of having it easily available...

The discussion should in that case be about whether history shoudl be
kept for each individual comment, or just thrown away...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Tom Lane
Magnus Hagander  writes:
> On Wed, Apr 18, 2012 at 04:30, Tom Lane  wrote:
>> So when I read Andrew's recent suggestion that we use
>> Bugzilla, my immediate reaction was "egad, can't we do better?".
>> Maybe we can't :-(.

> Personally, I'd say we *already* do better than that...

Just meditating a little ... one of my big beefs with Bugzilla is that
it shows basically a historical record of how a bug was discovered and
dealt with.  While that surely has, er, historical value, it's not that
useful to read when you want to know which bug matches your symptoms,
what the possible consequences are, which versions it was fixed in,
etc.  One particularly nasty point is that (AFAIK) it's impossible to
delete or edit incorrect comments, only to add new ones.

I wonder whether a better model would be a wiki page per bug, with
an editable description and some links to reports, commits, etc.
Not but what I hate every wiki I've ever used too ... but at least
they let you fix the information when it's wrong or unhelpful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
Regarding the schema: I'm afraid the schema cannot be changed at this point, 
though I appreciate
the suggestions. 

Regarding an INSERT performance test, what kind of table shape would you like 
me to exercise? 
The patch as submitted may actually shave some cycles off of the insertion of 
rows with trailing nulls even 
when there are less than 64 columns because it avoids iterating over the null 
columns a 2nd time in heap_fill_tuple(), 
so I want to be sure that I pick something that you feel is properly 
representative. 

Thanks.

-Jamie



 From: Tom Lane 
To: Jameison Martin  
Cc: "pgsql-hackers@postgresql.org"  
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap 
 
Jameison Martin  writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 
> columns, most of which have only the first 50 or so values set. 800 columns 
> would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
> alignment the row bitmap would take up 104 bytes with the current 
> implementation. If only the first 50 or so columns are actually non-null, 
> then the minimum bitmap size wouldn't need to be more than 8 bytes, which 
> means the proposed change would save 96 bytes. For the data set I have in 
> mind roughly 90% of the rows would fall into the category of needing only 8 
> bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net 
> loss) for typical cases? Are you interested in some insert performance tests? 
> Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane

Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Magnus Hagander
On Wed, Apr 18, 2012 at 05:44, Tom Lane  wrote:
> Greg Smith  writes:
>> Rather than talk about adopting one of the available torture devices,
>> I'd happily consider the simplest thing possible that would be useful
>> here instead.  Here's my proposed tiny tracker:
>
> Wasn't Jay just muttering about "writing your own bug tracker" being an
> anti-pattern?  But still, there's something in what you say, because ...

The caes where it would work to do that is if we agreed it'd be a
"tiny tracker". And not actually try to do too much. As in basicaly a
neater frontend over the mailinglist archives.

I actually started on a demo of that at one point. Currently blocked
behind the fact that we have to fix the mailinglist archives as well,
in particular the break-on-month-boundary kills any attempt to do such
a thing. Which is also being worked on, but backlogged as usual :S

I'll admit that one reason it's been sitting fairly low on the prio
list is the guaranteed months of bikeshedding that it would bring
along ;)


>> -Make commits that fix a bug reference it in one of the standard ways
>> that's done by every one of these bug trackers.  Just throw "Fixes
>> #6596" into the commit message.  These will probably work if a more
>> serious tool is adopted, too.
>
> ... I think you'll find a lot of that data could be mined out of our
> historical commit logs already.  I know I make a practice of mentioning
> "bug #" whenever there is a relevant bug number, and I think other
> committers do too.  It wouldn't be 100% coverage, but still, if we could
> bootstrap the tracker with a few hundred old bugs, we might have
> something that was immediately useful, instead of starting from scratch
> and hoping it would eventually contain enough data to be useful.

I have always considered that a *requirement*, not a neat addon.


> At the same time, I think we'd likely be a lot better off squirting this
> data into bugzilla or another standard tracker, instead of building our
> own infrastructure.

I'm somewhat doubtful.

As a first step, we should at least stick in a tracker with a
reasonable SQL schema so it's possible to extract and do smomething
with the data. IIRC, bugzilla (and others) at least used to just
concatenate all comments into a single text field, and not even keep
them apart, for example. Because clearly this whole JOIN thing is evil
and difficult. They may have fixed that by now, but what i've seen
from most trackers shows signs of people who have never seen a
database beyond an Excel sheet...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-17 Thread Amit Kapila
>>I'm afraid I'm still not following you very well.  Perhaps you could
>>submit a proposed patch?

Before that can you please explain in little more detail (if possible with
small example) about the idea you have told in original mail : "is there any
join clause that both these relations participate in?"

I wanted to know the detail about the idea you told to see if what I am
proposing has any merit as compare to your idea.


-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Wednesday, April 18, 2012 10:36 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila  writes:
>> I might still be misunderstanding, but I think what you are suggesting
>> is that in the loop in make_rels_by_clause_joins, if we find that the
>> old_rel doesn't have a join clause/restriction with the current
>> other_rel, we check to see whether other_rel has any join clauses at
>> all, and force the join to occur anyway if it doesn't.

> It is on similar lines, but the only difference is that it will try to
join 
> old_rel with other_rel list incase 
> old_rel is not able to join with any of other_rel in the list with proper
> join clause between them. 

I'm afraid I'm still not following you very well.  Perhaps you could
submit a proposed patch?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Magnus Hagander
On Tue, Apr 17, 2012 at 19:59, Greg Smith  wrote:
> On 04/17/2012 09:20 AM, Jay Levitt wrote:
> Let's pick a real example from the last week of my life, where having a bug
> tracker would have helped me out.  This appears in a log:
>
> ERROR: missing chunk number 0 for toast value 1167375 in pg_toast_2619
>
> What I should be able to do here is search the bug tracker for these words
> and have it spit out an issue that looks like this

I'm snipping the actual usecase, because it's nice and deatailed, but.
I think this cleraly outlines that we need to remember that there are
*two* different patterns that people are trying tosolve with the
bugtracker.

One is the simple "someone reported a bug. track until someone fixes
it to make sure we don't miss it. Possibly even flag who is currently
working on/responsible for this bug".

The other one is for the *outsider* (sorry, Greg, in this scenario you
get to represent an outsider for once). Who comes in and wants to know
if the problem he/she has exists before, if it's fixed, and in which
versions it's fixed in.

There is some overlap, but the general usecase is drastically
different. And we do a decent enough job of the first one today,
partially because we have a few people who are very good at
remembering these things and then finding them in the list archives.
It works. However, we do very bad on the second one, IMHO.

Any tool we'd go for should aim to cover *both* usecases.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Magnus Hagander
On Wed, Apr 18, 2012 at 04:30, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Apr 17, 2012 at 1:47 AM, Magnus Hagander  wrote:
>>> That's probably one reason people aren't jumping on this. Because
>>> there is no tracker out there that people actually *like*...
>
>> I think this is a point worth serious thought.
>
> Indeed.  The only one I've got extensive experience with is Bugzilla
> (because Red Hat uses it) and I do cordially hate it.  At least some
> of that is due to bureaucratic practices RH has evolved, like cloning
> bugs N times for N affected releases, but I think the tool encourages
> such things.  So when I read Andrew's recent suggestion that we use
> Bugzilla, my immediate reaction was "egad, can't we do better?".
> Maybe we can't :-(.

Personally, I'd say we *already* do better than that...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-17 Thread Tom Lane
Robert Haas  writes:
> On Mon, Apr 16, 2012 at 1:43 PM, Jeff Davis  wrote:
>> ... Only one side really needs the mark and restore logic, but it was easier
>> to write the pseudocode in a symmetrical way (except step 7).

> I'm actually not sure these are equivalent formulations.  Suppose one
> side has [i,i] where i ranges from 1 to 1000 and the other side
> the exact same thing plus [1,1000].  That one really big range
> will come up second on the right side, and you will not be able to
> discard it until you reach the end of the join.  If you just keep
> rewinding the right side, you're going to end up with O(n^2) behavior,
> whereas if you can discard tuples "from the middle" on the right side,
> then you will get O(n) behavior, which is a big difference.  In other
> words, in your original algorithm the tuples that you discard in step
> 4 or 5 need not be the first remaining tuple on whichever side of the
> join we're talking about.

It would be a pretty weird implementation of mergejoin that could
"discard tuples from the middle" of an input stream.  Or to be more
specific, it wouldn't be the mergejoin itself that could do that at all
--- you'd need the input plan node to be some sort of tweaked version of
tuplestore or tuplesort that could respond to a request like that.

I can't escape the feeling that Jeff has chosen the wrong basis for his
thought experiment, and that what he really ought to be thinking about
is hashjoin, which keeps an in-memory table that it could easily modify
on the fly if it chose to.  The multi-batch aspect of hybrid hashjoin
could be useful too (IOW, when you're out of better ideas, throw the
tuple back in the water to process later).

This is just handwaving of course.  I think some digging in the
spatial-join literature would likely find ideas better than any of
these.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-17 Thread Tom Lane
Amit Kapila  writes:
>> I might still be misunderstanding, but I think what you are suggesting
>> is that in the loop in make_rels_by_clause_joins, if we find that the
>> old_rel doesn't have a join clause/restriction with the current
>> other_rel, we check to see whether other_rel has any join clauses at
>> all, and force the join to occur anyway if it doesn't.

> It is on similar lines, but the only difference is that it will try to join 
> old_rel with other_rel list incase 
> old_rel is not able to join with any of other_rel in the list with proper
> join clause between them. 

I'm afraid I'm still not following you very well.  Perhaps you could
submit a proposed patch?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Tom Lane
Jameison Martin  writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 
> columns, most of which have only the first 50 or so values set. 800 columns 
> would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
> alignment the row bitmap would take up 104 bytes with the current 
> implementation. If only the first 50 or so columns are actually non-null, 
> then the minimum bitmap size wouldn't need to be more than 8 bytes, which 
> means the proposed change would save 96 bytes. For the data set I have in 
> mind roughly 90% of the rows would fall into the category of needing only 8 
> bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net 
> loss) for typical cases? Are you interested in some insert performance tests? 
> Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Brendan Jurd
On 18 April 2012 13:44, Tom Lane  wrote:
> ... I think you'll find a lot of that data could be mined out of our
> historical commit logs already.  I know I make a practice of mentioning
> "bug #" whenever there is a relevant bug number, and I think other
> committers do too.  It wouldn't be 100% coverage, but still, if we could
> bootstrap the tracker with a few hundred old bugs, we might have
> something that was immediately useful, instead of starting from scratch
> and hoping it would eventually contain enough data to be useful.

Just as a data point, git tells me that there are 387 commits where
the commit log message matches '#\d+', and 336 where it matches 'bug
#\d+'.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Greg Smith

On 04/17/2012 11:44 PM, Tom Lane wrote:

At the same time, I think we'd likely be a lot better off squirting this
data into bugzilla or another standard tracker, instead of building our
own infrastructure.


Perhaps.  It just struck me that a lot of the custom bits needed here 
regardless could be built/added to the usual workflow bottom-up.  You 
don't have to assume that either a full tracker or something like my 
simple POC idea will show up at the end to get started.  Making goal #1 
involve just mining for the data that's already around and squirting it 
onto a web page would be a useful exercise, one that's likely to benefit 
any tracker adoption.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Tom Lane
Greg Smith  writes:
> Rather than talk about adopting one of the available torture devices, 
> I'd happily consider the simplest thing possible that would be useful 
> here instead.  Here's my proposed tiny tracker:

Wasn't Jay just muttering about "writing your own bug tracker" being an
anti-pattern?  But still, there's something in what you say, because ...

> -Make commits that fix a bug reference it in one of the standard ways 
> that's done by every one of these bug trackers.  Just throw "Fixes 
> #6596" into the commit message.  These will probably work if a more 
> serious tool is adopted, too.

... I think you'll find a lot of that data could be mined out of our
historical commit logs already.  I know I make a practice of mentioning
"bug #" whenever there is a relevant bug number, and I think other
committers do too.  It wouldn't be 100% coverage, but still, if we could
bootstrap the tracker with a few hundred old bugs, we might have
something that was immediately useful, instead of starting from scratch
and hoping it would eventually contain enough data to be useful.

At the same time, I think we'd likely be a lot better off squirting this
data into bugzilla or another standard tracker, instead of building our
own infrastructure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Greg Smith

On 04/17/2012 10:30 PM, Tom Lane wrote:


Indeed.  The only one I've got extensive experience with is Bugzilla
(because Red Hat uses it) and I do cordially hate it.  At least some
of that is due to bureaucratic practices RH has evolved, like cloning
bugs N times for N affected releases, but I think the tool encourages
such things.


That's along the same lines as my comments toward Jay Levitt, that bugs 
where the fixes span multiple releases are the part nobody seems to 
handle very well.


Rather than talk about adopting one of the available torture devices, 
I'd happily consider the simplest thing possible that would be useful 
here instead.  Here's my proposed tiny tracker:


-If a bug is found in a released version, but it didn't originate on 
pgsql-bugs, send a message to that list so it gets assigned a bug id.


-Write something that consumes pgsql-bugs and dumps all bug numbers and 
their subject lines into a database.  Start them with a state of 
"Unconfirmed".


-Make commits that fix a bug reference it in one of the standard ways 
that's done by every one of these bug trackers.  Just throw "Fixes 
#6596" into the commit message.  These will probably work if a more 
serious tool is adopted, too.


-Update src/tools/git_changelog to understand these messages and produce 
a delimited file about every bug fix discovered.  Import new entries 
into another table with the bug id as the foreign key.


-Provide a command line tool to change bug state, basically a thin 
wrapper around an UPDATE statement.  Make it easy to change ranges that 
are currently "Unconfirmed" to "Not a bug", for the bug reports that 
weren't really bugs.


-When point release tagging happens, run another script that looks for 
bug fix commits since the last one, and then save that version number 
into a "fixed in" table.


-Generate a web page out of the database.

I think I've outlined that in a way that would make useful steps toward 
adopting one of the full packages, too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 10:53 PM, Tom Lane  wrote:
> I can see both sides of this.  I agree that the old behavior is buggy,
> but what I imagine Robert is worried about is scripts that accidentally
> work okay today and would stop working once the PG programs are fixed
> to complain about bogus usage.  People tend not to like it if you make
> that kind of change in a minor release.

Exactly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 11:07 PM, Greg Sabino Mullane  wrote:
> Let's not let perfect be the enemy of good. In this case, *anything*
> that actually tracks bugs (and they are all quite good at that,
> if nothing else) is an improvement over what we have now, and thus,
> quite good. :)

I respectfully disagree.  I would rather be chained to an angry cat
for a day than have to use Bugzilla on a regular basis.  RT is better,
but the UI is still laughably bad.  Try sticking a 74-email long
thread into an RT ticket and then try to do anything with it.  Now try
in Gmail (even the new, revised, slightly-harder-to-use Gmail).  It's
not close.

> Personally, I'm okay with, and have extensively hacked on, Bugzilla
> and RT, but anything should be fine as long as we have someone
> to take ownership.

Therein lies another problem...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Qi Huang




> Date: Wed, 18 Apr 2012 02:45:09 +0300
> Subject: Re: [HACKERS] Gsoc2012 idea, tablesample
> From: a...@cybertec.at
> To: cbbro...@gmail.com
> CC: sfr...@snowman.net; pgsql-hackers@postgresql.org
> 
> On Tue, Apr 17, 2012 at 7:33 PM, Christopher Browne  
> wrote:
> > Well, there may be cases where the quality of the sample isn't
> > terribly important, it just needs to be "reasonable."
> >
> > I browsed an article on the SYSTEM/BERNOULLI representations; they
> > both amount to simple picks of tuples.
> >
> > - BERNOULLI implies picking tuples with a specified probability.
> >
> > - SYSTEM implies picking pages with a specified probability.  (I think
> > we mess with this in ways that'll be fairly biased in view that tuples
> > mayn't be of uniform size, particularly if Slightly Smaller strings
> > stay in the main pages, whilst Slightly Larger strings get TOASTed...) 
Looking at the definition of BERNOULLI method and it means to scan all the 
tuples, I always have a question. What is the difference of using BERNOULLI 
method with using "select *  where rand() < 0.1"? They will both go through 
all the tuples and cost a seq-scan. If the answer to the above question is "no 
difference", I have one proposal for another method of BERNOULLI. For a 
relation, we can have all their tuples assigned an unique and continuous ID( we 
may use ctid or others). Then for each number in the set of IDs, we assign a 
random number and check whether that is smaller than the sampling percentage. 
If it is smaller, we retrieve the tuple corresponding to that ID. This method 
will not seq scan all the tuples, but it can sample by picking tuples.Thanks 

Best Regards and ThanksHuang Qi VictorComputer Science of National University 
of Singapore




  

Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> But for any
> given ABC there are also people who will tell you that it's got
> significant problems.  We don't need to change anything to get a
> system that's got significant problems; we already have one.

Let's not let perfect be the enemy of good. In this case, *anything* 
that actually tracks bugs (and they are all quite good at that, 
if nothing else) is an improvement over what we have now, and thus, 
quite good. :)

Personally, I'm okay with, and have extensively hacked on, Bugzilla 
and RT, but anything should be fine as long as we have someone 
to take ownership.

- -- 
Greg Sabino Mullane g...@turnstep.com
PGP Key: 0x14964AC8 201204172131
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk+OL/0ACgkQvJuQZxSWSshMxACeJdr+WO4ttA2mkrGLv98PTTSH
jSoAniKwQNPzokA3f0GYN8gB+hAOc0Hy
=oPn6
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen

2012-04-17 Thread Tom Lane
Andrew Dunstan  writes:
> You know, I could have sworn it was discussed, but when I look back I 
> see it wasn't. I must have been remembering the recent logging protocol bug.

> I'll revert it if people want, although I still think it's a bug.

I think we discussed it to the extent of agreeing it was a bug, but
the question of whether to back-patch was not brought up.

I can see both sides of this.  I agree that the old behavior is buggy,
but what I imagine Robert is worried about is scripts that accidentally
work okay today and would stop working once the PG programs are fixed
to complain about bogus usage.  People tend not to like it if you make
that kind of change in a minor release.  Against that you have to set
the probability of mistaken interactive usage being caught, or not,
by a repaired program.  Stopping a disastrous command-line typo seems
potentially worth any pain from having to fix scripts that would have
to be fixed eventually anyway.

If you had patched only HEAD I would have been fine with that, but
seeing that you've done the work to back-patch I'm kind of inclined
to let it stand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 17, 2012 at 1:47 AM, Magnus Hagander  wrote:
>> That's probably one reason people aren't jumping on this. Because
>> there is no tracker out there that people actually *like*...

> I think this is a point worth serious thought.

Indeed.  The only one I've got extensive experience with is Bugzilla
(because Red Hat uses it) and I do cordially hate it.  At least some
of that is due to bureaucratic practices RH has evolved, like cloning
bugs N times for N affected releases, but I think the tool encourages
such things.  So when I read Andrew's recent suggestion that we use
Bugzilla, my immediate reaction was "egad, can't we do better?".
Maybe we can't :-(.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Kyotaro HORIGUCHI
> > Hmm. StandbyMode is a local variable which cannot be accessed in
> > checkpointer. But WalRcvInProgress() which shows if wal receiver
> > is running seems to be usable to ENABLE governing progress by
> > checkpoint_segments.
> 
> Even when walreceiver is not running and WAL files are read from the archive,
> checkpoint_segments can trigger a restartpoint. In this case, ISTM a
> restartpoint
> should be scheduled according to checkpoint_segments, so I don't think that
> checking WalRcvInProgress() for that purpose is right thing. Instead, what 
> about
> sharing StandbyMode flag among processes via shared memory like XLogCtl?

I tried that at first. But I suppose the requirement here is 'if
reading segments comes via replication stream, enable throttling
by checkpoint_segments.' and WalRcvInProgress() seems fit to
check that. Plus, adding SharedStartupStandbyMode into
XLogCtlData seems accompanied with some annoyances which would
not pay.


By the way, do you have some advise about GetStandbyFlushRecPtr()
and the order of the locations? I'm embarrassed with that...

> > I agree with it basically. But I've get confused to look into
> > GetStandbyFlushRecPtr().
> >
> > |   if (XLByteLT(receivePtr, replayPtr))
> > | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr;
> > |   else
> > | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr;
>  - receivePtr seems always updated just after syncing received xlog.
>  - replayPtr is updated just BEFORE xlog_redo operation, and
>  - restorePtr is updated AFTER xlog_redo().
>  - And, replayPtr seems not exceeds receivePtr.
> 
> These seems quite reasonable. These conditions make following
> conditional expression.
> 
>  restorePtr <= replayPtr <= receivePtr
> 
> But XLByteLT(recievePtr, replayPtr) this should not return true
> under the condition above.. Something wrong in my assumption?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 1:47 AM, Magnus Hagander  wrote:
> That's probably one reason people aren't jumping on this. Because
> there is no tracker out there that people actually *like*...

I think this is a point worth serious thought.  The bug trackers I've
used have been mostly terrible; saying that they are to bug tracking
what CVS is to version control is insulting CVS.  They're more like
what editing RCS files in vi is to version control - i.e. worse than
not having version control.

To put that in practical terms, I think everyone (including people
like Tom and I) who (a) are old curmudgeons or anyway middle-aged
curmudgeons and (b) would spend much more time in bed with any system
that we adopted than the average hacker would agree that the current
system is kind of a pain.  But there is no point in replacing it with
something else unless that new thing is going to be significantly
better than what we are doing now.  And it's not entirely clear that
such a thing exists.  There are certainly people out there, and even
on this list, who will tell you that system ABC is great.  But for any
given ABC there are also people who will tell you that it's got
significant problems.  We don't need to change anything to get a
system that's got significant problems; we already have one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Ants Aasma
On Tue, Apr 17, 2012 at 7:33 PM, Christopher Browne  wrote:
> Well, there may be cases where the quality of the sample isn't
> terribly important, it just needs to be "reasonable."
>
> I browsed an article on the SYSTEM/BERNOULLI representations; they
> both amount to simple picks of tuples.
>
> - BERNOULLI implies picking tuples with a specified probability.
>
> - SYSTEM implies picking pages with a specified probability.  (I think
> we mess with this in ways that'll be fairly biased in view that tuples
> mayn't be of uniform size, particularly if Slightly Smaller strings
> stay in the main pages, whilst Slightly Larger strings get TOASTed...)

Dealing with non uniform sizes isn't too hard. analyze.c already does
that. Given a table with B blocks it takes a uniform sample of b
blocks, and runs Vitter's reservoir sampling algorithm over the
resulting blocks to get s random tuples in a single pass. It's quite
easy to prove that this results in each tuple having an equal
probability to appear in the final table. However, it isn't fully
independent sampling - depending on the value of b compared to s and
B, there is a slightly higher probability to see multiple tuples
picked from one page. I'm too lazy to do the math, but for the analyze
case of b = s it probably isn't significant for most practical
purposes, even if B is really large. And it seems to me that when b >=
s the reservoir sampling thresholds could be tweaked at block
boundaries to even out the dependencies.

The ratio of b to s could be tweaked to get lower quality sampling
(samples are more spatially clumped) in exchange for less random I/O.

> Possibly the forms of sampling that people *actually* need, most of
> the time, are more like Dollar Unit Sampling, which are pretty
> deterministic, in ways that mandate that they be rather expensive
> (e.g. - guaranteeing Seq Scan).

I have a gut feeling that Dollar Unit Sampling and other weighted
samples can be done with a similar approach of uniformly sampling
blocks and then running weighted reservoir sampling [1] over the
result.

[1]: http://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf

Cheers,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen

2012-04-17 Thread Andrew Dunstan



On 04/17/2012 07:19 PM, Andrew Dunstan wrote:



On 04/17/2012 07:08 PM, Robert Haas wrote:
On Tue, Apr 17, 2012 at 6:39 PM, Andrew Dunstan  
wrote:

Don't override arguments set via options with positional arguments.

A number of utility programs were rather careless about paremeters
that can be set via both an option argument and a positional
argument. This leads to results which can violate the Principal
Of Least Astonishment. These changes refuse to use positional
arguments to override settings that have been made via positional
arguments. The changes are backpatched to all live branches.

Branch
--
REL8_3_STABLE

Uh, isn't it kind of a bad idea to back-patch something like this?  It
seems like a behavior change.



It was discussed. I think the previous behaviour is a bug. It can't be 
sane to be allowed to do:


   initdb -D foo bar






You know, I could have sworn it was discussed, but when I look back I 
see it wasn't. I must have been remembering the recent logging protocol bug.


I'll revert it if people want, although I still think it's a bug.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen

2012-04-17 Thread Andrew Dunstan



On 04/17/2012 07:08 PM, Robert Haas wrote:

On Tue, Apr 17, 2012 at 6:39 PM, Andrew Dunstan  wrote:

Don't override arguments set via options with positional arguments.

A number of utility programs were rather careless about paremeters
that can be set via both an option argument and a positional
argument. This leads to results which can violate the Principal
Of Least Astonishment. These changes refuse to use positional
arguments to override settings that have been made via positional
arguments. The changes are backpatched to all live branches.

Branch
--
REL8_3_STABLE

Uh, isn't it kind of a bad idea to back-patch something like this?  It
seems like a behavior change.



It was discussed. I think the previous behaviour is a bug. It can't be 
sane to be allowed to do:


   initdb -D foo bar


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Christopher Browne
On Tue, Apr 17, 2012 at 4:15 PM, Jay Levitt  wrote:
> That's a great point. Both GitHub and git itself have no real concept of
> releases, and can't tell you when a commit made it in.

Those factors likely play together in this.

Git is a tool, not a workflow, and intentionally allows its users to
use it in a variety of ways.  (Which includes some very interesting
pathologies visible with stuff like git-annex, git-mail.)

It's not a bug that git can do things differently than the Postgres
project wants things done.

Likewise, it's not a bug that github, which intends to support all
kinds of users using git, does not enforce the preferred Postgres
workflow.

I think it's pretty *normal* that we'd need tooling that won't be
identical to (say) GitHub, and we shouldn't get too exercised about
this.

I wonder if our "fix" instead involves:
a) Adding an ArchiveOpteryx instance to archive mailing list traffic;
http://archiveopteryx.org/
b) A bit more tooling to make it easier to link to threads in that instance
c) Perhaps some management tools based on debbugs to ease scripting of
issues being tracked

That's not prescriptive; just ideas.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Don't override arguments set via options with positional argumen

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 6:39 PM, Andrew Dunstan  wrote:
> Don't override arguments set via options with positional arguments.
>
> A number of utility programs were rather careless about paremeters
> that can be set via both an option argument and a positional
> argument. This leads to results which can violate the Principal
> Of Least Astonishment. These changes refuse to use positional
> arguments to override settings that have been made via positional
> arguments. The changes are backpatched to all live branches.
>
> Branch
> --
> REL8_3_STABLE

Uh, isn't it kind of a bad idea to back-patch something like this?  It
seems like a behavior change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Andrew Dunstan



On 04/17/2012 04:38 PM, Tom Lane wrote:

Jay Levitt  writes:

Greg Smith wrote:

Tracking when and how a bug is backported to older versions is one hard part
of the problem here.

That's a great point. Both GitHub and git itself have no real concept of
releases, and can't tell you when a commit made it in.

We do actually have a somewhat-workable solution for that, see
src/tools/git_changelog.  It relies on cooperation of the committers
to commit related patches with the same commit message and more or
less the same commit time, but that fits fairly well with our practices
anyway.  If we did have an issue tracker I could see expecting commit
messages to include a reference to the issue number, and then it would
not be hard to adapt this program to key on that instead of matching
commit message texts.





Yeah, that would be good.

BTW, since we're discussing trackers yet again, let me put in a plug for 
Bugzilla, which has mature Postgres support, is written in Perl (which a 
large number of hackers are familiar with and which we use extensively), 
has a long history and a large organization behind it (Mozilla) and last 
but not least has out of the box support for creating updating and 
closing bugs via email (I just set up an instance of the latest release 
with this enabled to assure myself that it works, and it does.) It also 
has XML-RPC and JSON-RPC interfaces, as well as standard browser 
support, although I have not tested the RPC interfaces.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension allocating shared memory

2012-04-17 Thread Kevin Grittner
Alvaro Herrera  wrote:
> Excerpts from Kevin Grittner's message:
 
>> What is the best way for an extension to allocate shared memory
>> and to access it from every backend? 
 
> RequestAddinShmemSpace
 
Perfect!  That's exactly what I wanted.  I see that the
pg_stat_statements extension is already using it, so I even have a
good working example to look at.
 
Thanks,
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
Thanks for the response.



The use-case I'm targeting is a schema that has multiple tables with ~800 
columns, most of which have only the first 50 or so values set. 800 columns 
would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
alignment the row bitmap would take up 104 bytes with the current 
implementation. If only the first 50 or so columns are actually non-null, then 
the minimum bitmap size wouldn't need to be more than 8 bytes, which means the 
proposed change would save 96 bytes. For the data set I have in mind roughly 
90% of the rows would fall into the category of needing only 8 bytes for the 
null bitmap.


What kind of test results would prove that this is a net win (or not a net 
loss) for typical cases? Are you interested in some insert performance tests? 
Also, how would you define a typical case (e.g. what kind of data shape)?

Thanks.
-jamie



 From: Tom Lane 
To: Jameison Martin  
Cc: "pgsql-hackers@postgresql.org"  
Sent: Tuesday, April 17, 2012 9:38 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap 
 
Jameison Martin  writes:
> The following patch truncates trailing null attributes from heap rows to 
> reduce the size of the row bitmap. 

This has been discussed before, but it always seemed that the
cost-benefit ratio was exceedingly questionable.  You don't get any
savings whatsoever unless you reduce the size of the null bitmap across
a MAXALIGN boundary, which more and more often is 64 bits, so that the
frequency with which the optimization wins anything doesn't look likely
to be that high.  And on the other side of the coin, you're adding
cycles to every single tuple-construction operation to make this work.
The introduction of bugs doesn't seem improbable either.  (Just because
tuples in user tables might have unexpected natts values doesn't mean
that the code is, or should be, prepared to
 tolerate that in system
tables or plan-constructed tuples.)

So what I'd like to see is some concrete test results proving that this
is a net win, or at least not a net loss, for typical cases.  Just
asserting that it might be a win for certain usage patterns doesn't do
it for me.

            regards, tom lane

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Tom Lane
Greg Stark  writes:
> On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane  wrote:
>> This has been discussed before, but it always seemed that the
>> cost-benefit ratio was exceedingly questionable.  You don't get any
>> savings whatsoever unless you reduce the size of the null bitmap across
>> a MAXALIGN boundary, which more and more often is 64 bits, so that the
>> frequency with which the optimization wins anything doesn't look likely
>> to be that high.

> There is the usage pattern where (brace yourself) people have
> thousands of columns in which they have all but a handful be null.
> They might be pretty happy about this.

Oh, I don't doubt that there are *some* use cases for this.  I'm just
dubious about how much we'd be slowing things down for everybody else.
As I said, what I'd like to see are some benchmarks, and not just
benchmarks that are tuned to match the sort of case where this wins.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Tom Lane
Jay Levitt  writes:
> Greg Smith wrote:
>> Tracking when and how a bug is backported to older versions is one hard part
>> of the problem here.

> That's a great point. Both GitHub and git itself have no real concept of 
> releases, and can't tell you when a commit made it in.

We do actually have a somewhat-workable solution for that, see
src/tools/git_changelog.  It relies on cooperation of the committers
to commit related patches with the same commit message and more or
less the same commit time, but that fits fairly well with our practices
anyway.  If we did have an issue tracker I could see expecting commit
messages to include a reference to the issue number, and then it would
not be hard to adapt this program to key on that instead of matching
commit message texts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-17 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 17, 2012 at 3:05 PM, Tom Lane  wrote:
>>> Personally, I find required_outer more clear.  YMMV.

>> Perhaps.  What's bothering me is the potential for confusion with outer
>> joins; the parameter-supplying rels are *not* necessarily on the other
>> side of an outer join.  Anybody else have an opinion about that?

> Well, we also use the words "inner" and "outer" to refer to the sides
> of any join, regardless of type.

True.

>  The thing I don't like about "param_relids" is that "param" can refer
> to an awful lot of different things.

Fair enough.  I'll leave required_outer alone then, and adjust some
names in the new patch to be consistent with that.

As far as the other naming issue goes, it struck me that instead of
join_clause_is_parameterizable_xxx, we could call those functions
join_clause_is_movable_xxx, assuming it's okay to commandeer
the notion of "movable" for this particular usage.  It seems a bit
less generic than "parameterizable" anyway.  The "for" and "within"
bits don't fit with that though.  The first one could reasonably
be called "join_clause_is_movable_to", since we're checking if it's
okay to push the clause to precisely that base relation, but I'm a
bit at a loss for a modifier for the other one.  "into" would be
appropriate, but "to" and "into" are so close together that people
might get confused.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Stephen Frost
Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
> FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
> worked with geographic indexes.  This would be tremendously useful for
> constructing low-resolution "zoom out" tiles on maps and similar.

I'm familiar with the concept of 'zoom out' tiles and PostGIS, but I
don't actually see the connection between that and TABLESAMPLE.  Perhaps
I'm missing something, but I've never seen a case where you create 'zoom
out' tiles by just grabbing some portion of the data at random, as that
would end up creating empty spots or missing pieces.

My experience has been with running an algorithm on each record in the
data set to reduce the number of points on a given record (which ends up
reducing the size of the record, etc).  You could also filter out
records which wouldn't be visible due to the small 'size' of the record.
Again, neither of those would benefit from a TABLESAMPLE command.
Perhaps they're thinking it's going to use a GIST index to only pull out
records matching a certain condition..?  Except we have WHERE for that..

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Greg Stark
On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane  wrote:
> This has been discussed before, but it always seemed that the
> cost-benefit ratio was exceedingly questionable.  You don't get any
> savings whatsoever unless you reduce the size of the null bitmap across
> a MAXALIGN boundary, which more and more often is 64 bits, so that the
> frequency with which the optimization wins anything doesn't look likely
> to be that high.

There is the usage pattern where (brace yourself) people have
thousands of columns in which they have all but a handful be null.
They might be pretty happy about this. I'm not sure if that's a use
case that makes sense to optimize for though -- even for them the
space overhead would be noticeable but not a showstopper.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Jay Levitt

Greg Smith wrote:

On 04/17/2012 09:20 AM, Jay Levitt wrote:

Antispam is (in the large) a technically unsolvable
problem; even in the '90s, we'd see hackers start poking at our newest
countermeasures within the hour. GitHub is a giant target, and PG
probably benefits here from NOT being one.

Everyone who deals with list moderation and spam issues around PostgreSQL
just got a belly laugh from that comment. Hint: the PostgreSQL lists had
already been around and therefore were being targeted by spammers for over
ten years before GitHub even existed.


Hehe.  OK, we will have to battle this out over drinks if I ever make it to 
PGCon.. but teaser: I've bankrupted Sanford Wallace and taught the DOJ what 
spam was.



Pedantic note/fun fact: There was no email antispam in 1994

I like it when Magnus really gets the details perfect when making a deadpan
joke.


Dammit.  I *fail*.


Anyway, back to serious talk, I believe GitHub is a dead end here because
the "primary key" as it were for issues is a repo. A bug tracker for
PostgreSQL would need to have issues broken down per branch and include
information similar to the release notes for each minor point release.
Tracking when and how a bug is backported to older versions is one hard part
of the problem here.


That's a great point. Both GitHub and git itself have no real concept of 
releases, and can't tell you when a commit made it in.


Although.. there's some sort of new release-note functionality. Maybe I'll 
play and see if it'd be applicable here.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URI and regression testing

2012-04-17 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of mar abr 17 16:03:50 -0300 2012:

> >> That's one reason for that, but there are probably others in the way of
> >> making this fully portable and automatable.
> 
> This test setup also appears to labor under the illusion that we live in 
> a Unix-only world. And for no good reason that I can tell. The shell 
> script should be ripped out and replaced by a perl script which could 
> actually be used on any windows build host. The MSVC build system also 
> needs adjusting to make it build the test driver, at least.

I'll see about it.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension allocating shared memory

2012-04-17 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of mar abr 17 16:27:21 -0300 2012:
> What is the best way for an extension to allocate shared memory and to
> access it from every backend?  Or, if there is no support existing for
> that, what advice do people have if I want to make that happen?  I don't
> need a lot (probably 1KB would do).

RequestAddinShmemSpace

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 3:05 PM, Tom Lane  wrote:
> Well, we already made a policy decision that we weren't going to try
> very hard to support merge joins inside parameterized subtrees, because
> the potential growth in planning time looked nasty.  My thought was that
> we might resurrect the parameterized MergeAppendPath code when and if
> we reverse that decision.  At the moment, in fact, I believe that
> add_path is pretty nearly guaranteed to discard a parameterized
> MergeAppendPath immediately upon submission, because the fact that it's
> sorted isn't given any weight for add_path comparisons, and cost-wise
> it's going to look worse than the similarly parameterized plain Append
> that we would have submitted just before.

OK.

>>> Second, I've gotten dissatisfied
>>> with the terminology "required_outer" that was used in the original param
>>> plans patch.  I'm considering a global search and replace with
>>> "param_relids" or some variant of that.
>
>> Personally, I find required_outer more clear.  YMMV.
>
> Perhaps.  What's bothering me is the potential for confusion with outer
> joins; the parameter-supplying rels are *not* necessarily on the other
> side of an outer join.  Anybody else have an opinion about that?

Well, we also use the words "inner" and "outer" to refer to the sides
of any join, regardless of type.  Maybe we could call it
"requires_nestloop_with" or "requires_join_to" or something like that.
 The thing I don't like about "param_relids" is that "param" can refer
to an awful lot of different things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-17 Thread Fujii Masao
On Tue, Apr 17, 2012 at 9:52 PM, Thom Brown  wrote:
> On 16 April 2012 17:21, Fujii Masao  wrote:
>> On Sun, Apr 15, 2012 at 12:13 AM, Thom Brown  wrote:
>>> No, that's not what I was referring to.  If you don't have a standby
>>> (i.e. a single, isolated database cluster with no replication), and
>>> its synchronous_commit is set to 'remote_write', what effect does that
>>> have?
>>
>> It's the same effect as 'on' and 'local' do, i.e., transaction commit waits
>> for only local WAL flush. This behavior is not documented explicitly...
>> How should we change the document? What about adding the following
>> into the explanation of synchronous_commit parameter (maybe the end
>> of second paragraph of that)?
>>
>> -
>> If synchronous_standby_names is not set, on, remote_write and local
>> provide the same synchronization level; transaction commit only waits for
>> local flush.
>> -
>
> Yes, that sounds fine.

Okay, patch attached.

Regards,

-- 
Fujii Masao


synchronous_commit_doc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] extension allocating shared memory

2012-04-17 Thread Kevin Grittner
What is the best way for an extension to allocate shared memory and to
access it from every backend?  Or, if there is no support existing for
that, what advice do people have if I want to make that happen?  I don't
need a lot (probably 1KB would do).
 
If this just "can't be done" I guess I could start a little daemon to
run alongside PostgreSQL and talk to it through a pipe or TCP on
localhost, but it seems cleaner to have it run within PostgreSQL if
feasible.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Fujii Masao
On Tue, Apr 17, 2012 at 11:50 PM, Kyotaro HORIGUCHI
 wrote:
> Hmm. StandbyMode is a local variable which cannot be accessed in
> checkpointer. But WalRcvInProgress() which shows if wal receiver
> is running seems to be usable to ENABLE governing progress by
> checkpoint_segments.

Even when walreceiver is not running and WAL files are read from the archive,
checkpoint_segments can trigger a restartpoint. In this case, ISTM a
restartpoint
should be scheduled according to checkpoint_segments, so I don't think that
checking WalRcvInProgress() for that purpose is right thing. Instead, what about
sharing StandbyMode flag among processes via shared memory like XLogCtl?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-17 Thread Robert Haas
On Sun, Apr 15, 2012 at 8:23 AM, Simon Riggs  wrote:
> On Sat, Apr 7, 2012 at 9:51 PM, Robert Haas  wrote:
>> I think this basically just boils down to too many patches and not
>> enough people.  I was interested in Command Triggers from the
>> beginning of this CommitFest, and I would have liked to pick it up
>> sooner, but there were a LOT of patches to work on for this
>> CommitFest.  The first three CommitFests of this cycle each had
>> between 52 and 60 patches, while this one had 106 which included
>> several very complex and invasive patches, command triggers among
>> them.  So there was just a lot more to do, and a number of the people
>> who submitted all of those patches didn't do a whole lot to help
>> review them, sometimes because they were still furiously rewriting
>> their submissions.  It's not surprising that more patches + fewer
>> reviewers = each patch getting less attention, or getting it later.
>
> This is a good point. The current process lacks inherent scalability.
>
> I would really like us to enforce a policy of 1 patch => 1 review.
> That way we automatically have enough review time, no matter how many
> patches we get. If we don't enforce that, then patch sponsors are more
> likely to take the attitude that review isn't something they need to
> pay for, just the dev work.

I would be generally in favor of that policy, but I would relax it for
people who have only ever submitted a handful of patches, so as to
continue encouraging them to become involved in the community.

It's also worth noting that not all reviews are created equal.  It
takes a lot more time to review command triggers than it does to
review pg_archivecleanup extension-skipping.  It's not important that
the review effort is *exactly* proportional to the size of what has
been submitted, but it *is* important that when we're having a
CommitFest, people are spending the majority of their time on
reviewing, rather than continuing to spend it developing patches that
aren't done yet.  It's perfectly right to say that people should be
given a chance to finish up patches that are almost there, but our
definition of "almost there" has a tendency to expand to the point
where it's laughable when the last CommitFest of the cycle rolls
along.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Fujii Masao
On Tue, Apr 17, 2012 at 3:50 PM, Kyotaro HORIGUCHI
 wrote:
> These seems quite reasonable. These conditions make following
> conditional expression.
>
>     restorePtr <= replayPtr <= receivePtr
>
> But XLByteLT(recievePtr, replayPtr) this should not return true
> under the condition above.. Something wrong in my assumption?

When walreceiver is not running, i.e., the startup process reads the WAL files
from the archival area, the replay location would get bigger than the
receive one.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-17 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane  wrote:
>> BTW, after writing the code for it I decided to remove creation of
>> parameterized MergeAppendPaths from allpaths.c, though there is still some
>> support for them elsewhere.  On reflection it seemed to me that the code
>> was willing to create far too many of these, much more than their
>> potential usefulness could justify (remember that parameterized paths must
>> be on the inside of a nestloop, so their sort ordering is typically of
>> marginal use).  We can put that back if we can think of a more restrictive
>> heuristic for when to create them.

> I guess the case in which this would matter is if you wrote something
> like A LJ (B IJ C) where B and/or C has child tables and the best
> method of joining them to each other is a marge join.

Well, we already made a policy decision that we weren't going to try
very hard to support merge joins inside parameterized subtrees, because
the potential growth in planning time looked nasty.  My thought was that
we might resurrect the parameterized MergeAppendPath code when and if
we reverse that decision.  At the moment, in fact, I believe that
add_path is pretty nearly guaranteed to discard a parameterized
MergeAppendPath immediately upon submission, because the fact that it's
sorted isn't given any weight for add_path comparisons, and cost-wise
it's going to look worse than the similarly parameterized plain Append
that we would have submitted just before.

>> Second, I've gotten dissatisfied
>> with the terminology "required_outer" that was used in the original param
>> plans patch.  I'm considering a global search and replace with
>> "param_relids" or some variant of that.

> Personally, I find required_outer more clear.  YMMV.

Perhaps.  What's bothering me is the potential for confusion with outer
joins; the parameter-supplying rels are *not* necessarily on the other
side of an outer join.  Anybody else have an opinion about that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URI and regression testing

2012-04-17 Thread Andrew Dunstan



On 04/17/2012 02:47 PM, Alvaro Herrera wrote:

Excerpts from Peter Eisentraut's message of mar abr 17 15:41:04 -0300 2012:

On tis, 2012-04-17 at 10:47 -0300, Alvaro Herrera wrote:

What's the preferred way to make it automatically tested as much as
possible?  I know the buildfarm does not run "installcheck-world", so if
we want it there, it'd need a bit more code on the client side.  I think
it would be wise to have it also run on installcheck-world.

It was agreed during the patch discussion that it shouldn't be run
automatically.

Oh, okay.  I didn't notice that.  I guess we should issue a
call-for-testing, then, so that we ensure it works (FSVO works) in all
(FSVO all) platforms.


Hmm.  Just had this thought: not all platform support the same socket
types.  Maybe we should have separated the .in file in three parts:
IPv4, IPv6, unix-domain socket.  That way each platform would only run
tests that pertain to it.  Right now there's a single "regress.in" file
that lists all the tests.

That's one reason for that, but there are probably others in the way of
making this fully portable and automatable.



This test setup also appears to labor under the illusion that we live in 
a Unix-only world. And for no good reason that I can tell. The shell 
script should be ripped out and replaced by a perl script which could 
actually be used on any windows build host. The MSVC build system also 
needs adjusting to make it build the test driver, at least.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-17 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar abr 17 15:46:23 -0300 2012:
> On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane  wrote:

> > The core of the patch is in the new functions get_baserel_parampathinfo
> > and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
> > and join_clause_is_parameterizable_for and
> > join_clause_is_parameterizable_within, which control
> > movement of parameterizable join clauses.  (I'm not that thrilled with the
> > names of the latter two functions, anybody got a better idea?)  The rest
> > of it is pretty much boilerplate changes and replacing ad-hoc logic with
> > uses of this stuff.
> 
> Parameterizable is such a mouthful.  I wish we had a better word.

P13e ?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URI and regression testing

2012-04-17 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mar abr 17 15:41:04 -0300 2012:
> On tis, 2012-04-17 at 10:47 -0300, Alvaro Herrera wrote:
> > What's the preferred way to make it automatically tested as much as
> > possible?  I know the buildfarm does not run "installcheck-world", so if
> > we want it there, it'd need a bit more code on the client side.  I think
> > it would be wise to have it also run on installcheck-world.
> 
> It was agreed during the patch discussion that it shouldn't be run
> automatically.

Oh, okay.  I didn't notice that.  I guess we should issue a
call-for-testing, then, so that we ensure it works (FSVO works) in all
(FSVO all) platforms.

> > Hmm.  Just had this thought: not all platform support the same socket
> > types.  Maybe we should have separated the .in file in three parts:
> > IPv4, IPv6, unix-domain socket.  That way each platform would only run
> > tests that pertain to it.  Right now there's a single "regress.in" file
> > that lists all the tests.
> 
> That's one reason for that, but there are probably others in the way of
> making this fully portable and automatable.

Hmm.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-17 Thread Robert Haas
On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane  wrote:
> I've been hacking away on a patch to do this, and attached is something
> that I think is pretty close to committable.  It needs another going-over
> and some new regression test cases, but it seems to work, and it fixes a
> number of things besides the above-mentioned issue.  In particular, this
> has a much more principled approach than HEAD does to the problem of where
> to place parameterizable join clauses in the plan tree; that can be seen
> in the one change in the existing regression tests, where we no longer
> generate a redundant upper-level copy of an OR join clause that the old
> code wasn't bright enough to get rid of.
>
> The patch is a bit large because I chose to revise the data representation.
> Instead of each Path having its own required_outer, rows, and
> param_clauses fields, now a parameterized Path has a pointer to a
> ParamPathInfo struct that it shares with other Paths for the same rel and
> the same parameterization.  This guarantees that such paths will have the
> same estimated rowcount, because we only compute that once per
> parameterization, which should save some work as well as making the world
> safe for add_path_precheck.

Seems reasonable.

> The only place where this approach proved a bit tricky was in handling
> AppendPaths and MergeAppendPaths, which didn't surprise me because that
> was a rough spot for the old way too (and indeed they aren't handled
> completely correctly in HEAD).  A parameterized path is now *required*
> to enforce all clauses that the join clause movement rules assign to it;
> but Append and MergeAppend don't do qual checking, and I didn't feel like
> changing that.  The method that I have settled on is to require all child
> paths of a parameterized append to have the exact same parameterization,
> IOW we push the qual checks down below the append.  Now the interesting
> point about that is that we want to support Appends wherein some children
> are seqscans and some are indexscans (consider a partitioned table where
> the parent is a dummy empty table with no indexes).  The "raw" situation
> there is that we'll have a plain seqscan path for the parent and then a
> collection of similarly-parameterized indexscan paths for the live
> partition children.  To make it possible to convert that case into a
> parameterized append path, I added parameterization support to seqscans
> and then wrote "reparameterize_path", which changes a Path to increase
> its parameterization level (and thereby assign it more pushed-down join
> clauses to check at runtime).  That allows us to reconfigure the seqscan
> to match the other children.  I've also added such support to
> SubqueryScan, on the grounds that the other common use of append paths is
> UNION ALL across subqueries.  We might later want to add parameterization
> support to other path types, but this seemed like enough for the moment.

OK.

> BTW, after writing the code for it I decided to remove creation of
> parameterized MergeAppendPaths from allpaths.c, though there is still some
> support for them elsewhere.  On reflection it seemed to me that the code
> was willing to create far too many of these, much more than their
> potential usefulness could justify (remember that parameterized paths must
> be on the inside of a nestloop, so their sort ordering is typically of
> marginal use).  We can put that back if we can think of a more restrictive
> heuristic for when to create them.

I guess the case in which this would matter is if you wrote something
like A LJ (B IJ C) where B and/or C has child tables and the best
method of joining them to each other is a marge join.  That doesn't
seem all that likely; normally a hash join or nested loop will be
better.  On the flip side I can't see that generating a bunch of extra
paths is going to hurt you much there either; they will fall away
pretty quickly if they aren't useful for merging.  Now, if you have
something like A IJ B or A LJ B, where B is partitioned, it's clearly
a waste of time to generate parameterized paths with pathkeys.

> The core of the patch is in the new functions get_baserel_parampathinfo
> and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
> and join_clause_is_parameterizable_for and
> join_clause_is_parameterizable_within, which control
> movement of parameterizable join clauses.  (I'm not that thrilled with the
> names of the latter two functions, anybody got a better idea?)  The rest
> of it is pretty much boilerplate changes and replacing ad-hoc logic with
> uses of this stuff.

Parameterizable is such a mouthful.  I wish we had a better word.

> I have a couple of other ideas in mind in the way of mop-up, but they are
> not in this patch to keep it from bloating even more.  First, I'm thinking
> we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan
> cost estimators to invoke cost_qual_eval explicitly.  That field has been
> vestigial from a

Re: [HACKERS] libpq URI and regression testing

2012-04-17 Thread Peter Eisentraut
On tis, 2012-04-17 at 10:47 -0300, Alvaro Herrera wrote:
> What's the preferred way to make it automatically tested as much as
> possible?  I know the buildfarm does not run "installcheck-world", so if
> we want it there, it'd need a bit more code on the client side.  I think
> it would be wise to have it also run on installcheck-world.

It was agreed during the patch discussion that it shouldn't be run
automatically.

> Hmm.  Just had this thought: not all platform support the same socket
> types.  Maybe we should have separated the .in file in three parts:
> IPv4, IPv6, unix-domain socket.  That way each platform would only run
> tests that pertain to it.  Right now there's a single "regress.in" file
> that lists all the tests.

That's one reason for that, but there are probably others in the way of
making this fully portable and automatable.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Josh Berkus
Qi, Hackers:

FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
worked with geographic indexes.  This would be tremendously useful for
constructing low-resolution "zoom out" tiles on maps and similar.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-17 Thread Robert Haas
On Mon, Apr 16, 2012 at 7:05 PM, Tom Lane  wrote:
>> Hmm. This sounds like something that Tom's recent work on
>> parameterized plans ought to have fixed, or if not, it seems closely
>> related.
>
> Not really.  It's still going to be a nestloop, and as such not terribly
> well suited for queries where there are a lot of matchable rows on both
> sides.  The work I've been doing is really about making nestloops usable
> in cases where join order restrictions formerly prevented it --- but
> Jeff's complaint has nothing to do with that.  (This thought also makes
> me a bit dubious about the nearby suggestions that more indexes will
> fix it.)

I thought Jeff was parenthetically complaining about cases like A LEFT
JOIN (B INNER JOIN C ON b.y = c.y) ON a.x && b.x.  That presumably
would require the parameterized-path stuff to have any chance of doing
partial index scans over B.  However, I understand that's not the main
issue here.

One thing that I think needs some analysis is when the range join idea
is better or worse than a nested loop with inner index-scan, because
potentially those are the options the planner has to choose between,
and the costing model had better know enough to make the right thing
happen.  It strikes me that the nested loop with inner index-scan is
likely to be a win when there are large chunks of the indexed relation
that the nestloop never needs to visit at all - imagine small JOIN big
ON small.a && big.a, for example.  I suppose the really interesting
question is how much we can save when the entirety of both relations
has to be visited anyway - it seems promising, but I guess we won't
know for sure without testing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-17 Thread Robert Haas
On Mon, Apr 16, 2012 at 1:43 PM, Jeff Davis  wrote:
> On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote:
>> Jeff Davis  writes:
>> >  1. Order the ranges on both sides by the lower bound, then upper bound.
>> > Empty ranges can be excluded entirely.
>> >  2. Left := first range on left, Right := first range on right
>> >  3. If Left or Right is empty, terminate.
>> >  4. If lower(Left) > upper(Right), discard Right, goto 2
>> >  5. If lower(Right) > upper(Left), discard Left, goto 2
>> >  6. return (Left, Right) as joined tuple
>> >  7. Right := next range on right
>> >  8. goto 3
>>
>> This is surely not correct in detail.  As written, it will be impossible
>> for any tuple on the right side to be joined to more than one left-side
>> tuple.  You will need something analogous to the mark-and-restore
>> rewinding logic in standard mergejoin to make this work.
>
> Every time you discard a tuple on the left, you go to step 2, which
> rewinds the right side back to the first non-discarded tuple.
>
> So, implemented using mark and restore:
>
>  * start off with the marks at the first tuple on each side
>  * "discard" means move the mark down a tuple
>  * setting it back to the first range means restoring to the mark
>  * going to the next range (step 7) just means getting another
>    tuple, without changing the mark
>
> Only one side really needs the mark and restore logic, but it was easier
> to write the pseudocode in a symmetrical way (except step 7).

I'm actually not sure these are equivalent formulations.  Suppose one
side has [i,i] where i ranges from 1 to 1000 and the other side
the exact same thing plus [1,1000].  That one really big range
will come up second on the right side, and you will not be able to
discard it until you reach the end of the join.  If you just keep
rewinding the right side, you're going to end up with O(n^2) behavior,
whereas if you can discard tuples "from the middle" on the right side,
then you will get O(n) behavior, which is a big difference.  In other
words, in your original algorithm the tuples that you discard in step
4 or 5 need not be the first remaining tuple on whichever side of the
join we're talking about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Greg Smith

On 04/17/2012 09:20 AM, Jay Levitt wrote:

Antispam is (in the large) a technically unsolvable
problem; even in the '90s, we'd see hackers start poking at our newest
countermeasures within the hour. GitHub is a giant target, and PG
probably benefits here from NOT being one.


Everyone who deals with list moderation and spam issues around 
PostgreSQL just got a belly laugh from that comment.  Hint:  the 
PostgreSQL lists had already been around and therefore were being 
targeted by spammers for over ten years before GitHub even existed.



Pedantic note/fun fact: There was no email antispam in 1994


I like it when Magnus really gets the details perfect when making a 
deadpan joke.


Anyway, back to serious talk, I believe GitHub is a dead end here 
because the "primary key" as it were for issues is a repo.  A bug 
tracker for PostgreSQL would need to have issues broken down per branch 
and include information similar to the release notes for each minor 
point release.  Tracking when and how a bug is backported to older 
versions is one hard part of the problem here.


For example, Trac, Redmine, and Github all have ways to make a commit 
message reference an issue, something like "fixes #X".  That's fine for 
projects that don't have a complicated backport policy, but I haven't 
been able to figure out how to make it work well enough for a PostgreSQL 
bug tracker, to end up saving any work here.  In some cases, a bug 
shouldn't be closed until it's been backported to all supported 
releases.  Others will only fix in relevant releases.


Let's pick a real example from the last week of my life, where having a 
bug tracker would have helped me out.  This appears in a log:


ERROR: missing chunk number 0 for toast value 1167375 in pg_toast_2619

What I should be able to do here is search the bug tracker for these 
words and have it spit out an issue that looks like this


===

Bug:  Fix race condition during toast table access from stale syscache 
entries


Impact:  Transient query failures

Fixed in:

9.2.0:  http://archives.postgresql.org/pgsql-committers/2011-11/msg00012.php

9.1.2:  http://archives.postgresql.org/pgsql-committers/2011-11/msg00016.php

9.0.6:  http://archives.postgresql.org/pgsql-committers/2011-11/msg00014.php

8.4.10: 
http://archives.postgresql.org/pgsql-committers/2011-11/msg00013.php


8.3.17: 
http://archives.postgresql.org/pgsql-committers/2011-11/msg00017.php


8.2.23: 
http://archives.postgresql.org/pgsql-committers/2011-11/msg00015.php


===

Note that the "fixed in" version information doesn't show up until some 
time *after* the bug fix is committed, because they normally get rolled 
into the next minor release in bulk.


A bug tracking system for PostgreSQL will start looking attractive when 
it makes life easier for the people who do these backports and the 
associated release notes.  Start looking at the problem from their 
perspective if you want to figure out how to make that happen.  I don't 
have a good answer to that; I just know that Trac, Redmine, and GitHub 
haven't felt like a good fit, having used every one of that trio for 
multiple years now at some point.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Greg Stark
On Tue, Apr 17, 2012 at 5:33 PM, Christopher Browne  wrote:
> I get the feeling that this is a somewhat-magical feature (in that
> users haven't much hope of understanding in what ways the results are
> deterministic) that is sufficiently "magical" that anyone serious
> about their result sets is likely to be unhappy to use either SYSTEM
> or BERNOULLI.

These both sound pretty useful. "BERNOULLI" is fine for cases where
you aren't worried about time dependency on your data. If you're
looking for the average or total value of some column for example.

SYSTEM just means "I'm willing to trade some unspecified amount of
speed for some unspecified amount of accuracy" which presumably is
only good if you trust the database designers to make a reasonable
trade-off for cases where speed matters and the accuracy requirements
aren't very strict.

> Possibly the forms of sampling that people *actually* need, most of
> the time, are more like Dollar Unit Sampling, which are pretty
> deterministic, in ways that mandate that they be rather expensive
> (e.g. - guaranteeing Seq Scan).

I don't know about that but the cases I would expect to need other
distributions would be ones where you're looking at the tuples in a
non-linear way. Things like "what's the average gap between events" or
"what's the average number of instances per value".  These might
require a full table scan but might still be useful if the data is
going to be subsequently aggregated or joined in ways that would be
too expensive on the full data set.

But we shouldn't let best be the enemy of the good here. Having SYSTEM
and BERNOULLI would solve most use cases and having those would make
it easier to add more later.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Tom Lane
Jameison Martin  writes:
> The following patch truncates trailing null attributes from heap rows to 
> reduce the size of the row bitmap. 

This has been discussed before, but it always seemed that the
cost-benefit ratio was exceedingly questionable.  You don't get any
savings whatsoever unless you reduce the size of the null bitmap across
a MAXALIGN boundary, which more and more often is 64 bits, so that the
frequency with which the optimization wins anything doesn't look likely
to be that high.  And on the other side of the coin, you're adding
cycles to every single tuple-construction operation to make this work.
The introduction of bugs doesn't seem improbable either.  (Just because
tuples in user tables might have unexpected natts values doesn't mean
that the code is, or should be, prepared to tolerate that in system
tables or plan-constructed tuples.)

So what I'd like to see is some concrete test results proving that this
is a net win, or at least not a net loss, for typical cases.  Just
asserting that it might be a win for certain usage patterns doesn't do
it for me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Christopher Browne
On Tue, Apr 17, 2012 at 11:27 AM, Stephen Frost  wrote:
> Qi,
>
> * Qi Huang (huangq...@hotmail.com) wrote:
>> > Doing it 'right' certainly isn't going to be simply taking what Neil did
>> > and updating it, and I understand Tom's concerns about having this be
>> > more than a hack on seqscan, so I'm a bit nervous that this would turn
>> > into something bigger than a GSoC project.
>>
>> As Christopher Browne mentioned, for this sampling method, it is not 
>> possible without scanning the whole data set. It improves the sampling 
>> quality but increases the sampling cost. I think it should also be using 
>> only for some special sampling types, not for general. The general sampling 
>> methods, as in the SQL standard, should have only SYSTEM and BERNOULLI 
>> methods.
>
> I'm not sure what sampling method you're referring to here.  I agree
> that we need to be looking at implementing the specific sampling methods
> listed in the SQL standard.  How much information is provided in the
> standard about the requirements placed on these sampling methods?  Does
> the SQL standard only define SYSTEM and BERNOULLI?  What do the other
> databases support?  What does SQL say the requirements are for 'SYSTEM'?

Well, there may be cases where the quality of the sample isn't
terribly important, it just needs to be "reasonable."

I browsed an article on the SYSTEM/BERNOULLI representations; they
both amount to simple picks of tuples.

- BERNOULLI implies picking tuples with a specified probability.

- SYSTEM implies picking pages with a specified probability.  (I think
we mess with this in ways that'll be fairly biased in view that tuples
mayn't be of uniform size, particularly if Slightly Smaller strings
stay in the main pages, whilst Slightly Larger strings get TOASTed...)

I get the feeling that this is a somewhat-magical feature (in that
users haven't much hope of understanding in what ways the results are
deterministic) that is sufficiently "magical" that anyone serious
about their result sets is likely to be unhappy to use either SYSTEM
or BERNOULLI.

Possibly the forms of sampling that people *actually* need, most of
the time, are more like Dollar Unit Sampling, which are pretty
deterministic, in ways that mandate that they be rather expensive
(e.g. - guaranteeing Seq Scan).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
The following patch truncates trailing null attributes from heap rows to reduce 
the size of the row bitmap. 


Applications often have wide rows in which many of the trailing column values 
are null. On an insert/update, all of the trailing null columns are tracked in 
the row bitmap. This can add a substantial overhead for very wide rows. This 
change truncates heap rows such that the trailing nulls are elided. 

The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL is a 
metadata only change. Postgres works fine when a row's metadata (tuple 
descriptor) is inconsistent with the actual row data: extra columns are assumed 
to be null. This change just adjusts the number of attributes for a row and the 
row bitmap to only track up to the last non-null attribute.

Thanks.

-Jamie Martin

0001-Truncate-trailing-null-attributes-from-heap-rows-to-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Stephen Frost
Qi,

* Qi Huang (huangq...@hotmail.com) wrote:
> > Doing it 'right' certainly isn't going to be simply taking what Neil did
> > and updating it, and I understand Tom's concerns about having this be
> > more than a hack on seqscan, so I'm a bit nervous that this would turn
> > into something bigger than a GSoC project.
> 
> As Christopher Browne mentioned, for this sampling method, it is not possible 
> without scanning the whole data set. It improves the sampling quality but 
> increases the sampling cost. I think it should also be using only for some 
> special sampling types, not for general. The general sampling methods, as in 
> the SQL standard, should have only SYSTEM and BERNOULLI methods. 

I'm not sure what sampling method you're referring to here.  I agree
that we need to be looking at implementing the specific sampling methods
listed in the SQL standard.  How much information is provided in the
standard about the requirements placed on these sampling methods?  Does
the SQL standard only define SYSTEM and BERNOULLI?  What do the other
databases support?  What does SQL say the requirements are for 'SYSTEM'?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Qi Huang

> > 2. It's not very useful if it's just a dummy replacement for "WHERE
> > random() < ?". It has to be more advanced than that. Quality of the
> > sample is important, as is performance. There was also an
> > interesting idea of on implementing monetary unit sampling.
> 
> In reviewing this, I got the impression (perhaps mistaken..), that
> different sampling methods are defined by the SQL standard and that it
> would simply be us to implement them according to what the standard
> requires.
> 
> > I think this would be a useful project if those two points are taken
> > care of.
> 
> Doing it 'right' certainly isn't going to be simply taking what Neil did
> and updating it, and I understand Tom's concerns about having this be
> more than a hack on seqscan, so I'm a bit nervous that this would turn
> into something bigger than a GSoC project.
> 

As Christopher Browne mentioned, for this sampling method, it is not possible 
without scanning the whole data set. It improves the sampling quality but 
increases the sampling cost. I think it should also be using only for some 
special sampling types, not for general. The general sampling methods, as in 
the SQL standard, should have only SYSTEM and BERNOULLI methods. 

Best Regards and ThanksHuang Qi VictorComputer Science of National University 
of Singapore

Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Greg Stark
On Tue, Apr 17, 2012 at 2:49 PM, Stephen Frost  wrote:
> * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
>> 1. We probably don't want the SQL syntax to be added to the grammar.
>> This should be written as an extension, using custom functions as
>> the API, instead of extra SQL syntax.
>
> Err, I missed that, and don't particularly agree with it..  Is there a
> serious issue with the grammar defined in the SQL standard?  The other
> DBs which provide this- do they use the SQL grammar or something else?
>
> I'm not sure that I particularly *like* the SQL grammar, but if we're
> going to implement this, we should really do it 'right'.

I think the danger is that fiddling with the grammar can be a ratsnest
of learning how to deal with bison grammars and learning how to
interpret the ANSI standard and bikeshedding. These are all parts of
the open source world so maybe an argument could be made they should
be part of a GSOC project but I fear they would swallow the whole
project.

But I think I agree that doing it as an external module would be
strange and not very useful. I picture it instead as a new scan type
which is basically a copy of  heapscan or tidscan and uses various
algorithms to decide which tuples to return. For a first cut pof I
would leave out the grammar and just have a guc that enabled replacing
the heap scan with a sample scan everywhere.

But that would have to be done as a patch to Postgres to add the new
scan type. It wouldn't make it much easier to have a hook that
replaced one scan type with another I don't think.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-17 Thread Greg Stark
On Mon, Apr 16, 2012 at 10:20 PM, Simon Riggs  wrote:
> The thing I like most about temp indexes is that they needn't be temporary.
>
> I'd like to see something along the lines of demand-created optional
> indexes, that we reclaim space/maintenance overhead on according to
> some cache management scheme. More space you have, the more of the
> important ones hang around. The rough same idea applies to
> materialised views.

I find this a) really scary, b) a huge increase in scope, and c) kind
of pointless.

a) DDL and shared caches require all kinds of additional
synchronization that would be really annoying to be incurring in the
middle of DML queries unexpectedly. If these indexes are to be useful
for other transactions they would also impose a hidden and
uncontrollable cost on all updates and inserts on the table.

b) I think it would be a lot harder to get working cleanly. You would
need to invent a UI to control the lifetime and resources used by
these objects and deal with duplication between manually created and
dynamically created objects. It's fundamentally a huge shift in the
design of the database changing what things the user maintains as part
of their schema and what things the database maintains transparently.
This is a big project on its own aside from the technical side of
things.

c) If these indexes are useful for many queries then the user can
choose to create them themself. These indexes you're proposing would
be just as expensive as any those indexes and offer few advantages
aside from their automaticness. The same could be accomplished with
some external demon that just looked at the query logs and determined
which indexes to create or not.

The main advantage of creating dynamic indexes as part of the query
would be lost. Namely that these would be local data structures that
don't need to be synchronized with other transactions and don't need
to be updated by other transactions. They're just part of the query
execution the way spilled hash tables and tuplesort tapes are. You
could build indexes on materialized data resulting from earlier joins
or aggregates and so on.

The point is that if you make them equivalent to normal indexes just
dynamically maintained then all you've done is change the way normal
indexes work but haven't really changed the set of queries they're
useful for. That might be a neat UI Feature but If you want to change
the set of queries postgres can handle efficiently at all then you
need something that's fundamentally different from a table index.

As an aside I think some of what you're looking for could be better
handled with some kind of query result cache that could keep around
the materialized data from plan nodes until an event happens that
changes the data. This might be related to the underlying
infrastructure needed for materialized views but I haven't thought too
deeply about it. It seems like a lot of work and a big change from the
current very isolated per-process execution model.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Jay Levitt

Alex Shulgin wrote:

Jay Levitt  writes:

(A quick Google shows redmine and especially Trac having spam issues
of their own.)


Ugh, redmine (or trac for that matters) has nothing to with handling
spam.  I believe a typical bug tracker doesn't handle spam itself, it
lets the mailing system do that.

Surely you can throw in some captcha plugins to try to reduce the spam
posted from the web UI.


Maybe I'm confused - Magnus et al, are we talking spammy issues/issue 
comments/etc, or are we talking more about exposed email addresses?


I assumed we meant spammy issues, like blog comments - spammers post issues 
and comments with links, to get PageRank to their sites.  Email defenses 
wouldn't help here.


Captchas are fairly pointless nowadays, assuming you have someone dedicated 
enough to write a spambot against your bug tracker.  Most of them (even 
reCAPTCHA!) can be >80% defeated by software - many 99% - and there are 
millions of humans hanging out on Mechanical Turk who'll solve them for you 
100%.  Modern anti-spam ends up being a machine learning and systems 
exercise.. but that's another mailing list :) I think Google gets more use 
out of reCAPTCHA for OCR tweaking than for anti-spam.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Kyotaro HORIGUCHI
Hello, this message is attached with the patch which did not
tested. That is for show the way.

On Tue, Apr 17, 2012 at 9:38 PM, Kyotaro HORIGUCHI
 wrote:
> But I think referring checkpoint_segment on such case should be
> inhibited, and I suppose it is possible using StandbyMode in
> IsCheckpointOnSchedule(), I suppose.
>
> I will correct the patch later.

Hmm. StandbyMode is a local variable which cannot be accessed in
checkpointer. But WalRcvInProgress() which shows if wal receiver
is running seems to be usable to ENABLE governing progress by
checkpoint_segments.

| IsCheckpointOnSchedule(double progress)
| {

|/*
| * Inhibit governing progress by segments in archive recovery.
| */
|recovery_in_progress = RecoveryInProgress();
|if (!recovery_in_progress || WalRcvInProgress())
|{
|recptr = recovery_in_progress ? GetXLogReplayRecPtr(NULL) :
|GetInsertRecPtr();


How about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


standby_checkpoint_segments_9.2dev_fix_20120417v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Tom Lane
Stephen Frost  writes:
> * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
>> Another idea that Robert Haas suggested was to add support doing a
>> TID scan for a query like "WHERE ctid<  '(501,1)'". That's not
>> enough work for GSoC project on its own, but could certainly be a
>> part of it.

> I don't think Robert's suggestion would be part of a 'tablesample'
> patch.

Yeah, I don't see the connection either.  It seems more like this
would be a localized hack in tidpath.c and nodeTidscan.c.  I think
it'd be a neat beginning project for somebody, but it's not really
related to the GSoC project as proposed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Regarding GSoc Application

2012-04-17 Thread Atri Sharma
On Tue, Apr 17, 2012 at 7:37 PM, Andrew Dunstan  wrote:
>
>
> On 04/17/2012 09:12 AM, Atri Sharma wrote:
>>
>> I just had a small doubt I wanted to clarify.I initially said in my
>> proposal that I would be using SPI for getting the FDW API to call Pl/Java
>> functions,but now,after discussion with the community,I have changed the
>> approach and I will be using JNI Invocation API instead for the same
>> purpose. I wanted to know if I need to send a modified proposal for the same
>> or the original proposal shall still hold good.
>
>
> These are not the right forums to ask this question. You need to ask on the
> GSOC specific forum.
>
> cheers
>
> andrew

Thanks Andrew.I'll keep that in mind.


-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Regarding GSoc Application

2012-04-17 Thread Andrew Dunstan



On 04/17/2012 09:12 AM, Atri Sharma wrote:
I just had a small doubt I wanted to clarify.I initially said in my 
proposal that I would be using SPI for getting the FDW API to call 
Pl/Java functions,but now,after discussion with the community,I have 
changed the approach and I will be using JNI Invocation API instead 
for the same purpose. I wanted to know if I need to send a modified 
proposal for the same or the original proposal shall still hold good.


These are not the right forums to ask this question. You need to ask on 
the GSOC specific forum.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-17 Thread Alex Shulgin

Jay Levitt  writes:
>
>> No meaningful search, eh?  Works for me.
>
> Redmine searches return partial-word matches, and there's no way to
> disable that.  Searching for "test" finds "latest". To me, that's
> broken.

Well, I believe one can plug in a different search engine, like lucene
or xapian.  However it doesn't look like some one already did (for
ticket/wiki history, but there's xapian search plugin[1] to index attachments.)

> Also, the UI is very 5 years ago; e.g., "compare revisions" uses the
> same columns-of-radio-buttons approach as MediaWiki. If the goal is a
> tool to reduce friction and increase involvement, you want a smoother
> UX.

Nothing that could not be tweaked with a plugin or core code
modification here either.  Not sure about the magnitude of the effort
required, though.

--
Alex

[1] https://github.com/xelkano/redmine_xapian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Stephen Frost
* Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as
> the API, instead of extra SQL syntax.

Err, I missed that, and don't particularly agree with it..  Is there a
serious issue with the grammar defined in the SQL standard?  The other
DBs which provide this- do they use the SQL grammar or something else?

I'm not sure that I particularly *like* the SQL grammar, but if we're
going to implement this, we should really do it 'right'.

> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an
> interesting idea of on implementing monetary unit sampling.

In reviewing this, I got the impression (perhaps mistaken..), that
different sampling methods are defined by the SQL standard and that it
would simply be us to implement them according to what the standard
requires.

> I think this would be a useful project if those two points are taken
> care of.

Doing it 'right' certainly isn't going to be simply taking what Neil did
and updating it, and I understand Tom's concerns about having this be
more than a hack on seqscan, so I'm a bit nervous that this would turn
into something bigger than a GSoC project.

> Another idea that Robert Haas suggested was to add support doing a
> TID scan for a query like "WHERE ctid<  '(501,1)'". That's not
> enough work for GSoC project on its own, but could certainly be a
> part of it.

I don't think Robert's suggestion would be part of a 'tablesample'
patch.  Perhaps a completely different project which was geared towards
allowing hidden columns to be used in various ways in a WHERE clause..
Of course, we'd need someone to actually define that; I don't think
someone relatively new to the project is going to know what experienced
hackers want to do with system columns.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Heikki Linnakangas

On 17.04.2012 14:55, Qi Huang wrote:

Hi, Heikki   Thanks for your advice.I will change my plan accordingly. But 
I have a few questions.

1. We probably don't want the SQL syntax to be added to the grammar.
This should be written as an extension, using custom functions as the
API, instead of extra SQL syntax.


1. "This should be written as an extension, using custom functions as the API". 
Could you explain a bit more what does this mean?


I mean, it won't be integrated into the PostgeSQL server code. Rather, 
it will be a standalone module that can be distributed as a separate 
.tar.gz file, and installed on a server. PostgreSQL has some facilities 
to help you package code as extensions that can be easily distributed 
and installed.



2. It's not very useful if it's just a dummy replacement for "WHERE
random()<  ?". It has to be more advanced than that. Quality of the
sample is important, as is performance. There was also an interesting
idea of on implementing monetary unit sampling.


2. In the plan, I mentioned using optimizer statistics to improve the quality 
of sampling.


Yeah, that's one approach. Would be nice to hear more about that, how 
exactly you can use optimizer statistics to help the sampling.



I may emphasize on that point. I will read about monetary unit sampling and add 
into the plan about possibility of implementing this idea.


Ok, sounds good.


Another idea that Robert Haas suggested was to add support doing a TID
scan for a query like "WHERE ctid<  '(501,1)'". That's not enough work
for GSoC project on its own, but could certainly be a part of it.


3. I read about the replies on using ctid. But I don't quite understand how that might help. 
ctid is just a physical location of row version within the table. If I do "where 
ctid<'(501, 1)'", what is actually happening?


At the moment, if you do "WHERE ctid = '(501,1)', you get an access plan 
with a TidScan, which quickly fetches the row from that exact physical 
location. But if you do "WHERE ctid < '(501,1'), you get a SeqScan, 
which scans the whole table. That's clearly wasteful, you know the 
physical range of pages you need to scan: everything up to page 501. But 
the SeqScan will scan pages > 501, too. The idea is to improve that so 
that you'd only scan the pages up to page 501.



Can I add in this as an optional implementation? I think I can check how to do 
this if I can have enough time in this project.


Yeah, that sounds reasonable.


Besides, I saw the Gsoc site editing has been closed. Should I just submit 
through this mailing list with attachment?


Just post the updated details to this mailing list. Preferably inline, 
not as an attachment. You don't need to post the contact details, 
biography, etc, just updated inch-stones and project details parts.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq URI and regression testing

2012-04-17 Thread Alvaro Herrera

Hi,

When I committed Alex Shulgin's patch to add URI support to libpq, I
included the test harness as well.  However, due to it being in a
separate subdirectory that did not previously had tests, it's not being
run by buildfarm.  

It's not considered in "make installcheck-world" either.

What's the preferred way to make it automatically tested as much as
possible?  I know the buildfarm does not run "installcheck-world", so if
we want it there, it'd need a bit more code on the client side.  I think
it would be wise to have it also run on installcheck-world.

Hmm.  Just had this thought: not all platform support the same socket
types.  Maybe we should have separated the .in file in three parts:
IPv4, IPv6, unix-domain socket.  That way each platform would only run
tests that pertain to it.  Right now there's a single "regress.in" file
that lists all the tests.

Opinions?

-- 
Álvaro Herrera 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-17 Thread Thom Brown
On 17 April 2012 14:35, Heikki Linnakangas
 wrote:
> On 17.04.2012 14:10, Thom Brown wrote:
>>
>> On 17 April 2012 11:30, Heikki Linnakangas
>>   wrote:
>>>
>>> What happens is that we write the commit record if the transaction
>>> accesses
>>> a temporary table, but we don't flush it. However, we still wait until
>>> it's
>>> replicated to the standby. The obvious fix is to not wait for that, see
>>> attached.
>>
>>
>> Tested patch.  Yes, that fixes the problem.  Thanks.
>
>
> Ok, committed.

Thanks Heikki.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-17 Thread Heikki Linnakangas

On 17.04.2012 14:10, Thom Brown wrote:

On 17 April 2012 11:30, Heikki Linnakangas
  wrote:

What happens is that we write the commit record if the transaction accesses
a temporary table, but we don't flush it. However, we still wait until it's
replicated to the standby. The obvious fix is to not wait for that, see
attached.


Tested patch.  Yes, that fixes the problem.  Thanks.


Ok, committed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Alex Shulgin

Jay Levitt  writes:
>
> (A quick Google shows redmine and especially Trac having spam issues
> of their own.)

Ugh, redmine (or trac for that matters) has nothing to with handling
spam.  I believe a typical bug tracker doesn't handle spam itself, it
lets the mailing system do that.

Surely you can throw in some captcha plugins to try to reduce the spam
posted from the web UI.

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-17 Thread Jay Levitt

Magnus Hagander wrote:

On Mon, Apr 16, 2012 at 23:48, Jay Levitt  wrote:

- Familiarity: Many developers already have a GitHub account and use it

Most of the more senior developers don't use github. Other than
possibly as a place to store a plain git repository. So that's not
really relevant.


I meant outside developers - the folks you'd like to see more involved in 
the process.



- Patch commenting and git integration encourage actual review-resubmit
cycles instead of "Here, look, I fixed it for you" reviews


The amount of spam coming through that system, and the
inability/unwillingness of github to even care about it is a killer
argument *against* github.

We have working antispam for email. The github antispam is somewhere
around where email antispam was in 1994.


Interesting; I haven't run into this but you're the second person to mention 
it here.  Antispam is (in the large) a technically unsolvable problem; even 
in the '90s, we'd see hackers start poking at our newest countermeasures 
within the hour.  GitHub is a giant target, and PG probably benefits here 
from NOT being one. (A quick Google shows redmine and especially Trac having 
spam issues of their own.)


Pedantic note/fun fact: There was no email antispam in 1994; Canter & Siegel 
posted their infamous USENET Green Card spam that year, but it didn't really 
spread to email for another year or two. Once it did, there were fervent 
debates about whether it should be called "velveeta" to distinguish from the 
USENET variety.



GitHub could well be a non-starter, but if third-party-dependence is really
the holdup, I'd volunteer to write the tools - in fact, a google of [export
issues from github] shows a few that might already suffice.


It *is* a non-starter, because (a) it's a third party dependency, and
(b) AFAIK they don't provide *data access* to the issue trackers.


Sure they do:

http://developer.github.com/v3/issues/

Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Regarding GSoc Application

2012-04-17 Thread Atri Sharma
On Wed, Apr 11, 2012 at 8:17 PM, Atri Sharma  wrote:
>
> On Tue, Apr 10, 2012 at 10:41 PM, Atri Sharma  wrote:
Well. maybe I spoke too soon...JNI is probably the best route.  Since
SPI is off the table, all we're really pulling in from pl/java is the
(non-trivial) proper installation of a jvm into a postgres process.
pl/java is essentially a wrapper to JNI that does postgres to jni type
conversion and builds a database driven class lookup system that
allows building the java environment inside the database.   That part
is not needed at all; since we're not going to install our routines as
pl/java installed they will be installed into a separate and more
typical class path from a jvm point of view. There are no exposed
routines in pljava.so that allow for easy manual invocation of a java
routine -- it's all pretty much hardwired to the language function
call handler system.

This begs the question about why pl/java should be used at all. Well,
as I said the jvm insertion into postgres is non-trivial so that alone
is worth something.  Maybe though we should just rip out the jvm guts
that we really need (mostly some parts of backend.c and some of the
type files) and do a completely standalone jni wrapper.  pl/java's
build system is a mess anyways so we're not losing all that much in
trying to go off on our own and do something.  OTOH, throwing a build
system on top of the growing pile of things to do is turning this into
a daunting project vs the (admittedly cobbled together) approach I was
hoping to do earlier.  Anyways, it's really Atri's call if he's
comfortable proceeding.
>>>
>>> Hi All,
>>>
>>> I think we are back on the initial approach I proposed(hooking directly
> into
>>> the JVM and executing Java code that calls JDBC).I think the best way to
> do
>>> this is create a JVM that executes the Java code and give the control of
> the
>>> JVM to the native API.
>>>
>>> I agree,the only need of Pl/Java that is apparent here is the need of the
>>> Java internals(JDK et al).If we set them up independently,then,we can
> have
>>> the FDW wrapping JDBC directly through JNI.JNI would call pure Java
>>> functions to connect to the JDBC.
>>>
>>> I think we can proceed with this.Once we are done with the API calling
> Java
>>> functions,I think the rest of the path is easily mapped(writing Java
>>> functions to connect to JDBC).
>>
>>yeah -- it sound plausible.  I think the next step is to pull one of
>>the fdw projects that is working and currently builds with pgxs. Make
>>sure it is bsd licensed and that you retain the original copyright in
>>the code (feel free to add your own).   The mysql_fdw project for
>>example is a good candidate.  After that we need to strip out all the
>>mysql specific parts so that we have a basic skeleton of the fdw
>>wrapper that still builds.   From there we will need to build in the
>>jni calls as well as jvm initialization code we are going to more or
>>less directly copy from pl/java, but one thing at a time:  IMO,
>>getting the jvm installed and executing a 'hello world' jni is a good
>>milestone to reach before getting into all the mechanics of the remote
>>querying.
>>
>>The jvm initialization code in pl/java we are going to use is in
>>src/C/pljava/backend.c.  A lot of the initialization work done there
>>is irrelevant to what we are trying to do but there are some very
>>important bits like the on_proc_exit handler that does cleanup when
>>the postgres process exits.  Ideally we can drop just the stuff we
>>need from there and get our project to compile.
>>
>
> I agree,atm I'll work on getting JNI set up and to build a JVM using it and
> getting the "Hello,World" running in it in PostGreSQl.
>
> I'll keep you posted,
>
> Atri
>

Hi all,

I just had a small doubt I wanted to clarify.I initially said in my
proposal that I would be using SPI for getting the FDW API to call
Pl/Java functions,but now,after discussion with the community,I have
changed the approach and I will be using JNI Invocation API instead
for the same purpose.

I wanted to know if I need to send a modified proposal for the same or
the original proposal shall still hold good.

Atri

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-17 Thread Thom Brown
On 16 April 2012 17:21, Fujii Masao  wrote:
> On Sun, Apr 15, 2012 at 12:13 AM, Thom Brown  wrote:
>> No, that's not what I was referring to.  If you don't have a standby
>> (i.e. a single, isolated database cluster with no replication), and
>> its synchronous_commit is set to 'remote_write', what effect does that
>> have?
>
> It's the same effect as 'on' and 'local' do, i.e., transaction commit waits
> for only local WAL flush. This behavior is not documented explicitly...
> How should we change the document? What about adding the following
> into the explanation of synchronous_commit parameter (maybe the end
> of second paragraph of that)?
>
> -
> If synchronous_standby_names is not set, on, remote_write and local
> provide the same synchronization level; transaction commit only waits for
> local flush.
> -

Yes, that sounds fine.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Kyotaro HORIGUCHI
Sorry, I've wrote something wrong.

>> The reason we haven't historically obeyed checkpoint_segments
>> during recovery is that it slows down the recovery
>> unnecessarily if you're restoring from a backup and you replay,
>
> The variable StandbyMode is false on archive recovery, so no
> checkpoint triggerred during then.

Nevertheless, checkpoints will be triggered by checkpoint_timeout and
run at the maybe higher speed governed by checkpoint_segments. This is
undesirable behavior from such a point of view.

But I think referring checkpoint_segment on such case should be
inhibited, and I suppose it is possible using StandbyMode in
IsCheckpointOnSchedule(), I suppose.

I will correct the patch later.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-17 Thread Greg Stark
On Mon, Apr 16, 2012 at 10:42 PM, Peter Geoghegan  wrote:
> All but 4 regression tests pass, but they don't really count
> as failures, since they're down to an assumption in the tests that the
> order certain tuples appear should be the same as our current
> quicksort implementation returns them, even though, in these
> problematic cases, that is partially dictated by implementation - our
> quicksort isn't stable, but timsort is.

This is an interesting point. If we use a stable sort we'll probably
be stuck with stable sorts indefinitely. People will start depending
on the stability and then we'll break their apps if we find a faster
sort that isn't stable.

Notably though tapesort is not stable (because heapsort is not stable
so neither the runs nor the merge steps are stable). So people's apps
would appear to work when they're in memory and fail only on large
data sets. It's easily possible for a user's query to never need to go
to tape though.

We don't have the luxury of having a separate sort and stable_sort
though due to the ORDER BY clause.

All in all I think it's handier to have a stable ORDER BY sort than an
unstable one though. So I'm not necessarily opposed to it even if it
means we're stuck using a stable sort indefinitely.
-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Qi Huang

Besides, I saw the Gsoc site editing has been closed. Should I just submit 
through this mailing list with attachment?

Best Regards and ThanksHuang Qi VictorComputer Science of National University 
of Singapore

> Date: Tue, 17 Apr 2012 09:16:29 +0300
> From: heikki.linnakan...@enterprisedb.com
> To: j...@agliodbs.com
> CC: huangq...@hotmail.com; pgsql-hackers@postgresql.org; and...@anarazel.de; 
> alvhe...@commandprompt.com; neil.con...@gmail.com; dan...@heroku.com; 
> cbbro...@gmail.com; kevin.gritt...@wicourts.gov
> Subject: [HACKERS] Gsoc2012 idea, tablesample
> 
> On 24.03.2012 22:12, Joshua Berkus wrote:
> > Qi,
> >
> > Yeah, I can see that.  That's a sign that you had a good idea for a 
> > project, actually: your idea is interesting enough that people want to 
> > debate it.  Make a proposal on Monday and our potential mentors will help 
> > you refine the idea.
> 
> Yep. The discussion withered, so let me try to summarize:
> 
> 1. We probably don't want the SQL syntax to be added to the grammar. 
> This should be written as an extension, using custom functions as the 
> API, instead of extra SQL syntax.
> 
> 2. It's not very useful if it's just a dummy replacement for "WHERE 
> random() < ?". It has to be more advanced than that. Quality of the 
> sample is important, as is performance. There was also an interesting 
> idea of on implementing monetary unit sampling.
> 
> I think this would be a useful project if those two points are taken 
> care of.
> 
> Another idea that Robert Haas suggested was to add support doing a TID 
> scan for a query like "WHERE ctid<  '(501,1)'". That's not enough work 
> for GSoC project on its own, but could certainly be a part of it.
> 
> -- 
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
  

Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-17 Thread Qi Huang




Hi, Heikki   Thanks for your advice.I will change my plan accordingly. But 
I have a few questions.
> 1. We probably don't want the SQL syntax to be added to the grammar. 
> This should be written as an extension, using custom functions as the 
> API, instead of extra SQL syntax.
> 

1. "This should be written as an extension, using custom functions as the API". 
Could you explain a bit more what does this mean?   
> 2. It's not very useful if it's just a dummy replacement for "WHERE 
> random() < ?". It has to be more advanced than that. Quality of the 
> sample is important, as is performance. There was also an interesting 
> idea of on implementing monetary unit sampling.

2. In the plan, I mentioned using optimizer statistics to improve the quality 
of sampling. I may emphasize on that point. I will read about monetary unit 
sampling and add into the plan about possibility of implementing this idea. 
> Another idea that Robert Haas suggested was to add support doing a TID 
> scan for a query like "WHERE ctid< '(501,1)'". That's not enough work 
> for GSoC project on its own, but could certainly be a part of it.

3. I read about the replies on using ctid. But I don't quite understand how 
that might help. ctid is just a physical location of row version within the 
table. If I do "where ctid<'(501, 1)'", what is actually happening? Can I add 
in this as an optional implementation? I think I can check how to do this if I 
can have enough time in this project. 

Best Regards and ThanksHuang Qi VictorComputer Science Department- National 
University of Singapore

> Date: Tue, 17 Apr 2012 09:16:29 +0300
> From: heikki.linnakan...@enterprisedb.com
> To: j...@agliodbs.com
> CC: huangq...@hotmail.com; pgsql-hackers@postgresql.org; and...@anarazel.de; 
> alvhe...@commandprompt.com; neil.con...@gmail.com; dan...@heroku.com; 
> cbbro...@gmail.com; kevin.gritt...@wicourts.gov
> Subject: [HACKERS] Gsoc2012 idea, tablesample
> 
> On 24.03.2012 22:12, Joshua Berkus wrote:
> > Qi,
> >
> > Yeah, I can see that.  That's a sign that you had a good idea for a 
> > project, actually: your idea is interesting enough that people want to 
> > debate it.  Make a proposal on Monday and our potential mentors will help 
> > you refine the idea.
> 
> Yep. The discussion withered, so let me try to summarize:
> 
> 1. We probably don't want the SQL syntax to be added to the grammar. 
> This should be written as an extension, using custom functions as the 
> API, instead of extra SQL syntax.
> 
> 2. It's not very useful if it's just a dummy replacement for "WHERE 
> random() < ?". It has to be more advanced than that. Quality of the 
> sample is important, as is performance. There was also an interesting 
> idea of on implementing monetary unit sampling.
> 
> I think this would be a useful project if those two points are taken 
> care of.
> 
> Another idea that Robert Haas suggested was to add support doing a TID 
> scan for a query like "WHERE ctid<  '(501,1)'". That's not enough work 
> for GSoC project on its own, but could certainly be a part of it.
> 
> -- 
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

  

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Kyotaro HORIGUCHI
Hello,

> The reason we haven't historically obeyed checkpoint_segments
> during recovery is that it slows down the recovery
> unnecessarily if you're restoring from a backup and you replay,

The variable StandbyMode is false on archive recovery, so no
checkpoint triggerred during then.

xlog.c:10026 (in some version around 9.2)
| /*
|  * Request a restartpoint if we've replayed too much
|  * xlog since the last one.
|  */
| if (StandbyMode && bgwriterLaunched)
| {
|   if (XLogCheckpointNeeded(readId, readSeg))

> You could argue that you should obey checkpoint_segments in a
> standby server that's caught up with the master, but AFAICS the
> patch doesn't try to detect that.

Concerning checkpoint, there seems no need for the standby to know
whether it has been caught up with its master or not. Checkpoint has
been already kicked by checkpoint_timeout regardless of the
sync_state.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-17 Thread Thom Brown
On 17 April 2012 11:30, Heikki Linnakangas
 wrote:
> What happens is that we write the commit record if the transaction accesses
> a temporary table, but we don't flush it. However, we still wait until it's
> replicated to the standby. The obvious fix is to not wait for that, see
> attached.

Tested patch.  Yes, that fixes the problem.  Thanks.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-17 Thread Heikki Linnakangas

On 17.04.2012 02:54, Michael Nolan wrote:

On Mon, Apr 16, 2012 at 6:27 PM, Thom Brown  wrote:


Hi,

I've noticed that when using synchronous replication (on 9.2devel at
least), temporary tables become really slow:


Since temporary tables are only present until the session ends (or
possibly only until a commit), why are they replicated at all?


They're not replicated.

What happens is that we write the commit record if the transaction 
accesses a temporary table, but we don't flush it. However, we still 
wait until it's replicated to the standby. The obvious fix is to not 
wait for that, see attached.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e22bdac..d8523f3 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1168,7 +1168,8 @@ RecordTransactionCommit(void)
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	SyncRepWaitForLSN(XactLastRecEnd);
+	if (wrote_xlog)
+		SyncRepWaitForLSN(XactLastRecEnd);
 
 	/* Reset XactLastRecEnd until the next transaction writes something */
 	XactLastRecEnd.xrecoff = 0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-17 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> If the feature set is desirable, though, I wonder if Postgres is 
> big/high profile enough for them to figure out some sort of better 
> arrangement. They *love* it when big open-source projects use GitHub 
> as their public repo - they'll email and blog announcements about 
> it - and if there's interest I'd be happy to open a conversation 
> with them.

No need to wonder, we've been in contact with them before and they 
are very pro Postgres. 

> I've looked at it in conjunction with Jenkins CI; it looked nice but was way 
> too heavy-weight for a four-person startup (what's code review?). It's 
> probably much more suitable for this sized project.

> Gerrit's a full-featured code review app with a tolerable UI;

MediaWiki is just starting to jump into git/Gerrit and there are definitely 
a lot of rough edges in that workflow still.

...

Someone mentioned upthread that github spam was a problem. I'm not sure 
I see the issue here - wouldn't mail from them still go through our lists 
and out current anti-spam measures anyway?

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201204170623
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk+NRO4ACgkQvJuQZxSWSshPpACg9+ZB6NzCsvnkQwuoD/BzIHgL
yMkAn3zwksbKxaSDt3k/YzKY7UVLmUZb
=igZu
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clobbered parameter names via DECLARE in PL/PgSQL

2012-04-17 Thread Pavel Stehule
Hello

there is VIP patch of plpgsql_check_function that supports this warning

Regards

Pavel


2012/4/15 Pavel Stehule :
> 2012/4/15 Tom Lane :
>> Pavel Stehule  writes:
>>> We can raise warning from CREATE OR REPLACE FUNCTION - but I would to
>>> like have plpgsql_check_function inside core - and it is better place
>>> for this and similar issues.
>>
>> I agree.  This is a perfectly legal use of nested declaration scopes,
>> so it would be totally inappropriate to complain about it in normal
>> use of a plpgsql function.  On the other hand, it would probably be
>> sane and useful for CHECK FUNCTION to flag any case where an inner
>> declaration shadows an outer-scope name (not only the specific case
>> of topmost block vs function parameter).
>
> yes, it is very simple check there. There should be "levels" of
> warnings in future and performance or semantic warnings.
>
> But, we don't need to increase complexity of CHECK FUNCTION now. A
> design of CHECK FUNCTION was rich for this purposes. And we need to
> find way to push plpgsql_check_function to core first.
>
> Regards
>
> Pavel
>
>
>
>
>
>>
>>                        regards, tom lane


plpgsql_check_function-2012-04-17-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-17 Thread Heikki Linnakangas

On 17.04.2012 09:50, Kyotaro HORIGUCHI wrote:

This is new version of the patch.
I replaced GetStandbyFlushRecPtr with GetXLogReplayRecPtr to
check progress of checkpoint following Fujii's sugestion.


The reason we haven't historically obeyed checkpoint_segments during 
recovery is that it slows down the recovery unnecessarily if you're 
restoring from a backup and you replay, say, one week's worth of WAL 
files. If for example you have checkpoint_segments=10 and 
checkpoint_timeout='15 mins' in the server that generated the WAL, you 
would be constantly performing a restartpoint if you trigger one every 
10 segments.


You could argue that you should obey checkpoint_segments in a standby 
server that's caught up with the master, but AFAICS the patch doesn't 
try to detect that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-17 Thread Pavel Stehule
2012/4/17 Heikki Linnakangas :
> On 17.04.2012 07:56, Pavel Stehule wrote:
>>
>> 2012/4/16 Heikki Linnakangas:
>>>
>>> Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed
>>> MSVC
>>> install.pm to also copy the header file.
>>
>>
>> Hello,
>>
>> it doesn't work for modules from contrib directory
>>
>> pavel ~/src/postgresql/contrib/check_plpgsql $ make
>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Wendif-labels
>> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
>> -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
>> -I../../src/include -D_GNU_SOURCE   -c -o check_plpgsql.o
>> check_plpgsql.c
>> check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory
>> compilation terminated.
>
>
> Hmm, the makefile rule I added copies the plpgsql.h file to include/server
> directory when you do "make install". That makes the file available when you
> build with USE_PGXS=1, without access to the source tree, but doesn't change
> the situation when you build inside contrib.
>
> If you plop the module directly to contrib, I guess you'll have to do
>
> CFLAGS += -I$(top_srcdir)/src/pl/plpgsql/src
>
> That's what pldebugger has always done.
>

ook, thank you

Pavel

> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers