Re: [HACKERS] Ask help for putting SP-Gist into postgresql

2010-03-26 Thread Albe Laurenz
Pei He wrote:
 I am trying to put the SP-Gist package, a general index framework for
 space partitioning trees , into Postgresql source code.
 
 SP-Gist was developed for postgresql 8.0. However, now it does not
 work with the new version.
 
 So, for the submitted patch, what version of postgresql is required?
 And, is there anybody can help with that?
 
 Please cc to my email, when reply.

Start here: http://wiki.postgresql.org/wiki/Submitting_a_Patch

Present your idea in detail on this mailing list as a first step,
and ask for help as you need.

Currently PostgreSQL is in feature freeze for the upcoming 9.0
release, so you should target the first commitfest for 9.1.

Your patches should apply to cvs HEAD.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming Replication and CopyOutResponse message

2010-03-26 Thread Fujii Masao
Hi,

At the end of the handshake for replication, a CopyOutResponse
message might not arrive at the standby for a while if there is
no WAL record to send. Also walreceiver would get stuck until
that message has arrived. This is not a big problem, but should
be fixed.

The cause is that walsender hasn't called pq_flush() there.
The attached patch changes walsender so as to call pg_flush()
immediately when sending that message. Is it worth applying
the patch?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


walsender_pg_flush_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-26 Thread Dimitri Fontaine

Disclaimer: the following is only my view on the matter and is not meant
as representative of the project views, which are reached through
discussion and consensus. IOW, my 2¢.

Joseph Adams joeyadams3.14...@gmail.com writes:
 Perhaps we could have some sort of LET statement that allows the
 client to pass data to the server, then have libpq automatically wrap
 queries with the LET statement (when necessary).  Here's what it would
 look like to the PHP scripter:

My first reaction reading this would be to keep the idea but extend the
WITH syntax instead, so you'd have

  WITH BINDINGS bob AS (
 current_user = 'bob'
  )
  SELECT answer FROM secrets WHERE user=current_user AND question=$1;

That said, you can already (as of 8.4) do the following:

  WITH bob(name) AS (
 SELECT 'bob'
  )
  SELECT answer FROM secrets, bob WHERE user=bob.name AND question=$1;

The syntax WITH bob(current_user) is not possible because of the
standard using current_user as a keyword (IIUC), but you get the idea.

 Granted, it would be silly to pass the value itself to the server over
 and over, so a serious implementation would probably pass a context
 ID, and these variable assignments would live in the backend instead.

I wonder if creating a temporary (I mean per-backend) new catalog where
to store the bindings (or whatever you name them) and then allow another
syntax like the following would help here:

  WITH BINDINGS bob ()
  SELECT ...

The real problem though is that when using a transaction level pooling
system you want to tie your bindings to a transaction, not to a
session. So I'm not sure if storing the bindings in a local backend
catalog is a must-have feature.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication and CopyOutResponse message

2010-03-26 Thread Heikki Linnakangas
Fujii Masao wrote:
 At the end of the handshake for replication, a CopyOutResponse
 message might not arrive at the standby for a while if there is
 no WAL record to send. Also walreceiver would get stuck until
 that message has arrived. This is not a big problem, but should
 be fixed.
 
 The cause is that walsender hasn't called pq_flush() there.
 The attached patch changes walsender so as to call pg_flush()
 immediately when sending that message. Is it worth applying
 the patch?

Yeah, applied.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incrementally Updated Backups and restartpoints

2010-03-26 Thread Fujii Masao
On Thu, Mar 4, 2010 at 9:00 PM, Fujii Masao masao.fu...@gmail.com wrote:
 And, when we start an archive recovery from the backup from the standby,
 we seem to reach a safe starting point before database has actually become
 consistent. It's because backupStartLoc is zero. Isn't this an issue?

 This issue seems to still happen. So should this be fixed for 9.0?
 Or only writing a note in document is enough for 9.0? I'm leaning
 towards the latter.

I'm thinking of adding something like the following to the section
25.6. Incrementally Updated Backups. Thought?


The pg_control file must be backed up first.
This avoids the problem that we might fail to restore a consistent
database state because recovery starts from the later restart point
than the start of the backup.

When recovering from the incrementally updated backup, the server
can begin accepting connections and complete the recovery successfully
before the database has become consistent. To avoid these problems,
you must check whether the database has been consistent by comparing
the progress of the recovery with the backup ending WAL location
before your users try to connect to the server and when archive
recovery ends. So, in advance, the backup ending WAL location must
be taken by calling the pg_last_xlog_replay_location function at the
end of the backup. The progress of the recovery is also taken from
the pg_last_xlog_replay_location function.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-26 Thread Marko Tiikkaja

On 3/26/10 5:42 AM +0200, Joseph Adams wrote:

// New libpq function
pg_set('current_user', 'bob');

$result = pg_query_params(
'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
array('Birth place'));


What this really does is something like:

$result = pg_query_params(
'LET current_user=$1 DO $2 $3',
array(
'bob',
'SELECT answer FROM secrets WHERE user=current_user AND 
question=$1',
'Birth place')
));


Looks to me like this is already achievable with custom GUCs and views.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres 9.0 Alpha, GIN indexes, and intarray contrib module, and SQL Functions

2010-03-26 Thread Mike Lewis
Greetings,

I'm using the intarray contrib module[1] gin indexes on arrays (obviously)
with postgres 9 alpha 4.  I am querying to see the existence of an element.
 When I do the query normally, it performs as I'd expect (very fast).  The
explain plan looks like what I'd expect, using the index and whatnot.  When
I put this in SQL function, it performs very slow, as if I didn't have an
index on it. (I thought it might be using the default @ operator, so I
tried using @@ operator which wasn't a normal array operator, but it still
had the issue). I also tried putting the query in a plpgsql to see if that
changed things (it didn't).

Then what I did was uninstall the intarray contrib module and created
regular gin indexes on the array.  When querying by hand it performs fast,
but it also seems to use these indexes when in the UDF (so all works when I
use the standard gin indexes).

Is there something I am missing in my function declarations or is this
possibly a bug?

My Table looks like:
create table followship_rollups
(
max_id bigint not null, -- for sorting
user_id int not null,
append_frozen bool default false not null,
follower_ids int[] not null CHECK (my_array_length(follower_ids) =
100),
friend_ids int[] not null CHECK (my_array_length(friend_ids) = 100)
);
create index followship_rollups_expanded_follower on followship_rollups
using gin (follower_ids  gin__int_ops);
create index followship_rollups_expanded_friend on followship_rollups using
gin (friend_ids  gin__int_ops);

My function is:

-- Return true or false if the friendship exists
create or replace function has_follower(user_id integer, follower_id
integer)
returns boolean
language sql as $$
(select true from followship_rollups where user_id = $1 and follower_ids
@ ARRAY[$2])
union all
(select false)
limit 1
;
$$;

Full sql is here if you need more info:
http://github.com/mikelikespie/followships/blob/c1a7e8c16159018d7d1154a11169315ac6560178/followships_2.sql
(please
forgive the sloppiness, I haven't cleaned it up yet)

Thanks,
Mike Lewis

[1] http://www.postgresql.org/docs/current/static/intarray.html
--
Michael Lewis
lolrus.org
mikelikes...@gmail.com


Re: [HACKERS] Postgres 9.0 Alpha, GIN indexes, and intarray contrib module, and SQL Functions

2010-03-26 Thread Tom Lane
Mike Lewis mikelikes...@gmail.com writes:
 I'm using the intarray contrib module[1] gin indexes on arrays (obviously)
 with postgres 9 alpha 4.  I am querying to see the existence of an element.
  When I do the query normally, it performs as I'd expect (very fast).  The
 explain plan looks like what I'd expect, using the index and whatnot.  When
 I put this in SQL function, it performs very slow, as if I didn't have an
 index on it. (I thought it might be using the default @ operator, so I
 tried using @@ operator which wasn't a normal array operator, but it still
 had the issue). I also tried putting the query in a plpgsql to see if that
 changed things (it didn't).

 Then what I did was uninstall the intarray contrib module and created
 regular gin indexes on the array.  When querying by hand it performs fast,
 but it also seems to use these indexes when in the UDF (so all works when I
 use the standard gin indexes).

I wonder whether you are dealing with a search path issue.  Was the
function being created/used with the same search_path as you were
testing by hand?

Some other remarks not directly related to the complaint:

 My Table looks like:
 create table followship_rollups
 (
 max_id bigint not null, -- for sorting
 user_id int not null,
 append_frozen bool default false not null,
 follower_ids int[] not null CHECK (my_array_length(follower_ids) =
 100),
 friend_ids int[] not null CHECK (my_array_length(friend_ids) = 100)
 );
 create index followship_rollups_expanded_follower on followship_rollups
 using gin (follower_ids  gin__int_ops);
 create index followship_rollups_expanded_friend on followship_rollups using
 gin (friend_ids  gin__int_ops);

Isn't user_id the primary key for this table?  If so, why isn't it
declared that way?  If not, what the heck *is* the intended structure of
this table?  I would think that the performance-critical part of your
query ought to be the user_id = $1 and the GIN indexes wouldn't be
useful at all (for this particular query anyway).

 create or replace function has_follower(user_id integer, follower_id
 integer)
 returns boolean
 language sql as $$
 (select true from followship_rollups where user_id = $1 and follower_ids
 @ ARRAY[$2])
 union all
 (select false)
 limit 1
 ;
 $$;

FWIW, this strikes me as pretty non-idiomatic SQL.  I'd have written it
as just
... as $$
  select exists(select 1 from followship_rollups where user_id = $1 and 
follower_ids @ ARRAY[$2])
$$;

That doesn't seem to explain your performance complaint though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-26 Thread Robert Haas
On Thu, Mar 25, 2010 at 11:42 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 From what I can tell, a big problem with my jails idea (as well as the
 variables Robert described) is that there really isn't a way to store
 context in the backend specifically for the end client (e.g. a PHP
 script) due to connection pooling.  Also, I almost feel that storing
 such context would be a disadvantage, as it would harm some of the
 referential transparency that pooling and caching take advantage of,
 now and in the future.  However, I'm not going to give up :)

 Perhaps we could have some sort of LET statement that allows the
 client to pass data to the server, then have libpq automatically wrap
 queries with the LET statement (when necessary).  Here's what it would
 look like to the PHP scripter:

 // New libpq function
 pg_set('current_user', 'bob');

 $result = pg_query_params(
        'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
        array('Birth place'));


 What this really does is something like:

 $result = pg_query_params(
        'LET current_user=$1 DO $2 $3',
        array(
                'bob',
                'SELECT answer FROM secrets WHERE user=current_user AND 
 question=$1',
                'Birth place')
        ));


 Here, the hypothetical LET statement executes a query string, binding
 current_user to our desired value.  The client library would wrap all
 future queries in this fashion.

 Granted, it would be silly to pass the value itself to the server over
 and over, so a serious implementation would probably pass a context
 ID, and these variable assignments would live in the backend instead.
 Moreover, LET is a terrible keyword choice here, considering most
 PostgreSQL users won't need to use it explicitly thanks to additional
 libpq support.

 Alternatively (this might require changing the client/server
 protocol), a context ID could be passed back and forth, thus providing
 a way to tell clients apart.

 Implementing this idea requires adding to the backend and to libpq.
 The backend would need at least two new statements.  One would set a
 variable of a session context, creating one if necessary and returning
 its ID.  Another would execute a string as a parameter and bind both
 immediate arguments and session context to it.  libpq would need a
 function to set a variable, and it would need to wrap queries it sends
 out with LET statements if necessary.

 Note that these variables can't be used in pre-defined functions
 unless they are somehow declared in advance.  One idea would be to
 first add global variable support, then make session-local contexts be
 able to temporarily reassign those variables.  Another would be to
 provide an explicit declaration statement.

 Would this make a good proposal for GSoC?:  Implement the backend part
 of my proposal, and create a proof-of-concept wrapper demonstrating
 it.  This way, I add the new statements, but don't mess around with
 existing functionality too much.

Hmm.  I'm not sure exactly what problem you're trying to solve here.
I don't think this is a particularly good design for supporting
variables inside the server, since, well, it doesn't actually support
variables inside the server.  If we just want a crude hack for
allowing the appearance of session-local server-side variables, that
could be implemented entirely in client code - in fact it could be
done as a thin wrapper around libpq that just does textual
substitution of the variables actually referenced by a particular
query.  That wouldn't require any modifications to core PostgreSQL at
all, and it would probably perform better too since you'd not send all
the unnecessary variables with every query.

Of course, you're 100% correct that connection pooling won't
necessarily play well with this feature, but that doesn't mean that we
shouldn't implement it.  For one thing, not everybody uses connection
pooling; for two things, I think global variables (that would behave
sort of like a sequence - they'd act sort of like a single column
single row relation) would also be useful, and those WOULD work in a
connection-pooling environment.

But, I think that implementing any kind of variable support in the
backend is way too ambitious a project for a first-time hacker to get
done in a couple of months.  I would guess that's a two-year project
for a first time hacker or a one-year project for an experienced
hacker (or a three week project for Tom Lane).  Here are some ideas
from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
to the right size for GSOC:

Allow administrators to cancel multi-statement idle transactions
Check for unreferenced table files created by transactions that were
in-progress when the server terminated abruptly
Add functions to check correctness of configuration files before they
are loaded live
Add JSON (JavaScript Object Notation) data type [tricky part will be
getting community buy-in on which JSON library to use]

[HACKERS] TODO list updates

2010-03-26 Thread Robert Haas
In reading through the TODO list, I noticed a few things that I think
are done, may be done, or may be partially done.  See below.
Thoughts?  ...Robert

Add missing operators for geometric data types
- this is at least partly done.  not sure if it is entirely done.

Add OR REPLACE to CREATE LANGUAGE
- Done.

Implement full support for window framing clauses.
- Not sure if we made any progress on this or not.

Add PQescapeIdentifierConn()
- Done, as PQescapeIdentifier().

Add UNIQUE capability to non-btree indexes
- This is one application of exclusion constraints, so I'd say this is done.

[GIN] Support empty indexed values (such as zero-element arrays) properly
- I think this might be done but I'm not sure.

Clean up VACUUM FULL's klugy transaction management
- I think this is moot with the new cluster-based VF.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO list updates

2010-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 In reading through the TODO list, I noticed a few things that I think
 are done, may be done, or may be partially done.  See below.
 Thoughts?  ...Robert

 Add missing operators for geometric data types
 - this is at least partly done.  not sure if it is entirely done.

I think you are thinking of Teodor's point_ops patch, but what that
did was add GIST index support for some existing geometric operators.
This TODO item suffers from the all-too-common sin of being uselessly
vague; it doesn't identify what operators it thinks are missing.

 Implement full support for window framing clauses.
 - Not sure if we made any progress on this or not.

We made some progress for 9.0, but there's still a lot of unimplemented
territory.

 Add UNIQUE capability to non-btree indexes
 - This is one application of exclusion constraints, so I'd say this is done.

I think exclusion constraints address this as much as it needs to be
addressed for GIST and GIN, neither of which have equality as a core
concept.  Hash, however, does have that.  So I'd vote for narrowing the
entry to support UNIQUE natively in hash indexes and moving it to the
hash-index section.

 [GIN] Support empty indexed values (such as zero-element arrays) properly
 - I think this might be done but I'm not sure.

Not done, not even started.  See the referenced discussions, or the
limitations acknowledged here:
http://developer.postgresql.org/pgdocs/postgres/gin-limit.html

 Clean up VACUUM FULL's klugy transaction management
 - I think this is moot with the new cluster-based VF.

Right, that one's either done or no longer relevant depending on how you
want to look at it.


There is another TODO item that I was just struck by while reading your
response to Joseph Adams:

Fix system views like pg_stat_all_tables to use set-returning
functions, rather than views of per-column functions

What is the point of this proposal?  The reason for the per-column function
implementation was to allow people to slice and dice the underlying data
their own way.  Changing to monolithic SRFs would make that harder, and
I don't see that it's buying anything especially useful.  I'd vote for
taking this off unless someone can explain why it's an improvement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] last_statrequest is in the future

2010-03-26 Thread Greg Smith

Tom Lane wrote:

Should we redesign the stats signaling logic to work around this,
or just hope we can nag kernel people into fixing it?
  


Even if there was something to be done in kernel space, how many years 
from now would it be before it made this problem go away for the 
majority of near future 9.0 users?  We've been seeing a fairly regular 
stream of pgstat wait timeout reports come in.  The one I reported was 
from recent hardware and a very mainstream Linux setup.  I'm not real 
optimistic that this one can get punted toward the OS and get anything 
done about it in time to head off problems in the field.


This particularly pathologic case with jaguar is great because it's made 
it possible to nail down how to report the problem.  I don't think it's 
possible to make a strong conclusion about how to resolve this just from 
that data though.  What we probably need is for your additional logging 
code to catch this again on some systems that are not so obviously 
broken, to get a better idea what a normal (rather than extreme) 
manifestation looks like.  How much skew is showing up, whether those do 
in fact correspond with the wait timeouts, that sort of thing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] last_statrequest is in the future

2010-03-26 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Even if there was something to be done in kernel space, how many years 
 from now would it be before it made this problem go away for the 
 majority of near future 9.0 users?  We've been seeing a fairly regular 
 stream of pgstat wait timeout reports come in.  The one I reported was 
 from recent hardware and a very mainstream Linux setup.

The behavior I'm seeing in jaguar's reports is difficult to characterize
as anything except a seriously broken machine (see the followup message
with further details).  It may be that there is some
not-obviously-broken-hardware case that we need to deal with, but I have
not seen evidence of it yet.  Can you reproduce your test case with CVS
HEAD and send the log messages?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO list updates

2010-03-26 Thread Robert Haas
On Fri, Mar 26, 2010 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 In reading through the TODO list, I noticed a few things that I think
 are done, may be done, or may be partially done.  See below.
 Thoughts?  ...Robert

 Add missing operators for geometric data types
 - this is at least partly done.  not sure if it is entirely done.

 I think you are thinking of Teodor's point_ops patch, but what that
 did was add GIST index support for some existing geometric operators.
 This TODO item suffers from the all-too-common sin of being uselessly
 vague; it doesn't identify what operators it thinks are missing.

Well, I'm thinking of that because that TODO item has a reference to
the point_ops patch.  I think we should remove this item and replace
it with any more specific suggestions someone cares to raise.

 Implement full support for window framing clauses.
 - Not sure if we made any progress on this or not.

 We made some progress for 9.0, but there's still a lot of unimplemented
 territory.

Perhaps it could be made more specific.

 Add UNIQUE capability to non-btree indexes
 - This is one application of exclusion constraints, so I'd say this is done.

 I think exclusion constraints address this as much as it needs to be
 addressed for GIST and GIN, neither of which have equality as a core
 concept.  Hash, however, does have that.  So I'd vote for narrowing the
 entry to support UNIQUE natively in hash indexes and moving it to the
 hash-index section.

As far as I know, exclusion constraints would work with hash opclasses
also.  Do you think there's an advantage to having something that is
hash-specific a la the btree-specific stuff we already have?

 [GIN] Support empty indexed values (such as zero-element arrays) properly
 - I think this might be done but I'm not sure.

 Not done, not even started.  See the referenced discussions, or the
 limitations acknowledged here:
 http://developer.postgresql.org/pgdocs/postgres/gin-limit.html

OK.

 Clean up VACUUM FULL's klugy transaction management
 - I think this is moot with the new cluster-based VF.

 Right, that one's either done or no longer relevant depending on how you
 want to look at it.

OK.

 There is another TODO item that I was just struck by while reading your
 response to Joseph Adams:

        Fix system views like pg_stat_all_tables to use set-returning
        functions, rather than views of per-column functions

 What is the point of this proposal?  The reason for the per-column function
 implementation was to allow people to slice and dice the underlying data
 their own way.  Changing to monolithic SRFs would make that harder, and
 I don't see that it's buying anything especially useful.  I'd vote for
 taking this off unless someone can explain why it's an improvement.

I assumed it would be faster and less likely to return inconsistent
results.  If that's not the case then I agree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More idle thoughts

2010-03-26 Thread Greg Stark
The Linux kernel had a big push to reduce latency, and one of the
tricks they did was they replaced the usual interrupt points with a
call which noted how long it had been since the last interrupt point.
It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
conditionally having it call a function which calls gettimeofday and
compares with the previous timestamp received at the last CFI().

I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which
occur after a syscall that an actual interrupt would interrupt. They
might show a long wait even though an actual interrupt would trigger
them right away --  the two that come to mind are reading from the
client and blocking on a semaphore. So I think we would need to add a
parameter to indicate if that's the case for a given call-site. I
think it would be easy to know what value to put there because I think
we always do a CFI() immediately after such syscalls but I'm not 100%
sure that's the case.

Obviously this wouldn't run all the time. I'm not even sure it should
run on the build farm because I think doing an extra syscall at these
places might mask timing bugs by synchronizing the bus in a lot of
places. But even a few build farm animals might uncover places where
we don't respond to C-c or hold up the sinval messages etc.

It also doesn't replace our current method of responding to user
complaints -- many if not all of them are relatively subtle cases
where the user is doing something unusual to create a loop that
doesn't normally occur. But we won't know that unless we try.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO list updates

2010-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 As far as I know, exclusion constraints would work with hash opclasses
 also.

Yeah, they do.

 Do you think there's an advantage to having something that is
 hash-specific a la the btree-specific stuff we already have?

Sure: it'll be more efficient because of not needing an AFTER trigger.
Now of course this assumes that hash indexes ever grow up enough to play
in the big leagues, which is a pretty dubious proposition; but having
real unique-constraint support is one of the things people would want
from them if they ever did get to be credible production choices.
So this isn't something I'd put at the front of the TODO list for hash
indexes, but it ought to be in there somewhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More idle thoughts

2010-03-26 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 The Linux kernel had a big push to reduce latency, and one of the
 tricks they did was they replaced the usual interrupt points with a
 call which noted how long it had been since the last interrupt point.
 It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by
 conditionally having it call a function which calls gettimeofday and
 compares with the previous timestamp received at the last CFI().

Hmm.  The only thing you could find out is which CFI call was reporting
the long delay, not what the path of control in between had been.  If
the long computation had been down inside some function that's no longer
active, this wouldn't help much to track it down.  Still, it'd be better
than no data at all.

 I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which
 occur after a syscall that an actual interrupt would interrupt. They
 might show a long wait even though an actual interrupt would trigger
 them right away --  the two that come to mind are reading from the
 client and blocking on a semaphore. So I think we would need to add a
 parameter to indicate if that's the case for a given call-site. I
 think it would be easy to know what value to put there because I think
 we always do a CFI() immediately after such syscalls but I'm not 100%
 sure that's the case.

I'm afraid you're being too optimistic about that.  What I'd think about
is just adding an extra call to reset the delay-time counter after each
such syscall, rather than having two kinds of CFI.

 Obviously this wouldn't run all the time.

Yeah, it would increase the overhead of CFI by orders of magnitude,
so you wouldn't even want to think about building a production version
that way.  But it might be a useful testing option.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-03-26 Thread Simon Riggs
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote:

 When replaying the deletion record, the standby could look at all the
 heap tuples whose index pointers are being removed, to see which one
 was newest.

Long after coding this, I now realise this really is a dumb-ass idea.

There is no spoon. The index tuples did once point at valid heap tuples.
1. heap tuples are deleted
2. heap tuples become dead
3. index tuples can now be marked killed
4. index tuples removed
Heap tuples can be removed at step 2, index tuples can't be removed
until step 4. so the dead index tuples can't be followed reliably to
read the xids. They might be the correct ones, might not, and no way to
tell.

So that puts this back to exactly the place we were on 31 Jan:

On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote:
 We can always put it back later if nothing better gets
 implemented.

So, barring huge injections of insight, I'll be putting back the
generation of OldestXmin() into the btree delete path.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-03-26 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote:
 When replaying the deletion record, the standby could look at all the
 heap tuples whose index pointers are being removed, to see which one
 was newest.

 Long after coding this, I now realise this really is a dumb-ass idea.

 There is no spoon. The index tuples did once point at valid heap tuples.
 1. heap tuples are deleted
 2. heap tuples become dead
 3. index tuples can now be marked killed
 4. index tuples removed
 Heap tuples can be removed at step 2, index tuples can't be removed
 until step 4.

Uh, no, heap tuples can't be removed until after all index entries that
are pointing at them have been removed.  Please tell me you have not
broken this.

 So, barring huge injections of insight, I'll be putting back the
 generation of OldestXmin() into the btree delete path.

I stand by my previous comments about where this is going to end up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance improvement for unique checks

2010-03-26 Thread Gokulakannan Somasundaram
Hi,
   Since we insert a new entry into the index for every update that's being
made into the table, we inevitably make a unique check against the older
version of the newly inserted row, even when the values are not updated. Of
course i am talking about non-HOT updates. (We will not go to the index for
HOT updates)

a) The page which contains the index entry is Exclusively locked
b) We go ahead and visit the heap page for its HeapTupleSatisfiesDirty.

If we have the information of the old tuple(its tuple-id) after a heap
update, during the index insert, we can avoid the uniqueness check for this
tuple,as we know for sure that tuple won't satisfy the visibility criteria.
If the table has 'n' unique indexes it avoids 'n' heap tuple lookups, also
increasing the concurrency in the btree, as the write lock duration is
reduced.

Any comments?

Thanks,
Gokul.


Re: [HACKERS] dtester-0.1 released

2010-03-26 Thread Steve Singer

Markus Wanner wrote:

Thanks,

I can now fetch off the postgres-dtester repository, but I'm still 
getting the error when I try to clone the /dtester repository.





Steve,

Oh, thank you for pointing this out. I've fixed that for now.

Do I need to run git update-server-info after every pull for the http 
protocol to work? Or just once to initialize the repository?


Note that the git protocol said to be is more efficient. At least that's 
what I've heard. You might want to use that instead (especially if http 
continues to pose problems).


Kind Regards

Markus Wanner




--
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-26 Thread Robert Haas
On Sun, Jul 19, 2009 at 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  Ideally this sort of thing would happen in prepjointree.c, but
 we don't have nearly enough information at that stage.

Tom,

You've mentioned this point a couple of times - what is ideal about
prepjointree?  Reading through the optimizer functions section of
src/backend/optimizer/README, it seems like the earliest point at
which we could do this would be just before the call to
make_one_rel().  I think that would eliminate some redundant
computation.  Right now, if we have A LJ B LJ C we'll try joining A to
C and discover that it's removable; later we'll try joining {A B} to C
and again discover that C is removable.  But maybe it could be
attacked from the other direction: look at C and try to figure out
whether there's a some baserel or joinrel to which we can join it
removably.  If we find one, we don't need to bother generating seq
scan or index paths for C, reloptinfos for joinrels that include C,
etc.

The problem with moving it back any further seems to be that we have
to know which clauses are mergejoinable before we can do the necessary
computations; I think flattening of the query tree has to already be
done too.

Obviously this is all 9.1 material but PG east has me thinking about
this topic again...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join removal

2010-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jul 19, 2009 at 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  Ideally this sort of thing would happen in prepjointree.c, but
 we don't have nearly enough information at that stage.

 You've mentioned this point a couple of times - what is ideal about
 prepjointree?

Well, it's the place where we like to rearrange the join tree, and
dropping a join altogether certainly counts as that.  We can't do it
there, at the moment anyway, for lack of supporting data --- but if
it were possible to do it at that time I think it'd be the cleanest
approach.

 Reading through the optimizer functions section of
 src/backend/optimizer/README, it seems like the earliest point at
 which we could do this would be just before the call to
 make_one_rel().  I think that would eliminate some redundant
 computation.

Maybe.  It would also add a new pass over the join tree that, in
99% of cases, would make no useful contribution whatever.  It's
possible that this would still be cheaper than a lot of failed calls
to join_is_removable, but I'm unconvinced --- we were able to make
the failure path through that pretty durn cheap for most simple cases.
The approach you're sketching still involves a combinatorial search
so I doubt it's going to be cheap.

I should maybe pause here a moment to say that my approach to
considering the cost of new planner optimizations is to focus on how
much time the added code will eat on queries where it fails to make any
useful contribution.  If the optimization wins, then presumably you will
make back at execution time whatever it might have cost you to plan
(if this is debatable, you probably shouldn't be bothering with the idea
at all).  So the pain will come from adding planning time on queries
where there isn't any runtime payoff; especially for something like join
removal, which only applies to a small minority of queries anyway.
Therefore I'm suspicious of adding new passes over the query structure
if they are only going to be used for low-probability wins.

 The problem with moving it back any further seems to be that we have
 to know which clauses are mergejoinable before we can do the necessary
 computations; I think flattening of the query tree has to already be
 done too.

Yeah.  I had been thinking that we could build the RelOptInfo and
IndexOptInfo structs earlier, but really all of the
clause-classification work done by deconstruct_jointree is important
as well for this function's purposes, so it's not that easy to push
back to prepjointree :-(.  I suspect that any such attempt would end
up requiring a massive rethinking of the order of operations in the
planner.  Maybe we should do that sometime but I'm not eager for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)

2010-03-26 Thread Joseph Adams
On Fri, Mar 26, 2010 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote:

 Hmm.  I'm not sure exactly what problem you're trying to solve here.
 I don't think this is a particularly good design for supporting
 variables inside the server, since, well, it doesn't actually support
 variables inside the server.  If we just want a crude hack for
 allowing the appearance of session-local server-side variables, that
 could be implemented entirely in client code - in fact it could be
 done as a thin wrapper around libpq that just does textual
 substitution of the variables actually referenced by a particular
 query.  That wouldn't require any modifications to core PostgreSQL at
 all, and it would probably perform better too since you'd not send all
 the unnecessary variables with every query.

One problem with a textual substitution is that implicit variable use
(e.g. selecting from a view) can't be substituted, at least not
trivially.  As for sending unnecessary variables with every query,
my idea was to store those variables in a global table keyed by
context ID, then just send that context ID with every query.

 But, I think that implementing any kind of variable support in the
 backend is way too ambitious a project for a first-time hacker to get
 done in a couple of months.  I would guess that's a two-year project
 for a first time hacker or a one-year project for an experienced
 hacker (or a three week project for Tom Lane).  Here are some ideas
 from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
 to the right size for GSOC:
 [...]
 Add JSON (JavaScript Object Notation) data type [tricky part will be
 getting community buy-in on which JSON library to use]

The JSON idea caught my eye.  I guess the best approach here would be
not to use an external library, but to implement it manually using
flex/bison.  Most of the work would probably revolve around converting
things to/from PostgreSQL types, writing test cases, and getting it
integrated; writing the parser itself should be a piece of cake.

At first, I figured adding JSON support would be almost too trivial:
just parse it, then you're done.  After seeing that
src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit
more to it :)

I skimmed through some JSON implementations in C, and I didn't find
any using bison/flex.  From the looks of it, I do like JSON_parser (
http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears
to be written for speed.

I think one benefit of adding JSON support is that it would provide a
way to store EAV-type data with less overhead than XML (and no
dependency on an external library).  If this were the only goal,
binary encoding would be even better.  However, I suppose JSON is more
popular and easier to work with in practice.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers