[HACKERS] Index-only scan performance regression

2012-01-28 Thread Dean Rasheed
Given a freshly created table (not vacuumed), and a query that uses an
index-only scan, for example:

CREATE TABLE foo(a int PRIMARY KEY);
INSERT INTO foo SELECT * FROM generate_series(1,100);
ANALYSE foo;

EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 1;

QUERY PLAN
---
 Aggregate  (cost=322.86..322.87 rows=1 width=0) (actual
time=23.646..23.646 rows=1 loops=1)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.00..300.42
rows=8975 width=0) (actual time=0.027..22.291 rows=1 loops=1)
 Index Cond: (a <= 1)
 Heap Fetches: 1
 Total runtime: 23.673 ms
(5 rows)


the actual performance is much worse than the equivalent index scan,
as used in 9.1 and earlier:

SET enable_indexonlyscan = off;
EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 1;

 QUERY PLAN
-
 Aggregate  (cost=322.86..322.87 rows=1 width=0) (actual
time=3.219..3.219 rows=1 loops=1)
   ->  Index Scan using foo_pkey on foo  (cost=0.00..300.42 rows=8975
width=0) (actual time=0.014..2.302 rows=1 loops=1)
 Index Cond: (a <= 1)
 Total runtime: 3.240 ms
(4 rows)


Obviously this is the worst-case for an index-only scan, since there
is no visibility map, and it has to fetch each tuple from the heap,
but ideally this should perform around the same as an ordinary index
scan, since it's doing pretty much the same work.

Digging around, it looks like the additional cost is coming from
visibilitymap_test(), which is calling smgrexists() for each tuple, to
see if the visibility map file has been created. So it's doing a file
access check for each row, while the visibility map doesn't exist.

I'm not really familiar with this code, but a possible fix seems to be
to send an invalidation message in vm_extend() when it creates or
extends the visibility map, so that vm_readbuf() only has to re-check
the visibility map file if smgr_vm_nblocks has been invalidated. With
this change (attached) the worst-case index-only scan time becomes
around the same as the index scan time.

Regards,
Dean


index-only-scan.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TS: Limited cover density ranking

2012-01-28 Thread karavelov
- Цитат от Oleg Bartunov (o...@sai.msu.su), на 28.01.2012 в 21:04 - 

> I suggest you work on more general approach, see 
> http://www.sai.msu.su/~megera/wiki/2009-08-12 for example. 
> 
> btw, I don't like you changed ts_rank_cd arguments. 

Hello Oleg, 

Thanks for the feedback. 

Is it OK to begin with adding an exta argument and check in calc_rank_cd? 

I could change the function names in order not to overload ts_rank_cd 
arguments. My proposition is : 

at sql level: 
ts_rank_lcd([weights], tsvector, tsquery, limit, [method]) 

at C level: 
ts_ranklcd_wttlf 
ts_ranklcd_wttl 
ts_ranklcd_ttlf 
ts_ranklcd_ttl 

Adding the functions could be done as an extension but they are just 
trampolines into calc_rank_cd(). 

I agree that what you describe in the wiki page is more general approach. So 
this : 

SELECT ts_rank_lcd(to_tsvector('a b c'), to_tsquery('a&c'),2 )>0; 

could be replaced with 

SELECT to_tsvector('a b c') @@ to_tsquery('(a ?2 c)|(c ?2 a) '); 

but if we need to look for 3 or more nearby terms without order the tsquery 
with '?' operator will became quite complicated. For example 

SELECT tsvec @@ 
'(a ? b ? c) | (a ? c ? b) | (b ? a ? c) | (b ? c ? a) | (c ? a ? b) | (c ? b ? 
a)'::tsquery; 

is the same as 

SELECT ts_rank_lcd(tsvec, 'a&b&c'::tsquery,2)>0; 

So this is the reason to think that the general approach does not exclude the 
the 
usefulness of the approach that I am proposing. 

Best regards 

-- 
Luben Karavelov


Re: [HACKERS] pg_dumpall and temp_tablespaces dependency problem

2012-01-28 Thread Tom Lane
Heikki Linnakangas  writes:
> Barring objections, I'll write a patch to relax the checking on 
> default_text_search_config and temp_tablespaces to match search_path.

This seems like something that's going to come back again and again.
What do you think of changing things so that ALTER ROLE/DATABASE SET
*never* throw hard errors for bogus-seeming values?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-28 Thread Jeff Janes
On Fri, Jan 27, 2012 at 5:35 AM, Heikki Linnakangas
 wrote:
> On 26.01.2012 04:10, Robert Haas wrote:
>
>>
>> I think you should break this off into a new function,
>> LWLockWaitUntilFree(), rather than treating it as a new LWLockMode.
>> Also, instead of adding lwWaitOnly, I would suggest that we generalize
>> lwWaiting and lwExclusive into a field lwWaitRequest, which can be set
>> to 1 for exclusive, 2 for shared, 3 for wait-for-free, etc.  That way
>> we don't have to add another boolean every time someone invents a new
>> type of wait - not that that should hopefully happen very often.  A
>> side benefit of this is that you can simplify the test in
>> LWLockRelease(): keep releasing waiters until you come to an exclusive
>> waiter, then stop.  This possibly saves one shared memory fetch and
>> subsequent test instruction, which might not be trivial - all of this
>> code is extremely hot.
>
>
> Makes sense. Attached is a new version, doing exactly that.

Others are going to test this out on high-end systems.  I wanted to
try it out on the other end of the scale.  I've used a Pentium 4,
3.2GHz,
with 2GB of RAM and with a single IDE drive running ext4.  ext4 is
amazingly bad on IDE, giving about 25 fsync's per second (and it lies
about fdatasync, but apparently not about fsync)

I ran three modes, head, head with commit_delay, and the group_commit patch

shared_buffers = 600MB
wal_sync_method=fsync

optionally with:
commit_delay=5
commit_siblings=1

pgbench -i -s40

for clients in 1 5 10 15 20 25 30
pgbench -T 30 -M prepared -c $clients -j $clients

ran 5 times each, taking maximum tps from the repeat runs.

The results are impressive.

clients headhead_commit_delay   group_commit
1   23.923.023.9
5   31.051.359.9
10  35.056.595.7
15  35.664.9136.4
20  34.368.7159.3
25  36.564.1168.8
30  37.283.871.5

I haven't inspected that deep fall off at 30 clients for the patch.

By way of reference, if I turn off synchronous commit, I get
tps=1245.8 which is 100% CPU limited.  This sets an theoretical upper
bound on what could be achieved by the best possible group committing
method.

If the group_commit patch goes in, would we then rip out commit_delay
and commit_siblings?



Cheers,

Jeff
<>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unfriendly error when accessing NEW in statement-level trigger

2012-01-28 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
> When you try to use the NEW variable in a statement-level trigger, you get
> ERROR:  record "new" is not assigned yet
> DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.

> I'm not that familiar with PL/pgSQL code, so I'm not sure how or if this
> should be fixed. From a quick look it seems that the NEW and OLD
> variables could be defined as row variables instead of record, since the
> format of the tuple is known.

Yeah, maybe.  I suspect that that wouldn't have worked when the code was
first designed, but now that we create a separate execution context for
each trigger target table, it should be the case that NEW/OLD have
stable structure.  You'd need to watch out for ALTER TABLE commands
though --- I don't remember exactly what plpgql does about rowtypes
changing underneath it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-28 Thread Tom Lane
Noah Misch  writes:
> On Thu, Jan 26, 2012 at 08:23:34AM -0500, Robert Haas wrote:
>> I'm just going to remove the test.  This is not very future-proof and

> [ objections ]

FWIW, I concur with Robert's choice here.  This test method is ugly and
fragile, and I'm not even thinking about the question of whether an
installation might have GUC settings that affect it.  My objection is
that you're assuming that nowhere else, anywhere in the large amount of
code executed by the queries under test, will anyone ever wish to insert
another elog(DEBUG) message.

> I used the same strategy in another ALTER TABLE patch this CF:
> http://archives.postgresql.org/message-id/20120126033956.gc15...@tornado.leadboat.com

That's going to need to be removed before commit too, then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] unfriendly error when accessing NEW in statement-level trigger

2012-01-28 Thread Jan Urbański
When you try to use the NEW variable in a statement-level trigger, you get

ERROR:  record "new" is not assigned yet
DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.

which is not too friendly, given that people sometimes forget to specify
FOR EACH  at all, get the default behaviour of FOR EACH STATEMENT
and scratch their heads. A quick search on the error detail reveals a
few such confused users already.

What's more, the documentation for PL/pgSQL triggers says that "This
variable is NULL in statement-level triggers" but it's not really a
NULL, for instance you can't execute "x is NULL" with it (you get the
same error).

I'm not that familiar with PL/pgSQL code, so I'm not sure how or if this
should be fixed. From a quick look it seems that the NEW and OLD
variables could be defined as row variables instead of record, since the
format of the tuple is known.

Would that make sense?

Cheers,
Jan

PS: If changing them to row types would induce too much code churn that
maybe we could use some trick to check if the error comes from using the
OLD or NEW variable and add a HINT to the error message?

J

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] isolationtester seems uselessly rigid as to length of permutation

2012-01-28 Thread Alvaro Herrera

Excerpts from Tom Lane's message of sáb ene 28 18:08:36 -0300 2012:
> I thought it'd be a good idea to put in some basic test cases for the
> EvalPlanQual code using the isolationtester infrastructure.  While
> fooling with it, I soon ran into this restriction:
> 
> if (p->nsteps != nallsteps)
> {
> fprintf(stderr, "invalid number of steps in permutation %d\n", i 
> + 1);
> exit_nicely();
> }
> 
> ie, a "permutation" list has to specify exactly as many steps as there
> are in the spec file.  This seems to me to be a useless restriction,
> because it prevents running a test sequence that repeats some steps or
> leaves other steps out.  Barring objections, I'm going to remove the
> above lines.

Yes, sorry, that patch was a mistake (misdiagnosed problem, later
patched differently).  My FOR KEY SHARE patch deals with that too, but
I'm happy with just removing the test.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-28 Thread Noah Misch
On Thu, Jan 26, 2012 at 08:23:34AM -0500, Robert Haas wrote:
> On Thu, Jan 26, 2012 at 6:55 AM, Noah Misch  wrote:
> > On Wed, Jan 25, 2012 at 11:53:10PM -0500, Tom Lane wrote:
> >> Not only is that code spectacularly unreadable, but has nobody noticed
> >> that this commit broke the buildfarm?
> >
> > Thanks for reporting the problem. ?This arose because the new test case
> > temporarily sets client_min_messages=DEBUG1. ?The default buildfarm
> > configuration sets log_statement=all in its postgresql.conf, so the client
> > gets those log_statement lines. ?I would fix this as attached, resetting the
> > optional logging to defaults during the test cases in question. ?Not
> > delightful, but that's what we have to work with.
> 
> I'm just going to remove the test.  This is not very future-proof and

It would deserve an update whenever we add a new optional-logging GUC
pertaining to user backends.  Other tests require similarly-infrequent
refreshes in response to other changes.  Of course, buildfarm members would
not be setting those new GUCs the day they're available.  Calling for an
update to the test could reasonably fall to the first buildfarm member owner
who actually decides to use a new GUC in his configuration.

> an ugly pattern if it gets copied to other places.  We need to work on

I would rather folks introduce ugliness to the test suite than introduce
behaviors having no test coverage.

> a more sensible way for ALTER TABLE to report what it did, but a
> solution based on what GUCs the build-farm happens to set doesn't seem
> like it's justified for the narrowness of the case we're testing here.

It's not based on what GUCs the buildfarm happens to set.  I looked up all
GUCs that might create problems such as the one observed here.  They were the
four I included in the patch, plus debug_pretty_print, debug_print_parse,
debug_print_plan and debug_print_rewritten.  I concluded that the four debug_*
ones were materially less likely to ever get set in postgresql.conf for a
"make installcheck" run, so I left them out for brevity.

The setting changed *by default* for buildfarm clients is log_statement.
Buildfarm member owners can do as they wish, though.

>  Whether or not we allow this case to work without a rewrite is in
> some sense arbitrary. There's no real reason it can't be done; rather,
> we're just exercising restraint to minimize the risk of future bugs.
> So I don't want to go to great lengths to test it.

I used the same strategy in another ALTER TABLE patch this CF:
http://archives.postgresql.org/message-id/20120126033956.gc15...@tornado.leadboat.com
If we pay its costs for those tests, we then may as well let this test case
ride their coattails.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-28 Thread Jeff Janes
On Thu, Jan 12, 2012 at 4:49 AM, Simon Riggs  wrote:
> On Thu, Jan 5, 2012 at 6:26 PM, Simon Riggs  wrote:
>
>> Patch to remove clog contention caused by dirty clog LRU.
>
> v2, minor changes, updated for recent commits

This no longer applies to file src/backend/postmaster/bgwriter.c, due
to the latch code, and I'm not confident I know how to fix it.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-28 Thread Cédric Villemain
Le 28 janvier 2012 21:46, David E. Wheeler  a écrit :
> On Jan 27, 2012, at 2:19 AM, Cédric Villemain wrote:
>
>>> Also --exclude-extension?
>>
>> It might be the default.
>> We need something to dump the content of
>> pg_catalog.pg_extension_script (or whatever table is going to contain
>> SQL code), per extension or all.
>
> I think dim said --no-extensions would be the default, but I’m thinking it 
> would be useful to have --with-extensions to include them all, but then be 
> able to --exclude-extension for a select few.

So I am.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] isolationtester seems uselessly rigid as to length of permutation

2012-01-28 Thread Tom Lane
I thought it'd be a good idea to put in some basic test cases for the
EvalPlanQual code using the isolationtester infrastructure.  While
fooling with it, I soon ran into this restriction:

if (p->nsteps != nallsteps)
{
fprintf(stderr, "invalid number of steps in permutation 
%d\n", i + 1);
exit_nicely();
}

ie, a "permutation" list has to specify exactly as many steps as there
are in the spec file.  This seems to me to be a useless restriction,
because it prevents running a test sequence that repeats some steps or
leaves other steps out.  Barring objections, I'm going to remove the
above lines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-28 Thread David E. Wheeler
On Jan 27, 2012, at 2:19 AM, Cédric Villemain wrote:

>> Also --exclude-extension?
> 
> It might be the default.
> We need something to dump the content of
> pg_catalog.pg_extension_script (or whatever table is going to contain
> SQL code), per extension or all.

I think dim said --no-extensions would be the default, but I’m thinking it 
would be useful to have --with-extensions to include them all, but then be able 
to --exclude-extension for a select few.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-28 Thread Heikki Linnakangas

On 28.01.2012 15:49, Simon Riggs wrote:

On Fri, Jan 27, 2012 at 9:07 PM, Dan Scales  wrote:


Also, I missed this before:  don't you want to add the checksum calculation 
(PageSetVerificationInfo) to mdextend() (or preferably smgrextend()) as well?  
Otherwise, you won't be checksumming a bunch of the new pages.


You don't need to checksum the extend because no data is written at
that point.


That's not correct. smgrextend writes a block of data just like 
smgrwrite does. When a relation is extended by the buffer manager, it 
calls smgrextend with an all-zeros block, but not all callers do that. 
See rewriteheap.c and nbtsort.c for counter-examples.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-28 Thread Jeff Janes
On Fri, Jan 27, 2012 at 1:45 PM, Jeff Janes  wrote:
> On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs  wrote:
>
>> The following patch adds a pgbench option -I to load data using
>> INSERTs, so that we can begin benchmark testing with rows that have
>> large numbers of distinct un-hinted transaction ids. With a database
>> pre-created using this we will be better able to simulate and thus
>> more easily measure clog contention. Note that current clog has space
>> for 1 million xids, so a scale factor of greater than 10 is required
>> to really stress the clog.
>
> Running with this patch with a non-default scale factor generates the
> spurious notice:
>
> "Scale option ignored, using pgbench_branches table count = 10"
>
> In fact the scale option is not being ignored, because it was used to
> initialize the pgbench_branches table count earlier in this same
> invocation.
>
> I think that even in normal (non-initialization) usage, this message
> should be suppressed when the provided scale factor
> is equal to the pgbench_branches table count.

The attached patch does just that.  There is probably no reason to
warn people that we are doing what they told us to, but not for the
reason they think.

I think this change makes sense regardless of the disposition of the
thread topic.

Cheers,

Jeff


pgbench_scale.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-28 Thread Kohei KaiGai
2012/1/26 Robert Haas :
> On Thu, Jan 26, 2012 at 2:07 PM, Kohei KaiGai  wrote:
>> 2012/1/26 Robert Haas :
>>> I'm wondering if a function would be a better fit than a GUC.  I don't
>>> think you can really restrict the ability to revert a GUC change -
>>> i.e. if someone does a SET and then a RESET, you pretty much have to
>>> allow that.  I think.  But if you expose a function then it can work
>>> however you like.
>>>
>> One benefit to use GUC is that we can utilize existing mechanism to
>> revert a value set within a transaction block on error.
>> If we implement same functionality with functions, XactCallback allows
>> sepgsql to get control on appropriate timing?
>
> Not sure, but I thought the use case was to set this at connection
> startup time and then hand the connection off to a client.  What keeps
> the client from just issuing RESET?
>
In the use case of Joshua, the original security label does not privileges
to reference any tables, and it cannot translate any domains without
credential within IC-cards. Thus, RESET is harmless.

However, I also assume one other possible use-case; the originator
has privileges to translate 10 of different domains, but disallows to
revert it. In this case, RESET without permission checks are harmful.
(This scenario will be suitable with multi-category-model.)

Apart from this issue, I found a problem on implementation using GUC
options. It makes backend crash, if it tries to reset sepgsql.client_label
without permissions within error handler because of double-errors.

So, I'll try to modify the patch with an idea to use a function.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dumpall and temp_tablespaces dependency problem

2012-01-28 Thread Heikki Linnakangas

create user foouser;
create tablespace temptblspc location '/tmp/tmptblspc';
alter user foouser set temp_tablespaces='temptblspc';

Run pg_dumpall. It will produce a dump like:

...
CREATE ROLE foouser;
ALTER ROLE foouser WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB 
LOGIN NOREPLICATION;

ALTER ROLE foouser SET temp_tablespaces TO 'temptblspc';
...
CREATE TABLESPACE temptblspc OWNER heikki LOCATION '/tmp/tmptblspc';

That latter ALTER ROLE statement fails at restore:

ERROR:  tablespace "temptblspc" does not exist

The problem here is that the ALTER ROLE statement refers to the 
tablespace, which is created afterwards. There's two possible solutions 
to this that I can see:


1. Teach pg_dumpall to dump the ALTER ROLE statement after creating 
tablespaces.


2. Relax the check on ALTER ROLE to not throw an error when you set 
temp_tablespaces to a non-existent tablespace.


There's another GUC that has the same problem: 
default_text_search_config. Only that is worse, because text search 
configurations are local to a database, so reordering the statements in 
the pg_dumpall output won't help. So I'm leaning towards option 2, also 
because moving the ALTER ROLE statement in the dump would make it less 
readable. Relaxing the check would be consistent with setting 
search_path, where you get a NOTICE rather than an ERROR if you refer to 
a non-existent schema in the ALTER ROLE statement.


Barring objections, I'll write a patch to relax the checking on 
default_text_search_config and temp_tablespaces to match search_path.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TS: Limited cover density ranking

2012-01-28 Thread Oleg Bartunov
I suggest you work on more general approach, see 
http://www.sai.msu.su/~megera/wiki/2009-08-12 for example.


btw, I don't like you changed ts_rank_cd arguments.

Oleg
On Fri, 27 Jan 2012, karave...@mail.bg wrote:


Hello,

I have developed a variation of cover density ranking functions that counts 
only covers that are lesser than a specified limit. It is useful for finding 
combinations of terms that appear nearby one another. Here is an example of 
usage:

-- normal cover density ranking : not changed
luben=> select ts_rank_cd(to_tsvector('a b c d e g h i j k'), 
to_tsquery('a&d'));
ts_rank_cd

 0.033
(1 row)

-- limited to 2
luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k'), 
to_tsquery('a&d'));
ts_rank_cd

 0
(1 row)

luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k a d'), 
to_tsquery('a&d'));
ts_rank_cd

   0.1
(1 row)

-- limited to 3
luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k'), 
to_tsquery('a&d'));
ts_rank_cd

 0.033
(1 row)

luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k a d'), 
to_tsquery('a&d'));
ts_rank_cd

  0.13
(1 row)

Find attached a path agains 9.1.2 sources. I preferred to make a patch, not a 
separate extension because it is only 1 statement change in calc_rank_cd 
function. If I have to make an extension a lot of code would be duplicated 
between backend/utils/adt/tsrank.c and the extension.

I have some questions:

1. Is it interesting to develop it further (documentation, cleanup, etc) for 
inclusion in one of the next versions? If this is the case, there are some 
further questions:

- should I overload ts_rank_cd (as in examples above and the patch) or should I 
define new set of functions, for example ts_rank_lcd ?
- should I define define this new sql level functions in core or should I go 
only with this 2 lines change in calc_rank_cd() and define the new functions as 
an extension? If we prefer the later, could I overload core functions with 
functions defined in extensions?
- and finally there is always the possibility to duplicate the code and make an 
independent extension.

2. If I run the patched version on cluster that was initialized with unpatched 
server, is there a way to register the new functions in the system catalog 
without reinitializing the cluster?

Best regards
luben

--
Luben Karavelov


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Andrew Dunstan



On 01/28/2012 01:46 PM, Jeff Davis wrote:

On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:

Andrew Dunstan  writes:

I'm curious what problem we're actually solving here, though. I've run
the buildfarm countless thousands of times on different VMs, and five of
my seven current animals run in VMs, and I don't think I've ever seen a
failure ascribable to inadequately synced files from initdb.

Yeah.  Personally I would be sad if initdb got noticeably slower, and
I've never seen or heard of a failure that this would fix.

I wonder whether it wouldn't be sufficient to call sync(2) at the end,
anyway, rather than cluttering the entire initdb codebase with fsync
calls.

I can always add a "sync" call to the test, also (rather than modifying
initdb). Or, it could be an initdb option, which might be a good
compromise. I don't have a strong opinion here.

As machines get more memory and filesystems get more lazy, I wonder if
it will be a more frequent occurrence, however. On the other hand, if
filesystems are more lazy, that also increases the cost associated with
extra "sync" calls. I think there would be a surprise factor if
sometimes initdb had a long pause at the end and caused 10GB of data to
be written out.



-1 for that. A very quick look at initdb.c suggests to me that there are 
only two places where we'd need to put fsync(), right before we call 
fclose()  in write_file() and write_version_file(). If we're going to do 
anything that seems to be the least painful and most portable way to go.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Jeff Janes
On Sat, Jan 28, 2012 at 10:18 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> I'm curious what problem we're actually solving here, though. I've run
>> the buildfarm countless thousands of times on different VMs, and five of
>> my seven current animals run in VMs, and I don't think I've ever seen a
>> failure ascribable to inadequately synced files from initdb.
>
> Yeah.  Personally I would be sad if initdb got noticeably slower, and
> I've never seen or heard of a failure that this would fix.
>
> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.
>
>                        regards, tom lane

Does sync(2) behave like sync(8) and flush the entire system cache, or
does it just flush the files opened by the process which called it?

The man page didn't enlighten me on that.

sometimes sync(8) never returns.  It doesn't just flush what was dirty
at the time it was called, it actually keeps running until there are
simultaneously no dirty pages anywhere in the system.  On busy
systems, this condition might never be reached.  And it can't be
interrupted, not even with kill -9.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-28 Thread Kohei KaiGai
2012/1/26 Robert Haas :
> On Thu, Jan 26, 2012 at 7:27 AM, Kohei KaiGai  wrote:
>> It seems to me reasonable design.
>> The attached patch is rebased one according to your perform-deletion patch.
>
> That looks pretty sensible.  But I don't think this is true any more:
>
> +    Please note that it shall not be checked on the objects removed by
> +    cascaded deletion according to the standard manner in SQL.
>
> I've been thinking more about the question of object access hooks with
> arguments as well.  In a couple of designs you've proposed, we've
> needed InvokeObjectAccessHook0..11 or whatever, and I don't like that
> design - it seems grotty and not type-safe.  However, looking at what
> you did in this patch, I have another idea.  Suppose that we add ONE
> additional argument to the object access hook function, as you've done
> here, and if a particular type of hook invocation needs multiple
> arguments, it can pass them with a struct.  In fact, let's use a
> struct regardless, for uniformity, and pass the value as a void *.
> That is:
>
> typedef struct ObjectAccessDrop {
>    int dropflags;
> } ObjectAccessDrop;
>
> At the call site, we do this:
>
> if (object_access_hook)
> {
>    ObjectAccessDrop arg;
>    arg.dropflags = flags;
>    InvokeObjectAccessHook(..., arg);
> }
>
> If there's no argument, then we can just do:
>
> InvokeObjectAccessHook(..., NULL);
>
> The advantage of this is that if we change the structure definition,
> loadable modules compiled against a newer code base should either (1)
> still work or (2) fail to compile.  The way we have it right now, if
> we decide to pass a second argument (say, the DropBehavior) to the
> hook, we're potentially going to silently break sepgsql no matter how
> we do it.  But if we enforce use of a struct, then the only thing the
> client should ever be doing with the argument it gets is casting it to
> ObjectAccessDrop *.  Then, if we've added fields to the struct, the
> code will still do the right thing even if the field order has been
> changed or whatever.   If we've removed fields or changed their data
> types, things should blow up fairly obviously instead of seeming to
> work but actually failing.
>
> Thoughts?
>
I also think your idea is good; flexible and reliable toward future enhancement.

If we have one point to improve this idea, is it needed to deliver
"access", "classid",
"objectid" and "subid" as separated argument?

If we define a type to deliver information on object access hook as follows:

typedef struct {
ObjectAccessTypeaccess;
ObjectAddress  address;
union {
struct {
intflags;
} drop;
} arg;
} ObjectAccessHookData;

All the argument that object_access_hook takes should be a pointer of this
structure only, and no need to type cast on the module side.

One disadvantage is that it needs to set up this structure on caller
side including
ObjectAccessType and ObjectAddress information. However, it can be embedded
within preprocessor macro to keep nums of lines as currently we do.

example:
#define InvaokeDropAccessHook(classid, objectid, objsubid, flags) \
  do {
  if (object_access_hook)
  {
  ObjectAccessHookData  __hook_data;

  __hook_data.access = OAT_DROP;
  __hook_data.address.classId  = (classid);
  __hook_data.address.objectId = (objectid);
  __hook_data.address.objectSubid = (objsubid);
  __hook_data.args.drop.flags = (flags);

  (*object_access_hook)(&__hook_data);
 }
  } while (0)

How about your opinion?

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Jeff Davis
On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > I'm curious what problem we're actually solving here, though. I've run 
> > the buildfarm countless thousands of times on different VMs, and five of 
> > my seven current animals run in VMs, and I don't think I've ever seen a 
> > failure ascribable to inadequately synced files from initdb.
> 
> Yeah.  Personally I would be sad if initdb got noticeably slower, and
> I've never seen or heard of a failure that this would fix.
> 
> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.

I can always add a "sync" call to the test, also (rather than modifying
initdb). Or, it could be an initdb option, which might be a good
compromise. I don't have a strong opinion here.

As machines get more memory and filesystems get more lazy, I wonder if
it will be a more frequent occurrence, however. On the other hand, if
filesystems are more lazy, that also increases the cost associated with
extra "sync" calls. I think there would be a surprise factor if
sometimes initdb had a long pause at the end and caused 10GB of data to
be written out.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Jeff Janes
On Sat, Jan 28, 2012 at 7:31 AM, Andrew Dunstan  wrote:
>
>
> On 01/27/2012 11:52 PM, Noah Misch wrote:
>>>
>>> Is a platform-independent fsync be available at initdb time?
>>
>> Not sure.
>>
>
> It's a macro on Windows that calls _commit(fd), so it should be portable
> enough.
>
> I'm curious what problem we're actually solving here, though. I've run the
> buildfarm countless thousands of times on different VMs, and five of my
> seven current animals run in VMs, and I don't think I've ever seen a failure
> ascribable to inadequately synced files from initdb.

I wouldn't expect you to ever see that problem on the buildfarm.  If
the OS gets thunked during the middle of a regression test, when it
comes back up the code is not going to try to pick up where it left
off, it is just going to blow away the entire install and start over
from scratch.  So any crash-recoverability problems will never be
detected.  I would guess the original poster is doing a more stringent
kind of test.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Tom Lane
Andrew Dunstan  writes:
> I'm curious what problem we're actually solving here, though. I've run 
> the buildfarm countless thousands of times on different VMs, and five of 
> my seven current animals run in VMs, and I don't think I've ever seen a 
> failure ascribable to inadequately synced files from initdb.

Yeah.  Personally I would be sad if initdb got noticeably slower, and
I've never seen or heard of a failure that this would fix.

I wonder whether it wouldn't be sufficient to call sync(2) at the end,
anyway, rather than cluttering the entire initdb codebase with fsync
calls.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Jeff Davis
On Sat, 2012-01-28 at 10:31 -0500, Andrew Dunstan wrote:
> I'm curious what problem we're actually solving here, though. I've run 
> the buildfarm countless thousands of times on different VMs, and five of 
> my seven current animals run in VMs, and I don't think I've ever seen a 
> failure ascribable to inadequately synced files from initdb.

I believe I have seen such a problem second hand in a situation where
the VM was known to be killed harshly (not sure if you do that
regularly).

It's a little difficult for me to _prove_ that this would have solved
the problem, and I think it was only observed once (though I could
probably reproduce it if I tried). The symptom was a log message
indicating that PG_VERSION was missing or corrupt on a system that was
previously started and online (albeit briefly for a test).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-28 Thread Andrew Dunstan



On 01/27/2012 11:52 PM, Noah Misch wrote:

Is a platform-independent fsync be available at initdb time?

Not sure.



It's a macro on Windows that calls _commit(fd), so it should be portable 
enough.


I'm curious what problem we're actually solving here, though. I've run 
the buildfarm countless thousands of times on different VMs, and five of 
my seven current animals run in VMs, and I don't think I've ever seen a 
failure ascribable to inadequately synced files from initdb.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inlining comparators as a performance optimisation

2012-01-28 Thread Bruce Momjian
On Fri, Jan 13, 2012 at 10:48:56AM +0100, Pierre C wrote:
> Maybe an extra column in pg_proc would do (but then, the proargtypes
> and friends would describe only the sql-callable version) ?
> Or an extra table ? pg_cproc ?
> Or an in-memory hash : hashtable[ fmgr-callable function ] => C version
> - What happens if a C function has no SQL-callable equivalent ?
> Or (ugly) introduce an extra per-type function
> type_get_function_ptr( function_kind ) which returns the requested
> function ptr
> 
> If one of those happens, I'll dust off my old copy-optimization patch ;)

I agree that COPY is ripe for optimization, and I am excited you have
some ideas on this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding shutdown checkpoint at failover

2012-01-28 Thread Simon Riggs
On Thu, Jan 26, 2012 at 5:27 AM, Fujii Masao  wrote:

> One thing I would like to ask is that why you think walreceiver is more
> appropriate for writing XLOG_END_OF_RECOVERY record than startup
> process. I was thinking the opposite, because if we do so, we might be
> able to skip the end-of-recovery checkpoint even in file-based log-shipping
> case.

Right now, WALReceiver has one code path/use case.

Startup has so many, its much harder to know whether we'll screw up one of them.

If we can add it in either place then I choose the simplest, most
relevant place. If the code is the same, we can move it around later.

Let me write the code and then we can think some more.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-28 Thread Simon Riggs
On Fri, Jan 27, 2012 at 10:05 PM, Jeff Janes  wrote:
> On Sat, Jan 21, 2012 at 7:31 AM, Simon Riggs  wrote:
>>
>> Yes, it was. Sorry about that. New version attached, retesting while
>> you read this.
>
> In my hands I could never get this patch to do anything.  The new
> cache was never used.
>
> I think that that was because RecentXminPageno never budged from -1.
>
> I think that that, in turn, is because the comparison below can never
> return true, because the comparison is casting both sides to uint, and
> -1 cast to uint is very large
>
>        /* When we commit advance ClogCtl's shared RecentXminPageno if needed 
> */
>        if (ClogCtl->shared->RecentXminPageno < 
> TransactionIdToPage(RecentXmin))
>                 ClogCtl->shared->RecentXminPageno =
> TransactionIdToPage(RecentXmin);

Thanks, will look again.

> Also, I think the general approach is wrong.  The only reason to have
> these pages in shared memory is that we can control access to them to
> prevent write/write and read/write corruption.  Since these pages are
> never written, they don't need to be in shared memory.   Just read
> each page into backend-local memory as it is needed, either
> palloc/pfree each time or using a single reserved block for the
> lifetime of the session.  Let the kernel worry about caching them so
> that the above mentioned reads are cheap.

Will think on that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-28 Thread Simon Riggs
On Fri, Jan 27, 2012 at 9:07 PM, Dan Scales  wrote:

> The advantage of putting the checksum calculation in smgrwrite() (or 
> mdwrite()) is that it catches a bunch of page writes that don't go through 
> the buffer pool (see calls to smgrwrite() in nbtree.c, nbtsort.c, spginsert.c)

I'll have another look at that. Seems like we can make it various
ways, we just need to decide the code placement.

> Also, I missed this before:  don't you want to add the checksum calculation 
> (PageSetVerificationInfo) to mdextend() (or preferably smgrextend()) as well? 
>  Otherwise, you won't be checksumming a bunch of the new pages.

You don't need to checksum the extend because no data is written at
that point. It create a new block that will become dirty at some point
and then be written out, which will trigger the checksum.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multithread Query Planner

2012-01-28 Thread
Ok, thanks.

Att,

Fred

2012/1/24 Robert Haas 

> On Tue, Jan 24, 2012 at 11:25 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> I doubt it.  Almost nothing in the backend is thread-safe.  You can't
> >> acquire a heavyweight lock, a lightweight lock, or a spinlock. You
> >> can't do anything that might elog() or ereport().  None of those
> >> things are reentrant.
> >
> > Not to mention palloc, another extremely fundamental and non-reentrant
> > subsystem.
> >
> > Possibly we could work on making all that stuff re-entrant, but it would
> > be a huge amount of work for a distant and uncertain payoff.
>
> Right.  I think it makes more sense to try to get parallelism working
> first with the infrastructure we have.  Converting to use threading,
> if we ever do it at all, should be something we view as a later
> performance optimization.  But I suspect we won't want to do it
> anyway; I think there will be easier ways to get where we want to be.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Temp file missing during large pgbench data set

2012-01-28 Thread Thom Brown
On 28 January 2012 09:04, Magnus Hagander  wrote:
> On Sat, Jan 28, 2012 at 09:57, Magnus Hagander  wrote:
>> On Sat, Jan 28, 2012 at 05:30, Tom Lane  wrote:
>>> Thom Brown  writes:
 I'm using latest git master (latest entry
 0816fad6eebddb8f1f0e21635e46625815d690b9) and I'm getting an error
 when trying to create a large data set with pgbench:
>>>
 LOG:  could not stat file "base/pgsql_tmp/pgsql_tmp8056.0": Success
 STATEMENT:  alter table pgbench_accounts add primary key (aid)
>>>
>>> I'll bet lunch this is due to some bug in commit
>>> bc3347484a7bf9eddb98e4352d84599cae9a31c6
>>
>> Not taking that bet. There definitely is a pretty obvious bug in that.
>> Will look into it. Pretty sure the else branch is at the wrong
>> level...
>
> Yup, it was. Patch applied, can you confirm if that fixes your problem?

Yes, looks fine now.

Thanks

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temp file missing during large pgbench data set

2012-01-28 Thread Magnus Hagander
On Sat, Jan 28, 2012 at 09:57, Magnus Hagander  wrote:
> On Sat, Jan 28, 2012 at 05:30, Tom Lane  wrote:
>> Thom Brown  writes:
>>> I'm using latest git master (latest entry
>>> 0816fad6eebddb8f1f0e21635e46625815d690b9) and I'm getting an error
>>> when trying to create a large data set with pgbench:
>>
>>> LOG:  could not stat file "base/pgsql_tmp/pgsql_tmp8056.0": Success
>>> STATEMENT:  alter table pgbench_accounts add primary key (aid)
>>
>> I'll bet lunch this is due to some bug in commit
>> bc3347484a7bf9eddb98e4352d84599cae9a31c6
>
> Not taking that bet. There definitely is a pretty obvious bug in that.
> Will look into it. Pretty sure the else branch is at the wrong
> level...

Yup, it was. Patch applied, can you confirm if that fixes your problem?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temp file missing during large pgbench data set

2012-01-28 Thread Magnus Hagander
On Sat, Jan 28, 2012 at 05:30, Tom Lane  wrote:
> Thom Brown  writes:
>> I'm using latest git master (latest entry
>> 0816fad6eebddb8f1f0e21635e46625815d690b9) and I'm getting an error
>> when trying to create a large data set with pgbench:
>
>> LOG:  could not stat file "base/pgsql_tmp/pgsql_tmp8056.0": Success
>> STATEMENT:  alter table pgbench_accounts add primary key (aid)
>
> I'll bet lunch this is due to some bug in commit
> bc3347484a7bf9eddb98e4352d84599cae9a31c6

Not taking that bet. There definitely is a pretty obvious bug in that.
Will look into it. Pretty sure the else branch is at the wrong
level...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers