[HACKERS] ANALYZE to be ignored by VACUUM

2008-02-15 Thread ITAGAKI Takahiro
When there are a heavily updated table and a large table at the same time,
ANALYZE against the large table disturbs other autovacuums and HOT updates.
In my workload, ANALYZE takes long time (1min at statistics_target = 10,
and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
because seqscans run on the table repeatedly.

ANALYZE is a transaction. As long as long transactions are alive,
VACUUM cannot remove dead tuples deleted after those transaction began.
HOT also cannot work under long transactions. We will be happy if VACUUM
can get along with long transactions, but it requires some kinds of complex
managements of vacuum horizon. I have no idea for it...


So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
by VACUUM. It is just same as VACUUM that has already been ignored by other
VACUUMs since version 8.2.

My proposal is splitting ANALYZEs with use_own_xacts (by VACUUM ANALYZE,
autovacuum or database-wide analyze) to two transactions:
   T1: acquire_sample_rows()
   T2: compute_stats() and update_attstats()
and set PROC_IN_VACUUM during T1.
T1 takes long time because read pages to sample rows.
T2 is relatively short because stats are calculated in memory, without i/o.
T2 is needed for consistent snapshot because user-defined functions
in expression indexes might use it.

Is it an acceptable approach? Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Reworking WAL locking

2008-02-15 Thread Simon Riggs
On Thu, 2008-02-14 at 16:50 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Thu, 2008-02-14 at 13:52 -0500, Tom Lane wrote:
  [ still staring at the code ... ]  Something that might be interesting
  though is to try to move some of the buffer control logic overhead out
  of WALInsertLock's domain and into WALWriteLock's domain.
 
  I like this one because its another example that contention is not
  static, seems like its more often cyclical. The WALInsertLock is held
  longer than normal when we cross page boundaries, creating a pulsing
  effect thru the lock queue.
 
 Yeah, significantly longer than normal if you assume the normal case is
 to write a WAL record that's much less than a full page.  But has anyone
 seen direct evidence that that's an important effect?  I was just
 proposing this on speculation --- if there's already evidence that that
 behavior is a problem, it'd be interesting to see it.

A tracepoint in AdvanceXLInsertBuffer() would help there.

I would expect lock time to vary with number of cache lines touched
(approximated by record size) as well as whether the record crossed a
page boundary. Very large WAL records are more likely to cross
boundaries, so the effect will show itself most when we write small WAL
records.

 BTW, we'd probably need to do something like this even if we then go
 forward with your original idea.  If we're going to allow multiple
 backends to be inserting WAL records into the-same-or-different WAL
 buffers concurrently, we can't have that same code responsible for
 initializing empty buffers.

You've sold me already!

I will return to the other part of the idea, but just too busy now to
think and reply in full. I agree with the issues you raised earlier.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] wishlist for 8.4

2008-02-15 Thread Richard Huxton

Decibel! wrote:

On Feb 14, 2008, at 10:06 AM, Tom Lane wrote:

=?UTF-8?B?UmFwaGHDq2wgSmFjcXVvdA==?= [EMAIL PROTECTED] writes:

so, I propose the use of
NEW[variable_containing_the_column_name]
(which can obviously be extended to any tuples)
to allow such access.



what do you experts think ?


Zero chance.  plplgsql is a strongly typed language, and a construct
like that couldn't have any known-in-advance data type.


Would it be reasonable to teach EXECUTE about NEW and OLD? That should 
allow the OP to do what he's looking for...


You could have a function get_attribute_as_text(NEW, 'id') or even 
get_attribute_quoted(NEW, 'id')


It would be nice to have a more dynamic language built-in. I'm not aware 
of any BSD-licensed dynamic languages though.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] subquery in limit

2008-02-15 Thread Grzegorz Jaskiewicz

(just as an example):
select * from test order by a limit (select count(*)*0.9 from test);

is not doable in postgresql. Someone recently asked on IRC about,  
SELECT TOP 90 PERCENT type of query in m$sql.
Any ideas how should this be approach in psql. I ask here, because you  
guys probably can tell why the first query won't work (subquery is not  
allowed as limit's argument, why?).


cheers.

--
Grzegorz Jaskiewicz
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] wishlist for 8.4

2008-02-15 Thread Sam Mason
On Thu, Feb 14, 2008 at 11:06:42AM -0500, Tom Lane wrote:
 =?UTF-8?B?UmFwaGHDq2wgSmFjcXVvdA==?= [EMAIL PROTECTED] writes:
  so, I propose the use of
  NEW[variable_containing_the_column_name]
  (which can obviously be extended to any tuples)
  to allow such access.
 
  what do you experts think ?
 
 Zero chance.  plplgsql is a strongly typed language, and a construct
 like that couldn't have any known-in-advance data type.

If variable_containing_the_column_name was a string then I agree,
if columns were first class objects then it wouldn't be a problem.
Introducing sub-types would make this all nice and tidy, but
unfortunately sub-types are a long way away from what PG currently
supports and I don't know the SQL spec well enough to know if there
would be any serious incompatibilities with them--they are pretty
expressive so I'd doubt there would be any fundamental incompatibility.

If you want an example, Java has sub-types and 1.5+ has parametric
polymorphism making it possible to create an interface that named the
column you were interested in and then to write a function like:

  interface ColumnRow,ColType {
public ColType get(Row r);
  }
  public Row,ColType ColType foo(Row row, ColumnRow,ColType col) {
return col.get(row);
  }

In PG you'd want the sub-types to be generated automatically, the above
example is somewhat unreadable.  The only real purpose of the above was
to show how a strong type system can express what the OP wanted.

The anyelement and supporting polymorphism would all be subsumed into
the features provided by sub-typing and would all be done in a much
tidier way.

It's a very big change for PG though!


  Sam

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] wishlist for 8.4

2008-02-15 Thread Heikki Linnakangas

Raphaƫl Jacquot wrote:
I recently found myself trying to build a trigger to modify some fields 
in a good dozen similarly structured tables in which the similar columns 
had different names.
in fact, I got stuck in pl/pgsql with the fact that there's no way to 
access the NEW tuple in an indirect way, having the name of the column 
in some variable. (I found that it could be done in plperl, but that 
left me with a taste of un-completeness...)


It's ugly, but you could play tricks with EXECUTE. Like:

CREATE OR REPLACE FUNCTION emp_stamp() RETURNS trigger AS $emp_stamp$
DECLARE
empname text;
BEGIN
EXECUTE 'SELECT ('''||new||'''::emp).empname' INTO empname;
RAISE NOTICE 'new empname is: %', empname;
RETURN NEW;
END;
$emp_stamp$ LANGUAGE plpgsql;

Not sure the quoting is right...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] subquery in limit

2008-02-15 Thread Roberts, Jon
I have no idea why you can't do a subquery in the limit but you can
reference a function:

create table test as select * from pg_tables;

create or replace function fn_count(p_sql varchar) returns int as
$$
declare
  v_count int;
begin
  execute p_sql into v_count;
  return v_count;
end;
$$
language 'plpgsql' security definer;


select * from test limit fn_count('select round(count(*)*0.9) from
test');

And I'm sure someone will point out a more efficient way to write my
function without using pl/pgsql.  :)


Jon


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Grzegorz Jaskiewicz
 Sent: Friday, February 15, 2008 5:35 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] subquery in limit
 
 (just as an example):
 select * from test order by a limit (select count(*)*0.9 from test);
 
 is not doable in postgresql. Someone recently asked on IRC about,
 SELECT TOP 90 PERCENT type of query in m$sql.
 Any ideas how should this be approach in psql. I ask here, because you
 guys probably can tell why the first query won't work (subquery is not
 allowed as limit's argument, why?).
 
 cheers.
 
 --
 Grzegorz Jaskiewicz
 [EMAIL PROTECTED]
 
 
 
 
 ---(end of
broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ANALYZE to be ignored by VACUUM

2008-02-15 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 In my workload, ANALYZE takes long time (1min at statistics_target = 10,
 and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
 because seqscans run on the table repeatedly.

There is something *seriously* wrong with that.  If vacuum can complete
in under 30 seconds, how can analyze take a minute?  (I'm also wondering
whether you'll still need such frantic vacuuming with HOT...)

 So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
 by VACUUM.

I think we need to understand what the real problem is with your test
case.  This proposal seems very messy/ugly to me, and I'm unconvinced
that it solves anything.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] subquery in limit

2008-02-15 Thread Tom Lane
Roberts, Jon [EMAIL PROTECTED] writes:
 I have no idea why you can't do a subquery in the limit

It hasn't seemed worth putting any effort into --- AFAIR this is the
first time anyone's even inquired about it.  As you say, you can always
use a function.

 And I'm sure someone will point out a more efficient way to write my
 function without using pl/pgsql.  :)

Only that it doesn't seem a particularly bright idea to use SECURITY
DEFINER for a function that will execute any arbitrary caller-provided
SQL ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] subquery in limit

2008-02-15 Thread Roberts, Jon

 
 Roberts, Jon [EMAIL PROTECTED] writes:
  I have no idea why you can't do a subquery in the limit
 
 It hasn't seemed worth putting any effort into --- AFAIR this is the
 first time anyone's even inquired about it.  As you say, you can
always
 use a function.
 
  And I'm sure someone will point out a more efficient way to write my
  function without using pl/pgsql.  :)
 
 Only that it doesn't seem a particularly bright idea to use SECURITY
 DEFINER for a function that will execute any arbitrary caller-provided
 SQL ...
 

LOL!  I knew something in my code would trigger a response.  :)



Jon

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Last call for objections: SELECT alias without AS for IDENTs only?

2008-02-15 Thread Tom Lane
Is anyone opposed to applying the patch shown here
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00434.php
(plus appropriate docs changes etc)?  This allows AS to be omitted
in a SELECT output-column alias clause, but only if the desired
alias name lexes as an IDENT, that is, it's not any known keyword
(reserved or otherwise).  It would be nicer if it allowed any
unreserved keyword too; but per discussion, the only way that could
happen is if we made some currently unreserved keywords reserved,
which does not seem like a net improvement.

The documentation would probably have to say something like You
can omit AS only if the desired alias name is not any known
keyword (see Appendix C).  For protection against possible future
keyword additions, it is recommended that you always either use AS
or double-quote the alias name.

On the whole this form of the patch seems like a reasonable compromise
--- it doesn't lose any existing functionality, it doesn't take much
work, and it should at least dampen complaints about our lack of spec
compliance on this point.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] wishlist for 8.4

2008-02-15 Thread David Fetter
On Fri, Feb 15, 2008 at 10:06:49AM +, Richard Huxton wrote:
 Decibel! wrote:
 On Feb 14, 2008, at 10:06 AM, Tom Lane wrote:
 =?UTF-8?B?UmFwaGHDq2wgSmFjcXVvdA==?= [EMAIL PROTECTED] writes:
 so, I propose the use of
 NEW[variable_containing_the_column_name]
 (which can obviously be extended to any tuples) to allow such
 access.

 what do you experts think ?

 Zero chance.  plplgsql is a strongly typed language, and a
 construct like that couldn't have any known-in-advance data type.

 Would it be reasonable to teach EXECUTE about NEW and OLD? That
 should allow the OP to do what he's looking for...

 You could have a function get_attribute_as_text(NEW, 'id') or even
 get_attribute_quoted(NEW, 'id')

 It would be nice to have a more dynamic language built-in. I'm not
 aware of any BSD-licensed dynamic languages though.

Perl is BSD-compatible.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] CVS repository invalid revision

2008-02-15 Thread Christian Robottom Reis
Hello there,

I'm running a conversion of PostgreSQL's CVS repository, but I'm
stuck on a revision that cscvs fails to parse. The hint that the error
gives me is:

Parser error: failed to parse revision data line (line: 'date: 2000/12/04 
01:20:38;  author: tgl;  state: Exp;  lines:
')

That's the literal output -- I'm not sure what the linebreak after
lines: means.

Can someone help me out by finding and fixing the corrupted revision?
Id be most grateful.

-- 
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3376 0125

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] NetBSD/dtime_t

2008-02-15 Thread Michael Meskes
Could anyone please tell me how NetBSD defines dtime_t? We have a
buildfarm failure on canary. I know which change triggered this. I
enabled 

typedef timestamp dtime_t;

Which was mishandled by ecpg in Informix mode before and didn't work at
all if you were using the data type.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] NetBSD/dtime_t

2008-02-15 Thread Kris Jurka



On Fri, 15 Feb 2008, Michael Meskes wrote:


Could anyone please tell me how NetBSD defines dtime_t? We have a
buildfarm failure on canary.


/usr/include/sys/types.h says:

typedef int32_t dtime_t;/* on-disk time_t */

Kris Jurka


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] PG-relevant bug fixes just pushed in Fedora 8

2008-02-15 Thread Tom Lane
If you happen to use Fedora 8, you might be interested in these
just-closed bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=430983

Fixes the problem of gssapi reporting uninterpreted numeric error codes
instead of useful error messages when there's a Kerberos problem, as
we previously saw in this thread:
http://archives.postgresql.org/pgsql-hackers/2008-01/msg01004.php

https://bugzilla.redhat.com/show_bug.cgi?id=432203

Fixes flex 2.5.33 to not generate half a dozen no previous prototype
warnings for each .l file.  Note if you are working with a CVS pull
you'll need to make maintainer-clean to get rid of the broken current
output files, else the warnings will remain.

If you're on another distro and see these problems, you might want to
pester your distributor to apply the same fixes.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings