date:20090402

Re: [HACKERS] [SQL] How would I get rid of trailing blank line?

2009-04-02 Thread Tena Sakai

Hi Andrew,

> Right. There's a simple pipeline way to get rid of it:
>   psql -t -f query.sql | sed -e '$d' > query.out

Hi Scott,

> Tired of those blank lines in your text files?  Grep them away:
> psql -tf query.sql mydatabase | grep -v "^$" > query.out

Thank you Both.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


-Original Message-
From: Andrew Dunstan [mailto:and...@dunslane.net]
Sent: Thu 4/2/2009 6:34 PM
To: Tom Lane
Cc: Tena Sakai; pgsql-...@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] How would I get rid of trailing blank line?
 


Tom Lane wrote:
> "Tena Sakai"  writes:
>   
>> I often use a line like:
>>   psql -tf query.sql mydatabase > query.out
>> 
>
>   
>> -t option gets rid of the heading and count
>> report at the bottom.  There is a blank line
>> at the bottom, however.  Is there any way to
>> have psql not give me that blank line?
>> 
>
> Doesn't look like it --- the final fputc('\n', fout); seems to be
> done unconditionally in all the output formats.  I wonder if we should
> change that?  I'm afraid it might break programs that are used to it :-(
>
>
>   

Right. There's a simple pipeline way to get rid of it:

psql -t -f query.sql | sed -e '$d' > query.out


cheers

andrew

Re: [HACKERS] a few crazy ideas about hash joins

2009-04-02 Thread Heikki Linnakangas


Robert Haas wrote:

While investigating some performance problems recently I've had cause
to think about the way PostgreSQL uses hash joins.  So here are a few
thoughts.  Some of these have been brought up before.

1. When the hash is not expected to spill to disk, it preserves the
pathkeys of the outer side of the join.  If the optimizer were allowed
to assume that, it could produce significantly more efficient query
plans in some cases.  The problem is what to do if we start executing
the query and find out that we have more stuff to hash than we expect,
such that we need multiple batches?  Now the results won't be sorted.
I think we could handle this as follows: Don't count on the hash join
to preserve pathkeys unless it helps, and only rely on it when it
seems as if the hash table will still fit even if it turns out to be,
say, three times as big as expected.  But if you are counting on the
hash join to preserve pathkeys, then pass that information to the
executor.  When the executor is asked to perform a hash join, it will
first hash the inner side of the relation.  At that point, we know
whether we've succesfully gotten everything into a single batch, or
not.  If we have, perform the join normally.  If the worst has
happened and we've gone multi-batch, then perform the join and sort
the output before returning it.  The performance will suck, but at
least you'll get the right answer.

Previous in-passing reference to this idea here:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php


Hmm, instead of a sorting the output if the worst happens, a final merge 
step as in a merge sort would be enough.



2. Consider building the hash table lazily.  I often see query planner
pick a hash join over a nested inner indexscan because it thinks that
it'll save enough time making hash probes rather than index probes to
justify the time spent building the hash table up front.  But
sometimes the relation that's being hashed has a couple thousand rows,
only a tiny fraction of which will ever be retrieved from the hash
table.  We can predict when this is going to happen because n_distinct
for the outer column will be much less than the size of the inner rel.
 In that case, we could consider starting with an empty hash table
that effectively acts as a cache.  Each time a value is probed, we
look it up in the hash table.  If there's no entry, we use an index
scan to find the matching rows and insert them into the hash table.
Negative results must also be cached.


Yeah, that would be quite nice. One problem is that our ndistinct 
estimates are not very accurate.



3. Avoid building the exact same hash table twice in the same query.
This happens more often you'd think.  For example, a table may have
two columns creator_id and last_updater_id which both reference person
(id).  If you're considering a hash join between paths A and B, you
could conceivably check whether what is essentially a duplicate of B
has already been hashed somewhere within path A.  If so, you can reuse
that same hash table at zero startup-cost.


That seems like a quite simple thing to do. But would it work for a 
multi-batch hash table?



4. As previously discussed, avoid hashing for distinct and then
hashing the results for a hash join on the same column with the same
operators.


This seems essentially an extension of idea 3.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation Update: Document pg_start_backup checkpoint behavior

2009-04-02 Thread Heikki Linnakangas


Bruce Momjian wrote:

Michael Renner wrote:

+ processing.  Unfortunately it's currently not possible to expedite
+ the checkpointing done by pg_start_backup.
 




I have combined the above patch with another change that reports a
checkpoint is taking place:

test=> select pg_start_backup('12');
NOTICE:  performing checkpoint
 pg_start_backup
-
 0/220
(1 row)


Rather than deplore that you can't expedite the checkpoint, why don't we 
just make it possible? It's trivial to do, and in hindsight I think we 
should've implemented that option when we got smoothed checkpoints. 
Let's just decide what the command should look like.


The first question is what the default behavior should be? We've seen 
enough complaints and I've been bitten by that myself during development 
of other stuff often enough that I think we should change the default to 
immediate. From backwards-compatibility point of view, we shouldn't 
change the default, but then again an immediate checkpoint was what you 
got before 8.3.


For the interface, I can see two options:

1. New function

pg_start_backup('label') -> immediate checkpoint
pg_start_backup_lazy('label') -> lazy checkpoint

2. New argument

pg_start_backup('label') -> immediate checkpoint
pg_start_backup('label', false) -> immediate checkpoint
pg_start_backup('label', true) -> lazy checkpoint

The first looks nicer, IMHO, because the word 'lazy' makes it 
self-documenting. In the second form, you have to look at the manual to 
figure out what the 2nd argument does.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation Update: Document pg_start_backup checkpoint behavior

2009-04-02 Thread Tom Lane

Bruce Momjian  writes:
> + ereport(NOTICE,
> + (errmsg("performing checkpoint")));

You've *got* to be kidding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch for small omission in psql \? help

2009-04-02 Thread Andrew Gierth

Happened to notice this while looking for something else; the \ef
command appears to be missing from \? output. Suggested patch below.

-- 
Andrew (irc:RhodiumToad)

Index: src/bin/psql/help.c
===
RCS file: /projects/cvsroot/pgsql/src/bin/psql/help.c,v
retrieving revision 1.144
diff -c -r1.144 help.c
*** src/bin/psql/help.c	25 Mar 2009 13:15:55 -	1.144
--- src/bin/psql/help.c	3 Apr 2009 04:16:51 -
***
*** 175,180 
--- 175,181 
  
  	fprintf(output, _("Query Buffer\n"));
  	fprintf(output, _("  \\e [FILE]  edit the query buffer (or file) with external editor\n"));
+ 	fprintf(output, _("  \\ef [FUNCNAME] edit a function definition with external editor\n"));
  	fprintf(output, _("  \\p show the contents of the query buffer\n"));
  	fprintf(output, _("  \\r reset (clear) the query buffer\n"));
  #ifdef USE_READLINE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation Update: Document pg_start_backup checkpoint behavior

2009-04-02 Thread Bruce Momjian

Michael Renner wrote:
> Hi,
> 
> small patch for the documentation describing the current pg_start_backup 
> checkpoint behavior as per 
> http://archives.postgresql.org//pgsql-general/2008-09/msg01124.php .
> 
> Should we note down a TODO to revisit the current checkpoint handling?
> 
> best regards,
> Michael

> diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
> index 02545f1..6ea9488 100644
> --- a/doc/src/sgml/backup.sgml
> +++ b/doc/src/sgml/backup.sgml
> @@ -737,12 +737,8 @@ SELECT pg_start_backup('label');
>   (see the configuration parameter
>   ).  Usually
>   this is what you want because it minimizes the impact on query
> - processing.  If you just want to start the backup as soon as
> - possible, execute a CHECKPOINT command
> - (which performs a checkpoint as quickly as possible) and then
> - immediately execute pg_start_backup.  Then there
> - will be very little for pg_start_backup's checkpoint
> - to do, and it won't take long.
> + processing.  Unfortunately it's currently not possible to expedite
> + the checkpointing done by pg_start_backup.
>  
> 
> 

I have combined the above patch with another change that reports a
checkpoint is taking place:

test=> select pg_start_backup('12');
NOTICE:  performing checkpoint
 pg_start_backup
-
 0/220
(1 row)

Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/backup.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.123
diff -c -c -r2.123 backup.sgml
*** doc/src/sgml/backup.sgml	5 Mar 2009 19:50:03 -	2.123
--- doc/src/sgml/backup.sgml	3 Apr 2009 03:35:42 -
***
*** 737,748 
   (see the configuration parameter
   ).  Usually
   this is what you want because it minimizes the impact on query
!  processing.  If you just want to start the backup as soon as
!  possible, execute a CHECKPOINT command
!  (which performs a checkpoint as quickly as possible) and then
!  immediately execute pg_start_backup.  Then there
!  will be very little for pg_start_backup's checkpoint
!  to do, and it won't take long.
  
 
 
--- 737,744 
   (see the configuration parameter
   ).  Usually
   this is what you want because it minimizes the impact on query
!  processing.  Unfortunately it's currently not possible to expedite
!  the checkpointing done by pg_start_backup.
  
 
 
Index: src/backend/access/transam/xlog.c
===
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.334
diff -c -c -r1.334 xlog.c
*** src/backend/access/transam/xlog.c	11 Mar 2009 23:19:24 -	1.334
--- src/backend/access/transam/xlog.c	3 Apr 2009 03:35:42 -
***
*** 6977,6982 
--- 6977,6984 
  	/* Ensure we release forcePageWrites if fail below */
  	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
  	{
+ 		ereport(NOTICE,
+ (errmsg("performing checkpoint")));
  		/*
  		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
  		 * page problems, this guarantees that two successive backup runs will

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue


Hiroshi Inoue wrote:

Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas  writes:

Tom Lane wrote:

Maybe use a special string "Translate Me First" that
doesn't actually need to be end-user-visible, just so no one sweats 
over

getting it right in context.


Yep, something like that. There seems to be a magic empty string 
translation at the beginning of every po file that returns the 
meta-information about the translation, like translation author and 
date. Assuming that works reliably, I'll use that.


At first that sounded like an ideal answer, but I can see a gotcha:
suppose the translation's author's name contains some characters that
don't convert to the database encoding.  I suppose that would result in
failure, when we'd prefer it not to.  A single-purpose string could be
documented as "whatever you translate this to should be pure ASCII,
never mind if it's sensible".


I just tried that, and it seems that gettext() does transliteration, 
so any characters that have no counterpart in the database encoding 
will be replaced with something similar, or question marks. Assuming 
that's universal across platforms, and I think it is, using the empty 
string should work.


It also means that you can use lc_messages='ja' with 
server_encoding='latin1', but it will be unreadable because all the 
non-ascii characters are replaced with question marks. For something 
like lc_messages='es_ES' and server_encoding='koi8-r', it will still 
look quite nice.


Attached is a patch I've been testing. Seems to work quite well. It 
would be nice if someone could test it on Windows, which seems to be a 
bit special in this regard.


Unfortunately it doesn't seem to work on Windows.


Is it unappropriate to call iconv_open() to check if the codeset is
 valid for bind_textdomain_codeset()?

regards,
Hiroshi Inoue


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] benchmarking the query planner

2009-04-02 Thread Robert Haas

On Thu, Mar 19, 2009 at 4:04 AM, ITAGAKI Takahiro
 wrote:
> Robert Haas  wrote:
>> >> Works for me. Especially if you want to think more about ANALYZE before
>> >> changing that.
>> >
>> > Well, it's something that would be sane to contemplate adding in 8.4.
>> > It's way too late for any of this other stuff to happen in this release.
>>
>> I'm thinking about trying to implement this, unless someone else is
>> already planning to do it.  I'm not sure it's practical to think about
>> getting this into 8.4 at this point, but it's worth doing whether it
>> does or not.
>
> Can we use get_relation_stats_hook on 8.4? The pg_statistic catalog
> will be still modified by ANALYZEs, but we can rewrite the statistics
> just before it is used.
>
> your_relation_stats_hook(root, rte, attnum, vardata)
> {
>    Call default implementation;
>    if (rte->relid = YourRelation && attnum = YourColumn)
>        ((Form_pg_statistic) (vardata->statsTuple))->stadistinct = 
> YourNDistinct;
> }

I don't know, can you run a query from inside the stats hook?  It
sounds like this could be made to work for a hard-coded relation and
column, but ideally you'd like to get this data out of a table
somewhere.

I started implementing this by adding attdistinct to pg_attribute and
making it a float8, with 0 meaning "don't override the results of the
normal stats computation" and any other value meaning "override the
results of the normal stats computation with this value".  I'm not
sure, however, whether I can count on the result of an equality test
against a floating-point zero to be reliable on every platform.It
also seems like something of a waste of space, since the only positive
values that are useful are integers (and presumably less than 2^31-1)
and the only negative values that are useful are > -1.  So I'm
thinking about making it an integer, to be interpreted as follows:

0 => compute ndistinct normally
positive value => use this value for ndistinct
negative value => use this value * 10^-6 for ndistinct

Any thoughts?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] a few crazy ideas about hash joins

2009-04-02 Thread Robert Haas

While investigating some performance problems recently I've had cause
to think about the way PostgreSQL uses hash joins. So here are a few
thoughts. Some of these have been brought up before.

1. When the hash is not expected to spill to disk, it preserves the
pathkeys of the outer side of the join. If the optimizer were allowed
to assume that, it could produce significantly more efficient query
plans in some cases. The problem is what to do if we start executing
the query and find out that we have more stuff to hash than we expect,
such that we need multiple batches? Now the results won't be sorted.
I think we could handle this as follows: Don't count on the hash join
to preserve pathkeys unless it helps, and only rely on it when it
seems as if the hash table will still fit even if it turns out to be,
say, three times as big as expected. But if you are counting on the
hash join to preserve pathkeys, then pass that information to the
executor. When the executor is asked to perform a hash join, it will
first hash the inner side of the relation. At that point, we know
whether we've succesfully gotten everything into a single batch, or
not. If we have, perform the join normally. If the worst has
happened and we've gone multi-batch, then perform the join and sort
the output before returning it. The performance will suck, but at
least you'll get the right answer.

Previous in-passing reference to this idea here:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php

2. Consider building the hash table lazily. I often see query planner
pick a hash join over a nested inner indexscan because it thinks that
it'll save enough time making hash probes rather than index probes to
justify the time spent building the hash table up front. But
sometimes the relation that's being hashed has a couple thousand rows,
only a tiny fraction of which will ever be retrieved from the hash
table. We can predict when this is going to happen because n_distinct
for the outer column will be much less than the size of the inner rel.
In that case, we could consider starting with an empty hash table
that effectively acts as a cache. Each time a value is probed, we
look it up in the hash table. If there's no entry, we use an index
scan to find the matching rows and insert them into the hash table.
Negative results must also be cached.

3. Avoid building the exact same hash table twice in the same query.
This happens more often you'd think. For example, a table may have
two columns creator_id and last_updater_id which both reference person
(id). If you're considering a hash join between paths A and B, you
could conceivably check whether what is essentially a duplicate of B
has already been hashed somewhere within path A. If so, you can reuse
that same hash table at zero startup-cost.

4. As previously discussed, avoid hashing for distinct and then
hashing the results for a hash join on the same column with the same
operators.

http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206...@mail.gmail.com

Thoughts on the value and/or complexity of implementation of any of these?

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 101 matches

Mail list logo