Re: [GENERAL] overhead of plpgsql functions over simple select

2008-10-24 Thread Ivan Sergio Borgonovo
On Fri, 24 Oct 2008 07:03:35 +0200
"Pavel Stehule" <[EMAIL PROTECTED]> wrote:

> 2008/10/24 Tom Lane <[EMAIL PROTECTED]>:
> > "Pavel Stehule" <[EMAIL PROTECTED]> writes:
> >> postgres=# create function simplefce(a int, b int) returns int
> >> as $$select $1 + $2$$ language sql immutable strict;
> >> CREATE FUNCTION
> >> postgres=# create function simplefce1(a int, b int) returns int
> >> as $$begin return a+b; end;$$ language plpgsql immutable strict;
> >> CREATE FUNCTION
> >
> > That's a pretty unfair comparison, because that SQL function is
> > simple enough to be inlined.  The place to use plpgsql is when
> > you need some procedural logic; at which point a SQL function
> > simply fails to provide the required functionality.
> >
> 
> Yes, this test is maximal unfair to plpgsql - it's too simply
> function. But it was original question. What is overhead plpgsql
> call on simple functions? On every little bit complicated functions
> overhead should be less. And this sample shows sense of using SQL
> functions.

It's just one case. Furthermore I was interested in plain select
statement vs. plsql encapsulating a simple select statement. But
since we are at it, it would be nice to have a larger picture.

I just avoided a test because I didn't know what
to test.

eg. If I'm using a stable function that return records plpgsql
functions are more complicated just to interpret, they are simply
longer, then as I'm learning now they can't be embedded while sql
functions can (am I right?).

To make a meaningful test I should know what are the potential
factors that make the difference between the 2 (3 actually, simple
sql statement, sql functions and plpgsql functions).

I can't even understand if all immutable sql functions can be
embedded.
The more field are returned (unless I've a custom type or a matching
table) the longer will be the plpgsql function etc...

I couldn't think anything other than cost of interpretation (or
does postgresql has a sort of JIT) and cost of call that can impact
the difference.

I can't still understand when and if it is going to make a
difference.
Yeah I understood that at least in immutable functions sql is faster.
I did some simple tests and it looks as being roughly 3 time faster.
With higher numbers the difference seems to get smaller, maybe
because of the higher cost of allocating memory caused by
generate_series(?).
So I know that immutable simple(?) functions are much faster in
sql... anything else to avoid? What are the factors that play a role
in execution times?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL 8.3.4 Solaris x86 compilation issues

2008-10-24 Thread Dot Yet
Hi everyone,
i am facing some problem while compiling postgresql 8.3.4 on Solaris 10 x86
10u5. the compiler is SunStudio 12.

The compilation happens without errors, but make check fails:

OpenSSL 0.9.8i compiled as:
./Configure --prefix=/opt/usr/local -m64 -xmodel=medium

PostgreSQL 8.3.4 compiled as:
./configure --prefix=/opt/usr/local --with-libs=/opt/usr/local/lib
--without-readline --with-openssl CFLAGS=" -m64 -xmodel=medium"

The error happens while running make check:

./pg_regress --temp-install=./tmp_check --top-builddir=../../..
--srcdir=/export/home/dotyet/pginst/postgresql-8.3.4/src/test/regress
--temp-port=55432 --schedule=./parallel_schedule --multibyte=SQL_ASCII
--load-language=plpgsql
== removing existing temp installation==
== creating temporary installation==
== initializing database system   ==
== starting postmaster==
Killed
Killed
Killed
Killed
Killed
Killed

if I remove the --with-openssl, everything works as expected.

any thoughts?

thanks in advance.
dotyet


Re: [GENERAL] Storing questionnaire data

2008-10-24 Thread Thom Brown
Thanks David and Jeff.

I can see your point.  The provided link might actually be useful,
although I think I'd make some changes to it.

I wouldn't have trouble data-mining such a structure for individual
questionnaire results.  The planner will be shrugging its shoulders,
but I haven't actually tested that solution with many massive
questionnaires for its query performance.

I pretty much have my answer.  Thanks for your input guys.

Thom

On Fri, Oct 24, 2008 at 12:31 AM, Jeff Davis <[EMAIL PROTECTED]> wrote:
> On Thu, 2008-10-23 at 21:38 +0100, Thom Brown wrote:
>> I'm afraid such a rigid structure is completely tailored for a
>> specific questionnaire.  What if I, or even a client, wanted to
>> generate different questionnaires?  I would like the data to indicate
>> the flow of questions and answers rather than just use the database as
>> pure storage for a completely coded solution.
>
> If you construct a schema in such a way that there's really no
> constraint on the data at all, then the user of that permissive schema
> is effectively designing the database.
>
> This is not necessarily a bad thing (or rather, it may not be
> avoidable). For instance, if you have no idea what kind of questions
> might be asked by the questionnaire, nor any idea what kind of questions
> might be asked about the responses they receive to the questionnaire,
> there's not much you can do. Pretty much anything is going to look a lot
> like EAV.
>
> The cost of this, however, is that you (as the designer of the
> permissive schema) can no longer see any meaning in the data at all. You
> may be able to dig around manually a bit and find out a few specific
> things, but you can't do it in any automated way. This is because you
> aren't the real designer of the database, you've passed that job along
> to your users. Only they (hopefully) have any idea what it might mean.
> The users might not be good database designers, in which case they'll
> end up with a mess, and you won't be able to help them.
>
> Also, as a performance matter, the optimizer also has no idea what your
> data means, and so it can't take any useful shortcuts. So, it will
> probably be slow.
>
> The best you can really do is try to find whatever basic meaning you
> can. Usually there is something there: there are basic data types people
> will want (e.g. string, numeric, timestamp). There are questions,
> perhaps groups of questions, order in which the questions should be
> asked, order in which the questions are answered, time the question was
> answered, and respondents. There is one (or fewer) answer per question
> per respondent. Try to piece this stuff together in some way as to
> provide maximum meaning to you (and to PostgreSQL) without destroying
> the usefulness to your customers.
>
> I think the article David mentioned:
> http://www.varlena.com/GeneralBits/110.php
> Is a pretty reasonable compromise for many use-cases. Perhaps more can
> be done, but usually questionnaires are either too unimportant to really
> dig in, or so important that designing a database around it is the
> obvious thing to do.
>
> Regards,
>Jeff Davis
>
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] partitioning question. need current month and archive partitions.

2008-10-24 Thread Sergey Levchenko
Hi All!

I have a table - transaction pool - with a lot of rows, but I use only
data for the latest month, or current year in my computations.
How can I split data to partitions like that if I can't use CHECK
constraints with non constant objects like, extract('month' from
CURRENT_DATE), extract('year' from CURRENT_DATE):

  transaction_pool - master table

  transaction_currentmonth - table contains transactions commited
_only_ this (current) month
  transaction_currentyear - table contains all transaction for this
year except for transaction_currentmonth rows

  transaction_archive - table contains everything except for
transaction_currentmonth, transaction_currentyear rows

Will it be helpful to make such a kind of partitioning?

Thanks a lot!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query m:n-Combination

2008-10-24 Thread Ludwig Kniprath

Dear list,
I have to solve a simple Problem, explained below with some sample-Data.

A typical M:N-constellation, rivers in one table, communities in the 
other table, m:n-join-informations (which river is running in which 
community) in a third table.


Table rivers:
R_ID  R_Name
1 river_1
2 river_2
3 river_3
4 river_4
5 river_5

Table communities :
C_ID   C_Name
1  community_1
2  community_2
3  community_3
4  community_4
5  community_5

Join-table
mn_2_r_id   mn_2_c_id
1   1
1   2
1   3
1   4
2   1
3   2
3   5
4   3
...

(in real database this relation is an gis-relation with thousands of 
rivers and countries, related by spatial join, but the problem is the 
same...)


I want to know, which river is running through communities 1,2,3 *and* 4?
You can see the solution by just looking at the data above (only 
"river_1" is running through all these countries), but how to query this 
by sql?


Thanks in advance
Ludwig

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need Tool to sync databases with 8.3.1

2008-10-24 Thread Stefan Sturm
Hello,

> reopening this thread, because I just received e-mail from EMS that they
> just release EMS DB Comparer with PostgreSQL 8.3 support.
> I'm going to evaluate this, so I realized people here may be interested.
> http://sqlmanager.net/en/products/postgresql/dbcomparer/download

this app looks and works really good, but there is no version for OSX :-(
Does anybody know an application like this for OSX?

Thanks,
Stefan Sturm

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] docbook xml into/out-of sql-tables

2008-10-24 Thread Peter Eisentraut

Otto Hirr wrote:

I'm looking for pointers to info on storeing / retreving docbook, or other
document type systems, in sql tables.


Make a column of type xml and store it there.  But we don't have schema 
validation for xml data yet.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query m:n-Combination

2008-10-24 Thread Albe Laurenz
Ludwig Kniprath wrote:
> A typical M:N-constellation, rivers in one table, communities in the 
> other table, m:n-join-informations (which river is running in which 
> community) in a third table.
> 
> Table rivers:
> R_ID  R_Name
> 1 river_1
> 2 river_2
> 3 river_3
> 4 river_4
> 5 river_5
> 
> Table communities :
> C_ID   C_Name
> 1  community_1
> 2  community_2
> 3  community_3
> 4  community_4
> 5  community_5
> 
> Join-table
> mn_2_r_id   mn_2_c_id
> 1   1
> 1   2
> 1   3
> 1   4
> 2   1
> 3   2
> 3   5
> 4   3
> ...
> 
> I want to know, which river is running through communities 
> 1,2,3 *and* 4?
> You can see the solution by just looking at the data above (only 
> "river_1" is running through all these countries), but how to 
> query this by sql?

SELECT r.r_name FROM rivers AS r
  JOIN join-table j1 ON (r.r_id = j1.mn_2_r_id)
  JOIN join-table j2 ON (r.r_id = j2.mn_2_r_id)
  JOIN join-table j3 ON (r.r_id = j3.mn_2_r_id)
  JOIN join-table j4 ON (r.r_id = j4.mn_2_r_id)
WHERE j1.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 'community_1')
  AND j2.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 'community_2')
  AND j3.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 'community_3')
  AND j4.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 'community_4')

(untested)

Is that what you are looking for?

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query m:n-Combination

2008-10-24 Thread Thomas Markus

hi,

try

select
   r.*
from
   rivers r
   join jointable j1 on r.r_id=j1.mn_2_r_id join communities c1 on 
j1.mn_2_c_id=c1.c_id and c1.C_Name='community_1'
   join jointable j2 on r.r_id=j2.mn_2_r_id join communities c2 on 
j2.mn_2_c_id=c2.c_id and c2.C_Name='community_2'
   join jointable j3 on r.r_id=j3.mn_2_r_id join communities c3 on 
j3.mn_2_c_id=c3.c_id and c3.C_Name='community_3'
   join jointable j4 on r.r_id=j4.mn_2_r_id join communities c4 on 
j4.mn_2_c_id=c4.c_id and c4.C_Name='community_4'

where
   r.R_Name='river_1'

/tm


Ludwig Kniprath schrieb:

Dear list,
I have to solve a simple Problem, explained below with some sample-Data.

A typical M:N-constellation, rivers in one table, communities in the 
other table, m:n-join-informations (which river is running in which 
community) in a third table.


Table rivers:
R_ID  R_Name
1 river_1
2 river_2
3 river_3
4 river_4
5 river_5

Table communities :
C_ID   C_Name
1  community_1
2  community_2
3  community_3
4  community_4
5  community_5

Join-table
mn_2_r_id   mn_2_c_id
1   1
1   2
1   3
1   4
2   1
3   2
3   5
4   3
...

(in real database this relation is an gis-relation with thousands of 
rivers and countries, related by spatial join, but the problem is the 
same...)


I want to know, which river is running through communities 1,2,3 *and* 4?
You can see the solution by just looking at the data above (only 
"river_1" is running through all these countries), but how to query 
this by sql?


Thanks in advance
Ludwig



begin:vcard
fn:Thomas Markus
n:Markus;Thomas
org:proventis GmbH
adr:;;Zimmerstr. 79-80;Berlin;Berlin;10117;Germany
email;internet:[EMAIL PROTECTED]
tel;work:+49 30 29 36 399 22
x-mozilla-html:FALSE
url:http://www.proventis.net
version:2.1
end:vcard


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query m:n-Combination

2008-10-24 Thread Sam Mason
On Fri, Oct 24, 2008 at 03:05:33PM +0200, Albe Laurenz wrote:
> Ludwig Kniprath wrote:
> > I want to know, which river is running through communities 
> > 1,2,3 *and* 4?
> > You can see the solution by just looking at the data above (only 
> > "river_1" is running through all these countries), but how to 
> > query this by sql?
> 
> SELECT r.r_name FROM rivers AS r
>   JOIN join-table j1 ON (r.r_id = j1.mn_2_r_id)
>   JOIN join-table j2 ON (r.r_id = j2.mn_2_r_id)
>   JOIN join-table j3 ON (r.r_id = j3.mn_2_r_id)
>   JOIN join-table j4 ON (r.r_id = j4.mn_2_r_id)
> WHERE j1.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 
> 'community_1')
>   AND j2.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 
> 'community_2')
>   AND j3.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 
> 'community_3')
>   AND j4.mn_2_c_id = (SELECT c_id FROM communities WHERE c_name = 
> 'community_4')

An alternative would be to move the work into the aggregation stage:

  SELECT r.r_id
  FROM rivers r, communities c, "join-table" j
  WHERE r.r_id = j.mn_2_r_id
AND c.c_id = j.mn_2_c_id
  GROUP BY r.r_id
  HAVING bool_or(c.name = 'community_1')
 AND bool_or(c.name = 'community_2')
 AND bool_or(c.name = 'community_3')
 AND bool_or(c.name = 'community_4')
 AND bool_or(c.name = 'community_5');

You may need to put a "c.name IN ('community_1', 'community_2'"...
expression into the WHERE clause to give the planner some traction to
optimize things, but it's not needed for correctness.


  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Escape wildcard problems.

2008-10-24 Thread Gauthier, Dave
I read in the docs (section 9.7.1) that the backslash... \ ... is the default 
escape char to use in "like" expressions.  Yet when I try it, it doesn't seem 
to work the ay I expect.  Here's an example...

select name from templates where name like '%\_cont\_%';

  name
--
cgidvcontrol
x8idvcontrol
etc

I would expect to NOT see these because the "cont" is not preceded by and 
followed by an underscore (because I escaped them with \).

Please advise.

Thanks
-dave



Re: [GENERAL] Annoying Reply-To

2008-10-24 Thread Michelle Konzack
Am 2008-10-23 15:52:30, schrieb ries van Twisk:
> anyways.. I don't care anymore... I will do a reply all.

I do normaly:  killall   ;-)

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: [GENERAL] Query m:n-Combination

2008-10-24 Thread Harald Fuchs
In article <[EMAIL PROTECTED]>,
Ludwig Kniprath <[EMAIL PROTECTED]> writes:

> Dear list,
> I have to solve a simple Problem, explained below with some sample-Data.

> A typical M:N-constellation, rivers in one table, communities in the
> other table, m:n-join-informations (which river is running in which
> community) in a third table.

> Table rivers:
> R_ID  R_Name
> 1 river_1
> 2 river_2
> 3 river_3
> 4 river_4
> 5 river_5

> Table communities :
> C_ID   C_Name
> 1  community_1
> 2  community_2
> 3  community_3
> 4  community_4
> 5  community_5

> Join-table
> mn_2_r_id   mn_2_c_id
> 1   1
> 1   2
> 1   3
> 1   4
> 2   1
> 3   2
> 3   5
> 4   3
> ...

> (in real database this relation is an gis-relation with thousands of
> rivers and countries, related by spatial join, but the problem is the
> same...)

> I want to know, which river is running through communities 1,2,3 *and* 4?
> You can see the solution by just looking at the data above (only
> "river_1" is running through all these countries), but how to query
> this by sql?

Probably the fastest way is to do an OR join and counting the matches:

  SELECT r.r_name
  FROM rivers r
  JOIN join_table j ON j.mn2_r_id = r.r_id
  JOIN communities c ON c.c_id = j.mn2_c_id
  WHERE c.c_name IN ('community_1', 'community_2',
 'community_3', 'community_4')
  GROUP BY r.r_name
  HAVING count(*) = 4


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Escape wildcard problems.

2008-10-24 Thread Sam Mason
On Fri, Oct 24, 2008 at 08:12:38AM -0700, Gauthier, Dave wrote:
> select name from templates where name like '%\_cont\_%';
> 
>   name
> --
> cgidvcontrol
> x8idvcontrol
> etc
> 
> I would expect to NOT see these because the "cont" is not preceded by
> and followed by an underscore (because I escaped them with \).

You need to escape the escape! backslash is the escape character in
literals as well as like patterns, so you need to double it up.  I think
you want to be doing:

  name LIKE '%\\_cont\\_%'



  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Escape wildcard problems.

2008-10-24 Thread Alan Hodgson
On Friday 24 October 2008, "Gauthier, Dave" <[EMAIL PROTECTED]> wrote:
> I read in the docs (section 9.7.1) that the backslash... \ ... is the
> default escape char to use in "like" expressions.  Yet when I try it, it
> doesn't seem to work the ay I expect.  Here's an example...
>
> select name from templates where name like '%\_cont\_%';
>

Use double \\ for underscores. I don't know why it's necessary, but it works 
here.


-- 
Alan

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Escape wildcard problems.

2008-10-24 Thread Craig Ringer
Alan Hodgson wrote:
> On Friday 24 October 2008, "Gauthier, Dave" <[EMAIL PROTECTED]> wrote:
>> I read in the docs (section 9.7.1) that the backslash... \ ... is the
>> default escape char to use in "like" expressions.  Yet when I try it, it
>> doesn't seem to work the ay I expect.  Here's an example...
>>
>> select name from templates where name like '%\_cont\_%';
>>
> 
> Use double \\ for underscores. I don't know why it's necessary, but it works 
> here.

Here's why. See the documentation for more information:


craig=> show standard_conforming_strings;
 standard_conforming_strings
-
 off
(1 row)

craig=> SELECT '%\_cont\_%';
WARNING:  nonstandard use of escape in a string literal
LINE 1: SELECT '%\_cont\_%';
   ^
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
 ?column?
--
 %_cont_%
(1 row)

craig=> SELECT E'%\\_cont\\_%';
  ?column?

 %\_cont\_%
(1 row)

craig=> set standard_conforming_strings = 1;
SET

craig=> SELECT '%\_cont\_%';
  ?column?

 %\_cont\_%
(1 row)



--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Escape wildcard problems.

2008-10-24 Thread Thom Brown
Or you could use:

SELECT name
FROM templates
WHERE name ~ '\_cont\_';

This does it as a regular expression.

~* '\_aa\_';

On Fri, Oct 24, 2008 at 5:07 PM, Craig Ringer
<[EMAIL PROTECTED]> wrote:
> Alan Hodgson wrote:
>> On Friday 24 October 2008, "Gauthier, Dave" <[EMAIL PROTECTED]> wrote:
>>> I read in the docs (section 9.7.1) that the backslash... \ ... is the
>>> default escape char to use in "like" expressions.  Yet when I try it, it
>>> doesn't seem to work the ay I expect.  Here's an example...
>>>
>>> select name from templates where name like '%\_cont\_%';
>>>
>>
>> Use double \\ for underscores. I don't know why it's necessary, but it works
>> here.
>
> Here's why. See the documentation for more information:
>
>
> craig=> show standard_conforming_strings;
>  standard_conforming_strings
> -
>  off
> (1 row)
>
> craig=> SELECT '%\_cont\_%';
> WARNING:  nonstandard use of escape in a string literal
> LINE 1: SELECT '%\_cont\_%';
>   ^
> HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
>  ?column?
> --
>  %_cont_%
> (1 row)
>
> craig=> SELECT E'%\\_cont\\_%';
>  ?column?
> 
>  %\_cont\_%
> (1 row)
>
> craig=> set standard_conforming_strings = 1;
> SET
>
> craig=> SELECT '%\_cont\_%';
>  ?column?
> 
>  %\_cont\_%
> (1 row)
>
>
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] max time in a table query takes ages

2008-10-24 Thread Scott Ribe
> Can postgres use combined indicies for queries that would only require part of
> it ? 

Even if not, if there is at least one index that reduces the potential
matches to a small set, then scanning those rows against the other criteria
won't take so long. (Assuming good stats and PG choosing a good plan.)

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres optimizer choosing wrong index

2008-10-24 Thread Jack Orenstein

Tom Lane wrote:

Jack Orenstein <[EMAIL PROTECTED]> writes:

I may have simplified too far. Our application runs a number of
different queries. All our WHERE clauses restrict dh and fh. For a
given pair of (dh, fh) values, the initial query should come up empty
and then insert this pair, and then there is further processing
(SELECT, UPDATE). Something is causing a huge number of index row
reads (according to pg_stat_user_indexes) but only in tables that have
been vacuumed.


Well, that's a bit more concrete but it's still difficult to tell where
the problem is.  Are you by any chance creating new tables and then
vacuuming them while they're still empty?  That would cause
pg_class.relpages to get set to zero, and 7.4.x is not bright enough to
change plans when you later fill the table (until you do another vacuum
or analyze).  However, I think 7.4 would always choose a seqscan on a
table it thinks is zero-size, so I'm not sure that that's what's going
on here.


I've collected much more detailed information on this problem.

(Recap: In 7.4, table T(int dh, int fh, int nm, ...) has indexes idx_df on (dh, 
fh), and idx_dn on (dh, nm). Queries of the form "... where dh = ? and fh = ?" 
appear to be using the (dh, nm) index leading to horrendous performance in cases 
where there are very few unique dh values, and T has been vacuumed.)


The optimizer is definitely picking the wrong index even when the query 
restricts dh to a value actually present in the table. (Yesterday, I was misled 
by running EXPLAIN with a value not in the table.) Tom's speculation above is 
mostly correct except that, as shown below, the optimizer is not choosing a seqscan.


EXPERIMENT:

- I created two schemas, NOVAC and VAC, each with a table T as described above.

- Before loading data, I ran VACUUM ANALYZE on VAC.T.

- I then started loading data. The workload is a mixture of INSERT,   SELECT and 
UPDATE. For SELECT and UPDATE the WHERE clause always includes "dh = ? and fh = ?".


PG_STAT RESULTS:

Stats collection was enabled. Here is sample output during the run, (yes, those 
are real numbers, I just happened to run the query when the 2nd row counters hit 
those nice round numbers).


ris=# select schemaname, indexrelname, idx_scan, idx_tup_read, 
idx_tup_fetch from pg_stat_user_indexes where schemaname in ('novac', 'vac') and 
relname = 'external_file' order by 1, 2;

 schemaname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch
+--+--+--+---
 novac  | idx_dn   |0 |0 | 0
 novac  | idx_dh   |25000 |2 | 2
 vac| idx_dn   |16093 | 25991728 |  25991728
 vac| idx_dh   |0 |0 | 0

This shows the different index usage in the two schemas. The schema containing 
the vacuumed table used idx_dn exclusively; in the other schema, idx_dh was used 
exclusively.


EXPLAIN PLAN RESULTS:

This query shows the distribution of T.dh values in NOVAC:

ris=# select dh, count(*) from novac.t group by dh;
 dh| count
---+---
 280433162 |  5000
 601454890 |  5000

Using one of these dh values, the optimizer selects idx_df as I expected (the fh 
values are nearly unique; 0 is the median of the possible values):


ris=# explain select * from novac.t where dh = 280433162 and fh = 0;
QUERY PLAN 


--
 Index Scan using idx_df on t  (cost=0.00..4.83 rows=1 width=454)
   Index Cond: ((dh = 280433162) AND (fh = 0))
(2 rows)

But in the VAC schema:

ris=# select dir_hash, count(*) from vac.t group by dh;
 dir_hash  | count
---+---
 758082190 |  5000
 980351022 |  5000
(2 rows)

ris=# explain select * from vac.t where dh = 758082190 and fh = 0;
QUERY PLAN 


--
 Index Scan using idx_dn on t  (cost=0.00..3.68 rows=1 width=454)
   Index Cond: (dh = 758082190)
   Filter: (fh = 0)
(3 rows)

Here are page and tuple counts:

ris=# select n.nspname, relname, relpages, reltuples from pg_class, 
pg_namespace n where relname = 't' and relnamespace = n.oid;

 nspname  | relname | relpages | reltuples
--+-+--+---
 novac| t   |   10 |  1000
 vac  | t   |0 | 0

From Tom Lane's earlier email, the vacuum prior to the load explains the 0s in 
the second row.


pg_stats has no data for table T in either schema. In the case of the VAC 
schema, I'm guessing this is because the table was actually empty when analyzed.



CONCLUSION:

The optimizer is objectively making the wrong choice. The pg_stat results 
included above, (as well as t

Re: [GENERAL] max time in a table query takes ages

2008-10-24 Thread Grzegorz Jaśkiewicz
that index did the job, also reindexing, and getting rid of two other not
quite often used indices helped a lot.  Now, this whole machine is fairly
simple two way p4, with two sata discs in software raid 1 on linux. And it
seems to spend loads of time (40-60% sometimes) on waits. I guess this is
due to lack of aio support in postgresql, but still its times better from
what it used to be on 8.1 (the machine is running 8.3).


[GENERAL] Order by with spaces and other characters

2008-10-24 Thread mike stanton
Hello everyone.
We have a simple problem...that we have keys that include blanks and various 
other commonly used characters like ",",  ";" and "-". For some reason, the 
select we have, nothing complicated, ignores these "special" characters and 
happily sorts by the A-z range. How do we sort by the, say ascii 32-125 range?
This must be a known issue; perhaps we need to configure a parameter.

Michael Stanton W.
Depto. Informática
AutoCastillo S.A
Santiago Chile

[GENERAL] JDBC - Call stored function that returns user defined type

2008-10-24 Thread cyw
I am looking for info on how to call a stored function that returns a user 
defined type.

Assume I have a type defined as:
CREATE TYPE XYType AS (x int4, y int4);

and I use CYType in a function such as this:
CREATE  FUNCTION test(IN z int4, OUT xy XYType, OUT status character) 
RETURNS RECORD AS $BODY$.

I found info on how to formulate a CallableStatement, but no info on how to 
process the results from the call? Or does Postgres JDBC support this type of 
calls?

Any help is greatly appreciated.

Regards,
CYW






   

Re: [GENERAL] How to get user list and privileges?

2008-10-24 Thread Tomasz Myrta

Q napisal 24.10.2008 10:47:

How to get (through SQL command, in PostgreSQL 8.0.14)
users / groups / rules list and privileges for group / rule / table?
/sth like psql's "\z ..." command; 


If you need some psql feature - append "-E" param. Psql will echo all 
internal sql-queries to screen.


--
Regards,
Tomasz Myrta

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] max time in a table query takes ages

2008-10-24 Thread Alan Hodgson
On Friday 24 October 2008, "Grzegorz Jaśkiewicz" <[EMAIL PROTECTED]> wrote:
> with two sata discs in software raid 1 on linux. And it seems to spend
> loads of time (40-60% sometimes) on waits.  I guess this is due to lack of
> >aio support in postgresql, 

No, it's due to the fact that hard disks are slow; much, much slower than 
CPU or memory.

-- 
Alan

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storing questionnaire data

2008-10-24 Thread David Fetter
On Fri, Oct 24, 2008 at 09:34:20AM +0100, Thom Brown wrote:
> Thanks David and Jeff.
> 
> I can see your point.  The provided link might actually be useful,
> although I think I'd make some changes to it.

Good :)

It's not meant to be holy writ, just a way to see how you might
approach this problem without getting the EAV monkey on your back.
That monkey slowly turns into an 800-lb gorilla, and then grows very
quickly, crushing your app.

> I wouldn't have trouble data-mining such a structure for individual
> questionnaire results.  The planner will be shrugging its shoulders,
> but I haven't actually tested that solution with many massive
> questionnaires for its query performance.
> 
> I pretty much have my answer.  Thanks for your input guys.

Happy to help :)

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] stackbuilder updates

2008-10-24 Thread Dave Page
On Wed, Oct 22, 2008 at 11:38 PM, Jeff <[EMAIL PROTECTED]> wrote:
> Greetings:
>
> I successfully installed PostgreSQL 8.3.4-1 on Windows 2003 Server.
> Additionally, I used the stackbuilder to install Apache 2.2.4 and PHP 5.1.3.
>
> What is the suggested method of updating Apache and PHP?

Re-run stackbuilder and install updates as they are released.
Current;y there is no update for ApachePHP, but I expect to push one
out sometime after Diwali.

> BTW, I found it interesting that Apache and PHP would live within my
> PostgreSQL dir by default (with no opt to change it during install).

Hmmm, that's perhaps an oversight in how ApachePHP is registered in
StackBuilder. I'll look at that.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres optimizer choosing wrong index

2008-10-24 Thread Tom Lane
Jack Orenstein <[EMAIL PROTECTED]> writes:
> - I created two schemas, NOVAC and VAC, each with a table T as described 
> above.

> - Before loading data, I ran VACUUM ANALYZE on VAC.T.

> - I then started loading data. The workload is a mixture of INSERT,   SELECT 
> and 
> UPDATE. For SELECT and UPDATE the WHERE clause always includes "dh = ? and fh 
> = ?".

Basically your problem here is that vacuum records the size of the table
as zero (in pg_class.relpages/reltuples) and that causes the computed
costs of the two indexscans to be exactly the same, so it's a tossup
which one gets used.  (In recent versions I think the index with higher
OID would typically get chosen in a tie, but I forget if 7.4 worked that
way.)

8.0 and up are smart enough not to believe pg_class.relpages anymore
after you've loaded a lot of data, but 7.4 isn't.  In testing similar
cases here, I get reasonable cost estimates and a sane plan choice
from 7.4 so long as the stats are up to date.

Bottom line: you need to vacuum (or preferably analyze) *after*
initially populating a table, not before.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general