date:20080109

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Sim Zacks

It would be an administrative nightmare unless you had very few where clauses 
that you were tracking.


Instead of using a trigger, you could use Listen/Notify to call a daemon on the 
server to run the procedure and then you have no insert/delete overhead.

Or you could call the function on a cron job every 10 minutes...

Chris Browne wrote:

[EMAIL PROTECTED] (Zoltan Boszormenyi) writes:

which will be fast and depending on the initial value of COUNT(*)
it will be very close to the exact figure. You can extend the example
with more columns if you know your SELECT COUNT(*) ... WHERE
conditions in advance but this way you have to keep several administrative
tables for different monitored tables. Again, this trades some disk space
and INSERT/DELETE operation speed on the monitored tables for
quicker count.


Actually, this approach will be Really Terrible for any cases where
multiple connections are adding/deleting tuples concurrently, as it
will force ALL updates to serialize behind the update to the central
table.

Occasionally, you'll have something even worse, namely a deadlock,
where two or more of the updates fighting over the single summary
tuple fall into a bad state, and one of them is forced to give up,
potentially rolling back its whole transaction.

[Waving hands for a moment]

What I would do *instead* would be for each INSERT to add a tuple with
a count of 1, and for each DELETE to add a tuple with count of -1, and
then to periodically have a single process walk through to summarize
the table.  There may be a further optimization to be had by doing a
per-statement trigger that counts the number of INSERTs/DELETEs done,
so that inserting 30 tuples (in the table being tracked) leads to
adding a single tuple with count of 30 in the summary table.

That changes the tradeoffs, again...

 - Since each INSERT/DELETE is simply doing an INSERT into the summary
   table, the ongoing activity is *never* blocking anything

 - You get the count by requesting 
SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';


 - Note that the query is MVCC-consistent with the table!

 - Once in a while, you'll want to run a single procedure that, for
   each table, deletes all the existing records, and replaces them
   with a single one consisting of the sum of the individual values.

 - You can re-sync a table by running the query:
 begin;
delete from record_count where tablename = 'foo';
insert into record_count(tablename, rec_cnt) select 'foo', (select 
count(*) from foo);
 commit;


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Query to get column-names in table via PG tables?

2008-01-09 Thread Ken Johanson

I am looking for expertise on how to program the equivalent to this 
query, but using the pg_catalog tables, which I understand have fewer 
security restrictions than information_schema in some cases:


SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

I need this to lookup the column names and their ordinal position for 
a given table (implementing a driver call).


Just curious... but why is ordinal position important here?





Because the API spec (JDBC) for the driver supports an argument of 
column indexes (int array) which are the table's natural position. This 
is to specify which columns' auto-generated keys to return.


http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String,%20int[])

So in this case I must pre-fetch the column names from the indexes, and 
append a RETURNING clause. Inefficient but the only strategy I know of.


I wont argue if this API is somewhat dubious in ordinary applications, 
but the interface requires it be implemented anyway.


Ken



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] vacuum, dead rows, usual solutions didn't help

2008-01-09 Thread Gábor Farkas


Joshua D. Drake wrote:

Gábor Farkas wrote:

hi,

i have a postgresql-8.2.4 db,

and vacuuming it does not remove the dead rows


basically, the problem is this part of the vacuum-output:



on the db-server, 4 postgres processes are "idle in transaction", but
none is older than 2 days.


If you have something idle in transaction, your vacuums are useless. You 
need to fix your app.




maybe i described it the wrong way:

there is only 1 process, that is constantly "idle in transaction".

the remaining 3 were only idle-in-transaction at that point. so if i 
would keep checking for idle-in-transaction processes, the list of them 
would keep changing.


are you saying, that a process should NEVER be idle-in-transaction? not 
even for a short time? (like some seconds?)


also, even if it is wrong, can an 'idle-in-transaction' connection that 
was opened today block the vacuuming of rows that were deleted yesterday?



thanks,
gabor

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] vacuum, dead rows, usual solutions didn't help

2008-01-09 Thread Gábor Farkas


Tom Lane wrote:

=?iso-8859-1?Q?G=E1bor?= Farkas <[EMAIL PROTECTED]> writes:

basically, the problem is this part of the vacuum-output:



INFO:  "sessions": found 2 removable, 6157654 nonremovable row versions
in 478069 pages
DETAIL:  6155746 dead row versions cannot be removed yet.


The problem is that you've got some old open transactions that could
potentially see those recently-dead rows, so VACUUM can't remove the
rows without breaking MVCC rules for those transactions.

Find the clients that are holding open transactions, and zap 'em.


well, that's the problem :-)

if i do a "ps aux | grep postgres" on the db-server, then

(ignoring the bin/postgres, writer-process and stats-collector-process)

then the oldest process is 3 days old.

but this dead-row-issue we have since weeks.

is it possible that a recently opened connection is blocking the 
vacuuming of older-dead-rows?





previously we were running this application with postgresql-7.4, and
there the vacuuming worked fine. now we migrated this to
postgresql-8.2.4, and it does not want to vacuum it properly.


I wonder whether you updated the client-side support libraries?


yes, they were updated too.


a related question:

on the db-server, let's say i have 3 databases: A, B, C.

if i have problems vacuuming B, then i only have to look for 
processes/transactions touching B, correct? in other words, if  i do a 
"ps aux | grep postgres", then i see also the username and the 
database-name in the process-list. and if a postgres-process is not 
touching B, then i do not have to check it. is this correct?


gabor

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Query to get column-names in table via PG tables?

2008-01-09 Thread Joshua D. Drake


Ken Johanson wrote:

Hi all,

I am looking for expertise on how to program the equivalent to this 
query, but using the pg_catalog tables, which I understand have fewer 
security restrictions than information_schema in some cases:


SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

I need this to lookup the column names and their ordinal position for a 
given table (implementing a driver call).


Just curious... but why is ordinal position important here?

Joshua D. Drake



Thanks in advance,
Ken



---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] vacuum, dead rows, usual solutions didn't help

2008-01-09 Thread Joshua D. Drake


Gábor Farkas wrote:

hi,

i have a postgresql-8.2.4 db,

and vacuuming it does not remove the dead rows


basically, the problem is this part of the vacuum-output:



on the db-server, 4 postgres processes are "idle in transaction", but
none is older than 2 days.


If you have something idle in transaction, your vacuums are useless. You 
need to fix your app.


Sincerely,

Joshua D. Drake



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

[GENERAL] Query to get column-names in table via PG tables?

2008-01-09 Thread Ken Johanson


Hi all,

I am looking for expertise on how to program the equivalent to this 
query, but using the pg_catalog tables, which I understand have fewer 
security restrictions than information_schema in some cases:


SELECT column_name
FROM information_schema.columns
WHERE table_catalog=? AND table_schema=? AND table_name=?
ORDER BY ordinal_position

I need this to lookup the column names and their ordinal position for a 
given table (implementing a driver call).


Thanks in advance,
Ken



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] vacuum, dead rows, usual solutions didn't help

2008-01-09 Thread Tom Lane

=?iso-8859-1?Q?G=E1bor?= Farkas <[EMAIL PROTECTED]> writes:
> basically, the problem is this part of the vacuum-output:

> INFO:  "sessions": found 2 removable, 6157654 nonremovable row versions
> in 478069 pages
> DETAIL:  6155746 dead row versions cannot be removed yet.

The problem is that you've got some old open transactions that could
potentially see those recently-dead rows, so VACUUM can't remove the
rows without breaking MVCC rules for those transactions.

Find the clients that are holding open transactions, and zap 'em.

> previously we were running this application with postgresql-7.4, and
> there the vacuuming worked fine. now we migrated this to
> postgresql-8.2.4, and it does not want to vacuum it properly.

I wonder whether you updated the client-side support libraries?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] vacuum, dead rows, usual solutions didn't help

2008-01-09 Thread Gábor Farkas

hi,

i have a postgresql-8.2.4 db,

and vacuuming it does not remove the dead rows


basically, the problem is this part of the vacuum-output:

"
HINT:  Close open transactions soon to avoid wraparound problems.
INFO:  vacuuming "public.sessions"
INFO:  scanned index "sessions_pkey" to remove 2 row versions
DETAIL:  CPU 0.60s/0.25u sec elapsed 61.57 sec.
INFO:  "sessions": removed 2 row versions in 2 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "sessions_pkey" now contains 6157654 row versions in 52923
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "sessions": found 2 removable, 6157654 nonremovable row versions
in 478069 pages
DETAIL:  6155746 dead row versions cannot be removed yet.
There were 8735 unused item pointers.
107 pages contain useful free space.
0 pages are entirely empty.
CPU 6.02s/1.58u sec elapsed 598.05 sec.
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
INFO:  vacuuming "pg_toast.pg_toast_5525738"
INFO:  index "pg_toast_5525738_index" now contains 13957669 row versions
in 38328 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.40s/0.04u sec elapsed 22.26 sec.
INFO:  "pg_toast_5525738": found 0 removable, 13957669 nonremovable row
versions in 3461686 pages
DETAIL:  13938280 dead row versions cannot be removed yet.
There were 154 unused item pointers.
69 pages contain useful free space.
0 pages are entirely empty.
CPU 39.95s/6.19u sec elapsed 1139.50 sec.
INFO:  analyzing "public.sessions"
INFO:  "sessions": scanned 3000 of 478438 pages, containing 12 live rows
and 38419 dead rows; 12 rows in sample, 1914 estimated total rows
INFO:  free space map contains 26849 pages in 444 relations
DETAIL:  A total of 30736 page slots are in use (including overhead).
30736 page slots are required to track all free space.
Current limits are:  153600 page slots, 1000 relations, using 1005 kB.
"

(the full vacuum-log is at http://www.nekomancer.net/tmp/vacuum.txt)

the "sessions" table hold session-data for a web-application (the code
uses the perl Apache::Session module btw.), so it
changes very often, and is vacuumed every hour (using a cronjob).

previously we were running this application with postgresql-7.4, and
there the vacuuming worked fine. now we migrated this to
postgresql-8.2.4, and it does not want to vacuum it properly.

the migration to 8.2.4 happened approx. one month ago, and this dead-row
count has been growing since then.

what i tried:

ps aux | grep postgres on the db-server, and found some connections that
were quite old. i restarted the applications that "caused" those
connections, so right now there are no too old connections.

pg_stat_activity: the query_start of every entry is on today, for the
entries with null query_start the postgres processes are not older than
2 days.

on the db-server, 4 postgres processes are "idle in transaction", but
none is older than 2 days.

in pg_locks, all the locks that are for the "sessions" table are from
"young" (today-created) connections, and their locks are RowShareLock or
AccessShareLock.


so currently i am out of ideas what to check...

well, actually there is one more idea: maybe the autovacuuming process
somehow "conflicts" with the manual-vacuuming cronjob? is that possible?

any other ideas?

thanks,
gabor

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] XML path function

2008-01-09 Thread x asasaxax

Can you tell me, in how much time did the query will take with indexes +
tsearch2?
How much time take a satisfactory query?
Can you show me some examples with tsearch2 and xml indexes?

Thanks

Re: [GENERAL] Storing and querying boolean fields

2008-01-09 Thread Robert Treat

On Wednesday 09 January 2008 20:09, Greg Smith wrote:
> On Wed, 9 Jan 2008, Richard Brown wrote:
> > - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> > upgrade if I could, but the decision is not mine.
>
> Just make sure you CYA so when said server eats itself the decision maker
> can't point the finger at you.  Give them a copy of a dated written
> statement from you with text like "PostgreSQL 7.3.4 has several major bugs
> that are known to cause data corruption" and quote some of the choicer
> items in the trail of release notes at
> http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are
> in 7.3.7, 7.3.12, 7.3.19.
>
> The bonus to this approach, where you disclaim the whole platform as known
> to be unreliable, is that if you ever make a huge mistake you can shrug
> off the blame by saying the old database that you told them was buggy, but
> weren't allowed to upgrade, ate the data and it was totally not your
> fault.
>

It would probably also be nice to reference the last security announcement 
(http://www.postgresql.org/about/news.905) which includes this relevant 
message:

"Minor release 7.3.21 for PostgreSQL version 7.3 will be the last update to 
the 7.3 branch. As version 7.3 is now over five years old, the community will 
no longer release patches for it after today's release. Users of version 7.3 
are encouraged to upgrade to a more current version as soon as possible, or 
to seek support from a commercial support vendor who is willing to continue 
backpatching for them." 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Robert Treat

On Wednesday 09 January 2008 18:21, Joshua D. Drake wrote:
> On Wed, 09 Jan 2008 14:17:14 -0800
>
> Jeff Davis <[EMAIL PROTECTED]> wrote:
> > I posted to LKML here:
> >
> > http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202
> >
> > because linux has a behavior -- which in my opinion is a bug -- that
> > causes the OOM killer to almost always kill PostgreSQL first,
> > regardless of whether it was truly the offending process or not.
>
> If that isn't an argument for FreeBSD I don't know what is...
>

Funny, it looked like an argument for Solaris to me. ;-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Gregory Stark

"Josh Harrison" <[EMAIL PROTECTED]> writes:

>  Aggregate  (cost=342178.51..342178.52 rows=1 width=0)
>->  Bitmap Heap Scan on person  (cost=3120.72..341806.71 rows=148721 
> width=0)
>  Recheck Cond: (person_id > 114600::numeric)
>  ->  Bitmap Index Scan on person_pk  (cost=0.00..3083.53 rows=148721 
> width=0)
>Index Cond: (person_id > 114600::numeric)
...
> How does the planner choose the plan?

The short answer is that it looks at the "cost" for all the reasonable plans
it can come up with and picks the plan with the lowest cost.

Scans which return many rows will tend to prefer bitmap index scans (when they
don't do a full sequential scan of the t table) since it avoids random access.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] performance differences of major versions

2008-01-09 Thread Matthew T. O'Connor


Pavel Stehule wrote:

pgbench test - default configuration

Verze   7.3.15  7.4.13  8.0.8   8.1.4   8.2.beta1 8.3beta1
tps 311 340 334 398 423 585

but pgbench is simple test and thise numbers hasnot great value.


Was that the same version of pgbench each time?  Or was it the pgbench 
that came with each version?  I think pgbench has changed a few times, 
if you are using different versions of pgbench, are these numbers at all 
meaningful?


Matt

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Storing and querying boolean fields

2008-01-09 Thread Greg Smith


On Wed, 9 Jan 2008, Richard Brown wrote:


- We are using PostgreSQL 7.3.4, and am locked into this version. I would
upgrade if I could, but the decision is not mine.


Just make sure you CYA so when said server eats itself the decision maker 
can't point the finger at you.  Give them a copy of a dated written 
statement from you with text like "PostgreSQL 7.3.4 has several major bugs 
that are known to cause data corruption" and quote some of the choicer 
items in the trail of release notes at 
http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are 
in 7.3.7, 7.3.12, 7.3.19.


The bonus to this approach, where you disclaim the whole platform as known 
to be unreliable, is that if you ever make a huge mistake you can shrug 
off the blame by saying the old database that you told them was buggy, but 
weren't allowed to upgrade, ate the data and it was totally not your 
fault.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 12:38:43PM -0700, Guido Neitzer wrote:
> >>Easy multi-master clustering with just two machines.

> As I said: FrontBase is offering that.

It looks like a two-phase commit answer, if I'm reading correctly.  You can
do this today on many systems (including Postgres), but the classical
problem with 2PC clustering is that it is very slow compared to single-node
systems.  Anyway, FrontBase looks interesting.  Thanks for the reference.

A

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] performance differences of major versions

2008-01-09 Thread Bruce Momjian

Pavel Stehule wrote:
> Hello
> 
> pgbench test - default configuration
> 
> Verze 7.3.15  7.4.13  8.0.8   8.1.4   8.2.beta1 8.3beta1
> tps   311 340 334 398 423 585
> 
> but pgbench is simple test and thise numbers hasnot great value.

Wow, even though it is a single benchmark, I have never seen such a
clear comparison between Postgres versions, and the 8.2->8.3 improvement
is huge, +38% improvement. 

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Storing and querying boolean fields

2008-01-09 Thread Tom Lane

"Richard Brown" <[EMAIL PROTECTED]> writes:
> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

They won't even let you update to 7.3.something-reasonably-current ?

Resign.  Go find a job with a boss whose IQ is above room temperature.
You want to be out of there before the finger-pointing starts, which
it certainly will when one of the many fixed-since-7.3.4 bugs eats
your database.

> I need to add many boolean flags to my database that need to be retrieved by
> primary key, as well as be searched upon with decent speed. By "many", I
> mean about 30 booleans total. The added bonus is that some of these values
> need to be searched in an OR manner, while others need to be searched as an
> AND. For example, I will need to do a search that looks something like:
> select * from table where A && B && C && (D || E) && F

As for the problem at hand, the contrib/intarray module would probably
fill the bill nicely, though I dunno how far I'd trust either it or the
GIST index code back in 7.3.  You'd represent the flags as an array of
the numbers of the flags that are ON, and search it with the @@
operator:
  int[] @@ query_int  - returns TRUE if array satisfies query (like '1&(2|3)') 
This on-disk representation might be a bit larger than you were hoping
for, but it'd not break if you suddenly found you needed 33 flags not 32...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Guido Neitzer


On 09.01.2008, at 13:51, Martin wrote:


I've been working with FrontBase a lot lately and I wouldn't say
anything about it qualifies as "incredibly easy" and reliable it
is not.


We had never ever any reliability issues with FrontBase as long as  
didn't try to insert garbage. It really doesn't like that.



Performance of FrontBase is just plain terrible. One of
our reports takes 9 minutes on FrontBase and 10 seconds on
Postgres.


As I said: depends on what you are doing.

Which version did you use where you got that terrible performance? The  
latest one? There was a bug in there query planner in an old version  
that totally killed some queries.


There is also the or-query problem (not sure whether that one is still  
in there): if your report does something like "select bar from foo  
where a = 1 or b = 1;" it didn't use indexes on a or b which is  
terrible. If you have that, use separate selects combined with a union  
if you can.


In my cases it was never as fast as PostgreSQL as soon as there are  
more than two tables involved. But except a couple of cases where I  
had to use PostgreSQL for performance reasons, I was okay with it. But  
performance is not everything.


PostgreSQL is for most cases the better product, but for some cases,  
FrontBase is. Depends on what you're doing how much which side has ...



Then there's the documentation issue...


PostgreSQL documentation is better, but what is your issue exactly?  
The main problem I have with the FrontBase documentation is that it  
mostly gives no useful examples.


As this is off-topic here: we can transfer that either to the  
FrontBase list or to private mail if you like.


cug

--
http://www.event-s.net


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Storing and querying boolean fields

2008-01-09 Thread Richard Brown

Hi All,

First, some background:

- We are using PostgreSQL 7.3.4, and am locked into this version. I would
upgrade if I could, but the decision is not mine.
- The table referred to below is 120+ million rows, and has a width of 27
columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text
field, and 1 double precision). This table already has 3 multi-field
indexes, as well as a primary key index.

I need to add many boolean flags to my database that need to be retrieved by
primary key, as well as be searched upon with decent speed. By "many", I
mean about 30 booleans total. The added bonus is that some of these values
need to be searched in an OR manner, while others need to be searched as an
AND. For example, I will need to do a search that looks something like:
select * from table where A && B && C && (D || E) && F

Our first thought was to "pack" these booleans into integer fields to save
space and help with searching (so we thought). The design calls for a
smallint field and an integer field to be added to the table. UPDATE/INSERT
would be handled by calculating the values of the two fields outside of the
database. Searching would be accomplished by taking the bits that were
requested, creating a bitmask corresponding to the 2 fields, and then doing
something like:  select * from table where (field & BITMASK) = BITMASK.  We
felt that putting this information in this table rather than a separate one
would help performance by eliminating a JOIN, and it was assumed (ugh) that
if we built an index on those two fields, that we would be able to use those
indexes for searching in the aforementioned manner.

Unfortunately we have come up with the following problems with this
approach:
1. Postgres will only use 1 index per table, so putting these fields in the
same table as several other searchable fields actually hurts us more than
helps.
2. We haven't been able to get any index scans in preliminary testing (all
seq. scans).

After all of that background, my actual question is:  what is the best way
to effectively store and query this type of data? It seems more and more
that our initial design is just flat-out wrong, but we are at a loss as to
what the other options are.

Thanks very much in advance, and I apologize for the length of the message,
but I felt background info would be important.

Regards,
Richard

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 09 Jan 2008 14:17:14 -0800
Jeff Davis <[EMAIL PROTECTED]> wrote:


> I posted to LKML here:
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202
> 
> because linux has a behavior -- which in my opinion is a bug -- that
> causes the OOM killer to almost always kill PostgreSQL first,
> regardless of whether it was truly the offending process or not.

If that isn't an argument for FreeBSD I don't know what is...

/linuxpoet

> 
> So, find out which process truly caused the memory pressure that lead
> to the OOM being invoked, and fix that problem.
> 
> You may also consider some other linux configuration options that make
> invocation of OOM killer less likely.
> 
> Regards,
>   Jeff Davis
> 
> 
> ---(end of
> broadcast)--- TIP 2: Don't 'kill -9' the
> postmaster
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhVcVATb/zqfZUUQRAp0YAJ4ooisf5xRDvXegEl2f/r3TTTB4jACfSuFn
O1MUlow1sg++4zdoh6TGu6Y=
=JMWG
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 3:57 PM, Hervé Piedvache <[EMAIL PROTECTED]> wrote:

SNIP

> 0+0
> Jan  9 20:30:48 db2 kernel: Free swap  = 15623168kB
> Jan  9 20:30:48 db2 kernel: Total swap = 15623172kB
> Jan  9 20:30:48 db2 kernel: Free swap:   15623168kB
> Jan  9 20:30:48 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0
> Jan  9 20:30:48 db2 kernel: Mem-info:
> Jan  9 20:30:48 db2 kernel: DMA per-cpu:
> Jan  9 20:30:48 db2 postgres[7634]: [2-1] LOG:  background writer process (PID
> 7639) was terminated by signal 9

This makes no sense to me.  The OS is showing that there's
16G free swap.  Why is it killing things?  I'm betting there's some
bug with too large of a swap resulting in some kind of wrap around or
something.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Chris Browne

[EMAIL PROTECTED] (Zoltan Boszormenyi) writes:
> which will be fast and depending on the initial value of COUNT(*)
> it will be very close to the exact figure. You can extend the example
> with more columns if you know your SELECT COUNT(*) ... WHERE
> conditions in advance but this way you have to keep several administrative
> tables for different monitored tables. Again, this trades some disk space
> and INSERT/DELETE operation speed on the monitored tables for
> quicker count.

Actually, this approach will be Really Terrible for any cases where
multiple connections are adding/deleting tuples concurrently, as it
will force ALL updates to serialize behind the update to the central
table.

Occasionally, you'll have something even worse, namely a deadlock,
where two or more of the updates fighting over the single summary
tuple fall into a bad state, and one of them is forced to give up,
potentially rolling back its whole transaction.

[Waving hands for a moment]

What I would do *instead* would be for each INSERT to add a tuple with
a count of 1, and for each DELETE to add a tuple with count of -1, and
then to periodically have a single process walk through to summarize
the table.  There may be a further optimization to be had by doing a
per-statement trigger that counts the number of INSERTs/DELETEs done,
so that inserting 30 tuples (in the table being tracked) leads to
adding a single tuple with count of 30 in the summary table.

That changes the tradeoffs, again...

 - Since each INSERT/DELETE is simply doing an INSERT into the summary
   table, the ongoing activity is *never* blocking anything

 - You get the count by requesting 
SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';

 - Note that the query is MVCC-consistent with the table!

 - Once in a while, you'll want to run a single procedure that, for
   each table, deletes all the existing records, and replaces them
   with a single one consisting of the sum of the individual values.

 - You can re-sync a table by running the query:
 begin;
delete from record_count where tablename = 'foo';
insert into record_count(tablename, rec_cnt) select 'foo', (select 
count(*) from foo);
 commit;
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/languages.html
Rules of  the Evil Overlord #174.  "If I am dangling  over a precipice
and the hero reaches  his hand down to me, I will  not attempt to pull
him down with  me. I will allow him to rescue  me, thank him properly,
then return to the safety of my fortress and order his execution."

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Hervé Piedvache


Le mercredi 09 janvier 2008, Jeff Davis a écrit :
> On Wed, 2008-01-09 at 22:57 +0100, HervÃ© Piedvache wrote:
> > Hi,
> >
> > I have a big trouble with a PostgreSQL server ... regulary since I have
> > added 8 Gb of memory, on a server having already 8Gb of memory, I have
> > troubles. Nothing else have changed ... I'm on a Dell server, and all the
> > memory diagnostics from Dell seems to be good ...
> > When I have a lot of connexions (persistante connexions from 6 web
> > apache/php serveurs using PDO, about 110 process on each web servers) on
> > the server, or long request, it's difficult for me to know when it's
> > appening, the kernel seems to kill my postgresql process then the server
> > become completly instable, and most of the time need a reboot ...
> >
> > I'm on Linux kernel 2.6.15 with a version 8.1.10 of PostgreSQL.
> > My database is a size of 56G
> > RAM = 16 Gb
>
> [snip]
>
> > Jan  9 20:30:47 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0
>
> It looks like the Out Of Memory Killer was invoked, and you need to find
> out why it was invoked.
>
> I posted to LKML here:
>
> http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202
>
> because linux has a behavior -- which in my opinion is a bug -- that
> causes the OOM killer to almost always kill PostgreSQL first, regardless
> of whether it was truly the offending process or not.
>
> So, find out which process truly caused the memory pressure that lead to
> the OOM being invoked, and fix that problem.

How can I process to find this ? It's a production server for a web service, 
and I have no idea how to find wich process was the cause of this ... !?

> You may also consider some other linux configuration options that make
> invocation of OOM killer less likely.

On this server there is only Postgresql, slony, and sshd running the rest is 
only Linux basic process (cron, atd, getty etc.)

regards,
-- 
Hervé Piedvache

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Hervé Piedvache

Tom,

Le mercredi 09 janvier 2008, Tom Lane a écrit :
> =?utf-8?q?Herv=C3=A9_Piedvache?= <[EMAIL PROTECTED]> writes:
> > When I have a lot of connexions (persistante connexions from 6 web
> > apache/php serveurs using PDO, about 110 process on each web servers) on
> > the server, or long request, it's difficult for me to know when it's
> > appening, the kernel seems to kill my postgresql process then the server
> > become completly instable, and most of the time need a reboot ...
>
> Turn off memory overcommit.

My sysctl.conf file looks like this :
kernel.shmmax= 941604096
kernel.sem = 250 32000 100 400
fs.file-max=655360
vm.overcommit_memory=2
vm.overcommit_ratio=30

> > max_connections = 2048
>
> Have you considered using a connection pooler in front of a smaller
> number of backends?

Which system do you recommand for this ?

> If you really need that many backends, it'd likely be a good idea to
> reduce max_files_per_process to perhaps 100 or so.  If you manage
> to run the kernel out of filetable slots, all sorts of userland stuff
> is going to get very unhappy.

I'll try this ...

regards,
-- 
Hervé

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Tom Lane

=?utf-8?q?Herv=C3=A9_Piedvache?= <[EMAIL PROTECTED]> writes:
> When I have a lot of connexions (persistante connexions from 6 web apache/php 
> serveurs using PDO, about 110 process on each web servers) on the server, or 
> long request, it's difficult for me to know when it's appening, the kernel 
> seems to kill my postgresql process then the server become completly 
> instable, and most of the time need a reboot ...

Turn off memory overcommit.

> max_connections = 2048

Have you considered using a connection pooler in front of a smaller
number of backends?

If you really need that many backends, it'd likely be a good idea to
reduce max_files_per_process to perhaps 100 or so.  If you manage
to run the kernel out of filetable slots, all sorts of userland stuff
is going to get very unhappy.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Jeff Davis

On Wed, 2008-01-09 at 22:57 +0100, Hervé Piedvache wrote:
> Hi,
> 
> I have a big trouble with a PostgreSQL server ... regulary since I have added 
> 8 Gb of memory, on a server having already 8Gb of memory, I have troubles. 
> Nothing else have changed ... I'm on a Dell server, and all the memory 
> diagnostics from Dell seems to be good ...
> When I have a lot of connexions (persistante connexions from 6 web apache/php 
> serveurs using PDO, about 110 process on each web servers) on the server, or 
> long request, it's difficult for me to know when it's appening, the kernel 
> seems to kill my postgresql process then the server become completly 
> instable, and most of the time need a reboot ...
> 
> I'm on Linux kernel 2.6.15 with a version 8.1.10 of PostgreSQL.
> My database is a size of 56G
> RAM = 16 Gb
> 

[snip]

> Jan  9 20:30:47 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0

It looks like the Out Of Memory Killer was invoked, and you need to find
out why it was invoked.

I posted to LKML here:

http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202

because linux has a behavior -- which in my opinion is a bug -- that
causes the OOM killer to almost always kill PostgreSQL first, regardless
of whether it was truly the offending process or not.

So, find out which process truly caused the memory pressure that lead to
the OOM being invoked, and fix that problem.

You may also consider some other linux configuration options that make
invocation of OOM killer less likely.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Kernel kills postgres process - help need

2008-01-09 Thread Hervé Piedvache

Hi,

I have a big trouble with a PostgreSQL server ... regulary since I have added 
8 Gb of memory, on a server having already 8Gb of memory, I have troubles. 
Nothing else have changed ... I'm on a Dell server, and all the memory 
diagnostics from Dell seems to be good ...
When I have a lot of connexions (persistante connexions from 6 web apache/php 
serveurs using PDO, about 110 process on each web servers) on the server, or 
long request, it's difficult for me to know when it's appening, the kernel 
seems to kill my postgresql process then the server become completly 
instable, and most of the time need a reboot ...

I'm on Linux kernel 2.6.15 with a version 8.1.10 of PostgreSQL.
My database is a size of 56G
RAM = 16 Gb

kernel shmmax : 941604096

Postgresql config :
max_connections = 2048
shared_buffers = 4
#temp_buffers = 1000# min 100, 8KB each
work_mem = 2048 # min 64, size in KB
maintenance_work_mem = 512000   # min 1024, size in KB
max_stack_depth = 4096  # min 100, size in KB
max_fsm_pages = 2500
max_fsm_relations = 2000# min 100, ~70 bytes each
max_files_per_process = 255 # min 25
fsync = on
wal_buffers = 128   # min 4, 8KB each
commit_delay = 500  # range 0-10, in microseconds
commit_siblings = 5 # range 1-1000
checkpoint_segments = 160
effective_cache_size = 60   # typically 8KB each
random_page_cost = 2

Syslog when crashing :
Jan  9 20:30:47 db2 kernel: oom-killer: gfp_mask=0x84d0, order=0
Jan  9 20:30:48 db2 kernel: Mem-info:
Jan  9 20:30:48 db2 kernel: DMA per-cpu:
Jan  9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 0, batch 1 used:0
Jan  9 20:30:48 db2 kernel: DMA32 per-cpu: empty
Jan  9 20:30:48 db2 kernel: Normal per-cpu:
Jan  9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 186, batch 31 used:5
Jan  9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 62, batch 15 used:59
Jan  9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 186, batch 31 used:22
Jan  9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 62, batch 15 used:49
Jan  9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 186, batch 31 used:33
Jan  9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 62, batch 15 used:60
Jan  9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 186, batch 31 used:3
Jan  9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 62, batch 15 used:55
Jan  9 20:30:48 db2 kernel: HighMem per-cpu:
Jan  9 20:30:48 db2 kernel: cpu 0 hot: low 0, high 186, batch 31 used:5
Jan  9 20:30:48 db2 kernel: cpu 0 cold: low 0, high 62, batch 15 used:5
Jan  9 20:30:48 db2 kernel: cpu 1 hot: low 0, high 186, batch 31 used:11
Jan  9 20:30:48 db2 kernel: cpu 1 cold: low 0, high 62, batch 15 used:4
Jan  9 20:30:48 db2 kernel: cpu 2 hot: low 0, high 186, batch 31 used:17
Jan  9 20:30:48 db2 kernel: cpu 2 cold: low 0, high 62, batch 15 used:14
Jan  9 20:30:48 db2 kernel: cpu 3 hot: low 0, high 186, batch 31 used:14
Jan  9 20:30:48 db2 kernel: cpu 3 cold: low 0, high 62, batch 15 used:9
Jan  9 20:30:48 db2 kernel: Free pages:  497624kB (490232kB HighMem)
Jan  9 20:30:48 db2 kernel: Active:3604892 inactive:234379 dirty:20273 
writeback:210 unstable:0 free:124406 slab:49119 mapped:547571 
pagetables:139724
Jan  9 20:30:48 db2 kernel: DMA free:3588kB min:68kB low:84kB high:100kB 
active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? 
yes
Jan  9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 880 17392
Jan  9 20:30:48 db2 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan  9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 880 17392
Jan  9 20:30:48 db2 kernel: Normal free:3804kB min:3756kB low:4692kB 
high:5632kB active:508kB inactive:464kB present:901120kB pages_scanned:975 
all_unreclaimable? yes
Jan  9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 0 132096
Jan  9 20:30:48 db2 kernel: HighMem free:490108kB min:512kB low:18148kB 
high:35784kB active:14419044kB inactive:937112kB present:16908288kB 
pages_scanned:0 all_unreclaimable? no
Jan  9 20:30:48 db2 kernel: lowmem_reserve[]: 0 0 0 0
Jan  9 20:30:48 db2 kernel: DMA: 1*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 
1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Jan  9 20:30:48 db2 kernel: DMA32: empty
Jan  9 20:30:48 db2 kernel: Normal: 35*4kB 0*8kB 7*16kB 5*32kB 1*64kB 0*128kB 
1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3804kB
Jan  9 20:30:48 db2 kernel: HighMem: 29171*4kB 43358*8kB 162

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Zoltan Boszormenyi


Ivan Sergio Borgonovo írta:

On Wed, 09 Jan 2008 20:29:39 +0100
Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote:

  

The decision to use MVCC in PostgreSQL makes the point moot.



...

thanks.

  

In PostgreSQL, COUNT(*) responds closely at the same speed
regardless of other transactions. Which way do you prefer?



Considering the relative value of count my interest was for something
that is even less precise than the "usual" count but performs better.
I'm not proposing to turn Postgres into MySQL.
  


This below might be a good compromise.
Although every INSERT/DELETE will be a bit slower
because of the additional UPDATE on the administrative table.

create table rec_count (tablename text unique, rec_cnt bigint) with 
(fillfactor 50);


Add any tables you want to it with their current record count and
for any tables you want to watch:

create or replace function inc_tablename_rec_cnt()
returns trigger as $$
begin
   update rec_count set rec_cnt = rec_cnt + 1 where tablename = 
'tablename';

   return new;
end; $$ language plpgsql;

create or replace function dec_tablename_rec_cnt()
returns trigger as $$
begin
   update rec_count set rec_cnt = rec_cnt - 1 where tablename = 
'tablename';

   return new;
end; $$ language plpgsql;

create trigger ins_tablename_rec_cnt after insert on tablename for each 
row execute procedure inc_tablename_rec_cnt();
create trigger del_tablename_rec_cnt after insert on tablename for each 
row execute procedure dec_tablename_rec_cnt();


The administrative table will be a highly updated one if you want
to watch a high-traffic table, hence the FILLFACTOR setting.
You may need to adjust max_fsm_pages. Later, you can do a

SELECT rec_cnt FROM rec_count WHERE tablename = 'tablename';

which will be fast and depending on the initial value of COUNT(*)
it will be very close to the exact figure. You can extend the example
with more columns if you know your SELECT COUNT(*) ... WHERE
conditions in advance but this way you have to keep several administrative
tables for different monitored tables. Again, this trades some disk space
and INSERT/DELETE operation speed on the monitored tables for
quicker count.

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 12:24:37PM -0500, Josh Harrison wrote:
> For example if I have a query like
> select  column2 from ABC where column1 > 20
> and table ABC is indexed on (column1,column2) then Oracle will not goto the
> heap to fetch the tuples. It will return them from the index itself since
> the column information is available in the index. 

Ah, I didn't know that.  If your description is right, there would seem to
be some pretty severe I/O penalties when using an index that doesn't cover
all the fields you want.  I mean, I can see the gain, but it would seem
that's a significant cost.  Anyway, no point in speculating about the
implementation details of a system I don't know in detail.

> I had vacuumed and analysed the tables ...tried increasing the statistics
> too. But the performance compared to oracle for these types of queries(that
> i had mentioned above) is pretty slow

I see.  Without rather more information, we're not going to be able to help
you.  What you could do is take some detailed examples to the -performance
list.  Oracle is terrifically capable, but Postgres can usually hold its own
when correctly tuned (there are some cases where it can't, though).

A

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Martin

In article <[EMAIL PROTECTED]>,
Guido Neitzer <[EMAIL PROTECTED]> wrote:

>FrontBase. It has an incredibly easy to configure replication and  
>multi master clustering support, is very reliable and can also handle  
>really big databases.

I've been working with FrontBase a lot lately and I wouldn't say
anything about it qualifies as "incredibly easy" and reliable it
is not. Performance of FrontBase is just plain terrible. One of 
our reports takes 9 minutes on FrontBase and 10 seconds on 
Postgres.

Then there's the documentation issue...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] XML path function

2008-01-09 Thread Nikolay Samokhvalov

On Jan 9, 2008 6:00 PM, x asasaxax <[EMAIL PROTECTED]> wrote:
> My Postgre version its the 8.2. I´ve reached to do the path i wanted, but
> when i do a explain analyze on the select it return 500 miliseconds. Is this
> a good search? Is there a way to slow down this time with postgre 8.3? What
> is a good time for xml xpath´s?

Unfortunately, this is the normal speed. The reason is that Postgres
does full XML parsing and XPath evaluation at runtume.

If you use

SELECT 
FROM 
WHERE  = ,

then functional indexes over xpath function may help. Also, you could
use tsearch2 + functional XPath indexes to index text data from your
XML, if you have corresponding needs.

Unfortunately, that is almost all you can do speeding up your XPath
queries at the moment, and in terms of performance there is no any
major improvements in 8.3 either.

-- 
Nikolay Samokhvalov  <[EMAIL PROTECTED]>
http://nikolay.samokhvalov.com

Postgresmen http://postgresmen.ru
OpenWebTechnologies http://openwebtech.ru

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Prepared Statements

2008-01-09 Thread Kris Jurka




On Wed, 9 Jan 2008, [EMAIL PROTECTED] wrote:


- I know there is a PREPARE Statement in Postgresql and read the docs.
- in PostgresqlJDBC i have a prepareThreshold parameter which i left to
default of 5.
- in DBCP i have a property "poolPreparedStatements", set to true. Does ist
just configure prepareThreshold of JDBC or does it maintain a statementPool
of it's own?


The postgresql JDBC driver does not have a statement pool of its own, so 
the two options prepareThreshold and poolPreparedStatements are 
complementary.  By itself the JDBC driver will switch to a long term 
prepared plan once you've used the exact same PreparedStatement object 
prepareThreshold number of times.  Since it doesn't pool things behind the 
scenes, you've got to retain a reference to the same PreparedStatement 
object which is difficult or impossible in many applications.  This is 
where the DBCP statement pool comes in.  It proxies PreparedStatement 
wrappers to one underlying PG PreparedStatement object so that you can hit 
prepareThreshold.



In my Log files of postgresql each query is called like this:

   EXECUTE  [PREPARE: select ...]

I have not found anything about preparing "unnamed" statements. What 
does it mean?




Unnamed statements are what the driver uses before it hits the 
prepareThreshold limit.  Once it has determined the statement will be 
reused many times it changes to a named statement that has a longer 
lifespan.


Kris Jurka

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 09 Jan 2008 20:29:39 +0100
Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote:

> The decision to use MVCC in PostgreSQL makes the point moot.

...

thanks.

> In PostgreSQL, COUNT(*) responds closely at the same speed
> regardless of other transactions. Which way do you prefer?

Considering the relative value of count my interest was for something
that is even less precise than the "usual" count but performs better.
I'm not proposing to turn Postgres into MySQL.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 9 Jan 2008 13:45:10 -0600
"Scott Marlowe" <[EMAIL PROTECTED]> wrote:

> But my account rep told me it was easy, and he'd never lie to me,
> would he?  <@_@>

If he uses count(*) maybe, otherwise he is locking your $.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Prepared Statements

2008-01-09 Thread Martin Gainty

straight from jdbc2.1 doc
http://java.sun.com/j2se/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html
Statement
Statement object to submit a set of heterogeneous update commands together
as a single unit, or batch, to the underlying DBMS
i.e. execute Statement without parameters
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html

PreparedStatement
An element in a batch consists of a parameterized command and an associated
set of parameters when a PreparedStatement is used. (i.e. A Statement with
params)
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/PreparedStatement.html

CallableStatement
Multiple sets of input parameter values may be associated with a callable
statement and sent to the DBMS together.
(For procedures)
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/CallableStatement.html

Viel Gluck
Martin
- Original Message -
From: <[EMAIL PROTECTED]>
To: 
Sent: Wednesday, January 09, 2008 2:02 PM
Subject: [GENERAL] Prepared Statements


> Hi,
>
> i am trying to understand "Prepared Statements". I am asking because i
want to
> understand the impact of "Prepared statements" to my application.
>
> Actually i use Hibernate, DBCP Connection Pool with Postgresql-JDBC Driver
and
> Postgresql 8.1.
>
> - I know there is a PREPARE Statement in Postgresql and read the docs.
> - in PostgresqlJDBC i have a prepareThreshold parameter which i left to
> default of 5.
> - in DBCP i have a property "poolPreparedStatements", set to true. Does
ist
> just configure prepareThreshold of JDBC or does it maintain a
statementPool
> of it's own?
>
> In my Log files of postgresql each query is called like this:
>
> EXECUTE  [PREPARE: select ...]
>
> I have not found anything about preparing "unnamed" statements. What does
it
> mean?
>
> many questions, but i was googling a lot and "Prepared Statement" is a
> somewhat ambiguous expression, isn't it? Can someone clarify this to me,
> please?
>
> kind regards,
> Janning
>
>
>
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 10:05 AM, Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 08, 2008 at 10:59:56PM -0700, Guido Neitzer wrote:
> >
> > Easy multi-master clustering with just two machines.
>
> To my knowledge, _nobody_ actually offers that.
>
> There are three companies I know of that have done effective marketing of
> systems.
>
> Company O has a very advanced system with plenty of features.  When you see
> it functioning, it is very impressive.  Casual acquaintance with anyone who
> has attempted to implement it, however, will yield many stories that give
> the lie to any claims of "easy" multi-master.  Some implementors would be
> happy to get to "hard to do, but working" multi-master, as far as I've been
> able to ascertain.

But my account rep told me it was easy, and he'd never lie to me,
would he?  <@_@>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Guido Neitzer


On 09.01.2008, at 09:05, Andrew Sullivan wrote:


Easy multi-master clustering with just two machines.


To my knowledge, _nobody_ actually offers that.


As I said: FrontBase is offering that.

cug

--
http://www.event-s.net


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Andrew Sullivan

On Tue, Jan 08, 2008 at 10:59:56PM -0700, Guido Neitzer wrote:
> 
> Easy multi-master clustering with just two machines.

To my knowledge, _nobody_ actually offers that.

There are three companies I know of that have done effective marketing of
systems.

Company O has a very advanced system with plenty of features.  When you see
it functioning, it is very impressive.  Casual acquaintance with anyone who
has attempted to implement it, however, will yield many stories that give
the lie to any claims of "easy" multi-master.  Some implementors would be
happy to get to "hard to do, but working" multi-master, as far as I've been
able to ascertain.

Company M has a nifty 80% solution for in-memory clustering.  It's a cool
hack.  But it has a remarkably large number of failure modes and corner
cases that make it a risky answer for really high-value data.  If my bank
were using this technology, I would transfer my money to another bank.

Company I actually has the most mature technology in this area, if you're
willing to use VMS.  It relies on the features of VMS to do this.  Given
that those features were delivered precisely for the finance and insurance
industries where extremely valuable data was being stored, there is a long
history of reliable field deployments.  Unfortunately, the continued life of
VMS is in some doubt, and skilled VMS operators are increasingly hard to
find and expensive.

There are other systems, including PostgreSQL, that can do a kind of
"clustering" with multiple machines, shared disk, and some sort of heartbeat
arrangement. 

A

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Zoltan Boszormenyi


Ivan Sergio Borgonovo írta:

On Wed, 9 Jan 2008 13:04:39 +0100
"Harald Armin Massa" <[EMAIL PROTECTED]> wrote:

  

Ivan,



  

Please forgive my naiveness in this field but what does it mean an
"exact count" and what other DB means with "an exact count" and
how other DB deal with it?
  


  

PostgreSQL will give you an exact count of the contents of the
database as it is in the moment you begin your count. (i.e. the
transaction starts)



  

BUT as the table is not locked, in parallel somebody can bulkload
MANY items into the database, so at the moment (start of your
transaction)
+ 1msec your count may be invalid allready.



That's reasonable. What other DB do and what is the SQL standard
requirement for count(*)?

  

I'd expect it perform as good or better than other DB since now
the bottleneck should be how efficiently it can filter records...
but still a count(*) with a where clause will incur in the same
problem of what "exact" means.
  

I know of 3 concepts to answer count() faster then PostreSQL:

1) just lie, present an estimate
2) do not have MVCC
3) store "record deleted info" in index, so you can answer count()
with only scanning the index



Sorry if I insist but I think this is a beaten selling point against
postgresql.
One of the most recurring argument about why someone else db is
better than pg is count is slow.

Who lies?
If it is possible to do otherwise to have count run faster what are
the trade off that make it unreasonable to implement it in pg?
  


The decision to use MVCC in PostgreSQL makes the point moot.
What your transaction cannot see (e.g. another transaction inserted
a new record but didn't COMMITted yet) cannot be counted as visible.
It's not only a theoretical but also practical. Compare that to the
MaxDB way. At the very lowest level MaxDB keeps the records
in a balanced b-tree based on its primary key. Look at this URL:
http://blog.ulf-wendel.de/?p=76
MaxDB keeps how many records are in the tree at all times which is
handy if  your database is close to idle. Consequences:

- very fast COUNT(*) when idle
- records unCOMMITted by other transactions can be seen by COUNT(*)
 if your transaction is in "read committed" isolation level but not the
 actual records
- COUNT(*) waits for all other transactions that modified the table
 in question to finish if your transaction   is in "repeatable read" or
 "serializable" isolation levels. Consider that transactions can take
 a lng time to finish if they process many things. This way your
 SELECT COUNT(*) doesn't respond instantly but doesn't slow down
 your server either. But the end user perception is the same:
 COUNT(*) is slow!

In PostgreSQL, COUNT(*) responds closely at the same speed regardless
of other transactions. Which way do you prefer?


This is not very useful question but 3) imply that select scan the
index return the rows and just later check if they are still there.
Is it?
And since indexes aren't updated "on the fly" you may get back a
larger number than what is actually the real value.

Let me consider an everyday use where count() looks as the most
obvious solution: paging.

I search trough a table and I need to know which is the last page.
Concurrently someone is deleting a ton of records.
No matter if count() is fast or not when I output the pager it will
be "wrong". But still I'll need an estimate of the number of pages,
it is not fun if that estimate is wrong *and* slow.
And once you add the where clauses there is no way to cache the count.
Is there a way to count based on indexes without taking into account
deleted rows so to "count" faster?

I can make the search faster using indexes as Dann Corbit suggested,
but as you imply that won't make count as fast as the
"concurrence"[1] that lie or don't use MVCC or store deleted info in
indexes.

SELECT reltuples FROM pg_class WHERE relname = ;

doesn't apply since you can't add "conditions".

Please be patient. I hear this over and over and over. Postgresql is
faulty because it can't count fast.
And the only reply I've seen are: it is bad design to use count,
you're a dumb ass. I admit I may be a dumb ass, but it is hard to
sell Postgres if I continue to be a dumb ass ;)

- What count(*) really does?
Now I understood that count(*) return the # of rows as it sees them
at the moment it was invoked. That should be true for other DB as
well. That means that unless other DB lock the table they can't take
into account records that are going to be deleted once the count has
been issued.

- When count can be avoided?
Well since even other DB may report the "wrong" number, this makes
count() look less a Sacre Graal. But still if you need an estimate,
wouldn't it be better to have it quickly?
How does postgresql compare to other DB when you run:
select count(*) from table where conditions
once you use indexes?
If such kind of query will have anyway to scan the results to see if
they are still there since info about deleted records aren't s

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 9 Jan 2008 10:58:29 -0800
"Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> > OK... we are getting near to the point. I understand the trade-off
> > problem in storing into indexes id the row is still there.
> > Is there a way to get the count of the rows that *may be* there, 

> If you analyze regularly you can use pg_class. It isn't exact but is
> usually close enough (especially if you are just using it for
> something like pagination).

But what if I've a

select count(*) from table where condition;

where condition involves just indexed columns and I want to trust the
indexes and I'm not concerned of the deleted rows?
Just to get an estimate between reindexing cycles, that would be
perfect for paging.

pg_class does look as returning all the rows.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 12:58 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On Wed, 9 Jan 2008 20:01:05 +0100
> Ivan Sergio Borgonovo <[EMAIL PROTECTED]> wrote:
>
> > On Wed, 9 Jan 2008 10:30:45 -0600
> > "Scott Marlowe" <[EMAIL PROTECTED]> wrote:
> >
> > > Now, everything's a tradeoff.  If PostgreSQL had visibility
> > > information in the indexes, it would have to lock both the table and
> > > index for every write, thus slowing down all the other queries that
> > > are trying to access the table.  It would be a tradeoff that
> > > sacrificed write speed for read speed. In a db that was used mostly
> > > for writing, it would likely be a fair trade.  In a db that did a
> > > lot of writing, it might slow the whole thing to a crawl.
> >
> > OK... we are getting near to the point. I understand the trade-off
> > problem in storing into indexes id the row is still there.
> > Is there a way to get the count of the rows that *may be* there,
>
> If you analyze regularly you can use pg_class. It isn't exact but is
> usually close enough (especially if you are just using it for something
> like pagination).

Yeah, but the OP's point was that it doesn't work if you have a where clause.

I could see a use for an approximate count(*) with where clause, just
like I could see a use for the ability to retrieve random rows from a
table without using order by random() on it.  And those are both
things that would require some form of hacking in the db that I'm
certainly not capable of pulling off...

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Prepared Statements

2008-01-09 Thread mljv

Hi,

i am trying to understand "Prepared Statements". I am asking because i want to 
understand the impact of "Prepared statements" to my application.

Actually i use Hibernate, DBCP Connection Pool with Postgresql-JDBC Driver and 
Postgresql 8.1.

- I know there is a PREPARE Statement in Postgresql and read the docs. 
- in PostgresqlJDBC i have a prepareThreshold parameter which i left to 
default of 5. 
- in DBCP i have a property "poolPreparedStatements", set to true. Does ist 
just configure prepareThreshold of JDBC or does it maintain a statementPool 
of it's own? 

In my Log files of postgresql each query is called like this:

EXECUTE  [PREPARE: select ...]

I have not found anything about preparing "unnamed" statements. What does it 
mean?

many questions, but i was googling a lot and "Prepared Statement" is a 
somewhat ambiguous expression, isn't it? Can someone clarify this to me, 
please?

kind regards,
Janning




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 9 Jan 2008 20:01:05 +0100
Ivan Sergio Borgonovo <[EMAIL PROTECTED]> wrote:

> On Wed, 9 Jan 2008 10:30:45 -0600
> "Scott Marlowe" <[EMAIL PROTECTED]> wrote:
> 
> > Now, everything's a tradeoff.  If PostgreSQL had visibility
> > information in the indexes, it would have to lock both the table and
> > index for every write, thus slowing down all the other queries that
> > are trying to access the table.  It would be a tradeoff that
> > sacrificed write speed for read speed. In a db that was used mostly
> > for writing, it would likely be a fair trade.  In a db that did a
> > lot of writing, it might slow the whole thing to a crawl.
> 
> OK... we are getting near to the point. I understand the trade-off
> problem in storing into indexes id the row is still there.
> Is there a way to get the count of the rows that *may be* there, 

If you analyze regularly you can use pg_class. It isn't exact but is
usually close enough (especially if you are just using it for something
like pagination).

Joshua D. Drake



- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhRlVATb/zqfZUUQRAoQmAJ4nkBHmZEsC8UusCT7+qul1Qa9/0QCeJFru
gnBj3ROCMz+vqbF/1Z78roY=
=iVOP
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 9 Jan 2008 10:30:45 -0600
"Scott Marlowe" <[EMAIL PROTECTED]> wrote:

> Now, everything's a tradeoff.  If PostgreSQL had visibility
> information in the indexes, it would have to lock both the table and
> index for every write, thus slowing down all the other queries that
> are trying to access the table.  It would be a tradeoff that
> sacrificed write speed for read speed. In a db that was used mostly
> for writing, it would likely be a fair trade.  In a db that did a
> lot of writing, it might slow the whole thing to a crawl.

OK... we are getting near to the point. I understand the trade-off
problem in storing into indexes id the row is still there.
Is there a way to get the count of the rows that *may be* there, I
mean the one that satisfy the where clauses no matter if in the
meanwhile they were deleted or not and reindex on a regular basis.
That would be simple and I think fast and an acceptable solution for
most CMS.

> Now, if there's a where clause that's selective enough, then a
> count(*) query may use the index and be a bit faster, but remember,
> in pgsql, it's still got to actually hit the table to see if each
> tuple really is visible to this transaction, so the index needs to
> be fairly selective to be a win.

But well if the query is not selective enough I think the problem is
shared with other DB as well.

> A possible workaround is to have something like a separate table
> with nothing but the IDs and whatever would be in your where clause
> for the tables you're accessing with a foreign key to it, and use
> THAT for a count(*).  Since the rows are skinnier, the count(*)
> will be faster. Another alternative is to have a trigger fire that
> keeps a track of the size of the table in a summary table when rows
> are added and deleted.

If the where clause is dynamic, how can it help?

> select count(*) from bigtable where inserttime > now() - interval
> '1 week'; count
> -
>  1254269
> (1 row)
> 
> Time: 21422.368 ms
> 
> Second run:
> select count(*) from bigtable where inserttime > now() - interval
> '1 week'; count
> -
>  1254320
> (1 row)
> 
> Time: 2001.204 ms
> 
> With the data loaded into shared_buffers / linux kernel cache,
> that's not too bad.
> 
> Now, I run the same query against our production oracle machine,
> which is a MUCH more powerful server...
> 
> SQL> select count(*) from bigtable where inserttime > SYSDATE-7;
> 
>   COUNT(*)
> --
>1255972
> 
> Elapsed: 00:00:18.62
> 
> second run:
> SQL> select count(*) from bigtable where inserttime > SYSDATE-7;
> 
>   COUNT(*)
> --
>1255973
> Elapsed: 00:00:00.98
> 
> Now, obviously, Oracle's got some optimizations for what it's got in
> the buffer there, but the first run isn't really any faster. In
> fact, for a much more powerful machine, the performance was,
> relatively speaking, pretty bad compared to my little 1 CPU 1 sw
> RAID-10 reporting server.

Interesting... oh let's put some emotions in it: cool ;)

> So, while PostgreSQL's count(*) performance isn't blindingly fast,
> it's not the dog some people make it out to be either.

Continuing here from your previous post... while you guys are
concerned of competing with big boxes running Oracle or thinking
about Google's webfarms I'm thinking about stuff that is in between a
mere CMS for mortals and stuff that deals with money where you don't
have budgets and hits enough to justify a complex logic to do stats
analysis or collect enough stats to make any forecast reasonable.

In this context a simpler faster even if less accurate count may be
very handy.

thanks again

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 12:11 PM, Andrew Sullivan <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:
>
> > accessed frequently. So clustering the table according to one index will
> > yield poor performance to queries involving other indexes.
>
> Maybe not poor, but certainly not optimised.
>
> > Index-only scan is a good solution for this I guess for queries
> involving
> > indexed columns (like in oracle) !!!
>
> I think I don't know what you mean by "index-only scan".  Oracle can't be
> returning you data just by scanning an index, can it?  It needs to get the
> tuple if you need it back.

For example if I have a query like
select  column2 from ABC where column1 > 20
and table ABC is indexed on (column1,column2) then Oracle will not goto the
heap to fetch the tuples. It will return them from the index itself since
the column information is available in the index. But postgres always goes
to the heap even if it has all the queried columns in the index.
For queries that involves all the columns(like select * from ABC where
column1>20) ... of course oracle goes to the heap to fetch them

>
> My bet is that you haven't tuned your vacuums correctly, or you aren't
> doing
> ANALYSE often enough on the affected tables, or you need to SET STATISTICS
> higher on some of the columns in order to get better estimates (and
> therefore better plans).

I had vacuumed and analysed the tables ...tried increasing the statistics
too. But the performance compared to oracle for these types of queries(that
i had mentioned above) is pretty slow

josh

Re: [GENERAL] Insert waiting for update?

2008-01-09 Thread Albe Laurenz

Ashish Karalkar wrote:
> I am having table with 4M rows.
> I am trying to update all these rows with statement
> 
> update mytable set mycolumn=0;
> 
> At the same time there are insert happening on the table.
> but all these insert are in waiting mode. 
> does update is locking the table for insert?
> 
> does insert and update confilict with each other?
> 
> For more details:
> 
> I have two tables master,child.
> with child having fk to master.
> 
> Now that master table contains 4M rows . while I update them 
> (Master table) the inserts are going into waiting mode on child table.
> 
> 
> Update  acquired row exclusive lock on master table  , and 
> the insert on child table acquired access share and share 
> lock,row share on master. 
> 
> Is this is what preventing for inserting records into child 
> table, I mean is the row exclusive lock conflicting with the 
> locks insert wants to hold?

Please post the table definitions including indexes and constraints.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 11:56 AM, Andreas Kretschmer <[EMAIL PROTECTED]>
wrote:

> Josh Harrison <[EMAIL PROTECTED]> schrieb:
> > My questions
> > 1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another
> database
> > (database2)
> > When I issued the query in both database (database1 and database2)
> >
> > select * from dummy limit 1000 ( the planner chooses seq scan for this
> query)
> > select * from shuffled_dummy limit 1000 (planner chooses seq scan)
> >
> >
> > 2. Also when does the planner switch from choosing index scan to bitmap
> index
> > scan? Is it dependent on the number of rows to be retrieved or the
> position of
> > the relevant data in the blocks or something else?
>
> For a select * ... without a WHERE the db can't use an index, this query
> forced a seq-scan.
>
> A index is used when:
> - a index are created
> - a propper WHERE or ORDER BY in the query
> - this index is useful
>  (a index isn't useful, for instance, for a small table or when almost
>  all rows are in the result set)
>
> A bitmap index scan performed when:
> - 2 or more propper indexes available
> - see above
>
> Thanks
In my database, I have a table 'person' containing roughly 30,000,000
records

explain select count(*) from person where person_id >  114700
QUERY
PLAN

 


 Aggregate  (cost=307708.20..307708.21 rows=1
width=0)
   ->  Index Scan using person_pk on person
(cost=0.00..307379.79rows=131364 width=0)
 Index Cond: (person_id >
114700::numeric)

 3 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
This returns the result
count
 
 78718

But for this query where the condition is slightly different the query plan
is different. The planner goes for bitmap index

explain select count(*) from person where person_id >  114600
QUERY
PLAN

 
---

 Aggregate  (cost=342178.51..342178.52 rows=1
width=0)
   ->  Bitmap Heap Scan on person  (cost=3120.72..341806.71 rows=148721
width=0)
 Recheck Cond: (person_id >
114600::numeric)
 ->  Bitmap Index Scan on person_pk  (cost=0.00..3083.53 rows=148721
width=0)
   Index Cond: (person_id >
114600::numeric)

 5 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
and the result is
count
 
 90625

How does the planner choose the plan?
josh

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 11:03:59AM -0600, Scott Marlowe wrote:
> 
> And if, for some god forsaken reason, you need to operate on that
> number, there's always "lock table"...

Yes.  You could also store the data in ISAM :-P

> I feel dirty. :)

You should.  Go wash your brain out with soap.  LOCK TABLE indeed.

A


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:

> accessed frequently. So clustering the table according to one index will
> yield poor performance to queries involving other indexes. 

Maybe not poor, but certainly not optimised.

> Index-only scan is a good solution for this I guess for queries involving
> indexed columns (like in oracle) !!!

I think I don't know what you mean by "index-only scan".  Oracle can't be
returning you data just by scanning an index, can it?  It needs to get the
tuple if you need it back.

My bet is that you haven't tuned your vacuums correctly, or you aren't doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).

A

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 10:46 AM, Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> On Wed, Jan 09, 2008 at 05:21:24PM +0100, Ivan Sergio Borgonovo wrote:
> >
> > I got the impression that even counting with clauses on on indexed
> > columns means you'll have to check if columns are still there. That
> > seems to imply that the extra cost make pg under perform compared to
> > other DB even in that scenario.
>
> You have to do this for any row you need to see, for _any_ database
> operation in Postgres.  But that's no different from any other database
> system: they all have to locate all the rows that satisfy the condition, and
> then tell you how many there are.
>
> Many other systems, however, "know" how many rows there are in the table.
> In some sense, they have optimised for that case at the expense of other
> cases (like, for instance, more aggressive locks than Postgres takes, or
> failures due to rollback segment exhaustion, or whatever your favourite
> limitation of your system of choice is).  When you build a system, you're
> going to trade some features for others more than likely, and the real
> question is what things you trade away.  The speed of counting all the rows
> in the table seems to me to be a good thing to trade away, because it's very
> rare that you actually need to know that.
>
> > If you're interested in all the record in a table, there is no way to
> > have an "engraved in stone" answer
>
> Sure there is: SELECT count(*) from table.  That tells you how many rows
> there were in the table when your transaction started (modulo read
> visibility rules), just like in any other database system.

And if, for some god forsaken reason, you need to operate on that
number, there's always "lock table"...

I feel dirty. :)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Andreas Kretschmer

Josh Harrison <[EMAIL PROTECTED]> schrieb:
> My questions
> 1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another database
> (database2)
> When I issued the query in both database (database1 and database2)
> 
> select * from dummy limit 1000 ( the planner chooses seq scan for this query)
> select * from shuffled_dummy limit 1000 (planner chooses seq scan)
> 
> 
> 2. Also when does the planner switch from choosing index scan to bitmap index
> scan? Is it dependent on the number of rows to be retrieved or the position of
> the relevant data in the blocks or something else?

For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.

A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
  (a index isn't useful, for instance, for a small table or when almost
  all rows are in the result set)

A bitmap index scan performed when:
- 2 or more propper indexes available
- see above



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 11:39 AM, Andrew Sullivan <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
> > Does it make any sense *knowing* how the implementation works to load
> > records in a table in a specific order to improve performances?
>
> Well, this is more or less what CLUSTER does.  There are some cases where
> happening to know about the order the table is in will yield happy
> effects,
> yes.
>

You are right. Sometimes when i cluster the table according to the
frequently accessed indexes then it makes queries pretty fast. But its not a
feasible solution always since some tables have more indexes which are
accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes. Index-only scan
is a good solution for this I guess for queries involving indexed columns
(like in oracle) !!!

josh

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 11:28 AM, Ivan Sergio Borgonovo <[EMAIL PROTECTED]> wrote:

> On Wed, 09 Jan 2008 10:54:21 -0500
> Tom Lane <[EMAIL PROTECTED]> wrote:
>
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > > Josh Harrison escribió:
> > >> Fine. I can use order by when I want  to order it in terms of
> > >> some columns. But What if I want to maintain the same order as
> > >> in the database1? ie., I want my rows of TableABC in Database2
> > >> to be the same order as the rows in TableABC in Database 1 ???
> >
> > > You can't.
> >
> > According to the SQL standard, a table is an *unordered* collection
> > of rows, and the results of any query are produced in an
> > unspecified order (unless you use ORDER BY).  The ambiguity about
> > row ordering is intentional and is exploited by most DBMSes
> > including Postgres to improve implementation efficiency.  If you
> > assume there is such a thing as a specific ordering within a table,
> > you'll live to regret it eventually.
>
> Does it make any sense *knowing* how the implementation works to load
> records in a table in a specific order to improve performances?
>
> And yeah I know that once you start deleting/updating row you may
> lose the advantage you gained betting on some peculiarity of the
> implementation... but in case you're dealing with a mostly static
> table?
>
> eg. if I'm importing a table does it make any sense to pre-sort it
> before importing it in postgres?
>

Okay. Let me explain this again
Lets say you load the data related to a particular person sequentially into
a table in the test  database. This results in all of the data for any one
person being located one or a very few sequential data blocks.  Testing
access to the person's data then reveals access to be very fast.  However,
the data is never loaded in that way in the production database.  It is
almost always spread out across many data blocks within the database,
roughly organized by the date-time in which the data arrived.In this case
access to a particular person's data is not as fast as compared to the
previous one where the data are located close to each other.

we have this problem when we compare Oracle's performance with postgres
since Oracle has index-only scan where it can access the data just from the
index when the query involves only indexed columns. But since postgres
currently doesn't implement index-only scan and goes to the heap for
fetching the tuples it becomes very slow when the data are shuffled

Let me know if it makes sense now
Thanks
josh

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 05:21:24PM +0100, Ivan Sergio Borgonovo wrote:
> 
> I got the impression that even counting with clauses on on indexed
> columns means you'll have to check if columns are still there. That
> seems to imply that the extra cost make pg under perform compared to
> other DB even in that scenario.

You have to do this for any row you need to see, for _any_ database
operation in Postgres.  But that's no different from any other database
system: they all have to locate all the rows that satisfy the condition, and
then tell you how many there are.

Many other systems, however, "know" how many rows there are in the table. 
In some sense, they have optimised for that case at the expense of other
cases (like, for instance, more aggressive locks than Postgres takes, or
failures due to rollback segment exhaustion, or whatever your favourite
limitation of your system of choice is).  When you build a system, you're
going to trade some features for others more than likely, and the real
question is what things you trade away.  The speed of counting all the rows
in the table seems to me to be a good thing to trade away, because it's very
rare that you actually need to know that.

> If you're interested in all the record in a table, there is no way to
> have an "engraved in stone" answer 

Sure there is: SELECT count(*) from table.  That tells you how many rows
there were in the table when your transaction started (modulo read
visibility rules), just like in any other database system.

A

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Andrew Sullivan

On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
> Does it make any sense *knowing* how the implementation works to load
> records in a table in a specific order to improve performances?

Well, this is more or less what CLUSTER does.  There are some cases where
happening to know about the order the table is in will yield happy effects,
yes.

A

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 10:21 AM, Ivan Sergio Borgonovo <[EMAIL PROTECTED]> wrote:
> On Wed, 09 Jan 2008 16:33:54 +0200
> Sim Zacks <[EMAIL PROTECTED]> wrote:
>
> > Using count(*) is not bad design, though generally it makes sense
> > to use it with a where.
>
> I got the impression from others comments that postgresql
> under perform other DB even when a where clause on indexed column is
> involved.

Correct, see my other post.  Again though, it's a tradeoff.  This one
lower performing action allows other actions to be faster.

> Again: paging records. You can't do statistics.

Sure you can.  When you're near the front, no one cares how exact it
is.  Ever search for a common term on google?  You don't get an exact
count, you get an approximation, and you get it for a reason.

Other sites often lie, and give what looks like an exact count, but if
it's in the 100,000 who really cares?  Seriously, are you going to
10,000th page on google for a search term?  If someone does start
going that far out, the cost of limit/offset are going to kick in, and
not just in postgresql, and queries are going to take longer and
longer.  At that point, you can switch to an exact count(*) if you
need to have it.  It won't be the most expensive thing you're doing.

> Still everybody knows that a frequent complain about postgresql is it
> has a slow count.

I've never heard it before (just kidding).  I do think it's frequent.
I also think it's overstated.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 8:33 AM, Sim Zacks <[EMAIL PROTECTED]> wrote:
> Using count(*) is not bad design, though generally it makes sense to use it 
> with
> a where.
>
> Saying using count(*) is bad design means that the only design that you can
> visualize is the specific one that you are using.
>
> There are tons of real world examples where you need count. That is why so 
> many
> people use it as a benchmark.
>
> Obviously if you have an application where millions of rows are added and
> subtracted every minute, then the value of count is kind of vague.
>
> However, if you are querying a data warehouse that gets populated once a 
> month,
> then count has a huge value. You could use statistics in that case, but it is
> much harder to join against other tables when you are using statistics. It is
> also less intuitive then using the SQL standard for finding the number of 
> rows.

I think part of the problem is people think of count() as something
other than an aggregate function.

If I ran "select avg(i) from table" on a 20M row table, I'd expect it
to take a few seconds, after all, I'm running a function across 20
Million rows.

Some databases have the ability to short-circuit count(*) without a
where clause, some with a where clause.  But the basic model of
count(*) is that it's an aggregate function, and what you're asking
the db to do is to count every single row in the db that matches your
where clause.

Now, everything's a tradeoff.  If PostgreSQL had visibility
information in the indexes, it would have to lock both the table and
index for every write, thus slowing down all the other queries that
are trying to access the table.  It would be a tradeoff that
sacrificed write speed for read speed. In a db that was used mostly
for writing, it would likely be a fair trade.  In a db that did a lot
of writing, it might slow the whole thing to a crawl.

So, the slow count(*) performance of postgresql, especially count(*)
without a where clause, is a result of the TANSTAAFL principle (there
ain't no such thing as a free lunch).

Now, if there's a where clause that's selective enough, then a
count(*) query may use the index and be a bit faster, but remember, in
pgsql, it's still got to actually hit the table to see if each tuple
really is visible to this transaction, so the index needs to be fairly
selective to be a win.

A possible workaround is to have something like a separate table with
nothing but the IDs and whatever would be in your where clause for the
tables you're accessing with a foreign key to it, and use THAT for a
count(*).  Since the rows are skinnier, the count(*) will be faster.
Another alternative is to have a trigger fire that keeps a track of
the size of the table in a summary table when rows are added and
deleted.

Each of these methods "costs" you something, in time and effort or
performance, and that's why they're not automatic.  For instance, I
have an 80M row stats db that grows by about 1M rows a week.  I do NOT
need to count the whole thing, hardly ever, and can live with the fact
that I don't know exactly how many rows it has at any given time.
When I do a select count(*) with a where clause it's usually
restricted to < 1 weeks data and can use an index and come back pretty
quickly.

select count(*) from bigtable where inserttime > now() - interval '1 week';
  count
-
 1254269
(1 row)

Time: 21422.368 ms

Second run:
select count(*) from bigtable where inserttime > now() - interval '1 week';
  count
-
 1254320
(1 row)

Time: 2001.204 ms

With the data loaded into shared_buffers / linux kernel cache, that's
not too bad.

Now, I run the same query against our production oracle machine, which
is a MUCH more powerful server...

SQL> select count(*) from bigtable where inserttime > SYSDATE-7;

  COUNT(*)
--
   1255972

Elapsed: 00:00:18.62

second run:
SQL> select count(*) from bigtable where inserttime > SYSDATE-7;

  COUNT(*)
--
   1255973
Elapsed: 00:00:00.98

Now, obviously, Oracle's got some optimizations for what it's got in
the buffer there, but the first run isn't really any faster. In fact,
for a much more powerful machine, the performance was, relatively
speaking, pretty bad compared to my little 1 CPU 1 sw RAID-10
reporting server.

So, while PostgreSQL's count(*) performance isn't blindingly fast,
it's not the dog some people make it out to be either.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Installation problem: failed to initialize lc_messages to ""

2008-01-09 Thread Tom Lane

Stefan Schwarzer <[EMAIL PROTECTED]> writes:
> FATAL:  XX000: failed to initialize lc_messages to ""
> LOCATION:  InitializeGUCOptions, guc.c:2666

Typically what this means is that you have an improper setting of LANG
or LC_ALL in your environment ("improper" meaning that it doesn't match
any of the locales that are actually installed on your system).  You
can use "locale -a" to find out what's installed.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <[EMAIL PROTECTED]> wrote:

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Josh Harrison escribió:
> >> Fine. I can use order by when I want  to order it in terms of
> >> some columns. But What if I want to maintain the same order as
> >> in the database1? ie., I want my rows of TableABC in Database2
> >> to be the same order as the rows in TableABC in Database 1 ???
> 
> > You can't.
> 
> According to the SQL standard, a table is an *unordered* collection
> of rows, and the results of any query are produced in an
> unspecified order (unless you use ORDER BY).  The ambiguity about
> row ordering is intentional and is exploited by most DBMSes
> including Postgres to improve implementation efficiency.  If you
> assume there is such a thing as a specific ordering within a table,
> you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 09 Jan 2008 16:33:54 +0200
Sim Zacks <[EMAIL PROTECTED]> wrote:

> Using count(*) is not bad design, though generally it makes sense
> to use it with a where.

I got the impression from others comments that postgresql
under perform other DB even when a where clause on indexed column is
involved.

I may have misinterpreted this but still could someone clarify?

Harald Armin Massa wrote:

> 1) just lie, present an estimate
> 2) do not have MVCC
> 3) store "record deleted info" in index, so you can answer count()
> with only scanning the index

I got the impression that even counting with clauses on on indexed
columns means you'll have to check if columns are still there. That
seems to imply that the extra cost make pg under perform compared to
other DB even in that scenario.

I beg pardon to Harald if I misinterpreted his words.

> Saying using count(*) is bad design means that the only design that
> you can visualize is the specific one that you are using.

I'd be interested in some example by Joshua otherwise I can't
understand what he meant.

If you're interested in all the record in a table, there is no way to
have an "engraved in stone" answer and since there are no "where
clauses" you can cache that info and update it once in a while.
If you have a where clause I'm not expecting an "engraved in stone"
answer but I'd expect to have a quick way to get an estimate and I
still haven't understood if:
a) I could improve my design to avoid count
b) postgres perform as good as other db on count where there is a
where clause
c) is there a way to have a quick estimate avoiding count when there
is a where clause

> There are tons of real world examples where you need count. That is
> why so many people use it as a benchmark.

> Obviously if you have an application where millions of rows are
> added and subtracted every minute, then the value of count is kind
> of vague.

> However, if you are querying a data warehouse that gets populated
> once a month, then count has a huge value. You could use statistics
> in that case, but it is much harder to join against other tables
> when you are using statistics. It is also less intuitive then using
> the SQL standard for finding the number of rows.

Again: paging records. You can't do statistics.
Surely you could optimise and lie... but that comes to a cost
compared to the simplicity of count.

Still everybody knows that a frequent complain about postgresql is it
has a slow count.

I can understand grey answer provided they are coherent.
a) the above claim is false
b) this claim is true just on cases where you could opt for a better
design
c) this claim is false for count without where clause
d) this claim is true
e) ... 

details on b) would be much appreciated. Other cases require just a
yes/no answer.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Andrew Sullivan

On Tue, Jan 08, 2008 at 11:37:38PM -0700, Guido Neitzer wrote:
> Like, I have a situation where I need multi-master just for  
> availability. Two small servers are good enough for that. But  
> unfortunately with PostgreSQL the whole setup is a major pain in the ...

Really?  I don't think a RAID array with heartbeat and failover is that big
a pain.  It requires some careful implementation, but it can be made to work
well, I think.

A


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 10:27 AM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> Josh Harrison escribió:
>
> > Another quick question...When you issue a query like this
> > select * from dummy limit 10
> > What 10 rows are fetched? like first 10  or last 10 or  the first 10
> from
> > first block or ?
>
> Any 10.  (First 10 in the physical table _if_ a seqscan is used).
>

Okay. Here is another scenario where Im confused.
I have a a table with around  30,000,000 recs. This is not a production
system but a test system. So in the test system generally we upload the rows
in some order say rows corresponding to a particular  patient or something
like that. But in the production system, it generally doesn't happen like
that. The rows of 1 particular patient can be shuffled anywhere (ie.,
inserted in any order). We r trying to duplicate the same by shuffling te
data in the table so that the rows are not in any order and also not stored
in contiguous blocks

So now I have a table Dummy with 30,000,000 recs and a table Shuffled_Dummy
(Create table Shuffled_Dummy as select * from Dummy order by random() ) with
the same shuffled rows of dummy.

My questions
1.  I pg_dumped dummy and Shuffled_dummy (from database1) to another
database(database2)
When I issued the query in both database (database1 and database2)

select * from dummy limit 1000 ( the planner chooses seq scan for this
query)
 - the output results from dummy are different in the 2 databases
But
select * from shuffled_dummy limit 1000 (planner chooses seq scan)
 - the outputs from shuffled_dummy are same from both the database

Why?

2. Also when does the planner switch from choosing index scan to bitmap
index scan? Is it dependent on the number of rows to be retrieved or the
position of the relevant data in the blocks or something else?

Thanks
josh

Re: [GENERAL] PgSql Mirroring/Fail Over Server

2008-01-09 Thread Scott Marlowe

On Jan 9, 2008 3:24 AM, Rayudu Madhava <[EMAIL PROTECTED]> wrote:
>
> Sir,
>
>   I am very new to Pgsql. I have a server serving 200
> clients. I want to prepare a failover /mirroring
> server which in case the original server fails should
> take over automatically. Kindly Guide me.

Probably the easiest way to accomplish this is through pgpool II.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Josh Harrison escribió:
>> Fine. I can use order by when I want  to order it in terms of some columns.
>> But What if I want to maintain the same order as in the database1? ie., I
>> want my rows of TableABC in Database2 to be the same order as the rows in
>> TableABC in Database 1 ???

> You can't.

According to the SQL standard, a table is an *unordered* collection of
rows, and the results of any query are produced in an unspecified order
(unless you use ORDER BY).  The ambiguity about row ordering is
intentional and is exploited by most DBMSes including Postgres to
improve implementation efficiency.  If you assume there is such a thing
as a specific ordering within a table, you'll live to regret it
eventually.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Insert waiting for update?

2008-01-09 Thread Ashish Karalkar

Thanks for the replayI think you missed on second detail mail :For more details:I have two tables master,child.with child having fk to master.Now
that master table contains 4M rows . while I update them (Master table)
the inserts are going into waiting mode on child table.Update 
acquired row exclusive lock on master table  , and the insert on child
table acquired access share and share lock,row share on master. Is
this is what preventing for inserting records into child table, I mean
is the row exclusive lock conflicting with the locks insert(which is again row exclusive log) wants to
hold?With RegardsAshish...--- On Wed, 9/1/08, Albe Laurenz <[EMAIL PROTECTED]> wrote:From: Albe Laurenz <[EMAIL PROTECTED]>Subject: Re: [GENERAL] Insert waiting for update?To: [EMAIL PROTECTED], "pggeneral" Cc: [EMAIL PROTECTED]Date: Wednesday, 9 January, 2008, 8:53 PMAshish Karalkar wrote:> I am having table with 4M rows.> I am trying to update all these rows with statement> > update mytable set mycolumn=0;> > At the same time there are insert happening on the table.> but all these insert are in waiting mode. > does update is locking the table for insert?> > does insert and update confilict with each other?Not normally, but
 it can happen.You couldSELECT l.locktype, t.relname, l.pid, l.transactionid, l.mode, l.grantedFROM pg_catalog.pg_locks l LEFT OUTER JOIN pg_catalog.pg_class t ON l.relation = t.oid;andSELECT procpid, current_query FROM pg_stat_activity;while the inserts hang. Maybe the result will indicate why.Are there any triggers or rules defined?What indexes are defined on the table?Yours,Laurenz Albe---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to   choose an index scan if your joining column's datatypes do not   match


   Chat on a cool, new interface. No download required. Click here.

[GENERAL] Installation problem: failed to initialize lc_messages to ""

2008-01-09 Thread Stefan Schwarzer


Hi there,

I am trying to install Postgres 8.1.11 on Mac Leopard. Compilation was  
ok. Now, the initdb has some problems:



$ /usr/local/pgsql/bin/initdb -D --locale=C /Users/schwarzer/Documents/ 
data_postgres

...
selecting default max_connections ... 10
selecting default shared_buffers ... 50
creating configuration files ... ok
creating template1 database in /Users/schwarzer/Documents/ 
data_postgres/base/1 ... FATAL:  XX000: failed to initialize  
lc_messages to ""

LOCATION:  InitializeGUCOptions, guc.c:2666
child process exited with exit code 1
initdb: removing contents of data directory "/Users/schwarzer/ 
Documents/data_postgres"


What is it? What do I have to do? Can anyone give me a hint please?!

Thanks a lot,

Stef

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Alvaro Herrera

Josh Harrison escribió:

> Another quick question...When you issue a query like this
> select * from dummy limit 10
> What 10 rows are fetched? like first 10  or last 10 or  the first 10 from
> first block or ?

Any 10.  (First 10 in the physical table _if_ a seqscan is used).

> And this query always returns the same 10 rows (when there are no
> updates/deletes)

Yes, assuming there is no VACUUM or CLUSTER either.  And no inserts
either if there is free space on the first pages.

You really shouldn't trust it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Insert waiting for update?

2008-01-09 Thread Albe Laurenz

Ashish Karalkar wrote:
> I am having table with 4M rows.
> I am trying to update all these rows with statement
> 
> update mytable set mycolumn=0;
> 
> At the same time there are insert happening on the table.
> but all these insert are in waiting mode. 
> does update is locking the table for insert?
> 
> does insert and update confilict with each other?

Not normally, but it can happen.

You could

SELECT l.locktype, t.relname, l.pid, l.transactionid, l.mode, l.granted
FROM pg_catalog.pg_locks l LEFT OUTER JOIN
 pg_catalog.pg_class t ON l.relation = t.oid;

and

SELECT procpid, current_query FROM pg_stat_activity;

while the inserts hang. Maybe the result will indicate why.

Are there any triggers or rules defined?
What indexes are defined on the table?

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Sim Zacks

I believe I was misunderstood. The fact that a product is closed source does not 
make it a better product. Some companies that are using Oracle would be better 
off using PostgreSQL. Other companies that need the features that Oracle offers 
would not be better off using Postgresql.


However, there are a lot of closed source products that are better then their 
open source counter-parts.
Oracle vs. Postgresql is one of them. Obviously if you don't need the 
feature-set provided by Oracle then you would be foolish for paying for it. But 
if you do need the extra features, then it is worth it.


Sim




Clodoaldo wrote:

2008/1/9, Sim Zacks <[EMAIL PROTECTED]>:

The reason companies go with the closed source, expensive solutions is because
they are better products.


Not necessarily. FOSS products don't have a selling team to persuade
and bribe people. Expensive solutions, and that is in part what make
them expensive, can spend lots of time persuading and can offer good
money to those who decide which is the "best" product. Those who
decide are not the coders or db admins and in general don't really
care much.

Regards, Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 9:59 AM, A. Kretschmer <[EMAIL PROTECTED]>
wrote:

> am  Wed, dem 09.01.2008, um  9:45:11 -0500 mailte Josh Harrison folgendes:
> > What if I want to maintain the same order as in the database1? ie., I
> want my
> > rows of TableABC in Database2 to be the same order as the rows in
> TableABC in
> > Database 1 ???
>
> For what reason?
>
> Again: there is no order within the database.
>
> Thanks...
Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10  or last 10 or  the first 10 from
first block or ?
And this query always returns the same 10 rows (when there are no
updates/deletes)


josh

Re: [GENERAL] XML path function

2008-01-09 Thread x asasaxax

My Postgre version its the 8.2. I´ve reached to do the path i wanted, but
when i do a explain analyze on the select it return 500 miliseconds. Is this
a good search? Is there a way to slow down this time with postgre 8.3? What
is a good time for xml xpath´s?

Thanks

2008/1/8, x asasaxax <[EMAIL PROTECTED]>:
>
> Hi everyone,
>
>I´m trying to undestand how to do select with xml path.
>
> I have this xml example:
>
> create table temp(id integer, xml text, Primary Key(id));
>
> 
> 
> 
> 
>  
>  
>  
>  
>  
>  
> 
> 
> 
>
> Here go the questions:
> 1) How can i select all name elements from the xml?
> 2) How can i select all element3 id´s?
> 3) How can i select all  element5 that have element3 with attribute
> name="x"?
>
>
> Thanks very much.  :)
>

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread A. Kretschmer

am  Wed, dem 09.01.2008, um  9:45:11 -0500 mailte Josh Harrison folgendes:
> What if I want to maintain the same order as in the database1? ie., I want my
> rows of TableABC in Database2 to be the same order as the rows in TableABC in
> Database 1 ???

For what reason?

Again: there is no order within the database.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 9:35 AM, A. Kretschmer <[EMAIL PROTECTED]>
wrote:

> am  Wed, dem 09.01.2008, um 14:07:13 + mailte Raymond O'Donnell
> folgendes:
> > On 09/01/2008 14:02, Josh Harrison wrote:
> >
> > >When restoring the pg_dumped data thro psql does the rows of the table
> > >are restored in the same order? ie for example if
> > >Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
> > >restore it to another database, will it have the rows in the same order
>
> > >r1,r2,r3,r4,r5? Does this apply to big tables also?
> >
> > If you use the text dump format, you can see the order in which the rows
> > are restored.
>
> Right, but within the table the rows hasn't an order. You can see this
> when you select the ctid-column:
>
> test=# create table order_test (i int);
> CREATE TABLE
> test=*# insert into order_test values (1);
> INSERT 0 1
> test=*# insert into order_test values (2);
> INSERT 0 1
> test=*# insert into order_test values (3);
> INSERT 0 1
> test=*# select ctid, i from order_test ;
>  ctid  | i
> ---+---
>  (0,1) | 1
>  (0,2) | 2
>  (0,3) | 3
> (3 rows)
>
> test=*# update order_test set i=20 where i=2;
> UPDATE 1
> test=*# update order_test set i=2 where i=20;
> UPDATE 1
> test=*# select ctid, i from order_test ;
>  ctid  | i
> ---+---
>  (0,1) | 1
>  (0,3) | 3
>  (0,5) | 2
> (3 rows)
>
>
>
> Now a pg_dump:
>
> ALTER TABLE public.order_test OWNER TO webmaster;
>
> --
> -- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
> webmaster
> --
>
> COPY order_test (i) FROM stdin;
> 1
> 3
> 2
> \.
>
>
> Now the question: what is the correct order?
>
> All my requirement is that the dumped table in database2 should be in the
same order as the original table(at the time of dump) in database1 .
Thanks
josh

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Sim Zacks

Using count(*) is not bad design, though generally it makes sense to use it with 
a where.


Saying using count(*) is bad design means that the only design that you can 
visualize is the specific one that you are using.


There are tons of real world examples where you need count. That is why so many 
people use it as a benchmark.


Obviously if you have an application where millions of rows are added and 
subtracted every minute, then the value of count is kind of vague.


However, if you are querying a data warehouse that gets populated once a month, 
then count has a huge value. You could use statistics in that case, but it is 
much harder to join against other tables when you are using statistics. It is 
also less intuitive then using the SQL standard for finding the number of rows.


Sim

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Alvaro Herrera

Josh Harrison escribió:

> Fine. I can use order by when I want  to order it in terms of some columns.
> But What if I want to maintain the same order as in the database1? ie., I
> want my rows of TableABC in Database2 to be the same order as the rows in
> TableABC in Database 1 ???

You can't.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

On Jan 9, 2008 9:12 AM, A. Kretschmer <[EMAIL PROTECTED]>
wrote:

> am  Wed, dem 09.01.2008, um  9:02:23 -0500 mailte Josh Harrison folgendes:
> > Hi,
> > When restoring the pg_dumped data thro psql does the rows of the table
> are
> > restored in the same order? ie for example if
> > Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
> restore it
> > to another database, will it have the rows in the same order
> r1,r2,r3,r4,r5?
>
> No. If you need an order than you need an ORDER BY in the
> SELECT-Statement.
>

Fine. I can use order by when I want  to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???

Thanks
josh

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread A. Kretschmer

am  Wed, dem 09.01.2008, um 14:07:13 + mailte Raymond O'Donnell folgendes:
> On 09/01/2008 14:02, Josh Harrison wrote:
> 
> >When restoring the pg_dumped data thro psql does the rows of the table 
> >are restored in the same order? ie for example if
> >Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and 
> >restore it to another database, will it have the rows in the same order 
> >r1,r2,r3,r4,r5? Does this apply to big tables also?
> 
> If you use the text dump format, you can see the order in which the rows 
> are restored.

Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:

test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
 ctid  | i
---+---
 (0,1) | 1
 (0,2) | 2
 (0,3) | 3
(3 rows)

test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
 ctid  | i
---+---
 (0,1) | 1
 (0,3) | 3
 (0,5) | 2
(3 rows)



Now a pg_dump:

ALTER TABLE public.order_test OWNER TO webmaster;

--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--

COPY order_test (i) FROM stdin;
1
3
2
\.


Now the question: what is the correct order?


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Clodoaldo

2008/1/9, Sim Zacks <[EMAIL PROTECTED]>:
>
> The reason companies go with the closed source, expensive solutions is because
> they are better products.

Not necessarily. FOSS products don't have a selling team to persuade
and bribe people. Expensive solutions, and that is in part what make
them expensive, can spend lots of time persuading and can offer good
money to those who decide which is the "best" product. Those who
decide are not the coders or db admins and in general don't really
care much.

Regards, Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 9 Jan 2008 13:04:39 +0100
"Harald Armin Massa" <[EMAIL PROTECTED]> wrote:

> Ivan,

> > Please forgive my naiveness in this field but what does it mean an
> > "exact count" and what other DB means with "an exact count" and
> > how other DB deal with it?

> PostgreSQL will give you an exact count of the contents of the
> database as it is in the moment you begin your count. (i.e. the
> transaction starts)

> BUT as the table is not locked, in parallel somebody can bulkload
> MANY items into the database, so at the moment (start of your
> transaction)
> + 1msec your count may be invalid allready.

That's reasonable. What other DB do and what is the SQL standard
requirement for count(*)?

> > I'd expect it perform as good or better than other DB since now
> > the bottleneck should be how efficiently it can filter records...
> > but still a count(*) with a where clause will incur in the same
> > problem of what "exact" means.
> 
> I know of 3 concepts to answer count() faster then PostreSQL:
> 
> 1) just lie, present an estimate
> 2) do not have MVCC
> 3) store "record deleted info" in index, so you can answer count()
> with only scanning the index

Sorry if I insist but I think this is a beaten selling point against
postgresql.
One of the most recurring argument about why someone else db is
better than pg is count is slow.

Who lies?
If it is possible to do otherwise to have count run faster what are
the trade off that make it unreasonable to implement it in pg?

This is not very useful question but 3) imply that select scan the
index return the rows and just later check if they are still there.
Is it?
And since indexes aren't updated "on the fly" you may get back a
larger number than what is actually the real value.

Let me consider an everyday use where count() looks as the most
obvious solution: paging.

I search trough a table and I need to know which is the last page.
Concurrently someone is deleting a ton of records.
No matter if count() is fast or not when I output the pager it will
be "wrong". But still I'll need an estimate of the number of pages,
it is not fun if that estimate is wrong *and* slow.
And once you add the where clauses there is no way to cache the count.
Is there a way to count based on indexes without taking into account
deleted rows so to "count" faster?

I can make the search faster using indexes as Dann Corbit suggested,
but as you imply that won't make count as fast as the
"concurrence"[1] that lie or don't use MVCC or store deleted info in
indexes.

SELECT reltuples FROM pg_class WHERE relname = ;

doesn't apply since you can't add "conditions".

Please be patient. I hear this over and over and over. Postgresql is
faulty because it can't count fast.
And the only reply I've seen are: it is bad design to use count,
you're a dumb ass. I admit I may be a dumb ass, but it is hard to
sell Postgres if I continue to be a dumb ass ;)

- What count(*) really does?
Now I understood that count(*) return the # of rows as it sees them
at the moment it was invoked. That should be true for other DB as
well. That means that unless other DB lock the table they can't take
into account records that are going to be deleted once the count has
been issued.

- When count can be avoided?
Well since even other DB may report the "wrong" number, this makes
count() look less a Sacre Graal. But still if you need an estimate,
wouldn't it be better to have it quickly?
How does postgresql compare to other DB when you run:
select count(*) from table where conditions
once you use indexes?
If such kind of query will have anyway to scan the results to see if
they are still there since info about deleted records aren't stored
in indexes, is there a way to ignore this and just have a faster
estimate?
I still can't see why it is bad design to use count().

- When count() can't be avoided?
All the situation where you may really need count() I think you also
need to lock the table but well I'd be curious to see an example
where you need count()

Still can somebody make an example of bad design and one where
count() couldn't be avoided if any?

Consider that while it makes few sense to rely on "wrong" numbers in
a "business" environment where data integrity/coherence makes *a lot*
of sense it is not so clear in a CMS world where most of those
critics come from.

I know that the arguments to promote postgres in the "business" world
where DB2, Oracle and MS SQL play (?) may be different and count()
may lose its importance in that context and you could say that other
advantages plenty pay off the "slowness" of an operation that in such
a context is rare.

thanks

[1] or does postgres perform as the concurrence once you add where
clauses?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread A. Kretschmer

am  Wed, dem 09.01.2008, um  9:02:23 -0500 mailte Josh Harrison folgendes:
> Hi,
> When restoring the pg_dumped data thro psql does the rows of the table are
> restored in the same order? ie for example if
> Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore 
> it
> to another database, will it have the rows in the same order r1,r2,r3,r4,r5?

No. If you need an order than you need an ORDER BY in the
SELECT-Statement.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Raymond O'Donnell


On 09/01/2008 14:02, Josh Harrison wrote:

When restoring the pg_dumped data thro psql does the rows of the table 
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and 
restore it to another database, will it have the rows in the same order 
r1,r2,r3,r4,r5? Does this apply to big tables also?


If you use the text dump format, you can see the order in which the rows 
are restored.


Ray.

---
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
[EMAIL PROTECTED]
---

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] quick question abt pg_dump and restore

2008-01-09 Thread Josh Harrison

Hi,
When restoring the pg_dumped data thro psql does the rows of the table are
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore
it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?
Thanks
josh

Re: [GENERAL] Insert waiting for update?

2008-01-09 Thread Ashish Karalkar

--- On Wed, 9/1/08, Ashish Karalkar <[EMAIL PROTECTED]> wrote:From: Ashish Karalkar <[EMAIL PROTECTED]>Subject: [GENERAL] Insert waiting for update?To: "pggeneral" Cc: [EMAIL PROTECTED]Date: Wednesday, 9 January, 2008, 4:29 PMDear list members,I am having table with 4M rows.I am trying to update all these rows with statementupdate mytable set mycolumn=0;At
 the same time there are insert happening on the table.but all these insert are in waiting mode. does update is locking the table for insert?does insert and update confilict with each other?With regardsAshihshFor more details:I have two tables master,child.with child having fk to master.Now that master table contains 4M rows . while I update them (Master table) the inserts are going into waiting mode on child table.Update  acquired row exclusive lock on master table   , and the insert on child table acquired access share and share lock,row share on master. Is this is what preventing for inserting records into child table, I mean is the row exclusive lock conflicting with the locks insert wants to hold?With RegardsAshish...


   Save all your chat conversations. Find them online.




   Why delete messages? Unlimited storage is just a click away.

Re: [GENERAL] Startup scripts - Use -m fast or smart?

2008-01-09 Thread Glyn Astill

Thanks Laurenz, that's a good point, I shall leave them as is.

Glyn


--- Albe Laurenz <[EMAIL PROTECTED]> wrote:

> Glyn Astill wrote:
> > I've just changed my startup scripts to use the linux one
> supplied in
> > contrib.
> > 
> > I noticed this uses the "-m fast" argument for start and stop.
> > 
> > Before I setup the scripts I was using "-m smart" to make sure
> all
> > queries were finished before shutting dowm on all but my WAL
> slave.
> > 
> > I was going to change these to -m smart just to be safe, however
> I
> > just wanted to check if there was any reason not to do this?
> 
> If you have "pg_ctl stop -m smart" in your shutdown script, this
> will prevent your host from shutting down as long as somebody is
> still
> connected to the database, even if the connections are idle.
> 
> This is most likely *not* what you want.
> 
> -m fast is the correct thing for server shutdown. It will do no
> damage
> as all active transactions will be rolled back.
> 
> If there are database transactions that you do not want to
> interrupt,
> the best approach is not to shutdown the database server.
> 
> Yours,
> Laurenz Albe
> 



  __
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Startup scripts - Use -m fast or smart?

2008-01-09 Thread Albe Laurenz

Glyn Astill wrote:
> I've just changed my startup scripts to use the linux one supplied in
> contrib.
> 
> I noticed this uses the "-m fast" argument for start and stop.
> 
> Before I setup the scripts I was using "-m smart" to make sure all
> queries were finished before shutting dowm on all but my WAL slave.
> 
> I was going to change these to -m smart just to be safe, however I
> just wanted to check if there was any reason not to do this?

If you have "pg_ctl stop -m smart" in your shutdown script, this
will prevent your host from shutting down as long as somebody is still
connected to the database, even if the connections are idle.

This is most likely *not* what you want.

-m fast is the correct thing for server shutdown. It will do no damage
as all active transactions will be rolled back.

If there are database transactions that you do not want to interrupt,
the best approach is not to shutdown the database server.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Read/Write restriction mechanism

2008-01-09 Thread Michael Glaesemann



On Jan 9, 2008, at 1:39 , Naz Gassiep wrote:

In a PHP project I have several functions that I use for DB  
operations. I only want to allow one of them to write, all the  
others are for reading only.



(Using DB level perms are out, as this is the function usage I'm  
trying to control, not the connections).


Um, why are DB-level permissions out? It seems like a natural fit:  
your writer connects as one role while the readers connect as  
another. Only grant SELECT access to the readers.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Harald Armin Massa

Ivan,

> Please forgive my naiveness in this field but what does it mean an
> "exact count" and what other DB means with "an exact count" and how
> other DB deal with it?

PostgreSQL will give you an exact count of the contents of the
database as it is in the moment you begin your count. (i.e. the
transaction starts)

BUT as the table is not locked, in parallel somebody can bulkload MANY
items into the database, so at the moment (start of your transaction)
+ 1msec your count may be invalid allready.

> I'd expect it perform as good or better than other DB since now the
> bottleneck should be how efficiently it can filter records... but
> still a count(*) with a where clause will incur in the same problem
> of what "exact" means.

I know of 3 concepts to answer count() faster then PostreSQL:

1) just lie, present an estimate
2) do not have MVCC
3) store "record deleted info" in index, so you can answer count()
with only scanning the index

Harald


-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Startup scripts - Use -m fast or smart?

2008-01-09 Thread Glyn Astill

Hi chaps,

I've just changed my startup scripts to use the linux one supplied in
contrib.

I noticed this uses the "-m fast" argument for start and stop.

Before I setup the scripts I was using "-m smart" to make sure all
queries were finished before shutting dowm on all but my WAL slave.

I was going to change these to -m smart just to be safe, however I
just wanted to check if there was any reason not to do this?

Thanks
Glyn


  __
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] performance differences of major versions

2008-01-09 Thread Pavel Stehule

Hello

pgbench test - default configuration

Verze   7.3.15  7.4.13  8.0.8   8.1.4   8.2.beta1 8.3beta1
tps 311 340 334 398 423 585

but pgbench is simple test and thise numbers hasnot great value.

Regards
Pavel



On 09/01/2008, Willy-Bas Loos <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Are there any benchmarks that compare different major versions of
> PostgreSQL?
>
> Cheers,
>
> WBL
>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] performance differences of major versions

2008-01-09 Thread Willy-Bas Loos

Hi,

Are there any benchmarks that compare different major versions of
PostgreSQL?

Cheers,

WBL

[GENERAL] Insert waiting for update?

2008-01-09 Thread Ashish Karalkar

Dear list members,I am having table with 4M rows.I am trying to update all these rows with statementupdate mytable set mycolumn=0;At the same time there are insert happening on the table.but all these insert are in waiting mode. does update is locking the table for insert?does insert and update confilict with each other?With regardsAshihs


   Save all your chat conversations. Find them online.

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 9 Jan 2008 01:39:34 -0800
"Dann Corbit" <[EMAIL PROTECTED]> wrote:

> > On Wed, 09 Jan 2008 00:06:45 -0800
> > "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> > > Granted there are scenarios where others are FASTER (SELECT
> > > COUNT(*)) but I find that if you are doing those items, you
> > > normally have a weird design anyway.

> > > Sincerely,

> > Sincerely, would you make an example of such a bad design?

> A program that estimates cardinality by doing SELECT COUNT(*) is a
> bad design.  Assuming you have the wherewithal to vacuum your
> tables (or have autovacuum enabled) a query against the system
> tables will be a much better estimate of cardinality.
> 
> Now (some may argue) what if we want an _EXACT_ value for
> COUNT(*)?  We had better ask ourselves (in that circumstance) "Am I
> willing to lock the entire table and scan it?" because that is what
> will be necessary to get a truly exact value.  Otherwise, you can
> get totals that are wildly off-base if someone is doing a bulk
> import or deleting a large number of records.

Please forgive my naiveness in this field but what does it mean an
"exact count" and what other DB means with "an exact count" and how
other DB deal with it?

How "count" is defined in the SQL standard?

Is there a real situation then where you really need the "exact"
count?

Am I right saying that:

select count(*) from ...
-- here count may already be different

and that:
select for update count(*)

could be very expensive? Or what would it mean to do a
select for update count(*) ...?

> I think this should be a FAQ because it is a (F)requently (A)sked
> (Q)uestion.

After reading your email I think the real problem is not how to do
otherwise but understand what count(*) really mean and when and if it
is really useful and when it can be avoided.

I'd write in the FAQ something in the line of:

- What count(*) really does?
- When it can be avoided?
- When it can't be avoided?

In my everyday use of count(*), after your email I can hardly spot a
place where I need an exact count.
But to better understand and being convinced that using count(*) is
bad design I think last question could help a lot.

How does count(*) with a where clause perform generally on postgresql
compared to other DB?
I'd expect it perform as good or better than other DB since now the
bottleneck should be how efficiently it can filter records... but
still a count(*) with a where clause will incur in the same problem
of what "exact" means.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] data transfer/migrate from win to linux

2008-01-09 Thread Harald Armin Massa

Antonio,

> After looking for a way to transfer PostgreSQL/PostGIS data from windowsXP
> to linux (Ubuntu 7.10), I did not find it.
> Please, does anyone know an easy way or free tool for it.

I do this via

pg_dump on the sender

and

pg_restore or psql -f

on the receiver site. Both are included with PostgreSQL. To avoid
challenges, make sure that the PostgreSQL versions on both systems
match.

best wishes,

Harald

-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] data transfer/migrate from win to linux

2008-01-09 Thread Antonio Muñoz

Hello.

After looking for a way to transfer PostgreSQL/PostGIS data from windowsXP
to linux (Ubuntu 7.10), I did not find it.
Please, does anyone know an easy way or free tool for it.

Thanks in advance
Best regards

Antonio

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Dann Corbit

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-general-
> [EMAIL PROTECTED] On Behalf Of Ivan Sergio Borgonovo
> Sent: Wednesday, January 09, 2008 1:30 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] count(*) and bad design was: Experiences with
> extensibility
> 
> On Wed, 09 Jan 2008 00:06:45 -0800
> "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:
> 
> > Granted there are scenarios where others are FASTER (SELECT
> > COUNT(*)) but I find that if you are doing those items, you
> > normally have a weird design anyway.
> 
> > Sincerely,
> 
> Sincerely, would you make an example of such a bad design?

A program that estimates cardinality by doing SELECT COUNT(*) is a bad
design.  Assuming you have the wherewithal to vacuum your tables (or
have autovacuum enabled) a query against the system tables will be a
much better estimate of cardinality.

Now (some may argue) what if we want an _EXACT_ value for COUNT(*)?  We
had better ask ourselves (in that circumstance) "Am I willing to lock
the entire table and scan it?" because that is what will be necessary to
get a truly exact value.  Otherwise, you can get totals that are wildly
off-base if someone is doing a bulk import or deleting a large number of
records.

So:
SELECT reltuples FROM pg_class WHERE relname = ;

Is more often what is really wanted.

> Or did you just mean that count(*) is bad design in postgresql since
> there are usually better alternatives in postgresql?

If you are using COUNT(*) as an existence test, then substitute:

WHERE EXISTS()

Use the indexes (if possible) by WHERE clause restriction:

SELECT count(1) FROM  WHERE 

Will use indexes if appropriate.

> I'm not joking. I'd like to learn.

I think this should be a FAQ because it is a (F)requently (A)sked
(Q)uestion.

IMO-YMMV.

> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
> 
> 
> ---(end of
broadcast)---
> TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL]

2008-01-09 Thread Evgeny Shepelyuk


--
Best Regards
Evgeny K. Shepelyuk

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] PgSql Mirroring/Fail Over Server

2008-01-09 Thread Rayudu Madhava


Sir,

  I am very new to Pgsql. I have a server serving 200
clients. I want to prepare a failover /mirroring
server which in case the original server fails should
take over automatically. Kindly Guide me.

With Regards,

Rayudu.


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Ivan Sergio Borgonovo

On Wed, 09 Jan 2008 00:06:45 -0800
"Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> Granted there are scenarios where others are FASTER (SELECT
> COUNT(*)) but I find that if you are doing those items, you
> normally have a weird design anyway.

> Sincerely,

Sincerely, would you make an example of such a bad design?

Or did you just mean that count(*) is bad design in postgresql since
there are usually better alternatives in postgresql?

I'm not joking. I'd like to learn.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Experiences with extensibility

2008-01-09 Thread Joshua D. Drake


Sim Zacks wrote:

We use postgresql because it is open source, we have in-house experience 
to deal with it so we don't have any extra support costs and we don't 
need the features that are offered in commercial products that 
PostGreSQL does not have. We also don't need the speed that commercial 
products offer that is missing in PostgreSQL.


I use PostgreSQL because it has a ton of features the closed source 
products don't offer and is generally faster than the closed source 
solutions.


Granted there are scenarios where others are FASTER (SELECT COUNT(*)) 
but I find that if you are doing those items, you normally have a weird 
design anyway.


Sincerely,

Joshua D. Drake


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

1 2 >

1 - 100 of 101 matches

Mail list logo