Re: [GENERAL] GiST and Gin technical doc

2006-10-02 Thread Teodor Sigaev

GiST:

http://gist.cs.berkeley.edu/
http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/

GIN:
GIN is ordinary inverted index which described in many docs.
Look for some implementation details in my presentation on conference:
http://www.sigaev.ru/gin/Gin.pdf

Yannick Warnier wrote:

Hi there,

I'm having trouble finding technical documentation about GiST, Gin and
TSearch2.

I am particularly interested in the internal data structure of a GiST-ed
or Gin-ed index, and the availability of searches by proximity.

Does anyone know of a good place to find such doc, outside from the
source code (I'm not too much into C/C++)?

Thank you,

Yannick


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] PostgreSQL 8.1 - SSL:Required

2006-10-02 Thread Andrew
Hi,I am using the binary install for postgresql 8.1 Win32 on Windows XP. I know how to intstall SSL if I was installing from source, --with-openssl, but I am installing onto Windows XP for the first time, so my question is:
1. How do I install the SSL module via the install wizard?Blessings
Andrew








Re: [GENERAL] Major Performance decrease after some hours

2006-10-02 Thread Peter Bauer

2006/10/2, Tom Lane [EMAIL PROTECTED]:

Peter Bauer [EMAIL PROTECTED] writes:
 Attached you can find the postgresql logfiles and a logfile which
 contains alls SQL statements executed in the relevant time together
 with the excpetions thrown. I also attached a file with all used
 Pl/pgSQL functions. Since we were not able to find a problem for so
 long, i think it makes no sense to filter the information because we
 are probably not looking on the right spots.

I hope the pgsql-general moderators are not going to approve an 8MB
message :-(.  I'll hang onto my copy though, just in case it proves
useful.


Sorry, i put the files on http://dagobert.apus.co.at/pglogfiles/ so
please use them.


 It might be worth attaching to a few of them with
 gdb to get stack traces,

 Should this be done when the problems are appearing or during normal
 operation or both to compare the results?

When the problems are appearing.


ok, i will try to reproduce it and post the stack traces

thx,
Peter

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] PostgreSQL 8.1 - SSL:Required

2006-10-02 Thread Magnus Hagander
 Hi,
 I am using the binary install for postgresql 8.1 Win32 on Windows
 XP. I know how to intstall SSL if I was installing from source, --
 with-openssl, but I am installing onto Windows XP for the first
 time, so my question is:
 
 1. How do I install the SSL module via the install wizard?

SSL is built into all the binaries that are distributed. You just need
to create and install your certificate, and then change the Config files
appropriately (same way as you would when installing from source)

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] pg web hosting with tsearch2?

2006-10-02 Thread Michelle Konzack
Am 2006-09-29 13:45:34, schrieb Rick Schumeyer:

 3)All the web hosts I am aware of are still offering only pg 7.4.
 Does anybody offer pg 8.x ?

I am thinking about a root-Server for 39 Euro/Month
on which I can install what I want

Greetings
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
   50, rue de Soultz MSM LinuxMichi
0033/6/6192519367100 Strasbourg/France   IRC #Debian (irc.icq.com)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Major Performance decrease after some hours

2006-10-02 Thread Ray Stell
On Sun, Oct 01, 2006 at 12:55:51PM +0200, MaXX wrote:
 
 Pure speculation: are you sure you aren't vacuuming too agressively?
 The DELETE waiting and SELECT waiting sound to me like they are waiting 
 for a lock that another vacuum is holding. 

How would one determine the lock situation definitively?  Is there 
an internal mechanism that can be queried?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] How to trace index to table?

2006-10-02 Thread Chris Hoover
I'm trying to build some queries to gather metrics on my PG database. When looking at pg_class and pulling a row that is an index, how do I use sql to pull the table the index belongs to?Thanks,Chris


Re: [GENERAL] How to trace index to table?

2006-10-02 Thread Ardian Xharra



If you look on pg_indexes you find the index and 
the table.


  - Original Message - 
  From: 
  Chris Hoover 
  
  To: pgsql-general@postgresql.org 
  
  Sent: Monday, October 02, 2006 9:19 
  AM
  Subject: [GENERAL] How to trace index to 
  table?
  I'm trying to build some queries to gather metrics on my PG 
  database. When looking at pg_class and pulling a row that is an index, 
  how do I use sql to pull the table the index belongs 
  to?Thanks,Chris
  
  

  No virus found in this incoming message.Checked by AVG Free 
  Edition.Version: 7.1.407 / Virus Database: 268.12.11/460 - Release Date: 
  01/10/2006


Re: [GENERAL] Major Performance decrease after some hours

2006-10-02 Thread Tom Lane
Ray Stell [EMAIL PROTECTED] writes:
 How would one determine the lock situation definitively?  Is there 
 an internal mechanism that can be queried?

pg_locks view.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Normal vs Surrogate Primary Keys...

2006-10-02 Thread rlee0001

Stephan Szabo wrote:
 On Sun, 1 Oct 2006, rlee0001 wrote:

  I know, for example, that by default PostgreSQL assigns every record a
  small unique identifier called an OID. It seems reasonable then, that
  when the DBA creates a cascading foreign key to a record, that the DBMS
  could, instead of storing the record's entire natural key, store only a
  reference to the OID and abstract/hide this behavior from the
  environment just as PostgreSQL does with its OID feature now. Of
  course, this would require that the OID be guaranteed unique, which I
  don't beleave is the case in the current versions.
 
  This would completely eliminate concerns related to the performance of
  cascading updates because no actual cascade would take place, but
  rather the update would affect all referencing records implicitly via
  the abstraction.

 Well, that alone isn't enough I think.

 MATCH SIMPLE allows you to pass the constraint for a row if any of the
 columns in a multi-column foreign key are NULL, so there isn't always a
 matching row, but there's also meaningful information in the column
 values. MATCH PARTIAL (which we admittedly don't support yet) allows you
 to have a valid key if the non-NULL portions of the multi-column foreign
 key match to one or more rows in the referenced table, so there may be
 more than one matching row. The all NULL case is pretty easy to handle in
 general.

 In addition, AFAICT for cascades you would potentially be trading the cost
 at cascade time with the cost at select time, so that would itself not
 always be a win. Also, I don't see how you get away with not needing two
 indexes on the referenced table to do this well unless you're storing
 something like a ctid which has its own problems with updates.

 I think there may be some better options than what we've got, but there's
 tradeoffs as well.

LOL, I hadn't even considered that. In my proposed solution, every
select against a table containing a foreign key (which selects at least
part of the foreign key) would require an implicit join to retrieve the
actual foreign key values. Additionally, selects with explicit joins
would need to be converted, which would require the DBMS to execute
several additional queries (to retrieve the internal surrogate key for
the given natural key values) before executing the original join query.

Oh well, can't kill a guy for trying.

Thanks for pointing out what I was too blind to see. :o)

-Robert

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] pg_dump design problem (bug??)

2006-10-02 Thread Karl O. Pinc

Postgresql 8.1.3

Hi,

I'm wondering if there's a problem with pg_dump --create,
or if I'm just missing something.
It does not seem to restore things like:

ALTER DATABASE foo SET DateStyle TO European;

Shouldn't the database that is re-created be like
the database that is being dumped?

For our purposes we do a pg_dumpall --globals-only
and then pg_dumps of each of our databases.  We like
this because we can restore blobs this way, get a
nice compressed and flexable --format=c, and are able
to restore individual databases.  But there is clearly
a problem because we lose the database meta-information
like datestyle, timezones and all that other per-db
SET stuff.  It seems the only way to get that is with
a pg_dumpall, and then it's not per-database.

What should we do to work around this problem?

Should pg_dump/pg_restore have arguments like:

--dbmeta  (the default when --create is used)
   Dumps the database's SET options.

--no-dbmeta (the default when --create is not used)
   Does not dump the database's SET options.

--dbowner (the default when --create is used)
   Dumps the database's owner.

--no-dbowner (the default when --create is not used)
   Does not dump the database's owner.


Hummm for complete control consider the following:

Or maybe pg_dump/pg_restore should augment/replace
--data-only --schema-only --create with:

--content=ctype[, ...]

where

ctype=db|dbowner|meta|schema|schemaowner|table|tableowner|data

db  create the database
dbowner set the database owner as in the dumped db
metaset the database SETs as in the dumped db
schema  create the schema (not data definitions/table structure)
as in the dumped db
schemaowner set the schema owner as in the dumped db
table   create the table(s) as in the dumped db
tableowner  set the table owners as in the dumped db
dataload the data as in the dumped db

I'd also want to add functions, triggers, views and the other sorts
of things that go into databases to the above list, but that's
enough for now.

Thanks for listening.

Karl [EMAIL PROTECTED]
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] using float4, I get incorrect value when select

2006-10-02 Thread Anibal David Acosta F.
Hello, I have a table, with a float4 column, when I insert a new row and for 
example I use this value 5000542,5 in the column of type float4, the 
insert do OK, but when I select (select myColumn from myTable) I get the 
value 5000540 I don't know why.

But if I cast to float8 (select cast(myColumn as float8) from myTable) I get 
the correct value (5000542,5).

I don't know if I am doing something wrong.. Please help me. Thanks

P.D.: I am using postgres 8.1 for windows

Anibal 




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Normal vs Surrogate Primary Keys...

2006-10-02 Thread rlee0001

Tom Lane wrote:
 rlee0001 [EMAIL PROTECTED] writes:
  ... I know, for example, that by default PostgreSQL assigns every record a
  small unique identifier called an OID.

 Well, actually, that hasn't been the default for some time, and even if
 you turn it on it's not guaranteed unique without additional steps, and
 seeing that it's only 32bits wide it's not usable as a unique ID for
 very large tables.

 unique and small are more or less contradictory in this context.

  ... it seems like this is the sort of thing that even a fully SQL-compliant
  DBMS could do internally to compensate for the performance issues with
  using large natural keys in relationships.

 What performance issues are those, exactly?  I have seen no data that
 proves that large key size is a bottleneck for reasonable schema designs
 in Postgres.  Maybe that just means that we have more fundamental
 problems to fix :-( ... but there's no point in investing a lot of
 effort on an aspect that isn't the next bottleneck.

Well from what I understand, there are basically two reasons to use
surrogate primary keys: 1) No reliable natural candidate key exists or
2) The natural candidate keys are percieved to be far too large/complex
to use as the primary key. I have yet to hear anyone recommend the use
of surrogate keys in modern databases for any other reason. Obviously
that some entities in practice have no reliable natural keys cannot be
helped and in those cases a surrogate key pretty much has to be used,
enless the data modeler would find it acceptable to use every attribute
in the entity as part of the primary key (in order to ensure uniqueness
of each record overall).

The second argument for the use of surrogate keys is simply that
reliable natural candidate keys are often perceived to be too large to
use as primary keys. The perception seems to be that large primary keys
consume a considerable amount of storage space when used in foreign
keys. For example, if I key employee by Last Name, First Name, Date
of Hire and Department, I would need to store copies of all this data
in any entity that relates to an employee (e.g. payroll, benefits and
so on). In addition, if any of these fields change in value, that
update would need to cascade to any related entities, which might be
perceived as a performance issue if there are many related records.

I'm not saying that PostgreSQL specifically has performance problems
but that using large natural keys can hamper performance (both in terms
of storage space required and cascading update time) when a lot of
relationships exist between entities with such large keys.

Personally I hate using surrogate keys except in places where no
reliable natural key exists but find it nessisary in order to improve
the efficiency of foreign keys in the database. So my proposal was
simply to have the DBMS internally create and use an invisible
surrogate key to identify and relate records, but expose the natural
key to the environment. The currently OID implimentation cannot be used
for this as you've already stated but I think a similar implimentation
could work.

I'm just wondering if anyone else would take advantage of the
performance benefit, or perhaps sees a flaw in my logic.

   regards, tom lane

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Postgres backup

2006-10-02 Thread pd
guys,

i have a java web app with a postgres backend. now i want to implement
a db database functionality from within my web app. now i have got
this so far,

String[] args = {\C:\\Program
Files\\PostgreSQL\\8.1\\bin\\pg_dump\,  -i,  -h,  localhost, 
-p,  5432,  -U,  postgres,  -F,  t,  -b,  -v,  -f, 
\C:\\Test.backup\,  TESTDB};

String result = ;
if (args.length  0) {
   result = args[0];// start with the first element
   for (int i=1; iargs.length; i++) {
   result = result + args[i];
  }
}

Process p = Runtime.getRuntime().exec(result);

Now on executing this i am getting a file called Test.backup, but its
0KB compeletely empty. can somebody tell me what i am doing wrong and
how can i fix it. somebody also told me that this is not the best way
of doing this, and i was wondering is there a better way of
implementhing this.

Thanks.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL]

2006-10-02 Thread Dirk Hartmann
Hi,

I am using Postgresql 8.0. I have the problem to do any operation with the
database like vaccumdb or dumping the database. The postgresql log shows,
that it can't find the file /var/lib/pgsql/data/pg_clog/ . The database
starts up and some tables can queried.

Any help is welcome

Dirk


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Advantages of postgresql

2006-10-02 Thread Iulian Manea








Hello everybody,



So far I have only been working with MySQL. Today I was
talking to a friend and he was suggesting I migrated to postgreSQL, as it is
way better 



My question is  why?

I mean could someone pls tell me some advantages and
disadvantages of working with postgresql?



Thanks in advance,

Iulian!










[GENERAL] FlushRelationBuffers Error

2006-10-02 Thread Joey Adams
Last night during a vacuum full, one of our production 7.4.12 databases threw 
this warning:

WARNING:  FlushRelationBuffers(idxtype26, 6137): block 5752 is referenced 
(private 0, global 1)
vacuumdb: vacuuming of database cvdb failed: ERROR:  FlushRelationBuffers 
returned -2
PANIC:  cannot abort transaction 35650725, it was already committed
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

We do use some external C functions that deal with creating indexes and locking 
tables, but we're pretty confident that they're locking correctly.  Another bit 
of information, the database gets restarted before the vacuum process begins.

Is this something that will resolve itself the next time the database gets 
vacuumed or does it need some kind of immediate action? Does the index need 
reindexed?

Thanks

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Advantages of postgresql

2006-10-02 Thread Andreas Kretschmer
Iulian Manea [EMAIL PROTECTED] schrieb:

 
 
 Hello everybody,
  
 So far I have only been working with MySQL. Today I was talking to a friend 
 and
 he was suggesting I migrated to postgreSQL, as it is way better 
  
 My question is  why?
 I mean could someone pls tell me some advantages and disadvantages of working
 with postgresql?

Read http://sql-info.de/mysql/gotchas.html versus
http://sql-info.de/postgresql/postgres-gotchas.html  


HTH, Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Dynamic access to record columns (in PL/pgsql)

2006-10-02 Thread Erik Jones

Tom Lane wrote:

Daniele Mazzini [EMAIL PROTECTED] writes:
  
I have a trigger procedure which I want to be called after insert or 
update on different tables. In this procedure, I need to find the values 
of key fields for the affected table in the NEW record, but I can't find 
a way to access a field the name of which I only know at execution time. 



There is no way to do that in plpgsql.  You could do it in the other PLs
(eg plperl, pltcl) since they are not as strongly typed as plpgsql.

regards, tom lane

  
Well, I wouldn't go that far.  A little more information might point to 
a solution here.  Given any one table that this trigger might fire on, 
are the fields that you need to check defined?  I.e., if  the trigger 
fires on table A, will you always be looking at column A.b and on table 
B column B.a?  If so, when you create the trigger, pass in some kind of 
variable that the trigger can use to know what field it's looking for.  
If that doesn't make enough sense, let me know and I can give you a more 
fleshed out example of what I'm talking about.


--
erik jones [EMAIL PROTECTED]
software development
emma(r)


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Weird disk write load caused by PostgreSQL?

2006-10-02 Thread Alexander Staubo
I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that  
seems to be writing data to disk at rates that I think are  
disproportional to the update load imposed on the database. I am  
looking for ways to determine the cause of this I/O.


As an example, here is a typical graph produced by Munin:

  http://purefiction.net/paste/pg-iostat.png

Running an hourly iostat produces this output:

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  43.50 0.21 0.04 427915  72736
sda  43.62 0.21 0.04 428183  88904
sda  43.74 0.21 0.05 428440 104877
sda  43.90 0.21 0.06 428808 124681
sda  44.06 0.21 0.07 429111 145447
sda  44.27 0.21 0.08 429532 170317
sda  44.46 0.21 0.09 429985 193594

In other words, it's reading about 400MB/hour and writing around  
15-20GB/hour, or exactly 118GB during the last six hours. To  
determine how well this correlates to the actual inserts and updates  
being performed on the database, I ran a loop alongside iostat that  
executed select sum(n_tup_upd), sum(n_tup_ins) from  
pg_stat_all_tables against PostgreSQL every hour and output the  
difference. Here are a few samples:


| delta_upd | delta_ins |
+---+---+
|  7111 |  2343 |
|  7956 |  2302 |
|  7876 |  2181 |
|  9269 |  2477 |
|  8553 |  2205 |

For the write numbers to match the tuple numbers, each updated/ 
inserted tuple would have to average at least 1.5MB (15 GB divided by  
10,000 tuples), which is not the case; the total size of the raw  
tuples updated/inserted during the above session probably does not  
exceed a couple of megabytes. Even considering overhead, page size,  
MVCC, etc., this does not compute.


I have not narrowed this explicitly down to PostgreSQL, but since the  
stock Linux kernel we are running does not offer per-process I/O  
statistics, I cannot determine this for sure. However, except for the  
PostgreSQL database, everything else on the box should be identical  
to what we are running on other boxes, which are not exhibiting the  
same kind of load. Oh, and I have PostgreSQL logging turned off.


Note that PostgreSQL's performance in itself seems fine, and  
according to top/ps it's only very rarely in iowait.


Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[GENERAL] test

2006-10-02 Thread Alban Hertroys
Mail from this ML doesn't seem to arrive at our office anymore (you 
haven't been silent for 4 days, have you?). Hence a small test.


Sorry for the inconvenience.
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Weird disk write load caused by PostgreSQL?

2006-10-02 Thread Tom Lane
Alexander Staubo [EMAIL PROTECTED] writes:
 I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that  
 seems to be writing data to disk at rates that I think are  
 disproportional to the update load imposed on the database. I am  
 looking for ways to determine the cause of this I/O.

Are you sure that iostat is to be trusted?  The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Normal vs Surrogate Primary Keys...

2006-10-02 Thread Martijn van Oosterhout
On Sun, Oct 01, 2006 at 07:48:14PM -0700, rlee0001 wrote:
 snip For example, if I key employee by Last Name, First Name, Date
 of Hire and Department, I would need to store copies of all this data
 in any entity that relates to an employee (e.g. payroll, benefits and
 so on). In addition, if any of these fields change in value, that
 update would need to cascade to any related entities, which might be
 perceived as a performance issue if there are many related records.

Err, those fields don't make a natural key since they have no guarentee
of uniqueness. You've simply decided that the chance of collision is
low enough that you don't care, but for me that's not really good
enough for use as a key.

Secondly, three of the four fields you suggest are subject to change,
so that indeed makes them a bad choice. My definition of key includes
unchanged for the lifetime of the tuple.

In that situation your idea may work well, but that's just a surrogate
key in disguise...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] using float4, I get incorrect value when select

2006-10-02 Thread Martijn van Oosterhout
On Sun, Oct 01, 2006 at 10:47:27PM -0400, Anibal David Acosta F. wrote:
 Hello, I have a table, with a float4 column, when I insert a new row and for 
 example I use this value 5000542,5 in the column of type float4, the 
 insert do OK, but when I select (select myColumn from myTable) I get the 
 value 5000540 I don't know why.
 
 But if I cast to float8 (select cast(myColumn as float8) from myTable) I get 
 the correct value (5000542,5).

Do you understand the concept of floating point and the difference
between float4 and float8? Floating point throws away digits that it
doesn't have room to store, and with float4 that occurs after about 6
digits.

If you want to remember exact numbers, use numeric.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] Postgres backup

2006-10-02 Thread Martijn van Oosterhout
On Sun, Oct 01, 2006 at 09:20:12PM -0700, pd wrote:
 guys,
 
 i have a java web app with a postgres backend. now i want to implement
 a db database functionality from within my web app. now i have got
 this so far,
 
 String[] args = {\C:\\Program
 Files\\PostgreSQL\\8.1\\bin\\pg_dump\,  -i,  -h,  localhost, 
 -p,  5432,  -U,  postgres,  -F,  t,  -b,  -v,  -f, 
 \C:\\Test.backup\,  TESTDB};

Do you really have space in those strings? Because that's not going to
work.

Secondly, work out where your STDERR is going, because any error
messages from pg_dump will have gone there...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] String handling function, substring vs. substr

2006-10-02 Thread arsi


Hi all,


brian [EMAIL PROTECTED] writes:

[EMAIL PROTECTED] wrote:

Does this mean that substr calls substring internally?? Or is it the
other way around?? Or are they independent of each other??



Looks like they're pretty evenly matched.


Actually, a bit of poking into the contents of pg_proc will show you
that they are both aliases for the same C function (text_substr_no_len).
So they should be *exactly* the same speed.

regards, tom lane

Thanks Brian for your answer (although that wasn't what I was looking for, 
I was looking for Tom's answer).


Thanks Tom for your answer.

Archie

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Weird disk write load caused by PostgreSQL?

2006-10-02 Thread Alexander Staubo

On Oct 2, 2006, at 17:50 , Tom Lane wrote:


Alexander Staubo [EMAIL PROTECTED] writes:

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.


Are you sure that iostat is to be trusted?


No. :) But iostat reads directly from /dev/diskstats, which should be  
reliable. Of course, it still doesn't say anything about which  
process is doing the writing; for that I would need to install the  
atop kernel patches or similar.


...

The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.


They are not uniform at all -- they correlate perfectly with the web  
traffic; it just so happens that the samples I quoted were from peak  
hours. Take a look at the Munin graph. (The spikes correspond to  
scheduled maintenance tasks like backups.)


Alexander.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Advantages of postgresql

2006-10-02 Thread Brandon Aiken








Ive recently done the same thing.



Basically, it boils down to philosophy.
MySQLs primary goal is speed. Speed over features, and even speed
over data integrity. PostgreSQLs (and most RDBMS systems) primary
goal is to present the complete relational model and maintain ACID compliance.



If youre using MySQL 4.x or
earlier, youve got a terrible DBMS. Prior to MySQL 5, non-integer math
was always inaccurate. There was no precise datatype. Additionally,
MySQL 4 lacked a number of features like views, triggers, and stored
procedures. MySQL 5 adds these features. Even then, however, the
default engine for MySQL, MyISAM, is *not*
a transactional engine so updates are not atomic. MyISAM also doesnt
support foreign key constraints, which, if your schema is even remotely
complex, is nightmarish. You must use the InnoDB engine in MySQL to get
the benefits of transactions.



Essentially, it boils down to this:

1. If you have a very simple database of 1
or two unrelated tables for a shopping cart or a guest book, MySQL is fine. (But
so is SQLite.)

2. If all you care about is speed and arent
terribly concerned if some of your records break or dont commit, MySQL
is also fine. This is why some sites (Slashdot, Digg) use MySQL databases.
Its no big deal if one of the forums loses some random guys
anti-MS rant.

3. If you plan to do all your data
checking in your control code and not enforce referential integrity, MySQL is
fine. This method is generally considered poor design, however.



Part of the problem many DBAs have with
MySQL is that the primary developer is a bit strange. Early
versions of the MySQL documentation called foreign keys tools for weak
developers, and said that ACID compliance could be emulated in your application
code so it wasnt necessary in the database. It should be pointed
out that no credible DBA (and, I should hope, no credible app devs) would agree
with these statements. Essentially, instead of properly citing
limitations of the database, early MySQL docs simply said not only that every
other DBMS in the world had it wrong, but that the relational model itself was essentially
not useful. To DBAs, MySQL came to be seen as one step above the MBA who makes
his department use a central Excel spreadsheet as a database.
This reputation continues to stick with MySQL in spite of the strides it has
made with MySQL 5. Another huge problem with MySQL is that it silently
truncates data. If you have a DECIMAL(5) field and try to INSERT 10
or 100 or what have you, instead of throwing an error MySQL instead inserts
9 (the maximum value). Thats just bad. An RDBMS
should do exactly everything you tell it and complain *loudly* when it cant.



If youre used to MySQL, the
problems with PostgreSQL are basically that its not quite as friendly as
MySQL. The command line for Postgre, psql, is less user-friendly. The
Windows GUI app, pgAdmin III, is also less user-friendly. Additionally,
the default install for PostgreSQL on nearly every Linux system Ive seen
is configured to run at minimal requirements. So youll have to
edit the configuration file in order to get the database to perform correctly.
Also, since PostgreSQL has so many more features than MySQL, it can be a bit
daunting to get started. Its like youve worked with Notepad
for years and years, and now youre starting to use Word or EMACS.







--





Brandon
 Aiken





CS/IT Systems Engineer













From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Iulian Manea
Sent: Monday, October 02, 2006
3:38 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Advantages of
postgresql





Hello everybody,



So far I have only been working with MySQL. Today I was
talking to a friend and he was suggesting I migrated to postgreSQL, as it is
way better 



My question is  why?

I mean could someone pls tell me some advantages and
disadvantages of working with postgresql?



Thanks in advance,

Iulian!










Re: [GENERAL] Weird disk write load caused by PostgreSQL?

2006-10-02 Thread Alexander Staubo
I'm surprised that I have to reply to myself, since in hindsight this  
should be bloody obvious: It's the pgsql_tmp directory. I just  
monitored the file creation in that directory, and found PostgreSQL  
to be creating huge temporary, extremely short-lived files ranging  
from 1MB to 20MB in size. I increased work_mem to 25MB -- should be  
perfectly safe on a box with 4GB of RAM, I hope - and the iostat  
traffic seems to have dropped to near zero.


I'm a bit baffled as to why PostgreSQL would ever be sorting 20MB of  
data in the first place, even with ~12 connections running queries  
concurrently, but I suppose I will have to look more closely at our  
query patterns.


Alexander.

On Oct 2, 2006, at 19:25 , Alexander Staubo wrote:


On Oct 2, 2006, at 17:50 , Tom Lane wrote:


Alexander Staubo [EMAIL PROTECTED] writes:

I have a production PostgreSQL instance (8.1 on Linux 2.6.15) that
seems to be writing data to disk at rates that I think are
disproportional to the update load imposed on the database. I am
looking for ways to determine the cause of this I/O.


Are you sure that iostat is to be trusted?


No. :) But iostat reads directly from /dev/diskstats, which should  
be reliable. Of course, it still doesn't say anything about which  
process is doing the writing; for that I would need to install the  
atop kernel patches or similar.


...

The read numbers in
particular look suspiciously uniform ... it would be a strange
query load that would create a read demand changing less than 1%
from hour to hour, unless perhaps that represented the disk's
saturation point, which is not the case if you're not seeing
obvious performance problems.


They are not uniform at all -- they correlate perfectly with the  
web traffic; it just so happens that the samples I quoted were from  
peak hours. Take a look at the Munin graph. (The spikes correspond  
to scheduled maintenance tasks like backups.)


Alexander.


---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Performance and Generic Config after install

2006-10-02 Thread Oisin Glynn
As an aside to the [GENERAL] Advantages of PostgreSQL thread going on 
today, I have wondered why the initial on install config of PostgreSQL 
is (according to most posts) very conservative.  I can see how this 
would be a plus when people may be getting PostgreSQL as part of an OS 
in the Linux world who may never/rarely use it.


I know that in reality tuning and sizing all of the parameters is a very 
database specific thing, but it would seem that if some default  choices 
would be available it would give those testing/evaluation and trying to 
get started a shot at quicker progress. Obviously they would still need 
to tune to your own application.


Some dreadfully named, possibly pointless options?

Unoptimized / low performance -  Low load on Server   (The current out 
of the box)
Production Non Dedicated   - PostgreSQL is one of the apps sharing 
server but is important.
Production Dedicated Server   - The only purpose of this box is to run 
PostgreSQL


Perhaps this has already been suggested and or shot down?

Oisin





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] Performance and Generic Config after install

2006-10-02 Thread Andrew Sullivan
On Mon, Oct 02, 2006 at 02:40:03PM -0400, Oisin Glynn wrote:
 As an aside to the [GENERAL] Advantages of PostgreSQL thread going on 
 today, I have wondered why the initial on install config of PostgreSQL 
 is (according to most posts) very conservative.  I can see how this 

Actually, that's a bit of a hangover in much the way the reputation
of MySQL as having no support for ACID is: the last couple of
releases of Postgres try to make at least some effort at estimating
sane but safe basic configuration for the system when it's installed.  

That said, there is something of a problem in auto-configuring a
cost-based planner and optimiser: some of the tuning is likely to be
extremely sensitive to other things you're doing on the box, which
means that you need to do a good, careful job for optimal
performance.  

I often hear people complaining about this feature of Postgres in
comparison to MySQL.  But it's not a reasonable comparison, because
MySQL basically uses a rule-based optimiser.  And systems like DB2
and Oracle, that use a cost-based optimiser, are often far from
perfect after a fresh install, too.  

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Performance and Generic Config after install

2006-10-02 Thread Jeff Davis
On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote:
 As an aside to the [GENERAL] Advantages of PostgreSQL thread going on 
 today, I have wondered why the initial on install config of PostgreSQL 
 is (according to most posts) very conservative.  I can see how this 
 would be a plus when people may be getting PostgreSQL as part of an OS 
 in the Linux world who may never/rarely use it.
 
 I know that in reality tuning and sizing all of the parameters is a very 
 database specific thing, but it would seem that if some default  choices 
 would be available it would give those testing/evaluation and trying to 
 get started a shot at quicker progress. Obviously they would still need 
 to tune to your own application.
 
 Some dreadfully named, possibly pointless options?
 
 Unoptimized / low performance -  Low load on Server   (The current out 
 of the box)

Keep in mind that PostgreSQL doesn't really restrict itself as a whole.
If you set the settings too low, and throw costly queries at it, the
load on the server will be very high. We don't want to imply that
PostgreSQL's settings restrict it's cpu, memory, or disk usage as a
whole.

 Production Non Dedicated   - PostgreSQL is one of the apps sharing 
 server but is important.
 Production Dedicated Server   - The only purpose of this box is to run 
 PostgreSQL
 

In my opinion, this is more the job of distributions packaging it.
Distributions have these advantages when they are choosing the settings:

(1) They have more information about the target computer
(2) They have more information about the intended use of the system as a
whole
(3) They have more ability to ask questions of the user

PostgreSQL itself can't easily do those things in a portable way. If
someone is compiling from source, it is more reasonable to expect them
to know what settings to use.

However, that said, I think that distributions certainly do take a cue
from the default settings in the source distribution. That's why lately
the default settings have been growing more aggressive with each
release.

Regards,
Jeff Davis








---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Postgres backup

2006-10-02 Thread Damian C

On 10/3/06, Martijn van Oosterhout kleptog@svana.org wrote:

On Sun, Oct 01, 2006 at 09:20:12PM -0700, pd wrote:
 guys,

 i have a java web app with a postgres backend. now i want to implement
 a db database functionality from within my web app. now i have got
 this so far,

 String[] args = {\C:\\Program
 Files\\PostgreSQL\\8.1\\bin\\pg_dump\,  -i,  -h,  localhost, 
 -p,  5432,  -U,  postgres,  -F,  t,  -b,  -v,  -f, 
 \C:\\Test.backup\,  TESTDB};




Secondly, work out where your STDERR is going, because any error
messages from pg_dump will have gone there...


For some useful info on draining STDERR (and seeing error codes etc)
from with java see
http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html
-Damian

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Determining size of a database before dumping

2006-10-02 Thread Madison Kelly

Hi all,

  I am (re)writing a backup program and I want to add a section for 
backing up pSQL DBs. In the planning steps (making sure a given 
destination has enough space) I try to calculate how much space will be 
needed by a 'pg_dump' run *before* actually dumping it.


  Is there a relatively easy way to do that? Moreso, if it possible to 
do this from an unpriviledged account? If not, is there a way to add the 
permissions to a specific pg user to allow that user to perform this?


  Thanks in advance!

Madison

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Steve Wampler
Madison Kelly wrote:
 Hi all,
 
   I am (re)writing a backup program and I want to add a section for
 backing up pSQL DBs. In the planning steps (making sure a given
 destination has enough space) I try to calculate how much space will be
 needed by a 'pg_dump' run *before* actually dumping it.

I suppose:

   pg_dump $PGD_OPTIONS | wc -c

isn't efficient enough, right?  Without knowing the options you plan
to use with pg_dump (compression?  dump just tables?, etc.) this is
going to be hard to get a decent estimate from...

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Alexander Staubo

On Oct 2, 2006, at 22:17 , Madison Kelly wrote:

  I am (re)writing a backup program and I want to add a section for  
backing up pSQL DBs. In the planning steps (making sure a given  
destination has enough space) I try to calculate how much space  
will be needed by a 'pg_dump' run *before* actually dumping it.


  Is there a relatively easy way to do that? Moreso, if it possible  
to do this from an unpriviledged account? If not, is there a way to  
add the permissions to a specific pg user to allow that user to  
perform this?


You could dump the database to /dev/null, piping it through wc to  
catch the size, but that would of course be wasteful.


You could count the disk space usage of the actual stored tuples,  
though this will necessarily be inexact:


  http://www.postgresql.org/docs/8.1/static/diskusage.html

Or you could count the size of the physical database files (/var/lib/ 
postgresql or wherever). While these would be estimates, you could at  
least guarantee that the dump would not *exceed* the esimtate.


Keep in mind that pg_dump can compress the dump and (iirc) will do so  
by default when you use the custom format (-Fc or --format=c).


Alexander.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Madison Kelly

Steve Wampler wrote:

Madison Kelly wrote:

Hi all,

  I am (re)writing a backup program and I want to add a section for
backing up pSQL DBs. In the planning steps (making sure a given
destination has enough space) I try to calculate how much space will be
needed by a 'pg_dump' run *before* actually dumping it.


I suppose:

   pg_dump $PGD_OPTIONS | wc -c

isn't efficient enough, right?  Without knowing the options you plan
to use with pg_dump (compression?  dump just tables?, etc.) this is
going to be hard to get a decent estimate from...



For now, lets assume I am doing a raw dump (no compression) and no fancy 
switches. I would probably err of the side of caution and try dumping 
OIDs and all schema (plus whatever else is needed to insure a full 
restore to a clean DB).


I could try piping the dump into something like 'wc' but with very large 
DBs I'd be worried about the (tremendous) disk I/O that would cause. 
This is also why I am hoping Pg keeps this info somewhere.


Madison

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Performance and Generic Config after install

2006-10-02 Thread Brandon Aiken
I think the problem would be partly mitigated be better or more obvious
documentation that makes it clear that a) PostgreSQL is probably not
configured optimally, and b) where exactly to go to get server
optimization information.  Even basic docs on postgresql.conf seem
lacking.  The fact that something like these exist: 

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

http://www.varlena.com/GeneralBits/Tidbits/perf.html#conf

Should be a giant red flag that documentation is a tad sparse for the
config file.  Those docs would take hours of work, they're the only
thing I've really found, *and they're still 3 years out of date*.

It took me a lot of digging to find the docs on optimizing PostgreSQL
from postgresql.org.  It's in the documentation area, but it's not
documented in the manual at all that I could find (which is highly
counter-intuitive).  Instead, it's listed under 'Technical
Documentation' (making the manual casual documentation?  I thought all
Postgre docs were technical.) then under 'Community Guides and Docs',
and finally under the subheading Optimizing (note that the above links
are listed here):

http://www.postgresql.org/docs/techdocs.2

Either the server installer or the (preferably) the manual needs to make
it very clear about this documentation.  If nobody can find it nobody
will use it, and it's very well hidden at the moment.

The manual gets updated with every release, but more and more I realize
that the manual isn't comprehensive.  The manual explains the SQL syntax
and how PostgreSQL interprets the relational model, but it has very
little information on how to really *use* PostgreSQL as a server.  The
manual is all app dev and no sysadmin.  For example, *what* compile time
options are available?  I know they exist, but I've never seen them
listed.

For another example, take a look at this so-called detailed guide to
installing PG on Fedora, which is linked from the 'Technical
Documentation' area of postgresql.org:

http://dancameron.org/pages/how-to-install-and-setup-postgresql-for-fedo
ralinux/

Now, really, this 'guide' is little more than what yum command to run
and which config lines to edit to limit remote TCP connections.

Now take a look at the first comment:
Thanks for the advice. For an Oracle DBA this really helped me in
comming up to speed on Postgres administration.

There should be an Administration Guide companion to the Manual.

--
Brandon Aiken
CS/IT Systems Engineer
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff Davis
Sent: Monday, October 02, 2006 2:58 PM
To: Oisin Glynn
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Performance and Generic Config after install

On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote:
 As an aside to the [GENERAL] Advantages of PostgreSQL thread going
on 
 today, I have wondered why the initial on install config of PostgreSQL

 is (according to most posts) very conservative.  I can see how this 
 would be a plus when people may be getting PostgreSQL as part of an OS

 in the Linux world who may never/rarely use it.
 
 I know that in reality tuning and sizing all of the parameters is a
very 
 database specific thing, but it would seem that if some default
choices 
 would be available it would give those testing/evaluation and trying
to 
 get started a shot at quicker progress. Obviously they would still
need 
 to tune to your own application.
 
 Some dreadfully named, possibly pointless options?
 
 Unoptimized / low performance -  Low load on Server   (The current out

 of the box)

Keep in mind that PostgreSQL doesn't really restrict itself as a whole.
If you set the settings too low, and throw costly queries at it, the
load on the server will be very high. We don't want to imply that
PostgreSQL's settings restrict it's cpu, memory, or disk usage as a
whole.

 Production Non Dedicated   - PostgreSQL is one of the apps sharing 
 server but is important.
 Production Dedicated Server   - The only purpose of this box is to run

 PostgreSQL
 

In my opinion, this is more the job of distributions packaging it.
Distributions have these advantages when they are choosing the settings:

(1) They have more information about the target computer
(2) They have more information about the intended use of the system as a
whole
(3) They have more ability to ask questions of the user

PostgreSQL itself can't easily do those things in a portable way. If
someone is compiling from source, it is more reasonable to expect them
to know what settings to use.

However, that said, I think that distributions certainly do take a cue
from the default settings in the source distribution. That's why lately
the default settings have been growing more aggressive with each
release.

Regards,
Jeff Davis








---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

---(end of 

Re: [GENERAL] Performance and Generic Config after install

2006-10-02 Thread Jim C. Nasby
Patches welcome. :)

BTW, -docs or -www might be a better place to discuss this.

On Mon, Oct 02, 2006 at 05:11:20PM -0400, Brandon Aiken wrote:
 I think the problem would be partly mitigated be better or more obvious
 documentation that makes it clear that a) PostgreSQL is probably not
 configured optimally, and b) where exactly to go to get server
 optimization information.  Even basic docs on postgresql.conf seem
 lacking.  The fact that something like these exist: 
 
 http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
 
 http://www.varlena.com/GeneralBits/Tidbits/perf.html#conf
 
 Should be a giant red flag that documentation is a tad sparse for the
 config file.  Those docs would take hours of work, they're the only
 thing I've really found, *and they're still 3 years out of date*.
 
 It took me a lot of digging to find the docs on optimizing PostgreSQL
 from postgresql.org.  It's in the documentation area, but it's not
 documented in the manual at all that I could find (which is highly
 counter-intuitive).  Instead, it's listed under 'Technical
 Documentation' (making the manual casual documentation?  I thought all
 Postgre docs were technical.) then under 'Community Guides and Docs',
 and finally under the subheading Optimizing (note that the above links
 are listed here):
 
 http://www.postgresql.org/docs/techdocs.2
 
 Either the server installer or the (preferably) the manual needs to make
 it very clear about this documentation.  If nobody can find it nobody
 will use it, and it's very well hidden at the moment.
 
 The manual gets updated with every release, but more and more I realize
 that the manual isn't comprehensive.  The manual explains the SQL syntax
 and how PostgreSQL interprets the relational model, but it has very
 little information on how to really *use* PostgreSQL as a server.  The
 manual is all app dev and no sysadmin.  For example, *what* compile time
 options are available?  I know they exist, but I've never seen them
 listed.
 
 For another example, take a look at this so-called detailed guide to
 installing PG on Fedora, which is linked from the 'Technical
 Documentation' area of postgresql.org:
 
 http://dancameron.org/pages/how-to-install-and-setup-postgresql-for-fedo
 ralinux/
 
 Now, really, this 'guide' is little more than what yum command to run
 and which config lines to edit to limit remote TCP connections.
 
 Now take a look at the first comment:
 Thanks for the advice. For an Oracle DBA this really helped me in
 comming up to speed on Postgres administration.
 
 There should be an Administration Guide companion to the Manual.
 
 --
 Brandon Aiken
 CS/IT Systems Engineer
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Davis
 Sent: Monday, October 02, 2006 2:58 PM
 To: Oisin Glynn
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Performance and Generic Config after install
 
 On Mon, 2006-10-02 at 14:40 -0400, Oisin Glynn wrote:
  As an aside to the [GENERAL] Advantages of PostgreSQL thread going
 on 
  today, I have wondered why the initial on install config of PostgreSQL
 
  is (according to most posts) very conservative.  I can see how this 
  would be a plus when people may be getting PostgreSQL as part of an OS
 
  in the Linux world who may never/rarely use it.
  
  I know that in reality tuning and sizing all of the parameters is a
 very 
  database specific thing, but it would seem that if some default
 choices 
  would be available it would give those testing/evaluation and trying
 to 
  get started a shot at quicker progress. Obviously they would still
 need 
  to tune to your own application.
  
  Some dreadfully named, possibly pointless options?
  
  Unoptimized / low performance -  Low load on Server   (The current out
 
  of the box)
 
 Keep in mind that PostgreSQL doesn't really restrict itself as a whole.
 If you set the settings too low, and throw costly queries at it, the
 load on the server will be very high. We don't want to imply that
 PostgreSQL's settings restrict it's cpu, memory, or disk usage as a
 whole.
 
  Production Non Dedicated   - PostgreSQL is one of the apps sharing 
  server but is important.
  Production Dedicated Server   - The only purpose of this box is to run
 
  PostgreSQL
  
 
 In my opinion, this is more the job of distributions packaging it.
 Distributions have these advantages when they are choosing the settings:
 
 (1) They have more information about the target computer
 (2) They have more information about the intended use of the system as a
 whole
 (3) They have more ability to ask questions of the user
 
 PostgreSQL itself can't easily do those things in a portable way. If
 someone is compiling from source, it is more reasonable to expect them
 to know what settings to use.
 
 However, that said, I think that distributions certainly do take a cue
 from the default settings in the source distribution. That's why lately
 the default 

Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Madison Kelly

Alexander Staubo wrote:

On Oct 2, 2006, at 22:17 , Madison Kelly wrote:

  I am (re)writing a backup program and I want to add a section for 
backing up pSQL DBs. In the planning steps (making sure a given 
destination has enough space) I try to calculate how much space will 
be needed by a 'pg_dump' run *before* actually dumping it.


  Is there a relatively easy way to do that? Moreso, if it possible to 
do this from an unpriviledged account? If not, is there a way to add 
the permissions to a specific pg user to allow that user to perform this?


You could dump the database to /dev/null, piping it through wc to catch 
the size, but that would of course be wasteful.


You could count the disk space usage of the actual stored tuples, though 
this will necessarily be inexact:


  http://www.postgresql.org/docs/8.1/static/diskusage.html

Or you could count the size of the physical database files 
(/var/lib/postgresql or wherever). While these would be estimates, you 
could at least guarantee that the dump would not *exceed* the esimtate.


Keep in mind that pg_dump can compress the dump and (iirc) will do so by 
default when you use the custom format (-Fc or --format=c).


Alexander.


Heh, that looks like just the article I would have wanted if I had 
properly RTFM. :D


Many thanks!!

Madi

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Tom Lane
Alexander Staubo [EMAIL PROTECTED] writes:
 You could count the disk space usage of the actual stored tuples,  
 though this will necessarily be inexact:
http://www.postgresql.org/docs/8.1/static/diskusage.html
 Or you could count the size of the physical database files (/var/lib/ 
 postgresql or wherever). While these would be estimates, you could at  
 least guarantee that the dump would not *exceed* the esimtate.

You could guarantee no such thing; consider compression of TOAST values.
Even for uncompressed data, datatypes such as int and float can easily
print as more bytes than they occupy on-disk.

Given all the non-data overhead involved (eg for indexes), it's probably
unlikely that a text dump would exceed the du size of the database,
but it's far from guaranteed.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Experiences with 3PAR

2006-10-02 Thread Aaron Glenn

I'm curious if anyone on the list has any hands on performance
experience with running PostgreSQL on 3PAR appliances (big and small).
If you do, please contact me offlist - I'll be happy to summarize for
the list archives.

Thanks,
Aaron

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Alexander Staubo

On Oct 2, 2006, at 23:19 , Tom Lane wrote:


Alexander Staubo [EMAIL PROTECTED] writes:

You could count the disk space usage of the actual stored tuples,
though this will necessarily be inexact:
   http://www.postgresql.org/docs/8.1/static/diskusage.html
Or you could count the size of the physical database files (/var/lib/
postgresql or wherever). While these would be estimates, you could at
least guarantee that the dump would not *exceed* the esimtate.


You could guarantee no such thing; consider compression of TOAST  
values.

Even for uncompressed data, datatypes such as int and float can easily
print as more bytes than they occupy on-disk.


Why does pg_dump serialize data less efficiently than PostgreSQL when  
using the custom format? (Pg_dump arguably has greater freedom in  
being able to apply space-saving optimizations to the output format.  
For example, one could use table statistics to selectively apply  
something like Rice coding for numeric data, or vertically decompose  
the tuples and emit sorted vectors using delta compression.) As for  
TOAST, should not pg_dump's compression compress just as well, or  
better?


Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/06 16:19, Tom Lane wrote:
 Alexander Staubo [EMAIL PROTECTED] writes:
 You could count the disk space usage of the actual stored tuples,  
 though this will necessarily be inexact:
http://www.postgresql.org/docs/8.1/static/diskusage.html
 Or you could count the size of the physical database files (/var/lib/ 
 postgresql or wherever). While these would be estimates, you could at  
 least guarantee that the dump would not *exceed* the esimtate.
 
 You could guarantee no such thing; consider compression of TOAST values.
 Even for uncompressed data, datatypes such as int and float can easily
 print as more bytes than they occupy on-disk.
 
 Given all the non-data overhead involved (eg for indexes), it's probably
 unlikely that a text dump would exceed the du size of the database,
 but it's far from guaranteed.

It's my experience that when there are lots of numeric fields,
fixed-width text records are approx 2.5x larger than the original
binary records.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is common sense really valid?
For example, it is common sense to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that common sense is obviously wrong.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFIZgvS9HxQb37XmcRAp6xAKC74LV+2wR6Ao5Oq56RInkkDP8PZgCglKEv
z0fvjrXTloWJJ7qdhfOpIoI=
=jICB
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Jeff Davis
On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote:
 Why does pg_dump serialize data less efficiently than PostgreSQL when  
 using the custom format? (Pg_dump arguably has greater freedom in  
 being able to apply space-saving optimizations to the output format.  
 For example, one could use table statistics to selectively apply  
 something like Rice coding for numeric data, or vertically decompose  
 the tuples and emit sorted vectors using delta compression.) As for  
 TOAST, should not pg_dump's compression compress just as well, or  
 better?

It would be a strange set of data that had a larger representation as a
compressed pg_dump than the data directory itself. However, one could
imagine a contrived case where that might happen.

Let's say you had a single table with 10,000 columns of type INT4, 100M
records, all with random numbers in the columns. I don't think standard
gzip compression will compress random INT4s down to 32 bits.

Another example is NULLs. What if only a few of those records had non-
NULL values? If I understand correctly, PostgreSQL will represent those
NULLs with just one bit.

What you're saying is more theoretical. If pg_dump used specialized
compression based on the data type of the columns, and everything was
optimal, you're correct. There's no situation in which the dump *must*
be bigger. However, since there is no practical demand for such
compression, and it would be a lot of work, there is no *guarantee* that
the data directory will be bigger. However, it probably is.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Determining size of a database before dumping

2006-10-02 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 On Tue, 2006-10-03 at 00:42 +0200, Alexander Staubo wrote:
 Why does pg_dump serialize data less efficiently than PostgreSQL when  
 using the custom format?

 What you're saying is more theoretical. If pg_dump used specialized
 compression based on the data type of the columns, and everything was
 optimal, you're correct. There's no situation in which the dump *must*
 be bigger. However, since there is no practical demand for such
 compression, and it would be a lot of work ...

There are several reasons for not being overly tense about the pg_dump
format:

* We don't have infinite manpower

* Cross-version and cross-platform portability of the dump files is
  critical

* The more complicated it is, the more chance for bugs, which you'd
  possibly not notice until you *really needed* that dump.

In practice, pushing the data through gzip gets most of the potential
win, for a very small fraction of the effort it would take to have a
smart custom compression mechanism.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Foreign keys, arrays, and uniqueness

2006-10-02 Thread Morgan Kita
Let me start by saying I understand that postgresql does not support the
following: composite data types with individual components acting as
foreign keys, arrays of composite data types, and arrays with elements
acting as foreign keys. I will layout my example using them for clarity
even though they don't exist

Ok let me layout a rough example of my problem:

CREATE TABLE target_node1 (
id int4,
value text
);

CREATE TABLE target_node2 (
id int4,
value text
);

CREATE TYPE composite1 (
idx int4,
reference int4
);

CREATE TYPE composite2 (
idx int4,
reference1 int4
reference2 int4
);

CREATE TABLE example_table (
id int4,
value test,
type1 composite1[];
type2 composite2[];
);

ALTER TABLE example_table ADD FOREIGN KEY ((composite1).reference)
REFERENCES target_node1 (id);
ALTER TABLE example_table ADD FOREIGN KEY ((composite2).reference1)
REFERENCES target_node1 (id);
ALTER TABLE example_table ADD FOREIGN KEY ((composite2).reference2)
REFERENCES target_node2 (id);

In addition I want a UNIQUE check constraint on INSERT and UPDATE that
first checks that the text in the value column is unique. If that fails
then I need to check that the type1 and type2 arrays are unique and act
accordingly.

Yes I understand all of that is completely wrong, but hopefully it
clarifies my intentions.

Essentially, I want a list(s) of a custom type(hopefully composite type)
where individual data members of the type are foreign keys to other
tables. Also I need to use those custom types for uniqueness in
combination with other information in the table.

The solutions as I see it:
1. Parallel arrays for the data members of the types. In other words an
array for composite1.idx, an array for composite1.reference and so on.
Then I could take the array of interest and create triggers that would
allow the array to act as a set of foreign keys. However, this is slow
when updating or deleting from a table like target_node1. Also it just
seems accident prone given that the values are all separate from each
other, and hardly human understandable.

2. Normalize this and move the composite type into its own table that
references example_table. It is a many-to-one relationship after all,
and I am sure this is the route that will probably be suggested by you
guys. However... if I do that how the hell do I do maintain my
uniqueness check on insert into example table given that half of the
data I am checking on is in another table, and hasn't been inserted yet
for the new row? I am totally stuck in regards to this...

3. Something with truly custom data types(aka C functions loaded into
the back end). To be honest I don't even know how possible that is or
how much work that would take. I have defined my own custom types
before, but not any indexing functions for them. What would I have to
define beyond the type itself? How would the foreign keys work in this
situation, if at all?

Help! This is seriously important to my project... but naturally the
less complicated the solution the better.

Thanks in advance,
Morgan


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match