Re: [GENERAL] pg_autovacuum not doing anything

2005-09-12 Thread Sim Zacks
I verified that it was not doing anything by erasing the pg_statistics table 
and it did not fill it within 24 hours. When I ran vacuum analyze manually it 
filled it in.
I am using postgresql 8.03 on gentoo which was downloaded with portage and 
included pg_autovacuum as part of the set up. 
I will try setting the debug output to see if that gives me any more 
information.

Thank You
Sim
Thomas F. O'Connell [EMAIL PROTECTED] wrote in message news:[EMAIL 
PROTECTED]
 How do you know that it is not vacuuming or analyzing anything?
 
 And which version of postgres did you build/acquire pg_autovacuum from?
 
 It seems that in post-8.0 versions, there is almost no output by  
 default. You could try increasing the debug output using -d 1 or -d 2  
 on the command line to verify that any activity is occurring.
 
 --
 Thomas F. O'Connell
 Co-Founder, Information Architect
 Sitening, LLC
 
 Strategic Open Source: Open Your i™
 
 http://www.sitening.com/
 110 30th Avenue North, Suite 6
 Nashville, TN 37203-6320
 615-469-5150
 615-469-5151 (fax)
 
 On Sep 11, 2005, at 4:26 AM, Sim Zacks wrote:
 
  I have pg_autovacuum running on my gentoo server, the same server  
  with the
  postgresql database.
  ps shows that it is running and I have it start automatically with the
  system. It is not vacuuming or analyzing the tables though.
  Am I doing something wrong? Is there a better way to do this?
 
  Thank You
  Sim
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SQL - planet redundant data

2005-09-12 Thread Poul Jensen

Thank you for your input! Individual replies follow below.

##

Chris Travers wrote:


Ok.  Imagine two huge huge tables:

file_dataand additional_data

create_table file_data (
   file_id serial primary key,
   station_id text,

);

create table additional_data (
   data_id bigserial primary key,
   file_id int REFERENCES file_data (file_id),
   temp,
   
);


So the 500 million entries in additional_data.file_id are now restricted 
to belong to the same group of values found in file_data.file_id. But I 
don't see how this can get me rid of redundant data - I still have 500M 
entries? Consider the column year. My 500M records come from 13 years. 
Saving 500M values when only 13 are distinct is very redundant.




Also you can normalize your way out of the redundant data problem.

I can tell how to do this  for filename and station name - I save 
something by saving an integer rather than a text string. But all the 
rest of my columns are already smallints. Do I save anything by storing  
5 rather than 1997?




I would *highly* recommend waiting until 8.1 goes live to impliment 
this in production.



It will be built/rebuilt after 8.1 is released.

##

John D. Burger wrote:

Why not effectively append all of your per-file tables into one huge 
table, with an additional column indicating which file the data comes 
from?  Then you have only two tables, one with a row for each 
observation in your data, one with a row for each file.  Some queries 
need to join the two tables, but that's not a big deal.


That big table with a row for each observation will have to include all 
columns, such as year and station_id. For such columns I store 500M 
values of which only ~15 are distinct in my case - the redundancy I'm 
trying to minimize. By splitting into smaller tables, many columns will 
contain only one distinct value. It *should* be possible to delete such 
columns and instead store their corresponding values in a table 
header. If I had one table pr. file, the table you suggest with one row 
pr. file could effectively serve as the collection of table headers.




It also seems to me that you may be tying your schema design too 
closely to the current way that the data is represented.  Do you 
really need to have the data file figure so prominently in your design?


The archive of data files is shared by a number of research groups 
around the world. There's a whole software package that people use for 
data analysis, accessing the data files as they are. So I expect a lot 
of file-specific queries.


Hmm, in fact if the redundant values you're worried about come in long 
stretches (e.g., temperature is the same for many observations in a 
row), I suppose you could do the same thing - map a constant value to 
the range of observation IDs for which it holds.  This gets back to 
having many tables, though.


This is in effect normalization? But if the observation ID takes just as 
much storage as the original value, have we gained anything? BTW, I'm 
not aiming at redundancy in the measurements - this is minimal compared 
to e.g. year and station ID. I do have an idea of how it *should* be 
possible to get rid of much of it - read the response to Tom Lane below.


##

Tom Lane wrote:


No, tableoid is sort of a virtual column ... it doesn't exist on disk.
When you query it you get a value fetched from the internal data
structure representing the table.
 

So virtual columns are possible - THIS is a way to clear redundant data! 
Is it possible for a user to create a virtual column? If not, this would 
make a big improvement.


What I really need are partial virtual columns. I'm imagining an 
alternative version of VACUUM ANALYZE that could do the following:
1) Order the rows in the table so that for each column, identical values 
are placed next to each other for as far as possible (the row order that 
optimizes one column will probably not be optimal for other columns).
2) For each column, identify the stretches that contain only one 
distinct value. Save that value together with ID of start and end row 
and delete stretch.
It is not obvious how to do a perfect optimization process in 1), at 
least not to me - I'm sure a skilled mathematician would know exactly 
how to do it. But here's a simple approach that would get us part of the 
way:
1.1) Grab the column w. most redundancy (fewest distinct values) and 
sort it into groups according to the distinct values.
1.2) For each of these groups, grab the column w. next most redundancy 
and sort into groups according to the distinct values.
And so on. Stop whenever groups become so small that there's nothing to 
gain.
Such an analysis would make it much less expensive to combine 
same-schema tables, and having everything in the same table is really 
convenient. It would obviously save a lot of storage space, but I 
imagine it would enable more efficient queries too 

[GENERAL] help me learn

2005-09-12 Thread suresh ramasamy
hi,



i'm new to postgreSQL as well as new to database concepts. please tell
me how can i learn. i mean the easiest and fast way. Your help will be
appreciated.


Re: [GENERAL] help me learn

2005-09-12 Thread A. Kretschmer
am  12.09.2005, um 17:08:31 +0530 mailte suresh ramasamy folgendes:
 hi,
 
 i'm new to postgreSQL as well as new to database concepts. please tell me 
 how can i learn. i mean the easiest and fast way. Your help will be 

Read a book.
http://techdocs.postgresql.org/techdocs/bookreviews.php


Regards, Andreas
-- 
Andreas Kretschmer(Kontakt: siehe Header)
Heynitz:  035242/47212,  D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===Schollglas Unternehmensgruppe=== 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] help me learn

2005-09-12 Thread Berend Tober

suresh ramasamy wrote:

i'm new to postgreSQL as well as new to database concepts. please tell me 
how can i learn. i mean the easiest and fast way. Your help will be 
appreciated.




Make an appropriate posting to pgsql-jobs?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Replication

2005-09-12 Thread barry
Hi,

I currently have a postgresql 8 system which I want to replicate
(ideally in realtime) with a spare host in order to introduce some
redundancy - eg. if the master server dies then I've got a ready-to-go
backup.  Switchover does not have to be automated.

I've looked into commandprompt.com's mammoth system, but it only
supports up to 1000 tables (the documentation doesn't mention this!) -
the database in question has more than 1000 tables, and adds new tables
regularly.  Slony-I and pgpool apparently don't support dynamic
schemas, which I'd obviously need, so they're not quite up to the job
either.

I'm currently looking at some sort of hack-job with the WAL archives
(see http://www.issociate.de/board/index.php?t=msggoto=443099), but
this looks like a slightly flaky approach - have I missed the obvious
solution?  Is there any stable software available which can replicate a
large and dynamic number of tables? 

Cheers,

Barry


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Replication

2005-09-12 Thread Peter Eisentraut
Am Montag, 12. September 2005 13:52 schrieb [EMAIL PROTECTED]:
 I currently have a postgresql 8 system which I want to replicate

Look at DRBD.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Reid Thompson
Example:
assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index? 

thanks,
reid



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Alvaro Herrera
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
 Example:
 assume a table of 10 columns, three of which are fname, lname, and dob.
 If an index is created on (fname, lname, dob), will a query that
 utilizes two of the columns ( select 'data' from table where fname = 'X'
 and lname = 'Y') utilize the index? 

Yes, if it is selective enough.  (It _can_ use the index, which does not
mean that it _will_ use it.)  Note that if your example query used the
columns (lname, dob), the answer would be no.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, Gee, Officer Krupke)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Reid Thompson
Alvaro Herrera wrote:
 On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
 Example:
 assume a table of 10 columns, three of which are fname, lname, and
 dob. If an index is created on (fname, lname, dob), will a query that
 utilizes two of the columns ( select 'data' from table where fname =
 'X' and lname = 'Y') utilize the index?
 
 Yes, if it is selective enough.  (It _can_ use the index,
 which does not mean that it _will_ use it.)  Note that if
 your example query used the columns (lname, dob), the answer would be
 no. 
 
Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?



reid

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Michael Fuhr
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
 assume a table of 10 columns, three of which are fname, lname, and dob.
 If an index is created on (fname, lname, dob), will a query that
 utilizes two of the columns ( select 'data' from table where fname = 'X'
 and lname = 'Y') utilize the index? 

See Multicolumn Indexes in the Indexes chapter of the documentation.

http://www.postgresql.org/docs/8.0/interactive/indexes-multicolumn.html

You can use EXPLAIN to see whether the planner will use an index for
a particular query.

http://www.postgresql.org/docs/8.0/interactive/performance-tips.html#USING-EXPLAIN

Note, however, that the planner will ignore an index and use a
sequential scan if it thinks the latter will be faster, so if you
want to see whether the query *can* use an index (as opposed to
*will* use it) then you could execute SET enable_seqscan TO off
and then run EXPLAIN (don't forget to RESET enable_seqscan or SET
it back to on when you're done testing).

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Michael Fuhr
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
 Alvaro Herrera wrote:
  Note that if your example query used the columns (lname, dob),
  the answer would be no. 

 Why is that? In order to use an index, does the query have to utilize
 the 'first' element of the index?

In released versions of PostgreSQL, yes.  Version 8.1 will remove
that restriction.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Alvaro Herrera
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
 Alvaro Herrera wrote:
  On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
  Example:
  assume a table of 10 columns, three of which are fname, lname, and
  dob. If an index is created on (fname, lname, dob), will a query that
  utilizes two of the columns ( select 'data' from table where fname =
  'X' and lname = 'Y') utilize the index?
  
  Yes, if it is selective enough.  (It _can_ use the index,
  which does not mean that it _will_ use it.)  Note that if
  your example query used the columns (lname, dob), the answer would be
  no. 

 Why is that? In order to use an index, does the query have to utilize
 the 'first' element of the index?

The leftmost part.  There's no way to scan an index if you don't know
the key.  On a btree index, the key is ordered, and the columns at the
left are more significant than those at the right.  If you don't provide
a value for the leftmost (first) column, there's no way to start
scanning the index because there's no starting point.

I don't think that was nearly clear enough, but OTOH I haven't had any
coffee today yet.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
If you have nothing to say, maybe you need just the right tool to help you
not say it.   (New York Times, about Microsoft PowerPoint)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
 Example:
 assume a table of 10 columns, three of which are fname, lname, and dob.
 If an index is created on (fname, lname, dob), will a query that
 utilizes two of the columns ( select 'data' from table where fname = 'X'
 and lname = 'Y') utilize the index? 

 Yes, if it is selective enough.  (It _can_ use the index, which does not
 mean that it _will_ use it.)  Note that if your example query used the
 columns (lname, dob), the answer would be no.

Actually, that last point is not true anymore as of 8.1 --- see this
thread:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
which led to this patch:
http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

I missed the fact that the documentation said it wouldn't work though.
Will fix...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] SQL - planet redundant data

2005-09-12 Thread John D. Burger
Hmm, in fact if the redundant values you're worried about come in 
long stretches (e.g., temperature is the same for many observations 
in a row), I suppose you could do the same thing - map a constant 
value to the range of observation IDs for which it holds.  This gets 
back to having many tables, though.


This is in effect normalization? But if the observation ID takes just 
as much storage as the original value, have we gained anything? BTW, 
I'm not aiming at redundancy in the measurements - this is minimal 
compared to e.g. year and station ID.


I don't think this is exactly what people meant by normalization, but 
maybe.  My basic thought was that you since you have some redundancy in 
your data, you might want to use some form of compression.  One of the 
simplest forms of compression is called run-length encoding 
(http://en.wikipedia.org/wiki/Run_length_encoding).  So you'd have most 
of your data in a main table:


  create table observations (
obsID   integer primary key,-- Maybe a BIGINT
temperature float,
etc.
  );

and some other compressed tables for those features that have long 
runs of repetitive values:


  create table obsYears (
startObsinteger primary key references observations (obsID),
endObs  integer references observations (obsID),
yearinteger);

  create table obsStations (
startObsinteger primary key references observations (obsID),
endObs  integer references observations (obsID),
stationID   integer);

(Caution, I haven't checked these for syntax.)  I've introduced an 
observation ID, and then I have compressed tables that map =ranges= 
of these IDs to values that are constant for long stretches.  Each year 
occupies only one row, same with each station.  (I think your reply to 
Tom may have been getting at something like this.)  Now you can do 
queries like this, say, for temperature statistics in a particular 
year:


  select avg(temperature), stddev(temperature) from observations, 
obsYears

where obsID between startObs and endObs
and year = 2001;

You could join in other compressed tables in the same way.  In fact, 
you could glue them all together with a VIEW, and you'd be able to 
treat the whole thing like one giant table, with much of the redundancy 
removed.  Note that if you define indexes on the startObs and endObs 
columns, Postgresql might avoid scanning through the compressed tables 
every time you do a query.  You might also benefit from a composite 
index on (startObs, endObs).  For features like year, which are 
massively repetitive, this might even be faster than storing the 
feature in the main table, since the compressed table will easily fit 
in memory.


So the basic idea is run-length encoding for repetitive values.  I 
think I can use this in some of my own data - I don't know why I never 
thought of it before.


- John D. Burger
  MITRE



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
 Why is that? In order to use an index, does the query have to utilize
 the 'first' element of the index?

 The leftmost part.  There's no way to scan an index if you don't know
 the key.  On a btree index, the key is ordered, and the columns at the
 left are more significant than those at the right.  If you don't provide
 a value for the leftmost (first) column, there's no way to start
 scanning the index because there's no starting point.

Actually, btree doesn't have any particular problem with that --- it
just starts the scan at the beginning of the index.  However the other
index types do all require a constraint on the first index column;
for instance hash has to be able to determine a hash value.

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Replication

2005-09-12 Thread Joshua D. Drake

[EMAIL PROTECTED] wrote:


I've looked into commandprompt.com's mammoth system, but it only
supports up to 1000 tables (the documentation doesn't mention this!) -
the database in question has more than 1000 tables, and adds new tables
regularly.  Slony-I and pgpool apparently don't support dynamic
schemas, which I'd obviously need, so they're not quite up to the job
either.


Actually the theorectical limit for the Mammoth system is 10,000 tables,
we have only tested up to 1000. This is metioned quite clearly on the 
website.
It should be noted that if Slony-I won't do what you need then Mammoth 
probably

would not either.

Although Mammoth can add new tables on the fly as long as those tables
are empty. If they are not empty they would cause a full sync to occur.

Sincerely,

Joshua D. Drake




I'm currently looking at some sort of hack-job with the WAL archives
(see http://www.issociate.de/board/index.php?t=msggoto=443099), but
this looks like a slightly flaky approach - have I missed the obvious
solution?  Is there any stable software available which can replicate a
large and dynamic number of tables? 


Cheers,

Barry


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] If an index is based on 3 columns will a query using

2005-09-12 Thread Teodor Sigaev

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.


GiST may work with any subset of index columns too. Even in existing code I 
don't see any problem except NULL in a first column. GiST doesn't store tuples 
with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.


After our work about WAL-lization GiST, it may work with invalid tuples 
(possibly occured after crash recovery), so itsn't a big deal to add support 
NULL in a first column. But freeze date is outdated... Should I add or leave it 
to 8.2?




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Alvaro Herrera [EMAIL PROTECTED] writes:

  Yes, if it is selective enough.  (It _can_ use the index, which does not
  mean that it _will_ use it.)  Note that if your example query used the
  columns (lname, dob), the answer would be no.
 
 Actually, that last point is not true anymore as of 8.1 --- see this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
 which led to this patch:
 http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

Did that patch actually implement skip scanning? 

The comment seems to only describe removing the restriction from the planner.
Which would make it theoretically possible but presumably the the cost
estimator should ensure it essentially never gets chosen for btree indexes.
The btree index would very very rarely help since it would require a complete
index scan.

I guess I could see some corner cases where it would help. Very wide tables
with an index on a few very selective relatively narrow columns. So the index
could be scanned in its entirety much faster than a full table scan. But the
index would have to be *much* narrower than the table and quite selective
to overcome the random access penalty.

Skip scanning would make it much more likely to be helpful.

Also, I think Oracle has another scan method called a fast index scan that
basically does a full sequential scan of the index. So the tuples come out
unordered but the access pattern is sequential. Would that be a good TODO for
Postgres? Is it feasible given the index disk structures in Postgres?

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] ERROR: type temp_gc already exists

2005-09-12 Thread Janning Vygen
Am Samstag, 10. September 2005 18:05 schrieb Tom Lane:
 Janning Vygen [EMAIL PROTECTED] writes:
  i guess the table was dropped but not the corresponding type.
  How can things like this happen?

 Corrupted pg_depend table maybe?  You might try REINDEXing pg_depend
 to be on the safe side.  Also please look to see if there are any
 relevant entries in it (look for objid = the type's OID, or refobjid
 = 16562879 which we can see was the table's OID).

  How can i fix it?  Can i just drop the type from pg_type?

 If there's no pg_depend entry then DROP TYPE should work.  Otherwise
 you might have to resort to manually DELETEing the pg_type row.

Thanks for your detailed answer. I don't want to do anything wrong. To be 
sure, i have some more questions:

- There is no entry in pg_depend. Should i just drop the entry from pg_type or 
should i REINDEX anyway?

- Can i REINDEX pg_depend in normal operation mode or do i have to take 
precautions mentioned in the docs? [1] 

- How can things like this happen? Hardware failure? If yes, should i change 
my harddisk?

kind regards,
Janning

[1] It's not clear to me if pg_depend is a shared system catalog because the 
docs say 

   any of the shared system catalogs (pg_database, 
pg_group, pg_shadow, or  pg_tablespace) 

Maybe the iteration is final, maybe it shows only examples)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 GiST may work with any subset of index columns too. Even in existing code I 
 don't see any problem except NULL in a first column. GiST doesn't store 
 tuples 
 with leading NULL value (gist.c lines 174, 326), so index doesn't contained 
 them.

Well, that's exactly the problem :-(.  Or at least one of the problems;
the other being what you'd use as search key to find such tuples.

 After our work about WAL-lization GiST, it may work with invalid
 tuples (possibly occured after crash recovery), so itsn't a big deal
 to add support NULL in a first column. But freeze date is
 outdated... Should I add or leave it to 8.2?

Too late for 8.1 I'd say --- this definitely sounds like a new feature
rather than a bug fix.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] If an index is based on 3 columns will a query using two of the columns utilize the index?

2005-09-12 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

 Did that patch actually implement skip scanning? 

No, it just removed the planner's arbitrary assumption that the index
methods wouldn't cope.  Skip scanning is actually something rather
different anyway.

 The comment seems to only describe removing the restriction from the planner.
 Which would make it theoretically possible but presumably the the cost
 estimator should ensure it essentially never gets chosen for btree indexes.

btcostestimate does understand this now.

 I guess I could see some corner cases where it would help. Very wide tables
 with an index on a few very selective relatively narrow columns. So the index
 could be scanned in its entirety much faster than a full table scan. But the
 index would have to be *much* narrower than the table and quite selective
 to overcome the random access penalty.

With a bitmap index scan the penalty wouldn't be so high.

 Also, I think Oracle has another scan method called a fast index scan that
 basically does a full sequential scan of the index. So the tuples come out
 unordered but the access pattern is sequential. Would that be a good TODO for
 Postgres? Is it feasible given the index disk structures in Postgres?

I think this would probably fail under concurrent update conditions: you
couldn't guarantee not to miss or multiply return index entries.  There
is interlocking in an index-order scan that prevents such problems, but
I don't see how it'd work for a physical-order scan.

You could probably make it work if you were willing to lock out writers
for the duration of the scan, but that'd severely restrict the
usefulness I would think.  I'm also not sure how we'd express such a
constraint within the system...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Replication

2005-09-12 Thread Simon Riggs

Barry,

You can use PITR to archive transaction logs to a second server that is
kept in standby mode.

This will cope with any number of tables and cope with dynamic changes
to tables.

This is fairly straightforward and very low overhead.
Set archive_command to a program that transfers xlog files to second
server.
Then set restore_command on the second server to a program that loops
until the next file is available.
Switchover time is low.

Best Regards, Simon Riggs

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 [EMAIL PROTECTED]
 Sent: 12 September 2005 04:52
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] Replication


 Hi,

 I currently have a postgresql 8 system which I want to replicate
 (ideally in realtime) with a spare host in order to introduce some
 redundancy - eg. if the master server dies then I've got a ready-to-go
 backup.  Switchover does not have to be automated.

 I've looked into commandprompt.com's mammoth system, but it only
 supports up to 1000 tables (the documentation doesn't mention this!) -
 the database in question has more than 1000 tables, and adds
 new tables
 regularly.  Slony-I and pgpool apparently don't support dynamic
 schemas, which I'd obviously need, so they're not quite up to the job
 either.

 I'm currently looking at some sort of hack-job with the WAL archives
 (see http://www.issociate.de/board/index.php?t=msggoto=443099), but
 this looks like a slightly flaky approach - have I missed the obvious
 solution?  Is there any stable software available which can
 replicate a
 large and dynamic number of tables?

 Cheers,

 Barry


 ---(end of
 broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] linux pg pointed to windows partition for db

2005-09-12 Thread Steve Atkins
On Sun, Sep 11, 2005 at 06:36:26PM +0100, mike dixon wrote:
 Tried a couple other places and aren't getting anywhere.
 
 A windows xp program I use uses pgsql; I'd like to create a backup of the db 
 but from within linux without xp running (I run xp in vmware; and the xp db 
 backup will be written to an ext* partition).  I have the same rev pgsql 
 installed on xp and linux (8.0.3 IIRC); the xp/pgsql db is on an NTFS 
 partition and I mount the disc read-only. 


You'll be unable to start the database with the disk mounted read-only.

If you want to create a backup of the DB you have three options that
are likely to work.

1) Create a filesystem level backup of the data directory

2) Boot the virtual machine, start postgresql and run a pg_dump backup
   under XP.

3) Boot the virtual machine, start postgresql and then run a pg_dump
   backup under Linux, connecting to the virtual machine over the
   VMWare virtual network.

If you remount the NTFS partition read-write (does that work on Linux these
days reliably?) you may well be able to get a linux postmaster to look at
it, but I'd be concerned about trashing the data. Not something to try as
a backup attempt.

Cheers,
  Steve


 My problem is two-fold:
 
 1.  What parameters to start the linux pgsql with for this circumstance if 
 not the defaults;
 
 2.  How to point the linux pgsql/pg_dump to the xp pg's data to do the backup.
 
 Can I just start the linux pgsql with its defaults?  If so, how do I point 
 linux pg_dump to the pgsql db data in an xp dir (/mnt/Program 
 Files/Postgresql/8.0/data)?
 
 If linux pgsql defaults aren't good enough for this circumstance, what do I 
 need to change?
 
 Also, if the xp pgsql db requires a pw when working with it under xp, if xp 
 isn't running and I can use the linux pg_dump for this do I still need the xp 
 pgsql pw?  Is there any way to pass that to the linux pg_dump on the cmd line 
 so I can put this operation in a script?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] ERROR: type temp_gc already exists

2005-09-12 Thread Tom Lane
Janning Vygen [EMAIL PROTECTED] writes:
 Am Samstag, 10. September 2005 18:05 schrieb Tom Lane:
 If there's no pg_depend entry then DROP TYPE should work.  Otherwise
 you might have to resort to manually DELETEing the pg_type row.

 Thanks for your detailed answer. I don't want to do anything wrong. To be 
 sure, i have some more questions:

 - There is no entry in pg_depend. Should i just drop the entry from pg_type 
 or 
 should i REINDEX anyway?

Well, what did you do to check that there was no entry?  If the index is
corrupt and you issued a query that used the index, it might have failed
to find an entry that's actually there in the table (in fact, if we're
assuming the DROP TYPE didn't happen because the system didn't find the
dependency row while dropping the table, this is pretty much exactly
what you'd expect).  I'd REINDEX and then check again.

 - How can things like this happen? Hardware failure? If yes, should i change 
 my harddisk?

Insufficient information to say.  It wouldn't be a bad idea to run some
disk tests though.

 [1] It's not clear to me if pg_depend is a shared system catalog because 
 the 
 docs say 
any of the shared system catalogs (pg_database, 
 pg_group, pg_shadow, or  pg_tablespace) 
 Maybe the iteration is final, maybe it shows only examples)

That's meant to be a complete list --- I've updated the documentation to
make this clearer.  But you could check for yourself:
select relname from pg_class where relisshared;

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] SQL - planet redundant data

2005-09-12 Thread Poul Jensen

Thank you, John!
I misunderstood you the first time, but I now see we have the same thing 
in mind.



So you'd have most of your data in a main table:

  create table observations (
obsIDintegerprimary key,-- Maybe a BIGINT
temperaturefloat,
etc.
  );

and some other compressed tables for those features that have long 
runs of repetitive values:


  create table obsYears (
startObsintegerprimary keyreferences observations 
(obsID),

endObsintegerreferences observations (obsID),
yearinteger);

  create table obsStations (
startObsintegerprimary keyreferences observations 
(obsID),

endObsintegerreferences observations (obsID),
stationIDinteger);

(Caution, I haven't checked these for syntax.)  I've introduced an 
observation ID, and then I have compressed tables that map =ranges= 
of these IDs to values that are constant for long stretches.  Each 
year occupies only one row, same with each station.


One complication: Applying the observation ID you're in effect ordering 
the rows. If you order them chronologically, year is perfectly lined 
up, giving you one row pr. value in your compressed table, but e.g. 
month will be split up in n_years*12 stretches of obsIDs, and 
station_id may not have any continuous stretches of obsIDs at all. I 
don't see any solution to this, but better compression can be achieved 
by ordering rows optimally when applying the obsID. The reply to Tom 
Lane in my previous post suggested one way to do this - it may not 
always be optimal, but at least it's simple.


Now you can do queries like this, say, for temperature statistics in a 
particular year:


  select avg(temperature), stddev(temperature) from observations, 
obsYears

where obsID between startObs and endObs
and year = 2001;


This works! I had not yet realized how to make this connection between 
two tables, so that was a major help - thank you.




You could join in other compressed tables in the same way.  In fact, 
you could glue them all together with a VIEW, and you'd be able to 
treat the whole thing like one giant table, with much of the 
redundancy removed.


That is exactly what I want, and now I finally see how to do it (I 
think!). However, it is a considerable amount of work to set this up 
manually, plus, it has been a headache realizing how to get there at 
all. I'm hoping that one or more of the developers think it would be a 
good idea for PostgreSQL to perform an internal table optimization 
process using run-length encoding. Imagine you could just throw all your 
data into one table, run OPTIMIZE TABLE and you'd be done. With SQL 
being all about tables I'm surprised this idea (or something even 
better) hasn't been implemented already.


Poul Jensen

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] PQtrace doesn't work

2005-09-12 Thread W. van den Akker

Hello,

I've send this message also on 29-1-2004 and have since no solution for 
this problem .. :o .

I have a little test program (see at the end of the message).
The program crashes when PQTrace is called (instruction  referenced 
memory at 0x0010, the
memory could not be written (obvious ... ). I use the library 
libpqdll.lib and postgresql v8.0.1, but also happens

in 7.4.9.
Running under W2000 sp4, VC++ 6 SP5. If compiling under Linux then there 
is no problem. Obvious there is

something wrong with the use under windows

If I comment traceoption all works fine.

Any ideas?

gr,

Willem.

#include stdio.h
#include stdlib.h
#include libpq-fe.h
#include winsock.h

void main ()
{ 
   int  nFields;

   int i, j;

   PGconn*conn;
   PGresult *res;

   char *pghost = linux;
   char *dbName = some_db;

   FILE *debug;

   WSADATA  wsadata;
   WSAStartup(0x0101, wsadata);


  conn = PQsetdbLogin (pghost, NULL, NULL, NULL, dbName, user,);

  if (PQstatus(conn) == CONNECTION_BAD)
  {
  printf (Connection to database %s is failed\n, dbName);
  printf (%s, PQerrorMessage (conn));
  PQfinish (conn);
  exit (1);
  }

  debug = fopen (trace.out, w);
---  PQtrace (conn, debug);

  res = PQexec (conn, BEGIN);
  if (!res || PQresultStatus (res) != PGRES_COMMAND_OK)
  {
  printf (BEGIN command failed\n);
  PQclear (res);
  PQfinish (conn);
  exit (1);
  }

  PQclear (res);

  res = PQexec (conn, DECLARE mycursor CURSOR FOR select sum(id) from 
relaties);

  if (!res || PQresultStatus (res) != PGRES_COMMAND_OK)
  {
  printf (DECLARE CURSOR command failed\n);
  PQclear (res);
  PQfinish (conn);
  exit (1);
  }

  PQclear (res);
  res = PQexec (conn, FETCH ALL in mycursor);
  if (!res || PQresultStatus (res) != PGRES_TUPLES_OK)
  {
  printf (FETCH ALL command didn't return tuples properly\n);
  PQclear (res);
  PQfinish (conn);
  exit (1);
  }

  nFields = PQnfields (res);
  for (i = 0; i  nFields; i++)
  printf (%-15s, PQfname (res, i));

  printf (\n\n);

  for (i = 0; i  PQntuples (res); i++)
  {
  for (j = 0; j  nFields; j++)
  printf (%-15s, PQgetvalue (res, i, j));
  printf (\n);
  }

  PQclear (res);

  res = PQexec (conn, CLOSE mycursor);
  PQclear (res);

  res = PQexec (conn, COMMIT);
  PQclear (res);
PQfinish (conn);

  fclose (debug);

  WSACleanup();
}



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Building postgres on Suze

2005-09-12 Thread Christian Goetze

I'm trying to build postgres on Suze with --with-pam, and it tells me:

/usr/lib/gcc-lib/i586-suse-linux/3.3.5/../../../../i586-suse-linux/bin/ld: 
cannot find -lpam


I know it is actually installed, and disecting the configure script and 
hand-compiling the test program works if I say /lib/libpam.so.0, but 
fails if I say -lpam. Very bizarre - anyone encountered this?


I realize this is probably a suze issue and not a postgres issue, but I 
hope someone here has seen this problem


Thanks in advance!
--
cg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Replication

2005-09-12 Thread Tatsuo Ishii
 I currently have a postgresql 8 system which I want to replicate
 (ideally in realtime) with a spare host in order to introduce some
 redundancy - eg. if the master server dies then I've got a ready-to-go
 backup.  Switchover does not have to be automated.
 
 I've looked into commandprompt.com's mammoth system, but it only
 supports up to 1000 tables (the documentation doesn't mention this!) -
 the database in question has more than 1000 tables, and adds new tables
 regularly.  Slony-I and pgpool apparently don't support dynamic
 schemas, which I'd obviously need, so they're not quite up to the job
 either.

pgpool(without Slony-I) replicates schema changes. And PGCluter too.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Utility that creates table schema from csv data?

2005-09-12 Thread CSN
Probably wishful thinking, but who knows - maybe
there's something in contrib! I have a bunch of csv
data with the field names specified on the first line
of the various files. Is there any such utility that
will create a table schema using the field names AND
look through the data and determine what data types
each field should be?

Thanks,
CSN



__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] SQL - planet redundant data

2005-09-12 Thread Brent Wood


 That is exactly what I want, and now I finally see how to do it (I
 think!). However, it is a considerable amount of work to set this up
 manually, plus, it has been a headache realizing how to get there at
 all. I'm hoping that one or more of the developers think it would be a
 good idea for PostgreSQL to perform an internal table optimization
 process using run-length encoding. Imagine you could just throw all your
 data into one table, run OPTIMIZE TABLE and you'd be done. With SQL
 being all about tables I'm surprised this idea (or something even
 better) hasn't been implemented already.

There was a recent brief thread here on storing timeseries data, where the
use of clustered indices for static tables was suggested. This might also
be useful in your situation...

Cheers,

 Brent Wood

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq