[GENERAL] Backup Policy Disk Space Issues

2008-12-22 Thread Volkan YAZICI
Hi,

In the company, we're facing with serious disk space problems which is
not caused by PostgreSQL, but the nature of our data. Database sizes are
around 200-300GB, which is relatively not that much, but databases
require strict backup policies:

- Incremental backup for each day. (250GB)
- Full backup for each week of the last month. (4 x 250GB)
- Full backup for each month of the last year. (12 x 250GB)

As a result, we require a space of size (roughly)

  250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB

for each server per year. Considering we have ~15 servers,

  15x4250 = 63750 = 62.25TB

as can be seen, growth of the backed up data sizes have almost no
relations with the actual data sizes. At the moment, we're using tape
drive cartridges for weekly and monthly backups. But the incremental
backups, plus the database itself requires a constant space of size
~500GB.

To summarize, as a DBA most of my time is wasting with validating if the
backup policies performed right, cartridges captioned correctly, etc.
What are your experiences with similar sizes of data? How do you cope
with backups? Do you recommend any other hardware/software solutions?


Regards.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Grzegorz Jaśkiewicz
On Mon, Dec 22, 2008 at 5:41 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 I think one of the points that proves this is the chunks of innovative
 code that have been put into postgresql that were basically written by
 one or two guys in  1 year.  Small sharp teams can tackle one
 particular problem and do it very well in an open source project.

Which is precisely why big smart companies divide up projects into
smaller teams - to achieve same goal. it is well known fact, that more
developers means more chaos, and less done on time. As my friend puts
it - you cannot expect 9 pregnant woman to deliver in 1 month :) I
know for a fact that microsoft, xensource and few others tackle
projects in small teams of brilliant engineers.
 I don't know how oracle does it, but the whole thing is rather hudge,
so there must be quite few developers involved - at least in whole
middleware. Installing it on my laptop took about 2 hours (MBP, 2GB of
ram, centos) - compared to postgresql... Thank god pg developers not
decided to use java gui to 'script' whole thing, I think oracle would
be much better off without whole java crap around it (but that's just
my opinion).

-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Geoffrey

Jonah H. Harris wrote:

On Sun, Dec 21, 2008 at 11:02 PM, Scott Marlowe scott.marl...@gmail.com wrote:

The difference is HE put forth an opinion about the pg developers
being smarter, but you put forth what seems like a statement of fact
with no evidence to back it up.  One is quite subjective and open for
debate on both sides, and often to good effect.  The other is a
statement of fact regarding scalability in apparently all usage
circumstances, since it wasn't in any way clarified if you were
talking about a narrow usage case or all of the possible and / or
probably ones.


Agreed.  It's just that, because I know quite a few of the engineers
working on Oracle and SQL Server, it generally pisses me off to see
people make blanket statements about one group being smarter than
another when they probably have no basis for comparison.  It's all
good though, I'm just cranky tonight.


I still haven't seen a post regarding the Oracle scalability issue. 
Where is the data??


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] lack of consequence with domains and types

2008-12-22 Thread Grzegorz Jaśkiewicz
so, consider this one:

create sequence seq1;

create domain foo1 as bigint default nextval('seq1') not null;
create domain foo2 as timestamp without time zone default now() not null;
create type footype as
(
  a foo1,
  b foo2
) ;

create table bar(a bigint not null, b varchar(20));
insert into bar(a) select generate_series(1,100);
alter table bar add column blah footype not null;


ERROR:  column blah contains null values


:/


I was expecting domains to kick in with their default values again. I
presume this is somehow similar to problem with enums I raised before.

Obviously I can work around that thing with:

create sequence seq1;

create type footype as
(
  a bigint,
  b timestamp without time zone
);

create table bar(a bigint not null, b varchar(20));
insert into bar(a) select generate_series(1,100);

alter table bar add column blah footype not null default (
nextval('seq1'), now()) ;


but that defeats whole purpose of domains, doesn't it ?

well, on top of that - I could create another domain with default
(nextval, now), but still
The feature of domains and types is really great, but I see a lack of
consequence here. It would be great to see that fixed in future
versions of pg.


Thanks :)

-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] lack of consequence with domains and types

2008-12-22 Thread Grzegorz Jaśkiewicz
On Mon, Dec 22, 2008 at 1:49 PM, Grzegorz Jaśkiewicz gryz...@gmail.com wrote:
 but that defeats whole purpose of domains, doesn't it ?

 well, on top of that - I could create another domain with default
 (nextval, now), but still

Well I can't, it doesn't work :(

create domain xyz as footype default(nextval('seq1'), now()) not null;
ERROR:  xyz is not a valid base type for a domain


huh

-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Jonah H. Harris
On Mon, Dec 22, 2008 at 7:37 AM, Geoffrey li...@serioustechnology.com wrote:
 I still haven't seen a post regarding the Oracle scalability issue. Where is
 the data??

You mean the PG scalability issue in comparison to Oracle?

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Alvaro Herrera
Scott Marlowe escribió:

 The difference is HE put forth an opinion about the pg developers
 being smarter, but you put forth what seems like a statement of fact
 with no evidence to back it up.

The other difference is that I said it jokingly, whereas you (Jonah)
seem to be bitter about the whole matter.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] design available tricks: traversing heterogeneous tree (table-level + linked list)

2008-12-22 Thread Ivan Sergio Borgonovo
I've such a structure:

create table catalog_fam (
  famid int primary key,
  name varchar(255),
  action smallint
);
create table catalog_macro (
  macroid int primary key,
  famid int references catalog_fam (famid),
  name varchar(255),
  action smallint
);
create table catalog_cat (
  catid int primary key,
  macroid int references catalog_macro (macroid),
  name varchar(255),
  action smallint
);
create table catalog_group (
  groupid int primary key,
  catid int references catalog_cat (catid),
  groupup references catalog_group (groupid),
  action smallint
);
create table catalog_categoryitem(
  itemid int references catalog_item (itemid),
  famid int references catalog_fam (famid),
  macroid int references catalog_macro (macroid),
  catid int references catalog_cat (catid),
  groupid references catalog_group (groupid),
  action smallint
);

where action is (update, delete, insert).

And I've to build up functions that eg.
- tell me what are the categories (where category may stand for
famid, macroid, catid, groupid) have the same parent
- tell me which are the items that belong to the same category
- tell me which are the categories that belong to the same level of
a parent...

And I'd like to:
- offer a uniform interface to the client (php) through plpsql
functions
- avoid to get crazy writing and maintaining such functions.

I could even implement a completely different schema to solve the
above, but then I'll have to face the problem of importing the data.

I've just a constraint if I'll have to change the schema: there is
already some logic depending on:

create table catalog_categoryitem(
  itemid int references catalog_item (itemid),
  famid int references catalog_fam (famid),
  -- ... the rest doesn't matter
);

and

catalog_fam

but refactoring this could be worth since:
a) there is no item that belong to more than one fam
b) famid is frequently accessed when items are and I could avoid a
join

Still there may be some interest in having fast access/grouping and
traversing of stuff with same catid and macroid in the future.
So a nested set (mptt) to represent the hierarchy may not be optimal
and wring a safe and *fast* import function may not be trivial.

I was wondering if there is some cool feature or cool contrib
(8.3) that could make the choice much easier to take or just some
suggestion.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Jonah H. Harris
On Mon, Dec 22, 2008 at 8:22 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 The other difference is that I said it jokingly, whereas you (Jonah)
 seem to be bitter about the whole matter.

Well, it wasn't clear and I was just in a generally bad mood.  Usually
you'd add a :) at the end, which you didn't this time.  So, I wasn't
sure whether you were being serious or not.

I'm only bitter about people bashing things they don't know just for
the sake of bashing them.  It wasn't anything directly against you,
it's just that the anti-any-other-database types of comments seem to
perpetuate more misunderstanding of the other systems.  For the
record, the rest of your post was full of information, so I know
that's not what you were doing.  It was just the aforementioned
comment, which I wasn't sure was a joke.  That's why my response to
you was written as a question rather than a lengthy discussion of
how/why Oracle does things that way.

-Jonah

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread James B. Byrne
In-Reply-to: 200812220435.mbm4zmd07...@momjian.us

On: Sun, 21 Dec 2008 23:35:48 -0500 (EST), Bruce Momjian
br...@momjian.us wrote:

 I am sure there are smart people at all the database companies.  I do
 believe that open source development harnesses the abilities of its
 intelligent people better than commercial companies.

I doubt that this is the case.  In fact, I would venture that the chief
advantage of open source / free software projects over their commercial
brethren is that there is no harness at all.

The advantage of OS/FS is that theirs is truly a Darwinian struggle.  OS
Projects that prove less fit for their environment pass away fairly
quickly for want of any real support. This is especially evident when a
significantly superior approach evolves outside the project or the social
behaviour of the team leads to self destructive actions within.

Commercial projects are not as directly susceptible to this process as
they do not exist for their own sake, but rather as an artifact of another
process, that of a commercial enterprise.  Those projects survival is more
a consequence of, and dependent upon, the survival of the their supporting
social structure, the enterprise itself.

I think that to describe either OS or commercial software as better or
worse is misleading.  The most that can be said is that each approach
serves a different purpose and exists in a different environment.

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Jonah H. Harris
On Mon, Dec 22, 2008 at 9:35 AM, James B. Byrne byrn...@harte-lyne.ca wrote:
 I think that to describe either OS or commercial software as better or
 worse is misleading.  The most that can be said is that each approach
 serves a different purpose and exists in a different environment.

Well said.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Geoffrey

Jonah H. Harris wrote:

On Mon, Dec 22, 2008 at 7:37 AM, Geoffrey li...@serioustechnology.com wrote:

I still haven't seen a post regarding the Oracle scalability issue. Where is
the data??


You mean the PG scalability issue in comparison to Oracle?


Yes.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Policy Disk Space Issues

2008-12-22 Thread David Fetter
On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote:
 Hi,
 
 In the company, we're facing with serious disk space problems which
 is not caused by PostgreSQL, but the nature of our data. Database
 sizes are around 200-300GB, which is relatively not that much, but
 databases require strict backup policies:
 
 - Incremental backup for each day. (250GB)

What exactly does this mean in the context of PostgreSQL?  We don't,
as far as I've been able to determine, support this in either the
community branch or even in any proprietary one.

 - Full backup for each week of the last month. (4 x 250GB)
 - Full backup for each month of the last year. (12 x 250GB)
 
 As a result, we require a space of size (roughly)
 
   250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB
 
 for each server per year. Considering we have ~15 servers,
 
   15x4250 = 63750 = 62.25TB

SATA disk space is quite cheap these days, so unless something is very
badly wrong with your funding model, this is not really a problem.

Here's one outfit that will build and configure storage hardware for
you:

http://www.capricorn-tech.com/

Cheers,
David.

 as can be seen, growth of the backed up data sizes have almost no
 relations with the actual data sizes. At the moment, we're using tape
 drive cartridges for weekly and monthly backups. But the incremental
 backups, plus the database itself requires a constant space of size
 ~500GB.
 
 To summarize, as a DBA most of my time is wasting with validating if the
 backup policies performed right, cartridges captioned correctly, etc.
 What are your experiences with similar sizes of data? How do you cope
 with backups? Do you recommend any other hardware/software solutions?

 
 
 Regards.
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] lack of consequence with domains and types

2008-12-22 Thread Erik Jones

On Dec 22, 2008, at 4:49 AM, Grzegorz Jaśkiewicz wrote:


so, consider this one:

create sequence seq1;

create domain foo1 as bigint default nextval('seq1') not null;
create domain foo2 as timestamp without time zone default now() not  
null;

create type footype as
(
 a foo1,
 b foo2
) ;

create table bar(a bigint not null, b varchar(20));
insert into bar(a) select generate_series(1,100);
alter table bar add column blah footype not null;


ERROR:  column blah contains null values


:/


I was expecting domains to kick in with their default values again. I
presume this is somehow similar to problem with enums I raised before.


Hmm...  While I see what you mean I don't think that would be  
desirable as it would conflict with the expected behavior of ALTER  
TABLE when adding columns without explicit defaults set in the ALTER  
TABLE command:


 When a column is added with ADD COLUMN, all existing rows in the  
table are initialized with the column's default value (NULL if no  
DEFAULT clause is specified).


Adding a column with a non-null default or changing the type of an  
existing column will require the entire table to be rewritten. This  
may take a significant amount of time for a large table; and it will  
temporarily require double the disk space. 


This says that it you add a column without an explicit default in the  
ALTER TABLE statement it definitely will *not* write any values and  
will be a quick operation.  Since it doesn't look to write any values  
the domain's default is never looked at.  If it did then it would  
contradict the defined behavior of adding a column without a default  
value in the ALTER TABLE statement.



Obviously I can work around that thing with:

create sequence seq1;

create type footype as
(
 a bigint,
 b timestamp without time zone
);

create table bar(a bigint not null, b varchar(20));
insert into bar(a) select generate_series(1,100);

alter table bar add column blah footype not null default (
nextval('seq1'), now()) ;


but that defeats whole purpose of domains, doesn't it ?

well, on top of that - I could create another domain with default
(nextval, now), but still
The feature of domains and types is really great, but I see a lack of
consequence here. It would be great to see that fixed in future
versions of pg.


As mentioned above, by fixing the behavior to be what you're  
expecting you'd be breaking the defined behavior of ALTER TABLE.


Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Ordering returned rows according to a list

2008-12-22 Thread Sebastian Tennant
Hi all,

I'd like to make a single query that returns a number of rows using a
'WHERE id IN (list-of-ids)' condition, but I'd like the rows to be
returned in the order in which the ids are given in the list.

Is this possible?

Sebastian


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Erro in vaccum

2008-12-22 Thread paulo matadr

My vacuum was follow error below:
WARNING:  oldest xmin is far in the past
HINT:  Close open transactions soon to avoid wraparound problems.
No have   transactions in locked ,

what's  could be happen?
Paulo Moraes


  Veja quais são os assuntos do momento no Yahoo! +Buscados
http://br.maisbuscados.yahoo.com

Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Christophe
Playing the straight man, I have to ask: Scalability issues with locks  
in PG vs Oracle?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ordering returned rows according to a list

2008-12-22 Thread Adam Rich
 
 Hi all,
 
 I'd like to make a single query that returns a number of rows using a
 'WHERE id IN (list-of-ids)' condition, but I'd like the rows to be
 returned in the order in which the ids are given in the list.
 
 Is this possible?
 

Depending on how many IDs you have in your list, you can accomplish this
with a CASE statement:

SELECT *
FROM MYTABLE
WHERE id IN (6, 9, 3)
ORDER BY CASE id 
WHEN 6 then 1
WHEN 9 then 2
WHEN 3 then 3 END










-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ordering returned rows according to a list

2008-12-22 Thread Sebastian Tennant
Quoth Adam Rich ada...@sbcglobal.net:
 I'd like to make a single query that returns a number of rows using a
 'WHERE id IN (list-of-ids)' condition, but I'd like the rows to be
 returned in the order in which the ids are given in the list.

 Depending on how many IDs you have in your list, you can accomplish this
 with a CASE statement:

 SELECT *
 FROM MYTABLE
 WHERE id IN (6, 9, 3)
 ORDER BY CASE id 
 WHEN 6 then 1
 WHEN 9 then 2
 WHEN 3 then 3 END

Thanks for this suggestion Adam.

You say depending on how many IDs there are in the list.  The query is
constructed programatically so unless there's a limit on the number of
conditions a CASE clause can handle, this is the way I'll go about it.

Sebastian


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Erro in vaccum

2008-12-22 Thread Scott Marlowe
On Mon, Dec 22, 2008 at 10:22 AM, paulo matadr saddon...@yahoo.com.br wrote:

 My vacuum was follow error below:
 WARNING:  oldest xmin is far in the past
 HINT:  Close open transactions soon to avoid wraparound problems.
 No have   transactions in locked ,

Transactions don't have to hold locks to cause problems.  Generally
speaking you've got an old transaction being held open.  what does:

select min(xact_start) from pg_stat_activity;

say?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using the wrong index (very suboptimal), why?

2008-12-22 Thread Shane Wright
Hi,

I have somewhat of a quandary with a large table in my database;
PostgreSQL is choosing the 'wrong' index for a certain kind of query;
causing performance to become an order of magnitude slower (query times
usually measured in milliseconds now become seconds/minutes!).

It's not that it isn't using any index (so enable_seqscan=off doesn't
help), it's that the index it picks is suboptimal.

The query is based on 3 of the table columns - there is an index on all
three, but it prefers to use an index on just two of them, then
filtering by the 3rd (EXPLAIN output is below - showing the query as
well).

I've tried everything I can find to make it not do this (config
parameters, statistics target, vacuuming more...)  - what am I missing?

It happened a couple of times before - after some unusual activity
patterns in a client script - but repeatedly vacuuming until the planner
changed its mind made it go away.  We then made config changes
(random_page_cost and effective_cache_size) thinking that fixed the
reason why it made the bad decision in the first place...but it would
appear not

The only thing remaining I can think of is rebuilding the 'preferred'
index; that should reduce its size from 27Gb down to something more
manageable; maybe that will let the database want to use it more because
there'll be less I/O?

I've considered upgrading to 8.3.5 as well, but I can't see anything in
the changelogs that would fix this issue (I appreciate upgrading is a
good idea anyway, that will be done at some point soon anyway).


Hopefully all relevant info is listed below - if anyone's got any ideas
I'd appreciate any help or pointers anyone can give, thanks...



The server is PostgreSQL 8.3.0 on Linux with 32Gb RAM.  /var/lib/pgsql/
is on a fibre-channel SAN.  This table has around 680 million rows - and
has been reasonably regularly vacuumed, but is probably in dire need of
a VACUUM FULL and REINDEX to reclaim dead space (see disk space info at
the bottom of the post).


emystery=# \d answers
  Table public.answers
 Column |  Type   | Modifiers
+-+-
---
 aid| integer | not null default
nextval(('seq_answers_aid'::text)::regclass)
 rid| integer | not null
 nid| integer | not null
 iid1   | integer |
 iid2   | integer |
 iid3   | integer |
 text   | text|
 extra  | bigint  |
Indexes:
answers_pkey PRIMARY KEY, btree (aid)
index_answers_iid1 btree (iid1) WHERE iid1  0
index_answers_iid2 btree (iid2) WHERE iid2  0
index_answers_iid3 btree (iid3) WHERE iid3  0
index_answers_nidiid1 btree (nid, iid1)
index_answers_ridnidiid1 btree (rid, nid, iid1)



This is what it is doing [slowly] - but for many values of rid/nid/iid1
there are a lot of rows to filter through (tens/hundreds of thousands)
so this can take many seconds or minutes:

emystery=# explain select * from answers where rid=1 and nid=2 and
iid1=3;

-
 Index Scan using index_answers_nidiid1 on answers  (cost=0.00..28.74
rows=1 width=62)
   Index Cond: ((nid = 2) AND (iid1 = 3))
   Filter: (rid = 1)



This is the pattern it *should* use (and does use on other installations
of similar/older data).  When this pattern is used the query always
completes in 1 second, usually ~0.2 seconds!


emystery20080821=# explain select * from answers where rid=1 and nid=2
and iid1=3;


 Index Scan using index_answers_ridnidiid1 on answers  (cost=0.00..99.04
rows=1 width=67)
   Index Cond: ((rid = 1) AND (nid = 2) AND (iid1 = 3))




The PostgreSQL configuration has been altered to help prefer random
lookups [via an index], and to indicate to the database how much data
the OS 

#seq_page_cost = 1.0# measured on an arbitrary scale
random_page_cost = 1.0  # same scale as above
#cpu_tuple_cost = 0.01  # same scale as above
#cpu_index_tuple_cost = 0.005   # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 31GB


The table has been VACUUM ANALYSE'd (repeatedly!) to no avail.

We've also increased the statistics target for the columns in the table
to 200 (from 10) which still isn't making any difference (I thought
because the table is so large and the dataset is quite uneven that this
should help...it's now at 1000 and vacuuming again so we might be
lucky...)

alter table answers alter column rid set statistics 200;
alter table answers alter column nid set statistics 200;
alter table answers alter column iid1 set statistics 200;



Here is the [full] output from the most recent VACUUM:

emystery=# vacuum verbose analyse answers;
INFO:  vacuuming public.answers
INFO:  scanned index index_answers_nidiid1 to 

Re: [GENERAL] Backup Policy Disk Space Issues

2008-12-22 Thread Volkan YAZICI
On Mon, 22 Dec 2008, David Fetter da...@fetter.org writes:
 On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote:
 Hi,
 
 In the company, we're facing with serious disk space problems which
 is not caused by PostgreSQL, but the nature of our data. Database
 sizes are around 200-300GB, which is relatively not that much, but
 databases require strict backup policies:
 
 - Incremental backup for each day. (250GB)

 What exactly does this mean in the context of PostgreSQL?  We don't,
 as far as I've been able to determine, support this in either the
 community branch or even in any proprietary one.

I tried to mean WAL shipping in here. (You know, business terminology
for n00b boss staff.)

 - Full backup for each week of the last month. (4 x 250GB)
 - Full backup for each month of the last year. (12 x 250GB)
 
 As a result, we require a space of size (roughly)
 
   250 + 4x250 + 12x250 = 17x250 = 4250GB = 4.15TB
 
 for each server per year. Considering we have ~15 servers,
 
   15x4250 = 63750 = 62.25TB

 SATA disk space is quite cheap these days, so unless something is very
 badly wrong with your funding model, this is not really a problem.

Umm... A minority of the servers have SATA interface. (Most of 'em use
SAS drives and SAN systems.)

 Here's one outfit that will build and configure storage hardware for
 you:

 http://www.capricorn-tech.com/

Interesting I'll check it out.


Regards.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How are locks managed in PG?

2008-12-22 Thread Jonah H. Harris
On Mon, Dec 22, 2008 at 12:34 PM, Christophe x...@thebuild.com wrote:
 Playing the straight man, I have to ask: Scalability issues with locks in PG
 vs Oracle?

(in slow motion) no.  Locks aren't something particular I'd
like to discuss, this topic just came from a post upthread.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] lack of consequence with domains and types

2008-12-22 Thread Grzegorz Jaśkiewicz
On Mon, Dec 22, 2008 at 6:10 PM, Erik Jones ejo...@engineyard.com wrote:
 As mentioned above, by fixing the behavior to be what you're expecting
 you'd be breaking the defined behavior of ALTER TABLE.

I don't understand. The domain's have default values, how will it
break alter table ? Please explain.



-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Policy Disk Space Issues

2008-12-22 Thread Craig Ringer
Volkan YAZICI wrote:
 On Mon, 22 Dec 2008, David Fetter da...@fetter.org writes:
 On Mon, Dec 22, 2008 at 10:07:21AM +0200, Volkan YAZICI wrote:
   15x4250 = 63750 = 62.25TB
 SATA disk space is quite cheap these days, so unless something is very
 badly wrong with your funding model, this is not really a problem.
 
 Umm... A minority of the servers have SATA interface. (Most of 'em use
 SAS drives and SAN systems.)

Yes... but you can buy new SATA storage enclosures or storage servers.
SATA storage enclosures with SAS and/or Fibre Channel interfaces to the
host exist, and are suitable for exactly this sort of bulk
low-performance archival storage role. You have enough data that you
should be expecting to spend a bit on backups I'm afraid.

SATA storage arrays might not be ideal for your long-term storage full
backups, but they're perfect for the storage of WAL archives 
snapshots, and for the shorter-lived backups that you'll periodically
rotate out.

I built an 8TB storage server for the (small) company I work for at a
pitiful cost to ensure that we always had at least two versions of all
backups in storage that was reliable*, immediately accessible, and
encrypted in case of theft. It's not the only backup mechanism, but it's
the main one and by its self is adequate for all but the most critical
data. It's hard to overemphasise the benefits its had in terms of
improved backup reliability and quick access to backups.

About 100tb, which is about what I'd plan for in your case, is ... more
expensive.  That said, with redundancy within each enclosure and between
them it'd be a pretty solid way to store your backups. It helps that you
may not want to store your long-term archival backups on SATA arrays,
and it's also not clear to what extent you've investigated options for
reducing your backup sizes in the first place. 40-50 TB is not an
unreasonable amount of storage to pick up in the form of arrays of large
external SATA enclosures.

In particular, if you're backing up the database cluster at the file
system level, you might want to look into using dumps for your
longer-lived backups instead. For one thing, a compressed dump tends to
be a LOT smaller then a filesystem-level cluster backup of a Pg cluster,
and for another you protect yourself against most forms of undiscovered
corruption in the cluster.

If you do go for SATA storage, avoid systems that rely on SATA
multiplexers if possible. They're REALLY slow, and are particularly
awful in RAID environments. Given that alternatives that have many SATA
interfaces and a single SAS port for the host interface exist, as do
internally RAID-ed Fibre Channel options, multiplexer based systems
don't seem worth it.

* with RAID and proper array scrubbing on a server attached to a UPS
it's WAY more reliable than the previous DDS-4 DAT backups. It also has
the advantage of not needing five or six tapes per day and operating
completely unattended, so risk of human error is drastically reduced.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using the wrong index (very suboptimal), why?

2008-12-22 Thread Tom Lane
Shane Wright shane.wri...@edigitalresearch.com writes:
 It's not that it isn't using any index (so enable_seqscan=off doesn't
 help), it's that the index it picks is suboptimal.

 The query is based on 3 of the table columns - there is an index on all
 three, but it prefers to use an index on just two of them, then
 filtering by the 3rd (EXPLAIN output is below - showing the query as
 well).

Well, that isn't ipso facto a stupid choice to make.  The 3-column
index is larger and slower to search, so it should only use it if
there's a substantial reduction in the number of heap rows to be
visited.  You say that there is, but the estimated costs in the EXPLAINs
you provide sure don't indicate that the planner thinks so.  You need to
look into what the estimated vs actual rowcounts are for just the
two-column condition (ie, where nid = something AND iid1 = something).
Getting those estimates to match reality is the key to getting a sane
plan choice here.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general