[HACKERS] SET WITHOUT OIDS and VACUUM badness?

2004-01-21 Thread Christopher Kings-Lynne
This is what we did:

0. BEGIN;

1. ALTER TABLE ... SET WITHOUT OIDS

2. A bunch of things are selected out of this table and inserted into 
another (using INSERT ... SELECT)

3. An index is created on a timestamp field on this table

4. Then there's an update on a related table, that selects stuff from 
this table.

5. Renames a column

6. Drops a constraint

7. Adds a foreign key

8. Drops 8 columns

9. Drops 2 indexes

10. Drops 3 triggers

11. Then a tsearch 'txtidx' field is updated, and then cancelled halfway 
through

12. ROLLBACK;

13. VACUUM FULL forums_posts;

Then we get thousands of these:

WARNING:  relation forums_posts TID 22763/10: OID is invalid
WARNING:  relation forums_posts TID 22763/11: OID is invalid
WARNING:  relation forums_posts TID 22763/12: OID is invalid
WARNING:  relation forums_posts TID 22763/13: OID is invalid
WARNING:  relation forums_posts TID 22763/14: OID is invalid
WARNING:  relation forums_posts TID 22763/15: OID is invalid
WARNING:  relation forums_posts TID 22763/16: OID is invalid
WARNING:  relation forums_posts TID 22763/17: OID is invalid
WARNING:  relation forums_posts TID 22764/1: OID is invalid
WARNING:  relation forums_posts TID 22764/2: OID is invalid
WARNING:  relation forums_posts TID 22764/3: OID is invalid
WARNING:  relation forums_posts TID 22764/4: OID is invalid
This seems to be reproducible...

Chris

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] SET WITHOUT OIDS and VACUUM badness?

2004-01-21 Thread Dennis Bjorklund
On Wed, 21 Jan 2004, Christopher Kings-Lynne wrote:

 This seems to be reproducible...

Here is a smaller example that show the problem:

CREATE TABLE foo (a INT);

BEGIN;
  ALTER TABLE foo SET WITHOUT OIDS;
  INSERT INTO foo values (5);
ROLLBACK;

VACUUM FULL foo;

It's easy to guess what is causing this, but I'll leave that to the person 
that wants to fix it.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] SET WITHOUT OIDS and VACUUM badness?

2004-01-21 Thread Gavin Sherry
On Wed, 21 Jan 2004, Christopher Kings-Lynne wrote:

 This is what we did:

 0. BEGIN;

 1. ALTER TABLE ... SET WITHOUT OIDS

 12. ROLLBACK;

 13. VACUUM FULL forums_posts;

The problem here is that this conditional doesn't take into account the
change in state which the above transaction causes:

if (onerel-rd_rel-relhasoids 
!OidIsValid(HeapTupleGetOid(tuple)))

Tuples inserted after step one have no (valid) OID. However, since we
rollback, the change to pg_class.relhasoids = 'f' is rolled back. The
only solution I can think of is removing the test or storing relhasoids as
a per tuple flag (argh).

Gavin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] SET WITHOUT OIDS and VACUUM badness?

2004-01-21 Thread Gavin Sherry
On Wed, 21 Jan 2004, Gavin Sherry wrote:

 On Wed, 21 Jan 2004, Christopher Kings-Lynne wrote:

  This is what we did:
 
  0. BEGIN;
 
  1. ALTER TABLE ... SET WITHOUT OIDS

  12. ROLLBACK;
 
  13. VACUUM FULL forums_posts;

 The problem here is that this conditional doesn't take into account the
 change in state which the above transaction causes:

 if (onerel-rd_rel-relhasoids 
 !OidIsValid(HeapTupleGetOid(tuple)))

 Tuples inserted after step one have no (valid) OID. However, since we
 rollback, the change to pg_class.relhasoids = 'f' is rolled back. The
 only solution I can think of is removing the test or storing relhasoids as
 a per tuple flag (argh).

What am I talking about. Can't we test for:

(tuple)-t_infomask  HEAP_HASOID

Instead of:

onerel-rd_rel-relhasoids

Gavin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Slow query - index not used

2004-01-21 Thread Michael Brusser
 What's the actual distribution of values in these columns?
 Are you searching for values that are particularly common
 or uncommon?

This column always has a predefined set of values.
Usually the app. would search for one of the
existing values.

---
Total records:  74654
---
nt_note_name| count
+--
 Collection | 10068
 Component  |  1200
 CustTicket | 15009
 Deliver|  1201
 Download   |  1999
 GroupProv  |   464
 IP_News|  5950
 IP_Publish_Request |  4000
 IP_Usage   |  2000
 KnowledgeBase  | 15002
 LevelInfo  |10
 OtherParam |  4000
 Request|  4501
 TestMethod |  4050
 VerTech|  4000
 Version|  1200
---

I started from scratch: took out param DEFAULT_STATISTICS_TARGET from
config file, restarted db, ran vacuum analyze, then got this statistics:


# explain select count (1) from note_links_aux
 where nt_note_name = 'KnowledgeBase';

 Aggregate  (cost=1982.68..1982.68 rows=1 width=0)
   -  Seq Scan on note_links_aux  (cost=0.00..1970.18 rows=5002 width=0)
 Filter: (nt_note_name = 'KnowledgeBase'::character varying)


# explain select count (1) from note_links_aux
 where nt_note_name = 'OtherParam';

 Aggregate  (cost=1984.78..1984.78 rows=1 width=0)
   -  Seq Scan on note_links_aux  (cost=0.00..1970.18 rows=5840 width=0)
 Filter: (nt_note_name = 'OtherParam'::character varying)


# explain select count (1) from note_links_aux
 where nt_note_name = 'LevelInfo';

 Aggregate  (cost=58.91..58.91 rows=1 width=0)
   -  Index Scan using nla_nt_note_name_fk_i on note_links_aux
  (cost=0.00..58.87 rows=15 width=0)
 Index Cond: (nt_note_name = 'LevelInfo'::character varying)


# explain select count (1) from note_links_aux
 where nt_note_name = 'NoSuchThing';

Aggregate  (cost=5.83..5.83 rows=1 width=0)
   -  Index Scan using nla_nt_note_name_fk_i on note_links_aux
 (cost=0.00..5.83 rows=1 width=0)
 Index Cond: (nt_note_name = 'NoSuchThing'::character varying)



So 'rows' values are incorrect. Also looking at queries with 'KnowledgeBase'
and 'OtherParam' - does seq. scan make sense?

I mean 'rows' has value of about 5000 records from the total of 75000
records on the table. This ratio does not seem high enough to assume
that index scan won't be benefitial.
And even if we consider the real number of records - 5000, this is still
only 20% of the total. Would an index scan be still faster?

Sorry if I put here more info than you need.
Thanks,
Mike.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Slow query - index not used

2004-01-21 Thread Dennis Bjorklund
On Wed, 21 Jan 2004, Michael Brusser wrote:

 So 'rows' values are incorrect.

You can increase the statistics-gathering for that column with ALTER TABLE 
and probably get better estimates.

 Also looking at queries with 'KnowledgeBase'
 and 'OtherParam' - does seq. scan make sense?
 
 I mean 'rows' has value of about 5000 records from the total of 75000
 records on the table.

It doesn't sound so strange to me. I don't know exactly what limits 
postgresql uses but it probably need to fetch every page in the table to 
find all those 5000 records. If it has to do that then the index scan 
would not help that much (it might even make it slower).

It's easy to test what happens if it do the index scan instead of the seq. 
scan. Just do SET enable_seqscan TO false; before you try the query. Then 
you can compare the times with and without index scan.

Remember, even if it finds a row in the index. it still has to fetch the 
actual row from the table also. So if it needs to fetch all pages from the 
table the total amount of IO is all of the table + the relevant part of 
the index. The if it's faster or not depends on such things as if it's 
already cached in memory. Setting the effective_cache_size correctly lets 
postgresql take into account how much file cache you have which can effect 
the plan.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Brian Moore [EMAIL PROTECTED] writes:
  i would like to begin work on the TODO item
Allow backend to output result sets in XML
 
 I am not sure why it's phrased that way --- surely the code to hack on
 is the client side, not the backend.  Otherwise you need a protocol
 revision to make this happen, which implies hacking *both* ends.

Presumably libpq would continue to use the binary protocol, but other clients
could bypass libpq and just stream ascii xml queries.

Personally I don't see any point in xml, but if there was a standard query
protocol then a client could send queries to any database that supported it
without using any libraries. That might be useful. Of course you could do that
without xml, but people seem to get more excited about complying with
standards when they invoke xml.

 psql already has some code to output results as HTML tables; I'd think
 adding functionality in that vicinity would be the way to go.

That could also be useful, mainly in that it could include the data from the
query, as well as some meta data. Allowing import tools for programs like
spreadsheets to do more intelligent things with the data than currently.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Merlin Moncure
Greg Stark wrote:
 Personally I don't see any point in xml, but if there was a standard
query
 protocol then a client could send queries to any database that
supported
 it
 without using any libraries. That might be useful. Of course you could
do
 that
 without xml, but people seem to get more excited about complying with
 standards when they invoke xml.

hm.  I have to deal with xml quite frequently because I do a lot of DX
with the gov't and other entities that are rapidly standardizing on xml.

I like Oracle's approach to xml using object relational mappings to
allow composition of documents server side based on natural data
relationships.  The XML document becomes something like a specialized
view.  It would save me tons of app-level coding if the server could do
this for me.

Since postgres is already fairly Oracle-ish in design, IMO this is
definitely the way to go (XQuery = Insanity.).  A FE/BE protocol
revision would be useful but not necessary...the XML doc could be
returned as a scalar.

Right now I think all xml processing is done in app-level code, because
the server (due to limitations of sql) is usually unable to return data
the way you want it...so simply adding xml output from psql would be
fairly useless for most real tasks (if it wasn't, someone would have
done it a long time ago).  Also, contrib\xml can already handle most of
the simple things.

Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Joshua D. Drake




Greg Stark wrote:

  Tom Lane [EMAIL PROTECTED] writes:

  
  
Brian Moore [EMAIL PROTECTED] writes:


  i would like to begin work on the TODO item
  Allow backend to output result sets in XML
  

I am not sure why it's phrased that way --- surely the code to hack on
is the client side, not the backend.  Otherwise you need a protocol
revision to make this happen, which implies hacking *both* ends.

  
  
Presumably libpq would continue to use the binary protocol, but other clients
could bypass libpq and just stream ascii xml queries.
  

I would think that you would still use libpq with the binary protocol
that understood an xml header request
of some sort??

J




  
Personally I don't see any point in xml, but if there was a standard query
protocol then a client could send queries to any database that supported it
without using any libraries. That might be useful. Of course you could do that
without xml, but people seem to get more excited about complying with
standards when they invoke xml.

  
  
psql already has some code to output results as HTML tables; I'd think
adding functionality in that vicinity would be the way to go.

  
  
That could also be useful, mainly in that it could include the data from the
query, as well as some meta data. Allowing import tools for programs like
spreadsheets to do more intelligent things with the data than currently.

  



-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL




Re: [HACKERS] Slow query - index not used

2004-01-21 Thread Tom Lane
Michael Brusser [EMAIL PROTECTED] writes:
 So 'rows' values are incorrect.

You sound like you are expecting them to be exact.  They're just
estimates.  They're all plenty close enough for planning purposes,
except maybe the one for 'KnowledgeBase' is a little further off
than I would have expected.  That could just be a chance sampling
thing --- if you rerun ANALYZE and then check again, how much does that
estimate move around in a few tries?

 I mean 'rows' has value of about 5000 records from the total of 75000
 records on the table. This ratio does not seem high enough to assume
 that index scan won't be benefitial.

You're mistaken.  You didn't give any indication of how wide the rows
are in this table, but supposing for purposes of argument that they're
roughly 100 bytes apiece, there would be room for 80 rows on each 8K
page of the table.  A scan that needs to visit 1/15th of the table is
statistically certain to read nearly every page of the table, many of
them multiple times.  (This assumes the needed rows are randomly
scattered, which wouldn't be true if the table is clustered on the
index, but you didn't say anything about having done that.)  So an
indexscan doesn't save any I/O, and may cost I/O if some of the pages
fall out of buffer cache between visits.  Add to that the penalty for
random instead of sequential reads, and the cost to read the index
itself, and you come out way behind a seqscan.

This all assumes that I/O is far more expensive than CPU cycles.
If you have a database that's small enough to fit in RAM then the cost
of checking the WHERE clauses at each tuple could become the dominant
factor.  If that's your situation you should look at altering the
planner's cost parameters --- in particular lower random_page_cost and
increase the various CPU cost settings.

With the default cost settings, the planner will not pick an indexscan
if it thinks that more than a couple percent of the table needs to be
visited.  (I think the breakeven point is dependent on table size, but
don't have the details in my head right now.)  You can experiment with
EXPLAIN ANALYZE and the various enable_xxx settings to see if it's right
or not in your environment.

See the pgsql-performance archives for more discussion.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Bort, Paul
Please forgive me if this is silly, but if you wanted XML from the server,
couldn't you just write a PL/Perl untrusted function that takes a SELECT
statement as its parameter, and returns a single scalar containing the XML?
- The XML:: modules in Perl help with the XML formatting
- DBD::PgSPI could be handed the query as-is
- No change to BE/FE or wire protocols
- No impact on people who don't want it
- Probably works across versions with minimal fuss

Returning a simple XML structure with column names and rows should only take
a few lines. (I'd write an example if I knew XML:: better.)

I'll go back to lurking now. Thanks all for the great database! 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Peter Eisentraut
Tom Lane wrote:
 Brian Moore [EMAIL PROTECTED] writes:
  i would like to begin work on the TODO item
Allow backend to output result sets in XML

 I am not sure why it's phrased that way --- surely the code to hack
 on is the client side, not the backend.  Otherwise you need a
 protocol revision to make this happen, which implies hacking *both*
 ends.

 psql already has some code to output results as HTML tables; I'd
 think adding functionality in that vicinity would be the way to go.

Converting a libpq result set (or a JDBC result set or ...) to an XML 
document should be a trivial string concatenation job that anyone can 
implement in half an hour.  The more interesting questions are: what 
XML schema do you want to use and why?  What do you want to do with the 
XML in the first place?  Would a streaming interface be a better?  Do 
you want a text document or a preparsed structure?  What good would a, 
say, libpq implementation be if it's more work to make a wrapper in one 
of the other language bindings than implement it from scratch there?

I think output XML is just buzz.  Give us a real use scenario and an 
indication that a majority also has that use scenario (vs. the other 
ones listed above), then we can talk.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] testing mail relays ...

2004-01-21 Thread Marc G. Fournier

we lost a mail relay, so teseting to make sure configuration is still
working after replacing it with other local machines we have more contrl
over ...

 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Allow backend to output result sets in XML

2004-01-21 Thread Merlin Moncure
Peter Eisentraut wrote:
 I think output XML is just buzz.  Give us a real use scenario and an
 indication that a majority also has that use scenario (vs. the other
 ones listed above), then we can talk.

Consider:

create table person (name varchar primary key, age int);
create table account (number varchar primary key, name varchar
references person);
insert into person values ('Fred', 35);
insert into person values ('Barney', 37);
insert into account values ('1234', 'Fred');
insert into account values ('5678', 'Fred');
insert into account values ('abcd', 'Barney');

test=# select * from person into xml natural;
?xml version=1.0 encoding=UTF-8 ?
result
row n=1
person
nameFred/name
  age35/age
  account
  number1234/number
  /account
  account
  number5678/number
  /account
/person
/row
/result
row n=2
[...]

now consider:
select * from person into xml natural 
  namespace is 'some_uri'
  schema is 'person.xsd';

this returns result set above, but with schema and namespace
declarations included.  Of course, there is tons of complexity hiding in
there, but is this worth considering?

Merlin




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] CTTAS w/ DISTINCT ON crashes backend

2004-01-21 Thread Tom Lane
The crash I'm getting can be boiled down to this:

regression=# create table fooey(f1 int) without oids;
CREATE TABLE
regression=# insert into fooey values(11);
INSERT 0 1
regression=# create temp table fooey2 as select distinct * from fooey;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!

I'm getting an Assert failure, which no doubt detects the problem much
sooner than you were doing.  The problem is in adding OIDs to rows that
initially did not have 'em when returned from the SELECT DISTINCT plan.
So your best immediate workaround is to create the first temp table with
oids, or create the second one without.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] CTTAS w/ DISTINCT ON crashes backend

2004-01-21 Thread Mike Mascari
Tom Lane wrote:

The crash I'm getting can be boiled down to this:

regression=# create table fooey(f1 int) without oids;
CREATE TABLE
regression=# insert into fooey values(11);
INSERT 0 1
regression=# create temp table fooey2 as select distinct * from fooey;
server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!
I'm getting an Assert failure, which no doubt detects the problem much
sooner than you were doing.  The problem is in adding OIDs to rows that
initially did not have 'em when returned from the SELECT DISTINCT plan.
 

Okay.

So your best immediate workaround is to create the first temp table with
oids, or create the second one without.
 

Thanks!

Mike Mascari



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly