date:20060916

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-16 Thread Stefan Kaltenbrunner

[already sent a variant of that yesterday but it doesn't look like it
made it to the list]

Tom Lane wrote:
> Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Apparently we've made the planner a bit too optimistic about the savings
>>> that can be expected from repeated indexscans occurring on the inside of
>>> a join.
> 
>> effective_cache_size was set to 10GB(my fault for copying over the conf
>> from a 16GB box) during the run - lowering it just a few megabytes(!) or
>> to a more realistic 6GB results in the following MUCH better plan:
>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
> 
> Interesting.  It used to be that effective_cache_size wasn't all that
> critical... what I think this report is showing is that with the 8.2
> changes to try to account for caching effects in repeated indexscans,
> we've turned that into a pretty significant parameter.

took me a while due to hardware issues on my testbox - but there are new
results(with 6GB for effective_cache_size) up at:

http://www.kaltenbrunner.cc/files/5/

there are still a few issues with the validity of the run like the rf
tests not actually being done right - but lowering effective_cache_size
gave a dramtic speedup on Q5,Q7 and Q8.

that is the explain for the 4h+ Q9:

http://www.kaltenbrunner.cc/files/analyze_q9.txt

increasing the the statistic_target up to 1000 does not seem to change
the plan btw.

disabling nested loop leads to the following (4 times faster) plan:

http://www.kaltenbrunner.cc/files/analyze_q9_no_nest.txt

since the hash-joins in there look rather slow (inappropriate hashtable
set up due to the wrong estimates?) I disabled hash_joins too:

http://www.kaltenbrunner.cc/files/analyze_q9_no_nest_no_hashjoin.txt

and amazingly this plan is by far the fastest one in runtime (15min vs
4,5h ...) except that the planner thinks it is 20 times more expensive ...


Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] GIN documentation

2006-09-16 Thread David Fuhry


Teodor,

   Attached is a diff -c against your original gindocs patch.  I did my 
best not to change any of the semantics.  My changes no doubt overlap & 
conflict with those Jeff Davis sent you earlier, so consider both of our 
diffs.


Thanks,

Dave Fuhry

Teodor Sigaev wrote:

Patch adds GIN documentation and slightly improves GiST docs.

Somebody of native English speakers, pls, check the text... Thank you.





---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org
*** gindocs.orig	2006-09-17 00:21:38.0 -0400
--- gindocs	2006-09-17 00:57:12.0 -0400
***
*** 22,28 
  ! 	  
  ! 	  
  ! 	   
! ! 		Soft upper limit of the size of the returned set by GIN index. For more
  ! 		information see .
  ! 	   
  ! 	  
--- 22,28 
  ! 	  
  ! 	  
  ! 	   
! ! 		Soft upper limit of the size of the set returned by the GIN index. For more
  ! 		information see .
  ! 	   
  ! 	  
***
*** 88,95 
  +  
  +GIN stands for Generalized Inverted Index.  It is
  +an index structure storing a set of (key, posting list) pairs, where
! +'posting list' is a set of rows in which the key occurs. The
! +row may contains a lot of keys.
  +  
  + 
  +  
--- 88,95 
  +  
  +GIN stands for Generalized Inverted Index.  It is
  +an index structure storing a set of (key, posting list) pairs, where
! +'posting list' is a set of rows in which the key occurs. Each
! +row may contain many keys.
  +  
  + 
  +  
***
*** 178,184 
  +  
  +   
  + 	   Returns an array of keys of the query to be executed. n contains
! + 	   strategy number of operation (see ).
  + 	   Depending on n, query may be different type.
  +   
  +  
--- 178,184 
  +  
  +   
  + 	   Returns an array of keys of the query to be executed. n contains
! + 	   the strategy number of the operation (see ).
  + 	   Depending on n, query may be different type.
  +   
  +  
***
*** 188,196 
  +  bool consistent( bool check[], StrategyNumber n, Datum query)
  +  
  +   
! + 	   Returns TRUE if indexed value satisfies query qualifier with strategy n 
  + 	   (or may satisfy in case of RECHECK mark in operator class). 
! + 	   Each element of the check array is TRUE if indexed value has a 
  + 	   corresponding key in the query: if (check[i] == TRUE ) the i-th key of 
  + 	   the query is present in the indexed value.
  +   
--- 188,196 
  +  bool consistent( bool check[], StrategyNumber n, Datum query)
  +  
  +   
! + 	   Returns TRUE if the indexed value satisfies the query qualifier with strategy n 
  + 	   (or may satisfy in case of RECHECK mark in operator class). 
! + 	   Each element of the check array is TRUE if the indexed value has a 
  + 	   corresponding key in the query: if (check[i] == TRUE ) the i-th key of 
  + 	   the query is present in the indexed value.
  +   
***
*** 209,218 
  +Create vs insert
  +
  + 	
! + 	 In most cases, insertion into GIN index is slow enough
! + 	 due to a lot keys should be inserted per one value. So, for bulk upload
! + 	 data in table it will be useful to drop index and create it
! + 	 after finishing upload.
  + 	
  +
  +   
--- 209,218 
  +Create vs insert
  +
  + 	
! + 	 In most cases, insertion into a GIN index is slow
! + 	 due to the likelihood of many keys being inserted for each value. So, for bulk insertions into a
! + 	 table it is advisable to to drop the GIN index and recreate it
! + 	 after finishing bulk insertion.
  + 	
  +
  +   
***
*** 221,227 
  +gin_fuzzy_search_limit
  +
  + 	
! + 	 The primary goal of development GIN indices was 
  + 	 support for highly scalable, full-text search in 
  + 	 PostgreSQL and there are often situations when 
  + 	 a full-text search returns a very large set of results.  Since reading 
--- 221,227 
  +gin_fuzzy_search_limit
  +
  + 	
! + 	 The primary goal of developing GIN indices was 
  + 	 support for highly scalable, full-text search in 
  + 	 PostgreSQL and there are often situations when 
  + 	 a full-text search returns a very large set of results.  Since reading 
***
*** 232,238 
  + 	
  + 	 Such queries usually contain very frequent words, so the results are not 
  + 	 very helpful. To facilitate execution of such queries 
! + 	 GIN has a configurable  soft upper limit of the size 
  + 	 of the returned set, determined by the 
  + 	 gin_fuzzy_search_limit GUC variable.  It is set to 0 by
  + 	 default (no limit).
--- 232,238 
  + 	
  + 	 Such queries usually contain very frequent words, so the results are not 
  + 	 very helpful. To facilitate execution of such queries 
! + 	 GIN has a configurable soft upper limit o

Re: [HACKERS] Timezone List

2006-09-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> Any view over the full timezone names should also include the
>> corresponding data from zone.tab in the timezone library source.

> Just noticed this mail, so that's not included in my patch.

BTW, now that the view is in, I can't help noticing that it shows 550
different zone names, while there are only 392 entries in the zone.tab
file.  I conclude that the zic people don't take maintenance of zone.tab
very seriously, and hence that we probably shouldn't rely on it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Mid cycle release?

2006-09-16 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> Then we should change autovacuum so that it stays out of the way when
> tables are being vacuumed frequently enough via an external means.

What makes you think it doesn't do that already?  Of course, it has its
own ideas about what "frequently enough" is, but it won't re-vacuum a
table that's been vacuumed within that interval.

While I personally don't really want autovac on in my development
environment, I find it hard to deny the argument that it ought to be
on by default.  If you know enough to set up a cron job to do your own
vacuuming schedule, you *certainly* know enough to turn off autovac
if you don't want it, or better dial it down to the point where it's
just an emergency backstop for your cron job.  If you don't know enough
to turn off autovac, then you need it on.

Also, as noted already, having autovac on by default will encourage the
developers to work out the remaining kinks in it ;-)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Opinion about macro for the uuid datatype.

2006-09-16 Thread Tom Lane

Gevik Babakhani <[EMAIL PROTECTED]> writes:
> I was wondering if I should go ahead and add a macro datatype like the
> SERIAL, only this time for the uuid.

This assumes a fact not in evidence, which is that we're going to accept
a uuid-generation function as part of core.  AFAIK the only reasonably
non-contentious part of this proposal is the ability to *store* uuids.
Generating new ones introduces a host of portability and other issues.

Considering the amount of pain involved in supporting SERIAL in the
parser, pg_dump, etc, I'd say that adding the above is a pretty certain
route to getting your patch rejected as too invasive.  If, three or four
versions down the road, large numbers of people are using uuid with the
same generation function, *then* it might be time to think about
introducing a macro type.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Gregory Stark wrote:
>> A TODO list people can freely add stuff to is precisely what would make it
>> useful. It would have things we don't already know.

> I am just going to hope that you are kidding about this one.

Fortunately, none of the real developers would have to pay any attention
to any such page ... and you can bet they wouldn't.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] minor feature request: Secure defaults during function creation

2006-09-16 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Thu, Sep 14, 2006 at 10:24:43AM -0400, Pascal Meunier wrote:
>> My request is to allow changing default permissions for function creation, a
>> la "umask", or at least not give PUBLIC execute permissions by default.

> Hrm... do we have any other objects that default to granting permissions
> on creation?

Yes; see the GRANT reference page.

I'm disinclined to change it.  We've had the current behavior since we
introduced ACLs for functions at all, and there have been very few
complaints.  I think we'd get a lot more complaints if we denied public
EXECUTE by default.  One reason is that given the way pg_dump and
default permissions work, any such change would break existing
applications, because an existing schema loaded into a new backend
would suddenly have different permissions behavior.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane

Hannu Krosing <[EMAIL PROTECTED]> writes:
> why not go all the way, and do utf-7 encoded header if hi bit is set ?
> or just always have an utf-8 encoded header.

That definition is (a) very expensive to scan, and (b) useless for
anything except utf-8 encoded text.  Whatever mechanism we select should
be more flexible than that (eg, should work for inet values).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] The enormous s->childXids problem

2006-09-16 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> The real question is why does the subtransaction actually assign itself
>> an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.

> I suspect the answer to that is the same as the answer to what's actually
> creating the subtransaction. plperl_return_next doesn't. I think something
> must be doing an actual SPI query, not just a return next.

The other question on the table is why it didn't respond to QueryCancel
in a reasonable amount of time.  I'd really like to see a complete test
case for this problem ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New version of money type

2006-09-16 Thread Stephen Frost

* Gregory Stark ([EMAIL PROTECTED]) wrote:
> In any case I think Jim was suggesting this be handled internally to the
> numeric data type which wouldn't cause this problem. However I'm not sure
> anything has to be done. A numeric is an array of 16 bit integers, so anything
> under 64k *is* stored just as an integer. 

Right, which is fine, but for >64k (Actually, isn't it 10,000?),
operations could be done in 1 step using 64bit ints instead of in
multiple steps.  On systems with fast 64bit integer ops (quite a few of
them out there these days...) this seems likely to be an improvement in
performance.

Of course, there's the question of how much of an improvement, how
complicated it makes the code, backwards-compatibility issues, and what
to do about the binary in/out operations.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Mid cycle release?

2006-09-16 Thread Joshua D. Drake


Jim C. Nasby wrote:

On Thu, Sep 14, 2006 at 02:36:22PM -0700, Joshua D. Drake wrote:
But on a serious note, the problem I run into is exactly the opposite. 
Someone will turn on autovacuum because they thought it was a good idea 
and for their work load, it isn't. So instead of creating new and 
interesting ways to allow their database to be more efficient, I am 
dealing with snafu's created by my own community.
 
Then we should change autovacuum so that it stays out of the way when

tables are being vacuumed frequently enough via an external means.


ALTER TABLE foo DISABLE autovacuum? :)

Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Mid cycle release?

2006-09-16 Thread Jim C. Nasby

On Thu, Sep 14, 2006 at 02:36:22PM -0700, Joshua D. Drake wrote:
> But on a serious note, the problem I run into is exactly the opposite. 
> Someone will turn on autovacuum because they thought it was a good idea 
> and for their work load, it isn't. So instead of creating new and 
> interesting ways to allow their database to be more efficient, I am 
> dealing with snafu's created by my own community.
 
Then we should change autovacuum so that it stays out of the way when
tables are being vacuumed frequently enough via an external means.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Joshua D. Drake


Gregory Stark wrote:

Josh Berkus  writes:

The other lesson of Wikipedia is that maintaining wiki quality for a generally 
editable wiki requires a full-time dedicated staff.   We don't even have any 
volunteers who have 4 hours/week to commit to cleaning up the wiki, unless 
you're volunteering.


Bullshit. Most pages on wikipedia don't require any attention from such staff.


This does not help your argument.


The wiki has been sitting there for two weeks and hasn't had any problems.
It's already getting more attention and updates than the techdocs wiki which
still has articles up from 2001 that are no longer relevant and in some cases
are actively misleading.


Techdocs is a different problem all together. Josh has already mentioned 
some problems with it. I can mention more.


1. It isn't easy to login
2. It is even harder to create a login
3. There is no creation of login for most people because they don't know 
they have to go to the community portion of the www site to get to it.


I am sure their are other problems on the inside, I haven't actually 
ever logged in ;)




Putting barriers up blocking people trying to help isn't any guarantee of
quality. What it does guarantee is irrelevance.


Again you argue without actual evidence. Wikipedia is a success it is 
however it does have quite a bit of problems as well. A simple but very 
straightforward signup mechanism isn't going to stop most people.




Frankly that's what we have today and that's why it's useless. Things only get
put on the list when everyone who cares already knows what has to be done and
then nobody looks at it because there's nothing there they don't already know
about.


Anytime I have asked for something to be put on the TODO list, it is. As 
long as I can provide a practical reason as to what it is and why it 
would be good.


That part of the TODO works just fine.

Now, do I think there is improvement to be made? Of course but the 
current TODO is far from useless.





A TODO list people can freely add stuff to is precisely what would make it
useful. It would have things we don't already know.



I am just going to hope that you are kidding about this one.

Sincerely,

Joshua D. Drake



--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] The enormous s->childXids problem

2006-09-16 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> The real question is why does the subtransaction actually assign itself
> an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.
> What is it you're doing in there that changes the database?

I suspect the answer to that is the same as the answer to what's actually
creating the subtransaction. plperl_return_next doesn't. I think something
must be doing an actual SPI query, not just a return next.

-- 
greg

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Opinion about macro for the uuid datatype.

2006-09-16 Thread Gevik Babakhani

The development of the uuid datatype is yet in progress...
I was wondering if I should go ahead and add a macro datatype like the
SERIAL, only this time for the uuid.

something like:
create table tbl
(
mypk SERIALGUID;
)

which creates

create table tbl
(
mypk uuid default new_guid();
)

or do you think this would be an overkill?

regards,
Gevik.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Opinion wanted on UUID/GUID datatype output formats.

2006-09-16 Thread Jim C. Nasby

On Thu, Sep 14, 2006 at 11:32:09PM +0200, Peter Eisentraut wrote:
> Gevik Babakhani wrote:
> > As suggested in earlier discussion we provide a raw/plain output of
> > the uuid type:
> >
> > devdb=# select * from tbluuid;
> > pk|
> > --+
> >  6b13c5a1afb4dcf5ce8f8b4656b6c93c |
> >  01e40a79b55b6e226bffb577e960453d |
> > (2 rows)
> 
> The UUID standards define a single perfectly clear format, and the one 
> you show is not it.
> 
> > I was wondering if we want to have a formatting function to be able
> > to provide other common formats of the uuid/guid?
> 
> If you stick to the standard format, I don't think that will be 
> necessary.

+1. For people that care about the non-standard MSSQL format, they can
easily create their own function that will wrap it in {}.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New version of money type

2006-09-16 Thread Gregory Stark

Theo Schlossnagle <[EMAIL PROTECTED]> writes:

> Would that pose indexing issues?  It would also mean that when joining two
> tables you'd have to handle some interesting type  conversion issues (at
> times).  We had someone accidentally create a  largish table with userid as
> "numeric" and other tables are "bigint",  it was disastrous for performance
> (joining).  I'd imagine that if the  above wasn't done cleverly, that
> performance problem would be repeated.

That used to be a problem but Tom solved it a little while back. Not a perfect
solution in that it requires lots of cross-data-type operators as the number
of data types grows but it works.

In any case I think Jim was suggesting this be handled internally to the
numeric data type which wouldn't cause this problem. However I'm not sure
anything has to be done. A numeric is an array of 16 bit integers, so anything
under 64k *is* stored just as an integer. 

Well, just an integer plus a useless exponent. I think it would be a neat
trick to normalize the exponent to the end of the last element of the mantissa
rather than the first digit so that integers don't need an exponent.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Gregory Stark

Josh Berkus  writes:

> The other lesson of Wikipedia is that maintaining wiki quality for a 
> generally 
> editable wiki requires a full-time dedicated staff.   We don't even have any 
> volunteers who have 4 hours/week to commit to cleaning up the wiki, unless 
> you're volunteering.

Bullshit. Most pages on wikipedia don't require any attention from such staff.
There are *millions* of pages constantly being updated something that only
works because of that dynamic. Only a small number of pages need any special
attention.

The wiki has been sitting there for two weeks and hasn't had any problems.
It's already getting more attention and updates than the techdocs wiki which
still has articles up from 2001 that are no longer relevant and in some cases
are actively misleading.

Putting barriers up blocking people trying to help isn't any guarantee of
quality. What it does guarantee is irrelevance.

> This is *particularly* true of the TODO stuff.  We simply don't want Joe User 
> adding their personal wishlist to the TODOs, and that's exactly what will 
> happen if the TODO list is world-writable.  TODOs should be items which have 
> been hashed out here on the Hackers list, and the wiki page should list the 
> specification which is the general consensus.

Frankly that's what we have today and that's why it's useless. Things only get
put on the list when everyone who cares already knows what has to be done and
then nobody looks at it because there's nothing there they don't already know
about.

A TODO list people can freely add stuff to is precisely what would make it
useful. It would have things we don't already know.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] index help for uuid datatype

2006-09-16 Thread Gevik Babakhani

I followed your advice 6 million records are inserted without any
problems :)

Thank you.

On Sat, 2006-09-16 at 14:03 -0400, Tom Lane wrote:
> Gevik Babakhani <[EMAIL PROTECTED]> writes:
> > I was doing strncmp at some point but it did not work because 
> > of the '\0'. I have created a custom comparison function and it seems to
> > work.
> 
> Perhaps you just need memcmp instead of strncmp?
> 
>   regards, tom lane
> 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [pgsql-www] Developer's Wiki

2006-09-16 Thread Lukas Kahwe Smith


Joshua D. Drake wrote:

Gregory Stark wrote:

Josh Berkus  writes:


I was actually hoping for more feedback on the content itself. I'm
still not clear if it's supposed to be "developers only - to the
exclusion of users" or "developers only - but accessable to anyone".
It should be readable by everyone, but editable only by authorized 
users.


I think the lessons of wikipedia is precisely that you *don't* want to 
add

such barriers. You want to let people add stuff pretty much freely. That
encourages people to get involved and put up information. 


I don't agree, you should also look at the recent post and fork by one 
of wikipedia's co-founders. The developers wiki should only be edited by 
authorized users.


Now, getting authorized should be easy as reasonably possible, but 
having a wholesale editing orgy on the wiki responsible for tracking 
postgresql developer information is not a good idea.


I agree.
Banning IPs is simply not feasible.
I think a minor moderation step during the signup is little overhead and 
ensures we know who changed what etc. This is obviously not only 
important for blaming but also great for talking to people about a given 
page when it comes time to update it.


I think however there should be a "section" that is free for all. It 
should be clearly labeled with parts are free for all and which are not. 
It should be easy to move pages from one section to the other and back.


Essentially I would say the wiki should be open to anyone who signs up, 
however there should be pages that are only writeable to people inside a 
special group. I am not sure how the ACL works in the current wiki. SOme 
wikis allow you to define ACL's by page, some allow you to create 
subwikis with different ACLs etc.


regards,
Lukas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark

Bruce Momjian <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> Bruce Momjian <[EMAIL PROTECTED]> writes:
>> 
>> Sure, this helps with CHAR(1) but there were plen
>
> OK.  

Ooops, sorry, I guess I sent that before I was finished editing it. I'm glad
you could divine what I meant because I'm not entirely sure myself :)

> Well, if you are using TEXT, it is hard to say you are worried about
> storage size.  I can't imagine many one-byte values are stored in TEXT.

Sure, what about "Middle name or initial". Or "Apartment Number". Or for that
matter "Drive Name" on a windows box. Just because the user doesn't want to
enforce a limit on the field doesn't mean the data will always be so large.

>> Part of the reason I think the smallfoo data types may be a bright idea in
>> their own right is that the datatypes might be able to do clever things about
>> their internal storage. For instance, smallnumeric could use base 100 where
>> largenumeric uses base 1.
>
> I hardly think modifying the numeric routines to do a two different
> bases is worth it.  

It doesn't actually require any modification, it's already a #define. It may
be worth doing the work to make it a run-time parameter so we don't need to
recompile the functions twice.

I'm pretty sure it's worthwhile as far as space conservation goes. a datum
holding a value like "10" currently takes 10 bytes including the length
header:

postgres=# select sizeof('10'::numeric);
 sizeof 

 10
(1 row)

That would go down to 7 bytes with a 1-byte length header. And down to 4 bytes
with base 100. Ie, reading a table full of small numeric values would be 75%
faster.

With some clever hacking I think we could get it to go down to a single byte
with no length header just like ascii characters for integers under 128. But
that's a separate little side project.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Heikki Linnakangas

Tom Lane wrote:

Gregory Stark <[EMAIL PROTECTED]> writes:
The user would have to decide that he'll never need a value over 127 
bytes

long ever in order to get the benefit.

Weren't you the one that's been going on at great length about how
wastefully we store CHAR(1) ? Sure, this has a somewhat restricted
use case, but it's about as efficient as we could possibly get within
that use case.

I like the idea of having variable length headers much more than a new 
short character type. It solves a more general problem, and it 
compresses VARCHAR(>255) TEXT fields nicely when the actual data in the 
field is small.

I'd like to propose one more encoding scheme, based on on Tom's earlier 
proposals. The use cases I care about are:

* support uncompressed data up to 1G, like we do now
* 1 byte length word for short data.
* store typical CHAR(1) values in just 1 byte.

Tom wrote:
> * 0xxx uncompressed 4-byte length word as stated above
> * 10xx 1-byte length word, up to 62 bytes of data
> * 110x 2-byte length word, uncompressed inline data
> * 1110 2-byte length word, compressed inline data
> *  1-byte length word, out-of-line TOAST pointer

My proposal is:

00xx uncompressed, aligned 4-byte length word
010x 1-byte length word, uncompressed inline data (up to 32 bytes)
011x 2-byte length word, uncompressed inline data (up to 8k)
1xxx 1 byte data in range 0x20-0x7E
1000 2-byte length word, compressed inline data (up to 4k)
 TOAST pointer

The decoding algorithm is similar to Tom's proposal, and relies on using 
0x00 for padding.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] [PATCHES] plpgsql, return can contains any

2006-09-16 Thread Bruce Momjian


This has been saved for the 8.3 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---


Pavel Stehule wrote:
> 
> 
> >
> >"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> > >> This patch doesn't seem to cope with cases where the supplied tuple has
> > >> the wrong number of columns, and it doesn't look like it's being 
> >careful
> > >> about dropped columns either.  Also, that's a mighty bizarre-looking
> > >> choice of cache memory context in coerce_to_tuple ... but then again,
> > >> why are you bothering with a cache at all for temporary arrays?
> >
> > > I am sorry, Tom. But I don't understand. I can check number of columns,
> > > ofcourse and I'll do it. What cache for temporary arrays do you mean?
> >
> >I thought that making coerce_to_tuple depend on estate->err_func was
> >pretty bizarre, and that there was no need for any "cache" at all for
> >arrays that need only live as long as the function runs.  All you are
> >saving here is a palloc/pfree cycle, which is not worth the obscurantism
> >and risk of bugs (are you sure natts can never change?).
> 
> No, cache there is ugly. But I don't have idea about more efective 
> implementation of it :-(. First version of this patch was more clean. and 
> little bit slow. This cache save 10%.
> 
> >
> >BTW, if you want this patch to make it into 8.2, it needs to be fixed
> >and resubmitted *very* soon.
> 
> I understand, but I am not able work on it in next four days. And I need 
> help with it from Neil. It will be for 8.3.
> 
> Thank you
> Pavel
> 
> _
> Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
> http://messenger.msn.cz/
> 
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New version of money type

2006-09-16 Thread Stephen Frost

* Theo Schlossnagle ([EMAIL PROTECTED]) wrote:
> Would that pose indexing issues?  It would also mean that when  
> joining two tables you'd have to handle some interesting type  
> conversion issues (at times).  We had someone accidentally create a  
> largish table with userid as "numeric" and other tables are "bigint",  
> it was disastrous for performance (joining).  I'd imagine that if the  
> above wasn't done cleverly, that performance problem would be repeated.

The performance issue you ran into with joins was more likely because
there's no hash function for numeric than the way numerics are stored.
I'm not really sure how I feel about this idea...  If it's handled
completely inside numeric then it might be reasonable to do (there
wouldn't *be* any real 'type conversion', numeric would just be modified
to support both sizes and would handle an upgrading/downgrading, I don't
think the code would be all *that* complex, honestly...).

I don't think the indexing would be an issue either as you can provide
the appropriate operations regardless of the size..  It might make
writing the hash function a bit more interesting, but probably not...

We might want to have a compile-time option for this tho, as not all
architectures handle 64bit integer ops very well.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian

Gregory Stark wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > Tom Lane wrote:
> >> Gregory Stark <[EMAIL PROTECTED]> writes:
> >> > The user would have to decide that he'll never need a value over 127 
> >> > bytes
> >> > long ever in order to get the benefit.
> >> 
> >> Weren't you the one that's been going on at great length about how
> >> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
> >> use case, but it's about as efficient as we could possibly get within
> >> that use case.
> 
> Sure, this helps with CHAR(1) but there were plen

OK.  One thing that we have to remember is that the goal isn't to
squeeze every byte out of the storage format.  That would be
inefficient, performance-wise.  We need just a reasonble storage layout.

> > To summarize what we are now considering:
> >
> > Originally, there was the idea of doing 1,2, and 4-byte headers.  The
> > 2-byte case is probably not worth the extra complexity (saving 2 bytes
> > on a 128-byte length isn't very useful).
> 
> Well don't forget we virtually *never* use more than 2 bytes out of the 4 byte
> headers for on-disk data. The only way we ever store a datum larger than 16k
> is you compile with 32k blocks *and* you explicitly disable toasting on the
> column.

Well, if we went with 2-byte, then we are saying we are not going to
store the TOAST length in the heap header, but store it somewhere else,
probably in TOAST.  I can see how that could be done.  This would leave
us with 0, 1, and 2-byte headers, and 4-byte headers in TOAST.  Is that
something to consider?  I think one complexity is that we are going to
need 4-byte headers in the backend to move around values, so there is
going to need to be a 2-byte to 4-byte mapping for all data types, not
just the short ones.  If this only applies to TEXT, bytea, and a few
other types, it is uncertain whether it is worth it.

(We do store the TOAST length in heap, right, just before the TOAST
pointer?)

> Worse, if we don't do anything about fields like text it's not true that this
> only occurs on 128-byte columns and larger. It occurs on any column that
> *could* contain 128 bytes or more. Ie, any column declared as varchar(128)
> even if it contains only "Bruce" or any column declared as text or numeric.

Well, if you are using TEXT, it is hard to say you are worried about
storage size.  I can't imagine many one-byte values are stored in TEXT.

> I'm not sure myself whether the smallfoo data types are a bad idea in
> themselves though. I just think it probably doesn't replace trying to shorten
> the largefoo varlena headers as well.

See above.  Using just 2-byte headers in heap is a possibility.  I am
just not sure if the overhead is worth it.  With the 0-1 header, we
don't have any backend changes as data is passed around from the disk to
memory.  Doing the 2-byte header would require that.

> Part of the reason I think the smallfoo data types may be a bright idea in
> their own right is that the datatypes might be able to do clever things about
> their internal storage. For instance, smallnumeric could use base 100 where
> largenumeric uses base 1.

I hardly think modifying the numeric routines to do a two different
bases is worth it.  

> > I am slightly worried about having short version of many of our types. 
> > Not only char, varchar, and text, but also numeric.  I see these varlena
> > types in the system:
> 
> I think only the following ones make sense for smallfoo types:
> 
> >  bpchar
> >  varchar
> >  bit
> >  varbit
> >  numeric

OK, bit and numeric are ones we didn't talk about yet.

> These don't currently take typmods so we'll never know when they could use a
> smallfoo representation, it might be useful if they did though:
> 
> >  bytea
> >  text
> >  path
> >  polygon

Good point.

> 
> 
> Why are these varlena? Just for ipv6 addresses? Is the network mask length not
> stored if it's not present? This gives us a strange corner case in that ipv4
> addresses will *always* fit in the smallfoo data type and ipv6 *never* fit.
> Ie, we'll essentially end up with an ipv4inet and an ipv6inet. Sad in a way.
> 
> >  inet
> >  cidr

Yes, I think so.

> 
> I have to read up on what this is.
> 
> >  refcursor
> 
> 
> > Are these shorter headers going to have the same alignment requirements
> > as the 4-byte headers?  I am thinking not, meaning we will not have as
> > much padding overhead we have now.
> 
> Well a 1-byte length header doesn't need any alignment so they would have only
> the alignment that the data type itself declares. I'm not sure how interacts
> with heap_deform_tuple but it's probably simpler than finding out only once
> you parse the length header what alignment you need.

That is as big a win as the shorter header.  Doing a variable length
header with big-endian encoding and stuff would be a mess, for sure.
With 0-1 header, your alignment doesn't need to change from the disk to
memory.

-- 
  B

Re: [HACKERS] New version of money type

2006-09-16 Thread Theo Schlossnagle



On Sep 16, 2006, at 5:27 PM, Jim C. Nasby wrote:


On Thu, Sep 14, 2006 at 11:12:14AM -0400, D'Arcy J.M. Cain wrote:

The benefit of the money type is speed.  Because internal operations
are done on integers they can generally be handled by single CPU ops.
My tests on the 64 bit version show 10% to 25% improvement over  
numeric

for many operations.


Has anyone looked at changing numeric so that for numbers with less  
than
9 digits it stores/uses an int, and for between 10 and 18 digits it  
uses
a bigint? Perhaps that would net every numeric user a speed  
improvement.


Would that pose indexing issues?  It would also mean that when  
joining two tables you'd have to handle some interesting type  
conversion issues (at times).  We had someone accidentally create a  
largish table with userid as "numeric" and other tables are "bigint",  
it was disastrous for performance (joining).  I'd imagine that if the  
above wasn't done cleverly, that performance problem would be repeated.


// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread mark

On Sat, Sep 16, 2006 at 02:13:49PM -0700, Mark Dilger wrote:
> Mark Dilger wrote:
> >Wouldn't a 4-byte numeric be a "float4" and an 8-byte numeric be a 
> >"float8".  I'm not sure I see the difference.
> Nevermind.  I don't normally think about numeric as anything other than 
> an arbitrarily large floating point type.  But it does differ in that 
> you can specify the range you want it to cover.

Range and the base, both being important.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New version of money type

2006-09-16 Thread Jim C. Nasby

On Thu, Sep 14, 2006 at 11:12:14AM -0400, D'Arcy J.M. Cain wrote:
> The benefit of the money type is speed.  Because internal operations
> are done on integers they can generally be handled by single CPU ops.
> My tests on the 64 bit version show 10% to 25% improvement over numeric
> for many operations.

Has anyone looked at changing numeric so that for numbers with less than
9 digits it stores/uses an int, and for between 10 and 18 digits it uses
a bigint? Perhaps that would net every numeric user a speed improvement.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Mark Dilger


Mark Dilger wrote:

Wouldn't a 4-byte numeric be a "float4" and an 8-byte numeric be a 
"float8".  I'm not sure I see the difference.


Nevermind.  I don't normally think about numeric as anything other than 
an arbitrarily large floating point type.  But it does differ in that 
you can specify the range you want it to cover.


mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] minor feature request: Secure defaults during function creation

2006-09-16 Thread Jim C. Nasby

On Thu, Sep 14, 2006 at 10:24:43AM -0400, Pascal Meunier wrote:
> First, I asked about this on #postgresql, and I realize that this request
> would be a low priority item.  Yet, it would be an improvement for security
> reasons.
> 
> When creating a function using EXTERNAL SECURITY DEFINER, by default PUBLIC
> has execute privileges on it.  That's unexpected given that when I create a
> new table, PUBLIC doesn't have any privileges on it.  It's also not a secure
> default.
> 
> My request is to allow changing default permissions for function creation, a
> la "umask", or at least not give PUBLIC execute permissions by default.  I
> am aware that it is possible to wrap the create function statement with the
> necessary grants/revokes inside a transaction, as a work-around, but it is
> not obvious and makes things unnecessarily inconvenient.  This increases the
> chances of beginner and even medium-skill admins to get their security
> wrong.

Hrm... do we have any other objects that default to granting permissions
on creation? ISTM all objects should be created with no permissions.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Joshua D. Drake


Gregory Stark wrote:

Josh Berkus  writes:


I was actually hoping for more feedback on the content itself. I'm
still not clear if it's supposed to be "developers only - to the
exclusion of users" or "developers only - but accessable to anyone".

It should be readable by everyone, but editable only by authorized users.


I think the lessons of wikipedia is precisely that you *don't* want to add
such barriers. You want to let people add stuff pretty much freely. That
encourages people to get involved and put up information. 


I don't agree, you should also look at the recent post and fork by one 
of wikipedia's co-founders. The developers wiki should only be edited by 
authorized users.


Now, getting authorized should be easy as reasonably possible, but 
having a wholesale editing orgy on the wiki responsible for tracking 
postgresql developer information is not a good idea.


Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PATCHES] [HACKERS] Timezone List

2006-09-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>>> Assuming we can sneak this in even though it's feature-freeze,
>>> want me to look for it?

>> Yeah, please take a look --- seeing the size of the code will
>> probably help us decide if it's too late for 8.2 or not.

> Here goes. Tested only on win32 so far, but works there. No docs yet
> either - need to know if it goes in first ;)

I've applied this along with some extra work to get it to show GMT
offsets and DST status, which should be useful for helping people
to choose which setting they want.  This effectively obsoletes
Table B-5 as well as B-4 in the SGML docs ... we should probably
remove both of those in favor of recommending people look at the
views.

I did the basic documentation work in catalogs.sgml for these views,
but Appendix B still needs an update.  Joachim, you were going to do
that, right?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Josh Berkus

Greg,

> I think the lessons of wikipedia is precisely that you *don't* want to add
> such barriers. You want to let people add stuff pretty much freely. That
> encourages people to get involved and put up information.

The other lesson of Wikipedia is that maintaining wiki quality for a generally 
editable wiki requires a full-time dedicated staff.   We don't even have any 
volunteers who have 4 hours/week to commit to cleaning up the wiki, unless 
you're volunteering.

This is *particularly* true of the TODO stuff.  We simply don't want Joe User 
adding their personal wishlist to the TODOs, and that's exactly what will 
happen if the TODO list is world-writable.  TODOs should be items which have 
been hashed out here on the Hackers list, and the wiki page should list the 
specification which is the general consensus.

If we had a "user documentation wiki", then *that* should be world-editable, 
but again that would require community volunteers to dedicate to cleaning it 
up.  The developer wiki is by and for actual contributors.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark

Bruce Momjian <[EMAIL PROTECTED]> writes:

> Tom Lane wrote:
>> Gregory Stark <[EMAIL PROTECTED]> writes:
>> > The user would have to decide that he'll never need a value over 127 bytes
>> > long ever in order to get the benefit.
>> 
>> Weren't you the one that's been going on at great length about how
>> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
>> use case, but it's about as efficient as we could possibly get within
>> that use case.

Sure, this helps with CHAR(1) but there were plen

>
> To summarize what we are now considering:
>
> Originally, there was the idea of doing 1,2, and 4-byte headers.  The
> 2-byte case is probably not worth the extra complexity (saving 2 bytes
> on a 128-byte length isn't very useful).

Well don't forget we virtually *never* use more than 2 bytes out of the 4 byte
headers for on-disk data. The only way we ever store a datum larger than 16k
is you compile with 32k blocks *and* you explicitly disable toasting on the
column.

Worse, if we don't do anything about fields like text it's not true that this
only occurs on 128-byte columns and larger. It occurs on any column that
*could* contain 128 bytes or more. Ie, any column declared as varchar(128)
even if it contains only "Bruce" or any column declared as text or numeric.

I'm not sure myself whether the smallfoo data types are a bad idea in
themselves though. I just think it probably doesn't replace trying to shorten
the largefoo varlena headers as well.

Part of the reason I think the smallfoo data types may be a bright idea in
their own right is that the datatypes might be able to do clever things about
their internal storage. For instance, smallnumeric could use base 100 where
largenumeric uses base 1.

> I am slightly worried about having short version of many of our types. 
> Not only char, varchar, and text, but also numeric.  I see these varlena
> types in the system:

I think only the following ones make sense for smallfoo types:

>bpchar
>varchar
>bit
>varbit
>numeric

These don't currently take typmods so we'll never know when they could use a
smallfoo representation, it might be useful if they did though:

>bytea
>text
>path
>polygon

Why are these varlena? Just for ipv6 addresses? Is the network mask length not
stored if it's not present? This gives us a strange corner case in that ipv4
addresses will *always* fit in the smallfoo data type and ipv6 *never* fit.
Ie, we'll essentially end up with an ipv4inet and an ipv6inet. Sad in a way.

>inet
>cidr

I have to read up on what this is.

>refcursor

> Are these shorter headers going to have the same alignment requirements
> as the 4-byte headers?  I am thinking not, meaning we will not have as
> much padding overhead we have now.

Well a 1-byte length header doesn't need any alignment so they would have only
the alignment that the data type itself declares. I'm not sure how interacts
with heap_deform_tuple but it's probably simpler than finding out only once
you parse the length header what alignment you need.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Gregory Stark

Josh Berkus  writes:

>> I was actually hoping for more feedback on the content itself. I'm
>> still not clear if it's supposed to be "developers only - to the
>> exclusion of users" or "developers only - but accessable to anyone".
>
> It should be readable by everyone, but editable only by authorized users.

I think the lessons of wikipedia is precisely that you *don't* want to add
such barriers. You want to let people add stuff pretty much freely. That
encourages people to get involved and put up information. 

Experience shows that most people are cooperative most of the time. If there
turns out to be particularly contentious areas you can restrict access to
those areas to authorized users or ban ip addresses.

I've already put some stuff up there. I didn't plan to, but when I was
browsing I had ideas and the ability to add content was just one click away...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] XML support wiki page

2006-09-16 Thread Josh Berkus

Peter,

> The short status is that we have quite a bit of code ready and willing
> for 8.3.  Some factions are working on sneaking some of that into 8.2,
> but not me. :)

Is there a reason to have this code on pgFoundry in advance of applying it as 
patches against the main code?   Nickolay submitted for a project, but I'm a 
little reluctant to approve a pgFoundry project which is going to be 
abandoned as soon as we branch 8.3.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [pgsql-www] [HACKERS] Developer's Wiki

2006-09-16 Thread Josh Berkus

Martjin,

> I was actually hoping for more feedback on the content itself. I'm
> still not clear if it's supposed to be "developers only - to the
> exclusion of users" or "developers only - but accessable to anyone".

It should be readable by everyone, but editable only by authorized users.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian

Tom Lane wrote:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > The user would have to decide that he'll never need a value over 127 bytes
> > long ever in order to get the benefit.
> 
> Weren't you the one that's been going on at great length about how
> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
> use case, but it's about as efficient as we could possibly get within
> that use case.

To summarize what we are now considering:

Originally, there was the idea of doing 1,2, and 4-byte headers.  The
2-byte case is probably not worth the extra complexity (saving 2 bytes
on a 128-byte length isn't very useful).

What has come about is the idea of 0, 1, and 4-byte headers.  0-byte
headers store only one 7-bit ASCII byte, 1-byte headers can store 127
bytes or 127 / max_encoding_len characters.  4-byte headers store what
we have now.

The system is split into two types of headers, 0/1 headers which are
identified by a special data type (or mapped to a data type that can't
exceed that length, like inet), and 4-byte headers.  The code that deals
with 0/1 headers is independent of the 4-byte header code we have now.

I am slightly worried about having short version of many of our types. 
Not only char, varchar, and text, but also numeric.  I see these varlena
types in the system:

test=> SELECT typname FROM pg_type WHERE typlen = -1 AND typtype = 'b'
   AND typelem = 0;
  typname
---
 bytea
 text
 path
 polygon
 inet
 cidr
 bpchar
 varchar
 bit
 varbit
 numeric
 refcursor
(12 rows)

Are these shorter headers going to have the same alignment requirements
as the 4-byte headers?  I am thinking not, meaning we will not have as
much padding overhead we have now.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Bruce Momjian

Hannu Krosing wrote:
> ?hel kenal p?eval, R, 2006-09-15 kell 19:18, kirjutas Tom Lane:
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > Tom Lane wrote:
> > >> No, it'll be a 1-byte header with length indicating that no bytes
> > >> follow,
> > 
> > > Well, in my idea, 1001 would be 0x01.  I was going to use the
> > > remaining 7 bits for the 7-bit ascii value.
> > 
> > Huh?  I thought you said 0001 would be 0x01, that is, high bit
> > clear means a single byte containing an ASCII character. 
> 
> why not go all the way, and do utf-7 encoded header if hi bit is set ?
> 
> or just always have an utf-8 encoded header.

This is a special zero-length header case.  We only have one byte. 
Doing a utf8 length call to span to the next column is too expensive.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Hannu Krosing

Ühel kenal päeval, R, 2006-09-15 kell 19:18, kirjutas Tom Lane:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> No, it'll be a 1-byte header with length indicating that no bytes
> >> follow,
> 
> > Well, in my idea, 1001 would be 0x01.  I was going to use the
> > remaining 7 bits for the 7-bit ascii value.
> 
> Huh?  I thought you said 0001 would be 0x01, that is, high bit
> clear means a single byte containing an ASCII character. 

why not go all the way, and do utf-7 encoded header if hi bit is set ?

or just always have an utf-8 encoded header.

> You could
> reverse that but it just seems to make things harder --- the byte
> isn't a correct data byte by itself, as it would be with the other
> convention.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] TODO item: update source/timezone for 64-bit tz

2006-09-16 Thread Bruce Momjian


This has been saved for the 8.3 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
> Back when we converted src/timezone to use int64 for pg_time_t, we
> wondered what to do about extending the compiled timezone data file
> format for int64, so that it would work for years beyound 2038.  We
> shelved the problem waiting to see what the upstream zic folks would do.
> Well, it looks like they've done something about it.  So I think we
> ought to plan on updating our code to match theirs, so that we fix the
> y2038 problem while keeping it possible to use a standard zic-database
> installation with Postgres.  This is not urgent (I surely see no need
> to hold up 8.2 to fix it), but it ought to go on the TODO list.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Truncation of email subject lines

2006-09-16 Thread Douglas McNaught

Bruce Momjian <[EMAIL PROTECTED]> writes:

> Should I try hacking my mail reader to prevent this?  I think I see
> where it is happening in the code.

I'd say it'd be better to hack MajorDomo to be RFC-compliant.  :)

-Doug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Truncation of email subject lines

2006-09-16 Thread Bruce Momjian

I have confirmed that my email client, elm-ME+, is wrapping long subject
lines on output, e.g.:

Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - information
about current SSL

and because majordomo is stripping any secondary lines, the subjects are
getting truncated.  It took me a while to figure this out because my
mail reader displays multi-line subjects just fine.

When we started talking about the problems with multi-line subjects, I
figured it wasn't a problem --- who would make a multi-line subject?  I
didn't realize mail readers would do that automatically.

Should I try hacking my mail reader to prevent this?  I think I see
where it is happening in the code.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] index help for uuid datatype

2006-09-16 Thread Tom Lane

Gevik Babakhani <[EMAIL PROTECTED]> writes:
> I was doing strncmp at some point but it did not work because 
> of the '\0'. I have created a custom comparison function and it seems to
> work.

Perhaps you just need memcmp instead of strncmp?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] index help for uuid datatype

2006-09-16 Thread Gevik Babakhani

Thank you Tom :) 

I was doing strncmp at some point but it did not work because 
of the '\0'. I have created a custom comparison function and it seems to
work. I am now inserting 6 million records to see if it will break again
and start the other tests from scratch.
Thank you for your help.

On Sat, 2006-09-16 at 12:43 -0400, Tom Lane wrote:
> Gevik Babakhani <[EMAIL PROTECTED]> writes:
> > I must be doing something very wrong.
> > Does anyone ever seen such a thing?
> 
> Your comparison functions for uuid are inconsistent.
> 
>   regards, tom lane
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information

2006-09-16 Thread Bruce Momjian

Subject wrap test, please ignore.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] TODO item: update source/timezone for 64-bit tz files

2006-09-16 Thread Tom Lane

Back when we converted src/timezone to use int64 for pg_time_t, we
wondered what to do about extending the compiled timezone data file
format for int64, so that it would work for years beyound 2038.  We
shelved the problem waiting to see what the upstream zic folks would do.
Well, it looks like they've done something about it.  So I think we
ought to plan on updating our code to match theirs, so that we fix the
y2038 problem while keeping it possible to use a standard zic-database
installation with Postgres.  This is not urgent (I surely see no need
to hold up 8.2 to fix it), but it ought to go on the TODO list.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Foreign keys

2006-09-16 Thread Jim C. Nasby

On Sun, Sep 10, 2006 at 09:40:51AM -0700, Joshua D. Drake wrote:
> 
> >In any case the same logic that leads to it being desirable to report all 
> >the
> >errors to the user in a UI and not just report them one by one also 
> >applies to
> >the database. I'm not sure it's the most important issue in the world, but 
> >it
> >does seem like a "it would be nice" feature if it reported all the errors 
> >in
> >the statement, not just the first one it finds.
> >
> 
> Seems kind of extraneous to me. I am guessing it would cause yet further 
> overhead with our foreign key checks.
> 
> My testing shows that the use of foreign keys on high velocity single 
> transaction loads, can cause easily a 50% reduction in performance. Why 
> add to that? What we need to be doing is finding a way to decrease the 
> impact of foreign key checks.

IIRC, a big chunk of that overhead is simply having triggers on the
table. I tested it once and found something like a 30% overhead for
having a trigger that did nothing on insert. Granted, that was a simple
test on a single machine, but still...

Obviously one place to look is in the trigger code to see if there's
performance gains to be had there. But something else to consider is
moving away from using a general-purpose trigger framework to impliment
RI. I suspect a dedicate code path for RI could be a lot leaner than the
general-purpose trigger code is.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  

I like this scheme a lot - maximum bang for buck.



  
Is there any chance we can do it transparently, without exposing new 
types? It is in effect an implementation detail ISTM, and ideally the 
user would not need to have any knowledge of it.



Well, they'd have to be separate types, but the parser handling of them
would be reasonably transparent I think.  It would work pretty much
exactly like the way that CHAR(N) maps to "bpchar" now --- is that
sufficiently well hidden for your taste?


  


Yeah, probably. At least to the stage where it's not worth a herculean 
effort to overcome.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] index help for uuid datatype

2006-09-16 Thread Tom Lane

Gevik Babakhani <[EMAIL PROTECTED]> writes:
> I must be doing something very wrong.
> Does anyone ever seen such a thing?

Your comparison functions for uuid are inconsistent.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] index help for uuid datatype

2006-09-16 Thread Gevik Babakhani

I am testing the uuid datatype with unique indexing.
I have the following script to generate a table with uuid types:

create table guid(
pk uuid primary key default new_guid(),
f1 varchar(38)
);

insert into guid(f1) values('bla bla');
insert into guid(f1) values('bla bla');

insert into guid(f1) select f1 from guid;

-- I repeat the insert above 10 times to get may records.

-- then save the generated guid into the varchar field
update guid set f1=pk;

-- check for duplicates in the varchar field
select f1,count(*) from guid group by f1 having(count(f1) > 1);

-- no duplicates there:

 f1 | count
+---
(0 rows)

-- and here it comes.
dev=# create unique index idx1 on guid using btree (pk);
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

I must be doing something very wrong.
Does anyone ever seen such a thing?

Regards,
Gevik.











---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [DOCS] New XML section for documentation

2006-09-16 Thread Tom Lane

I wrote:
> ISTM the right answer is to add xml_is_well_formed() in this release
> and have xml_valid as an alias for it, with documentation explaining
> that xml_valid is deprecated and will be removed in the next release.

Not hearing any objection, I've done this.

> His patch also adds an xpath_array() function --- what do people
> think about that?  It's well past feature freeze ... now we've always
> been laxer about contrib than the core code, but still I'm inclined
> to say that that function should wait for 8.3.

I didn't add xpath_array(), but am still open to doing it if there
is any consensus in favor of it.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I like this scheme a lot - maximum bang for buck.

> Is there any chance we can do it transparently, without exposing new 
> types? It is in effect an implementation detail ISTM, and ideally the 
> user would not need to have any knowledge of it.

Well, they'd have to be separate types, but the parser handling of them
would be reasonably transparent I think.  It would work pretty much
exactly like the way that CHAR(N) maps to "bpchar" now --- is that
sufficiently well hidden for your taste?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Andrew Dunstan




Tom Lane wrote:

To review: Bruce is
proposing a var-length type structure with the properties

first byte 0xxx   field length 1 byte, exactly that value
first byte 1xxx   xxx data bytes follow

This can support *any* stored value from zero to 127 bytes long.
We can imagine creating new datatypes "short varchar" and "short char",
and then having the parser silently substitute these types for varchar(N)
or char(N) whenever N <= 127 / max_encoding_length.  Add some
appropriate implicit casts to convert these to the normal varlena types
for computation, and away you go.  No breakage of any existing
datatype-specific code, just a few additions in places like
heap_form_tuple.


  


I like this scheme a lot - maximum bang for buck.

Is there any chance we can do it transparently, without exposing new 
types? It is in effect an implementation detail ISTM, and ideally the 
user would not need to have any knowledge of it.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Weren't you the one that's been going on at great length about how
>> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
>> use case, but it's about as efficient as we could possibly get within
>> that use case.

> Sure, but are you saying you would have this in addition to do variable sized
> varlena headers?

We could ... but right at the moment I'm thinking this would solve 80%
of the problem with about 1% of the work needed for the varlena change.
So my thought is to do this for 8.3 and then wait a couple releases to
see if the more extensive hack is really needed.  Even if we did
eventually install variable-sized varlena headers, this layout would
still be useful for types like inet/cidr.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] Tiny plpython fix

2006-09-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> If we need this change in plpython, why not also 
>> src/include/port/win32.h?

> That's a very good question. It is because something that's pulled in
> from the python headers causes the deprecation to show. Whereas when we
> compile other things, the deprecation is hidden somewhere in an #ifdef
> or so. But its definitly not *wrong* to fix it in win32.h as well.

OK, changed it both places.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] pg_strcasecmp in fe-connect.c

2006-09-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> There are also some occurrences in pgbench.c, but I'm unsure 
>> that we need be paranoid about changing those.

> If we ever want to be able to compile it on a platform that doesn't have
> strcasecmp() (such as MSVC++), we would, no?

OK, replaced 'em all.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Include file in regress.c

2006-09-16 Thread Magnus Hagander

> > (I seem to have something funky in my cvs repo in general - doing a 
> > cvs diff gives me a *huge* diff for files like gram.c that 
> I thought 
> > weren't supposed to be in cvs at all. Any ideas on why that 
> would be? 
> > (I'm rsync:ing to a local repository and then running against that 
> > one))
> 
> Hm, gram.c once was in CVS, years ago ... but I don't know 
> why you'd be seeing anything about it now.

That's kinda what baffles me as well...

if I look into my cvs repository directory, it shows only gram.y,v, with
gram.c,v in Attic - which seems to make sense. Must be my client that's
gone crazy. In fact, mmy output ends up as:

Index: src\backend\parser/gram.c
===
RCS file: c:/prog/cvsrepo/pgsql/pgsql/src/backend/parser/Attic/gram.c,v
retrieving revision 2.90
diff -c -r2.90 gram.c
*** src\backend\parser/gram.c   7 May 1999 01:22:54 -   2.90
--- src\backend\parser/gram.c   14 Sep 2006 06:10:08 -


So I guess the question is down to why the hell it's looking at the
attic file at all. When I run cvs update on it, I get things like:

cvs.exe update: Updating src\backend\parser
M src\backend\parser/gram.c
cvs.exe update: use `cvs.exe add' to create an entry for
src\backend\parser/scan.c


which is kinda weird, because gram.c is in .cvsignore.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] The enormous s->childXids problem

2006-09-16 Thread Tom Lane

Theo Schlossnagle <[EMAIL PROTECTED]> writes:
> The select function is dbi-link's remote_select.   
> remote_select will perform the query and then for each row  
> return_next which calls the SPI.xs stuff to do plperl_return_next  
> which is wrapped in a PG_TRY block.  I see the value of the try block  
> to kick back sensible errors to perl, but creating childXids for  
> every row of a setof seems wildly excessive.  What's the harm in  
> simply not TRY'ing around there?

Failing.

The real question is why does the subtransaction actually assign itself
an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.
What is it you're doing in there that changes the database?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> The user would have to decide that he'll never need a value over 127 bytes
>> long ever in order to get the benefit.
>
> Weren't you the one that's been going on at great length about how
> wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
> use case, but it's about as efficient as we could possibly get within
> that use case.

Sure, but are you saying you would have this in addition to do variable sized
varlena headers?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] log_duration is redundant, no?

2006-09-16 Thread Tom Lane

"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> My only concern was that we now have less information with
> log_statement='all' than with log_min_duration_statement.

Well, you don't have the durations, but log_statement isn't supposed to
tell you that.  So I'm still quite confused about what it is that you
want to do differently.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Include file in regress.c

2006-09-16 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
> (I seem to have something funky in my cvs repo in general - doing a cvs
> diff gives me a *huge* diff for files like gram.c that I thought weren't
> supposed to be in cvs at all. Any ideas on why that would be? (I'm
> rsync:ing to a local repository and then running against that one))

Hm, gram.c once was in CVS, years ago ... but I don't know why you'd be
seeing anything about it now.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Reducing data type space usage

2006-09-16 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> The user would have to decide that he'll never need a value over 127 bytes
> long ever in order to get the benefit.

Weren't you the one that's been going on at great length about how
wastefully we store CHAR(1) ?  Sure, this has a somewhat restricted
use case, but it's about as efficient as we could possibly get within
that use case.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] minor feature request: Secure defaults during function creation

2006-09-16 Thread Pascal Meunier

First, I asked about this on #postgresql, and I realize that this request
would be a low priority item.  Yet, it would be an improvement for security
reasons.

When creating a function using EXTERNAL SECURITY DEFINER, by default PUBLIC
has execute privileges on it.  That's unexpected given that when I create a
new table, PUBLIC doesn't have any privileges on it.  It's also not a secure
default.

My request is to allow changing default permissions for function creation, a
la "umask", or at least not give PUBLIC execute permissions by default.  I
am aware that it is possible to wrap the create function statement with the
necessary grants/revokes inside a transaction, as a work-around, but it is
not obvious and makes things unnecessarily inconvenient.  This increases the
chances of beginner and even medium-skill admins to get their security
wrong.


Thanks,
Pascal Meunier
Purdue University CERIAS



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Mid cycle release?

2006-09-16 Thread Tom Dunstan


Joshua D. Drake wrote:
O.k. that was negative, sorry. Frankly I think that turning autovacuum 
on by default pretty much equates to, "I am lazy, and I don't want to 
actually evaluate my needs. Lets just go with MS Access"


Please ignore my negativity today. I apologize. I do not want autovacuum 
turned on by default but it isn't that big of a deal.


Dammit, I was halfway through a brilliant rebuttal! I'm going to post it 
anyway, since I think it's important to discuss the issues if we're 
going to make the right call. Your repented negativity is noted, though. :)


I can definitely see where you're coming from, it's a sort of tough-love 
scenario. There are legitimate counter arguments, though. The most 
obvious is that anyone who *does* evaluate their needs properly 
shouldn't have too much trouble turning it off, whereas there are lots 
of small database users out there who find having to set up a vacuum 
cron a pain. Example: I'm in the process of setting up a typo blog, 
using postgresql of course, but the database setup was secondary to the 
main thing that I was doing, and I'd completely forgotten about setting 
up a cron. Now I'm unlikely to produce blog posts at a rate that will 
cause the database to grow out of the "minuscule" range, but it should 
still be done, right?


I have to ask, what's wrong with lazy users? Software which allows you 
to be lazy gives you a warm tingly feeling, and you install it on your 
intranet server when no-one's looking. We want people to think of 
postgresql that way.


There are lots of MySQL specific pieces of software out there that 
started out as some guy/girl with a PHP and MySQL type of book. We can't 
turn that clock back, but making postgresql easier for the masses has to 
be a good thing for its adoption. The native win32 port is the poster 
child for this. It was a big PR win, no?


I would argue that leaving autovacuum off is only justifiable if we feel 
that it's going to be a bad choice for the majority of users. Many of 
the users who frequent postgresql lists understand the trade-off, but 
the ones that we're trying to attract don't. Is it better for them to 
discover manual vacuums when they're trying to incrementally improve 
performance (with the risk that they never discover them at all), or 
when their database is running like a dog because they've never vacuumed 
it at all?


One solution might be to turn it on in turn-key solutions: linux distro 
RPMs, Win32 installer (is it on there already?) etc, but leave it turned 
off in the source release. Would that help you, or are your clients 
using RPMs or whatever?


Cheers

Tom


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [ADMIN] Vacuum error on database postgres

2006-09-16 Thread andy


Tom Lane wrote:

Jeff Davis <[EMAIL PROTECTED]> writes:

Couldn't you just sort by the table names, and ANALYZE the tables in
that order? Would that effectively prevent the deadlocks?


That'd work too, I think (I suggested the variant of ordering by OID,
which is simpler and more reliable).  Not sure if it's really worth the
trouble though --- how many people do you think are doing concurrent
whole-database ANALYZEs inside transaction blocks?

As-is the code will do the analyzes in pg_class physical row order,
which is almost good enough --- only if someone did a schema change that
forced a pg_class row update between the starts of the two ANALYZE runs
would it possibly fail.  So the use-case for a fix is really kinda narrow.

regards, tom lane


Honestly, its not that big a problem, and if there were some doc's, 
faq's, etc (and people on the newsgroups) I dont think you should even 
worry about it.


It makes sense to me, and if Tom had come back and said, yeah, here is 
why, cuz you run autovacuum and at then end of the script you did a 
vacuum... they are conflicting... dont do that.  I'd be cool with that. 
 As soon as its common knowledge I think it could be avoided.


Really, isn't it just bulk loads anyway where a person might do this?

-Andy

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] log_duration is redundant, no?

2006-09-16 Thread Guillaume Smet


On 9/16/06, Tom Lane <[EMAIL PROTECTED]> wrote:

The only asymmetry in the thing is that if log_statement fired then
we suppress duplicate printing of the query in the later duration log
message (if any) for that query.  But that seems like the right thing
if you're at all concerned about log volume.


Perhaps I'm not representative of the users of these settings but when
I used log_statement='all', I didn't really care about the log volume.
I knew it really generates a lot of log lines and it slows down my
database.

My only concern was that we now have less information with
log_statement='all' than with log_min_duration_statement.

That said, I don't use it myself now: I use exclusively
log_min_duration_statement and log_duration. So if you think it's
better like that, it's ok for me.

Does anyone else have an opinion about this?

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Include file in regress.c

2006-09-16 Thread Magnus Hagander

> > This patch adds a required include file to regress.c, 
> required to get 
> > at InvalidTransactionId.
> 
> If that's needed, why isn't everybody else's build falling over too?

Uh, because it's already included 4 lines up?! I must've been tired when
I wrote that patch.

Must've been something wrong in my cvs pull, because I cleaned out the
directory and did a new cvs update, and now it works without it. Sorry
about the fuss.

(I seem to have something funky in my cvs repo in general - doing a cvs
diff gives me a *huge* diff for files like gram.c that I thought weren't
supposed to be in cvs at all. Any ideas on why that would be? (I'm
rsync:ing to a local repository and then running against that one))

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] log_duration is redundant, no?

2006-09-16 Thread Guillaume Smet


On 9/16/06, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

It might make sense to log _what_ is going on, without telling all the
little details, for example

LOG:  parse duration: 0.250 ms
LOG:  bind duration: 0.057 ms
LOG:  execute my_query: SELECT * FROM shop WHERE $1 = $2
DETAIL:  parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets'


It's not really the idea when you use log_statement and log_duration.
Lines are completely different semantically speaking.
So you should have:
LOG:  parse (log_statement)
LOG:  duration: 0.250 ms (log_duration)

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Tiny plpython fix

2006-09-16 Thread Magnus Hagander

> > Seems __vc_errcode was used during Visual C++ beta at some 
> point, and 
> > is now declared deprecated in the system headers. This 
> patch renames 
> > our use of it to __msvc_errcode, so we don't conflict anymore.
> 
> If we need this change in plpython, why not also 
> src/include/port/win32.h?

That's a very good question. It is because something that's pulled in
from the python headers causes the deprecation to show. Whereas when we
compile other things, the deprecation is hidden somewhere in an #ifdef
or so. But its definitly not *wrong* to fix it in win32.h as well.

The level of header files can be quite horrible to find this in, so I
haven't dug deep enough to find exactly which define it comes from. But
given that it's somewhere in the python headers, it's not something we
can change anyway.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] pg_strcasecmp in fe-connect.c

2006-09-16 Thread Magnus Hagander

> > This patch fixes a couple of cases where we use 
> strcasecmp() instead 
> > of
> > pg_strcasecmp() in fe_connect.c, coming from the LDAP client pathc.
> 
> Applied.  I found another instance in contrib/hstore, too.  

Ah. msvc builds don't currently build /contrib, that's why I missed that
one.

> There are also some occurrences in pgbench.c, but I'm unsure 
> that we need be paranoid about changing those.


If we ever want to be able to compile it on a platform that doesn't have
strcasecmp() (such as MSVC++), we would, no?

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Optimize ORDER BY ... LIMIT

2006-09-16 Thread Nicolas Barbier

2006/9/16, Gregory Stark <[EMAIL PROTECTED]>:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

> I don't know if this is the same thing you are talking about, but Oleg
> talked to me on the conference about "partial sort", which AFAICS it's
> about the same thing you are talking about.  I think Teodor submitted a
> patch to implement it, which was rejected because of not being general
> enough.

Oof, you have a long memory. Oleg does reference such a thing in his 2002 post
that ended up resulting in the TODO item. I can't find the original patch but
I doubt any patch against 7.1 is going to be all that helpful in understanding
what to do today.

I'm also confused how he only saw a factor of 6 improvement in reading the top
100 out of a million. I would expect much better.

For example, consider the case in which 6 passes are needed to do the
full sort. Then, for a "partial sort", at least the first of these
passes has to be fully executed, because one needs to read at least
all the data once to find the "top n".

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

72 matches

Mail list logo