Re: [HACKERS] Reducing stats collection overhead

2007-07-30 Thread Arjen van der Meijden

On 31-7-2007 5:07 Alvaro Herrera wrote:

Arjen van der Meijden wrote:
Afaik Tom hadn't finished his patch when I was testing things, so I don't 
know. But we're in the process of benchmarking a new system (dual quad-core 
Xeon) and we'll have a look at how it performs in the postgres 8.2dev we 
used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 
8.3dev). I'll let you guys (or at least Tom) know how they compare in our 
benchmark.


So, ahem, did it work? :-)


The machine turned out to have a faulty mainboard, so we had to 
concentrate on first figuring out why it was unstable and then whether 
the replacement mainboard did make it stable in a long durability 
test Of course that behaviour only appeared with mysql and not with 
postgresql, so we had to run our mysql-version of the benchmark a few 
hundred times, rather than testing various versions, untill the machine 
had to go in production.


So we haven't tested postgresql 8.3dev on that machine, sorry.

Best regards,

Arjen





On 18-5-2007 15:12 Alvaro Herrera wrote:

Tom Lane wrote:

Arjen van der Meijden told me that according to the tweakers.net
benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
here that for small SELECT queries issued as separate transactions,
there's a significant difference.  I think much of the difference stems
from the fact that we now have stats_row_level ON by default, and so
every transaction sends a stats message that wasn't there by default
in 8.2.  When you're doing a few thousand transactions per second
(not hard for small read-only queries) that adds up.

So, did this patch make the performance problem go away?





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= <[EMAIL PROTECTED]> writes:
> On Mon, 2007-07-30 at 23:36 -0400, Tom Lane wrote:
>> There isn't any "secret sauce" in the RHEL build process

> Really? Are the compiler options, etc, public?

Certainly.  If you doubt it, try comparing pg_config output for the RHEL
and CentOS packages.  (And if the CFLAGS entries are different, you
should be mentioning it to the CentOS package maintainer, not me.)

> Actually I did not compare -- But both of them were 4.3 (RHEL 4.3 and
> CentOS 4.3). I'm assuming that they have the same package versions,
> right?

> BTW, they were stock 4.3 -- no updates, etc.

RHEL 4.3 was obsoleted more than a year ago, so I'd like to think that
nobody finds "no update" comparisons to be very relevant today ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Devrim GÜNDÜZ
Hi,

On Mon, 2007-07-30 at 23:36 -0400, Tom Lane wrote:
> > Each test took 1-2 days -- I will insist that CentOS performs poorer
> > than RHEL.
> 
> I'm finding that hard to believe too.

I have felt the same, that's why I repeated the test twice.

> There isn't any "secret sauce" in the RHEL build process 

Really? Are the compiler options, etc, public?

> --- the CentOS guys should have been able to duplicate the RHEL RPMs
> exactly.  Now it's possible that CentOS had lagged in updating some
> performance-relevant package; did you compare package versions across
> both OSes? 

Actually I did not compare -- But both of them were 4.3 (RHEL 4.3 and
CentOS 4.3). I'm assuming that they have the same package versions,
right? 

BTW, they were stock 4.3 -- no updates, etc.

I hope I will be able to publish only the graphs, so that community will
take a look what is going on.

Regards,

-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [GENERAL] ascii() for utf8

2007-07-30 Thread Decibel!

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I didn't find anything in the docs so I think the answer
is no which leads me to ask...  Why not?  (Hard to believe
lack of need without concluding that either ascii() is
not needed, of utf8 text is little used.)

Are there technical problems in implementing such a
function?  Has anyone else already done this (ie, is
there somewhere I could get it from?)

Is there some other non-obvious way to get the cp value
for the utf8 character?

I think I could use plperl or plpython for this but
this seems like an awful lot of overhead for such a
basic task.


I suspect that this is just a matter of no one scratching the itch. I  
suspect a patch would be accepted, or you could possibly put  
something on pgFoundry. I'd set it up so that ascii() and chr() act  
according to the appropriate locale setting (I'm not sure which one  
would be appropriate).

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= <[EMAIL PROTECTED]> writes:
> On Mon, 2007-07-30 at 19:14 -0700, Joshua D. Drake wrote:
>>> and RHEL performed much better than CentOS.

>> Not to be unkind, but I doubt that on an identical configuration.

> Each test took 1-2 days -- I will insist that CentOS performs poorer
> than RHEL.

I'm finding that hard to believe too.  There isn't any "secret sauce"
in the RHEL build process --- the CentOS guys should have been able to
duplicate the RHEL RPMs exactly.  Now it's possible that CentOS had
lagged in updating some performance-relevant package; did you compare
package versions across both OSes?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Reducing stats collection overhead

2007-07-30 Thread Alvaro Herrera
Arjen van der Meijden wrote:
> Afaik Tom hadn't finished his patch when I was testing things, so I don't 
> know. But we're in the process of benchmarking a new system (dual quad-core 
> Xeon) and we'll have a look at how it performs in the postgres 8.2dev we 
> used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 
> 8.3dev). I'll let you guys (or at least Tom) know how they compare in our 
> benchmark.

So, ahem, did it work? :-)


> On 18-5-2007 15:12 Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> Arjen van der Meijden told me that according to the tweakers.net
>>> benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
>>> here that for small SELECT queries issued as separate transactions,
>>> there's a significant difference.  I think much of the difference stems
>>> from the fact that we now have stats_row_level ON by default, and so
>>> every transaction sends a stats message that wasn't there by default
>>> in 8.2.  When you're doing a few thousand transactions per second
>>> (not hard for small read-only queries) that adds up.
>> So, did this patch make the performance problem go away?


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
> 
> Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> 
> > > I think we might need additional "freezing-xmax" operations to avoid
> > > XID-wraparound in the first path of vacuum, though it hardly occurs.
> > 
> > I'm not sure I follow.  Can you elaborate?  Do you mean storing a
> > separate relfrozenxmax for each table or something like that?
> 
> We need to work around wraparound of xmax in dead tuples. If we miss to
> vacuum them and XID is wrapped, we cannot remove them until the next
> XID-wraparound, because we treat them to be deleted in the *future*.

Oh, but this should not be a problem, because a tuple is either frozen
or removed completely -- xmax cannot precede xmin.


> > > We just add XID of the vacuum to dead tuples we see in the
> > > first path. When backends find a dead tuple and see the transaction
> > > identified by XID in it has commited, they can freely reuse the area of
> > > the dead tuple because we can assume index entries pointing the tuple
> > > have been removed by the vacuum.
> > 
> > I would be worried about leftover index entries being later used by new
> > tuples in the heap.  Then when you visit the index, find that entry, go
> > to the heap and find the new tuple and return it, which could be bogus.
> 
> Avoiding dangling index entries, I'm thinking about reusing dead tuples
> only if we see the VACUUM transaction have committed successfully.
> That means the VACUUM transaction removed all index entries corresponding
> those dead tuples; They are now Heap-Only-Tuples, so that we can recycle
> them in the same manner as HOT updated tuples.

Hmm.  OK, I admit I have no idea how HOT works.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-30 Thread Bruce Momjian
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > Bruce Momjian wrote:
> 
> > >   3) Remove default_text_search_config and require the
> > >  configuration to be specified in each function call.
> > > 
> > > If we remove default_text_search_config, it would also make ::tsvector
> > > casting useless as well.
> > 
> > OK, I just found a case that I think is going to make #3 a requirement
> > (remove default_text_search_config).
> > 
> > How is a CREATE INDEX ... to_tsvector(col) going to restore from a
> > pg_dump?  I see no way of guaranteeing that the
> > default_text_search_config is correct on the restore, and in fact I
> > don't think we have any way of knowing the default_text_search_config
> > used for the index.
> 
> Make pg_dump emit only CREATE INDEX sentences with two-param format.  In
> fact I think it would make sense to convert internally the one-param
> format to two-param, before hitting the catalogs.
>
> This would also solve your problem about usability of WHERE clauses, if
> you rewrite the one-param calls to two-params before the optimizer kicks
> in.

Yes, that could be done but as far as I know we weren't planning to have
those areas of our backend be aware of configuration parameters to
text search functions, and I doubt we want to do that for 8.3, if ever.
There are many tsearch functions and doing this would spill tsearch
function checks all over the backend, reducing our modularity.

The default capability just isn't worth it, and in addition is
error-prone.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread ITAGAKI Takahiro

Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> > I think we might need additional "freezing-xmax" operations to avoid
> > XID-wraparound in the first path of vacuum, though it hardly occurs.
> 
> I'm not sure I follow.  Can you elaborate?  Do you mean storing a
> separate relfrozenxmax for each table or something like that?

We need to work around wraparound of xmax in dead tuples. If we miss to
vacuum them and XID is wrapped, we cannot remove them until the next
XID-wraparound, because we treat them to be deleted in the *future*.


> > We just add XID of the vacuum to dead tuples we see in the
> > first path. When backends find a dead tuple and see the transaction
> > identified by XID in it has commited, they can freely reuse the area of
> > the dead tuple because we can assume index entries pointing the tuple
> > have been removed by the vacuum.
> 
> I would be worried about leftover index entries being later used by new
> tuples in the heap.  Then when you visit the index, find that entry, go
> to the heap and find the new tuple and return it, which could be bogus.

Avoiding dangling index entries, I'm thinking about reusing dead tuples
only if we see the VACUUM transaction have committed successfully.
That means the VACUUM transaction removed all index entries corresponding
those dead tuples; They are now Heap-Only-Tuples, so that we can recycle
them in the same manner as HOT updated tuples.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-30 Thread Alvaro Herrera
Bruce Momjian wrote:
> Bruce Momjian wrote:

> > 3) Remove default_text_search_config and require the
> >configuration to be specified in each function call.
> > 
> > If we remove default_text_search_config, it would also make ::tsvector
> > casting useless as well.
> 
> OK, I just found a case that I think is going to make #3 a requirement
> (remove default_text_search_config).
> 
> How is a CREATE INDEX ... to_tsvector(col) going to restore from a
> pg_dump?  I see no way of guaranteeing that the
> default_text_search_config is correct on the restore, and in fact I
> don't think we have any way of knowing the default_text_search_config
> used for the index.

Make pg_dump emit only CREATE INDEX sentences with two-param format.  In
fact I think it would make sense to convert internally the one-param
format to two-param, before hitting the catalogs.

This would also solve your problem about usability of WHERE clauses, if
you rewrite the one-param calls to two-params before the optimizer kicks
in.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Devrim GÜNDÜZ
Hi,

On Mon, 2007-07-30 at 19:14 -0700, Joshua D. Drake wrote:
> > and RHEL performed much better than CentOS. 
> 
> Not to be unkind, but I doubt that on an identical configuration.

Since I don't have the permission to distribute the benchmark results, I
will be happy to spend time for re-running these tests if someone
provides me an identical machine. 

Each test took 1-2 days -- I will insist that CentOS performs poorer
than RHEL.

BTW, I will ask for permission to distribute the graphs that I produced
using gnuplot -- Maybe those graphs will give us some light.

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Joshua D. Drake

Devrim GÜNDÜZ wrote:

Hi,



RHEL has better performance than CentOS -- I guess it is the compiler
options that Red Hat is using while compiling their RPMs.

I have performed a test using OSDL test suite a few months ago on a
system that has:

* 8 x86_64 CPUs @ 3200.263
* 16 Gigabytes of RAM
* PostgreSQL 8.1.5 (PGDG packages)

and RHEL performed much better than CentOS. 


Not to be unkind, but I doubt that on an identical configuration.

Joshua D. Drake



Regards,



--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-30 Thread Bruce Momjian
Bruce Momjian wrote:
> We have to decide if we want a GUC default_text_search_config, and if so
> when can it be changed.
> 
> Right now there are three ways to create a tsvector (or tsquery)
> 
>   ::tsvector
>   to_tsvector(value)
>   to_tsvector(config, value)
> 
> (ignoring plainto_tsvector)
> 
> Only the last one specifies the configuration. The others use the
> configuration specified by default_text_search_config.  (We had an
> previous discussion on what the default value of
> default_text_search_config should be, and it was decided it should be
> set via initdb based on a flag or the locale.)
> 
> Now, because most people use a single configuration, they can just set
> default_text_search_config and there is no need to specify the
> configuration name.
> 
> However, expression indexes cause a problem here:
> 
>   
> http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
> 
> We recommend that users create an expression index on the column they
> want to do a full text search on, e.g.
> 
>   CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
> 
> However, the big problem is that the expressions used in expression
> indexes should not change their output based on the value of a GUC
> variable (because it would corrupt the index), but in the case above,
> default_text_search_config controls what configuration is used, and
> hence the output of to_tsvector is changed if default_text_search_config
> changes.
> 
> We have a few possible options:
> 
>   1) Document the problem and do nothing else.
>   2) Make default_text_search_config a postgresql.conf-only
>  setting, thereby making it impossible to change by non-super
>  users, or make it a super-user-only setting.
>   3) Remove default_text_search_config and require the
>  configuration to be specified in each function call.
> 
> If we remove default_text_search_config, it would also make ::tsvector
> casting useless as well.

OK, I just found a case that I think is going to make #3 a requirement
(remove default_text_search_config).

How is a CREATE INDEX ... to_tsvector(col) going to restore from a
pg_dump?  I see no way of guaranteeing that the
default_text_search_config is correct on the restore, and in fact I
don't think we have any way of knowing the default_text_search_config
used for the index.

And if we have to require the configuration name in CREATE INDEX, it has
to be used in WHERE, so we might as well just remove the default
capability and always require the configuration name.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Building Hash Index by Presorting Tuples

2007-07-30 Thread twraney
Hi,

We are trying to sort the index tuples before inserting them into hash buckets, 
to improve build speed.  

Here is our plan:

1. Build a spool that contains all the index tuples to be inserted into the 
buckets. - this is done.

2. sort the index tuples in the spool according to the bucket number to which 
they should belong. This results in accessing a bucket once and only once.  

3. For (2) to work, we need an estimate of the number of buckets. This is done.

4. After sorting the index tuples, insert them into hash in bucket order. 

Our challenge: we need to determine the final bucket number for the itup (index 
tuple). 

1. to do the above, we need to apply a mask to the hash value of the index 
tuple. first, we calculate the hash value of the index tuple. then, we 
calculate the mask using:

(1 << (ceiling(log 2 (Estimate of buckets needed-1

So, if we need 6 buckets, the mask would be 7 or binary 111.  If we needed 100 
buckets, the mask would be 127 or binary 111.   If we AND this mask to the 
hash of the key, we only recognize the least   sig. bits needed to do the 
compare.

A 32 bit hash value may look like:  1011010100101010110101010101

Let's say we just need 6 buckets, apply the mask 111 and we get:

1011010100101010110101010101 (the hash value of the key)  
0111 (the mask &)
   
0101 (the resulting bucket number = 5)

If we needed 100 buckets, the calculation would look like:

1011010100101010110101010101 (the hash value of the key)   
0111 (the mask &)

01010101 (the resulting bucket number = 85) 

2. however, in practice when we apply a mask of value say, (binary) our 
resulting bucket number is not evenly distrubuted.

3. do we look for a better hash function? or can we modify the existing hash?  
Comments are welcome.

-Tom


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Reducing Transaction Start/End Contention

2007-07-30 Thread Simon Riggs
Jignesh Shah's scalability testing on Solaris has revealed further
tuning opportunities surrounding the start and end of a transaction.
Tuning that should be especially important since async commit is likely
to allow much higher transaction rates than were previously possible.

There is strong contention on the ProcArrayLock in Exclusive mode, with
the top path being CommitTransaction(). This becomes clear as the number
of connections increases, but it seems likely that the contention can be
caused in a range of other circumstances. My thoughts on the causes of
this contention are that the following 3 tasks contend with each other
in the following way:

CommitTransaction(): takes ProcArrayLock Exclusive
but only needs access to one ProcArray element

waits for

GetSnapshotData():ProcArrayLock Shared
ReadNewTransactionId():XidGenLock Shared

which waits for

GetNextTransactionId()
takes XidGenLock Exclusive
ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
two possible place where I/O is required
ExtendSubtrans(): takes SubtransControlLock()
one possible place where I/O is required
Avoids lock on ProcArrayLock: atomically updates one ProcArray element


or more simply:

CommitTransaction() -- i.e. once per transaction
waits for
GetSnapshotData() -- i.e. once per SQL statement
which waits for
GetNextTransactionId() -- i.e. once per transaction

This gives some goals for scalability improvements and some proposals.
(1) and (2) are proposals for 8.3 tuning, the others are directions for
further research.


Goal: Reduce total time that GetSnapshotData() waits for
GetNextTransactionId()

1. Increase size of Clog-specific BLCKSZ
Clog currently uses BLCKSZ to define the size of clog buffers. This can
be changed to use CLOG_BLCKSZ, which would then be set to 32768.
This will naturally increase the amount of memory allocated to the clog,
so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
suggested, with successful results). This will also reduce the number of
ExtendClog() calls, which will probably reduce the overall contention
also.

2. Perform ExtendClog() as a background activity
Background process can look at the next transactionid once each cycle
without holding any lock. If the xid is almost at the point where a new
clog page would be allocated, then it will allocate one prior to the new
page being absolutely required. Doing this as a background task would
mean that we do not need to hold the XidGenLock in exclusive mode while
we do this, which means that GetSnapshotData() and CommitTransaction()
would also be less likely to block. Also, if any clog writes need to be
performed when the page is moved forwards this would also be performed
in the background.

3. Consider whether ProcArrayLock should use a new queued-shared lock
mode that puts a maximum wait time on ExclusiveLock requests. It would
be fairly hard to implement this well as a timer, but it might be
possible to place a limit on queue length. i.e. allow Share locks to be
granted immediately if a Shared holder already exists, but only if there
is a queue of no more than N exclusive mode requests queued. This might
prevent the worst cases of exclusive lock starvation. 

4. Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S<-S<-X<-S<-X<-S<-X<-S
Currently we would wake the 1st and 2nd waiters only. 

If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their req

Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread Alvaro Herrera
Jim Nasby wrote:
> On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote:
>> ITAGAKI Takahiro wrote:
>>> "Simon Riggs" <[EMAIL PROTECTED]> wrote:
>>>
 Read the heap blocks in sequence, but make a conditional lock for
 cleanup on each block. If we don't get it, sleep, then try again when we
 wake up. If we fail the second time, just skip the block completely.
>>
>> It would be cool if we could do something like sweep a range of pages,
>> initiate IO for those that are not in shared buffers, and while that is
>> running, lock and clean up the ones that are in shared buffers, skipping
>> those that are not lockable right away; when that's done, go back to
>> those buffers that were gotten from I/O and clean those up.  And retry
>> the locking for those that couldn't be locked the first time around,
>> also conditionally.  And when that's all done, a third pass could get
>> those blocks that weren't cleaned up in none of the previous passes (and
>> this time the lock would not be conditional).
>
> Would that be substantially easier than just creating a bgreader?

I'm not sure about easier, but I'm not sure that the bgreader can do the
same job.  ISTM that the bgreader would be mostly in charge of reading
in advance of backends, whereas what I'm proposing is mostly about
finding the best spot for locking.  It might turn out to be more trouble
than it's worth though, for sure.  And in any case I'm not in a hurry to
implement it.

In any case I'm not so sure about skipping vacuuming a block if it's not
lockable.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Crear es tan difícil como ser libre" (Elsa Triolet)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Machine available for community use

2007-07-30 Thread Devrim GÜNDÜZ
Hi,

On Wed, 2007-07-25 at 20:22 -0700, Joshua D. Drake wrote:
> > I'm checking into this, but it may take a few days to get an answer
> > (particularly since I'm planning to take Friday through Monday off).
> 
> Well if we go RHEL why not CentOS5 and just call it good? 

...because RHEL and CentOS are not really that identical. They are just
binary-compilant.

RHEL has better performance than CentOS -- I guess it is the compiler
options that Red Hat is using while compiling their RPMs.

I have performed a test using OSDL test suite a few months ago on a
system that has:

* 8 x86_64 CPUs @ 3200.263
* 16 Gigabytes of RAM
* PostgreSQL 8.1.5 (PGDG packages)

and RHEL performed much better than CentOS. 

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-30 Thread Bruce Momjian
Oleg Bartunov wrote:
> > OK, here is what I am thinking.  If we make default_text_search_config
> > super-user-only, then the user can't do SET (using "zero_damaged_pages"
> > as a superuser-only example):
> >
> > test=> set zero_damaged_pages = on;
> > ERROR:  permission denied to set parameter "zero_damaged_pages"
> >
> > test=> alter user guest set zero_damaged_pages = on;
> > ERROR:  permission denied to set parameter "zero_damaged_pages"
> >
> > but the super-user can set it in postgresql.conf, or:
> >
> > test=# alter user guest set zero_damaged_pages = on;
> > ALTER ROLE
> >
> > or
> >
> > test=# alter database vendor3 set zero_damaged_pages = on;
> > ALTER ROLE
> >
> > meaning while it will be super-user-only, the administrator can set the
> > default for specific databases and users.  Is that the best approach?
> >
> > A user can still over-ride the default by specifying the configuration
> > in the function call.
> 
> This is ok, but it will not work in hosting environment and still 
> doesn't prevent errors.

Agreed.  super-user-only now seems strange to me because it isn't a
security issue, but rather an attempt to avoid people causing errors.

The fundamental issue is that if you do a query using tsvector and
tsquery everything will work find because default_text_search_config
will be the same for both queries.  The problem is if do an expression
index lookup that doesn't specify the configuration name and your
default_text_search_config doesn't match the index, or you INSERT or
UPDATE into an expression index with a mismatched
default_text_search_config.

If we do make default_text_search_config super-user-only it prevents a
database owner from doing ALTER DATABASE db1 SET
default_text_search_config = 'english', which seems like a pretty big
limitation because I think per-database default_text_search_config makes
the most sense.

And, again, if you specify the configuration in the expression index you
have to specify it in the WHERE clause and then
default_text_search_config is pretty useless.

If we required the configuration to always be specified, you could still
store multiple configurations in the same column by having a secondary
column hold the configuration name:

CREATE INDEX i on x USING gist (to_tsvector(config_col, body));

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Export convert_to_scalar

2007-07-30 Thread Magnus Hagander
On Mon, Jul 30, 2007 at 01:04:35PM +0100, Gregory Stark wrote:
> 
> One of the ideas for pgadmin for the future will be a statistics table
> visualizer which amongst other things could chart the histograms so people can
> see if they represent the distribution of data they expect.
> 
> To do this pgadmin would need access to convert_to_scalar so it understands
> where to plot values from the histogram (and because where to place them is
> the only way to tell how tall to make the bars).
> 
> As the comments indicate convert_to_scalar is a bit of a hack and we may want
> to reimplement it some day but there's no particular need to do so for this
> task.
> 
> There are two options here, we can either provide a user-accessible catalog
> function to access convert_to_scalar in pg_proc, or we can just export
> convert_to_scalar and allow pgadmin's adminpack to define the corresponding
> function.
> 
> I've actually already done both so I could play with it, I can post whichever
> people prefer or both to patches. It would be good to get one or the other
> done in 8.3 so that if I or anyone else implements the charting with for
> pgadmin they can support 8.3 instead of having to wait until 8.4.

Unless there are very good arguments for not doing it, I'd very much like
to see it in core rather than adminpack.

Most people don't install adminpack, really. And it doesn't play too well
in a hosted scenario. So I'd rather see more functionality moved from
adminpack into core than the other way around..

//Magnus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Export convert_to_scalar

2007-07-30 Thread Gregory Stark

One of the ideas for pgadmin for the future will be a statistics table
visualizer which amongst other things could chart the histograms so people can
see if they represent the distribution of data they expect.

To do this pgadmin would need access to convert_to_scalar so it understands
where to plot values from the histogram (and because where to place them is
the only way to tell how tall to make the bars).

As the comments indicate convert_to_scalar is a bit of a hack and we may want
to reimplement it some day but there's no particular need to do so for this
task.

There are two options here, we can either provide a user-accessible catalog
function to access convert_to_scalar in pg_proc, or we can just export
convert_to_scalar and allow pgadmin's adminpack to define the corresponding
function.

I've actually already done both so I could play with it, I can post whichever
people prefer or both to patches. It would be good to get one or the other
done in 8.3 so that if I or anyone else implements the charting with for
pgadmin they can support 8.3 instead of having to wait until 8.4.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pipe chunking vs Windows

2007-07-30 Thread Andrew Dunstan



Magnus Hagander wrote:
However, the problem is that in backporting it we'd make a slight 
behaviour change - the log file just gets LF instead of CRLF line 
endings. I'm inclined to say that's a better result than living with the 
bug, though.



Can't we add back the CRLF combo when writing the file out from the
syslogger process?


  


Yes, you're right. I can do this by setting text mode on the syslogFile. 
I'll have a patch for review shortly.


Note: this is behaviour we definitely won't want for CSVlogs, because it 
will translate embedded LF as well as LF at the end of a log line. This 
whole little mess chewed up all the time I wanted to spend yesterday 
making progress on CSVlogs :-(


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pipe chunking vs Windows

2007-07-30 Thread Magnus Hagander
On Sun, Jul 29, 2007 at 07:43:34PM -0400, Andrew Dunstan wrote:
> 
> 
> Andreas Pflug wrote:
> >Andrew Dunstan wrote:
> >  
> >>I have no idea why that's done - it goes back to the origins of the
> >>syslogger - probably because someone mistakenly thinks all WIndows
> >>text files have to have CRLF line endings.
> >>
> >>I tried changing that to _O_BINARY, and calling _setmode on both the
> >>pipe before it's duped into stderr and stderr after the dup and both.
> >>Nothing seemed to work.
> >>
> >AFAIR the flag has to be set again in each child process.
> >
> >  
> 
> OK, this was the bit of info I was missing. I put a call in 
> SubPostmasterMain() like this:
> 
>_setmode(fileno(stderr), _O_BINARY);
> 
> and one in the syslogger code the same, just after the pipe is dup'ed 
> into stderr.
> 
> After that I saw no more corruption issues.

Seems right - for anything EXEC_BACKEND, if you change a state from the
default it won't be inherited, so you nede to explicitly add it in the
child.

> However, the problem is that in backporting it we'd make a slight 
> behaviour change - the log file just gets LF instead of CRLF line 
> endings. I'm inclined to say that's a better result than living with the 
> bug, though.

Can't we add back the CRLF combo when writing the file out from the
syslogger process?

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pipe chunking vs Windows

2007-07-30 Thread Magnus Hagander
On Sun, Jul 29, 2007 at 06:31:04PM -0400, Andrew Dunstan wrote:
> 
> 
> Andreas Pflug wrote:
> >Andrew Dunstan wrote:
> >  
> >>> 
> >>>  
> I have no idea why that's done - it goes back to the origins of the
> syslogger - probably because someone mistakenly thinks all WIndows
> text files have to have CRLF line endings.
> 
> >Yes this was intentional, notepad still doesn't like LF line endings.
> >Not my preferred text viewer, but the only one that's always available.
> >
> >
> >  
> 
> Not for Wordpad though, and it's pretty universal too. And Notepad won't 
> load a file of any great size anyway. Furthermore, we just can't have 

Just for the record, that hasn't been true for a long time. Notepad opens
large files just fine (well, fine is a matter of definition, since it needs
to load the whole file in RAM which is interesting when you open gigabyte
size files. But IIRC, so does wordpad).

And notepad doesn't carry the risk of people saving the file in word format
(yes, this happens all too frequently from what I've seen)

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster