Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Curt Sampson
On Tue, 26 Oct 2004, Greg Stark wrote:

 I see mmap or O_DIRECT being the only viable long-term stable states. My
 natural inclination was the former but after the latest thread on the subject
 I suspect it'll be forever out of reach. That makes O_DIRECT And a Postgres
 managed cache the only real choice. Having both caches is just a waste of
 memory and a waste of cpu cycles.

I don't see why mmap is any more out of reach than O_DIRECT; it's not
all that much harder to implement, and mmap (and madvise!) is more
widely available.

But if using two caches is only costing us 1% in performance, there's
not really much point

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.NetBSD.org
 Make up enjoying your city life...produced by BIC CAMERA

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Using ALTER TABLESPACE in pg_dump

2004-10-26 Thread Fabien COELHO
Dear Tom,
ISTM that the core business of a database is to help organize and protect
data, and it is plainly that. You just wish you won't need it, so it is
somehow abstract, but when and if you need it, it is not second-order
at all;-) and it is much too late to redo the dump.
So you create some tablespaces by hand.  Big deal.
I agree that is is doable this way, although not really nice.
This objection is not strong enough to justify an ugly, klugy definition 
for where tables get created.
I do also agree about this.
My real point is that while reading the thread quickly, I was afraid the 
problem would not be better addressed at all in the coming release.

It seems that I was wrong as it does not look to be the case.
Any fix instead of nothing is fair enough for me.
Thanks for your answer, have a nice day,
--
Fabien Coelho - [EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Simon Riggs
On Mon, 2004-10-25 at 16:34, Jan Wieck wrote: 
 The problem is, with a too small directory ARC cannot guesstimate what 
 might be in the kernel buffers. Nor can it guesstimate what recently was 
 in the kernel buffers and got pushed out from there. That results in a 
 way too small B1 list, and therefore we don't get B1 hits when in fact 
 the data was found in memory. B1 hits is what increases the T1target, 
 and since we are missing them with a too small directory size, our 
 implementation of ARC is propably using a T2 size larger than the 
 working set. That is not optimal.

I think I have seen that the T1 list shrinks too much, but need more
tests...with some good test results

The effectiveness of ARC relies upon the balance between the often
conflicting requirements of recency and frequency. It seems
possible, even likely, that pgsql's version of ARC may need some subtle
changes to rebalance it - if we are unlikely enough to find cases where
it genuinely is out of balance. Many performance tests are required,
together with a few ideas on extra parameters to includehence my
support of Jan's ideas.

That's also why I called the B1+B2 hit ratio turbulence because it
relates to how much oscillation is happening between T1 and T2. In
physical systems, we expect the oscillations to be damped, but there is
no guarantee that we have a nearly critically damped oscillator. (Note
that the absence of turbulence doesn't imply that T1+T2 is optimally
sized, just that is balanced).

[...and all though the discussion has wandered away from my original
patch...would anybody like to commit, or decline the patch?]

 If we would replace the dynamic T1 buffers with a max_backends*2 area of 
 shared buffers, use a C value representing the effective cache size and 
 limit the T1target on the lower bound to effective cache size - shared 
 buffers, then we basically moved the T1 cache into the OS buffers.

Limiting the minimum size of T1len to be 2* maxbackends sounds like an
easy way to prevent overbalancing of T2, but I would like to follow up
on ways to have T1 naturally stay larger. I'll do a patch with this idea
in, for testing. I'll call this T1 minimum size so we can discuss it.

Any other patches are welcome...

It could be that B1 is too small and so we could use a larger value of C
to keep track of more blocks. I think what is being suggested is two
GUCs: shared_buffers (as is), plus another one, larger, which would
allow us to track what is in shared_buffers and what is in OS cache. 

I have comments on effective cache size below

On Mon, 2004-10-25 at 17:03, Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
  This all only holds water, if the OS is allowed to swap out shared 
  memory. And that was my initial question, how likely is it to find this 
  to be true these days?
 
 I think it's more likely that not that the OS will consider shared
 memory to be potentially swappable.  On some platforms there is a shmctl
 call you can make to lock your shmem in memory, but (a) we don't use it
 and (b) it may well require privileges we haven't got anyway.

Are you saying we shouldn't, or we don't yet? I simply assumed that we
did use that function - surely it must be at least an option? RHEL
supports this at least

It may well be that we don't have those privileges, in which case we
turn off the option. Often, we (or I?) will want to install a dedicated
server, so we should have all the permissions we need, in which case...

 This has always been one of the arguments against making shared_buffers
 really large, of course --- if the buffers aren't all heavily used, and
 the OS decides to swap them to disk, you are worse off than you would
 have been with a smaller shared_buffers setting.

Not really, just an argument against making them *too* large. Large
*and* utilised is OK, so we need ways of judging optimal sizing.

 However, I'm still really nervous about the idea of using
 effective_cache_size to control the ARC algorithm.  That number is
 usually entirely bogus.  Right now it is only a second-order influence
 on certain planner estimates, and I am afraid to rely on it any more
 heavily than that.

...ah yes, effective_cache_size.

The manual describes effective_cache_size as if it had something to do
with the OS, and some of this discussion has picked up on that.

effective_cache_size is used in only two places in the code (both in the
planner), as an estimate for calculating the cost of a) nonsequential
access and b) index access, mainly as a way of avoiding overestimates of
access costs for small tables.

There is absolutely no implication in the code that effective_cache_size
measures anything in the OS; what it gives is an estimate of the number
of blocks that will be available from *somewhere* in memory (i.e. in
shared_buffers OR OS cache) for one particular table (the one currently
being considered by the planner).

Crucially, the size referred to is the size of the *estimate*, not the
size 

Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Simon Riggs
On Tue, 2004-10-26 at 06:53, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  Another issue is what we do with the effective_cache_size value once we
  have a number we trust.  We can't readily change the size of the ARC
  lists on the fly.
 
  Huh? I thought effective_cache_size was just used as an factor the cost
  estimation equation.
 
 Today, that is true.  Jan is speculating about using it as a parameter
 of the ARC cache management algorithm ... and that worries me.
 

ISTM that we should be optimizing the use of shared_buffers, not whats
outside. Didn't you (Tom) already say that?

BTW, very good ideas on how to proceed, but why bother?

For me, if the sysadmin didn't give shared_buffers to PostgreSQL, its
because the memory is intended for use by something else and so not
available at all. At least not dependably. The argument against large
shared_buffers because of swapping applies to that assumption also...the
OS cache is too volatile to attempt to gauge sensibly.

There's an argument for improving performance for people that haven't
set their parameters correctly, but thats got to be a secondary
consideration anyhow.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS]

2004-10-26 Thread Bernd Helmle
[sorry if this mail appears more than once, but it seems the others didn't 
make it through the list]

This is a short preview on the view update code i'm currently working on. 
It is far away from being ready, but i want to share the current code and 
get some hints, what things have to be made better:

http://www.oopsware.de/pgsql/pgsql-view_update.tar.gz
The patchfile is against beta4, view_update.h belongs to 
src/include/rewrite, view_update.c to src/backend/rewrite. The code 
requires an initdb, however, the catalog change contained in this code is 
used but not very useful yet.

The code has the following known problems:
- User defined update rules on views are created without any intervention 
from the code. This is why the rule regression tests (and others) fails, 
because there suddenly are two INSERT/UPDATE/DELETE rules

- Indexed array fields  in a view columns list causes SIGSEGV (due to some 
experiments i do with array subscripting operations).

- There are problems with NULL fields in WHERE conditions.
- gram.y is only an ugly hack to get the CHECK OPTION working. needs deeper 
efforts, because it makes WITH a reserved keyword

The following items needs deeper discussion i think:
- DEFAULT values on the underlying base tables needs to be specified 
explicitly to the view itself (via ALTER TABLE), another problem are 
SERIALs ...
- What should happen, if someone throws a pg_dump or sql script at the 
backend, that holds own update rules for a view? in this case the implicit 
ones should be removed or deactivated 
- the code only supports cascaded rules and all rules are created on the 
base relations only. So if one underlying base relation is not updateable, 
the view itself is not updateable, too.

--
TIA
 Bernd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing
On K, 2004-10-20 at 03:03, Simon Riggs wrote:

 Well, thats the best one yet. That's the solution, if ever I heard it.
 
 The reduction in bitmap size makes their use much safer. Size matters, since
 we're likely to start using these techniques on very large databases, which
 imply obviously have very large CTID lists. The problem with guessing the
 number of rows is you're never too sure whether its worth the startup
 overhead of using the bitmap technique. my next question was going to
 be, so how will you know when to use the technique?
 
 Hmmmthinkyou'd need to be clear that the cost of scanning a block
 didn't make the whole thing impractical. Generally, since we're using this
 technique to access infrequent row combinations, we'd be looking at no more
 than one row per block usually anyway. So the technique is still I/O bound -
 a bit extra post I/O cpu work won't hurt much. OK, cool.

I still think that an initial implementation could be done with a plain
bitmap kind of bitmap, if we are content with storing one bit per page
only - a simple page bitmap for 1TB table with 8kB pages takes only 16
MB, and that's backends private memory not more scarce shared memory.

It takes only 4mb for 32kb pages.

I guess that anyone working with terabyte size tables can afford a few
megabytes of RAM for query processing.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Automatic view update rules

2004-10-26 Thread Bernd Helmle
--On Dienstag, Oktober 26, 2004 13:02:27 +0200 Bernd Helmle 
[EMAIL PROTECTED] wrote:

[sorry if this mail appears more than once, but it seems the others
didn't make it through the list]
This is a short preview on the view update code i'm currently working on.
It is far away from being ready, but i want to share the current code and
get some hints, what things have to be made better:
http://www.oopsware.de/pgsql/pgsql-view_update.tar.gz
The patchfile is against beta4, view_update.h belongs to
src/include/rewrite, view_update.c to src/backend/rewrite. The code
requires an initdb, however, the catalog change contained in this code is
used but not very useful yet.
The code has the following known problems:
- User defined update rules on views are created without any intervention
from the code. This is why the rule regression tests (and others) fails,
because there suddenly are two INSERT/UPDATE/DELETE rules
- Indexed array fields  in a view columns list causes SIGSEGV (due to
some experiments i do with array subscripting operations).
- There are problems with NULL fields in WHERE conditions.
- gram.y is only an ugly hack to get the CHECK OPTION working. needs
deeper efforts, because it makes WITH a reserved keyword
The following items needs deeper discussion i think:
- DEFAULT values on the underlying base tables needs to be specified
explicitly to the view itself (via ALTER TABLE), another problem are
SERIALs ...
- What should happen, if someone throws a pg_dump or sql script at the
backend, that holds own update rules for a view? in this case the
implicit ones should be removed or deactivated 
- the code only supports cascaded rules and all rules are created on the
base relations only. So if one underlying base relation is not
updateable, the view itself is not updateable, too.
 Sorry, forgot the subject :(
--
 Bernd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing
On K, 2004-10-20 at 01:52, Mark Kirkwood wrote:

 I don't believe that read only is required. The update/insert 
 performance impact of bimap indexes is however very high (in Oracle's 
 implementation anyway) - to the point where many sites drop them before 
 adding in new data, and recreated 'em afterwards!
 
 In the advent that there is a benefit for the small on-disk footprint, 
 the insert/update throughput implications will need to be taken into 
 account.

I repeat here my earlier proposal of making the bitmap indexes
page-level and clustering data automatically on AND of all defined
bitmap indexes. 

This would mostly solve this problem too, as there will be only one
insert per page per index (when the first tuple is inserted) and one
delete (when the page gets empty).

This has a downside of suboptimal space usage but this should not (tm)
be an issue for large tables, where most combinations of bits will get
enough hits to fill several pages.

Such clustering would also help (probably a lot) all queries actually
using these indexes.

-
Hannu

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Simon Riggs
On Tue, 2004-10-26 at 09:49, Simon Riggs wrote:
 On Mon, 2004-10-25 at 16:34, Jan Wieck wrote: 
  The problem is, with a too small directory ARC cannot guesstimate what 
  might be in the kernel buffers. Nor can it guesstimate what recently was 
  in the kernel buffers and got pushed out from there. That results in a 
  way too small B1 list, and therefore we don't get B1 hits when in fact 
  the data was found in memory. B1 hits is what increases the T1target, 
  and since we are missing them with a too small directory size, our 
  implementation of ARC is propably using a T2 size larger than the 
  working set. That is not optimal.
 
 I think I have seen that the T1 list shrinks too much, but need more
 tests...with some good test results
 
  If we would replace the dynamic T1 buffers with a max_backends*2 area of 
  shared buffers, use a C value representing the effective cache size and 
  limit the T1target on the lower bound to effective cache size - shared 
  buffers, then we basically moved the T1 cache into the OS buffers.
 
 Limiting the minimum size of T1len to be 2* maxbackends sounds like an
 easy way to prevent overbalancing of T2, but I would like to follow up
 on ways to have T1 naturally stay larger. I'll do a patch with this idea
 in, for testing. I'll call this T1 minimum size so we can discuss it.
 

Don't know whether you've seen this latest update on the ARC idea:
Sorav Bansal and Dharmendra S. Modha, 
CAR: Clock with Adaptive Replacement,
in Proceedings of the USENIX Conference on File and Storage Technologies
(FAST), pages 187--200, March 2004.
[I picked up the .pdf here http://citeseer.ist.psu.edu/bansal04car.html]

In that paper Bansal and Modha introduce an update to ARC called CART
which they say is more appropriate for databases. Their idea is to
introduce a temporal locality window as a way of making sure that
blocks called twice within a short period don't fall out of T1, though
don't make it into T2 either. Strangely enough the temporal locality
window is made by increasing the size of T1... in an adpative way, of
course.

If we were going to put a limit on the minimum size of T1, then this
would put a minimal temporal locality window in placerather than
the increased complexity they go to in order to make T1 larger. I note
test results from both the ARC and CAR papers that show that T2 usually
represents most of C, so the observations that T1 is very small is not
atypical. That implies that the cost of managing the temporal locality
window in CART is usually wasted, even though it does cut in as an
overall benefit: The results show that CART is better than ARC over the
whole range of cache sizes tested (16MB to 4GB) and workloads (apart
from 1 out 22).

If we were to implement a minimum size of T1, related as suggested to
number of users, then this would provide a reasonable approximation of
the temporal locality window. This wouldn't prevent the adaptation of T1
to be higher than this when required.

Jan has already optimised ARC for PostgreSQL by the addition of a
special lookup on transactionId required to optimise for the double
cache lookup of select/update that occurs on a T1 hit. That seems likely
to be able to be removed as a result of having a larger T1.

I'd suggest limiting T1 to be a value of:
shared_buffers = 1000  T1limit = max_backends *0.75
shared_buffers = 2000  T1limit = max_backends
shared_buffers = 5000  T1limit = max_backends *1.5
shared_buffers  5000   T1limit = max_backends *2

I'll try some tests with both
- minimum size of T1
- update optimisation removed

Thoughts?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Possible make_oidjoins_check Security Issue

2004-10-26 Thread Bruce Momjian
Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  On Wed, 2004-10-20 at 06:18, Rod Taylor wrote:
  http://secunia.com/advisories/12860/
 
  This seems like a rather inconsequential problem,
 
 Indeed, since ordinary users have no use for make_oidjoins_check.
 It's surely very implausible that anyone would run it as root; and
 even if someone did, the attacker cannot control what gets written.
 
  but it should be fixed. The first two ideas that come to mind: use
  temporary files in $PWD rather than /tmp, or create a subdirectory in
  /tmp to use for the temporary files.
 
 I believe that the subdirectory idea is also vulnerable without great
 care.

I believe the proper way to handle this is a new directory under /tmp. 
I use this in my local scripts:

TMP=/tmp/$$
OMASK=`umask`
umask 077
if ! mkdir $TMP
thenecho Can't create temporary directory $TMP. 12
exit 1
fi
umask $OMASK
unset OMASK

It basically makes sure it creates a new directory under /tmp with a
umask that guarantees no one else can create a file in that directory. 
All temp files are accessed as $TMP/XXX.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Andrew Dunstan

Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.

Bottom line, this is a real mess. Surely postgres is not the only 
application in the world that wants to be able to delete a directory 
tree reliably on Windows. What do other apps do?

cheers
andrew
2004-10-26 10:26:35 [2496] LOG:  connection received: host=127.0.0.1 
port=1918
2004-10-26 10:26:35 [2496] LOG:  connection authorized: user=pgrunner 
database=template1
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1247
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1249
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1255
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1259
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16384
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16386
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16388
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16390
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16392
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16394
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16396
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16398
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16400
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16402
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16404
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16406
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16408
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16410
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16412
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16414
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16416
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16418
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16672
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16674
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16676
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16678
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16679
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16680
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16681
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16682
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16683
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16684
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16685
2004-10-26 10:26:35 [2496] 

Re: [HACKERS] Possible make_oidjoins_check Security Issue

2004-10-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I believe the proper way to handle this is a new directory under /tmp. 

It's definitely not worth the trouble.  I looked at what configure does
to make /tmp subdirectories portably, and it is spectacularly ugly
(not to mention long).  If make_oidjoins_check were a user-facing tool
that would be one thing, but it isn't ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] making pdf of docs

2004-10-26 Thread Dennis Bjorklund
Is there something wrong that makes it impossible to build the doc as a 
pdf?

I started a build 4 hours ago, and it has still not finished (stuck at
100% CPU on my old 800Mhz 1G RAM machine).

I know that openjade is very slow so for the first 3 hours I didn't worry.
Now I'm starting to think that it will never finish.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] making pdf of docs

2004-10-26 Thread Kris Jurka


On Tue, 26 Oct 2004, Dennis Bjorklund wrote:

 Is there something wrong that makes it impossible to build the doc as a 
 pdf?

My experience is that the latest openjade crashes.  The latest jade takes 
about 10 days on an Athlon 1600, but I can build it in a very reasonable 
timeframe with an older version of jade.

Well scratch that.  I just checked the jade versions and they are
identical, perhaps different sytlesheets or something?  For now all I know
is that it works on debian stable, but takes forever on unstable.  I'll 
have to do some more investigating to try and find the real difference.


Kris Jurka

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Greg Stark

Curt Sampson [EMAIL PROTECTED] writes:

 On Tue, 26 Oct 2004, Greg Stark wrote:
 
  I see mmap or O_DIRECT being the only viable long-term stable states. My
  natural inclination was the former but after the latest thread on the subject
  I suspect it'll be forever out of reach. That makes O_DIRECT And a Postgres
  managed cache the only real choice. Having both caches is just a waste of
  memory and a waste of cpu cycles.
 
 I don't see why mmap is any more out of reach than O_DIRECT; it's not
 all that much harder to implement, and mmap (and madvise!) is more
 widely available.

Because there's no way to prevent a write-out from occurring and no way to be
notified by mmap before a write-out occurs, and Postgres wants to do its WAL
logging at that time if it hasn't already happened.

 But if using two caches is only costing us 1% in performance, there's
 not really much point

Well firstly it depends on the work profile. It can probably get much higher
than we saw in that profile if your work load is causing more fresh buffers to
be fetched.

Secondly it also reduces the amount of cache available. If you have 256M of
ram with about 200M free, and 40Mb of ram set aside for Postgres's buffer
cache then you really only get 160Mb. It's costing you 20% of your cache, and
reducing the cache hit rate accordingly.

Thirdly the kernel doesn't know as much as Postgres about the load. Postgres
could optimize its use of cache based on whether it knows the data is being
loaded by a vacuum or sequential scan rather than an index lookup. In practice
Postgres has gone with ARC which I suppose a kernel could implement anyways,
but afaik neither linux nor BSD choose to do anything like it.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] making pdf of docs

2004-10-26 Thread Mike Mascari
Dennis Bjorklund wrote:
Is there something wrong that makes it impossible to build the doc as a 
pdf?

I started a build 4 hours ago, and it has still not finished (stuck at
100% CPU on my old 800Mhz 1G RAM machine).
I know that openjade is very slow so for the first 3 hours I didn't worry.
Now I'm starting to think that it will never finish.
I've never tried building PDF from PostgreSQL DocBook source. However, 
in other DocBook documents, I've found that if there is an embedded 
image that is too large to fit on a single page, various PDF renderers 
will paginate the image onto the next page, discover it is too large to 
fit on the next page, generate a page break, and the process continues 
ad infinitum.

Maybe a recent large image was added to the docs?
FWIW,
Mike Mascari
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Greg Stark

Hannu Krosing [EMAIL PROTECTED] writes:

 I repeat here my earlier proposal of making the bitmap indexes
 page-level and clustering data automatically on AND of all defined
 bitmap indexes. 

The problem with page-level bitmaps is that they could be much less effective.
Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
those matches about 1% of your tuples. If you have 100 tuples per page then
each of those bitmaps will find a tuple in about half the pages. So the
resulting AND will find about 1/8th of the pages as candidates. In reality the
number of pages it should have to fetch should be more like 1 in a million.

The other problem is that for persist on-disk indexes they require more work
to update. You would have to recheck every other tuple in the page to
recalculate the bit value instead of just being able to flip one bit.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Postgres performs a Seq Scan instead of an Index Scan!

2004-10-26 Thread Jos van Roosmalen
Hello,
I have a little question. Why performs Postgresql a Seq. Scan in the 
next Select statement instead of a Index Read?

I have an index on (ATTR1,ATTR2,ATTR3), so why is postgresql not 
performing a Index Keyed Read in the SELECT?

I agree that the tables are empty so maybe this influence the decision 
to do a Seq scan, but my app use a DB with arround 100.000 records and 
it still does a seq. scan.

CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP);
CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3);
EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND 
ATTR3='2004-01-01';

Result in:
   QUERY 
PLAN   
--
Seq Scan on testtable  (cost=0.00..27.50 rows=1 width=20)
  Filter: ((attr1 = 1) AND (attr2 = 2) AND (attr3 = '2004-01-01 
00:00:00'::timestamp without time zone))
(2 rows)

If I add a INDEXHELPER it helps a bit. But it's not a 100% Index Scan.
CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP);
CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3);
CREATE INDEX INDEXHELPER ON TESTTABLE(ATTR3);
EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND 
ATTR3='2004-01-01';

 QUERY 
PLAN  
---
Index Scan using indexhelper on testtable  (cost=0.00..17.09 rows=1 
width=20)
  Index Cond: (attr3 = '2004-01-01 00:00:00'::timestamp without time zone)
  Filter: ((attr1 = 1) AND (attr2 = 2))
(3 rows)

Changing from TIMESTAMP to DATE don't help (I am not using the time 
component in my app):

CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 DATE);
CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3);
EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND 
ATTR3='2004-01-01';

   QUERY PLAN   
--
Seq Scan on testtable  (cost=0.00..27.50 rows=1 width=16)
  Filter: ((attr1 = 1) AND (attr2 = 2) AND (attr3 = '2004-01-01'::date))
(2 rows)

Thanks in Advance,
Jos
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!

2004-10-26 Thread James Robinson
On Oct 26, 2004, at 12:12 PM, Jos van Roosmalen wrote:
ATTR1 INT8
Looks like your column is int8, yet your query is sending in an int4. 
Therefore the index is not used. This is fixed in PG 8.0. In the mean 
time, you can:

SELECT * FROM TESTTABLE WHERE ATTR1=1::INT8 ...
which explicitly casts the literal int4 to an int8, making the int8 
column index useable.


James Robinson
Socialserve.com
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!

2004-10-26 Thread Jochem van Dieten
On Tue, 26 Oct 2004 18:12:36 +0200, Jos van Roosmalen wrote:
 
 I have a little question. Why performs Postgresql a Seq. Scan in the
 next Select statement instead of a Index Read?

That is a FAQ: http://www.postgresql.org/docs/faqs/FAQ.html#4.8


Please direct any further questions of this nature that are not
covered in the FAQ or the documentation to the pgsql-performance list.

Jochem

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!

2004-10-26 Thread Kurt Roeckx
On Tue, Oct 26, 2004 at 06:12:36PM +0200, Jos van Roosmalen wrote:
 
 CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP);
 CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3);
 EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND 
 ATTR3='2004-01-01';


try:
explain select * from testtable where attr1=1::int8 and attr2=2
and attr3='2004-01-01';

Or change the int8 to bigint.

Isn't this in some FAQ yet?

It will not do an index scan if the types don't match.


Kurt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Here is some more info. Below is a trace from dropdb. There is a loop 
 around the rmdir() calls which I have set to time out at 600 seconds. 
 The call eventually succeeds after around 300 seconds (I've seen this 
 several times). It looks like we are the victim of some caching - the 
 directory still thinks it has some of the files it has told us we have 
 deleted successfully.

If you rescan the directory after deleting the files, does it show
as empty?

 Bottom line, this is a real mess. Surely postgres is not the only 
 application in the world that wants to be able to delete a directory 
 tree reliably on Windows. What do other apps do?

I'm wondering if this is a side effect of the way win32_open does
things.  It's hard to believe that rmdir is that bogus in general,
but perhaps win32_open is causing us to exercise a corner case?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] to_char/to_number loses sign

2004-10-26 Thread Tom Lane
Karel Zak [EMAIL PROTECTED] writes:
 Yes, you're right. It strange, but NUM_S missing there. The conversion
 from string to number is less stable part of formatting.c...

 The patch is in the attachment.

This patch causes the regression tests to fail.  I think you need to
consider the to_char() side of it more carefully.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Postresql 8.0 Beta 3 - SELECT ... FOR UPDATE

2004-10-26 Thread Bruce Momjian

There is a statement_timeout that will control how long a statement can
execute before being cancelled.  We have never agreed that controlling
how long we wait for an individual lock is valuable.

---

Robert Treat wrote:
 On Thursday 21 October 2004 06:44, you wrote:
  Hi
 
  Was already implemented the timeout on the SELECT ... FOR UPDATE
  (non-blocking lock) and/or is possible known if the lock exist on the
  specified ROW before executing the SELECT?
 
 
 No.
 
  Please note: ours need is the timeout/verify at the ROW level, not at the
  table level.
 
  Is already OK? Is in the TODO list?
  May you suggest an alternative method?
 
  Thank you.
 
 You would need a more extensive implementation of row level locks than 
 PostgreSQL currently offers. There have been discussions about this in the 
 past, but afaik no one is actively working on it.  You can probably find more 
 info in the archives about it, also I believe it is on the TODO list, so you 
 might find some more detail by looking there.  
 
 -- 
 Robert Treat
 Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] New compile warnings in CVS

2004-10-26 Thread Bruce Momjian
I just updated my CVS copy and am seeing four new warnings from
pgstat.c:

pgstat.c:2352: warning: variable `dbentry' might be clobbered by
`longjmp' or `vfork'
pgstat.c:2360: warning: variable `havebackends' might be clobbered by
`longjmp' or `vfork'
pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by
`longjmp' or `vfork'
pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by
`longjmp' or `vfork'

Any ideas on this?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] New compile warnings in CVS

2004-10-26 Thread Bruce Momjian
Bruce Momjian wrote:
 I just updated my CVS copy and am seeing four new warnings from
 pgstat.c:
   
   pgstat.c:2352: warning: variable `dbentry' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2360: warning: variable `havebackends' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by
   `longjmp' or `vfork'
 
 Any ideas on this?

Oh this is BSD/OS 4.3 using gcc:

gcc version 2.95.3 20010315 (release)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.
   

If you rescan the directory after deleting the files, does it show
as empty?
 

No! That's how I got the list of files it still thinks are there. 
Gross, eh?

 

Bottom line, this is a real mess. Surely postgres is not the only 
application in the world that wants to be able to delete a directory 
tree reliably on Windows. What do other apps do?
   

I'm wondering if this is a side effect of the way win32_open does
things.  It's hard to believe that rmdir is that bogus in general,
but perhaps win32_open is causing us to exercise a corner case?
 

I don't know. I tried to reproduce it in a simple case using 
fopen/fclose and wasn't able to.

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] New compile warnings in CVS

2004-10-26 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I just updated my CVS copy and am seeing four new warnings from
 pgstat.c:

   pgstat.c:2352: warning: variable `dbentry' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2360: warning: variable `havebackends' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by
   `longjmp' or `vfork'
   pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by
   `longjmp' or `vfork'

No doubt this is from the PG_TRY that Neil added a couple days ago.
I think he is going to take it out again in favor of using AllocateFile,
so ignore the warnings for now (they're obviously bogus anyway).

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] 8.0b4: COMMIT outside of a transaction echoes ROLLBACK

2004-10-26 Thread Ian Barwick
just wondering:

test= select version();
 version
--
 PostgreSQL 8.0.0beta4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
3.3.3 (SuSE Linux)
(1 row)

test= begin;
BEGIN
test= commit;
COMMIT
test= commit;
WARNING:  there is no transaction in progress
ROLLBACK

Is there any reason ROLLBACK and not COMMIT is echoed here?

Ian Barwick
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] timestamp with time zone a la sql99

2004-10-26 Thread Bruce Momjian

Added to TODO:

* Once we expand timestamptz to bigger than 8 bytes, there's essentially


---

Tom Lane wrote:
 Dennis Bjorklund [EMAIL PROTECTED] writes:
  So if I understand you correctly you are planning to extend the current
  timestamp type to work with both named time zones and HH:MM ones? I didn't
  think you wanted the last one since your plan was to store a UTC+OID where
  the OID pointed to a named time zone. And I guess that you don't plan to
  add 00:00, 00:01, 00:02, ... as named zones with an OID.
 
 I missed getting back to you on this, but I think we can do both.  Some
 random points:
 
 * Once we expand timestamptz to bigger than 8 bytes, there's essentially
 zero cost to making it 12 bytes, and for that matter we could go to 16
 without much penalty, because of alignment considerations.  So there's
 plenty of space.
 
 * What we need is to be able to represent either a fixed offset from UTC
 or a reference of some kind to a zic database entry.  The most
 bit-splurging way of doing the former is a signed offset in seconds from
 Greenwich, which would take 17 bits.  It'd be good enough to represent
 the offset in minutes, which needs only 11 bits.
 
 * I suggested OIDs for referencing zic entries, but we don't have to do
 that; any old mapping table will do.  16 bits would surely be plenty to
 assign a unique label to every present and future zic entry.
 
 * My inclination therefore is to extend timestamptz with two 16-bit
 fields, one being the offset from UTC (in minutes) and one being the
 zic identifier.  If the identifier is zero then it's a straight numeric
 offset from UTC and the offset field is all you need (this is the SQL
 spec compatible case).  If the identifier is not zero then it gives you
 an index to look up the timezone rules.  However, there is no need for
 the offset field to go to waste; we should store the offset anyway,
 since that might save a trip to the zic database in some cases.
 
 * It's not clear to me yet whether the stored offset in the second case
 should be the zone's standard UTC offset (thus always the same for a
 given zone ID) or the current-time offset for the timestamp (thus
 different if the timestamp is in daylight-savings or standard time).
 
 * If we store the current-time offset then it almost doesn't matter
 whether the timestamp itself is stored as a UTC or local time value;
 you can trivially translate either to the other by adding or subtracting
 the offset (*60).  But I'm inclined to store UTC for consistency with
 past practice, and because it will make comparisons a bit faster: you
 can compare the timestamps without adjusting first.  Generally I think
 comparisons ought to be the best-optimized operations in a Postgres
 datatype, because index operations will do a ton of 'em.  (We definitely
 do NOT want to have to visit the zic database in order to compare two
 timestamptz values.)
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS]

2004-10-26 Thread Hannu Krosing
On T, 2004-10-26 at 14:02, Bernd Helmle wrote:

 - gram.y is only an ugly hack to get the CHECK OPTION working. needs deeper 
 efforts, because it makes WITH a reserved keyword

IMHO it should be a reserved keyword. I once wrangled to fit ANSI SQL
recursive queries into postgres grammar and the only way I could do that
also forced me to make WITH a reserved keyword.

Thus I think that reserved keyword is what it is meant to be in the
first place ;)

--
Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] DBT-3 Query 2 EXPLAIN ANALYZE differences

2004-10-26 Thread Mark Wong
I was doing some testing with DBT-3 on our 8-way STP systems and
noticed a significant difference in the execution of Query 2 using
8.0beta3.

Here is the query template we're using:
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone,
   s_comment
from part, supplier, partsupp, nation, region
where p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p_size = :1
  and p_type like '%:2'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = ':3'
  and ps_supplycost = ( select min(ps_supplycost)
from partsupp, supplier, nation, region
where p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = ':3'
  )
order by s_acctbal desc, n_name, s_name, p_partkey

This first run executes it pretty fast:
http://khack.osdl.org/stp/298338/  1555.414 ms

This second run executes it relatively slow:
http://khack.osdl.org/stp/298340/  42532.855 ms

The plans are different and I suspect thats where the differences lie.
For brevity (and readability) I won't copy the plans here but I'll
provide the links.  Search for 'PERF1.POWER.Q2' in the file, it's the
second query executed and you'll notice the differences under the
SubPlan:

The first run:
http://khack.osdl.org/stp/298338/results/run/power_query.result

The second run:
http://khack.osdl.org/stp/298340/results/run/power_query.result

I know Jenny has previously presented a problem that was solved by
doing a setseed(0), but I noticed the kit doesn't do that anymore.
Anyone know if this might be the same or related issues?

Thanks,
Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing
On T, 2004-10-26 at 18:42, Greg Stark wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  I repeat here my earlier proposal of making the bitmap indexes
  page-level and clustering data automatically on AND of all defined
  bitmap indexes. 
 
 The problem with page-level bitmaps is that they could be much less effective.
 Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
 those matches about 1% of your tuples. If you have 100 tuples per page then
 each of those bitmaps will find a tuple in about half the pages. So the
 resulting AND will find about 1/8th of the pages as candidates. In reality the
 number of pages it should have to fetch should be more like 1 in a million.
 
 The other problem is that for persist on-disk indexes they require more work
 to update. You would have to recheck every other tuple in the page to
 recalculate the bit value instead of just being able to flip one bit.

Read again ;)

the per-page clustering would make sure that all the tuples would be on
1 (or on a few) pages.

and what comes to updating the index, you have to do it only once per
100 pages ;)


Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing
On T, 2004-10-26 at 23:53, Hannu Krosing wrote:
 On T, 2004-10-26 at 18:42, Greg Stark wrote:
  Hannu Krosing [EMAIL PROTECTED] writes:
  
   I repeat here my earlier proposal of making the bitmap indexes
   page-level and clustering data automatically on AND of all defined
   bitmap indexes. 
  
  The problem with page-level bitmaps is that they could be much less effective.
  Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
  those matches about 1% of your tuples. If you have 100 tuples per page then
  each of those bitmaps will find a tuple in about half the pages. So the
  resulting AND will find about 1/8th of the pages as candidates. In reality the
  number of pages it should have to fetch should be more like 1 in a million.

Another way to solve the unequal number of tuples per page problem was
to have a fixed length bitmap with rollower (mod length) for each page. 

So your 100 tuples per page on average table should get a 32 (or 64)
bits per page bitmap where bits 1, 33, 65 and 97 would all be in the
same position (for 32 bits), but one could still do fast ANDS and ORS
with high degree of accuracy.

I guess the per-page clustering idea described in my previous mail can
even be extended inside the pages (i.e. cluster on same bits in
2/4/8/16/32bit page bitmap) if simple per/page bitmaps would waste too
much space (many different values, few actual rows - i.e. not a good
candidate for real bitmap indexes ;-p )

  The other problem is that for persist on-disk indexes they require more work
  to update. You would have to recheck every other tuple in the page to
  recalculate the bit value instead of just being able to flip one bit.
 
 Read again ;)
 
 the per-page clustering would make sure that all the tuples would be on
 1 (or on a few) pages.
 
 and what comes to updating the index, you have to do it only once per
 100 pages ;)

This kind of clustering index works best when created on an empty table,
so all tuples can be inserted on their rightful pages.

If this kind of BM index is created on a table with some data, we need
an additional bitmap for gray pages - that is pages containing tuples
matching several combinations of index bits. 

The way to sharpen a table with gray pages would be either a CLUSTER
command or VACUUM (which could check for same-bit-combination-ness.

At least an empty page would be initially (or after becoming empty
during vacuum) marked non-gray and it should also never become gray
unless a new bitmap index is added.

-
Hannu



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Andre Maasikas
Hannu Krosing wrote:
the per-page clustering would make sure that all the tuples would be on
1 (or on a few) pages.
I understand that You can cluster on one column, but how do you do it for
indexes on other columns?
BTW, lossy variants also lose count(), group by only from index
and what comes to updating the index, you have to do it only once per
100 pages ;)
Sorry, how does that work, if I update foo = 'bar'-'baz' - I can flip 
the 'baz' bit
on right away but I have to check every other row to see
if I can turn the 'bar' bit off

Andre
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Unixware 714 pthreads

2004-10-26 Thread Bruce Momjian

The only help I can be is that on Unixware (only) the backend is
compiled with threading enabled.  This might be showing some thread
bugs.


---

[EMAIL PROTECTED] wrote:
 Hi every one,
 
 I need help to debug the problem I have on Unixware 714 and beta3.
 postgresql make check hangs on plpgsql test when compiled with
 --enable-thread-safty.
 
 It does hang on select block_me();
 
 This select should be canceled by the set statement_timeout=1000, instead,
 the backend is 100% CPU bound and only kill -9 can kill it.
 
 It works ok when compiled without -enable-thread-safty.
 
 I've tried almost every thing I could think of, but not knowing so much
 about threads and PG source code, I request that someone can help me as to
 find a way to debug this. It worked up until beta2, but I'm not sure
 block_me()was there.
 
 I really need someone to tell me where to begin.
 
 TIA
 
 -- 
 Olivier PRENANT   Tel: +33-5-61-50-97-00 (Work)
 6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
 31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
 FRANCE  Email: [EMAIL PROTECTED]
 --
 Make your life a dream, make your dream a reality. (St Exupery)
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Reini Urban
Andrew Dunstan schrieb:
Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.
300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?
Why don't you use DeletFileA() instead of unlink()?
Or even better, why don't you use this delete on close snippet instead:
HANDLE h;
h = CreateFile (win32_name, 0, FILE_SHARE_READ, sec_none_nih,
	  OPEN_EXISTING, FILE_FLAG_DELETE_ON_CLOSE, 0);
if (h != INVALID_HANDLE_VALUE)
{
	(void) SetFileAttributes (win32_name, (DWORD) win32_name);
	BOOL res = CloseHandle (h);
	//syscall_printf (%d = CloseHandle (%p), res, h);
	if (GetFileAttributes (win32_name) == INVALID_FILE_ATTRIBUTES)
	{
	  //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) 
succeeded);
	  goto ok;
	}
	  else
	{
	  //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) failed);
	  SetFileAttributes (win32_name, (DWORD) win32_name  
~(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_SYSTEM));
	}
	}
}

  /* Try a delete with attributes reset */
  if (DeleteFile (win32_name))
{
  syscall_printf (DeleteFile after CreateFile/CloseHandle succeeded);
  goto ok;
}
It should only happen a ERROR_SHARING_VIOLATION on NT systems with such 
a long timeout. This is then a concurrency problem. win95 will not 
return ERROR_SHARING_VIOLATION, only ERROR_ACCESS_DENIED

...
2004-10-26 10:31:09 [2496] WARNING:  rmtree: rmdir took 274 secs/loops
2004-10-26 10:31:09 [2496] LOG:  disconnection: session time: 0:04:34.11 
user=pgrunner database=template1 host=127.0.0.1 port=1918
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Should bgwriter log checkpoint start/end?

2004-10-26 Thread Tom Lane
In previous releases it was possible to observe whether an automatic
checkpoint was in progress by looking to see if there was a postmaster
child process doing one.  In 8.0 this will not work because the bgwriter
is always there.  I am thinking that for tasks such as performance
debugging it would be a good idea if the bgwriter could emit postmaster
log messages at start and end of a checkpoint.  However, this should
probably not happen at the default LOG level since it would clutter the
logs with perfectly routine messages.  Any opinions about what elog
level to use for this?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] DBT-3 Query 2 EXPLAIN ANALYZE differences

2004-10-26 Thread Josh Berkus
Mark,

 The plans are different and I suspect thats where the differences lie.
 For brevity (and readability) I won't copy the plans here but I'll
 provide the links. Search for 'PERF1.POWER.Q2' in the file, it's the
 second query executed and you'll notice the differences under the
 SubPlan:

Yeah, the difference seems to be an arbitrary choice of which table to join on 
first.The error is only +10 ms, but when it's being executed 3000 
times ...

Will delve into this when I can re-format it, and try to figure out why the 
planner is choosing what it is.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Josh Berkus
Thomas,

 As a result, I was intending to inflate the value of
 effective_cache_size to closer to the amount of unused RAM on some of
 the machines I admin (once I've verified that they all have a unified
 buffer cache). Is that correct?

Currently, yes.  Right now, e_c_s is used just to inform the planner and make 
index vs. table scan and join order decisions.

The problem which Simon is bringing up is part of a discussion about doing 
*more* with the information supplied by e_c_s.He points out that it's not 
really related to the *real* probability of any particular table being 
cached.   At least, if I'm reading him right.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] New compile warnings in CVS

2004-10-26 Thread Neil Conway
On Wed, 2004-10-27 at 03:57, Tom Lane wrote:
 No doubt this is from the PG_TRY that Neil added a couple days ago.
 I think he is going to take it out again in favor of using AllocateFile,
 so ignore the warnings for now (they're obviously bogus anyway).

Sorry, I didn't see those compile warnings locally (using GCC 3.3).

I'll send the patch for AllocateFile() to -patches shortly.

-Neil



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Curt Sampson
On Wed, 26 Oct 2004, Greg Stark wrote:

  I don't see why mmap is any more out of reach than O_DIRECT; it's not
  all that much harder to implement, and mmap (and madvise!) is more
  widely available.

 Because there's no way to prevent a write-out from occurring and no way to be
 notified by mmap before a write-out occurs, and Postgres wants to do its WAL
 logging at that time if it hasn't already happened.

I already described a solution to that problem in a post earlier in this
thread (a write queue on the block). I may even have described it on
this list a couple of years ago, that being about the time I thought
it up. (The mmap idea just won't die, but at least I wasn't the one to
bring it up this time. :-))

 Well firstly it depends on the work profile. It can probably get much higher
 than we saw in that profile

True, but 1% was is much, much lower than I'd expected. That tells me
that my intuitive idea of the performance model is wrong, which means,
for me at least, it's time to shut up or put up some benchmarks.

 Secondly it also reduces the amount of cache available. If you have 256M of
 ram with about 200M free, and 40Mb of ram set aside for Postgres's buffer
 cache then you really only get 160Mb. It's costing you 20% of your cache, and
 reducing the cache hit rate accordingly.

Yeah, no question about that.

 Thirdly the kernel doesn't know as much as Postgres about the load. Postgres
 could optimize its use of cache based on whether it knows the data is being
 loaded by a vacuum or sequential scan rather than an index lookup. In practice
 Postgres has gone with ARC which I suppose a kernel could implement anyways,
 but afaik neither linux nor BSD choose to do anything like it.

madvise().

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.NetBSD.org
 Make up enjoying your city life...produced by BIC CAMERA

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 8.0b4: COMMIT outside of a transaction echoes ROLLBACK

2004-10-26 Thread Bruce Momjian
Ian Barwick wrote:
 just wondering:
 
 test= select version();
  version
 --
  PostgreSQL 8.0.0beta4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
 3.3.3 (SuSE Linux)
 (1 row)
 
 test= begin;
 BEGIN
 test= commit;
 COMMIT
 test= commit;
 WARNING:  there is no transaction in progress
 ROLLBACK
 
 Is there any reason ROLLBACK and not COMMIT is echoed here?

Because the transaction was not committed, but rather rolled back.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] [HACKERS] ARC Memory Usage analysis

2004-10-26 Thread Scott Marlowe
On Mon, 2004-10-25 at 23:53, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  Another issue is what we do with the effective_cache_size value once we
  have a number we trust.  We can't readily change the size of the ARC
  lists on the fly.
 
  Huh? I thought effective_cache_size was just used as an factor the cost
  estimation equation.
 
 Today, that is true.  Jan is speculating about using it as a parameter
 of the ARC cache management algorithm ... and that worries me.

Because it's so often set wrong I take it.  But if it's set right, and
it makes the the database faster to pay attention to it, then I'd be in
favor of it.  Or at least having a switch to turn on the ARC buffer's
ability to look at it.

Or is it some other issue, having to do with the idea of knowing
effective cache size cause a positive effect overall on the ARC
algorhythm?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] pg_dump test success

2004-10-26 Thread Christopher Kings-Lynne
Hi guys,
I just thought I'd let you know that i just dumped our production 
database with 8.0's pg_dumpall and reloaded it into a test 8.0 database.

It worked.
No errors.
For the first time in our company's history with PostgreSQL, we can 
upgrade without editing the dump file!!!

I feel like my life's work is complete :D
Chris
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match