Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Neil Conway
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> 2. Evaluate the performance of different hash index implementations
>and/or changes to the current implementation. My current plan is
>to keep the implementation as simple as possible and still provide
>the desired performance. Several hash index suggestions deal with
>changing the layout of the keys on a page to improve lookup
>performance, including reducing the bucket size to a fraction of
>a page or only storing the hash value on the page, instead of
>the index value itself.

You might find this patch useful:

http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php

It implements the "just store the hash in the index" idea; it also sorts
the entries in a bucket by the hash value, which allows binary search to
be used to locate candidate matches.

I was surprised that this didn't result in a performance improvement for
the benchmarks that I ran, but I never got around to investigating
further (either running more benchmarks or checking whether there was a
bug in the implementation).

Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd like.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Martijn van Oosterhout
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote:
> Since we already have to check the actual tuple values for any index
> lookup in postgresql, we could only store the full hash value and the
> corresponding TIDs in the bucket. Then when we lookup an item by
> calculating its hash, if the exact hash is not present in the bucket,
> then we know that the item is not in the index.

Sounds like you'd be returning a bitmap for use with a bitmap scan.
That's a different take on other suggestions I've heard and would allow
a hash index to have an almost unlimited key size yet flexible
matching... (combined with other index, or even just the same index).

Neat.

Have a nice day,
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Florian G. Pflug

Tom Lane wrote:

"Florian G. Pflug" <[EMAIL PROTECTED]> writes:

So I believe you're right, and we can skip taking the lock in the no
xid case - I actually think with just a little bit of more work, we
can go even further, and get rid of the ReadNewTransactionId() call
completely during snapshotting.


[ squint... ]  This goes a bit far for me.  In particular, I think this
will fail in the edge case when there are no live XIDs visible in
ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
at least not without re-scanning the ProcArray a second time, which
makes it at best a questionable win.


Why would it? The idea was to remember the largest committed xid, and that
won't go away just because the proc array is rather empty xid-wise. Actually,
in that case the "largest comitted xid"+1 will (nearly) be what
ReadNewTransactionId() returns. (Nearly because the transaction with the xid
ReadNewTransactionId()-1 might have aborted, so largestCommittedXid might be
a bit further behind ReadNewTransactionId().)

(That slightly lagging of largestCommittedXid might cause some tuples not to
be VACUUMED though, so we might want to update largestCommittedXid for
ABORTS too, and probably rename it to largestNonRunningXid or whatever ;-) ).

I would go as far as saying that largestCommittedXid+1 is the natural choice
for xmax - after all, xmax is the cutoff point after which a xid *cannot*
be seen as committed, and largestCommittedXid+1 is the smallest xmax that
guarantees that we see xacts committed before the snapshot as committed.

The xmin computation won't change - apart from using some other initial value.

This would rid us of the rather complicated entanglement of XidGenLock and
the ProcArrayLock, lessen the lock contention, and reduce the average snapshot
size a bit.

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Tom Lane
"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> The lazy-XID patch, as committed, doesn't help that situation at all,

> I think the comment is correct in principle - If we remove the oldest
> xmin without locking, then two concurrent OldestXmin calculations
> will get two different results. The question is if that has any
> negative effects, though.

My point is that there's a difference between what you compute (and
publish) as your own xmin, and what you compute as the RecentGlobalXmin.
I don't think there's any need for a guarantee that two concurrent
processes get the same estimate of RecentGlobalXmin, as long as
they do not get an estimate less than reality, ie, that someone cannot
later compute and publish a smaller xmin.

There are reasons why we want two concurrent GetSnapshotDatas to compute
the same xmin, but I think in the end it just comes down to being a
prerequisite for the above constraint --- without that you're not sure
that someone might not be about to publish an xmin less than what you
obtained as RecentGlobalXmin.

Dropping a live xid is a whole different issue.  There, you have the
problem that you need everyone to see a consistent commit order, which
is what the example in GetSnapshotData is about.  But I don't think that
xmin enters into that.  xmin is only about "is it safe to drop this
tuple because no one can see it?".  There, we don't have to be exactly
correct, we only have to err in the conservative direction.

> It was this comment in GetSnapshotData that made me keep the locking
> in the first place:

>   * It is sufficient to get shared lock on ProcArrayLock, even if we are
>   * computing a serializable snapshot and therefore will be setting
>   * MyProc->xmin. This is because any two backends that have overlapping
>   * shared holds on ProcArrayLock will certainly compute the same xmin

If I recall correctly, that text was written to justify downgrading
GetSnapshotData's hold on ProcArrayLock from exclusive to shared --- it
was merely arguing that the results wouldn't change if we did that.
I don't see an argument there that this condition is really *necessary*.
We do have to think carefully about whether GetOldestXmin can compute a
value that's too large, but right at the moment I see no problem there.

> So I believe you're right, and we can skip taking the lock in the no
> xid case - I actually think with just a little bit of more work, we
> can go even further, and get rid of the ReadNewTransactionId() call
> completely during snapshotting.

[ squint... ]  This goes a bit far for me.  In particular, I think this
will fail in the edge case when there are no live XIDs visible in
ProcArray.  You cannot go back and do ReadNewTransactionId afterward,
at least not without re-scanning the ProcArray a second time, which
makes it at best a questionable win.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Florian G. Pflug

Tom Lane wrote:

Simon was complaining a bit ago that we still have problems with
excessive contention for the ProcArrayLock, and that much of this stems
from the need for transaction exit to take that lock exclusively.
The lazy-XID patch, as committed, doesn't help that situation at all,
saying

/*
 * Lock ProcArrayLock because that's what GetSnapshotData uses.
 * You might assume that we can skip this step if we had no
 * transaction id assigned, because the failure case outlined
 * in GetSnapshotData cannot happen in that case. This is true,
 * but we *still* need the lock guarantee that two concurrent
 * computations of the *oldest* xmin will get the same result.
 */


I think the comment is correct in principle - If we remove the oldest
xmin without locking, then two concurrent OldestXmin calculations
will get two different results. The question is if that has any
negative effects, though.


That leaves xmin, which AFAICS is
only interesting for the computations of GetOldestXmin() and
RecentGlobalXmin.  And I assert it doesn't matter if those numbers
advance asynchronously, so long as they never go backward.

Yes, the xmin is surely the only field that might need need the locking.

It was this comment in GetSnapshotData that made me keep the locking
in the first place:

 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 * computing a serializable snapshot and therefore will be setting
 * MyProc->xmin. This is because any two backends that have overlapping
 * shared holds on ProcArrayLock will certainly compute the same xmin
 * (since no xact, in particular not the oldest, can exit the set of
 * running transactions while we hold ProcArrayLock --- see further
 * discussion just below). So it doesn't matter whether another backend
 * concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as
 * set or not; he'd compute the same xmin for himself either way.
 * (We are assuming here that xmin can be set and read atomically,
 * just like xid.)

But now that I read this again, I think that comment is just missleading -
especially the part "So it doesn't matter whether another backend concurrently
doing GetSnapshotData or GetOldestXmin sees our xmin as set or not; he'd compute
the same xmin for himself either way."
This sounds as if the Proc->xmin that *one* backend announces had
influence over the Proc->xmin that *another* backend might compute.
Which isn't true - it only influences the GlobalXmin that another backend might
compute.

So I believe you're right, and we can skip taking the lock in the no xid case -
I actually think with just a little bit of more work, we can go even further,
and get rid of the ReadNewTransactionId() call completely during snapshotting.

There are two things we must ensure when I comes to snapshots, commits and
xid assignment.

1) A transaction must either be not in progress, be in our snapshot, or
   have an xid >= xmax.
2) If transaction A sees B as committed, and B sees C as committed, then
   A must see C as committed.

ad 1): We guarantee that by storing the xid in the proc array before releasing
   the XidGenLock. Therefore, when we later obtain our xmax value,
   we can be sure that we see all xacts in the proc array that have an
   xid < xmax and are in progress.

ad 2): We guarantee that by serializing snapshotting against committing. Since
   we use ReadNewTransactionId() as the snapshot's xmax this implies that
   we take the ProcArrayLock *before* reading our xmax value.

Now, ReadNewTransactionId() is actually larger than necessary as a xmax.
The minimal xmax that we can set is "largest committed xid"+1. We can
easily track that value during commit when we hold the ProcArrayLock
(If we have no xid, and therefor don't have to hold the lock, we also
don't need to update that value).

If we used this "LatestCommittedXid" as xmax, we'd still guarantee (2),
but without having to hold the XidGenLock during GetSnapshotData().

I wouldn't have dared to suggest this for 8.3, but since you came up with
locking improvements in the first place... ;-)

greetings, Florian Pflug


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Greg Smith

On Thu, 6 Sep 2007, Kevin Grittner wrote:

I thought that the bgwriter_lru_percent was scanned from the lru end 
each time; I would not expect that it would ever get beyond the oldest 
10%.


You're correct; I stated that badly.  What I should have said is that your 
LRU writer could potentially scan the pool as fast as once per second if 
there were enough allocations going on.



How low does the count have to go, or does it track the count when it
becomes dirty and look for a decrease?


The usage count has to be 0 before a page can be re-used for a new 
allocation, and the LRU background writer only writes out potentially 
reusable pages that are dirty.  So the count has to be 0 before it will 
write it.



This should keep us reading new pages, which covers this, yes?


One would hope.  Your whole arrangement of shared_buffers, 
checkpoint_segments, and related parameters will need to be reconsidered 
for 8.3; you've got a delicated balanced arrangement for your 8.2 setup 
right now that's working for you, but just translating it straight to 8.3 
won't get you what you want.  I'll get back to the message you already 
sent on that subject when I get enough time to address it fully.



I'm not clear on the benefit of not writing the recently accessed dirty
pages when there are no less recently used dirty pages.


This presumes PostgreSQL has some notion of the balance of recently 
accessed vs. not accessed dirty pages, which it does not.  Buffers get 
updated individually, and there's no mechanism summarizing what's in 
there; you have to scan the buffer cache yourself to figure that out.  I 
do some of that in this new patch, tracking things like how many buffers 
are scanned on average to find reusable ones.


Many months ago, I wrote a very complicated re-implementation of the 
all-scan portion of the background writer that tracked the usage count of 
everything it looked at, kept statistics about how many pages were dirty 
at each usage count, then targeted how high of a usage count could be 
written given some information about what I/O rate you felt your devices 
could sustain.  This did exactly what you're asking for here:  wrote 
whatever dirty pages were around starting with the ones that hadn't been 
recently used, then worked its way up to pages with a higher usage count 
if the recently used ones were all clean.


As far as I've been able to tell, and from Heikki's test results, the load 
distributed checkpoint was a better answer to this problem.  Rather than 
constantly fight to get pages with high usage counts out all the time, 
just spread the checkpoint out instead and deal with them only then.  I 
gave up on that branch of code while he removed the all-scan writer 
altogether as part of committing LDC.  I suspect the path I was following 
was exactly what you think you'd like to have, but it seems that it's not 
actually needed.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Win32 build Large Address Aware?

2007-09-06 Thread Trevor Talbot
While reading one of the recent -perform threads, it occurred to me to
check, and the 8.2.4 Win32 release binaries aren't marked "large
address aware".  This means the process gets a 2GB VM space, which is
normal for 32bit Windows.  On x64, my understanding is that each 32
bit process can actually get 4GB if the appropriate flag is set in the
binary.  (I don't have the hardware to verify this.)

The reason documented for this behavior is that 2GB VM space was the
hard limit for a very long time, so some applications borrowed the
high bit for themselves to use, and couldn't cope with addresses over
2GB.  Essentially just a default for backwards compatibility.

So with that in mind, is there a reason the Win32 binaries aren't
marked that way?  Unless there are problems with it, it might be worth
doing until 64bit builds are supported.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Low hanging fruit in lazy-XID-assignment patch?

2007-09-06 Thread Tom Lane
Simon was complaining a bit ago that we still have problems with
excessive contention for the ProcArrayLock, and that much of this stems
from the need for transaction exit to take that lock exclusively.
The lazy-XID patch, as committed, doesn't help that situation at all,
saying

/*
 * Lock ProcArrayLock because that's what GetSnapshotData uses.
 * You might assume that we can skip this step if we had no
 * transaction id assigned, because the failure case outlined
 * in GetSnapshotData cannot happen in that case. This is true,
 * but we *still* need the lock guarantee that two concurrent
 * computations of the *oldest* xmin will get the same result.
 */

On reflection though this seems wrong: I believe that we could skip
taking the lock when exiting a transaction with no XID.  The actions
being guarded with the lock are

MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false;/* must be cleared with xid/xmin */

/* Clear the subtransaction-XID cache too while holding the lock */
MyProc->subxids.nxids = 0;
MyProc->subxids.overflowed = false;

Clearing xid is obviously a no-op if we had no xid, and if we had no xid
we have no subxids either, so the last 2 lines are also no-ops.  I
cannot see any reason why we need a guard on clearing lxid, either.
inVacuum is only interesting if xmin is, since if there's no xid
assigned then it's effectively just a filter on whether other backends
pay attention to this one's xmin.  That leaves xmin, which AFAICS is
only interesting for the computations of GetOldestXmin() and
RecentGlobalXmin.  And I assert it doesn't matter if those numbers
advance asynchronously, so long as they never go backward.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Tom Lane
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> On Thu, Sep 6, 2007 at 11:27 AM, in message
> <[EMAIL PROTECTED]>, Greg Smith
> <[EMAIL PROTECTED]> wrote: 
>> With the default delay of 200ms, this has the LRU-writer scanning the 
>> whole pool every 1 second,
>  
> Whoa!  Apparently I've totally misread the documentation.  I thought that
> the bgwriter_lru_percent was scanned from the lru end each time; I would
> not expect that it would ever get beyond the oldest 10%.

I believe you're correct and Greg got this wrong.  I won't draw any
conclusions about whether the LRU stuff is actually doing you any good
though.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Kevin Grittner
>>> On Thu, Sep 6, 2007 at  7:03 PM, in message
<[EMAIL PROTECTED]>, Jeff Davis <[EMAIL PROTECTED]>
wrote: 
> 
> I think ... there's still room for a simple tool that can zero out
> the meaningless data in a partially-used WAL segment before compression.
> It seems reasonable to me, so long as you keep archive_timeout at
> something reasonably high.
> 
> If nothing else, people that already have a collection of archived WAL
> segments would then be able to compact them.
 
That would be a *very* useful tool for us, particularly if it could work
against our existing collection of old WAL files.
 
-Kevin
 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Kevin Grittner
>>> On Thu, Sep 6, 2007 at 11:27 AM, in message
<[EMAIL PROTECTED]>, Greg Smith
<[EMAIL PROTECTED]> wrote: 
> On Thu, 6 Sep 2007, Kevin Grittner wrote:
> 
> I have been staring carefully at your configuration recently, and I would 
> wager that you could turn off the LRU writer altogether and still meet 
> your requirements in 8.2.
 
I totally agree that it is of minor benefit compared to the all-writer,
if it even matters at all.  I knew that when I chose the settings.
 
> Here's what you've got right now:
> 
>> shared_buffers = 160MB (=2 buffers)
>> bgwriter_lru_percent = 20.0
>> bgwriter_lru_maxpages = 200
>> bgwriter_all_percent = 10.0
>> bgwriter_all_maxpages = 600
> 
> With the default delay of 200ms, this has the LRU-writer scanning the 
> whole pool every 1 second,
 
Whoa!  Apparently I've totally misread the documentation.  I thought that
the bgwriter_lru_percent was scanned from the lru end each time; I would
not expect that it would ever get beyond the oldest 10%.  I put that in
just as a guard to keep the backends from having to wait for the OS write.
I've always doubted whether it was helping, but "it wasn't broke"
 
> while the all-writer scans every two 
> seconds--assuming they don't hit the write limits.  If some event were to 
> dirty the whole pool in 200ms, it might take as much as 6.7 seconds to 
> write everything out (2 / 600 * 200 ms) via the all-scan.
 
Right.  Since the file system didn't seem to be able to accept writes
faster than 800 PostgreSQL pages per second, and I wanted to leave a
LITTLE slack, I set that limit.  We don't seem to hit it, as far as I can
tell.  In fact, the output rate would be naturally fairly smooth, if not
for the "hold all dirty pages until the last possible moment, then write
them all to the OS and fsync" approach.
 
> There's a second low-level issue involved here.  When a page becomes 
> dirty, that implies it was also recently used, which means the LRU writer 
> won't touch it.  That page can't be written out by the LRU writer until an 
> entire pass has been made over the shared_buffer pool while looking for 
> buffers to allocate for new activity.  When the allocation clock-sweep 
> passes over the newly dirtied buffer again, its usage count will drop by 
> one and it will no longer be considered recently used.  At that point the 
> LRU writer can write it out.
 
How low does the count have to go, or does it track the count when it
becomes dirty and look for a decrease?
 
> So unless there is other allocation activity 
> going on, the scan_whole_pool_seconds mechanism will never provide the 
> bound on time to scan and write everything you hope it will.
 
That may not be an issue for the environment where this has been a problem
for us -- the web hits are coming in at a pretty good rate 24/7.  (We have
a couple dozen large companies scanning data through HTTP SOAP requests
all the time.)  This should keep us reading new pages, which covers this,
yes?
 
> where the buffer cache was 
> filled with mostly dirty buffers that couldn't be re-used
 
That would be the condition that would be the killer with a synchronous
checkpoint if the OS cache has already had some dirty pages trickled out.
If we can hit this condition in our web database, either the load
distributed checkpoint will save us, or we can't use 8.3.  Period.
 
> The completely understandable line of thinking that led to your request 
> here is one of my concerns with exposing scan_whole_pool_seconds as a 
> tunable.  It may suggest to people that if they set the number very low, 
> it will assure all dirty buffers will be scanned and written within that 
> time bound.  That's certainly not the case; both the maxpages and the 
> usage count information will actually drive the speed that mechanism plods 
> through the buffer cache.  It really isn't useful for scanning fast.
 
I'm not clear on the benefit of not writing the recently accessed dirty
pages when there are no less recently used dirty pages.  I do trust the OS
to not write them before they age out in that cache, and the OS cache
doesn't start writing dirty pages from its cache until they reach a
certain percentage of the cache space, so I'd just as soon let the OS know
that the MRU dirty pages are there, so it knows that it's time to start
working on the LRU pages in its cache.
 
-Kevin
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup(Maybe OFFTOPIC)

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 21:50 +0100, Gregory Stark wrote:

> > - Improve scalability of recovery for large I/O bound databases
> 
> That seems too vague for the TODO. Did you have specific items in mind?

I think we should parallelise recovery. Heikki wanted to do this another
way, so I worded it vaguely (i.e. as a requirement) to cover multiple
approaches. Some of that was discussed on -hackers already (where?).

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup(Maybe OFFTOPIC)

2007-09-06 Thread Gregory Stark

"Simon Riggs" <[EMAIL PROTECTED]> writes:

> High Availability
> -
> - Allow a warm standby system to also allow read-only statements
> - Allow WAL traffic to be streamed to another server for stand-by
> replication (synchronous/asynchronous options)

Asynchronous streaming of WAL would be a very good feature.

Synchronous streaming where a commit doesn't return until the backup server
(or a quorum of backup servers if you have more than one) acknowledges receipt
of the logs past the commit record would be an *extremely* good feature.

Those could be broken out into two separate points. Streaming WAL is one
thing, feedback and pitr-synchronous commits would be a second point.

Knowing what WAL record the backup server had received so far and what WAL
record it had processed so far would be useful for plenty of other purposes as
well.

> - Improve scalability of recovery for large I/O bound databases

That seems too vague for the TODO. Did you have specific items in mind?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Jeff Davis
On Thu, 2007-09-06 at 19:23 -0400, Tom Lane wrote:
> Jeff Davis <[EMAIL PROTECTED]> writes:
> > On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote:
> >> With file-based log shipping, you can get down to 1 second, by using the
> >> archive_timeout setting. It will produce a lot of log files with very
> >> little content in them, but they will compress well.
> 
> > I tried doing a couple pg_switch_xlog() in quick succession, and the WAL
> > files that represent less than a second of actual data don't seem much
> > more compressible than the rest.
> 
> That's because we do not try to zero out a WAL file when recycling it,
> so the part after what's been used is not any more compressible than the
> valid part.
> 

That was my point.

> pg_switch_xlog is not, and was never intended to be, a solution for a
> low-latency log-shipping scheme.  The right solution for that is to make
> a smarter log-shipping daemon that transmits only part of a WAL segment
> at a time.  (As Hannu is getting tired of repeating, you can find a
> working example in SkyTools.)

I think that in addition to WalMgr (which is the tool I assume you're
referring to), there's still room for a simple tool that can zero out
the meaningless data in a partially-used WAL segment before compression.
It seems reasonable to me, so long as you keep archive_timeout at
something reasonably high.

If nothing else, people that already have a collection of archived WAL
segments would then be able to compact them.

I agree that something like WalMgr is better for low-latency, however.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Trouble with the PL/pgSQL debugger and VC++

2007-09-06 Thread Charlie Savage

Hi Korry,

Can you e-mail the VC++ projects that you created? (You can do that 
off-list).  I have VC++ here but I haven't tried to do a PG build in 
that environment yet (guess it's time to learn).


Done.



However, I can't set any breakpoints using PgAdmin.  I know the dlls 
are loaded via Process Explorer, and in fact I can attach to them with 
the VC++ debugger.
When you say that you can't set any breakpoints using PgAdmin, does that 
mean that the menu choices ("Set Breakpoint" and "Debug") just don't 
appear? Or are they there but don't do anything?  Or are you getting an 
error message?


Let me see if I can clarify.  Both choices are available from the 
PgAdmin menu.


When I choose Debug a window opens asking me to set the value for the 
parameter to the fucntion  I do that, hit OK.  But then the window just 
reappears again.


If instead I do "Set Breakpoint" then I get a window that says "Waiting 
to set breakpoint in core.testwhere" with a progress bar (note there is 
a debugger window behind it also).  That window never goes away.  When I 
press cancel I get a Debug Assertion Failure:


close.c, line 47
Expression (fh >= 0 && (unsigned)fh < (unsigned)_nhandle)

Can you gather a PgAdmin log file (see Options on the File Menu, then 
choose the Logging tab, check "Debug") and send it to me.  That will 
give me some clues.


Yes, will do.

*  Opening a new pgadmin window and doing step 6 (SELECT * FROM 
pldbg_wait_for_breakpoint(1);).  That didn't work.
That won't work... you want to open another session and SELECT 
testwhere(7) from the new session.


I thought each pgadmin sql window was its own session though (they have 
different backend pids)?  No?


The first session is your debugger client, the second session is the 
target process (the application that you are debugging).
* I tried executing the function (select testwhere(7);).  That didn't 
work.
I presume that you mean that the debugger session was still hung in the 
call to pldbg_wait_for_target(1), right?


Did you remember to set shared_preload_librarys = 
'$libdir/plugins/plugin_debugger' in your postgresql.conf file (and 
restart the server aftwards)?.


Yes.  And checked it a few times :)

FYI the readme includes the .so ($libdir/plugins/plugin_debugger.so') if 
I remember correctly, might want to remove that.


Charlie


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] Some more msvc++ 8.2.4 build feedback

2007-09-06 Thread Magnus Hagander
Charlie Savage wrote:
> Hi Magnus,
> 
>> Yeah, this is a known problem, and it's fixed for 8.3. Really, the msvc
>> build process in 8.2 has a *lot* of rough edges ;-) The code itself builds
>> fine, but the buildsystem needed a lot more work.
>>   
> Great - figured that might be the case.
> 
>>> 3.  If you have a contrib module that is not known to the build system 
>>> it blows up.  For example, I use PostGIS.  When running build.bat it 
>>> stops at postgis (sorry, I didn't write down the error message but can 
>>> easily get it if needed).  Could unknown contrib modules just be skipped 
>>> instead?
>>> 
>>
>> Uh, can you explain more what you mean? How can the pg build be affected by
>> postgis?
>>   
> Download postgis source and put it under contrib/postis. Then:
> 
> C:\Development\msys\src\postgresql-8.2.4\src\tools\msvc>perl mkvcbuild.pl
> Could not determine contrib module type for postgis-1.2.1
>  at mkvcbuild.pl line 326
> main::AddContrib('postgis-1.2.1') called at mkvcbuild.pl line 200
> 
> There is a loop in mkvcbuild.pl that processes the contrib modules:
> 
> my $D;
> opendir($D, 'contrib') || croak "Could not opendir on contrib!\n";
> while (my $d = readdir($D)) {
> next if ($d =~ /^\./);
> next unless (-f "contrib/$d/Makefile");
> next if (grep {/^$d$/} @contrib_excludes);
> AddContrib($d);
> }
> closedir($D);
> 
> 
> Perhaps I shouldn't be installing the postgis source to contrib, but
> that used to be the recommended practice (at least for PostGis).

Oh, didn't realise you'd stick it in the actual contrib tree. I thought
it would build with pgxs or something.

I'll put it on my TODO to change that code to look at the Makefile
instead of the directories. Unless you want to send in a patch :-P

//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Trouble with the PL/pgSQL debugger and VC++

2007-09-06 Thread Charlie Savage

Hi Korry,

I am having problems with getting the debugger to work.  Setup:

OS: Windows XP
Postgresql Version: 8.2.4 compiled with VC++ 2005

Since my version of pg is built with VC++ I thought it wise to also use 
VC++ for the debugger plugin.  So I converted the Makefile into 3 
different VC++ projects - pldbgapi, plugin_debugger and targetinfo. 
Note that targetinfo is not mentioned at all in the comments at the top 
of Makefile - you may wish to update them.


VC++ did not compile the code as is, I've attached a patch below with 
the changes I had to make.  I also generated the appropriate DEF files 
(using the perl scripts in postgresql-8.2.4\src\tools\msvc).  I also had 
to define a preprocess define, __WIN32__ (line 1524, pldgbapi.c).  Maybe 
you could use something more standard, like _WIN32?


I was then able to build the dlls except the profiler (for some reason 
the struct timezone wasn't being picked up via the includes - but I left 
that for another day).


I then installed the dlls as per the instructions and updated my 
postgresql.conf file.


However, I can't set any breakpoints using PgAdmin.  I know the dlls are 
loaded via Process Explorer, and in fact I can attach to them with the 
VC++ debugger.


So then I tried running through your command line example (using the 
PgAdmin sql window and then psql) and here are the results:




1. CREATE OR REPLACE FUNCTION testwhere(x int)
RETURNS int AS
$$
DECLARE
  result int;
BEGIN
  result := x + 1;
  RETURN x;
END;
$$
LANGUAGE 'plpgsql' STABLE;


2.  SELECT * FROM pldbg_get_target_info( 'testwhere', 'f' );

target;schema;nargs;argtypes;targetname;argmodes;argnames;targetlang;fqname;returnsset;returntype
80655;79041;1;23;testwhere;;{x};77823;core.testwhere;f;23

3.  SELECT * FROM pldbg_create_listener();

pldbg_create_listener
1

4.  SELECT * from pldbg_set_global_breakpoint(1, 80655, NULL, NULL);

 pldbg_set_global_breakpoint
 t

5. SELECT * FROM pldbg_wait_for_target(1);

At this point the session hangs, as explained in your email.  So what 
happens next?  I tried:


*  Opening a new pgadmin window and doing step 6 (SELECT * FROM 
pldbg_wait_for_breakpoint(1);).  That didn't work.


* I tried executing the function (select testwhere(7);).  That didn't work.

Any tips/help appreciated.  Like I said, I can step through the code in 
the VC++ debugger so I can pretty much look at anything that might be 
helpful.


And I'd be happy to send along the VC++ project files and DEF files if 
you'd like them.


Thanks,

Charlie




Only in .: msvc
diff -u /c/temp/contrib/pldebugger/pldbgapi.c ./pldbgapi.c
--- /c/temp/contrib/pldebugger/pldbgapi.c   2007-07-29 17:09:46 -0600
+++ ./pldbgapi.c2007-09-06 00:34:29 -0600
@@ -1560,6 +1560,12 @@
 }
 static int allocateServerListener( int * port )
 {
+#ifdef WIN32
+   WORDwVersionRequested;
+   WSADATA wsaData;
+   int err;
+   u_long blockingMode = 0;
+#endif
int sockfd  
= socket( AF_INET, SOCK_STREAM, 0 );
struct sockaddr_in  proxy_addr  = {0};
socklen_t   proxy_addr_len  = 
sizeof( proxy_addr );
@@ -1571,9 +1577,6 @@
proxy_addr.sin_addr.s_addr = htonl( INADDR_ANY );

 #ifdef WIN32
-   WORDwVersionRequested;
-   WSADATA wsaData;
-   int err;

wVersionRequested = MAKEWORD( 2, 2 );

@@ -1617,7 +1620,6 @@
listen( sockfd, 2 );

 #ifdef WIN32
-   u_long blockingMode = 0;

ioctlsocket( sockfd, FIONBIO,  &blockingMode );
 #endif
Only in .: pldebugger
diff -u /c/temp/contrib/pldebugger/plugin_debugger.c ./plugin_debugger.c
--- /c/temp/contrib/pldebugger/plugin_debugger.c2007-08-07 10:37:14 
-0600
+++ ./plugin_debugger.c 2007-09-06 00:34:58 -0600
@@ -1143,16 +1143,18 @@
int client_sock;
int reuse_addr_flag = 1;

+#ifdef WIN32
+   WORD wVersionRequested;
+   WSADATA wsaData;
+   int err;
+   u_long blockingMode = 0;
+#endif
/* Ask the TCP/IP stack for an unused port */
srv_addr.sin_family  = AF_INET;
srv_addr.sin_port= htons( 0 );
srv_addr.sin_addr.s_addr = htonl( INADDR_ANY );

 #ifdef WIN32
-   WORD wVersionRequested;
-   WSADATA wsaData;
-   int err;
-
wVersionRequested = MAKEWORD( 2, 2 );

err = WSAStartup( wVersionRequested, &wsaData );
@@ -1197,7 +1199,6 @@
listen( sockfd, 2 );

 #ifdef WIN32
-   u_long blockingMode = 0;

ioctlsocket( sockfd, FIONBIO,  &blockingMode );
 #endif


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Kevin Grittner
>>> On Thu, Sep 6, 2007 at  3:25 PM, in message
<[EMAIL PROTECTED]>, Jeff Davis <[EMAIL PROTECTED]>
wrote: 
> On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote:
>> With file-based log shipping, you can get down to 1 second, by using the
>> archive_timeout setting. It will produce a lot of log files with very
>> little content in them, but they will compress well.
> 
> I tried doing a couple pg_switch_xlog() in quick succession, and the WAL
> files that represent less than a second of actual data don't seem much
> more compressible than the rest.
 
Agreed.  We kick out a WAL file once per hour (if they don't fill first)
and have found that a compressed WAL file during normal activity averages
a little over 4 MB.  During the nightly VACUUM ANALYZE we get a few over
10 MB.  The interesting thing is that about the time that these WAL files
would get reused, even when the system is idle, they are at the VACUUM
ANALYZE size again.  Note how the first three 18:00 file sizes echo at the
next morning, before the users are in and working.
 
  5293110 2007-08-22 17:14 000100180044.gz
  5205720 2007-08-22 17:14 000100180045.gz
  5104595 2007-08-22 17:14 000100180046.gz
  3747524 2007-08-22 17:14 000100180047.gz
  3118762 2007-08-22 17:14 000100180048.gz
  3342918 2007-08-22 17:14 000100180049.gz
  4600438 2007-08-22 17:14 00010018004A.gz
  2715708 2007-08-22 17:15 00010018004B.gz
  2865803 2007-08-22 17:15 00010018004C.gz
 10111700 2007-08-22 18:00 00010018004D.gz
 11561630 2007-08-22 18:00 00010018004E.gz
 11619590 2007-08-22 18:00 00010018004F.gz
  7165231 2007-08-22 18:00 000100180050.gz
  4012164 2007-08-22 18:00 000100180051.gz
  4590502 2007-08-22 18:00 000100180052.gz
  3617203 2007-08-22 18:01 000100180053.gz
  3552210 2007-08-22 18:01 000100180054.gz
  4006261 2007-08-22 18:01 000100180055.gz
  3538293 2007-08-22 18:01 000100180056.gz
  4291776 2007-08-22 18:02 000100180057.gz
  4837268 2007-08-22 18:02 000100180058.gz
  3948408 2007-08-22 19:02 000100180059.gz
  2714635 2007-08-22 20:02 00010018005A.gz
  4989698 2007-08-22 21:02 00010018005B.gz
  5290729 2007-08-22 22:02 00010018005C.gz
  5203306 2007-08-22 23:02 00010018005D.gz
  5101976 2007-08-23 00:02 00010018005E.gz
  3745125 2007-08-23 01:02 00010018005F.gz
  3119218 2007-08-23 02:02 000100180060.gz
  3340691 2007-08-23 03:02 000100180061.gz
  4599279 2007-08-23 04:02 000100180062.gz
 10110899 2007-08-23 05:02 000100180063.gz
 11555895 2007-08-23 06:02 000100180064.gz
 11587646 2007-08-23 07:02 000100180065.gz


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Trouble with the PL/pgSQL debugger and VC++

2007-09-06 Thread korry.douglas


Since my version of pg is built with VC++ I thought it wise to also 
use VC++ for the debugger plugin.  So I converted the Makefile into 3 
different VC++ projects - pldbgapi, plugin_debugger and targetinfo. 
Note that targetinfo is not mentioned at all in the comments at the 
top of Makefile - you may wish to update them.

Hi Charlie, thanks for the feedback (and thanks for the patches!).

Can you e-mail the VC++ projects that you created? (You can do that 
off-list).  I have VC++ here but I haven't tried to do a PG build in 
that environment yet (guess it's time to learn).


However, I can't set any breakpoints using PgAdmin.  I know the dlls 
are loaded via Process Explorer, and in fact I can attach to them with 
the VC++ debugger.
When you say that you can't set any breakpoints using PgAdmin, does that 
mean that the menu choices ("Set Breakpoint" and "Debug") just don't 
appear? Or are they there but don't do anything?  Or are you getting an 
error message?


Can you gather a PgAdmin log file (see Options on the File Menu, then 
choose the Logging tab, check "Debug") and send it to me.  That will 
give me some clues.




5. SELECT * FROM pldbg_wait_for_target(1);

At this point the session hangs, as explained in your email.  So what 
happens next?  I tried:


*  Opening a new pgadmin window and doing step 6 (SELECT * FROM 
pldbg_wait_for_breakpoint(1);).  That didn't work.
That won't work... you want to open another session and SELECT 
testwhere(7) from the new session.


The first session is your debugger client, the second session is the 
target process (the application that you are debugging).
* I tried executing the function (select testwhere(7);).  That didn't 
work.
I presume that you mean that the debugger session was still hung in the 
call to pldbg_wait_for_target(1), right?


Did you remember to set shared_preload_librarys = 
'$libdir/plugins/plugin_debugger' in your postgresql.conf file (and 
restart the server aftwards)?.



-- Korry

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Some more msvc++ 8.2.4 build feedback

2007-09-06 Thread Charlie Savage

Oh, didn't realise you'd stick it in the actual contrib tree. I thought
it would build with pgxs or something.


You used to have to build postgis from the contrib directory.  That 
changed a while back, but I guess its just habit.  It has its own 
configure and makefile scripts, so doesn't seem to use pgxs.



I'll put it on my TODO to change that code to look at the Makefile
instead of the directories. Unless you want to send in a patch :-P


Thanks.  No patch from me - my perl and autoconf skills leave something 
to be desired :)


Charlie


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote:
>> With file-based log shipping, you can get down to 1 second, by using the
>> archive_timeout setting. It will produce a lot of log files with very
>> little content in them, but they will compress well.

> I tried doing a couple pg_switch_xlog() in quick succession, and the WAL
> files that represent less than a second of actual data don't seem much
> more compressible than the rest.

That's because we do not try to zero out a WAL file when recycling it,
so the part after what's been used is not any more compressible than the
valid part.

pg_switch_xlog is not, and was never intended to be, a solution for a
low-latency log-shipping scheme.  The right solution for that is to make
a smarter log-shipping daemon that transmits only part of a WAL segment
at a time.  (As Hannu is getting tired of repeating, you can find a
working example in SkyTools.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Decibel!
On Thu, Sep 06, 2007 at 09:20:31AM -0500, Kevin Grittner wrote:
> >>> On Wed, Sep 5, 2007 at 10:31 PM, in message
> <[EMAIL PROTECTED]>, Greg Smith
> <[EMAIL PROTECTED]> wrote: 
> > 
> > -There are two magic constants in the code:
> > 
> >  int smoothing_samples = 16;
> >  float   scan_whole_pool_seconds = 120.0;
> > 
> 
> > I personally 
> > don't feel like these constants need to be exposed for tuning purposes;
> 
> > Determining 
> > whether these should be exposed as GUC tunables is certainly an open 
> > question though.
>  
> If you exposed the scan_whole_pool_seconds as a tunable GUC, that would
> allay all of my concerns about this patch.  Basically, our problems were

I like the idea of not having that as a GUC, but I'm doubtful that it
can be hard-coded like that. What if checkpoint_timeout is set to 120?
Or 60? Or 2000?

I don't know that there should be a direct correlation, but ISTM that
scan_whole_pool_seconds should take checkpoint intervals into account
somehow.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpiBGkQouND3.pgp
Description: PGP signature


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Michael Glaesemann wrote:

On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the 
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to 
be rechecked due to the chance of collisions. There might be other 
issues with truncation, but preventing hashes from being unique isn't 
one of them.


Of course - that's why I used the word "limit".

Hash works best, when the key is unique, however. A 32-bit hash will be 
many powers of 2 more unique than a 8-bit hash.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 19:33, kirjutas apoc9009:
> Simon Riggs schrieb:
> > I'm not clear whether you are looking for Backup, or High Availability
> > Replication.
> >
> > There is no data loss with the online backup technique described in the
> > manual.
> >   
> No, there is a lost of Data.
> 
> The WAL File musst bei generated by the Postmaster and this File must be 
> copied to the
> safe Remote Backupserver. If the Machine, where the Database is running 
> crashed, then the Last
> WAL is lost and the Backup isnt complet,e this is the Center of the Problem.

read and re-read 23.4.4

you DO NOT have to wait for the whole file to be complete to copy it
over

you can query the position where postgres has currently completed
writing and then copy over that part. doing so you can have wal-based
backup that is good up to last second (if you poll the function and copy
over the newly written part of the file every second)

Look at WalMgr.py in SkyTools package how this can be done in practice.

Skytools is available at http://pgfoundry.org/projects/skytools/ 

If used for setting up WAL-based backup on pg 8.2, it runs automatically
in "syncdaemon" mode, which means that parts of WAL file are copied over
as soon as they are written by postgres.

--
Hannu



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Jeff Davis
On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote:
> With file-based log shipping, you can get down to 1 second, by using the
> archive_timeout setting. It will produce a lot of log files with very
> little content in them, but they will compress well.

I tried doing a couple pg_switch_xlog() in quick succession, and the WAL
files that represent less than a second of actual data don't seem much
more compressible than the rest.

I think WAL compression has been talked about before, with ideas such as
removing unneeded full page images. However, it seems like it would be
easy to get a lot of the benefit by just having a utility that could
discard useless contents that are left over from a previously-recycled
file, and then could just reconstruct it later with zeros.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 17:53 +0300, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-09-06 kell 16:15, kirjutas apoc9009:
> > > So you want the user to still be connected to the failed machine, but at
> > > the same time be connected to the new live failover machine ?
> > >
> > > -
> > > Hannu
> > No.
> > The User should be connected to the running db without restrictions 
> > while backup is in progress
> 
> And what's wrong with WAL-based backup then ?
> 
> Especially the variant described in 23.4.4 wich keeps the WAL copied to
> backup machine  up to last second ?

I think the OP means
- synchronous replication
- ability to access the standby node for queries

Yes, both requests are planned for later releases. 



Bruce, can we edit the TODO to include a section called "High
Availability"?

We already have a few scattered references to such things, but in Admin
and WAL. We need a specific section.

We currently have these items, that can be moved to the new section:
- Allow a warm standby system to also allow read-only statements
- Allow WAL traffic to be streamed to another server for stand-by
replication

new section would look like this:

High Availability
-
- Allow a warm standby system to also allow read-only statements
- Allow WAL traffic to be streamed to another server for stand-by
replication (synchronous/asynchronous options)
- Improve scalability of recovery for large I/O bound databases

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Andrew Sullivan
On Thu, Sep 06, 2007 at 04:08:10PM +0200, apoc9009 wrote:
> >archive_timeout setting. It will produce a lot of log files with very
> >little content in them, but they will compress well.
> >  
> Yes, it is possible but not recommended . My Backup Servers Filesystem 
> will explode :D

. . .

> Correct, but  this is not good enought and i think there are a lot of 
> Peoples having the same Problem.
> It was wishfull, having a Online Streaming Backupsubsystem, thadt can 
> produce portable Backupdatabases,
> to prevent users for Millions of Archive Files on the Backup FTP-Server.

It seems that what you want is near-real-time online backups with _no
cost_, which is not a feature that I think anyone will ever work on.  

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Trevor Talbot
On 9/6/07, apoc9009 <[EMAIL PROTECTED]> wrote:

> Backup 12/24/2008 Version 2
> /pg/backup/12_24_2008/base/rcvry.rcv  <--- Basebackup
> /pg/backup/12_24_2008/changes/0001.chg  <--- Changed Data
>   /changes/0002.chg  <--- Changed Data
>   /changes/0003.chg  <--- Changed Data
>
>   /changes/0010.chg  <--- Changed Data
>
>   /changes/0001.rsf   <---  Recovery
> Stripeset File (10 MByte) addon of Basebackup
>   delete *.chg
>
> if a Stripeset of 10 *.chg Files exist, they should be converted or merged
> to one greater Recovery Stripe File (*.RSF)

Why?  What does this actually do?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
> Hannu Krosing wrote:
 One approahc is not to mix hashes, but to partition the hash, so that
 each column gets its N bits in the hash.   
>>> How does that help?  You still need all the keys to find out which
>>> bucket to look in.
>>> 
>>
>> no. you need to look at only the buckets where that part of hash matches
>>
>> say you allocate bits 4-7 for column 2 and then need to look up column 2
>> value with hash 3 . here you need to look at only buckets N*16 + 3, that
>> is, you need to examine only each 16th bucket
>>
>>   
>
> I don't like the truncating hash suggestion because it limits the ability 
> of a hash code to uniquely identify a key.
>
> If a user requires the ability to search on both (column1) and (column1, 
> column2), they can create two hash indexes and the planner can decide which 
> to use.
> Or, they can use a btree. I think hash has a subset of uses where it would 
> be a significant gain, and focus should be spent on this subset.
>
> Cheers,
> mark
>
I agree that we should focus primarily on the subset of uses for hash
indexes where there would be a significant gain. I do think that being
able to use a single O(1) hash lookup against all the values specified
in a pseudo-multi-column index could be very beneficial in reducing
access time and I/O. 

Since we already have to check the actual tuple values for any index
lookup in postgresql, we could only store the full hash value and the
corresponding TIDs in the bucket. Then when we lookup an item by
calculating its hash, if the exact hash is not present in the bucket,
then we know that the item is not in the index. If the value exists,
then we would check the heap tuples before returning the results. Thus
a negative lookup only needs to check the index and if the hash function
is "good" there will be optimally only 1 possibly valid heap tuple if
there is a match. One very big win for this change is to allow a much
smaller index size (hash value + relevant TIDs) and the large column
values are only stored in the actual data tuples.

Regards,
Ken
> -- 
> Mark Mielke <[EMAIL PROTECTED]>
>

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> >> No, because part of the deal is that you can do lookups using only the
> >> leading index columns.  At least, all the existing multicolumn index
> >> types can do that.
> 
> > One approahc is not to mix hashes, but to partition the hash, so that
> > each column gets its N bits in the hash. 
> 
> How does that help?  You still need all the keys to find out which
> bucket to look in.

no. you need to look at only the buckets where that part of hash matches

say you allocate bits 4-7 for column 2 and then need to look up column 2
value with hash 3 . here you need to look at only buckets N*16 + 3, that
is, you need to examine only each 16th bucket

-
Hannu


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?

2007-09-06 Thread John DeSoi

Hi Korry,

On Sep 6, 2007, at 10:23 AM, korry.douglas wrote:

John, I started writing up the API documentation and then noticed  
that most of what I intended to write is already described in the  
pldbgapi.c module.


Take a look at that module and let me know if you have any  
questions (you can e-mail me off-list if you like).  I'll update  
the documentation in pldbgapi.c as needed.



I just noticed that when digging around last night. It helped a lot  
with my understanding of how things work. I think that needs to go in  
the readme file or at least reference it from the readme file.


I would still like to see a simple example using psql. I know you  
would not really use psql for this, but I think it would help a lot  
with getting started for folks that want to use the debugger. I did  
not spend lots of time on it, but even after reading pldbgapi.c I was  
not able to get simple session going (e.g. how to start a session and  
request the source for a procedure).


Thanks,

John




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009

Simon Riggs schrieb:

On Thu, 2007-09-06 at 12:53 +0200, apoc9009 wrote:
  
You've either not read 23.4.4 or haven't understood it. If the text is 
unclear, documentation additions/changes are always welcome.
  

I have read this:

PostgreSQL directly supports file-based log shipping as described above. 
It is also possible to implement record-based log shipping, though this 
requires custom development.



We may expect that feature in later releases, but definitely not in 8.3

  
I wish to have an Solution, thadt backup my Database DB wihout 
Datalosses,  without locking Tables, without Shutdown
and without any User must be forced for logging out (Backup in 
Production State Online without Datalosses).



I'm not clear whether you are looking for Backup, or High Availability
Replication.

There is no data loss with the online backup technique described in the
manual.
  

No, there is a lost of Data.

The WAL File musst bei generated by the Postmaster and this File must be 
copied to the
safe Remote Backupserver. If the Machine, where the Database is running 
crashed, then the Last

WAL is lost and the Backup isnt complet,e this is the Center of the Problem.

If you require HA replication with zero data-loss then you are currently
restricted to non-database techniques, which you already mentioned, so
you have your answer already.
  

we talking about Backup this isnt the same as Replication.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009

Heikki Linnakangas schrieb:

apoc9009 wrote:

"Without datalosses" is utopy. For that, you'd need something like
synchronous replication, otherwise you're always going to have a window
where you have something committed in the server, but not yet in the
backup. So it's just a question of how wide that window is, how much
data loss can you live with.

With file-based log shipping, you can get down to 1 second, by using the
archive_timeout setting. It will produce a lot of log files with very
little content in them, but they will compress well.
  
Yes, it is possible but not recommended . My Backup Servers Filesystem 
will explode :D



The record-based log shipping will give you a very narrow window, down
to < 1 second or even less if you're willing to poll the server that
often, but requires some custom development.
  
Correct, but  this is not good enought and i think there are a lot of 
Peoples having the same Problem.
It was wishfull, having a Online Streaming Backupsubsystem, thadt can 
produce portable Backupdatabases,

to prevent users for Millions of Archive Files on the Backup FTP-Server.

My Ideaa is the following Folder Structure:

/pg/backup/

Backup 12/24/2007 Version 1
/pg/backup/12_24_2007/base/rcvry.rcv  <--- Basebackup
/pg/backup/12_24_2007/changes/0001.chg  <--- Changed Data
 /changes/0002.chg  <--- Changed Data
 /changes/0003.chg  <--- Changed Data

Backup 12/24/2008 Version 2
/pg/backup/12_24_2008/base/rcvry.rcv  <--- Basebackup
/pg/backup/12_24_2008/changes/0001.chg  <--- Changed Data
 /changes/0002.chg  <--- Changed Data
 /changes/0003.chg  <--- Changed Data
  
 /changes/0010.chg  <--- Changed Data

 /changes/0001.rsf   <---  Recovery 
Stripeset File (10 MByte) addon of Basebackup

 delete *.chg

if a Stripeset of 10 *.chg Files exist, they should be converted or merged

to one greater Recovery Stripe File (*.RSF)

Things to do:

1.A Integradted FTP-Client to the Postmaster Process
2.Online Streamingbackup Logic inside the Postmaster

Apoc


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 12:53 +0200, apoc9009 wrote:
> > You've either not read 23.4.4 or haven't understood it. If the text is 
> > unclear, documentation additions/changes are always welcome.
> I have read this:
> 
> PostgreSQL directly supports file-based log shipping as described above. 
> It is also possible to implement record-based log shipping, though this 
> requires custom development.

We may expect that feature in later releases, but definitely not in 8.3

> I wish to have an Solution, thadt backup my Database DB wihout 
> Datalosses,  without locking Tables, without Shutdown
> and without any User must be forced for logging out (Backup in 
> Production State Online without Datalosses).

I'm not clear whether you are looking for Backup, or High Availability
Replication.

There is no data loss with the online backup technique described in the
manual.

If you require HA replication with zero data-loss then you are currently
restricted to non-database techniques, which you already mentioned, so
you have your answer already.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Michael Glaesemann


On Sep 6, 2007, at 10:53 , Mark Mielke wrote:

I don't like the truncating hash suggestion because it limits the  
ability of a hash code to uniquely identify a key.


AIUI, a hash can't be used as a unique identifier: it always needs to  
be rechecked due to the chance of collisions. There might be other  
issues with truncation, but preventing hashes from being unique isn't  
one of them.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 16:15, kirjutas apoc9009:
> > So you want the user to still be connected to the failed machine, but at
> > the same time be connected to the new live failover machine ?
> >
> > -
> > Hannu
> No.
> The User should be connected to the running db without restrictions 
> while backup is in progress

And what's wrong with WAL-based backup then ?

Especially the variant described in 23.4.4 wich keeps the WAL copied to
backup machine  up to last second ?

---
Hannu



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Kevin Grittner
>>> On Wed, Sep 5, 2007 at 10:31 PM, in message
<[EMAIL PROTECTED]>, Greg Smith
<[EMAIL PROTECTED]> wrote: 
> 
> -There are two magic constants in the code:
> 
>  int smoothing_samples = 16;
>  float   scan_whole_pool_seconds = 120.0;
> 

> I personally 
> don't feel like these constants need to be exposed for tuning purposes;

> Determining 
> whether these should be exposed as GUC tunables is certainly an open 
> question though.
 
If you exposed the scan_whole_pool_seconds as a tunable GUC, that would
allay all of my concerns about this patch.  Basically, our problems were
resolved by getting all dirty buffers out to the OS cache within two
seconds; any longer than that and the OS cache didn't reach its trigger
point for pushing out to the controller cache in time to prevent the glut
which locks everything up.  I also suspect that this interval kept the OS
cache more aware of frequently updated pages, so that it could avoid
unnecessary physical writes under its own logic.
 
While I'm hoping that the new checkpoint techniques will be a better
solution, I can't count on that without significant testing in our
environment, and I really want a fall-back.  The metric you emphasized was
the percentage of PostgreSQL writes to the OS cache which were handled by
the background writer, which doesn't necessarily correspond to a solution
to the glut, which is based on the peak number of total writes presented
to the controller by the OS within a small window of time.
 
-Kevin
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?

2007-09-06 Thread korry.douglas


Is there any documentation that describes how to use the SQL 
functions? Some are obvious enough, but a simple example showing a 
debugging session would be helpful.
John, I started writing up the API documentation and then noticed that 
most of what I intended to write is already described in the pldbgapi.c 
module.


Take a look at that module and let me know if you have any questions 
(you can e-mail me off-list if you like).  I'll update the documentation 
in pldbgapi.c as needed.


-- Korry


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Hannu Krosing wrote:

One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash. 
  

How does that help?  You still need all the keys to find out which
bucket to look in.



no. you need to look at only the buckets where that part of hash matches

say you allocate bits 4-7 for column 2 and then need to look up column 2
value with hash 3 . here you need to look at only buckets N*16 + 3, that
is, you need to examine only each 16th bucket

  


I don't like the truncating hash suggestion because it limits the 
ability of a hash code to uniquely identify a key.


If a user requires the ability to search on both (column1) and (column1, 
column2), they can create two hash indexes and the planner can decide 
which to use.
Or, they can use a btree. I think hash has a subset of uses where it 
would be a significant gain, and focus should be spent on this subset.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [HACKERS] Just-in-time Background Writer Patch+Test Results

2007-09-06 Thread Greg Smith

On Thu, 6 Sep 2007, Kevin Grittner wrote:


If you exposed the scan_whole_pool_seconds as a tunable GUC, that would
allay all of my concerns about this patch.  Basically, our problems were
resolved by getting all dirty buffers out to the OS cache within two
seconds


Unfortunately it wouldn't make my concerns about your system go away or 
I'd have recommended exposing it specifically to address your situation. 
I have been staring carefully at your configuration recently, and I would 
wager that you could turn off the LRU writer altogether and still meet 
your requirements in 8.2.  Here's what you've got right now:



shared_buffers = 160MB (=2 buffers)
bgwriter_lru_percent = 20.0
bgwriter_lru_maxpages = 200
bgwriter_all_percent = 10.0
bgwriter_all_maxpages = 600


With the default delay of 200ms, this has the LRU-writer scanning the 
whole pool every 1 second, while the all-writer scans every two 
seconds--assuming they don't hit the write limits.  If some event were to 
dirty the whole pool in 200ms, it might take as much as 6.7 seconds to 
write everything out (2 / 600 * 200 ms) via the all-scan.  The 
all-scan is already gone in 8.3.  Your LRU scan will take much longer than 
that to clear everything out.  At least (2 / 200 * 200ms) 20 seconds 
to clear a fully dirty cache.


But in fact, it's impossible to even bound how long it will take before 
the LRU writer (which is the only part this new patch tries to improve) 
gets around to writing even a single dirty buffer no matter what 
bgwriter_lru_percent (8.2) or scan_whole_pool_seconds (JIT patch) is set 
to.


There's a second low-level issue involved here.  When a page becomes 
dirty, that implies it was also recently used, which means the LRU writer 
won't touch it.  That page can't be written out by the LRU writer until an 
entire pass has been made over the shared_buffer pool while looking for 
buffers to allocate for new activity.  When the allocation clock-sweep 
passes over the newly dirtied buffer again, its usage count will drop by 
one and it will no longer be considered recently used.  At that point the 
LRU writer can write it out.  So unless there is other allocation activity 
going on, the scan_whole_pool_seconds mechanism will never provide the 
bound on time to scan and write everything you hope it will.


And if there's other allocations going on, the much more powerful JIT 
mechanism will scan the whole pool plenty fast if you bump the already 
exposed multiplier tunable up.  In my tests where the buffer cache was 
filled with mostly dirty buffers that couldn't be re-used (something 
relatively easy to trigger with pgbench tests), I've actually watched the 
new code scan >90% of the buffer cache looking for those few reusable 
buffers in the pool in a single invocation.  This would be like setting 
bgwriter_lru_percent=90.0 in the old configuration, but it only gets that 
aggressive when the distribution of pages in the buffer cache demands it, 
and when it has reason to believe going that fast will be helpful.


The completely understandable line of thinking that led to your request 
here is one of my concerns with exposing scan_whole_pool_seconds as a 
tunable.  It may suggest to people that if they set the number very low, 
it will assure all dirty buffers will be scanned and written within that 
time bound.  That's certainly not the case; both the maxpages and the 
usage count information will actually drive the speed that mechanism plods 
through the buffer cache.  It really isn't useful for scanning fast.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009



So you want the user to still be connected to the failed machine, but at
the same time be connected to the new live failover machine ?

-
Hannu

No.
The User should be connected to the running db without restrictions 
while backup is in progress


Apoc

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 12:53, kirjutas apoc9009:
> I wish to have an Solution, thadt backup my Database DB wihout 
> Datalosses, 

Then you need the backup record to be completed on the backup machine
before the transaction commit returns on master.

This is quaranteed to be really slow for any significant transaction
rate but can be done using DRBD.

>  without locking Tables, 

No backup I know of needs locking tables

> without Shutdown 

This one I just don't understand. Shtdown what ?

> without any User must be forced for logging out (Backup in 
> Production State Online without Datalosses). 

So you want the user to still be connected to the failed machine, but at
the same time be connected to the new live failover machine ?

-
Hannu




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
>> No, because part of the deal is that you can do lookups using only the
>> leading index columns.  At least, all the existing multicolumn index
>> types can do that.

> One approahc is not to mix hashes, but to partition the hash, so that
> each column gets its N bits in the hash. 

How does that help?  You still need all the keys to find out which
bucket to look in.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Simon Riggs
On Thu, 2007-09-06 at 12:21 +0200, apoc9009 wrote:
>  If the System Crash, you have Dataloss of 
> over 16 MBytes thats Fatal and not acceptable! 1MByte Dataloss of ASCII Data 
> is also 
> not acceptable!

Is any data loss acceptable in the case of a disaster? How much?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 12:53, kirjutas apoc9009:
> > You've either not read 23.4.4 or haven't understood it. If the text is 
> > unclear, documentation additions/changes are always welcome.
> I have read this:
> 
> PostgreSQL directly supports file-based log shipping as described above. 
> It is also possible to implement record-based log shipping, though this 
> requires custom development.

check out SkyTools, it seems to make use of 23.4.4 to have sub-second
failure window

-
Hannu



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
> >> - What about multi-column indexes? The current implementation
> >> only supports 1 column.
> 
> > That seems kind of weird. It seems obvious that you mix the three hashes
> > together which reduces it to the solved problem. 
> 
> No, because part of the deal is that you can do lookups using only the
> leading index columns.  At least, all the existing multicolumn index
> types can do that.

One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash. 

If you do it smartly you can use any column for index lookups, not just
the leading one.

>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Richard Huxton

apoc9009 wrote:


You've either not read 23.4.4 or haven't understood it. If the text is 
unclear, documentation additions/changes are always welcome.

I have read this:

PostgreSQL directly supports file-based log shipping as described above. 
It is also possible to implement record-based log shipping, though this 
requires custom development.


But thadt is not thadt what iam looking for!

Filebased Logship backups having a big Problem for doing continous 
Backups. You have to wait until
the Postmaster has written the WAL File, after this you can save it to 
the Backupserver. But 1 WAL
has a size of 16 MByte ny default! (thadt is a big Datahole in your  
"Online Backup"!)


Which is why that entire section is about copying just the changed parts 
of WAL files.


It makes no sense to reduce the Filesize. If the Filesize is smaller 
then 16 MBytes for WAL Files
you have still the same Problem, there are Data losses and thadt the 
Center of the Problem.


But in your original email you said:
> All Users of Hugh Databases (Missioncritical and allways Online) can
> bring up its
> Databases with the same information with differences 1-5 Sec. before
> the Crash occurs!

That suggested to me that you didn't want per-transaction backup, just 
one backup every second. OK, what you actually want is a continuous 
backup with one copy made per transaction.


I wish to have an Solution, thadt backup my Database DB wihout 
Datalosses,  without locking Tables, without Shutdown
and without any User must be forced for logging out (Backup in 
Production State Online without Datalosses).


So, if I understand, you want on of:
1. External RAID array. If main machine dies, turn backup machine on. 
Both share the same disks.


2. Something like DRBD to copy individual disk blocks between machines. 
You could do this just for WAL.


3. Replication.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Guillaume Smet
Hi,

On 9/6/07, apoc9009 <[EMAIL PROTECTED]> wrote:
> Filebased Logship backups having a big Problem for doing continous
> Backups. You have to wait until
> the Postmaster has written the WAL File, after this you can save it to
> the Backupserver. But 1 WAL
> has a size of 16 MByte ny default! (thadt is a big Datahole in your
> "Online Backup"!)

You should read the documentation more carefully:
"archive_timeout (integer)

The archive_command is only invoked on completed WAL segments.
Hence, if your server generates little WAL traffic (or has slack
periods where it does so), there could be a long delay between the
completion of a transaction and its safe recording in archive storage.
To put a limit on how old unarchived data can be, you can set
archive_timeout to force the server to switch to a new WAL segment
file periodically. When this parameter is greater than zero, the
server will switch to a new segment file whenever this many seconds
have elapsed since the last segment file switch. Note that archived
files that are closed early due to a forced switch are still the same
length as completely full files. Therefore, it is unwise to use a very
short archive_timeout — it will bloat your archive storage.
archive_timeout settings of a minute or so are usually reasonable.
This parameter can only be set in the postgresql.conf file or on the
server command line. "

(Taken from 
http://www.postgresql.org/docs/8.2/static/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT)

> I wish to have an Solution, thadt backup my Database DB wihout
> Datalosses,  without locking Tables, without Shutdown
> and without any User must be forced for logging out (Backup in
> Production State Online without Datalosses).

Well, there's what you want and there's what is possible. Warm standby
is definitely the best solution for your problem, even if not perfect.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Heikki Linnakangas
apoc9009 wrote:
> I wish to have an Solution, thadt backup my Database DB wihout
> Datalosses,  without locking Tables, without Shutdown
> and without any User must be forced for logging out (Backup in
> Production State Online without Datalosses).

"Without datalosses" is utopy. For that, you'd need something like
synchronous replication, otherwise you're always going to have a window
where you have something committed in the server, but not yet in the
backup. So it's just a question of how wide that window is, how much
data loss can you live with.

With file-based log shipping, you can get down to 1 second, by using the
archive_timeout setting. It will produce a lot of log files with very
little content in them, but they will compress well.

The record-based log shipping will give you a very narrow window, down
to < 1 second or even less if you're willing to poll the server that
often, but requires some custom development.

I wonder, do you really need such a super real time backup solution,
when you have the remote SAN? Don't you trust that the SAN hardware?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009


You've either not read 23.4.4 or haven't understood it. If the text is 
unclear, documentation additions/changes are always welcome.

I have read this:

PostgreSQL directly supports file-based log shipping as described above. 
It is also possible to implement record-based log shipping, though this 
requires custom development.


But thadt is not thadt what iam looking for!

Filebased Logship backups having a big Problem for doing continous 
Backups. You have to wait until
the Postmaster has written the WAL File, after this you can save it to 
the Backupserver. But 1 WAL
has a size of 16 MByte ny default! (thadt is a big Datahole in your  
"Online Backup"!)


It makes no sense to reduce the Filesize. If the Filesize is smaller 
then 16 MBytes for WAL Files
you have still the same Problem, there are Data losses and thadt the 
Center of the Problem.


I wish to have an Solution, thadt backup my Database DB wihout 
Datalosses,  without locking Tables, without Shutdown
and without any User must be forced for logging out (Backup in 
Production State Online without Datalosses).



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Richard Huxton

apoc9009 wrote:



http://www.postgresql.org/docs/8.2/static/warm-standby.html
Particularly section 23.4.4


23.4.4 is thadt what iam using just im Time but this ist not eneought 
for me!


No Versioning, no chances to prevent data losses

You have to wait until a WAL File ist written (Default Value for WAL 
Files is 16 MBytes).


You've either not read 23.4.4 or haven't understood it. If the text is 
unclear, documentation additions/changes are always welcome.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009



http://www.postgresql.org/docs/8.2/static/warm-standby.html
Particularly section 23.4.4


23.4.4 is thadt what iam using just im Time but this ist not eneought 
for me!


No Versioning, no chances to prevent data losses

You have to wait until a WAL File ist written (Default Value for WAL 
Files is 16 MBytes).


I need an EXCAT Copy from the Database and only the last changes in it 
to for
updating my Backupresitory. If the System Crash, you have Dataloss of 
over 16 MBytes
thadts Fatal and not acceptable! 1MByte Dataloss of ASCII Data is also 
not acceptable!


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009



http://www.postgresql.org/docs/8.2/static/warm-standby.html
Particularly section 23.4.4


Thadt is Replication NOT Backup



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread Richard Huxton

apoc9009 wrote:

Hi Hackers,

In my Project i have to handle a Database with 600 GByte Text only, 
distributed on 4 Tablespaces
on multiple Harddisks and Remote SAN's connected via Gigaethernet to the 
Remote SAN-Storage.


I need more flexibility by doing Backups of my big Database, but the 
built in Online Backupsystem
dont work for my Setup good enought for me. I Can not accept 16 MByte 
big WAL's Files
for securing it on Tape. 16 MByte Data loss on a Crash Situation is 
Fatal and no helpfully

(1 MByte to). I wish to have a continoues Backup without any data losses.


http://www.postgresql.org/docs/8.2/static/warm-standby.html
Particularly section 23.4.4

That can get you to 1 second or less.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?

2007-09-06 Thread Dave Page
John DeSoi wrote:
> Hi Dave,
> 
> On Sep 5, 2007, at 3:54 AM, Dave Page wrote:
> 
>> That's odd - I cannot reproduce that on OS X using beta 4 (which has no
>> important changes in the debugger over beta 3).
>>
>> Can you provide a simple test case?
> 
> I get the same error with this:
> 
> create or replace function debug_test(out t text, out i integer)
> returns record as $$
> begin
> t := 'test 1';
> i := 10;
> return;
> end;
> $$ language plpgsql;
> 
> 
> I did the following:
> 
> 1. Right click the function and chose "Debug" from the "Debugging" submenu.
> 2. Clicked the OK button on the dialog.

Thanks John - bug found and fixed in SVN.

Regards Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)

2007-09-06 Thread apoc9009

Hi Hackers,

In my Project i have to handle a Database with 600 GByte Text only, 
distributed on 4 Tablespaces
on multiple Harddisks and Remote SAN's connected via Gigaethernet to the 
Remote SAN-Storage.


I need more flexibility by doing Backups of my big Database, but the 
built in Online Backupsystem
dont work for my Setup good enought for me. I Can not accept 16 MByte 
big WAL's Files
for securing it on Tape. 16 MByte Data loss on a Crash Situation is 
Fatal and no helpfully

(1 MByte to). I wish to have a continoues Backup without any data losses.

My Idea:
- 1 A Complete Database Backup from Scratch (its implemented right now)
- 2 Online streaming  Backup thadt, updates my Basebackup continously 
every Time Changes was made (the half functionality is allready implemented)
- 3 Able to redirect the Online Streamingbackup Files to an Remote 
Servermachine (FTP) (the ARCHIVE Param in postgresql.conf can do thadt 
allready
 but the Method is not 100% Continously, big Holes of Datalosses 
can occur)
- 4 Version Management of Multiple Backuplines by Timestamp (is not yet 
implemented)
- 5 Recovery Option inside the PSQL-Client, for doing the Desaster 
Recovery. (is not  yet implemented)


Benefitts:

All Users of Hugh Databases (Missioncritical and allways Online) can 
bring up its
Databases with the same information with differences 1-5 Sec. before the 
Crash occurs!


ps:
At EMCSoftware there is an Tool thadt has can do thadt for ORACLE and 
MSSQL but there

is not Option for Postgres avaiable );

Sry for my bad english and i hope there is some one thadt can understand 
the Problem.


Apoc


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] left outer join vs subplan

2007-09-06 Thread Simon Riggs
On Wed, 2007-09-05 at 20:11 +0400, Teodor Sigaev wrote:

> I found two queries which do the same thing but they is very different in 
> time. 
> For test suite it's about 10^3 times, but on real data it can be 10^5 times. 
> It's observed on 8.1-current, 8.2-current and CVS HEAD versions. Interesting 
> that even without LIMIT clause they take approximately the same time, but 
> costs 
> is differ in 30 times. Is any way to tweaking pgsql to produce more 
> reasonable 
> plan for first query?

Times I get are:
Q1: ~950ms
Q2: ~5ms

> This query is auto-generated, so they may be more complex and I choose 
> simplest 
> example.

I think we need to build up a library of autogenerated queries, so we
can do things which address multiple use cases. Can you tell us more
about who/what generated it, so we can research?

The query formulation does seem a fairly common one.

> First query:
> explain analyze
> select *
> from
>  a
>  left outer join (
>  select b.id, sum(b.val)
>  from b
>  group by b.id
>  ) bagg
>  on bagg.id = a.id
> where
>  a.id > 1
> order by a.addon, a.id
> limit 100;
>   Limit  (cost=9923.36..9923.61 rows=100 width=20) (actual 
> time=2232.437..2233.273 rows=100 loops=1)
> ->  Sort  (cost=9923.36..10031.41 rows=43221 width=20) (actual 
> time=2232.428..2232.709 rows=100 loops=1)
>   Sort Key: a.addon, a.id
>   Sort Method:  top-N heapsort  Memory: 24kB
>   ->  Merge Right Join  (cost=0.00..8271.48 rows=43221 width=20) 
> (actual 
> time=313.198..2052.559 rows=4 loops=1)
> Merge Cond: (b.id = a.id)
> ->  GroupAggregate  (cost=0.00..5725.41 rows=53292 width=12) 
> (actual time=0.266..1422.522 rows=5 loops=1)
>   ->  Index Scan using bidx on b  (cost=0.00..4309.26 
> rows=15 width=12) (actual time=0.217..547.402 rows=15 loops=1)
> ->  Index Scan using a1idx on a  (cost=0.00..1256.90 
> rows=40551 
> width=8) (actual time=0.171..155.073 rows=4 loops=1)
>   Index Cond: (a.id > 1)
>   Total runtime: 2233.940 ms

The value of sum(b.val) is never used in the query, so the aggregate
itself could be discarded. I suspect there are other conditions you
aren't showing us that would make this impossible?

The aggregate prevents the condition bagg.id = a.id from being pushed
down so that we know b.id = a.id. If we knew that then we could use b.id
= ? as an index condition to retrieve the rows.

Since we can't use the best technique, we use another. That then hits a
third optimization problem. When an IndexScan is used to enforce order,
we don't estimate how much of the table needs to be scanned before we
start hitting rows. In the example you give we need to scan 65% of the
table using an IndexScan before we hit any rows. So we would probably be
better off doing a Sort<-SeqScan to apply the condition.

I think we need to do all 3 eventually.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster