Re: [HACKERS] Hash index todo list item
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: > 2. Evaluate the performance of different hash index implementations >and/or changes to the current implementation. My current plan is >to keep the implementation as simple as possible and still provide >the desired performance. Several hash index suggestions deal with >changing the layout of the keys on a page to improve lookup >performance, including reducing the bucket size to a fraction of >a page or only storing the hash value on the page, instead of >the index value itself. You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php It implements the "just store the hash in the index" idea; it also sorts the entries in a bucket by the hash value, which allows binary search to be used to locate candidate matches. I was surprised that this didn't result in a performance improvement for the benchmarks that I ran, but I never got around to investigating further (either running more benchmarks or checking whether there was a bug in the implementation). Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge it up to HEAD if you'd like. -Neil ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hash index todo list item
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote: > Since we already have to check the actual tuple values for any index > lookup in postgresql, we could only store the full hash value and the > corresponding TIDs in the bucket. Then when we lookup an item by > calculating its hash, if the exact hash is not present in the bucket, > then we know that the item is not in the index. Sounds like you'd be returning a bitmap for use with a bitmap scan. That's a different take on other suggestions I've heard and would allow a hash index to have an almost unlimited key size yet flexible matching... (combined with other index, or even just the same index). Neat. Have a nice day, Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?
Tom Lane wrote: "Florian G. Pflug" <[EMAIL PROTECTED]> writes: So I believe you're right, and we can skip taking the lock in the no xid case - I actually think with just a little bit of more work, we can go even further, and get rid of the ReadNewTransactionId() call completely during snapshotting. [ squint... ] This goes a bit far for me. In particular, I think this will fail in the edge case when there are no live XIDs visible in ProcArray. You cannot go back and do ReadNewTransactionId afterward, at least not without re-scanning the ProcArray a second time, which makes it at best a questionable win. Why would it? The idea was to remember the largest committed xid, and that won't go away just because the proc array is rather empty xid-wise. Actually, in that case the "largest comitted xid"+1 will (nearly) be what ReadNewTransactionId() returns. (Nearly because the transaction with the xid ReadNewTransactionId()-1 might have aborted, so largestCommittedXid might be a bit further behind ReadNewTransactionId().) (That slightly lagging of largestCommittedXid might cause some tuples not to be VACUUMED though, so we might want to update largestCommittedXid for ABORTS too, and probably rename it to largestNonRunningXid or whatever ;-) ). I would go as far as saying that largestCommittedXid+1 is the natural choice for xmax - after all, xmax is the cutoff point after which a xid *cannot* be seen as committed, and largestCommittedXid+1 is the smallest xmax that guarantees that we see xacts committed before the snapshot as committed. The xmin computation won't change - apart from using some other initial value. This would rid us of the rather complicated entanglement of XidGenLock and the ProcArrayLock, lessen the lock contention, and reduce the average snapshot size a bit. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?
"Florian G. Pflug" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The lazy-XID patch, as committed, doesn't help that situation at all, > I think the comment is correct in principle - If we remove the oldest > xmin without locking, then two concurrent OldestXmin calculations > will get two different results. The question is if that has any > negative effects, though. My point is that there's a difference between what you compute (and publish) as your own xmin, and what you compute as the RecentGlobalXmin. I don't think there's any need for a guarantee that two concurrent processes get the same estimate of RecentGlobalXmin, as long as they do not get an estimate less than reality, ie, that someone cannot later compute and publish a smaller xmin. There are reasons why we want two concurrent GetSnapshotDatas to compute the same xmin, but I think in the end it just comes down to being a prerequisite for the above constraint --- without that you're not sure that someone might not be about to publish an xmin less than what you obtained as RecentGlobalXmin. Dropping a live xid is a whole different issue. There, you have the problem that you need everyone to see a consistent commit order, which is what the example in GetSnapshotData is about. But I don't think that xmin enters into that. xmin is only about "is it safe to drop this tuple because no one can see it?". There, we don't have to be exactly correct, we only have to err in the conservative direction. > It was this comment in GetSnapshotData that made me keep the locking > in the first place: > * It is sufficient to get shared lock on ProcArrayLock, even if we are > * computing a serializable snapshot and therefore will be setting > * MyProc->xmin. This is because any two backends that have overlapping > * shared holds on ProcArrayLock will certainly compute the same xmin If I recall correctly, that text was written to justify downgrading GetSnapshotData's hold on ProcArrayLock from exclusive to shared --- it was merely arguing that the results wouldn't change if we did that. I don't see an argument there that this condition is really *necessary*. We do have to think carefully about whether GetOldestXmin can compute a value that's too large, but right at the moment I see no problem there. > So I believe you're right, and we can skip taking the lock in the no > xid case - I actually think with just a little bit of more work, we > can go even further, and get rid of the ReadNewTransactionId() call > completely during snapshotting. [ squint... ] This goes a bit far for me. In particular, I think this will fail in the edge case when there are no live XIDs visible in ProcArray. You cannot go back and do ReadNewTransactionId afterward, at least not without re-scanning the ProcArray a second time, which makes it at best a questionable win. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Low hanging fruit in lazy-XID-assignment patch?
Tom Lane wrote: Simon was complaining a bit ago that we still have problems with excessive contention for the ProcArrayLock, and that much of this stems from the need for transaction exit to take that lock exclusively. The lazy-XID patch, as committed, doesn't help that situation at all, saying /* * Lock ProcArrayLock because that's what GetSnapshotData uses. * You might assume that we can skip this step if we had no * transaction id assigned, because the failure case outlined * in GetSnapshotData cannot happen in that case. This is true, * but we *still* need the lock guarantee that two concurrent * computations of the *oldest* xmin will get the same result. */ I think the comment is correct in principle - If we remove the oldest xmin without locking, then two concurrent OldestXmin calculations will get two different results. The question is if that has any negative effects, though. That leaves xmin, which AFAICS is only interesting for the computations of GetOldestXmin() and RecentGlobalXmin. And I assert it doesn't matter if those numbers advance asynchronously, so long as they never go backward. Yes, the xmin is surely the only field that might need need the locking. It was this comment in GetSnapshotData that made me keep the locking in the first place: * It is sufficient to get shared lock on ProcArrayLock, even if we are * computing a serializable snapshot and therefore will be setting * MyProc->xmin. This is because any two backends that have overlapping * shared holds on ProcArrayLock will certainly compute the same xmin * (since no xact, in particular not the oldest, can exit the set of * running transactions while we hold ProcArrayLock --- see further * discussion just below). So it doesn't matter whether another backend * concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as * set or not; he'd compute the same xmin for himself either way. * (We are assuming here that xmin can be set and read atomically, * just like xid.) But now that I read this again, I think that comment is just missleading - especially the part "So it doesn't matter whether another backend concurrently doing GetSnapshotData or GetOldestXmin sees our xmin as set or not; he'd compute the same xmin for himself either way." This sounds as if the Proc->xmin that *one* backend announces had influence over the Proc->xmin that *another* backend might compute. Which isn't true - it only influences the GlobalXmin that another backend might compute. So I believe you're right, and we can skip taking the lock in the no xid case - I actually think with just a little bit of more work, we can go even further, and get rid of the ReadNewTransactionId() call completely during snapshotting. There are two things we must ensure when I comes to snapshots, commits and xid assignment. 1) A transaction must either be not in progress, be in our snapshot, or have an xid >= xmax. 2) If transaction A sees B as committed, and B sees C as committed, then A must see C as committed. ad 1): We guarantee that by storing the xid in the proc array before releasing the XidGenLock. Therefore, when we later obtain our xmax value, we can be sure that we see all xacts in the proc array that have an xid < xmax and are in progress. ad 2): We guarantee that by serializing snapshotting against committing. Since we use ReadNewTransactionId() as the snapshot's xmax this implies that we take the ProcArrayLock *before* reading our xmax value. Now, ReadNewTransactionId() is actually larger than necessary as a xmax. The minimal xmax that we can set is "largest committed xid"+1. We can easily track that value during commit when we hold the ProcArrayLock (If we have no xid, and therefor don't have to hold the lock, we also don't need to update that value). If we used this "LatestCommittedXid" as xmax, we'd still guarantee (2), but without having to hold the XidGenLock during GetSnapshotData(). I wouldn't have dared to suggest this for 8.3, but since you came up with locking improvements in the first place... ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
On Thu, 6 Sep 2007, Kevin Grittner wrote: I thought that the bgwriter_lru_percent was scanned from the lru end each time; I would not expect that it would ever get beyond the oldest 10%. You're correct; I stated that badly. What I should have said is that your LRU writer could potentially scan the pool as fast as once per second if there were enough allocations going on. How low does the count have to go, or does it track the count when it becomes dirty and look for a decrease? The usage count has to be 0 before a page can be re-used for a new allocation, and the LRU background writer only writes out potentially reusable pages that are dirty. So the count has to be 0 before it will write it. This should keep us reading new pages, which covers this, yes? One would hope. Your whole arrangement of shared_buffers, checkpoint_segments, and related parameters will need to be reconsidered for 8.3; you've got a delicated balanced arrangement for your 8.2 setup right now that's working for you, but just translating it straight to 8.3 won't get you what you want. I'll get back to the message you already sent on that subject when I get enough time to address it fully. I'm not clear on the benefit of not writing the recently accessed dirty pages when there are no less recently used dirty pages. This presumes PostgreSQL has some notion of the balance of recently accessed vs. not accessed dirty pages, which it does not. Buffers get updated individually, and there's no mechanism summarizing what's in there; you have to scan the buffer cache yourself to figure that out. I do some of that in this new patch, tracking things like how many buffers are scanned on average to find reusable ones. Many months ago, I wrote a very complicated re-implementation of the all-scan portion of the background writer that tracked the usage count of everything it looked at, kept statistics about how many pages were dirty at each usage count, then targeted how high of a usage count could be written given some information about what I/O rate you felt your devices could sustain. This did exactly what you're asking for here: wrote whatever dirty pages were around starting with the ones that hadn't been recently used, then worked its way up to pages with a higher usage count if the recently used ones were all clean. As far as I've been able to tell, and from Heikki's test results, the load distributed checkpoint was a better answer to this problem. Rather than constantly fight to get pages with high usage counts out all the time, just spread the checkpoint out instead and deal with them only then. I gave up on that branch of code while he removed the all-scan writer altogether as part of committing LDC. I suspect the path I was following was exactly what you think you'd like to have, but it seems that it's not actually needed. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Win32 build Large Address Aware?
While reading one of the recent -perform threads, it occurred to me to check, and the 8.2.4 Win32 release binaries aren't marked "large address aware". This means the process gets a 2GB VM space, which is normal for 32bit Windows. On x64, my understanding is that each 32 bit process can actually get 4GB if the appropriate flag is set in the binary. (I don't have the hardware to verify this.) The reason documented for this behavior is that 2GB VM space was the hard limit for a very long time, so some applications borrowed the high bit for themselves to use, and couldn't cope with addresses over 2GB. Essentially just a default for backwards compatibility. So with that in mind, is there a reason the Win32 binaries aren't marked that way? Unless there are problems with it, it might be worth doing until 64bit builds are supported. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Low hanging fruit in lazy-XID-assignment patch?
Simon was complaining a bit ago that we still have problems with excessive contention for the ProcArrayLock, and that much of this stems from the need for transaction exit to take that lock exclusively. The lazy-XID patch, as committed, doesn't help that situation at all, saying /* * Lock ProcArrayLock because that's what GetSnapshotData uses. * You might assume that we can skip this step if we had no * transaction id assigned, because the failure case outlined * in GetSnapshotData cannot happen in that case. This is true, * but we *still* need the lock guarantee that two concurrent * computations of the *oldest* xmin will get the same result. */ On reflection though this seems wrong: I believe that we could skip taking the lock when exiting a transaction with no XID. The actions being guarded with the lock are MyProc->xid = InvalidTransactionId; MyProc->lxid = InvalidLocalTransactionId; MyProc->xmin = InvalidTransactionId; MyProc->inVacuum = false;/* must be cleared with xid/xmin */ /* Clear the subtransaction-XID cache too while holding the lock */ MyProc->subxids.nxids = 0; MyProc->subxids.overflowed = false; Clearing xid is obviously a no-op if we had no xid, and if we had no xid we have no subxids either, so the last 2 lines are also no-ops. I cannot see any reason why we need a guard on clearing lxid, either. inVacuum is only interesting if xmin is, since if there's no xid assigned then it's effectively just a filter on whether other backends pay attention to this one's xmin. That leaves xmin, which AFAICS is only interesting for the computations of GetOldestXmin() and RecentGlobalXmin. And I assert it doesn't matter if those numbers advance asynchronously, so long as they never go backward. Comments? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > On Thu, Sep 6, 2007 at 11:27 AM, in message > <[EMAIL PROTECTED]>, Greg Smith > <[EMAIL PROTECTED]> wrote: >> With the default delay of 200ms, this has the LRU-writer scanning the >> whole pool every 1 second, > > Whoa! Apparently I've totally misread the documentation. I thought that > the bgwriter_lru_percent was scanned from the lru end each time; I would > not expect that it would ever get beyond the oldest 10%. I believe you're correct and Greg got this wrong. I won't draw any conclusions about whether the LRU stuff is actually doing you any good though. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
>>> On Thu, Sep 6, 2007 at 7:03 PM, in message <[EMAIL PROTECTED]>, Jeff Davis <[EMAIL PROTECTED]> wrote: > > I think ... there's still room for a simple tool that can zero out > the meaningless data in a partially-used WAL segment before compression. > It seems reasonable to me, so long as you keep archive_timeout at > something reasonably high. > > If nothing else, people that already have a collection of archived WAL > segments would then be able to compact them. That would be a *very* useful tool for us, particularly if it could work against our existing collection of old WAL files. -Kevin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
>>> On Thu, Sep 6, 2007 at 11:27 AM, in message <[EMAIL PROTECTED]>, Greg Smith <[EMAIL PROTECTED]> wrote: > On Thu, 6 Sep 2007, Kevin Grittner wrote: > > I have been staring carefully at your configuration recently, and I would > wager that you could turn off the LRU writer altogether and still meet > your requirements in 8.2. I totally agree that it is of minor benefit compared to the all-writer, if it even matters at all. I knew that when I chose the settings. > Here's what you've got right now: > >> shared_buffers = 160MB (=2 buffers) >> bgwriter_lru_percent = 20.0 >> bgwriter_lru_maxpages = 200 >> bgwriter_all_percent = 10.0 >> bgwriter_all_maxpages = 600 > > With the default delay of 200ms, this has the LRU-writer scanning the > whole pool every 1 second, Whoa! Apparently I've totally misread the documentation. I thought that the bgwriter_lru_percent was scanned from the lru end each time; I would not expect that it would ever get beyond the oldest 10%. I put that in just as a guard to keep the backends from having to wait for the OS write. I've always doubted whether it was helping, but "it wasn't broke" > while the all-writer scans every two > seconds--assuming they don't hit the write limits. If some event were to > dirty the whole pool in 200ms, it might take as much as 6.7 seconds to > write everything out (2 / 600 * 200 ms) via the all-scan. Right. Since the file system didn't seem to be able to accept writes faster than 800 PostgreSQL pages per second, and I wanted to leave a LITTLE slack, I set that limit. We don't seem to hit it, as far as I can tell. In fact, the output rate would be naturally fairly smooth, if not for the "hold all dirty pages until the last possible moment, then write them all to the OS and fsync" approach. > There's a second low-level issue involved here. When a page becomes > dirty, that implies it was also recently used, which means the LRU writer > won't touch it. That page can't be written out by the LRU writer until an > entire pass has been made over the shared_buffer pool while looking for > buffers to allocate for new activity. When the allocation clock-sweep > passes over the newly dirtied buffer again, its usage count will drop by > one and it will no longer be considered recently used. At that point the > LRU writer can write it out. How low does the count have to go, or does it track the count when it becomes dirty and look for a decrease? > So unless there is other allocation activity > going on, the scan_whole_pool_seconds mechanism will never provide the > bound on time to scan and write everything you hope it will. That may not be an issue for the environment where this has been a problem for us -- the web hits are coming in at a pretty good rate 24/7. (We have a couple dozen large companies scanning data through HTTP SOAP requests all the time.) This should keep us reading new pages, which covers this, yes? > where the buffer cache was > filled with mostly dirty buffers that couldn't be re-used That would be the condition that would be the killer with a synchronous checkpoint if the OS cache has already had some dirty pages trickled out. If we can hit this condition in our web database, either the load distributed checkpoint will save us, or we can't use 8.3. Period. > The completely understandable line of thinking that led to your request > here is one of my concerns with exposing scan_whole_pool_seconds as a > tunable. It may suggest to people that if they set the number very low, > it will assure all dirty buffers will be scanned and written within that > time bound. That's certainly not the case; both the maxpages and the > usage count information will actually drive the speed that mechanism plods > through the buffer cache. It really isn't useful for scanning fast. I'm not clear on the benefit of not writing the recently accessed dirty pages when there are no less recently used dirty pages. I do trust the OS to not write them before they age out in that cache, and the OS cache doesn't start writing dirty pages from its cache until they reach a certain percentage of the cache space, so I'd just as soon let the OS know that the MRU dirty pages are there, so it knows that it's time to start working on the LRU pages in its cache. -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup(Maybe OFFTOPIC)
On Thu, 2007-09-06 at 21:50 +0100, Gregory Stark wrote: > > - Improve scalability of recovery for large I/O bound databases > > That seems too vague for the TODO. Did you have specific items in mind? I think we should parallelise recovery. Heikki wanted to do this another way, so I worded it vaguely (i.e. as a requirement) to cover multiple approaches. Some of that was discussed on -hackers already (where?). -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup(Maybe OFFTOPIC)
"Simon Riggs" <[EMAIL PROTECTED]> writes: > High Availability > - > - Allow a warm standby system to also allow read-only statements > - Allow WAL traffic to be streamed to another server for stand-by > replication (synchronous/asynchronous options) Asynchronous streaming of WAL would be a very good feature. Synchronous streaming where a commit doesn't return until the backup server (or a quorum of backup servers if you have more than one) acknowledges receipt of the logs past the commit record would be an *extremely* good feature. Those could be broken out into two separate points. Streaming WAL is one thing, feedback and pitr-synchronous commits would be a second point. Knowing what WAL record the backup server had received so far and what WAL record it had processed so far would be useful for plenty of other purposes as well. > - Improve scalability of recovery for large I/O bound databases That seems too vague for the TODO. Did you have specific items in mind? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, 2007-09-06 at 19:23 -0400, Tom Lane wrote: > Jeff Davis <[EMAIL PROTECTED]> writes: > > On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote: > >> With file-based log shipping, you can get down to 1 second, by using the > >> archive_timeout setting. It will produce a lot of log files with very > >> little content in them, but they will compress well. > > > I tried doing a couple pg_switch_xlog() in quick succession, and the WAL > > files that represent less than a second of actual data don't seem much > > more compressible than the rest. > > That's because we do not try to zero out a WAL file when recycling it, > so the part after what's been used is not any more compressible than the > valid part. > That was my point. > pg_switch_xlog is not, and was never intended to be, a solution for a > low-latency log-shipping scheme. The right solution for that is to make > a smarter log-shipping daemon that transmits only part of a WAL segment > at a time. (As Hannu is getting tired of repeating, you can find a > working example in SkyTools.) I think that in addition to WalMgr (which is the tool I assume you're referring to), there's still room for a simple tool that can zero out the meaningless data in a partially-used WAL segment before compression. It seems reasonable to me, so long as you keep archive_timeout at something reasonably high. If nothing else, people that already have a collection of archived WAL segments would then be able to compact them. I agree that something like WalMgr is better for low-latency, however. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Trouble with the PL/pgSQL debugger and VC++
Hi Korry, Can you e-mail the VC++ projects that you created? (You can do that off-list). I have VC++ here but I haven't tried to do a PG build in that environment yet (guess it's time to learn). Done. However, I can't set any breakpoints using PgAdmin. I know the dlls are loaded via Process Explorer, and in fact I can attach to them with the VC++ debugger. When you say that you can't set any breakpoints using PgAdmin, does that mean that the menu choices ("Set Breakpoint" and "Debug") just don't appear? Or are they there but don't do anything? Or are you getting an error message? Let me see if I can clarify. Both choices are available from the PgAdmin menu. When I choose Debug a window opens asking me to set the value for the parameter to the fucntion I do that, hit OK. But then the window just reappears again. If instead I do "Set Breakpoint" then I get a window that says "Waiting to set breakpoint in core.testwhere" with a progress bar (note there is a debugger window behind it also). That window never goes away. When I press cancel I get a Debug Assertion Failure: close.c, line 47 Expression (fh >= 0 && (unsigned)fh < (unsigned)_nhandle) Can you gather a PgAdmin log file (see Options on the File Menu, then choose the Logging tab, check "Debug") and send it to me. That will give me some clues. Yes, will do. * Opening a new pgadmin window and doing step 6 (SELECT * FROM pldbg_wait_for_breakpoint(1);). That didn't work. That won't work... you want to open another session and SELECT testwhere(7) from the new session. I thought each pgadmin sql window was its own session though (they have different backend pids)? No? The first session is your debugger client, the second session is the target process (the application that you are debugging). * I tried executing the function (select testwhere(7);). That didn't work. I presume that you mean that the debugger session was still hung in the call to pldbg_wait_for_target(1), right? Did you remember to set shared_preload_librarys = '$libdir/plugins/plugin_debugger' in your postgresql.conf file (and restart the server aftwards)?. Yes. And checked it a few times :) FYI the readme includes the .so ($libdir/plugins/plugin_debugger.so') if I remember correctly, might want to remove that. Charlie smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Some more msvc++ 8.2.4 build feedback
Charlie Savage wrote: > Hi Magnus, > >> Yeah, this is a known problem, and it's fixed for 8.3. Really, the msvc >> build process in 8.2 has a *lot* of rough edges ;-) The code itself builds >> fine, but the buildsystem needed a lot more work. >> > Great - figured that might be the case. > >>> 3. If you have a contrib module that is not known to the build system >>> it blows up. For example, I use PostGIS. When running build.bat it >>> stops at postgis (sorry, I didn't write down the error message but can >>> easily get it if needed). Could unknown contrib modules just be skipped >>> instead? >>> >> >> Uh, can you explain more what you mean? How can the pg build be affected by >> postgis? >> > Download postgis source and put it under contrib/postis. Then: > > C:\Development\msys\src\postgresql-8.2.4\src\tools\msvc>perl mkvcbuild.pl > Could not determine contrib module type for postgis-1.2.1 > at mkvcbuild.pl line 326 > main::AddContrib('postgis-1.2.1') called at mkvcbuild.pl line 200 > > There is a loop in mkvcbuild.pl that processes the contrib modules: > > my $D; > opendir($D, 'contrib') || croak "Could not opendir on contrib!\n"; > while (my $d = readdir($D)) { > next if ($d =~ /^\./); > next unless (-f "contrib/$d/Makefile"); > next if (grep {/^$d$/} @contrib_excludes); > AddContrib($d); > } > closedir($D); > > > Perhaps I shouldn't be installing the postgis source to contrib, but > that used to be the recommended practice (at least for PostGis). Oh, didn't realise you'd stick it in the actual contrib tree. I thought it would build with pgxs or something. I'll put it on my TODO to change that code to look at the Makefile instead of the directories. Unless you want to send in a patch :-P //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Trouble with the PL/pgSQL debugger and VC++
Hi Korry, I am having problems with getting the debugger to work. Setup: OS: Windows XP Postgresql Version: 8.2.4 compiled with VC++ 2005 Since my version of pg is built with VC++ I thought it wise to also use VC++ for the debugger plugin. So I converted the Makefile into 3 different VC++ projects - pldbgapi, plugin_debugger and targetinfo. Note that targetinfo is not mentioned at all in the comments at the top of Makefile - you may wish to update them. VC++ did not compile the code as is, I've attached a patch below with the changes I had to make. I also generated the appropriate DEF files (using the perl scripts in postgresql-8.2.4\src\tools\msvc). I also had to define a preprocess define, __WIN32__ (line 1524, pldgbapi.c). Maybe you could use something more standard, like _WIN32? I was then able to build the dlls except the profiler (for some reason the struct timezone wasn't being picked up via the includes - but I left that for another day). I then installed the dlls as per the instructions and updated my postgresql.conf file. However, I can't set any breakpoints using PgAdmin. I know the dlls are loaded via Process Explorer, and in fact I can attach to them with the VC++ debugger. So then I tried running through your command line example (using the PgAdmin sql window and then psql) and here are the results: 1. CREATE OR REPLACE FUNCTION testwhere(x int) RETURNS int AS $$ DECLARE result int; BEGIN result := x + 1; RETURN x; END; $$ LANGUAGE 'plpgsql' STABLE; 2. SELECT * FROM pldbg_get_target_info( 'testwhere', 'f' ); target;schema;nargs;argtypes;targetname;argmodes;argnames;targetlang;fqname;returnsset;returntype 80655;79041;1;23;testwhere;;{x};77823;core.testwhere;f;23 3. SELECT * FROM pldbg_create_listener(); pldbg_create_listener 1 4. SELECT * from pldbg_set_global_breakpoint(1, 80655, NULL, NULL); pldbg_set_global_breakpoint t 5. SELECT * FROM pldbg_wait_for_target(1); At this point the session hangs, as explained in your email. So what happens next? I tried: * Opening a new pgadmin window and doing step 6 (SELECT * FROM pldbg_wait_for_breakpoint(1);). That didn't work. * I tried executing the function (select testwhere(7);). That didn't work. Any tips/help appreciated. Like I said, I can step through the code in the VC++ debugger so I can pretty much look at anything that might be helpful. And I'd be happy to send along the VC++ project files and DEF files if you'd like them. Thanks, Charlie Only in .: msvc diff -u /c/temp/contrib/pldebugger/pldbgapi.c ./pldbgapi.c --- /c/temp/contrib/pldebugger/pldbgapi.c 2007-07-29 17:09:46 -0600 +++ ./pldbgapi.c2007-09-06 00:34:29 -0600 @@ -1560,6 +1560,12 @@ } static int allocateServerListener( int * port ) { +#ifdef WIN32 + WORDwVersionRequested; + WSADATA wsaData; + int err; + u_long blockingMode = 0; +#endif int sockfd = socket( AF_INET, SOCK_STREAM, 0 ); struct sockaddr_in proxy_addr = {0}; socklen_t proxy_addr_len = sizeof( proxy_addr ); @@ -1571,9 +1577,6 @@ proxy_addr.sin_addr.s_addr = htonl( INADDR_ANY ); #ifdef WIN32 - WORDwVersionRequested; - WSADATA wsaData; - int err; wVersionRequested = MAKEWORD( 2, 2 ); @@ -1617,7 +1620,6 @@ listen( sockfd, 2 ); #ifdef WIN32 - u_long blockingMode = 0; ioctlsocket( sockfd, FIONBIO, &blockingMode ); #endif Only in .: pldebugger diff -u /c/temp/contrib/pldebugger/plugin_debugger.c ./plugin_debugger.c --- /c/temp/contrib/pldebugger/plugin_debugger.c2007-08-07 10:37:14 -0600 +++ ./plugin_debugger.c 2007-09-06 00:34:58 -0600 @@ -1143,16 +1143,18 @@ int client_sock; int reuse_addr_flag = 1; +#ifdef WIN32 + WORD wVersionRequested; + WSADATA wsaData; + int err; + u_long blockingMode = 0; +#endif /* Ask the TCP/IP stack for an unused port */ srv_addr.sin_family = AF_INET; srv_addr.sin_port= htons( 0 ); srv_addr.sin_addr.s_addr = htonl( INADDR_ANY ); #ifdef WIN32 - WORD wVersionRequested; - WSADATA wsaData; - int err; - wVersionRequested = MAKEWORD( 2, 2 ); err = WSAStartup( wVersionRequested, &wsaData ); @@ -1197,7 +1199,6 @@ listen( sockfd, 2 ); #ifdef WIN32 - u_long blockingMode = 0; ioctlsocket( sockfd, FIONBIO, &blockingMode ); #endif smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
>>> On Thu, Sep 6, 2007 at 3:25 PM, in message <[EMAIL PROTECTED]>, Jeff Davis <[EMAIL PROTECTED]> wrote: > On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote: >> With file-based log shipping, you can get down to 1 second, by using the >> archive_timeout setting. It will produce a lot of log files with very >> little content in them, but they will compress well. > > I tried doing a couple pg_switch_xlog() in quick succession, and the WAL > files that represent less than a second of actual data don't seem much > more compressible than the rest. Agreed. We kick out a WAL file once per hour (if they don't fill first) and have found that a compressed WAL file during normal activity averages a little over 4 MB. During the nightly VACUUM ANALYZE we get a few over 10 MB. The interesting thing is that about the time that these WAL files would get reused, even when the system is idle, they are at the VACUUM ANALYZE size again. Note how the first three 18:00 file sizes echo at the next morning, before the users are in and working. 5293110 2007-08-22 17:14 000100180044.gz 5205720 2007-08-22 17:14 000100180045.gz 5104595 2007-08-22 17:14 000100180046.gz 3747524 2007-08-22 17:14 000100180047.gz 3118762 2007-08-22 17:14 000100180048.gz 3342918 2007-08-22 17:14 000100180049.gz 4600438 2007-08-22 17:14 00010018004A.gz 2715708 2007-08-22 17:15 00010018004B.gz 2865803 2007-08-22 17:15 00010018004C.gz 10111700 2007-08-22 18:00 00010018004D.gz 11561630 2007-08-22 18:00 00010018004E.gz 11619590 2007-08-22 18:00 00010018004F.gz 7165231 2007-08-22 18:00 000100180050.gz 4012164 2007-08-22 18:00 000100180051.gz 4590502 2007-08-22 18:00 000100180052.gz 3617203 2007-08-22 18:01 000100180053.gz 3552210 2007-08-22 18:01 000100180054.gz 4006261 2007-08-22 18:01 000100180055.gz 3538293 2007-08-22 18:01 000100180056.gz 4291776 2007-08-22 18:02 000100180057.gz 4837268 2007-08-22 18:02 000100180058.gz 3948408 2007-08-22 19:02 000100180059.gz 2714635 2007-08-22 20:02 00010018005A.gz 4989698 2007-08-22 21:02 00010018005B.gz 5290729 2007-08-22 22:02 00010018005C.gz 5203306 2007-08-22 23:02 00010018005D.gz 5101976 2007-08-23 00:02 00010018005E.gz 3745125 2007-08-23 01:02 00010018005F.gz 3119218 2007-08-23 02:02 000100180060.gz 3340691 2007-08-23 03:02 000100180061.gz 4599279 2007-08-23 04:02 000100180062.gz 10110899 2007-08-23 05:02 000100180063.gz 11555895 2007-08-23 06:02 000100180064.gz 11587646 2007-08-23 07:02 000100180065.gz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Trouble with the PL/pgSQL debugger and VC++
Since my version of pg is built with VC++ I thought it wise to also use VC++ for the debugger plugin. So I converted the Makefile into 3 different VC++ projects - pldbgapi, plugin_debugger and targetinfo. Note that targetinfo is not mentioned at all in the comments at the top of Makefile - you may wish to update them. Hi Charlie, thanks for the feedback (and thanks for the patches!). Can you e-mail the VC++ projects that you created? (You can do that off-list). I have VC++ here but I haven't tried to do a PG build in that environment yet (guess it's time to learn). However, I can't set any breakpoints using PgAdmin. I know the dlls are loaded via Process Explorer, and in fact I can attach to them with the VC++ debugger. When you say that you can't set any breakpoints using PgAdmin, does that mean that the menu choices ("Set Breakpoint" and "Debug") just don't appear? Or are they there but don't do anything? Or are you getting an error message? Can you gather a PgAdmin log file (see Options on the File Menu, then choose the Logging tab, check "Debug") and send it to me. That will give me some clues. 5. SELECT * FROM pldbg_wait_for_target(1); At this point the session hangs, as explained in your email. So what happens next? I tried: * Opening a new pgadmin window and doing step 6 (SELECT * FROM pldbg_wait_for_breakpoint(1);). That didn't work. That won't work... you want to open another session and SELECT testwhere(7) from the new session. The first session is your debugger client, the second session is the target process (the application that you are debugging). * I tried executing the function (select testwhere(7);). That didn't work. I presume that you mean that the debugger session was still hung in the call to pldbg_wait_for_target(1), right? Did you remember to set shared_preload_librarys = '$libdir/plugins/plugin_debugger' in your postgresql.conf file (and restart the server aftwards)?. -- Korry ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Some more msvc++ 8.2.4 build feedback
Oh, didn't realise you'd stick it in the actual contrib tree. I thought it would build with pgxs or something. You used to have to build postgis from the contrib directory. That changed a while back, but I guess its just habit. It has its own configure and makefile scripts, so doesn't seem to use pgxs. I'll put it on my TODO to change that code to look at the Makefile instead of the directories. Unless you want to send in a patch :-P Thanks. No patch from me - my perl and autoconf skills leave something to be desired :) Charlie smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Jeff Davis <[EMAIL PROTECTED]> writes: > On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote: >> With file-based log shipping, you can get down to 1 second, by using the >> archive_timeout setting. It will produce a lot of log files with very >> little content in them, but they will compress well. > I tried doing a couple pg_switch_xlog() in quick succession, and the WAL > files that represent less than a second of actual data don't seem much > more compressible than the rest. That's because we do not try to zero out a WAL file when recycling it, so the part after what's been used is not any more compressible than the valid part. pg_switch_xlog is not, and was never intended to be, a solution for a low-latency log-shipping scheme. The right solution for that is to make a smarter log-shipping daemon that transmits only part of a WAL segment at a time. (As Hannu is getting tired of repeating, you can find a working example in SkyTools.) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
On Thu, Sep 06, 2007 at 09:20:31AM -0500, Kevin Grittner wrote: > >>> On Wed, Sep 5, 2007 at 10:31 PM, in message > <[EMAIL PROTECTED]>, Greg Smith > <[EMAIL PROTECTED]> wrote: > > > > -There are two magic constants in the code: > > > > int smoothing_samples = 16; > > float scan_whole_pool_seconds = 120.0; > > > > > I personally > > don't feel like these constants need to be exposed for tuning purposes; > > > Determining > > whether these should be exposed as GUC tunables is certainly an open > > question though. > > If you exposed the scan_whole_pool_seconds as a tunable GUC, that would > allay all of my concerns about this patch. Basically, our problems were I like the idea of not having that as a GUC, but I'm doubtful that it can be hard-coded like that. What if checkpoint_timeout is set to 120? Or 60? Or 2000? I don't know that there should be a direct correlation, but ISTM that scan_whole_pool_seconds should take checkpoint intervals into account somehow. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpiBGkQouND3.pgp Description: PGP signature
Re: [HACKERS] Hash index todo list item
Michael Glaesemann wrote: On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of collisions. There might be other issues with truncation, but preventing hashes from being unique isn't one of them. Of course - that's why I used the word "limit". Hash works best, when the key is unique, however. A 32-bit hash will be many powers of 2 more unique than a 8-bit hash. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Ühel kenal päeval, N, 2007-09-06 kell 19:33, kirjutas apoc9009: > Simon Riggs schrieb: > > I'm not clear whether you are looking for Backup, or High Availability > > Replication. > > > > There is no data loss with the online backup technique described in the > > manual. > > > No, there is a lost of Data. > > The WAL File musst bei generated by the Postmaster and this File must be > copied to the > safe Remote Backupserver. If the Machine, where the Database is running > crashed, then the Last > WAL is lost and the Backup isnt complet,e this is the Center of the Problem. read and re-read 23.4.4 you DO NOT have to wait for the whole file to be complete to copy it over you can query the position where postgres has currently completed writing and then copy over that part. doing so you can have wal-based backup that is good up to last second (if you poll the function and copy over the newly written part of the file every second) Look at WalMgr.py in SkyTools package how this can be done in practice. Skytools is available at http://pgfoundry.org/projects/skytools/ If used for setting up WAL-based backup on pg 8.2, it runs automatically in "syncdaemon" mode, which means that parts of WAL file are copied over as soon as they are written by postgres. -- Hannu ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, 2007-09-06 at 12:08 +0100, Heikki Linnakangas wrote: > With file-based log shipping, you can get down to 1 second, by using the > archive_timeout setting. It will produce a lot of log files with very > little content in them, but they will compress well. I tried doing a couple pg_switch_xlog() in quick succession, and the WAL files that represent less than a second of actual data don't seem much more compressible than the rest. I think WAL compression has been talked about before, with ideas such as removing unneeded full page images. However, it seems like it would be easy to get a lot of the benefit by just having a utility that could discard useless contents that are left over from a previously-recycled file, and then could just reconstruct it later with zeros. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, 2007-09-06 at 17:53 +0300, Hannu Krosing wrote: > Ühel kenal päeval, N, 2007-09-06 kell 16:15, kirjutas apoc9009: > > > So you want the user to still be connected to the failed machine, but at > > > the same time be connected to the new live failover machine ? > > > > > > - > > > Hannu > > No. > > The User should be connected to the running db without restrictions > > while backup is in progress > > And what's wrong with WAL-based backup then ? > > Especially the variant described in 23.4.4 wich keeps the WAL copied to > backup machine up to last second ? I think the OP means - synchronous replication - ability to access the standby node for queries Yes, both requests are planned for later releases. Bruce, can we edit the TODO to include a section called "High Availability"? We already have a few scattered references to such things, but in Admin and WAL. We need a specific section. We currently have these items, that can be moved to the new section: - Allow a warm standby system to also allow read-only statements - Allow WAL traffic to be streamed to another server for stand-by replication new section would look like this: High Availability - - Allow a warm standby system to also allow read-only statements - Allow WAL traffic to be streamed to another server for stand-by replication (synchronous/asynchronous options) - Improve scalability of recovery for large I/O bound databases -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, Sep 06, 2007 at 04:08:10PM +0200, apoc9009 wrote: > >archive_timeout setting. It will produce a lot of log files with very > >little content in them, but they will compress well. > > > Yes, it is possible but not recommended . My Backup Servers Filesystem > will explode :D . . . > Correct, but this is not good enought and i think there are a lot of > Peoples having the same Problem. > It was wishfull, having a Online Streaming Backupsubsystem, thadt can > produce portable Backupdatabases, > to prevent users for Millions of Archive Files on the Backup FTP-Server. It seems that what you want is near-real-time online backups with _no cost_, which is not a feature that I think anyone will ever work on. A -- Andrew Sullivan | [EMAIL PROTECTED] Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On 9/6/07, apoc9009 <[EMAIL PROTECTED]> wrote: > Backup 12/24/2008 Version 2 > /pg/backup/12_24_2008/base/rcvry.rcv <--- Basebackup > /pg/backup/12_24_2008/changes/0001.chg <--- Changed Data > /changes/0002.chg <--- Changed Data > /changes/0003.chg <--- Changed Data > > /changes/0010.chg <--- Changed Data > > /changes/0001.rsf <--- Recovery > Stripeset File (10 MByte) addon of Basebackup > delete *.chg > > if a Stripeset of 10 *.chg Files exist, they should be converted or merged > to one greater Recovery Stripe File (*.RSF) Why? What does this actually do? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hash index todo list item
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote: > Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. >>> How does that help? You still need all the keys to find out which >>> bucket to look in. >>> >> >> no. you need to look at only the buckets where that part of hash matches >> >> say you allocate bits 4-7 for column 2 and then need to look up column 2 >> value with hash 3 . here you need to look at only buckets N*16 + 3, that >> is, you need to examine only each 16th bucket >> >> > > I don't like the truncating hash suggestion because it limits the ability > of a hash code to uniquely identify a key. > > If a user requires the ability to search on both (column1) and (column1, > column2), they can create two hash indexes and the planner can decide which > to use. > Or, they can use a btree. I think hash has a subset of uses where it would > be a significant gain, and focus should be spent on this subset. > > Cheers, > mark > I agree that we should focus primarily on the subset of uses for hash indexes where there would be a significant gain. I do think that being able to use a single O(1) hash lookup against all the values specified in a pseudo-multi-column index could be very beneficial in reducing access time and I/O. Since we already have to check the actual tuple values for any index lookup in postgresql, we could only store the full hash value and the corresponding TIDs in the bucket. Then when we lookup an item by calculating its hash, if the exact hash is not present in the bucket, then we know that the item is not in the index. If the value exists, then we would check the heap tuples before returning the results. Thus a negative lookup only needs to check the index and if the hash function is "good" there will be optimally only 1 possibly valid heap tuple if there is a match. One very big win for this change is to allow a much smaller index size (hash value + relevant TIDs) and the large column values are only stored in the actual data tuples. Regards, Ken > -- > Mark Mielke <[EMAIL PROTECTED]> > ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Hash index todo list item
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: > >> No, because part of the deal is that you can do lookups using only the > >> leading index columns. At least, all the existing multicolumn index > >> types can do that. > > > One approahc is not to mix hashes, but to partition the hash, so that > > each column gets its N bits in the hash. > > How does that help? You still need all the keys to find out which > bucket to look in. no. you need to look at only the buckets where that part of hash matches say you allocate bits 4-7 for column 2 and then need to look up column 2 value with hash 3 . here you need to look at only buckets N*16 + 3, that is, you need to examine only each 16th bucket - Hannu ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?
Hi Korry, On Sep 6, 2007, at 10:23 AM, korry.douglas wrote: John, I started writing up the API documentation and then noticed that most of what I intended to write is already described in the pldbgapi.c module. Take a look at that module and let me know if you have any questions (you can e-mail me off-list if you like). I'll update the documentation in pldbgapi.c as needed. I just noticed that when digging around last night. It helped a lot with my understanding of how things work. I think that needs to go in the readme file or at least reference it from the readme file. I would still like to see a simple example using psql. I know you would not really use psql for this, but I think it would help a lot with getting started for folks that want to use the debugger. I did not spend lots of time on it, but even after reading pldbgapi.c I was not able to get simple session going (e.g. how to start a session and request the source for a procedure). Thanks, John John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Simon Riggs schrieb: On Thu, 2007-09-06 at 12:53 +0200, apoc9009 wrote: You've either not read 23.4.4 or haven't understood it. If the text is unclear, documentation additions/changes are always welcome. I have read this: PostgreSQL directly supports file-based log shipping as described above. It is also possible to implement record-based log shipping, though this requires custom development. We may expect that feature in later releases, but definitely not in 8.3 I wish to have an Solution, thadt backup my Database DB wihout Datalosses, without locking Tables, without Shutdown and without any User must be forced for logging out (Backup in Production State Online without Datalosses). I'm not clear whether you are looking for Backup, or High Availability Replication. There is no data loss with the online backup technique described in the manual. No, there is a lost of Data. The WAL File musst bei generated by the Postmaster and this File must be copied to the safe Remote Backupserver. If the Machine, where the Database is running crashed, then the Last WAL is lost and the Backup isnt complet,e this is the Center of the Problem. If you require HA replication with zero data-loss then you are currently restricted to non-database techniques, which you already mentioned, so you have your answer already. we talking about Backup this isnt the same as Replication. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Heikki Linnakangas schrieb: apoc9009 wrote: "Without datalosses" is utopy. For that, you'd need something like synchronous replication, otherwise you're always going to have a window where you have something committed in the server, but not yet in the backup. So it's just a question of how wide that window is, how much data loss can you live with. With file-based log shipping, you can get down to 1 second, by using the archive_timeout setting. It will produce a lot of log files with very little content in them, but they will compress well. Yes, it is possible but not recommended . My Backup Servers Filesystem will explode :D The record-based log shipping will give you a very narrow window, down to < 1 second or even less if you're willing to poll the server that often, but requires some custom development. Correct, but this is not good enought and i think there are a lot of Peoples having the same Problem. It was wishfull, having a Online Streaming Backupsubsystem, thadt can produce portable Backupdatabases, to prevent users for Millions of Archive Files on the Backup FTP-Server. My Ideaa is the following Folder Structure: /pg/backup/ Backup 12/24/2007 Version 1 /pg/backup/12_24_2007/base/rcvry.rcv <--- Basebackup /pg/backup/12_24_2007/changes/0001.chg <--- Changed Data /changes/0002.chg <--- Changed Data /changes/0003.chg <--- Changed Data Backup 12/24/2008 Version 2 /pg/backup/12_24_2008/base/rcvry.rcv <--- Basebackup /pg/backup/12_24_2008/changes/0001.chg <--- Changed Data /changes/0002.chg <--- Changed Data /changes/0003.chg <--- Changed Data /changes/0010.chg <--- Changed Data /changes/0001.rsf <--- Recovery Stripeset File (10 MByte) addon of Basebackup delete *.chg if a Stripeset of 10 *.chg Files exist, they should be converted or merged to one greater Recovery Stripe File (*.RSF) Things to do: 1.A Integradted FTP-Client to the Postmaster Process 2.Online Streamingbackup Logic inside the Postmaster Apoc ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, 2007-09-06 at 12:53 +0200, apoc9009 wrote: > > You've either not read 23.4.4 or haven't understood it. If the text is > > unclear, documentation additions/changes are always welcome. > I have read this: > > PostgreSQL directly supports file-based log shipping as described above. > It is also possible to implement record-based log shipping, though this > requires custom development. We may expect that feature in later releases, but definitely not in 8.3 > I wish to have an Solution, thadt backup my Database DB wihout > Datalosses, without locking Tables, without Shutdown > and without any User must be forced for logging out (Backup in > Production State Online without Datalosses). I'm not clear whether you are looking for Backup, or High Availability Replication. There is no data loss with the online backup technique described in the manual. If you require HA replication with zero data-loss then you are currently restricted to non-database techniques, which you already mentioned, so you have your answer already. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Hash index todo list item
On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of collisions. There might be other issues with truncation, but preventing hashes from being unique isn't one of them. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Ühel kenal päeval, N, 2007-09-06 kell 16:15, kirjutas apoc9009: > > So you want the user to still be connected to the failed machine, but at > > the same time be connected to the new live failover machine ? > > > > - > > Hannu > No. > The User should be connected to the running db without restrictions > while backup is in progress And what's wrong with WAL-based backup then ? Especially the variant described in 23.4.4 wich keeps the WAL copied to backup machine up to last second ? --- Hannu ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
>>> On Wed, Sep 5, 2007 at 10:31 PM, in message <[EMAIL PROTECTED]>, Greg Smith <[EMAIL PROTECTED]> wrote: > > -There are two magic constants in the code: > > int smoothing_samples = 16; > float scan_whole_pool_seconds = 120.0; > > I personally > don't feel like these constants need to be exposed for tuning purposes; > Determining > whether these should be exposed as GUC tunables is certainly an open > question though. If you exposed the scan_whole_pool_seconds as a tunable GUC, that would allay all of my concerns about this patch. Basically, our problems were resolved by getting all dirty buffers out to the OS cache within two seconds; any longer than that and the OS cache didn't reach its trigger point for pushing out to the controller cache in time to prevent the glut which locks everything up. I also suspect that this interval kept the OS cache more aware of frequently updated pages, so that it could avoid unnecessary physical writes under its own logic. While I'm hoping that the new checkpoint techniques will be a better solution, I can't count on that without significant testing in our environment, and I really want a fall-back. The metric you emphasized was the percentage of PostgreSQL writes to the OS cache which were handled by the background writer, which doesn't necessarily correspond to a solution to the glut, which is based on the peak number of total writes presented to the controller by the OS within a small window of time. -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?
Is there any documentation that describes how to use the SQL functions? Some are obvious enough, but a simple example showing a debugging session would be helpful. John, I started writing up the API documentation and then noticed that most of what I intended to write is already described in the pldbgapi.c module. Take a look at that module and let me know if you have any questions (you can e-mail me off-list if you like). I'll update the documentation in pldbgapi.c as needed. -- Korry ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hash index todo list item
Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in. no. you need to look at only the buckets where that part of hash matches say you allocate bits 4-7 for column 2 and then need to look up column 2 value with hash 3 . here you need to look at only buckets N*16 + 3, that is, you need to examine only each 16th bucket I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. If a user requires the ability to search on both (column1) and (column1, column2), they can create two hash indexes and the planner can decide which to use. Or, they can use a btree. I think hash has a subset of uses where it would be a significant gain, and focus should be spent on this subset. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [HACKERS] Just-in-time Background Writer Patch+Test Results
On Thu, 6 Sep 2007, Kevin Grittner wrote: If you exposed the scan_whole_pool_seconds as a tunable GUC, that would allay all of my concerns about this patch. Basically, our problems were resolved by getting all dirty buffers out to the OS cache within two seconds Unfortunately it wouldn't make my concerns about your system go away or I'd have recommended exposing it specifically to address your situation. I have been staring carefully at your configuration recently, and I would wager that you could turn off the LRU writer altogether and still meet your requirements in 8.2. Here's what you've got right now: shared_buffers = 160MB (=2 buffers) bgwriter_lru_percent = 20.0 bgwriter_lru_maxpages = 200 bgwriter_all_percent = 10.0 bgwriter_all_maxpages = 600 With the default delay of 200ms, this has the LRU-writer scanning the whole pool every 1 second, while the all-writer scans every two seconds--assuming they don't hit the write limits. If some event were to dirty the whole pool in 200ms, it might take as much as 6.7 seconds to write everything out (2 / 600 * 200 ms) via the all-scan. The all-scan is already gone in 8.3. Your LRU scan will take much longer than that to clear everything out. At least (2 / 200 * 200ms) 20 seconds to clear a fully dirty cache. But in fact, it's impossible to even bound how long it will take before the LRU writer (which is the only part this new patch tries to improve) gets around to writing even a single dirty buffer no matter what bgwriter_lru_percent (8.2) or scan_whole_pool_seconds (JIT patch) is set to. There's a second low-level issue involved here. When a page becomes dirty, that implies it was also recently used, which means the LRU writer won't touch it. That page can't be written out by the LRU writer until an entire pass has been made over the shared_buffer pool while looking for buffers to allocate for new activity. When the allocation clock-sweep passes over the newly dirtied buffer again, its usage count will drop by one and it will no longer be considered recently used. At that point the LRU writer can write it out. So unless there is other allocation activity going on, the scan_whole_pool_seconds mechanism will never provide the bound on time to scan and write everything you hope it will. And if there's other allocations going on, the much more powerful JIT mechanism will scan the whole pool plenty fast if you bump the already exposed multiplier tunable up. In my tests where the buffer cache was filled with mostly dirty buffers that couldn't be re-used (something relatively easy to trigger with pgbench tests), I've actually watched the new code scan >90% of the buffer cache looking for those few reusable buffers in the pool in a single invocation. This would be like setting bgwriter_lru_percent=90.0 in the old configuration, but it only gets that aggressive when the distribution of pages in the buffer cache demands it, and when it has reason to believe going that fast will be helpful. The completely understandable line of thinking that led to your request here is one of my concerns with exposing scan_whole_pool_seconds as a tunable. It may suggest to people that if they set the number very low, it will assure all dirty buffers will be scanned and written within that time bound. That's certainly not the case; both the maxpages and the usage count information will actually drive the speed that mechanism plods through the buffer cache. It really isn't useful for scanning fast. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
So you want the user to still be connected to the failed machine, but at the same time be connected to the new live failover machine ? - Hannu No. The User should be connected to the running db without restrictions while backup is in progress Apoc ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Ühel kenal päeval, N, 2007-09-06 kell 12:53, kirjutas apoc9009: > I wish to have an Solution, thadt backup my Database DB wihout > Datalosses, Then you need the backup record to be completed on the backup machine before the transaction commit returns on master. This is quaranteed to be really slow for any significant transaction rate but can be done using DRBD. > without locking Tables, No backup I know of needs locking tables > without Shutdown This one I just don't understand. Shtdown what ? > without any User must be forced for logging out (Backup in > Production State Online without Datalosses). So you want the user to still be connected to the failed machine, but at the same time be connected to the new live failover machine ? - Hannu ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hash index todo list item
Hannu Krosing <[EMAIL PROTECTED]> writes: > Ãhel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: >> No, because part of the deal is that you can do lookups using only the >> leading index columns. At least, all the existing multicolumn index >> types can do that. > One approahc is not to mix hashes, but to partition the hash, so that > each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
On Thu, 2007-09-06 at 12:21 +0200, apoc9009 wrote: > If the System Crash, you have Dataloss of > over 16 MBytes thats Fatal and not acceptable! 1MByte Dataloss of ASCII Data > is also > not acceptable! Is any data loss acceptable in the case of a disaster? How much? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Ühel kenal päeval, N, 2007-09-06 kell 12:53, kirjutas apoc9009: > > You've either not read 23.4.4 or haven't understood it. If the text is > > unclear, documentation additions/changes are always welcome. > I have read this: > > PostgreSQL directly supports file-based log shipping as described above. > It is also possible to implement record-based log shipping, though this > requires custom development. check out SkyTools, it seems to make use of 23.4.4 to have sub-second failure window - Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hash index todo list item
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: > >> - What about multi-column indexes? The current implementation > >> only supports 1 column. > > > That seems kind of weird. It seems obvious that you mix the three hashes > > together which reduces it to the solved problem. > > No, because part of the deal is that you can do lookups using only the > leading index columns. At least, all the existing multicolumn index > types can do that. One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. If you do it smartly you can use any column for index lookups, not just the leading one. > regards, tom lane > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
apoc9009 wrote: You've either not read 23.4.4 or haven't understood it. If the text is unclear, documentation additions/changes are always welcome. I have read this: PostgreSQL directly supports file-based log shipping as described above. It is also possible to implement record-based log shipping, though this requires custom development. But thadt is not thadt what iam looking for! Filebased Logship backups having a big Problem for doing continous Backups. You have to wait until the Postmaster has written the WAL File, after this you can save it to the Backupserver. But 1 WAL has a size of 16 MByte ny default! (thadt is a big Datahole in your "Online Backup"!) Which is why that entire section is about copying just the changed parts of WAL files. It makes no sense to reduce the Filesize. If the Filesize is smaller then 16 MBytes for WAL Files you have still the same Problem, there are Data losses and thadt the Center of the Problem. But in your original email you said: > All Users of Hugh Databases (Missioncritical and allways Online) can > bring up its > Databases with the same information with differences 1-5 Sec. before > the Crash occurs! That suggested to me that you didn't want per-transaction backup, just one backup every second. OK, what you actually want is a continuous backup with one copy made per transaction. I wish to have an Solution, thadt backup my Database DB wihout Datalosses, without locking Tables, without Shutdown and without any User must be forced for logging out (Backup in Production State Online without Datalosses). So, if I understand, you want on of: 1. External RAID array. If main machine dies, turn backup machine on. Both share the same disks. 2. Something like DRBD to copy individual disk blocks between machines. You could do this just for WAL. 3. Replication. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Hi, On 9/6/07, apoc9009 <[EMAIL PROTECTED]> wrote: > Filebased Logship backups having a big Problem for doing continous > Backups. You have to wait until > the Postmaster has written the WAL File, after this you can save it to > the Backupserver. But 1 WAL > has a size of 16 MByte ny default! (thadt is a big Datahole in your > "Online Backup"!) You should read the documentation more carefully: "archive_timeout (integer) The archive_command is only invoked on completed WAL segments. Hence, if your server generates little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set archive_timeout to force the server to switch to a new WAL segment file periodically. When this parameter is greater than zero, the server will switch to a new segment file whenever this many seconds have elapsed since the last segment file switch. Note that archived files that are closed early due to a forced switch are still the same length as completely full files. Therefore, it is unwise to use a very short archive_timeout — it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable. This parameter can only be set in the postgresql.conf file or on the server command line. " (Taken from http://www.postgresql.org/docs/8.2/static/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT) > I wish to have an Solution, thadt backup my Database DB wihout > Datalosses, without locking Tables, without Shutdown > and without any User must be forced for logging out (Backup in > Production State Online without Datalosses). Well, there's what you want and there's what is possible. Warm standby is definitely the best solution for your problem, even if not perfect. -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
apoc9009 wrote: > I wish to have an Solution, thadt backup my Database DB wihout > Datalosses, without locking Tables, without Shutdown > and without any User must be forced for logging out (Backup in > Production State Online without Datalosses). "Without datalosses" is utopy. For that, you'd need something like synchronous replication, otherwise you're always going to have a window where you have something committed in the server, but not yet in the backup. So it's just a question of how wide that window is, how much data loss can you live with. With file-based log shipping, you can get down to 1 second, by using the archive_timeout setting. It will produce a lot of log files with very little content in them, but they will compress well. The record-based log shipping will give you a very narrow window, down to < 1 second or even less if you're willing to poll the server that often, but requires some custom development. I wonder, do you really need such a super real time backup solution, when you have the remote SAN? Don't you trust that the SAN hardware? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
You've either not read 23.4.4 or haven't understood it. If the text is unclear, documentation additions/changes are always welcome. I have read this: PostgreSQL directly supports file-based log shipping as described above. It is also possible to implement record-based log shipping, though this requires custom development. But thadt is not thadt what iam looking for! Filebased Logship backups having a big Problem for doing continous Backups. You have to wait until the Postmaster has written the WAL File, after this you can save it to the Backupserver. But 1 WAL has a size of 16 MByte ny default! (thadt is a big Datahole in your "Online Backup"!) It makes no sense to reduce the Filesize. If the Filesize is smaller then 16 MBytes for WAL Files you have still the same Problem, there are Data losses and thadt the Center of the Problem. I wish to have an Solution, thadt backup my Database DB wihout Datalosses, without locking Tables, without Shutdown and without any User must be forced for logging out (Backup in Production State Online without Datalosses). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
apoc9009 wrote: http://www.postgresql.org/docs/8.2/static/warm-standby.html Particularly section 23.4.4 23.4.4 is thadt what iam using just im Time but this ist not eneought for me! No Versioning, no chances to prevent data losses You have to wait until a WAL File ist written (Default Value for WAL Files is 16 MBytes). You've either not read 23.4.4 or haven't understood it. If the text is unclear, documentation additions/changes are always welcome. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
http://www.postgresql.org/docs/8.2/static/warm-standby.html Particularly section 23.4.4 23.4.4 is thadt what iam using just im Time but this ist not eneought for me! No Versioning, no chances to prevent data losses You have to wait until a WAL File ist written (Default Value for WAL Files is 16 MBytes). I need an EXCAT Copy from the Database and only the last changes in it to for updating my Backupresitory. If the System Crash, you have Dataloss of over 16 MBytes thadts Fatal and not acceptable! 1MByte Dataloss of ASCII Data is also not acceptable! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
http://www.postgresql.org/docs/8.2/static/warm-standby.html Particularly section 23.4.4 Thadt is Replication NOT Backup ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
apoc9009 wrote: Hi Hackers, In my Project i have to handle a Database with 600 GByte Text only, distributed on 4 Tablespaces on multiple Harddisks and Remote SAN's connected via Gigaethernet to the Remote SAN-Storage. I need more flexibility by doing Backups of my big Database, but the built in Online Backupsystem dont work for my Setup good enought for me. I Can not accept 16 MByte big WAL's Files for securing it on Tape. 16 MByte Data loss on a Crash Situation is Fatal and no helpfully (1 MByte to). I wish to have a continoues Backup without any data losses. http://www.postgresql.org/docs/8.2/static/warm-standby.html Particularly section 23.4.4 That can get you to 1 second or less. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Has anyone tried out the PL/pgSQL debugger?
John DeSoi wrote: > Hi Dave, > > On Sep 5, 2007, at 3:54 AM, Dave Page wrote: > >> That's odd - I cannot reproduce that on OS X using beta 4 (which has no >> important changes in the debugger over beta 3). >> >> Can you provide a simple test case? > > I get the same error with this: > > create or replace function debug_test(out t text, out i integer) > returns record as $$ > begin > t := 'test 1'; > i := 10; > return; > end; > $$ language plpgsql; > > > I did the following: > > 1. Right click the function and chose "Debug" from the "Debugging" submenu. > 2. Clicked the OK button on the dialog. Thanks John - bug found and fixed in SVN. Regards Dave ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Hi Hackers, In my Project i have to handle a Database with 600 GByte Text only, distributed on 4 Tablespaces on multiple Harddisks and Remote SAN's connected via Gigaethernet to the Remote SAN-Storage. I need more flexibility by doing Backups of my big Database, but the built in Online Backupsystem dont work for my Setup good enought for me. I Can not accept 16 MByte big WAL's Files for securing it on Tape. 16 MByte Data loss on a Crash Situation is Fatal and no helpfully (1 MByte to). I wish to have a continoues Backup without any data losses. My Idea: - 1 A Complete Database Backup from Scratch (its implemented right now) - 2 Online streaming Backup thadt, updates my Basebackup continously every Time Changes was made (the half functionality is allready implemented) - 3 Able to redirect the Online Streamingbackup Files to an Remote Servermachine (FTP) (the ARCHIVE Param in postgresql.conf can do thadt allready but the Method is not 100% Continously, big Holes of Datalosses can occur) - 4 Version Management of Multiple Backuplines by Timestamp (is not yet implemented) - 5 Recovery Option inside the PSQL-Client, for doing the Desaster Recovery. (is not yet implemented) Benefitts: All Users of Hugh Databases (Missioncritical and allways Online) can bring up its Databases with the same information with differences 1-5 Sec. before the Crash occurs! ps: At EMCSoftware there is an Tool thadt has can do thadt for ORACLE and MSSQL but there is not Option for Postgres avaiable ); Sry for my bad english and i hope there is some one thadt can understand the Problem. Apoc ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] left outer join vs subplan
On Wed, 2007-09-05 at 20:11 +0400, Teodor Sigaev wrote: > I found two queries which do the same thing but they is very different in > time. > For test suite it's about 10^3 times, but on real data it can be 10^5 times. > It's observed on 8.1-current, 8.2-current and CVS HEAD versions. Interesting > that even without LIMIT clause they take approximately the same time, but > costs > is differ in 30 times. Is any way to tweaking pgsql to produce more > reasonable > plan for first query? Times I get are: Q1: ~950ms Q2: ~5ms > This query is auto-generated, so they may be more complex and I choose > simplest > example. I think we need to build up a library of autogenerated queries, so we can do things which address multiple use cases. Can you tell us more about who/what generated it, so we can research? The query formulation does seem a fairly common one. > First query: > explain analyze > select * > from > a > left outer join ( > select b.id, sum(b.val) > from b > group by b.id > ) bagg > on bagg.id = a.id > where > a.id > 1 > order by a.addon, a.id > limit 100; > Limit (cost=9923.36..9923.61 rows=100 width=20) (actual > time=2232.437..2233.273 rows=100 loops=1) > -> Sort (cost=9923.36..10031.41 rows=43221 width=20) (actual > time=2232.428..2232.709 rows=100 loops=1) > Sort Key: a.addon, a.id > Sort Method: top-N heapsort Memory: 24kB > -> Merge Right Join (cost=0.00..8271.48 rows=43221 width=20) > (actual > time=313.198..2052.559 rows=4 loops=1) > Merge Cond: (b.id = a.id) > -> GroupAggregate (cost=0.00..5725.41 rows=53292 width=12) > (actual time=0.266..1422.522 rows=5 loops=1) > -> Index Scan using bidx on b (cost=0.00..4309.26 > rows=15 width=12) (actual time=0.217..547.402 rows=15 loops=1) > -> Index Scan using a1idx on a (cost=0.00..1256.90 > rows=40551 > width=8) (actual time=0.171..155.073 rows=4 loops=1) > Index Cond: (a.id > 1) > Total runtime: 2233.940 ms The value of sum(b.val) is never used in the query, so the aggregate itself could be discarded. I suspect there are other conditions you aren't showing us that would make this impossible? The aggregate prevents the condition bagg.id = a.id from being pushed down so that we know b.id = a.id. If we knew that then we could use b.id = ? as an index condition to retrieve the rows. Since we can't use the best technique, we use another. That then hits a third optimization problem. When an IndexScan is used to enforce order, we don't estimate how much of the table needs to be scanned before we start hitting rows. In the example you give we need to scan 65% of the table using an IndexScan before we hit any rows. So we would probably be better off doing a Sort<-SeqScan to apply the condition. I think we need to do all 3 eventually. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster