Re: [HACKERS] Patch queue concern
My feeling is we should have more regular sync points where the patch queue is emptied and everything committed or rejected. No doubt, but the real problem here is that reviewing/committing other people's patches is not fun, it's just work :-(. So it's no surprise that it tends to get put off. Not sure what to do about that. In my experience it mostly pays to keep people directly responsible for their own work. Every intermediate tester/reviewer/coordinator tends to reduce the submitter's feeling for responsibility. So I could imagine a modus operandi where a submitter states: I feel confident that you can commit without review and will be availabe for fixes/additional work required. Maybe we have that in the form of committers that commit their own work already. But I do feel that some patches Bruce is talking about need aggreement and help, not only review. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Patch queue concern
On Thu, 2007-03-29 at 01:34 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: My feeling is we should have more regular sync points where the patch queue is emptied and everything committed or rejected. No doubt, but the real problem here is that reviewing/committing other people's patches is not fun, it's just work :-(. Gosh, you always seemed to enjoy my patches so much ;-) We can all see how hard you and Bruce work and its very much appreciated, even if we don't often say so. That's why everybody else works so hard too. Sometimes we only communicate the tensions caused by external expectations. So it's no surprise that it tends to get put off. Not sure what to do about that. Well, one thing I can do is say Thanks now and try to do that more regularly in the future. The enjoyment you and others take from working on PostgreSQL is infectious, so whatever else we do its gotta stay fun. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: In this specific context, this particular case is easy to handle because we are only concerned about the serializable transactions started before CREATE INDEX commits. If PREPARE can see the new index, it implies that the CI transaction is committed. So the transaction starting after than can only see the tuple version that we have indexed. Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Imagine that it wasn't explicitly PREPARED (where you might say this is acceptable), but rather just a query inside a plpgsql function, maybe even called from some app using connection pooling. This means that the non-index using plan might get used for a quite long time, which contradics the work Tom did on plan invalidation I think. Maybe Tom can comment on wheter it's possible to use plan invalidation to eventually get rid of a stale plan in this context? greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote: Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Frankly I don't know this works, but are you sure that the plan will be used until the end of the session ? Even if thats the case, it can happen even today if we create a new index, but the existing sessions will use the stale plan (assuming what you said is true) Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Hi, Here're some feedback to the comment: Simon Riggs wrote: On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote: As written below, full page write can be categolized as follows: 1) Needed for crash recovery: first page update after each checkpoint. This has to be kept in WAL. 2) Needed for archive recovery: page update between pg_start_backup and pg_stop_backup. This has to be kept in archive log. 3) For log-shipping slave such as pg_standby: no full page writes will be needed for this purpose. My proposal deals with 2). So, if we mark each full_page_write, I'd rather mark when this is needed. Still need only one bit because the case 3) does not need any mark. I'm very happy with this proposal, though I do still have some points in detailed areas. If you accept that 1 2 are valid goals, then 1 3 or 1, 2 3 are also valid goals, ISTM. i.e. you might choose to use full_page_writes on the primary and yet would like to see optimised data transfer to the standby server. In that case, you would need the mark. Yes, I need the mark. In my proposal, only unmarked full-page-writes, which were written as the first update after a checkpoint, are to be removed offline (pg_compresslog). - Not sure why we need full_page_compress, why not just mark them always? That harms noone. (Did someone else ask for that? If so, keep it) No, no one asked to have a separate option. There'll be no bad influence to do so. So, if we mark each full_page_write, I'd rather mark when this is needed. Still need only one bit because the case 3) does not need any mark. OK, different question: Why would anyone ever set full_page_compress = off? Why have a parameter that does so little? ISTM this is: i) one more thing to get wrong ii) cheaper to mark the block when appropriate than to perform the if() test each time. That can be done only in the path where backup blocks are present. iii) If we mark the blocks every time, it allows us to do an offline WAL compression. If the blocks aren't marked that option is lost. The bit is useful information, so we should have it in all cases. Not only full-page-writes are written as WAL record. In my proposal, both full-page-writes and logical log are written in a WAL record, which will make WAL size slightly bigger (five percent or so). If full_page_compress = off, only a full-page-write will be written in a WAL record. I thought someone will not be happy with this size growth. I agree to make this mandatory if every body is happy with extra logical log in WAL records with full page writes. I'd like to have your opinion. - OTOH I'd like to see an explicit parameter set during recovery since you're asking the main recovery path to act differently in case a single bit is set/unset. If you are using that form of recovery, we should say so explicitly, to keep everybody else safe. Only one thing I had to do is to create dummy full page write to maintain LSNs. Full page writes are omitted in archive log. We have to LSNs same as those in the original WAL. In this case, recovery has to read logical log, not dummy full page writes. On the other hand, if both logical log and real full page writes are found in a log record, the recovery has to use real full page writes. I apologise for not understanding your reply, perhaps my original request was unclear. In recovery.conf, I'd like to see a parameter such as dummy_backup_blocks = off (default) | on to explicitly indicate to the recovery process that backup blocks are present, yet they are garbage and should be ignored. Having garbage data within the system is potentially dangerous and I want to be told by the user that they were expecting that and its OK to ignore that data. Otherwise I want to throw informative errors. Maybe it seems OK now, but the next change to the system may have unintended consequences and it may not be us making the change. It's OK the Alien will never escape from the lab is the starting premise for many good sci-fi horrors and I want to watch them, not be in one myself. :-) We can call it other things, of course. e.g. ignore_dummy_blocks decompressed_blocks apply_backup_blocks So far, we don't need any modification to the recovery and redo functions. They ignore the dummy and apply logical logs. Also, if there are both full page writes and logical log, current recovery selects full page writes to apply. I agree to introduce this option if 8.3 code introduces any conflict to the current. Or, we could introduce this option for future safety. Do you think we should introduce this option? If this should be introduced now, what we should do is to check this option when dummy full-page-write appears. Yes I believe so. As pg_standby does not include any chance to meet partial writes of pages, I believe you can omit all the full page writes. Of course, as Tom Lange suggested in http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php removing full
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Thu, 2007-03-29 at 17:50 +0900, Koichi Suzuki wrote: Not only full-page-writes are written as WAL record. In my proposal, both full-page-writes and logical log are written in a WAL record, which will make WAL size slightly bigger (five percent or so). If full_page_compress = off, only a full-page-write will be written in a WAL record. I thought someone will not be happy with this size growth. OK, I see what you're doing now and agree with you that we do need a parameter. Not sure about the name you've chosen though - it certainly confused me until you explained. A parameter called ..._compress indicates to me that it would reduce something in size whereas what it actually does is increase the size of WAL slightly. We should have a parameter name that indicates what it actually does, otherwise some people will choose to use this parameter even when they are not using archive_command with pg_compresslog. Some possible names... additional_wal_info = 'COMPRESS' add_wal_info wal_additional_info wal_auxiliary_info wal_extra_data attach_wal_info ... others? I've got some ideas for the future for adding additional WAL info for various purposes, so it might be useful to have a parameter that can cater for multiple types of additional WAL data. Or maybe we go for something more specific like wal_add_compress_info = on wal_add__info ... In recovery.conf, I'd like to see a parameter such as dummy_backup_blocks = off (default) | on to explicitly indicate to the recovery process that backup blocks are present, yet they are garbage and should be ignored. Having garbage data within the system is potentially dangerous and I want to be told by the user that they were expecting that and its OK to ignore that data. Otherwise I want to throw informative errors. Maybe it seems OK now, but the next change to the system may have unintended consequences and it may not be us making the change. It's OK the Alien will never escape from the lab is the starting premise for many good sci-fi horrors and I want to watch them, not be in one myself. :-) We can call it other things, of course. e.g. ignore_dummy_blocks decompressed_blocks apply_backup_blocks So far, we don't need any modification to the recovery and redo functions. They ignore the dummy and apply logical logs. Also, if there are both full page writes and logical log, current recovery selects full page writes to apply. I agree to introduce this option if 8.3 code introduces any conflict to the current. Or, we could introduce this option for future safety. Do you think we should introduce this option? Yes. You are skipping a correctness test and that should be by explicit command only. It's no problem to include that as well, since you are already having to specify pg_... decompress... but the recovery process doesn't know whether or not you've done that. Anyway, could you try to run pg_standby with pg_compresslog and pg_decompresslog? After freeze, yes. Additional recomment on page header removal: I found that it is not simple to keep page header in the compressed archive log. Because we eliminate unmarked full page writes and shift the rest of the WAL file data, it is not simple to keep page header as the page header in the compressed archive log. It is much simpler to remove page header as well and rebuild them. I'd like to keep current implementation in this point. OK. This is a good feature. Thanks for your patience with my comments. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Group Commit
I've been working on the patch to enhance our group commit behavior. The patch is a dirty hack at the moment, but I'm settled on the algorithm I'm going to use and I know the issues involved. Here's the patch as it is if you want to try it out: http://community.enterprisedb.com/groupcommit-pghead-2.patch but it needs a rewrite before being accepted. It'll only work on systems that use sysv semaphores, I needed to add a function to acquire a semaphore with timeout and I only did it for sysv_sema.c for now. What are the chances of getting this in 8.3, assuming that I rewrite and submit a patch within the next week or two? Algorithm - Instead of starting a WAL flush immediately after a commit record is inserted, we wait a while to give other backends a chance to finish their transactions and have them flushed by the same fsync call. There's two things we can control: how many commits to wait for (commit group size), and for how long (timeout). We try to estimate the optimal commit group size. The estimate is commit group size = (# of commit records flushed + # of commit records arrived while fsyncing). This is a relatively simple estimate that works reasonably well with very short transactions, and the timeout limits the damage when the estimate is not working. There's a lot more factors we could take into account in the estimate, for example: - # of backends and their states (affects how many are likely to commit soon) - amount of WAL written since last XLogFlush (affects the duration of fsync) - when exaclty the commit records arrive (we don't want to wait 10 ms to get one more commit record in, when an fsync takes 11 ms) but I wanted to keep this simple for now. The timeout is currently hard-coded at 1 ms. I wanted to keep it short compared to the time it takes to fsync (somewhere in the 5-15 ms depending on hardware), to limit the damage when the algorithm isn't getting the estimate right. We could also vary the timeout, but I'm not sure how to calculate the optimal value and the real granularity will depend on the system anyhow. Implementation -- To count the # of commits since last XLogFlush, I added a new XLogCtlCommit struct in shared memory: typedef struct XLogCtlCommit { slock_tcommit_lock; /* protects the struct */ int commitCount; /* # of commit records inserted since XLogFlush */ intgroupSize; /* current commit group size */ XLogRecPtr lastCommitPtr; /* location of the latest commit record */ PGPROC*waiter;/* process to signal when groupSize is reached */ } XLogCtlCommit; Whenever a commit record is inserted in XLogInsert, commitCount is incremented and lastCommitPtr is updated. When it reaches groupSize, the waiter-process is woken up. In XLogFlush, after acquiring WALWriteLock, we wait until groupSize is reached (or timeout expires) before doing the flush. Instead of the current logic to flush as much WAL as possible, we flush up to the last commit record. Flushing any more wouldn't save us an fsync later on, but might make the current fsync take longer. By doing that, we avoid the conditional acquire of the WALInsertLock that's in there currently. We make note of commitCount before starting the fsync; that's the # of commit records that arrived in time so that the fsync will flush them. Let's call that value intime. After the fsync is finished, we update the groupSize for the next round. The new groupSize is the current commitCount after the fsync, IOW the number of commit records arrived after the previous XLogFlush, including the time it took to do the fsync. We update the commitCount by decrementing it by intime. Now we're ready for the next round, and we can release WALWriteLock. WALWriteLock The above would work nicely, except that a normal lwlock doesn't play nicely. You can release and reacquire a lightwait lock in the same time slice even when there's other backends queuing for the lock, effectively cutting the queue. Here's what sometimes happens, with 2 clients: Client 1 Client 2 do workdo work insert commit record insert commit record acquire WALWriteLock try to acquire WALWriteLock, blocks fsync release WALWriteLock begin new transaction do work insert commit record reacquire WALWriteLock wait for 2nd commit to arrive Client 1 will eventually time out and commit just its own commit record. Client 2 should be released immediately after client 1 releases the WALWriteLock. It only needs to observe that its commit record has already been flushed and doesn't need to do anything. To fix the above, and other race conditions like that, we need a specialized WALWriteLock that orders the waiters by the commit record XLogRecPtrs. WALWriteLockRelease wakes up all waiters that have their commit record already flushed. They will just fall through without
Re: [HACKERS] Group Commit
I wrote: What are the chances of getting this in 8.3, assuming that I rewrite and submit a patch within the next week or two? I also intend to do performance testing with different workloads to ensure the patch doesn't introduce a performance regression under some conditions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Modifying TOAST thresholds
On Wed, 2007-03-28 at 14:08 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I also think that we ought to add TOAST_MAX_CHUNK_SIZE to the set of compiled-in parameters that are recorded in pg_control and checked for compatibility at startup (like BLCKSZ) --- this will prevent anyone from shooting themselves in the foot while experimenting. Is there any reason to experiment with this? I would have thought we would divorce TOAST_MAX_CHUNK_SIZE from TOAST_THRESHOLD and hard code it as the same expression that's there now. Ie, the largest size that can fit in a page. No, right now it's the largest size that you can fit 4 on a page. It's not obvious to me that 4 is optimal once it's divorced from TOAST_THRESHOLD. It seems possible that the correct number is 1, and even if it's useful to keep the tuples smaller than that, there's no reason to assume 4 is the best number per page. Well it certainly seems worth separating them. It does seem possible that recursive toasting effected some of the earlier results we looked at. Would you like me to do this, or will you? I'll look again at the possibility for setting TOAST_THRESHOLD and re-cast the test patch I have for production use. But either way it's going to be a couple of days after freeze now. I'd like to get some mechanism for reducing WAL volume into 8.3, whether its configurable toast or WAL reduction for UPDATEs. If for no other reason than making backup and availability solutions more manageable. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Patch queue concern
Tom Lane [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: My feeling is we should have more regular sync points where the patch queue is emptied and everything committed or rejected. No doubt, but the real problem here is that reviewing/committing other people's patches is not fun, it's just work :-(. So it's no surprise that it tends to get put off. Not sure what to do about that. Obviously a big part of that is that we just don't have enough committers. I'm hopeful that in time that situation will improve but in the meantime we do have a problem and the burden falls unfairly on a few. Is there anything others can do to help? If non-committers like Simon or I reviewed patches would it be easier for you to give a quick agreement to the comments or that's not an issue comment? It seems like we do have a few committers who should be able to review code quality but are uncertain about making major design decisions. If, for example, Bruce or Jan reviewed patches more invasive than they usually do for code quality and checked with you on design questions would that be helpful? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote: Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Frankly I don't know this works, but are you sure that the plan will be used until the end of the session ? Even if thats the case, it can happen even today if we create a new index, but the existing sessions will use the stale plan (assuming what you said is true) I've checked that: test=# prepare myplan as select * from test where id=1 ; PREPARE test=# explain execute myplan ; QUERY PLAN Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 1) (2 rows) Now I create an index in another session test=# explain select * from test where id=1 ; QUERY PLAN -- Bitmap Heap Scan on test (cost=95.11..8248.45 rows=5000 width=36) Recheck Cond: (id = 1) - Bitmap Index Scan on idx (cost=0.00..93.86 rows=5000 width=0) Index Cond: (id = 1) (4 rows) test=# explain execute myplan ; QUERY PLAN Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 1) (2 rows) !!! Index got used by the select .. but not by execute myplan ... !!! test=# prepare myplan2 as select * from test where id=1 ; PREPARE test=# explain execute myplan2 ; QUERY PLAN - Index Scan using idx on test (cost=0.00..8.38 rows=1 width=37) Index Cond: (id = 1) (2 rows) !!! A newly prepared plan of course uses the index !!! So yes, plans get cached until the end of the session, and yes, 8.2 won't notice index creation either ;-) The open question is how CVS HEAD with plan invalidation behaves. If it replans after the index-creating transaction commits, then basing index validity on a snapshot will break this, because upon replay they index might not be useable, but later on it may very well be (but that plan invalidation machinery won't realize that) So this might not introduce a regression compared to 8.2, but to a future 8.3 with plan invalidation... Sorry for being so unclear in my previous emails - I had confused myself ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ECPG threads test
On Wed, Mar 28, 2007 at 04:56:28PM +0200, Magnus Hagander wrote: From what I can see, the ecpg thread tests (src/interfaces/ecpg/threads) don't ever run. They rely on ENABLE_THREAD_SAFETY to be set, but even when I build with --enable-thread-safety, it's not set. This is because ecpg does not pull in pg_config.h, and also does not specify it on the commandline. It used to include pg_config.h, but we removed it since it was too much and instead created a file named ecpg_config.h. However, we missed ENABLE_THREAD_SAFETY there. I just committed a fix. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ECPG regression tests expected files
On Wed, Mar 28, 2007 at 07:30:21PM +0200, Magnus Hagander wrote: If you want to pick one early, please look at the one about the thread regression tests not appearing to run at all. I'd like to have that confirmed before I try to dig into how to fix it - in case it's not actually broken, and it's just me who's doing something wrong... I just committed some stuff including your patch and a fix to the regression test problem. At least it works on my system. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Patch queue concern
Gregory Stark wrote: Obviously a big part of that is that we just don't have enough committers. I'm hopeful that in time that situation will improve but in the meantime we do have a problem and the burden falls unfairly on a few. Is there anything others can do to help? If non-committers like Simon or I reviewed patches would it be easier for you to give a quick agreement to the comments or that's not an issue comment? It seems like we do have a few committers who should be able to review code quality but are uncertain about making major design decisions. If, for example, Bruce or Jan reviewed patches more invasive than they usually do for code quality and checked with you on design questions would that be helpful? I try to review things that I feel are well within my area of competence (e.g plperl, sql level commands) but I feel more hesitant about things very deep inside the backend - there's more danger I'll miss something subtle there. Outside events have conspired to make both reviewing and coding harder for me to get done this cycle. As for major design decisions, these should not be in the hands of a reviewer anyway - they should be explicitly discussed on list. There is plenty of scope for people to review patches if they aren't committers. In fact, it is highly encouraged. Please review anything on the patch list you feel able to. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Patch queue concern
Andrew Dunstan [EMAIL PROTECTED] writes: There is plenty of scope for people to review patches if they aren't committers. In fact, it is highly encouraged. Please review anything on the patch list you feel able to. Sure. Even if you miss things, every problem you do spot is one less... and there's no guarantee that the eventual committer would have seen it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ECPG regression tests expected files
On Thu, Mar 29, 2007 at 02:04:48PM +0200, Michael Meskes wrote: On Wed, Mar 28, 2007 at 07:30:21PM +0200, Magnus Hagander wrote: If you want to pick one early, please look at the one about the thread regression tests not appearing to run at all. I'd like to have that confirmed before I try to dig into how to fix it - in case it's not actually broken, and it's just me who's doing something wrong... I just committed some stuff including your patch and a fix to the regression test problem. At least it works on my system. Thanks. Passes the regression tests on win32 with the native threading. I've updated and committed updates to the thread regression tests to have them use the native threading as well. This removes the final requirement on pthreads on win32, so that's one more non-standard build depdendency no longer needed. Yay! //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
Florian G. Pflug [EMAIL PROTECTED] writes: Pavan Deolasee wrote: Frankly I don't know this works, but are you sure that the plan will be used until the end of the session ? Even if thats the case, it can happen even today if we create a new index, but the existing sessions will use the stale plan (assuming what you said is true) I've checked that: Evidently you weren't testing on HEAD. The open question is how CVS HEAD with plan invalidation behaves. If it replans after the index-creating transaction commits, then basing index validity on a snapshot will break this, because upon replay they index might not be useable, but later on it may very well be (but that plan invalidation machinery won't realize that) It will replan at the first use of the plan after seeing the relcache inval sent by commit of the index-creating transaction. If you have two separate transactions to create an index and then mark it valid later, everything's fine because there are two inval events. However, if you design something where an index becomes usable due to the passage of time rather than an explicit mark-valid step, there's gonna be a problem. I'd suggest trying to stick to the way CREATE INDEX CONCURRENTLY does it... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Modifying TOAST thresholds
Simon Riggs [EMAIL PROTECTED] writes: Well it certainly seems worth separating them. It does seem possible that recursive toasting effected some of the earlier results we looked at. Would you like me to do this, or will you? I'm willing to do the code changes to separate TOAST_THRESHOLD from the toast chunk size, but I do not have the time or facilities to do any performance testing for different parameter choices. Anyone want to work on that? I'd like to get some mechanism for reducing WAL volume into 8.3, whether its configurable toast or WAL reduction for UPDATEs. If for no other reason than making backup and availability solutions more manageable. I think the WAL-reduction proposal needs more time and thought than is feasible before 8.3. OTOH, tuning the TOAST parameters seems like something we understand well enough already, we just need to put some cycles into testing different alternatives. I would have no objection to someone working on that during April and delivering a final patch sometime before beta. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Warning on contrib/tsearch2
code: random backend code should not, not, not be using fopen() directly. If you lose control to an elog, which is certainly possible seeing that this loop calls into the utils/mb subsystem, you'll leak the file descriptor. Use AllocateFile/FreeFile instead of fopen/fclose. Will soon in tsearch_core patch I find the direct use of malloc/realloc/strdup to be poor style as well --- backend code that is not using palloc needs to have *very* good reason to do so, and I see none here. Already in tsearch_core patch. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] tsearch_core patch for inclusion
http://www.sigaev.ru/misc/tsearch_core-0.43.gz Changes: 1 Ispell dictionary now supports hunspell dictionary's format which is used by OpenOffice = 2.0.2 http://wiki.services.openoffice.org/wiki/Dictionaries Changes in format is addressed, basically, to better support of compound words ( German, Norwegian ). So, please, test it - we don't know that languages at all. 2 added recent fixes of contrib/tsearch2 3 fix usage of fopen/fclose -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Group Commit
Heikki Linnakangas [EMAIL PROTECTED] writes: I've been working on the patch to enhance our group commit behavior. The patch is a dirty hack at the moment, but I'm settled on the algorithm I'm going to use and I know the issues involved. ... The timeout is currently hard-coded at 1 ms. This is where my bogometer triggered. There's way too many platforms where 1 msec timeout is a sheer fantasy. If you cannot make it perform well with a 10-msec timeout then I don't think it's going to be at all portable. Now I know that newer Linux kernels tend to ship with 1KHz scheduler tick rate, so there's a useful set of platforms where you could make it work even so, but I'm not really satisfied with saying this facility is only usable if you have a fast kernel tick rate ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/29/07, Tom Lane [EMAIL PROTECTED] wrote: It will replan at the first use of the plan after seeing the relcache inval sent by commit of the index-creating transaction. If you have two separate transactions to create an index and then mark it valid later, everything's fine because there are two inval events. However, if you design something where an index becomes usable due to the passage of time rather than an explicit mark-valid step, there's gonna be a problem. I'd suggest trying to stick to the way CREATE INDEX CONCURRENTLY does it... I had earlier proposed to do things CIC way. But there were objections to the additional wait introduced in CREATE INDEX, and I don't think they were unreasonable. May be if we can avoid waits if there are no HOT-chains in the table, but still we need agreement on that. OTOH ISTM that the pg_index:xcreate solution may work fine if we can keep index unusable to those transactions which started before CREATE INDEX could commit. I coded a quick prototype where I use ActiveSnapshot in get_relation_info() to test if the CREATE INDEX transaction is seen as in-progress to the transaction. If seen as in-progress, the index is not used (even though the CREATE INDEX is committed and hence can be seen by SnapshotNow). If an index which was earlier seen unusable is marked as valid as time progresses, could there be some trouble ? I mean, as long as we don't make the index usable before all tuples which are not indexed are DEAD, we should be fine. Is there something I am missing ? Would it help to explain the idea if I post the patch ? CREATE INDEX and CREATE INDEX CONCURRENTLY turned out to be much more difficult than I imagined earlier. While we are discussing CREATE INDEX, I would post a design for CIC. I restrained myself till now to avoid confusion, but with time running out, it would be helpful to get agreement so that we can finish the patch on time. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane [EMAIL PROTECTED] writes: However, if you design something where an index becomes usable due to the passage of time rather than an explicit mark-valid step, there's gonna be a problem. I'd suggest trying to stick to the way CREATE INDEX CONCURRENTLY does it... I'm a bit skeptical about the idea of CREATE INDEX (ie, non-concurrent) creating an index that won't be used for a while. We get enough people asking why Postgres isn't using an index as it is... Besides, it seems if people are happy to have indexes take a long time to build they could just do a concurrent build. The reason they do non-concurrent builds is precisely because they're willing to take an exclusive lock in order to have them complete as soon as possible. Earlier we were talking about not inserting any HOT tuples until the index became valid. The goal of having an xid on the index was so we would know when we could start doing HOT updates again. That seems like a much lesser cost than not being able to use the index until all live transactions exit. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Group Commit
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I've been working on the patch to enhance our group commit behavior. The patch is a dirty hack at the moment, but I'm settled on the algorithm I'm going to use and I know the issues involved. ... The timeout is currently hard-coded at 1 ms. This is where my bogometer triggered. There's way too many platforms where 1 msec timeout is a sheer fantasy. If you cannot make it perform well with a 10-msec timeout then I don't think it's going to be at all portable. Now I know that newer Linux kernels tend to ship with 1KHz scheduler tick rate, so there's a useful set of platforms where you could make it work even so, but I'm not really satisfied with saying this facility is only usable if you have a fast kernel tick rate ... The 1 ms timeout isn't essential for the algorithm. In fact, I chose it arbitrarily; in the quick tests I did the length of the timeout didn't seem to matter much. I'm running with CONFIG_HZ=250 kernel myself, which means that the timeout is really 4 ms on my laptop. I suspect the tick rate largely explains why the current commit_delay isn't very good is that even though you specify it in microseconds, it really waits a lot longer. With the proposed algorithm, the fsync is started immediately when enough commit records have been inserted, so the timeout only comes into play when the estimate for the group size is too high. With a higher-precision timer, we could vary not only the commit group size but also the timeout. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Group Commit
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: This is where my bogometer triggered. There's way too many platforms where 1 msec timeout is a sheer fantasy. If you cannot make it perform well with a 10-msec timeout then I don't think it's going to be at all portable. The 1 ms timeout isn't essential for the algorithm. OK, but when you get to performance testing, please see how well it works at CONFIG_HZ=100. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] CREATE INDEX CONCURRENTLY and HOT
Sorry to start another thread while we are still discussing CREATE INDEX design, but I need help/suggestions to finish the patch on time for 8.3 We earlier thought that CREATE INDEX CONCURRENTLY (CIC) would be simpler to do because of the existing waits in CIC. But one major problem with CIC is that UPDATEs are allowed while we are building the index and these UPDATEs can create HOT-chains which has different values for attributes on which we are building the new index. To keep the HOT-chain semantic consistent across old and new indexes, we might be forced to delete the old index entry and reinsert new one during the validate_index() phase. This is of course not easy. May I propose the following design which is less intrusive: We do CIC in three phases: In the first phase we just create the catalog entry for the new index, mark the index read-only and commit the transaction. By read-only, I mean that the index is not ready inserts, but is consulted during UPDATEs to decide whether to do HOT UPDATE or not (just like other existing indexes). We then wait for all transactions conflicting on ShareLock to complete. That would guarantee that all the existing transactions which can not see the new index catalog entry are finished. A new transaction is started. We then build the index just the way we do today. While we are building the index, no new HOT-chains are be created where the index keys do not match because the new index is consulted while deciding whether to do HOT UPDATE or not. At the end of this step, the index is marked ready for inserts, we once again wait for all transactions conflicting on ShareLock to finish and commit the transaction. In the third phase, we validate the index inserting any missing index entries for tuples which are not HEAP_ONLY. For HEAP_ONLY tuples we already have the index entry though it points to the root tuple. Thats OK because we guarantee that all tuples in the chain share the same key with respect to old as well as new indexes. We then mark the index valid and commit. In summary, this design introduces one more transaction and wait. But that should not be a problem because we would anyways wait for those transactions to finish today though a bit later in the process. Comments/suggestions ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Patch queue concern
Gregory Stark wrote: Bruce Momjian [EMAIL PROTECTED] writes: It favours people who are short-sighted and don't see what possible improvements their code has. No code in an ongoing project like this is ever completed anyways. It favors those who do not wait until the last minute, but complete them well before the freeze date. What is this complete you keep talking about? Should I stop working on the sort/limit patch even though Heikki pointed out a few things to clean up and the cost model isn't updated yet just so that you'll consider it complete and put it on the patch queue? If I don't stop working on it you think we should just ignore it even if it's in a usable state now? Even the cost model changes could be done pretty easily with some guidance from a review. Complete means the author _thinks_ he is done, and has responded to all community comments on the patch. It's also an artifact of the working model we have where patches are sent in big chunks and reviewed much later during a feature freeze. If we were committing directly into a CVS repository we would have wanted to commit these changes as soon as they were ready for committing, not wait until they're completed. Then continue working and commit further changes. It's only This would have CVS containing uncomplete features --- and before beta, we would either have to beg the authors to complete them, or rip them out, neither of which we want to do. You don't want to commit something if it's in an unusable state and would have to be ripped out without more work. I said as soon as they're ready for committing as opposed to completed. You're asking people if they've stopped working on patches and you're surprised to find that there are a lot of patches people are still working on. That's silly, of course people are still working on them, many of these tasks are open ended and can be improved as long as we have time. just because they're still working on them doesn't necessarily mean what they have so far isn't worth committing as is yet. We don't want open-ended a few days before feature feeze. We want them to be as done, at some complete stopping point, and submitted. OK, but we don't want something that is ready to be committed, we need it complete. So how many more releases before you think Postgres is complete? I am getting tired of your semantic games, here, Greg. I have no idea what you are trying to accomplish, but I have better things to do. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixing insecure security definer functions
As was pointed out awhile ago http://archives.postgresql.org/pgsql-general/2007-02/msg00673.php it's insecure to run a SECURITY DEFINER function with a search_path setting that's under the control of someone who wishes to subvert the function. Even for non-security-definer functions, it seems useful to be able to select the search path for the function to use; we've had requests for that before. Right now, this is possible but tedious and slow, because you really have to use a subtransaction to ensure that the path is reset correctly on exit: BEGIN SET LOCAL search_path = ...; ... useful work here ... EXCEPTION END (In fact it's worse than that, since you can't write an EXCEPTION without at least one WHEN clause, which is maybe something to change?) Also, this approach isn't available in plain SQL functions. I would like to fix this for 8.3. I don't have a patch yet but want to get buy-in on a design before feature freeze. I propose the following, fully-backward-compatible design: 1. Add a text column propath to pg_proc. It can be either NULL or a search path specification (interpreted the same as values for the search_path GUC variable). NULL means use the caller's setting, ie, current behavior. 2. When fmgr.c sees either prosecdef or propath set for a function to be called, it will insert the fmgr_security_definer hook into the call. fmgr_security_definer will be responsible for establishing the correct current-user and/or path settings and restoring them on exit. (We could use two independent hooks, but since these features will often be used together, implementing both with just one hook seems reasonable.) 3. Add optional clauses to CREATE FUNCTION and ALTER FUNCTION to specify the propath value. I suggest, but am not wedded to, PATH 'foo, bar' PATH NONE Since PATH NONE is the default, it's not really needed in CREATE FUNCTION, but it seems useful to allow it for ALTER FUNCTION. Comments? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Patch queue concern
We don't want open-ended a few days before feature feeze. We want them to be as done, at some complete stopping point, and submitted. OK, but we don't want something that is ready to be committed, we need it complete. So how many more releases before you think Postgres is complete? I am getting tired of your semantic games, here, Greg. I have no idea what you are trying to accomplish, but I have better things to do. I have to concur here. Everyone is doing the best that they can. Greg, how about reviewing some patches? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/29/07, Gregory Stark [EMAIL PROTECTED] wrote: Besides, it seems if people are happy to have indexes take a long time to build they could just do a concurrent build. I think we discussed this earlier. One of the down-side of CIC is that it needs two complete heap scans. Apart from that CIC itself needs to wait for all existing transactions to finish and more than one instance of CIC can not be run on a table. Earlier we were talking about not inserting any HOT tuples until the index became valid. The goal of having an xid on the index was so we would know when we could start doing HOT updates again. That seems like a much lesser cost than not being able to use the index until all live transactions exit. What I am proposing is to keep index unusable for existing transactions. The index is available for all new transactions even if there are unfinished existing transactions. Is that a big problem ? Well, I still need buy-in and review from Tom and others on the design, but it seems workable to me. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Patch queue concern
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: My feeling is we should have more regular sync points where the patch queue is emptied and everything committed or rejected. No doubt, but the real problem here is that reviewing/committing other people's patches is not fun, it's just work :-(. So it's no surprise that it tends to get put off. Not sure what to do about that. Of course, writing patches isn't totally _fun_ either. The big problem is shown in this chart: P a t c h C o m p l e x i t y Developer | Simple Complex -- Experienced | Easy Medium Novice | Medium Hard The basic problem is we have a lot of complex patches coming in, and many from people who do not have years of experience with submitting patches to PostgreSQL. A complex patch from a novice user takes a lot of time to review, and frankly, we don't have enough experienced developers doing such reviews. If the patch deals with an area of the code where I am not experienced, often even I am incapable of reviewing the patch. The bottom line is that we are getting more novice developers faster than we grow experienced developers. This is no big surprise, and I don't see a simple solution. Odds are this is going to continue. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Patch queue concern
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: My feeling is we should have more regular sync points where the patch queue is emptied and everything committed or rejected. No doubt, but the real problem here is that reviewing/committing other people's patches is not fun, it's just work :-(. So it's no surprise that it tends to get put off. Not sure what to do about that. Obviously a big part of that is that we just don't have enough committers. I'm hopeful that in time that situation will improve but in the meantime we do have a problem and the burden falls unfairly on a few. Is there anything others can do to help? If non-committers like Simon or I reviewed patches would it be easier for you to give a quick agreement to the comments or that's not an issue comment? Just to clarify, the committing is the easy part. I can do that all day and not break a sweat. It is making sure the patch is correct in all aspects --- functionality, clarity, modularity, reliability, design, etc. that takes lots of time, and really can be done by anyone in the community. We already have people commenting on other peoples patches, and new versions appearing, and every new version makes the final job of review/commit easier, because someone was going to have to make those changes before the patch was applied. It seems like we do have a few committers who should be able to review code quality but are uncertain about making major design decisions. If, for example, Bruce or Jan reviewed patches more invasive than they usually do for code quality and checked with you on design questions would that be helpful? I wish that would work, but the big trick is getting the entire problem into your head, matching user behavior with our existing code, and making those link up. There is really no _stage_ nature of final patch review. People can still comment on the patch, and improve it, but the final decision has to be a holistic one that makes sure the entire patch is in harmony. (I am sounding new-age here. :-) ) You might remember during I think 8.1 I started pushing patches because no one was objecting to the patches, and people complained because the patches we not complete. The patches had problems, but I was unable to fully understand some of the patches, and the patches had to be backed out. Since then, I haven't applied anything I didn't fully understand, so the patches not languish in the patch queue until an experienced PostgreSQL developer who does fully understand them can give me a green light on it. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Patch queue concern
Bruce Momjian wrote: OK, but we don't want something that is ready to be committed, we need it complete. So how many more releases before you think Postgres is complete? I am getting tired of your semantic games, here, Greg. I have no idea what you are trying to accomplish, but I have better things to do. Why not just post a specific list of the patches you are thinking of? Is it the patch queue list in total? Did I miss it? Without specifics these things just spiral on forever, as all debates about code do when there is no code to actually look at. With specifics it is self documenting and definitional. You are thinking / concerned about x patches. Folks can look at how to move them forward, and it would probably help guide community attention. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: Earlier we were talking about not inserting any HOT tuples until the index became valid. The goal of having an xid on the index was so we would know when we could start doing HOT updates again. That seems like a much lesser cost than not being able to use the index until all live transactions exit. What I am proposing is to keep index unusable for existing transactions. The index is available for all new transactions even if there are unfinished existing transactions. Is that a big problem ? Well, I still need buy-in and review from Tom and others on the design, but it seems workable to me. Yes, that seems totally acceptable to me. As I remember, the index is usable by the transaction that created it, and new transactions. Hard to see how someone would have a problem with that. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixing insecure security definer functions
On 3/29/07, Tom Lane [EMAIL PROTECTED] wrote: As was pointed out awhile ago http://archives.postgresql.org/pgsql-general/2007-02/msg00673.php it's insecure to run a SECURITY DEFINER function with a search_path setting that's under the control of someone who wishes to subvert the function. Even for non-security-definer functions, it seems useful to be able to select the search path for the function to use; we've had requests for that before. Right now, this is possible but tedious and slow, because you really have to use a subtransaction to ensure that the path is reset correctly on exit: BEGIN SET LOCAL search_path = ...; ... useful work here ... EXCEPTION END (In fact it's worse than that, since you can't write an EXCEPTION without at least one WHEN clause, which is maybe something to change?) Also, this approach isn't available in plain SQL functions. I would like to fix this for 8.3. I don't have a patch yet but want to get buy-in on a design before feature freeze. I propose the following, fully-backward-compatible design: 1. Add a text column propath to pg_proc. It can be either NULL or a search path specification (interpreted the same as values for the search_path GUC variable). NULL means use the caller's setting, ie, current behavior. 2. When fmgr.c sees either prosecdef or propath set for a function to be called, it will insert the fmgr_security_definer hook into the call. fmgr_security_definer will be responsible for establishing the correct current-user and/or path settings and restoring them on exit. (We could use two independent hooks, but since these features will often be used together, implementing both with just one hook seems reasonable.) 3. Add optional clauses to CREATE FUNCTION and ALTER FUNCTION to specify the propath value. I suggest, but am not wedded to, PATH 'foo, bar' PATH NONE Since PATH NONE is the default, it's not really needed in CREATE FUNCTION, but it seems useful to allow it for ALTER FUNCTION. fwiw, I think this is a great solution...because the default behavior is preserved you get through without any extra guc settings (although you may want to add one anyways). maybe security definer functions should raise a warning for implicit PATH NONE, and possibly even deprecate that behavior and force people to type it out in future (8.4+) releases. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixing insecure security definer functions
* Merlin Moncure ([EMAIL PROTECTED]) wrote: fwiw, I think this is a great solution...because the default behavior is preserved you get through without any extra guc settings (although you may want to add one anyways). I agree that the proposed solution looks good. maybe security definer functions should raise a warning for implicit PATH NONE, and possibly even deprecate that behavior and force people to type it out in future (8.4+) releases. While I agree that raising a warning makes sense I don't believe it should be forced. There may be cases where, even in security definer functions, the current search_path should be used (though, of course, care must be taken in writing such functions). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Patch queue concern
Bruce Momjian [EMAIL PROTECTED] writes: The basic problem is we have a lot of complex patches coming in, and many from people who do not have years of experience with submitting patches to PostgreSQL. A complex patch from a novice user takes a lot of time to review, and frankly, we don't have enough experienced developers doing such reviews. Part of the issue is that we have a lot of new developers who are trying to solve hard problems without having learned their way around the code by fixing easy stuff. It was easier some years ago for people to learn that way, because there was way more low-hanging fruit back then. But there's still some out there. I have a distinct sense that we are getting patches from people who are trying to run before they've learned to walk. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Fixing insecure security definer functions
On 3/29/07, Stephen Frost [EMAIL PROTECTED] wrote: * Merlin Moncure ([EMAIL PROTECTED]) wrote: fwiw, I think this is a great solution...because the default behavior is preserved you get through without any extra guc settings (although you may want to add one anyways). I agree that the proposed solution looks good. maybe security definer functions should raise a warning for implicit PATH NONE, and possibly even deprecate that behavior and force people to type it out in future (8.4+) releases. While I agree that raising a warning makes sense I don't believe it should be forced. There may be cases where, even in security definer functions, the current search_path should be used (though, of course, care must be taken in writing such functions). I agree...I'm just suggesting to make you explicitly write 'PATH NONE' for security definer functions because of the security risk...just a thought though. merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixing insecure security definer functions
Stephen Frost [EMAIL PROTECTED] writes: * Merlin Moncure ([EMAIL PROTECTED]) wrote: maybe security definer functions should raise a warning for implicit PATH NONE, and possibly even deprecate that behavior and force people to type it out in future (8.4+) releases. While I agree that raising a warning makes sense I don't believe it should be forced. A WARNING seems reasonable to me too. I'd just do it on the combination of SECURITY DEFINER with PATH NONE, regardless of how you typed it exactly. ALTERing a function into that configuration should draw the same warning. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] tsearch_core patch for inclusion
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Teodor Sigaev wrote: http://www.sigaev.ru/misc/tsearch_core-0.43.gz Changes: 1 Ispell dictionary now supports hunspell dictionary's format which is used by OpenOffice = 2.0.2 http://wiki.services.openoffice.org/wiki/Dictionaries Changes in format is addressed, basically, to better support of compound words ( German, Norwegian ). So, please, test it - we don't know that languages at all. 2 added recent fixes of contrib/tsearch2 3 fix usage of fopen/fclose -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Concurrent connections in psql
+++ We'd love this feature as it would really help us write better test cases ! Regards Sailesh -- Sailesh Krishnamurthy Amalgamated Insight [W] (650) 242-3503 [C] (650) 804-6585 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark Sent: Tuesday, March 27, 2007 6:39 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Concurrent connections in psql Would people be interested in this feature? There was some positive reaction from users but I'm not sure how excited developers are about complicating the logic in psql (which is already pretty tangled). This code bitrotted severely when Tom added the cursor support to psql. I don't mind redoing it if people want it though. I already did a first pass at doing so but it wasn't clear to me how best to integrate it with that cursor support change. I elected to treat each chunk of results from the cursor as a separate result set which makes it possible to switch connections between chunks. That's nice but probably not really acceptable judging by how much effort Tom went through in the cursor code to avoid having the chunks appear as separate result sets. Probably I'll have to do more work in that area. Are people interested in having this? The reason I think it's particularly interesting is writing regression tests -- especially to test HOT cases. Gregory Stark [EMAIL PROTECTED] writes: I mentioned this a while back, now that 8.2 is out perhaps others will be more interested in new code. Currently Postgres regression tests only test functionality within a single session. There are no regression tests that test the transaction semantics or locking behaviour across multiple transactions. I modified psql to allow you to open multiple connections and switch between them with a sort of csh job control style interface. It actually works out pretty well. It's fairly easy to write regression tests for basic 2-client or 3-client cases. The actual user interface may need some discussion though. I didn't want to play the name game so I just prefixed all my commands with c and figured we can always rename them later. And experience with actually writing the tests shows that the explicit \cwait command which was needed to eliminate (in practice if not in theory) race conditions in regression tests turns out to be more flexibility than necessary. Since you end up having to insert one in precisely predictable locations -- namely after every asynchronous command and after every connection switch -- perhaps it would be simpler to just have a \pset cwait command that automatically introduces timeouts in precisely those places. A brief explanation including an example regression test (the SAVEPOINT locking bug discovered recently) and the patch here: http://community.enterprisedb.com/concurrent/index.html -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Server-side support of all encodings
Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with a kanji. My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special emoticons for mobile phones. These characters are usually in a region that is not specified by the standard SJIS, therefore they are not properly converted either to EUC or UTF8 (which would be my prefered choice, but unfortunately not all mobile phones support it, so conversion is still necessary - from what i've seen, the new SJIS_2004 map seems to define these entities, but I'm not 100% sure they all get converted properly). I inherited a system in which this problem is bypassed by setting SQL_ASCII server encoding, but that is not the best solution (full text search is rendered useless and occasionally the special character issue rears its ugly head - not only do we have to deal with normal sqlinjection, but also encoding-based injections) (and for the real WTF, my predecessor converted everything to EUC before inserting - eventually losing all the emojis and creating all sorts of strange phenomena, like tables with one column in euc until a certain date and sjis from then on while euc for all other columns) Is there a way to properly deal with sjis+emoji extensions (a patch i'm not aware of, for example), is it considered as a todo for further releases or should i consider augmenting postgres in a way (if the latter, could you provide any pointers on how to proceed?) Thank you, Zaki -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Monday, March 26, 2007 11:20 AM To: ITAGAKI Takahiro Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Server-side support of all encodings ITAGAKI Takahiro [EMAIL PROTECTED] writes: PostgreSQL suppots SJIS, BIG5, GBK, UHC and GB18030 as client encodings, but we cannot use them as server encodings. Are there any reason for it? Very much so --- they aren't safe ASCII-supersets, and thus for example the parser will fail on them. Backend encodings must have the property that all bytes of a multibyte character are = 128. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Simon, OK, different question: Why would anyone ever set full_page_compress = off? The only reason I can see is if compression costs us CPU but gains RAM I/O. I can think of a lot of applications ... benchmarks included ... which are CPU-bound but not RAM or I/O bound. For those applications, compression is a bad tradeoff. If, however, CPU used for compression is made up elsewhere through smaller file processing, then I'd agree that we don't need a switch. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee [EMAIL PROTECTED] writes: What I am proposing is to keep index unusable for existing transactions. The index is available for all new transactions even if there are unfinished existing transactions. Ah thanks, that makes a lot more sense. Sorry for the false alarm. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Server-side support of all encodings
On Wed, Mar 28, 2007 at 10:44:00AM +0900, Dezso Zoltan wrote: My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special emoticons for mobile phones. These characters are usually in a region that is not specified by the standard SJIS, Unicode also defines a region for user-defined characters. While it may be odd, perhaps it would be reasonable to map the user-defined space in SJIS to somewhere in the user-defined pace in Unicode, so that at least you round-trip consistancy. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] problems with plan invalidation
Kris Jurka [EMAIL PROTECTED] writes: Running the JDBC driver's regression test suite for the first time in a while I got a lot of failures that I would have to guess are related to plan invalidation work. Attached is a self contained test case and the JDBC driver's log of what protocol messages it is sending. I've committed a fix for this case --- please give it another try to see if you find any other problems. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
On Thu, 2007-03-29 at 22:08 +0530, Pavan Deolasee wrote: On 3/29/07, Tom Lane [EMAIL PROTECTED] wrote: It will replan at the first use of the plan after seeing the relcache inval sent by commit of the index-creating transaction. If you have two separate transactions to create an index and then mark it valid later, everything's fine because there are two inval events. However, if you design something where an index becomes usable due to the passage of time rather than an explicit mark-valid step, there's gonna be a problem. I'd suggest trying to stick to the way CREATE INDEX CONCURRENTLY does it... I had earlier proposed to do things CIC way. But there were objections to the additional wait introduced in CREATE INDEX, and I don't think they were unreasonable. May be if we can avoid waits if there are no HOT-chains in the table, but still we need agreement on that. OTOH ISTM that the pg_index:xcreate solution may work fine if we can keep index unusable to those transactions which started before CREATE INDEX could commit. Pavan, ISTM you have misunderstood Tom slightly. Having the index invisible to all current transactions is acceptable. However, the other backends will not receive an invalidation event, which means even when they start new transactions they will still not see it, which is not acceptable. ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CREATE INDEX and HOT - revised design
On Thu, 2007-03-29 at 13:55 -0400, Bruce Momjian wrote: Pavan Deolasee wrote: Earlier we were talking about not inserting any HOT tuples until the index became valid. The goal of having an xid on the index was so we would know when we could start doing HOT updates again. That seems like a much lesser cost than not being able to use the index until all live transactions exit. What I am proposing is to keep index unusable for existing transactions. The index is available for all new transactions even if there are unfinished existing transactions. Is that a big problem ? Well, I still need buy-in and review from Tom and others on the design, but it seems workable to me. Yes, that seems totally acceptable to me. As I remember, the index is usable by the transaction that created it, and new transactions. Hard to see how someone would have a problem with that. Agreed. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Feature thought: idle in transaction timeout
Hello, I ran into an interesting problem with a customer today. They are running Jabber XCP (not the one we use). Unfortunately, the product has a bug that causes it to leave connections persistent in a transaction state. This is what it does: BEGIN; SELECT 1; Basically it is verifying that the connection is live. However, it never calls commit. So what happens? We can't vacuum ;). Anyway, my thought is, we know when a transaction is idle, why not have an idle timeout where we will explicitly close the connection or rollback or something? User configurable of course. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Modifying TOAST thresholds
On Thu, 2007-03-29 at 12:05 -0400, Tom Lane wrote: I think the WAL-reduction proposal needs more time and thought than is feasible before 8.3. Agreed. We really need to focus on the major features. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [JDBC] problems with plan invalidation
On Thu, 29 Mar 2007, Tom Lane wrote: Kris Jurka [EMAIL PROTECTED] writes: Running the JDBC driver's regression test suite for the first time in a while I got a lot of failures that I would have to guess are related to plan invalidation work. Attached is a self contained test case and the JDBC driver's log of what protocol messages it is sending. I've committed a fix for this case --- please give it another try to see if you find any other problems. That seems to fix all of them. Thanks. Kris Jurka ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] timing in PostgreSQL
Hi, i need you help Postgres gurus... I have been modified postgres for a while... and i need to test the performance (for now, only time) of my modifications against the original postgres My first approach was to use the function clock() of the time.h C library, in the postgres.c file but reading the file, i noticed that you can set a group of options that print statistics (generals or specifics to postgres modules, in the stdin) for each query that you send to postgres Server... So, i did it... Now, i don't have a clue how interpreter what postgres print i search in the postgres manual but i find nothing about it Now, where i can find information about how to read this statistics??? (this sound a pg_admin question but read the second one) And, those times are exclusive postgres times, or they include the expended time of postgres in process the query, plus the time that postgres waits when there are other process running in the SO (you kwon, multi tasking)??? By the way, is this a good approach, or i will have to take the first approach that i mentioned??? Thanks. Waiting your answers, Carlos Chacon.
Re: [HACKERS] CREATE INDEX and HOT - revised design
Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. Back to the drawing board :-( regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. I believe the original idea was to invent some kind of on commit run this transaction hook - similar to how files are deleted on commit, I think. At least I understood the Run another transaction on commit that way... greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] List of uncompleted patches for 8.2
Here is a web site showing all the uncompleted patches for 8.2 that I am worried about: http://momjian.postgresql.org/cgi-bin/pgpatches_hold I can update the list as I get feedback. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] List of uncompleted patches for 8.2
Bruce Momjian wrote: Here is a web site showing all the uncompleted patches for 8.2 that I am Sorry, mean 8.3. --- worried about: http://momjian.postgresql.org/cgi-bin/pgpatches_hold I can update the list as I get feedback. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote: OK, different question: Why would anyone ever set full_page_compress = off? The only reason I can see is if compression costs us CPU but gains RAM I/O. I can think of a lot of applications ... benchmarks included ... which are CPU-bound but not RAM or I/O bound. For those applications, compression is a bad tradeoff. If, however, CPU used for compression is made up elsewhere through smaller file processing, then I'd agree that we don't need a switch. Koichi-san has explained things for me now. I misunderstood what the parameter did and reading your post, ISTM you have as well. I do hope Koichi-san will alter the name to allow everybody to understand what it does. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] List of uncompleted patches for 8.3
[ Sorry, reposted with correct subject line.] Here is a web site showing all the uncompleted patches for 8.3 that I am worried about: http://momjian.postgresql.org/cgi-bin/pgpatches_hold -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] CREATE INDEX and HOT - revised design
On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. I agree with most of that, but thats why we-are-where-we-are and I don't think its a showstopper at all. The idea is to make note that the transaction has created an index within a transaction block, so that after the top level transaction commits we sneak in an extra hidden transaction to update the pg_index tuple with the xcreate of the first transaction. We don't do this after the CREATE INDEX statement ends, only at the end of the transaction in which it ran. We only do this if we are creating an index on a table that is not a temporary table and was not created during the transaction (so --single-transaction isn't broken and doesn't require this additional action). i.e. MyTransactionCreatedVisibleIndex, with special processing in xact.c. The only other alternative is to forcibly throw a relcache inval event in the same circumstances without running the additional transaction, but the solution is mostly the same. I agree this is weird, but no more weird a solution as CIC was when that first came out. I don't like it, or think its clever; I just think its the only thing on the table. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Is this an psql Error???
Hi... I was trying to execute \timing in a psql console but the command that i always get is \t Is something that im missing??? or is an Error??? ...My version of postgres is 8.1.4 Bye.
Re: [HACKERS] CREATE INDEX and HOT - revised design
Simon Riggs wrote: On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. The idea is to make note that the transaction has created an index within a transaction block, so that after the top level transaction commits we sneak in an extra hidden transaction to update the pg_index tuple with the xcreate of the first transaction. The only other alternative is to forcibly throw a relcache inval event in the same circumstances without running the additional transaction, but the solution is mostly the same. I think one alternative might be to store a list of xid's together with a cached plan, and replan if the commit status (as percieved by the transaction the plan will be executed in) of one of those xid's changes. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Josh; I'd like to explain what the term compression in my proposal means again and would like to show the resource consumption comparision with cp and gzip. My proposal is to remove unnecessary full page writes (they are needed in crash recovery from inconsistent or partial writes) when we copy WAL to archive log and rebuilt them as a dummy when we restore from archive log. Dummy is needed to maintain LSN. So it is very very different from general purpose compression such as gzip, although pg_compresslog compresses archive log as a result. As to CPU and I/O consumption, I've already evaluated as follows: 1) Collect all the WAL segment. 2) Copy them by different means, cp, pg_compresslog and gzip. and compared the ellapsed time as well as other resource consumption. Benchmark: DBT-2 Database size: 120WH (12.3GB) Total WAL size: 4.2GB (after 60min. run) Elapsed time: cp:120.6sec gzip: 590.0sec pg_compresslog: 79.4sec Resultant archive log size: cp: 4.2GB gzip: 2.2GB pg_compresslog: 0.3GB Resource consumption: cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec pg_compresslog: user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec Because the resultant log size is considerably smaller than cp or gzip, pg_compresslog need much less I/O and because the logic is much simpler than gzip, it does not consume CPU. The term compress may not be appropriate. We may call this log optimization instead. So I don't see any reason why this (at least optimization mark in each log record) can't be integrated. Simon Riggs wrote: On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote: OK, different question: Why would anyone ever set full_page_compress = off? The only reason I can see is if compression costs us CPU but gains RAM I/O. I can think of a lot of applications ... benchmarks included ... which are CPU-bound but not RAM or I/O bound. For those applications, compression is a bad tradeoff. If, however, CPU used for compression is made up elsewhere through smaller file processing, then I'd agree that we don't need a switch. As I wrote to Simon's comment, I concern only one thing. Without a switch, because both full page writes and corresponding logical log is included in WAL, this will increase WAL size slightly (maybe about five percent or so). If everybody is happy with this, we don't need a switch. Koichi-san has explained things for me now. I misunderstood what the parameter did and reading your post, ISTM you have as well. I do hope Koichi-san will alter the name to allow everybody to understand what it does. Here're some candidates: full_page_writes_optimize full_page_writes_mark: means it marks full_page_write as needed in crash recovery, needed in archive recovery and so on. I don't insist these names. It's very helpful if you have any suggestion to reflect what it really means. Regards; -- Koichi Suzuki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Autovacuum vs statement_timeout
I seem to remember that we'd agreed that autovacuum should ignore any globally set statement_timeout, on the grounds that a poorly chosen setting could indefinitely prevent large tables from being vacuumed. But I do not see anything in autovacuum.c that resets the variable. Am I just being blind? (Quite possible, as I'm tired and under the weather.) The thing that brought this to mind was the idea that Mark Shuttleworth's open problem might be triggered in part by a statement timeout interrupting autovacuum at an inopportune point --- some logs he sent me offlist show that he is using statement_timeout ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Server-side support of all encodings
Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with a kanji. My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special emoticons for mobile phones. These characters are usually in a region that is not specified by the standard SJIS, therefore they are not properly converted either to EUC or UTF8 (which would be my prefered choice, but unfortunately not all mobile phones support it, so conversion is still necessary - from what i've seen, the new SJIS_2004 map seems to define these entities, but I'm not 100% sure they all get converted properly). I inherited a system in which this problem is bypassed by setting SQL_ASCII server encoding, but that is not the best solution (full text search is rendered useless and occasionally the special character issue rears its ugly head - not only do we have to deal with normal sqlinjection, but also encoding-based injections) (and for the real WTF, my predecessor converted everything to EUC before inserting - eventually losing all the emojis and creating all sorts of strange phenomena, like tables with one column in euc until a certain date and sjis from then on while euc for all other columns) Is there a way to properly deal with sjis+emoji extensions (a patch i'm not aware of, for example), is it considered as a todo for further releases or should i consider augmenting postgres in a way (if the latter, could you provide any pointers on how to proceed?) You can always use CREATE CONVERSION for this kind of purpose. Create your own conversion map between SJIS -- EUC or UT-8. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Server-side support of all encodings
Hello Everyone, I very much understand why SJIS is not a server encoding. It contains ASCII second bytes (including \ and ' both of which can be really nasty inside a normal sql) and further, half-width katakana is represented as one byte-characters, incidentally two of which coincide with a kanji. My question is, however: what would be the best practice if it was imperative to use SJIS encoding for texts and no built-in conversions are useful? To elaborate, I need to support japanese emoji characters, which are special emoticons for mobile phones. These characters are usually in a region that is not specified by the standard SJIS, therefore they are not properly converted either to EUC or UTF8 (which would be my prefered choice, but unfortunately not all mobile phones support it, so conversion is still necessary - from what i've seen, the new SJIS_2004 map seems to define these entities, but I'm not 100% sure they all get converted properly). I inherited a system in which this problem is bypassed by setting SQL_ASCII server encoding, but that is not the best solution (full text search is rendered useless and occasionally the special character issue rears its ugly head - not only do we have to deal with normal sqlinjection, but also encoding-based injections) (and for the real WTF, my predecessor converted everything to EUC before inserting - eventually losing all the emojis and creating all sorts of strange phenomena, like tables with one column in euc until a certain date and sjis from then on while euc for all other columns) Is there a way to properly deal with sjis+emoji extensions (a patch i'm not aware of, for example), is it considered as a todo for further releases or should i consider augmenting postgres in a way (if the latter, could you provide any pointers on how to proceed?) You can always use CREATE CONVERSION for this kind of purpose. Create your own conversion map between SJIS -- EUC or UT-8. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Hi, Here's a patch reflected some of Simon's comments. 1) Removed an elog call in a critical section. 2) Changed the name of the commands, pg_complesslog and pg_decompresslog. 3) Changed diff option to make a patch. -- Koichi Suzuki pg_lesslog.tgz Description: Binary data ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/30/07, Simon Riggs [EMAIL PROTECTED] wrote: Pavan, ISTM you have misunderstood Tom slightly. Oh, yes. Now that I re-read Tom's comment, his plan invalidation design and code, I understand things better. Having the index invisible to all current transactions is acceptable. Ok. However, the other backends will not receive an invalidation event, which means even when they start new transactions they will still not see it, which is not acceptable. Agree. ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. ISTM that the run-another-transaction-afterwards idea would have same problem with plan invalidation. When the second transaction commits, the relcache invalidation event is generated. The event may get consumed by other backends, but the index may still not be usable to them because their xid xcreat. If no more relcache invalidation events are generated after that, the backends would continue to use the cached plan, even if index becomes usable to them later. How about storing the snapshot which we used during planning in CachedPlanSource, if at least one index was seen unusable because its CREATE INDEX transaction was seen as in-progress ? In RevalidateCachedPlan(), we check if snapshot is set in CachedPlanSource and check if we are now using a different snapshot. This triggers plan invalidation and re-planning. This would also help us to use index early in read-committed transactions even if the transaction was started before CREATE INDEX committed. Does this sound good ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com