Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs schreef: On Fri, 2007-10-12 at 07:17 +0100, Simon Riggs wrote: On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote: Michael Paesold escribió: Simon Riggs wrote: Hmm, I am not sure we are there, yet. Autovacuum does take extra care to vacuum tables nearing xid wrap-around, right? It even does so when autovacuum is disabled in the configuration. So in case a vacuum is needed for that very reason, the vacuum should *not* be canceled, of course. So we don't really need the information, whether the AV worker is doing VACUUM or ANALYZE, but whether it is critical against xid wrap-around. Could that be done as easily as in Alvaro's patch for distinguishing vacuum/analyze? Alvaro? Yes, I think it is easy to mark the "is for xid wraparound" bit in the WorkerInfo struct and have the cancel work only if it's off. However, what I think should happen is that the signal handler for SIGINT in a worker for xid wraparound should not cancel the current vacuum. Instead turn it into a no-op, if possible. That way we also disallow a user from cancelling vacuums for xid wraparound. I think he can do that with pg_cancel_backend, and it could be dangerous. I think that is dangerous too because the user may have specifically turned AV off. That anti-wraparound vacuum might spring up right in a busy period and start working its way through many tables, all of which cause massive writes to occur. That's about as close to us causing an outage as I ever want to see. We need a way through that to allow the user to realise his predicament and find a good time to VACUUM. I never want to say to anybody "nothing you can do, just sit and watch, your production system will be working again in no time. Restart? no that won't work either." I think the best way to handle this is to have two limits. First limit attempts to autovacuum, but can be cancelled. When we hit second limit, sometime later, then autovacuum cannot be cancelled. That would give us a breathing space if we need it. Just a few thoughts: 1) In the postgresql.conf you can define if you use autovacuum. You make a parameter that states a time of day. If autovacuum is canceled once and not performed manually before that time, then it executes at that time (or just after the next system restart after that time). So you ensure that it isn't delayed indefinitely and you execute it on a time the database is normally not under a heavy load. As a standard value you could take 2am in the morning or so. 2) I you can cancel an autovacuum that way, could you prevent it by a statement to start executing in the first place, and then restart execution by another statement. There are a few situations where vacuuming is entirely pointless Example: a) Everyone logs out, upgradeprocedure of db is started b) drop indexes c) add tables/change tables/add columns/change columns d) convert data e) drop tables/drop columns f) add indexes g) vacuum full analyze h) Everyone starts new app BTW: I like pg83, allready looking for implementation when it hits the shelves...
Re: [HACKERS] Release notes introductory text
On Fri, Oct 12, 2007 at 07:51:13AM +0100, Simon Riggs wrote: > On Thu, 2007-10-11 at 17:46 -0400, Tom Lane wrote: > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Kevin Grittner wrote: > > >> If the goal is to provide fair warning of a high-than-usual-risk > > >> release, you've got it covered. > > > > > No, that was not the intent. The indent was to say we got a lot done in > > > one year. You have a suggestion? > > > > Yeah: take the entire paragraph out again. I concur with Neil that > > it's nothing but marketing fluff, and inaccurate fluff at that. > > I think it is important that we are up beat about what we have achieved, > but perhaps we should avoid opinions in the release notes. > > Perhaps it should be part of the press release? It certainly sounds a lot more like press release than release notes to me... //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Some questions about mammoth replication
Hannu Krosing wrote: > > We have hooks in executor calling our own collecting functions, so we > > don't need the trigger machinery to launch replication. > > But where do you store the collected info - in your own replication_log > table, or do reuse data in WAL you extract it on master befor > replication to slave (or on slave after moving the WAL) ? We don't use either a log table in database or WAL. The data to replicate is stored in disk files, one per transaction. As Joshua said, the WAL is used to ensure that only those transactions that are recorded as committed in WAL are sent to slaves. > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > are worth migrating to slaves , or do you just apply everything in WAL > > > in separate transactions and abort if you find out that original > > > transaction aborted ? > > > > We check if a data transaction is recorded in WAL before sending > > it to a slave. For an aborted transaction we just discard all data > > collected > > from that transaction. > > Do you duplicate postgresql's MVCC code for that, or will this happen > automatically via using MVCC itself for collected data ? Every transaction command that changes data in a replicated relation is stored on disk. PostgreSQL MVCC code is used on a slave in a natural way when transaction commands are replayed there. > > How do you handle really large inserts/updates/deletes, which change say 10M > rows in one transaction ? We produce really large disk files ;). When a transaction commits - a special queue lock is acquired and transaction is enqueued to a sending queue. Since the locking mode for that lock is exclusive a commit of a very large transaction would delay commits of other transactions until the lock is held. We are working on minimizing the time of holding this lock in the new version of Replicator. > > > Do you extract / generate full sql DML queries from data in WAL logs, or > > > do you apply the changes at some lower level ? > > > > We replicate the binary data along with a command type. Only the data > > necessary to replay the command on a slave are replicated. > > Do you replay it as SQL insert/update/delete commands, or directly on > heap/indexes ? We replay the commands directly using heap/index functions on a slave. Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: > Last two mails were sent by mistake without completion. I couldn't > curse my system any further :-) > a) Dead Space, if it is successfull in its implementation of what it claims, > will have the means to point out that all the tuples of certain chunks are > frozen for registered relations and registered chunks. There would be lot of > blocks which won't fall under this category. > i) For example, if the records are newly inserted, that block > can't be marked as containing all frozen tuples. If records have just been inserted to a block, it is in cache. Therefore hitting that block to check visibility isn't going to cost much. There might be some middle-ground where a tuple has been inserted a while ago, so that the block has already been evicted from cache, but the transaction hasn't yet been committed, but that's a pretty narrow use case. Note that we can flag a page in the DSM not only by VACUUM, but by any backend as soon as all tuples are visible to everyone. You do have to scan the tuple headers on the page to determine that, but that's not so much overhead, and it could be offloaded to the bgwriter. > ii) Imagine the case where there is a batch job / Huge select > query running in a enterprise for more than 6hrs. All the blocks which have > got inserted into the tables, during this period might not be able to get > the advantage of DSM Yep, true. A long-running transaction like that is problematic anyway, because we can't vacuum away any dead rows generated during that period. > iii) Imagine the case for which i am actually proposing the > Index with Snapshot infos. Partitioned tables. Every time a new table gets > created, it has to get registered into the Deadspace. This requires more > maintenance on the DBA Side Why do you think that the DBA needs to register tables to the DSM manually? Surely that would happen automatically. > iv) I understand the DeadSpaceLock to be a Global lock(If one > transaction is updating the dead space for any unregistered chunk, no one > else can query the DeadSpace). If my statement is right, then partitioned > tables might not be able to benefit from DSM. We have to remember for tables > with daily partitions, this would prove to be a nightmare The patch submitted for 8.3 did use a global lock, and a fixed size shared memory area, but those were exactly the reasons the patch was rejected. It will have to be reworked for 8.4. > Other than that there are lot of advantages, i foresee with including the > indexes with snapshots > i) Vacumming of these indexes need not be done with SuperExclusive Locks. It > is possible to design a strategy to vacuum these indexes with Exclusive > locks on pages I'm not convinced that's true. We only need super-exclusive locks on index pages for interlocking index and heap accesses with non-MVCC snapshots, IOW system tables. And since the lock is only held for a short time and infrequently, it hasn't been a problem at all. > ii) The above would mean that index can be in operation while the vacuum is > happening Huh? VACUUM hasn't locked out other access since version 7.2! > iii) As we have already stated, it provides a efficient clustering of > related data. Sorry, I missed that part. What's that? > iv) The Reverse Mapping Index, if present provides an efficient solution to > the Retail Vacuum problem. So HOT can be improved further with no need to > place the restriction of the updated tuple should be in the same page > iv) Updates on tables with primary keys(where primary key is not updated), > will be able to resolve the unique constraint faster. This is a minor > advantage. > > The complexity of Reverse Mapping index will only be there for user-defined > functional indexes. The *run-time* complexity of that will only be there for UDF indexes, but the *code* complexity will always be there. Sorry, I think the probability of a reverse mapping index being accepted to Postgres is very close to zero. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs wrote: I think the best way to handle this is to have two limits. First limit attempts to autovacuum, but can be cancelled. When we hit second limit, sometime later, then autovacuum cannot be cancelled. That would give us a breathing space if we need it. Sounds quite reasonable. Best Regards Michael Paesold ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Including Snapshot Info with Indexes
Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. -- Andreas Joseph Krogh <[EMAIL PROTECTED]> Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_restore oddity?
Mario Weilguni wrote: > I cannot use "-1" for performance, because some gist stuff has changed > and the restore fails. But there seems to be no option for pg_restore to > use transactions for data restore, so it's very very slow (one million > records, each obviously in it's own transaction - because a separate > session "select count(1) from logins" shows a growing number). By default, pg_dump/pg_restore uses a COPY command for each table, and each COPY executes as a single transaction, so you shouldn't see the row count growing like that. Is the dump file in --inserts format? > It would be nice to use transactions for the data stuff itself, but not > for schema changes or functions. I know I can use separate pg_restore > runs for schema and data, but it's complicated IMHO. pg_restore -s foo pg_restore -a -1 foo doesn't seem too complicated to me. Am I missing something? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Locales and Encodings
On Fri, Oct 12, 2007 at 02:03:47PM +0100, Gregory Stark wrote: > This approach doesn't address that but I don't think it makes the problems > there any worse either. That is, I think already have these problems around > shared tables. Or we could just setup encodings/locales per column and the problem goes away entirely. Most of the code's already been written, it's not even terribly difficult. Where we're stuck is that we can't agree on a source of locale data. People don't want the ICU or glibc data and there's no other source as readily available. Perhaps we should fix that problem, rather than making more workarounds. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Locales and Encodings
"Peter Eisentraut" <[EMAIL PROTECTED]> writes: > Am Freitag, 12. Oktober 2007 schrieb Gregory Stark: >> . when creating a new database from a template the new locale and encoding >> must be identical to the template database's encoding and locale. Unless >> the template is template0 in which case we rebuild all indexes after >> copying. > > Why would you restrict the index rebuilding only to this particular case? It > could be done for any database. Well there's no guarantee there isn't 8-bit data in other databases which would be invalid in the new encoding. I think it's reasonable to assume there's only 7-bit ascii in template0 however. An alternative would be introducing an ASCII7 encoding which template0 would use and any other database in that encoding could be used as a template for any encoding. However that would still require index rebuilds which would potentially take a long time. Another alternative would be recoding all the data from the template database encoding to the new encoding and throwing an error if a non-encodable character is found. I think it's a lot simpler to just declare it a non-problem by saying there won't be any non-ascii text in template0. > The other issue are shared catalogs. This approach doesn't address that but I don't think it makes the problems there any worse either. That is, I think already have these problems around shared tables. . If you have two databases with locales that don't agree then the indexes on those tables won't function properly. . What happens if you create a user while connected to a latin1 database with an é in his username and then connect to a database in a UTF8 database? That username is now an invalidly encoded UTF8 string. Perhaps we should be using pattern_ops for the indexes on the shared tables? Or using bytea with UTF8 encoded strings instead of name and text? That actually sounds reasonable now that we have convert() functions which take and generate bytea, at least for the text fields like in pltemplate -- less so for the name columns. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_restore oddity?
Heikki Linnakangas wrote: Mario Weilguni wrote: I cannot use "-1" for performance, because some gist stuff has changed and the restore fails. But there seems to be no option for pg_restore to use transactions for data restore, so it's very very slow (one million records, each obviously in it's own transaction - because a separate session "select count(1) from logins" shows a growing number). By default, pg_dump/pg_restore uses a COPY command for each table, and each COPY executes as a single transaction, so you shouldn't see the row count growing like that. Is the dump file in --inserts format? It would be nice to use transactions for the data stuff itself, but not for schema changes or functions. I know I can use separate pg_restore runs for schema and data, but it's complicated IMHO. pg_restore -s foo pg_restore -a -1 foo doesn't seem too complicated to me. Am I missing something? Doesn't pg_restore create the indices *after* loading the data if you let it restore the schema *and* the data in one step? The above workaround would disable that optimization, thereby making the data-restore phase much more costly. Now that I think about it, I remember that I've often whished that we not only had --schema-only and --data-only, but also --schema-unconstrained-only and --constraints-only. regards, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_tablespace_size()
Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> * Just remove the above-quoted lines. Superusers should be allowed to >>> shoot themselves in the foot. (I'm not actually sure that there would >>> be any bad consequences from putting an ordinary table into pg_global >>> anyway. > >> Is there ever *any* reason for doing this? > > Probably not a good one, and I suspect there would be some funny > misbehaviors if you were to clone the database containing the table. > The table would be physically shared but logically not. yuck. > What I'm inclined to do about it is is adopt my suggestion #2 (move the > location of the defense), since "permission denied" for a superuser is > a pretty unhelpful error message anyway. Ok. Works for me. >>> * Decide that we should allow anyone to do pg_tablespace_size('pg_global') >>> and put in a special wart for that in dbsize.c. This wasn't part of >>> the original agreement but maybe there's a case to be made for it. > >> That's pretty much the same thing, right? > > Well, no, I was suggesting that we might want to special-case pg_global > as a tablespace that anyone (superuser or no) could get the size of. > This is actually independent of whether we change the aclmask behavior. Oh, ok, I see. Then my vote is for the other solution = not allowing anybody to do this. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_tablespace_size()
Magnus Hagander <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> * Just remove the above-quoted lines. Superusers should be allowed to >> shoot themselves in the foot. (I'm not actually sure that there would >> be any bad consequences from putting an ordinary table into pg_global >> anyway. > Is there ever *any* reason for doing this? Probably not a good one, and I suspect there would be some funny misbehaviors if you were to clone the database containing the table. The table would be physically shared but logically not. What I'm inclined to do about it is is adopt my suggestion #2 (move the location of the defense), since "permission denied" for a superuser is a pretty unhelpful error message anyway. >> * Decide that we should allow anyone to do pg_tablespace_size('pg_global') >> and put in a special wart for that in dbsize.c. This wasn't part of >> the original agreement but maybe there's a case to be made for it. > That's pretty much the same thing, right? Well, no, I was suggesting that we might want to special-case pg_global as a tablespace that anyone (superuser or no) could get the size of. This is actually independent of whether we change the aclmask behavior. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] pg_tablespace_size()
Tom Lane wrote: > I wrote: >> [ squint... ] There is something wrong here, because a superuser should >> certainly pass the aclcheck test. I don't know where the bug is but >> this is not the correct fix. > > OK, after looking, the issue is this wart in pg_tablespace_aclmask(): > > /* > * Only shared relations can be stored in global space; don't let even > * superusers override this > */ > if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode()) > return 0; > > /* Otherwise, superusers bypass all permission checking. */ Yup, that was my point. > There are a number of ways that we could deal with this: > > * Just remove the above-quoted lines. Superusers should be allowed to > shoot themselves in the foot. (I'm not actually sure that there would > be any bad consequences from putting an ordinary table into pg_global > anyway. I think I wrote the above code in fear that some parts of the > system would equate reltablespace = pg_global with relisshared, but > AFAICS that's not the case now.) Is there ever *any* reason for doing this? If there isn't, I don't think we should provide just that foot-gun. But if there is any case where it makes sense to do that, then the superuser should probably be allowed to do it. > * Remove the above lines and instead put a defense into heap_create. > This might be better design anyway since a more specific error could > be reported. > > * Leave aclchk.c as-is and apply Magnus' patch to allow superusers > to bypass the check in pg_tablespace_size. See foot-gun above. If we want to keep the check, I think that my patch is fine. If we don't, then taking out that code is better. > * Decide that we should allow anyone to do pg_tablespace_size('pg_global') > and put in a special wart for that in dbsize.c. This wasn't part of > the original agreement but maybe there's a case to be made for it. That's pretty much the same thing, right? Since the acl check will check for pg_global, and if it's anything else, let superuser in. It's gotta be easier to read if it's just a plain superuser check, I think. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
On Fri, 2007-10-12 at 13:51 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Can you explain further what you meant by "don't disable manual > > cancels". > > I meant that pg_cancel_backend() should still work on autovac workers, > contrary to Alvaro's suggestion that autovac workers should sometimes > ignore SIGINT. > > Basically the implementation vision I have is that the SIGINT catcher in > an autovac worker should remain stupid, and any intelligence involved > should be on the side where we're deciding whether to send a signal or > not. This probably does involve exposing more state in PGPROC but I see > nothing much wrong with that. (It might be time to merge inVacuum, > isAutovacuum, and the additional state into a bitwise vacuumFlags field.) Gotcha -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Release notes introductory text
Tom Lane wrote: > "Joshua D. Drake" <[EMAIL PROTECTED]> writes: >> With respect to you Kevin, your managers should wait. You don't >> install .0 releases of "any" software into production without "months" >> of testing. At which point, normally a .1 release has come out anyway. > > How exactly do you expect the software to get from a .0 to a .1 release, > or to have addressed the bugs that might bite you when it does get to .1, > if you aren't helping to test it? In most environments I've seen, developer and QA systems don't hesitate to move to .0 releases (or even beta). I agree with Joshua that it's nerve wracking to move _production_ systems to .0 releases from most software vendors. > Now I realize that you did say "test" above, but way too often I see > this sort of argument as a justification for doing nothing and expecting > somebody else to fix it. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs <[EMAIL PROTECTED]> writes: > Can you explain further what you meant by "don't disable manual > cancels". I meant that pg_cancel_backend() should still work on autovac workers, contrary to Alvaro's suggestion that autovac workers should sometimes ignore SIGINT. Basically the implementation vision I have is that the SIGINT catcher in an autovac worker should remain stupid, and any intelligence involved should be on the side where we're deciding whether to send a signal or not. This probably does involve exposing more state in PGPROC but I see nothing much wrong with that. (It might be time to merge inVacuum, isAutovacuum, and the additional state into a bitwise vacuumFlags field.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Locales and Encodings
Tom Lane wrote: Peter Eisentraut <[EMAIL PROTECTED]> writes: Am Freitag, 12. Oktober 2007 schrieb Gregory Stark: It would make Postgres inconsistent and less integrated with the rest of the OS. How do you explain that Postgres doesn't follow the system's configurations and the collations don't agree with the system collations? We already have our own encoding support (for better or worse), and I don't think having one's own locale support would be that much different. Well, yes it would be, because encodings are pretty well standardized; there is not likely to be any user-visible difference between one platform's idea of UTF8 and another's. This is very very far from being the case for locales. See for instance the recent thread in which we found out that "en_US" locale has utterly different sort orders on Linux and OS X. For me, this paragraph is more of in argument *in favour* of having our own locale support. At least for me, consistency between PG running on different platforms would bring more benefits than consistency between PG and the platform it runs on. At the company I used to work for, we had all our databases running with encoding=utf-8 and locale=C, because I didn't want our applications to depend on platform-specific locale issues. Plus, some of the applications supported multiple languages, making a cluster-global locale unworkable anyway - a restriction which would go away if we went with ICU. regards, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > That seemed more complex when I thought about that, but if we just use > > SIGUSR2 for automatic cancels then this would be very simple. > > Why not SIGINT? I must be missing something. How would I tell the difference between manual and automatic cancels if we use SIGINT for both cases? Or did you think to put another flag on the PGPROC to allow us to tell whether it isAvoidingWraparound? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Locales and Encodings
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Freitag, 12. Oktober 2007 schrieb Gregory Stark: >> It would make Postgres inconsistent and less integrated with the rest of >> the OS. How do you explain that Postgres doesn't follow the system's >> configurations and the collations don't agree with the system collations? > We already have our own encoding support (for better or worse), and I don't > think having one's own locale support would be that much different. Well, yes it would be, because encodings are pretty well standardized; there is not likely to be any user-visible difference between one platform's idea of UTF8 and another's. This is very very far from being the case for locales. See for instance the recent thread in which we found out that "en_US" locale has utterly different sort orders on Linux and OS X. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_tablespace_size()
Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: >> Attached is a patch I want to apply for this. Toms message at >> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me >> bring it up here before I apply it. > > [ squint... ] There is something wrong here, because a superuser should > certainly pass the aclcheck test. I don't know where the bug is but > this is not the correct fix. Are you sure? pg_tablespace_aclmask() has: /* * Only shared relations can be stored in global space; don't let even * superusers override this */ if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode()) return 0; And this is what causes the failure. I certainly didn't want to take out that check, so short-circuiting it in the way I did seemed right.. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Locales and Encodings
On Fri, Oct 12, 2007 at 03:28:26PM +0100, Gregory Stark wrote: > Fix the problem by making ICU a smaller less complex dependency? How? It's 95% data, you can't reduce that. glibc also has 10MB of locale data. That actual code is much smaller than postgres and doesn't depend on any other non-system libraries. > I think realistically we're basically waiting for strcoll_l to become > standardized by POSIX so we can depend on it. I think we could be waiting forever then. It's supported by Win32, MacOSX and glibc. The systems that don't support it tend not to support multibyte collation anyway. Patches have been created to use this and rejected because not enough platforms support it... > Personally I think we should just implement our own strcoll_l as a wrapper > around setlocale-strcoll-setlocale and use strcoll_l if it's available and > our, possibly slow, wrapper if not. If we ban direct use of strcoll and other > lc_collate sensitive functions in Postgres we could also remember the last > locale used and not do unnecessary setlocales so existing use cases aren't > slowed down at all. Been done also. As I recall it was *really* slow, not just a little bit. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Locales and Encodings
Michael Glaesemann wrote: > > On Oct 12, 2007, at 10:19 , Gregory Stark wrote: > >> It would make Postgres inconsistent and less integrated with the rest >> of the >> OS. How do you explain that Postgres doesn't follow the system's >> configurations and the collations don't agree with the system >> collations? > > How is this fundamentally different from PostgreSQL using a separate > users/roles system than the OS? Even more, eliminating dependencies on a OS's correct implementation of locale stuff appears A Good Thing to me. I wonder if a compile time option to use ICU in 8.4 should be considered, regarding all those lengthy threads about encoding/locale/collation problems. Regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs <[EMAIL PROTECTED]> writes: > That seemed more complex when I thought about that, but if we just use > SIGUSR2 for automatic cancels then this would be very simple. Why not SIGINT? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_tablespace_size()
Tom Lane wrote: > Dave Page <[EMAIL PROTECTED]> writes: > > select pg_tablespace_size('pg_global'); > > ERROR: permission denied for tablespace pg_global > > > Can this be fixed please? > > What's to be fixed? That's exactly the result I'd expect given the > previously-agreed-to behavior for pg_tablespace_size. I know we said tablespace requires CREATE privs, but certainly the super-user should be able to use the function too. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_tablespace_size()
Magnus Hagander <[EMAIL PROTECTED]> writes: > Attached is a patch I want to apply for this. Toms message at > http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me > bring it up here before I apply it. [ squint... ] There is something wrong here, because a superuser should certainly pass the aclcheck test. I don't know where the bug is but this is not the correct fix. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Including Snapshot Info with Indexes
Andreas Joseph Krogh <[EMAIL PROTECTED]> writes: > Will $SUBJECT make it possible for count(*) to use index? This is a much > wanted feature. If you mean "count(*) will become instantaneous", no it won't. It would get faster, but probably not by more than a factor of 10 or so, corresponding to the size ratio between index and table. Remember that the proposal is going to bloat indexes pretty badly: a simple index on an int4 column will double in size, more or less. So that ratio will be much less favorable than it is today. Personally I think the bloat problem will doom this entire approach. The distributed costs of that will outweigh the speedups that can be achieved for specific queries. The OP is free to try to prove this fear wrong, but no amount of discussion will constitute such a proof; only a testable implementation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Locale + encoding combinations
On Fri, Oct 12, 2007 at 06:03:52AM -0700, Trevor Talbot wrote: > On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote: > > Tom Lane wrote > > > That still leaves us with the problem of how to tell whether a locale > > > spec is bad on Windows. Judging by your example, Windows checks whether > > > the code page is present but not whether it is sane for the base locale. > > > What happens when there's a mismatch --- eg, what encoding do system > > > messages come out in? > > > > I'm not sure how to test that specifically, but it seems that accented > > characters simply fall back to their undecorated equivalents if the > > encoding is not appropriate, eg: > > > > [EMAIL PROTECTED]:~$ ./setlc French_France.1252 > > Locale: French_France.1252 > > The date is: sam. 01 of août 2007 > > [EMAIL PROTECTED]:~$ ./setlc French_France.28597 > > Locale: French_France.28597 > > The date is: sam. 01 of aout 2007 > > > > (the encodings used there are WIN1252 and ISO8859-7 (Greek)). > > > > I'm happy to test further is you can suggest how I can figure out the > > encoding actually output. > > The encoding output is the one you specified. Keep in mind, > underneath Windows is mostly working with Unicode, so all characters > exist and the locale rules specify their behavior there. The encoding > is just the byte stream it needs to force them all into after doing > whatever it does to them. As you've seen, it uses some sort of > best-fit mapping I don't know the details of. (It will drop accent > marks and choose characters with similar shape where possible, by > default.) > > I think it's a bit more complex for input/transform cases where you > operate on the byte stream directly without intermediate conversion to > Unicode, which is why UTF-8 doesn't work as a codepage, but again I > don't have the details nearby. I can try to do more digging if > needed. Just so the non-windows-savvy people get it.. When Windows documentation or users refer to Unicode, they mean UTF-16. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_tablespace_size()
On Fri, Oct 12, 2007 at 12:20:08PM +0100, Dave Page wrote: > Following on from Hiroshi's earlier post: > http://archives.postgresql.org/pgsql-hackers/2007-10/msg00324.php > > Per http://archives.postgresql.org/pgsql-hackers/2007-08/msg01018.php, > superusers should be able to use pg_tablespace_size() on any tablespace, > however when logged in as postgres on a beta1 server, I get: > > -- Executing query: > select pg_tablespace_size('pg_global'); > > ERROR: permission denied for tablespace pg_global > > Can this be fixed please? Attached is a patch I want to apply for this. Toms message at http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me bring it up here before I apply it. I think the correct behaviour is that superusers should be able to do it, and that's in the thread that Dave referred to. But since Tom said it's "by design" in the message above, I'd like confirmation that this is ok. Since I think that allowing superusers to do it is the only sensible thing, I will apply this patch unless someone objects :-) //Magnus Index: src/backend/utils/adt/dbsize.c === RCS file: /cvsroot/pgsql/src/backend/utils/adt/dbsize.c,v retrieving revision 1.14 diff -c -r1.14 dbsize.c *** src/backend/utils/adt/dbsize.c 29 Aug 2007 17:24:29 - 1.14 --- src/backend/utils/adt/dbsize.c 12 Oct 2007 14:38:32 - *** *** 164,171 /* * User must have CREATE privilege for target tablespace, either explicitly * granted or implicitly because it is default for current database. */ ! if (tblspcOid != MyDatabaseTableSpace) { aclresult = pg_tablespace_aclcheck(tblspcOid, GetUserId(), ACL_CREATE); if (aclresult != ACLCHECK_OK) --- 164,175 /* * User must have CREATE privilege for target tablespace, either explicitly * granted or implicitly because it is default for current database. +* +* Specifically check for superuser permissions here, since +* pg_tablespace_aclcheck() will deny access to pg_global even for +* superusers. */ ! if (tblspcOid != MyDatabaseTableSpace && !superuser()) { aclresult = pg_tablespace_aclcheck(tblspcOid, GetUserId(), ACL_CREATE); if (aclresult != ACLCHECK_OK) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Locales and Encodings
Am Freitag, 12. Oktober 2007 schrieb Martijn van Oosterhout: > Where we're stuck is that we can't agree on a > source of locale data. People don't want the ICU or glibc data and > there's no other source as readily available. What were the objections to ICU? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Locales and Encodings
"Martijn van Oosterhout" <[EMAIL PROTECTED]> writes: > People don't want the ICU or glibc data and there's no other source as > readily available. > > Perhaps we should fix that problem, rather than making more > workarounds. Fix the problem by making ICU a smaller less complex dependency? Or fix the problem that glibc isn't everyone's libc? I think realistically we're basically waiting for strcoll_l to become standardized by POSIX so we can depend on it. Personally I think we should just implement our own strcoll_l as a wrapper around setlocale-strcoll-setlocale and use strcoll_l if it's available and our, possibly slow, wrapper if not. If we ban direct use of strcoll and other lc_collate sensitive functions in Postgres we could also remember the last locale used and not do unnecessary setlocales so existing use cases aren't slowed down at all. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Locale + encoding combinations
On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote: > Tom Lane wrote > > That still leaves us with the problem of how to tell whether a locale > > spec is bad on Windows. Judging by your example, Windows checks whether > > the code page is present but not whether it is sane for the base locale. > > What happens when there's a mismatch --- eg, what encoding do system > > messages come out in? > > I'm not sure how to test that specifically, but it seems that accented > characters simply fall back to their undecorated equivalents if the > encoding is not appropriate, eg: > > [EMAIL PROTECTED]:~$ ./setlc French_France.1252 > Locale: French_France.1252 > The date is: sam. 01 of août 2007 > [EMAIL PROTECTED]:~$ ./setlc French_France.28597 > Locale: French_France.28597 > The date is: sam. 01 of aout 2007 > > (the encodings used there are WIN1252 and ISO8859-7 (Greek)). > > I'm happy to test further is you can suggest how I can figure out the > encoding actually output. The encoding output is the one you specified. Keep in mind, underneath Windows is mostly working with Unicode, so all characters exist and the locale rules specify their behavior there. The encoding is just the byte stream it needs to force them all into after doing whatever it does to them. As you've seen, it uses some sort of best-fit mapping I don't know the details of. (It will drop accent marks and choose characters with similar shape where possible, by default.) I think it's a bit more complex for input/transform cases where you operate on the byte stream directly without intermediate conversion to Unicode, which is why UTF-8 doesn't work as a codepage, but again I don't have the details nearby. I can try to do more digging if needed. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs schreef: On Fri, 2007-10-12 at 11:44 +0200, Michael Paesold wrote: Simon Riggs wrote: On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote: Yes, I think it is easy to mark the "is for xid wraparound" bit in the WorkerInfo struct and have the cancel work only if it's off. However, what I think should happen is that the signal handler for SIGINT in a worker for xid wraparound should not cancel the current vacuum. Instead turn it into a no-op, if possible. That way we also disallow a user from cancelling vacuums for xid wraparound. I think he can do that with pg_cancel_backend, and it could be dangerous. I think that is dangerous too because the user may have specifically turned AV off. That anti-wraparound vacuum might spring up right in a busy period and start working its way through many tables, all of which cause massive writes to occur. That's about as close to us causing an outage as I ever want to see. We need a way through that to allow the user to realise his predicament and find a good time to VACUUM. I never want to say to anybody "nothing you can do, just sit and watch, your production system will be working again in no time. Restart? no that won't work either." You are probably right that VACUUM going full-steam is a bad idea in most situations. Except for anti-wraparound vacuum, cancellation seems the most reasonable thing to do. Because autovacuum will usually pickup the table in time again. Yeh, if we do have to do the second emergency anti-wraparound, then that should be at full speed, since there's nothing else to do at that point. The only problem I would see is if someone has an application that does a lot of schema changes (doesn't sound like a good idea anyway). In that case they would better issue manual vacuums on such tables. I can't see a use case for regular DDL as part of an application, on an otherwise integral table (lots of updates and deletes). As part of an application there's no use. As part of an upgrade between 2 different versions of that application there is. And that's exactly the kind of situation where temporary disabling autovacuum could become handy.
[HACKERS] pg_restore oddity?
There's a IMO a problem with pg_restore, it should be easy to fix (I hope - and I could try to fix it and send a patch). * I've a dump taken from a 8.1 database * I'm using gist and ltree * I'm restoring to a 8.2 database Problem: I cannot use "-1" for performance, because some gist stuff has changed and the restore fails. But there seems to be no option for pg_restore to use transactions for data restore, so it's very very slow (one million records, each obviously in it's own transaction - because a separate session "select count(1) from logins" shows a growing number). It would be nice to use transactions for the data stuff itself, but not for schema changes or functions. I know I can use separate pg_restore runs for schema and data, but it's complicated IMHO. I see several options: * Use transactions for data, maybe with a separate command line option * Use transactions everytime, and place savepoints to recover from errors? Any ideas what I could do? Regards Mario ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Locales and Encodings
Am Freitag, 12. Oktober 2007 schrieb Gregory Stark: > . when creating a new database from a template the new locale and encoding > must be identical to the template database's encoding and locale. Unless > the template is template0 in which case we rebuild all indexes after > copying. Why would you restrict the index rebuilding only to this particular case? It could be done for any database. The other issue are shared catalogs. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/12/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > > Gokulakannan Somasundaram wrote: > If records have just been inserted to a block, it is in cache. Therefore > hitting that block to check visibility isn't going to cost much. There > might be some middle-ground where a tuple has been inserted a while ago, > so that the block has already been evicted from cache, but the > transaction hasn't yet been committed, but that's a pretty narrow use > case. > > Note that we can flag a page in the DSM not only by VACUUM, but by any > backend as soon as all tuples are visible to everyone. You do have to > scan the tuple headers on the page to determine that, but that's not so > much overhead, and it could be offloaded to the bgwriter. The first step in any database tuning of course is to reduce Random I/Os. But then the next step is to reduce the logical I/Os, Whether the I/O happens from the Disk/Memory, we should try to reduce the access to a shared resource as much as possible. So even if the newly inserted tuples are in memory / disk, restricting the access to it would improve the overall performance of the system(That place can be taken over by other blocks). If we see the overall picture, scalability of the database gets increased, as we reduce the number of locks being taken. Yep, true. A long-running transaction like that is problematic anyway, > because we can't vacuum away any dead rows generated during that period. It is not problematic for the Indexes with snapshot. They will be working as usual. And i think one reason why timestamp based databases got an advantage over locking based databases is that batch jobs can run together with OLTP transactions. In order to maintain that advantage in PostgreSQL, Indexes with snapshot helps. Why do you think that the DBA needs to register tables to the DSM > manually? Surely that would happen automatically. Accepted. The patch submitted for 8.3 did use a global lock, and a fixed size > shared memory area, but those were exactly the reasons the patch was > rejected. It will have to be reworked for 8.4. Ok, then the best case rework to my understanding would be to include the bitmap DSM block number into the locking attributes. But still one DSM block would be mapping to 8192 * 8 = 65536 Table blocks. So if you have to add a unregistered chunk of a newly created partition, then any access into the 65536 blocks will have to get stalled, which may not be acceptable under the OLTP performance requirements. This becomes a performance overhead for people maintaining daily partitions, as they create a new table everyday and the space would be increasing from morning till evening and the same table would be queried the most. I'm not convinced that's true. We only need super-exclusive locks on > index pages for interlocking index and heap accesses with non-MVCC > snapshots, IOW system tables. And since the lock is only held for a > short time and infrequently, it hasn't been a problem at all. For a heap with no indexes, we don't take super-exclusive lock to do Vacuum on it. We just need to take Exclusive lock on each block and do the Vacuum on it. That's because the table contains the necessary visibility information. But with indexes, we may need to refer to the table in order to do Vacuum. In the mean-while we don't want any page splits to happen. That's why we take a super exclusive lock on all the leaf pages (no body should even have a pin on one of them Ref : README file in nbtree directory) But if we have the snapshot info into the indexes, then we can just do a index scan(similar to the heap scan described before) by taking Exclusive lock on pages one by one and Vacuuming them. > ii) The above would mean that index can be in operation while the vacuum > is > > happening > > Huh? VACUUM hasn't locked out other access since version 7.2! I might have missed something here. If we need Super-Exclusive lock on all leaf pages in the index to do the Vacuum(no-one should be even having a PIN on it), then how are we able to allow index scans while the Vacuum is happening? In my case, the index will have the snapshot information. so no super exclusive locks, so other leaf pages can be accessed. If there is a explanation, it might also be useful to update the README file in the nbtree directory > iii) As we have already stated, it provides a efficient clustering of > > related data. > > Sorry, I missed that part. What's that? Say i create a index on SalespersonId, Date, Customer Name, Details of the Transaction on a table. For a query like select Customer Name, Details of the transaction from table where salespersonid='xyz' order by date. The entire query gets satisfied by the Index. We will not goto the table. In short the index acts like an IOT in oracle/ Clustered indexes in other databases. The necessary information is obtained from one place, since snapshot is stored in the index It will be very useful, especially when the query is goi
Re: [HACKERS] Some questions about mammoth replication
Hannu Krosing wrote: > > We don't use either a log table in database or WAL. The data to > > replicate is stored in disk files, one per transaction. > > Clever :) > > How well does it scale ? That is, at what transaction rate can your > replication keep up with database ? This depend on a number of concurrent transactions (the data is collected by every backend process), max transaction size etc. I don't have numbers here, sorry. > > > As Joshua said, > > the WAL is used to ensure that only those transactions that are recorded > > as committed in WAL are sent to slaves. > > How do you force correct commit order of applying the transactions ? The first transaction that is committed in PostgreSQL is the first transaction placed into the queue, and the first that is restored by the slave. > > > > > > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > > > are worth migrating to slaves , or do you just apply everything in WAL > > > > > in separate transactions and abort if you find out that original > > > > > transaction aborted ? > > > > > > > > We check if a data transaction is recorded in WAL before sending > > > > it to a slave. For an aborted transaction we just discard all data > > > > collected > > > > from that transaction. > > > > > > Do you duplicate postgresql's MVCC code for that, or will this happen > > > automatically via using MVCC itself for collected data ? > > > > Every transaction command that changes data in a replicated relation is > > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way > > when transaction commands are replayed there. > > Do you replay several transaction files in the same transaction on > slave ? > Can you replay several transaction files in parallel ? No, we have plans for concurrent restore of replicated data, but currently we a single slave process responsible for restoring data received from MCP. > > > > How do you handle really large inserts/updates/deletes, which change say > > > 10M > > > rows in one transaction ? > > > > We produce really large disk files ;). When a transaction commits - a > > special queue lock is acquired and transaction is enqueued to a sending > > queue. > > Since the locking mode for that lock is exclusive a commit of a > > very large transaction would delay commits of other transactions until > > the lock is held. We are working on minimizing the time of holding this > > lock in the new version of Replicator. > > Why does it take longer to queue a large file ? dou you copy data from > one file to another ? Yes, currently the data is copied from the transaction files into the queue (this doesn't apply to dump transactions). However, we have recently changed this, the new code will acquire the queue lock only to record transaction as committed in replication log without moving the data. > > > Do you replay it as SQL insert/update/delete commands, or directly on > > > heap/indexes ? > > > > We replay the commands directly using heap/index functions on a slave. > > Does that mean that the table structures will be exactly the same on > both master slave ? Yes, the table structure on the slaves should match the table structure on master. > That is, do you replicate a physical table image > (maybe not including transaction ids on master) ? Yes, we call this 'full dump', and it is fired automatically for every replicated table. We replicate only data however, not DDL commands to create/alter table or sequence. > > Or you just use lower-level versions on INSERT/UPDATE/DELETE ? > > - > Hannu > > > Regards, -- Alexey Klyukin http://www.commandprompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Locales and Encodings
It seems like the root of the problems we're butting our heads against with encoding and locale is all the same issue: it's nonsensical to take the locale at initdb time per-cluster and then allow user-specified encoding per-database. If anything it would make more sense to go the other way around. But actually it seems to me we could allow changing both on a per-database basis with certain restrictions: . template0 is always SQL_ASCII with locale C . when creating a new database you can specify the encoding and locale and we check that they're compatible. . when creating a new database from a template the new locale and encoding must be identical to the template database's encoding and locale. Unless the template is template0 in which case we rebuild all indexes after copying. We could liberalize this last restriction if we created a new encoding like SQL_ASCII but which enforces 7-bit ascii. But then the index rebuild step could take a long time. This would make the whole locale/encoding issue make much more transparent. In database listings you would see both listed alongside, you wouldn't be bound by any initdb environment choices, and errors when running create database would be able to tell you exactly what you're doing wrong and what you have to do to avoid the problem. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] pg_tablespace_size()
Following on from Hiroshi's earlier post: http://archives.postgresql.org/pgsql-hackers/2007-10/msg00324.php Per http://archives.postgresql.org/pgsql-hackers/2007-08/msg01018.php, superusers should be able to use pg_tablespace_size() on any tablespace, however when logged in as postgres on a beta1 server, I get: -- Executing query: select pg_tablespace_size('pg_global'); ERROR: permission denied for tablespace pg_global Can this be fixed please? Regards, Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Some questions about mammoth replication
Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin: > Hannu Krosing wrote: > > > > We have hooks in executor calling our own collecting functions, so we > > > don't need the trigger machinery to launch replication. > > > > But where do you store the collected info - in your own replication_log > > table, or do reuse data in WAL you extract it on master befor > > replication to slave (or on slave after moving the WAL) ? > > We don't use either a log table in database or WAL. The data to > replicate is stored in disk files, one per transaction. Clever :) How well does it scale ? That is, at what transaction rate can your replication keep up with database ? > As Joshua said, > the WAL is used to ensure that only those transactions that are recorded > as committed in WAL are sent to slaves. How do you force correct commit order of applying the transactions ? > > > > > > Do you make use of snapshot data, to make sure, what parts of WAL log > > > > are worth migrating to slaves , or do you just apply everything in WAL > > > > in separate transactions and abort if you find out that original > > > > transaction aborted ? > > > > > > We check if a data transaction is recorded in WAL before sending > > > it to a slave. For an aborted transaction we just discard all data > > > collected > > > from that transaction. > > > > Do you duplicate postgresql's MVCC code for that, or will this happen > > automatically via using MVCC itself for collected data ? > > Every transaction command that changes data in a replicated relation is > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way > when transaction commands are replayed there. Do you replay several transaction files in the same transaction on slave ? Can you replay several transaction files in parallel ? > > How do you handle really large inserts/updates/deletes, which change say > > 10M > > rows in one transaction ? > > We produce really large disk files ;). When a transaction commits - a > special queue lock is acquired and transaction is enqueued to a sending > queue. > Since the locking mode for that lock is exclusive a commit of a > very large transaction would delay commits of other transactions until > the lock is held. We are working on minimizing the time of holding this > lock in the new version of Replicator. Why does it take longer to queue a large file ? dou you copy data from one file to another ? > > > > Do you extract / generate full sql DML queries from data in WAL logs, or > > > > do you apply the changes at some lower level ? > > > > > > We replicate the binary data along with a command type. Only the data > > > necessary to replay the command on a slave are replicated. > > > > Do you replay it as SQL insert/update/delete commands, or directly on > > heap/indexes ? > > We replay the commands directly using heap/index functions on a slave. Does that mean that the table structures will be exactly the same on both master slave ? That is, do you replicate a physical table image (maybe not including transaction ids on master) ? Or you just use lower-level versions on INSERT/UPDATE/DELETE ? - Hannu ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ECPG regression tests
On Tue, Oct 09, 2007 at 05:44:22PM -0400, Andrew Dunstan wrote: > > > Andrew Dunstan wrote: > > > > > >Magnus Hagander wrote: > >>>(Hint: if you don't put the PlatformSDK directories first in the > >>>INCLUDE and LIB lists bad and inexplicable things can happen.) > >>> > >>>Pick up the latest version of run_build.pl in CVS if you want to run > >>>this in your buildfarm animal now. > >>> > >>>A release will be forthcoming very soon. > >>> > >> > >>I put it in, but it doesn't work. It works when running ecpg tests > >>manual, > >>but from run_build I get: > >>http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=skylark&dt=2007-10-09%20090814&stg=ecpg-check > >> > >> > >> > >>which seems similar to what you had before. How did you fix that > >>one? Is > >>that the one requiring a reorder? > >> > >> > >> > > > >Yes. compare its build_env INCLUDE and LIB and possibly PATH values > >with those of red_bat: > >http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=red_bat&dt=2007-10-09%2004:30:04 > > > > > > > >BTW, following some advice I found on the net those PlatformSDK > >directories were copied manually from the SDK install, back when I > >installed MSVC. > > You also appear to have simply a configuration bug - there is a missing > semicolon or two in: > > LIB' => 'D:\\Program Files (x86)\\Microsoft Visual Studio > 8\\VC\\ATLMFC\\LIB;D:\\Program Files (x86)\\Microsoft Visual Studio > 8\\VC\\LIBD:\\Program Files (x86)\\Microsoft Visual Studio > 8\\VC\\PlatformSDK\\lib;D:\\Program Files (x86)\\Microsoft Visual Studio > 8\\SDK\\v2.0\\libD:\\Program Files (x86)\\Microsoft Visual Studio .NET > 2003\\SDK\\v1.1\\Lib\\;C:\\Program Files\\SQLXML 4.0\\bin\\', Gah. That was it, I had . instead of , in the config file :-( Fixing that made the ecpg regression etsts pass, didn't have to reorder or change anything at all. //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
Andreas Joseph Krogh wrote: > Will $SUBJECT make it possible for count(*) to use index? This is a much > wanted feature. Yes, both the DSM approach and the approach proposed by Gokul. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Including Snapshot Info with Indexes
On Friday 12 October 2007 11:49:17 Heikki Linnakangas wrote: > Andreas Joseph Krogh wrote: > > Will $SUBJECT make it possible for count(*) to use index? This is a much > > wanted feature. > > Yes, both the DSM approach and the approach proposed by Gokul. Good. -- Andreas Joseph Krogh <[EMAIL PROTECTED]> Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs wrote: On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote: Yes, I think it is easy to mark the "is for xid wraparound" bit in the WorkerInfo struct and have the cancel work only if it's off. However, what I think should happen is that the signal handler for SIGINT in a worker for xid wraparound should not cancel the current vacuum. Instead turn it into a no-op, if possible. That way we also disallow a user from cancelling vacuums for xid wraparound. I think he can do that with pg_cancel_backend, and it could be dangerous. I think that is dangerous too because the user may have specifically turned AV off. That anti-wraparound vacuum might spring up right in a busy period and start working its way through many tables, all of which cause massive writes to occur. That's about as close to us causing an outage as I ever want to see. We need a way through that to allow the user to realise his predicament and find a good time to VACUUM. I never want to say to anybody "nothing you can do, just sit and watch, your production system will be working again in no time. Restart? no that won't work either." You are probably right that VACUUM going full-steam is a bad idea in most situations. Except for anti-wraparound vacuum, cancellation seems the most reasonable thing to do. Because autovacuum will usually pickup the table in time again. The only problem I would see is if someone has an application that does a lot of schema changes (doesn't sound like a good idea anyway). In that case they would better issue manual vacuums on such tables. Best Regards Michael Paesold ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Timezone database changes
Am Donnerstag, 11. Oktober 2007 schrieb Gregory Stark: > Thinking of it as UTC is the wrong way to think about it. A birth occurred > at a specific moment in time. You want to record that precise moment, not > what it happened to show on the clock at the time. If the clock turns out > to have been in the wrong timezone the birth isn't going to move. > > The use case for storing a local timestamp with a timezone attached is for > things like appointments. If the time zone rules change you would want the > appointment to move with them, not to stay at the same moment in time. The difference here is that one occured in the past and one is planned for the future. Appointments in the past will still stay at the same time even if the time zone rules change afterwards. The supercorrect way to handle this would likely be to introduce some sort of time-zone rules changeset that describes "as of point in time X, the time zone designation ABC changes in the following way", which would then fix up all data items past point X in the database in some clever way. Obviously this is quite a bit too much for us to manage. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs <[EMAIL PROTECTED]> writes: > On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote: >> Why not SIGINT? > I must be missing something. How would I tell the difference between > manual and automatic cancels if we use SIGINT for both cases? Why do you need to? I thought the plan was that DeadlockCheck would only try to signal autovac workers. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_tablespace_size()
I wrote: > [ squint... ] There is something wrong here, because a superuser should > certainly pass the aclcheck test. I don't know where the bug is but > this is not the correct fix. OK, after looking, the issue is this wart in pg_tablespace_aclmask(): /* * Only shared relations can be stored in global space; don't let even * superusers override this */ if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode()) return 0; /* Otherwise, superusers bypass all permission checking. */ There are a number of ways that we could deal with this: * Just remove the above-quoted lines. Superusers should be allowed to shoot themselves in the foot. (I'm not actually sure that there would be any bad consequences from putting an ordinary table into pg_global anyway. I think I wrote the above code in fear that some parts of the system would equate reltablespace = pg_global with relisshared, but AFAICS that's not the case now.) * Remove the above lines and instead put a defense into heap_create. This might be better design anyway since a more specific error could be reported. * Leave aclchk.c as-is and apply Magnus' patch to allow superusers to bypass the check in pg_tablespace_size. * Decide that we should allow anyone to do pg_tablespace_size('pg_global') and put in a special wart for that in dbsize.c. This wasn't part of the original agreement but maybe there's a case to be made for it. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
On Fri, 2007-10-12 at 11:44 +0200, Michael Paesold wrote: > Simon Riggs wrote: > > On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote: > >> Yes, I think it is easy to mark the "is for xid wraparound" bit in the > >> WorkerInfo struct and have the cancel work only if it's off. > >> > >> However, what I think should happen is that the signal handler for > >> SIGINT in a worker for xid wraparound should not cancel the current > >> vacuum. Instead turn it into a no-op, if possible. That way we also > >> disallow a user from cancelling vacuums for xid wraparound. I think he > >> can do that with pg_cancel_backend, and it could be dangerous. > > > > I think that is dangerous too because the user may have specifically > > turned AV off. That anti-wraparound vacuum might spring up right in a > > busy period and start working its way through many tables, all of which > > cause massive writes to occur. That's about as close to us causing an > > outage as I ever want to see. We need a way through that to allow the > > user to realise his predicament and find a good time to VACUUM. I never > > want to say to anybody "nothing you can do, just sit and watch, your > > production system will be working again in no time. Restart? no that > > won't work either." > > You are probably right that VACUUM going full-steam is a bad idea in most > situations. Except for anti-wraparound vacuum, cancellation seems the most > reasonable thing to do. Because autovacuum will usually pickup the table in > time again. Yeh, if we do have to do the second emergency anti-wraparound, then that should be at full speed, since there's nothing else to do at that point. > The only problem I would see is if someone has an application that does a > lot of schema changes (doesn't sound like a good idea anyway). In that case > they would better issue manual vacuums on such tables. I can't see a use case for regular DDL as part of an application, on an otherwise integral table (lots of updates and deletes). -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Including Snapshot Info with Indexes
I agree with that. I will go ahead and do a test implementation and share the results with everyone. Thanks, Gokul. On 10/12/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > Andreas Joseph Krogh <[EMAIL PROTECTED]> writes: > > Will $SUBJECT make it possible for count(*) to use index? This is a much > > wanted feature. > > If you mean "count(*) will become instantaneous", no it won't. It would > get faster, but probably not by more than a factor of 10 or so, > corresponding to the size ratio between index and table. Remember that > the proposal is going to bloat indexes pretty badly: a simple index on > an int4 column will double in size, more or less. So that ratio will > be much less favorable than it is today. > > Personally I think the bloat problem will doom this entire approach. > The distributed costs of that will outweigh the speedups that can be > achieved for specific queries. The OP is free to try to prove this > fear wrong, but no amount of discussion will constitute such a proof; > only a testable implementation. > > regards, tom lane > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org >
Re: [HACKERS] Locales and Encodings
On Oct 12, 2007, at 10:19 , Gregory Stark wrote: It would make Postgres inconsistent and less integrated with the rest of the OS. How do you explain that Postgres doesn't follow the system's configurations and the collations don't agree with the system collations? How is this fundamentally different from PostgreSQL using a separate users/roles system than the OS? Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] pg_tablespace_size()
Dave Page <[EMAIL PROTECTED]> writes: > In the message I quoted at the end of the long discussion on the topic > you assured me it would work for superusers. > Is there any reason the superuser shouldn't see the size of pg_global? Oh, you were superuser? Then something's broken, agreed. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pg_tablespace_size()
Tom Lane wrote: > Dave Page <[EMAIL PROTECTED]> writes: >> select pg_tablespace_size('pg_global'); >> ERROR: permission denied for tablespace pg_global > >> Can this be fixed please? > > What's to be fixed? That's exactly the result I'd expect given the > previously-agreed-to behavior for pg_tablespace_size. In the message I quoted at the end of the long discussion on the topic you assured me it would work for superusers. Is there any reason the superuser shouldn't see the size of pg_global? Regards, Dave. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Locale + encoding combinations
Tom Lane wrote > That still leaves us with the problem of how to tell whether a locale > spec is bad on Windows. Judging by your example, Windows checks whether > the code page is present but not whether it is sane for the base locale. > What happens when there's a mismatch --- eg, what encoding do system > messages come out in? I'm not sure how to test that specifically, but it seems that accented characters simply fall back to their undecorated equivalents if the encoding is not appropriate, eg: [EMAIL PROTECTED]:~$ ./setlc French_France.1252 Locale: French_France.1252 The date is: sam. 01 of août 2007 [EMAIL PROTECTED]:~$ ./setlc French_France.28597 Locale: French_France.28597 The date is: sam. 01 of aout 2007 (the encodings used there are WIN1252 and ISO8859-7 (Greek)). I'm happy to test further is you can suggest how I can figure out the encoding actually output. Regards, Dave. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_restore oddity?
Heikki Linnakangas schrieb: Mario Weilguni wrote: I cannot use "-1" for performance, because some gist stuff has changed and the restore fails. But there seems to be no option for pg_restore to use transactions for data restore, so it's very very slow (one million records, each obviously in it's own transaction - because a separate session "select count(1) from logins" shows a growing number). By default, pg_dump/pg_restore uses a COPY command for each table, and each COPY executes as a single transaction, so you shouldn't see the row count growing like that. Is the dump file in --inserts format? You are right, it was my fault. I was confused about the pg_dump syntax, and used "-d" (the "-d" because pg_restore needs it for the destination database, not the dump itself), so it was using "--inserts". Everything is working fine. I've done dump/restores cycles a hundreds times, and now such a mistake. I can't believe it. Seems like I need to take some vacations. Thanks for the help! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
Simon Riggs <[EMAIL PROTECTED]> writes: > I think the best way to handle this is to have two limits. > First limit attempts to autovacuum, but can be cancelled. > When we hit second limit, sometime later, then autovacuum cannot be > cancelled. This seems like uselessly complex overdesign. Remember that we still have the 8.1-era mechanism of warning messages and eventual shutdown of normal backends as the wrap point gets closer. A DBA who persists in cancelling antiwraparound vacuums will start getting nasty warnings, and if he still persists will eventually be forced to do the vacuuming manually in a standalone backend (which will go at full speed btw). I concur with Simon's original argument that it's a terrible idea to make a background vacuum cancel-proof. Pretty much the entire argument for having vacuum at all (as opposed to say Oracle-style rollback) is that it's good to be able to postpone maintenance overhead to a time of the DBA's choosing. I don't see a reason why that doesn't apply to anti-wraparound vacuuming too. If he postpones it too long, the recovery will get unpleasant, but that's his choice. An independent reason for minimizing complexity in this area is that it's a corner case that (I trust) will never be hit by most installations, which means it'll never get much field testing. Poorly tested complicated code is dangerous. In short: put in the automatic lock cancel for regular vacuums, disable it for antiwraparound vacuums, but don't disable manual cancels; and definitely don't invent a complicated new set of behaviors around that. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Locale + encoding combinations
Trevor Talbot wrote: > The encoding output is the one you specified. OK. > Keep in mind, > underneath Windows is mostly working with Unicode, so all characters > exist and the locale rules specify their behavior there. The encoding > is just the byte stream it needs to force them all into after doing > whatever it does to them. As you've seen, it uses some sort of > best-fit mapping I don't know the details of. (It will drop accent > marks and choose characters with similar shape where possible, by > default.) Right, that makes sense. The codepages used by setlocale etc. are just translation tables to/from the internal unicode representation. > I think it's a bit more complex for input/transform cases where you > operate on the byte stream directly without intermediate conversion to > Unicode, which is why UTF-8 doesn't work as a codepage, but again I > don't have the details nearby. I can try to do more digging if > needed. It does (sort of) work as a codepage, it just doesn't have the NLS file to define how things like UPPER() and LOWER() should work. Regards, Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pg_tablespace_size()
Dave Page <[EMAIL PROTECTED]> writes: > select pg_tablespace_size('pg_global'); > ERROR: permission denied for tablespace pg_global > Can this be fixed please? What's to be fixed? That's exactly the result I'd expect given the previously-agreed-to behavior for pg_tablespace_size. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I think the best way to handle this is to have two limits. > > First limit attempts to autovacuum, but can be cancelled. > > When we hit second limit, sometime later, then autovacuum cannot be > > cancelled. > > This seems like uselessly complex overdesign. OK > In short: put in the automatic lock cancel for regular vacuums, disable > it for antiwraparound vacuums, but don't disable manual cancels; and > definitely don't invent a complicated new set of behaviors around that. That seemed more complex when I thought about that, but if we just use SIGUSR2 for automatic cancels then this would be very simple. Or did you have another design? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Locales and Encodings
"Peter Eisentraut" <[EMAIL PROTECTED]> writes: > Am Freitag, 12. Oktober 2007 schrieb Martijn van Oosterhout: >> Where we're stuck is that we can't agree on a >> source of locale data. People don't want the ICU or glibc data and >> there's no other source as readily available. > > What were the objections to ICU? It's introducing a new dependency to do something fundamental to Postgres, one that's larger than all of Postgres. It would make Postgres inconsistent and less integrated with the rest of the OS. How do you explain that Postgres doesn't follow the system's configurations and the collations don't agree with the system collations? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Locales and Encodings
Am Freitag, 12. Oktober 2007 schrieb Gregory Stark: > It would make Postgres inconsistent and less integrated with the rest of > the OS. How do you explain that Postgres doesn't follow the system's > configurations and the collations don't agree with the system collations? We already have our own encoding support (for better or worse), and I don't think having one's own locale support would be that much different. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Locales and Encodings
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > On Fri, Oct 12, 2007 at 03:28:26PM +0100, Gregory Stark wrote: >> I think realistically we're basically waiting for strcoll_l to become >> standardized by POSIX so we can depend on it. > I think we could be waiting forever then. strcoll is only a small fraction of the problem anyway. The and functions are another chunk of it, and then there's the issues of system message spellings, LC_MONETARY info, etc etc. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] First steps with 8.3 and autovacuum launcher
On Fri, 2007-10-12 at 12:42 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote: > >> Why not SIGINT? > > > I must be missing something. How would I tell the difference between > > manual and automatic cancels if we use SIGINT for both cases? > > Why do you need to? I thought the plan was that DeadlockCheck would > only try to signal autovac workers. ...thinks... On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote: > In short: put in the automatic lock cancel for regular vacuums, disable > it for antiwraparound vacuums, but don't disable manual cancels; and > definitely don't invent a complicated new set of behaviors around that. So you mean "dont allow automatic cancels of manually submitted VACUUMs". I thought you meant don't disable manually-requested cancels of autovacuums. Can you explain further what you meant by "don't disable manual cancels". (laughs) wish I had a pound for every time we'd misunderstood each other, or at least, a pound every time I misunderstood you. :-) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] fmgr_info_cxt_security() screwing memory ?
Hi, I'm calling an arbitrary user-defined function from the backend. Although I can do it via FunctionCallInvoke, I have a weird problem when calling fmgr_info. The call results in a argument variable (eventType) being cleared. A gdb hardware watch says that the variable is modified by fmgr_info_cxt_security (fmgr.c:185): "finfo->fn_extra = NULL"; Any ideas of what is happening? For simplicity, in the following snippet I have hardcoded foid to some (PGPSQL) function id and removed the rest of the call-related statements. static void execEventTypeFunc(char *eventType, Oid funcId) { FmgrInfo *finfo; FunctionCallInfoData fcinfo; Datum result; Oid foid = 17283; finfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo)); fmgr_info(foid, finfo); ... } Thanks, Luis Vargas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] fmgr_info_cxt_security() screwing memory ?
Luis Vargas <[EMAIL PROTECTED]> writes: > Hi, I'm calling an arbitrary user-defined function from the backend. > Although I can do it via FunctionCallInvoke, I have a weird problem when > calling fmgr_info. The call results in a argument variable (eventType) > being cleared. A gdb hardware watch says that the variable is modified by > fmgr_info_cxt_security (fmgr.c:185): "finfo->fn_extra = NULL"; Any ideas of > what is happening? One wonders whether eventType points at already-pfreed memory. If not, the problem must lie somewhere in the code you removed from the example. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Seems we need a post-beta1 initdb already
As Martin Pitt pointed out in this pgsql-bugs thread http://archives.postgresql.org/pgsql-bugs/2007-10/msg00089.php we have managed to create an ABI break between 8.2 and 8.3 libpq by renumbering encoding IDs in pg_wchar.h. Although perhaps this does not hurt any third-party clients, it does break our own initdb and psql programs, which turn out to be dependent on libpq and the backend having the same numbering. Going forward we should try to fix things so that the exact values of those IDs need not be considered part of libpq's ABI, but that doesn't get us out of the fact that 8.2 psql will fail if you try to use it with CVS HEAD libpq.so. It seems that we are faced with a choice of two evils: 1. Accept that there's an ABI break and increment libpq.so's major version number for 8.3. This will be a PITA for packagers, who will have to carry a compatibility package to provide 8.2 libpq.so. 2. Renumber 8.3's encoding IDs to preserve compatibility with the 8.2 values. It turns out that we can do that, but we will have to force initdb because the contents of pg_database.encoding will change. I'm of the opinion that #2 is the lesser evil, but maybe I'm overly influenced by my Red Hat packaging responsibilities --- I'll personally have to spend time on a compatibility package if we go with #1. Other opinions out there? Also, if we do #2 it means that we have the option to resolve the contrib/txid mess by pushing txid into the core backend before beta2. Any votes pro or con on that? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Seems we need a post-beta1 initdb already
Tom Lane wrote: I'm of the opinion that #2 is the lesser evil, but maybe I'm overly influenced by my Red Hat packaging responsibilities --- I'll personally have to spend time on a compatibility package if we go with #1. Other opinions out there? Also, if we do #2 it means that we have the option to resolve the contrib/txid mess by pushing txid into the core backend before beta2. Any votes pro or con on that? #2 by all means. Post-beta initdb is mildly unfortunate; compatibility packages are to be avoided far more. +1 to put txid in core, where it belongs. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Seems we need a post-beta1 initdb already
On Friday 12 October 2007 15:41:58 Tom Lane wrote: > As Martin Pitt pointed out in this pgsql-bugs thread > http://archives.postgresql.org/pgsql-bugs/2007-10/msg00089.php > we have managed to create an ABI break between 8.2 and 8.3 libpq > by renumbering encoding IDs in pg_wchar.h. Although perhaps this > does not hurt any third-party clients, it does break our own initdb > and psql programs, which turn out to be dependent on libpq and the > backend having the same numbering. Going forward we should try to > fix things so that the exact values of those IDs need not be considered > part of libpq's ABI, but that doesn't get us out of the fact that 8.2 > psql will fail if you try to use it with CVS HEAD libpq.so. > > It seems that we are faced with a choice of two evils: > > 1. Accept that there's an ABI break and increment libpq.so's major > version number for 8.3. This will be a PITA for packagers, who will > have to carry a compatibility package to provide 8.2 libpq.so. > > 2. Renumber 8.3's encoding IDs to preserve compatibility with the > 8.2 values. It turns out that we can do that, but we will have to > force initdb because the contents of pg_database.encoding will change. > > I'm of the opinion that #2 is the lesser evil, but maybe I'm overly > influenced by my Red Hat packaging responsibilities --- I'll personally > have to spend time on a compatibility package if we go with #1. > Other opinions out there? > > Also, if we do #2 it means that we have the option to resolve the > contrib/txid mess by pushing txid into the core backend before beta2. > Any votes pro or con on that? I vote that adding txid to the backend now will be of better value for all parties involved than not, given it's coming without any particular pain. (though I may be a bit biased on this). By having it in core it's one less thing that skytools/slony needs to detect and error out on with an message saying install contrib/txid during their install phases. Not to mention this puts having replication that much more likely to be available in shared hosting environments who by all reports loth installing contrib for their customers. > > regards, tom lane > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Seems we need a post-beta1 initdb already
On Oct 12, 2007, at 17:41 , Tom Lane wrote: Also, if we do #2 it means that we have the option to resolve the contrib/txid mess by pushing txid into the core backend before beta2. Any votes pro or con on that? +1 Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Seems we need a post-beta1 initdb already
Hi, On Fri, 2007-10-12 at 18:41 -0400, Tom Lane wrote: > > I'm of the opinion that #2 is the lesser evil, but maybe I'm overly > influenced by my Red Hat packaging responsibilities --- I'll > personally > have to spend time on a compatibility package if we go with #1. > Other opinions out there? > > Also, if we do #2 it means that we have the option to resolve the > contrib/txid mess by pushing txid into the core backend before beta2. > Any votes pro or con on that? I am also for #2 (and yes, I want to avoid yet another compat rpm problem...) Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Hash index todo list item
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote: > On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > > Dear PostgreSQL Hackers: > > > > After following the hackers mailing list for quite a while, > > I am going to start investigating what will need to be done > > to improve hash index performance. Below are the pieces of > > this project that I am currently considering: > > > > 1. Characterize the current hash index implementation against > >the BTree index, with a focus on space utilization and > >lookup performance against a collection of test data. This > >will give a baseline performance test to evaluate the impact > >of changes. I initially do not plan to bench the hash creation > >process since my initial focus will be on lookup performance. > > > > Here are very basic results for a table with 1.6m entries: > > postgres=# CREATE TABLE dict (word varchar(100)); > CREATE TABLE > > postgres=# COPY dict FROM '/tmp/words'; > COPY 1648379 > postgres=# select count(*) from dict; > count > - > 1648379 > (1 row) > > Time: 11187.418 ms > postgres=# select count(*) from dict; > count > - > 1648379 > (1 row) > > Time: 6040.912 ms > postgres=# CREATE INDEX wordhash ON dict USING hash (word); > CREATE INDEX > Time: 11108707.160 ms > > postgres=# select * from dict where word = 'avatar'; > word > > avatar > (1 row) > > Time: 79.823 ms > postgres=# select * from dict where word = 'zebra'; > word > --- > zebra > (1 row) > > Time: 9.864 ms > postgres=# select * from dict where word = 'turkey'; > word > > turkey > (1 row) > > Time: 18.418 ms > Time: 1.045 ms > Time: 1.257 ms > Time: 1.080 ms > > postgres=# CREATE INDEX wordbtree ON dict USING btree (word); > CREATE INDEX > > Time: 25438.884 ms > > postgres=# select * from dict where word = 'avatar'; > word > > avatar > (1 row) > > Time: 13.400 ms > postgres=# select * from dict where word = 'zebra'; > word > --- > zebra > (1 row) > > Time: 1.173 ms > postgres=# select * from dict where word = 'turkey'; > word > > turkey > (1 row) > > Time: 1.186 ms > Time: 1.103 ms > Time: 1.099 ms > Time: 1.108 ms > > -- > Size of table = 87556096 > > Size of hash index = 268451840 > Size of btree index = 53510144 > > From my very small sample on an unloaded machine, a hash index lookup > took the least amount of time. It had a much larger initial time which > could be attributable to cache population effects. The size is 5X that > of the Btree index. I will continue to improve the test suite as more > granularity is needed. If anyone has a good data generator, please let > me know. Otherwise I will just roll my own. > > Regards, > Ken > I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2) including the hash index sorted build patch and the results are below: test=# CREATE TABLE dict (word varchar(100)); CREATE TABLE Time: 24.463 ms test=# COPY dict FROM '/tmp/cracklib-words'; LOG: checkpoints are occurring too frequently (21 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". COPY 1648379 Time: 44470.235 ms test=# select count(*) from dict; count - 1648379 (1 row) Time: 4553.924 ms test=# CREATE INDEX wordhash ON dict USING hash (word); CREATE INDEX Time: 85905.960 ms test=# select * from dict where word = 'avatar'; word avatar (1 row) Time: 592.906 ms test=# select * from dict where word = 'zebra'; word --- zebra (1 row) Time: 21.458 ms test=# select * from dict where word = 'turkey'; word turkey (1 row) Time: 22.532 ms Time: 1.224 ms Time: 1.200 ms Time: 1.264 ms test=# CREATE INDEX wordbtree ON dict USING btree (word); CREATE INDEX Time: 33714.874 ms test=# select * from dict where word = 'avatar'; word avatar (1 row) Time: 24.296 ms test=# select * from dict where word = 'zebra'; word --- zebra (1 row) Time: 1.400 ms test=# select * from dict where word = 'turkey'; word turkey (1 row) Time: 28.302 ms Time: 1.284 ms Time: 1.313 ms --- Size of table = 69877760 Size of hash index = 268451840 Size of btree index = 48570368 I just wanted to have this baseline for future patches since 8.3 is just around the bend. As expected, the sorted build process is a huge improvement from the unsorted original. Regards, Ken Marshall ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)
Hi Simon/Hannu, a) I accept with storing ctid in place of oid. That would definitely improve the performance. I guess it would also remove the restriction of the size on TOASTABLE ATTRIBUTES. I guess different chunks have to be linked together, without the index. b) But i don't understand how storing the visibility info in TOAST table would help you. In that case you may need to update it for every delete/ update happening in the main table. Only thing, it would help is if someone wants to do a full table scan on TOAST table alone. May be Vacuum of TOAST tables can be done independently. But do you think it is worth the loss of performance in Update/Delete? Thanks, Gokul. On 10/8/07, Simon Riggs <[EMAIL PROTECTED]> wrote: > > On Mon, 2007-10-08 at 14:58 +0300, Hannu Krosing wrote: > > > 1. get rid of indexes for TOAST tables > > > > instead of TOAST tuple id store CTID of first TOAST block directly, and > > use something like skip lists inside the TOAST block headers to access > > next TOAST tuples. > > It should be possible to optimise TOAST for when there is just a single > chunk that is toasted. Since we often (and by default) compress data > stored in TOAST tables this would be a frequently used optimisation. > > Instead of storing the TOAST OID, which is then looked-up in the index > to find the TOAST tid, we can just store the tid directly in the toast > pointer on the main heap. That way we'll get faster read and write > access for a good proportion of rows by avoiding the toast index and > going straight to the toast heap. > > We'd need a different kind of toast pointer which would be 2 bytes > longer than the normal pointer. I think we have a spare flag bit to > indicate this. > > That's just a rough sketch, I've not checked the details on this. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > >
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/12/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > Andreas Joseph Krogh <[EMAIL PROTECTED]> writes: > > Will $SUBJECT make it possible for count(*) to use index? This is a much > > wanted feature. > > If you mean "count(*) will become instantaneous", no it won't. It would > get faster, but probably not by more than a factor of 10 or so, > corresponding to the size ratio between index and table. Remember that > the proposal is going to bloat indexes pretty badly: a simple index on > an int4 column will double in size, more or less. So that ratio will > be much less favorable than it is today. I accept that the indexes will be bigger in size for this approach. You might need more disk-space and you might need more memory to accomodate the same amount of information. But i think disk costs and memory costs have come down a lot, People can afford to buy more disk and memory. But the only fact that remains true is that the disk, the last mechanical device is slow in addressing Random I/Os. So this proposal is aimed at occupying more memory and disk space to reduce Random I/Os. Say, if we are accomodating 200 tuples per index page in today's index(for a typical configuration), and as you said in the worst case (only one index field), we will be occupying 100 tuples per index page. But we would take away probably 100 random I/Os (say with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of disk is around $1. But one hour of Performance tuner would cost around $200 :)). So that's the trade-off for the enterprises, if they want to shift between the two indexes. Personally I think the bloat problem will doom this entire approach. > The distributed costs of that will outweigh the speedups that can be > achieved for specific queries. The OP is free to try to prove this > fear wrong, but no amount of discussion will constitute such a proof; > only a testable implementation. > > regards, tom lane > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org >
Re: [HACKERS] Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)
On Sat, 2007-10-13 at 11:12 +0530, Gokulakannan Somasundaram wrote: > b) But i don't understand how storing the visibility info in TOAST > table would help you. In that case you may need to update it for every > delete/ update happening in the main table. Only thing, it would help > is if someone wants to do a full table scan on TOAST table alone. May > be Vacuum of TOAST tables can be done independently. But do you think > it is worth the loss of performance in Update/Delete? No -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match