Re: CRC was: Re: [HACKERS] beta testing version
Horst Herb wrote: This may be implemented very fast (if someone points me where I can find CRC func). And I could implement "physical log" till next monday. I have been experimenting with CRCs for the past 6 month in our database for internal logging purposes. Downloaded a lot of hash libraries, tried different algorithms, and implemented a few myself. Which algorithm do you want? Have a look at the openssl libraries (www.openssl.org) for a start -if you don't find what you want let me know. As the logging might include large data blocks, especially now that we can TOAST our data, I would strongly suggest to use strong hashes like RIPEMD or MD5 instead of CRC-32 and the like. Sure, it takes more time tocalculate and more place on the hard disk, but then: a database without data integrity (and means of _proofing_ integrity) is pretty worthless. The choice of hash algoritm could be made a compile-time switch quite easyly I guess. - Hannu
[HACKERS] Re: [DOCS] organization file
hi, yes a was talking about the first: the file format of tables. I was reading about diferent file organizations (structures): sequential, heal, ring, multi ring, etc... afaik most of the files are sequential in nature, with some record updates happening in the middle to mark records as "obsolete". So data is added on to the end, which is why running VACUUM is so important. I look for some info in the documentation but i didn't find nothing, also i'm interested about the recovery system of postgresql i hope that you can give me some hints about where i can look for it In previous releases, since all files are written sequentially the recovery system is very simple. For the upcoming 7.1 release with WAL, there is likely more done, but I'm not familiar with the details. Somebody want to write a (short) description? I'll include it in the docs... - Thomas
Re: [HACKERS] RFC C++ Interface
Randy Jonasz writes: The following are my ideas for implementing the C++ API: My feeling is that if you want to create a new API, don't. Instead immitate an existing one. For you ambitions you could either take the C++ API of another RDBMS product, try to shoehorn the SQL CLI onto C++, write a C++ version of JDBC, or something of that nature. Designing a complete and good API from scratch is a really painful job when done well, and given that the market for C++ combined with PostgreSQL traditionally hasn't exactly been huge you need all the head starts you can get. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] copy from stdin limits
Greetings, and thank you for your reply. OK, I have 4 tables, and a view on a merge of the 4. I have a trigger on insert into table 3, and a trigger on insert into the view, which basically just takes the input data, does a few selects on the tables, and inserts the appropriate portions of the data into each table as necessary. When I copy up to ~ 1000 lines of a file into this view, everything goes fine. More than that, and after a while, cpu activity stops, disk activity stops, and the job hangs indefinitely. Control-C gives the later message "You have to get out of copy state yourself". I can provide the schema if needed. Thanks for your help! Tom Lane [EMAIL PROTECTED] writes: Camm Maguire [EMAIL PROTECTED] writes: Greetings! 'copy from stdin' on 7.0.2 appears to simply hang if more than ~ 1000 records follow in one shot. I couldn't see this behavior documented anywhere. Is this a bug? I've never heard of any such behavior ... and you can be sure that we'd have heard about this, since any moderately large pg_dump file would trigger such a bug. You must have something else going on. Details please? regards, tom lane -- Camm Maguire[EMAIL PROTECTED] == "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Re: [HACKERS] copy from stdin limits
Camm Maguire [EMAIL PROTECTED] writes: OK, I have 4 tables, and a view on a merge of the 4. I have a trigger on insert into table 3, and a trigger on insert into the view, which basically just takes the input data, does a few selects on the tables, and inserts the appropriate portions of the data into each table as necessary. When I copy up to ~ 1000 lines of a file into this view, everything goes fine. I'm a bit startled that COPY to a view works at all ;-). But it does look like copy honors triggers, so in principle the above ought to work. I'll need to see the complete details of the trigger and all the referenced table declarations --- probably don't need the data, though, unless the trigger needs to have nonempty tables to start with. regards, tom lane
Re: [HACKERS] RFC C++ Interface
Thanks for responding. I will definitely kepp your comments in mind. Cheers, Randy On Wed, 6 Dec 2000, Nathan Myers wrote: On Wed, Dec 06, 2000 at 05:09:31PM -0500, Randy Jonasz wrote: I know it's been a while since we last discussed a possible rewrite of the C++ API but I now have some time to devote to it. The following are my ideas for implementing the C++ API: I need suggestions, additions, comments etc! It would be helpful if the interface elements were to satisfy the STL requirements on iterators and collections. Those specify a minimum interface, which may be extended as needed. The one caveat is, don't try to "shoehorn" any semantics into the interface; anything that doesn't fit precisely should be treated as an extension instead, and the corresponding standard interface left unimplemented. Nathan Myers [EMAIL PROTECTED] Randy Jonasz Software Engineer Click2net Inc. Web: http://www.click2net.com Phone: (905) 271-3550 "You cannot possibly pay a philosopher what he's worth, but try your best" -- Aristotle
[HACKERS] Switch pg_ctl's default about waiting?
Now that the postmaster takes a noticeable amount of time to shut down, I'm wondering if pg_ctl's default about whether or not to wait ought to be reversed. That is, "-w" would become the norm, and some new switch ("-n" maybe) would be needed if you didn't want it to wait. Comments? regards, tom lane
Re: [HACKERS] BUG WITH CREATE FUNCTION.......
[EMAIL PROTECTED] writes: create function test(text) returns text AS '' LANGUAGE 'sql'; [crashes] OK, now it says: regression=# create function test(text) returns text AS '' LANGUAGE 'sql'; ERROR: function declared to return text, but no SELECT provided Thanks for the report! regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
On Thu, Dec 07, 2000 at 06:40:49PM +1100, Horst Herb wrote: This may be implemented very fast (if someone points me where I can find CRC func). And I could implement "physical log" till next monday. As the logging might include large data blocks, especially now that we can TOAST our data, I would strongly suggest to use strong hashes like RIPEMD or MD5 instead of CRC-32 and the like. Cryptographically-secure hashes are unnecessarily expensive to compute. A simple 64-bit CRC would be of equal value, at much less expense. Nathan Myers [EMAIL PROTECTED]
RE: [HACKERS] beta testing version
This may be implemented very fast (if someone points me where I can find CRC func). Lifted from the PNG spec (RFC 2083): Thanks! What about Copyrights/licence? Vadim
RE: [HACKERS] Switch pg_ctl's default about waiting?
Now that the postmaster takes a noticeable amount of time to shut down, I'm wondering if pg_ctl's default about whether or not to wait ought to be reversed. That is, "-w" would become the norm, and some new switch ("-n" maybe) would be needed if you didn't want it to wait. Comments? Agreed. Actually, without -m f|i flag to pg_ctl and with active sessions 7.0.X postmaster shuts down long time too. Vadim
Re: [HACKERS] CRCs (was: beta testing version)
On Wed, Dec 06, 2000 at 06:53:37PM -0600, Bruce Guenter wrote: On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote: On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote: I don't know how pgsql does it, but the only safe way I know of is to include an "end" marker after each record. An "end" marker is not sufficient, unless all writes are done in one-sector units with an fsync between, and the drive buffering is turned off. That's why an end marker must follow all valid records. When you write records, you don't touch the marker, and add an end marker to the end of the records you've written. After writing and syncing the records, you rewrite the end marker to indicate that the data following it is valid, and sync again. There is no state in that sequence in which partially- written data could be confused as real data, assuming either your drives aren't doing write-back caching or you have a UPS, and fsync doesn't return until the drives return success. That requires an extra out-of-sequence write. Any other way I've seen discussed (here and elsewhere) either - Assume that a CRC is a guarantee. We are already assuming a CRC is a guarantee. The drive computes a CRC for each sector, and if the CRC is OK the drive is happy. CRC errors within the drive are quite frequent, and the drive re-reads when a bad CRC comes up. The kind of data failures that a CRC is guaranteed to catch (N-bit errors) are almost precisely those that a mis-read on a hardware sector would cause. They catch a single mis-read, but not necessarily the quite likely double mis-read. ... A CRC would be a good addition to help ensure the data wasn't broken by flakey drive firmware, but doesn't guarantee consistency. No, a CRC would be a good addition to compensate for sector write reordering, which is done both by the OS and by the drive, even for "atomic" writes. But it doesn't guarantee consistency, even in that case. There is a possibility (however small) that the random data that was located in the sectors before the write will match the CRC. Generally, there are no guarantees, only reasonable expectations. A 64-bit CRC would give sufficient confidence without the out-of-sequence write, and also detect corruption from any source including power outage. (I'd also like to see CRCs on all the table blocks as well; is there a place to put them?) Nathan Myers [EMAIL PROTECTED]
RE: [HACKERS] pre-beta is slow
recently I have downloaded a pre-beta postgresql, I found insert and update speed is slower then 7.0.3, even I turn of sync flag, it is still slow than 7.0, why? how can I make it faster? Try to compare 7.0.3 7.1beta in multi-user environment. Vadim
Re: [HACKERS] beta testing version
"Mikheev, Vadim" [EMAIL PROTECTED] writes: This may be implemented very fast (if someone points me where I can find CRC func). Lifted from the PNG spec (RFC 2083): Thanks! What about Copyrights/licence? Should fit fine under our regular BSD license. CRC as such is long since in the public domain... regards, tom lane
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
On Thursday 07 December 2000 16:48, The Hermit Hacker wrote: Okay, since I haven't gotten word back on where to find the docs for v7.1, it still contains those for v7.0, but I just put up beta1 tarballs in the /pub/dev directory ... can someone take a look at these before we announce them to make sure they look okay? I'm in the process of downloading. What would be the diff between the beta1 and the snapshot? Saludos... :-) -- "And I'm happy, because you make me feel good, about me." - Melvin Udall - Martín Marqués email: [EMAIL PROTECTED] Santa Fe - Argentinahttp://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -
RE: CRC was: Re: [HACKERS] beta testing version
This may be implemented very fast (if someone points me where I can find CRC func). And I could implement "physical log" till next monday. I have been experimenting with CRCs for the past 6 month in our database for internal logging purposes. Downloaded a lot of hash libraries, tried different algorithms, and implemented a few myself. Which algorithm do you want? Have a look at the openssl libraries (www.openssl.org) for a start -if you don't find what you want let me know. Thanks. As the logging might include large data blocks, especially now that we can TOAST our data, TOAST breaks data into a few 2K (or so) tuples to be inserted separately. But first after checkpoint btree split will require logging of 2x8K record -:( I would strongly suggest to use strong hashes like RIPEMD or MD5 instead of CRC-32 and the like. Sure, it takes more time tocalculate and more place on the hard disk, but then: a database without data integrity (and means of _proofing_ integrity) is pretty worthless. Other opinions? Also, we shouldn't forget licence issues. Vadim
RE: [HACKERS] CRCs (was: beta testing version)
That's why an end marker must follow all valid records. ... That requires an extra out-of-sequence write. Yes, and also increase probability to corrupt already committed to log data. (I'd also like to see CRCs on all the table blocks as well; is there a place to put them?) Do we need it? "physical log" feature suggested by Andreas will protect us from non atomic data block writes. Vadim
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
The Hermit Hacker [EMAIL PROTECTED] writes: it still contains those for v7.0, but I just put up beta1 tarballs in the /pub/dev directory ... can someone take a look at these before we announce them to make sure they look okay? The tarballs match what I have locally ... ship 'em ... regards, tom lane
RE: [HACKERS] How to reset WAL enveironment
Probably this is caused by my trial (local) change and generated an illegal log output. However it seems to mean that WAL isn't always redo-able. Illegal log output is like disk crash - only BAR can help. But redo-recovery after restore would also fail. The operation which corresponds to the illegal log output aborted at the execution time and rolling back by redo also failed. It seems preferable to me that the transaction is rolled back by undo. What exactly did you change in code? What kind of illegal log output? Was something breaking btree/WAL logic written to log? Vadim
Re: [HACKERS] Switch pg_ctl's default about waiting?
Tom Lane writes: Now that the postmaster takes a noticeable amount of time to shut down, I'm wondering if pg_ctl's default about whether or not to wait ought to be reversed. That is, "-w" would become the norm, and some new switch ("-n" maybe) would be needed if you didn't want it to wait. Two concerns: 1. The waiting isn't very reliable as we recently found out. (If you wait on shutdown, then wait on startup would be default as well, no?) 2. Why would you necessarily care to wait for shutdown? Startup I can see, but shutdown doesn't seem so important. -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
The Hermit Hacker writes: Okay, since I haven't gotten word back on where to find the docs for v7.1, /home/projects/pgsql/ftp/www/html/devel-corner/docs Ideally (IMHO) we'd build the documentation right in place when making the distribution tarball, i.e., broken docs, no release. I'm not sure how to usefully extrapolate that to the snapshot builds, though. Another thing we should think about is to not tar.gz the documentation files. That way we could create useful incremental diffs between releases later on. Any comments here? it still contains those for v7.0, but I just put up beta1 tarballs in the /pub/dev directory ... can someone take a look at these before we announce them to make sure they look okay? -- Peter Eisentraut [EMAIL PROTECTED] http://yi.org/peter-e/
Re: [HACKERS] Switch pg_ctl's default about waiting?
Peter Eisentraut [EMAIL PROTECTED] writes: 2. Why would you necessarily care to wait for shutdown? Startup I can see, but shutdown doesn't seem so important. Well, maybe I'm the only one who has a script like pg_ctl -w stop cd ~/.../backend; make installbin pg_ctl start but I got burnt regularly until I put -w in there ;-) regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
"Mikheev, Vadim" [EMAIL PROTECTED] writes: I would strongly suggest to use strong hashes like RIPEMD or MD5 instead of CRC-32 and the like. Other opinions? Also, we shouldn't forget licence issues. I agree with whoever commented that crypto hashes are silly for this application. A 64-bit CRC *might* be enough stronger than a 32-bit CRC to be worth the extra calculation, but frankly I doubt that too. Remember that we are already sitting atop hardware that's really pretty reliable, despite the carping that's been going on in this thread. All that we have to do is detect the infrequent case where a block of data didn't get written due to system failure. It's wildly pessimistic to think that we might get called on to do so as much as once a day (if you are trying to run a reliable database, and are suffering power failures once a day, and haven't bought a UPS, you're a lost cause). A 32-bit CRC will fail to detect such an error with a probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of 2^32 days, or 11 million years, on the wildly pessimistic side --- real installations probably 100 times better. That's plenty for me, and improving the odds to 2^64 or 2^128 is not worth any slowdown IMHO. regards, tom lane
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
Peter Eisentraut [EMAIL PROTECTED] writes: Another thing we should think about is to not tar.gz the documentation files. That way we could create useful incremental diffs between releases later on. Any comments here? I've never figured out why we do that. Since the thing is going to be inside a tarball anyway, there's no possible savings from distributing the built doco that way, rather than as ordinary files. regards, tom lane
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
On Thursday 07 December 2000 18:35, Peter Eisentraut wrote: Ideally (IMHO) we'd build the documentation right in place when making the distribution tarball, i.e., broken docs, no release. I'm not sure how to usefully extrapolate that to the snapshot builds, though. Another thing we should think about is to not tar.gz the documentation files. That way we could create useful incremental diffs between releases later on. Any comments here? If you dont't tar.gz the docs, what should the downladable format be? CVS? I think CVS would be great. -- "And I'm happy, because you make me feel good, about me." - Melvin Udall - Martín Marqués email: [EMAIL PROTECTED] Santa Fe - Argentinahttp://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -
Re: [HACKERS] CRCs (was: beta testing version)
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote: That's why an end marker must follow all valid records. ... That requires an extra out-of-sequence write. Yes, and also increase probability to corrupt already committed to log data. (I'd also like to see CRCs on all the table blocks as well; is there a place to put them?) Do we need it? "physical log" feature suggested by Andreas will protect us from non atomic data block writes. There are myriad sources of corruption, including RAM bit rot and software bugs. The earlier and more reliably it's caught, the better. The goal is to be able to say that a power outage won't invisibly corrupt your database. Here is are sources to a 64-bit CRC computation, under BSD license: http://gcc.gnu.org/ml/gcc/1999-11n/msg00592.html Nathan Myers [EMAIL PROTECTED]
[HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
I have an abstract solution for a problem in postgresql's handling of what should be constant data. We had problem with a query taking way too long, basically we had this: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date = '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan AND hit_date = '2000-12-07 14:27:24-08'::timestamp ; turning it into: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date = '2000-12-07 14:27:24-08'::timestamp AND hit_date = '2000-12-07 14:27:24-08'::timestamp ; (doing the -24 hours seperately) The values of cost went from: (cost=0.00..127.24 rows=11 width=12) to: (cost=0.00..4.94 rows=1 width=12) By simply assigning each sql "function" a taint value for constness one could easily reduce: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan to: '2000-12-07 14:27:24-08'::timestamp by applying the expression and rewriting the query. Each function should have a marker that explains whether when given a const input if the output might vary, that way subexpressions can be collapsed until an input becomes non-const. Here, let's break up: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan What we have is: timestamp(const) - timespan(const) we have timestamp defined like so: const timestamp(const string) non-const timestamp(non-const) and timespan like so: const timespan(const string) non-const timespan(non-const) So now we have: const timestamp((const string)'2000-12-07 14:27:24-08') - const timespan((const string)'24 hours') --- const - const const then eval the query. You may want to allow a function to have a hook where it can eval a const because depending on the const it may or may not be able to return a const, for instance if some string you passed to timestamp() caused it to return non-const data. Or maybe this is fixed in 7.1? -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."
Re: CRC was: Re: [HACKERS] beta testing version
On Thu, Dec 07, 2000 at 04:35:00PM -0500, Tom Lane wrote: Remember that we are already sitting atop hardware that's really pretty reliable, despite the carping that's been going on in this thread. All that we have to do is detect the infrequent case where a block of data didn't get written due to system failure. It's wildly pessimistic to think that we might get called on to do so as much as once a day (if you are trying to run a reliable database, and are suffering power failures once a day, and haven't bought a UPS, you're a lost cause). A 32-bit CRC will fail to detect such an error with a probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of 2^32 days, or 11 million years, on the wildly pessimistic side --- real installations probably 100 times better. That's plenty for me, and improving the odds to 2^64 or 2^128 is not worth any slowdown IMHO. 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for 2^32 times the confidence. That's pretty cheap confidence. 2. I disagree with way the above statistics were computed. That eleven million-year figure gets whittled down pretty quickly when you factor in all the sources of corruption, even without crashes. (Power failures are only one of many sources of corruption.) They grow with the size and activity of the database. Databases are getting very large and busy indeed. 3. Many users clearly hope to be able to pull the plug on their hardware and get back up confidently. While we can't promise they won't have to go to their backups, we should at least be equipped to promise, with confidence, that they will know whether they need to. 4. For a way to mark the "current final" log entry, you want a lot more confidence, because you read a lot more of them, and reading beyond the end may cause you to corrupt a currently-valid database, which seems a lot worse than just using a corrupted database. Still, I agree that a 32-bit CRC is better than none at all. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
We had problem with a query taking way too long, basically we had this: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date = '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan AND hit_date = '2000-12-07 14:27:24-08'::timestamp ; turning it into: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date = '2000-12-07 14:27:24-08'::timestamp AND hit_date = '2000-12-07 14:27:24-08'::timestamp ; Perhaps I'm being daft, but why should hit_date be both = and = the exact same time and date? (or did you mean to subtract 24 hours from your example and forgot?) (doing the -24 hours seperately) The values of cost went from: (cost=0.00..127.24 rows=11 width=12) to: (cost=0.00..4.94 rows=1 width=12) By simply assigning each sql "function" a taint value for constness one could easily reduce: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan to: '2000-12-07 14:27:24-08'::timestamp You mean '2000-12-06', don't you? Each function should have a marker that explains whether when given a const input if the output might vary, that way subexpressions can be collapsed until an input becomes non-const. There is "with (iscachable)". Does CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS 'SELECT $1-''24 hours''::interval' WITH (iscachable) work faster? -- Joel Burton, Director of Information Systems -*- [EMAIL PROTECTED] Support Center of Washington (www.scw.org)
Re: [HACKERS] v7.1 beta 1 (ODBC driver?)
The official ODBC driver from pg7.0.x doesn't work w/7.1 (b/c of the changes in the system catalogs, IIRC). The CVS 7.1devel code works and builds easily, but I suspect 99% of the beta testers won't have Visual C++ or won't be able to compile the driver. Is there an official driver-compiler-person that could package this up for 7.1beta? (I know that a binary driver isn't part of the beta per se, and that it's not *unreleasable* to think that everyone could compile their own, but I bought VC++ just to compile this driver, and would hate to see M$ get richer for even more people. Also, I doubt we'd want to impugn the perceived quality of 7.1beta b/c people don't understand that its just the ODBC drivers that out-of-date.) If there's no one official tasked w/this, I'd be happy to submit my compiled version, at http://www.scw.org/pgaccess. -- Joel Burton, Director of Information Systems -*- [EMAIL PROTECTED] Support Center of Washington (www.scw.org)
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
On Thu, 7 Dec 2000, Martin A. Marques wrote: On Thursday 07 December 2000 16:48, The Hermit Hacker wrote: Okay, since I haven't gotten word back on where to find the docs for v7.1, it still contains those for v7.0, but I just put up beta1 tarballs in the /pub/dev directory ... can someone take a look at these before we announce them to make sure they look okay? I'm in the process of downloading. What would be the diff between the beta1 and the snapshot? None for today ... snapshot's are build daily, beta1 right now is "a release candidate, if nobody reports any problems, we release what is packaged" ... we usually wait for a two week period or so after each beta is released for bug reporst before saying "its clean" ... if nobody changes the code in CVS in two weeks, beta1 goes out as v7.1 ... if we release a beta2, its two weeks from that, and so on ...
Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
Alfred Perlstein [EMAIL PROTECTED] writes: Each function should have a marker that explains whether when given a const input if the output might vary, that way subexpressions can be collapsed until an input becomes non-const. We already have that and do that. The reason the datetime-related routines are generally not marked 'proiscachable' is that there's this weird notion of a CURRENT time value, which means that the result of a datetime calculation may vary depending on when you do it, even though the inputs don't. Note that CURRENT here does not mean translating 'now' to current time during input conversion, it's a special-case data value inside the system. I proposed awhile back (see pghackers thread "Constant propagation and similar issues" from mid-September) that we should eliminate the CURRENT concept, so that datetime calculations can be constant-folded safely. That, um, didn't meet with universal approval... but I still think it would be a good idea. In the meantime you can cheat by defining functions that you choose to mark ISCACHABLE, as has been discussed several times in the archives. regards, tom lane
Re: CRC was: Re: [HACKERS] beta testing version
[EMAIL PROTECTED] (Nathan Myers) writes: 2. I disagree with way the above statistics were computed. That eleven million-year figure gets whittled down pretty quickly when you factor in all the sources of corruption, even without crashes. (Power failures are only one of many sources of corruption.) They grow with the size and activity of the database. Databases are getting very large and busy indeed. Sure, but the argument still holds. If the net MTBF of your underlying system is less than a day, it's too unreliable to run a database that you want to trust. Doesn't matter what the contributing failure mechanisms are. In practice, I'd demand an MTBF of a lot more than a day before I'd accept a hardware system as satisfactory... 3. Many users clearly hope to be able to pull the plug on their hardware and get back up confidently. While we can't promise they won't have to go to their backups, we should at least be equipped to promise, with confidence, that they will know whether they need to. And the difference in odds between 2^32 and 2^64 matters here? I made a numerical case that it doesn't, and you haven't refuted it. By your logic, we might as well say that we should be using a 128-bit CRC, or 256-bit, or heck, a few kilobytes. It only takes a little longer to go up each step, right, so where should you stop? I say MTBF measured in megayears ought to be plenty. Show me the numerical argument that 64 bits is the right place on the curve. 4. For a way to mark the "current final" log entry, you want a lot more confidence, because you read a lot more of them, You only need to make the distinction during a restart, so I don't think that argument is correct. regards, tom lane
Re: [HACKERS] Patches with vacuum fixes available for 7.0.x
Alfred Perlstein [EMAIL PROTECTED] writes: Basically Vadim has been able to reduce the amount of time taken by a vacuum from 10-15 minutes down to under 10 seconds. Cool. What's it do, exactly? regards, tom lane
Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
* Tom Lane [EMAIL PROTECTED] [001207 16:45] wrote: Alfred Perlstein [EMAIL PROTECTED] writes: Each function should have a marker that explains whether when given a const input if the output might vary, that way subexpressions can be collapsed until an input becomes non-const. We already have that and do that. The reason the datetime-related routines are generally not marked 'proiscachable' is that there's this weird notion of a CURRENT time value, which means that the result of a datetime calculation may vary depending on when you do it, even though the inputs don't. Note that CURRENT here does not mean translating 'now' to current time during input conversion, it's a special-case data value inside the system. I proposed awhile back (see pghackers thread "Constant propagation and similar issues" from mid-September) that we should eliminate the CURRENT concept, so that datetime calculations can be constant-folded safely. That, um, didn't meet with universal approval... but I still think it would be a good idea. I agree with you that doing anything to be able to fold these would be nice. However there's a hook mentioned in my abstract that explains that if a constant makes it into a function, you can provide a hook so that the function can return whether or not that constant is cacheable. If the date functions used that hook to get a glimpse of the constant data passed in, they could return 'cachable' if it doesn't contain the 'CURRENT' stuff you're talking about. something like this could be called on input to "maybe-cachable" functions: int date_cachable_hook(const char *datestr) { if (strcasecmp("current", datestr) == 0) return (UNCACHEABLE); return (CACHEABLE); } Or maybe I'm missunderstanding what CURRENT implies? I do see that on: http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html both you and Thomas Lockhart agree that CURRENT is a broken concept because it can cause btree inconsistancies and should probably be removed anyway. No one seems to dispute that, and then the thread leads off into discussions about optimizer hints. In the meantime you can cheat by defining functions that you choose to mark ISCACHABLE, as has been discussed several times in the archives. Yes, but it doesn't help the niave user (me :) ) much. :( Somehow I doubt that if 'CURRENT' was ifdef'd people would complain. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."
Re: [HACKERS] Patches with vacuum fixes available for 7.0.x
* Tom Lane [EMAIL PROTECTED] [001207 17:10] wrote: Alfred Perlstein [EMAIL PROTECTED] writes: Basically Vadim has been able to reduce the amount of time taken by a vacuum from 10-15 minutes down to under 10 seconds. Cool. What's it do, exactly? The first is a bonus that Vadim gave us to speed up index vacuums, I'm not sure I understand it completely, but it work really well. :) here's the README he gave us: Vacuum LAZY index cleanup option LAZY vacuum option introduces new way of indices cleanup. Instead of reading entire index file to remove index tuples pointing to deleted table records, with LAZY option vacuum performes index scans using keys fetched from table record to be deleted. Vacuum checks each result returned by index scan if it points to target heap record and removes corresponding index tuple. This can greatly speed up indices cleaning if not so many table records were deleted/modified between vacuum runs. Vacuum uses new option on user' demand. New vacuum syntax is: vacuum [verbose] [analyze] [lazy] [table [(columns)]] The second is one of the suggestions I gave on the lists a while back, keeping track of the "last dirtied" block in the data files to only scan the tail end of the file for deleted rows, I think what he instead did was keep a table that holds all the modified blocks and vacuum only scans those: Minimal Number Modified Block (MNMB) This feature is to track MNMB of required tables with triggers to avoid reading unmodified table pages by vacuum. Triggers store MNMB in per-table files in specified directory ($LIBDIR/contrib/mnmb by default) and create these files if not existed. Vacuum first looks up functions mnmb_getblock(Oid databaseId, Oid tableId) mnmb_setblock(Oid databaseId, Oid tableId, Oid block) in catalog. If *both* functions were found *and* there was no ANALYZE option specified then vacuum calls mnmb_getblock to obtain MNMB for table being vacuumed and starts reading this table from block number returned. After table was processed vacuum calls mnmb_setblock to update data in file to last table block number. Neither mnmb_getblock nor mnmb_setblock try to create file. If there was no file for table being vacuumed then mnmb_getblock returns 0 and mnmb_setblock does nothing. mnmb_setblock() may be used to set in file MNMB to 0 and force vacuum to read entire table if required. To compile MNMB you have to add -DMNMB to CUSTOM_COPT in src/Makefile.custom. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."
Re: [HACKERS] abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
Alfred Perlstein [EMAIL PROTECTED] writes: ... However there's a hook mentioned in my abstract that explains that if a constant makes it into a function, you can provide a hook so that the function can return whether or not that constant is cacheable. Oh, I see --- you're right, I missed that part of your proposal. I dunno ... if we had more than one example of a case where this was needed (and if that example weren't demonstrably broken for other reasons), maybe that'd be worth doing. But it seems like a lot of mechanism to add to solve a problem we shouldn't have anyway. I do see that on: http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html both you and Thomas Lockhart agree that CURRENT is a broken concept because it can cause btree inconsistancies and should probably be removed anyway. I had forgotten the btree argument, actually ... thanks for reminding me! I think it's too late to do anything about this for 7.1, in any case, but I'll put removing CURRENT back on the front burner for 7.2. regards, tom lane
Re: [HACKERS] Patches with vacuum fixes available for 7.0.x
On Thu, 7 Dec 2000, Alfred Perlstein wrote: We recently had a very satisfactory contract completed by Vadim. Basically Vadim has been able to reduce the amount of time taken by a vacuum from 10-15 minutes down to under 10 seconds. ... What size database was that on? I looking at moving a 2GB database from MySQL to Postgres. Most of that data is one table with 12 million records, to which we post about 1.5 million records a month. MySQL's table locking sucks, but as long as are careful about what reports we run and when, we can avoid the problem. However, Postgres' vacuum also sucks. I have no idea how long our particular database would take to vacuum, but I don't think it would be very nice. That also leads to the erserver thing. erserver sounds nice, but I sure wish it was possible to get more details on it. It seems rather intangible right now. If erserver is payware, where do I buy it? This is getting a bit off-topic now... Tom
Re: [HACKERS] Patches with vacuum fixes available for 7.0.x
* Tom Samplonius [EMAIL PROTECTED] [001207 18:55] wrote: On Thu, 7 Dec 2000, Alfred Perlstein wrote: We recently had a very satisfactory contract completed by Vadim. Basically Vadim has been able to reduce the amount of time taken by a vacuum from 10-15 minutes down to under 10 seconds. ... What size database was that on? Tables were around 300 megabytes. I looking at moving a 2GB database from MySQL to Postgres. Most of that data is one table with 12 million records, to which we post about 1.5 million records a month. MySQL's table locking sucks, but as long as are careful about what reports we run and when, we can avoid the problem. However, Postgres' vacuum also sucks. I have no idea how long our particular database would take to vacuum, but I don't think it would be very nice. We only do about 54,000,000 updates to a single table per-month. That also leads to the erserver thing. erserver sounds nice, but I sure wish it was possible to get more details on it. It seems rather intangible right now. If erserver is payware, where do I buy it? Contact Pgsql Inc. I think it's free, but you have to discuss terms with them. This is getting a bit off-topic now... Scalabilty is hardly ever off-topic. :) -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."
Re: [HACKERS] Re: Sorry
| | That's ok, you work with Dan Moschuk right? | | He's my bitch. :-) | | And as such, I've donated him to do neat things with postgres' C++ interface | (for whatever reason, he's of the less-enlightened opinion that C++ shouldn't | be dragged out into the backyard and shot). *sigh* No more crack for me. How that _should_ have read (well, everything except the C++ thing :-) is that as Click2Net uses postgres exclusively for all our database needs, Randy has been kind enough to volunteer his time (or perhaps look for an excuse to stop doing PHP for a while :-) to work on this. I hope this will be the first of a string of projects that Randy, myself, and the and the rest of our band of merry-men will be undertaking and giving back to the postgres community. An area that I'm currently examining for the FreeBSD project is server clustering, and you can bet that one of the requirements is to make sure postgres can take full advantage of that. Well, assuming trying to get shared memory to work across multiple machines doesn't turn me off of programming for good. :P Cheers! -Dan -- Man is a rational animal who always loses his temper when he is called upon to act in accordance with the dictates of reason. -- Oscar Wilde
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
Another thing we should think about is to not tar.gz the documentation files. That way we could create useful incremental diffs between releases later on. Any comments here? I've never figured out why we do that. Well... Since the thing is going to be inside a tarball anyway, there's no possible savings from distributing the built doco that way, rather than as ordinary files. A couple of reasons, historically: 1) I was building docs locally, and moving them across to postgresql.org over a modem. It wasn't for another year (?) before postgresql.org could build them locally. 2) The first html docs were available before a release, and they needed to be distributed. 3) We put the docs into cvs, but the jade/docbook output did not have predictable file names. So each release would require wiping the output docs and somehow guessing which files were obsolete and which were new. 4) We would have to install these individual files, and we didn't have a technique for installing docs. Untarring seemed compatible with (2) and (3). Anyway, since we no longer put the docs tarball into cvs, then we could rethink the techniques. Peter, you seem to have done enough work on this to have an opinion, so what exactly would you prefer for packaging? I recall that an unpacked tree was the suggestion?? I think that *requiring* that the html docs be built in place to effect a release is an extra toolset burden that we should not accept. The fact that the docs tools work well on postgresql.org as well as on other machines is something to be enjoyed, not put into the critical path ;) - Thomas
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
Thomas Lockhart [EMAIL PROTECTED] writes: [ various good reasons ] 3) We put the docs into cvs, but the jade/docbook output did not have predictable file names. So each release would require wiping the output docs and somehow guessing which files were obsolete and which were new. That's something that's annoyed me for a good while in a different context, namely that URLs for particular pages of the docs on postgresql.org aren't stable. (Well, maybe they are? but foo58342.htm doesn't give one a warm feeling about it. chap3sec7.htm would look a lot better.) Is there any prospect of making the output filenames more predictable? Who should I annoy about it? I think that *requiring* that the html docs be built in place to effect a release is an extra toolset burden that we should not accept. Agreed on that one... regards, tom lane
Re: [HACKERS] v7.1 beta 1 ...packaged, finally ...
Is there any prospect of making the output filenames more predictable? Who should I annoy about it? Well, you could annoy me about it... ... and I would go to my local installation of the source tree... ... and build the docs to confirm that the *chapters* have good predictable names... ... and find the *every* .htm file has a "good" name. Hmm. Is it the fact that someone went through and added an "id field" to every chapter and section header? Whoever it was, good job! It wasn't me, but whoever it was: good job :) Ah, a perusal of the cvs log shows that Peter E. is the culprit. Looks like it is a non-issue from here on. - Thomas
Re: [HACKERS] Re: COPY BINARY file format proposal
Philip Warner [EMAIL PROTECTED] writes: I don't want to continue being picky, but you could just use 4 bytes for a maj-min-rev-patch version number (in that order), and avoid the endian issues by reading and writing each byte. No big deal, though. Well, the thing is that we need to protect the contents of datatype-specific structures. If it were just a matter of byte-flipping the counts and lengths defined by the (proposed) file format, I'd have specified that we write 'em all in network byte order and be done with it. But knowing the internal structure of every datatype in the system is a very different game, and I don't want to try to play that game ... at least not yet. So the proposal is just to identify the endianness that the file is being written with. Recovering the data on a machine of different endianness is a project for future data archeologists. This allows for both backwards-compatible header additions (extend the header without changing the version number) and non-backwards-compatible changes (bump the version number). That's where the rev patch levels help if you adopt the above version numbering - 1.0-** should should all be compatibile, 1.1 should be able to read = 1.1-**, 1.0-** should not be expected to read 1.1-** etc. Tell you the truth, I don't believe in file-format version numbers at all. My experience with such things is that they defeat portability rather than promote it, because readers tend to reject files that they could have actually have read as a result of insignificant version number issues. You can read all about my view of this issue in the PNG spec (RFC 2083, esp section 12.13) --- the versioning philosophy described there is largely yours truly's. I will not complain about sticking a "version 1.0" field into a format when there is no real intention of changing it in the future ... but assigning deep significance to major/minor numbers, or something like that, is wrongheaded. You need a much finer-grained view of compatibility issues than that if you want to achieve anything much in cross-version compatibility. Feature-based versioning, like PNG's notion of critical vs. ancillary chunks, is the thing you need for that. I didn't bring up the issue in this morning's proposal --- but if we ever do add stuff to the proposed extensible header, I will hold out for self-identifying feature-related items much like PNG chunks. regards, tom lane
Re: [HACKERS] Re: COPY BINARY file format proposal
I wrote: Next 4 bytes: integer layout field. This consists of the int32 constant 0x0A820D0A expressed in the source machine's endianness. (Again, value chosen with malice aforethought, to catch files munged by things like DOS/Unix newline conversion or high-bit-stripping.) Actually, that won't do. A little-endian machine would write 0A 0D 82 0A which would fail to trigger newline converters that are looking for \r followed by \n (0D 0A). If we're going to take seriously the idea of detecting newline transforms, then we need to incorporate the test pattern into the fixed-byte-order signature. How about: Signature: 12-byte sequence "PGBCOPY\n\377\r\n\0" (detects newline replacements, dropped nulls, dropped high bits, parity changes); Integer layout field: int32 constant 0x01020304 in source's byte order. The rest as before. regards, tom lane
[HACKERS] Oracle-compatible lpad/rpad behavior
Zeugswetter Andreas SB [EMAIL PROTECTED] writes: lpad and rpad never truncate, they only pad. Perhaps they *should* truncate if the specified length is less than the original string length. Does Oracle do that? Yes, it truncates, same as Informix. I went to fix this and then realized I still don't have an adequate spec of how Oracle defines these functions. It would seem logical, for example, that lpad might truncate on the left instead of the right, ie lpad('abcd', 3, 'whatever') might yield 'bcd' not 'abc'. Would someone check? Also, what happens if the specified length is less than zero? Error, or is it treated as zero? regards, tom lane
Re: [GENERAL] Oracle-compatible lpad/rpad behavior
I went to fix this and then realized I still don't have an adequate spec of how Oracle defines these functions. It would seem logical, for example, that lpad might truncate on the left instead of the right, ie lpad('abcd', 3, 'whatever') might yield 'bcd' not 'abc'. Would someone check? SQL select lpad('abcd', 3, 'foobar') from dual; LPA --- abc Also, what happens if the specified length is less than zero? Error, or is it treated as zero? SQL select ':' || lpad('abcd', -1, 'foobar') || ':' from dual; ': -- :: (colons added so it's obvious that it's a zero-length string) -Jonathan