Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: I am trying to develop, a API to carry through backup and restore through JDBC. I think that the best form is to use JNI. Some Suggestion? Do you mean 'backup' or 'export/dump'? If you mean 'export', do you need anything besides SQL? If you mean 'backup', how do you want to restore from a client machine when there is no server running? And if you're on the server, why would you want to use Java? Yours, Laurenz Albe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] JAVA Support
I would if we could get some -hackers buy in on the idea. Adding more and more auth methods is something they're not excited about unless there's a good reason (which I think this is). Actually, I've been trying to get some of the Sun engineers to contribute patches for Solaris authentication methods, of which GSSAPI is one. So in theory someone from Sun should be looking at coding this. Well, if they are, I hope they would be speaking up now, so work isn't duplicated... So if you're out there, please speak up! ;-) //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] JAVA Support
This being SASL: I know I tried to make it work on win32 once and failed miserably. (Then again, I've failed on Linux as well, but not quite as bad. And it's not included in all Linux distributions, at least it wasn't when I checked a while back) Well, I know Redhat has RPM's that look reasonable. I'm not a big Linux user myself. (More a BSD bigot, to be honest.) Well, Redhat != Linux, really :P Over to GSSAPI: In theory, you get to plug in other mechanisms than Kerberos. In practice I think this only worked on Solaris, until very recently. FWIW, Microsoft have supported NTLM over GSSAPI since.. eh. Back in 1999, I guess, with the first pre-releases of Windows 2000. Wire compatibility with a native Windows API (the SSPI), if it's used correctly. (Google for posts by Jeffrey Altman for references to example code.) This, IMHO, is a big win if we can pull it off. It would significantly lower the barrier for getting Kerberos working properly in pg on Win32. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] JAVA Support
However, that doesn't change that some people would like us to support GSSAPI, and there may be some benefit (additional applications, better network authentication, etc.) for doing so. If we can get additional programmers to code the support (i.e. Sun, JPL) I don't see any reason not to support the *additional* authentication methods. Well, as I said already, a lot depends on the size of the patch. As a reductio ad absurdum, if they drop 100K lines of code on us, it *will* get rejected, no matter how cool it is. Oh, absolutely. The current Kerberos support seems to require about 50 lines in configure.in and circa 200 lines of C code in each of the backend and libpq. Plus a dependency on an outside library that happens to be readily available and compatibly licensed. I would expect, without looking at the details of the API, GSSAPI to be about the same amount of code if not less. What amount of code are we talking about adding here, and what dependencies exactly? What portability and license hazards will be added? The Kerberos5 libraries that we rely on today provide GSSAPI. So it would work with the same external library. Now, it could *also* work with other libraries in some cases (for example, the Win32 SSPI libraries), but with the same libraries it should work fine. //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
ITAGAKI Takahiro wrote: However, I think our next goal to shrink the headers is 16 bytes. The headers become 23 bytes using phantom cids and we are limited by alignments, so we will have no more advantages unless we delete extra 7 bytes in the headers. ...and it seems to be very difficult. Yeah, I thought about that too earlier. If we get rid of VACUUM FULL, or replace it with something that doesn't need xvac, and keep cmin and cmax in backend-private storage, we could get rid of the overlayed t_field4, which is 4 bytes. Then we're down to 19 bytes. We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. There's currently 15 bits in use in the infomask. After we remove the HEAP_MOVED_* fields that we don't need without VACUUM FULL, that's down to 13 bits. t_natts only needs 11 bits, because MaxHeapAttributeNumber is 1600. We could move 5 of the bits in infomask to the high 5 bits of t_natts, and save one byte. We're now down to 17 bytes. That's as far as I got. So it seems we could shave off some bytes, but we still can't get down to 16. And the changes needed in total would be quite massive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Faster StrNCpy
Hi, Tom, Tom Lane wrote: Strong, David [EMAIL PROTECTED] writes: Just wondering - are any of these cases where a memcpy() would work just as well? Or are you not sure that the source string is at least 64 bytes in length? In most cases, we're pretty sure that it's *not* --- it'll just be a palloc'd C string. I'm disinclined to fool with the restriction that namestrcpy zero-pad Name values, because they might end up on disk, and allowing random memory contents to get written out is ungood from a security point of view. There's another disadvantage of always copying 64 bytes: It may be that the 64-byte range crosses a page boundary. Now guess what happens when this next page is not mapped - segfault. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Block B-Tree concept
Heikki Linnakangas wrote: If we want to keep the property that VACUUM doesn't re-evaluate index entries, any proposal that doesn't keep track of every heap tuple isn't going to work. I'll try to modify the design I had in mind so that it does keep track of every heap tuple in some form. After some thought: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. We should get most of the space and I/O savings as with the original proposal, but we can vacuum it without re-evaluating index expressions. It does change the format of an index tuple, unlike the original proposal. That adds some complexity. but it's doable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Another idea for dealing with cmin/cmax
Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote: Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand the bitmaps for the entire table? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Another idea for dealing with cmin/cmax
Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote: Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand the bitmaps for the entire table? No, you'd still have the the number of attributes (t_natts) in the header. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Block B-Tree concept
On Fri, Sep 29, 2006 at 10:51:32AM +0100, Heikki Linnakangas wrote: After some thought: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. We should get most of the space and I/O savings as with the original proposal, but we can vacuum it without re-evaluating index expressions. I think something like this has been discussed before. Where an index tuple currently has the key values followed by a ctid, simply change that so that it can be a list of ctid's, in order. This works on having the same key, but doesn't care if the tuples are on the same page. It used to be not possible because of how to handle forward and backward scanning within an index entry during updates. I think with the new page at a time index scanning, this got a lot easier. One issue is that a single index page could hold more than 1000 index entries, which might cause problems for callers. It does change the format of an index tuple, unlike the original proposal. That adds some complexity. but it's doable. This way doesn't change the current index format much. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] New version of money type
On Thu, Sep 28, 2006 at 06:32:11PM -0500, Jim C. Nasby wrote: What would be ideal is a money type that stored what currency was used and let you change precision (within reason). The taggedtypes version of currency does half of that, by storing the currency and allowing the output format to depend on that. It doesn't allow you to easily change the precision though, that would require user-defined typmod which is still under discussion. It would be possible to create a taggedtypes version of currency based on int64. With the currency tag it would be 12 bytes total. And the number of decimal places could be defined per currency... Interesting thought, probably wouldn't take more than an hour to whip up. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Block B-Tree concept
Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 10:51:32AM +0100, Heikki Linnakangas wrote: After some thought: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. We should get most of the space and I/O savings as with the original proposal, but we can vacuum it without re-evaluating index expressions. I think something like this has been discussed before. Where an index tuple currently has the key values followed by a ctid, simply change that so that it can be a list of ctid's, in order. Actually it's t_tid followed by t_info (which is size + flags) followed by key values. This works on having the same key, but doesn't care if the tuples are on the same page. It used to be not possible because of how to handle forward and backward scanning within an index entry during updates. I think with the new page at a time index scanning, this got a lot easier. I'm not very interested in the case where you have a lot of equal keys, I think the bitmap index am is more suitable for that. The Block B-tree could be used whenever you have a clustered table (including unique indexes). Some DBMSs have index-organized-tables for the same use case. When I tested a prototype of the original idea with TPC-C (DBT-2) data, a block index on the order_line table was under 2% of the size of a normal B-tree. That's very close to a best-case scenario; narrow rows and a completely clustered table. I'm aiming at that order of magnitude reduction in storage size. One issue is that a single index page could hold more than 1000 index entries, which might cause problems for callers. I don't think that's a problem. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Backup and restore through JDBC
The idea is to make one pg_dump of the server and to keep in the machine of client.And to restore this pg_dump when it will be necessary through the machine of the client.Perhaps I will have that to use some store procedure in the server, I do not know regardsMarlon2006/9/29, Albe Laurenz [EMAIL PROTECTED]: Marlon Petry wrote: I am trying to develop, a API to carry through backup and restore through JDBC. I think that the best form is to use JNI. Some Suggestion?Do you mean 'backup' or 'export/dump'? If you mean 'export', do you need anything besides SQL?If you mean 'backup', how do you want to restore froma client machine when there is no server running?And if you're on the server, why would you want to use Java? Yours,Laurenz Albe-- MarlonA busca infinita à felicidade nos priva de vivê-la a cada dia... (Elba Lucas)
Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: The idea is to make one pg_dump of the server and to keep in the machine of client. And to restore this pg_dump when it will be necessary through the machine of the client. Perhaps I will have that to use some store procedure in the server, I do not know pg_dump and pg_restore do not need to run on the server machine. Why not just run them where you want the dump stored? cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Backup and restore through JDBC
The idea is to make one pg_dump of the server and to keep in the machine of client. And to restore this pg_dump when it will be necessary through the machine of the client. Perhaps I will have that to use some store procedure in the server, I do not know pg_dump and pg_restore do not need to run on the server machine. Why notjust run them where you want the dump stored?cheersandrewBut I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could makeregards,Marlon
Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: pg_dump and pg_restore do not need to run on the server machine. Why not just run them where you want the dump stored? But I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could make You can't. pg_dump in particular embodies an enormous amount of knowledge that simply does not exist elsewhere. There is no dump/restore API, and there is nothing you can hook up to using JNI, AFAIK. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Block B-Tree concept
On Fri, 2006-09-29 at 10:51 +0100, Heikki Linnakangas wrote: Heikki Linnakangas wrote: If we want to keep the property that VACUUM doesn't re-evaluate index entries, any proposal that doesn't keep track of every heap tuple isn't going to work. I'll try to modify the design I had in mind so that it does keep track of every heap tuple in some form. The ideal situation is that we have one index pointer per block, so we should look for that and optimize accordingly. We mark the heap block to indicate how many block index pointers there are to it. If we have only a single pointer, then VACUUM will only have to touch the index pointer when the whole heap block is removed. In that case we have no re-evaluation of the index AFAICS. After some thought: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. We should get most of the space and I/O savings as with the original proposal, but we can vacuum it without re-evaluating index expressions. The benefit we're seeking with a block index is that most INSERTs don't write to the index. With that scheme we'd need to continually update the index tuple so that it exactly represented the heap after each inserted tuple, which is going to cause a hot block problem. Much of that can go away if we have a bulk_heap_insert() which puts the index entries in at the end of the block, though that needs some heavy thought also. Can we have this scheme enabled *only* for functional block indexes? Normal case is likely to be monotonically ascending keys for real world objects like Orders, Calls, Transactions etc.. It sounds like the original proposal is still valid for those cases. The bitmap would allow us to access heap rows faster in some circumstances, I suppose. Multi-block bitmaps do make this too much like bitmap indexes and the use-case is very different. [Is there some kind of hybrid solution of block bitmap indexes?] It does change the format of an index tuple, unlike the original proposal. That adds some complexity. but it's doable. Can we use an info bit to have two index tuple formats? - single tuple (as now) - multiple tuple block bitmap (as you suggest) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Backup and restore through JDBC
Andrew Dunstan wrote: Marlon Petry wrote: pg_dump and pg_restore do not need to run on the server machine. Why not just run them where you want the dump stored? But I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could make You can't. pg_dump in particular embodies an enormous amount of knowledge that simply does not exist elsewhere. There is no dump/restore API, and there is nothing you can hook up to using JNI, AFAIK. Recently, there was the proposal to extract that knowledge to a library (making pg_dump itself just a wrapper). This sounds valuable more and more, is anybody working on this for 8.3? Regards, Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Backup and restore through JDBC
Hi, Marlon, Marlon Petry wrote: But I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could make pg_dump and pg_restore should be runnable (possible with a small shell / bash wrapper script) without any installation, simply having them and all neded libs lying in the current directory. They don't need any registry keys, weird environment settings or such, just some libs which should be present on most platforms, except libpq. Using a java application for dump/restore will burden you with installing a JVM, the PostgreSQL JDBC drivers, and your application, which seems at least equal effort and more ressources. Btw, another idea is to run pg_dump on the server, but pipe its output to the client, e. G. via running it through telnet or SSH (yes there are SSH servers for windows), and then piping it to a file on the client (e. G. using plink.exe from the putty package). Or use netcat or so. On a unix box, when you're really crazy, and want to ignore all security restrictions, you could even install pg_dump via inetd, and then everyone connecting via TCP on the appropriate port gets a dump of the database. :-) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: pg_dump and pg_restore do not need to run on the server machine. Why not just run them where you want the dump stored? But I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could make You can't. pg_dump in particular embodies an enormous amount of knowledge that simply does not exist elsewhere. There is no dump/restore API, and there is nothing you can hook up to using JNI, AFAIK.Recently, there was the proposal to extract that knowledge to a library(making pg_dump itself just a wrapper). This sounds valuable more and more, is anybody working on this for 8.3?Regards,AndreasI have interest in working,how i could start ?regards,marlon
Re: [HACKERS] Block B-Tree concept
Csaba Nagy [EMAIL PROTECTED] writes: I think you build a whole new index named something like .temp-reindex and then as the last step of the second transaction delete the old idnex and rename the new index. That would require getting exclusive lock on the table. Just out of curiosity, creating a new index concurrently (or online, whatever you call it) doesn't require to set an exclusive lock on the table ? I thought it would, at least swiftly at the end of the operation, after all it's modifying the table... Nope. As I understand it the step that fundamentally requires a table lock is actually dropping the old index. You have to be sure nobody is actually using it before you do anything that causes people to stop maintaining it. We could do something like how the online index build creates the index but in reverse. We mark the index invalid and then wait out any transactions that could be using it. Then we can drop it safely. But I think even that has some practical problems. Transactions that have that index in their relcache structure for the table will try to maintain it and get confused if it's gone. It seems to me that taking a brief lock on the table at the end of the reindex isn't actually much of a problem. It only needs to be held briefly and it can be done in a separate transaction so there isn't a deadlock risk. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: You can't. pg_dump in particular embodies an enormous amount of knowledge that simply does not exist elsewhere. There is no dump/restore API, and there is nothing you can hook up to using JNI, AFAIK. Recently, there was the proposal to extract that knowledge to a library (making pg_dump itself just a wrapper). This sounds valuable more and more, is anybody working on this for 8.3? I have interest in working,how i could start ? Start by reading the code in src/bin/pg_dump Then after you recover from your head exploding you start devising some sort of sane API ... cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Backup and restore through JDBC
Marlon Petry wrote: You can't. pg_dump in particular embodies an enormous amount of knowledge that simply does not exist elsewhere. There is no dump/restore API, and there is nothing you can hook up to using JNI, AFAIK. Recently, there was the proposal to extract that knowledge to a library (making pg_dump itself just a wrapper). This sounds valuable more and more, is anybody working on this for 8.3? I have interest in working,how i could start ? Start by reading the code in src/bin/pg_dumpThen after you recover from your head exploding you start devising somesort of sane API ...cheersandrew ok. i'm trying start.
[HACKERS] send()/receive() and on-disk storage
Hi, Currently, there's a discussion on the pljava list, and we're confused about a small detail: Does PostgreSQL call the datatypes' defined send() function before storing the tuple data in the table, on disk, and call receive() when reading it again? My position was that send()/receive() are only used for client communication in the V3 protocol binary mode, and COPY in binary mode, but the backend stores them in the tables as_is (modulo compression/ TOASTing). Is my assumption correct? Btw, I'm going to triple-check this in the source and via tests, too. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] send()/receive() and on-disk storage
On Fri, Sep 29, 2006 at 03:44:07PM +0200, Markus Schaber wrote: Does PostgreSQL call the datatypes' defined send() function before storing the tuple data in the table, on disk, and call receive() when reading it again? No. My position was that send()/receive() are only used for client communication in the V3 protocol binary mode, and COPY in binary mode, but the backend stores them in the tables as_is (modulo compression/ TOASTing). Yes. It would be terribly inefficient to call those functions for each read/write. The disk has the internal format, send/receive deal with a portable not-host-dependant representation of the data. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Block B-Tree concept
Simon Riggs wrote: On Fri, 2006-09-29 at 10:51 +0100, Heikki Linnakangas wrote: Heikki Linnakangas wrote: If we want to keep the property that VACUUM doesn't re-evaluate index entries, any proposal that doesn't keep track of every heap tuple isn't going to work. I'll try to modify the design I had in mind so that it does keep track of every heap tuple in some form. The ideal situation is that we have one index pointer per block, so we should look for that and optimize accordingly. We mark the heap block to indicate how many block index pointers there are to it. If we have only a single pointer, then VACUUM will only have to touch the index pointer when the whole heap block is removed. In that case we have no re-evaluation of the index AFAICS. I don't see how that would work. It sounds similar to the reference counting option I proposed, which had the same re-evaluation problem. And in addition, it requires adding index-specific information to the heap page format, which troubles me from a modularization viewpoint. After some thought: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. We should get most of the space and I/O savings as with the original proposal, but we can vacuum it without re-evaluating index expressions. The benefit we're seeking with a block index is that most INSERTs don't write to the index. With that scheme we'd need to continually update the index tuple so that it exactly represented the heap after each inserted tuple, which is going to cause a hot block problem. That's just one of the benefits. I think the main benefit is dramatic reduction in index size which means that more of the index is cached. An INSERT will have to find the corresponding leaf page anyway. Having to dirty it isn't a big deal assuming that the hot blocks stay in cache. The hot block problem worries me a bit too. Any indexing scheme that packs more items on a block is going to suffer from that. Testing will show if that becomes a problem. Can we have this scheme enabled *only* for functional block indexes? No. As Tom pointed out, data type specific functions have potentially the same problem. And having both versions seems like a lot of code and complexity. The bitmap would allow us to access heap rows faster in some circumstances, I suppose. Yes, you wouldn't have to re-evaluate index quals on every tuple, when the whole range represented by the index tuple falls within the range you're searching for. And when there's only few tuples with consecutive keys on a heap page (which is not a good use case for block B-trees), you don't need to scan the whole page to find those matches. Multi-block bitmaps do make this too much like bitmap indexes and the use-case is very different. [Is there some kind of hybrid solution of block bitmap indexes?] Not that I know of, though there is different kind of bitmap indexes. The one that didn't make it to 8.2 uses equality encoding, where you have a bitmap for every distinct value. You can also have range-encoding, where you have a bitmap for ranges of values, for example one bitmap for 1-10, another for 10-15 etc. If you choose the ranges dynamically so that you have one range for each heap page (when it's clustered), you get something similar to the proposed Block B-tree. The current bitmap encoding scheme is optimized for large bitmaps, though, so the performance wouldn't be as good. It does change the format of an index tuple, unlike the original proposal. That adds some complexity. but it's doable. Can we use an info bit to have two index tuple formats? - single tuple (as now) - multiple tuple block bitmap (as you suggest) Yes. There's one bit free in the index tuple header. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Backup and restore through JDBC
Markus Schaber wrote: Marlon Petry wrote: But I would need to have installed pg_dump and pg_restore in machine client? Without having installed pg_dump and pg_restore,how I could make pg_dump and pg_restore should be runnable (possible with a small shell / bash wrapper script) without any installation, simply having them and all neded libs lying in the current directory. There's probably a case for having static builds of pg_dump and pg_restore around for various architectures, if only to help people out when they don't have access to a build environment etc. Either a set of static binaries on the website, or an easy way to build them from the source tree (they could then be copied to the target system). It strikes me that Marlon hasn't really explained why he wants to use JDBC. I assume that your application is Java based, but trust me, invoking pg_dump through Runtime.exec() or whatever is going to be much, much easier than any of the other things you've suggested, such as making a pg_dump API and using JNI to call it. That's just pain city, in a bunch of ways. Do you need to process the dump inside your program in some way? Or do you just need to store a dump and restore it later? Why the fascination with using an API? On a unix box, when you're really crazy, and want to ignore all security restrictions, you could even install pg_dump via inetd, and then everyone connecting via TCP on the appropriate port gets a dump of the database. :-) Oh, man, my head just exploded reading that. That's taking evil and being *creative* with it. :) Cheers Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [COMMITTERS] pgsql: Fix IS NULL and IS NOT NULL tests on row-valued
9.17.5. Row-wise Comparison These constructs test a row value for null or not null. A row value is considered not null if it has at least one field that is not null. I suppose, it should be changed too. Tom Lane wrote: the SQL spec, viz IS NULL is true if all the row's fields are null, IS NOT NULL is true if all the row's fields are not null. The former coding got -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] send()/receive() and on-disk storage
Hi, Martijn, Martijn van Oosterhout wrote: It would be terribly inefficient to call those functions for each read/write. The disk has the internal format, send/receive deal with a portable not-host-dependant representation of the data. Thanks. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 01:15:06PM +0900, ITAGAKI Takahiro wrote: Jim C. Nasby [EMAIL PROTECTED] wrote: The reason I thought of this is because once the transaction commits, we have no use for the cid info. So we could do something like have bgwriter look for tuples that belong to committed transactions before it writes a page, and strip the cid out of them. Your concept is just like as the experimental method that I suggested before in http://archives.postgresql.org/pgsql-hackers/2005-08/msg01193.php We can remove cmin and cmax from commited tuples and xmin from frozen tuples and we might save some bytes in tuple headers. However, I think our next goal to shrink the headers is 16 bytes. The headers become 23 bytes using phantom cids and we are limited by alignments, so we will have no more advantages unless we delete extra 7 bytes in the headers. ...and it seems to be very difficult. Dumb question... wouldn't getting down to 20 bytes buy us something? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Block B-Tree concept
Heikki Linnakangas [EMAIL PROTECTED] writes: Imagine a normal B-tree just like what we have now. But when there is more than one tuple on the same heap page with consecutive index keys, we represent all of them in a single index tuple that contains the key of the first one of them, and a (run-length encoded) bitmap of the OffsetNumbers. At first I thought that was a typo, and instead of consecutive you meant to write equal. I gather from the later statement I'm not very interested in the case where you have a lot of equal keys, I think the bitmap index am is more suitable for that. that indeed you meant to write consecutive, and I've got a problem with that: define consecutive. In a datatype independent fashion, please. I also wonder how you are going to implement splitting and merging of runs, which will certainly be necessary if this isn't to be a constantly-requires-REINDEX thing. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Block B-Tree concept
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I'm not very interested in the case where you have a lot of equal keys, I think the bitmap index am is more suitable for that. that indeed you meant to write consecutive, and I've got a problem with that: define consecutive. In a datatype independent fashion, please. I also wonder how you are going to implement splitting and merging of runs, which will certainly be necessary if this isn't to be a constantly-requires-REINDEX thing. I don't mean consecutive as in 1, 2, 3, 4, ... without gaps but as in A and B are consecutive in the index, if there's no value X in the index so that A X B. Maybe there's a better word for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Faster StrNCpy
Markus Schaber [EMAIL PROTECTED] writes: There's another disadvantage of always copying 64 bytes: It may be that the 64-byte range crosses a page boundary. Now guess what happens when this next page is not mapped - segfault. Irrelevant, because in all interesting cases the Name field is part of a larger record that would stretch into that other page anyway. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Block B-Tree concept
On Friday 29 September 2006 10:55, Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I'm not very interested in the case where you have a lot of equal keys, I think the bitmap index am is more suitable for that. that indeed you meant to write consecutive, and I've got a problem with that: define consecutive. In a datatype independent fashion, please. I also wonder how you are going to implement splitting and merging of runs, which will certainly be necessary if this isn't to be a constantly-requires-REINDEX thing. I don't mean consecutive as in 1, 2, 3, 4, ... without gaps but as in A and B are consecutive in the index, if there's no value X in the index so that A X B. Maybe there's a better word for that. http://en.wikipedia.org/wiki/Monotonic jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Backup and restore through JDBC
Andrew Dunstan [EMAIL PROTECTED] writes: Then after you recover from your head exploding you start devising some sort of sane API ... That's the hard part. There is no percentage in having a library if it doesn't do anything significantly different from what you could accomplish via system(pg_dump ...switches); What is it you hope to accomplish by having a library, exactly? (And don't say more control over the dump process. pg_dump is already on the hairy edge of maintainability; we do *not* need to try to deal with making it still function correctly after an application programmer makes some random intervention in the process.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Another idea for dealing with cmin/cmax
Jim C. Nasby [EMAIL PROTECTED] writes: Dumb question... wouldn't getting down to 20 bytes buy us something? Only on 32-bit machines, which are getting less interesting as database servers every day. (Just last night I was reading somebody opining that the transition to 64-bit hardware would be effectively complete by 2008 ... and he was talking about desktop PCs, not serious iron.) BTW, the apparently useless byte after the 27- or 23-byte header actually has some good use: in a table of up to 8 columns, you can fit a null bitmap there for free. In a scheme that took us down to 20 rather than 19 bytes, even a narrow table would pay the full maxalign price for having a null. I'm in favor of combining cmin/cmax/xvac to get us down to 23 bytes, but I think anything beyond that is going to face a serious problem of greatly increased cost for diminishing returns. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Block B-Tree concept
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: define consecutive. I don't mean consecutive as in 1, 2, 3, 4, ... without gaps but as in A and B are consecutive in the index, if there's no value X in the index so that A X B. Maybe there's a better word for that. Um, but how are you going to make that work without storing the keys for both A and B? You won't be able to tell whether an incoming key C that's just a bit bigger than A should go before or after B. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)
[ expanding this thread, as it now needs wider discussion ] Paul B. Anderson [EMAIL PROTECTED] writes: Actually, I was not filling all of the arrays in sequential order. I added code to initialize them in order and the function seems to be working now. Is that a known problem? Well, it's a documented behavior: section 8.10.4 saith A stored array value can be enlarged by assigning to an element adjacent to those already present, or by assigning to a slice that is adjacent to or overlaps the data already present. Up to 8.2 we didn't have a lot of choice about this, because without any ability to have nulls embedded in arrays, there wasn't any sane thing to do with the intermediate positions if you assigned to an element not adjacent to the existing range. As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. Comments? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Block B-Tree concept
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: I don't mean consecutive as in 1, 2, 3, 4, ... without gaps but as in A and B are consecutive in the index, if there's no value X in the index so that A X B. Maybe there's a better word for that. Um, but how are you going to make that work without storing the keys for both A and B? You won't be able to tell whether an incoming key C that's just a bit bigger than A should go before or after B. Let me describe the insertion algorithm: 1. To insert a tuple with key X, we find the position in the index where the new tuple would go, just like with a normal B-tree. Let's call the index tuple right before the position A and the next tuple B. So according to normal B-tree rules, X should go between A and B. 2. If A points to the same heap page as X, we set the bit representing the offset of the new tuple in the index tuple A (this might require enlarging the index tuple and event splitting the page), and we're done. If it points to a different page, we need split the range A-B to A-X-B, proceed to step 3. 3. To split the range A-B, scan the heap page to see which of the tuples pointed to by A are = X and which are X . If there's no tuples = X, insert a new index tuple for X, and we're done. Otherwise, let Y be the smallest tuple = X. Insert a new index tuple for Y, containing all the offsets with keys = X, and remove those offsets from A. We have now split A-B to A-Y-B so that A X Y B. 4. Insert the new index tuple for X. (I'm not sure I got the above description correct for cases with equal keys.) Note that we don't keep track of the ordering of tuples that are clumped into a single index tuple. That's not important, I/O wise, because if you're going to fetch a heap page into memory, you might as well scan all the tuples on it and sort them if necessary. That's where the space and I/O savings comes from. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] a little doubr about domains and pl/python
Hi all. I'm playing with pl/python AND i'm with a doubt. I wanna create a domain to roman numerals. I have made: CREATE OR REPLACE FUNCTION valid_roman(text) RETURNS BOOLEAN IMMUTABLE LANGUAGE plpythonu AS $$ from roman9 import fromRoman roman = args[0] try: r = fromRoman(roman) return 'true' except: return 'false' $$; CREATE DOMAIN romano AS text CHECK (valid_roman(VALUE)) ; CREATE OR REPLACE FUNCTION to_roman(integer) RETURNS romano IMMUTABLE LANGUAGE plpythonu AS $$ from roman9 import toRoman number = args[0] try: r = toRoman(number) return r except: return 'false' $$; CREATE CAST (integer AS romano) WITH FUNCTION to_roman(integer); And that works: SELECT to_roman(50) SELECT to_roman(100) SELECT valid_roman('XX') SELECT valid_roman('XXX') SELECT valid_roman('') But my cast doen't works: SELECT 5::romano ERROR: value for domain romano violates check constraint romano_check What's wrong? []'s - Walter ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [COMMITTERS] pgsql: Fix IS NULL and IS NOT NULL tests on row-valued expressions to
Teodor Sigaev [EMAIL PROTECTED] writes: 9.17.5. Row-wise Comparison These constructs test a row value for null or not null. A row value is considered not null if it has at least one field that is not null. Wups, I missed that part of the docs, will fix. Thanks. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)
On Sep 29, 2006, at 9:14 AM, Tom Lane wrote: [ expanding this thread, as it now needs wider discussion ] Paul B. Anderson [EMAIL PROTECTED] writes: Actually, I was not filling all of the arrays in sequential order. I added code to initialize them in order and the function seems to be working now. Is that a known problem? Well, it's a documented behavior: section 8.10.4 saith A stored array value can be enlarged by assigning to an element adjacent to those already present, or by assigning to a slice that is adjacent to or overlaps the data already present. Up to 8.2 we didn't have a lot of choice about this, because without any ability to have nulls embedded in arrays, there wasn't any sane thing to do with the intermediate positions if you assigned to an element not adjacent to the existing range. As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. Comments? At first blush, this strikes me as a bit too magical/implicit. Are there other languages where sequences behave similarly? The best analogy that comes to mind is sparse files, but in that case there is an implicit contract that the intervening empty regions do not actually occupy physical space, doesn't sound like that's true here. I think the result of this change would be more difficult debugging of off-by-one errors and their ilk, rather than actually being a real benefit. OTOH, perhaps there is a real use-case I am missing here. I don't see the rest of this thread on GENERAL and I couldn't find it searching the archives, where did it come from? -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)
As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. At first blush, this strikes me as a bit too magical/implicit. Are there other languages where sequences behave similarly? perl -e '@A = (1, 2, 3); print @A\n; $A[10] = 10; print @A\n;' 1 2 3 1 2 310 - John D. Burger MITRE ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Backup and restore through JDBC
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Then after you recover from your head exploding you start devising some sort of sane API ... That's the hard part. There is no percentage in having a library if it doesn't do anything significantly different from what you could accomplish via system(pg_dump ...switches); What is it you hope to accomplish by having a library, exactly? (And don't say more control over the dump process. Some more progress feedback would be really nice. pg_dump is already on the hairy edge of maintainability; we do *not* need to try to deal with making it still function correctly after an application programmer makes some random intervention in the process.) Agreed. The only sane approach seems to have a single dump function call (that takes a set of parameters as prepared by command line scanning) and a set of callbacks that enable api users to do sensible stuff at different stages of the backup process. Regards, Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Nulls, arrays, records, IS NULL, IS DISTINCT FROM
Following up yesterday's discussion, I've been studying the SQL spec for null predicate and distinct predicate, and it seems a bit inconsistent. The rules for distinct predicate make it clear that you are supposed to drill down into row and array values to determine distinctness. SQL99 has a) If the declared type of X or Y is an array type, then X IS DISTINCT FROM Y is effectively computed as follows: i) Let NX be the number of elements in X; let NY be the number of elements in Y. ii) Let EX(i) be the i-th element of X; let EY(i) be the i-th element of Y. iii) Case: 1) If NX is not equal to NY, then X IS DISTINCT FROM Y is true. 2) If NX equals zero and NY equals zero, then X IS DISTINCT FROM Y is false. 3) If EX(i) IS DISTINCT FROM EY(i) is false for all i between 1 (one) and NX, then X IS DISTINCT FROM Y is false. 4) Otherwise, X IS DISTINCT FROM Y is true. SQL2003 has completely rewritten the text but the meaning seems the same. I suppose we want to generalize the NX/NY business to say if the array bounds are not identical then the arrays are distinct. We are clearly getting this wrong since the introduction of nulls in arrays, but I'll go fix that. Similarly, given two row expressions, distinctness is determined field-wise: X and Y are distinct if any two corresponding fields are distinct. We are currently getting this correct only for the case of parse-time ROW expressions, ie ROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz) This is pretty much analogous to the case Teodor noted yesterday for IS NULL: it's not being done in gram.y but it's still being done much too early. We need to be able to do it in the executor to handle situations where a row value is coming from a function or some other source that's not disassemblable at parse time. What's bothering me is that for foo IS [NOT] NULL, the spec clearly prescribes drilling down into a rowtype value to examine the individual fields, but I can't find any language that prescribes the same for arrays. Is this intentional, or an oversight? In particular, the spec says ROW(1,2,NULL) IS NOT NULL is false, because the row fields must be *all* not null to make it true. But it's very unclear whether ARRAY[1,2,NULL] IS NOT NULL should be false on the same reasoning. Right now, we respond true on the grounds that the array object as-a-whole isn't null, without examining its contents. Comments? Does anyone see any guidance in the spec? If there is none, which behavior do we think is most useful/consistent? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Yes, unfortunately there isn't much more to be had for another 2 weeks ;) I trust they've got the reboot time and they will know exactly how long from reboot to problem? I'm not all that sold on the GetTickCount overflow theory, but certainly we ought not be missing a chance to test or disprove it. Yes I documented all conversations and disclaimers :) O.k. further on this.. the crashing is happening quickly now but not predictably. (as in sometimes a week sometimes 2 days). I just now got them to send some further logs... Interestingly: 2006-09-28 16:38:37.406 LOG: could not send data to client: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. That log entry is the last (of consequence) entry before the machine says: 2006-09-28 16:40:36.921 LOG: received fast shutdown request 2006-09-28 16:40:36.921 LOG: aborting any active transactions 2006-09-28 16:40:36.921 FATAL: terminating connection due to administrator command On the ERROR side of things I have a bunch of standard, unique key violations etc... AND: postgresql-2006-09-27_00.log:2006-09-27 23:49:57.671 FATAL: could not read from statistics collector pipe: No error I have requested a clean run with entire log at DEBUG2. Hopefully that will give us more info. Sincerely, Joshua D. Drake Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] JAVA Support
On Sep 28, 2006, at 9:35 PM, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Is there any reason why we haven't built a generic authentication API? Something like PAM, except cross platform? We're database geeks, not security/crypto/authentication geeks. What makes you think we have any particular competence to do the above? Actually, the part of this proposal that raised my hackles the most was the claim that GSSAPI provides a generic auth API, because that was exactly the bill of goods we were sold in connection with PAM. (So why is this our problem at all --- can't you make a PAM plugin for it??) It didn't help any that that was shortly followed by the lame admission that no one has ever implemented anything except Kerberos underneath it. Word to the wise, guys: go *real* soft on vaporware claims for auth stuff, because we've seen enough of those before. Well, that's why I was pushing SASL instead of GSSAPI. There are multiple mechanisms that are actually in use. PAM turned out not to be sufficiently specified for cross-platform behavioral compatibility, and it only does password checking anyway. Calling it a security solution is a big overstatement IMO. I guess a lot of people use PAM with SSL and don't worry about the gap between the two (which SASL or GSSAPI close). In defense of GSSAPI non-Kerberos mechanisms do exist. They just cost money and they aren't very cross-platform. AFAIK GSSAPI has no simple password mechanisms. There's a Microsoft-compatible SPNEGO mechanism for GSSAPI that's being implemented fairly widely now, but it's just a sub-negotiation mech that lets you choose between a Kerberos 5 (that's practically identical to the direct one), and NTLM. If you allow NTLM you'd better limit it to NTLMv2! The opinions expressed in this message are mine, not those of Caltech, JPL, NASA, or the US Government. [EMAIL PROTECTED], or [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] JAVA Support
On Sep 29, 2006, at 12:31 AM, Magnus Hagander wrote: However, that doesn't change that some people would like us to support GSSAPI, and there may be some benefit (additional applications, better network authentication, etc.) for doing so. If we can get additional programmers to code the support (i.e. Sun, JPL) I don't see any reason not to support the *additional* authentication methods. Well, as I said already, a lot depends on the size of the patch. As a reductio ad absurdum, if they drop 100K lines of code on us, it *will* get rejected, no matter how cool it is. Oh, absolutely. The current Kerberos support seems to require about 50 lines in configure.in and circa 200 lines of C code in each of the backend and libpq. Plus a dependency on an outside library that happens to be readily available and compatibly licensed. I would expect, without looking at the details of the API, GSSAPI to be about the same amount of code if not less. Probably save some Kerberos bookkeeping. Probably loose it with GSSAPI bookkeeping, including name translation (which is far less obvious). Net, I would expect to lose, but not by very much. What amount of code are we talking about adding here, and what dependencies exactly? What portability and license hazards will be added? The Kerberos5 libraries that we rely on today provide GSSAPI. So it would work with the same external library. Now, it could *also* work with other libraries in some cases (for example, the Win32 SSPI libraries), but with the same libraries it should work fine. //Magnus If I had a lot of time to spend on this I would write a SASL-like wrapper so it could be used on platforms with GSSAPI, but not SASL support in the OS. As you may have noticed, I believe SASL is the way to go. I'm not up for it though. There's probably room in the world for a SASL-lite library though. Cyrus is great, but if your OS doesn't supply it for you, it's supposed to be really hard to build. The opinions expressed in this message are mine, not those of Caltech, JPL, NASA, or the US Government. [EMAIL PROTECTED], or [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Testing strlcpy ()
We've completed some tests comparing Postgres 8.2beta1 (beta1) and Postgres 8.2beta1 with strlcpy () (strlcpy). First and foremost, the patch seems to be stable - we have not run into any issues with it. After a database reload, there is an 11% difference between strlcpy and beta1 - strlcpy (107 TPS) performing ~11% better than beta1 (96 TPS). This seems to be consistent over a number of runs. Originally, we had only seen 3% difference here. After 8 (or so) runs, the difference between the two versions disappears and both versions produce ~250 TPS. Both versions, over time, spend more and more time in LWLockAcquire. David ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Nulls, arrays, records, IS NULL, IS DISTINCT FROM
On Fri, Sep 29, 2006 at 12:53:19PM -0400, Tom Lane wrote: Following up yesterday's discussion, I've been studying the SQL spec for null predicate and distinct predicate, and it seems a bit inconsistent. Do you have the official spec in hand, or just the draft from wiscorp? SQL2003 has completely rewritten the text but the meaning seems the same. I suppose we want to generalize the NX/NY business to say if the array bounds are not identical then the arrays are distinct. That sounds like a reasonable generalization. We are clearly getting this wrong since the introduction of nulls in arrays, but I'll go fix that. Similarly, given two row expressions, distinctness is determined field-wise: X and Y are distinct if any two corresponding fields are distinct. We are currently getting this correct only for the case of parse-time ROW expressions, ie ROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz) This is pretty much analogous to the case Teodor noted yesterday for IS NULL: it's not being done in gram.y but it's still being done much too early. We need to be able to do it in the executor to handle situations where a row value is coming from a function or some other source that's not disassemblable at parse time. What's bothering me is that for foo IS [NOT] NULL, the spec clearly prescribes drilling down into a rowtype value to examine the individual fields, but I can't find any language that prescribes the same for arrays. Is this intentional, or an oversight? In particular, the spec says ROW(1,2,NULL) IS NOT NULL is false, because the row fields must be *all* not null to make it true. That's odd because as I understand the above, ROW(1,2,NULL) IS NULL is also false. Have I misunderstood? But it's very unclear whether ARRAY[1,2,NULL] IS NOT NULL should be false on the same reasoning. Right now, we respond true on the grounds that the array object as-a-whole isn't null, without examining its contents. Comments? Does anyone see any guidance in the spec? Sadly, I don't have the official one in hand. Is there one available? If there is none, which behavior do we think is most useful/consistent? My reading of the (provisional) spec is that a complex type (ROW, ARRAY, MULTISET) is NULL iff all of its elements are NULL or the whole of it is NULL. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Nulls, arrays, records, IS NULL, IS DISTINCT FROM
David Fetter [EMAIL PROTECTED] writes: On Fri, Sep 29, 2006 at 12:53:19PM -0400, Tom Lane wrote: In particular, the spec says ROW(1,2,NULL) IS NOT NULL is false, because the row fields must be *all* not null to make it true. That's odd because as I understand the above, ROW(1,2,NULL) IS NULL is also false. Have I misunderstood? Yup, they are both false. The spec goes to some trouble to make it clear that IS NULL/IS NOT NULL are not inverses for row values: NOTE 135 - For all R, R IS NOT NULL has the same result as NOT R IS NULL if and only if R is of degree 1. Table 16, null predicate semantics, specifies this behavior. What I find odd is the lack of comparable language about arrays. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] a little doubr about domains and pl/python
Walter Cruz [EMAIL PROTECTED] writes: CREATE CAST (integer AS romano) WITH FUNCTION to_roman(integer); What's wrong? Domains aren't supported as cast source/targets at the moment; I don't think the system is finding your cast specification at all, but is instead doing this as 5::text::romano which of course fails. See past discussions about the subtleties of trying to fix this. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Array assignment behavior (was Re: [ADMIN] Stored procedure array
It seems that the suggestion to fill intermediate positions with NULLs would be preferable to the current behavior. I know of no requirement to populate arrays in sequence in any other language so I think other programmers would be surprised too by the current behavior. Paul Tom Lane wrote: [ expanding this thread, as it now needs wider discussion ] "Paul B. Anderson" [EMAIL PROTECTED] writes: Actually, I was not filling all of the arrays in sequential order. I added code to initialize them in order and the function seems to be working now. Is that a known problem? Well, it's a documented behavior: section 8.10.4 saith A stored array value can be enlarged by assigning to an element adjacent to those already present, or by assigning to a slice that is adjacent to or overlaps the data already present. Up to 8.2 we didn't have a lot of choice about this, because without any ability to have nulls embedded in arrays, there wasn't any sane thing to do with the intermediate positions if you assigned to an element not adjacent to the existing range. As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. Comments? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match .
Re: [HACKERS] [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure
Yep, that definitely threw me the first time I encountered it. Paul B. Anderson wrote: It seems that the suggestion to fill intermediate positions with NULLs would be preferable to the current behavior. I know of no requirement to populate arrays in sequence in any other language so I think other programmers would be surprised too by the current behavior. Paul Tom Lane wrote: [ expanding this thread, as it now needs wider discussion ] Paul B. Anderson [EMAIL PROTECTED] writes: Actually, I was not filling all of the arrays in sequential order. I added code to initialize them in order and the function seems to be working now. Is that a known problem? Well, it's a documented behavior: section 8.10.4 saith A stored array value can be enlarged by assigning to an element adjacent to those already present, or by assigning to a slice that is adjacent to or overlaps the data already present. Up to 8.2 we didn't have a lot of choice about this, because without any ability to have nulls embedded in arrays, there wasn't any sane thing to do with the intermediate positions if you assigned to an element not adjacent to the existing range. As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. Comments? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match . -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)
John D. Burger [EMAIL PROTECTED] writes: As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. At first blush, this strikes me as a bit too magical/implicit. Are there other languages where sequences behave similarly? perl -e '@A = (1, 2, 3); print @A\n; $A[10] = 10; print @A\n;' 1 2 3 1 2 310 Actually, now that I look closely, I think the SQL spec demands exactly this. Recall that SQL99 only allows one-dimensional, lower-bound-one arrays. The specification for UPDATE ... SET C[I] = SV ... reads Case: i) If the value of C is null, then an exception condition is raised: data exception - null value in array target. ii) Otherwise: 1) Let N be the maximum cardinality of C. 2) Let M be the cardinality of the value of C. 3) Let I be the value of the simple value specification immediately contained in update target. 4) Let EDT be the element type of C. 5) Case: A) If I is greater than zero and less than or equal to M, then the value of C is replaced by an array A with element type EDT and cardinality M derived as follows: I) For j varying from 1 (one) to I-1 and from I+1 to M, the j-th element in A is the value of the j-th element in C. II) The I-th element of A is set to the specified update value, denoted by SV, by applying the General Rules of Subclause 9.2, Store assignment, to the I-th element of A and SV as TARGET and VALUE, respectively. B) If I is greater than M and less than or equal to N, then the value of C is replaced by an array A with element type EDT and cardinality I derived as follows: I) For j varying from 1 (one) to M, the j-th element in A is the value of the j-th element in C. II) For j varying from M+1 to I-1, the j-th element in A is the null value. III) The I-th element of A is set to the specified update value, denoted by SV, by applying the General Rules of Subclause 9.2, Store assignment, to the I-th element of A and SV as TARGET and VALUE, respectively. C) Otherwise, an exception condition is raised: data exception - array element error. We currently violate case i by allowing the null array value to be replaced by a single-element array. I'm disinclined to change that, as I think our behavior is more useful than the spec's. But case ii.5.B pretty clearly describes null-fill, so I think we'd better do that, now that we can. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure
Ok, just so I can be sure I understand what I just read: i. says that you can assign to an array that has not been initialized. ii. states that the index of an insertion into an array should not be limited by the current range of index values of the array and requires any gaps in the index range to be set with values of null. I really don't have anything to add to discussion other than that I agree with Tom's assessment, but rather want clarify what might be a slightly hazy interpretation of the specs listed below. Tom Lane wrote: Actually, now that I look closely, I think the SQL spec demands exactly this. Recall that SQL99 only allows one-dimensional, lower-bound-one arrays. The specification for UPDATE ... SET C[I] = SV ... reads Case: i) If the value of C is null, then an exception condition is raised: data exception - null value in array target. ii) Otherwise: 1) Let N be the maximum cardinality of C. 2) Let M be the cardinality of the value of C. 3) Let I be the value of the simple value specification immediately contained in update target. 4) Let EDT be the element type of C. 5) Case: A) If I is greater than zero and less than or equal to M, then the value of C is replaced by an array A with element type EDT and cardinality M derived as follows: I) For j varying from 1 (one) to I-1 and from I+1 to M, the j-th element in A is the value of the j-th element in C. II) The I-th element of A is set to the specified update value, denoted by SV, by applying the General Rules of Subclause 9.2, Store assignment, to the I-th element of A and SV as TARGET and VALUE, respectively. B) If I is greater than M and less than or equal to N, then the value of C is replaced by an array A with element type EDT and cardinality I derived as follows: I) For j varying from 1 (one) to M, the j-th element in A is the value of the j-th element in C. II) For j varying from M+1 to I-1, the j-th element in A is the null value. III) The I-th element of A is set to the specified update value, denoted by SV, by applying the General Rules of Subclause 9.2, Store assignment, to the I-th element of A and SV as TARGET and VALUE, respectively. C) Otherwise, an exception condition is raised: data exception - array element error. We currently violate case i by allowing the null array value to be replaced by a single-element array. I'm disinclined to change that, as I think our behavior is more useful than the spec's. But case ii.5.B pretty clearly describes null-fill, so I think we'd better do that, now that we can. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Per-database search_path
Folks, I'd like to propose a feature for 8.3, which would let login roles have a default search_path both globally, as it is now, and per-database. This is because in general no two databases have any schemas in common, and a login role should be able to do something reasonable just by connecting. What would be involved with making this so? So far, I can see: * Storing this information in pg_catalog somehow * Setting the search_path when a login role connects Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Per-database search_path
DF, I'd like to propose a feature for 8.3, which would let login roles have a default search_path both globally, as it is now, and per-database. This is because in general no two databases have any schemas in common, and a login role should be able to do something reasonable just by connecting. Yes. I've been bitten by this more than once ... However, it almost seems like this would become a piece of the other per-database-user stuff we'd like to do, like local superuser. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Per-database search_path
On Fri, Sep 29, 2006 at 01:06:09PM -0700, Josh Berkus wrote: DF, I'd like to propose a feature for 8.3, which would let login roles have a default search_path both globally, as it is now, and per-database. This is because in general no two databases have any schemas in common, and a login role should be able to do something reasonable just by connecting. Yes. I've been bitten by this more than once ... Come to think of it, this ties in with the ON (DIS)CONNECT TRIGGER idea that others have brought up in the past. The trigger idea may be too big a hammer, and might even be ill-advised from a design point of view, but I thought I'd bring it up anyhow. However, it almost seems like this would become a piece of the other per-database-user stuff we'd like to do, like local superuser. I'm not sure that's the same. The thing about superuser as it exists now is the ability to write to the filesystem, which means that there's no boundary really possible. Maybe some kind of tiered system of users with more than 2 tiers...but that sounds pretty byzantine to me. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Faster StrNCpy
On Fri, Sep 29, 2006 at 11:21:21AM +0200, Markus Schaber wrote: Tom Lane wrote: Just wondering - are any of these cases where a memcpy() would work just as well? Or are you not sure that the source string is at least 64 bytes in length? In most cases, we're pretty sure that it's *not* --- it'll just be a palloc'd C string. I'm disinclined to fool with the restriction that namestrcpy zero-pad Name values, because they might end up on disk, and allowing random memory contents to get written out is ungood from a security point of view. There's another disadvantage of always copying 64 bytes: It may be that the 64-byte range crosses a page boundary. Now guess what happens when this next page is not mapped - segfault. With strncpy(), this possibility already exists. If it is a real problem, that stand-alone 64-byte allocations are crossing page boundaries, the fault is with the memory allocator, not with the user of the memory. For strlcpy(), my suggestion that Tom quotes was that modern processes do best when instructions can be fully parallelized. It is a lot easier to parallelize a 64-byte copy, than a tight loop looking for '\0' or n = 64. 64 bytes easily fits into cache memory, and modern processors write cache memory in blocks of 16, 32, or 64 bytes anyways, meaning that any savings in terms of not writing are minimal. But it's only safe if you know that the source string allocation is = 64 bytes. Often you don't, therefore it isn't safe, and the suggestion is unworkable. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Faster StrNCpy
If anybody is curious, here are my numbers for an AMD X2 3800+: $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to be slow.' -o x x.c y.c strlcpy.c ; ./x NONE:620268 us MEMCPY: 683135 us STRNCPY:7952930 us STRLCPY: 10042364 us $ gcc -O3 -std=c99 -DSTRING='Short sentence.' -o x x.c y.c strlcpy.c ; ./x NONE:554694 us MEMCPY: 691390 us STRNCPY:7759933 us STRLCPY:3710627 us $ gcc -O3 -std=c99 -DSTRING='' -o x x.c y.c strlcpy.c ; ./x NONE:631266 us MEMCPY: 775340 us STRNCPY:7789267 us STRLCPY: 550430 us Each invocation represents 100 million calls to each of the functions. Each function accepts a 'dst' and 'src' argument, and assumes that it is copying 64 bytes from 'src' to 'dst'. The none function does nothing. The memcpy calls memcpy(), the strncpy calls strncpy(), and the strlcpy calls the strlcpy() that was posted from the BSD sources. (GLIBC doesn't have strlcpy() on my machine). This makes it clear what the overhead of the additional logic involves. memcpy() is approximately equal to nothing at all. strncpy() is always expensive. strlcpy() is often more expensive than memcpy(), except in the empty string case. These tests do not properly model the effects of real memory, however, they do model the effects of cache memory. I would suggest that the results are exaggerated, but not invalid. For anybody doubting the none vs memcpy, I've included the generated assembly code. I chalk it entirely up to fully utilizing the parallelization capability of the CPU. Although 16 movq instructions are executed, they can be executed fully in parallel. It almost makes it clear to me that all of these instructions are pretty fast. Are we sure this is a real bottleneck? Even the slowest operation above, strlcpy() on a very long string, appears to execute 10 per microsecond? Perhaps my tests are too easy for my CPU and I need to make it access many different 64-byte blocks? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ .file x.c .text .p2align 4,,15 .globl x_none .type x_none, @function x_none: .LFB14: rep ; ret .LFE14: .size x_none, .-x_none .p2align 4,,15 .globl x_strlcpy .type x_strlcpy, @function x_strlcpy: .LFB17: movl$64, %edx jmp strlcpy .LFE17: .size x_strlcpy, .-x_strlcpy .p2align 4,,15 .globl x_strncpy .type x_strncpy, @function x_strncpy: .LFB16: movl$64, %edx jmp strncpy .LFE16: .size x_strncpy, .-x_strncpy .p2align 4,,15 .globl x_memcpy .type x_memcpy, @function x_memcpy: .LFB15: movq(%rsi), %rax movq%rax, (%rdi) movq8(%rsi), %rax movq%rax, 8(%rdi) movq16(%rsi), %rax movq%rax, 16(%rdi) movq24(%rsi), %rax movq%rax, 24(%rdi) movq32(%rsi), %rax movq%rax, 32(%rdi) movq40(%rsi), %rax movq%rax, 40(%rdi) movq48(%rsi), %rax movq%rax, 48(%rdi) movq56(%rsi), %rax movq%rax, 56(%rdi) ret .LFE15: .size x_memcpy, .-x_memcpy .section.eh_frame,a,@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .string zR .uleb128 0x1 .sleb128 -8 .byte 0x10 .uleb128 0x1 .byte 0x3 .byte 0xc .uleb128 0x7 .uleb128 0x8 .byte 0x90 .uleb128 0x1 .align 8 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB14 .long .LFE14-.LFB14 .uleb128 0x0 .align 8 .LEFDE1: .LSFDE3: .long .LEFDE3-.LASFDE3 .LASFDE3: .long .LASFDE3-.Lframe1 .long .LFB17 .long .LFE17-.LFB17 .uleb128 0x0 .align 8 .LEFDE3: .LSFDE5: .long .LEFDE5-.LASFDE5 .LASFDE5: .long .LASFDE5-.Lframe1 .long .LFB16 .long .LFE16-.LFB16 .uleb128 0x0 .align 8 .LEFDE5: .LSFDE7: .long .LEFDE7-.LASFDE7 .LASFDE7: .long .LASFDE7-.Lframe1 .long .LFB15 .long .LFE15-.LFB15 .uleb128 0x0 .align 8 .LEFDE7: .ident GCC: (GNU) 4.1.1 20060525 (Red Hat 4.1.1-1) .section.note.GNU-stack,,@progbits ---(end of
Re: [HACKERS] Faster StrNCpy
[EMAIL PROTECTED] writes: If anybody is curious, here are my numbers for an AMD X2 3800+: You did not show your C code, so no one else can reproduce the test on other hardware. However, it looks like your compiler has unrolled the memcpy into straight-line 8-byte moves, which makes it pretty hard for anything operating byte-wise to compete, and is a bit dubious for the general case anyway (since it requires assuming that the size and alignment are known at compile time). This does make me wonder about whether we shouldn't try the strlen+memcpy implementation I proposed earlier ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake [EMAIL PROTECTED] writes: O.k. further on this.. the crashing is happening quickly now but not predictably. (as in sometimes a week sometimes 2 days). OK, that seems to eliminate the GetTickCount-overflow theory anyway. That log entry is the last (of consequence) entry before the machine says: 2006-09-28 16:40:36.921 LOG: received fast shutdown request Oh? That's pretty interesting on a Windows machine, because AFAIK there wouldn't be any standard mechanism that might tie into our homegrown signal facility. Anyone have a theory on what might trigger a SIGINT to the postmaster, other than intentional pg_ctl invocation? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Faster StrNCpy
On Fri, Sep 29, 2006 at 05:34:30PM -0400, Tom Lane wrote: [EMAIL PROTECTED] writes: If anybody is curious, here are my numbers for an AMD X2 3800+: You did not show your C code, so no one else can reproduce the test on other hardware. However, it looks like your compiler has unrolled the memcpy into straight-line 8-byte moves, which makes it pretty hard for anything operating byte-wise to compete, and is a bit dubious for the general case anyway (since it requires assuming that the size and alignment are known at compile time). I did show the .s code. I call into x_memcpy(a, b), meaning that the compiler can't assume anything. It may happen to be aligned. Here are results over 64 Mbytes of memory, to ensure that every call is a cache miss: $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x NONE:767243 us MEMCPY: 6044137 us STRNCPY: 10741759 us STRLCPY: 12061630 us LENCPY: 9459099 us $ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x NONE:712193 us MEMCPY: 6072312 us STRNCPY:9982983 us STRLCPY:6605052 us LENCPY: 7128258 us $ gcc -O3 -std=c99 -DSTRING='' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x NONE:708164 us MEMCPY: 6042817 us STRNCPY:8885791 us STRLCPY:5592477 us LENCPY: 6135550 us At least on my machine, memcpy() still comes out on top. Yes, assuming that it is aligned correctly for the machine. Here is unaliagned (all arrays are stored +1 offset in memory): $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to be very slow.' -DN=(1024*1024) -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x NONE:790932 us MEMCPY: 6591559 us STRNCPY: 10622291 us STRLCPY: 12070007 us LENCPY:10322541 us $ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=(1024*1024) -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x NONE:764577 us MEMCPY: 6631731 us STRNCPY:9513540 us STRLCPY:6615345 us LENCPY: 7263392 us $ gcc -O3 -std=c99 -DSTRING='' -DN=(1024*1024) -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x NONE:825689 us MEMCPY: 660 us STRNCPY:8976487 us STRLCPY:5878088 us LENCPY: 6180358 us Alignment looks like it does impact the results for memcpy(). memcpy() changes from around 6.0 seconds to 6.6 seconds. Overall, though, it is still the winner in all cases accept for strlcpy(), which beats it on very short strings (). Here is the cache hit case including your strlen+memcpy as 'LENCPY': $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x NONE:696157 us MEMCPY: 825118 us STRNCPY:7983159 us STRLCPY: 10787462 us LENCPY: 6048339 us $ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=1 -o x x.c y.c strlcpy.c ; ./x NONE:700201 us MEMCPY: 593701 us STRNCPY:7577380 us STRLCPY:3727801 us LENCPY: 3169783 us $ gcc -O3 -std=c99 -DSTRING='' -DN=1 -o x x.c y.c strlcpy.c ; ./x NONE:706283 us MEMCPY: 792719 us STRNCPY:7870425 us STRLCPY: 681334 us LENCPY: 2062983 us First call was every call being a cache hit. With this one, every one is a cache miss, and the 64-byte blocks are spread equally over 64 Mbytes of memory. I've attached the code for your consideration. x.c is the routines I used to perform the tests. y.c is the main program. strlcpy.c is copied from the online reference as is without change. The compilation steps are described above. STRING is the string to try out. N is the number of 64-byte blocks to allocate. ALIGN is the number of bytes to offset the array by when storing / reading / writing. ALIGN should be = 0. At N=1, it's all in cache. At N=1024*1024 it is taking up 64 Mbytes of RAM. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ #include string.h #include sys/types.h size_t strlcpy(char *dst, const char *src, size_t siz); void x_none(char * restrict a, const char * restrict b) { // Do nothing. } void x_memcpy(char * restrict a, const char * restrict b) { memcpy(a, b, 64); } void x_strncpy(char * restrict a, const char * restrict b) { strncpy(a, b, 64); } void x_strlcpy(char * restrict a, const char * restrict b) { strlcpy(a, b, 64); } void x_strlenmemcpy(char * restrict a, const char * restrict b) { size_t len = strlen(b) + 1; memcpy(a, b, len 64 ? len : 64); } #include stdio.h #include string.h #include
Re: [HACKERS] Per-database search_path
On Fri, Sep 29, 2006 at 05:41:35PM -0400, Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: On Fri, Sep 29, 2006 at 01:06:09PM -0700, Josh Berkus wrote: However, it almost seems like this would become a piece of the other per-database-user stuff we'd like to do, like local superuser. I'm not sure that's the same. The thing about superuser as it exists now is the ability to write to the filesystem, which means that there's no boundary really possible. Yeah. ISTM the correct generalization is per-user per-database default GUC settings, which has nothing to do with superuserness. This sounds like a TODO for 8.3. What wrinkles might this involve? Offhand, I'm thinking that it would touch the inheritance stuff that roles have. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Per-database search_path
David Fetter [EMAIL PROTECTED] writes: On Fri, Sep 29, 2006 at 05:41:35PM -0400, Tom Lane wrote: Yeah. ISTM the correct generalization is per-user per-database default GUC settings, which has nothing to do with superuserness. This sounds like a TODO for 8.3. What wrinkles might this involve? Probably rethink the rolconfig/datconfig representation. Maybe it's time for a separate catalog for the settings. Offhand, I'm thinking that it would touch the inheritance stuff that roles have. No, it wouldn't, because defaults only apply at the instant of connection, so there's no inheritance or SET ROLE to worry about. Whatever role you log in as is what you get. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] The enormous s-childXids problem
On Saturday 16 September 2006 20:34, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: The real question is why does the subtransaction actually assign itself an XID --- a simple RETURN NEXT operation ought not do that, AFAICS. I suspect the answer to that is the same as the answer to what's actually creating the subtransaction. plperl_return_next doesn't. I think something must be doing an actual SPI query, not just a return next. The other question on the table is why it didn't respond to QueryCancel in a reasonable amount of time. I'd really like to see a complete test case for this problem ... I think the plperl was a red herring. Once dbi-link grabs a recordset, the rows are looped over, processed, and then inserted (based on some conditionals) into another table. Those inserts are wrapped in a beginexception block, which, since it is in a loop, I suspect is creating the large number of childXids in cases where there are a large number of inserts. I haven't tested that theory, but it seems logical, and should be easy enough to reproduce with a simple LOOP ... END LOOP in plpgsql. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: O.k. further on this.. the crashing is happening quickly now but not predictably. (as in sometimes a week sometimes 2 days). OK, that seems to eliminate the GetTickCount-overflow theory anyway. That log entry is the last (of consequence) entry before the machine says: 2006-09-28 16:40:36.921 LOG: received fast shutdown request Oh? That's pretty interesting on a Windows machine, because AFAIK there wouldn't be any standard mechanism that might tie into our homegrown signal facility. Anyone have a theory on what might trigger a SIGINT to the postmaster, other than intentional pg_ctl invocation? Well the other option would be a windows restart. On windows would that send a SIGINT to the backend? Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] JAVA Support
Henry B. Hotz wrote: Well, that's why I was pushing SASL instead of GSSAPI. There are multiple mechanisms that are actually in use. PAM turned out not to be sufficiently specified for cross-platform behavioral compatibility, and it only does password checking anyway. Calling it a security solution is a big overstatement IMO. I guess a lot of people use PAM with SSL and don't worry about the gap between the two (which SASL or GSSAPI close). In defense of GSSAPI non-Kerberos mechanisms do exist. They just cost money and they aren't very cross-platform. AFAIK GSSAPI has no simple password mechanisms. There's a Microsoft-compatible SPNEGO mechanism for GSSAPI that's being implemented fairly widely now, but it's just a sub-negotiation mech that lets you choose between a Kerberos 5 (that's practically identical to the direct one), and NTLM. If you allow NTLM you'd better limit it to NTLMv2! As already mentioned, the limitations of PAM weren't clear until after we implemented it, so I expect the same to happen here, and the number of acronyms flying around in this discussion is a bad sign too. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Per-database search_path
Added to TODO: * Allow more complex user/database default GUC settings Currently, ALTER USER and ALTER DATABASE support per-user and per-database defaults. Consider adding per-user-and-database defaults so things like search_path can be defaulted for a specific user connecting to a specific database. --- Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: On Fri, Sep 29, 2006 at 05:41:35PM -0400, Tom Lane wrote: Yeah. ISTM the correct generalization is per-user per-database default GUC settings, which has nothing to do with superuserness. This sounds like a TODO for 8.3. What wrinkles might this involve? Probably rethink the rolconfig/datconfig representation. Maybe it's time for a separate catalog for the settings. Offhand, I'm thinking that it would touch the inheritance stuff that roles have. No, it wouldn't, because defaults only apply at the instant of connection, so there's no inheritance or SET ROLE to worry about. Whatever role you log in as is what you get. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] New version of money type
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Sep 29, 2006 at 12:19:07PM +0200, Martijn van Oosterhout wrote: On Thu, Sep 28, 2006 at 06:32:11PM -0500, Jim C. Nasby wrote: What would be ideal is a money type that stored what currency was used and let you change precision (within reason). The taggedtypes version of currency does half of that, by storing the currency and allowing the output format to depend on that. It doesn't allow you to easily change the precision though, that would require user-defined typmod which is still under discussion. It would be possible to create a taggedtypes version of currency based on int64. With the currency tag it would be 12 bytes total. And the number of decimal places could be defined per currency... Interesting thought, probably wouldn't take more than an hour to whip up. If you are at that, it's worth noting that the currency tag consists of three capital ASCII letters. That would be fifteen bits, take or give. That leaves fourty-eightish bits for the number or about 10^14. THis is only half-serious, since there are other problems with currencies (their value is dependent on time, all that Pandora's box). Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFHfWlBcgs9XrR2kYRAhieAJ9GYKruXeW3nqGyg0TO8Mo5bFBNMQCfbfzK lbARH+l5PxIexOElcxTg3WE= =//LX -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] New version of money type
On Sat, Sep 30, 2006 at 04:42:13AM +, [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Sep 29, 2006 at 12:19:07PM +0200, Martijn van Oosterhout wrote: On Thu, Sep 28, 2006 at 06:32:11PM -0500, Jim C. Nasby wrote: If you are at that, it's worth noting that the currency tag consists of three capital ASCII letters. That would be fifteen bits, take or give. I feel silly for even mentioning this, but there are less than 256 countries in the UN, and as far as I know, each has at most one currency, so you could use 8 bits instead of 15. That leaves forty-eightish bits for the number or about 10^14. By the above calculation, 56 bits or about 7.2 * 10^16. THis is only half-serious, since there are other problems with currencies (their value is dependent on time, all that Pandora's box). It's not just dependent on time. Exchange rates vary in such a way that the relationships are not always transitive :P Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match