[HACKERS] Fwd: Starting off with the development
And I wanted to know what type of data structures are created by the parser / planner and how are they read by the executor. I know that it is some sort of tree. But how many child nodes are there in a node on the tree and what structure is the NODE itself of? Which files should I look into to get these details? Thanks -Vaibhav
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote: * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not sure if its worth it. Given that there are widely established functions atoi() and atol(), naming the reverse itoa() and ltoa() makes a lot of sense. The changed versions read like string to ASCII. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SR fails to send existing WAL file after off-line copy
On 02.11.2010 00:47, Tom Lane wrote: Greg Starkgsst...@mit.edu writes: On Mon, Nov 1, 2010 at 12:37 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. This smells very much like http://article.gmane.org/gmane.comp.db.postgresql.devel.general/137052 I wonder if there's some defensive programming way to avoid bugs of this sort. It strikes me that it's not good if there isn't a recognizable invalid value for WAL locations. These bits of code show that there is reason to have one. Maybe we should teach initdb to start the WAL one segment later, and then 0/0 *would* mean invalid, and we could revert these other hacks. Good idea. That can even be back-patched to 9.0, it should have no effect on existing installations, -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] B-tree parent pointer and checkpoints
We have the rm_safe_restartpoint mechanism to ensure that we don't use a checkpoint that splits a multi-level B-tree insertion as a restart point. But to my surprise, we don't have anything to protect against the analogous case during normal operation. This is possible: 1. Split child page. Write WAL records for the child pages. 2. Begin and finish a checkpoint 3. Crash, before writing the WAL record of inserting the child pointer in the parent B-tree page. 4. Recovery begins at the new checkpoint, never sees the incomplete split, so it stays incomplete. In practice that's pretty hard to hit, because a checkpoint takes some time, while locking the parent page and writing the child pointer is usually very quick. But it's possible. It surprises me that we thought of this when we introduced restartpoints, but this more obvious case during normal operation seems to have been there forever. Nothing very bad happens if you lose the parent update, but this would be nice to fix nevertheless. I bumped into this while thinking about archive recovery - the above can happen at archive recovery too if the checkpoint is caused by pg_start_backup(). I think we can fix this by requiring that any multi-WAL-record actions that are in-progress when a checkpoint starts (at the REDO-pointer) must finish before the checkpoint record is written. That will close the issue with restartpoints, archive recovery etc. as well, so we no longer need to worry about this anywhere else than while performing an online checkpoint. I'm thinking of using the isCommit flag for this, to delay writing the checkpoint record until all incomplete splits are finished. isCommit protects against a similar race condition between writing commit record and flushing the clog page, this race condition is similar. Will obviously need to rename it, and double-check that it's safe: b-tree splits take longer, and there's no critical section there like there is in the commit codepath. Comments? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Intelligent RDBMS
Hello All, I am Vijay Ghatpande from Pune, India. I am in IT field for last 30 years and mainly in ERP design, development and implementation. I have worked on JD Edwards, Oracle Apps and SAP. I was involved in design and development of ERP package called ISP – Integrated Software for Production. Now I am independent and started my own consultancy AMS ERP Solutions and work for SME industries in and around Pune. I am technical and worked from COBO, DGL to .NET, C++. I have also worked as ORACLE, DB2 DBA for many years. I have a dream project in my mind. It is on Relational Database Management. Pl read below paragraph from the angle of users of application developed on RDBMS. Any application contains many tables and they are related to each other. RDBMS keeps relations. Developers, users have to know these relations to extract proper data and convert it into information. If multiple tables are involved then query can be more complex. Main problem with RDBMS is writing SQL. End users avoid SQL and thinks that this is a technical work. This is limitation of RDBMS. If SQL becomes free of table and relations then l am sure everyone will like it. For example MS Dynamics has many tables to store the data. We can keep this data in such a way that for developer MS Dynamics will be one table say MSD and user can extract any data from that table. He need not know internal table relations. If this happens life will be easy for many. Application development will be very easy and cost effective. This can be achieved by creating views. I am thinking of intelligent database where relations are inbuilt. If this is available in the product itself then domain knowledge will be in RDBMS. If SQL becomes free of table and relations then l am sure everyone will like it. My intention is not to hide relations from users. But this will be one of the advantages of intelligent RDBMS. This will be next generation RDBMS. This is going to change the RDBMS, ERP world. I am requesting you to give me feedback. With Warm Regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] timestamp of the last replayed transaction
Hi, After 9.0 release, I've often heard that some people want to know how far transactions have been replayed in the standby in timestamp rather than LSN. So I'm thinking to include the function which returns the timestamp of the last applied transaction (i.e., commit/abort WAL record) in the core. Name: pg_last_replay_xact_timestamp (better name?) Return Type: timestamp with time zone Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] timestamp of the last replayed transaction
Fujii Masao masao.fu...@gmail.com writes: After 9.0 release, I've often heard that some people want to know how far transactions have been replayed in the standby in timestamp rather than LSN. So I'm thinking to include the function which returns the timestamp of the last applied transaction (i.e., commit/abort WAL record) in the core. Name: pg_last_replay_xact_timestamp (better name?) Return Type: timestamp with time zone Thought? How do you want to implement the tracking? Will it look like the proposal in this thread: http://archives.postgresql.org/pgsql-hackers/2010-05/msg01209.php Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
Peter Eisentraut pete...@gmx.net writes: On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote: * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not sure if its worth it. Given that there are widely established functions atoi() and atol(), naming the reverse itoa() and ltoa() makes a lot of sense. The changed versions read like string to ASCII. Yeah, and s32 makes no sense at all. I think we should either leave well enough alone (to avoid introducing a cross-version backpatch hazard) or use pg_i32toa etc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improved parallel make support
Peter Eisentraut pete...@gmx.net writes: This patch requires GNU make 3.80, because of the above | feature and the $(eval) function. Version 3.80 is dated October 2002, so it should be no problem, but I do occasionally read of make 3.79 around here; maybe it's time to get rid of that. I did put in a check that makes the build fail right away if a wrong version of make is used. Do we have a handle on how many buildfarm members this will break? (fwiw, my hpux box is running 3.79.1) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-tree parent pointer and checkpoints
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I think we can fix this by requiring that any multi-WAL-record actions that are in-progress when a checkpoint starts (at the REDO-pointer) must finish before the checkpoint record is written. What happens if someone wants to start a new split while the checkpoint is hanging fire? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-tree parent pointer and checkpoints
On 02.11.2010 16:30, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: I think we can fix this by requiring that any multi-WAL-record actions that are in-progress when a checkpoint starts (at the REDO-pointer) must finish before the checkpoint record is written. What happens if someone wants to start a new split while the checkpoint is hanging fire? You mean after CreateCheckPoint has determined the redo pointer, but before it has written the checkpoint record? The new split can go ahead, and the checkpoint doesn't need care about it. Recovery will start at the redo pointer, so it will see the split record, and will know to finish the incomplete split if necessary. The logic is the same as with inCommit. Checkpoint will fetch the list of in-progress splits some time after determining the redo-pointer. It will then wait until all of those splits have finished. Any new splits that begin after fetching the list don't affect the checkpoint. inCommit can't be used as is, because it's tied to the Xid, but something similar should work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Starting off with the development
Vaibhav Kaushal vaibhavkaushal...@gmail.com wrote: What version should I start from? I guess postgresql 9.1 alpha would be good. The HEAD of the master branch of the git repository is where development normally takes place. You should start by developer section of the main PostgreSQL web site and the pages to which it links -- especially the Developer's FAQ page: http://www.postgresql.org/developer/ http://wiki.postgresql.org/wiki/Developer_FAQ You also might want to browse the README files in the source directories. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking latest timeline in standby mode
On 02.11.2010 07:15, Fujii Masao wrote: On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yeah, that's one approach. Another is to validate the TLI in the xlog page header, it should always match the current timeline we're on. That would feel more robust to me. Yeah, that seems better. We're a bit fuzzy about what TLI is written in the page header when the timeline changing checkpoint record is written, though. If the checkpoint record fits in the previous page, the page will carry the old TLI, but if the checkpoint record begins a new WAL page, the new page is initialized with the new TLI. I think we should rearrange that so that the page header will always carry the old TLI. Or after rescanning the timeline history files, what about refetching the last applied record and checking whether the TLI in the xlog page header is the same as the previous TLI? IOW, what about using the header of the xlog page including the last applied record instead of the following checkpoint record? I guess that would work too, but it seems problematic to move backwards during recovery. Anyway ISTM we should also check that the min recovery point is not ahead of the TLI switch location. So we need to fetch the record in the min recovery point and validate the TLI of the xlog page header. Otherwise, the database might get corrupted. This can happen, for example, when you remove all the WAL files in pg_xlog directory and restart the standby. Yes, that's another problem. We don't know which timeline the min recovery point refers to. We should store TLI along with minRecoveryPoint, then we can at least check that we're on the right timeline when we reach minRecoveryPoint and throw an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create custom collation from case insensitive portuguese
Alexandre Riveira escreveu: I've achieved some success in changing collate operating system (linux) to generate sort of way of Brazil Portuguese hopes by adding the following code in LC_COLLATE This was already discussed; search the archives [1] [2]. So far, I understood the mechanism of change collate and reproduce in postgresql, and I could not generate a case-insensitive search, I believe that would change within the LC_COLLATE variable, but could not go any further than that. PostgreSQL doesn't support case-insensitive searches specifying the collate per column yet. Look at [3]. But you could use ILIKE or regular expression to achieve that. [1] http://pgfoundry.org/pipermail/brasil-usuarios/20060330/001667.html [2] http://www.mail-archive.com/brasil-usuar...@pgfoundry.org/msg00895.html [3] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00512.php -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER TYPE recursion to typed tables
I'm working on propagating ALTER TYPE commands to typed tables. This is currently prohibited. For example, take these regression test cases: CREATE TYPE test_type2 AS (a int, b text); CREATE TABLE test_tbl2 OF test_type2; ALTER TYPE test_type2 ADD ATTRIBUTE c text; -- fails ALTER TYPE test_type2 ALTER ATTRIBUTE b TYPE varchar; -- fails ALTER TYPE test_type2 DROP ATTRIBUTE b; -- fails ALTER TYPE test_type2 RENAME ATTRIBUTE b TO bb; -- fails The actual implementation isn't very difficult, because the ALTER TABLE code already knows everything about recursion. Now I'm wondering what kind of syntax should be used to control this. I think you don't want to automatically propagate such innocent looking operations to tables in a potentially data-destroying manner. The natural idea would be RESTRICT/CASCADE. This is currently only associated with DROP operations, but I suppose ADD/ALTER/RENAME ATTRIBUTE x ... CASCADE doesn't sound too odd. Comments, other ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comparison with true in source code
On Mon, Nov 01, 2010 at 12:17:02PM +0900, Itagaki Takahiro wrote: There are some == true in the codes, but they might not be safe because all non-zero values are true in C. Is it worth cleaning up them? ... src/interfaces/ecpg/ecpglib/connect.c(168): if (con-autocommit == true strncmp(mode, off, strlen(off)) == 0) src/interfaces/ecpg/preproc/ecpg.addons(356): if (compat == ECPG_COMPAT_INFORMIX_SE autocommit == true) src/interfaces/ecpg/preproc/ecpg.c(310): ptr2ext[3] = (header_mode == true) ? 'h' : 'c'; src/interfaces/ecpg/preproc/ecpg.c(327): ptr2ext[1] = (header_mode == true) ? 'h' : 'c'; I actually see no reason why these variables are not defined as bool instead of int, so I changed this. Hopefully I found all of them. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
Andres Freund and...@anarazel.de wrote: * You wont see an error if the next command after the IDLE in transaction is a COMMIT/ROLLBACK. I don*t see any sensible way around that. Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT, couldn't you call a function to check for it in CommitTransaction and PrepareTransaction? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE recursion to typed tables
On Tue, Nov 2, 2010 at 9:15 AM, Peter Eisentraut pete...@gmx.net wrote: I'm working on propagating ALTER TYPE commands to typed tables. This is currently prohibited. For example, take these regression test cases: CREATE TYPE test_type2 AS (a int, b text); CREATE TABLE test_tbl2 OF test_type2; ALTER TYPE test_type2 ADD ATTRIBUTE c text; -- fails ALTER TYPE test_type2 ALTER ATTRIBUTE b TYPE varchar; -- fails ALTER TYPE test_type2 DROP ATTRIBUTE b; -- fails ALTER TYPE test_type2 RENAME ATTRIBUTE b TO bb; -- fails The actual implementation isn't very difficult, because the ALTER TABLE code already knows everything about recursion. Now I'm wondering what kind of syntax should be used to control this. I think you don't want to automatically propagate such innocent looking operations to tables in a potentially data-destroying manner. The natural idea would be RESTRICT/CASCADE. This is currently only associated with DROP operations, but I suppose ADD/ALTER/RENAME ATTRIBUTE x ... CASCADE doesn't sound too odd. Comments, other ideas? That seems reasonable. What do you plan to do about this case? CREATE TYPE test_type AS (a int, b text); CREATE TABLE test_tbl (x test_type); ALTER TYPE test_type ADD ATTRIBUTE c text; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create custom collation from case insensitive portuguese
On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira eu...@timbira.com wrote: Alexandre Riveira escreveu: I've achieved some success in changing collate operating system (linux) to generate sort of way of Brazil Portuguese hopes by adding the following code in LC_COLLATE This was already discussed; search the archives [1] [2]. So far, I understood the mechanism of change collate and reproduce in postgresql, and I could not generate a case-insensitive search, I believe that would change within the LC_COLLATE variable, but could not go any further than that. PostgreSQL doesn't support case-insensitive searches specifying the collate per column yet. Look at [3]. But you could use ILIKE or regular expression to achieve that. Is citext also useful for this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: marcin mank marcin.m...@gmail.com writes: On Sat, Oct 30, 2010 at 6:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: 3. To hash, apply the element type's hash function to each array element. Combine these values by rotating the accumulator left one bit at each step and xor'ing in the next element's hash value. Thoughts? In particular, is anyone aware of a better method for combining the element hash values? This would make the hash the same for arrays with elements 32 apart swapped. Well, there are *always* going to be cases where you get the same hash value for two different inputs; it's unavoidable given that you have to combine N 32-bit hash values into only one 32-bit output. Sure. The goal is to make those hard to predict, though. I think multiply by 31 and add the next value is a fairly standard way of getting that behavior. It mixes up the bits a lot more than just left-shifting by a variable offset. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE recursion to typed tables
On tis, 2010-11-02 at 10:54 -0700, Robert Haas wrote: What do you plan to do about this case? CREATE TYPE test_type AS (a int, b text); CREATE TABLE test_tbl (x test_type); ALTER TYPE test_type ADD ATTRIBUTE c text; This is currently prohibited, and I'm not planning to do anything about that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Robert Haas robertmh...@gmail.com writes: On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: marcin mank marcin.m...@gmail.com writes: This would make the hash the same for arrays with elements 32 apart swapped. Well, there are *always* going to be cases where you get the same hash value for two different inputs; it's unavoidable given that you have to combine N 32-bit hash values into only one 32-bit output. Sure. The goal is to make those hard to predict, though. Really? I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... I think multiply by 31 and add the next value is a fairly standard way of getting that behavior. It mixes up the bits a lot more than just left-shifting by a variable offset. What concerns me about that is that it tends to push the bits to the left --- I think the effects of the earlier inputs are going to overflow out as you incorporate a lot of newer inputs. What you want is a scheme where every input item affects all the bits of the final result about equally, and my gut feeling is this doesn't provide that. I'd be happier about this approach if there were some actual theory behind it ... maybe there's some analysis out there, but the one link that was given was just about entirely unconvincing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Excerpts from Tom Lane's message of mar nov 02 15:21:31 -0300 2010: What concerns me about that is that it tends to push the bits to the left --- I think the effects of the earlier inputs are going to overflow out as you incorporate a lot of newer inputs. What you want is a scheme where every input item affects all the bits of the final result about equally, and my gut feeling is this doesn't provide that. Ahh, now I understand what you meant with rotating the bits in the original email in this thread. You're not simply shifting. Clever. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Oct 30, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: marcin mank marcin.m...@gmail.com writes: This would make the hash the same for arrays with elements 32 apart swapped. Well, there are *always* going to be cases where you get the same hash value for two different inputs; it's unavoidable given that you have to combine N 32-bit hash values into only one 32-bit output. Sure. The goal is to make those hard to predict, though. Really? I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. I think multiply by 31 and add the next value is a fairly standard way of getting that behavior. It mixes up the bits a lot more than just left-shifting by a variable offset. What concerns me about that is that it tends to push the bits to the left --- I think the effects of the earlier inputs are going to overflow out as you incorporate a lot of newer inputs. What you want is a scheme where every input item affects all the bits of the final result about equally, and my gut feeling is this doesn't provide that. I don't think so. That would be a problem if you multiplied by an even number. You can see that you don't get dups if you just add in the same value over and over: #include stdio.h int main(int argc, char **argv) { unsigned hv = 1; unsigned i; for (i = 0; i 100; ++i) { hv = hv * 31 + 1; printf(%08x\n, hv); } return 0; } No dups. Also, if you change that first hv = 1 line to hv = 2 and rerun, that sequence also has no dups, and no collisions with the hv = 1 sequence. I'd be happier about this approach if there were some actual theory behind it ... maybe there's some analysis out there, but the one link that was given was just about entirely unconvincing. I think it's from Knuth, though unfortunately I don't have a copy to check. There are probably better algorithms out there, but this one's pretty simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The goal is to make those hard to predict, though. Really? I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. Are you sure you're not confusing attributes of a good random number generator with hash generation? (They have much in common, but I've only seen concern with hard to predict when working on random number algorithms, as for financial audits or jury selection.) If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. You're confusing unpredictable with widely distributed, I think. There's no reason that the hash value of an integer of the same size as the hash shouldn't be the integer itself, for example. It's hard to get more predictable than that, yet it works well for hash lookups. I know that for a hash of a character string, the total = (total * 31) + nextcharacter has been shown to be effective. That's hardly random or hard to predict, but it does tend to spread out the hash values well. Whether it is as good for arrays, I don't know. It seems like a reasonable place to start, though, since a character string is rather like an array of characters. What concerns me about that is that it tends to push the bits to the left --- I think the effects of the earlier inputs are going to overflow out as you incorporate a lot of newer inputs. What you want is a scheme where every input item affects all the bits of the final result about equally, and my gut feeling is this doesn't provide that. I don't think so. That would be a problem if you multiplied by an even number. You can see that you don't get dups if you just add in the same value over and over: Yeah, that 1 in the low order position ensures that the impact of any value which is ever added into the total is never entirely lost. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE recursion to typed tables
On Nov 2, 2010, at 11:17 AM, Peter Eisentraut pete...@gmx.net wrote: On tis, 2010-11-02 at 10:54 -0700, Robert Haas wrote: What do you plan to do about this case? CREATE TYPE test_type AS (a int, b text); CREATE TABLE test_tbl (x test_type); ALTER TYPE test_type ADD ATTRIBUTE c text; This is currently prohibited, and I'm not planning to do anything about that. OK. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Tue, Nov 2, 2010 at 12:34 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The goal is to make those hard to predict, though. Really? I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. Are you sure you're not confusing attributes of a good random number generator with hash generation? (They have much in common, but I've only seen concern with hard to predict when working on random number algorithms, as for financial audits or jury selection.) If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. You're confusing unpredictable with widely distributed, I think. There's no reason that the hash value of an integer of the same size as the hash shouldn't be the integer itself, for example. It's hard to get more predictable than that, yet it works well for hash lookups. Well, no, not really. For example, it may be that you have a hash table with N buckets and values that of the form N+k, 2N+k, 3N+k, and everything will collide. If you do some mixing of the bits, that case is far less likely to occur in practice. See for example http://www.azillionmonkeys.com/qed/hash.html I know that for a hash of a character string, the total = (total * 31) + nextcharacter has been shown to be effective. That's hardly random or hard to predict, but it does tend to spread out the hash values well. Whether it is as good for arrays, I don't know. It seems like a reasonable place to start, though, since a character string is rather like an array of characters. That was my thought. What concerns me about that is that it tends to push the bits to the left --- I think the effects of the earlier inputs are going to overflow out as you incorporate a lot of newer inputs. What you want is a scheme where every input item affects all the bits of the final result about equally, and my gut feeling is this doesn't provide that. I don't think so. That would be a problem if you multiplied by an even number. You can see that you don't get dups if you just add in the same value over and over: Yeah, that 1 in the low order position ensures that the impact of any value which is ever added into the total is never entirely lost. Right... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Robert Haas robertmh...@gmail.com wrote: There's no reason that the hash value of an integer of the same size as the hash shouldn't be the integer itself, for example. It's hard to get more predictable than that, yet it works well for hash lookups. Well, no, not really. For example, it may be that you have a hash table with N buckets and values that of the form N+k, 2N+k, 3N+k, and everything will collide. That hardly seems convincing if the integer is a sequentially assigned number, as with many ID columns; but if you want an algorithm which will work well with numbers which might be assigned in a pattern with a high correlation with some unpredictable number of buckets -- sure, it won't work well without some scrambling. For sequentially assigned numbers, that scrambling would have a cost and would be of no value. I guess it depend a bit on your use case and goals. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Robert Haas robertmh...@gmail.com writes: On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Really? I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. That seems like a rather poor straw man, since it suffers from exactly the defect I'm complaining about, namely failing to consider all the input values equally. I'd be happier about this approach if there were some actual theory behind it ... maybe there's some analysis out there, but the one link that was given was just about entirely unconvincing. I think it's from Knuth, though unfortunately I don't have a copy to check. There are probably better algorithms out there, but this one's pretty simple. I don't see anything in Knuth suggesting a multiplier of 31. His recommendation for a multiplier, if you're going to use multiplicative hashing, is wordsize/phi (phi being the golden ratio) ... and he also wants you to keep the high order not the low order bits of the product. However, this is largely beside the point, because that theory, as well as the Java code you're arguing from, has to do with the initial hashing of a raw sequence of input items. Not with combining some existing hash values. The rotate-and-xor method I suggested for that is borrowed exactly from section 6.4 of Knuth (page 512, in the first edition of volume 3). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Tue, Nov 02, 2010 at 04:42:19PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Nov 2, 2010 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Really? ?I think I don't understand when this fails isn't obviously better than being able to predict when it fails ... Isn't that the whole point of hash functions? Collisions are inevitable, but you want them to be unpredictable. If you want a hash function with predictable collision behavior, just has the first element. It'll be highly predictable AND wicked fast. That seems like a rather poor straw man, since it suffers from exactly the defect I'm complaining about, namely failing to consider all the input values equally. I'd be happier about this approach if there were some actual theory behind it ... maybe there's some analysis out there, but the one link that was given was just about entirely unconvincing. I think it's from Knuth, though unfortunately I don't have a copy to check. There are probably better algorithms out there, but this one's pretty simple. I don't see anything in Knuth suggesting a multiplier of 31. His recommendation for a multiplier, if you're going to use multiplicative hashing, is wordsize/phi (phi being the golden ratio) ... and he also wants you to keep the high order not the low order bits of the product. However, this is largely beside the point, because that theory, as well as the Java code you're arguing from, has to do with the initial hashing of a raw sequence of input items. Not with combining some existing hash values. The rotate-and-xor method I suggested for that is borrowed exactly from section 6.4 of Knuth (page 512, in the first edition of volume 3). regards, tom lane Given that our hash implimentation mixes the input data well (It does. I tested it.) then a simple rotate-and-xor method is all that should be needed to maintain all of the needed information. The original hash function has done the heavy lifting in this case. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Nov 2, 2010, at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, this is largely beside the point, because that theory, as well as the Java code you're arguing from, has to do with the initial hashing of a raw sequence of input items. Not with combining some existing hash values. The rotate-and-xor method I suggested for that is borrowed exactly from section 6.4 of Knuth (page 512, in the first edition of volume 3). It seems undesirable to me to have a situation where transposing two array elements doesn't change the hash value. But I just work here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote: Andres Freund and...@anarazel.de wrote: * You wont see an error if the next command after the IDLE in transaction is a COMMIT/ROLLBACK. I don*t see any sensible way around that. Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT, couldn't you call a function to check for it in CommitTransaction and PrepareTransaction? Sure, throwing an error somewhere wouldnt be that hard. But at the moment a COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn). Doesn't seem to be something to changed lightly. Does anybody have any idea why COMMIT is allowed there? Seems pretty strange to me. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Robert Haas robertmh...@gmail.com writes: On Nov 2, 2010, at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, this is largely beside the point, because that theory, as well as the Java code you're arguing from, has to do with the initial hashing of a raw sequence of input items. Not with combining some existing hash values. The rotate-and-xor method I suggested for that is borrowed exactly from section 6.4 of Knuth (page 512, in the first edition of volume 3). It seems undesirable to me to have a situation where transposing two array elements doesn't change the hash value. But I just work here. [ shrug... ] There are always going to be collisions, and you're overstating the importance of this one (only some transpositions will result in a duplicate hash, not any transposition). What's actually *important*, for our purposes, is that all bits of the final hash value depend in a reasonably uniform way on all bits of the input hash values. If we don't have that property, then bucket numbers (which we form by taking the low-order N bits of the final hash, where N isn't known in advance) won't be as well distributed as we'd like. It's possible that the multiply-by-31 method is as good as the rotate-and-xor method by that measure, or even better; but it's far from obvious that it's better. And I'm not convinced that the multiply method has a pedigree that should encourage me to take it on faith. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
Excerpts from Andres Freund's message of mar nov 02 18:36:19 -0300 2010: On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote: Andres Freund and...@anarazel.de wrote: * You wont see an error if the next command after the IDLE in transaction is a COMMIT/ROLLBACK. I don*t see any sensible way around that. Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT, couldn't you call a function to check for it in CommitTransaction and PrepareTransaction? Sure, throwing an error somewhere wouldnt be that hard. But at the moment a COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn). Doesn't seem to be something to changed lightly. If the user calls COMMIT, it calls EndTransactionBlock, which ends up calling AbortTransaction. Can't you do it at that point? (Says he who hasn't looked at the patch at all) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
Andres Freund and...@anarazel.de wrote: On Tuesday 02 November 2010 18:33:15 Kevin Grittner wrote: Well, on a ROLLBACK I'm not sure it's a problem. On a COMMIT, couldn't you call a function to check for it in CommitTransaction and PrepareTransaction? Sure, throwing an error somewhere wouldnt be that hard. But at the moment a COMMIT is always successfull (and just reporting a ROLLBACK in a failed txn). Doesn't seem to be something to changed lightly. Well, I'm looking at using this with the Serializable Snapshot Isolation (SSI) patch, which can throw an error from a COMMIT, if the commit completes the conditions necessary for an anomaly to occur (i.e., the committing transaction is on the rw-conflict *out* side of a pivot, and it is the first in the set to commit). If we succeed in enhancing SSI to use lists of rw-conflicted transactions rather than its current techniques, we might be able to (and find it desirable to) always commit in that circumstance and roll back some *other* transaction which is part of the problem. Of course, that other transaction might be idle at the time, and the next thing *it* tries to execute *might* be a COMMIT. So if the SSI patch goes in, there will always be some chance that a COMMIT can fail, but doing it through the mechanism your patch provides could improve performance, because we could guarantee that nobody ever has a serialization failure without first successfully committing a transaction which wrote to one or more tables. If a commit fails due to SSI, it is highly desirable that the error use the serialization failure SQLSTATE, so that an application framework can know that it is reasonable to retry the transaction. Does anybody have any idea why COMMIT is allowed there? Seems pretty strange to me. So that the failed transaction state can be cleared. The transaction as a whole has failed, but you don't want the connection to become useless. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
On Sat, Oct 30, 2010 at 01:01:44PM -0400, Tom Lane wrote: marcin mank marcin.m...@gmail.com writes: This is what boost does: http://www.systomath.com/include/Boost-1_34/doc/html/boost/hash_combine.html Hmm. I am reminded of Knuth's famous dictum: never generate random numbers with a method chosen at random. Is there any actual theory behind that algorithm, and if so what is it? The combination of shifting with addition (not xor) seems more likely to lead to weird cancellations than any improvement in the hash behavior. As far as I can tell the boost combiner comes from: http://goanna.cs.rmit.edu.au/~jz/fulltext/jasist-tch.pdf Which seems to be solving a completely different problem and I can't see how relevant it is to the design of a hash combiner. The fact that they get trivial details wrong like 32bit integers going up to 2^32 rather than 2^32-1 doesn't inspire confidence. One subtle point I noticed on the boost mailing list was that they didn't want [[1],2] hashing to the same value as [1,2]. I'm pretty sure that this sort of construction isn't possible to express in PG, but thought I should mention it just in case. -- Sam http://samason.me.uk/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote: Does anybody have any idea why COMMIT is allowed there? Seems pretty strange to me. So that the failed transaction state can be cleared. The transaction as a whole has failed, but you don't want the connection to become useless. Well, allowing ROLLBACK is enought there, isnt it? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create custom collation from case insensitive portuguese
Thank you all for your help ! When mentioned in Portuguese case-insensitive in fact we are also talking about accent-insensitive A common example is that the name Jose and José also can be written, citext or ilike only not solve the problem My progress is ... Edit file /usr/share/i18n/locales/i18n e alter section tolower /, an example: (U00C9,U00E9) e alter for (U00C9,U0065) LOWER reproduce: LOWER(ITAPAGÉ) = itapage, Example success: SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%') this behavior is reproduced in citext (logradouro is column citext) SELECT * FROM endereco WHERE logradouro = 'itapage' or SELECT * FROM endereco WHERE logradouro LIKE 'itapage%' All examples return the desired value ITAPAGÉ Issue: SELECT * FROM endereco WHERE logradouro LIKE 'itapage%' NOT USE INDEX I tried CREATE INDEX cep_ik_logradouro ON cep USING btree (logradouro); CREATE INDEX like_index ON cep(logradouro varchar_pattern_ops); CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops); I've had success with using index SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%') and CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops) But I want to solve using only citext Tank's Alexandre Riveira Brazil Robert Haas escreveu: On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira eu...@timbira.com wrote: Alexandre Riveira escreveu: I've achieved some success in changing collate operating system (linux) to generate sort of way of Brazil Portuguese hopes by adding the following code in LC_COLLATE This was already discussed; search the archives [1] [2]. So far, I understood the mechanism of change collate and reproduce in postgresql, and I could not generate a case-insensitive search, I believe that would change within the LC_COLLATE variable, but could not go any further than that. PostgreSQL doesn't support case-insensitive searches specifying the collate per column yet. Look at [3]. But you could use ILIKE or regular expression to achieve that. Is citext also useful for this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create custom collation from case insensitive portuguese
Alexandre Riveira alexan...@objectdata.com.br writes: When mentioned in Portuguese case-insensitive in fact we are also talking about accent-insensitive See unaccent dictionary, but don't use only this one in your text search configuration, IIRC. http://www.postgresql.org/docs/9/static/unaccent.html Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation
Andres Freund and...@anarazel.de writes: On Tuesday 02 November 2010 22:59:15 Kevin Grittner wrote: Does anybody have any idea why COMMIT is allowed there? Seems pretty strange to me. So that the failed transaction state can be cleared. The transaction as a whole has failed, but you don't want the connection to become useless. Well, allowing ROLLBACK is enought there, isnt it? The client has no reason to think the transaction has failed, so what it's going to send is COMMIT, not ROLLBACK. From the point of view of the client, this should look like its COMMIT failed (or in general its next command failed); not like there was some magic action-at-a-distance on the state of its transaction. Now you could argue that if you send COMMIT, and that fails, you should have to send ROLLBACK to get to an idle state ... but that's not how this ever worked before, and I don't think it's what the SQL standard expects either. After a COMMIT, you're out of the transaction either way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash support for arrays
Tom Lane wrote: It's possible that the multiply-by-31 method is as good as the rotate-and-xor method by that measure, or even better; but it's far from obvious that it's better. And I'm not convinced that the multiply method has a pedigree that should encourage me to take it on faith. Short summary: * some history of it's trivia follows * (nothing here suggests it's better - just old and common and cheap) Longer - some trivia regarding its pedigree: It (or at least a *31 variant) seems to have a history of advocacy going back to Chris Torek in 1990: http://groups.google.com/group/comp.lang.c/browse_thread/thread/28c2095282f0c1b5/193be99e9836791b?q=#193be99e9836791b X#defineHASH(str, h, p) \ X for (p = str, h = 0; *p;) h = (h 5) - h + *p++ and gets referred to in Usenet papers in the early 90's as well: http://www.usenix.com/publications/library/proceedings/sa92/salz.pdf Regarding why the magic number 31 [or 33 which also often comes up] apparently the only thing magic about it is that it's an odd number != 1.The rest of the odd numbers work about as well according to this guy who tried to explain it: http://svn.eu.apache.org/repos/asf/apr/apr/trunk/tables/apr_hash.c * The magic of number 33, i.e. why it works better than many other * constants, prime or not, has never been adequately explained by * anyone. So I try an explanation: if one experimentally tests all * multipliers between 1 and 256 (as I did while writing a low-level * data structure library some time ago) one detects that even * numbers are not useable at all. The remaining 128 odd numbers * (except for the number 1) work more or less all equally well. * They all distribute in an acceptable way and this way fill a hash * table with an average percent of approx. 86%. * * If one compares the chi^2 values of the variants (see * Bob Jenkins ``Hashing Frequently Asked Questions'' at * http://burtleburtle.net/bob/hash/hashfaq.html for a description * of chi^2), the number 33 not even has the best value. But the * number 33 and a few other equally good numbers like 17, 31, 63, * 127 and 129 have nevertheless a great advantage to the remaining * numbers in the large set of possible multipliers: their multiply * operation can be replaced by a faster operation based on just one * shift plus either a single addition or subtraction operation. And * because a hash function has to both distribute good _and_ has to * be very fast to compute, those few numbers should be preferred. ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers