Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On Thu, Mar 22, 2012 at 03:02:45PM -0400, Tom Lane wrote: It's even less clear about what the semantics are in multi-key cases. Right offhand I would say that multi-key cases are nonsensical and should be forbidden outright, because there is no way to figure out which collections of elements of different arrays should be considered to be a referencing item. Currently multi-column keys with more than one EACH column are unsupported, mainly because it's unclear how they should work (and I agree that they might not work at all). Could we see a specification of what the referencing semantics are intended to be, please? You are right, the discussion has never been put together in a single place, as it should have. Please find below an updated version of the specification, which Marco and I put together from the discussion in this list, and taking into account the changes happened in the review phase. Some comments have also been added to explain why some choices have been forbidden. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it ---8--8--8--8--8--8--8--8--8--- ON (DELETE | UPDATE) actions for EACH foreign keys == -- --- --- |ON |ON | Action| DELETE | UPDATE | -- --- --- CASCADE |Row| Forbidden | SET NULL |Row|Row| SET DEFAULT |Row|Row| EACH CASCADE | Element | Element | EACH SET NULL | Element | Element | EACH SET DEFAULT | Forbidden | Forbidden | NO ACTION | - | - | RESTRICT | - | - | -- - - Example 1. Table C references table B via a (non-array) foreign key. Example 2. The referencing table A is constructed as GROUP BY of table C in Example 1. There is an EACH foreign key on A which references B, representing the same relationship as the foreign key in Example 1. Remark 3. Examples 1 and 2 are related, because they represent the same model; in making choices about a certain action on Example 2 we will considering its relationship with Example 1. Example 4. Assume that the FK in Example 1 has a ON DELETE CASCADE action. Deleting one row on table B will delete all the referencing rows in table A. The state that we get after the DELETE is the same obtained by Example 2 with the ON DELETE EACH CASCADE action after removing the same row. Example 4 suggests to associate the Element behaviour to the ON DELETE EACH CASCADE action. The user can choose between two different options for a CASCADE-style action when a referenced row is deleted; both of them have use cases, as the following Example 5 shows. Example 5. If you remove a vertex from a polygon (represented as an array of vertices), you can either destroy the polygon (ON DELETE CASCADE) or transform it into a polygon with less vertices (ON DELETE EACH CASCADE). ON UPDATE SET NULL has its own purpose as a different behaviour than ON UPDATE EACH SET NULL; again, both options are provided to the user, essentially like with ON DELETE CASCADE and ON DELETE EACH CASCADE. ON (UPDATE | DELETE) EACH SET DEFAULT is forbidden, because table A does not carry a default value for an array element. In theory the default value could be retrieved from the referenced table B, but that would be unusual and in any case different from the corresponding case of Example 1 with ON (UPDATE | DELETE) SET DEFAULT. ON UPDATE CASCADE is forbidden because, as far as we can see, the only meaningful action to propagate updated values is ON UPDATE EACH CASCADE. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)
Hi again, [...] However, your example is enough unlike the actual code that the conclusion you state following the word clearly isn't actually clear to me. According to latch.h, the correct method of using a latch is like this: * for (;;) * { * ResetLatch(); * if (work to do) * Do Stuff(); * WaitLatch(); * } Meanwhile, anyone who is creating additional work to do should add the work to the queue and then set the latch. When writing the above statement, including the clearly, we were possibly too much thinking of the above usage hint, which just uses ResetLatch and WaitLatch, and not considering that SetLatch is to be part of Do Stuff(). So here are once again our version, and the more properly translated one, this time including SetLatch (in line 16). Our version: In PostgreSQL-function terms: 1 #define WORKERS 2 1 #define WORKERS 2 2 volatile _Bool latch[WORKERS]; 2 volatile _Bool latch[WORKERS]; 3 volatile _Bool flag[WORKERS]; 3 volatile _Bool flag[WORKERS]; 44 5 void worker(int i) 5 void worker(int i) 6 { 6 { 7 while(!latch[i]);7 WaitLatch(i); 8 for(;;) 8 for(;;) 9 {9 { 10assert(!latch[i] || flag[i]); 10assert(!latch[i] || flag[i]); 11latch[i] = 0; 11ResetLatch(i); 12if(flag[i])12if(flag[i]) 13{ 13{ 14 flag[i] = 0; 14 flag[i] = 0; 15 flag[i+1 % WORKERS] = 1; 15 flag[i+1 % WORKERS] = 1; 16 latch[i+1 % WORKERS] = 1;16 SetLatch(i+1 % WORKERS); 17} 17} 18 18 19while(!latch[i]); 19WaitLatch(i); 20 }20 } 21 } 21 } So it seems to me that we could potentially fix this by inserting barriers at the end of ResetLatch and at the beginning of SetLatch and WaitLatch. Then the latch has to get reset before we check whether there's work to do; and we've got to finish checking for work before we again try to wait for the latch. Similarly, any work that was in progress before SetLatch was called will be forced to be committed to memory before SetLatch does anything else. Adding that many barriers might not be very good for performance but it seems OK from a correctness point of view, unless I am missing something, which is definitely possible. I'd appreciate any thoughts you have on this, as this is clearly subtle and tricky to get exactly right. So we had suggested the following bugfixes: In [3] it was suggested to fix the problem by placing a barrier in ResetLatch, which corresponds to placing it between lines 11 and 12 in the code above. This amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e., between WaitLatch and the if(flag[1]) ) of Worker 1. Placing a sync (i.e., the strongest Power barrier) accordingly would, however, still be insufficient for the second problem, as it would only fix the reordering of read-read pairs by Worker 1 and the store atomicity issue from Worker 0. But the writes on Worker 0 could still be reordered (problem number 2). One possible fix consists of placing a sync between the two writes on Worker 0, and an address dependency between the two reads on Worker 1. Clearly, however, these are changes that cannot any longer be hidden behind the ResetLatch/WaitLatch interface, but rather go in the code using these. Here, the two writes on Worker 0 corresponds to lines 15 and 16. And indeed line 16 is exactly the call to SetLatch. For solving problem 1, the mp idiom, the following options are possible (in all cases stronger synchronisation primitives may be used, i.e., the strongest Power barrier, sync, may be used, or lwsync may be used instead of an address dependency): 1. An lwsync at the beginning of SetLatch, and lwsync in ResetLatch (preferably after the write). 2. An lwsync at the beginning of SetLatch, and an address dependency in ResetLatch. To address the second problem, the lb idiom, an address dependency has to be put either in WaitLatch or SetLatch. To fix both problems, the performance-wise cheapest option would thus be placing an address dependency in ResetLatch and an lwsync in SetLatch. For practical reasons, however, placing an lwsync in both places (at the beginning of SetLatch and after the write in ResetLatch) might be preferable, as address dependencies may be optimised away by the C compiler or require inline assembly in a form not as easy to factor out as lwsync, plus the interface of
Re: [HACKERS] Fix PL/Python metadata when there is no result
On ons, 2012-03-07 at 17:14 -0500, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On ons, 2012-03-07 at 16:49 -0500, Tom Lane wrote: Still, it seems rather arbitrary to say that the row count property is the thing to test for that purpose and no other is. Why not return None for any property that's not sensible? Hmm, above you said you were in favor of throwing an error rather than returning None? I said it was a reasonable alternative, not that it was the only one we should consider. The behavior of .nrows() might be accidental, but perhaps it is a preferable model to adopt. It turns out I was mistaken about the .nrows() behavior. It returns 0 even for utility commands, because the value comes straight from SPI_processed. But SPI_processed is a C variable, which can't have a not applicable value, so that doesn't necessarily mean other languages can't handle this differently. After pondering this for several days now I still think the best approach is to change .nrows() to return None for utility commands and have the other metadata functions throw exceptions. Then the programming style would be if there are rows, give me metadata about them. The alternative would be to introduce another function has_rows or something, but then how would that be more intuitive than saying nrows() is not None? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
Qi, Yeah, I can see that. That's a sign that you had a good idea for a project, actually: your idea is interesting enough that people want to debate it. Make a proposal on Monday and our potential mentors will help you refine the idea. - Original Message - Date: Thu, 22 Mar 2012 13:17:01 -0400 Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema From: cbbro...@gmail.com To: kevin.gritt...@wicourts.gov CC: pgsql-hackers@postgresql.org On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, the standard syntax apparently aims to reduce the number of returned rows, which ORDER BY does not. Maybe you could do it with ORDER BY .. LIMIT, but the idea here I think is that we'd like to sample the table without reading all of it first, so that seems to miss the point. I think actually the traditional locution is more like ! ; WHERE random() constant where the constant is the fraction of the table you want. And yeah, the presumption is that you'd like it to not actually read every row. (Though unless the sampling density is quite a bit less than 1 row per page, it's not clear how much you're really going to win.) It's all going to depend on the use cases, which I don't think I've heard described very well yet. I've had to pick random rows from, for example, a table of disbursements to support a financial audit. In those cases it has been the sample size that mattered, and order didn't. One interesting twist there is that for some of these financial audits they wanted the probability of a row being selected to be proportional ! to the dollar amount of the disbursement. I don't t hink you can do this without a first pass across the whole data set. This one was commonly called Dollar Unit Sampling, though the terminology has gradually gotten internationalized. http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html What the article doesn't mention is that some particularly large items might wind up covering multiple samples. In the example, they're looking for a sample every $3125 down the list. If there was a single transaction valued at $3, that (roughly) covers 10 of the desired samples. It isn't possible to do this without scanning across the entire table. If you want repeatability, you probably want to instantiate a copy of enough information to indicate the ordering chosen. That's probably something that needs to be captured as part of the work of the audit, so n! ot only does it need to involve a pass across the data, it probably requires capturing a fair bit of data for posterity. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? The discussion till now has gone far beyond my understanding. Could anyone explain briefly what is the idea for now? The designing detail for me is still unfamiliar. I can only take time to understand while possible after being selected and put time on it to read relevant material. For now, I'm still curious why Neil's implementation is no longer working? The Postgres has been patched a lot, but the general idea behind Neil's implementation should still work, isn't it? Besides, whether this query is needed is still not decided. Seems this is another hard to decide point. Is it that this topic is still not so prepared for th e Gsoc yet? If really so, I think I still have time to switch to other topics. Any suggestion? Thanks. Best Regards and Thanks Huang Qi Victor Computer Science of National University of Singapore -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
Billy, I've done a brief search of the postgresql mail archives, and I've noticed a few projects for adding query caches to postgresql, (for example, Masanori Yamazaki's query cache proposal for GSOC 2011), ... which was completed, btw. Take a look at the current release of pgPool. Are you proposing this for GSOC2012, or is this just a general idea? I'm wondering if anyone would be interested in a query cache as a backend to postgresql? I've been playing around with the postgresql code, and if I'm understanding the code, I believe this is possible. Well, you'd have to start by demonstrating the benefit of it. The advantage of query caches in proxies and clients is well-known, because you can offload some of the work of the database onto other servers, this increasing capacity. Adding a query cache to the database server would require the query identity recognition of the cache to be far cheaper (as in 10X cheaper) than planning and running the query, which seems unlikely at best. There are a number of proven caching models which PostgreSQL currently does not yet implement. I'd think it would be more profitable to pursue one of those, such as: * parse caching in the client (JDBC has this, but libpq does not). * shared cached plans between sessions (snapshot issues here could be nasty) * fully automated materialized views If you want to do something radical and new, then come up with a way for a client to request and then reuse a complete query plan by passing it to the server. That would pave the way for client-side plan caching (and plan manipulation) code written in a variety of languages, and thus further innovation through creative algorithms and other ideas. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix PL/Python metadata when there is no result
Peter Eisentraut pete...@gmx.net writes: On ons, 2012-03-07 at 17:14 -0500, Tom Lane wrote: I said it was a reasonable alternative, not that it was the only one we should consider. The behavior of .nrows() might be accidental, but perhaps it is a preferable model to adopt. After pondering this for several days now I still think the best approach is to change .nrows() to return None for utility commands and have the other metadata functions throw exceptions. OK, I don't have strong feelings about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
Joshua Berkus j...@agliodbs.com writes: If you want to do something radical and new, then come up with a way for a client to request and then reuse a complete query plan by passing it to the server. [ raised eyebrow ] That seems like a complete nonstarter on two different grounds: cache invalidation needs (client won't know if plan is stale) and security issues (pass broken plan to server, crash server). Those problems could be avoided if the client simply has a token for a plan that's kept on the server side ... but how is that concept different from a prepared statement? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Sat, Mar 24, 2012 at 3:22 PM, Joshua Berkus j...@agliodbs.com wrote: Billy, I've done a brief search of the postgresql mail archives, and I've noticed a few projects for adding query caches to postgresql, (for example, Masanori Yamazaki's query cache proposal for GSOC 2011), ... which was completed, btw. Take a look at the current release of pgPool. Are you proposing this for GSOC2012, or is this just a general idea? just a general idea, but if someone wants to work on it for GSOC2012, I wouldn't mind giving a helping hand. I'm not a student, so GSOC probably doesn't apply to me. I'm wondering if anyone would be interested in a query cache as a backend to postgresql? I've been playing around with the postgresql code, and if I'm understanding the code, I believe this is possible. Well, you'd have to start by demonstrating the benefit of it. The advantage of query caches in proxies and clients is well-known, because you can offload some of the work of the database onto other servers, this increasing capacity. Adding a query cache to the database server would require the query identity recognition of the cache to be far cheaper (as in 10X cheaper) than planning and running the query, which seems unlikely at best. I figured I'd create the md5 digest of the sourceText of a query, and then look that up in a hash. I don't think that will be very expensive. I'll have another hash to keep track of which queries are dependent on which relations, so that when a relation is changed somehow (and committed), the query is then invalidated and removed from the query hash. Billy
Re: [HACKERS] query cache
Well, you'd have to start by demonstrating the benefit of it. The advantage of query caches in proxies and clients is well-known, because you can offload some of the work of the database onto other servers, this increasing capacity. Adding a query cache to the database server would require the query identity recognition of the cache to be far cheaper (as in 10X cheaper) than planning and running the query, which seems unlikely at best. I figured I'd create the md5 digest of the sourceText of a query, and then look that up in a hash. I don't think that will be very expensive. I'll have another hash to keep track of which queries are dependent on which relations, so that when a relation is changed somehow (and committed), the query is then invalidated and removed from the query hash. From the experience of implementing query cache in pgool-II there are some suggestions: - A query result cache should not be created if the transaction including the SELECT is not committed. - Since a transaction could have many SELECTs, you need to keep those query results somewhere in a temporary storage. You could either discard or register them to the query cache storage depending on the transaction's fate, either aborted or committed. - If a SELECT has non-immutable functions, then the query result should not be cached. - If a SELECT uses temporary tables, then the query result should not be cached. - If a SELECT uses unlogged tables, then the query result should not be cached because their data could vanish after crash recovery. Of course this is only applied if you plan to use cache storage which does not survive after crash. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers