Re: [PERFORM] store A LOT of 3-tuples for comparisons
I am also wondering about the ordering and whether that matters. Can you have he, can, drink as well as drink, he, can and should they be considered the same? If so you will need a different tactic. ordering matters. So the 3-column tactic should work. Thanks for your advice! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Q on views and performance
On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) [EMAIL PROTECTED] wrote: On 2008-02-22 12:49, Kynn Jones wrote: Of course, I expect that using views Vint1 and Vint2... would result in a loss in performance relative to a version that used bona fide tables Tint1 and Tint2. My question is, how can I minimize this performance loss? That used to be my thoughts too, but I have found over the years that the PostgreSQL execution planner is able to flatten SELECTs using VIEWs, ALMOST ALWAYS in a way that does not adversely affect performance, and often gives an IMPROVEMENT in performance, probably because by using VIEWs I am stating the query problem in a better way than if I try to guess the best way to optimize a SELECT. Well, the last consideration you mention there does not apply to the two alternatives I'm comparing because they differ only in that one uses views V1, V2, V3, ... , V100 where the other one uses the corresponding tables T1, T2, T3, ... , T100, so the query statements would be identical in both cases. I have at least a 10:1 ratio of VIEWs to TABLEs. Occasionally, with some query that is slow, I will try to rewrite it without VIEWs. This ALMOST NEVER results in an improvement in performance... That's truly amazing! Just to make sure I get you right, you're saying that when you replace a view by its equivalent table you see no performance gain? How could it be? With views every query entails the additional work of searching the underlying tables for the records that make up the views... OK, if I think a bit more about it I suppose that a view could be implemented for performance as a special sort of table consisting of a single column of pointers to the true records, in which case using views would entail only the cost of this indirection, and not the cost of a search... (And also the cost of maintaining this pointer table, if the underlying tables are mutable.) So I guess views could be implemented in such a way that the difference in SELECT performance relative to replacing them with tables would be negligible... Anyway, your post once again reminded me of awesomeness of PostgreSQL. Props to the developers! kynn
Re: [PERFORM] Q on views and performance
On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) [EMAIL PROTECTED] wrote: On 2008-02-22 12:49, Kynn Jones wrote: Of course, I expect that using views Vint1 and Vint2... would result in a loss in performance relative to a version that used bona fide tables Tint1 and Tint2. My question is, how can I minimize this performance loss? That used to be my thoughts too, but I have found over the years that the PostgreSQL execution planner is able to flatten SELECTs using VIEWs, ALMOST ALWAYS in a way that does not adversely affect performance, and often gives an IMPROVEMENT in performance, probably because by using VIEWs I am stating the query problem in a better way than if I try to guess the best way to optimize a SELECT. I have at least a 10:1 ratio of VIEWs to TABLEs. Occasionally, with some query that is slow, I will try to rewrite it without VIEWs. This ALMOST NEVER results in an improvement in performance, and when it does, I am able to find another way to write the VIEW and SELECT to recapture the gain. Since you have experience working with views, let me ask you this. The converse strategy to the one I described originally would be to create the individual tables T1, T2, T3, ..., T100, but instead of keeping around the original (and now redundant) table T, replace it with a view V made up of the union of T1, T2, T3, ..., T100. The problem with this alternative is that one cannot index V, or define a primary key constraint for it, because it's a view. This means that a search in V, even for a primary key value, would be *have to be* very inefficient (i.e. I don't see how even the very clever PostgreSQL implementers could get around this one!), because the engine would have to search *all* the underlying tables, T1 through T100, even if it found the desired record in T1, since it has no way of knowing that the value is unique all across V. Is there a way around this? kynn
Re: [PERFORM] Q on views and performance
On 2008-02-23 05:59, Kynn Jones wrote: On Fri, Feb 22, 2008 at 8:48 PM, Dean Gibson (DB Administrator) [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: ... Since you have experience working with views, let me ask you this. The converse strategy to the one I described originally would be to create the individual tables T1, T2, T3, ..., T100, but instead of keeping around the original (and now redundant) table T, replace it with a view V made up of the union of T1, T2, T3, ..., T100. The problem with this alternative is that one cannot index V, or define a primary key constraint for it, because it's a view. This means that a search in V, even for a primary key value, would be *have to be* very inefficient (i.e. I don't see how even the very clever PostgreSQL implementers could get around this one!), because the engine would have to search *all* the underlying tables, T1 through T100, even if it found the desired record in T1, since it has no way of knowing that the value is unique all across V. Is there a way around this? kynn Oh, I wouldn't create separate tables and do a UNION of them, I'd think that would be inefficient. I didn't look in detail at your previous eMail, but I will now: 1. You haven't told us the distribution of zipk, or what the tables are indexed on, or what type of performance you are expecting. Your initial examples don't help much unless you actually have performance numbers or EXPLAIN output for them, since adding the third JOIN significantly changes the picture, as does changing one of the JOINs to a LEFT JOIN. 2. In your actual (Q1** and Q2**) examples, why is one JOIN an INNER JOIN and the other one a LEFT JOIN? Given your description of Q1 at the top of your message, that doesn't make sense to me. 3. Why not write: CREATE VIEW txt AS SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS type2 FROM T a1 [LEFT] JOIN T a2 USING( zipk ); -- Use LEFT if appropriate SELECT word1, word1 FROM S JOIN txt ON word = word1 WHERE type1 = int1 AND type2 = int2; If either of those (either with or without the LEFT) are not equivalent to your problem, how about just: SELECT a1.word AS word1, a2.word AS word2 FROM S JOIN T a1 USING( word) [LEFT] JOIN T a2 USING( zipk ) -- Use LEFT if appropriate WHERE a1.type = int1 AND a2.type = int2; Show us (using EXPLAIN) what the query planner thinks of each of these. -- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
Re: [PERFORM] Q on views and performance
On 2008-02-23 07:08, Dean Gibson (DB Administrator) wrote: ... SELECT word1, word1 FROM S JOIN txt ON word = word1 WHERE type1 = int1 AND type2 = int2; ... Oops that should be: SELECT word1, word2 FROM S JOIN txt ON word = word1 WHERE type1 = int1 AND type2 = int2; -- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
Re: [PERFORM] Q on views and performance
Hi, Dean. The system I'm working with is very similar in spirit to a large multilingual dictionary covering 100 languages. Using this analogy, the type column would correspond to the language, and the zipk column would correspond to some language-independent key associated with a concept (concept key for short). So, if it were indeed a multilingual dictionary, records in T would look like word | zipk | language -+--+--- house | 1234 | english casa| 1234 | spanish haus| 1234 | german piano | 2345 | english piano | 2345 | spanish cat | 3456 | english chat| 3456 | french chat| 4567 | english plausch | 4567 | german ...where I used the notation lang to denote the integer id assigned to language lang. Therefore typically there are about 100 records in T for any given zipk, one for each language. But the correspondence is not perfect, since, for example, some languages have, proverbially, more than one word for snow, and some (maybe from some tropical island in the South Pacific) have none. (This last case, BTW, is what accounts for the use of left joins, as will become clear in a minute.) The table S can be thought of a table consisting of a collection of words to be translated to some target language. In the first type of query (Q1), all the words in S are effectively declared to belong to the same source language, whereas in the second type of query (Q2) the source language for the words in S is left unspecified (in this case S may contain words from various languages, or words--like piano or chat in the example above--that belong simultaneously to different languages, and which may (e.g. piano) or may not (e.g. chat) have the same zipk [concept key] for each of these languages). So, regarding your question about (Q1**) and (Q2**): (Q1**) SELECT a1.word, sq.word FROM S JOIN T a1 USING ( word ) LEFT JOIN ( SELECT * FROM T a2 WHERE a2.type = int2 ) sq USING ( zipk ) WHERE a1.type = int1; (Q2**) SELECT a1.word, sq.word FROM S JOIN T a1 USING ( word ) LEFT JOIN ( SELECT * FROM T a2 WHERE a2.type = int2 ) sq USING ( zipk ) ...the inner join with S is intended to pick out all the records in the source table (either Tint1 in Q1** or T in Q2**) corresponding to words in S, while the second (left) join, is there to find all the translations in the target language. I use a left join so that even those words in S for which no translations exist will show up in the query results. 3. Why not write: CREATE VIEW txt AS SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS type2 FROM T a1 [LEFT] JOIN T a2 USING( zipk ); -- Use LEFT if appropriate SELECT word1, word1 FROM S JOIN txt ON word = word1 WHERE type1 = int1 AND type2 = int2; This is would indeed produce the same results as Q1, but this approach would require defining about 10,000 views, one for each possible pair of int1 and int2 (or pair of languages, to continue the multilingual dictionary analogy), which freaks me out for some reason. (Actually, the number of such views would be many more than that, because in the actual application there is not just one T but several dozen, similar to what would happen to the schema in the multilingual dictionary analogy if we wanted to pre-segregate the words according to some categories, say a T for animals, a T for fruits, a T for verbs, a T for professions, etc.) (I need to do a bit more work before I can post the EXPLAIN results.) kynn
Re: [PERFORM] Q on views and performance
On 2008-02-23 08:21, Kynn Jones wrote: ... 3. Why not write: CREATE VIEW txt AS SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS type2 FROM T a1 [LEFT] JOIN T a2 USING( zipk ); -- Use LEFT if appropriate SELECT word1, word1 FROM S JOIN txt ON word = word1 WHERE type1 = int1 AND type2 = int2; This is would indeed produce the same results as Q1, but this approach would require defining about 10,000 views, one for each possible pair of int1 and int2 Why 10,000 views??? What's wrong with the ONE view above? You DON'T want to be defining VIEWs based on actual tables VALUES; leave that to the SELECT. For that matter, what's wrong with the final SELECT I listed (below)? SELECT a1.word AS word1, a2.word AS word2 FROM S JOIN T a1 USING( word ) LEFT JOIN T a2 USING( zipk ) WHERE a1.type = int1 AND a2.type = int2; -- Dean -- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
Re: [PERFORM] Q on views and performance
On 2008-02-23 08:49, Dean Gibson (DB Administrator) wrote: Why 10,000 views??? What's wrong with the ONE view above? You DON'T want to be defining VIEWs based on actual tables VALUES; leave that to the SELECT. For that matter, what's wrong with the final SELECT I listed (below)? SELECT a1.word AS word1, a2.word AS word2 FROM S JOIN T a1 USING( word ) LEFT JOIN T a2 USING( zipk ) WHERE a1.type = int1 AND a2.type = int2; -- Dean Amendment: I forgot, that if it's a LEFT JOIN you have to write it as: SELECT a1.word AS word1, a2.word AS word2 FROM S JOIN T a1 USING( word ) LEFT JOIN T a2 USING( zipk ) WHERE a1.type = int1 AND (a2.type = int2 OR a2.type IS NULL); -- Dean -- Mail to my list address MUST be sent via the mailing list. All other mail to my list address will bounce.
[PERFORM] LISTEN / NOTIFY performance in 8.3
Hi, I'm noticing a strange increase in the amount of time it takes to issue a NOTIFY statement. I have an existing app that provides a producer / consumer type of queue and that uses the LISTEN / NOTIFY mechanism to signal the consumers of new items arriving in the queue. The consumers then process these items and issue a notify to signal that they have been processed. In the past issuing these notifications happened very quickly, now on 8.3 I'm seeing all of them taking over 300ms and many of them taking 1500ms or more! The notifications are happening outside of any transactions (which is itself a probable area for performance improvement, I realize) but I'm wondering what might have changed between 8.1 (the version I was using in the past) and 8.3? TIA, Joel ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] LISTEN / NOTIFY performance in 8.3
Joel Stevenson [EMAIL PROTECTED] writes: I have an existing app that provides a producer / consumer type of queue and that uses the LISTEN / NOTIFY mechanism to signal the consumers of new items arriving in the queue. The consumers then process these items and issue a notify to signal that they have been processed. In the past issuing these notifications happened very quickly, now on 8.3 I'm seeing all of them taking over 300ms and many of them taking 1500ms or more! That's strange, I would not have thought that listen/notify behavior would change at all. How are you measuring this delay exactly? Can you put together a self-contained test case? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] LISTEN / NOTIFY performance in 8.3
At 11:58 PM -0500 2/23/08, Tom Lane wrote: Joel Stevenson [EMAIL PROTECTED] writes: That's strange, I would not have thought that listen/notify behavior would change at all. How are you measuring this delay exactly? Can you put together a self-contained test case? Attached is a perl script that sort of simulates what's going on. Thanks for the script. It's not showing any particular problems here, though. With log_min_duration_statement = 10, the only statements that (slightly) exceed 10ms are the select count(*) from generate_series(1, 15000) ones. Also of note, the iowait percentages on this quad core linux box jump to 30-40% while this test script is running, event though there's no table activity involved and the producer consumers pause for up to a second between iterations. This sounds a bit like pg_listener has gotten bloated. Try a VACUUM VERBOSE pg_listener (as superuser) and see what it says. At the moment (server is inactive): pcdb=# VACUUM VERBOSE pg_listener; INFO: vacuuming pg_catalog.pg_listener INFO: pg_listener: removed 1 row versions in 1 pages INFO: pg_listener: found 1 removable, 21 nonremovable row versions in 28 pages DETAIL: 0 dead row versions cannot be removed yet. There were 2319 unused item pointers. 28 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM running the test script and then the above command: pcdb=# VACUUM VERBOSE pg_listener; INFO: vacuuming pg_catalog.pg_listener INFO: pg_listener: removed 693 row versions in 12 pages INFO: pg_listener: found 693 removable, 21 nonremovable row versions in 28 pages DETAIL: 0 dead row versions cannot be removed yet. There were 2308 unused item pointers. 28 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Numerous notifications took 1000ms or so to complete in the test script execution between those two vacuum runs. If that is the problem then the next question is why it got so much more bloated than you were used to --- something wrong with vacuuming procedures, perhaps? I have autovacuum on and using default settings. I have an explicit vacuum routine that runs nightly over the whole DB. -Joel ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq