Re: [GENERAL] request help forming query
yes, this could get called on quite large tables (maybe not billions ...). The second solution looks useful - I'll try it on some test data. thanks both of you. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Planner: rows=1 after similar to where condition.
-Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Monday, 25 February 2008 7:14 To: Joris Dobbelsteen Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Planner: rows=1 after similar to where condition. On Sun, Feb 24, 2008 at 4:35 PM, Joris Dobbelsteen [EMAIL PROTECTED] wrote: I seem to have some planner oddity, where it seems to completely mispredict the output after a regex compare. I've seem it on other occasions, where it completely screws up the join. You can note the rows=1 after the filter. A similar sitution has occurred when doing a regex filter in a subquery, which was subsequently predited as 1 row and triggered (oddly enough) a sequencial scan. Doing the same using equality on the result to substring(text from regex) seemed to work and produced a useful plan, since it did a hash-join (as it should have). Is this a known problem? Otherwise I think I should build a smaller test case... Using Postgresql 8.2.6 from Debian Etch-backports. Should be: PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian 4.1.1-21). Should have paid closer attention. Bitmap Heap Scan on log_syslog syslog (cost=13124.26..51855.25 rows=1 width=270) Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text)) Filter: ***SOME VERY LONG SIMILAR TO REGEX - BitmapAnd (cost=13124.26..13124.26 rows=18957 width=0) - Bitmap Index Scan on IX_log_syslog_program (cost=0.00..2223.95 rows=92323 width=0) Index Cond: ((program)::text = 'amavis'::text) - Bitmap Index Scan on IX_log_syslog_facility (cost=0.00..10899.81 rows=463621 width=0) Index Cond: ((facility)::text = 'mail'::text) It's not saying it will only get one row back for sure, it's saying it thinks it will return one row. and depending on your query, it might. What's the query, and what's the explain analyze of that query? See the attached file for the query and the explain (hopefully this gives a consistent view and maintains the layout for easier reading). The point is that it will NOT, not even close. The planner guesses 1 row, but the output was arround 13000 rows (of the 2.2M rows in the table). Oddly enough the 18k rows on the bitmap and seems a very good estimation. In fact, if I omit the SIMILAR TO, it estimates ~12000 rows, which is spot on. So it seems the SIMILAR TO really gets the planner confused. The real value was 12981 rows that were returned by the query in the first case. However, since I removed this data from the original table (its now somewhere else), I cannot present the original EXPLAIN ANALYZE any more. The new dataset only contains ~137 (but I still have the old statistics, I think, or at least they provide the same predictions). I also included a run after EXPLAIN ANALYZE on the current dataset. Hopefully this helps. Thanks, - Joris QUERY= SELECT a.msgid, a.rcv_time, a.sent_time, a.hostname, a.passedas, a.from, a.to, a.message_id, a.resent_message_id, a.mail_id, a.queue_id, convert_score(a.spamscore) AS spamscore, CAST(a.duration AS integer) FROM(SELECT syslog.msgid, syslog.rcv_time, syslog.sent_time, syslog.hostname, substring(syslog.text from 'amavis% Passed #%#, % % - %, Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as passedas, substring(syslog.text from 'amavis% Passed %, % #%# - %, Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as from, substring(syslog.text from 'amavis% Passed %, % % - #%#, Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as to, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: #[^]+#, (Resent-Message-ID: #%#, |)mail_id: %, Hits: %, queued_as: %, % ms' for '#') as message_id, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: %, Resent-Message-ID: #%#, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as resent_message_id, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: %, mail_id: #%#, Hits: %, queued_as: %, % ms' for '#') as mail_id, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: %, mail_id: %, Hits: #%#, queued_as: %, % ms' for '#') as spamscore, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: %, mail_id: %, Hits: %, queued_as: #%#, % ms' for '#') as queue_id, substring(syslog.text from 'amavis% Passed %, % % - %, Message-ID: %, mail_id: %, Hits: %, queued_as: %, #%# ms' for '#') as duration, syslog.text, syslog.facility, syslog.priority, syslog.program FROM public.log_syslog syslog WHERE syslog.program = 'amavis' AND syslog.facility = 'mail' AND syslog.priority =
Re: [GENERAL] Planner: rows=1 after similar to where condition.
Joris Dobbelsteen [EMAIL PROTECTED] writes: Should be: PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian 4.1.1-21). In this case that may matter. One of the changes in the 8.2.6 update was: . Improve planner's handling of LIKE/regex estimation in non-C locales (Tom) I seem to recall this largely had to do with negated regexp matches but I might only be remembering part of it. I would start by taking all the fixes for known bugs before trying to diagnose a new one :) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Planner: rows=1 after similar to where condition.
-Original Message- From: Gregory Stark [mailto:[EMAIL PROTECTED] Sent: Monday, 25 February 2008 12:31 To: Joris Dobbelsteen Cc: Scott Marlowe; pgsql-general@postgresql.org Subject: Re: Planner: rows=1 after similar to where condition. Joris Dobbelsteen [EMAIL PROTECTED] writes: Should be: PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian 4.1.1-21). In this case that may matter. One of the changes in the 8.2.6 update was: . Improve planner's handling of LIKE/regex estimation in non-C locales (Tom) I seem to recall this largely had to do with negated regexp matches but I might only be remembering part of it. I would start by taking all the fixes for known bugs before trying to diagnose a new one :) It seems debian backports are not upgraded on my system. So I performed the upgrade to: PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) And indeed, I should have upgraded it before to the latest version. It still spits out the same EXPLAIN ANALYZE result. Bitmap Heap Scan on log_syslog syslog (cost=11168.32..16988.84 rows=1 width=221) (actual time=11145.729..30067.606 rows=212 loops=1) Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text)) Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+, [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+, (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text)) - BitmapAnd (cost=11168.32..11168.32 rows=2095 width=0) (actual time=608.771..608.771 rows=0 loops=1) - Bitmap Index Scan on IX_log_syslog_program (cost=0.00..490.06 rows=19160 width=0) (actual time=94.982..94.982 rows=85238 loops=1) Index Cond: ((program)::text = 'amavis'::text) - Bitmap Index Scan on IX_log_syslog_facility (cost=0.00..10677.75 rows=426214 width=0) (actual time=504.960..504.960 rows=455084 loops=1) Index Cond: ((facility)::text = 'mail'::text) Total runtime: 30068.868 ms From what little I know statistics and how, in postgres, the filter expressions are computed, the rows=1 prediction seems still a bit odd. It seems the fix did not effect this query. Thanks for your support so far, - Joris ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work
I think I am missing something about how the new CREATE OR REPLACE FUNCTION ...COST works or I am missing some setting in postgresql conf. I was hoping I could use it to control the function that is used in cases where only one needs to be evaluated. Regardless of what I do it seems to always evaluate the first function in the list. I'm running on PostgreSQL 8.3.0, compiled by Visual C++ build 1400 Here is an example of my test: Functions and tables CREATE TABLE log_call ( fn_name character varying(100) NOT NULL, fn_calltime timestamp with time zone NOT NULL DEFAULT now() ) WITH (OIDS=FALSE); CREATE OR REPLACE FUNCTION fn_pg_costlyfunction() RETURNS integer AS $$ BEGIN INSERT INTO log_call(fn_name) VALUES('fn_pg_costlyfunction()'); RETURN 5; END$$ LANGUAGE 'plpgsql' VOLATILE COST 100; CREATE OR REPLACE FUNCTION fn_pg_cheapfunction() RETURNS integer AS $$ BEGIN INSERT INTO log_call(fn_name) VALUES('fn_pg_cheapfunction()'); RETURN 5; END$$ LANGUAGE 'plpgsql' VOLATILE COST 1; --- Now for the test - --Test 1: This shows that fn_pg_costlyfunction() is the only function that is run - -- unexpected to me shouldn't no function be evaluated or the cheap one? --What's the difference between Test 1 and Test 2 that makes Test 2 do the RIGHT thing? TRUNCATE TABLE log_call; SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 OR 5 2); --Test 2: This works as I would expect - shows that none of the functions are run presumably its going straight for 5 2 --becuase it recognizes its the cheapest route TRUNCATE TABLE log_call; SELECT foo.value FROM (SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 OR 5 2 ) as value) as foo --Test 3: It always runs the first function even though the cost of the first is higher than the second (in this case log_call contains fn_pg_costlyfunction()) TRUNCATE TABLE log_call; SELECT foo.value FROM (SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2) as value) as foo; TRUNCATE TABLE log_call; SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2) as value; --Test 4: It always runs the first function even though the cost of the first is higher than the second (in this case log_call contains fn_pg_cheapfunction()) TRUNCATE TABLE log_call; SELECT foo.value FROM (SELECT (fn_pg_cheapfunction() 2 OR fn_pg_costlyfunction() 2 ) as value) as foo; TRUNCATE TABLE log_call; SELECT (fn_pg_cheapfunction() 2 OR fn_pg_costlyfunction() 2 ) as value; Thanks, Regina - The substance of this message, including any attachments, may be confidential, legally privileged and/or exempt from disclosure pursuant to Massachusetts law. It is intended solely for the addressee. If you received this in error, please contact the sender and delete the material from any computer.
Re: [GENERAL] text and bytea
Umm, I think all you showed was that the to_ascii() function was broken. Postgres knows exactly what encoding the string is in, the backend encoding: in your case UTF-8. That would be fine, if it were true; then, one could assume that every postgresql function that returns a text gets ALWAYS the standard backend encoding (again: as in Java). But consider the result postgresql gets from this (from my example): encode(convert_to(c,'LATIN9'),'escape') That's something of type text (a strign), postgresql believes it's UTF8, but it's not (it probably woud not even validate as a valid utf8 sequence). IMHO, the semantics of encode() and decode() are correct (the bridge between bytea and text ... in the backend encoding; they should be the only bridge), convert() is also ok (deals with bytes), but convert_to() and convert_from() are dubious if not broken: they imply texts in arbitrary encodings (for output or input) , lead to anomalities and shouldnt be necessary at all. Cheers Hernán J. González http://hjg.com.ar/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] text and bytea
hernan gonzalez [EMAIL PROTECTED] writes: IMHO, the semantics of encode() and decode() are correct (the bridge between bytea and text ... in the backend encoding; they should be the only bridge), convert() is also ok (deals with bytes), but convert_to() and convert_from() are dubious if not broken: they imply texts in arbitrary encodings (for output or input) , lead to anomalities and shouldnt be necessary at all. postgres=# \df convert_from List of functions Schema | Name | Result data type | Argument data types +--+--+- pg_catalog | convert_from | text | bytea, name (1 row) postgres=# \df convert_to List of functions Schema |Name| Result data type | Argument data types ++--+- pg_catalog | convert_to | bytea| text, name (1 row) Looks like they produce and consume byteas to me. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Planner: rows=1 after similar to where condition.
Joris Dobbelsteen [EMAIL PROTECTED] writes: Bitmap Heap Scan on log_syslog syslog (cost=11168.32..16988.84 rows=1 width=221) (actual time=11145.729..30067.606 rows=212 loops=1) Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text)) Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+, [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+, (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text)) It's not too surprising that you'd get a small selectivity estimate for such a long regexp; the default estimate is just based on the amount of fixed text in the pattern, and you've got a lot. If you increase the stats target for the column to 100 or more then it will try actually applying the regexp to all the histogram entries. That might or might not give you a better estimate. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Planner: rows=1 after similar to where condition.
Resent due to bounce... orange.nl #5.0.0 X-SMTP-Server; host sss.pgh.pa.us[66.207.139.130] said: 550 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Monday, 25 February 2008 16:34 To: Joris Dobbelsteen Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org Subject: Re: [GENERAL] Planner: rows=1 after similar to where condition. Joris Dobbelsteen [EMAIL PROTECTED] writes: Bitmap Heap Scan on log_syslog syslog (cost=11168.32..16988.84 rows=1 width=221) (actual time=11145.729..30067.606 rows=212 loops=1) Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text)) Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+, [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+, (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text)) It's not too surprising that you'd get a small selectivity estimate for such a long regexp; the default estimate is just based on the amount of fixed text in the pattern, and you've got a lot. If you increase the stats target for the column to 100 or more then it will try actually applying the regexp to all the histogram entries. That might or might not give you a better estimate. I will try that, expect result back within a few days (have it collect some better sample set). Unfortunally the regex is not so much for narrowing down the selection, but rather guarenteeing the format of the messages. You seem to consider the common case differently, and I can agree for most part. Unfortunally my use-case is different from the expected. That said, might a less aggressive selectivity estimation for long strings work better in the common case? Might an alternative be to use a function and check for a positive result, i.e. something that the predictor cannot take into account? - Joris ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work
Obe, Regina [EMAIL PROTECTED] writes: --Test 1: This shows that fn_pg_costlyfunction() is the only function that is run - -- unexpected to me shouldn't no function be evaluated or the cheap one? --What's the difference between Test 1 and Test 2 that makes Test 2 do the RIGHT thing? TRUNCATE TABLE log_call; SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 OR 5 2); In a SELECT with no FROM we don't run the optimizer at all; the assumption is that when the expression will only be evaluated once, it's not worth trying to do expression simplification on it first. --Test 2: This works as I would expect - shows that none of the functions are run presumably its going straight for 5 2 --becuase it recognizes its the cheapest route TRUNCATE TABLE log_call; SELECT foo.value FROM (SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 OR 5 2 ) as value) as foo That's just constant-folding: x OR TRUE is TRUE. It has exactly zero to do with the cost of anything. Offhand I think the behavior you are looking for of choosing to run more expensive subexpressions later only occurs for top-level WHERE clauses that are combined with AND. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Planner: rows=1 after similar to where condition.
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Monday, 25 February 2008 16:34 To: Joris Dobbelsteen Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org Subject: Re: [GENERAL] Planner: rows=1 after similar to where condition. Joris Dobbelsteen [EMAIL PROTECTED] writes: Bitmap Heap Scan on log_syslog syslog (cost=11168.32..16988.84 rows=1 width=221) (actual time=11145.729..30067.606 rows=212 loops=1) Recheck Cond: (((program)::text = 'amavis'::text) AND ((facility)::text = 'mail'::text)) Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~ '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+, [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+, (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+, queued.as: [^ ,]+, [0-9]+ ms)$'::text)) It's not too surprising that you'd get a small selectivity estimate for such a long regexp; the default estimate is just based on the amount of fixed text in the pattern, and you've got a lot. If you increase the stats target for the column to 100 or more then it will try actually applying the regexp to all the histogram entries. That might or might not give you a better estimate. I will try that, expect result back within a few days (have it collect some better sample set). Unfortunally the regex is not so much for narrowing down the selection, but rather guarenteeing the format of the messages. You seem to consider the common case differently, and I can agree for most part. Unfortunally my use-case is different from the expected. That said, might a less aggressive selectivity estimation for long strings work better in the common case? Might an alternative be to use a function and check for a positive result, i.e. something that the predictor cannot take into account? - Joris ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work
--Test 2: This works as I would expect - shows that none of the functions are run presumably its going straight for 5 2 --becuase it recognizes its the cheapest route TRUNCATE TABLE log_call; SELECT foo.value FROM (SELECT (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 OR 5 2 ) as value) as foo That's just constant-folding: x OR TRUE is TRUE. It has exactly zero to do with the cost of anything. Offhand I think the behavior you are looking for of choosing to run more expensive subexpressions later only occurs for top-level WHERE clauses that are combined with AND. regards, tom lane Tom thanks for the clarification - based on your comment I verified with these -- fn_pg_cheapfunction() is the only one run as you predicted TRUNCATE TABLE log_call; SELECT true as value WHERE (fn_pg_costlyfunction() 2 AND fn_pg_cheapfunction() 5 ); -- fn_pg_costlyfunction() is the only one run - again as predicted by your statement TRUNCATE TABLE log_call; SELECT true as value WHERE (fn_pg_costlyfunction() 2 OR fn_pg_cheapfunction() 2 ); It would be really nice if this worked with OR as well. Is it just much harder to deal with the OR case in the planner or was there some other reason why the OR case was left out? Thanks, Regina - The substance of this message, including any attachments, may be confidential, legally privileged and/or exempt from disclosure pursuant to Massachusetts law. It is intended solely for the addressee. If you received this in error, please contact the sender and delete the material from any computer. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] process pool
Hi, sorry for this question, but as far as I know postgresql does not use a process pool, rather a new process is created for any connection on demand. If this is true, what is the reason for this? Thanks, Luca ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work
Obe, Regina [EMAIL PROTECTED] writes: It would be really nice if this worked with OR as well. Is it just much harder to deal with the OR case in the planner or was there some other reason why the OR case was left out? Nobody's really made a case why we should have the planner expend cycles on that. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Cascading Trigger - changing row on delete does not delete row
All, I have 2 tables which both have triggers on them. When I delete a row on table A, a cascading trigger ends up modifying rows in table B. The modified rows in table B trigger an update on rows in table A which happens to be the same row that I am trying to delete. I don't get any errors from the delete, yet PostgreSQL tells me 0 rows affected by the delete and sure enough the row I just tried to delete is still there. Running the delete a 2nd time works because the trigger does not cascade and effect the deleted row. Is there a way to know that a row I am deleting is being deleted so I don't update it? I thought about adding a boolean column 'is_being_deleted' but I can't set that to true without updating the row (which I'm trying to avoid). I've thought about using PL/Perl to access transaction-level global variables where I could store the ID of the row I'm deleting and fetch that value in order to avoid it in my updates ... but I don't want invoke the PL/Perl interpreter and slow down what I'm already doing in PL/PGSQL. Are there transaction-level variables in PL/PGSQL (globals)? Suggestions? -- Dante ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] process pool
Luca Ferrari [EMAIL PROTECTED] writes: Hi, sorry for this question, but as far as I know postgresql does not use a process pool, rather a new process is created for any connection on demand. If this is true, what is the reason for this? Generally to add code the question is not why not, but why? That is, the question you should be asking is why would we do it? What advantage do you anticipate it would have. I assume you think it would be faster. But forking processes on Unix is very fast compared to starting a database session and transaction so the gain would be minimal. And in any case you can always add a pool outside of Postgres using things like pgpool or pgbouncer. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] text and bytea
IMHO, the semantics of encode() and decode() are correct (the bridge between bytea and text ... in the backend encoding; they should be the only bridge), convert() is also ok (deals with bytes), but convert_to() and convert_from() are dubious if not broken: they imply texts in arbitrary encodings (for output or input) , lead to anomalities and shouldnt be necessary at all. Sorry, my mistake. I meant the opposite: convert_to() and convert_from() are the correct bridge (text = bytea) functions. Also convert() is ok. The objetionable ones IMHO are decode()/encode(), which can consume/produce a non-utf8 string (I mean, not the backend encoding) Going back to the line: encode(convert_to(c,'LATIN9'),'escape') Here we have: c = text (ut8) convert_to(..). = bytea (represents a char sequence in latin9 encoding) encode(...) = text (in latin9 encoding?) Cheers Hernán J. González http://hjg.com.ar/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] text and bytea
Another example (Psotgresql 8.3.0, UTF-8 server/client encoding) test=# create table chartest ( c text); test=# insert into chartest (c) values ('¡Hasta mañana!'); test=# create view vchartest as select encode(convert_to(c,'LATIN9'),'escape') as c1 from chartest; test=# select c,octet_length(c) from chartest ; c| octet_length +-- ¡Hasta mañana! | 16 test=# select c1,octet_length(c1) from vchartest ; c1 | octet_length --+-- Hasta maana! | 14 (the field is seen as as text by postgresql, with the default encoding.. UTF8; it is actually not) test=# select * from vchartest where c1 like '%a%'; c1 -- Hasta maana! (1 row) test=# select * from vchartest where c1 ilike '%a%'; ERROR: invalid byte sequence for encoding UTF8: 0xa1 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. That ilike breaks and like is rather random, it seems that the later has some optimization does not check the validty of the utf8 stream. But thats not the point. The point is that IMO postgresql should always handle text in the backend encoding, there should no exists funcions that are designed to produce/consume texts in other encodings. Perhaps the encode function is ill defined, and should be rethinked. Two alternatives: 1. For special binary-to-ascii encodings (base64,hex). Keep its present signature but remove/deprecate the escape type. It returns a text in the backend encoding. 2 For arbitrary binary encodings. Change its signature so that it returns bytea. Of course, all this applies symmetrically to decode(). Appart, and in accordance with this, I think to_ascii() should accept only one text argument. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] APEX / HTML DB for PostgreSQL
Il giorno 24/feb/08, alle ore 04:53, justin tocci ha scritto: In its most basic form a great tool would just start as a pl/pgsql or pl/perl function that could be used to call a table and have it output a batch of records to an editable html form or a colored pdf for nice reports. There would need to be some kind of interface for Apaché of course. I guess the XML functions of 8.3 + an XSL stylesheet shoud do the job... my 2 cents, e. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
Stuart Brooks wrote: ERROR: canceling autovacuum task CONTEXT: automatic vacuum of table metadb.test.transactions Are these happening regularly? They indicate that something is happening on the table that collides with what autovacuum needs to do, and autovacuum defers its task. For this to happen you need to be doing ALTER TABLE or similar however; normal UPDATE/INSERT/DELETE should not cause autovacuum to cancel itself. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] text and bytea
hernan gonzalez [EMAIL PROTECTED] writes: The objetionable ones IMHO are decode()/encode(), which can consume/produce a non-utf8 string (I mean, not the backend encoding) Huh? Those deal with bytea too --- in fact, they've got nothing at all to do with multibyte character representations. They're for handling hex and base64 and suchlike representations of binary data. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] text and bytea
hernan gonzalez [EMAIL PROTECTED] writes: test=# create view vchartest as select encode(convert_to(c,'LATIN9'),'escape') as c1 from chartest; Hmm. This isn't a very sensible combination that you've written here, but I see the point: encode(..., 'escape') is broken in that it fails to convert high-bit-set bytes into \nnn sequences. At least in multibyte backend encodings, we *must* do that to produce valid textual output. I suspect that for consistency we should do it regardless of backend encoding. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] give a hand to a user having trouble with MySQL?
Came across a blog from Ovid, one of the Perl gurus at BBC. http://use.perl.org/~Ovid/journal/35752?from=rss Quote They're not kidding, either. Subquery optimization in MySQL is terribly brokenhttp://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/, as we've discovered today. We have Matt Trout in for some consulting and even ridiculously stupid subqueries have terrible query plans, forcing us to to consider nasty workarounds, one of which is called PostgreSQL. Even though he counts PostgreSQL as one of th enasty workarounds, maybe we should give them a helping hand? Wenjian
[GENERAL] Windows 8.3 installer leaves a debug flag on?
To follow up on my old thread - I tested another install of Postgres 8.3.0.1 - and on windows, the postgresql.conf file has this in it: shared_preload_libraries = '$libdir/plugins/plugin_debugger.dll' # (change requires restart) I got the impression that that debugger wasn't intended to be enabled, by default. It looks like the windows installer is enabling it, however. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] disabling triggers, constaints and so on
We are still in a pickle with trying to resolve our trigger issues without affecting slony triggers. The point is, we need to be able to disable triggers, check constraints, and foreign-key constraints without affecting slony triggers in certain situations. This is all running on 7.4.19, thus, it's our understanding that using tgenabled is not going to be a solution. Making our triggers smarter doesn't get us all the way there. Suggestions? -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Use index for upper(customername) like 'ABC%'
CREATE INDEX ie_cust_upper_name ON customer ( UPPER( customername ) ) WHERE ( UPPER(customername) LIKE 'ABC%' ); ABC is different at every time. This is first letters of customer name entered by user. So I cannot add ABC to index. Andrus. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] RETURNS SETOF function question
On Sun, Feb 24, 2008 at 7:08 PM, Erik Jones [EMAIL PROTECTED] wrote: There was an article that covered this in the Postgres Online Journal ( http://www.postgresonline.com/journal/index.php?/categories/6-pl-programming). Basically, do this: CREATE OR REPLACE FUNCTION foo(text, text) RETURNS SETOF text $$ SELECT * FROM foo($1, $2); $$ LANGUAGE sql; That one goes straight into my (still tiny) bag o' Pg tricks. It worked like a charm. Thanks! G.
[GENERAL] how to auto GRANT custom ACL on a new table?
Hello, I have asked the question on novice ML but it seems there is not much traffic there. I hope nobody minds I'm re-asking here, on a much more active list. The problem is as follows: There are 2 role groups in my cluster: editors and viewers. In each group there are several users. In the DB, members of editors are allowed to create new tables in one schema. My problem is that only the very user who created the table is allowed to view and edit it, by default. I need to modify this default PostgreSQL's behaviour, so that the ACL on a new table in this schema is set to SELECT, INSERT, UPDATE, DELETE, REFERENCES for editors, and SELECT for viewers, without having to manually GRANT rights each time a new table is created. I can't control this setting from the client software as these are various programs, and even if I could it'd be still better anyway to have it controlled in one single place on the server side. But how? From reading so far I *suppose* I should create a function which calls an appropriate GRANT, and trigger it when a new record is added to pg_class. Is this doable? If the function+trigger approach is optimal, can Anybody share a working example? I have a muddy idea it would be something like this pseudo-code: CREATE TRIGGER trig_acl AFTER INSERT ON pg_catalog.pg_class FOR EACH ROW EXECUTE PROCEDURE fnct_acl(); CREATE FUNCTION fnct_acl () RETURNS opaque AS ' DECLARE table_name TEXT #how do I fetch the table name??? BEGIN GRANT ALL ON myschema.table_name TO editors; END; ' LANGUAGE 'plpgsql'; What should I modify to make it real code? An extra, but desired functionality, would be if I could also prevent other editors from modifying the table *if* it is being currently being edited by somebody. Is that feasible at all? Thank you for any pointers! Regards, Maciek P.S. I searched the archives and only found this [1] thread related, but it does not provide an obvious (for me) solution. [1]http://www.nabble.com/grant-select-on-all-tables-to1029307.html#a1029307 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Windows 8.3 installer leaves a debug flag on?
Dan Armbrust [EMAIL PROTECTED] writes: To follow up on my old thread - I tested another install of Postgres 8.3.0.1 - and on windows, the postgresql.conf file has this in it: shared_preload_libraries = '$libdir/plugins/plugin_debugger.dll' # (change requires restart) I got the impression that that debugger wasn't intended to be enabled, by default. It looks like the windows installer is enabling it, however. I agree, this does not seem like a wise default. I'm not sure what the performance penalty is for having that preloaded, but it's likely not zero ... and production machines really shouldn't have it installed anyway. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] copy with escape
I have data that I'm running through pg_escape_sting in php and then adding to stdin for a copy command. The problem is O'reilly is being changed to O''Reilly in the string and then in the db. I saw with the copy command I can specify the escape but it isn't working for me. Should this command fix this double 'single' quote issue when I put it in the db? And what is the proper syntax? COPY mytable FROM stdin with escape Thanks!
Re: [GENERAL] copy with escape
blackwater dev [EMAIL PROTECTED] writes: I have data that I'm running through pg_escape_sting in php and then adding to stdin for a copy command. The problem is O'reilly is being changed to O''Reilly in the string and then in the db. pg_escape_string is designed to produce a string properly quoted for use as a literal in a SQL command. It is completely wrong for data that is to go into COPY input. I kinda doubt that PHP has anything built-in that's suitable for COPY, though I could be wrong ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] copy with escape
blackwater dev wrote: I have data that I'm running through pg_escape_sting in php and then adding to stdin for a copy command. The problem is O'reilly is being changed to O''Reilly in the string and then in the db. I saw with the copy command I can specify the escape but it isn't working for me. Should this command fix this double 'single' quote issue when I put it in the db? And what is the proper syntax? COPY mytable FROM stdin with escape COPY mytable (...) FROM STDIN WITH CSV ESCAPE '; http://www.postgresql.org/docs/8.3/static/sql-copy.html But CSV comes with a lot of baggage. You'd be far better off doing tab-delimited, unquoted fields (if you have no tabs in your data). COPY mytable (...) FROM STDIN; ... \. I can't remember precisely all of what pg_escape_string() does, but if you need it for something else you could always do this afterward ;-) implode(\t, str_replace('', ', $row)) b ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Deploying PostgreSQL on virtualized hardware
In a couple months, I'm going to be considering how best to deploy an application I have that uses PostgreSQL as its back-end. It also makes heavy use of Perl under mod_perl and UMN MapServer with a sprinkling of PostGIS. I've recently become intrigued by the idea of virtualized servers [0], but I don't know anyone who uses them, so I thought I'd ask the list if any of you have tried deploying PostgreSQL or any other disk-heavy software on them? I'm not familiar with how the virtualized disks are arranged, how volatile the memory is, and so forth. And, unfortunately, the advertising folk don't seem to be very keen on divulging the technical details. It would certainly be nice to pay $60 per month for eight virtualization slots and scale up from there as the service grows, rather than buying $10,000 worth of servers when they won't be even close to fully utilized for a several months. Any thoughts? Thanks. Colin [0] eg. http://www.gandi.net/hebergement/offre/xen/ ---(end of broadcast)--- TIP 6: explain analyze is your friend