Re: [GENERAL] Optimizing query?
On 2013-01-31, haman...@t-online.de haman...@t-online.de wrote: Pavel Stehlule wrote: Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, = etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code =3D tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D= in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z= ])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) =3D tab2.code Hi Pavel, it was just by chance that a fixed size substring would match the data at hand. It is more common to have a digit/letter (or vice versa) boundary or a hyphen there both take an enormous time. In the better case that I can subset (e.g. a= ll candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and= 800 entries in tab2 is about 40 seconds. any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. I see the problem since obviously every the ~ operator with a non-constant pattern is constantly recompiling the pattern. I wonder whether it would be possible to invent a prefix-match operator that approaches the performance of string equality. I noted in the past (not sure whether anything has changed in regex matching) that a constant leading part of regex would improve performance, i.e. use an index scan to select possible candidates. you could write a set returning function that opens cursors on both tables using ORDER BY code and merges the results -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Pavel Stehule wrote: but maybe https://github.com/dimitri/prefix can help Hi Pavel, thanks - this works perfect. However, it does not seem to play well with the optimizer, so I ended up with select all candidates into a temp table using prefix operator apply all other conditions by joining that temp table to original ones Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
2013/2/2 haman...@t-online.de: Pavel Stehule wrote: but maybe https://github.com/dimitri/prefix can help Hi Pavel, thanks - this works perfect. However, it does not seem to play well with the optimizer, so I ended up with select all candidates into a temp table using prefix operator apply all other conditions by joining that temp table to original ones you can send proposals to enhancing to Dimitry - He will be happy :) Regards Pavel Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
2013/2/2 Pavel Stehule pavel.steh...@gmail.com: 2013/2/2 haman...@t-online.de: Pavel Stehule wrote: but maybe https://github.com/dimitri/prefix can help Hi Pavel, thanks - this works perfect. However, it does not seem to play well with the optimizer, so I ended up with select all candidates into a temp table using prefix operator apply all other conditions by joining that temp table to original ones seriously - it is typical solution - and it is great so PostgreSQL help to you :) Regards Pavel you can send proposals to enhancing to Dimitry - He will be happy :) Regards Pavel Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
2013/1/31 Vincent Veyron vv.li...@wanadoo.fr: Le mercredi 30 janvier 2013 à 11:08 +, wolfg...@noten5.maas-noten.de a écrit : Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code = tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) = tab2.code both take an enormous time. In the better case that I can subset (e.g. all candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and 800 entries in tab2 is about 40 seconds. any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. You can try use a functional index. create index on tab2 ((substring(tab1.code from 1 for 5)) Regards Pavel Stehule Regards Wolfgang Hamann -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Le jeudi 31 janvier 2013 à 09:49 +0100, Pavel Stehule a écrit : any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. Hi Pavel, Thank you for the correction. Since we're at it, I have a question regarding functions in a query : Suppose I have a query of the form SELECT my_function(column_1), column_2 FROM my_table GROUP BY my_function(column_1) ORDER BY my_function(column_1); where my_function is a user defined function. How many times is the function computed? -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
2013/1/31 Vincent Veyron vv.li...@wanadoo.fr: Le jeudi 31 janvier 2013 à 09:49 +0100, Pavel Stehule a écrit : any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. Hi Pavel, Thank you for the correction. Since we're at it, I have a question regarding functions in a query : Suppose I have a query of the form SELECT my_function(column_1), column_2 FROM my_table GROUP BY my_function(column_1) ORDER BY my_function(column_1); where my_function is a user defined function. How many times is the function computed? if function is stable or immutable, then once per row Pavel -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Le jeudi 31 janvier 2013 à 11:06 +0100, Pavel Stehule a écrit : 2013/1/31 Vincent Veyron vv.li...@wanadoo.fr: Suppose I have a query of the form SELECT my_function(column_1), column_2 FROM my_table GROUP BY my_function(column_1) ORDER BY my_function(column_1); where my_function is a user defined function. How many times is the function computed? if function is stable or immutable, then once per row In this post (watch for line-wrap) : http://www.postgresql.org/message-id/CAFj8pRAdYL1-hCxH +qszqkht9ynoaoigkfx4cnc9mzutimc...@mail.gmail.com you wrote that it is usually better not to mark SQL functions (as opposed to plpgsql functions). So should I mark SQL functions stable/immutable if I use them in a query like the one above, or is it unnecessary? -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
2013/1/31 Vincent Veyron vv.li...@wanadoo.fr: Le jeudi 31 janvier 2013 à 11:06 +0100, Pavel Stehule a écrit : 2013/1/31 Vincent Veyron vv.li...@wanadoo.fr: Suppose I have a query of the form SELECT my_function(column_1), column_2 FROM my_table GROUP BY my_function(column_1) ORDER BY my_function(column_1); where my_function is a user defined function. How many times is the function computed? if function is stable or immutable, then once per row In this post (watch for line-wrap) : http://www.postgresql.org/message-id/CAFj8pRAdYL1-hCxH +qszqkht9ynoaoigkfx4cnc9mzutimc...@mail.gmail.com you wrote that it is usually better not to mark SQL functions (as opposed to plpgsql functions). So should I mark SQL functions stable/immutable if I use them in a query like the one above, or is it unnecessary? It should not be marked Regards Pavel -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Pavel Stehlule wrote: Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, = etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code =3D tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D= in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z= ])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) =3D tab2.code Hi Pavel, it was just by chance that a fixed size substring would match the data at hand. It is more common to have a digit/letter (or vice versa) boundary or a hyphen there both take an enormous time. In the better case that I can subset (e.g. a= ll candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and= 800 entries in tab2 is about 40 seconds. any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. I see the problem since obviously every the ~ operator with a non-constant pattern is constantly recompiling the pattern. I wonder whether it would be possible to invent a prefix-match operator that approaches the performance of string equality. I noted in the past (not sure whether anything has changed in regex matching) that a constant leading part of regex would improve performance, i.e. use an index scan to select possible candidates. You can try use a functional index. create index on tab2 ((substring(tab1.code from 1 for 5)) What kind of trick is that - mixing two tables into a functional index? What would the exact syntax be for that? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Hello 2013/1/31 haman...@t-online.de: Pavel Stehlule wrote: Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, = etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code =3D tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D= in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z= ])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) =3D tab2.code Hi Pavel, it was just by chance that a fixed size substring would match the data at hand. It is more common to have a digit/letter (or vice versa) boundary or a hyphen there both take an enormous time. In the better case that I can subset (e.g. a= ll candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and= 800 entries in tab2 is about 40 seconds. any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. I see the problem since obviously every the ~ operator with a non-constant pattern is constantly recompiling the pattern. I wonder whether it would be possible to invent a prefix-match operator that approaches the performance of string equality. I noted in the past (not sure whether anything has changed in regex matching) that a constant leading part of regex would improve performance, i.e. use an index scan to select possible candidates. You can try use a functional index. create index on tab2 ((substring(tab1.code from 1 for 5)) What kind of trick is that - mixing two tables into a functional index? it is not possible - you can do some auxiliary table and creating indexes over this table but maybe https://github.com/dimitri/prefix can help Regards Pavel What would the exact syntax be for that? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Optimizing query?
Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code = tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])'); both take an enormous time. In the better case that I can subset (e.g. all candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and 800 entries in tab2 is about 40 seconds. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Le mercredi 30 janvier 2013 à 11:08 +, wolfg...@noten5.maas-noten.de a écrit : Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code = tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) = tab2.code both take an enormous time. In the better case that I can subset (e.g. all candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and 800 entries in tab2 is about 40 seconds. Regards Wolfgang Hamann -- Salutations, Vincent Veyron http://marica.fr/site/demonstration Logiciel de gestion des contentieux juridiques et des sinistres d'assurance -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Optimizing query
Hello. I have a query which works a bit slow. It's runned on desktop computer: AMD Athlon X2 2GHz , Win Xp sp2, 1GB ram. Postgres 8.4.5 with some changes in config: shared_buffers = 200MB # min 128kB # (change requires restart) temp_buffers = 8MB # min 800kB work_mem = 12MB # min 64kB maintenance_work_mem = 32MB # min 1MB Indexes in table NumeryA: NTA, NKA, KodBłędu, Plikprimary key DataPliku, KodBłędu index dp_kb NKA, NTAindex nka_nta Indexes in table Rejestr stacji do naprawy: LP- primary key Numer kierunkowy, substr(Numer stacji::text, 1, 5) - index 3 Data weryfikacji - index Data weryfikacji_1 Numer kierunkowy, Numer stacji, Data odrzucania bilingu z Serat - index Powtórzenia - Query is: -- SELECT A.NKA, A.NTA, Min(PołączeniaMin) || ',' || Max(PołączeniaMax) AS Biling, Sum(Ile)::text AS Ilość CDR, R.LP::text AS Sprawa, (R.Osoba weryfikująca) AS Osoba, to_char(min(Wartość),'FM990D00') AS Wartość po kontroli, max(R.Kontrola po naprawie w Serat - CDR)::text AS CDR po kontroli, min(A.KodBłędu)::text AS KodBłędu, Max(to_char(R.Data kontroli,'-MM-DD')) AS Ostatnia Kontrola , max(Skutek wprowadzenia błednej ewidencji w Serat) as Skutek , sum(www.a_biling_070(NRB))::text , sum(www.a_biling_darmowy(NRB))::text FROM (SELECT NumeryA.* FROM ONLY NumeryA WHERE DataPliku = current_date-4*30 and KodBłędu=74::text ) AS A LEFT JOIN (SELECT * FROM Rejestr stacji do naprawy WHERE Data weryfikacji = current_date-4*30 ) AS R ON A.NKA = R.Numer kierunkowy and substr(A.NTA,1,5) = substr(R.Numer stacji,1,5) and A.NTA like R.Numer stacji GROUP BY R.Osoba weryfikująca,R.LP,A.NKA, A.NTA ORDER BY Sum(Ile) DESC LIMIT 5000 -- Explain analyze: -- Limit (cost=30999.84..31012.34 rows=5000 width=149) (actual time=7448.483..7480.094 rows=5000 loops=1) - Sort (cost=30999.84..31073.19 rows=29341 width=149) (actual time=7448.475..7459.663 rows=5000 loops=1) Sort Key: (sum(NumeryA.Ile)) Sort Method: top-N heapsort Memory: 1488kB - GroupAggregate (cost=11093.77..29050.46 rows=29341 width=149) (actual time=4700.654..7377.762 rows=14225 loops=1) - Sort (cost=11093.77..11167.12 rows=29341 width=149) (actual time=4699.587..4812.776 rows=46732 loops=1) Sort Key: Rejestr stacji do naprawy.Osoba weryfikująca, Rejestr stacji do naprawy.LP, NumeryA.NKA, NumeryA.NTA Sort Method: quicksort Memory: 9856kB - Merge Left Join (cost=8297.99..8916.58 rows=29341 width=149) (actual time=2931.449..3735.876 rows=46732 loops=1) Merge Cond: (((NumeryA.NKA)::text = (Rejestr stacji do naprawy.Numer kierunkowy)::text) AND ((substr((NumeryA.NTA)::text, 1, 5)) = (substr((Rejestr stacji do naprawy.Numer stacji)::text, 1, 5 Join Filter: ((NumeryA.NTA)::text ~~ (Rejestr stacji do naprawy.Numer stacji)::text) - Sort (cost=6062.18..6135.53 rows=29341 width=95) (actual time=2131.297..2241.303 rows=46694 loops=1) Sort Key: NumeryA.NKA, (substr((NumeryA.NTA)::text, 1, 5)) Sort Method: quicksort Memory: 7327kB - Bitmap Heap Scan on NumeryA (cost=1502.09..3884.98 rows=29341 width=95) (actual time=282.570..1215.355 rows=46694 loops=1) Recheck Cond: ((DataPliku = (('now'::text)::date - 120)) AND ((KodBłędu)::text = '74'::text)) - Bitmap Index Scan on dp_kb (cost=0.00..1494.75 rows=29341 width=0) (actual time=281.991..281.991 rows=46694 loops=1) Index Cond: ((DataPliku = (('now'::text)::date - 120)) AND ((KodBłędu)::text = '74'::text)) - Sort (cost=2235.82..2285.03 rows=19684 width=64) (actual time=800.101..922.463 rows=54902 loops=1) Sort Key: Rejestr stacji do naprawy.Numer kierunkowy, (substr((Rejestr stacji do naprawy.Numer stacji)::text, 1, 5)) Sort Method: quicksort Memory: 3105kB - Seq Scan on Rejestr stacji do naprawy (cost=0.00..831.88 rows=19684 width=64) (actual time=2.118..361.463 rows=19529 loops=1) Filter: (Data weryfikacji = (('now'::text)::date - 120)) Total runtime: 7495.697 ms - How to make it faster ? pasman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Optimizing query: select ... where id = 4 and md5(...) = '...'
Hello, in my application I'm trying to authenticate users against a table called users. The integer column id should match, but also an md5 hash of the password column (salted with a string) should match. My authentication function (written in C, using libpq) should return a username (is a varchar(200) field). I wonder, what is faster: fetching 2 columns - the username and the md5-result and then comparing the md5 string against the argument in my app, like here: punbb= select username, md5('deadbeef' || password) from users where id = 4; username | md5 --+-- Vasja| dcde745cc304742e26d62e683a9ecb0a (1 row) punbb= explain select username, md5('deadbeef' || password) from users where id = 4; QUERY PLAN -- Index Scan using users_pkey on users (cost=0.00..5.95 rows=1 width=156) Index Cond: (id = 4) (2 rows) Or letting the database doing this comparison for me: punbb= select username from users where id = 4 and md5('deadbeef' || password) = 'dcde745cc304742e26d62e683a9ecb0a'; username -- Vasja (1 row) punbb= explain select username from users where id = 4 and md5('deadbeef' || password) = 'dcde745cc304742e26d62e683a9ecb0a'; QUERY PLAN -- Index Scan using users_pkey on users (cost=0.00..5.95 rows=1 width=118) Index Cond: (id = 4) Filter: (md5(('deadbeef'::text || (password)::text)) = 'dcde745cc304742e26d62e683a9ecb0a'::text) (3 rows) I've prepared a test case with the code listed at the botom and have run it 1000 times, but am still unsure: $ time perl -e 'for (1..1000) {system(./fetch-user, APP_QUERY) and die $!}' username: Vasja 5.038u 5.734s 0:26.29 40.9% 0+0k 0+4io 0pf+0w $ time perl -e 'for (1..1000) {system(./fetch-user, DB_QUERY) and die $!}' username: Vasja 4.757u 5.890s 0:26.52 40.1% 0+0k 0+8io 0pf+0w How does one profile PostgreSQL-queries in general? Thank you Alex PS: Using Postgresql 8.1.0 (from packages) on OpenBSD/386 -current PPS: My test program, call with APP_QUERY or DB_QUERY: #include err.h #include stdio.h #include libpq-fe.h #define DB_CONN_STR host=/var/www/tmp user=punbb dbname=punbb #define APP_QUERY select username, md5('deadbeef' || password) \ from users where id = $1 #define DB_QUERYselect username from users where id = $1 and \ md5('deadbeef' || password) = $2 int main(int argc, char *argv[]) { PGconn *conn; PGresult*res; const char *query; const char *args[2]; unsignednargs; charusername[201]; if (! strcmp(argv[1], APP_QUERY)) { query = APP_QUERY; nargs = 1; } else if (! strcmp(argv[1], DB_QUERY)) { query = DB_QUERY; nargs = 2; } else errx(1, wrong usage: supply APP_QUERY or DB_QUERY); if ((conn = PQconnectdb(DB_CONN_STR)) == NULL) err(1, Connect failed: out of memory); if (PQstatus(conn) != CONNECTION_OK) err(1, Connect failed: %s, PQerrorMessage(conn)); if ((res = PQprepare(conn, sql_fetch_username, query, nargs, NULL)) == NULL) err(1, Preparing '%s' failed: out of memory, query); if (PQresultStatus(res) != PGRES_COMMAND_OK) err(1, Preparing statement failed: %s, PQerrorMessage(conn)); PQclear(res); args[0] = 4; args[1] = dcde745cc304742e26d62e683a9ecb0a; if ((res = PQexecPrepared(conn, sql_fetch_username, nargs, args, NULL, NULL, 0)) == NULL) err(1, Executing statement '%s' failed: out of memory, query); if (PQresultStatus(res) != PGRES_TUPLES_OK) err(1, Executing statement '%s' failed: %s, query, PQerrorMessage(conn)); PQclear(res); if (nargs == 1) (void) strcmp(args[1], PQgetvalue(res, 0, 1)); fprintf(stderr, username: %s\n, PQgetvalue(res, 0, 0)); PQfinish(conn); return 0; } -- http://preferans.de ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Optimizing query: select ... where id = 4 and md5(...) = '...'
On Mon, Jul 03, 2006 at 03:13:15PM +0200, Alexander Farber wrote: Hello, in my application I'm trying to authenticate users against a table called users. The integer column id should match, but also an md5 hash of the password column (salted with a string) should match. My authentication function (written in C, using libpq) should return a username (is a varchar(200) field). I wonder, what is faster: fetching 2 columns - the username and the md5-result and then comparing the md5 string against the argument in my app, like here: I don't know about speed, but I think the choice should really be based on whether you want to be able to tell the difference between unknown user and bad password. You can still do the comparison in the database by doing something like: select username, md5('deadbeef' || password) = 'blah' from users where id = 4; So the second field will be true or false. In any case, the testing you're doing is bogus, since you're probably testing backend startup time as well, which is probably longer than the query you're running anyway. Even then, 5ms for the whole process is not to be sneezed at. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] Optimizing query: select ... where id = 4 and md5(...) = '...'
Yes, you're probably right. I'm just trying to ensure, that the slow md5() function isn't called for every row in the table. If that's not the case, then the other tiny speed differences are not that important for me. Your query works too, thanks for the hint. punbb= select username, md5('deadbeef' || password) = 'blah' from users where id = 4; username | ?column? --+-- Vasja| f (1 row) punbb= explain select username, md5('deadbeef' || password) = 'blah' from users where id = 4; QUERY PLAN -- Index Scan using users_pkey on users (cost=0.00..5.95 rows=1 width=156) Index Cond: (id = 4) (2 rows) Regards Alex On 7/3/06, Martijn van Oosterhout kleptog@svana.org wrote: On Mon, Jul 03, 2006 at 03:13:15PM +0200, Alexander Farber wrote: in my application I'm trying to authenticate users against a table called users. The integer column id should match, but also an md5 hash of the password column (salted with a string) should match. My authentication function (written in C, using libpq) should return a username (is a varchar(200) field). I wonder, what is faster: fetching 2 columns - the username and the md5-result and then comparing the md5 string against the argument in my app, like here: I don't know about speed, but I think the choice should really be based on whether you want to be able to tell the difference between unknown user and bad password. You can still do the comparison in the database by doing something like: select username, md5('deadbeef' || password) = 'blah' from users where id = 4; So the second field will be true or false. In any case, the testing you're doing is bogus, since you're probably testing backend startup time as well, which is probably longer than the query you're running anyway. Even then, 5ms for the whole process is not to be sneezed at. -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. -- http://preferans.de ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Optimizing query: select ... where id = 4 and md5(...)
Alexander Farber wrote: I wonder, what is faster: fetching 2 columns - the username and the md5-result and then comparing the md5 string against the argument in my app, like here: punbb= select username, md5('deadbeef' || password) from users where id = 4; username | md5 --+-- Vasja| dcde745cc304742e26d62e683a9ecb0a (1 row) Why don't you store the hashed value instead? -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Optimizing query: select ... where id = 4 and md5(...) = '...'
Hi Alban, On 7/3/06, Alban Hertroys [EMAIL PROTECTED] wrote: Alexander Farber wrote: punbb= select username, md5('deadbeef' || password) from users where id = 4; username | md5 --+-- Vasja| dcde745cc304742e26d62e683a9ecb0a (1 row) Why don't you store the hashed value instead? actually it's not me - it's the punBB forum SW ( http://docs.punbb.org/dev.html#dbtables ) against which I want my libpq-program to authenticate. And additional reason is that I could change my salt string if I wanted and the users could still authenticate without changing their passwords Regards Alex -- http://preferans.de ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Optimizing query
I have a problem creating a usable index for the following simple query: SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1 id is a serial, so the query is to find the latest entry to a given node and id is the primary key. The table contains around 1 million records and the query takes around 2 seconds. I have tried to make an index on node and also on both id node, but is doesn't lower the query time. What am I doing wrong ? Thanks, Poul ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Optimizing query
Poul Møller Hansen wrote: I have a problem creating a usable index for the following simple query: SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1 id is a serial, so the query is to find the latest entry to a given node and id is the primary key. You're not necessarily getting the latest entry, just the one with the highest id. Sequences guarantee uniqueness but if you have concurrent inserts not necessarily ordering. The table contains around 1 million records and the query takes around 2 seconds. Well, you don't say how many different values for node there are, nor how many rows you would expect where node='10'. I have tried to make an index on node and also on both id node, but is doesn't lower the query time. Difficult to say what's happening since you don't supply any EXPLAIN ANALYSE output. However, if you have an index on (node,id) you might want to try: SELECT ... ORDER BY node DESC, id DESC LIMIT 1; That way the ORDER BY part clearly tells the planner that a reverse-order on your index will be useful. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Optimizing query
I have a problem creating a usable index for the following simple query: SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1 id is a serial, so the query is to find the latest entry to a given node and id is the primary key. You're not necessarily getting the latest entry, just the one with the highest id. Sequences guarantee uniqueness but if you have concurrent inserts not necessarily ordering. Right you are, but I have no concurrent inserts from the same node. Difficult to say what's happening since you don't supply any EXPLAIN ANALYSE output. However, if you have an index on (node,id) you might want to try: SELECT ... ORDER BY node DESC, id DESC LIMIT 1; That way the ORDER BY part clearly tells the planner that a reverse-order on your index will be useful. Thanks a lot, that did the trick ! explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1 QUERY PLAN --- Limit (cost=0.00..764.00 rows=1 width=246) (actual time=1874.890..1874.896 rows=1 loops=1) - Index Scan Backward using table_pkey on table (cost=0.00..4347913.94 rows=5691 width=246) (actual time=1874.867..1874.867 rows=1 loops=1) Filter: ((node)::text = '10'::text) Total runtime: 1875.111 ms explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node, id DESC LIMIT 1 QUERY PLAN Limit (cost=22638.36..22638.36 rows=1 width=246) (actual time=3.001..3.007 rows=1 loops=1) - Sort (cost=22638.36..22652.59 rows=5691 width=246) (actual time=2.984..2.984 rows=1 loops=1) Sort Key: node, id - Index Scan using node_date on table (cost=0.00..21898.65 rows=5691 width=246) (actual time=0.077..1.852 rows=62 loops=1) Index Cond: ((node)::text = '10'::text) Total runtime: 3.127 ms Poul ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Optimizing query
On Mon, 15 Aug 2005, Poul Møller Hansen wrote: I have a problem creating a usable index for the following simple query: SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1 id is a serial, so the query is to find the latest entry to a given node and id is the primary key. The table contains around 1 million records and the query takes around 2 seconds. I have tried to make an index on node and also on both id node, but is doesn't lower the query time. Try to make an index on (node,id) and write the query as: SELECT * FROM my.table WHERE node = '10' ORDER BY node desc, id desc LIMIT 1; Then i'm pretty sure it will use that index. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Optimizing query
=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= [EMAIL PROTECTED] writes: explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node, id DESC LIMIT 1 QUERY PLAN Limit (cost=22638.36..22638.36 rows=1 width=246) (actual time=3.001..3.007 rows=1 loops=1) - Sort (cost=22638.36..22652.59 rows=5691 width=246) (actual time=2.984..2.984 rows=1 loops=1) Sort Key: node, id - Index Scan using node_date on table (cost=0.00..21898.65 rows=5691 width=246) (actual time=0.077..1.852 rows=62 loops=1) Index Cond: ((node)::text = '10'::text) Total runtime: 3.127 ms You're not there yet: you want what Richard said, namely explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node DESC, id DESC LIMIT 1 There shouldn't be any Sort in the plan, just the indexscan and Limit. The plan above is going to suck if there are a lot of rows with node = '10'. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Optimizing query
You're not there yet: you want what Richard said, namely I realized that it wasn't optimal for all nodes, namely those with a lot of rows. So you are absolutely right, I followed the suggestion of Richard and it works perfect. Thank you all, I learned a lesson of indexes today... Poul ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Optimizing query
Hello! I have some trouble getting good results from my query. here is structure stat_views id| integer id_zone | integer created | timestamp I have btree index on created and also id and there is 1633832 records in that table First of all I have to manualy set seq_scan to OFF because I always get seq_scan. When i set it to off my explain show: explain SELECT count(*) as views FROM stat_views WHERE id = 12; QUERY PLAN Aggregate (cost=122734.86..122734.86 rows=1 width=0) - Index Scan using stat_views_id_idx on stat_views (cost=0.00..122632.60 rows=40904 width=0) Index Cond: (id = 12) But what I need is to count views for some day, so I use explain SELECT count(*) as views FROM stat_views WHERE date_part('day', created) = 18; QUERY PLAN Aggregate (cost=100101618.08..100101618.08 rows=1 width=0) - Seq Scan on stat_views (cost=1.00..100101565.62 rows=20984 width=0) Filter: (date_part('day'::text, created) = 18::double precision) How can I make this to use index and speed the query. Now it takes about 12 seconds. -- Best regards, Uros mailto:[EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Optimizing query
Do something like: CREATE OR REPLACE FUNCTION my_date_part( timestamp) RETURNS DOUBLE precision AS 'DECLAREmydate ALIAS FOR $1;BEGINreturn date_part( ''day'', mydate );END;' LANGUAGE 'plpgsql' IMMUTABLE; create index idx_tmp on stat_views( my_date_part( created) ); or add an extra date_part column to your table which pre-calculates date_part('day', created) and put an index on this. Cheers Matthew -- - Original Message - From: Uros To: [EMAIL PROTECTED] Sent: Wednesday, November 19, 2003 10:41 AM Subject: [GENERAL] Optimizing query Hello!I have some trouble getting good results from my query.here is structurestat_viewsid | integerid_zone | integercreated | timestampI have btree index on created and also id and there is 1633832 records inthat tableFirst of all I have to manualy set seq_scan to OFF because I always getseq_scan. When i set it to off my explain show:explain SELECT count(*) as views FROM stat_views WHERE id = 12; QUERY PLANAggregate (cost=122734.86..122734.86 rows=1 width=0) - Index Scan using stat_views_id_idx on stat_views (cost=0.00..122632.60 rows=40904 width=0) Index Cond: (id = 12)But what I need is to count views for some day, so I useexplain SELECT count(*) as views FROM stat_views WHERE date_part('day', created) = 18; QUERY PLANAggregate (cost=100101618.08..100101618.08 rows=1 width=0) - Seq Scan on stat_views (cost=1.00..100101565.62 rows=20984 width=0) Filter: (date_part('day'::text, created) = 18::double precision)How can I make this to use index and speed the query. Now it takes about 12seconds. -- Best regards,Uros mailto:[EMAIL PROTECTED]---(end of broadcast)---TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]_This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com
Re: [GENERAL] Optimizing query
Uros writes: explain SELECT count(*) as views FROM stat_views WHERE date_part('day', created) = 18; QUERY PLAN Aggregate (cost=100101618.08..100101618.08 rows=1 width=0) - Seq Scan on stat_views (cost=1.00..100101565.62 rows=20984 width=0) Filter: (date_part('day'::text, created) = 18::double precision) Create an index on date_part('day', created). In 7.3 and earlier you need to create a wrapper function and index that, in 7.4 you can index arbitrarz expressions directly. The documentation contains more information about that. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Optimizing query
Uros wrote: Hello! I have some trouble getting good results from my query. here is structure stat_views id| integer id_zone | integer created | timestamp I have btree index on created and also id and there is 1633832 records in that table First of all I have to manualy set seq_scan to OFF because I always get seq_scan. When i set it to off my explain show: explain SELECT count(*) as views FROM stat_views WHERE id = 12; QUERY PLAN Aggregate (cost=122734.86..122734.86 rows=1 width=0) - Index Scan using stat_views_id_idx on stat_views (cost=0.00..122632.60 rows=40904 width=0) Index Cond: (id = 12) But what I need is to count views for some day, so I use explain SELECT count(*) as views FROM stat_views WHERE date_part('day', created) = 18; QUERY PLAN Aggregate (cost=100101618.08..100101618.08 rows=1 width=0) - Seq Scan on stat_views (cost=1.00..100101565.62 rows=20984 width=0) Filter: (date_part('day'::text, created) = 18::double precision) How can I make this to use index and speed the query. Now it takes about 12 seconds. Can you post explain analyze for the same? Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Optimizing query
Hello Shridhar, I use Matthew's solution and it works. Query takes only half a second. I didn't know that i can index function to. Thanks Uros Wednesday, November 19, 2003, 1:23:26 PM, you wrote: SD Uros wrote: Hello! I have some trouble getting good results from my query. here is structure stat_views id| integer id_zone | integer created | timestamp I have btree index on created and also id and there is 1633832 records in that table First of all I have to manualy set seq_scan to OFF because I always get seq_scan. When i set it to off my explain show: explain SELECT count(*) as views FROM stat_views WHERE id = 12; QUERY PLAN Aggregate (cost=122734.86..122734.86 rows=1 width=0) - Index Scan using stat_views_id_idx on stat_views (cost=0.00..122632.60 rows=40904 width=0) Index Cond: (id = 12) But what I need is to count views for some day, so I use explain SELECT count(*) as views FROM stat_views WHERE date_part('day', created) = 18; QUERY PLAN Aggregate (cost=100101618.08..100101618.08 rows=1 width=0) - Seq Scan on stat_views (cost=1.00..100101565.62 rows=20984 width=0) Filter: (date_part('day'::text, created) = 18::double precision) How can I make this to use index and speed the query. Now it takes about 12 seconds. SD Can you post explain analyze for the same? SD Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster