Re: [GENERAL] column names query
>> On Thu, Sep 7, 2017 at 3:28 PMwrote: >> >> > Example query >> > select a.col1, case when a.col2 > 0 then a.col3 else b.xcol1 end as mycol3 >> > from a left join b on >> > Expected response >> > col1 mycol3 >> > >> >> This may be overkill, but works: >> >> postgres=# \copy (select 1 as foo, 2 as bar limit 0) to stdout with (format >> csv, header, delimiter ' ') >> foo bar >> Hi Manuel, this works perfectly, many thanks Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] column names query
>> SELECT table_name, column_name >> FROM information_schema.columns >> WHERE table_name = 'your_name'; >> >> - Original Message -From: hamann w <haman...@t-online.de>To: >> pgsql-general@postgresql.orgSent: Thu, 07 Sep 2017 07:18:12 - >> (UTC)Subject: [GENERAL] column names query >> >> >> >> Hi, >> >> is there a simple way to retrieve column names from a query - basically the >> way psql addscolumn headings when I do a select? >> >> Best regardsWolfgang Hamann >> >> >> >> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)To make >> changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general >> Hi, I am already using something like this (but using pg_attribute table) to retrieve current table layout for static tables. At the moment I am looking for a solution for ad hoc queries Example query select a.col1, case when a.col2 > 0 then a.col3 else b.xcol1 end as mycol3 from a left join b on Expected response col1 mycol3 Obviously, I could save the result into a temp table and retrieve the table's heading. Best regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] column names query
>> On Thu, Sep 7, 2017 at 9:18 AM,wrote: >> > is there a simple way to retrieve column names from a query - basically >> > the way psql adds >> > column headings when I do a select? >> >> How do you do the query? I mean, JDBC, PERL? After all psql is just a >> C program doing a query using libpq and can do it, we may provide some >> useful info if you show yours first. >> >> Francisco Olarte. Hi, I imagine this situation: I start working on a query... when I am happy with the result, I see a table (with headers) in psql. Now I can do \copy (select my query here) to /tmp/testfile1 I would like to do something like \copy (select heading(select my query here)) to /tmp/heading_testfile1 Best regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] column names query
Hi, is there a simple way to retrieve column names from a query - basically the way psql adds column headings when I do a select? Best regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table create time
>> On 31/08/2017 09:56, haman...@t-online.de wrote: >> > Hi, >> > >> > is there a way to add a table create (and perhaps schema modify) timestamp >> > to the system? >> > I do occasionally create semi-temporary tables (meant to live until a >> > problem is solved, i.e. longer >> > than a session) with conveniently short names. >> In FreeBSD you'd do smth like this to find the file creation time : >> ls -lU /data/PG_9.3_201306121/16425/12344 >> >> where 12344 is the filenode of the relation in question. In ext4 you may do >> this albeit with more difficulty. >> Hello Achilleas, many thanks for responding. There are two problems; a) accessing the filesystem will likely require some extra effort (e.g. installing an untrusted programming language) b) a dump/restore will modify the dates best regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Table create time
Hi, is there a way to add a table create (and perhaps schema modify) timestamp to the system? I do occasionally create semi-temporary tables (meant to live until a problem is solved, i.e. longer than a session) with conveniently short names. Also, is there a simple query to identify tables without a table comment? (so a weekly cron could remind me of tables that are already a few days old but have no explanatory comment) I am running PG 9.3 Best regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How pg_dump works
>> Hi all, >> I need to pg_dump a database to another server. >> The particularity is that the database is bigger than remaining space on >> disk. Obviously, this is not a problem because i'm dumping to another >> host, but I need to know if the procedure streams data to remote host or >> the data itself is packed up in temporary file(s) that are sent to >> remote host. >> The former would be such a problem, because there are good chances I'll >> have my disk filled up... >> >> I've not found details on this in documentation. >> >> Thanks >> Moreno. >> Hi Moreno, it is one big stream. I am sending compressed pg_dump and can actually use zcat dumpfile.gz | psql mydatabase on the other end Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] SQL help - multiple aggregates
Hi, I have a table cv with custid and vendid columns. Every entry represents the purchase of a product available from a specific vendor. Now, for a set of "interesting" vendors, I would like to select a new table custid, c415, c983, c1256 based upon part queries select custid, count(vendid) as c415 from cv where vendid = 415 group by custid The only way i managed to achieve that was select distinct custid into temp table cv1 from cv; alter table cv1 add column c415 int; update cv1 set c415 = part.c415 from (select custid, count(vendid) as c415 from cv where vendid = 415 group by custid) part where cv1.custid = part.custid; and repeating the process for every column requested Is there a better way (by creating an aggregate function, perhaps) Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow SQL?
Hi Bjorn, I have experienced that some subqueries can be quite slow, and would suspect the NOT IN clause. I occasionally rewrite NOT IN (select key from unwanted_candidates) as IN (select key from possible_candidates except select key from unwanted_candidates) Admittedly, I am not running latest version Regards Wolfgang Hamann >> I am trying to move a small system from Oracle to PostgreSQL and I have come >> upon a sql that runs really slow compared to on the Oracle database and >> I am not able to interpret why this is slow. >> >> The SQL looks like this: >> >> >> Select a.status, a.plass, a.navn, a.avlsnr, >>date_part('day',(now() - s.dato)) dato_diff, v.tekst, >> COALESCE(a.avlsverdi,0) >> From sed_uttak s, sem_avlsverdi a, semin_vare v >> where a.aktiv = 1 >> Ands.dato = (Select Max(y.dato) >> From sed_uttak y >> Where y.avlsnr = s.avlsnr) >> Anda.avlsnr = s.avlsnr >> Ands.sedtype = v.tallkode >> And a.avlsnr Not In (Select avlsnr >> From dyr_pause_mot) >> Union >> Select a.status, a.plass, a.navn, a.avlsnr, >>date_part('day',(now() - s.dato)) dato_diff, 'Tappe pause', >> COALESCE(a.avlsverdi,0) >> From sed_uttak s, sem_avlsverdi a, dyr_pause_mot p >> Where s.dato = (Select Max(x.dato) >> From sed_uttak x >> Where x.avlsnr = s.avlsnr) >> And a.avlsnr = s.avlsnr >> And a.avlsnr = p.avlsnr >> Union >> Select a.status, a.plass, a.navn, a.avlsnr, null dato_diff, 'IKKE TAPPET', >> COALESCE(a.avlsverdi,0) >> From sem_avlsverdi a >> Where a.aktiv = 1 >> And a.avlsnr Not In (Select avlsnr From sed_uttak) >> And a.avlsnr Not In (Select avlsnr From dyr_pause_mot) >> >> >> >> And the explain result looks like this: >> >> HashAggregate (cost=7288068.92..7288092.10 rows=2318 width=36) (actual >> time=10740.366..10741.879 rows=6475 loops=1) >> >> Group Key: a.status, a.plass, a.navn, a.avlsnr, (date_part('day'::text, >> (now() - (s.dato)::timestamp with time zone))), v.tekst, >> (COALESCE((a.avlsverdi)::integer, 0)) >> -> Append (cost=1.46..7288028.35 rows=2318 width=36) (actual >> time=0.203..10730.906 rows=8915 loops=1) >> >> -> Nested Loop (cost=1.46..7274678.41 rows=698 width=82) (actual >> time=0.203..10638.870 rows=8602 loops=1) >> >> Join Filter: (s.sedtype = v.tallkode) >> >> >> Rows Removed by Join Filter: 127006 >> >> >> -> Nested Loop (cost=1.46..7274438.07 rows=698 width=26) >> (actual time=0.189..10607.509 rows=6164 loops=1) >> >> -> Seq Scan on sem_avlsverdi a (cost=1.04..153.19 >> rows=3238 width=16) (actual time=0.024..4.027 rows=6474 loops=1) >> >> Filter: ((NOT (hashed SubPlan 5)) AND (aktiv = 1)) >> >> >> Rows Removed by Filter: 3 >> >> >> SubPlan 5 >> >> >> -> Seq Scan on dyr_pause_mot dyr_pause_mot_1 >> (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1) >> >> -> Index Scan using idx_seduttak_avlsnr on sed_uttak s >> (cost=0.42..2246.53 rows=1 width=14) (actual time=1.404..1.637 rows=1 >> loops=6474) >> Index Cond: (avlsnr = a.avlsnr) >> >> >> Filter: (dato = (SubPlan 4)) >> >> >> Rows Removed by Filter: 42 >> >> >> SubPlan 4 >>
Re: [GENERAL] Running query without trigger?
>> On 2016-07-09 08:20, haman...@t-online.de wrote: >> > Hi, >> > >> > a table is associated with a trigger for normal use. >> > An admin (someone with ALTER privilege) can disable tthe trigger, run some >> > bulk update, >> > and then re-enable it. This means, however, that normal user activity has >> > to be locked out. >> > >> > There are two possible scenarios: the bulk update would not cause trigger >> > activity at all, >> > because of the values and columns involved. >> > or - the bulk update is followed by another bulk transaction that is >> > equivalent to trigger >> > invocations per row. >> > At least in the first case, running this particular query without >> > triggering the trigger, >> > but normal activity still going on, would be really great >> > Is there a way to achieve this? >> > >> > Regards >> > Wolfgang Hamann >> >> Well for temporary disabling triggers the easiest is to run: >> SET session_replication_role = replica; >> UPDATE ... >> SET session_replication_role = DEFAULT; >> >> This only affects the current session i.e. you and not any other >> sessions which is what you seemed to require. All this assuming you run >> a supported version of postgres >> >> Regards, >> Christian Elmerot, Systems Engineer One.com >> Hello Christian, thanks for the tip. Out of the suggestions I got, specifying a list of relevant columns seems to be the best one for my specific case. I believe your suggestion will be helpful in other - more ad hoc - situations Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running query without trigger?
>> On 2016-07-09 08:20, haman...@t-online.de wrote: >> > Hi, >> > >> > a table is associated with a trigger for normal use. >> > An admin (someone with ALTER privilege) can disable tthe trigger, run some >> > bulk update, >> > and then re-enable it. This means, however, that normal user activity has >> > to be locked out. >> > >> > There are two possible scenarios: the bulk update would not cause trigger >> > activity at all, >> > because of the values and columns involved. >> > or - the bulk update is followed by another bulk transaction that is >> > equivalent to trigger >> > invocations per row. >> > At least in the first case, running this particular query without >> > triggering the trigger, >> > but normal activity still going on, would be really great >> > Is there a way to achieve this? >> > >> > Regards >> > Wolfgang Hamann >> >> Well for temporary disabling triggers the easiest is to run: >> SET session_replication_role = replica; >> UPDATE ... >> SET session_replication_role = DEFAULT; >> >> This only affects the current session i.e. you and not any other >> sessions which is what you seemed to require. All this assuming you run >> a supported version of postgres >> >> Regards, >> Christian Elmerot, Systems Engineer One.com >> Hello Christian, thanks for the tip. Out of the suggestions I got, specifying a list of relevant columns seems to be the best one for my specific case. I believe your suggestion will be helpful in other - more ad hoc - situations Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Running query without trigger?
Hi, a table is associated with a trigger for normal use. An admin (someone with ALTER privilege) can disable tthe trigger, run some bulk update, and then re-enable it. This means, however, that normal user activity has to be locked out. There are two possible scenarios: the bulk update would not cause trigger activity at all, because of the values and columns involved. or - the bulk update is followed by another bulk transaction that is equivalent to trigger invocations per row. At least in the first case, running this particular query without triggering the trigger, but normal activity still going on, would be really great Is there a way to achieve this? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Skip trigger?
>> On Sat, Apr 30, 2016 at 1:38 AM,wrote: >> > I have a table with a row update trigger that is quite slow. >> > The trigger finction basically sets some bits in a "changed" column >> > depending on which values really changed. >> > For some bulk updates it can be determined in advance that the >> > trigger function will not have any effect. >> > Is there any way to run an update query and specify that it should not >> > activate the trigger. >> > I know that I can disable the trigger and reenable it later; >> > however other concurrent updates mights need it >> >> Indeed the main issue is how you want to handle concurrency. ALTER >> TABLE statements to disable triggers works and is transactional, but >> locks the table, which may be undesired. Here are some useful >> pointers: >> http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html >> Hi Manuel, many thanks ... this seems to be just what I was looking for. I will give it a try tomorrow Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Skip trigger?
>> On 2016-04-30 02:08, wolfg...@alle-noten.de wrote: >> > Hi, >> > >> > I have a table with a row update trigger that is quite slow. >> > The trigger finction basically sets some bits in a "changed" column >> > depending on which values really changed. >> > For some bulk updates it can be determined in advance that the >> > trigger function will not have any effect. >> > Is there any way to run an update query and specify that it should not >> > activate the trigger. >> > I know that I can disable the trigger and reenable it later; >> > however other concurrent updates mights need it >> >> >> I always disable the trigger, run the update, and enable the trigger >> within a transaction. This locks the table and will prevent other >> sessions from doing updates without the trigger (I run it during >> off-hours if it is going to take more time than is acceptable). >> Hi Stephen, this is what I do now occasionally. However, I would like to be able to run some things right away. The solutions in http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html look very promising Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] running script on the server
Hi, can any of the programming languages (perl, tcl) be used to exec a process on the server and open files? The scenario I have in mind would - copy some fields of a table to an external text file - have a script generate binary data from the text file - load that data into a large object The client would just select rebuild() Best regards Wolfgang Hamann -- gpg key: 92FCFB25 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [SOLVED] running script on the server
Adrian Klaver wrote: On 07/28/2015 08:28 PM, haman...@t-online.de wrote: Hi, can any of the programming languages (perl, tcl) be used to exec a process on the server and open files? You are looking for untrusted. For Python that is the only way it comes, hence plpythonu(ntrusted) For Perl see here: http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html That is all I know about. The scenario I have in mind would - copy some fields of a table to an external text file - have a script generate binary data from the text file So what exactly does the above step entail? Just wondering if it could be done without leaving the database? Hi Adrian, the untrusted part seems to be the solution. The script consists of a perl part - text preprocessing likely to be changed - as well as a C program that actually builds the blob. I have a little experience with building postgresql binary modules, but I would prefer to keep the - tested - program separate. Copying data from the table and text preprocessing could run in plperl - load that data into a large object The client would just select rebuild() Best regards Wolfgang Hamann -- gpg key: 92FCFB25 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_attribute growing extremely
Hi, on a server running 8.4 I observe that, since a while, the pg_attribute table is growing enormously. Soon after reloading I have one file ls -s 1249 1048580 1249 a day later this is 1048580 1249 1048580 1249.1 1048580 1249.10 1048580 1249.11 1048580 1249.12 1048580 1249.13 1048580 1249.14 1048580 1249.15 682212 1249.16 1048580 1249.2 1048580 1249.3 1048580 1249.4 1048580 1249.5 1048580 1249.6 1048580 1249.7 1048580 1249.8 1048580 1249.9 4316 1249_fsm 24 1249_vm and 5 days later the system had arrived at 102 files The server is running since quite some time, without any major change. The workload is mostly reading, a little text updates and a little more binary updates. Every two weeks or so the database is recreated and reloaded The total size of data is growing slowly (say a rate of 10% per year) Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Debugging of C functions
Hi, � How is it possible to debug user defined functions written in C. I can't just include all the files and run it. � Does anyone have an idea? � Janek Sendrowski Hi, I recall compiling with debug as normal, adding a printf followed by 30 sec sleep, and use the pause to attach gdb. If it is at all possible, you should try to run the database in single mode rather than normal server operation Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] incremental dumps
On 8/1/13, haman...@t-online.de haman...@t-online.de wrote: Hi, I want to store copies of our data on a remote machine as a security measure. Wolfgang 2 questions: 1. How secure is the remote site? 2. How much data are we talking about? -- Mike Nolan Hi Mike, currently the source uses some 20 GB in a database partition and about 700 GB in a general data partition. For the database, a diff -e grows to about 10% of the size of a full dump in a week The remote site is a raid box at a hosting center, with paid backup Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] incremental dumps
Luca Ferrari wrote: On Fri, Aug 2, 2013 at 6:55 PM, haman...@t-online.de wrote: thanks for the hint - this is probably one of the things to do. I have something else in mind, but at present I just suspect that this might happen: when I modify data and select _without an ordering_, I am pretty sure to get the data in a different sequence than before. So I wonder whethet forcing the dump to honor a particular ordering (if that is at all possible) would also reduce the size of dumps ... or the time diff takes to produce them May I ask what is the final purpose? Because if it is to take a backup I guess this is not the right way to go, while if it is keeping (and rebuilding) an history of data, than using a more specific approach (like logging) could give you less headaches. Luca -- Hi Luca, we recently decided to have off-site backups rather than burning piles of DVDs that are kept on-site. The backup server sits in a data center and is fed nightly via rsync. The link is not too fast. One thought in favor of text files: if disaster really strikes (the regular machine goes on fire) it is quite likely that a replacement would be installed with latest versions of all software. Now, if I had binary files, I would probably have to install the old version of the software just to be able to do a regular dump and then reload into newer one With the planned setup, I would be able to look up previous states of the database as well. (Sample scenario: when was the price of product xyz actually changed?) This is likely not too convenient ... but loading successive dumps into a secondary installation of the database is definitely worse. Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] incremental dumps
On 08/01/2013 02:59 AM, haman...@t-online.de wrote: However, the diff files seem to be considerably larger than one would expect. One obvious part of the problem is the fact that diff shows old and new text, You could try using diff --suppress-common-lines -ed which in my experience creates the smallest diffs (actually ed scripts). Apply them with patch -e Martin Hi Martin, thanks for the hint - this is probably one of the things to do. I have something else in mind, but at present I just suspect that this might happen: when I modify data and select _without an ordering_, I am pretty sure to get the data in a different sequence than before. So I wonder whethet forcing the dump to honor a particular ordering (if that is at all possible) would also reduce the size of dumps ... or the time diff takes to produce them Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] incremental dumps
Hi, I want to store copies of our data on a remote machine as a security measure. My first attempt was a full dump (which takes too long to upload) followed by diffs between the pgdump files. This provides readable / searchable versioned data (I could alway apply the diffs on the remote machine and search the text file, without having an instance of postgres running on that machine) However, the diff files seem to be considerably larger than one would expect. One obvious part of the problem is the fact that diff shows old and new text, so e.g. changing the amount of stock for a product with a 1kB description would generate at least 2kb of text in the diff file. What would be a better approach? Best regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File System backup
yes, when i tried to start, postgresql service using init.d its gave me the error removed stale pid, postgresql failed to start. Regards, Itishree Hi, have you tried to execute the startup sequence step by step? On systems without systemd, something like sh -x /etc/init.d/postgresql start 21 | less should reveal the entire sequence. I dont know whether systemd has something to help, I really did a step by step run one time Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] how can this get faster
Hi, I have a table with mainly a list of keywords and a few odd columns referring to their origin. There is a standard btree index, and searches are carried out as anchored regex. Entries in the table and queries are uppercased; this turned out to have a big speed advantage over a ~* condition. So queries are like: select * from mytable where word ~ '^CARNEVAL' The table currently holds 15.000.000 rows, and query times are on the order of 10-20 msec Now I am trying something different: I have a temp table with words appearing in the table plus some extra data, and I want to get the other columns from that table too select aux.col1, aux.col2, mytable.inf1, mytable.inf2 from aux, mytable where aux.word = mytable.word; I specifically do analyze aux; before the operation. On a sample my aux table has 433 entries (different words), and I get this behaviour Hash Join (cost=12.74..321957.88 rows=472279 width=12) (actual time=0.530..2242.837 rows=29342 loops=1) Hash Cond: (mytable.word = aux.word) - Seq Scan on mytable (cost=0.00..259779.89 rows=15317989 width=12) (actual time=0.009..808.577 rows=15317717 loops=1) - Hash (cost=7.33..7.33 rows=433 width=14) (actual time=0.173..0.173 rows=433 loops=1) - Seq Scan on aux (cost=0.00..7.33 rows=433 width=14) (actual time=0.004..0.063 rows=433 loops=1) Total runtime: 2243.924 ms experimenting with different query data (i.e. different collections of words) reveals a strong dependency on actual data. In particular, I split the above in two subtables. The first one containing 45 rows produced Nested Loop (cost=31.41..169444.22 rows=49082 width=12) (actual time=0.040..37.739 rows=590 loops=1) - Seq Scan on aux1 (cost=0.00..1.45 rows=45 width=18) (actual time=0.003..0.011 rows=45 loops=1) - Bitmap Heap Scan on mytable (cost=31.41..3751.76 rows=1091 width=12) (actual time=0.014..0.054 rows=13 loops=45) Recheck Cond: (mytable.word = aux1.word) - Bitmap Index Scan on mytable_word_idx (cost=0.00..31.14 rows=1091 width=0) (actual time=0.008..0.008 rows=13 loops=45) Index Cond: (mytable.word = aux1.word) Total runtime: 37.800 ms while the second one (remaining 388 rows) resulted in another sequential scan on the large table, (and even no result within 3 minutes, when omitting the analyze step Is there a way to help the planner choose the index scan Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pltcl and modules question
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 is there any way to use a module within a pltcl script, i.e. have load /path/to/mystuff.so or package require mystuff in a script. You can load tcl code by putting it in the pltcl_modules tables. See: http://www.postgresql.org/docs/9.2/static/pltcl-unknown.html Similarly, would it be possible to access loaded module from perl script, i.e. have use mystuff; You can load any Perl module you want within a Pl/PerlU function. It's possible to access shared code with Pl/Perl, but it's a little more involved. See: http://www.postgresql.org/docs/9.2/static/plperl-under-the-hood.html - -- Hi Greg, thanks a lot, I will give it a try next week. Comparing the two references, I sort of fear that there mght be a problem when tcl wants to load a binary rather than script module I can try pgtclu as well Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pltcl and modules question
Hi, is there any way to use a module within a pltcl script, i.e. have load /path/to/mystuff.so or package require mystuff in a script. The reason why I am asking: I have recently converted a fairly slow script (lots of regex) into one compiled module that basically does all regex at once, and critcl made it fairly easy to build a .so or a tcl package out of the code Similarly, would it be possible to access loaded module from perl script, i.e. have use mystuff; line Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Pavel Stehule wrote: but maybe https://github.com/dimitri/prefix can help Hi Pavel, thanks - this works perfect. However, it does not seem to play well with the optimizer, so I ended up with select all candidates into a temp table using prefix operator apply all other conditions by joining that temp table to original ones Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re [GENERAL] Deleting 173000 records takes forever
Alexander Farber wrote: delete from pref_users where id not in (select distinct id from pref_money); Hi, try instead delete from pref_users where id in (select id from pref_users except select id from pref_money); Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing query?
Pavel Stehlule wrote: Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, = etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code =3D tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D= in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z= ])'); Have you tried the substring function? select from tab1, tab2 where substring(tab1.code from 1 for 5) =3D tab2.code Hi Pavel, it was just by chance that a fixed size substring would match the data at hand. It is more common to have a digit/letter (or vice versa) boundary or a hyphen there both take an enormous time. In the better case that I can subset (e.g. a= ll candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and= 800 entries in tab2 is about 40 seconds. any join where result is related to some function result can be very slow, because estimation will be out and any repeated function evaluation is just expensive. I see the problem since obviously every the ~ operator with a non-constant pattern is constantly recompiling the pattern. I wonder whether it would be possible to invent a prefix-match operator that approaches the performance of string equality. I noted in the past (not sure whether anything has changed in regex matching) that a constant leading part of regex would improve performance, i.e. use an index scan to select possible candidates. You can try use a functional index. create index on tab2 ((substring(tab1.code from 1 for 5)) What kind of trick is that - mixing two tables into a functional index? What would the exact syntax be for that? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] optimize query?
Hi, I am trying to match items from 2 tables based on a common string. One is a big table which has one column with entries like XY123, ABC44, etc The table has an index on that column. The second table is, typically, much smaller select from tab1, tab2 where tab1.code = tab2.code; This works fine and fast. Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the big table and want them to match XY423, GF55 in the second table Variants I have tried select from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); select from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])'); both take an enormous time. In the better case that I can subset (e.g. all candidates in table 2 share initial AX) I get back to manageable times by adding and tab1.code ~ '^AX' into the recipe. Actual runtime with about a million entries in tab1 and 800 entries in tab2 is about 40 seconds. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimize query?
Bob Futrelle wrote: If looking for the variants with a single suffixed character is all you'll ever need to do: Do a single pass on the large table, after creating a field, 'trunc' that contains a truncated version of the item, e.g., adding XY423 to trunc for each entry of the form XY423A, or XY423B, etc. This would be a one-time operation. Hi Bob, unfortunately it is not because the suffix pattern is not always the same. On the other hand, I tried to split the query into select substring (code from '^[A-Z]+[0-9]+') as code into tmp table t select where tab2.code = t.code which definitely was a big improvement. I believe that most of the time passing in a pattern like that would work So thank you for that tip Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using POSIX Regular Expressions on xml type fields gives inconsistent results
David Johnston pol...@yahoo.com writes: [optionally go look at ~* in the documentation at this point; or just try a simple flip-flop of the expression] SELECT 'Kumar' ~* 'kuma' - TRUE !!! (ah ha! I had the order reversed) SELECT any( ARRAY['Kumar','Gozales']::text ) ... wait...ANY doesn't work on the other side... :( [At this point I'd confirm or question why ANY hasn't been made to go both ways but also realize that I will have to approach this in a different way to achieve my goal.] It's been awhile since I looked at the point in detail, but I seem to recall that there are fundamental syntactic-ambiguity reasons why the ANY/ALL part has to be on the righthand side of the comparison operator. There's a much easier fix to this problem though, which is to invent a reverse ~ operator that does POSIX comparison with the pattern on the left. The hardest part of doing that for yourself is choosing a name for the reverse operator --- it just goes like create function reverse_regexeq(text, text) returns bool as 'select $2 ~ $1' language sql strict immutable; create operator ~~~ (procedure = reverse_regexeq, leftarg = text, rightarg = text); and similarly for the case-insensitive version, and there you go: pattern ~~~ ANY (whatever) solves the problem. Hi Tom, while it might work to reverse the order in a regex, I have experienced severe slowdown when the pattern comes from table data Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Backslashitis
Hi, I have a column declared as array of text. I can get a single backslash into one of the array elements by update ... set mycol[1] = E'blah \\here' If I try to update the whole array update ... set mycol = E'{blah \\here}' the backslash is missing. I can get two backslashes there. Is there a good way to solve the problem, other than rewriting my update script to do array updates one element at a time? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Horizontal aggregation?
Hi, I am looking for a feature that would select from a table with k1 a k1 b k1 c k2 a k3 b k3 c something like k1 a b c k2 a k3 b c (Just all elements next to each other, with a space in between) or perhaps an array output k1 {a,b,c] k2 {a} k3 {b,c} If such an operator exists, would there be a remove duplicates option? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] non-static LIKE patterns
haman...@t-online.de writes: Tom Lane wrote: If you want it to be bulletproof, what I'd think about is something like WHERE second.path LIKE quote_like(first.path)||'%' Just out of curiosity: wouldn't that (as well as using non-static like) be an enormous performance problem? Well, it won't be free, but I think you've already doomed yourself to a not-very-bright plan by using LIKE in this way at all. In any case, as a wise man once said, you can make it run arbitrarily fast if it doesn't have to give the right answer. Correctness trumps any micro-optimization questions, so if you have to have prefix matching of this sort, it's gonna cost ya somehow. Hi Tom, I just stumbled across this question because I regularly come across problems that, at first, look like they should be solved with non-static LIKE or REGEX patterns I actually have two situations where I would need a better plan. One is, fortunately, fairly static (mostly lookups, hardly inserts) for name matches. Many famous people appear in different spellings, say these two musicians Franz|Ferenc Liszt Fr(e|y)der(ic|yk) Chopin So the first plan would be to regex-compare the sought name against the first name (or last name) regexes. Run-time is astronomical, though My current approach is to a) keep the regexes in a separate table/column, so names with a regex entry are handled in a smaller query b) reverse the query: for every regex (they are well-behaved in this context) I pre-create a pattern so that my actual query becomes where pre-made-pattern ~ searched_name c) while preparing the pattern, a common initial character (the F for Franz and Ferenc) is identified to build an index. In the rare case that the first letter is already different, there would be two entries in the table. So the actual query can check for first letter before it does the regex. The other situation, unfortionately, is ad-hoc queries where I cannot do that kind of preparation typically, the DB would contain strings like XY4711A, XY271, XY17321AAA, and I want to check whether an input like XY17321 matches a database entry up to the end of the numerals. So I add [^0-9]*$ to the end of my candidates, select where candidate ~ entry-in-table and go for a coffee or two Of course I would prefer to see a pre-built solution do all that mess for me... Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Horizontal aggregation?
Hi, I am looking for a feature that would select from a table with If such an operator exists, would there be a remove duplicates option? Regards Wolfgang Hamann select k, array_agg(distinct val ) from t; See the docs: http://www.postgresql.org/docs/9.1/static/sql-expressions.h= tml#SYNTAX-AGGREGATES obviously I forgot the group by: select k, array_agg(distinct val ) from t group by k; Hi Abel, thanks a lot. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] non-static LIKE patterns
Tom Lane wrote: patrick keshishian pkesh...@gmail.com writes: Thanks for the quick reply. Would be tough choosing another reasonable ESCAPE character while dealing with paths. Will think more about this. If you want it to be bulletproof, what I'd think about is something like WHERE second.path LIKE quote_like(first.path)||'%' where quote_like() is a function that inserts a backslash before each backslash, percent, and underscore in the given value. Probably not hard to cons that up from regexp_replace(). regards, tom lane Just out of curiosity: wouldn't that (as well as using non-static like) be an enormous performance problem? I tried something with normal ~ regex matching some time ago but gave up on the idea pretty soon Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] question about alternate ordering of results
Hi Tom, declaring another operator class helped. At first, however, results were sorted deifferent than expected. A little gdb session revealed that if fact only the FUNCTION 1 entry in the operator class is used Regards Wolfgang Hamann haman...@t-online.de writes: Now, in versions 8 and later the using - is rejected, the ordering op needs to be or member of a btree operator class. What is needed to create the old behaviour again - create a complete operator class, including new names for the unchanged equals/not equals function? Yes. It sounds like you have pretty much all the spare parts you need, you just have to collect them together into an opclass for each ordering you want. Is this relevant to performance? Somewhat, in that it helps the planner optimize ordering considerations. But IIRC the main argument for tightening it up was to catch mistakes wherein somebody says ORDER BY x USING , or some other operator that doesn't produce a consistent sort order. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pltcl and modules
Hi, I recently found out about critcl, which does a fairly decent job of converting a C fragment inside a tcl procedure into a loadable module. Now, if I wanted to use the same code inside pltcl, would that be possible? If yes, any special precautions other than getting the file paths right? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] question about alternate ordering of results
Hi, in the old days (version 7.x) I created a set of btree operators create operator ( procedure = mytext_lt, leftarg = text, rightarg = text, commutator = , negator = =, restrict = scalarltsel, join = scalarltjoinsel ); etc. for a completeoperator class create operator class mytext_ops for type text using btree as operator 1 (text,text), ); Some tables have an index using this operator class create index blah_t on blah using btree (col1 mytext_ops); I also created a few extra operators create operator - ( procedure = mytext_lt_x, leftarg = text, rightarg = text, commutator = -, negator = =-, restrict = scalarltsel, join = scalarltjoinsel ); and could use them in a query, like select from blah order by col1; select from blah order by col1 using ; select from blah order by col1 using -; (resulting in 3 different result orderings) BTW: the difference between the two ops is natural ordering of numerics: if two strings start to differ at 3rd vs 17th, the returns them in alpha sort and the - produces proper numeric order Now, in versions 8 and later the using - is rejected, the ordering op needs to be or member of a btree operator class. What is needed to create the old behaviour again - create a complete operator class, including new names for the unchanged equals/not equals function? - adding another index to get the feature Is this relevant to performance? I guess that an index using one opclass and ordering using the other one would result in an index scan to locate the data and then an extra ordering pass to order them the other way Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] can these queries be combined into one?
Hi, I am currently doing something like select ordercode, descr, codes into temp table x from products where ... Here codes is a bit-mapped field update x set codes = codes | 512 from othertable t where ordercode = t.ordercode and select * from x Is there a way to avoid that temp table? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] table alias for update
is there a way to use a table alias for the target table in an UPDATE query? I would like to avoid spelling out things like that in psql: update myverylongtablename set col1 = r.col1repl from repltable r where myverylongtablename.id = r.id and myverylongtablename.col2 0 Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] COPY and indices?
Scott Marlowe wrote: 2012/3/12 François Beausoleil franc...@teksol.info: Hi all, When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), what happens with indices? Are they updated only once after the operation, or are they updated once per row? Note that I'm not replacing the table's data: I'm appending to what's already there. I suspect batching writes will be faster than writing each individual row using an INSERT statement. Copy is faster, but if a single row fails (dup key etc) the whole copy fails. Hi Scott, my impression: it would be even faster to drop the indices, do the bulk copy, and rebuild the indices after the fact. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] what Linux to run
If we move to Linux, what is the preferred Linux for running Postgres on. This machine would be dedicated to the database only.=20 I'd like a recommendation for both a GUI hosted version and a non-GUI version. I haven't used Linux in the past but did spend several year s in a mixed Unix and IBM mainframe environment at the console level. =20 Hi, one thing you might want to consider is system lifetime: some distro may be set up so that you more or less have to reinstall within 2 years, if you plan to use update service - others may be longer. Now, fast development is great AND allows you to change to better hardware easily. It does however mean that you might get surprised with a different postgres version at times you dont really like it. If you plan to install from source, this would not be of any concern regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] what Linux to run
On Tue, 28 Feb 2012, haman...@t-online.de wrote: one thing you might want to consider is system lifetime: some distro may be set up so that you more or less have to reinstall within 2 years, if you plan to use update service - others may be longer. Now, fast development is great AND allows you to change to better hardware easily. It does however mean that you might get surprised with a different postgres version at times you dont really like it. If you plan to install from source, this would not be of any concern Wolfgang, Most updates fix security vulnerabilities. If you keep current with those there's not a compelling need to upgrade the distribution itself unless you want to do so. There's a distinction between the distribution itself (kernel, and GNU tools) and the end-user applications bundled with the distribution. Also, the distributions with which I'm familiar allow you to select the applications to upgrade so you can avoid surprises. Hi Rich, if - after say 18 months, I do no longer get updates (this seems to be lifecycle of the locally popular SuSE), it means that you either have to do an upgrade install or forget about security fixes. Now the upgrade install might bring you some software with incompatible changes, or even might replace some software you used to rely on with something different After some unpleasant surprises I stopped to upgrade: rather get a fresh box, install everything there, and once it plays nicely, swap them Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] problem setting up
Hi, I am just trying to build a duplicate of our database server. Like the main one, I compiled it from source, ran initdb, created a superuser, and then proceded to creating a database. However, db creation is denied to the superuser. Likewise, if I create the db as user postgres and then change ownership, I still do not have enough rights to, say, install a language. The pg_database.datdba as well as pg_user tables seem to have correct values. Where could I look next? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Searching for bare letters
Reuven M. Lerner wrote: pHi, everyone. I'm working on a project on PostgreSQL 9.0 (soon to be upgraded to 9.1, given that we haven't yet launched). The project will involve numerous text fields containing English, Spanish, and Portuguese. Some of those text fields will be searchable by the user. That's easy enough to do; for our purposes, I was planning to use some combination of LIKE searches; the database is small enough that this doesn't take very much time, and we don't expect the number of searchable records (or columns within those records) to be all that large./p pThe thing is, the people running the site want searches to work on what I'm calling (for lack of a better term) bare letters. That is, if the user searches for n, then the search should also match Spanish words containing ñ. I'm told by Spanish-speaking members of the team that this is how they would expect searches to work. However, when I just did a quick test using a UTF-8 encoded 9.0 database, I found that PostgreSQL didn't see the two characters as identical. (I must say, this is the behavior that I would have expected, had the Spanish-speaking team member not said anything on the subject.)/p pSo my question is whether I can somehow wrangle PostgreSQL into thinking that n and ñ are the same character for search purposes, or if I need to do something else -- use regexps, keep a naked, searchable version of each column alongside the native one, or something else entirely -- to get this to work./p pAny ideas?/p pThanks,/p pReuvenbr I had the same problem with german (there is ä ö ü) I ended up with a normalized version of the database (for many purposes, this could be just an extra column) plus preprocessing the input. There is one difficulty with german searches: these letters are commonly transliterated into ue etc, like in Muenchen. So depending on culture, some people would expect a u search term to match, and others the ue. So preprocessing query means replacing bare u (not followed by e) with a ue? regex BTW: if your search form does not explicitly tell the browser to use utf8 to encode the search field, you might expect a small proportion of iso-latin1 requests Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] how to improve this similarity query?
Hi, I have a table of names, and searches are usually performed on prefix match. This could nicely translate into an index search Suppose first name is stored as either 'Jim' or 'Jimmy', searching ... where firstname ~* '^jim'; gets proper result. I had hoped that creating a functional index on lower(firstname) and using a query like where lower(firstname) ~ '^jim' would improve the search, but it does not. I ended up with adding a lowercased column for matching Now a few names (a few percent of the entire lot) have alternates, like 'James'. These could be nicknames, informal variants, language variants, alternate spellings I have already split off these few percent into a separate table and can query that like ... where 'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;'jim' ~* firstname_pattern; There are two problems with this approach: when I use 'Jimbo' for the plain query, I do not get any 'Jim' or 'Jimmy', as expected. To achieve the same effect on the pattern query, I need to decorate the pattern somewhat. Actually, when I specify 'J(im|ames)' for the pattern, it gets preprocessed - and is stored in the database as - 'J(im|am($|e($|s)))$' Unfortunately there are regex patterns which the preprocessing script cannot handle, so I might have to try a different regex. The other, bigger, problem: the search cannot make use of an index, and it has to compile a regex for every entry in the table. I am considering a change to that part: in the Jim/James case it is obvious that I could speed up the query with where firstname_pattern ~* '^j' and 'jim' ~* firstname_pattern; If the pattern was 'Bob|Robert' instead, I would have to change the preprocessing so the 'B' and 'R' parts would be separate. So, I wonder whether there is any better way of doing these. I have looked into tsquery resp. fulltext, but they do not seem to support prefix matches, only exact ones. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Eduardo Morras nec...@retena.com wrote: At 08:04 25/09/2011, haman...@t-online.de wrote: select * from items where regexp_matches(items.code,'(ABC) (DE1) (any_substring)'){}; Hi Eduardo, it is clear that scanning the table once with a list of matches will outperform rescanning the table for every string wanted. Now, my problem is that the patterns are dynamic as well. So if I could translate a table with one column and a few thousand rows into something like regexp_matches(code,'string1|string2|.string2781') would ideally be a performant query. Unfortunately I have no idea how I could achieve this transformation inside the database. Doing it externally fails, because any single query cannot be more than so few characters. You can create a plsql function and pass a setof text that do it. Sorry but instead saying you What Must You Type, WMYT(c), i prefer the How Should You Do way, HSYD(c). Note that you can get the same results using other approachs (f.ex. using FTS described in chapter 12) Check this topics: Function Creation http://www.postgresql.org/docs/9.0/static/sql-createfunction.html Tutorial about Function Creation http://www.adderpit.com/practical-postgresql/x10374.htm Hi, I tried the pl/sql approach to convert the contents of that patterns table into a regex. Results: 40 seconds runtime for 9500 candidates and 815 patterns 718 seconds for the same set of 9500 candidates, but using 4000 patterns instead. So it seems that I am reaching limits of pattern match As for the fulltext index (and the underlying tsquery): this is an exact match rather than prefix match, so I would need to know match patterns in advance in order to build the index I am thinking about that anyway (because ABC1234 likely should not match ABC123 pattern in my context), but I would sort of prefer a system where I can state the rules when I see the data set, rather than having to pre-create an index. Thanks for the tutorial link :) It seems that the responses on my post give all sorts of input that will help me on other tasks Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Alban Hertroys haram...@gmail.com wrote: Hi, =20 the strings are not really known before. Let me explain the scenario; there is one table about products, and = code is the manufacturer's (or resellers') product id. So, if ABC were a maker of laptops, ABC123 and ABC456 might be two = different machines, and ABC123G might have G3 mobile installed, or ABC123X might be the = same thing with extra memory. Obviously these device variants all look the same. Now reseller sends us a collection of product images, so there would = be ABC123.jpg and ABC456.jpg The database task at hand is matching products to images (and then = inserting the image name into a column of the products table). I guessed right then. The origin of your problem is that you have = similar items in your database, but the database doesn't know they are = similar. I'd fix that first, it makes the problem a whole lot easier to = handle. For example, if an image comes in named ABC123G.jpg, you look up the = product and manufacturer and update its image. Then you query for = products of the same manufacturer that are similar to ABC123G (result: = ABC123 and ABC123X) and update their images as well (if appropriate; = perhaps they have a recent enough image of their own?). As another whacky alternative to your regular expressions; I think it = would be possible to abuse the text-search functionality in Postgres to = match product id's. Those id's are basically a language per manufacturer = describing product details. If you can split the product id's up into lexemes that describe the base = product id and it's options, then you can use full-text search to match = up expressions similar to the lexemes derived from the image name. For example: productid | lexemes ---+-- ABC123| {'ABC' '123'} ABC123G | {'ABC' '123' 'G'} ABC123X | {'ABC' '123' 'X'} ABC456| {'ABC' '456'} I'm not really sure if that's possible, or how much work it would be per = manufacturer - I haven't used FTS much. I'd first see if I couldn't add that similarity information to the = products table, though ;) Alban Hertroys -- The scale of a problem often equals the size of an ego. Hi, the actual process tends to be - add products to database - then receive images and try to match them to products. So I know about the naming scheme only when I see a list of files, and would want to have a cmdline option for my matching script that distinguishes formats like a) exact b) alpha suffix following numeric main body c) period or slash between main and related To make things even more complex, I might receive images from a reseller that offers a few manufacturers using different conventions. I wonder whether this would translate well into building a temporary index, if I detect b or c patterns. When I asked first, I also started working on a different approach. This did work on the current case (I got similar datasets before, but with considerably fewer items), so I will probably try that and some ideas I got from this discussion, and see how far I get. This approach is a separate perl script that builds a tree structure of all the image names and then tries to create a regex for a crude filter. In the example it would have determined that all images in the lot match an ABC prefix. Then it selects all matching codes (so it can work with the entire database) and runs them through the same tree structure. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Eduardo Morras wrote: Hi, if I understand this right, it does not mean check if the string appears at position 0 which could translate into an index query, but rather check if the string appears anywhere and then check if that is position 0, so the entire table is checked. The second one yes, as it checks all patterns you want only one time per row they only needs one table scan. The first one eliminates the substring 'ABC' from the string, if the lengths of both strings are equal, the substring 'ABC' wasn't in it. If they are different, the trimmed string will be shorter. explain analyze select items.num, wantcode from items, n where strpos(code, wantcode) = 0; Nested Loop (cost=167.14..196066.54 rows=39178 width=36) (actual time=0.074..36639.312 rows=7832539 loops=1) Join Filter: (strpos((inner.code)::text, outer.wantcode) = 0) - Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual time=0.005..2.212 rows=815 loops=1) - Materialize (cost=167.14..263.28 rows=9614 width=42) (actual time=0.007..13.970 rows=9614 loops=815) - Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.044..14.855 rows=9614 loops=1) Total runtime: 46229.836 ms The query ran much faster than the pattern query, however. This seems to be the performance of just searching for a plain string vs. initializing the regex engine every time (for 815 queries in a test set) It will do only one table scan while your original code will do one for each substring you want to test. You can add more and more substrings without too much cost. If you want to use the regex engine instead the postgresql string funtions check the regexp_matches(), it should be faster if you have 3000 substrings. select * from items where regexp_matches(items.code,'(ABC) (DE1) (any_substring)'){}; Hi Eduardo, it is clear that scanning the table once with a list of matches will outperform rescanning the table for every string wanted. Now, my problem is that the patterns are dynamic as well. So if I could translate a table with one column and a few thousand rows into something like regexp_matches(code,'string1|string2|.string2781') would ideally be a performant query. Unfortunately I have no idea how I could achieve this transformation inside the database. Doing it externally fails, because any single query cannot be more than so few characters. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Alban Hertroys haram...@gmail.com wrote: What is the output of explain? You say 'the other table', so presumably we're dealing with a foreign key here. Is there an index on that column? Albe Laurenz wrote: Is the index used for where code ~ '^ABC3563'? If not, then the result is fast only because the table is scanned only once, and it's just the factor of 3000 that's killing you. The second query (where code ~ wantcode) can never use an index because the pattern wantcode is unknown at query planning time. Yours, Laurenz Albe Here I created a subset (just number and code matching a certain prefix) \d items Table pg_temp_1.items Column | Type | Modifiers +---+--- num| integer | code | character varying(40) | create index itemsc on items (code); select count(*) from items; count --- 9614 A single anchored query select * from items where code ~ '^ABC'; does indeed use the index to retrieve data. Next I copied a file of wanted codes create temp table n (wantcode text); \copy n from /tmp/rmartin.tmp the file contains plain names, i.e. unanchored matches explain analyze select num, n.wantcode from items, n where items.code ~ n.wantcode; Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual time=148.479..336280.488 rows=2871 loops=1) Join Filter: ((outer.code)::text ~ inner.wantcode) - Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.048..38.666 rows=9614 loops=1) - Materialize (cost=20.00..30.00 rows=1000 width=32) (actual time=0.001..1.049 rows=815 loops=9614) - Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual time=0.003..1.839 rows=815 loops=1) Total runtime: 336286.692 ms An exact match where items.code = n.wantcode on the same data completes in 40 ms BTW: indexing the second table does not affect the query plan or the runtime, it just shows actual row count rather than estimate. This is, of course, bad; an anchored match could be faster and also is more appropriate to the scenario. So I change the contents of the second table update n set wantcode = textcat('^', wantcode); and try again, with similar results Nested Loop (cost=14.15..176478.01 rows=39178 width=36) (actual time=125.114..308831.697 rows=2871 loops=1) Join Filter: ((outer.code)::text ~ inner.wantcode) - Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.061..2034.572 rows=9614 loops=1) - Materialize (cost=14.15..22.30 rows=815 width=32) (actual time=0.001..1.095 rows=815 loops=9614) - Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual time=0.114..1.893 rows=815 loops=1) Total runtime: 308837.746 ms I am aware that this is unlikely to work fast (the planner would perhaps need a hint in the query rather than in the data column to choose an anchored match algorithm (in case there is such an algo, of course) So I wonder whether there might be a different approach to this problem rather than pattern matching. I recall I had a similar problem before with a contacts column possibly containing one or more email addresses. Here searches would also be number of people times number of requests performance. I finally ended up with a @@ match (contrib/tsquery) and a supporting GIST index, but that only supports exact match, not prefix Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Eduardo Morras wrote: You can try these, i doubt they will use any index but its a different approach: select * from items where length(items.code)length(rtrim(items.code,'ABC')); select * from items where strpos(items.code,'ABC')=0 or strpos(items.code,'any_substring')=0; Hi, if I understand this right, it does not mean check if the string appears at position 0 which could translate into an index query, but rather check if the string appears anywhere and then check if that is position 0, so the entire table is checked. explain analyze select items.num, wantcode from items, n where strpos(code, wantcode) = 0; Nested Loop (cost=167.14..196066.54 rows=39178 width=36) (actual time=0.074..36639.312 rows=7832539 loops=1) Join Filter: (strpos((inner.code)::text, outer.wantcode) = 0) - Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual time=0.005..2.212 rows=815 loops=1) - Materialize (cost=167.14..263.28 rows=9614 width=42) (actual time=0.007..13.970 rows=9614 loops=815) - Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.044..14.855 rows=9614 loops=1) Total runtime: 46229.836 ms The query ran much faster than the pattern query, however. This seems to be the performance of just searching for a plain string vs. initializing the regex engine every time (for 815 queries in a test set) Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Alban Hertroys wrote: So you're comparing a variable field value to a variable pattern - yeah, that's going to hurt. There's no way you could index exactly that. Perhaps there's some way you can transform the problem so that you get something indexable? For example, if your match patterns follow a certain pattern by themselves, you could add a column with the longest match pattern that would match the string. Then you could just do a query for which records have the match pattern (in that new column) that you're looking for and voila! If something like that is possible strongly depends on what kind of match patterns you're using, of course. Hi Alban, I already did that - the test set is just all records from the real table (about a million entries) that match the common 'ABC' prefix An exact match where items.code = n.wantcode on the same data completes in 40 ms That's an exact string match, of course that will be fast ;) The main difference is: the fast query looks like explain select items.num, wantcode from items, n where code = wantcode; Merge Join (cost=53.56..1104.02 rows=39178 width=36) Merge Cond: ((outer.code)::text = inner.wantcode) - Index Scan using itemsc on items (cost=0.00..438.75 rows=9614 width=42) - Sort (cost=53.56..55.60 rows=815 width=32) Sort Key: n.wantcode - Seq Scan on n (cost=0.00..14.15 rows=815 width=32) and the slow ones looks like that one: Nested Loop (cost=14.15..176478.01 rows=39178 width=36) Join Filter: ((outer.code)::text ~ inner.wantcode) So the database takes an entirely differnet approach at retrieving the entries. Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a faster way to do that
Hi Alban, I already did that - the test set is just all records from the real table= (about a million entries) that match the common 'ABC' prefix I think you misunderstood what I wrote. Notice the difference between which strings match the pattern and which records have the match pattern (in that new column) - the first is a regular expression match (unindexable), while the second is a string equality match (indexable). What I'm suggesting is to add a column, which for the string 'ABCDEFG' would contain 'ABC%'. Data would look like: SELECT str, pattern FROM tbl; =A0str | pattern -+- =A0ABCDEFG | ABC% ABCDEF | ABC% BCDEFGH | BCD% etc. (can't format this properly in webmail, sorry) When you look for records that match the pattern 'ABC%', you would normally perform a query like: SELECT str FROM tbl WHERE str LIKE 'ABC%'; But with this new column, you would query: SELECT str FROM tbl WHERE pattern =3D 'ABC%'; As I said, it depends a lot on your pattern needs whether this solution would work at all for you. If you only ever use a few patterns, it will work. If you use many different patterns or don't know before-hand which patterns will be used, it won't work well at all. The main difference is: the fast query looks like explain select items.num, wantcode from items, n where code =3D wantcode; =A0Merge Join =A0(cost=3D53.56..1104.02 rows=3D39178 width=3D36) =A0 Merge Cond: ((outer.code)::text =3D inner.wantcode) =A0 - =A0Index Scan using itemsc on items =A0(cost=3D0.00..438.75 rows= =3D9614 width=3D42) =A0 - =A0Sort =A0(cost=3D53.56..55.60 rows=3D815 width=3D32) =A0 =A0 =A0 =A0 Sort Key: n.wantcode =A0 =A0 =A0 =A0 - =A0Seq Scan on n =A0(cost=3D0.00..14.15 rows=3D815 wid= th=3D32) Is there an index on wantcode? If you have a million or more records, I would expect an index scan for a measly 815 matches... and the slow ones looks like that one: =A0Nested Loop =A0(cost=3D14.15..176478.01 rows=3D39178 width=3D36) =A0 Join Filter: ((outer.code)::text ~ inner.wantcode) So the database takes an entirely differnet approach at retrieving the en= tries. Yes, because you're still using ~ there, with a pattern that's unknown at query planning time. That will only be fast under some fairly rare circumstances. Hi, the problem is that I read the patterns from a file, as part of the script. They are not known seperately. So it seems that creating the extra column is just the same effort as running the original query. The processing is one-time in nature. The one thing I can do is selecting a range of items on a common prefix, if all the codes in the second table have some characters in common Is there an index on wantcode? If you have a million or more records, I would expect an index scan for a measly 815 matches.. When I ran a test, there was no real difference with wantcode indexed or not It was interesting to try another suggestion and noting the difference between comparison functions, with identical (lack of) use of an index Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] looking for a faster way to do that
Hi, I have one large table (about a million entries) with an indexed column containing codes like ABC3561A, ABC3563X, 72-451-823 etc. (lots of order numbers from different manufacturers) When I ask for a specific item select code where code = 'ABC3563X' I get fast result. I also get fast result when doing a prefix match select code where code ~ '^ABC3563' If a am retrieving many items by joining with another table select code . where code = wantcode this is still fast. If I try to get many items on a prefix match select code where code ~ wantcode things go very slow. Explain shows a nested loop, so seemingly the table is rescanned for every wanted item in the other table. A test run (3000 wanted codes against a shortened table of 1 ones) took about 200 seconds to complete What other queries could I use to get the requested selection? Regards Wolfgang -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general