[GENERAL] arrays, inline to pointer
Hi, I have some whopper tables for machine learning. One table has a handful of 200-500 double precision arrays (representing feature vectors). It's a 9.5 on a SSD (over USB3). Each table has 5-15M rows in them. For each array I've added, and populated, any dealings with the table has become way way slower. I can only assume this is because the array data is inline in the datablock on disk that stores the row. Is it possible to "have a pointer instead", or is the only workaround a child table with vector only in it? Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] efficient math vector operations on arrays
Hi, Are there highly efficient C extensions out there for math operations on arrays? Dot product and whatnot. Example usecase: sort an item by euclid distance. Kind regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] efficient math vector operations on arrays
On 24/12/15 07:13, Pavel Stehule wrote: Hi 2015-12-24 8:05 GMT+01:00 Marcus Engene <meng...@engene.se <mailto:meng...@engene.se>>: Hi, Are there highly efficient C extensions out there for math operations on arrays? Dot product and whatnot. what you mean "highly efficient" ? Implemented as a C module so I wont have to unnest or plpgsql. Kind regards, Marcus
[GENERAL] Trigger to a queue for sending updates to a cache layer
Hi, I'm working with a retail site with tens of millions of products in several languages. For the detail pages, we try to cache in memcached. We also have quite a bit of keyword translation lookups (for international queries to solr). We're thinking of adding a nosql layer that takes the big beating with the mundane reads. I don't fancy manually adding code to every possible place in the php code that touches the related tables. What fast trigger solution would you use for this? I'm thinking of something that would just push (via fdw?) data to a queue; table name and item id. Naturally it should not slow down pg noticable and have stellar uptime. Is the whole concept a stupid idea? Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] copy expensive local view to an RDS instance
Hi, I have a local db behind a firewall etc. Basically, I'd like to do what I'd locally would... create table abc as select * from local_expensive_view; abc - on RDS local_expensive_view - on local machine How would you go about doing this? Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] copy expensive local view to an RDS instance
On 06/05/14 16:58, bricklen wrote: A very quick search shows that rds supports dblink, so perhaps that would work. http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html Then I'd need to open our servers to external visits. It would be lovely if dblink_exec could push a subselect of data instead instead of pull from RDS. Does this make sense? Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] copy expensive local view to an RDS instance
On 06/05/14 17:15, bricklen wrote: On Tue, May 6, 2014 at 8:07 AM, Marcus Engene meng...@engene.se mailto:meng...@engene.se wrote: On 06/05/14 16:58, bricklen wrote: A very quick search shows that rds supports dblink, so perhaps that would work. http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html Then I'd need to open our servers to external visits. It would be lovely if dblink_exec could push a subselect of data instead instead of pull from RDS. Does this make sense? Is the idea to pull data from the RDS to your local machine? If so, dblink or possibly plproxy[1] should be able to do that. [1] https://wiki.postgresql.org/wiki/PL/Proxy Sorry, no, I want to push data from my local machine but I've only seen examples of push with dblink_exec and litteral values rather than a subselect or smth. Best regards, Marcus
[GENERAL] insert data over dblink from bulk select of local table
Hi, I have a production database with some heavy views. I'd like to make a hourly cron (or smth) that bulk inserts new entries into a table on RDS. It's easy to get data down to the prod side, but is it possible to do a bulk insert the other way around? All examples I've seen adds the data in a litteral way. Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] varchar lengths
Hi list, In Oracle I can... create table a ( b varchar2(10 chars) ); ...and then, regardless of character encoding and how much space an ascii character vs a ö takes, 10 characters will fit there. If I do say a web-thing in php I have to do horrors like... if (10 mb_strlen ($b, '8bit')) { // fail to explain to the user why it's too long } I could be liberal with the length and instead do a... create table a ( b varchar(40) ); ...but I like constraints and this isn't a mysql list so I don't have to motivate that. Is there anything I've misunderstood? How does the rest of you deal with this situation? Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] varchar lengths
On 9/21/10 1:29 , Terry Lee Tucker wrote: On Tuesday, September 21, 2010 07:23:45 Massa, Harald Armin wrote: I recommend to use TEXT as type for that kind of columns. 99 out of 100 theories about this value will never be longer then xx characters fail in the long run. And text, limited only by PostgreSQLs limits, performs as good or better then varchar(length_limit) The time of we only can allow n chars for first name for performance reasons have gone by, together with walkmen and VHS. Harald Also, if you are absolutely set on a constraint on the length of the text, you can use a trigger for this and when the constraint changes, and it will, you simply modify the trigger. Thanks for your answers! Richard was completely right of course. I hadn't actually tested this since 8.0 but now it works splendidly. Apologies for the noise. I do use text in several places but in some, where it's motivated, I like to use constrained lengths. May it be tables that hold data that goes to legacy systems, indexed columns (such as username) or the name of a product. Have a lovely tuesday everyone, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Plan for in with one argument
Hi List, With automated queries where I have COLUMN IN (), I get a different plan from COLUMN = (). That would make sense if there are several arguments, but in the case with just one argument the penalty can be seveare. One query went from 5s execution time to a few houndreds of mS when I changed IN to = if the number of arguments is 1. Is there a technical reason for not treating IN with one argument as = in that case? pondDump= explain analyze select pic.objectid as pic_objectid from pond_item_common pic where pic.pond_user IN (select pu2.objectid from pond_user pu2 where username_locase IN ('iceberger')) limit 100; QUERY PLAN -- Limit (cost=15.41..396.89 rows=100 width=4) (actual time=0.047..0.061 rows=11 loops=1) - Nested Loop (cost=15.41..1400.19 rows=363 width=4) (actual time=0.046..0.056 rows=11 loops=1) - HashAggregate (cost=8.28..8.29 rows=1 width=4) (actual time=0.026..0.026 rows=1 loops=1) - Index Scan using pond_user_c2 on pond_user pu2 (cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1) Index Cond: ((username_locase)::text = 'iceberger'::text) - Bitmap Heap Scan on pond_item_common pic (cost=7.13..1387.36 rows=363 width=8) (actual time=0.015..0.024 rows=11 loops=1) Recheck Cond: (pic.pond_user = pu2.objectid) - Bitmap Index Scan on pond_item_common_x1 (cost=0.00..7.04 rows=363 width=0) (actual time=0.012..0.012 rows=11 loops=1) Index Cond: (pic.pond_user = pu2.objectid) Total runtime: 0.181 ms (10 rows) pondDump= explain analyze select pic.objectid as pic_objectid from pond_item_common pic where pic.pond_user = (select pu2.objectid from pond_user pu2 where username_locase IN ('iceberger')) limit 100; QUERY PLAN - Limit (cost=15.41..395.88 rows=100 width=4) (actual time=0.043..0.055 rows=11 loops=1) InitPlan 1 (returns $0) - Index Scan using pond_user_c2 on pond_user pu2 (cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1) Index Cond: ((username_locase)::text = 'iceberger'::text) - Bitmap Heap Scan on pond_item_common pic (cost=7.13..1388.27 rows=363 width=4) (actual time=0.042..0.053 rows=11 loops=1) Recheck Cond: (pond_user = $0) - Bitmap Index Scan on pond_item_common_x1 (cost=0.00..7.04 rows=363 width=0) (actual time=0.038..0.038 rows=11 loops=1) Index Cond: (pond_user = $0) Total runtime: 0.096 ms (9 rows) pondDump= Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Plan for in with one argument
On 7/11/10 12:42 , Alban Hertroys wrote: On 11 Jul 2010, at 11:38, Marcus Engene wrote: Hi List, With automated queries where I have COLUMN IN (), I get a different plan from COLUMN = (). That would make sense if there are several arguments, but in the case with just one argument the penalty can be seveare. One query went from 5s execution time to a few houndreds of mS when I changed IN to = if the number of arguments is 1. Is there a technical reason for not treating IN with one argument as = in that case? It does that already for constant IN-lists: = create table test (id serial PRIMARY KEY); = insert into test (id) SELECT nextval('test_id_seq') from generate_ series(1, 1); = ANALYZE test; = explain analyse select * from test where id IN (15); QUERY PLAN Index Scan using test_pkey on test (cost=0.00..8.27 rows=1 width=4) (actual time=0.024..0.029 rows=1 loops=1) Index Cond: (id = 15) Total runtime: 0.102 ms (3 rows) However, you're using a subquery to get the IN-list. I'm pretty sure the planner cannot know for certain that your subquery will return only one row, so it cannot substitute your IN(subquery) with =(subquery). You'd probably be better off using an EXISTS instead of an IN there, that should certainly help for cases where the subquery returns many records, but it also gives the planner a better idea of your intentions. Hi Alban, This makes great sense both in theory and empiric tests. Thanks for the explanation. Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] short circuit joins
Hi, -- a select ... from base_table bt left outer join sub_table_1 st1 on (st1.base_table = bt.id) left outer join sub_table_2 st1 on (st2.base_table = bt.id) -- b select ... from base_table bt left outer join sub_table_1 st1 on (bt.objecttype = 1 AND st1.base_table = bt.id) left outer join sub_table_2 st1 on (bt.objecttype = 2 AND st2.base_table = bt.id) Pretend this is some stupid way of object orientation. base_table is the base class and sub_table_x are subclasses. base_table.objecttype tells which instance it is. Just for the sake of discussion, it could also be like bt.is_married_monogamous = 1 AND wife.husband = bt.id for example. In case b, does Postgres avoid to unnecessarily try look for respective sub_table ? Is it worthwhile to have? I'm on 8.3 presently. Still curious if newer versions have some optimization here. Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] somewhat slow query with subselect
Hi list, I have a table, not that many rows (400k) but with about 55 columns. There are some maintenance selects I want to do (every now and then) that I don't want to add indexes for. select ... from quite_big_table qbt where qbt.owner = 123 AND exists ( select null from quite_big_table qbt2 where qbt2.owner = qbt.owner AND qbt2.id qbt.id AND qbt2.filelength = qbt.filelength ) In a case with some 5000 rows belonging to owner 123, this select really takes a long time. Way longer than without the subselect and order by filelength. I agree that with the subselect it would take some extra juice, but in my mind it would do some hash in memory which wouldn't be too slow to lookup in. shared_buffers = 16GB temp_buffers = 1GB work_mem = 32MB maintenance_work_mem = 1GB #server has 64GB (64bit) I guess there is some penalty because of the size (# columns) of the table since it has to go thru more blocks on the disk. Would it have been beneficial if filelength was in a joined child table instead? How would you do this? Create a temporary table with owner+filelenght? Thanks! Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] somewhat slow query with subselect
Tom Lane wrote: Marcus Engene meng...@engene.se writes: ... In a case with some 5000 rows belonging to owner 123, this select really takes a long time. Way longer than without the subselect and order by filelength. I agree that with the subselect it would take some extra juice, but in my mind it would do some hash in memory which wouldn't be too slow to lookup in. 8.4 can turn EXISTS subqueries into hash joins, but previous releases won't... regards, tom lane Thank you very much for your answer, Tom. I tried to join the table instead and it was way faster. Sort (cost=46769.87..46770.51 rows=258 width=48) Sort Key: pic.filesize - Nested Loop (cost=34.30..46759.54 rows=258 width=48) Join Filter: ((picsame.objectid pic.objectid) AND (pic.filesize = picsame.filesize)) - Nested Loop (cost=8.27..3099.28 rows=16 width=56) - HashAggregate (cost=8.27..8.28 rows=1 width=4) - Index Scan using user_c2 on user pu2 (cost=0.00..8.27 rows=1 width=4) Index Cond: ((username_locase)::text = 'prolificarts'::text) - Index Scan using item_common_x1 on item_common pic (cost=0.00..3081.41 rows=767 width=52) Index Cond: (pic.user = pu2.objectid) - Bitmap Heap Scan on item_common picsame (cost=26.03..2715.34 rows=767 width=16) Recheck Cond: (picsame.user = pic.user) - Bitmap Index Scan on item_common_x1 (cost=0.00..25.84 rows=767 width=0) Index Cond: (picsame.user = pic.user) Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] where in (select array)
Richard Huxton wrote: I imagine it's cheaper disk dump wise to do the array thing in this and some other similar usages, and therefore it would be nice to have a non-ugly usage pattern. Don't imagine, test. And then factor in the cost of fiddling around with arrays when you need to access individual values. And the cost of the time you spent working on all this. On my dev 8.2.4 I get using real values from a db dump with 931873 might like rows 46539 might like vector rows Might like (row version): 10s to dump the second time, 38MB txt, 4MB gzip Might like vector: 2s to dump the second time, 7.6MB text, 2MB gzip Might like (row version) explain cost, my in () version: ~200 explain cost, join on: ~670 explain cost, virtual table *): ~670 *) select ... from (select ...) as a.b Might like vector: explain cost, my in (): 1669 If there would have been a generate_series function for vectors, the choice would have been easy I think. Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] where in (select array)
Hi List, I have the might_like table that contains products a user might like if he likes the present one (item). CREATE TABLE might_like ( item INTEGER NOT NULL ,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL ,child INTEGER NOT NULL ) WITHOUT OIDS; CREATE INDEX might_like_x1 ON might_like(item); Since there are (will be) houndreds of thousands of items, and 20+ might like items, i thought it would be nice to reduce the set to 1/20th by using a vector. CREATE TABLE might_like_vector ( item INTEGER NOT NULL ,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL ,child_arr INTEGER[] ) WITHOUT OIDS; CREATE INDEX might_like_vector_x1 ON might_like_vector(item); But then this don't work: select ... from item pic where pic.objectid in ( select mlv.child_arr from might_like_vector mlv where mlv.item = 125 AND mlv.created_at now() - interval '1 week' ) limit 16 This seems to work but is ugly: select ... from item pic where pic.objectid in ( select mlv.child_arr[s.a] from might_like_vector mlv ,generate_series(1,20) as s(a) where mlv.item = 125 AND mlv.created_at now() - interval '1 week' ) limit 16 Is there a better way to do it? Thanks, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] where in (select array)
Richard Huxton wrote: Marcus Engene wrote: Hi List, I have the might_like table that contains products a user might like if he likes the present one (item). CREATE TABLE might_like ( item INTEGER NOT NULL ,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL ,child INTEGER NOT NULL ) WITHOUT OIDS; CREATE INDEX might_like_x1 ON might_like(item); Since there are (will be) houndreds of thousands of items, and 20+ might like items, i thought it would be nice to reduce the set to 1/20th by using a vector. CREATE TABLE might_like_vector ( item INTEGER NOT NULL ,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL ,child_arr INTEGER[] ) WITHOUT OIDS; You haven't reduced the set at all, you've just turned part of it sideways. You might gain something on your search, but I'm guessing you've not tested it. Hmm - the attached script generates 100,000 items and 10 liked ones for each (well for the first 99,990 it says you like the next 10 items). They're all given different timestamps at day intervals which means you'll end up with 6 or seven matches for you sample query. Sorry, I was a bit unclear. This is run on a web server. The might like rows are generated if they are not available for an item when the corresponding page is generated. The one row per might-like-item is online since yesterday and has when this is written 1/2 M rows in it. The caching is primarily initiated by a google indexer agent. This might-like generation is expensive so caching them in the db is a must and the used CPU of the web-server halfed when this caching had been put live and cached the most commonly used items. When the might-like data is read from the database, the generated presentation html is put in memcached with a timeout of 1h (presently). Memcached here is probably way overkill, but using it in situations like this makes the site more scaleable to add webservers and postpones the problem of clustering pg. So with memcached I care less about saving a few mS in select latency and more about postponing other approaching problems like having the dbdump manageble. Right now it's a 100MB gzipped dump, which is very manageable, so where it's possible I'd like to keep the data compact. I imagine it's cheaper disk dump wise to do the array thing in this and some other similar usages, and therefore it would be nice to have a non-ugly usage pattern. Also, we're going to collect usage statistics to further optimize the behavior of the site, and I'm really worried about these millions of rows. If this is a bad approach to the problem I'm very interested to hear this. Regardless of the best approach, a proper solution to the subquery in would be super appreciated too :) Thanks for your answer! Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Stroring html form settings
Dianne Yumul wrote: Hello, I have some html forms that I save the settings into the database, things like which item was selected in the menu and if a checkbox was checked. The table looks like this: user_id | report_id | info -+---+ 111 | 1 | A:CHECKEDCHECKEDCHECKED:: 111 | 2 | A:::CHECKED:: 111 | 3 | A::CHECKED:CHECKED::CHECKEDCHECKED::: The info column has the settings separated with a : and consecutive colons mean the user didn't make a selection. Would this be the way to store them? I've done some searching and I could use XML (I will read some tutorials after writing this email). But there may be other ways and I'm just too much of a newbie to know. I'm using Postgresql 8.1.11 and PHP on CentOS 5.2. Thanks in advance. Dianne Hi, I would serialize to JSON instead of XML. http://www.aurore.net/projects/php-json/ A simple json_encode($_POST) might do the trick. You could either use json_decode() to make a PHP struct of it or send it more or less as it is to Javascript. I usually find XML horrible to work with in a web context. Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] regexp_replace() [noindex] thing
Hi! I'm using tsearch2 and sometimes there are blocks of text that shouldn't be indexed. There is a trigger function that gathers data from the usual suspects and updates the index. in this trigger, I'd like to exclude thing in a [noindex] tag: select regexp_replace ('innan[noindex]apa[/noindex]klas[noindex]banan[/noindex]', '(\\\[noindex\\\])+.*?(\\\[/noindex\\\])+', ' ') I would like to have a function like the above that returns innan klas for this data. I would have expected it to as I use the non greedy version. Could someone please point me in the general direction here? Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] regexp_replace() [noindex] thing
Tom Lane wrote: Marcus Engene [EMAIL PROTECTED] writes: I would like to have a function like the above that returns innan klas for this data. I would have expected it to as I use the non greedy version. regression=# select regexp_replace ('innan[noindex]apa[/noindex]klas[noindex]banan[/noindex]', '\\[noindex\\].*?\\[/noindex\\]', ' ', 'g'); regexp_replace innan klas (1 row) Super! Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] match an IP address
Phoenix Kiula wrote: If you don't want to store IPs for registered users, I'd use: user_id INTEGER, ip cidr, CONSTRAINT must_have_userstamp CHECK ( user_id IS NOT NULL OR ip IS NOT NULL) ... and yes, I'd use a functional index to look it up, or even a trigger-maintained cache of the text representation if I had to. Then Ok, this is an idea. And I do agree that multifunction fields are a potential pain in the distant future. My questions: 1. What extra tax will this constraint levy on an INSERT or UPDATE on this table? There are about 100,000 inserts a day, and over three times as many UPDATES. The concurrency is pretty high -- I mean sometimes 1,000 users at the same time but no more than that. If the additional cost of insertion/updating is not too heavy, I suppose this could be a nice approach. 2. Why not have an INET field...why a CIDR? What's the benefit? It stores those pesky .../8 type additional data which one has to mask with functions. Would INET work just as well? 3. Storage wise does this add significantly? How much space does an INET field take as opposed to, say, a VARCHAR field? 4. Most importantly, how would you structure the index for this? I would much rather have a fast = in my sql's WHERE clause. No OR etc. Any thoughts? Thanks Use the best of two worlds - consider memcached and use the db only when you create/update an entry so that you can restore it if memcached (perhaps as a consequence of a server reboot) gets restarted. http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Getting several columns from subselect with LIMIT 1
Pierre Thibaudeau wrote: I have a PERSONS table. I also have a PROGENY table, which is a many-to-many association table with two foreign keys to the PERSONS table to itself. (In this day and age, not only can an individual have any number of children, but also a person can have any number of parents! At least, let's assume that's true for the sake of this setup.) Suppose I wish to construct a view of the persons, along with the name of their first-born (if they have one; NULL otherwise). The following SELECT does just that: SELECT persons.*, ( SELECT child.name FROM progeny JOIN persons child ON child.id = progeny.child WHERE progeny.parent = persons.id ORDER BY child.birthdate ASC LIMIT 1 ) AS firstborn_name FROM persons; Now, this is probably not the most elegant piece of code, but the real problem is that I cannot see how to extend it to the case where I want not only the firstborn's name but also the firstborn's ID (short of repeating the entire subselect a second time). At the moment, with this current syntax, my subSELECT statement would not be allowed to return more than a single column. Any suggestion? Would this work? select p.* ,pp.* from persons p ,( SELECT child.name, child.id FROM progeny JOIN persons child ON child.id = progeny.child WHERE progeny.parent = p.id ORDER BY child.birthdate ASC LIMIT 1 ) as kid(kid_name,kid_id) best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] index on id and created_at
Hi, If I have a table like... CREATE TABLE apa ( objectid SERIAL PRIMARY KEY NOT NULL ,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL ,somedata text ) WITHOUT OIDS; ...where if rowX har higher objectid than rowY, it is implied that rowX has a later created_at then rowY. select objectid from apa where created_at now() - interval '1 day' order by objectid desc; In this select, it would have been nice to dodge the full table scan without adding an index to created_at. Is this possible somehow? Best regards, Marcus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] stem tsearch2, want different stemmed words
Oleg Bartunov wrote: On Sat, 30 Jun 2007, Marcus Engene wrote: Hi! bond= SELECT to_tsvector('default','animation animal'); to_tsvector - 'anim':1,2 (1 row) bond= Sorry for a silly question, I wonder, how do I override this? I would want different stemmed words for these. create synonym dictionary. Read about this http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes Many thanks! For future googlers: do check what was in your pg_ts_cfgmap before updating; update pg_ts_cfgmap set dict_name='{ts_p5_syn,en_stem}' where ts_name='default' and tok_alias in ('lword', 'lpart_hword','lhword' ); ;-P Best regards, Marcus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] stem tsearch2, want different stemmed words
Hi! bond= SELECT to_tsvector('default','animation animal'); to_tsvector - 'anim':1,2 (1 row) bond= Sorry for a silly question, I wonder, how do I override this? I would want different stemmed words for these. Best regards, Marcus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] performance; disk bad or something?
Tom Lane skrev: Marcus Engene [EMAIL PROTECTED] writes: - Index Scan using apa_item_common_x1 on apa_item_common aic (cost=0.00..4956.68 rows=1174 width=8) (actual time=19.854..9557.606 rows=1226 loops=1) If the table only has 12000 rows then it should never have used an index scan here at all --- a plain seqscan is usually the best bet for retrieving 10% of a table. Are you using nondefault planner settings? How big is the table physically (VACUUM VERBOSE output about it might tell something)? Hi and thanks for your answer! All planner settings in postgresql.conf are commented out. Until yesterday the only setting I've poked with is shared_buffers. Best regards, Marcus apa= vacuum verbose apa_item_common; INFO: vacuuming public.apa_item_common INFO: index apa_item_common_pkey now contains 12863 row versions in 36 pages DETAIL: 1246 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.21 sec. INFO: index apa_item_common_x1 now contains 12863 row versions in 38 pages DETAIL: 1246 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.72 sec. INFO: index apa_item_common_fts now contains 12863 row versions in 391 pages DETAIL: 1246 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.01s/0.01u sec elapsed 1.16 sec. INFO: index apa_item_common_x2 now contains 12863 row versions in 36 pages DETAIL: 1246 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.51 sec. INFO: apa_item_common: removed 9028 row versions in 3651 pages DETAIL: CPU 0.24s/0.36u sec elapsed 30.69 sec. INFO: apa_item_common: found 9028 removable, 12863 nonremovable row versions in 14489 pages DETAIL: 0 dead row versions cannot be removed yet. There were 76646 unused item pointers. 0 pages are entirely empty. CPU 0.64s/0.47u sec elapsed 84.91 sec. INFO: vacuuming pg_toast.pg_toast_181470 INFO: index pg_toast_181470_index now contains 1040 row versions in 5 pages DETAIL: 71 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.07 sec. INFO: pg_toast_181470: removed 618 row versions in 383 pages DETAIL: CPU 0.01s/0.03u sec elapsed 4.55 sec. INFO: pg_toast_181470: found 618 removable, 1040 nonremovable row versions in 1288 pages DETAIL: 0 dead row versions cannot be removed yet. There were 6121 unused item pointers. 0 pages are entirely empty. CPU 0.03s/0.04u sec elapsed 16.64 sec. VACUUM apa= ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] performance; disk bad or something?
Michael Fuhr skrev: On Sat, Apr 07, 2007 at 09:28:58AM +0200, Marcus Engene wrote: INFO: apa_item_common: removed 9028 row versions in 3651 pages DETAIL: CPU 0.24s/0.36u sec elapsed 30.69 sec. INFO: apa_item_common: found 9028 removable, 12863 nonremovable row versions in 14489 pages DETAIL: 0 dead row versions cannot be removed yet. There were 76646 unused item pointers. How often does this table receive updates and deletes and how often are you vacuuming it? It averages less than one row per page (12863 nonremovable row versions in 14489 pages) and appears to have become quite bloated with dead rows sometime in the past (76646 unused item pointers). Use CLUSTER or VACUUM FULL + REINDEX to compact the table and run ANALYZE afterwards to update the statistics, then make sure you're vacuuming it often enough to keep it from becoming bloated again. Is your free space map sufficiently sized? If you do a database-wide VACUUM VERBOSE, what are the last few lines of the output that mention free space map settings? If I should take a guess, there are 5 deletes per day and 5 updates or inserts per hour. The table is 1.5 years old and I try to vacuuming it once a week; although without full. I normally do a reindex as well. I've googled a bit to find optimizer hints a la oracle's /*+ index(asdasd) */ but from what I can tell pg has chosen not to use that? I find them convenient for testing at least, even if I agree that one perhaps should avoid having them in a final product. Toggling role/chose in Oracle is something I've often had use for too. The original select seems to be consistantly fast now. That is good, but do I have a ticking bomb? 12k rows is little by any measure and if it was so slow by a little bloat it will be inevitable to reoccur again? Worth mentioning is perhaps that I also have a tsearch2 index on each row that is about 50words each. But timed lookups on that index, which one would expect to be the slowest(?), always seem to be blazingly fast. Is 8.2.x better at these simple things too or is it mainly complex multithreadable queries which will benefit from it? I hadn't touched any fsm settings but I've now set it to max_fsm_pages = 20 # min max_fsm_relations*16, 6 bytes each 20k max_fsm_relations = 1 # min 100, ~50 bytes each 1k Thanks for your help! Marcus apa=# vacuum full verbose apa_item_common; INFO: vacuuming public.apa_item_common INFO: apa_item_common: found 176 removable, 12866 nonremovable row versions in 14489 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 220 to 2032 bytes long. There were 85496 unused item pointers. Total free space (including removable row versions) is 103371272 bytes. 8673 pages are or will become empty, including 0 at the end of the table. 14479 pages containing 103370096 free bytes are potential move destinations. CPU 0.38s/0.04u sec elapsed 60.17 sec. INFO: index apa_item_common_pkey now contains 12866 row versions in 36 pages DETAIL: 176 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.20 sec. INFO: index apa_item_common_x1 now contains 12866 row versions in 38 pages DETAIL: 176 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.84 sec. INFO: index apa_item_common_fts now contains 12866 row versions in 396 pages DETAIL: 176 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.01s/0.01u sec elapsed 1.67 sec. INFO: index apa_item_common_x2 now contains 12866 row versions in 36 pages DETAIL: 176 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.88 sec. INFO: apa_item_common: moved 10868 row versions, truncated 14489 to 1832 pages DETAIL: CPU 1.77s/21.13u sec elapsed 294.11 sec. INFO: index apa_item_common_pkey now contains 12866 row versions in 58 pages DETAIL: 10868 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.02u sec elapsed 0.35 sec. INFO: index apa_item_common_x1 now contains 12866 row versions in 69 pages DETAIL: 10868 index row versions were removed. 4 index pages have been deleted, 4 are currently reusable. CPU 0.00s/0.02u sec elapsed 0.40 sec. INFO: index apa_item_common_fts now contains 12866 row versions in 671 pages DETAIL: 10868 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.04u sec elapsed 1.80 sec. INFO: index apa_item_common_x2 now contains 12866 row versions in 67 pages DETAIL: 10868 index row versions were removed. 21 index pages have been deleted, 21 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.34 sec. INFO: vacuuming pg_toast.pg_toast_181470 INFO: pg_toast_181470: found 10 removable, 1040 nonremovable row
[GENERAL] performance; disk bad or something?
Hi, There are some performance issues I am still confused over. I've got a Linux box, raid1, 1GB memory CPU-wise the server is mostly idle PG 8.0.10, shared_buffers = 1 work_mem = 16348 maintenance_work_mem = 65536 Parameters are tweaked without much difference. The following query is over a 12000 row table and apa_user is indexed. I've recently vaccumed the db, analyze and recreated indexes on apa_item_common. So how on earth could this query take 10s to complete? Are my disks bad? Are there other possible reasons? If there were a couple of million rows I would still think it would run way faster than this. I cannot see odd messages in dmesg. It more or less started to act like this over night. Best regards, Marcus apa= explain analyze apa- select apa- aic.objectid as aic_objectid apa- from apa- apa_item_common aic apa- where apa- aic.apa_user = 704 AND apa- aic.status = 30 apa- ORDER BY aic.calc_rating desc apa- LIMIT 1000; Limit (cost=5016.54..5019.04 rows=1000 width=8) (actual time=9560.471..9562.201 rows=1000 loops=1) - Sort (cost=5016.54..5019.47 rows=1174 width=8) (actual time=9560.469..9561.065 rows=1000 loops=1) Sort Key: calc_rating - Index Scan using apa_item_common_x1 on apa_item_common aic (cost=0.00..4956.68 rows=1174 width=8) (actual time=19.854..9557.606 rows=1226 loops=1) Index Cond: (apa_user = 704) Filter: (status = 30) Total runtime: 9563.016 ms (7 rows) running the same query again, now it is cached then? Limit (cost=5016.54..5019.04 rows=1000 width=8) (actual time=7.261..9.016 rows=1000 loops=1) - Sort (cost=5016.54..5019.47 rows=1174 width=8) (actual time=7.258..7.870 rows=1000 loops=1) Sort Key: calc_rating - Index Scan using pond_item_common_x1 on pond_item_common pic (cost=0.00..4956.68 rows=1174 width=8) (actual time=0.029..5.483 rows=1226 loops=1) Index Cond: (pond_user = 704) Filter: (status = 30) Total runtime: 9.840 ms (7 rows) ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] speeding up a query
Hi, I'm on 8.0.10 and there is a query I cannot quite get adequately fast. Should it take 2.5s to sort these 442 rows? Are my settings bad? Is my query stupid? Would appreciate any tips. Best regards, Marcus apa= explain analyze apa- select apa- ai.objectid as ai_objectid apa- from apa- apa_item ai apa- where apa- idxfti @@ to_tsquery('default', 'KCA0304') AND apa- ai.status = 30 apa- ORDER BY ai.calc_rating desc apa- LIMIT 1000; Limit (cost=54.40..54.43 rows=12 width=8) (actual time=2650.254..2651.093 rows=442 loops=1) - Sort (cost=54.40..54.43 rows=12 width=8) (actual time=2650.251..2650.515 rows=442 loops=1) Sort Key: calc_rating - Index Scan using apa_item_fts on apa_item ai (cost=0.00..54.18 rows=12 width=8) (actual time=61.261..2649.045 rows=442 loops=1) Index Cond: (idxfti @@ '''kca0304'''::tsquery) Filter: (status = 30) Total runtime: 2651.659 ms (7 rows) apa= explain analyze apa- select apa- ai.objectid as ai_objectid apa- from apa- apa_item ai apa- where apa- idxfti @@ to_tsquery('default', 'KCA0304') AND apa- ai.status = 30 apa- LIMIT 1000; Limit (cost=0.00..54.18 rows=12 width=4) (actual time=0.186..18.628 rows=442 loops=1) - Index Scan using apa_item_fts on apa_item ai (cost=0.00..54.18 rows=12 width=4) (actual time=0.183..17.999 rows=442 loops=1) Index Cond: (idxfti @@ '''kca0304'''::tsquery) Filter: (status = 30) Total runtime: 19.062 ms (5 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] speeding up a query
Hi again, I was thinking, in my slow query it seems the sorting is the villain. Doing a simple qsort test I notice that: [EMAIL PROTECTED] /cygdrive/c/pond/dev/tt $ time ./a.exe 430 real0m0.051s user0m0.030s sys 0m0.000s [EMAIL PROTECTED] /cygdrive/c/pond/dev/tt $ time ./a.exe 43 real0m0.238s user0m0.218s sys 0m0.015s [EMAIL PROTECTED] /cygdrive/c/pond/dev/tt $ time ./a.exe 430 real0m2.594s user0m2.061s sys 0m0.108s From this very unfair test indeed I see that my machine has the capability to sort 4.3 million entries during the same time my pg is sorting 430. And i cannot stop wondering if there is some generic sorting routine that is incredibly slow? Would it be possible to, in the situations where order by is by simple datatypes of one column, to do a special sorting, like the qsort example in the end of this mail? Is this already addressed in later versions? If no, why? and if yes, where in the pg code do I look? Best regards, Marcus #include stdio.h #include stdlib.h typedef struct { int val; void *pek; } QSORTSTRUCT_INT_S; int sortstruct_int_compare(void const *a, void const *b) { return ( ((QSORTSTRUCT_INT_S *)a)-val - ((QSORTSTRUCT_INT_S *)b)-val ); } int main (int argc, char **argv) { int nbr = 0; int i = 0; QSORTSTRUCT_INT_S *sort_arr = 0; if (1 == argc) { printf(forgot amount argument\n); exit(1); } nbr = atoi (argv[1]); if (0 == (sort_arr = malloc (sizeof(QSORTSTRUCT_INT_S) * nbr))) { printf(cannot alloc\n); exit(1); } srand(123); for (i=0; inbr; i++) { sort_arr[i].val = rand(); } qsort(sort_arr, nbr, sizeof(QSORTSTRUCT_INT_S),sortstruct_int_compare); return 0; } ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] speeding up a query
Tom Lane skrev: Marcus Engene [EMAIL PROTECTED] writes: Should it take 2.5s to sort these 442 rows? Limit (cost=54.40..54.43 rows=12 width=8) (actual time=2650.254..2651.093 rows=442 loops=1) - Sort (cost=54.40..54.43 rows=12 width=8) (actual time=2650.251..2650.515 rows=442 loops=1) Sort Key: calc_rating - Index Scan using apa_item_fts on apa_item ai (cost=0.00..54.18 rows=12 width=8) (actual time=61.261..2649.045 rows=442 loops=1) Index Cond: (idxfti @@ '''kca0304'''::tsquery) Filter: (status = 30) Total runtime: 2651.659 ms It's not the sort that's taking 2.5s --- the sort looks to be taking about a millisec and a half. The indexscan is eating the other 2649 msec. The question that seems to be interesting is what's the difference between the contexts of your two queries, because they sure look like the indexscans were the same. Maybe the second one is merely benefiting from the first one having already sucked all the data into cache? regards, tom lane Yes indeed you are completely right! Both queries take about the same when run after the other. And I just made a fool of myself with an optimizing idea I had... Sorry for the noise and thanks for your answer! Best regards, Marcus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] generate_series with month intervals
Hi list, I'd like to generate the latest year dynamically with generate_series. This select works day wise: select date_trunc ('month', now())::date + s.a from generate_series(0, 11) as s(a) I tried this but it didn't work: select date_trunc ('month', now())::date + interval s.a || ' months' from generate_series(0, 11) as s(a) This works but looks grotesque: select distinct date_trunc ('month', now()::date + s.a)::date from generate_series(0, 365) as s(a) Is there a way to do this more elegantly? Best regards, Marcus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] generate_series with month intervals
Tom Lane skrev: Marcus Engene [EMAIL PROTECTED] writes: I tried this but it didn't work: select date_trunc ('month', now())::date + interval s.a || ' months' from generate_series(0, 11) as s(a) People keep trying that :-(. The typename 'foo' syntax is for a *literal constant* only. Instead use multiplication, something like select (date_trunc('month', now())::date + s.a * '1 month'::interval)::date from generate_series(0, 11) as s(a) You'll want the cast back to date as the last step here because date+interval will give timestamp. regards, tom lane Wonderful! Worked like a charm. Thanks! Marcus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Changes in 8.2's PHP behaviour?
semi-ambivalent skrev: All, I have a simple web page that inserts data into a table in 8.1.5 using PHP4. It's pretty amateurish but gets the job done. A few days ago I upgraded to 8.2 (this is on a FreeBSD system, and I used the port) but when I tried to use the web page this morning I got errors about: PHP Fatal error: Call to undefined function: pg_escape_string() in /usr/local/www/data-dist/some_dir/some_dir/dataInsert.php on line 9, referer: http://localhost/some_dir/some_dir/dataEnter.php I'm a pretty poor PHP person, and I think its error messages can be cryptic, but nothing has changed in this except the version of postgresql. Does 8.2 handle input differently than does 8.1.x? Is that where to begin looking? I have seen talk about handling of escape strings but thought is was 'this way is good and this way is bad' kind of thing, not a 'this way will no longer work with postgresql' thing. FWIW a quick pg_dump, a fallback to 8.1.5 and a restore and things work again. thanks for any pointers, r If a recompile of php is necessary perhaps you might want to consider php 5.1 in which a kind soul implemented support for bind variables. http://ch2.php.net/manual/en/function.pg-query-params.php Best regards, Marcus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] TOAD-like query builder for PostgreSQL?
[EMAIL PROTECTED] skrev: I really like TOAD for building Oracle queries. Is there a TOAD-like, FOSS query builder for PostgreSQL? Hi, I use Toad for Oracle too. pgAdminIII is included Postgres, its good but I prefer pg lightning admin which I've been using daily for almost a year now. It's sold for $10 at the moment. http://amsoftwaredesign.com/ Author seems responsive; I made him add the AltGr-Return from toad query editor ;-) Best regards, Marcus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Only MONO/WinForms is a way to go
John DeSoi skrev: There are highly productive IDEs for the Mac with all the goodies you mention. But few are cross-platform. Your statement about Windows desktop market share is correct, but it is not the relevant point. Many people are interested in cross-platform tools because they want to serve the Windows desktop market, but not have to give up Linux or OS X to do it. Personally, I bothered with cross platform for my product because I wanted to develop it in Windows/Visual studio. I have found nothing that comes near it on Linux and I'm to old to waste time with emacs for idealistic reasons. Emacs serves a purpose but not mine. Best regards, Marcus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] IN clause
I see we have a C J Date fan on the list! ;-) There is one other case where I personally find nullable columns a good thing: process_me ish flags. When a row is not supposed to be processed that field is null and when a field is null it wont be in the index [at least on Oracle]. Best regards, Marcus Brandon Aiken skrev: Hasn't it been said enough? Don't allow NULLs in your database. Databases are for storing data, not a lack of it. The only time NULL should appear is during outer joins. -- Brandon Aiken CS/IT Systems Engineer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martijn van Oosterhout Sent: Friday, November 24, 2006 7:20 AM To: surabhi.ahuja Cc: A. Kretschmer; pgsql-general@postgresql.org Subject: Re: [GENERAL] IN clause On Fri, Nov 24, 2006 at 05:31:07PM +0530, surabhi.ahuja wrote: That is fine but what I was actually expecting is this if select * from table where col_name in (null, 'a', 'b'); to return those rows where col_name is null or if it = a or if it is = b But i think in does not not support null queries , am i right? You'll need to check the standard, but IN() treats NULL specially, I think it returns NULL if any of the elements is null, or something like that. It certainly doesn't work the way you think it does. Have a nice day, ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Discover temporary INDEX/TABLE name
Ilja Golshtein skrev: Hello! How could I find out if a temporary table (or index on a temporary table) was created by current session? The problem is something like SELECT COUNT(*) FROM PG_INDEXES WHERE INDEXNAME='tmpind1' does not work since temporary indexes from other sessions are visible. I need a way to make a distinguish between temporary things belong to current session and others. I really do appreciate any help. Hi, Would it be terrible stupid of me to suggest you name the temporary things with f.ex pg_backend_pid() appended to the name? Or is the naming outside of your control? Best regards, Marcis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] encoding advice requested
Rick Schumeyer skrev: I will have to try the WIN1252 encoding. On the client side, my application is a web browser. On the server side, it is php scripts on a linux box. The data comes from copying data from a browser window (pointing to another web site) and pasting it into an html textarea, which is then submitted. Given this, would you still suggest the WIN1252 encoding? In my setup I compiled php with --enable-zend-multibyte ...which makes all strings unicode internally (I suppose they use wchar_t instead of char or something). Thus mb_*() are [from what I can tell] not necessary [for me] anymore. Do use a fairly recent php, not only for bind variables in the pg api. In php.ini i've got default_charset = utf-8 mbstring.internal_encoding = UTF-8; in the html head: meta http-equiv=Content-Type content=text/html; charset=utf-8 / The db is in utf-8. Flawlessly it has saved everything I've tossed at it, including all sorts of apostrophes. I've copy pasted chinese, hebrew, swedish, arabic... texts into textarea with no other problem that hebrew and arabic makes most sense written from right to left ;-) Best regards, Marcus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] tsearch2() with data from other table
Hi, I have a table with items. One field is site_user, which fk:s the user row that owns the item. There are a few fields i throw to tsearch2() via an index right now, but one user wanted his name to be in the search as well. So, what I would want to do is: CREATE FUNCTION euits(int) RETURNS text AS 'select username || \' \' || firstname || \' \' || lastname from site_user where objectid = $1;' LANGUAGE SQL; CREATE TRIGGER site_item_fts BEFORE UPDATE OR INSERT ON site_item FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxfti, name, keywords, keywords_cb, location_country, location_city, media_source, description, euits, site_user); Pg accepts this, but when I update site_item set description = description I get: Query executed in 78 ms, 6 Row(s) Affected WARNING: TSearch: 'site_user' is not of character type WARNING: TSearch: 'site_user' is not of character type ... I suppose I am doing something stupid here? If I remove euits I still receive the same error. Any suggestions are very welcome. Best regards, Marcus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] tsearch2() with data from other table
Teodor Sigaev skrev: CREATE FUNCTION euits(int) RETURNS text AS 'select username || \' \' || firstname || \' \' || lastname from site_user where objectid = $1;' LANGUAGE SQL; CREATE TRIGGER site_item_fts BEFORE UPDATE OR INSERT ON site_item FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxfti, name, keywords, keywords_cb, location_country, location_city, media_source, description, euits, site_user); site_user is a table, isn't it? tsearch2 trigger accepts only column's names and functions with prototype TEXT func(TEXT). For clarify, show your table's definitions. Hi, I'm sorry I was unclear, create table site_user ( objectid integer ,firstname varchar[60] ,lastname varchar[60] ,username varchar[10] ... ) without oids; create table site_item ( objectid integer ,site_user integer -- fk to site_user.objectid ,description text ,idxfti tsvector ... ) without oids; So, when updating site_item I want to fetch the names from site_user and give this data to tsearch2() along with other stuff from site_item. Best regards, Marcus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Indices for select count(*)?
Greg Stark wrote: Alexander Scholz [EMAIL PROTECTED] writes: Hi, thank you for your answer. Regarding the performance flow when trying to find out how many records are currently being stored in the table, I don't see how an index should help... Nevertheless we've created an unique index on ID but SELECT count(ID) from XYZ still takes 35 seconds*. (ID is the primary key basing on a sequence, select count(*) isn't faster.) So - what kind of indexing would speed this up then? No form of indexing can speed this up. To answer the server has to look at every record and count up how many of them should be included in your result. Why couldn't it be possible to count # of items in an index? The density of the information (items/inode|block|whatever it's called in btrees) is likely to be much higher giving less disk i/o. I'm sorry if this has been discussed recently. Best regards, Marcus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] fts, compond words?
That a simple case, what about languages as norwegian or german? They has compound words and ispell dictionary can split them to lexemes. But, usialy there is more than one variant of separation: forbruksvaremerkelov forbruk vare merke lov forbruk vare merkelov forbruk varemerke lov forbruk varemerkelov forbruksvare merke lov forbruksvare merkelov (notice: I don't know translation, just an example. When we working on compound word support we found word which has 24 variant of separation!!) So, query 'a + forbruksvaremerkelov' will be awful: a + ( (forbruk vare merke lov) | (forbruk vare merkelov) | ... ) Of course, that is examle just from mind, but solution of phrase search should work reasonably with such corner cases. (Sorry for replying in the wrong place in the thread, I was away for a trip and unsubscribed meanwhile) I'm a swede and swedish is similair to norweigan and german. Take this example: lång hårig kvinna långhårig kvinna Words are put together to make a new word with different meaning. The first example means tall hairy woman and the second is woman with long hair. If I would be on f.ex a date site, I'd want the distinction. ;-) If not, i should enter both strings (lång hårig | långhårig) kvinna ...which is perfectly acceptable. IMHO I don't see any point in splitting these words. Let's go back to the subject, what about a syntax like this: idxfti @@ to_tsquery('default', 'pizza (Chicago | [New York]') Ie the exact match string is always atomic. Wouldn't that be doable without any logical implications? Best regards, Marcus ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] fts, compond words?
Hi, I use the tsearch full text search with pg 8.0.3. It works great, but I wonder if it's possible to search for compound words? Ie if I search for New York i want to get a match on New York has traffic problems. but not on New axe murderer incident in brittish York. Is this possible? I don't use any wrapper, just select ... from ... where idxfti @@ to_tsquery('default', 'searchstring') Thanks, Marcus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] fts, compond words?
Oleg Bartunov wrote: On Mon, 5 Dec 2005, Marcus Engene wrote: Hi, I use the tsearch full text search with pg 8.0.3. It works great, but I wonder if it's possible to search for compound words? Ie if I search for New York i want to get a match on New York has traffic problems. but not on New axe murderer incident in brittish York. Is this possible? I don't use any wrapper, just select ... from ... where idxfti @@ to_tsquery('default', 'searchstring') ranking function is what you need. Read documentation. Hi, I realized from the documentation that I'm not looking for compound words after all, I meant exact phrase. I can't see how to make rank tell me which results has an exact phrase? Like there must be a occurence of 'new' before 'york' (stemmed not really exact phrase)? Is there something new in rank for pg 8.1? Thanks! Marcus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq