[SQL] Partitioning by letter question
Hi, I was wondering if this was possible. I'm trying to partition a table, which is straightforward enough thanks to the great documentation, but i have a question: If I partition using something like a product_id for example and have check constraints such as (id>=1000 and id<2000) then everything is fine and the planner correctly uses the right subset of the tables. However I would like to partition by the first letter and using something like this substr(word,1,1)='a' is ignored by the planner. From reading the docs I understand that complicated check constraints are ignored, but this doesn't seem overly complicated. Am i doing something wrong or is there another better way to do this Thanks John -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Partitioning by letter question
wrote: Hi, I was wondering if this was possible. I'm trying to partition a table, which is straightforward enough thanks to the great documentation, but i have a question: If I partition using something like a product_id for example and have check constraints such as (id>=1000 and id<2000) then everything is fine and the planner correctly uses the right subset of the tables. However I would like to partition by the first letter and using something like this substr(word,1,1)='a' is ignored by the planner. From reading the docs I understand that complicated check constraints are ignored, but this doesn't seem overly complicated. Am i doing something wrong or is there another better way to do this Have you tried: (word >= 'a' and word <'b') Cheers, had my programming head on. One question: any ideas about what to put for the last in the list i thought something like (word>='z' and word<'{') which is based on the ascii ordering. - my db is using utf8 I tried to check this by doing select * from words where word >'' order by word limit 10; which returns '.' as the first result (ok not a word, but that is a different issue) but if i do select * from words where word <'.' order by word desc limit 10 I get '/...' as the first result, I would expect '', this doesn't seem consistent. I'm obviously missing some inherent sorting behaviour her, but not sure.. Thanks John -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Partitioning improvements query
Hi all, I was just wondering if any progress has been made on improving partitioning, particuarly performance wise. I've found a few documents on the web, for example: http://wiki.postgresql.org/wiki/Table_partitioning and http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf, http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap which mention improvements to partitioning, but I can't find any info if these have been acted on. Just curious as things like pushing limits down to the sub queries would be a great feature, etc Cheers John -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] obtaining difference between minimum value and next in size
Hi, I was wondering if it is possible to do this with a single query rather than iterate over all of the rows in an application: I have a table which for brevity looks like: create table offers { integer id; integer product_id; double price; } where for each product there is a number of offers in this table. Now my question: Is it possible to obtain the difference between just the minimum price and the next one up per product, so say I have the following data: id, product_id, price 123, 2, 10.01 125, 2, 10.05 128, 2, 11.30 134, 3, 9.45 147, 3, 11.42 157, 3, 12.08 167, 3, 12.09 then I would like the following returned product_id, difference 2, .04 (10.05-10.01) 3, 1.97 (11.42-9.45) ,etc Any ideas? Thanks John -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/
Re: [SQL] obtaining difference between minimum value and next in size
Cheers oliverios and tom for your speedy replies. Unfortunately using v8.3 so the new functions are out. A big credit to oliverios for his sql fu, that seems to do exactly what I want and I think I pretty much understand the query. I always forget the comparison on the rows when thinking about groups. John - Original Message - From: Oliveiros d'Azevedo Cristina To: John Lister ; pgsql-sql@postgresql.org Sent: Wednesday, November 17, 2010 4:09 PM Subject: Re: [SQL] obtaining difference between minimum value and next in size Hi, John. I am not familiar with the functions Tom's indicated and I'm sure they constitute a much more straightfoward to solve your problem. Meanwhile, if you'd like to solve it with just SQL give this a try and see if it gives you the result you want Best, Oliveiros SELECT product_id, MIN(pv2) - pv1 FROM (( SELECT product_id,MIN(price) as pv1 FROM offers GROUP BY product_id) firstSubQuery NATURAL JOIN ( SELECT product_id,price as pv2 FROM offers) secondSubQuery ) total WHERE pv1 <> pv2 GROUP BY product_id,pv1 - Original Message ----- From: John Lister To: pgsql-sql@postgresql.org Sent: Wednesday, November 17, 2010 3:11 PM Subject: [SQL] obtaining difference between minimum value and next in size Hi, I was wondering if it is possible to do this with a single query rather than iterate over all of the rows in an application: I have a table which for brevity looks like: create table offers { integer id; integer product_id; double price; } where for each product there is a number of offers in this table. Now my question: Is it possible to obtain the difference between just the minimum price and the next one up per product, so say I have the following data: id, product_id, price 123, 2, 10.01 125, 2, 10.05 128, 2, 11.30 134, 3, 9.45 147, 3, 11.42 157, 3, 12.08 167, 3, 12.09 then I would like the following returned product_id, difference 2, .04 (10.05-10.01) 3, 1.97 (11.42-9.45) ,etc Any ideas? Thanks John -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/
[SQL] Full text search ordering question
Hi, is it possible to order the results of a full text search using another field? for example with the following table: CREATE TABLE breadcrumbs ( node_id integer NOT NULL, breadcrumb character varying, textsearchable tsvector, views integer, CONSTRAINT pk_breadcrumbs PRIMARY KEY (node_id) ) I'd like to do something like this select node_id, views from breadcrumbs where textsearchable @@ to_tsquery('word') order by views desc limit 100; As such I'd like to create a fts index on the textsearchable field and views field such that it orders the results by the views column. atm, this table has over 3M rows (and is likely to b magnitudes bigger) and some words match hundreds of thousands of rows, The best i've got so far is to create a fts index which is used and then the resulting rows are sorted in memory. Unfortunately because of the number of rows returned this takes a few seconds. With a btree index i could index on the 2 columns and it would only hit the index and take a fraction of a second. I've tried the btree_gist module, but it doesn't make any difference (except in letting me use an int in the gist index) Any ideas or is this simply not possible? Thanks -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/ -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Full text search ordering question
Thanks for the reply and apologies, it was my first post. I'm running on PG 8.3.3 (ubuntu), i haven't tried gin as that doesn't support multi-column. I haven't used 8.4 as it is still in development? a sample query is as follows select node_id from breadcrumbs where textsearchable @@ to_tsquery('book') order by views desc limit 100; explain analyze results in this: Limit (cost=10300.58..10300.83 rows=100 width=381) (actual time=69887.851..69887.880 rows=100 loops=1) -> Sort (cost=10300.58..10307.61 rows=2812 width=381) (actual time=69887.849..69887.862 rows=100 loops=1) Sort Key: views Sort Method: top-N heapsort Memory: 84kB -> Bitmap Heap Scan on breadcrumbs (cost=171.49..10193.10 rows=2812 width=381) (actual time=60311.197..69574.742 rows=569519 loops=1) Filter: (textsearchable@@ to_tsquery('book'::text))" -> Bitmap Index Scan on idx_breadcr (cost=0.00..170.79 rows=2812 width=0) (actual time=60261.959..60261.959 rows=569519 loops=1) Index Cond: (textsearchable @@ to_tsquery('book'::text)) Total runtime: 69896.896 ms As you can see it sorts the full result set from the search. Ideally i'd like to use an index on the views. How stable is 8.4? Is it worth trying that or is the multi-column gin likely to be back-ported? Thanks John, it's a good tradition to include query and their EXPLAIN ANALYZE. Pg version is also useful. Did you try GIN index ? In 8.4 you can use gin index on (views,tsvector) Oleg On Tue, 25 Nov 2008, John Lister wrote: Hi, is it possible to order the results of a full text search using another field? for example with the following table: CREATE TABLE breadcrumbs ( node_id integer NOT NULL, breadcrumb character varying, textsearchable tsvector, views integer, CONSTRAINT pk_breadcrumbs PRIMARY KEY (node_id) ) I'd like to do something like this select node_id, views from breadcrumbs where textsearchable @@ to_tsquery('word') order by views desc limit 100; As such I'd like to create a fts index on the textsearchable field and views field such that it orders the results by the views column. atm, this table has over 3M rows (and is likely to b magnitudes bigger) and some words match hundreds of thousands of rows, The best i've got so far is to create a fts index which is used and then the resulting rows are sorted in memory. Unfortunately because of the number of rows returned this takes a few seconds. With a btree index i could index on the 2 columns and it would only hit the index and take a fraction of a second. I've tried the btree_gist module, but it doesn't make any difference (except in letting me use an int in the gist index) Any ideas or is this simply not possible? Thanks -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/ Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] adding "order by" to a "group by" query
>(still curious about the "must be used in an aggregate function" error >though... because I do use it in an aggregate) You're original query grouped on the person id and name, therefore you can only return (and order by) these functions or the result of an aggregate function on other columns (such as the array_accum function). I'm no expert, but I think the error is slightly misleading, normally you would order by the result of an aggregate function but maybe the parser does this implicitly for you sometimes. does select p.id_person, person_name(p), array_accum(distinct pt.type_fr) from person p left join person_to_event x using (id_person) left join person_type pt using (id_person_type) where person_name(p) ilike '%will%' group by p.id_person,person_name(p) order by 3; work for you? -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Best way to "and" from a one-to-many joined table?
I guess it depends on the optimiser and how clever it is. With the former the db will probably generate 2 sets of ids for the 2 joined tables (a, b) which only contain the values you require, these lists are probably much smaller than the total number of rows in the table therefore any merges and sorts on them have to operate on less rows and will be quicker. With the latter query it has to fetch all the rows regardless of the attribute and then do the restriction at the end, which results in more rows, bigger merges and sorts and takes longer... Obviously postgres may be clever enough to realise what you want and rearrange the query internally to a more efficient form. Generally to find out what it is doing stick "EXPLAIN (ANALYZE)" in front. This will show you the steps the db is taking to perform the query and in what order. If you include ANAYLZE then the db actually does the query (throwing away the results) and gives you accurate values, etc otherwise it shows you estimated values based on the various stats collected for the table. SELECT person_name FROM test_people p JOIN test_attributes a ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr)) JOIN test_attributes b ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr)); Hi, I saw a few people post answers to this question and it raised another related question for me. What are the differences between the above query and this one. Are they semantically/functionally identical but might differ in performance? Or would they be optimized down to an identical query? Or am I misreading them and they are actually different? SELECT person_name FROM test_people p JOIN test_attributes a ON ((a.people_id = p.people_id) JOIN test_attributes b ON ((b."people_id" = p."people_id") WHERE (a."attribute" = @firstAttr)) AND (b."attribute" = @secondAttr)); Also, any suggestions about how to figure out this on my own without bugging the list in the future would be great. Thanks for any insight! Steve p.s. I posting in the same thread, but if you think I should have started a new thread let me know for the future. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] temp tables versus normal tables
I've got a process that every minute or so selects some data from a number of tables. At the minute i dump this into a normal table, where i do some more processing with it before truncating the table and starting again.. I don't have any indexes on the temporary table but have thought about adding some for the processing stage. My question is, would a proper temp table be any faster/better. Are they stored entirely in memory or written to disk (i don't care if the data is lost on server restart/crash) I suppose i could also use a view for the initial "insert into" and bypass the table althogether but i'm guessing this would be slower as the data would need to be looked up each time the view is used for the subsequent processing steps.. Any thoughts Thanks JOHN -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/
[SQL] Query planning question
Doing the following query select distinct m.id, m.name from manufacturer_manufacturer m join product_product p on (p.manufacturer_id=m.id) join retailer_offer o on (o.product_id=p.id) where o.retailer_id=XXX and o.active results in one of 2 query plans depending upon the value of XXX. The first ignores the index on products and does a hash join which is very slow, the second uses the index and does a nested loop which is fast. Am I right in assuming the planner thinks a sequential scan is quicker than 10k index hits, would tweaking the costs fix this or would i be better updating the stats for the product_id and manufacturer_id fields? "Unique (cost=318308.62..321110.94 rows=1029 width=13) (actual time=5057.271..5296.973 rows=699 loops=1)" " -> Sort (cost=318308.62..319242.73 rows=373642 width=13) (actual time=5057.270..5196.780 rows=455733 loops=1)" "Sort Key: m.id, m.name" "Sort Method: external merge Disk: 11032kB" "-> Hash Join (cost=110196.74..283725.63 rows=373642 width=13) (actual time=1706.287..3451.352 rows=455733 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Hash Join (cost=110163.59..278554.90 rows=373642 width=4) (actual time=1705.652..3230.879 rows=455733 loops=1)" "Hash Cond: (o.product_id = p.id)" "-> Bitmap Heap Scan on retailer_offer o (cost=9418.68..157960.21 rows=373642 width=4) (actual time=120.277..382.208 rows=455733 loops=1)" " Recheck Cond: ((retailer_id = 1347) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..9325.27 rows=373642 width=0) (actual time=79.503..79.503 rows=455829 loops=1)" "Index Cond: (retailer_id = 1347)" "-> Hash (cost=59067.07..59067.07 rows=2540307 width=8) (actual time=1584.994..1584.994 rows=2540324 loops=1)" " -> Seq Scan on product_product p (cost=0.00..59067.07 rows=2540307 width=8) (actual time=0.008..698.313 rows=2540324 loops=1)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)" "-> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.007..0.278 rows=1029 loops=1)" "Total runtime: 5310.663 ms" "Unique (cost=43237.52..43266.80 rows=1029 width=13) (actual time=190.978..196.625 rows=276 loops=1)" " -> Sort (cost=43237.52..43247.28 rows=3903 width=13) (actual time=190.977..192.431 rows=11298 loops=1)" "Sort Key: m.id, m.name" "Sort Method: quicksort Memory: 1037kB" "-> Hash Join (cost=134.14..43004.70 rows=3903 width=13) (actual time=5.006..155.188 rows=11298 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Nested Loop (cost=100.99..42917.88 rows=3903 width=4) (actual time=4.363..146.421 rows=11298 loops=1)" "-> Bitmap Heap Scan on retailer_offer o (cost=100.99..13663.63 rows=3903 width=4) (actual time=4.345..29.871 rows=11298 loops=1)" " Recheck Cond: ((retailer_id = 1710) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actual time=2.368..2.368 rows=11380 loops=1)" "Index Cond: (retailer_id = 1710)" "-> Index Scan using product_product_pkey on product_product p (cost=0.00..7.48 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=11298)" " Index Cond: (p.id = o.product_id)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.634..0.634 rows=1029 loops=1)" "-> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.275 rows=1029 loops=1)" "Total runtime: 196.716 ms" Thanks -- Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/
Re: [SQL] Query planning question
"John Lister" writes: Am I right in assuming the planner thinks a sequential scan is quicker than 10k index hits, would tweaking the costs fix this or would i be better updating the stats for the product_id and manufacturer_id fields? AFAICT the planner did exactly the right things here. Your first example is fetching 40 times as many rows from retailer_offer as the second one is. If the planner had stuck with the nestloop plan, it would've taken about 40x as long, and been significantly slower than the hash join. Cheers for the quick reply, maybe not the best values, see the following 2 plans with approx the same number of product rows but different results and times. I forgot to mention that the product table has 2.5M rows although this is apparent from the plans: with hash join: "Unique (cost=199627.47..199900.51 rows=1029 width=13) (actual time=2226.358..2238.255 rows=49 loops=1)" " -> Sort (cost=199627.47..199718.48 rows=36406 width=13) (actual time=2226.356..2230.342 rows=37086 loops=1)" "Sort Key: m.name, m.id" "Sort Method: quicksort Memory: 3276kB" "-> Hash Join (cost=101700.78..196869.37 rows=36406 width=13) (actual time=1759.983..2193.453 rows=37086 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Hash Join (cost=101667.62..196335.64 rows=36406 width=4) (actual time=1759.338..2174.826 rows=37086 loops=1)" "Hash Cond: (o.product_id = p.id)" "-> Bitmap Heap Scan on retailer_offer o (cost=921.66..84697.06 rows=36406 width=4) (actual time=12.168..49.759 rows=37086 loops=1)" " Recheck Cond: ((retailer_id = 5149) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..912.56 rows=36406 width=0) (actual time=7.136..7.136 rows=37089 loops=1)" "Index Cond: (retailer_id = 5149)" "-> Hash (cost=59067.54..59067.54 rows=2540354 width=8) (actual time=1746.670..1746.670 rows=2540383 loops=1)" " -> Seq Scan on product_product p (cost=0.00..59067.54 rows=2540354 width=8) (actual time=0.012..787.095 rows=2540383 loops=1)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.635..0.635 rows=1029 loops=1)" "-> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.296 rows=1029 loops=1)" "Total runtime: 2244.036 ms" and without: "Unique (cost=43237.53..43266.80 rows=1029 width=13) (actual time=410.191..421.953 rows=332 loops=1)" " -> Sort (cost=43237.53..43247.29 rows=3903 width=13) (actual time=410.189..414.351 rows=32959 loops=1)" "Sort Key: m.name, m.id" "Sort Method: quicksort Memory: 3384kB" "-> Hash Join (cost=134.15..43004.71 rows=3903 width=13) (actual time=16.356..328.938 rows=32959 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Nested Loop (cost=100.99..42917.89 rows=3903 width=4) (actual time=15.716..308.037 rows=32959 loops=1)" "-> Bitmap Heap Scan on retailer_offer o (cost=100.99..13663.64 rows=3903 width=4) (actual time=15.693..67.479 rows=32959 loops=1)" " Recheck Cond: ((retailer_id = 2016) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actual time=7.863..7.863 rows=33369 loops=1)" "Index Cond: (retailer_id = 2016)" "-> Index Scan using product_product_pkey on product_product p (cost=0.00..7.48 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=32959)" " Index Cond: (p.id = o.product_id)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)" "-> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.270 rows=1029 loops=1)" "Total runtime: 422.058 ms" You can see that the sequential scan is significantly slower than the index scan (i've tried to mitigate any caching by the OS with these results). Postgresql 8.3.5 running on a Quad Core Xeon 2Ghz with 12Gb ram. All costs set to defaults, shared_buffers=4.2GB and effective_cache=6Gb. I thought with the later versions more shared_buffers was better, is this too much?? Thanks JOHN -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql