Re: [GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Stefan Keller writes: > Hi Tom, hi all > Thanks, Tom, for your tipps. You answered 2011/5/1: >> (...), and there's no point in having the >> index column contents be the entire tags value (which is what's leading >> to the failure). Consider >> >> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity)); > To get a more general purpose index I tried also: > CREATE INDEX planet_osm_point_tags ON planet_osm_point USING gist(tags); > -- ERROR: invalid hstore value found > -- SQL state: XX000 > And I'm really interested in being able to use GIST. GIST is also > recommended here: > http://www.bostongis.com/PrinterFriendly.aspx?content_name=loading_osm_postgis > But I still get error 'invalid hstore value found' on my machine - > whatever I do! Do you have a self-contained test case for this? The symptom seems similar to a bug we found back around 9.0 release --- maybe there's another instance of the same mistake someplace. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Hi Tom, hi all Thanks, Tom, for your tipps. You answered 2011/5/1: > (...), and there's no point in having the > index column contents be the entire tags value (which is what's leading > to the failure). Consider > > create index planet_osm_point_amenity on planet_osm_point ((tags->amenity)); To get a more general purpose index I tried also: CREATE INDEX planet_osm_point_tags ON planet_osm_point USING gist(tags); -- ERROR: invalid hstore value found -- SQL state: XX000 And I'm really interested in being able to use GIST. GIST is also recommended here: http://www.bostongis.com/PrinterFriendly.aspx?content_name=loading_osm_postgis But I still get error 'invalid hstore value found' on my machine - whatever I do! I tried hard to find out the reason and also to remedy the cause of this failure. For example I did an update like this: UPDATE planet_osm_point SET tags = hstore(hstore_to_array(tags)); ...with no success. I'm running "PostgreSQL 9.1alpha1, compiled by Visual C++ build 1500, 32-bit" on Windows XP SP3. I have a dump of the table/database at hand to anyone who is interested in this possible bug. Yours, Stefan 2011/5/1 Tom Lane : > Stefan Keller writes: >> I'm doing an equality search with success with the '->' operator on >> the same field 'tags' like in this query (1): > >> -- Count all restaurants in database ("amenity = restaurant''): >> select count(*) from osm_all_v >> where hstore(tags)->'amenity'='restaurant' > >> This query 1 is reasonably fast and could be accelerated using this >> functional index: >> CREATE INDEX planet_osm_point_tags_restaurant >> ON planet_osm_point >> USING btree (tags) >> WHERE (tags -> 'amenity'::text) = 'restaurant'::text; > > This index seems a bit carelessly defined. There's no need to confine > its usefulness to exactly that query, and there's no point in having the > index column contents be the entire tags value (which is what's leading > to the failure). Consider > > create index planet_osm_point_amenity on planet_osm_point ((tags->amenity)); > > which will work for the above query and any other that's looking for a > specific value of tags->amenity. > >> SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*) >> FROM planet_osm_point p, ( >> SELECT key, count(*) >> FROM ( >> SELECT (each(tags)).key FROM planet_osm_point >> ) AS stat >> GROUP BY key HAVING count(*) > 1 >> AND key NOT LIKE 'note:%' >> ... >> AND key NOT IN >> ('ele','ref','url','website','email','maxspeed','converted_by', ... ) >> ) keys >> WHERE hstore(p.tags)->keys.key >'' >> GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1 >> ORDER by 1,3 desc > > It's pretty much useless to think about indexes for queries like this. > If it's going to scan the whole table anyway, as this surely is, then > an index is not going to make it faster. > > regards, tom lane > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Stefan Keller writes: > I'm doing an equality search with success with the '->' operator on > the same field 'tags' like in this query (1): > -- Count all restaurants in database ("amenity = restaurant''): > select count(*) from osm_all_v > where hstore(tags)->'amenity'='restaurant' > This query 1 is reasonably fast and could be accelerated using this > functional index: > CREATE INDEX planet_osm_point_tags_restaurant > ON planet_osm_point > USING btree (tags) > WHERE (tags -> 'amenity'::text) = 'restaurant'::text; This index seems a bit carelessly defined. There's no need to confine its usefulness to exactly that query, and there's no point in having the index column contents be the entire tags value (which is what's leading to the failure). Consider create index planet_osm_point_amenity on planet_osm_point ((tags->amenity)); which will work for the above query and any other that's looking for a specific value of tags->amenity. > SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*) > FROM planet_osm_point p, ( > SELECT key, count(*) > FROM ( > SELECT (each(tags)).key FROM planet_osm_point > ) AS stat > GROUP BY key HAVING count(*) > 1 > AND key NOT LIKE 'note:%' > ... > AND key NOT IN > ('ele','ref','url','website','email','maxspeed','converted_by', ... ) > ) keys > WHERE hstore(p.tags)->keys.key >'' > GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1 > ORDER by 1,3 desc It's pretty much useless to think about indexes for queries like this. If it's going to scan the whole table anyway, as this surely is, then an index is not going to make it faster. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Hi Tom, Thanks for the hint! I'm actually doing a GROUP BY with an inquality search (HAVING...) and still get long lasting queries, see query (2) below. I'm doing an equality search with success with the '->' operator on the same field 'tags' like in this query (1): -- Count all restaurants in database ("amenity = restaurant''): select count(*) from osm_all_v where hstore(tags)->'amenity'='restaurant' This query 1 is reasonably fast and could be accelerated using this functional index: CREATE INDEX planet_osm_point_tags_restaurant ON planet_osm_point USING btree (tags) WHERE (tags -> 'amenity'::text) = 'restaurant'::text; But look at query 2: This GROUP BY query gives following results (after a VACUUM FULL ANALYZE): * 26265ms without any index on field 'tags' * 26000ms with GIN index on field 'tags' * 26078ms with HASH index on field 'tags' To sum up: * GIN and HASH indexes don't make a difference. * BTree index fails (as reported) on field 'tags' with ERROR 'index row size 3120 exceeds maximum 2712' SQL state: 54000). * GIST index fails too on field 'tags' but with ERROR 'invalid hstore value found', SQL state: XX000. Q: => The latter indicates an "invalid value" - but does not say what's invalid. => I have no clue how to boost query 2...? See below the query and the explain output. Yours, Stefan QUERY 2 === -- Return all key-values of semantic type 'enum' without types numeric, date/time etc. -- (actually hstore handles all tag/values as of type text): SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*) FROM planet_osm_point p, ( SELECT key, count(*) FROM ( SELECT (each(tags)).key FROM planet_osm_point ) AS stat GROUP BY key HAVING count(*) > 1 AND key NOT LIKE 'note:%' ... AND key NOT IN ('ele','ref','url','website','email','maxspeed','converted_by', ... ) ) keys WHERE hstore(p.tags)->keys.key >'' GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1 ORDER by 1,3 desc ... "amenity";"bus_stop";24414 "amenity";"restaurant";5423 "amenity";"bench";5041 "amenity";"parking";4232 "amenity";"fire_hydrant";2363 "amenity";"post_box";1838 "amenity";"fuel";1628 "amenity";"place_of_worship";1615 ... EXPLAIN output: "Sort (cost=6043108.58..6054613.93 rows=4602142 width=118) (actual time=331415.987..331417.914 rows=964 loops=1)" " Sort Key: (min(keys.key)), (count(*))" " Sort Method: quicksort Memory: 85kB" " -> GroupAggregate (cost=4173505.61..4401220.42 rows=4602142 width=118) (actual time=330101.581..331409.834 rows=964 loops=1)" "Filter: (count(*) > 1)" "-> Sort (cost=4173505.61..4198338.94 rows=9933329 width=118) (actual time=330101.487..330870.237 rows=196227 loops=1)" " Sort Key: ((p.tags -> keys.key))" " Sort Method: external merge Disk: 21960kB" " -> Nested Loop (cost=22534.58..574521.77 rows=9933329 width=118) (actual time=6747.314..327192.811 rows=196227 loops=1)" "Join Filter: ((p.tags -> keys.key) > ''::text)" "-> Seq Scan on planet_osm_point p (cost=0.00..5649.22 rows=182822 width=86) (actual time=0.015..516.191 rows=182822 loops=1)" "-> Materialize (cost=22534.58..22539.88 rows=163 width=32) (actual time=0.039..0.890 rows=420 loops=182822)" " -> Subquery Scan on keys (cost=22534.58..22539.07 rows=163 width=32) (actual time=6740.620..6743.358 rows=420 loops=1)" "-> HashAggregate (cost=22534.58..22537.44 rows=163 width=32) (actual time=6740.613..6741.651 rows=420 loops=1)" " Filter: (count(*) > 1)" " -> Subquery Scan on stat (cost=0.00..21417.62 rows=148929 width=32) (actual time=0.093..6230.640 rows=196458 loops=1)" "Filter: ((stat.key !~~ 'name%'::text) AND (stat.key !~~ 'addr:%'::text) AND (stat.key !~~* 'FIXME'::text) AND (stat.key !~~* 'openGeoDB:%'::text) AND (stat.key !~~ 'note:%'::text) AND (stat.key !~~ '%_ref'::text) AND (stat.key !~~ '%description%'::text) AND (stat.key !~~ 'contact:%'::text) AND (stat.key !~~ 'operator:%'::text) AND (stat.key !~~ 'uic_%'::text) AND (stat.key !~~ 'TMC%'::text) AND (stat.key !~~ 'uic_%'::text) AND (stat.key !~~ 'direction%'::text) AND (stat.key !~~ 'is_in%'::text) AND (stat.key !~~ 'wikipedia%'::text) AND (stat.key <> ALL ('{ele,ref,url,website,email,maxspeed,converted_by,layer,level,phone,mobility:station_id,information,opening_hours,date,time,collection_times,operator,colour,fee,nat_name,alt_name,population,seats,postal_code,capacity,line,lines,maxheight,reg_name}'::text[])))" "-> Seq Scan on planet_osm_point (cost=0.00..6106.28 rows=182822 width=86) (actual time=0.025..2363.603 rows=575700 loops=1)" "Total runtime: 331426.018 ms" 2011/5/1 Tom Lane : > Stefan Keller writes: >> Any ideas on how to index my hstore attribute? > > Use a GIST or GIN index. The only
Re: [GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Stefan Keller writes: > Any ideas on how to index my hstore attribute? Use a GIST or GIN index. The only thing that a btree index on hstore can do for you is to support equality comparisons on the whole hstore value, which is pretty unlikely to be what you're after. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Hi, 2011/3/13 Viktor Nagy > when trying to insert a long-long value, I get the following error: > > ERROR: Index row size 3120 exceeds maximum 2712 for index > "ir_translation_ltns" > HINT: Values larger than 1/3 of a buffer page cannot be indexed. > Consider a function index of an MD5 hash of the value, or use full text > indexing. I get the same error but I'm using a hstore attribute (called 'tags'). Unfortunately, the trick with the MD5 function index does not work neither: CREATE INDEX planet_osm_point_tags ON planet_osm_point ((md5(tags))); ERROR: Funktion md5(hstore) does not exist SQL state: 42883 Any ideas on how to index my hstore attribute? Yours, S. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general