Re: [GENERAL] full text search on hstore or json with materialized view?
On Thu, 20 Apr 2017 07:56:18 -0700, Rj Ewingwrote: >On Wed, Apr 19, 2017 at 6:44 PM, George Neuner wrote: >> >> If you can restrict the FTS query to certain keys: >> >> SELECT id FROM mytable >> WHERE tsquery( ... ) @@ to_tsvector(v) >> AND k IN ( ... ) >> GROUP BY id >> >> [note: according to David Rowley, GROUP BY may be parallelized >> whereas DISTINCT currently cannot be.] >> >> then given an index on 'k' it may be much faster than just the FTS >> query alone. Subject to key variability, it also may be improved by >> table partitioning to reduce the search space. >> >> If the FTS query is key restricted, you can parallelize either on the >> client or on the server. If the FTS query is not key restricted, you >> pretty much are limited to server side (and 9.6 or later). >> >> ?I'll look into parallelism if we can't get the performance we need. > >What do you mean if I can restrict the FTS query to certain keys? I'm not >a sql expert, but it seems like the above query would match multiple keys >to 1 tsquery value You weren't specific as to the types of queries you wanted ... you mentioned somewhere higher up in the discussion: > ... a basic full text query on 44 million row is taking aproxx. 20ms. That implied you wanted to FTS search every row. Only later did you give an example that tied FTS patterns to particular keys. Until you did that, there was no reason to assume the FTS search was targeted - you might have wanted e.g., records where *any* k:v value matched the FTS pattern. [The take away here is: "try to be as specific as possible". 8-) ] Obviously you can associate a FTS pattern with a particular key value - just AND the conditions in the WHERE or HAVING clauses. But be aware that, in general, the more conditions you place on a query, the slower it runs. George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
On Thu, 20 Apr 2017 08:50:31 -0700, Rj Ewingwrote: >On Wed, Apr 19, 2017 at 9:55 PM, George Neuner wrote: > >> ... Since you are *testing* with 1M records (that >> create 44M k:v shards), I am assuming you will need to deal with much >> more than that in deployment. And if you think you need FTS, then you >> must be expecting more than simple word matches [as below], else you >> might do something simpler like >> >> SELECT ... >> WHERE val ILIKE > >the 1M records would most likely be the max. On average the tables would >have more like 100,000 records each. Ok, so my assumption was way off ... you should be able to achieve the timing you want with appropriate indexing. >from my understanding, *ILIKE* doesn't do any text normalization, which is >something we would like to have. Right. If you want rooting/stemming or dictionary translation, then you do need to use FTS. >> >how would I write an AND query that filtered on 2 separate keys from the >> >samples_lg_txt table? >> > >> >something like: >> > >> > SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM >> > samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND >> > tsv @@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ >> > to_tsquery('value2'))*; >> >> You're overthinking it >> >> SELECT count(distinct s.id) >> FROM samples_lg_txt AS s >> JOIN keys AS k ON k.id = s.key >> WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1') >>OR (k.name = 'key2' AND s.tsv @@ to_query('value2') > >but that is an OR query, I'm trying to do an AND query. Sorry, I missed the AND in your original query. Still the nested SELECT is unnecessary. Postgresql's planner/optimizer is pretty smart, and probably would collapse your code into mine (modulo the AND/OR goof), but I prefer not to rely on the planner to be smart ... that gets you into trouble when you have to switch between DBMS. >> There's actually no need to join if you can use the key name instead >> of an integer id. You can FK on strings, so you can still maintain an >> identity table of keys. E.g., >> >> > id | integer | >> > key| vchar(32) | FK key(name) ... >> > val| text | >> > tsv| tsvector | >> >> >> Then the query could be just >> >> SELECT count(distinct id) >> FROM samples_lg_txt >> WHERE (key = 'key1' AND tsv @@ to_query('value1') >>OR (key = 'key2' AND tsv @@ to_query('value2') >> > >?this would make queries simpler?. Yes - it eliminates the joins, and the query runs on a single table. >I guess a disadvantage to using a string >for the key is that the db size would be larger, and thus not as likely to >fit the entire table in ram. If there are only 63 keys across 44M rows, it >seems that storing an smallint would take less space then storing the >string. Maybe. Using the integer FK reduces the table size, but it requires a join with the foreign table. A join of two tables requires indexes for the join columns on both tables [which may or may not already exist], and produces [variously] a temporary hash or key relation table that represents the rows of the "joined" table. These temporary structures can grow very large and may have to spill onto disk. You can somewhat control that with the work_mem setting. But remember that the setting applies to every operation of every concurrent query ... so setting work_mem very high can backfire. So saving one place can cost you in another. TANSTAAFL. >I don't really have a need for the identity table of keys. It's only >purpose was to shrink the database size. > >Thanks again for the detailed responses! George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
On Wed, Apr 19, 2017 at 9:55 PM, George Neunerwrote: > On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewing > wrote: > > >okay, messing around a bit more with the secondary k,v table it seems like > >this could be a good solution.. > > > >I created a keys table to hold the 63 key values, then I dropped and > >recreated the secondary table, using a FK referencing the keys table. I'm > >not really sure why, but a basic full text query on 44 million row is > >taking aproxx. 20ms. > > That pretty much confirms your statistics were bad ... using the FK > table or not wouldn't make any difference to the planner. > > But if you are getting 20ms on 44M rows, then one or more of the > following must be true: > - your text values must be very short > - your FTS queries must be very simple > - you aren't reading the results > text is on average very short. 1-3 words per value. It was a count(*) query with only a single condition. > For comparison: > > I have an application that does FTS on a table of NAICS descriptions > indexed using tsvectors with an average length of 4.8 tokens per. It > does a 3-part All/Any/None term search. > > On my 24-core 2.4GHz server, a single threaded query with the whole > table and index in memory takes ~1 ms to search 20K rows using a > realistic tsquery: e.g., > > SELECT code,description > FROM naics > WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)') > > [getting the data out of Postgresql takes longer than the search] > > > GIN indexes don't exactly scale linearly, and tsquery is, in general, > much more dependent on the lengths of the tsvectors than on the > complexity of the match, but with 44M rows of similarly distributed > data, a similarly realistic query would be expected to take well over > 1 second. > > > My example is genuine but too small to bother parallelizing [mentioned > in a previous message]. Since you are *testing* with 1M records (that > create 44M k:v shards), I am assuming you will need to deal with much > more than that in deployment. And if you think you need FTS, then you > must be expecting more than simple word matches [as below], else you > might do something simpler like > > SELECT ... > WHERE val ILIKE > > the 1M records would most likely be the max. On average the tables would have more like 100,000 records each. I also realized that I inserted all k:v pairs into the secondary k:v table. In reality, I would only index strings, which would eliminate approx 25% of the k:v bringing that number down closer to 30M. from my understanding, *ILIKE* doesn't do any text normalization, which is something we would like to have. > > > > >my table structure is: > > > > Table "public.samples_lg_txt" > > Column | Type | Modifiers > >+--+--- > > id | integer | > > key| integer | > > val| text | > > tsv| tsvector | > >Indexes: > >"idx_tsv_samples_lg_text" gin (tsv) > >Foreign-key constraints: > >"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) > >ON DELETE CASCADE > >"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id) > > > > > >how would I write an AND query that filtered on 2 separate keys from the > >samples_lg_txt table? > > > >something like: > > > >SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM > >samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND > tsv > >@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ > >to_tsquery('value2'))*; > > You're overthinking it > > SELECT count(distinct s.id) > FROM samples_lg_txt AS s > JOIN keys AS k ON k.id = s.key > WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1') >OR (k.name = 'key2' AND s.tsv @@ to_query('value2') > > but that is an OR query, I'm trying to do an AND query. > There's actually no need to join if you can use the key name instead > of an integer id. You can FK on strings, so you can still maintain an > identity table of keys. E.g., > > > id | integer | > > key| vchar(32) | FK key(name) ... > > val| text | > > tsv| tsvector | > > > Then the query could be just > > SELECT count(distinct id) > FROM samples_lg_txt > WHERE (key = 'key1' AND tsv @@ to_query('value1') >OR (key = 'key2' AND tsv @@ to_query('value2') > > > Just a reminder [it's late here 8-)]: FK columns contain values - not > weird references to the foreign tables. The constraint just enforces > that any value inserted/updated into the FK column matches an existing > value in the relevant foreign table. this would make queries simpler. I guess a disadvantage to using a string for the key is that the db size would be larger, and thus not as likely to fit the entire table in ram. If there are only 63 keys across 44M rows, it seems that storing an smallint would take less space then storing the string. I don't
Re: [GENERAL] full text search on hstore or json with materialized view?
On Wed, Apr 19, 2017 at 8:09 PM, Jeff Janeswrote: > > Your best bet might be to ignore the per-field searching in the initial > (indexed) pass of the query to get everything that has all the search > terms, regardless of which field they occur in. And the re-check whether > each of the found values was found in the appropriate field in a later pass. > > Something like > > select * from sample where > to_tsvector(json_thing->>:key1) @@ :value1 > and to_tsvector(json_thing->>:key2) @@ :value2 > and to_tsvector('english',json_thing) @@ (:value1 || :value2) > that worked pretty well when there was an AND condition with multiple k:v pairs as you have. However replacing it with an OR condition across k:v pairs it was pretty slow. I do like the simplicity though. Maybe indexing the 10ish most common columns would be a "good enough" solution. > > From the initial email: > > > An idea that has come up is to use a materialized view or secondary > table with triggers, where we would have 3 columns (id, key, value). > > How would this be different from the "triple store" you are abandoning? > it would be fairly similar. One advantage would be that we could simplify the backend to just a RDMS (which we use already), and not have to maintain a separate "triple store" instance
Re: [GENERAL] full text search on hstore or json with materialized view?
On Wed, Apr 19, 2017 at 6:44 PM, George Neunerwrote: > > On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing > wrote: > > >I did some testing using a secondary table with the key, value column. > >However I don't think this will provide the performance that we need. > >Queries we taking 60+ seconds just for a count. > > SELECT count(*) or filtered? > > Either way, your statistics may be way off. Did you vacuum analyze > the table after the inserts (and the index creation if it was done > separately)? I think my statistics were off. I never ran vacuum analyze. > > Without more detail re: your hardware, Postgresql version, what > indexes are/will be available, the types of queries you want to run, > etc., it's very hard to give really meaningful suggestions. > > postgresql 9.6, currently testing on 8gb ram, but have upto 64gb for production. 7-core 2.10GHz. mostly want to run ad-hoc queries, returning entire row, matching 1 - 3 k:v conditions. The kind of query you have alluded to is pretty easily parallelized: > it can be spread over multiple sessions with result aggregation done > on the client side. > > Or, if you you have 9.6, you might try using backend parallelism: > https://www.postgresql.org/docs/9.6/static/parallel-query.html > [I've not used this, but some people have done it successfully.] > > If you can restrict the FTS query to certain keys: > > SELECT id FROM mytable > WHERE tsquery( ... ) @@ to_tsvector(v) > AND k IN ( ... ) > GROUP BY id > > [note: according to David Rowley, GROUP BY may be parallelized > whereas DISTINCT currently cannot be.] > > then given an index on 'k' it may be much faster than just the FTS > query alone. Subject to key variability, it also may be improved by > table partitioning to reduce the search space. > > If the FTS query is key restricted, you can parallelize either on the > client or on the server. If the FTS query is not key restricted, you > pretty much are limited to server side (and 9.6 or later). > > I'll look into parallelism if we can't get the performance we need. What do you mean if I can restrict the FTS query to certain keys? I'm not a sql expert, but it seems like the above query would match multiple keys to 1 tsquery value? We need to be able to do AND conditions with separate k:v pairs. Our keys are know ahead of time, and would vary for a given table, but would be in the range of 30-60 keys per table.
Re: [GENERAL] full text search on hstore or json with materialized view?
On Thu, 20 Apr 2017 00:55:48 -0400, George Neunerwrote: Doh! > SELECT count(distinct s.id) >FROM samples_lg_txt AS s >JOIN keys AS k ON k.id = s.key >WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1') > OR (k.name = 'key2' AND s.tsv @@ to_query('value2') > > : > > SELECT count(distinct id) >FROM samples_lg_txt >WHERE (key = 'key1' AND tsv @@ to_query('value1') > OR (key = 'key2' AND tsv @@ to_query('value2') All the WHERE clauses need closing parentheses. Time for bed, George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewingwrote: >okay, messing around a bit more with the secondary k,v table it seems like >this could be a good solution.. > >I created a keys table to hold the 63 key values, then I dropped and >recreated the secondary table, using a FK referencing the keys table. I'm >not really sure why, but a basic full text query on 44 million row is >taking aproxx. 20ms. That pretty much confirms your statistics were bad ... using the FK table or not wouldn't make any difference to the planner. But if you are getting 20ms on 44M rows, then one or more of the following must be true: - your text values must be very short - your FTS queries must be very simple - you aren't reading the results For comparison: I have an application that does FTS on a table of NAICS descriptions indexed using tsvectors with an average length of 4.8 tokens per. It does a 3-part All/Any/None term search. On my 24-core 2.4GHz server, a single threaded query with the whole table and index in memory takes ~1 ms to search 20K rows using a realistic tsquery: e.g., SELECT code,description FROM naics WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)') [getting the data out of Postgresql takes longer than the search] GIN indexes don't exactly scale linearly, and tsquery is, in general, much more dependent on the lengths of the tsvectors than on the complexity of the match, but with 44M rows of similarly distributed data, a similarly realistic query would be expected to take well over 1 second. My example is genuine but too small to bother parallelizing [mentioned in a previous message]. Since you are *testing* with 1M records (that create 44M k:v shards), I am assuming you will need to deal with much more than that in deployment. And if you think you need FTS, then you must be expecting more than simple word matches [as below], else you might do something simpler like SELECT ... WHERE val ILIKE >my table structure is: > > Table "public.samples_lg_txt" > Column | Type | Modifiers >+--+--- > id | integer | > key| integer | > val| text | > tsv| tsvector | >Indexes: >"idx_tsv_samples_lg_text" gin (tsv) >Foreign-key constraints: >"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) >ON DELETE CASCADE >"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id) > > >how would I write an AND query that filtered on 2 separate keys from the >samples_lg_txt table? > >something like: > >SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM >samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv >@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ >to_tsquery('value2'))*; You're overthinking it SELECT count(distinct s.id) FROM samples_lg_txt AS s JOIN keys AS k ON k.id = s.key WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1') OR (k.name = 'key2' AND s.tsv @@ to_query('value2') There's actually no need to join if you can use the key name instead of an integer id. You can FK on strings, so you can still maintain an identity table of keys. E.g., > id | integer | > key| vchar(32) | FK key(name) ... > val| text | > tsv| tsvector | Then the query could be just SELECT count(distinct id) FROM samples_lg_txt WHERE (key = 'key1' AND tsv @@ to_query('value1') OR (key = 'key2' AND tsv @@ to_query('value2') Just a reminder [it's late here 8-)]: FK columns contain values - not weird references to the foreign tables. The constraint just enforces that any value inserted/updated into the FK column matches an existing value in the relevant foreign table. George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
Please don't top-post, thanks. On Tue, Apr 18, 2017 at 3:18 PM, Rj Ewingwrote: > On Tue, Apr 18, 2017 at 3:00 PM, Bruce Momjian wrote: > >> >> Full text search of JSON and JSONB data is coming in Postgres 10, which >> is to to be released in September of this year: >> >> https://www.depesz.com/2017/04/04/waiting-for-postgresql-10- >> full-text-search-support-for-json-and-jsonb/ > > A step in the right direction for me, however it doesn't appear to support > per field full text searching. > It is exciting though! > Your best bet might be to ignore the per-field searching in the initial (indexed) pass of the query to get everything that has all the search terms, regardless of which field they occur in. And the re-check whether each of the found values was found in the appropriate field in a later pass. Something like select * from sample where to_tsvector(json_thing->>:key1) @@ :value1 and to_tsvector(json_thing->>:key2) @@ :value2 and to_tsvector('english',json_thing) @@ (:value1 || :value2) >From the initial email: > An idea that has come up is to use a materialized view or secondary table with triggers, where we would have 3 columns (id, key, value). How would this be different from the "triple store" you are abandoning? Cheers, Jeff
Re: [GENERAL] full text search on hstore or json with materialized view?
Please don't top post. https://en.wikipedia.org/wiki/Posting_style#Placement_of_replies https://en.wikipedia.org/wiki/Posting_style#Choosing_the_proper_posting_style >> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing>> wrote: >> >> > : >> >An idea that has come up is to use a materialized view or secondary table >> >with triggers, where we would have 3 columns (id, key, value). >> > >> >I think this would allow us to store a tsvector and gin index. Giving us >> >the ability to use fulltext search on k:v pairs, then join the original >> >data on the id field to return the entire record. >> > : On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing wrote: >I did some testing using a secondary table with the key, value column. >However I don't think this will provide the performance that we need. >Queries we taking 60+ seconds just for a count. SELECT count(*) or filtered? Either way, your statistics may be way off. Did you vacuum analyze the table after the inserts (and the index creation if it was done separately)? >With 1 million rows in the primary table, this resulted in 44 million rows >in the secondary k,v table for full text searching. The same query is es >takes ~50 ms on my local machine with 1/10th the ram allocated to es then >was allocated to psql. > >I'm gonna test using trigrams indexes on approx 10 json fields, and see if >that gives us what we are looking for. > >any thought on getting sub 1 sec queries on a table with 44 million rows? > >RJ Based on your description of the data [at top], I rather doubt trigrams will be an improvement over tsvector. And they're more cumbersome to use if you don't need better similarity matching than what tsvector offers [which itself is somewhat adjustable via dictionaries]. Without more detail re: your hardware, Postgresql version, what indexes are/will be available, the types of queries you want to run, etc., it's very hard to give really meaningful suggestions. The kind of query you have alluded to is pretty easily parallelized: it can be spread over multiple sessions with result aggregation done on the client side. Or, if you you have 9.6, you might try using backend parallelism: https://www.postgresql.org/docs/9.6/static/parallel-query.html [I've not used this, but some people have done it successfully.] If you can restrict the FTS query to certain keys: SELECT id FROM mytable WHERE tsquery( ... ) @@ to_tsvector(v) AND k IN ( ... ) GROUP BY id [note: according to David Rowley, GROUP BY may be parallelized whereas DISTINCT currently cannot be.] then given an index on 'k' it may be much faster than just the FTS query alone. Subject to key variability, it also may be improved by table partitioning to reduce the search space. If the FTS query is key restricted, you can parallelize either on the client or on the server. If the FTS query is not key restricted, you pretty much are limited to server side (and 9.6 or later). And I'm out of suggestions for now. Parallel query is your best bet for maximum performance, but unless you have enough RAM to hold the entire table and its indexes, and all the query workspaces, then I doubt you will be able to get anywhere near your optimistic execution target for FTS on 40+ million rows. YMMV, George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
okay, messing around a bit more with the secondary k,v table it seems like this could be a good solution.. I created a keys table to hold the 63 key values, then I dropped and recreated the secondary table, using a FK referencing the keys table. I'm not really sure why, but a basic full text query on 44 million row is taking aproxx. 20ms. my table structure is: Table "public.samples_lg_txt" Column | Type | Modifiers +--+--- id | integer | key| integer | val| text | tsv| tsvector | Indexes: "idx_tsv_samples_lg_text" gin (tsv) Foreign-key constraints: "samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) ON DELETE CASCADE "samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id) how would I write an AND query that filtered on 2 separate keys from the samples_lg_txt table? something like: SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv @@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ to_tsquery('value2'))*; On Wed, Apr 19, 2017 at 11:57 AM, Rj Ewingwrote: > I did some testing using a secondary table with the key, value column. > However I don't think this will provide the performance that we need. > Queries we taking 60+ seconds just for a count. > > With 1 million rows in the primary table, this resulted in 44 million rows > in the secondary k,v table for full text searching. The same query is es > takes ~50 ms on my local machine with 1/10th the ram allocated to es then > was allocated to psql. > > I'm gonna test using trigrams indexes on approx 10 json fields, and see if > that gives us what we are looking for. > > any thought on getting sub 1 sec queries on a table with 44 million rows? > > RJ > > On Tue, Apr 18, 2017 at 10:35 PM, George Neuner > wrote: > >> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing >> wrote: >> >> >I am evaluating postgres for as a datastore for our webapp. We are moving >> >away from a triple store db due to performance issues. >> > >> >Our data model consists of sets of user defined attributes. Approx 10% of >> >the attributes tend to be 100% filled with 50% of the attributes having >> >approx 25% filled. This is fairly sparse data, and it seems that jsonb or >> >hstore will be best for us. >> > >> >Unfortunately, from my understanding, postres doesn't support fulltext >> >search across hstore or jsonb key:values or even the entire document. >> While >> >this is not a deal breaker, this would be a great feature to have. We >> have >> >been experimenting w/ elasticsearch a bit, and particularly enjoy this >> >feature, however we don't really want to involve the complexity and >> >overhead of adding elasticsearch in front of our datasource right now. >> >> hstore and JSON values all really are just formatted text with a >> custom column type. You can create tsvectors from the values if you >> cast them to text. >> >> Note that a tsvector can only work on a /flat/ key:value structure: it >> won't understand nesting, and it and even with a flat store it won't >> understand the difference between keys/tags and the associated values. >> >> E.g., you will be able to see that a value contains both "foo" and >> "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z' >> you either must check the token positions (from the tsvector) or *try* >> to extract the key(s) you are interested in and check the associated >> value(s). >> >> This might work ok if you search only for keys in a "document" ... but >> trying to search values, I think would be far too complicated. >> >> It might help if you stored a 2D array instead of a flat structure, >> but even that would be fairly complicated to work with. >> >> >> >> >An idea that has come up is to use a materialized view or secondary table >> >with triggers, where we would have 3 columns (id, key, value). >> > >> >I think this would allow us to store a tsvector and gin index. Giving us >> >the ability to use fulltext search on k:v pairs, then join the original >> >data on the id field to return the entire record. >> >> This is a much better idea because it separates the key from the >> value, and unlike the full "document" case [above], you will know that >> the FTS index is covering only the values. >> >> If you need to preserve key order to reconstruct records, you will >> need an additional column to maintain that ordering. >> >> >> >is anyone currently doing this? Is there a better alternative? Any >> >performance issues that immediately jump out ( I realize the writes will >> >take longer)? >> > >> >the nature of our data is "relatively" static with bulk uploads (100 - >> 1000 >> >records). So we can sacrifice some write performance. >> > >> >RJ >> >> Having to "reconstruct" records will make reads take longer as well, >> but I think separating the keys and values is the
Re: [GENERAL] full text search on hstore or json with materialized view?
I did some testing using a secondary table with the key, value column. However I don't think this will provide the performance that we need. Queries we taking 60+ seconds just for a count. With 1 million rows in the primary table, this resulted in 44 million rows in the secondary k,v table for full text searching. The same query is es takes ~50 ms on my local machine with 1/10th the ram allocated to es then was allocated to psql. I'm gonna test using trigrams indexes on approx 10 json fields, and see if that gives us what we are looking for. any thought on getting sub 1 sec queries on a table with 44 million rows? RJ On Tue, Apr 18, 2017 at 10:35 PM, George Neunerwrote: > On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing > wrote: > > >I am evaluating postgres for as a datastore for our webapp. We are moving > >away from a triple store db due to performance issues. > > > >Our data model consists of sets of user defined attributes. Approx 10% of > >the attributes tend to be 100% filled with 50% of the attributes having > >approx 25% filled. This is fairly sparse data, and it seems that jsonb or > >hstore will be best for us. > > > >Unfortunately, from my understanding, postres doesn't support fulltext > >search across hstore or jsonb key:values or even the entire document. > While > >this is not a deal breaker, this would be a great feature to have. We have > >been experimenting w/ elasticsearch a bit, and particularly enjoy this > >feature, however we don't really want to involve the complexity and > >overhead of adding elasticsearch in front of our datasource right now. > > hstore and JSON values all really are just formatted text with a > custom column type. You can create tsvectors from the values if you > cast them to text. > > Note that a tsvector can only work on a /flat/ key:value structure: it > won't understand nesting, and it and even with a flat store it won't > understand the difference between keys/tags and the associated values. > > E.g., you will be able to see that a value contains both "foo" and > "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z' > you either must check the token positions (from the tsvector) or *try* > to extract the key(s) you are interested in and check the associated > value(s). > > This might work ok if you search only for keys in a "document" ... but > trying to search values, I think would be far too complicated. > > It might help if you stored a 2D array instead of a flat structure, > but even that would be fairly complicated to work with. > > > > >An idea that has come up is to use a materialized view or secondary table > >with triggers, where we would have 3 columns (id, key, value). > > > >I think this would allow us to store a tsvector and gin index. Giving us > >the ability to use fulltext search on k:v pairs, then join the original > >data on the id field to return the entire record. > > This is a much better idea because it separates the key from the > value, and unlike the full "document" case [above], you will know that > the FTS index is covering only the values. > > If you need to preserve key order to reconstruct records, you will > need an additional column to maintain that ordering. > > > >is anyone currently doing this? Is there a better alternative? Any > >performance issues that immediately jump out ( I realize the writes will > >take longer)? > > > >the nature of our data is "relatively" static with bulk uploads (100 - > 1000 > >records). So we can sacrifice some write performance. > > > >RJ > > Having to "reconstruct" records will make reads take longer as well, > but I think separating the keys and values is the best way to do it. > > > YMMV, > George > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] full text search on hstore or json with materialized view?
On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewingwrote: >I am evaluating postgres for as a datastore for our webapp. We are moving >away from a triple store db due to performance issues. > >Our data model consists of sets of user defined attributes. Approx 10% of >the attributes tend to be 100% filled with 50% of the attributes having >approx 25% filled. This is fairly sparse data, and it seems that jsonb or >hstore will be best for us. > >Unfortunately, from my understanding, postres doesn't support fulltext >search across hstore or jsonb key:values or even the entire document. While >this is not a deal breaker, this would be a great feature to have. We have >been experimenting w/ elasticsearch a bit, and particularly enjoy this >feature, however we don't really want to involve the complexity and >overhead of adding elasticsearch in front of our datasource right now. hstore and JSON values all really are just formatted text with a custom column type. You can create tsvectors from the values if you cast them to text. Note that a tsvector can only work on a /flat/ key:value structure: it won't understand nesting, and it and even with a flat store it won't understand the difference between keys/tags and the associated values. E.g., you will be able to see that a value contains both "foo" and "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z' you either must check the token positions (from the tsvector) or *try* to extract the key(s) you are interested in and check the associated value(s). This might work ok if you search only for keys in a "document" ... but trying to search values, I think would be far too complicated. It might help if you stored a 2D array instead of a flat structure, but even that would be fairly complicated to work with. >An idea that has come up is to use a materialized view or secondary table >with triggers, where we would have 3 columns (id, key, value). > >I think this would allow us to store a tsvector and gin index. Giving us >the ability to use fulltext search on k:v pairs, then join the original >data on the id field to return the entire record. This is a much better idea because it separates the key from the value, and unlike the full "document" case [above], you will know that the FTS index is covering only the values. If you need to preserve key order to reconstruct records, you will need an additional column to maintain that ordering. >is anyone currently doing this? Is there a better alternative? Any >performance issues that immediately jump out ( I realize the writes will >take longer)? > >the nature of our data is "relatively" static with bulk uploads (100 - 1000 >records). So we can sacrifice some write performance. > >RJ Having to "reconstruct" records will make reads take longer as well, but I think separating the keys and values is the best way to do it. YMMV, George -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search on hstore or json with materialized view?
A step in the right direction for me, however it doesn't appear to support per field full text searching. It is exciting though! On Tue, Apr 18, 2017 at 3:00 PM, Bruce Momjianwrote: > On Tue, Apr 18, 2017 at 02:38:15PM -0700, Rj Ewing wrote: > > I am evaluating postgres for as a datastore for our webapp. We are > moving away > > from a triple store db due to performance issues. > > > > Our data model consists of sets of user defined attributes. Approx 10% > of the > > attributes tend to be 100% filled with 50% of the attributes having > approx 25% > > filled. This is fairly sparse data, and it seems that jsonb or hstore > will be > > best for us. > > > > Unfortunately, from my understanding, postres doesn't support fulltext > search > > across hstore or jsonb key:values or even the entire document. While > this is > > not a deal breaker, this would be a great feature to have. We have been > > experimenting w/ elasticsearch a bit, and particularly enjoy this > feature, > > however we don't really want to involve the complexity and overhead of > adding > > elasticsearch in front of our datasource right now. > > Full text search of JSON and JSONB data is coming in Postgres 10, which > is to to be released in September of this year: > > https://www.depesz.com/2017/04/04/waiting-for-postgresql- > 10-full-text-search-support-for-json-and-jsonb/ > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + >
Re: [GENERAL] full text search on hstore or json with materialized view?
On Tue, Apr 18, 2017 at 02:38:15PM -0700, Rj Ewing wrote: > I am evaluating postgres for as a datastore for our webapp. We are moving away > from a triple store db due to performance issues. > > Our data model consists of sets of user defined attributes. Approx 10% of the > attributes tend to be 100% filled with 50% of the attributes having approx 25% > filled. This is fairly sparse data, and it seems that jsonb or hstore will be > best for us. > > Unfortunately, from my understanding, postres doesn't support fulltext search > across hstore or jsonb key:values or even the entire document. While this is > not a deal breaker, this would be a great feature to have. We have been > experimenting w/ elasticsearch a bit, and particularly enjoy this feature, > however we don't really want to involve the complexity and overhead of adding > elasticsearch in front of our datasource right now. Full text search of JSON and JSONB data is coming in Postgres 10, which is to to be released in September of this year: https://www.depesz.com/2017/04/04/waiting-for-postgresql-10-full-text-search-support-for-json-and-jsonb/ -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] full text search on hstore or json with materialized view?
I am evaluating postgres for as a datastore for our webapp. We are moving away from a triple store db due to performance issues. Our data model consists of sets of user defined attributes. Approx 10% of the attributes tend to be 100% filled with 50% of the attributes having approx 25% filled. This is fairly sparse data, and it seems that jsonb or hstore will be best for us. Unfortunately, from my understanding, postres doesn't support fulltext search across hstore or jsonb key:values or even the entire document. While this is not a deal breaker, this would be a great feature to have. We have been experimenting w/ elasticsearch a bit, and particularly enjoy this feature, however we don't really want to involve the complexity and overhead of adding elasticsearch in front of our datasource right now. An idea that has come up is to use a materialized view or secondary table with triggers, where we would have 3 columns (id, key, value). I think this would allow us to store a tsvector and gin index. Giving us the ability to use fulltext search on k:v pairs, then join the original data on the id field to return the entire record. is anyone currently doing this? Is there a better alternative? Any performance issues that immediately jump out ( I realize the writes will take longer)? the nature of our data is "relatively" static with bulk uploads (100 - 1000 records). So we can sacrifice some write performance. RJ
Re: [GENERAL] Full Text Search combined with Fuzzy
On 03.03.2017 16:17, Nicolas Paris wrote: Nice ! I do have 9.6 version. Would this kind of index could handle more than 20M large texts ? The recheck condition looks ressource consuming. You are right. I think pg_trgm will be not good for such large texts, unfortunately. The full text index + phrase search + synonym dictionnary is the only other alternativ to deal with typo-phrase mining ? I suppose there are no other options now. Though, prefix search maybe will help you [1]. Is there any possibility in the future to add typo in the full text road-map ? As far as I know, there is no plans in the near future to add similarity full text search. 1. https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Search combined with Fuzzy
Le 03 mars 2017 à 14:08, Artur Zakirov écrivait : > On 03.03.2017 15:49, Nicolas Paris wrote: > > > >Hi Oleg, > > > >Thanks. I thought pgtrgm was not able to index my long texts because of > >limitation of 8191 bytes per index row for btree. > > > >Then I found out it is possible to use pgtrgm over a GIN/GIST index. > >My final use case is phrase mining in texts. > > > >I want my application returns texts that contains approximatly the user > >entry: > > > >Eg: user search "Hello Word" > >a text containing "blah blah blah hello world blah blah blah" would be > >returned. > > > >Test: > >postgres=# CREATE table test_trgm (texts text); > >CREATE TABLE > >postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops); > >CREATE INDEX > >postgres=# SET enable_seqscan = OFF; > >SET > >postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah > >blah blah'); > >INSERT 0 1 > >postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah > >blah blah'); > >INSERT 0 1 > >postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm > >WHERE texts % 'hello word'; > > texts | similarity > >---+ > > blah blah blah hello world blah blah blah | 0.473684 > > blah blah blah hello word blah blah blah | 0.6875 > >(2 rows) > > > >postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM > >test_trgm WHERE texts % 'hello word'; > >QUERY PLAN > >--- > > Bitmap Heap Scan on test_trgm (cost=52.01..56.03 rows=1 width=32) > > Recheck Cond: (texts % 'hello word'::text) > > -> Bitmap Index Scan on test_trgm_texts_idx (cost=0.00..52.01 rows=1 > > width=0) > > Index Cond: (texts % 'hello word'::text) > >(4 rows) > > > >Conclusion: If I d'say 0.4 is my threshold, would this methodology meet > >my requirements ? > > > >Thanks for the help ! > > > > Hello, > > If you use PostgreSQL 9.6, then word_similarity() can help you [1]. For > example: > > postgres=# SELECT texts, word_similarity('hello word', texts) FROM test_trgm > WHERE 'hello word' <% texts; >texts | word_similarity > ---+- > blah blah blah hello world blah blah blah |0.818182 > blah blah blah hello word blah blah blah | 1 > (2 rows) > > 1. https://www.postgresql.org/docs/9.6/static/pgtrgm.html > Nice ! I do have 9.6 version. Would this kind of index could handle more than 20M large texts ? The recheck condition looks ressource consuming. The full text index + phrase search + synonym dictionnary is the only other alternativ to deal with typo-phrase mining ? Is there any possibility in the future to add typo in the full text road-map ? Thanks, > -- > Artur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Search combined with Fuzzy
On 03.03.2017 15:49, Nicolas Paris wrote: Hi Oleg, Thanks. I thought pgtrgm was not able to index my long texts because of limitation of 8191 bytes per index row for btree. Then I found out it is possible to use pgtrgm over a GIN/GIST index. My final use case is phrase mining in texts. I want my application returns texts that contains approximatly the user entry: Eg: user search "Hello Word" a text containing "blah blah blah hello world blah blah blah" would be returned. Test: postgres=# CREATE table test_trgm (texts text); CREATE TABLE postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops); CREATE INDEX postgres=# SET enable_seqscan = OFF; SET postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah blah'); INSERT 0 1 postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah blah'); INSERT 0 1 postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; texts | similarity ---+ blah blah blah hello world blah blah blah | 0.473684 blah blah blah hello word blah blah blah | 0.6875 (2 rows) postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; QUERY PLAN --- Bitmap Heap Scan on test_trgm (cost=52.01..56.03 rows=1 width=32) Recheck Cond: (texts % 'hello word'::text) -> Bitmap Index Scan on test_trgm_texts_idx (cost=0.00..52.01 rows=1 width=0) Index Cond: (texts % 'hello word'::text) (4 rows) Conclusion: If I d'say 0.4 is my threshold, would this methodology meet my requirements ? Thanks for the help ! Hello, If you use PostgreSQL 9.6, then word_similarity() can help you [1]. For example: postgres=# SELECT texts, word_similarity('hello word', texts) FROM test_trgm WHERE 'hello word' <% texts; texts | word_similarity ---+- blah blah blah hello world blah blah blah |0.818182 blah blah blah hello word blah blah blah | 1 (2 rows) 1. https://www.postgresql.org/docs/9.6/static/pgtrgm.html -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Search combined with Fuzzy
Le 27 févr. 2017 à 10:32, Oleg Bartunov écrivait : > > > On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Pariswrote: > > Hello, > > AFAIK there is no built-in way to combine full text search and fuzzy > matching > (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html). > By example, phrase searching with tipos in it. > > First I don't know if postgresql concurrents (lucene based...) are able > to do so. > > > Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used > for > this. > Hi Oleg, Thanks. I thought pgtrgm was not able to index my long texts because of limitation of 8191 bytes per index row for btree. Then I found out it is possible to use pgtrgm over a GIN/GIST index. My final use case is phrase mining in texts. I want my application returns texts that contains approximatly the user entry: Eg: user search "Hello Word" a text containing "blah blah blah hello world blah blah blah" would be returned. Test: postgres=# CREATE table test_trgm (texts text); CREATE TABLE postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops); CREATE INDEX postgres=# SET enable_seqscan = OFF; SET postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah blah'); INSERT 0 1 postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah blah'); INSERT 0 1 postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; texts | similarity ---+ blah blah blah hello world blah blah blah | 0.473684 blah blah blah hello word blah blah blah | 0.6875 (2 rows) postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; QUERY PLAN --- Bitmap Heap Scan on test_trgm (cost=52.01..56.03 rows=1 width=32) Recheck Cond: (texts % 'hello word'::text) -> Bitmap Index Scan on test_trgm_texts_idx (cost=0.00..52.01 rows=1 width=0) Index Cond: (texts % 'hello word'::text) (4 rows) Conclusion: If I d'say 0.4 is my threshold, would this methodology meet my requirements ? Thanks for the help ! > > Second, is such feature is in the road map ? > > Third, I wonder if it is a good idea to use the postgresql synonyms > feature for such prupose.(https://www.postgresql.org/docs/current/static/ > textsearch-dictionaries.html) > I mean, building up a synonyms dictionnary containing tipos. By eg: > > postgres pgsql > postgresql pgsql > postgrez pgsql > postgre pgsql > gogle googl > gooogle googl > > There is multiple way to build such dictionary. But my question is about > the implementation of dictionnaries in postgresql: Is postgresql > supposed to take advantage of billion entries dictionaries ? > > > dictionary is just a program, so it's up to developer how to write efficient > program to deal with billion entries. Specifically to synonym dictionary, it's > not intended to work with a lot of entries. btw, have a look on contrib/ > dict_xsyn dictionary, which is more flexible than synonym. > > > Thanks by advance for you answers, > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Search combined with Fuzzy
On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Pariswrote: > Hello, > > AFAIK there is no built-in way to combine full text search and fuzzy > matching > (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html). > By example, phrase searching with tipos in it. > > First I don't know if postgresql concurrents (lucene based...) are able > to do so. > Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used for this. > > Second, is such feature is in the road map ? > > Third, I wonder if it is a good idea to use the postgresql synonyms > feature for such prupose.(https://www.postgresql.org/docs/current/ > static/textsearch-dictionaries.html) > I mean, building up a synonyms dictionnary containing tipos. By eg: > > postgrespgsql > postgresql pgsql > postgrezpgsql > postgre pgsql > gogle googl > gooogle googl > > There is multiple way to build such dictionary. But my question is about > the implementation of dictionnaries in postgresql: Is postgresql > supposed to take advantage of billion entries dictionaries ? > dictionary is just a program, so it's up to developer how to write efficient program to deal with billion entries. Specifically to synonym dictionary, it's not intended to work with a lot of entries. btw, have a look on contrib/dict_xsyn dictionary, which is more flexible than synonym. > > Thanks by advance for you answers, > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
[GENERAL] Full Text Search combined with Fuzzy
Hello, AFAIK there is no built-in way to combine full text search and fuzzy matching (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html). By example, phrase searching with tipos in it. First I don't know if postgresql concurrents (lucene based...) are able to do so. Second, is such feature is in the road map ? Third, I wonder if it is a good idea to use the postgresql synonyms feature for such prupose.(https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html) I mean, building up a synonyms dictionnary containing tipos. By eg: postgrespgsql postgresql pgsql postgrezpgsql postgre pgsql gogle googl gooogle googl There is multiple way to build such dictionary. But my question is about the implementation of dictionnaries in postgresql: Is postgresql supposed to take advantage of billion entries dictionaries ? Thanks by advance for you answers, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search randomly not working for short prefixes?
På fredag 02. desember 2016 kl. 16:33:12, skrev Tom Lane>: cen writes: > Something funny going on with my full text search.. and I have no idea what. The way to debug this sort of thing is generally to look at what tsquery you're actually getting. I get regression=# select to_tsquery(unaccent('a:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('an:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('ana:*')); to_tsquery 'ana':* (1 row) Of course, only the last is going to match 'ana'. So you need to use a text search configuration in which a/an are not stop words. Or possibly you could cast the unaccent result directly to tsquery rather than passing it through to_tsquery(), though likely that would just have a different set of failure modes with queries where you do wish stemming would occur. The problem with "no" seems to be the same. One can always specify 'simple' as the config, eliminating any "stop-wprd smartness": andreak=> select to_tsquery('simple', 'a:*'); to_tsquery 'a':* (1 row) -- Andreas Joseph Krogh
Re: [GENERAL] Full text search randomly not working for short prefixes?
Thanks, that makes sense. I think I'll go with the cast approach, I don't really need stemming anywhere. Tom Lane je 02. 12. 2016 ob 16:33 napisal: cenwrites: Something funny going on with my full text search.. and I have no idea what. The way to debug this sort of thing is generally to look at what tsquery you're actually getting. I get regression=# select to_tsquery(unaccent('a:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('an:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('ana:*')); to_tsquery 'ana':* (1 row) Of course, only the last is going to match 'ana'. So you need to use a text search configuration in which a/an are not stop words. Or possibly you could cast the unaccent result directly to tsquery rather than passing it through to_tsquery(), though likely that would just have a different set of failure modes with queries where you do wish stemming would occur. The problem with "no" seems to be the same. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search randomly not working for short prefixes?
cenwrites: > Something funny going on with my full text search.. and I have no idea what. The way to debug this sort of thing is generally to look at what tsquery you're actually getting. I get regression=# select to_tsquery(unaccent('a:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('an:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) regression=# select to_tsquery(unaccent('ana:*')); to_tsquery 'ana':* (1 row) Of course, only the last is going to match 'ana'. So you need to use a text search configuration in which a/an are not stop words. Or possibly you could cast the unaccent result directly to tsquery rather than passing it through to_tsquery(), though likely that would just have a different set of failure modes with queries where you do wish stemming would occur. The problem with "no" seems to be the same. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search randomly not working for short prefixes?
Something funny going on with my full text search.. and I have no idea what. I have a receiver called "Ana", this is her tsv column: '3865100':4 'acevent...@mailinator.com':3B 'ana':1A 'novak':2A This queries do not find her: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('a:*')); SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('an:*')); This does: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('ana:*')); Now to an even more interesting part: I have 3 people with last name "Novak" and one with name "Nov" This query finds all 4: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('n:*')); This finds NONE: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('no:*')); This finds all 4 again: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nov:*')); ..and this finds all with the last name only: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nova:*')); These are the TSV columns of last name: "'3865100':4 'janez':1A 'janezno...@mailinator.com':3B 'novak':2A" "'3865100':4 'acevent...@mailinator.com':3B 'ana':1A 'novak':2A" "'3865100':4 'novak':2A 'tine':1A 'tno...@mailinator.com':3B" "'2141500':4 'alen.n...@gmailer.com':3B 'allan':1A 'novak':2A" And the first name: "'38651604724':6 'brez':3A 'list':4A 'nov':1A 'novreceiver...@mailinator.com':5B 'receiv':2A" What is going on here? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search tsv column aproach vs concat confusion
Hi, On 16.11.2016 11:54, cen wrote: Hi I am seeking some clarification in regard to full text search across multiple tables and what the best approach is. Documentation talks about two approaches when it comes to building a document: on-the-fly concat of columns and a dedicated tsv column approach. Let's say I want to perform a search for |"txt1 & txt2 & txt3" on columns table1.col1, table1.col2 and table2.col1. I see the following solutions:| |1. Concat all three into a document and perform a FTS.| |SELECT * FROM ( | |SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery |||WHEREsubquery.document@@to_tsquery(unaccent(?));| | |2. Create a tsv column in each table, concat tsv columns and perform FTS on that.| |SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));| |3. Have a tsv column only in table1 and insert table2.col1 to the tsv via triggers. Works but seems very hacky.| | | |It seems to me that option #2 is fast and easy to implement but I am not sure what the concat of tsvs really means from index usage and performance standpoint. Option #1 is the most flexible and I'd use that all the time if it was not THAT much slower than tsv column approacj. Documentation on TSV columns states: "||Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches." | The question is, how much faster are tsv columns really? Are there any benchmarks about this? If the performance difference is negligible I'd advocate that using tsv columns is a waste of time and space in most general cases. But since there is no information on how much faster it's hard to decide. I haven't any such benchmarks. But if you have a real database, you can perform tests using it on your solutions. Because it depends on your task and what you need. By the way, I suppose it is better to use COALESCE() function if your columns could have NULL value: SELECT * FROM ( SELECT to_tsvector(coalesce(table1.col1,'')) || to_tsvector(coalesce(table1.col2,'')) || to_tsvector(coalesce(table2.col1,'')) as document FROM table1 LEFT JOIN table2 ON table1.table2_id=table2.id ) subquery WHERE subquery.document @@ to_tsquery(unaccent(?)); And specifying a text search configuration makes queries a little bit faster: ... to_tsvector('english', coalesce(table1.col1,'')) ... -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search tsv column aproach vs concat confusion
Hi I am seeking some clarification in regard to full text search across multiple tables and what the best approach is. Documentation talks about two approaches when it comes to building a document: on-the-fly concat of columns and a dedicated tsv column approach. Let's say I want to perform a search for |"txt1 & txt2 & txt3" on columns table1.col1, table1.col2 and table2.col1. I see the following solutions:| |1. Concat all three into a document and perform a FTS.| |SELECT * FROM ( | |SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery |||WHEREsubquery.document@@to_tsquery(unaccent(?));| | |2. Create a tsv column in each table, concat tsv columns and perform FTS on that.| |SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));| |3. Have a tsv column only in table1 and insert table2.col1 to the tsv via triggers. Works but seems very hacky.| | | |It seems to me that option #2 is fast and easy to implement but I am not sure what the concat of tsvs really means from index usage and performance standpoint. Option #1 is the most flexible and I'd use that all the time if it was not THAT much slower than tsv column approacj. Documentation on TSV columns states: "||Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches." | The question is, how much faster are tsv columns really? Are there any benchmarks about this? If the performance difference is negligible I'd advocate that using tsv columns is a waste of time and space in most general cases. But since there is no information on how much faster it's hard to decide. Best regards, Klemen ||
Re: [GENERAL] Full text search on Chemistry text
On Wed, Jun 8, 2016 at 8:46 AM, Allan Kamauwrote: > I would like to generate tsvectors on documents that contain chemistry > related text. > Is there a synonym dictionary for chemistry terms available? > > -Allan > I did a Google search on "chemistry terms dictionary" and got a lot of hits. Are you asking for a "canonical" and "official" list? -- The unfacts, did we have them, are too imprecisely few to warrant our certitude. Maranatha! <>< John McKown
[GENERAL] Full text search on Chemistry text
I would like to generate tsvectors on documents that contain chemistry related text. Is there a synonym dictionary for chemistry terms available? -Allan
Re: [GENERAL] full text search index
You don't provide much context, like PostgreSQL version or machine characteristics. https://wiki.postgresql.org/wiki/SlowQueryQuestions On Wed, May 25, 2016 at 11:04 PM, Patrick Bakerwrote: >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >>AND j0_.id = 5) >> AND j0_.id = 5 > > The query is taking ages to run. > > I read about wildcards and it seems I have to use a function with > to_tsvector ? I very much doubt that full text search is going to be helpful here -- perhaps trigrams with an appropriate gist or gin index could help. Depending on table sizes and data present, picking out rows based on the OR of scanning for a sequence of characters in a couple character string columns might not be your fastest query to run. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search index
On 26 May 2016 at 06:04, Patrick Bakerwrote: > Hi there, > > I've got the following query: >> >> >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >>AND j0_.id = 5) >> AND j0_.id = 5 > > > The query is taking ages to run. Your guess is as good as ours without knowing what query plan the database decided on. Post the output of explain analyze. > I read about wildcards and it seems I have to use a function with > to_tsvector ? In general, you have to use an expression of which the query planner can see that it's equivalent to the expression used in the index. Otherwise the query planner has no way of knowing whether the index is suitable for the query and it won't use the index. >> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", >> ("full_text_universal_cast"("name_first"::"text")), >> ("full_text_universal_cast"("name_last"::"text"))); In your case, you should query on full_text_universal_cast(your_field) instead of on like '%some%'. Alternatively, if your query always uses the sanme wildcard expression you could create indexes on your_field like '%some%'. > full_text_universal_cast: >> >> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data >> "text") >> RETURNS "tsvector" AS >> $BODY$ >> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), '')); >> $BODY$ >> LANGUAGE sql IMMUTABLE >> COST 1000; The query planner has no way of knowing what this function does internally, so it certainly won't match the function results in the index up with your like expression. Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search index
> > > > Maybe Lucas Possamai can help. He seems to be a little bit further along > in this exercise. > > I'm too tired to care at the moment. And I haven't had much practical > work here anyway. > > David J. > > > > I subscribed to the list today, so don't have the old emails I had a look on the archives tough, Lucas do you have any progress? If so, please let me know :) Thanks Patrick
[GENERAL] full text search index
On Thursday, May 26, 2016, Patrick Bakerwrote: > Hi there, > > I've got the following query: > >> >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >>AND j0_.id = 5) >> AND j0_.id = 5 > > > The query is taking ages to run. > > I read about wildcards and it seems I have to use a function with to_tsvector > ? > >> >> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", >> ("full_text_universal_cast"("name_first"::"text")), >> ("full_text_universal_cast"("name_last"::"text"))); > > > full_text_universal_cast: > >> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data >> "text") >> RETURNS "tsvector" AS >> $BODY$ >> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), '')); >> $BODY$ >> LANGUAGE sql IMMUTABLE >> COST 1000; > > > > > Would be something like above? Because it's not working... > > What am I missing guys? > Thanks > Maybe Lucas Possamai can help. He seems to be a little bit further along in this exercise. I'm too tired to care at the moment. And I haven't had much practical work here anyway. David J.
[GENERAL] full text search index
Hi there, I've got the following query: > > SELECT COUNT(DISTINCT j0_.id) AS sclr10 > FROM customers j0_ > WHERE ((LOWER(j0_.name_first) LIKE '%some%' > OR LOWER(j0_.name_last) LIKE '%some%') >AND j0_.id = 5) > AND j0_.id = 5 The query is taking ages to run. I read about wildcards and it seems I have to use a function with to_tsvector ? > > CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", > ("full_text_universal_cast"("name_first"::"text")), > ("full_text_universal_cast"("name_last"::"text"))); full_text_universal_cast: > CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text") > RETURNS "tsvector" AS > $BODY$ > SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), '')); > $BODY$ > LANGUAGE sql IMMUTABLE > COST 1000; Would be something like above? Because it's not working... What am I missing guys? Thanks
Re: [GENERAL] Full text search question: "01.Bez." --> "Erster Bezirk"
Hello, On Sat, Mar 12, 2016 at 11:40 AM, Johann Höchtlwrote: > I fear I have an involved challenge concerning FTS. > > Assume I have the following text in a column: > > Graz,06.Bez.:Blah > > This parses as: > SELECT alias, description, token FROM ts_debug('german', > 'Graz,06.Bez.:Blah'); >alias | description | token > ---+-+ > asciiword | Word, all ASCII | Graz > blank | Space symbols | , > host | Host| 06.Bez > blank | Space symbols | .: > asciiword | Word, all ASCII | Blah > > > Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez > means "6th district" > > My first problem might be that the parser identifies "06.Bez." as a host > lexeme, but ... > > I already defined a synonym dictionary to enable searching for "Bezirk", > when there is only "Bez." in the database: > > file: bevaddress_host.syn: > 01.bez bezirk > 06.bez bezirk > > > > CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn ( > TEMPLATE = synonym, > SYNONYMS = bevaddress_host > ); > ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH > bevaddress_host_syn, simple; > > > I wonder how I can achieve to be able to search for "Erster Bezirk" > ("First district") to match eg. "01.Bez." > > Thank you for your help, Johann > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > As of the time of writing this I haven't seen any replies to your post so you may not be aware that an answer was provided to your specific question in a blog. http://obartunov.livejournal.com/185579.html Regards, Dane
[GENERAL] Full text search question: "01.Bez." --> "Erster Bezirk"
I fear I have an involved challenge concerning FTS. Assume I have the following text in a column: Graz,06.Bez.:Blah This parses as: SELECT alias, description, token FROM ts_debug('german', 'Graz,06.Bez.:Blah'); alias | description | token ---+-+ asciiword | Word, all ASCII | Graz blank | Space symbols | , host | Host| 06.Bez blank | Space symbols | .: asciiword | Word, all ASCII | Blah Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez means "6th district" My first problem might be that the parser identifies "06.Bez." as a host lexeme, but ... I already defined a synonym dictionary to enable searching for "Bezirk", when there is only "Bez." in the database: file: bevaddress_host.syn: 01.bez bezirk 06.bez bezirk CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn ( TEMPLATE = synonym, SYNONYMS = bevaddress_host ); ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH bevaddress_host_syn, simple; I wonder how I can achieve to be able to search for "Erster Bezirk" ("First district") to match eg. "01.Bez." Thank you for your help, Johann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] : :Full text search query ::
Hi All, please find herewith the following query 1. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104') it is showing only 1 record as output, it is expected to give 17 records as output. 2. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/') it is showing only 1 record as output, it is expected to give 17 records as output. 3. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/1') it is showing 17 records as output. In our search case we are passing parameter value as 104 and expected to get 17 records. Kindly some one guide here.
Re: [GENERAL] : :Full text search query ::
Hi, On 25.2.2015 12:50, JD wrote: Hi All, please find herewith the following query 1. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104') it is showing only 1 record as output, it is expected to give 17 records as output. 2. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/') it is showing only 1 record as output, it is expected to give 17 records as output. 3. select * from partdetails where scode=118 and (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/1') it is showing 17 records as output. In our search case we are passing parameter value as 104 and expected to get 17 records. Kindly some one guide here. You need to post 'to_tsvector('english', part_number)' for the 16 rows that you think should be returned but aren't. Fulltext works so that it transforms the source (part_number in this case) as defined in the text search configuration ('english'), and compares this with the tsquery. My bet is that the transformation keeps the whole string ('104/1') in this case, so that it does not match the tsquery. ISTM you're trying to do a prefix search on the part_number. In that case fulltext may not be the right solution, because it's fuzzy by nature. If you have two-part part numbers (i.e. it's always A/B) then maybe split that into two fields, and use simple equality conditions on each field. So instead of column 'part_number' containing valuye '104/1' use two columns part_number_a and part_number_b, containing values '104' and '1', and simple equality queries WHERE part_number_a = '104' and part_number_b = '1' or (if you want to match just the first part) WHERE part_number_a = '104' Another option is to use an index with a 'varchar_pattern_ops' opclass, which allows you to do prefix LIKE queries [1] CREATE INDEX custom_partnum_idx ON partdetails (part_number varchar_pattern_ops); SELECT ... FROM partdetails WHERE part_number LIKE '104/%' [1] http://www.postgresql.org/docs/9.2/static/indexes-opclass.html regards -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search prefix matching
Hi, I’m trying to use PostgreSQL's full text search for searching names with prefix matching. I’ve got a materialized view with the tsvector’s in an indexed column which I’m then searching with prefix matching, as in the sqlfiddle: http://sqlfiddle.com/#!15/a2389/6 and below. My problem is that when using the ‘finnish’ text search configuration, the names are split before the end, and they’re not matched when searching with prefix search that has exactly one character more than the lexeme, as also demonstrated in the above fiddle. When there are two characters after the lexeme, it does match. I’m working around the issue by using ‘simple’ configuration, which includes the full words in the lexemes. Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query? How can I get it to work? Best regards, Heikki Rauhala [1] http://www.postgresql.org/docs/9.3/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES create table names (name varchar); insert into names (name) values ('Sofia'); create materialized view name_fulltext as select name, to_tsvector('finnish', name) as searchable_index_col from names; select * from name_fulltext; select to_tsquery('finnish','sof:*'); select 'found sof', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sof:*'); select 'notfound sofi', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofi:*'); select 'found sofia', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofia:*'); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search prefix matching
On Tue, 16 Dec 2014 14:59:51 +0200 Heikki Rauhala heikki.rauh...@reaktor.fi wrote: Hi Heikki, There is a typo : create materialized view name_fulltext as ERREUR: erreur de syntaxe sur ou près de « materialized » LIGNE 1 : create materialized view name_fulltext as You might want to post the results too, next time; I posted them with a corrected script below: create table names (name varchar); insert into names (name) values ('Sofia'); create view name_fulltext as select name, to_tsvector('finnish', name) as searchable_index_col from names; select * from name_fulltext; select to_tsquery('finnish','sof:*'); select 'found sof', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sof:*'); select 'notfound sofi', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofi:*'); select 'found sofia', name from name_fulltext where searchable_index_col@@to_tsquery('finnish','sofia:*'); vv= select * from name_fulltext; name | searchable_index_col ---+-- Sofia | 'sof':1 (1 ligne) vv= select to_tsquery('finnish','sof:*'); to_tsquery 'sof':* (1 ligne) vv= vv= select 'found sof', name vv- from name_fulltext vv- where searchable_index_col@@to_tsquery('finnish','sof:*'); ?column? | name ---+--- found sof | Sofia (1 ligne) vv= vv= select 'notfound sofi', name vv- from name_fulltext vv- where searchable_index_col@@to_tsquery('finnish','sofi:*'); ?column? | name --+-- (0 ligne) vv= vv= select 'found sofia', name vv- from name_fulltext vv- where searchable_index_col@@to_tsquery('finnish','sofia:*'); ?column? | name -+--- found sofia | Sofia (1 ligne) -- Regards, Vincent Veyron https://libremen.com/ Legal case, contract and insurance claim management software -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search prefix matching
Heikki Rauhala heikki.rauh...@reaktor.fi writes: Should text search prefixes work predicatably as documented in [1] even if the lexemes are shorter than the query? How can I get it to work? I believe what you're seeing can be explained by these observations: regression=# select to_tsvector('finnish', 'sofia'); to_tsvector - 'sof':1 (1 row) regression=# select to_tsquery('finnish','sofia:*'); to_tsquery 'sof':* (1 row) regression=# select to_tsquery('finnish','sofi:*'); to_tsquery 'sofi':* (1 row) regression=# select to_tsquery('finnish','sof:*'); to_tsquery 'sof':* (1 row) What this shows is that the finnish configuration includes a word-stemming rule that strips off ia. It won't strip off just i though, so sofi doesn't get reduced to the same root and therefore doesn't match sofia. The * addition does nothing for you here since it allows matching in the other direction (query shorter than target). I know nothing of Finnish so I can't say just how correct these particular stemming rules are for that language; perhaps they need adjustment. But it seems to me that if you want blind non-language-aware prefix matching, you probably don't want the full-text-search machinery at all. Full text search is meant to deal with words, both in the documents and the queries. You might take a look at pg_trgm as an alternative. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search prefix matching
On 16.12.2014, at 17:09, Vincent Veyron vv.li...@wanadoo.fr wrote: On Tue, 16 Dec 2014 14:59:51 +0200 Heikki Rauhala heikki.rauh...@reaktor.fi wrote: Hi Heikki, There is a typo : create materialized view name_fulltext as ERREUR: erreur de syntaxe sur ou près de « materialized » LIGNE 1 : create materialized view name_fulltext as I forgot to mention the postgres version, which is 9.3. Materialized views were introduced in 9.3 and are important to me in this case, because they can be indexed. Next time I’ll include both the version and the full output. Best regards, - Heikki Rauhala -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full-Text Search question
Search for fulltext tutorial + json functions http://www.postgresql.org/docs/9.3/static/functions-json.html On Wed, May 14, 2014 at 1:00 AM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: thanks... could you please be so kind to post some snippet/code for this? Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.comwrote: Easy, you need to extract text fields from json and construct tsvector from them (use concatenation, for example). On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: can postgres do FTS (full text search) on a json column? if possible, please be so kindd to give some snippet/example. Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up
[GENERAL] Full-Text Search question
can postgres do FTS (full text search) on a json column? if possible, please be so kindd to give some snippet/example. Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up
Re: [GENERAL] Full-Text Search question
Easy, you need to extract text fields from json and construct tsvector from them (use concatenation, for example). On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: can postgres do FTS (full text search) on a json column? if possible, please be so kindd to give some snippet/example. Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full-Text Search question
thanks... could you please be so kind to post some snippet/code for this? Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.com wrote: Easy, you need to extract text fields from json and construct tsvector from them (use concatenation, for example). On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano jesusraf...@gmail.com wrote: can postgres do FTS (full text search) on a json column? if possible, please be so kindd to give some snippet/example. Att. == Jesus Rafael Sanchez Medrano Life is a dream, of which all must wake up
Re: [GENERAL] Full text search on partial URLs
On 11/15/2013 07:40 PM, Zev Benjamin wrote: One problem that I've run into here is that I would also like to highlight matched text in my application. For my existing search solution, I do this with ts_headline. For partial matches, it's unfortunately not just a matter of searching for the text and adding the appropriate markup because my documents are HTML (the FTS lexer helpfully pulls out all the HTML tags so it hasn't been a problem so far) and we don't want to accidentally highlight some of the attributes of the markup. One way to solve this would be if there were a way to turn a tsvector and tsquery pair into a list of the offsets and lengths of the lexemes that match. The highlighting could then be done at the application level rather than the database level while still leveraging Postgres's FTS functionality. I've written C functions to implement this and attached them to this email. The support files necessary for making a module are available at https://github.com/zbenjamin/tsearch_extras. I'm new to the PostgreSQL code base so any feedback or comments would be greatly appreciated. Would these be appropriate to submit as patches to PostgreSQL? Thanks, Zev /*- * * This use of this program is subject to the terms of The PostgreSQL License: * * Copyright (c) 2013, Zulip, Inc. * * Permission to use, copy, modify, and distribute this software and its * documentation for any purpose, without fee, and without a written agreement * is hereby granted, provided that the above copyright notice and this * paragraph and the following two paragraphs appear in all copies. * * IN NO EVENT SHALL ZULIP, INC. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, * SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ZULIP, * INC. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * ZULIP, INC. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND Zulip, * Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, * ENHANCEMENTS, OR MODIFICATIONS. * *- */ #include postgres.h #include fmgr.h #include funcapi.h #include catalog/pg_type.h #include tsearch/ts_utils.h #include tsearch/ts_public.h #include tsearch/ts_cache.h #include utils/elog.h #include utils/array.h #include utils/builtins.h PG_MODULE_MAGIC; typedef struct { int4 cur_word; int4 num_words; int4 char_offset; HeadlineWordEntry* words; } TsMatchesData; typedef struct { int4 offset; int4 len; } TsMatchLocation; PG_FUNCTION_INFO_V1(ts_match_locs_array); PG_FUNCTION_INFO_V1(ts_match_locs_array_byid); PG_FUNCTION_INFO_V1(tsvector_lexemes); Datum ts_match_locs_array(PG_FUNCTION_ARGS); Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS); Datum tsvector_lexemes(PG_FUNCTION_ARGS); static void ts_match_locs_setup(Oid cfgId, TsMatchesData *mdata, text* in, TSQuery query) { HeadlineParsedText prs; TSConfigCacheEntry *cfg; TSParserCacheEntry *prsobj; cfg = lookup_ts_config_cache(cfgId); prsobj = lookup_ts_parser_cache(cfg-prsId); memset(prs, 0, sizeof(HeadlineParsedText)); prs.lenwords = 32; prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords); hlparsetext(cfgId, prs, query, VARDATA(in), VARSIZE(in) - VARHDRSZ); FunctionCall3((prsobj-prsheadline), PointerGetDatum(prs), PointerGetDatum(NIL), PointerGetDatum(query)); mdata-cur_word = 0; mdata-char_offset = 0; mdata-num_words = prs.curwords; mdata-words = prs.words; } static bool ts_match_locs_next_match(TsMatchesData *mdata, TsMatchLocation *match) { while (mdata-cur_word mdata-num_words) { HeadlineWordEntry* word = mdata-words + mdata-cur_word; int offset = mdata-char_offset; mdata-cur_word++; if (! word-skip) { mdata-char_offset += word-len; if (word-selected) { match-offset = offset; match-len = word-len; return true; } } } return false; } Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS) { TsMatchesData mdata; TsMatchLocation match; Oid cfgId = PG_GETARG_OID(0); text *in = PG_GETARG_TEXT_P(1); TSQuery query = PG_GETARG_TSQUERY(2); ArrayType *result; Datum *elems; int num_matches_allocd = 6; /* a random guess */ int num_matches = 0; int result_dims[2]; int result_lbs[2]; elems = palloc(sizeof(Datum) * 2 * num_matches_allocd); ts_match_locs_setup(cfgId, mdata, in, query); while (ts_match_locs_next_match(mdata, match)) { if (num_matches = num_matches_allocd) { num_matches_allocd *= 1.5; elems = repalloc(elems, sizeof(Datum) * 2 * num_matches_allocd); } elems[num_matches * 2] = Int32GetDatum(match.offset); elems[num_matches * 2 + 1] =
Re: [GENERAL] Full text search on partial URLs
On 11/15/2013 07:40 PM, Zev Benjamin wrote: One problem that I've run into here is that I would also like to highlight matched text in my application. For my existing search solution, I do this with ts_headline. For partial matches, it's unfortunately not just a matter of searching for the text and adding the appropriate markup because my documents are HTML (the FTS lexer helpfully pulls out all the HTML tags so it hasn't been a problem so far) and we don't want to accidentally highlight some of the attributes of the markup. One way to solve this would be if there were a way to turn a tsvector and tsquery pair into a list of the offsets and lengths of the lexemes that match. The highlighting could then be done at the application level rather than the database level while still leveraging Postgres's FTS functionality. I've written C functions to implement this and attached them to this email. The support files necessary for making a module are available at https://github.com/zbenjamin/tsearch_extras. I'm new to the PostgreSQL code base so any feedback or comments would be greatly appreciated. Would these be appropriate to submit as patches to PostgreSQL? Thanks, Zev /*- * * This use of this program is subject to the terms of The PostgreSQL License: * * Copyright (c) 2013, Zulip, Inc. * * Permission to use, copy, modify, and distribute this software and its * documentation for any purpose, without fee, and without a written agreement * is hereby granted, provided that the above copyright notice and this * paragraph and the following two paragraphs appear in all copies. * * IN NO EVENT SHALL ZULIP, INC. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, * SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ZULIP, * INC. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * ZULIP, INC. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND Zulip, * Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, * ENHANCEMENTS, OR MODIFICATIONS. * *- */ #include postgres.h #include fmgr.h #include funcapi.h #include catalog/pg_type.h #include tsearch/ts_utils.h #include tsearch/ts_public.h #include tsearch/ts_cache.h #include utils/elog.h #include utils/array.h #include utils/builtins.h PG_MODULE_MAGIC; typedef struct { int4 cur_word; int4 num_words; int4 char_offset; HeadlineWordEntry* words; } TsMatchesData; typedef struct { int4 offset; int4 len; } TsMatchLocation; PG_FUNCTION_INFO_V1(ts_match_locs_array); PG_FUNCTION_INFO_V1(ts_match_locs_array_byid); PG_FUNCTION_INFO_V1(tsvector_lexemes); Datum ts_match_locs_array(PG_FUNCTION_ARGS); Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS); Datum tsvector_lexemes(PG_FUNCTION_ARGS); static void ts_match_locs_setup(Oid cfgId, TsMatchesData *mdata, text* in, TSQuery query) { HeadlineParsedText prs; TSConfigCacheEntry *cfg; TSParserCacheEntry *prsobj; cfg = lookup_ts_config_cache(cfgId); prsobj = lookup_ts_parser_cache(cfg-prsId); memset(prs, 0, sizeof(HeadlineParsedText)); prs.lenwords = 32; prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords); hlparsetext(cfgId, prs, query, VARDATA(in), VARSIZE(in) - VARHDRSZ); FunctionCall3((prsobj-prsheadline), PointerGetDatum(prs), PointerGetDatum(NIL), PointerGetDatum(query)); mdata-cur_word = 0; mdata-char_offset = 0; mdata-num_words = prs.curwords; mdata-words = prs.words; } static bool ts_match_locs_next_match(TsMatchesData *mdata, TsMatchLocation *match) { while (mdata-cur_word mdata-num_words) { HeadlineWordEntry* word = mdata-words + mdata-cur_word; int offset = mdata-char_offset; mdata-cur_word++; if (! word-skip) { mdata-char_offset += word-len; if (word-selected) { match-offset = offset; match-len = word-len; return true; } } } return false; } Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS) { TsMatchesData mdata; TsMatchLocation match; Oid cfgId = PG_GETARG_OID(0); text *in = PG_GETARG_TEXT_P(1); TSQuery query = PG_GETARG_TSQUERY(2); ArrayType *result; Datum *elems; int num_matches_allocd = 6; /* a random guess */ int num_matches = 0; int result_dims[2]; int result_lbs[2]; elems = palloc(sizeof(Datum) * 2 * num_matches_allocd); ts_match_locs_setup(cfgId, mdata, in, query); while (ts_match_locs_next_match(mdata, match)) { if (num_matches = num_matches_allocd) { num_matches_allocd *= 1.5; elems = repalloc(elems, sizeof(Datum) * 2 * num_matches_allocd); } elems[num_matches * 2] = Int32GetDatum(match.offset); elems[num_matches * 2 + 1] =
[GENERAL] Full text search on partial URLs
Hi, I have Postgres full text search set up for my application and it's been working great! However, my users would like their searches to turn up parts of URLs. For example, they would like a search for foobar to turn up a document that contains the string http://example.com/foobar/blah; (and similarly for queries like example and blah). With the default dictionaries for host, url, and url_path, the search query would have to contain the complete host or url path. What is the best way to accomplish this? Should I be looking at building a custom dictionary that breaks down hosts and urls or is there something simpler I can do? Thanks, Zev -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search on partial URLs
On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin zev-pg...@strangersgate.comwrote: Hi, I have Postgres full text search set up for my application and it's been working great! However, my users would like their searches to turn up parts of URLs. For example, they would like a search for foobar to turn up a document that contains the string http://example.com/foobar/blah; (and similarly for queries like example and blah). With the default dictionaries for host, url, and url_path, the search query would have to contain the complete host or url path. What is the best way to accomplish this? Should I be looking at building a custom dictionary that breaks down hosts and urls or is there something simpler I can do? Have you looked into trigrams? http://www.postgresql.org/docs/current/static/pgtrgm.html
Re: [GENERAL] Full text search on partial URLs
On 11/06/2013 01:47 PM, bricklen wrote: On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote: Hi, I have Postgres full text search set up for my application and it's been working great! However, my users would like their searches to turn up parts of URLs. For example, they would like a search for foobar to turn up a document that contains the string http://example.com/foobar/__blah http://example.com/foobar/blah (and similarly for queries like example and blah). With the default dictionaries for host, url, and url_path, the search query would have to contain the complete host or url path. What is the best way to accomplish this? Should I be looking at building a custom dictionary that breaks down hosts and urls or is there something simpler I can do? Have you looked into trigrams? http://www.postgresql.org/docs/current/static/pgtrgm.html I've looked at it in the context of adding fuzzy search. But my understanding is that doing a fuzzy search here would only work if the query were a significant fraction of, say, the url path. For example, I would expect a fuzzy search of foobar on /foobar/x to return a high similarity, but a fuzzy search of foobar on /foobar/some/very/long/path/x to have a low similarity. Or are you suggesting using trigrams in a different way? Zev -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search on partial URLs
On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin zev-pg...@strangersgate.comwrote: On 11/06/2013 01:47 PM, bricklen wrote: On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote: Hi, I have Postgres full text search set up for my application and it's been working great! However, my users would like their searches to turn up parts of URLs. For example, they would like a search for foobar to turn up a document that contains the string http://example.com/foobar/__blah http://example.com/foobar/blah (and similarly for queries like example and blah). With the default dictionaries for host, url, and url_path, the search query would have to contain the complete host or url path. What is the best way to accomplish this? Should I be looking at building a custom dictionary that breaks down hosts and urls or is there something simpler I can do? Have you looked into trigrams? http://www.postgresql.org/docs/current/static/pgtrgm.html I've looked at it in the context of adding fuzzy search. But my understanding is that doing a fuzzy search here would only work if the query were a significant fraction of, say, the url path. For example, I would expect a fuzzy search of foobar on /foobar/x to return a high similarity, but a fuzzy search of foobar on /foobar/some/very/long/path/x to have a low similarity. Or are you suggesting using trigrams in a different way? Yeah, I was thinking more along the lines of allowing wildcard searching, not similarity. Eg. CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( yourcol gist_trgm_ops ); select * from yourtable where yourcol ~~ '%foobar%';
Re: [GENERAL] Full text search on partial URLs
On 11/06/2013 02:04 PM, bricklen wrote: On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote: On 11/06/2013 01:47 PM, bricklen wrote: On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com mailto:zev-pgsql@__strangersgate.com mailto:zev-pg...@strangersgate.com wrote: Hi, I have Postgres full text search set up for my application and it's been working great! However, my users would like their searches to turn up parts of URLs. For example, they would like a search for foobar to turn up a document that contains the string http://example.com/foobar/blah http://example.com/foobar/__blah http://example.com/foobar/__blah http://example.com/foobar/blah (and similarly for queries like example and blah). With the default dictionaries for host, url, and url_path, the search query would have to contain the complete host or url path. What is the best way to accomplish this? Should I be looking at building a custom dictionary that breaks down hosts and urls or is there something simpler I can do? Have you looked into trigrams? http://www.postgresql.org/__docs/current/static/pgtrgm.__html http://www.postgresql.org/docs/current/static/pgtrgm.html I've looked at it in the context of adding fuzzy search. But my understanding is that doing a fuzzy search here would only work if the query were a significant fraction of, say, the url path. For example, I would expect a fuzzy search of foobar on /foobar/x to return a high similarity, but a fuzzy search of foobar on /foobar/some/very/long/path/__x to have a low similarity. Or are you suggesting using trigrams in a different way? Yeah, I was thinking more along the lines of allowing wildcard searching, not similarity. Eg. CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( yourcol gist_trgm_ops ); select * from yourtable where yourcol ~~ '%foobar%'; Hrm. That might work. So the application-level search functionality would be the union of tsearch and trigram wildcard matching. If anyone else has other ideas, I'd be interested in hearing them as well. Thanks, Zev -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search regression tests
I am sorry this mail was not meant for this list.
[GENERAL] Full text search regression tests
Hello All, Attached patch adds regression tests to check the full-text search capability of pg_bigm. Regards, Beena Emerson bigm-fulltext-search-regression.patch Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search
Hello everyone, I am using full text search, however it is not helping me to get the partial value. For example in my document let Pizza Hut is there, if i am searching for Pizza Hut is it giving me the values for only Pizza or a spell mistake like pizz is not returning any thing. any work around for this, please suggest. Regards, Itishree
Re: [GENERAL] Full text search
On Fri, Jul 12, 2013 at 11:48 AM, itishree sukla itishree.su...@gmail.comwrote: Hello everyone, I am using full text search, however it is not helping me to get the partial value. For example in my document let Pizza Hut is there, if i am searching for Pizza Hut is it giving me the values for only Pizza or a spell mistake like pizz is not returning any thing. any work around for this, please suggest. Regards, Itishree Hope you are looking like this. create table foo(v text); insert into foo values('Near to my house there\'s no any Pizza Hut restuarant'); insert into foo values('I wont like pizza, but friends are crazy of it'); postgres=# select * from foo where v ~* 'pizz'; v -- Near to my house there's no any Pizza Hut restuarant I wont like pizza, but friends are crazy of it (2 rows) postgres=# select * from foo where v ~* 'pizza hut'; v -- Near to my house there's no any Pizza Hut restuarant (1 row) postgres=# select * from foo where v ~* 'pizza'; v -- Near to my house there's no any Pizza Hut restuarant I wont like pizza, but friends are crazy of it (2 rows) or with ILIKE select * from foo where v ilike '%hut%'; --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Full text search
Hi Raghavendra, Thanks for your response, however i want same kind of result using full text search. is it possible? Regards, Itishree On Fri, Jul 12, 2013 at 12:14 PM, Raghavendra raghavendra@enterprisedb.com wrote: On Fri, Jul 12, 2013 at 11:48 AM, itishree sukla itishree.su...@gmail.com wrote: Hello everyone, I am using full text search, however it is not helping me to get the partial value. For example in my document let Pizza Hut is there, if i am searching for Pizza Hut is it giving me the values for only Pizza or a spell mistake like pizz is not returning any thing. any work around for this, please suggest. Regards, Itishree Hope you are looking like this. create table foo(v text); insert into foo values('Near to my house there\'s no any Pizza Hut restuarant'); insert into foo values('I wont like pizza, but friends are crazy of it'); postgres=# select * from foo where v ~* 'pizz'; v -- Near to my house there's no any Pizza Hut restuarant I wont like pizza, but friends are crazy of it (2 rows) postgres=# select * from foo where v ~* 'pizza hut'; v -- Near to my house there's no any Pizza Hut restuarant (1 row) postgres=# select * from foo where v ~* 'pizza'; v -- Near to my house there's no any Pizza Hut restuarant I wont like pizza, but friends are crazy of it (2 rows) or with ILIKE select * from foo where v ilike '%hut%'; --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Full text search
On Fri, Jul 12, 2013 at 12:56 PM, itishree sukla itishree.su...@gmail.com wrote: Hi Raghavendra, Thanks for your response, however i want same kind of result using full text search. is it possible? What do you mean when you say you are using full text search? Like what is the query that you used? That would be helpful. -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search, SQL functions, and the planner
Hi, To maintain an externally defined interface, I'd like to define a function hasword(haystack, needle) that (essentially) returns 1 when to_tsvector(haystack) @@ to_tsquery(needle), 0 otherwise. I've tried CREATE OR REPLACE FUNCTION ivo_hasword(haystack TEXT, needle TEXT) RETURNS INTEGER AS $func$ SELECT CASE WHEN to_tsvector('english', $1) @@ plainto_tsquery($2) THEN 1 ELSE 0 END $func$ LANGUAGE SQL STABLE; It seems the planner inlines the function body, as intended: # explain select * from rr.resource where 1=ivo_hasword(res_title, 'optical'); ... Seq Scan on resource (cost=0.00..2269.72 rows=69 width=924) Filter: (1 = CASE WHEN (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) THEN 1 ELSE 0 END) -- but it doesn't use the index on res_title, as it could, as exhibited by this equivalent query: # explain select * from rr.resource where to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text); ... Bitmap Heap Scan on resource (cost=21.96..731.76 rows=252 width=924) Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) - Bitmap Index Scan on resource_res_title (cost=0.00..21.89 rows=252 width=0) Index Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) Indeed, the index isn't used either when I'm writing the 1 = CASE WHEN statement into the query directly (this is on postgres 9.1). Sure enough, if I define the function like this: CREATE OR REPLACE FUNCTION bool_hasword(haystack TEXT, needle TEXT) RETURNS BOOLEAN AS $func$ SELECT to_tsvector('english', $1) @@ plainto_tsquery($2) $func$ LANGUAGE SQL STABLE; (i.e., returning a boolean instead of the lousy integer), the index is used. So -- is there a way to let the planner look through the CASE? Or some way of turning the boolean to an integer that the planner can see through? The CAST(... AS INTEGER) that fortunately would even produce the right numbers appears to be opaque to the planner. Anticipating the sane and logical answer: The boolean function doesn't really help me; this is about handing through that function directly to ADQL (http://www.ivoa.net/documents/latest/ADQL.html) as a user defined function, and those user defined functions cannot be boolean-valued. Cheers, Markus -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search, SQL functions, and the planner
Marc Dahn d...@tfiu.de writes: So -- is there a way to let the planner look through the CASE? No. It would actually be wrong to simplify the expression in the way you're imagining, since 1 = case when bool_expr then 1 else 0 end does not give the same result as the plain bool_expr if the latter yields null. If you're sufficiently desperate, though, you might consider some hack like this: regression=# CREATE OR REPLACE FUNCTION b_hasword(haystack TEXT, needle TEXT) RETURNS boolean AS $func$ SELECT to_tsvector('english', $1) @@ plainto_tsquery($2) $func$ LANGUAGE SQL STABLE; CREATE FUNCTION regression=# create function inteqbool(int,bool) returns bool as regression-# $$select $1::bool = $2 $$ language sql stable; CREATE FUNCTION regression=# CREATE OPERATOR = (procedure = inteqbool, leftarg=int, rightarg=bool); CREATE OPERATOR regression=# explain select * from resource where 1=b_hasword(res_title, 'optical'); QUERY PLAN Bitmap Heap Scan on resource (cost=4.20..14.38 rows=7 width=32) Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) - Bitmap Index Scan on resource_to_tsvector_idx (cost=0.00..4.20 rows=7 width=0) Index Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) (4 rows) which relies on the fact that the planner *will* simplify boolexpr = true to just boolexpr, so that after a couple of rounds of inlining and simplification we get to just the bare @@ expression. But aside from the time spent doing that, this approach could have unpleasant side effects in the form of causing int = bool expressions to be accepted generally, thus masking errors. Might be better to fix your client-side code. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search in Chinese
Can anyone shed any light on how I might go about implementing full text search for Chinese? I have heard I need to go down the NGRAM route but would love to get some external feedback. Our application has a per user 'ignore' list that we 'left outer join where null' sort of deal on so I really rather not go out to Solr or anything external if possible. I bet there is something simple as it seems like each character is an individual search term. Thoughts? Hints? Am I posting in the right place? Thanks! -- *Christian Jensen* 724 Ioco Rd Port Moody, BC V3H 2W8 +1 (778) 996-4283 christ...@jensenbox.com
Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
There is some good news coming from Oleg Bartunov and Alexander Korotkov about improving ranking speed: http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf It's worth reading their slides to gain a better understanding of PostgreSQL fulltext internals. On Wed, Mar 7, 2012 at 8:05 PM, Nicolas Grilly nico...@gardentechno.comwrote: In a previous discussion thread, Oleg suggested that ts_rank is unable to use GIN indices: http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php This is the only information I have about this.
Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
On Wed, Mar 7, 2012 at 11:59 AM, Andrey Chursin an...@danasoft.ws wrote: Is there any way to sort by ranking, avoiding seq scan? The only way i see now is to use pg_trgm instead of ts_rank, but we did not check yet how applicable is it for our purposes. pg_tgrm works very well in terms of measuring similarity between two ascii strings...many non-english languages will struggle. I doubt (although I ever tried) it's useful for matching a small phrase to a large document. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full-text search: Problems with dictionaries and periods
I made a custom synonym dictionary for my full-text search, using the following code from the documentation. CREATE TEXT SEARCH DICTIONARY my_synonym ( TEMPLATE = synonym, SYNONYMS = my_synonyms ); ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH my_synonym, english_stem; It works great, except on words with periods in them. For example, B.B.Q.: = select * from ts_debug('english', 'B.B.Q.'); alias |description| token | dictionaries | dictionary | lexemes ---+---+---+--++- file | File or path name | B.B.Q | {simple} | simple | {b.b.q} blank | Space symbols | . | {} || (2 rows) It interprets the string as a filename (that's what alias: file means, right?), so the asciiword dictionary I set up above doesn't apply. How can I change this? — Theron
Re: [GENERAL] Full text search ts_heading strange result
Hallo Tom, I believe the problem is that the one-argument form of to_tsquery() uses the default TS configuration, which you have probably not got set to simple. For me, the default TS configuration is english, which will stem polity as politi: regression=# select to_tsquery('(polity church)'); to_tsquery - 'politi' 'church' (1 row) However the simple configuration doesn't do anything to that lexeme: Thanks for the explanation. I am working with a multi-language database and that was the reason for using the 'simple' configuration. I have asked, in an earlier message on this list, advice on how to handle full text searches in a multi-language database, but got no reaction to it. If there is a better way than using the 'simple' configuration in this case, I would gladly try it. Regards Johann. -- Johann SpiesTelefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. If any of you lack wisdom, let him ask of God, that giveth to all men liberally, and upbraideth not; and it shall be given him. James 1:5 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ts_heading strange result
On 07/26/2012 02:14 PM, Johann Spies wrote: Hallo Tom, I believe the problem is that the one-argument form of to_tsquery() uses the default TS configuration, which you have probably not got set to simple. For me, the default TS configuration is english, which will stem polity as politi: regression=# select to_tsquery('(polity church)'); to_tsquery - 'politi' 'church' (1 row) However the simple configuration doesn't do anything to that lexeme: Thanks for the explanation. I am working with a multi-language database and that was the reason for using the 'simple' configuration. I have asked, in an earlier message on this list, advice on how to handle full text searches in a multi-language database, but got no reaction to it. If there is a better way than using the 'simple' configuration in this case, I would gladly try it. You'll need to store language information alongside each text value if you want to do anything more sophisticated. If you have mixed languages within a single text value or if you don't store information about the language a text value is in then you're largely out of luck. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ts_heading strange result
On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote: You'll need to store language information alongside each text value if you want to do anything more sophisticated. I was afraid that that will be the case :) I will have to update more than 32 entries which currently have inconsistent language indications and some of them none at all. Thanks for responding. Regards Johann -- Johann SpiesTelefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. If any of you lack wisdom, let him ask of God, that giveth to all men liberally, and upbraideth not; and it shall be given him. James 1:5 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ts_heading strange result
Johann Spies jsp...@sun.ac.za writes: On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote: You'll need to store language information alongside each text value if you want to do anything more sophisticated. I was afraid that that will be the case :) I'm not sure that there's anything horribly wrong with the strategy of using simple for everything. You won't get language-aware stemming, but maybe you don't need that. The problem with what you originally posted was not that simple was inadequate, but that you weren't applying it consistently --- you didn't have default_text_search_configuration set to match. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search ts_heading strange result
I am beginning to use the full text search facilities in Postgresql (9.0) and find the result of this query a bit strange: query: SELECT ts_headline('simple',title, to_tsquery('kerkreg|(church polity)')) from akb_articles A where A.tsv@@ 'kerkreg|(church polity)' Result Kerkvereniging en bKerkreg/b: Geskiedenis, beginsel en praktyk.(bChurch/b unity and bchurch/b polity: History, principle and practice.) Why is 'polity' not highlighted? Regards Johann -- Johann SpiesTelefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. Look not every man on his own things, but every man also on the things of others.Philippians 2:4 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ts_heading strange result
Johann Spies jsp...@sun.ac.za writes: I am beginning to use the full text search facilities in Postgresql (9.0) and find the result of this query a bit strange: query: SELECT ts_headline('simple',title, to_tsquery('kerkreg|(church polity)')) from akb_articles A where A.tsv@@ 'kerkreg|(church polity)' Result Kerkvereniging en bKerkreg/b: Geskiedenis, beginsel en praktyk.(bChurch/b unity and bchurch/b polity: History, principle and practice.) Why is 'polity' not highlighted? I believe the problem is that the one-argument form of to_tsquery() uses the default TS configuration, which you have probably not got set to simple. For me, the default TS configuration is english, which will stem polity as politi: regression=# select to_tsquery('(polity church)'); to_tsquery - 'politi' 'church' (1 row) However the simple configuration doesn't do anything to that lexeme: regression=# select to_tsquery('simple', '(polity church)'); to_tsquery - 'polity' 'church' (1 row) So what you've got is ts_headline() parsing the given title against the simple configuration and getting polity, but the tsquery is looking for politi, hence no match. In short: omit the 'simple' argument from the ts_headline call, and things should play together better. You could alternatively insert to_tsquery('simple', '(polity church)'), but that won't exactly match what the @@ in WHERE is doing: that's going to use the default configuration. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search advice requested
I have a table with bibliometric information on published articles. Fields of interest for full text searches are the 'title' and 'abstract' fields. Those fields can contain several languages but most of the entries use English. A grouped query on the 'language' field reveals that the following languages are involved: Afrikaans Chinese Dutch English French Gaelic (?) German Hungarian Italian Japanese Korean Polish Portuguese Rumanian Russian Slovene Sotho Spanish Turkish Xhosa Zulu Now my questions: 1. Is it possible at all to use full text search in such a setup? 2. If so, how would I approach the different languages in indexing and querying. 3. How do I ask postgresql which dictionaries are already available in the installation for full text search? 4. If full text searches cannot be utilised in such a setup, can trgm-related indexing using 'similarity' be a replacement? I think not. Regards Johann -- Johann SpiesTelefoon: 021-808 4699 Databestuurder / Data manager Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie Centre for Research on Evaluation, Science and Technology Universiteit Stellenbosch. Delight thyself also in the LORD: and he shall give thee the desires of thine heart. Psalms 37:4 E-pos vrywaringsklousule Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie. E-mail disclaimer This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
Tom, We made most of our text, varchar columns citext data types so that we could do case insensitive searches. Is this going to negate most of the index searches? It appeared to our DBA that it would be easier to use citext data type then need to use ILIKE instead? Michael Gould Intermodal Software Solutions, LLC 904-226-0978 Original Message Subject: Re: [GENERAL] full text search and ILIKE type clauses. From: Tom Lane t...@sss.pgh.pa.us Date: Sun, March 11, 2012 7:56 pm To: Tim Uckun timuc...@gmail.com Cc: pgsql-general pgsql-general@postgresql.org Tim Uckun timuc...@gmail.com writes: I want to be able to search a lot of fields using queries that use ILIKE and unfortunately many of the queries will be using the '%SOMETHING%' or '%SOMETHING' type clauses. Since indexes are useless on those I was thinking I could use tsvectors but I can't figure out how to accomplish this. Full text search is not going to help for this unless you are willing to be very lax about replicating the semantics of ILIKE. For example, ILIKE '%foo%' should match foo anywhere within a word, but FTS is not going to be able to do better than finding words that begin with foo. If you're using 9.1, you might look into contrib/pg_trgm instead. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
We made most of our text, varchar columns citext data types so that we could do case insensitive searches. Is this going to negate most of the index searches? It appeared to our DBA that it would be easier to use citext data type then need to use ILIKE instead? In the same vein... Does postgres have case insensitive collations yet? Now that 9.1 supports column level collations that would be a really great option for case insensitive queries. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
Tim, It is my understanding that since the extention citext is available that this gives you what your asking for and at least at this point isn't going to be part of the core. Michael Gould Intermodal Software Solutions, LLC 904-226-0978 Original Message Subject: Re: [GENERAL] full text search and ILIKE type clauses. From: Tim Uckun timuc...@gmail.com Date: Mon, March 12, 2012 1:20 pm To: mgo...@isstrucksoftware.net Cc: Tom Lane t...@sss.pgh.pa.us, pgsql-general pgsql-general@postgresql.org We made most of our text, varchar columns citext data types so that we could do case insensitive searches. Is this going to negate most of the index searches? It appeared to our DBA that it would be easier to use citext data type then need to use ILIKE instead? In the same vein... Does postgres have case insensitive collations yet? Now that 9.1 supports column level collations that would be a really great option for case insensitive queries. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
It is my understanding that since the extention citext is available that this gives you what your asking for and at least at this point isn't going to be part of the core. For me it's more of a workaround than a solution but yes probably good enough. Collation is more subtle than case insensitive comparisons but for english anyway it should be good enough. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
2012/3/12 Tim Uckun timuc...@gmail.com: It is my understanding that since the extention citext is available that this gives you what your asking for and at least at this point isn't going to be part of the core. For me it's more of a workaround than a solution but yes probably good enough. Collation is more subtle than case insensitive comparisons but for english anyway it should be good enough. Postgres uses system locales - so theoretically you can write own case insensitive locale. Regards Pavel Stehule -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] full text search and ILIKE type clauses.
I want to be able to search a lot of fields using queries that use ILIKE and unfortunately many of the queries will be using the '%SOMETHING%' or '%SOMETHING' type clauses. Since indexes are useless on those I was thinking I could use tsvectors but I can't figure out how to accomplish this. One option I have seen reccomended is to create a tsvector field and concat the various fields into that field as a tsvector. That would work for general text search but I am not sure how I could query FIELD1 ILIKE '%SOMETHING%' vs FIELD2 ILIKE '%SOMETHING%' .Would using hstore help in this case? I should point out that some of the fields are numbers. I am willing to deal with those separately by building other indexes for them but of course a unified approach would be preferable. Any help would be much appreciated. Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
Tim Uckun timuc...@gmail.com writes: I want to be able to search a lot of fields using queries that use ILIKE and unfortunately many of the queries will be using the '%SOMETHING%' or '%SOMETHING' type clauses. Since indexes are useless on those I was thinking I could use tsvectors but I can't figure out how to accomplish this. Full text search is not going to help for this unless you are willing to be very lax about replicating the semantics of ILIKE. For example, ILIKE '%foo%' should match foo anywhere within a word, but FTS is not going to be able to do better than finding words that begin with foo. If you're using 9.1, you might look into contrib/pg_trgm instead. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search and ILIKE type clauses.
If you're using 9.1, you might look into contrib/pg_trgm instead. If I was to use trgm would it be better to create a trigram index on each text field? In the past I have created a text field which contains the rest of the fields concatenated. That works great as long as you are looking for any word. Is there a way to specify which word should match? Could I combine hstore and tgm to match against individual words? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
According to Oleg in a previous discussion, ts_rank does not use index because index does not store enough information for ranking: http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php On Sat, Feb 18, 2012 at 12:39, Andrey Chursin an...@danasoft.ws wrote: Hello I have two unrelated questions about fts function ts_rank: 1) I've created GiST index on column with fts vector, but query SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20 is perfomed with sequential table scan. Index was created on field column. Does it mean FTS indexes does not support order by ranking? Or I need somehow to create separated index for ranking? 2) I have a misunderstanding with proximity ranking work. Given two vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a' | 'b'). And it is equal! But when i am replacing query with operator, e.g. asking for ts_rank(vector, 'a' 'b') i am getting different numbers. Why do I get proximity ranking only for AND fts queries? This is a problem as far as to_tsquery produces OR queries, so i need self-written postprocessing of query to replace OR with AND. -- Regards, Andrey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Nicolas Grilly Garden / Vocation City +33 1 45 72 48 78 - office +33 6 03 00 25 34 - mobile www.gardentechno.com - Développement web reporting / *Web development data analytics* www.vocationcity.com - Plateforme de recrutement sur le web / *Web recruitment platform*
Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
Is there any way to sort by ranking, avoiding seq scan? The only way i see now is to use pg_trgm instead of ts_rank, but we did not check yet how applicable is it for our purposes. 7 марта 2012 г. 20:53 пользователь Nicolas Grilly nico...@gardentechno.com написал: According to Oleg in a previous discussion, ts_rank does not use index because index does not store enough information for ranking: http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php On Sat, Feb 18, 2012 at 12:39, Andrey Chursin an...@danasoft.ws wrote: Hello I have two unrelated questions about fts function ts_rank: 1) I've created GiST index on column with fts vector, but query SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20 is perfomed with sequential table scan. Index was created on field column. Does it mean FTS indexes does not support order by ranking? Or I need somehow to create separated index for ranking? 2) I have a misunderstanding with proximity ranking work. Given two vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a' | 'b'). And it is equal! But when i am replacing query with operator, e.g. asking for ts_rank(vector, 'a' 'b') i am getting different numbers. Why do I get proximity ranking only for AND fts queries? This is a problem as far as to_tsquery produces OR queries, so i need self-written postprocessing of query to replace OR with AND. -- Regards, Andrey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Nicolas Grilly Garden / Vocation City +33 1 45 72 48 78 - office +33 6 03 00 25 34 - mobile www.gardentechno.com - Développement web reporting / Web development data analytics www.vocationcity.com - Plateforme de recrutement sur le web / Web recruitment platform -- Regards, Andrey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
In a previous discussion thread, Oleg suggested that ts_rank is unable to use GIN indices: http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php This is the only information I have about this. On Wed, Mar 7, 2012 at 18:59, Andrey Chursin an...@danasoft.ws wrote: Is there any way to sort by ranking, avoiding seq scan? The only way i see now is to use pg_trgm instead of ts_rank, but we did not check yet how applicable is it for our purposes.
[GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries
Hello I have two unrelated questions about fts function ts_rank: 1) I've created GiST index on column with fts vector, but query SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20 is perfomed with sequential table scan. Index was created on field column. Does it mean FTS indexes does not support order by ranking? Or I need somehow to create separated index for ranking? 2) I have a misunderstanding with proximity ranking work. Given two vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a' | 'b'). And it is equal! But when i am replacing query with operator, e.g. asking for ts_rank(vector, 'a' 'b') i am getting different numbers. Why do I get proximity ranking only for AND fts queries? This is a problem as far as to_tsquery produces OR queries, so i need self-written postprocessing of query to replace OR with AND. -- Regards, Andrey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full Text Search, avoiding lexemes search
Hi guys! Full text search, searches by lexemes, this minds that if you are finding for gato word you are really finding for {gat} lexeme. I you construct vectors for the words gato, gatos, gata, gatas, all have the same lexema {gat} Then the search gato that is to say the search {gat} matches with all previous vectors. There some way (configuration, query) to match only for gato and avoid gatos gata gatas, with FTS ?? Or match only for gato gatos buy no for gata gatas? Tnks!
[GENERAL] Full Text Search, avoiding lexemes search
Hi guys! Full text search, searches by lexemes, this minds that if you are finding for gato word you are really finding for {gat} lexeme. I you construct vectors for the words gato, gatos, gata, gatas, all have the same lexema {gat} Then the search gato that is to say the search {gat} matches with all previous vectors. There some way (configuration, query) to match only for gato and avoid gatos gata gatas, with FTS ?? Or match only for gato gatos buy no for gata gatas? Tnks!
Re: [GENERAL] Full Text Search, avoiding lexemes search
Daniel, just use different fts configuration for search, which doesn't includes stemmers. Regards, Oleg On Fri, 27 Jan 2012, Daniel V?zquez wrote: Hi guys! Full text search, searches by lexemes, this minds that if you are finding for gato word you are really finding for {gat} lexeme. I you construct vectors for the words gato, gatos, gata, gatas, all have the same lexema {gat} Then the search gato that is to say the search {gat} matches with all previous vectors. There some way (configuration, query) to match only for gato and avoid gatos gata gatas, with FTS ?? Or match only for gato gatos buy no for gata gatas? Tnks! Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search - How to build a filtering dictionary
Hi all, I need to build a synonym dictionary that performs a normalization of tokens just like a filtering dictionary does. I've searched for a filtering dictionary template but I've found it. Where Can I find it? Or, if there isn't such a template, How can I build a simple filter dictionary that simply maps a term with another (in a synonym dict-like way)? Thanks in advance, Antonio -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search - How to build a filtering dictionary
Antonio, you can see contrib/unaccent dictionary, which is a filtering dictionary. I have a page about it - http://mira.sai.msu.su/~megera/wiki/unaccent Oleg On Wed, 18 Jan 2012, Antonio Franzoso wrote: Hi all, I need to build a synonym dictionary that performs a normalization of tokens just like a filtering dictionary does. I've searched for a filtering dictionary template but I've found it. Where Can I find it? Or, if there isn't such a template, How can I build a simple filter dictionary that simply maps a term with another (in a synonym dict-like way)? Thanks in advance, Antonio Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search - How to build a filtering dictionary
Thanks for reply, there is any simplest way? I have to do just a simple map (in a similar way of synonym dictionary), set the TSL_FILTER flag (if there's a map for a token) and then pass the normalized token to my own thesaurus dictionary. I'm working on Windows and I've to write a C library to do these operations (and I cannot see the unaccent code because it's a dll file). If there is no other solution, I though that I can integrate this filtering dictionary in the thesaurus in a similar way: token: lemma, term1, term2, where token is the denormalized term, lemma is one entry of thesaurus and term1, term2,... are terms associated with lemma in the original thesaurus structure. What do you think about this solution? Il 18/01/2012 17:40, Oleg Bartunov ha scritto: Antonio, you can see contrib/unaccent dictionary, which is a filtering dictionary. I have a page about it - http://mira.sai.msu.su/~megera/wiki/unaccent Oleg On Wed, 18 Jan 2012, Antonio Franzoso wrote: Hi all, I need to build a synonym dictionary that performs a normalization of tokens just like a filtering dictionary does. I've searched for a filtering dictionary template but I've found it. Where Can I find it? Or, if there isn't such a template, How can I build a simple filter dictionary that simply maps a term with another (in a synonym dict-like way)? Thanks in advance, Antonio Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
On 11.05.11 17:04, t...@fuzzy.cz t...@fuzzy.cz wrote: We had exactly the same problem and persistent connection solved it. First testing with persistent connections seems to work like a charm. Will do some thorough testing and watch the memory load. Hopefully, I will not trip over some sort of pitfall. Goole seems to be full of people who have problems with persistent connections. Big thanks for your advice. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
On 11.05.11 17:31, Tom Lane t...@sss.pgh.pa.us wrote: You really, really, really need to fix whatever is preventing you from using pooling. Opening a database connection to run one query is just horridly inefficient. Very true. I did not mean that anything actually prevents us from using pooling. We just have no idea, how it will interfere with our productive pgcluster setup. I imagine the evaluation, testing and verification of pooling systems in combination with our setup to be quite tedious. Of course, I don't open a connection for each query. One is opened for each service call. The services are designed to start an own process for every call. Such a process - for now - needs an own connection. It usually handles dozens of queries. Until now, we never ran into performance problems. The time consumed by DB operations is usually negligible compared to the rest. First tests with a simple persistent connection setup seem to work fine and solve the performance issue. I tend to put some thorough testing on this setup and see if I step into a pitfall. Big kudos to you and this list. You were a great help, as always. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] full text search to_tsquery performance with ispell dictionary
Hello everybody, I was experimenting with the FTS feature on postgres 8.3.4 lately and encountered a weird performance issue when using a custom FTS configuration. I use this german ispell dictionary, re-encoded to utf8: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-g erman-compound.tar.gz With the following configuration: CREATE TEXT SEARCH CONFIGURATION public.german_de (COPY = pg_catalog.german); CREATE TEXT SEARCH DICTIONARY german_de_ispell ( TEMPLATE = ispell, DictFile = german_de_utf8, AffFile = german_de_utf8, StopWords = german_de_utf8 ); ALTER TEXT SEARCH CONFIGURATION german_de ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH german_de_ispell, german_stem; So far so good. Indexing and creation of tsvectors works like a charm. The problem is, that if I open a new connection to the database and do something like this SELECT to_tsquery('german_de', 'abcd'); it takes A LOT of time for the query to complete for the first time. About 1-1,5s. If I submit the same query for a second, third, fourth time and so on, it takes only some 10-20ms, which is what I would expect. It almost seems as if the dictionary is somehow analyzed or indexed and the results cached for each connection, which seems counter-intuitive to me. After all, the dictionaries should not change that often. Did I miss something or did I do something wrong? I'd be thankful for any advice. Kind Regards -- Stanislav Raskin livn GmbH Campus Freudenberg Rainer-Gruenter-Str. 21 42119 Wuppertal +49(0)202-8 50 66 921 ras...@livn.de http://www.livn.de livn local individual video news GmbH Registergericht Wuppertal HRB 20086 Geschäftsführer: Dr. Stefan Brües Alexander Jacob
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
Hello 2011/5/11 Stanislav Raskin ras...@livn.de: Hello everybody, I was experimenting with the FTS feature on postgres 8.3.4 lately and encountered a weird performance issue when using a custom FTS configuration. I use this german ispell dictionary, re-encoded to utf8: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz With the following configuration: CREATE TEXT SEARCH CONFIGURATION public.german_de (COPY = pg_catalog.german); CREATE TEXT SEARCH DICTIONARY german_de_ispell ( TEMPLATE = ispell, DictFile = german_de_utf8, AffFile = german_de_utf8, StopWords = german_de_utf8 ); ALTER TEXT SEARCH CONFIGURATION german_de ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH german_de_ispell, german_stem; So far so good. Indexing and creation of tsvectors works like a charm. The problem is, that if I open a new connection to the database and do something like this SELECT to_tsquery('german_de', 'abcd'); it takes A LOT of time for the query to complete for the first time. About 1-1,5s. If I submit the same query for a second, third, fourth time and so on, it takes only some 10-20ms, which is what I would expect. It almost seems as if the dictionary is somehow analyzed or indexed and the results cached for each connection, which seems counter-intuitive to me. After all, the dictionaries should not change that often. Did I miss something or did I do something wrong? I'd be thankful for any advice. Kind Regards it is expected behave :( . A loading of ispell dictionary is very slow. Use a german snowball instead. You can you a some pooling connection software too. Regards Pavel Stehule -- Stanislav Raskin livn GmbH Campus Freudenberg Rainer-Gruenter-Str. 21 42119 Wuppertal +49(0)202-8 50 66 921 ras...@livn.de http://www.livn.de livn local individual video news GmbH Registergericht Wuppertal HRB 20086 Geschäftsführer: Dr. Stefan Brües Alexander Jacob -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
Stanislav Raskin ras...@livn.de writes: The problem is, that if I open a new connection to the database and do something like this SELECT to_tsquery('german_de', 'abcd'); it takes A LOT of time for the query to complete for the first time. About 1-1,5s. If I submit the same query for a second, third, fourth time and so on, it takes only some 10-20ms, which is what I would expect. It almost seems as if the dictionary is somehow analyzed or indexed and the results cached for each connection, which seems counter-intuitive to me. Yes, loading a large dictionary is known to be a fairly expensive operation. There's been discussions about how to make it cheaper, but nothing's been done yet. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
Hello 2011/5/11 Stanislav Raskin ras...@livn.de: On 11.05.11 15:45, Pavel Stehule pavel.steh...@gmail.com wrote: it is expected behave :( . A loading of ispell dictionary is very slow. Use a german snowball instead. You can you a some pooling connection software too. Thank you for the response. Is the dictionary german_stem supplied with postgresql a snowball stemmer? If yes, it sure is incredibly fast, but yields much worse results and thus fewer and worse matches for search queries. To use connections pooling is...difficult in my situation, to say the least. We currently use quite a complex pgcluster/corosync setup for multi-master replication, load balancing and high availability. To introduce connection pooling to this setup could turn out to be quite a big project. German_stem is part of distribution. I am thinking so result of stems are usable because the reports about slow speed are not often. There are not exists Czech stem, so we have to use a ispell. I wrote a patch that stores loaded dictionary in shared memory. You can find source code in archive pg_hacker mailing list. But it isn't well tested and it is just prototype - not accepted to pg. You can test it. Sometimes people use a simple configuration here. It isn't best but it is fast. Regards Pavel Stehule -- Stanislav Raskin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
Yes, loading a large dictionary is known to be a fairly expensive operation. There's been discussions about how to make it cheaper, but nothing's been done yet. regards, tom lane Hi Tom, thanks for the quick response. Bad news for me ;( We develop ajax-driven web apps, which sort of rely on quick calls to data services. Each call to a service opens a new connection. This makes the search service, if using fts and ispell, about 100 times slower than a dumb ILIKE-implementation. Is there any way of hack or compromise to achieve good performance without losing fts ability? I am thinking, for example, of a way to permanently keep a loaded dictionary in memory instead of loading it for every connection. As I wrote in response to Pavel Stehule's post, connection pooling is not really an option. Our front-end is strictly PHP, so I was thinking about using a single persistent connection (http://de.php.net/manual/en/function.pg-pconnect.php) for all calls. Is there some sort of major disadvantage in this approach from the database point of view? Kind regards -- Stanislav Raskin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
On 11.05.11 15:45, Pavel Stehule pavel.steh...@gmail.com wrote: it is expected behave :( . A loading of ispell dictionary is very slow. Use a german snowball instead. You can you a some pooling connection software too. Thank you for the response. Is the dictionary german_stem supplied with postgresql a snowball stemmer? If yes, it sure is incredibly fast, but yields much worse results and thus fewer and worse matches for search queries. To use connections pooling is...difficult in my situation, to say the least. We currently use quite a complex pgcluster/corosync setup for multi-master replication, load balancing and high availability. To introduce connection pooling to this setup could turn out to be quite a big project. -- Stanislav Raskin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
Yes, loading a large dictionary is known to be a fairly expensive operation. There's been discussions about how to make it cheaper, but nothing's been done yet. regards, tom lane Hi Tom, thanks for the quick response. Bad news for me ;( We develop ajax-driven web apps, which sort of rely on quick calls to data services. Each call to a service opens a new connection. This makes the search service, if using fts and ispell, about 100 times slower than a dumb ILIKE-implementation. Is there any way of hack or compromise to achieve good performance without losing fts ability? I am thinking, for example, of a way to permanently keep a loaded dictionary in memory instead of loading it for every connection. As I wrote in response to Pavel Stehule's post, connection pooling is not really an option. Our front-end is strictly PHP, so I was thinking about using a single persistent connection (http://de.php.net/manual/en/function.pg-pconnect.php) for all calls. Is there some sort of major disadvantage in this approach from the database point of view? Kind regards Hi, opening a completely new connection for each request may be a bit expensive, so I'd recommend using some king od connection pooling, especially when you're doing 'small' transactions (because that's when the overhead matters). We had exactly the same problem and persistent connection solved it. But it has some drawbacks too - each conneection has it's own copy of the dictionary. So if the dictionary takes 30MB and you have 10 connections, then 300 MB of memory is used. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
2011/5/11 Stanislav Raskin ras...@livn.de: On 11.05.11 16:42, Pavel Stehule pavel.steh...@gmail.com wrote: I wrote a patch that stores loaded dictionary in shared memory. Hi Pavel, very interesting. I will give it a closer look. What do you think about using ispell to create, store and index tsvectors, but at the same time to use the stemmer to create tsqueries? Thus, I hope to create a fast user experience while searching, but at the same time have more detailed tsvectors and more matches. I know nothing - but there can be risk - two different method can to returns two different results. Regards Pavel Regards -- Stanislav Raskin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general