subject:"\[GENERAL\] full text search"

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-21 Thread George Neuner

On Thu, 20 Apr 2017 07:56:18 -0700, Rj Ewing 
wrote:

>On Wed, Apr 19, 2017 at 6:44 PM, George Neuner  wrote:
>>
>> If you can restrict the FTS query to certain keys:
>>
>>   SELECT id FROM mytable
>> WHERE tsquery( ... ) @@ to_tsvector(v)
>> AND k IN ( ... )
>> GROUP BY id
>>
>>   [note: according to David Rowley, GROUP BY may be parallelized
>>  whereas  DISTINCT currently cannot be.]
>>
>> then given an index on 'k' it may be much faster than just the FTS
>> query alone.  Subject to key variability, it also may be improved by
>> table partitioning to reduce the search space.
>>
>> If the FTS query is key restricted, you can parallelize either on the
>> client or on the server.  If the FTS query is not key restricted, you
>> pretty much are limited to server side (and 9.6 or later).
>>
>> ?I'll look into parallelism if we can't get the performance we need.
>
>What do you mean if I can restrict the FTS query to certain keys? I'm not
>a sql expert, but it seems like the above query would match multiple keys
>to 1 tsquery value


You weren't specific as to the types of queries you wanted ... you
mentioned somewhere higher up in the discussion:

> ... a basic full text query on 44 million row is taking aproxx. 20ms.

That implied you wanted to FTS search every row.  Only later did you
give an example that tied FTS patterns to particular keys.  Until you
did that, there was no reason to assume the FTS search was targeted -
you might have wanted e.g., records where *any* k:v value matched the
FTS pattern.

[The take away here is: "try to be as specific as possible". 8-) ]


Obviously you can associate a FTS pattern with a particular key value
- just AND the conditions in the WHERE or HAVING clauses.

But be aware that, in general, the more conditions you place on a
query, the slower it runs.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-21 Thread George Neuner

On Thu, 20 Apr 2017 08:50:31 -0700, Rj Ewing 
wrote:

>On Wed, Apr 19, 2017 at 9:55 PM, George Neuner  wrote:
>
>> ... Since you are *testing* with 1M records (that
>> create 44M k:v shards), I am assuming you will need to deal with much
>> more than that in deployment.  And if you think you need FTS, then you
>> must be expecting more than simple word matches [as below], else you
>> might do something simpler like
>>
>>   SELECT ...
>> WHERE val ILIKE 
>
>the 1M records would most likely be the max. On average the tables would
>have more like 100,000 records each.

Ok, so my assumption was way off ... you should be able to achieve the
timing you want with appropriate indexing. 

>from my understanding, *ILIKE* doesn't do any text normalization, which is
>something we would like to have.

Right. If you want rooting/stemming or dictionary translation, then
you do need to use FTS.

>> >how would I write an AND query that filtered on 2 separate keys from the
>> >samples_lg_txt table?
>> >
>> >something like:
>> >
>> > SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
>> > samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND
>> > tsv @@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
>> > to_tsquery('value2'))*;
>>
>> You're overthinking it
>>
>>   SELECT count(distinct s.id)
>> FROM  samples_lg_txt AS s
>> JOIN  keys AS k ON k.id = s.key
>> WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
>>OR (k.name = 'key2' AND s.tsv @@ to_query('value2')
>
>but that is an OR query, I'm trying to do an AND query.

Sorry, I missed the AND in your original query.  Still the nested
SELECT is unnecessary.

Postgresql's planner/optimizer is pretty smart, and probably would
collapse your code into mine (modulo the AND/OR goof), but I prefer
not to rely on the planner to be smart ... that gets you into trouble
when you have to switch between DBMS.

>> There's actually no need to join if you can use the key name instead
>> of an integer id.  You can FK on strings, so you can still maintain an
>> identity table of keys.  E.g.,
>>
>> > id | integer   |
>> > key| vchar(32) | FK key(name) ...
>> > val| text  |
>> > tsv| tsvector  |
>>
>>
>> Then the query could be just
>>
>>   SELECT count(distinct id)
>> FROM  samples_lg_txt
>> WHERE (key = 'key1' AND tsv @@ to_query('value1')
>>OR (key = 'key2' AND tsv @@ to_query('value2')
>>
>
>?this would make queries simpler?. 

Yes - it eliminates the joins, and the query runs on a single table.

>I guess a disadvantage to using a string
>for the key is that the db size would be larger, and thus not as likely to
>fit the entire table in ram. If there are only 63 keys across 44M rows, it
>seems that storing an smallint would take less space then storing the
>string.

Maybe.  Using the integer FK reduces the table size, but it requires a
join with the foreign table.  A join of two tables requires indexes
for the join columns on both tables [which may or may not already
exist], and produces [variously] a temporary hash or key relation
table that represents the rows of the "joined" table.  These temporary
structures can grow very large and may have to spill onto disk.

You can somewhat control that with the work_mem setting.  But remember
that the setting applies to every operation of every concurrent query
... so setting work_mem very high can backfire.

So saving one place can cost you in another.  TANSTAAFL.

>I don't really have a need for the identity table of keys. It's only
>purpose was to shrink the database size.
>
>Thanks again for the detailed responses!

George

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-20 Thread Rj Ewing

On Wed, Apr 19, 2017 at 9:55 PM, George Neuner  wrote:

> On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewing 
> wrote:
>
> >okay, messing around a bit more with the secondary k,v table it seems like
> >this could be a good solution..
> >
> >I created a keys table to hold the 63 key values, then I dropped and
> >recreated the secondary table, using a FK referencing the keys table. I'm
> >not really sure why, but a basic full text query on 44 million row is
> >taking aproxx. 20ms.
>
> That pretty much confirms your statistics were bad ... using the FK
> table or not wouldn't make any difference to the planner.
>
> But if you are getting 20ms on 44M rows, then one or more of the
> following must be true:
>  - your text values must be very short
>  - your FTS queries must be very simple
>  - you aren't reading the results
>

text is on average very short. 1-3 words per value. It was a count(*)
query with only a single condition.


> For comparison:
>
> I have an application that does FTS on a table of NAICS descriptions
> indexed using tsvectors with an average length of 4.8 tokens per.  It
> does a 3-part All/Any/None term search.
>
> On my 24-core 2.4GHz server, a single threaded query with the whole
> table and index in memory takes ~1 ms to search 20K rows using a
> realistic tsquery:  e.g.,
>
>   SELECT code,description
> FROM naics
> WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)')
>
> [getting the data out of Postgresql takes longer than the search]
>
>
> GIN indexes don't exactly scale linearly, and tsquery is, in general,
> much more dependent on the lengths of the tsvectors than on the
> complexity of the match, but with 44M rows of similarly distributed
> data, a similarly realistic query would be expected to take well over
> 1 second.
>
>
> My example is genuine but too small to bother parallelizing [mentioned
> in a previous message].  Since you are *testing* with 1M records (that
> create 44M k:v shards), I am assuming you will need to deal with much
> more than that in deployment.  And if you think you need FTS, then you
> must be expecting more than simple word matches [as below], else you
> might do something simpler like
>
>   SELECT ...
> WHERE val ILIKE 
> 
>

the 1M records would most likely be the max. On average the tables would
have more like 100,000 records each.

I also realized that I inserted all k:v pairs into the secondary k:v table.
In reality, I would only index strings, which would eliminate approx 25% of
the k:v bringing that number down closer to 30M.

from my understanding, *ILIKE* doesn't do any text normalization, which is
something we would like to have.

> 
>
>
> >my table structure is:
> >
> > Table "public.samples_lg_txt"
> > Column |   Type   | Modifiers
> >+--+---
> > id | integer  |
> > key| integer  |
> > val| text |
> > tsv| tsvector |
> >Indexes:
> >"idx_tsv_samples_lg_text" gin (tsv)
> >Foreign-key constraints:
> >"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id)
> >ON DELETE CASCADE
> >"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id)
> >
> >
> >how would I write an AND query that filtered on 2 separate keys from the
> >samples_lg_txt table?
> >
> >something like:
> >
> >SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
> >samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND
> tsv
> >@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
> >to_tsquery('value2'))*;
>
> You're overthinking it
>
>   SELECT count(distinct s.id)
> FROM  samples_lg_txt AS s
> JOIN  keys AS k ON k.id = s.key
> WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
>OR (k.name = 'key2' AND s.tsv @@ to_query('value2')
>
> but that is an OR query, I'm trying to do an AND query.


> There's actually no need to join if you can use the key name instead
> of an integer id.  You can FK on strings, so you can still maintain an
> identity table of keys.  E.g.,
>
> > id | integer   |
> > key| vchar(32) | FK key(name) ...
> > val| text  |
> > tsv| tsvector  |
>
>
> Then the query could be just
>
>   SELECT count(distinct id)
> FROM  samples_lg_txt
> WHERE (key = 'key1' AND tsv @@ to_query('value1')
>OR (key = 'key2' AND tsv @@ to_query('value2')
>
>
> Just a reminder [it's late here 8-)]: FK columns contain values - not
> weird references to the foreign tables.  The constraint just enforces
> that any value inserted/updated into the FK column matches an existing
> value in the relevant foreign table.


this would make queries simpler. I guess a disadvantage to using a string
for the key is that the db size would be larger, and thus not as likely to
fit the entire table in ram. If there are only 63 keys across 44M rows, it
seems that storing an smallint would take less space then storing the
string.

I don't

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-20 Thread Rj Ewing

On Wed, Apr 19, 2017 at 8:09 PM, Jeff Janes  wrote:

>
> Your best bet might be to ignore the per-field searching in the initial
> (indexed) pass of the query to get everything that has all the search
> terms, regardless of which field they occur in.  And the re-check whether
> each of the found values was found in the appropriate field in a later pass.
>
> Something like
>
> select * from sample where
>  to_tsvector(json_thing->>:key1) @@ :value1
>   and to_tsvector(json_thing->>:key2) @@ :value2
>   and to_tsvector('english',json_thing) @@ (:value1 || :value2)
>

that worked pretty well when there was an AND condition with multiple k:v
pairs as you have. However replacing it with an OR condition across k:v
pairs it was pretty slow. I do like the simplicity though. Maybe indexing
the 10ish most common columns would be a "good enough" solution.

>
> From the initial email:
>
> > An idea that has come up is to use a materialized view or secondary
> table with triggers, where we would have 3 columns (id, key, value).
>
> How would this be different from the "triple store" you are abandoning?
>

it would be fairly similar. One advantage would be that we could simplify
the backend to just a RDMS (which we use already), and not have to maintain
a separate "triple store" instance

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-20 Thread Rj Ewing

On Wed, Apr 19, 2017 at 6:44 PM, George Neuner  wrote:
>
> On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing 
> wrote:
>
> >I did some testing using a secondary table with the key, value column.
> >However I don't think this will provide the performance that we need.
> >Queries we taking 60+ seconds just for a count.
>
> SELECT count(*) or filtered?
>
> Either way, your statistics may be way off.  Did you vacuum analyze
> the table after the inserts (and the index creation if it was done
> separately)?


I think my statistics were off. I never ran vacuum analyze.

>
> Without more detail re: your hardware, Postgresql version, what
> indexes are/will be available, the types of queries you want to run,
> etc., it's very hard to give really meaningful suggestions.
>
> postgresql 9.6, currently testing on 8gb ram, but have upto 64gb for
production. 7-core 2.10GHz.

mostly want to run ad-hoc queries, returning entire row, matching 1 - 3 k:v
conditions.

The kind of query you have alluded to is pretty easily parallelized:
> it can be spread over multiple sessions with result aggregation done
> on the client side.
>
> Or, if you you have 9.6, you might try using backend parallelism:
> https://www.postgresql.org/docs/9.6/static/parallel-query.html
> [I've not used this, but some people have done it successfully.]


>

> If you can restrict the FTS query to certain keys:
>
>   SELECT id FROM mytable
> WHERE tsquery( ... ) @@ to_tsvector(v)
> AND k IN ( ... )
> GROUP BY id
>
>   [note: according to David Rowley, GROUP BY may be parallelized
>  whereas  DISTINCT currently cannot be.]
>
> then given an index on 'k' it may be much faster than just the FTS
> query alone.  Subject to key variability, it also may be improved by
> table partitioning to reduce the search space.
>
> If the FTS query is key restricted, you can parallelize either on the
> client or on the server.  If the FTS query is not key restricted, you
> pretty much are limited to server side (and 9.6 or later).
>
> I'll look into parallelism if we can't get the performance we need.

What do you mean if I can restrict the FTS query to certain keys? I'm not
a sql expert, but it seems like the above query would match multiple keys
to 1 tsquery value?

We need to be able to do AND conditions with separate k:v pairs. Our keys
are know ahead of time, and would vary for a given table, but would be in
the range of 30-60 keys per table.

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner

On Thu, 20 Apr 2017 00:55:48 -0400, George Neuner
 wrote:

Doh!

>  SELECT count(distinct s.id)
>FROM  samples_lg_txt AS s
>JOIN  keys AS k ON k.id = s.key
>WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
>   OR (k.name = 'key2' AND s.tsv @@ to_query('value2')
>
>  :
>
>  SELECT count(distinct id)
>FROM  samples_lg_txt
>WHERE (key = 'key1' AND tsv @@ to_query('value1')
>   OR (key = 'key2' AND tsv @@ to_query('value2')


All the WHERE clauses need closing parentheses.

Time for bed,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner

On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewing 
wrote:

>okay, messing around a bit more with the secondary k,v table it seems like
>this could be a good solution..
>
>I created a keys table to hold the 63 key values, then I dropped and
>recreated the secondary table, using a FK referencing the keys table. I'm
>not really sure why, but a basic full text query on 44 million row is
>taking aproxx. 20ms.

That pretty much confirms your statistics were bad ... using the FK
table or not wouldn't make any difference to the planner.

But if you are getting 20ms on 44M rows, then one or more of the
following must be true:
 - your text values must be very short
 - your FTS queries must be very simple
 - you aren't reading the results

For comparison: 

I have an application that does FTS on a table of NAICS descriptions
indexed using tsvectors with an average length of 4.8 tokens per.  It
does a 3-part All/Any/None term search.

On my 24-core 2.4GHz server, a single threaded query with the whole
table and index in memory takes ~1 ms to search 20K rows using a
realistic tsquery:  e.g., 

  SELECT code,description
FROM naics
WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)')

[getting the data out of Postgresql takes longer than the search]

GIN indexes don't exactly scale linearly, and tsquery is, in general,
much more dependent on the lengths of the tsvectors than on the
complexity of the match, but with 44M rows of similarly distributed
data, a similarly realistic query would be expected to take well over
1 second.

My example is genuine but too small to bother parallelizing [mentioned
in a previous message].  Since you are *testing* with 1M records (that
create 44M k:v shards), I am assuming you will need to deal with much
more than that in deployment.  And if you think you need FTS, then you
must be expecting more than simple word matches [as below], else you
might do something simpler like

  SELECT ...
WHERE val ILIKE 

>my table structure is:
>
> Table "public.samples_lg_txt"
> Column |   Type   | Modifiers
>+--+---
> id | integer  |
> key| integer  |
> val| text |
> tsv| tsvector |
>Indexes:
>"idx_tsv_samples_lg_text" gin (tsv)
>Foreign-key constraints:
>"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) 
>ON DELETE CASCADE
>"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id)
>
>
>how would I write an AND query that filtered on 2 separate keys from the
>samples_lg_txt table?
>
>something like:
>
>SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
>samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv
>@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
>to_tsquery('value2'))*;

You're overthinking it

  SELECT count(distinct s.id)
FROM  samples_lg_txt AS s
JOIN  keys AS k ON k.id = s.key
WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
   OR (k.name = 'key2' AND s.tsv @@ to_query('value2')

There's actually no need to join if you can use the key name instead
of an integer id.  You can FK on strings, so you can still maintain an
identity table of keys.  E.g.,

> id | integer   |
> key| vchar(32) | FK key(name) ...
> val| text  |
> tsv| tsvector  |

Then the query could be just

  SELECT count(distinct id)
FROM  samples_lg_txt
WHERE (key = 'key1' AND tsv @@ to_query('value1')
   OR (key = 'key2' AND tsv @@ to_query('value2')

Just a reminder [it's late here 8-)]: FK columns contain values - not
weird references to the foreign tables.  The constraint just enforces
that any value inserted/updated into the FK column matches an existing
value in the relevant foreign table.

George

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread Jeff Janes

Please don't top-post, thanks.

On Tue, Apr 18, 2017 at 3:18 PM, Rj Ewing  wrote:

> On Tue, Apr 18, 2017 at 3:00 PM, Bruce Momjian  wrote:
>
>>
>> Full text search of JSON and JSONB data is coming in Postgres 10, which
>> is to to be released in September of this year:
>>
>> https://www.depesz.com/2017/04/04/waiting-for-postgresql-10-
>> full-text-search-support-for-json-and-jsonb/
>
>

A step in the right direction for me, however it doesn't appear to support
> per field full text searching.
> It is exciting though!
>

Your best bet might be to ignore the per-field searching in the initial
(indexed) pass of the query to get everything that has all the search
terms, regardless of which field they occur in.  And the re-check whether
each of the found values was found in the appropriate field in a later pass.

Something like

select * from sample where
 to_tsvector(json_thing->>:key1) @@ :value1
  and to_tsvector(json_thing->>:key2) @@ :value2
  and to_tsvector('english',json_thing) @@ (:value1 || :value2)

>From the initial email:

> An idea that has come up is to use a materialized view or secondary table
with triggers, where we would have 3 columns (id, key, value).

How would this be different from the "triple store" you are abandoning?

Cheers,

Jeff

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner

Please don't top post.
https://en.wikipedia.org/wiki/Posting_style#Placement_of_replies
https://en.wikipedia.org/wiki/Posting_style#Choosing_the_proper_posting_style

>> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
>> wrote:
>>
>> > :
>> >An idea that has come up is to use a materialized view or secondary table
>> >with triggers, where we would have 3 columns (id, key, value).
>> >
>> >I think this would allow us to store a tsvector and gin index. Giving us
>> >the ability to use fulltext search on k:v pairs, then join the original
>> >data on the id field to return the entire record.
>> > :

On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing 
wrote:

>I did some testing using a secondary table with the key, value column.
>However I don't think this will provide the performance that we need.
>Queries we taking 60+ seconds just for a count.

SELECT count(*) or filtered?

Either way, your statistics may be way off.  Did you vacuum analyze
the table after the inserts (and the index creation if it was done
separately)?

>With 1 million rows in the primary table, this resulted in 44 million rows
>in the secondary k,v table for full text searching. The same query is es
>takes ~50 ms on my local machine with 1/10th the ram allocated to es then
>was allocated to psql.
>
>I'm gonna test using trigrams indexes on approx 10 json fields, and see if
>that gives us what we are looking for.
>
>any thought on getting sub 1 sec queries on a table with 44 million rows?
>
>RJ

Based on your description of the data [at top], I rather doubt
trigrams will be an improvement over tsvector.  And they're more
cumbersome to use if you don't need better similarity matching than
what tsvector offers [which itself is somewhat adjustable via
dictionaries].

Without more detail re: your hardware, Postgresql version, what
indexes are/will be available, the types of queries you want to run,
etc., it's very hard to give really meaningful suggestions.

The kind of query you have alluded to is pretty easily parallelized:
it can be spread over multiple sessions with result aggregation done
on the client side.

Or, if you you have 9.6, you might try using backend parallelism:
https://www.postgresql.org/docs/9.6/static/parallel-query.html
[I've not used this, but some people have done it successfully.]

If you can restrict the FTS query to certain keys:

  SELECT id FROM mytable
WHERE tsquery( ... ) @@ to_tsvector(v)
AND k IN ( ... )
GROUP BY id

  [note: according to David Rowley, GROUP BY may be parallelized
 whereas  DISTINCT currently cannot be.]

then given an index on 'k' it may be much faster than just the FTS
query alone.  Subject to key variability, it also may be improved by
table partitioning to reduce the search space.

If the FTS query is key restricted, you can parallelize either on the
client or on the server.  If the FTS query is not key restricted, you
pretty much are limited to server side (and 9.6 or later).

And I'm out of suggestions for now.  

Parallel query is your best bet for maximum performance, but unless
you have enough RAM to hold the entire table and its indexes, and all
the query workspaces, then I doubt you will be able to get anywhere
near your optimistic execution target for FTS on 40+ million rows.  

YMMV,
George

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread Rj Ewing

okay, messing around a bit more with the secondary k,v table it seems like
this could be a good solution..

I created a keys table to hold the 63 key values, then I dropped and
recreated the secondary table, using a FK referencing the keys table. I'm
not really sure why, but a basic full text query on 44 million row is
taking aproxx. 20ms.

my table structure is:

 Table "public.samples_lg_txt"
 Column |   Type   | Modifiers
+--+---
 id | integer  |
 key| integer  |
 val| text |
 tsv| tsvector |
Indexes:
"idx_tsv_samples_lg_text" gin (tsv)
Foreign-key constraints:
"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) ON
DELETE CASCADE
"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id)


how would I write an AND query that filtered on 2 separate keys from the
samples_lg_txt table?

something like:

SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv
@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
to_tsquery('value2'))*;

On Wed, Apr 19, 2017 at 11:57 AM, Rj Ewing  wrote:

> I did some testing using a secondary table with the key, value column.
> However I don't think this will provide the performance that we need.
> Queries we taking 60+ seconds just for a count.
>
> With 1 million rows in the primary table, this resulted in 44 million rows
> in the secondary k,v table for full text searching. The same query is es
> takes ~50 ms on my local machine with 1/10th the ram allocated to es then
> was allocated to psql.
>
> I'm gonna test using trigrams indexes on approx 10 json fields, and see if
> that gives us what we are looking for.
>
> any thought on getting sub 1 sec queries on a table with 44 million rows?
>
> RJ
>
> On Tue, Apr 18, 2017 at 10:35 PM, George Neuner 
> wrote:
>
>> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
>> wrote:
>>
>> >I am evaluating postgres for as a datastore for our webapp. We are moving
>> >away from a triple store db due to performance issues.
>> >
>> >Our data model consists of sets of user defined attributes. Approx 10% of
>> >the attributes tend to be 100% filled with 50% of the attributes having
>> >approx 25% filled. This is fairly sparse data, and it seems that jsonb or
>> >hstore will be best for us.
>> >
>> >Unfortunately, from my understanding, postres doesn't support fulltext
>> >search across hstore or jsonb key:values or even the entire document.
>> While
>> >this is not a deal breaker, this would be a great feature to have. We
>> have
>> >been experimenting w/ elasticsearch a bit, and particularly enjoy this
>> >feature, however we don't really want to involve the complexity and
>> >overhead of adding elasticsearch in front of our datasource right now.
>>
>> hstore and JSON values all really are just formatted text with a
>> custom column type.  You can create tsvectors from the values if you
>> cast them to text.
>>
>> Note that a tsvector can only work on a /flat/ key:value structure: it
>> won't understand nesting, and it and even with a flat store it won't
>> understand the difference between keys/tags and the associated values.
>>
>> E.g., you will be able to see that a value contains both "foo" and
>> "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z'
>> you either must check the token positions (from the tsvector) or *try*
>> to extract the key(s) you are interested in and check the associated
>> value(s).
>>
>> This might work ok if you search only for keys in a "document" ... but
>> trying to search values, I think would be far too complicated.
>>
>> It might help if you stored a 2D array instead of a flat structure,
>> but even that would be fairly complicated to work with.
>>
>>
>>
>> >An idea that has come up is to use a materialized view or secondary table
>> >with triggers, where we would have 3 columns (id, key, value).
>> >
>> >I think this would allow us to store a tsvector and gin index. Giving us
>> >the ability to use fulltext search on k:v pairs, then join the original
>> >data on the id field to return the entire record.
>>
>> This is a much better idea because it separates the key from the
>> value, and unlike the full "document" case [above], you will know that
>> the FTS index is covering only the values.
>>
>> If you need to preserve key order to reconstruct records, you will
>> need an additional column to maintain that ordering.
>>
>>
>> >is anyone currently doing this? Is there a better alternative? Any
>> >performance issues that immediately jump out ( I realize the writes will
>> >take longer)?
>> >
>> >the nature of our data is "relatively" static with bulk uploads (100 -
>> 1000
>> >records). So we can sacrifice some write performance.
>> >
>> >RJ
>>
>> Having to "reconstruct" records will make reads take longer as well,
>> but I think separating the keys and values is the

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread Rj Ewing

I did some testing using a secondary table with the key, value column.
However I don't think this will provide the performance that we need.
Queries we taking 60+ seconds just for a count.

With 1 million rows in the primary table, this resulted in 44 million rows
in the secondary k,v table for full text searching. The same query is es
takes ~50 ms on my local machine with 1/10th the ram allocated to es then
was allocated to psql.

I'm gonna test using trigrams indexes on approx 10 json fields, and see if
that gives us what we are looking for.

any thought on getting sub 1 sec queries on a table with 44 million rows?

RJ

On Tue, Apr 18, 2017 at 10:35 PM, George Neuner 
wrote:

> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
> wrote:
>
> >I am evaluating postgres for as a datastore for our webapp. We are moving
> >away from a triple store db due to performance issues.
> >
> >Our data model consists of sets of user defined attributes. Approx 10% of
> >the attributes tend to be 100% filled with 50% of the attributes having
> >approx 25% filled. This is fairly sparse data, and it seems that jsonb or
> >hstore will be best for us.
> >
> >Unfortunately, from my understanding, postres doesn't support fulltext
> >search across hstore or jsonb key:values or even the entire document.
> While
> >this is not a deal breaker, this would be a great feature to have. We have
> >been experimenting w/ elasticsearch a bit, and particularly enjoy this
> >feature, however we don't really want to involve the complexity and
> >overhead of adding elasticsearch in front of our datasource right now.
>
> hstore and JSON values all really are just formatted text with a
> custom column type.  You can create tsvectors from the values if you
> cast them to text.
>
> Note that a tsvector can only work on a /flat/ key:value structure: it
> won't understand nesting, and it and even with a flat store it won't
> understand the difference between keys/tags and the associated values.
>
> E.g., you will be able to see that a value contains both "foo" and
> "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z'
> you either must check the token positions (from the tsvector) or *try*
> to extract the key(s) you are interested in and check the associated
> value(s).
>
> This might work ok if you search only for keys in a "document" ... but
> trying to search values, I think would be far too complicated.
>
> It might help if you stored a 2D array instead of a flat structure,
> but even that would be fairly complicated to work with.
>
>
>
> >An idea that has come up is to use a materialized view or secondary table
> >with triggers, where we would have 3 columns (id, key, value).
> >
> >I think this would allow us to store a tsvector and gin index. Giving us
> >the ability to use fulltext search on k:v pairs, then join the original
> >data on the id field to return the entire record.
>
> This is a much better idea because it separates the key from the
> value, and unlike the full "document" case [above], you will know that
> the FTS index is covering only the values.
>
> If you need to preserve key order to reconstruct records, you will
> need an additional column to maintain that ordering.
>
>
> >is anyone currently doing this? Is there a better alternative? Any
> >performance issues that immediately jump out ( I realize the writes will
> >take longer)?
> >
> >the nature of our data is "relatively" static with bulk uploads (100 -
> 1000
> >records). So we can sacrifice some write performance.
> >
> >RJ
>
> Having to "reconstruct" records will make reads take longer as well,
> but I think separating the keys and values is the best way to do it.
>
>
> YMMV,
> George
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-18 Thread George Neuner

On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
wrote:

>I am evaluating postgres for as a datastore for our webapp. We are moving
>away from a triple store db due to performance issues.
>
>Our data model consists of sets of user defined attributes. Approx 10% of
>the attributes tend to be 100% filled with 50% of the attributes having
>approx 25% filled. This is fairly sparse data, and it seems that jsonb or
>hstore will be best for us.
>
>Unfortunately, from my understanding, postres doesn't support fulltext
>search across hstore or jsonb key:values or even the entire document. While
>this is not a deal breaker, this would be a great feature to have. We have
>been experimenting w/ elasticsearch a bit, and particularly enjoy this
>feature, however we don't really want to involve the complexity and
>overhead of adding elasticsearch in front of our datasource right now.

hstore and JSON values all really are just formatted text with a
custom column type.  You can create tsvectors from the values if you
cast them to text.

Note that a tsvector can only work on a /flat/ key:value structure: it
won't understand nesting, and it and even with a flat store it won't
understand the difference between keys/tags and the associated values.

E.g., you will be able to see that a value contains both "foo" and
"bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z'
you either must check the token positions (from the tsvector) or *try*
to extract the key(s) you are interested in and check the associated
value(s).

This might work ok if you search only for keys in a "document" ... but
trying to search values, I think would be far too complicated.  

It might help if you stored a 2D array instead of a flat structure,
but even that would be fairly complicated to work with.

>An idea that has come up is to use a materialized view or secondary table
>with triggers, where we would have 3 columns (id, key, value).
>
>I think this would allow us to store a tsvector and gin index. Giving us
>the ability to use fulltext search on k:v pairs, then join the original
>data on the id field to return the entire record.

This is a much better idea because it separates the key from the
value, and unlike the full "document" case [above], you will know that
the FTS index is covering only the values.

If you need to preserve key order to reconstruct records, you will
need an additional column to maintain that ordering.

>is anyone currently doing this? Is there a better alternative? Any
>performance issues that immediately jump out ( I realize the writes will
>take longer)?
>
>the nature of our data is "relatively" static with bulk uploads (100 - 1000
>records). So we can sacrifice some write performance.
>
>RJ

Having to "reconstruct" records will make reads take longer as well,
but I think separating the keys and values is the best way to do it.

YMMV,
George

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-18 Thread Rj Ewing

A step in the right direction for me, however it doesn't appear to support
per field full text searching.
It is exciting though!

On Tue, Apr 18, 2017 at 3:00 PM, Bruce Momjian  wrote:

> On Tue, Apr 18, 2017 at 02:38:15PM -0700, Rj Ewing wrote:
> > I am evaluating postgres for as a datastore for our webapp. We are
> moving away
> > from a triple store db due to performance issues.
> >
> > Our data model consists of sets of user defined attributes. Approx 10%
> of the
> > attributes tend to be 100% filled with 50% of the attributes having
> approx 25%
> > filled. This is fairly sparse data, and it seems that jsonb or hstore
> will be
> > best for us.
> >
> > Unfortunately, from my understanding, postres doesn't support fulltext
> search
> > across hstore or jsonb key:values or even the entire document. While
> this is
> > not a deal breaker, this would be a great feature to have. We have been
> > experimenting w/ elasticsearch a bit, and particularly enjoy this
> feature,
> > however we don't really want to involve the complexity and overhead of
> adding
> > elasticsearch in front of our datasource right now.
>
> Full text search of JSON and JSONB data is coming in Postgres 10, which
> is to to be released in September of this year:
>
> https://www.depesz.com/2017/04/04/waiting-for-postgresql-
> 10-full-text-search-support-for-json-and-jsonb/
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +  Ancient Roman grave inscription +
>

Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-18 Thread Bruce Momjian

On Tue, Apr 18, 2017 at 02:38:15PM -0700, Rj Ewing wrote:
> I am evaluating postgres for as a datastore for our webapp. We are moving away
> from a triple store db due to performance issues.
> 
> Our data model consists of sets of user defined attributes. Approx 10% of the
> attributes tend to be 100% filled with 50% of the attributes having approx 25%
> filled. This is fairly sparse data, and it seems that jsonb or hstore will be
> best for us.
> 
> Unfortunately, from my understanding, postres doesn't support fulltext search
> across hstore or jsonb key:values or even the entire document. While this is
> not a deal breaker, this would be a great feature to have. We have been
> experimenting w/ elasticsearch a bit, and particularly enjoy this feature,
> however we don't really want to involve the complexity and overhead of adding
> elasticsearch in front of our datasource right now.

Full text search of JSON and JSONB data is coming in Postgres 10, which
is to to be released in September of this year:


https://www.depesz.com/2017/04/04/waiting-for-postgresql-10-full-text-search-support-for-json-and-jsonb/

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] full text search on hstore or json with materialized view?

2017-04-18 Thread Rj Ewing

I am evaluating postgres for as a datastore for our webapp. We are moving
away from a triple store db due to performance issues.

Our data model consists of sets of user defined attributes. Approx 10% of
the attributes tend to be 100% filled with 50% of the attributes having
approx 25% filled. This is fairly sparse data, and it seems that jsonb or
hstore will be best for us.

Unfortunately, from my understanding, postres doesn't support fulltext
search across hstore or jsonb key:values or even the entire document. While
this is not a deal breaker, this would be a great feature to have. We have
been experimenting w/ elasticsearch a bit, and particularly enjoy this
feature, however we don't really want to involve the complexity and
overhead of adding elasticsearch in front of our datasource right now.

An idea that has come up is to use a materialized view or secondary table
with triggers, where we would have 3 columns (id, key, value).

I think this would allow us to store a tsvector and gin index. Giving us
the ability to use fulltext search on k:v pairs, then join the original
data on the id field to return the entire record.

is anyone currently doing this? Is there a better alternative? Any
performance issues that immediately jump out ( I realize the writes will
take longer)?

the nature of our data is "relatively" static with bulk uploads (100 - 1000
records). So we can sacrifice some write performance.

RJ

Re: [GENERAL] Full Text Search combined with Fuzzy

2017-03-03 Thread Artur Zakirov


On 03.03.2017 16:17, Nicolas Paris wrote:

Nice ! I do have 9.6 version.

Would this kind of index could handle more than 20M large texts ? The
recheck condition looks ressource consuming.


You are right. I think pg_trgm will be not good for such large texts, 
unfortunately.




The full text index + phrase search + synonym dictionnary is the only
other alternativ to deal with typo-phrase mining ?


I suppose there are no other options now. Though, prefix search maybe 
will help you [1].




Is there any possibility in the future to add typo in the full text
road-map ?


As far as I know, there is no plans in the near future to add similarity 
full text search.


1. 
https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full Text Search combined with Fuzzy

2017-03-03 Thread Nicolas Paris

Le 03 mars 2017 à 14:08, Artur Zakirov écrivait :
> On 03.03.2017 15:49, Nicolas Paris wrote:
> >
> >Hi Oleg,
> >
> >Thanks. I thought pgtrgm was not able to index my long texts because of
> >limitation of 8191 bytes per index row for btree.
> >
> >Then I found out it is possible to use pgtrgm over a GIN/GIST index.
> >My final use case is phrase mining in texts.
> >
> >I want my application returns texts that contains approximatly the user
> >entry:
> >
> >Eg: user search "Hello Word"
> >a text containing "blah blah blah hello world blah blah blah" would be
> >returned.
> >
> >Test:
> >postgres=# CREATE table test_trgm (texts text);
> >CREATE TABLE
> >postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops);
> >CREATE INDEX
> >postgres=# SET enable_seqscan = OFF;
> >SET
> >postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah 
> >blah blah');
> >INSERT 0 1
> >postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah 
> >blah blah');
> >INSERT 0 1
> >postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm 
> >WHERE texts % 'hello word';
> >   texts   | similarity
> >---+
> > blah blah blah hello world blah blah blah |   0.473684
> > blah blah blah hello word blah blah blah  | 0.6875
> >(2 rows)
> >
> >postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM 
> >test_trgm WHERE texts % 'hello word';
> >QUERY PLAN
> >---
> > Bitmap Heap Scan on test_trgm  (cost=52.01..56.03 rows=1 width=32)
> >   Recheck Cond: (texts % 'hello word'::text)
> >   ->  Bitmap Index Scan on test_trgm_texts_idx  (cost=0.00..52.01 rows=1 
> > width=0)
> > Index Cond: (texts % 'hello word'::text)
> >(4 rows)
> >
> >Conclusion: If I d'say 0.4 is my threshold, would this methodology meet
> >my requirements ?
> >
> >Thanks for the help !
> >
> 
> Hello,
> 
> If you use PostgreSQL 9.6, then word_similarity() can help you [1]. For
> example:
> 
> postgres=# SELECT texts, word_similarity('hello word', texts) FROM test_trgm
> WHERE 'hello word' <% texts;
>texts   | word_similarity
> ---+-
>  blah blah blah hello world blah blah blah |0.818182
>  blah blah blah hello word blah blah blah  |   1
> (2 rows)
> 
> 1. https://www.postgresql.org/docs/9.6/static/pgtrgm.html
> 

Nice ! I do have 9.6 version.

Would this kind of index could handle more than 20M large texts ? The
recheck condition looks ressource consuming.

The full text index + phrase search + synonym dictionnary is the only
other alternativ to deal with typo-phrase mining ?

Is there any possibility in the future to add typo in the full text
road-map ?

Thanks,

> -- 
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full Text Search combined with Fuzzy

2017-03-03 Thread Artur Zakirov


On 03.03.2017 15:49, Nicolas Paris wrote:


Hi Oleg,

Thanks. I thought pgtrgm was not able to index my long texts because of
limitation of 8191 bytes per index row for btree.

Then I found out it is possible to use pgtrgm over a GIN/GIST index.
My final use case is phrase mining in texts.

I want my application returns texts that contains approximatly the user
entry:

Eg: user search "Hello Word"
a text containing "blah blah blah hello world blah blah blah" would be
returned.

Test:
postgres=# CREATE table test_trgm (texts text);
CREATE TABLE
postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops);
CREATE INDEX
postgres=# SET enable_seqscan = OFF;
SET
postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah 
blah');
INSERT 0 1
postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah 
blah');
INSERT 0 1
postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE 
texts % 'hello word';
   texts   | similarity
---+
 blah blah blah hello world blah blah blah |   0.473684
 blah blah blah hello word blah blah blah  | 0.6875
(2 rows)

postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm 
WHERE texts % 'hello word';
QUERY PLAN
---
 Bitmap Heap Scan on test_trgm  (cost=52.01..56.03 rows=1 width=32)
   Recheck Cond: (texts % 'hello word'::text)
   ->  Bitmap Index Scan on test_trgm_texts_idx  (cost=0.00..52.01 rows=1 
width=0)
 Index Cond: (texts % 'hello word'::text)
(4 rows)

Conclusion: If I d'say 0.4 is my threshold, would this methodology meet
my requirements ?

Thanks for the help !



Hello,

If you use PostgreSQL 9.6, then word_similarity() can help you [1]. For 
example:


postgres=# SELECT texts, word_similarity('hello word', texts) FROM 
test_trgm WHERE 'hello word' <% texts;

   texts   | word_similarity
---+-
 blah blah blah hello world blah blah blah |0.818182
 blah blah blah hello word blah blah blah  |   1
(2 rows)

1. https://www.postgresql.org/docs/9.6/static/pgtrgm.html

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full Text Search combined with Fuzzy

2017-03-03 Thread Nicolas Paris

Le 27 févr. 2017 à 10:32, Oleg Bartunov écrivait :
> 
> 
> On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Paris  wrote:
> 
> Hello,
> 
> AFAIK there is no built-in way to combine full text search and fuzzy
> matching
> (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html).
> By example, phrase searching with tipos in it.
> 
> First I don't know if postgresql concurrents (lucene based...) are able
> to do so.
> 
> 
> Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used 
> for
> this.
> 

Hi Oleg,

Thanks. I thought pgtrgm was not able to index my long texts because of
limitation of 8191 bytes per index row for btree.

Then I found out it is possible to use pgtrgm over a GIN/GIST index.
My final use case is phrase mining in texts.

I want my application returns texts that contains approximatly the user
entry:

Eg: user search "Hello Word"
a text containing "blah blah blah hello world blah blah blah" would be
returned.

Test:
postgres=# CREATE table test_trgm (texts text);
CREATE TABLE
postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops);
CREATE INDEX
postgres=# SET enable_seqscan = OFF;
SET
postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah 
blah');
INSERT 0 1
postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah 
blah');
INSERT 0 1
postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE 
texts % 'hello word';
   texts   | similarity 
---+
 blah blah blah hello world blah blah blah |   0.473684
 blah blah blah hello word blah blah blah  | 0.6875
(2 rows)

postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm 
WHERE texts % 'hello word';
QUERY PLAN  
   
---
 Bitmap Heap Scan on test_trgm  (cost=52.01..56.03 rows=1 width=32)
   Recheck Cond: (texts % 'hello word'::text)
   ->  Bitmap Index Scan on test_trgm_texts_idx  (cost=0.00..52.01 rows=1 
width=0)
 Index Cond: (texts % 'hello word'::text)
(4 rows)

Conclusion: If I d'say 0.4 is my threshold, would this methodology meet
my requirements ?

Thanks for the help !


> 
> Second, is such feature is in the road map ?
> 
> Third, I wonder if it is a good idea to use the postgresql synonyms
> feature for such prupose.(https://www.postgresql.org/docs/current/static/
> textsearch-dictionaries.html)
> I mean, building up a synonyms dictionnary containing tipos. By eg:
> 
> postgres        pgsql
> postgresql      pgsql
> postgrez        pgsql
> postgre         pgsql
> gogle           googl
> gooogle         googl
> 
> There is multiple way to build such dictionary. But my question is about
> the implementation of dictionnaries in postgresql: Is postgresql
> supposed to take advantage of billion entries dictionaries ?
> 
> 
> dictionary is just a program, so it's  up to developer how to write efficient
> program to deal with billion entries. Specifically to synonym dictionary, it's
> not intended to work with a lot of entries. btw, have a look on contrib/
> dict_xsyn dictionary, which is more flexible than synonym.
> 
> 
> Thanks by advance for you answers,
>
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
> 
> 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full Text Search combined with Fuzzy

2017-02-27 Thread Oleg Bartunov

On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Paris  wrote:

> Hello,
>
> AFAIK there is no built-in way to combine full text search and fuzzy
> matching
> (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html).
> By example, phrase searching with tipos in it.
>
> First I don't know if postgresql concurrents (lucene based...) are able
> to do so.
>

Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used
for this.

>
> Second, is such feature is in the road map ?
>
> Third, I wonder if it is a good idea to use the postgresql synonyms
> feature for such prupose.(https://www.postgresql.org/docs/current/
> static/textsearch-dictionaries.html)
> I mean, building up a synonyms dictionnary containing tipos. By eg:
>
> postgrespgsql
> postgresql  pgsql
> postgrezpgsql
> postgre pgsql
> gogle   googl
> gooogle googl
>
> There is multiple way to build such dictionary. But my question is about
> the implementation of dictionnaries in postgresql: Is postgresql
> supposed to take advantage of billion entries dictionaries ?
>

dictionary is just a program, so it's  up to developer how to write
efficient program to deal with billion entries. Specifically to synonym
dictionary, it's not intended to work with a lot of entries. btw, have a
look on contrib/dict_xsyn dictionary, which is more flexible than synonym.

>
> Thanks by advance for you answers,
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] Full Text Search combined with Fuzzy

2017-02-26 Thread Nicolas Paris

Hello,

AFAIK there is no built-in way to combine full text search and fuzzy matching
(https://www.postgresql.org/docs/current/static/fuzzystrmatch.html).
By example, phrase searching with tipos in it.

First I don't know if postgresql concurrents (lucene based...) are able
to do so.

Second, is such feature is in the road map ?

Third, I wonder if it is a good idea to use the postgresql synonyms
feature for such 
prupose.(https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html)
I mean, building up a synonyms dictionnary containing tipos. By eg:

postgrespgsql
postgresql  pgsql
postgrezpgsql
postgre pgsql
gogle   googl
gooogle googl

There is multiple way to build such dictionary. But my question is about
the implementation of dictionnaries in postgresql: Is postgresql
supposed to take advantage of billion entries dictionaries ?

Thanks by advance for you answers,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search randomly not working for short prefixes?

2016-12-02 Thread Andreas Joseph Krogh

På fredag 02. desember 2016 kl. 16:33:12, skrev Tom Lane >:
cen  writes:
 > Something funny going on with my full text search.. and I have no idea what.

 The way to debug this sort of thing is generally to look at what tsquery
 you're actually getting.  I get

 regression=# select to_tsquery(unaccent('a:*'));
 NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
  to_tsquery
 
  
 (1 row)

 regression=# select to_tsquery(unaccent('an:*'));
 NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
  to_tsquery
 
  
 (1 row)

 regression=# select to_tsquery(unaccent('ana:*'));
  to_tsquery
 
  'ana':*
 (1 row)

 Of course, only the last is going to match 'ana'.

 So you need to use a text search configuration in which a/an are
 not stop words.  Or possibly you could cast the unaccent result
 directly to tsquery rather than passing it through to_tsquery(),
 though likely that would just have a different set of failure modes
 with queries where you do wish stemming would occur.

 The problem with "no" seems to be the same.
 
One can always specify 'simple' as the config, eliminating any "stop-wprd 
smartness":
 
andreak=> select to_tsquery('simple', 'a:*');  
 to_tsquery  
 
 'a':*
 (1 row)

  
-- Andreas Joseph Krogh

Re: [GENERAL] Full text search randomly not working for short prefixes?

2016-12-02 Thread cen

Thanks, that makes sense. I think I'll go with the cast approach, I 
don't really need stemming anywhere.



Tom Lane je 02. 12. 2016 ob 16:33 napisal:

cen  writes:

Something funny going on with my full text search.. and I have no idea what.

The way to debug this sort of thing is generally to look at what tsquery
you're actually getting.  I get

regression=# select to_tsquery(unaccent('a:*'));
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, 
ignored
  to_tsquery

  
(1 row)


regression=# select to_tsquery(unaccent('an:*'));
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, 
ignored
  to_tsquery

  
(1 row)


regression=# select to_tsquery(unaccent('ana:*'));
  to_tsquery

  'ana':*
(1 row)

Of course, only the last is going to match 'ana'.

So you need to use a text search configuration in which a/an are
not stop words.  Or possibly you could cast the unaccent result
directly to tsquery rather than passing it through to_tsquery(),
though likely that would just have a different set of failure modes
with queries where you do wish stemming would occur.

The problem with "no" seems to be the same.

regards, tom lane




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search randomly not working for short prefixes?

2016-12-02 Thread Tom Lane

cen  writes:
> Something funny going on with my full text search.. and I have no idea what.

The way to debug this sort of thing is generally to look at what tsquery
you're actually getting.  I get

regression=# select to_tsquery(unaccent('a:*')); 
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, 
ignored
 to_tsquery 

 
(1 row)

regression=# select to_tsquery(unaccent('an:*'));
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, 
ignored
 to_tsquery 

 
(1 row)

regression=# select to_tsquery(unaccent('ana:*'));
 to_tsquery 

 'ana':*
(1 row)

Of course, only the last is going to match 'ana'.

So you need to use a text search configuration in which a/an are
not stop words.  Or possibly you could cast the unaccent result
directly to tsquery rather than passing it through to_tsquery(),
though likely that would just have a different set of failure modes
with queries where you do wish stemming would occur.

The problem with "no" seems to be the same.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search randomly not working for short prefixes?

2016-12-02 Thread cen


Something funny going on with my full text search.. and I have no idea what.
I have a receiver called "Ana", this is her tsv column:

'3865100':4 'acevent...@mailinator.com':3B 'ana':1A 'novak':2A

This queries do not find her:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('a:*'));
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('an:*'));

This does:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('ana:*'));

Now to an even more interesting part: I have 3 people with last name 
"Novak" and one with name "Nov"


This query finds all 4:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('n:*'));

This finds NONE:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('no:*'));

This finds all 4 again:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nov:*'));

..and this finds all with the last name only:
SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nova:*'));


These are the TSV columns of last name:
"'3865100':4 'janez':1A 'janezno...@mailinator.com':3B 'novak':2A"
"'3865100':4 'acevent...@mailinator.com':3B 'ana':1A 'novak':2A"
"'3865100':4 'novak':2A 'tine':1A 'tno...@mailinator.com':3B"
"'2141500':4 'alen.n...@gmailer.com':3B 'allan':1A 'novak':2A"

And the first name:
"'38651604724':6 'brez':3A 'list':4A 'nov':1A 
'novreceiver...@mailinator.com':5B 'receiv':2A"



What is going on here?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search tsv column aproach vs concat confusion

2016-11-18 Thread Artur Zakirov


Hi,

On 16.11.2016 11:54, cen wrote:

Hi

I am seeking some clarification in regard to full text search across
multiple tables and what the best approach is. Documentation talks about
two approaches when it comes to building a document: on-the-fly concat
of columns and a dedicated tsv column approach. Let's say I want to
perform a search for |"txt1 & txt2 & txt3" on columns table1.col1,
table1.col2 and table2.col1. I see the following solutions:|

|1. Concat all three into a document and perform a FTS.|

|SELECT * FROM (
|

|SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument
F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery 
|||WHEREsubquery.document@@to_tsquery(unaccent(?));| |

|2. Create a tsv column in each table, concat tsv columns and perform
FTS on that.|

|SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id
WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));|

|3. Have a tsv column only in table1 and insert table2.col1 to the tsv
via triggers. Works but seems very hacky.|

|
|

|It seems to me that option #2 is fast and easy to implement but I am
not sure what the concat of tsvs really means from index usage and
performance standpoint. Option #1 is the most flexible and I'd use that
all the time if it was not THAT much slower than tsv column approacj.
Documentation on TSV columns states: "||Another advantage is that
searches will be faster, since it will not be necessary to redo the
to_tsvector calls to verify index matches."
|

The question is, how much faster are tsv columns really? Are there any
benchmarks about this? If the performance difference is negligible I'd
advocate that using tsv columns is a waste of time and space in most
general cases. But since there is no information on how much faster it's
hard to decide.



I haven't any such benchmarks. But if you have a real database, you can 
perform tests using it on your solutions. Because it depends on your 
task and what you need.


By the way, I suppose it is better to use COALESCE() function if your 
columns could have NULL value:


SELECT * FROM (

SELECT to_tsvector(coalesce(table1.col1,'')) ||
   to_tsvector(coalesce(table1.col2,'')) ||
   to_tsvector(coalesce(table2.col1,'')) as document FROM table1 
LEFT JOIN table2 ON table1.table2_id=table2.id


) subquery WHERE subquery.document @@ to_tsquery(unaccent(?));

And specifying a text search configuration makes queries a little bit 
faster:


... to_tsvector('english', coalesce(table1.col1,'')) ...

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search tsv column aproach vs concat confusion

2016-11-16 Thread cen


Hi

I am seeking some clarification in regard to full text search across 
multiple tables and what the best approach is. Documentation talks about 
two approaches when it comes to building a document: on-the-fly concat 
of columns and a dedicated tsv column approach. Let's say I want to 
perform a search for |"txt1 & txt2 & txt3" on columns table1.col1, 
table1.col2 and table2.col1. I see the following solutions:|


|1. Concat all three into a document and perform a FTS.|

|SELECT * FROM (
|

|SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument 
F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery |||WHEREsubquery.document@@to_tsquery(unaccent(?));| |


|2. Create a tsv column in each table, concat tsv columns and perform 
FTS on that.|


|SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id 
WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));|


|3. Have a tsv column only in table1 and insert table2.col1 to the tsv 
via triggers. Works but seems very hacky.|


|
|

|It seems to me that option #2 is fast and easy to implement but I am 
not sure what the concat of tsvs really means from index usage and 
performance standpoint. Option #1 is the most flexible and I'd use that 
all the time if it was not THAT much slower than tsv column approacj. 
Documentation on TSV columns states: "||Another advantage is that 
searches will be faster, since it will not be necessary to redo the 
to_tsvector calls to verify index matches."

|

The question is, how much faster are tsv columns really? Are there any 
benchmarks about this? If the performance difference is negligible I'd 
advocate that using tsv columns is a waste of time and space in most 
general cases. But since there is no information on how much faster it's 
hard to decide.



Best regards,
Klemen

||

Re: [GENERAL] Full text search on Chemistry text

2016-06-08 Thread John McKown

On Wed, Jun 8, 2016 at 8:46 AM, Allan Kamau  wrote:

> I would like to generate tsvectors on documents that contain chemistry
> related text.
> Is there a synonym dictionary for chemistry terms available?
>
> -Allan
>

I did a Google search on "chemistry terms dictionary" and got a lot of
hits. Are you asking for a "canonical" and "official" list?

-- 
The unfacts, did we have them, are too imprecisely few to warrant our
certitude.

Maranatha! <><
John McKown

[GENERAL] Full text search on Chemistry text

2016-06-08 Thread Allan Kamau

I would like to generate tsvectors on documents that contain chemistry
related text.
Is there a synonym dictionary for chemistry terms available?

-Allan

Re: [GENERAL] full text search index

2016-05-26 Thread Kevin Grittner

You don't provide much context, like PostgreSQL version or machine
characteristics.

https://wiki.postgresql.org/wiki/SlowQueryQuestions

On Wed, May 25, 2016 at 11:04 PM, Patrick Baker
 wrote:

>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>> OR LOWER(j0_.name_last) LIKE '%some%')
>>AND j0_.id = 5)
>>   AND j0_.id = 5
>
> The query is taking ages to run.
>
> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

I very much doubt that full text search is going to be helpful here
-- perhaps trigrams with an appropriate gist or gin index could
help.  Depending on table sizes and data present, picking out rows
based on the OR of scanning for a sequence of characters in a
couple character string columns might not be your fastest query to
run.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search index

2016-05-26 Thread Alban Hertroys

On 26 May 2016 at 06:04, Patrick Baker  wrote:
> Hi there,
>
> I've got the following query:
>>
>>
>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>> OR LOWER(j0_.name_last) LIKE '%some%')
>>AND j0_.id = 5)
>>   AND j0_.id = 5
>
>
> The query is taking ages to run.

Your guess is as good as ours without knowing what query plan the
database decided on. Post the output of explain analyze.

> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

In general, you have to use an expression of which the query planner
can see that it's equivalent to the expression used in the index.
Otherwise the query planner has no way of knowing whether the index is
suitable for the query and it won't use the index.

>> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid",
>> ("full_text_universal_cast"("name_first"::"text")),
>> ("full_text_universal_cast"("name_last"::"text")));

In your case, you should query on full_text_universal_cast(your_field)
instead of on like '%some%'.

Alternatively, if your query always uses the sanme wildcard expression
you could create indexes on your_field like '%some%'.

> full_text_universal_cast:
>>
>> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data
>> "text")
>>   RETURNS "tsvector" AS
>> $BODY$
>> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
>> $BODY$
>>   LANGUAGE sql IMMUTABLE
>>   COST 1000;

The query planner has no way of knowing what this function does
internally, so it certainly won't match the function results in the
index up with your like expression.

Regards,

Alban Hertroys
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search index

2016-05-26 Thread Patrick Baker

>
>
>
> Maybe Lucas Possamai can help.  He seems to be a little bit further along
> in this exercise.
>
> I'm too tired to care at the moment.  And I haven't had much practical
> work here anyway.
>
> David J.
>
>
>
>

I subscribed to the list today, so don't have the old emails

I had a look on the archives tough, Lucas do you have any progress? If so,
please let me know :)


Thanks
Patrick

[GENERAL] full text search index

2016-05-25 Thread David G. Johnston

On Thursday, May 26, 2016, Patrick Baker  wrote:

> Hi there,
>
> I've got the following query:
>
>>
>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>> OR LOWER(j0_.name_last) LIKE '%some%')
>>AND j0_.id = 5)
>>   AND j0_.id = 5
>
>
> The query is taking ages to run.
>
> I read about wildcards and it seems I have to use a function with to_tsvector
> ?
>
>>
>> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid",
>> ("full_text_universal_cast"("name_first"::"text")),
>> ("full_text_universal_cast"("name_last"::"text")));
>
>
> full_text_universal_cast:
>
>> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data
>> "text")
>>   RETURNS "tsvector" AS
>> $BODY$
>> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
>> $BODY$
>>   LANGUAGE sql IMMUTABLE
>>   COST 1000;
>
>
>
>
> Would be something like above? Because it's not working...
>
> What am I missing guys?
> Thanks
>

Maybe Lucas Possamai can help.  He seems to be a little bit further along
in this exercise.

I'm too tired to care at the moment.  And I haven't had much practical work
here anyway.

David J.

[GENERAL] full text search index

2016-05-25 Thread Patrick Baker

Hi there,

I've got the following query:

>
> SELECT COUNT(DISTINCT j0_.id) AS sclr10
> FROM customers j0_
> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
> OR LOWER(j0_.name_last) LIKE '%some%')
>AND j0_.id = 5)
>   AND j0_.id = 5


The query is taking ages to run.

I read about wildcards and it seems I have to use a function with to_tsvector
?

>
> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid",
> ("full_text_universal_cast"("name_first"::"text")),
> ("full_text_universal_cast"("name_last"::"text")));


full_text_universal_cast:

> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data "text")
>   RETURNS "tsvector" AS
> $BODY$
> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
> $BODY$
>   LANGUAGE sql IMMUTABLE
>   COST 1000;




Would be something like above? Because it's not working...

What am I missing guys?
Thanks

Re: [GENERAL] Full text search question: "01.Bez." --> "Erster Bezirk"

2016-03-13 Thread Dane Foster

Hello,


On Sat, Mar 12, 2016 at 11:40 AM, Johann Höchtl 
wrote:

> I fear I have an involved challenge concerning FTS.
>
> Assume I have the following text in a column:
>
> Graz,06.Bez.:Blah
>
> This parses as:
> SELECT alias, description, token FROM ts_debug('german',
> 'Graz,06.Bez.:Blah');
>alias   |   description   | token
> ---+-+
>  asciiword | Word, all ASCII | Graz
>  blank | Space symbols   | ,
>  host  | Host| 06.Bez
>  blank | Space symbols   | .:
>  asciiword | Word, all ASCII | Blah
>
>
> Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez
> means "6th district"
>
> My first problem might be that the parser identifies "06.Bez." as a host
> lexeme, but ...
>
> I already defined a synonym dictionary to enable searching for "Bezirk",
> when there is only "Bez." in the database:
>
> file: bevaddress_host.syn:
> 01.bez bezirk
> 06.bez bezirk
> 
>
>
> CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn (
> TEMPLATE = synonym,
> SYNONYMS = bevaddress_host
> );
> ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH
> bevaddress_host_syn, simple;
>
>
> I wonder how I can achieve to be able to search for "Erster Bezirk"
> ("First district") to match eg. "01.Bez."
>
> Thank you for your help, Johann
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


As of the time of writing this I haven't seen any replies to your post so
you may not be aware that an answer was provided to your specific question
in a blog. http://obartunov.livejournal.com/185579.html

Regards,



Dane

[GENERAL] Full text search question: "01.Bez." --> "Erster Bezirk"

2016-03-12 Thread Johann Höchtl


I fear I have an involved challenge concerning FTS.

Assume I have the following text in a column:

Graz,06.Bez.:Blah

This parses as:
SELECT alias, description, token FROM ts_debug('german', 
'Graz,06.Bez.:Blah');

   alias   |   description   | token
---+-+
 asciiword | Word, all ASCII | Graz
 blank | Space symbols   | ,
 host  | Host| 06.Bez
 blank | Space symbols   | .:
 asciiword | Word, all ASCII | Blah


Bez. ist the abbreviation for "Bezirk" (german for ~district). 06.Bez 
means "6th district"


My first problem might be that the parser identifies "06.Bez." as a host 
lexeme, but ...


I already defined a synonym dictionary to enable searching for "Bezirk", 
when there is only "Bez." in the database:


file: bevaddress_host.syn:
01.bez bezirk
06.bez bezirk



CREATE TEXT SEARCH DICTIONARY bevaddress_host_syn (
TEMPLATE = synonym,
SYNONYMS = bevaddress_host
);
ALTER TEXT SEARCH CONFIGURATION german ALTER MAPPING FOR host WITH 
bevaddress_host_syn, simple;



I wonder how I can achieve to be able to search for "Erster Bezirk" 
("First district") to match eg. "01.Bez."


Thank you for your help, Johann


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] : :Full text search query ::

2015-02-25 Thread JD

Hi All,

please find herewith the following  query

1. select * from partdetails where scode=118 and
(to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104')

it is showing only 1 record as output, it is expected to give 17 records as
output.

2. select * from partdetails where scode=118 and
(to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/')

it is showing only 1 record as output, it is expected to give 17 records as
output.

3. select * from partdetails where scode=118 and
(to_tsvector('english'::regconfig, part_number::text)) @@
to_tsquery('104/1')

it is showing  17 records as output.

In our search case we are passing parameter value as 104 and expected to
get 17 records.


Kindly some one guide here.

Re: [GENERAL] : :Full text search query ::

2015-02-25 Thread Tomas Vondra

Hi,

On 25.2.2015 12:50, JD wrote:
 Hi All,
 
 please find herewith the following  query
 
 1. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104')
 
 it is showing only 1 record as output, it is expected to give 17 records
 as output.
 
 2. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/')
 
 it is showing only 1 record as output, it is expected to give 17 records
 as output.
 
 3. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@
 to_tsquery('104/1')
 
 it is showing  17 records as output.
 
 In our search case we are passing parameter value as 104 and expected to
 get 17 records.
 
 
 Kindly some one guide here.

You need to post 'to_tsvector('english', part_number)' for the 16 rows
that you think should be returned but aren't.

Fulltext works so that it transforms the source (part_number in this
case) as defined in the text search configuration ('english'), and
compares this with the tsquery.

My bet is that the transformation keeps the whole string ('104/1')  in
this case, so that it does not match the tsquery.

ISTM you're trying to do a prefix search on the part_number. In that
case fulltext may not be the right solution, because it's fuzzy by
nature. If you have two-part part numbers (i.e. it's always A/B) then
maybe split that into two fields, and use simple equality conditions on
each field.

So instead of column 'part_number' containing valuye '104/1' use two
columns part_number_a and part_number_b, containing values '104' and
'1', and simple equality queries

WHERE part_number_a = '104' and part_number_b = '1'

or (if you want to match just the first part)

   WHERE part_number_a = '104'

Another option is to use an index with a 'varchar_pattern_ops' opclass,
which allows you to do prefix LIKE queries [1]

   CREATE INDEX custom_partnum_idx
 ON partdetails (part_number varchar_pattern_ops);


   SELECT ... FROM partdetails WHERE part_number LIKE '104/%'


[1] http://www.postgresql.org/docs/9.2/static/indexes-opclass.html

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search prefix matching

2014-12-16 Thread Heikki Rauhala

Hi,

I’m trying to use PostgreSQL's full text search for searching names with prefix 
matching. I’ve got a materialized view with the  tsvector’s in an indexed 
column which I’m then searching with prefix matching, as in the sqlfiddle: 
http://sqlfiddle.com/#!15/a2389/6 and below.

My problem is that when using the ‘finnish’ text search configuration, the 
names are split before the end, and they’re not matched when searching with 
prefix search that has exactly one character more than the lexeme, as also 
demonstrated in the above fiddle. When there are two characters after the 
lexeme, it does match.

I’m working around the issue by using ‘simple’ configuration, which includes 
the full words in the lexemes.

Should text search prefixes work predicatably as documented in [1] even if the 
lexemes are shorter than the query? How can I get it to work?

Best regards,

Heikki Rauhala


[1] 
http://www.postgresql.org/docs/9.3/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

create table names (name varchar);
insert into names (name) values ('Sofia');

create materialized view name_fulltext as
  select
name,
to_tsvector('finnish', name) as
  searchable_index_col
  from names;

select * from name_fulltext;
select to_tsquery('finnish','sof:*');

select 'found sof', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sof:*');

select 'notfound sofi', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sofi:*');

select 'found sofia', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sofia:*');

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search prefix matching

2014-12-16 Thread Vincent Veyron

On Tue, 16 Dec 2014 14:59:51 +0200
Heikki Rauhala heikki.rauh...@reaktor.fi wrote:

Hi Heikki,

There is a typo :

 
 create materialized view name_fulltext as

ERREUR:  erreur de syntaxe sur ou près de « materialized »
LIGNE 1 : create materialized view name_fulltext as

You might want to post the results too, next time; I posted them with a 
corrected script below:

create table names (name varchar);
insert into names (name) values ('Sofia');

create view name_fulltext as
  select
name,
to_tsvector('finnish', name) as
  searchable_index_col
  from names;

select * from name_fulltext;
select to_tsquery('finnish','sof:*');

select 'found sof', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sof:*');

select 'notfound sofi', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sofi:*');

select 'found sofia', name 
from name_fulltext 
where searchable_index_col@@to_tsquery('finnish','sofia:*');


vv= select * from name_fulltext;
 name  | searchable_index_col 
---+--
 Sofia | 'sof':1
(1 ligne)

vv= select to_tsquery('finnish','sof:*');
 to_tsquery 

 'sof':*
(1 ligne)

vv= 
vv= select 'found sof', name 
vv- from name_fulltext 
vv- where searchable_index_col@@to_tsquery('finnish','sof:*');
 ?column?  | name  
---+---
 found sof | Sofia
(1 ligne)

vv= 
vv= select 'notfound sofi', name 
vv- from name_fulltext 
vv- where searchable_index_col@@to_tsquery('finnish','sofi:*');
 ?column? | name 
--+--
(0 ligne)

vv= 
vv= select 'found sofia', name 
vv- from name_fulltext 
vv- where searchable_index_col@@to_tsquery('finnish','sofia:*');
  ?column?   | name  
-+---
 found sofia | Sofia
(1 ligne)




-- 
Regards, Vincent Veyron 

https://libremen.com/ 
Legal case, contract and insurance claim management software


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search prefix matching

2014-12-16 Thread Tom Lane

Heikki Rauhala heikki.rauh...@reaktor.fi writes:
 Should text search prefixes work predicatably as documented in [1] even if 
 the lexemes are shorter than the query? How can I get it to work?

I believe what you're seeing can be explained by these observations:

regression=# select to_tsvector('finnish', 'sofia');
 to_tsvector 
-
 'sof':1
(1 row)

regression=# select to_tsquery('finnish','sofia:*');
 to_tsquery 

 'sof':*
(1 row)

regression=# select to_tsquery('finnish','sofi:*');
 to_tsquery 

 'sofi':*
(1 row)

regression=# select to_tsquery('finnish','sof:*');
 to_tsquery 

 'sof':*
(1 row)

What this shows is that the finnish configuration includes a word-stemming
rule that strips off ia.  It won't strip off just i though, so sofi
doesn't get reduced to the same root and therefore doesn't match sofia.
The * addition does nothing for you here since it allows matching in
the other direction (query shorter than target).

I know nothing of Finnish so I can't say just how correct these particular
stemming rules are for that language; perhaps they need adjustment.  But
it seems to me that if you want blind non-language-aware prefix matching,
you probably don't want the full-text-search machinery at all.  Full text
search is meant to deal with words, both in the documents and the queries.
You might take a look at pg_trgm as an alternative.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search prefix matching

2014-12-16 Thread Heikki Rauhala

On 16.12.2014, at 17:09, Vincent Veyron vv.li...@wanadoo.fr wrote:

 On Tue, 16 Dec 2014 14:59:51 +0200
 Heikki Rauhala heikki.rauh...@reaktor.fi wrote:
 
 Hi Heikki,
 
 There is a typo :
 
 create materialized view name_fulltext as
 
 ERREUR:  erreur de syntaxe sur ou près de « materialized »
 LIGNE 1 : create materialized view name_fulltext as
 

I forgot to mention the postgres version, which is 9.3. Materialized views were 
introduced in 9.3 and are important to me in this case, because they can be 
indexed.

Next time I’ll include both the version and the full output.

Best regards, 

 - Heikki Rauhala




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full-Text Search question

2014-05-14 Thread Dorian Hoxha

Search for fulltext tutorial  + json functions
http://www.postgresql.org/docs/9.3/static/functions-json.html


On Wed, May 14, 2014 at 1:00 AM, Jesus Rafael Sanchez Medrano 
jesusraf...@gmail.com wrote:

 thanks... could you please be so kind to post some snippet/code for this?

 Att.
 ==
 Jesus Rafael Sanchez Medrano
 Life is a dream, of which all must wake up


 On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.comwrote:

 Easy, you need to extract text fields from json and construct tsvector
 from them (use concatenation, for example).

 On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano
 jesusraf...@gmail.com wrote:
  can postgres do FTS (full text search) on a json column? if possible,
 please
  be so kindd to give some snippet/example.
 
 
  Att.
  ==
  Jesus Rafael Sanchez Medrano
  Life is a dream, of which all must wake up

[GENERAL] Full-Text Search question

2014-05-13 Thread Jesus Rafael Sanchez Medrano

can postgres do FTS (full text search) on a json column? if possible,
please be so kindd to give some snippet/example.


Att.
==
Jesus Rafael Sanchez Medrano
Life is a dream, of which all must wake up

Re: [GENERAL] Full-Text Search question

2014-05-13 Thread Oleg Bartunov

Easy, you need to extract text fields from json and construct tsvector
from them (use concatenation, for example).

On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano
jesusraf...@gmail.com wrote:
 can postgres do FTS (full text search) on a json column? if possible, please
 be so kindd to give some snippet/example.


 Att.
 ==
 Jesus Rafael Sanchez Medrano
 Life is a dream, of which all must wake up


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full-Text Search question

2014-05-13 Thread Jesus Rafael Sanchez Medrano

thanks... could you please be so kind to post some snippet/code for this?

Att.
==
Jesus Rafael Sanchez Medrano
Life is a dream, of which all must wake up


On Tue, May 13, 2014 at 5:33 PM, Oleg Bartunov obartu...@gmail.com wrote:

 Easy, you need to extract text fields from json and construct tsvector
 from them (use concatenation, for example).

 On Tue, May 13, 2014 at 7:38 PM, Jesus Rafael Sanchez Medrano
 jesusraf...@gmail.com wrote:
  can postgres do FTS (full text search) on a json column? if possible,
 please
  be so kindd to give some snippet/example.
 
 
  Att.
  ==
  Jesus Rafael Sanchez Medrano
  Life is a dream, of which all must wake up

Re: [GENERAL] Full text search on partial URLs

2014-01-03 Thread Zev Benjamin


On 11/15/2013 07:40 PM, Zev Benjamin wrote:


One problem that I've run into here is that I would also like to
highlight matched text in my application.  For my existing search
solution, I do this with ts_headline.  For partial matches, it's
unfortunately not just a matter of searching for the text and adding the
appropriate markup because my documents are HTML (the FTS lexer
helpfully pulls out all the HTML tags so it hasn't been a problem so
far) and we don't want to accidentally highlight some of the
attributes of the markup.

One way to solve this would be if there were a way to turn a tsvector
and tsquery pair into a list of the offsets and lengths of the lexemes
that match.  The highlighting could then be done at the application
level rather than the database level while still leveraging Postgres's
FTS functionality.


I've written C functions to implement this and attached them to this 
email.  The support files necessary for making a module are available at
https://github.com/zbenjamin/tsearch_extras.  I'm new to the PostgreSQL 
code base so any feedback or comments would be greatly appreciated. 
Would these be appropriate to submit as patches to PostgreSQL?



Thanks,
Zev

/*-
 *
 * This use of this program is subject to the terms of The PostgreSQL License:
 *
 * Copyright (c) 2013, Zulip, Inc.
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation for any purpose, without fee, and without a written agreement
 * is hereby granted, provided that the above copyright notice and this
 * paragraph and the following two paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL ZULIP, INC. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
 * SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS,
 * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ZULIP,
 * INC. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * ZULIP, INC. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND Zulip,
 * Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
 * ENHANCEMENTS, OR MODIFICATIONS.
 *
 *-
 */

#include postgres.h

#include fmgr.h
#include funcapi.h
#include catalog/pg_type.h
#include tsearch/ts_utils.h
#include tsearch/ts_public.h
#include tsearch/ts_cache.h
#include utils/elog.h
#include utils/array.h
#include utils/builtins.h

PG_MODULE_MAGIC;


typedef struct {
	int4 cur_word;
	int4 num_words;
	int4 char_offset;
	HeadlineWordEntry* words;
} TsMatchesData;

typedef struct {
	int4 offset;
	int4 len;
} TsMatchLocation;

PG_FUNCTION_INFO_V1(ts_match_locs_array);
PG_FUNCTION_INFO_V1(ts_match_locs_array_byid);
PG_FUNCTION_INFO_V1(tsvector_lexemes);

Datum ts_match_locs_array(PG_FUNCTION_ARGS);
Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS);
Datum tsvector_lexemes(PG_FUNCTION_ARGS);

static void
ts_match_locs_setup(Oid cfgId, TsMatchesData *mdata, text* in, TSQuery query)
{
	HeadlineParsedText prs;
	TSConfigCacheEntry *cfg;
	TSParserCacheEntry *prsobj;

	cfg = lookup_ts_config_cache(cfgId);
	prsobj = lookup_ts_parser_cache(cfg-prsId);

	memset(prs, 0, sizeof(HeadlineParsedText));
	prs.lenwords = 32;
	prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords);

	hlparsetext(cfgId, prs, query, VARDATA(in), VARSIZE(in) - VARHDRSZ);

	FunctionCall3((prsobj-prsheadline),
  PointerGetDatum(prs),
  PointerGetDatum(NIL),
  PointerGetDatum(query));

	mdata-cur_word = 0;
	mdata-char_offset = 0;
	mdata-num_words = prs.curwords;
	mdata-words = prs.words;
}

static bool
ts_match_locs_next_match(TsMatchesData *mdata, TsMatchLocation *match)
{
	while (mdata-cur_word  mdata-num_words)
	{
		HeadlineWordEntry* word = mdata-words + mdata-cur_word;
		int offset = mdata-char_offset;

		mdata-cur_word++;
		if (! word-skip)
		{
			mdata-char_offset += word-len;

			if (word-selected)
			{
match-offset = offset;
match-len = word-len;
return true;
			}
		}
	}

	return false;
}

Datum
ts_match_locs_array_byid(PG_FUNCTION_ARGS)
{
	TsMatchesData mdata;
	TsMatchLocation match;
	Oid cfgId = PG_GETARG_OID(0);
	text *in = PG_GETARG_TEXT_P(1);
	TSQuery query = PG_GETARG_TSQUERY(2);
	ArrayType *result;
	Datum *elems;
	int num_matches_allocd = 6; /* a random guess */
	int num_matches = 0;
	int result_dims[2];
	int result_lbs[2];

	elems = palloc(sizeof(Datum) * 2 * num_matches_allocd);

	ts_match_locs_setup(cfgId, mdata, in, query);

	while (ts_match_locs_next_match(mdata, match))
	{
		if (num_matches = num_matches_allocd) {
			num_matches_allocd *= 1.5;
			elems = repalloc(elems, sizeof(Datum) * 2 * num_matches_allocd);
		}
		elems[num_matches * 2] = Int32GetDatum(match.offset);
		elems[num_matches * 2 + 1] =

Re: [GENERAL] Full text search on partial URLs

2014-01-03 Thread Zev Benjamin


On 11/15/2013 07:40 PM, Zev Benjamin wrote:


One problem that I've run into here is that I would also like to
highlight matched text in my application.  For my existing search
solution, I do this with ts_headline.  For partial matches, it's
unfortunately not just a matter of searching for the text and adding the
appropriate markup because my documents are HTML (the FTS lexer
helpfully pulls out all the HTML tags so it hasn't been a problem so
far) and we don't want to accidentally highlight some of the
attributes of the markup.

One way to solve this would be if there were a way to turn a tsvector
and tsquery pair into a list of the offsets and lengths of the lexemes
that match.  The highlighting could then be done at the application
level rather than the database level while still leveraging Postgres's
FTS functionality.


I've written C functions to implement this and attached them to this 
email.  The support files necessary for making a module are available at
https://github.com/zbenjamin/tsearch_extras.  I'm new to the PostgreSQL 
code base so any feedback or comments would be greatly appreciated. 
Would these be appropriate to submit as patches to PostgreSQL?



Thanks,
Zev
/*-
 *
 * This use of this program is subject to the terms of The PostgreSQL License:
 *
 * Copyright (c) 2013, Zulip, Inc.
 *
 * Permission to use, copy, modify, and distribute this software and its
 * documentation for any purpose, without fee, and without a written agreement
 * is hereby granted, provided that the above copyright notice and this
 * paragraph and the following two paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL ZULIP, INC. BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
 * SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS,
 * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF ZULIP,
 * INC. HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * ZULIP, INC. SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED
 * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS, AND Zulip,
 * Inc. HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
 * ENHANCEMENTS, OR MODIFICATIONS.
 *
 *-
 */

#include postgres.h

#include fmgr.h
#include funcapi.h
#include catalog/pg_type.h
#include tsearch/ts_utils.h
#include tsearch/ts_public.h
#include tsearch/ts_cache.h
#include utils/elog.h
#include utils/array.h
#include utils/builtins.h

PG_MODULE_MAGIC;


typedef struct {
	int4 cur_word;
	int4 num_words;
	int4 char_offset;
	HeadlineWordEntry* words;
} TsMatchesData;

typedef struct {
	int4 offset;
	int4 len;
} TsMatchLocation;

PG_FUNCTION_INFO_V1(ts_match_locs_array);
PG_FUNCTION_INFO_V1(ts_match_locs_array_byid);
PG_FUNCTION_INFO_V1(tsvector_lexemes);

Datum ts_match_locs_array(PG_FUNCTION_ARGS);
Datum ts_match_locs_array_byid(PG_FUNCTION_ARGS);
Datum tsvector_lexemes(PG_FUNCTION_ARGS);

static void
ts_match_locs_setup(Oid cfgId, TsMatchesData *mdata, text* in, TSQuery query)
{
	HeadlineParsedText prs;
	TSConfigCacheEntry *cfg;
	TSParserCacheEntry *prsobj;

	cfg = lookup_ts_config_cache(cfgId);
	prsobj = lookup_ts_parser_cache(cfg-prsId);

	memset(prs, 0, sizeof(HeadlineParsedText));
	prs.lenwords = 32;
	prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords);

	hlparsetext(cfgId, prs, query, VARDATA(in), VARSIZE(in) - VARHDRSZ);

	FunctionCall3((prsobj-prsheadline),
  PointerGetDatum(prs),
  PointerGetDatum(NIL),
  PointerGetDatum(query));

	mdata-cur_word = 0;
	mdata-char_offset = 0;
	mdata-num_words = prs.curwords;
	mdata-words = prs.words;
}

static bool
ts_match_locs_next_match(TsMatchesData *mdata, TsMatchLocation *match)
{
	while (mdata-cur_word  mdata-num_words)
	{
		HeadlineWordEntry* word = mdata-words + mdata-cur_word;
		int offset = mdata-char_offset;

		mdata-cur_word++;
		if (! word-skip)
		{
			mdata-char_offset += word-len;

			if (word-selected)
			{
match-offset = offset;
match-len = word-len;
return true;
			}
		}
	}

	return false;
}

Datum
ts_match_locs_array_byid(PG_FUNCTION_ARGS)
{
	TsMatchesData mdata;
	TsMatchLocation match;
	Oid cfgId = PG_GETARG_OID(0);
	text *in = PG_GETARG_TEXT_P(1);
	TSQuery query = PG_GETARG_TSQUERY(2);
	ArrayType *result;
	Datum *elems;
	int num_matches_allocd = 6; /* a random guess */
	int num_matches = 0;
	int result_dims[2];
	int result_lbs[2];

	elems = palloc(sizeof(Datum) * 2 * num_matches_allocd);

	ts_match_locs_setup(cfgId, mdata, in, query);

	while (ts_match_locs_next_match(mdata, match))
	{
		if (num_matches = num_matches_allocd) {
			num_matches_allocd *= 1.5;
			elems = repalloc(elems, sizeof(Datum) * 2 * num_matches_allocd);
		}
		elems[num_matches * 2] = Int32GetDatum(match.offset);
		elems[num_matches * 2 + 1] =

[GENERAL] Full text search on partial URLs

2013-11-06 Thread Zev Benjamin


Hi,

I have Postgres full text search set up for my application and it's been 
working great!  However, my users would like their searches to turn up 
parts of URLs.  For example, they would like a search for foobar to 
turn up a document that contains the string 
http://example.com/foobar/blah; (and similarly for queries like 
example and blah).  With the default dictionaries for host, url, and 
url_path, the search query would have to contain the complete host or 
url path.


What is the best way to accomplish this?  Should I be looking at 
building a custom dictionary that breaks down hosts and urls or is there 
something simpler I can do?



Thanks,
Zev


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search on partial URLs

2013-11-06 Thread bricklen

On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
zev-pg...@strangersgate.comwrote:

 Hi,

 I have Postgres full text search set up for my application and it's been
 working great!  However, my users would like their searches to turn up
 parts of URLs.  For example, they would like a search for foobar to turn
 up a document that contains the string http://example.com/foobar/blah;
 (and similarly for queries like example and blah).  With the default
 dictionaries for host, url, and url_path, the search query would have to
 contain the complete host or url path.

 What is the best way to accomplish this?  Should I be looking at building
 a custom dictionary that breaks down hosts and urls or is there something
 simpler I can do?


Have you looked into trigrams?
http://www.postgresql.org/docs/current/static/pgtrgm.html

Re: [GENERAL] Full text search on partial URLs

2013-11-06 Thread Zev Benjamin


On 11/06/2013 01:47 PM, bricklen wrote:


On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote:

Hi,

I have Postgres full text search set up for my application and it's
been working great!  However, my users would like their searches to
turn up parts of URLs.  For example, they would like a search for
foobar to turn up a document that contains the string
http://example.com/foobar/__blah http://example.com/foobar/blah
(and similarly for queries like example and blah).  With the
default dictionaries for host, url, and url_path, the search query
would have to contain the complete host or url path.

What is the best way to accomplish this?  Should I be looking at
building a custom dictionary that breaks down hosts and urls or is
there something simpler I can do?


Have you looked into trigrams?
http://www.postgresql.org/docs/current/static/pgtrgm.html


I've looked at it in the context of adding fuzzy search.  But my 
understanding is that doing a fuzzy search here would only work if the 
query were a significant fraction of, say, the url path.  For example, I 
would expect a fuzzy search of foobar on /foobar/x to return a high 
similarity, but a fuzzy search of foobar on 
/foobar/some/very/long/path/x to have a low similarity.


Or are you suggesting using trigrams in a different way?


Zev


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search on partial URLs

2013-11-06 Thread bricklen

On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
zev-pg...@strangersgate.comwrote:

 On 11/06/2013 01:47 PM, bricklen wrote:


 On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
 zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote:

 Hi,

 I have Postgres full text search set up for my application and it's
 been working great!  However, my users would like their searches to
 turn up parts of URLs.  For example, they would like a search for
 foobar to turn up a document that contains the string
 http://example.com/foobar/__blah http://example.com/foobar/blah
 (and similarly for queries like example and blah).  With the
 default dictionaries for host, url, and url_path, the search query
 would have to contain the complete host or url path.

 What is the best way to accomplish this?  Should I be looking at
 building a custom dictionary that breaks down hosts and urls or is
 there something simpler I can do?


 Have you looked into trigrams?
 http://www.postgresql.org/docs/current/static/pgtrgm.html


 I've looked at it in the context of adding fuzzy search.  But my
 understanding is that doing a fuzzy search here would only work if the
 query were a significant fraction of, say, the url path.  For example, I
 would expect a fuzzy search of foobar on /foobar/x to return a high
 similarity, but a fuzzy search of foobar on /foobar/some/very/long/path/x
 to have a low similarity.

 Or are you suggesting using trigrams in a different way?


Yeah, I was thinking more along the lines of allowing wildcard searching,
not similarity.

Eg.
CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( yourcol
gist_trgm_ops );
select * from yourtable where yourcol ~~ '%foobar%';

Re: [GENERAL] Full text search on partial URLs

2013-11-06 Thread Zev Benjamin




On 11/06/2013 02:04 PM, bricklen wrote:


On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin
zev-pg...@strangersgate.com mailto:zev-pg...@strangersgate.com wrote:

On 11/06/2013 01:47 PM, bricklen wrote:


On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin
zev-pg...@strangersgate.com
mailto:zev-pg...@strangersgate.com
mailto:zev-pgsql@__strangersgate.com
mailto:zev-pg...@strangersgate.com wrote:

 Hi,

 I have Postgres full text search set up for my application
and it's
 been working great!  However, my users would like their
searches to
 turn up parts of URLs.  For example, they would like a
search for
 foobar to turn up a document that contains the string
 http://example.com/foobar/blah
http://example.com/foobar/__blah
http://example.com/foobar/__blah http://example.com/foobar/blah
 (and similarly for queries like example and blah).  With the
 default dictionaries for host, url, and url_path, the
search query
 would have to contain the complete host or url path.

 What is the best way to accomplish this?  Should I be
looking at
 building a custom dictionary that breaks down hosts and
urls or is
 there something simpler I can do?


Have you looked into trigrams?
http://www.postgresql.org/__docs/current/static/pgtrgm.__html
http://www.postgresql.org/docs/current/static/pgtrgm.html


I've looked at it in the context of adding fuzzy search.  But my
understanding is that doing a fuzzy search here would only work if
the query were a significant fraction of, say, the url path.  For
example, I would expect a fuzzy search of foobar on /foobar/x to
return a high similarity, but a fuzzy search of foobar on
/foobar/some/very/long/path/__x to have a low similarity.

Or are you suggesting using trigrams in a different way?


Yeah, I was thinking more along the lines of allowing wildcard
searching, not similarity.

Eg.
CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST (
yourcol gist_trgm_ops );
select * from yourtable where yourcol ~~ '%foobar%';



Hrm.  That might work.  So the application-level search functionality 
would be the union of tsearch and trigram wildcard matching.


If anyone else has other ideas, I'd be interested in hearing them as well.


Thanks,
Zev


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search regression tests

2013-09-17 Thread Beena Emerson

I am sorry this mail was not meant for this list.

[GENERAL] Full text search regression tests

2013-09-17 Thread Beena Emerson

Hello All,

Attached patch adds regression tests to check the full-text search
capability of pg_bigm.


Regards,

Beena Emerson


bigm-fulltext-search-regression.patch
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search

2013-07-12 Thread itishree sukla

Hello everyone,

I am using full text search,  however it is not helping me to get the
partial value.

For example in  my document let Pizza Hut is there, if i am searching for
Pizza Hut is it giving me the values for only Pizza or a spell mistake like
pizz is not returning any thing. any work around for this, please suggest.

Regards,
Itishree

Re: [GENERAL] Full text search

2013-07-12 Thread Raghavendra

On Fri, Jul 12, 2013 at 11:48 AM, itishree sukla
itishree.su...@gmail.comwrote:

 Hello everyone,

 I am using full text search,  however it is not helping me to get the
 partial value.

 For example in  my document let Pizza Hut is there, if i am searching for
 Pizza Hut is it giving me the values for only Pizza or a spell mistake like
 pizz is not returning any thing. any work around for this, please suggest.

 Regards,
 Itishree


Hope you are looking like this.

create table foo(v text);
insert into foo values('Near to my house there\'s no any Pizza Hut
restuarant');
insert into foo values('I wont like pizza, but friends are crazy of it');


postgres=# select * from foo  where v ~* 'pizz';
  v
--
 Near to my house there's no any Pizza Hut restuarant
 I wont like pizza, but friends are crazy of it
(2 rows)

postgres=# select * from foo  where v ~* 'pizza hut';
  v
--
 Near to my house there's no any Pizza Hut restuarant
(1 row)

postgres=# select * from foo  where v ~* 'pizza';
  v
--
 Near to my house there's no any Pizza Hut restuarant
 I wont like pizza, but friends are crazy of it
(2 rows)

or

with ILIKE

select * from foo where v ilike '%hut%';


---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

Re: [GENERAL] Full text search

2013-07-12 Thread itishree sukla

Hi Raghavendra,

Thanks for your response, however i want same kind of result using full
text search. is it possible?

Regards,
Itishree


On Fri, Jul 12, 2013 at 12:14 PM, Raghavendra 
raghavendra@enterprisedb.com wrote:


 On Fri, Jul 12, 2013 at 11:48 AM, itishree sukla itishree.su...@gmail.com
  wrote:

 Hello everyone,

 I am using full text search,  however it is not helping me to get the
 partial value.

 For example in  my document let Pizza Hut is there, if i am searching for
 Pizza Hut is it giving me the values for only Pizza or a spell mistake like
 pizz is not returning any thing. any work around for this, please suggest.

 Regards,
 Itishree


 Hope you are looking like this.

 create table foo(v text);
 insert into foo values('Near to my house there\'s no any Pizza Hut
 restuarant');
 insert into foo values('I wont like pizza, but friends are crazy of it');


 postgres=# select * from foo  where v ~* 'pizz';
   v
 --
  Near to my house there's no any Pizza Hut restuarant
  I wont like pizza, but friends are crazy of it
 (2 rows)

 postgres=# select * from foo  where v ~* 'pizza hut';
   v
 --
  Near to my house there's no any Pizza Hut restuarant
 (1 row)

 postgres=# select * from foo  where v ~* 'pizza';
   v
 --
  Near to my house there's no any Pizza Hut restuarant
  I wont like pizza, but friends are crazy of it
 (2 rows)

 or

 with ILIKE

 select * from foo where v ilike '%hut%';


 ---
 Regards,
 Raghavendra
 EnterpriseDB Corporation
 Blog: http://raghavt.blogspot.com/

Re: [GENERAL] Full text search

2013-07-12 Thread Amit Langote

On Fri, Jul 12, 2013 at 12:56 PM, itishree sukla
itishree.su...@gmail.com wrote:
 Hi Raghavendra,

 Thanks for your response, however i want same kind of result using full text
 search. is it possible?


What do you mean when you say you are using full text search? Like
what is the query that you used? That would be helpful.


--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search, SQL functions, and the planner

2013-06-07 Thread Marc Dahn

Hi,

To maintain an externally defined interface, I'd like to define a
function hasword(haystack, needle) that (essentially) returns 1 when
to_tsvector(haystack) @@ to_tsquery(needle), 0 otherwise.

I've tried

CREATE OR REPLACE FUNCTION ivo_hasword(haystack TEXT, needle TEXT)
RETURNS INTEGER AS $func$
SELECT CASE WHEN to_tsvector('english', $1) @@ 
plainto_tsquery($2) 
THEN 1 
ELSE 0 
END
$func$ LANGUAGE SQL STABLE;

It seems the planner inlines the function body, as intended:

# explain select * from rr.resource where 1=ivo_hasword(res_title, 'optical');
...
 Seq Scan on resource  (cost=0.00..2269.72 rows=69 width=924)
   Filter: (1 = CASE WHEN (to_tsvector('english'::regconfig, res_title) @@ 
plainto_tsquery('optical'::text)) THEN 1 ELSE 0 END)

-- but it doesn't use the index on res_title, as it could, as
exhibited by this equivalent query:

# explain select * from rr.resource where to_tsvector('english'::regconfig, 
res_title) @@ plainto_tsquery('optical'::text);
...
 Bitmap Heap Scan on resource  (cost=21.96..731.76 rows=252 width=924)
   Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ 
plainto_tsquery('optical'::text))
   -  Bitmap Index Scan on resource_res_title  (cost=0.00..21.89 rows=252 
width=0)
 Index Cond: (to_tsvector('english'::regconfig, res_title) @@ 
plainto_tsquery('optical'::text))

Indeed, the index isn't used either when I'm writing the 1 = CASE WHEN
statement into the query directly (this is on postgres 9.1).

Sure enough, if I define the function like this:

CREATE OR REPLACE FUNCTION bool_hasword(haystack TEXT, needle TEXT)
RETURNS BOOLEAN AS $func$
SELECT to_tsvector('english', $1) @@ plainto_tsquery($2) 
$func$ LANGUAGE SQL STABLE;

(i.e., returning a boolean instead of the lousy integer), the index
is used.

So -- is there a way to let the planner look through the CASE?  Or
some way of turning the boolean to an integer that the planner can
see through?  The CAST(... AS INTEGER) that fortunately would even
produce the right numbers appears to be opaque to the planner.


Anticipating the sane and logical answer: The boolean function
doesn't really help me; this is about handing through that function
directly to ADQL (http://www.ivoa.net/documents/latest/ADQL.html)
as a user defined function, and those user defined functions cannot
be boolean-valued.

Cheers,

   Markus




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search, SQL functions, and the planner

2013-06-07 Thread Tom Lane

Marc Dahn d...@tfiu.de writes:
 So -- is there a way to let the planner look through the CASE?

No.  It would actually be wrong to simplify the expression in the way
you're imagining, since 1 = case when bool_expr then 1 else 0 end
does not give the same result as the plain bool_expr if the latter
yields null.

If you're sufficiently desperate, though, you might consider some hack
like this:

regression=# CREATE OR REPLACE FUNCTION b_hasword(haystack TEXT, needle TEXT)
RETURNS boolean AS $func$
SELECT to_tsvector('english', $1) @@ plainto_tsquery($2) 
$func$ LANGUAGE SQL STABLE;
CREATE FUNCTION
regression=# create function inteqbool(int,bool) returns bool as
regression-# $$select $1::bool = $2 $$ language sql stable;
CREATE FUNCTION
regression=# CREATE OPERATOR = (procedure = inteqbool, leftarg=int, 
rightarg=bool);
CREATE OPERATOR
regression=# explain select * from resource where 1=b_hasword(res_title, 
'optical');
   QUERY PLAN   

 Bitmap Heap Scan on resource  (cost=4.20..14.38 rows=7 width=32)
   Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ 
plainto_tsquery('optical'::text))
   -  Bitmap Index Scan on resource_to_tsvector_idx  (cost=0.00..4.20 rows=7 
width=0)
 Index Cond: (to_tsvector('english'::regconfig, res_title) @@ 
plainto_tsquery('optical'::text))
(4 rows)

which relies on the fact that the planner *will* simplify boolexpr = true
to just boolexpr, so that after a couple of rounds of inlining and
simplification we get to just the bare @@ expression.  But aside from
the time spent doing that, this approach could have unpleasant side
effects in the form of causing int = bool expressions to be accepted
generally, thus masking errors.  Might be better to fix your client-side
code.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search in Chinese

2012-11-18 Thread Christian Jensen

Can anyone shed any light on how I might go about implementing full text
search for Chinese?

I have heard I need to go down the NGRAM route but would love to get some
external feedback.

Our application has a per user 'ignore' list that we 'left outer join where
null' sort of deal on so I really rather not go out to Solr or anything
external if possible.

I bet there is something simple as it seems like each character is an
individual search term.

Thoughts? Hints? Am I posting in the right place?

Thanks!

-- 

*Christian Jensen*
724 Ioco Rd
Port Moody, BC V3H 2W8
+1 (778) 996-4283
christ...@jensenbox.com

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-11-16 Thread Nicolas Grilly

There is some good news coming from Oleg Bartunov and Alexander Korotkov
about improving ranking speed:
http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf

It's worth reading their slides to gain a better understanding of
PostgreSQL fulltext internals.


On Wed, Mar 7, 2012 at 8:05 PM, Nicolas Grilly nico...@gardentechno.comwrote:

 In a previous discussion thread, Oleg suggested that ts_rank is unable to
 use GIN indices:
 http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php

 This is the only information I have about this.

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-11-16 Thread Merlin Moncure

On Wed, Mar 7, 2012 at 11:59 AM, Andrey Chursin an...@danasoft.ws wrote:
 Is there any way to sort by ranking, avoiding seq scan?
 The only way i see now is to use pg_trgm instead of ts_rank, but we
 did not check yet how applicable is it for our purposes.

pg_tgrm works very well in terms of measuring similarity between two
ascii strings...many non-english languages will struggle.  I doubt
(although I ever tried) it's useful for matching a small phrase to a
large document.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full-text search: Problems with dictionaries and periods

2012-08-10 Thread Theron Luhn

I made a custom synonym dictionary for my full-text search, using the
following code from the documentation.

CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
SYNONYMS = my_synonyms
);

ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR asciiword WITH my_synonym, english_stem;

It works great, except on words with periods in them.  For example, B.B.Q.:

= select * from ts_debug('english', 'B.B.Q.');
 alias |description| token | dictionaries | dictionary | lexemes
---+---+---+--++-
 file  | File or path name | B.B.Q | {simple} | simple | {b.b.q}
 blank | Space symbols | . | {}   ||
(2 rows)

It interprets the string as a filename (that's what alias: file means,
right?), so the asciiword dictionary I set up above doesn't apply.

How can I change this?

— Theron

Re: [GENERAL] Full text search ts_heading strange result

2012-07-26 Thread Johann Spies

Hallo Tom,

 I believe the problem is that the one-argument form of to_tsquery() uses
 the default TS configuration, which you have probably not got set to
 simple.  For me, the default TS configuration is english, which will
 stem polity as politi:
 
 regression=# select to_tsquery('(polity  church)');
  to_tsquery  
 -
  'politi'  'church'
 (1 row)
 
 However the simple configuration doesn't do anything to that lexeme:

Thanks for the explanation.  I am working with a multi-language database
and that was the reason for using the 'simple' configuration. 

I have asked, in an earlier message on this list, advice on how to
handle full text searches in a multi-language database, but got no
reaction to it.  If there is a better way than using the 'simple'
configuration in this case, I would gladly try it.

Regards
Johann.

-- 
Johann SpiesTelefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology 
Universiteit Stellenbosch.

 If any of you lack wisdom, let him ask of God, that 
  giveth to all men liberally, and upbraideth not; and 
  it shall be given him.  James 1:5 
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lÃªs 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged 
and is intended only for the person to whom it is addressed. If you are not the 
intended recipient, you are notified that you may not use, distribute or copy 
this document in any manner whatsoever. Kindly also notify the sender 
immediately by telephone, and delete the e-mail. The University does not accept 
liability for any damage, loss or expense arising from this e-mail and/or 
accessing any files attached to this e-mail.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ts_heading strange result

2012-07-26 Thread Craig Ringer


On 07/26/2012 02:14 PM, Johann Spies wrote:

Hallo Tom,


I believe the problem is that the one-argument form of to_tsquery() uses
the default TS configuration, which you have probably not got set to
simple.  For me, the default TS configuration is english, which will
stem polity as politi:

regression=# select to_tsquery('(polity  church)');
  to_tsquery
-
  'politi'  'church'
(1 row)

However the simple configuration doesn't do anything to that lexeme:

Thanks for the explanation.  I am working with a multi-language database
and that was the reason for using the 'simple' configuration.

I have asked, in an earlier message on this list, advice on how to
handle full text searches in a multi-language database, but got no
reaction to it.  If there is a better way than using the 'simple'
configuration in this case, I would gladly try it.
You'll need to store language information alongside each text value if 
you want to do anything more sophisticated. If you have mixed languages 
within a single text value or if you don't store information about the 
language a text value is in then you're largely out of luck.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ts_heading strange result

2012-07-26 Thread Johann Spies

On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote:

 You'll need to store language information alongside each text value
 if you want to do anything more sophisticated. 

I was afraid that that will be the case :)

I will have to update more than 32 entries which currently have
inconsistent language indications and some of them none at all.

Thanks for responding.

Regards
Johann

-- 
Johann SpiesTelefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology 
Universiteit Stellenbosch.

 If any of you lack wisdom, let him ask of God, that 
  giveth to all men liberally, and upbraideth not; and 
  it shall be given him.  James 1:5 
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lÃªs 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged 
and is intended only for the person to whom it is addressed. If you are not the 
intended recipient, you are notified that you may not use, distribute or copy 
this document in any manner whatsoever. Kindly also notify the sender 
immediately by telephone, and delete the e-mail. The University does not accept 
liability for any damage, loss or expense arising from this e-mail and/or 
accessing any files attached to this e-mail.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ts_heading strange result

2012-07-26 Thread Tom Lane

Johann Spies jsp...@sun.ac.za writes:
 On Thu, Jul 26, 2012 at 04:19:02PM +0800, Craig Ringer wrote:
 You'll need to store language information alongside each text value
 if you want to do anything more sophisticated. 

 I was afraid that that will be the case :)

I'm not sure that there's anything horribly wrong with the strategy
of using simple for everything.  You won't get language-aware stemming,
but maybe you don't need that.  The problem with what you originally
posted was not that simple was inadequate, but that you weren't
applying it consistently --- you didn't have
default_text_search_configuration set to match.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search ts_heading strange result

2012-07-25 Thread Johann Spies

I am beginning to use the full text search facilities in Postgresql
(9.0) and find the result of this query a bit strange:

query:

SELECT  ts_headline('simple',title, to_tsquery('kerkreg|(church  polity)'))
from akb_articles A
where A.tsv@@ 'kerkreg|(church  polity)'

Result

Kerkvereniging en bKerkreg/b: Geskiedenis, beginsel en 
praktyk.(bChurch/b unity and bchurch/b polity: History, principle and 
practice.)

Why is 'polity' not highlighted?

Regards
Johann

-- 
Johann SpiesTelefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology 
Universiteit Stellenbosch.

 Look not every man on his own things, but every man 
  also on the things of others.Philippians 2:4 
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lÃªs 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged 
and is intended only for the person to whom it is addressed. If you are not the 
intended recipient, you are notified that you may not use, distribute or copy 
this document in any manner whatsoever. Kindly also notify the sender 
immediately by telephone, and delete the e-mail. The University does not accept 
liability for any damage, loss or expense arising from this e-mail and/or 
accessing any files attached to this e-mail.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ts_heading strange result

2012-07-25 Thread Tom Lane

Johann Spies jsp...@sun.ac.za writes:
 I am beginning to use the full text search facilities in Postgresql
 (9.0) and find the result of this query a bit strange:

 query:

 SELECT  ts_headline('simple',title, to_tsquery('kerkreg|(church  polity)'))
 from akb_articles A
 where A.tsv@@ 'kerkreg|(church  polity)'

 Result

 Kerkvereniging en bKerkreg/b: Geskiedenis, beginsel en 
 praktyk.(bChurch/b unity and bchurch/b polity: History, principle and 
 practice.)

 Why is 'polity' not highlighted?

I believe the problem is that the one-argument form of to_tsquery() uses
the default TS configuration, which you have probably not got set to
simple.  For me, the default TS configuration is english, which will
stem polity as politi:

regression=# select to_tsquery('(polity  church)');
 to_tsquery  
-
 'politi'  'church'
(1 row)

However the simple configuration doesn't do anything to that lexeme:

regression=# select to_tsquery('simple', '(polity  church)');
 to_tsquery  
-
 'polity'  'church'
(1 row)

So what you've got is ts_headline() parsing the given title against
the simple configuration and getting polity, but the tsquery is
looking for politi, hence no match.

In short: omit the 'simple' argument from the ts_headline call, and
things should play together better.  You could alternatively insert
to_tsquery('simple', '(polity  church)'), but that won't exactly
match what the @@ in WHERE is doing: that's going to use the default
configuration.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search advice requested

2012-07-12 Thread Johann Spies

I have a table with bibliometric information on published articles.
Fields of interest for full text searches are the 'title' and 'abstract'
fields.

Those fields can contain several languages but most of the entries use
English. A grouped query on the 'language' field reveals that the
following languages are involved: 

Afrikaans
Chinese
Dutch
English
French
Gaelic (?)
German
Hungarian
Italian
Japanese
Korean
Polish
Portuguese
Rumanian
Russian
Slovene
Sotho
Spanish
Turkish
Xhosa
Zulu

Now my questions:

1. Is it possible at all to use full text search in such a setup?
2. If so, how would I approach the different languages in indexing and
   querying.
3. How do I ask postgresql which dictionaries are already available in
   the installation for full text search?
4. If full text searches cannot be utilised in such a setup, can
   trgm-related indexing using 'similarity' be a replacement?  I think
   not.

Regards
Johann
-- 
Johann SpiesTelefoon: 021-808 4699
Databestuurder /  Data manager

Sentrum vir Navorsing oor Evaluasie, Wetenskap en Tegnologie
Centre for Research on Evaluation, Science and Technology 
Universiteit Stellenbosch.

 Delight thyself also in the LORD: and he shall give
  thee the desires of thine heart.
  Psalms 37:4 
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lÃªs 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged 
and is intended only for the person to whom it is addressed. If you are not the 
intended recipient, you are notified that you may not use, distribute or copy 
this document in any manner whatsoever. Kindly also notify the sender 
immediately by telephone, and delete the e-mail. The University does not accept 
liability for any damage, loss or expense arising from this e-mail and/or 
accessing any files attached to this e-mail.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread mgould

Tom,

We made most of our text, varchar columns citext data types so that we
could do case insensitive searches.  Is this going to negate most of the
index searches?  It appeared to our DBA that it would be easier to use
citext data type then need to use ILIKE instead?  

Michael Gould
Intermodal Software Solutions, LLC
904-226-0978
 
 
 Original Message 
Subject: Re: [GENERAL] full text search and ILIKE type clauses.
From: Tom Lane t...@sss.pgh.pa.us
Date: Sun, March 11, 2012 7:56 pm
To: Tim Uckun timuc...@gmail.com
Cc: pgsql-general pgsql-general@postgresql.org

Tim Uckun timuc...@gmail.com writes:
 I want to be able to search a lot of fields using queries that use
 ILIKE and unfortunately many of the queries will be using the
 '%SOMETHING%' or '%SOMETHING' type clauses. Since indexes are useless
 on those I was thinking I could use tsvectors but I can't figure out
 how to accomplish this.

Full text search is not going to help for this unless you are willing to
be very lax about replicating the semantics of ILIKE. For example,
ILIKE '%foo%' should match foo anywhere within a word, but FTS is not
going to be able to do better than finding words that begin with foo.

If you're using 9.1, you might look into contrib/pg_trgm instead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Tim Uckun


 We made most of our text, varchar columns citext data types so that we
 could do case insensitive searches.  Is this going to negate most of the
 index searches?  It appeared to our DBA that it would be easier to use
 citext data type then need to use ILIKE instead?


In the same vein...

Does postgres have case insensitive collations yet? Now that 9.1
supports column level collations that would be a really great option
for case insensitive queries.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread mgould

Tim,

It is my understanding that since the extention citext is available that
this gives you what your asking for and at least at this point isn't
going to be part of the core.

Michael Gould
Intermodal Software Solutions, LLC
904-226-0978
 
 
 Original Message 
Subject: Re: [GENERAL] full text search and ILIKE type clauses.
From: Tim Uckun timuc...@gmail.com
Date: Mon, March 12, 2012 1:20 pm
To: mgo...@isstrucksoftware.net
Cc: Tom Lane t...@sss.pgh.pa.us, pgsql-general
pgsql-general@postgresql.org


 We made most of our text, varchar columns citext data types so that we
 could do case insensitive searches.  Is this going to negate most of the
 index searches?  It appeared to our DBA that it would be easier to use
 citext data type then need to use ILIKE instead?


In the same vein...

Does postgres have case insensitive collations yet? Now that 9.1
supports column level collations that would be a really great option
for case insensitive queries.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Tim Uckun

 It is my understanding that since the extention citext is available that
 this gives you what your asking for and at least at this point isn't
 going to be part of the core.


For me it's more of a workaround than a solution but yes probably good
enough. Collation is more subtle than case insensitive comparisons but
for english anyway it should be good enough.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-12 Thread Pavel Stehule

2012/3/12 Tim Uckun timuc...@gmail.com:
 It is my understanding that since the extention citext is available that
 this gives you what your asking for and at least at this point isn't
 going to be part of the core.


 For me it's more of a workaround than a solution but yes probably good
 enough. Collation is more subtle than case insensitive comparisons but
 for english anyway it should be good enough.

Postgres uses system locales - so theoretically you can write own case
insensitive locale.

Regards

Pavel Stehule


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] full text search and ILIKE type clauses.

2012-03-11 Thread Tim Uckun

I want to be able to search a lot of fields using queries that use
ILIKE and unfortunately many of the queries will be  using the
'%SOMETHING%' or '%SOMETHING' type clauses.  Since indexes are useless
on those I was thinking I could use tsvectors but I can't figure out
how to accomplish this.

One option I have seen reccomended is to create a tsvector field and
concat the various fields into that field as a tsvector. That would
work for general text search but I am not sure how I could query
FIELD1 ILIKE '%SOMETHING%' vs FIELD2 ILIKE '%SOMETHING%' .Would
using hstore help in this case?

I should point out that some of the fields are numbers. I am willing
to deal with those separately by building other indexes for them but
of course a unified approach would be preferable.

Any help would be much appreciated. Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-11 Thread Tom Lane

Tim Uckun timuc...@gmail.com writes:
 I want to be able to search a lot of fields using queries that use
 ILIKE and unfortunately many of the queries will be  using the
 '%SOMETHING%' or '%SOMETHING' type clauses.  Since indexes are useless
 on those I was thinking I could use tsvectors but I can't figure out
 how to accomplish this.

Full text search is not going to help for this unless you are willing to
be very lax about replicating the semantics of ILIKE.  For example,
ILIKE '%foo%' should match foo anywhere within a word, but FTS is not
going to be able to do better than finding words that begin with foo.

If you're using 9.1, you might look into contrib/pg_trgm instead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search and ILIKE type clauses.

2012-03-11 Thread Tim Uckun


 If you're using 9.1, you might look into contrib/pg_trgm instead.


If I was to use trgm would it be better to create a trigram index on
each text field? In the past I have created a text field which
contains the rest of the fields concatenated. That works great as long
as you are looking for any word. Is  there a way to specify which word
should match?  Could I combine hstore and tgm to match against
individual words?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-03-07 Thread Nicolas Grilly

According to Oleg in a previous discussion, ts_rank does not use index
because index does not store enough information for ranking:
http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php

On Sat, Feb 18, 2012 at 12:39, Andrey Chursin an...@danasoft.ws wrote:

 Hello

 I have two unrelated questions about fts function ts_rank:

 1) I've created GiST index on column with fts vector, but query
 SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20
 is perfomed with sequential table scan. Index was created on field
 column. Does it mean FTS indexes does not support order by ranking? Or
 I need somehow to create separated index for ranking?

 2) I have a misunderstanding with proximity ranking work. Given two
 vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a'
 | 'b'). And it is equal! But when i am replacing query with 
 operator, e.g. asking for ts_rank(vector, 'a'  'b') i am getting
 different numbers. Why do I get proximity ranking only for AND fts
 queries? This is a problem as far as to_tsquery produces OR queries,
 so i need self-written postprocessing of query to replace OR with AND.

 --
 Regards,
 Andrey

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
Nicolas Grilly
Garden / Vocation City
+33 1 45 72 48 78 - office
+33 6 03 00 25 34 - mobile
www.gardentechno.com - Développement web  reporting / *Web development 
data analytics*
www.vocationcity.com - Plateforme de recrutement sur le web / *Web
recruitment platform*

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-03-07 Thread Andrey Chursin

Is there any way to sort by ranking, avoiding seq scan?
The only way i see now is to use pg_trgm instead of ts_rank, but we
did not check yet how applicable is it for our purposes.

7 марта 2012 г. 20:53 пользователь Nicolas Grilly
nico...@gardentechno.com написал:
 According to Oleg in a previous discussion, ts_rank does not use index
 because index does not store enough information for ranking:
 http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php

 On Sat, Feb 18, 2012 at 12:39, Andrey Chursin an...@danasoft.ws wrote:

 Hello

 I have two unrelated questions about fts function ts_rank:

 1) I've created GiST index on column with fts vector, but query
 SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20
 is perfomed with sequential table scan. Index was created on field
 column. Does it mean FTS indexes does not support order by ranking? Or
 I need somehow to create separated index for ranking?

 2) I have a misunderstanding with proximity ranking work. Given two
 vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a'
 | 'b'). And it is equal! But when i am replacing query with 
 operator, e.g. asking for ts_rank(vector, 'a'  'b') i am getting
 different numbers. Why do I get proximity ranking only for AND fts
 queries? This is a problem as far as to_tsquery produces OR queries,
 so i need self-written postprocessing of query to replace OR with AND.

 --
 Regards,
 Andrey

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




 --
 Nicolas Grilly
 Garden / Vocation City
 +33 1 45 72 48 78 - office
 +33 6 03 00 25 34 - mobile
 www.gardentechno.com - Développement web  reporting / Web development 
 data analytics
 www.vocationcity.com - Plateforme de recrutement sur le web / Web
 recruitment platform



-- 
Regards,
Andrey

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-03-07 Thread Nicolas Grilly

In a previous discussion thread, Oleg suggested that ts_rank is unable to
use GIN indices:
http://archives.postgresql.org/pgsql-general/2011-07/msg00351.php

This is the only information I have about this.

On Wed, Mar 7, 2012 at 18:59, Andrey Chursin an...@danasoft.ws wrote:

 Is there any way to sort by ranking, avoiding seq scan?
 The only way i see now is to use pg_trgm instead of ts_rank, but we
 did not check yet how applicable is it for our purposes.

[GENERAL] Full text search ranking: ordering using index and proximiti ranking with OR queries

2012-02-18 Thread Andrey Chursin

Hello

I have two unrelated questions about fts function ts_rank:

1) I've created GiST index on column with fts vector, but query
SELECT * FROM table ORDER BY ts_rank(field, :query) LIMIT 20
is perfomed with sequential table scan. Index was created on field
column. Does it mean FTS indexes does not support order by ranking? Or
I need somehow to create separated index for ranking?

2) I have a misunderstanding with proximity ranking work. Given two
vectors 'a:1 b:2' and 'a:1 b:1000', i am measuring ts_rank(vector, 'a'
| 'b'). And it is equal! But when i am replacing query with 
operator, e.g. asking for ts_rank(vector, 'a'  'b') i am getting
different numbers. Why do I get proximity ranking only for AND fts
queries? This is a problem as far as to_tsquery produces OR queries,
so i need self-written postprocessing of query to replace OR with AND.

-- 
Regards,
Andrey

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full Text Search, avoiding lexemes search

2012-01-27 Thread Daniel Vázquez

Hi guys!

Full text search, searches by lexemes, this minds that if you are finding
for gato word you are really finding for {gat} lexeme.
I you construct vectors for the words gato, gatos, gata, gatas, all
have the same lexema {gat}
Then the search gato that is to say the search {gat} matches with all
previous vectors.

There some way (configuration, query) to match only for gato and
avoid gatos gata gatas, with FTS ??
Or match only for gato gatos buy no for gata gatas?

Tnks!

[GENERAL] Full Text Search, avoiding lexemes search

2012-01-27 Thread Daniel Vázquez

Hi guys!

Full text search, searches by lexemes, this minds that if you are finding
for gato word you are really finding for {gat} lexeme.
I you construct vectors for the words gato, gatos, gata, gatas, all
have the same lexema {gat}
Then the search gato that is to say the search {gat} matches with all
previous vectors.

There some way (configuration, query) to match only for gato and
avoid gatos gata gatas, with FTS ??
Or match only for gato gatos buy no for gata gatas?

Tnks!

Re: [GENERAL] Full Text Search, avoiding lexemes search

2012-01-27 Thread Oleg Bartunov


Daniel,

just use different fts configuration for search, which doesn't
includes stemmers.

Regards,
Oleg
On Fri, 27 Jan 2012, Daniel V?zquez wrote:


Hi guys!

Full text search, searches by lexemes, this minds that if you are finding
for gato word you are really finding for {gat} lexeme.
I you construct vectors for the words gato, gatos, gata, gatas, all
have the same lexema {gat}
Then the search gato that is to say the search {gat} matches with all
previous vectors.

There some way (configuration, query) to match only for gato and
avoid gatos gata gatas, with FTS ??
Or match only for gato gatos buy no for gata gatas?

Tnks!



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search - How to build a filtering dictionary

2012-01-18 Thread Antonio Franzoso


Hi all,
I need to build a synonym dictionary that performs a normalization of 
tokens just like a filtering dictionary does. I've searched for a 
filtering dictionary template but I've found it. Where Can I find it? 
Or, if there isn't such a template, How can I build a simple filter 
dictionary that simply maps a term with another (in a synonym dict-like 
way)?


Thanks in advance,
Antonio

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search - How to build a filtering dictionary

2012-01-18 Thread Oleg Bartunov


Antonio,

you can see contrib/unaccent dictionary, which is a filtering 
dictionary. I have a page about it - http://mira.sai.msu.su/~megera/wiki/unaccent





Oleg
On Wed, 18 Jan 2012, Antonio Franzoso wrote:


Hi all,
I need to build a synonym dictionary that performs a normalization of 
tokens just like a filtering dictionary does. I've searched for a filtering 
dictionary template but I've found it. Where Can I find it? Or, if there 
isn't such a template, How can I build a simple filter dictionary that 
simply maps a term with another (in a synonym dict-like way)?


Thanks in advance,
Antonio




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search - How to build a filtering dictionary

2012-01-18 Thread Antonio Franzoso


Thanks for reply,
there is any simplest way? I have to do just a simple map (in a similar 
way of synonym dictionary), set the TSL_FILTER flag (if there's a map 
for a token) and then pass the normalized token to my own thesaurus 
dictionary. I'm working on Windows and I've to write a C library to do 
these operations (and I cannot see the unaccent code because it's a dll 
file).
If there is no other solution, I though that I can integrate this 
filtering dictionary in the thesaurus in a similar way:


token: lemma, term1, term2,

where token is the denormalized term, lemma is one entry of thesaurus 
and term1, term2,... are terms associated with lemma in the original 
thesaurus structure. What do you think about this solution?


Il 18/01/2012 17:40, Oleg Bartunov ha scritto:

Antonio,

you can see contrib/unaccent dictionary, which is a filtering 
dictionary. I have a page about it - 
http://mira.sai.msu.su/~megera/wiki/unaccent





Oleg
On Wed, 18 Jan 2012, Antonio Franzoso wrote:


Hi all,
I need to build a synonym dictionary that performs a normalization of 
tokens just like a filtering dictionary does. I've searched for a 
filtering dictionary template but I've found it. Where Can I find it? 
Or, if there isn't such a template, How can I build a simple filter 
dictionary that simply maps a term with another (in a synonym 
dict-like way)?


Thanks in advance,
Antonio




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-12 Thread Stanislav Raskin


On 11.05.11 17:04, t...@fuzzy.cz t...@fuzzy.cz wrote:

We had exactly the same problem and persistent connection solved it.

First testing with persistent connections seems to work like a charm. Will
do some thorough testing and watch the memory load. Hopefully, I will not
trip over some sort of pitfall. Goole seems to be full of people who have
problems with persistent connections.

Big thanks for your advice.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-12 Thread Stanislav Raskin



On 11.05.11 17:31, Tom Lane t...@sss.pgh.pa.us wrote:

You really, really, really need to fix whatever is preventing you from
using pooling.  Opening a database connection to run one query is just
horridly inefficient.


Very true. I did not mean that anything actually prevents us from using
pooling. We just have no idea, how it will interfere with our productive
pgcluster setup. I imagine the evaluation, testing and verification of
pooling systems in combination with our setup to be quite tedious.
 
Of course, I don't open a connection for each query. One is opened for
each service call. The services are designed to start an own process for
every call. Such a process - for now - needs an own connection. It usually
handles dozens of queries. Until now, we never ran into performance
problems. The time consumed by DB operations is usually negligible
compared to the rest.

First tests with a simple persistent connection setup seem to work fine
and solve the performance issue. I tend to put some thorough testing on
this setup and see if I step into a pitfall.

Big kudos to you and this list. You were a great help, as always.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Stanislav Raskin

Hello everybody,

I was experimenting with the FTS feature on postgres 8.3.4 lately and
encountered a weird performance issue when using a custom FTS configuration.

I use this german ispell dictionary, re-encoded to utf8:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-g
erman-compound.tar.gz

With the following configuration:

CREATE TEXT SEARCH CONFIGURATION public.german_de (COPY =
pg_catalog.german);



CREATE TEXT SEARCH DICTIONARY german_de_ispell (

TEMPLATE = ispell,

DictFile = german_de_utf8,

AffFile = german_de_utf8,

StopWords = german_de_utf8

);



ALTER TEXT SEARCH CONFIGURATION german_de

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

  word, hword, hword_part

WITH german_de_ispell, german_stem;



So far so good. Indexing and creation of tsvectors works like a charm.

The problem is, that if I open a new connection to the database and do
something like this

SELECT to_tsquery('german_de', 'abcd');

it takes A LOT of time for the query to complete for the first time. About
1-1,5s. If I submit the same query for a second, third, fourth time and so
on, it takes only some 10-20ms, which is what I would expect.

It almost seems as if the dictionary is somehow analyzed or indexed and the
results cached for each connection, which seems counter-intuitive to me.
After all, the dictionaries should not change that often.

Did I miss something or did I do something wrong?

I'd be thankful for any advice.

Kind Regards

--

Stanislav Raskin

livn GmbH
Campus Freudenberg
Rainer-Gruenter-Str. 21
42119 Wuppertal

+49(0)202-8 50 66 921
ras...@livn.de
http://www.livn.de

livn
local individual video news GmbH
Registergericht Wuppertal HRB 20086

Geschäftsführer:
Dr. Stefan Brües
Alexander Jacob

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Pavel Stehule

Hello

2011/5/11 Stanislav Raskin ras...@livn.de:
Hello everybody,
I was experimenting with the FTS feature on postgres 8.3.4 lately and
encountered a weird performance issue when using a custom FTS configuration.
I use this german ispell dictionary, re-encoded to utf8:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
With the following configuration:

CREATE TEXT SEARCH CONFIGURATION public.german_de (COPY =
pg_catalog.german);

CREATE TEXT SEARCH DICTIONARY german_de_ispell (

TEMPLATE = ispell,

DictFile = german_de_utf8,

AffFile = german_de_utf8,

StopWords = german_de_utf8

);

ALTER TEXT SEARCH CONFIGURATION german_de

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

word, hword, hword_part

WITH german_de_ispell, german_stem;

So far so good. Indexing and creation of tsvectors works like a charm.
The problem is, that if I open a new connection to the database and do
something like this
SELECT to_tsquery('german_de', 'abcd');
it takes A LOT of time for the query to complete for the first time. About
1-1,5s. If I submit the same query for a second, third, fourth time and so
on, it takes only some 10-20ms, which is what I would expect.
It almost seems as if the dictionary is somehow analyzed or indexed and the
results cached for each connection, which seems counter-intuitive to me.
After all, the dictionaries should not change that often.
Did I miss something or did I do something wrong?
I'd be thankful for any advice.
Kind Regards

it is expected behave :( . A loading of ispell dictionary is very slow.

Use a german snowball instead.

You can you a some pooling connection software too.

Regards

Pavel Stehule

Stanislav Raskin

livn GmbH
Campus Freudenberg
Rainer-Gruenter-Str. 21
42119 Wuppertal

+49(0)202-8 50 66 921
ras...@livn.de
http://www.livn.de

livn
local individual video news GmbH
Registergericht Wuppertal HRB 20086

Geschäftsführer:
Dr. Stefan Brües
Alexander Jacob

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Tom Lane

Stanislav Raskin ras...@livn.de writes:
 The problem is, that if I open a new connection to the database and do
 something like this
 SELECT to_tsquery('german_de', 'abcd');
 it takes A LOT of time for the query to complete for the first time. About
 1-1,5s. If I submit the same query for a second, third, fourth time and so
 on, it takes only some 10-20ms, which is what I would expect.

 It almost seems as if the dictionary is somehow analyzed or indexed and the
 results cached for each connection, which seems counter-intuitive to me.

Yes, loading a large dictionary is known to be a fairly expensive
operation.  There's been discussions about how to make it cheaper, but
nothing's been done yet.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Pavel Stehule

Hello

2011/5/11 Stanislav Raskin ras...@livn.de:

 On 11.05.11 15:45, Pavel Stehule pavel.steh...@gmail.com wrote:

it is expected behave :( . A loading of ispell dictionary is very slow.

Use a german snowball instead.

You can you a some pooling connection software too.


 Thank you for the response.
 Is the dictionary german_stem supplied with postgresql a snowball stemmer?
 If yes, it sure is incredibly fast, but yields much worse results and thus
 fewer and worse matches for search queries.

 To use connections pooling is...difficult in my situation, to say the
 least. We currently use quite a complex pgcluster/corosync setup for
 multi-master replication, load balancing and high availability. To
 introduce connection pooling to this setup could turn out to be quite a
 big project.


German_stem is part of distribution. I am thinking so result of stems
are usable because the reports about slow speed are not often.

There are not exists Czech stem, so we have to use a ispell. I wrote a
patch that stores loaded dictionary in shared memory. You can find
source code in archive pg_hacker mailing list. But it isn't well
tested and it is just prototype - not accepted to pg. You can test it.
Sometimes people use a simple configuration here. It isn't best
but it is fast.

Regards

Pavel Stehule

 --

 Stanislav Raskin





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Stanislav Raskin




Yes, loading a large dictionary is known to be a fairly expensive
operation.  There's been discussions about how to make it cheaper, but
nothing's been done yet.

regards, tom lane

Hi Tom,

thanks for the quick response. Bad news for me ;(
We develop ajax-driven web apps, which sort of rely on quick calls to data
services. Each call to a service opens a new connection. This makes the
search service, if using fts and ispell, about 100 times slower than a
dumb ILIKE-implementation.

Is there any way of hack or compromise to achieve good performance without
losing fts ability?
I am thinking, for example, of a way to permanently keep a loaded
dictionary in memory instead of loading it for every connection. As I
wrote in response to Pavel Stehule's post, connection pooling is not
really an option.
Our front-end is strictly PHP, so I was thinking about using a single
persistent connection
(http://de.php.net/manual/en/function.pg-pconnect.php) for all calls. Is
there some sort of major disadvantage in this approach from the database
point of view?

Kind regards

--

Stanislav Raskin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Stanislav Raskin


On 11.05.11 15:45, Pavel Stehule pavel.steh...@gmail.com wrote:

it is expected behave :( . A loading of ispell dictionary is very slow.

Use a german snowball instead.

You can you a some pooling connection software too.


Thank you for the response.
Is the dictionary german_stem supplied with postgresql a snowball stemmer?
If yes, it sure is incredibly fast, but yields much worse results and thus
fewer and worse matches for search queries.

To use connections pooling is...difficult in my situation, to say the
least. We currently use quite a complex pgcluster/corosync setup for
multi-master replication, load balancing and high availability. To
introduce connection pooling to this setup could turn out to be quite a
big project. 

--

Stanislav Raskin




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread tv




Yes, loading a large dictionary is known to be a fairly expensive
operation.  There's been discussions about how to make it cheaper, but
nothing's been done yet.

regards, tom lane

 Hi Tom,

 thanks for the quick response. Bad news for me ;(
 We develop ajax-driven web apps, which sort of rely on quick calls to data
 services. Each call to a service opens a new connection. This makes the
 search service, if using fts and ispell, about 100 times slower than a
 dumb ILIKE-implementation.

 Is there any way of hack or compromise to achieve good performance without
 losing fts ability?
 I am thinking, for example, of a way to permanently keep a loaded
 dictionary in memory instead of loading it for every connection. As I
 wrote in response to Pavel Stehule's post, connection pooling is not
 really an option.
 Our front-end is strictly PHP, so I was thinking about using a single
 persistent connection
 (http://de.php.net/manual/en/function.pg-pconnect.php) for all calls. Is
 there some sort of major disadvantage in this approach from the database
 point of view?

 Kind regards

Hi,

opening a completely new connection for each request may be a bit
expensive, so I'd recommend using some king od connection pooling,
especially when you're doing 'small' transactions (because that's when the
overhead matters).

We had exactly the same problem and persistent connection solved it.

But it has some drawbacks too - each conneection has it's own copy of the
dictionary. So if the dictionary takes 30MB and you have 10 connections,
then 300 MB of memory is used.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search to_tsquery performance with ispell dictionary

2011-05-11 Thread Pavel Stehule

2011/5/11 Stanislav Raskin ras...@livn.de:
 On 11.05.11 16:42, Pavel Stehule pavel.steh...@gmail.com wrote:


I wrote a
patch that stores loaded dictionary in shared memory.

 Hi Pavel,

 very interesting. I will give it a closer look.

 What do you think about using ispell to create, store and index tsvectors,
 but at the same time to use the stemmer to create tsqueries? Thus, I hope
 to create a fast user experience while searching, but at the same time
 have more detailed tsvectors and more matches.


I know nothing - but there can be risk - two different method can to
returns two different results.

Regards

Pavel

 Regards

 --

 Stanislav Raskin





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 >

1 - 100 of 163 matches

Mail list logo