Re: [HACKERS] Prefix support for synonym dictionary
1. The docs should be clarified a little. For instance, it should have a link back to the definition of a prefix search (12.3.2). I included my doc suggestions as an attachment. Thank you, merged 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps fragile) way. After calling findwrd(), the end pointer is pointing at either the end of the string, or the *; depending on whether the string ends in * and whether flags is NULL. I only mention this because I had to take a more careful look to see what was happening. Perhaps add a comment to make it more clear? Add comments: /* * Finds the next whitespace-delimited word within the 'in' string. * Returns a pointer to the first character of the word, and a pointer * to the next byte after the last character in the word (in *end). * Character '*' at the end of word will not be threated as word * charater if flags is not null. */ static char * findwrd(char *in, char **end, uint16 *flags) 3. The patch looks for the special byte '*'. I think that's fine, because we depend on the files being in UTF-8 encoding, where it's the same byte. However, I thought it was worth mentioning in case we want to support other encodings for text search files later. tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql supports only encoding which are a superset of ASCII. So it's safe to use asterisk with any encodings -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ synonym_prefix-0.2.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
2009/8/6 Teodor Sigaev teo...@sigaev.ru: 1. The docs should be clarified a little. For instance, it should have a link back to the definition of a prefix search (12.3.2). I included my doc suggestions as an attachment. Thank you, merged 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps fragile) way. After calling findwrd(), the end pointer is pointing at either the end of the string, or the *; depending on whether the string ends in * and whether flags is NULL. I only mention this because I had to take a more careful look to see what was happening. Perhaps add a comment to make it more clear? Add comments: /* * Finds the next whitespace-delimited word within the 'in' string. * Returns a pointer to the first character of the word, and a pointer * to the next byte after the last character in the word (in *end). * Character '*' at the end of word will not be threated as word * charater if flags is not null. */ static char * findwrd(char *in, char **end, uint16 *flags) 3. The patch looks for the special byte '*'. I think that's fine, because we depend on the files being in UTF-8 encoding, where it's the same byte. However, I thought it was worth mentioning in case we want to support other encodings for text search files later. tsearch_readline() converts file's UTF8 encoding into server encoding. pgsql supports only encoding which are a superset of ASCII. So it's safe to use asterisk with any encodings Jeff, Based on these comments, do you want to go ahead and mark this Ready for Committer? https://commitfest.postgresql.org/action/patch_view?id=133 ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote: Based on these comments, do you want to go ahead and mark this Ready for Committer? Done, thanks Teodor. However, on the commitfest page, the patches got updated in the wrong places: prefix support and filtering dictionary support are pointing at each others' patches. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
On Thu, Aug 6, 2009 at 12:53 PM, Jeff Davispg...@j-davis.com wrote: On Thu, 2009-08-06 at 12:19 -0400, Robert Haas wrote: Based on these comments, do you want to go ahead and mark this Ready for Committer? Done, thanks Teodor. However, on the commitfest page, the patches got updated in the wrong places: prefix support and filtering dictionary support are pointing at each others' patches. Fixed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
On Sun, Aug 2, 2009 at 3:05 PM, Jeff Davispg...@j-davis.com wrote: The patch looks good. Comments: 1. The docs should be clarified a little. For instance, it should have a link back to the definition of a prefix search (12.3.2). I included my doc suggestions as an attachment. 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps fragile) way. After calling findwrd(), the end pointer is pointing at either the end of the string, or the *; depending on whether the string ends in * and whether flags is NULL. I only mention this because I had to take a more careful look to see what was happening. Perhaps add a comment to make it more clear? 3. The patch looks for the special byte '*'. I think that's fine, because we depend on the files being in UTF-8 encoding, where it's the same byte. However, I thought it was worth mentioning in case we want to support other encodings for text search files later. Oleg, Are you planning to update this patch this week? If not I will set it to Returned with Feedback. Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
On Wed, 2009-08-05 at 12:34 -0400, Robert Haas wrote: Oleg, Are you planning to update this patch this week? If not I will set it to Returned with Feedback. My only comments were related to docs and comments, and I supplied a patch as a suggested fix for the docs. Also, the patch is very small. I'd hate to hold it up over such a minor issue, and it seems like a useful feature. If Oleg is unavailable, would you mind just having a second review of the patch to see if they agree with my suggestions, and then mark ready for committer review? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefix support for synonym dictionary
Hi, The patch looks good. Comments: 1. The docs should be clarified a little. For instance, it should have a link back to the definition of a prefix search (12.3.2). I included my doc suggestions as an attachment. 2. dsynonym_init() uses findwrd() in a slightly confusing (and perhaps fragile) way. After calling findwrd(), the end pointer is pointing at either the end of the string, or the *; depending on whether the string ends in * and whether flags is NULL. I only mention this because I had to take a more careful look to see what was happening. Perhaps add a comment to make it more clear? 3. The patch looks for the special byte '*'. I think that's fine, because we depend on the files being in UTF-8 encoding, where it's the same byte. However, I thought it was worth mentioning in case we want to support other encodings for text search files later. Regards, Jeff Davis *** textsearch.sgml 2009-08-02 11:22:38.0 -0700 --- textsearch.sgml.new 2009-08-02 11:22:27.0 -0700 *** *** 2290,2315 /para para ! Star sign literal*/literal at the end of definition word indicates, ! that definition word is a prefix and functionto_tsquery()/function ! function will transform that definition to the prefix search format. ! Notice, it is ignored in functionto_tsvector()/function. /para programlisting - cat $SHAREDIR/tsearch_data/synonym_sample.syn - postgrespgsql - postgresql pgsql - postgre pgsql - gogle googl - indices index* - cat $SHAREDIR/tsearch_data/synonym_sample.syn postgrespgsql postgresql pgsql postgre pgsql gogle googl indices index* =# create text search dictionary syn( template=synonym,synonyms='synonym_sample'); =# select ts_lexize('syn','indices'); ts_lexize --- 2290,2317 /para para ! An asterisk (literal*/literal) at the end of definition word indicates ! that definition word is a prefix, and functionto_tsquery()/function ! function will transform that definition to the prefix search format (see ! xref linkend=textsearch-parsing-queries). ! Notice that it is ignored in functionto_tsvector()/function. /para +para + Contents of filename$SHAREDIR/tsearch_data/synonym_sample.syn/: +/para programlisting postgrespgsql postgresql pgsql postgre pgsql gogle googl indices index* + /programlisting +para + Results: +/para + programlisting =# create text search dictionary syn( template=synonym,synonyms='synonym_sample'); =# select ts_lexize('syn','indices'); ts_lexize *** *** 2324,2329 --- 2326,2338 'index':* (1 row) + + =# select 'indexes are very useful'::tsvector; + tsvector + - + 'are' 'indexes' 'useful' 'very' + (1 row) + =# select 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices'); ?column? -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers