fulltext question
Hello all- I have a question on searching via fulltext. I have the following SQL statement: var('SQLResultsID') = 'select *, MATCH (product_id,product_name,product_desc) AGAINST(' + $sqlKeywordSearch + ') AS SCORE from products WHERE MATCH (product_id,product_name,product_desc) AGAINST(' + $sqlKeywordSearchB + ' IN BOOLEAN MODE) AND active NOT LIKE %no% ORDER BY score DESC First off, the variable $sqlKeywordSearch contains the search keywords separated by spaces. The second variable, $sqlKeywordSearchB, has keywords separated by spaces but also adds a “+ sign to the beginning of each keyword to do the Boolean search and match all the keywords in the search. question #1 - Is this the best way to do a boolean search but also return a usable “score”? The search is doing what I expect it to do so no problem there. Here’s my main question: I want to be able to “boost the rankings(score) of the results based on the fields. Anything that matches on the product_id field I would like to get a higher ranking, then the product_name field next, then the product_desc last (obviously, if something matches in the product_id field it is of greater importance than if it matches in the product_description field). I know I can boost, or adjust, the score to the keywords if I want but is there a way to add to the search score rankings based on the field searched? as always, Thank everyone for any help! James James Sheffer j...@higherpowered.com Lasso Developerhttp://www.higherpowered.com phone: 469-256-0268 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: MySQL fulltext. Question about the stopword list
Hi! On Jan 07, Erlend Hopso Stromsvik wrote: What I can easily do without breaking 4.0.x gamma status, is to add command line switch --disable-fulltext-stopwords. It can help as a temporary solution, untill a proper fix - per-index options, that is - will be implemented. That would be helpful for me, but what about Thomas Spahni's suggestion? In todo already :) I remember working on a project when I was school where we wrote this program using autogenerated stopword lists and N-gram matching for the text and search string. By this the stopword list was not hard coded.. What is N-gram matching ? n-grams are used to describe objects as vectors. This makes it possible to apply geometric, statistical and other mathematical techniques, which are well defined for vectors, but not for objects in general. For example, one of the most common uses is to define a similarity measure between textual documents based on the application of a mathematical function to the vector representations of the documents Yep. This is how fulltext search in MySQL works (not IN BOOLEAN MODE, of course). It is called vector space model, though words are used as vector coordinates, not n-grams. An n-gram is a set of n consecutive characters extracted from a word. The main idea behind this approach is that, similar words will have a high proportion of n-grams in common. Typical values for n are 2 or 3, these corresponding to the use of digrams or trigrams, respectively. So if you have the word 'computer' you'll get the following digrams: *c, co, om, mp, pu, ut, te, er, r* and the trigrams: **c,*co,com,omp,mpu,put,ute,ter,er*,r** Regards, Sergei -- MySQL Development Team __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
FW: MySQL fulltext. Question about the stopword list
Resending this, since the first one didn't seem to get posted on the MySQL list. -Original Message- From: Erlend HopsÛ StrÛmsvik Sent: 7. januar 2003 10:18 To: [EMAIL PROTECTED] Subject: RE: MySQL fulltext. Question about the stopword list What I can easily do without breaking 4.0.x gamma status, is to add command line switch --disable-fulltext-stopwords. It can help as a temporary solution, untill a proper fix - per-index options, that is - will be implemented. That would be helpful for me, but what about Thomas Spahni's suggestion? Sergei, but then, could you also add a command line switch --read-stopwords-from-file=filename ??? Please. That could solve half of my problem. Best regards, Thomas Spahni I was mere wondering why the stopword list was 'hardcoded' since it seems to me that it's one of those things a user should be able to change/modify without to much hassle and on a more frequent basis than whenever one recompile MySQL. Also a stopword list is very dependent on what kind of text/data one wants to search in so a large system with multiple users and databases might want different stopword lists... I remember working on a project when I was school where we wrote this program using autogenerated stopword lists and N-gram matching for the text and search string. By this the stopword list was not hard coded.. What is N-gram matching ? I post this to the MySQL board, since maybe someone else has something to add/say about it too :) Don't know where I got these texts from, but it should give you a general idea about n-grams. n-grams are used to describe objects as vectors. This makes it possible to apply geometric, statistical and other mathematical techniques, which are well defined for vectors, but not for objects in general. For example, one of the most common uses is to define a similarity measure between textual documents based on the application of a mathematical function to the vector representations of the documents N-Grams String-similarity approaches to conflation involve the system calculating a measure of similarity between an input query term and each of the distinct terms in the database. Those database terms that have a high similarity to a query term are then displayed to the user for possible inclusion in the query. N-gram matching techniques are one of the most common of these approaches (Freund Willett, 1982). An n-gram is a set of n consecutive characters extracted from a word. The main idea behind this approach is that, similar words will have a high proportion of n-grams in common. Typical values for n are 2 or 3, these corresponding to the use of digrams or trigrams, respectively. So if you have the word 'computer' you'll get the following digrams: *c, co, om, mp, pu, ut, te, er, r* and the trigrams: **c,*co,com,omp,mpu,put,ute,ter,er*,r** where '*' denotes a padding space. There are n+1 such digrams and n+2 such trigrams in a word containing n characters. Found this link after some 'googling about' http://web.umr.edu/~tauritzd/ngram/ This is probably the original text for the first text I had: http://web.umr.edu/~tauritzd/ngram/tutorial.html Regards, Sergei -- MySQL Development Team __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: MySQL fulltext. Question about the stopword list
What I can easily do without breaking 4.0.x gamma status, is to add command line switch --disable-fulltext-stopwords. It can help as a temporary solution, untill a proper fix - per-index options, that is - will be implemented. That would be helpful for me, but what about Thomas Spahni's suggestion? Sergei, but then, could you also add a command line switch --read-stopwords-from-file=filename ??? Please. That could solve half of my problem. Best regards, Thomas Spahni I was mere wondering why the stopword list was 'hardcoded' since it seems to me that it's one of those things a user should be able to change/modify without to much hassle and on a more frequent basis than whenever one recompile MySQL. Also a stopword list is very dependent on what kind of text/data one wants to search in so a large system with multiple users and databases might want different stopword lists... I remember working on a project when I was school where we wrote this program using autogenerated stopword lists and N-gram matching for the text and search string. By this the stopword list was not hard coded.. What is N-gram matching ? I post this to the MySQL board, since maybe someone else has something to add/say about it too :) Don't know where I got these texts from, but it should give you a general idea about n-grams. n-grams are used to describe objects as vectors. This makes it possible to apply geometric, statistical and other mathematical techniques, which are well defined for vectors, but not for objects in general. For example, one of the most common uses is to define a similarity measure between textual documents based on the application of a mathematical function to the vector representations of the documents N-Grams String-similarity approaches to conflation involve the system calculating a measure of similarity between an input query term and each of the distinct terms in the database. Those database terms that have a high similarity to a query term are then displayed to the user for possible inclusion in the query. N-gram matching techniques are one of the most common of these approaches (Freund Willett, 1982). An n-gram is a set of n consecutive characters extracted from a word. The main idea behind this approach is that, similar words will have a high proportion of n-grams in common. Typical values for n are 2 or 3, these corresponding to the use of digrams or trigrams, respectively. So if you have the word 'computer' you'll get the following digrams: *c, co, om, mp, pu, ut, te, er, r* and the trigrams: **c,*co,com,omp,mpu,put,ute,ter,er*,r** where '*' denotes a padding space. There are n+1 such digrams and n+2 such trigrams in a word containing n characters. Found this link after some 'googling about' http://web.umr.edu/~tauritzd/ngram/ This is probably the original text for the first text I had: http://web.umr.edu/~tauritzd/ngram/tutorial.html Regards, Sergei -- MySQL Development Team __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: MySQL fulltext. Question about the stopword list
On Mon, 6 Jan 2003, Sergei Golubchik wrote: Does the stopword list have to be 'hardcoded' in myisam/ft_static.c? No, of course, it has not. The problem is, that I cannot implement something like CREATE TABLE ... ( ... FULLTEXT (...) WITHOUT STOPWORD LIST ) or CREATE TABLE ... ( ... FULLTEXT (...) WITH STOPWORD LIST=filename ) or whatever, as there is simply no room in .frm file to store extra table/column/index attributes :( Changing .frm format is in our todo for a long time already, but it includes few subtasks that makes it very non-trivial. :( It's something that Monty is going to do himself. What I can easily do without breaking 4.0.x gamma status, is to add command line switch --disable-fulltext-stopwords. It can help as a temporary solution, untill a proper fix - per-index options, that is - will be implemented. Sergei, but then, could you also add a command line switch --read-stopwords-from-file=filename ??? Please. That could solve half of my problem. Best regards, Thomas Spahni -- filter: sql,query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
MySQL fulltext. Question about the stopword list
Hi I'm currently using MySQL (4.0.4 recompiled) as our database for storing information about books. There are about 1.5mill book titles + other information in the database now. The fulltext search capabilities of MySQL have turned out to become very good with the 4+ version and with a little tweaking of the ft_min_word_len and myisam/ft_static.c we have gotten all we wished for feature wise :) ft_min_word_len is 2 and I've completely removed the stopword list. A fulltext search with 1.5mill book titles and sub titles takes about 0.17 sec so there is no speed issues here. (hands down for MySQL) :) But... Does the stopword list have to be 'hardcoded' in myisam/ft_static.c? I remember working on a project when I was school where we wrote this program using autogenerated stopword lists and N-gram matching for the text and search string. By this the stopword list was not hard coded.. So is there a special reason for MySQL to have the stopword list on the source forcing you to recompile everytime you change it or whenever I download a new release? I find this to be a bit cumbersome since I run several development/backup servers and on different platforms (Windows/Linx). I'm still not able to compile MySQL for Windows so I'm using the prebuilt installer for Windows. So my Windows server won't be able to display the same results as the main server or the linux servers. Erlend Stromsvik - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: MySQL fulltext. Question about the stopword list
Hi! On Jan 06, Erlend Hopsø Strømsvik wrote: Hi I'm currently using MySQL (4.0.4 recompiled) as our database for storing information about books. There are about 1.5mill book titles + other information in the database now. The fulltext search capabilities of MySQL have turned out to become very good with the 4+ version and with a little tweaking of the ft_min_word_len and myisam/ft_static.c we have gotten all we wished for feature wise :) ft_min_word_len is 2 and I've completely removed the stopword list. A fulltext search with 1.5mill book titles and sub titles takes about 0.17 sec so there is no speed issues here. (hands down for MySQL) :) Thanks :) But... Does the stopword list have to be 'hardcoded' in myisam/ft_static.c? No, of course, it has not. The problem is, that I cannot implement something like CREATE TABLE ... ( ... FULLTEXT (...) WITHOUT STOPWORD LIST ) or CREATE TABLE ... ( ... FULLTEXT (...) WITH STOPWORD LIST=filename ) or whatever, as there is simply no room in .frm file to store extra table/column/index attributes :( Changing .frm format is in our todo for a long time already, but it includes few subtasks that makes it very non-trivial. :( It's something that Monty is going to do himself. What I can easily do without breaking 4.0.x gamma status, is to add command line switch --disable-fulltext-stopwords. It can help as a temporary solution, untill a proper fix - per-index options, that is - will be implemented. I remember working on a project when I was school where we wrote this program using autogenerated stopword lists and N-gram matching for the text and search string. By this the stopword list was not hard coded.. What is N-gram matching ? Regards, Sergei -- MySQL Development Team __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Fulltext question
Hi there, I am trying fulltext to speed up my queries; but what would the query line be, when I would like to match the phrase: This is a text? With a like query i would do WHERE column LIKE '%This Is a text%'; I've tried MATCH column AGAINST('+this +is +a +test'); but this returns, all records which have this is etc. but not the phrase. Thanks for the advice, Jeroen - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php