FW: MySQL fulltext. Question about the stopword list

Erlend Hopso Stromsvik Wed, 08 Jan 2003 13:06:53 -0800

Resending this, since the first one didn't seem to get posted on the MySQL
list.


> -----Original Message-----
> From: Erlend HopsÛ StrÛmsvik 
> Sent: 7. januar 2003 10:18
> To: [EMAIL PROTECTED]
> Subject: RE: MySQL fulltext. Question about the stopword list
> 
> 
> > What I can easily do without breaking 4.0.x "gamma" status, 
> is to add
> > command line switch --disable-fulltext-stopwords. It can help as a
> > temporary solution, untill a proper fix - per-index 
> options, that is -
> > will be implemented.
> 
> That would be helpful for me, but what about Thomas Spahni's 
> suggestion?
> 
> > 
> > Sergei,
> > 
> > but then, could you also add a command line switch
> > 
> > --read-stopwords-from-file="filename" ???
> > 
> > Please. That could solve half of my problem.
> > 
> > Best regards,
> > Thomas Spahni
> 
> I was mere wondering why the stopword list was 'hardcoded' 
> since it seems to me that it's one of those things a user 
> should be able to change/modify without to much hassle and on 
> a more frequent basis than whenever one recompile MySQL. Also 
> a stopword list is very dependent on what kind of text/data 
> one wants to search in so a large system with multiple users 
> and databases might want different stopword lists...
> 
> > 
> > > I remember working on a project when I was school where we 
> > wrote this
> > > program using autogenerated stopword lists and N-gram 
> > matching for the text
> > > and search string. By this the stopword list was not hard coded..
> > 
> > What is "N-gram matching" ?
> > 
> 
> I post this to the MySQL board, since maybe someone else has 
> something to add/say about it too :)
> Don't know where I got these texts from, but it should give 
> you a general idea about n-grams. 
> 
> ************************
> n-grams are used to describe objects as vectors. This makes 
> it possible to apply geometric, statistical and other 
> mathematical techniques, which are well defined for vectors, 
> but not for objects in general. For example, one of the most 
> common uses is to define a similarity measure between textual 
> documents based on the application of a mathematical function 
> to the vector representations of the documents
> ************************
> N-Grams
> String-similarity approaches to conflation involve the system 
> calculating a measure of similarity between an input query 
> term and each of the distinct terms in the database. Those 
> database terms that have a high similarity to a query term 
> are then displayed to the user for possible inclusion in the query. 
> N-gram matching techniques are one of the most common of 
> these approaches (Freund & Willett, 1982). An n-gram is a set 
> of n consecutive characters extracted from a word. The main 
> idea behind this approach is that, similar words will have a 
> high proportion of n-grams in common. Typical values for n 
> are 2 or 3, these corresponding to the use of digrams or 
> trigrams, respectively.
> 
> So if you have the word 'computer' you'll get the following digrams:
> *c, co, om, mp, pu, ut, te, er, r*
> 
> and the trigrams:
> **c,*co,com,omp,mpu,put,ute,ter,er*,r**
> 
> where '*' denotes a padding space. There are n+1 such digrams 
> and n+2 such trigrams in a word containing n characters.
> 
> 
> Found this link after some 'googling about' 
> http://web.umr.edu/~tauritzd/ngram/
> This is probably the original text for the first text I had:
> http://web.umr.edu/~tauritzd/ngram/tutorial.html
> 
> 
> > Regards,
> > Sergei
> > 
> > -- 
> > MySQL Development Team
> >    __  ___     ___ ____  __
> >   /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <[EMAIL PROTECTED]>
> >  / /|_/ / // /\ \/ /_/ / /__  MySQL AB, http://www.mysql.com/
> > /_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
> >        <___/
> > 
> 

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

FW: MySQL fulltext. Question about the stopword list

Reply via email to