From: Freddie Bingham [EMAIL PROTECTED]
(1) Why is such a query as this allowed?
WHERE MATCH (text)
AGAINST ('+s*' IN BOOLEAN MODE)
This returns everything that starts with an 's', as expected
by the syntax.
Why is it not stopped by the mysql minimum word length? I
The minimum word length still applies to the items that are found! A record
with so very cruel in the `text` column will certainly not be found.
have no desire to allow a search for every word that begins
with an 's' as it can be resourceful and will return results
with questionable usage. So I'll take care of not allowing
these searches in a higher level
It is always wise to check user input before applying it to a query!
(2) Why does this query also return everything that begins with an 's'
WHERE MATCH (text)
AGAINST ('[EMAIL PROTECTED]' IN BOOLEAN MODE)
Mysql is obviously silently stripping the list of garbage
characters away from the word, which I assume was also done
with the indexed list of words.
See: http://dev.mysql.com/doc/mysql/en/fulltext-search.html
--
MySQL uses a very simple parser to split text into words. A ``word'' is any
sequence of true word characters (letters, digits, and underscores),
optionally separated by no more than one sequential ''' character. For
example, wasn't is parsed as a single word, but wasn''t is parsed as two
words wasn and t. (And then t would be ignored as too short according to the
rules following.) Also, single quotes at the ends of words are stripped;
only embedded single quotes are retained.
---
As documented, MySQL will 'ignore' those characters while building the index
as they are not part of a 'word' ('word' as defined in the paragraph I
quoted).
As far as the search 'word' is concerned, MySQL supports a limited list of
operator, so you can assume that everything else is silently ignored...
I now have a serious issue since I what appears to be a long
word, but is really only one character long. Putting quotes
around the word seems to force it to be taken literally but
that also takes the * literally, breaking the wildcard
search. Where can I find a list of these characters that are
silently stripped out?
http://dev.mysql.com/doc/mysql/en/fulltext-boolean.html lists the operators
that are supported:
+ - () ~ *
WHERE MATCH (text)
AGAINST ('[EMAIL PROTECTED]' IN BOOLEAN MODE)
This returns all results that begin with a 't', but not 's'!
This will probably be equal to : '+t* s*' or: must contain word starting
with t, possible containing word(s) starting with s. Which is reflected in
these results:
text relevance
passe0
Type error 1
type sweetheart! 1.730698
supergranny 0
then this query:
WHERE MATCH(text) AGAINST ('[EMAIL PROTECTED]' IN
BOOLEAN MODE)
'+ qwerty* s* t*'
The first + is not leading a word, so it will be ignored: 'qwerty* s* t*'
So sweet sour set sells sins per se 1
so 0
passe 0
Type error 1
type sweetheart!! 2
supergranny 1
qwertyuiop 1
qwertyuiop same type2
Matches all words that beginning with a 's' or a 't'!
or beginning with 'qwerty'
Anyone have a concise explanation just exactly how the
wildcard character works in terms of real world strings such as this?
Hopefully the examples above are clear enough.
If you want to only support searches for words longer than say three
characters it would be wise to:
- only allow alphanumeric characters, single quotes, white space and
supported operators
- remove anything else and check for length of 'words'
If the entry passes these checks you can probably apply it to a query (after
escaping quotes!!). The checks can be performed fairly easy if you use some
kind of regular expressions in the application that uses the query.
Regards Jigal.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]