You're still doing a full table scan with REGEX, so you'll never get
it really fast. I was thinking it would be slightly faster because of
less comparisons. It's the full table scan and no use of indexes that
you want to get away from. Without doing that, the only way to get
things faster is with faster disks and more RAM for caching.
I assume you created a full text index on the title field? Sorry, but
have to ask. You can perform a full text search without full text
index, but it's not going to be fast since no index is used.
Try adding a plus sign in front of the words or phrases you want to
search on when using IN BOOLEAN MODE.
MATCH(emp.title) AGAINST('+"Vice President"' IN BOOLEAN MODE)
Although I don't know why it would be matching Vice Chairman, that's
just not right. Are you patched to the latest version of MySQL? I
remember getting some really weird results with full text searches
with an older version of 4.1.
Regardless, you will have to adjust the full text settings and
reindex to find words like VP, since that is shorter than the default
4 character word length minimum. I had to do this to search on things
like "VB" (Visual Basic). MySQL still won't index "V.P." since
periods aren't valid characters for words and you wouldn't want MySQL
indexing single letter "words". I have a similar problem trying to
search on "C", the programming language. I've figured out a work
around for "C++", but not "C". I think you're stuck with doing at
least a little data scrubbing and do cleanup for new data before saving.
On Oct 5, 2005, at 7:05 AM, Harini Raghavan wrote:
Hi Brent,
Using REGEXP did not really help with the performance. I need to do
whole word matching sowould prefer not to do LIKE '%Vice President
%' as it may return ome negative results.
I separated out some of the text based columns in to a different
table using MYISAM storage engine. Using FULLTEXT with IN BOOLEAN
MODE was a bit slow, but without the IN BOOLEAN MODE, it seems
fast. However I think the phrase searches are not working properly.
For example the below query returned records where the title was
'Vice Chairman':
select emp.title from employment_title emp where MATCH(emp.title)
AGAINST('"Vice President"')
I have verified the syntax of phrase query, and it seems to bve
correct. Any idea why this is happening? Also if I have multiple
phrases is the following query syntax correct?
select emp.title from employment_title emp where MATCH(emp.title)
AGAINST('"V.P." "VP" "Vice President" "Vice-President"')
Thanks,
Harini
Brent Baisley wrote:
Egads! That's a lot of OR's. You probably want to use REGEXP
instead of all those OR's. REGEXP can be slow, but you'll be
doing far less comparisons than what you have in your current
query so it may be faster than what you have.
Something like this:
SELECT ...
WHERE ... emp.title REGEXP 'V[.]?P[.]?' OR emp.title REGEXP 'Vice
[-]? President'
I think that will match everything you have. At the very least
you don't have to check for periods, commas, space and everything
else before and after what you are searching on. Searching on
emp.title LIKE '%Vice President%', will find 'Vice President'
anywhere in the text, regardless of what come before or after it.
One thing you should think about trying to do is breaking out the
fields you need to do a full text search on into a separate table
that you can make MYISAM. You'll be adding a JOIN to your queries
that need to do the full text search, but it should be a lot
quicker and your queries much simpler. You'll then have a mix of
InnoDB and MYISAM tables, which is perfectly legal.
On Oct 4, 2005, at 11:17 AM, Harini Raghavan wrote:
Hi,
I am using MYSQL 4.1 database in my J2ee application. I am
facing performance issues with some queries that are being run
on text fields. Since MYISAM storage engine does not support
transactions (and my application requires the database tables to
support transaction), I have not been able to use FULL TEXT
searches. I need to perform token matching against the text
fields and so use a lot of LIKE expressions in the query. Below
is one such query which is taking almost 2 mins to execute.
select count(emp.id) from executive as exec1 , company comp,
target_company targetComp, employment as emp
where emp.executive_id = exec1.id
and emp.company_id = comp.id
and comp.id = targetComp.company_id and ((emp.title like 'V.P.'
OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR
emp.title like 'V.P.. %' OR emp.title like '% V.P. %'
OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR
emp.title like '% V.P.' OR emp.title like '% V.P..')
OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title
like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %'
OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR
emp.title like '% VP' OR emp.title like '% VP.')
OR (emp.title like 'Vice President' OR emp.title like 'Vice
President %' OR emp.title like 'Vice President, %' OR emp.title
like 'Vice President. %'
OR emp.title like '% Vice President %' OR emp.title like '% Vice
President, %' OR emp.title like '% Vice President. %'
OR emp.title like '% Vice President' OR emp.title like '% Vice
President.') OR (emp.title like 'Vice-President' OR emp.title
like 'Vice-President %'
OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-
President. %' OR emp.title like '% Vice-President %'
OR emp.title like '% Vice-President, %' OR emp.title like '%
Vice- President. %' OR emp.title like '% Vice-President'
OR emp.title like '% Vice-President.'))
and emp.active = 1
Does MYSQL provide any other option to perform text based
searches? Can someone suggest any tips for performance tuning
the database in this scenario?
Thanks,
Harini
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?
[EMAIL PROTECTED]
--
Brent Baisley
Systems Architect
Landover Associates, Inc.
Search & Advisory Services for Advanced Technology Environments
p: 212.759.6400/800.759.0577
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]