Hi Green,
Scrubbing out the data is a great suggestion, I will definitely try that
out. I did try out the other option using REGEXP instead of matching
individual conditions. It definitely cleaned up the implementation, but
did not really improve the performance.
-Harini
[EMAIL PROTECTED] wrote:
Harini Raghavan <[EMAIL PROTECTED]> wrote on 10/04/2005
11:17:48 AM:
Hi,
I am using MYSQL 4.1 database in my J2ee application. I am facing
performance issues with some queries that are being run on text fields.
Since MYISAM storage engine does not support transactions(and my
application requires the database tables to support transaction), I have
not been able to use FULL TEXT searches. I need to perform token
matching against the text fields and so use a lot of LIKE expressions
in the query. Below is one such query which is taking almost 2 mins to
execute.
select count(emp.id) from executive as exec1 , company comp,
target_company targetComp, employment as emp
where emp.executive_id = exec1.id
and emp.company_id = comp.id
and comp.id = targetComp.company_id
and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like
'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %'
OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title
like '% V.P.' OR emp.title like '% V.P..')
OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP,
%' OR emp.title like 'VP. %' OR emp.title like '% VP %'
OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title
like '% VP' OR emp.title like '% VP.')
OR (emp.title like 'Vice President' OR emp.title like 'Vice President %'
OR emp.title like 'Vice President, %' OR emp.title like 'Vice President.
%'
OR emp.title like '% Vice President %' OR emp.title like '% Vice
President, %' OR emp.title like '% Vice President. %'
OR emp.title like '% Vice President' OR emp.title like '% Vice
President.') OR (emp.title like 'Vice-President' OR emp.title like
'Vice-President %'
OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President.
%' OR emp.title like '% Vice-President %'
OR emp.title like '% Vice-President, %' OR emp.title like '%
Vice-President. %' OR emp.title like '% Vice-President'
OR emp.title like '% Vice-President.'))
and emp.active = 1
Does MYSQL provide any other option to perform text based searches? Can
someone suggest any tips for performance tuning the database in this
scenario?
Thanks,
Harini
It seems to me that you are trying to search on unscrubbed data. I can
guess that your are collecting this data from a variety of sources and
that those sources don't always use the same abbreviation or punctuation.
However, it is better do deal with this kind of issue as the data arrives
(before it enters your database) and not to deal with it during retrieval
(as you are trying to do).
You need to scrub your data and standardize on certain names and
abbreviations. Decide that "V.P." is going to be your standard for "Vice
President" (and any of it's variants) and update all of your data to
match.
Until then, you can use RLIKE or REGEXP to minimize how many search terms
you are evaluating.
http://dev.mysql.com/doc/mysql/en/string-comparison-functions.html
http://dev.mysql.com/doc/mysql/en/regexp.html
For example, all of these terms:
(emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like
'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %'
OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title
like '% V.P.' OR emp.title like '% V.P..')
OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP,
%' OR emp.title like 'VP. %' OR emp.title like '% VP %'
OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title
like '% VP' OR emp.title like '% VP.')
can be simplified to just:
emp.title REGEXP '[:space:]*V.*P[ ,.]*'
Would that help?
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]