Hi Green,

Scrubbing out the data is a great suggestion, I will definitely try that
out. I did try out the other option using REGEXP instead of matching
individual conditions.  It definitely cleaned up the implementation, but
did not really improve the performance.

-Harini

[EMAIL PROTECTED] wrote:

Harini Raghavan <[EMAIL PROTECTED]> wrote on 10/04/2005 11:17:48 AM:

Hi,

I am using MYSQL 4.1 database in my J2ee application. I am facing performance issues with some queries that are being run on text fields. Since MYISAM storage engine does not support transactions(and my application requires the database tables to support transaction), I have

not been able to use FULL TEXT searches. I need to perform token matching against the text fields and so use a lot of LIKE expressions in the query. Below is one such query which is taking almost 2 mins to execute.

select count(emp.id) from executive as exec1 , company comp, target_company targetComp, employment as emp
where emp.executive_id = exec1.id
and emp.company_id = comp.id
and comp.id = targetComp.company_id and ((emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like

'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %'
OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title

like '% V.P.' OR emp.title like '% V.P..')
OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.') OR (emp.title like 'Vice President' OR emp.title like 'Vice President %'

OR emp.title like 'Vice President, %' OR emp.title like 'Vice President.
%'
OR emp.title like '% Vice President %' OR emp.title like '% Vice President, %' OR emp.title like '% Vice President. %' OR emp.title like '% Vice President' OR emp.title like '% Vice President.') OR (emp.title like 'Vice-President' OR emp.title like 'Vice-President %' OR emp.title like 'Vice-President, %' OR emp.title like 'Vice-President.

%' OR emp.title like '% Vice-President %'
OR emp.title like '% Vice-President, %' OR emp.title like '% Vice-President. %' OR emp.title like '% Vice-President'
OR emp.title like '% Vice-President.'))
and emp.active = 1

Does MYSQL provide any other option to perform text based searches? Can someone suggest any tips for performance tuning the database in this scenario?

Thanks,
Harini


It seems to me that you are trying to search on unscrubbed data. I can guess that your are collecting this data from a variety of sources and that those sources don't always use the same abbreviation or punctuation. However, it is better do deal with this kind of issue as the data arrives (before it enters your database) and not to deal with it during retrieval (as you are trying to do).

You need to scrub your data and standardize on certain names and abbreviations. Decide that "V.P." is going to be your standard for "Vice President" (and any of it's variants) and update all of your data to match.

Until then, you can use RLIKE or REGEXP to minimize how many search terms you are evaluating.
http://dev.mysql.com/doc/mysql/en/string-comparison-functions.html
http://dev.mysql.com/doc/mysql/en/regexp.html

For example, all of these terms:

(emp.title like 'V.P.' OR emp.title like 'V.P. %' OR emp.title like 'V.P., %' OR emp.title like 'V.P.. %' OR emp.title like '% V.P. %' OR emp.title like '% V.P., %' OR emp.title like '% V.P.. %' OR emp.title like '% V.P.' OR emp.title like '% V.P..') OR (emp.title like 'VP' OR emp.title like 'VP %' OR emp.title like 'VP, %' OR emp.title like 'VP. %' OR emp.title like '% VP %' OR emp.title like '% VP, %' OR emp.title like '% VP. %' OR emp.title like '% VP' OR emp.title like '% VP.')

can be simplified to just:

emp.title REGEXP '[:space:]*V.*P[ ,.]*'

Would that help?

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to