At 1:35 PM +1000 4/11/06, Taco Fleur wrote:
Thanks Steve,

Much appreciated, I was hoping there was something a little simpler, but I
will have a go at it.
Anyway of doing this with RegEx, would that simplify things?


Hi, and you're welcome -

Unfortunately, I don't think this can be done with regex/rlike; those
only give a boolean result (pattern matched/not matched), but can't
as far as I know be used for counting/replacing strings. This doesn't
appear to have changed even in MySQL 5.1.

        steve


-----Original Message-----
From: Steve Edberg [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 11 April 2006 9:50 AM
To: [EMAIL PROTECTED]; mysql@lists.mysql.com
Subject: Re: counting keywords

At 7:37 AM +1000 4/11/06, Taco Fleur wrote:
Hi all,

I am trying to find out whether it is possible to return the count of
keywords found in a text field, does anyone know?
For example; "ColdFusion or Java" is entered in the search string and
20 records are found that match, I need to count how many times
"ColdFusion"
and "Java" appears in each match, add those two and than sort
descending on that total.

I was actually using verity for the search, but the client insists he
sees the number of keywords found, which Verity does not do.

The text searched are Résumé's, I initially thought that MySQL could
search the résumé's when stored as binary data, but I was wrong, so I
now have the CV's converted to HTML and then store them in the DB as
VARCHAR

Any help would be much appreciated, I am having a hard time coming from
a MS SQL background ;-)


There's no function that I know of to do that directly; however, you could
do something like this:

    select
       (length(your_text_field)-length(replace(your_text_field,
'coldfusion','')))/length('coldfusion')
       as wordcount
    from
       your_table

That is, it removes all instances of 'coldfusion' from  your string, gets
the difference in length from the unaltered string, and divides that by the
number of characters in your search string. I've used this method several
times.

Of course, if you want to avoid matching against terms like 'javalike' or
'coldfusionista' then you've gotta do some additional checking, for example:

    select
       (length(your_text_field)-length(replace(concat('
',your_text_field,' '), ' coldfusion ','')))/length(' coldfusion ')
       as wordcount
    from
       your_table

See

    http://dev.mysql.com/doc/refman/4.1/en/string-functions.html

for more info.

If you're doing this alot, it might be more efficient to build a word index
table like:

    word        char(32) not null         # or whatever your max word
length is likely to be
    word_count  integer unsigned not null
    document_id integer unsigned not null

where document_id is a foreign key pointing at the table containing your
fulltext. This would be easier to extend to handle synonym handling too, and
you could do all the suffix handling/stemming you need (eg; to take care of
plurals). I've done something like that as well, and included an extra field
for the metaphone version of the word, to match approximate spellings. If
the text fields were all in plain text, you could even include character
positions like

    word          char(32) not null
    word_position integer unsigned not null
    document_id   integer unsigned not null

then you could get word counts by doing a select count(word).


        steve


  Kind regards,


Taco Fleur

Free Call 1800 032 982 or Mobile 0421 851 786 Pacific Fox
<http://www.pacificfox.com.au/> http://www.pacificfox.com.au an
industry leader with commercial IT experience since 1994 …

*

        Web Design and Development
*

        SMS Solutions, including developer API
*

        Domain Registration, .COM for as low as fifteen dollars a year,
.COM.AU for fifty dollars two years!
 >



--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg                                http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center                            [EMAIL PROTECTED] |
| Bioinformatics programming/database/sysadmin             (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to