I'm trying to create a full-text index on a large (2.1GB, 1.6M records), two
column table (primary key and a TEXT field), using MySQL 4.0.12 on Win2000.
All looks like it is proceeding well, I see .TMP files in the database
directory and a couple of temporary files in TEMP.  It chugs along for an
hour or so, then seems to stop doing anything at all.  The process is not
using any CPU cycles or I/O.  The first time I tried this, I left it
overnight, so it had many, many hours to complete.

The error log shows nothing related to this.

It dawned on me that perhaps the problem had to do with double-byte
characters, so I moved those records into another table, cleaned out some
leftovers and optimized the table.  However, I'm not absolutely certain I
got them all.  Just dawned on me to look at ASCII() of the first char of
each field... and I'm trying that, but now I'm getting a long delay while
the processlist says "Opening tables."  Perhaps this is a clue as to what
went wrong with FT indexing?  Killed the process and now I'm trying to stop
and restart the daemon, but it isn't responding. Ouch.

The text field contains bodies of e-mail, newsgroup and web forum messages.
Some are multi-part MIME messages, so there are some long lines that are
essentially garbage as far as full-text indexing is concerned.  Might those
also cause the problem I'm seeing?.

I have some Python code that will strip out the MIME and double-byte stuff,
but if there's a way to convince indexing to work, I'd rather go that way
than to have to build a cleaned-up copy of the table.

A couple of related feature ideas for FT indexing -- skip words over N
chars, skip records using a WHERE clause.

Thanks for any suggestions.  If I make any progress, I'll post.  If I can't
solve this soon, I'll be turning to Swish-E.  Anybody here have any Python
code for MySQL <-> Swish-E?

By the way, I'd be happy to discuss full-text indexing in some depth here.
I'd very much like to see it working well, and I have a lot of expertise in
that area -- I used to be the product manager for advanced technology at
Verity.

--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to