I'm trying to create a full-text index on a large (2.1GB, 1.6M records), two column table (primary key and a TEXT field), using MySQL 4.0.12 on Win2000. All looks like it is proceeding well, I see .TMP files in the database directory and a couple of temporary files in TEMP. It chugs along for an hour or so, then seems to stop doing anything at all. The process is not using any CPU cycles or I/O. The first time I tried this, I left it overnight, so it had many, many hours to complete.
The error log shows nothing related to this. It dawned on me that perhaps the problem had to do with double-byte characters, so I moved those records into another table, cleaned out some leftovers and optimized the table. However, I'm not absolutely certain I got them all. Just dawned on me to look at ASCII() of the first char of each field... and I'm trying that, but now I'm getting a long delay while the processlist says "Opening tables." Perhaps this is a clue as to what went wrong with FT indexing? Killed the process and now I'm trying to stop and restart the daemon, but it isn't responding. Ouch. The text field contains bodies of e-mail, newsgroup and web forum messages. Some are multi-part MIME messages, so there are some long lines that are essentially garbage as far as full-text indexing is concerned. Might those also cause the problem I'm seeing?. I have some Python code that will strip out the MIME and double-byte stuff, but if there's a way to convince indexing to work, I'd rather go that way than to have to build a cleaned-up copy of the table. A couple of related feature ideas for FT indexing -- skip words over N chars, skip records using a WHERE clause. Thanks for any suggestions. If I make any progress, I'll post. If I can't solve this soon, I'll be turning to Swish-E. Anybody here have any Python code for MySQL <-> Swish-E? By the way, I'd be happy to discuss full-text indexing in some depth here. I'd very much like to see it working well, and I have a lot of expertise in that area -- I used to be the product manager for advanced technology at Verity. -- Nick Arnett Phone/fax: (408) 904-7198 [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]