Re: Bug in Boolean mode fulltext searching.
Joe Rhett wrote: mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where match (Text) against ('pollution control' in boolean mode); Empty set (0.02 sec) mysql select Notice_ID from Notices where Text like '%pollution control%'; +---+ | Notice_ID | +---+ |192090 | +---+ 1 row in set (5.00 sec) Your LIKE query is not equivalent to your MATCH AGAINST. For example, Text containing pollution controls would match LIKE '%pollution control%', but would not MATCH AGAINST ('+pollution +control'...). Have you looked at Text for this row to be sure it contains exactly pollution and control? No, it isn't equivalent but it would match less than the AND would, since it is an exact phrase. Yes, those words and that exact phrase exists multiple times. It has been confirmed as a bug. mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in boolean mode); Empty set (0.03 sec) mysql select Notice_ID from Notices where match (Text) against ('air pollution' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where Text like '%air pollution%'; +---+ | Notice_ID | +---+ |196349 | |196569 | |188183 | |192090 | |192686 | |199283 | +---+ 6 rows in set (0.17 sec) (NOTE on the search for air -- my.cnf has ft_min_word_len=3) Are you certain that air is indexed? What does SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air'); return? Just over 6 thousand results. And it was answered just below, here: All OR searches work perfectly fine, as per: mysql select count(*) from Notices where match (Text) against ('air pollution'); +--+ | count(*) | +--+ | 100 | +--+ 1 row in set (0.03 sec) How many do you get with SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution'); If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 'pollution'. 17 hits for pollution. Do you normally talk down to people as if they are too retarted to have tested the obvious themselves? This wasn't a complaint or a how-to question, but a researched and tested bug. I specifically demonstrated that the phrases exist and that clearly stated that the OR searches worked. I'm sorry. It was not my intention to talk down to you or otherwise give offense, but given your obvious frustration with this problem, I chose my words poorly. I apologize. I wrote my reply before Sergei recognized the problem and told you how to fix it (and even he asked for more info). While you may have done days of research, we on the list only had the evidence you gave to go on. It is important when troubleshooting not to make assumptions. As you only provided query results, not actual data, I was hoping to elicit some additional info to help with the diagnosis. For example, you gave a query using LIKE to demonstrate that there should be at least one match for your first MATCH AGAINST, but your LIKE test is not a subset of your MATCH AGAINST. You are correct that LIKE will only return the subset of returns from MATCH AGAINST that are exact phrases, but it will also return things which MATCH AGAINST will not. I gave one examaple, namely a phrase containing pollution controls. The 's' at the end would satisfy the % of the LIKE test but 'controls' is not 'control' to MATCH AGAINST. You may know that there are many rows which should have matched, but you didn't show us one. In the case of 'air pollution', I must admit I failed to say what I was getting at. There have been questions before where mysql was not using ft_min_word_len=3 (something as trivial as putting it in the wrong section in my.cnf, for example) as intended. I thought it important to rule that out. Knowing that OR searches work does not rule that out. Try it: SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('no pollution'); should return the same 17 rows as the simple MATCH AGAINST ('pollution') did. The word 'no' is too short, isn't indexed, and is thus ignored in your query. If ft_min_word_len had not been successfully set to 3, 'air' would have been ignored. In this message you provided the fact that a MATCH AGAINST 'air' returns over 6 thousand hits, which clearly shows that ft_min_word_len is set to 3, as you intended. Again, while you had probably already verified that air was indexed, you didn't show us. I did not take your message to be a complaint or a how-to question, but a real problem to be solved. In rereading my message, I can see how someone, especially someone already frustrated with the problem, could take my questions as a suggestion that you didn't know what you were doing. I didn't mean to do that, and I'm sorry I wrote it that way. But, to be fair, you must admit that for all your research, this was not confirmed as
Re: Bug in Boolean mode fulltext searching.
mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where match (Text) against ('pollution control' in boolean mode); Empty set (0.02 sec) mysql select Notice_ID from Notices where Text like '%pollution control%'; +---+ | Notice_ID | +---+ |192090 | +---+ 1 row in set (5.00 sec) Your LIKE query is not equivalent to your MATCH AGAINST. For example, Text containing pollution controls would match LIKE '%pollution control%', but would not MATCH AGAINST ('+pollution +control'...). Have you looked at Text for this row to be sure it contains exactly pollution and control? No, it isn't equivalent but it would match less than the AND would, since it is an exact phrase. Yes, those words and that exact phrase exists multiple times. It has been confirmed as a bug. mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in boolean mode); Empty set (0.03 sec) mysql select Notice_ID from Notices where match (Text) against ('air pollution' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where Text like '%air pollution%'; +---+ | Notice_ID | +---+ |196349 | |196569 | |188183 | |192090 | |192686 | |199283 | +---+ 6 rows in set (0.17 sec) (NOTE on the search for air -- my.cnf has ft_min_word_len=3) Are you certain that air is indexed? What does SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air'); return? Just over 6 thousand results. And it was answered just below, here: All OR searches work perfectly fine, as per: mysql select count(*) from Notices where match (Text) against ('air pollution'); +--+ | count(*) | +--+ | 100 | +--+ 1 row in set (0.03 sec) How many do you get with SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution'); If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 'pollution'. 17 hits for pollution. Do you normally talk down to people as if they are too retarted to have tested the obvious themselves? This wasn't a complaint or a how-to question, but a researched and tested bug. I specifically demonstrated that the phrases exist and that clearly stated that the OR searches worked. -- Joe Rhett Chief Geek [EMAIL PROTECTED] Isite Services, Inc. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
Not a bug. In the manual, section Upgrading from Version 3.23 to 4.0, there is * To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables, you need to rebuild them with `REPAIR TABLE table_name USE_FRM'. Based on a guess, or did you analyze the data file I sent? And if so, may I suggest that the upgrade documentation REALLY needs to be broken into sections? 1. Table changes 2. Privilege changes 3. Configuration changes 4. API/result changes Yeah, I did overlook that statement -- stuck between a large number of notes about result changes relative to character set implementations. -- Joe Rhett Chief Geek [EMAIL PROTECTED] Isite Services, Inc. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
Hi! On Jan 13, Joe Rhett wrote: Not a bug. In the manual, section Upgrading from Version 3.23 to 4.0, there is * To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables, you need to rebuild them with `REPAIR TABLE table_name USE_FRM'. Based on a guess, or did you analyze the data file I sent? Based on your data. There is a ft_dump utility program that comes from source distribution. I did ft_dump -d Notices 0|grep '\\(control\|pollution\)\' log then I noticed that entries are ordered by weight, not by rowid. It is what was changed in 4.0 to make boolean search to work, and it's what is fixed by `REPAIR TABLE table_name USE_FRM' And of course I tried this myself before writing to you :) And if so, may I suggest that the upgrade documentation REALLY needs to be broken into sections? 1. Table changes 2. Privilege changes 3. Configuration changes 4. API/result changes Yes, I agree. Not necesarily to these particular division, but to the fact that our upgrading sections are difficult to follow. I'm forwarding this request to our doc team. Regards, Sergei -- __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
Based on a guess, or did you analyze the data file I sent? Based on your data. .. .. And of course I tried this myself before writing to you :) Cool. Thanks for the analysis. And if so, may I suggest that the upgrade documentation REALLY needs to be broken into sections? Yes, I agree. Not necesarily to these particular division, but to the fact that our upgrading sections are difficult to follow. I'm forwarding this request to our doc team. I don't really care what divisions, just that it's easier to separate out changes that need to be made to production databases during the upgrade versus changes that need to be made to code using the new system. -- Joe Rhett Chief Geek [EMAIL PROTECTED] Isite Services, Inc. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
Hi! On Jan 07, [EMAIL PROTECTED] wrote: Description: Boolean mode fulltext searching returns zero hits for valid queries. How-To-Repeat: Create a database with a Text column. Add a fulltext index on it. Try to search for multiple words with AND or phrase syntax. Here are examples: mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where match (Text) against ('pollution control' in boolean mode); Empty set (0.02 sec) mysql select Notice_ID from Notices where Text like '%pollution control%'; +---+ | Notice_ID | +---+ |192090 | +---+ 1 row in set (5.00 sec) Not a bug. In the manual, section Upgrading from Version 3.23 to 4.0, there is * To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables, you need to rebuild them with `REPAIR TABLE table_name USE_FRM'. Regards, Sergei -- __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
[EMAIL PROTECTED] wrote: Description: Boolean mode fulltext searching returns zero hits for valid queries. How-To-Repeat: Create a database with a Text column. Add a fulltext index on it. Try to search for multiple words with AND or phrase syntax. Here are examples: mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where match (Text) against ('pollution control' in boolean mode); Empty set (0.02 sec) mysql select Notice_ID from Notices where Text like '%pollution control%'; +---+ | Notice_ID | +---+ |192090 | +---+ 1 row in set (5.00 sec) Your LIKE query is not equivalent to your MATCH AGAINST. For example, Text containing pollution controls would match LIKE '%pollution control%', but would not MATCH AGAINST ('+pollution +control'...). Have you looked at Text for this row to be sure it contains exactly pollution and control? mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in boolean mode); Empty set (0.03 sec) mysql select Notice_ID from Notices where match (Text) against ('air pollution' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where Text like '%air pollution%'; +---+ | Notice_ID | +---+ |196349 | |196569 | |188183 | |192090 | |192686 | |199283 | +---+ 6 rows in set (0.17 sec) (NOTE on the search for air -- my.cnf has ft_min_word_len=3) Are you certain that air is indexed? What does SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air'); return? All OR searches work perfectly fine, as per: mysql select count(*) from Notices where match (Text) against ('air pollution'); +--+ | count(*) | +--+ | 100 | +--+ 1 row in set (0.03 sec) How many do you get with SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution'); If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 'pollution'. These are not overly common words: mysql select count(*) from Notices; +--+ | count(*) | +--+ |11990 | +--+ 1 row in set (0.00 sec) Fix: Use a WHERE text-column LIKE %phrase% for phrase searching. No known workaround for AND searches. Submitter-Id: submitter ID Originator: Joe Rhett Organization: Isite Services, Inc. MySQL support: none Synopsis: Boolean mode fulltext searching fails. Severity: serious Priority: high Category: mysql Class: sw-bug Release:mysql-4.0.16 (Source distribution) C compiler:2.95.3 C++ compiler: 2.95.3 Environment: System: SunOS web031 5.8 Generic_108529-23 i86pc i386 i86pc Architecture: i86pc Some paths: /usr/bin/perl Compilation info: CC='gcc' CFLAGS='' CXX='g++' CXXFLAGS='' LDFLAGS='' ASFLAGS='' LIBC: lrwxrwxrwx 1 root root 11 Sep 15 18:17 /lib/libc.so - ./libc.so.1 -rwxr-xr-x 1 root bin 956112 Jul 29 20:10 /lib/libc.so.1 lrwxrwxrwx 1 root root 11 Sep 15 18:17 /usr/lib/libc.so - ./libc.so.1 -rwxr-xr-x 1 root bin 956112 Jul 29 20:10 /usr/lib/libc.so.1 Configure command: ./configure '--prefix=/opt/mysql' '--localstatedir=/var/mysql' '--without-debug' -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Bug in Boolean mode fulltext searching.
Hi! On Jan 07, [EMAIL PROTECTED] wrote: Description: Boolean mode fulltext searching returns zero hits for valid queries. How-To-Repeat: Create a database with a Text column. Add a fulltext index on it. Try to search for multiple words with AND or phrase syntax. I tried - works ok. So - examples are not enough, I need a repeatable test case (e.g. your table data). Regards, Sergei -- __ ___ ___ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Bug in Boolean mode fulltext searching.
Description: Boolean mode fulltext searching returns zero hits for valid queries. How-To-Repeat: Create a database with a Text column. Add a fulltext index on it. Try to search for multiple words with AND or phrase syntax. Here are examples: mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where match (Text) against ('pollution control' in boolean mode); Empty set (0.02 sec) mysql select Notice_ID from Notices where Text like '%pollution control%'; +---+ | Notice_ID | +---+ |192090 | +---+ 1 row in set (5.00 sec) mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in boolean mode); Empty set (0.03 sec) mysql select Notice_ID from Notices where match (Text) against ('air pollution' in boolean mode); Empty set (0.00 sec) mysql select Notice_ID from Notices where Text like '%air pollution%'; +---+ | Notice_ID | +---+ |196349 | |196569 | |188183 | |192090 | |192686 | |199283 | +---+ 6 rows in set (0.17 sec) (NOTE on the search for air -- my.cnf has ft_min_word_len=3) All OR searches work perfectly fine, as per: mysql select count(*) from Notices where match (Text) against ('air pollution'); +--+ | count(*) | +--+ | 100 | +--+ 1 row in set (0.03 sec) These are not overly common words: mysql select count(*) from Notices; +--+ | count(*) | +--+ |11990 | +--+ 1 row in set (0.00 sec) Fix: Use a WHERE text-column LIKE %phrase% for phrase searching. No known workaround for AND searches. Submitter-Id: submitter ID Originator:Joe Rhett Organization: Isite Services, Inc. MySQL support: none Synopsis: Boolean mode fulltext searching fails. Severity: serious Priority: high Category: mysql Class: sw-bug Release: mysql-4.0.16 (Source distribution) C compiler:2.95.3 C++ compiler: 2.95.3 Environment: System: SunOS web031 5.8 Generic_108529-23 i86pc i386 i86pc Architecture: i86pc Some paths: /usr/bin/perl Compilation info: CC='gcc' CFLAGS='' CXX='g++' CXXFLAGS='' LDFLAGS='' ASFLAGS='' LIBC: lrwxrwxrwx 1 root root 11 Sep 15 18:17 /lib/libc.so - ./libc.so.1 -rwxr-xr-x 1 root bin 956112 Jul 29 20:10 /lib/libc.so.1 lrwxrwxrwx 1 root root 11 Sep 15 18:17 /usr/lib/libc.so - ./libc.so.1 -rwxr-xr-x 1 root bin 956112 Jul 29 20:10 /usr/lib/libc.so.1 Configure command: ./configure '--prefix=/opt/mysql' '--localstatedir=/var/mysql' '--without-debug' -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]