Re: Bug in Boolean mode fulltext searching.

2004-01-25 Thread Michael Stassen
Joe Rhett wrote:

mysql select Notice_ID from Notices where match (Text) against 
('+pollution +control' in boolean mode);
Empty set (0.00 sec)

mysql select Notice_ID from Notices where match (Text) against 
('pollution control' in boolean mode);
Empty set (0.02 sec)

mysql select Notice_ID from Notices where Text like '%pollution 
control%';
+---+
| Notice_ID |
+---+
|192090 |
+---+
1 row in set (5.00 sec)
Your LIKE query is not equivalent to your MATCH AGAINST.  For example, 
Text containing pollution controls would match LIKE '%pollution 
control%', but would not MATCH AGAINST ('+pollution +control'...).  Have 
you looked at Text for this row to be sure it contains exactly 
pollution and control?
No, it isn't equivalent but it would match less than the AND would,
since it is an exact phrase.  Yes, those words and that exact phrase 
exists multiple times.  It has been confirmed as a bug.

mysql select Notice_ID from Notices where match (Text) against 
('+pollution +air' in boolean mode);
Empty set (0.03 sec)

mysql select Notice_ID from Notices where match (Text) against ('air 
pollution' in boolean mode);
Empty set (0.00 sec)

mysql select Notice_ID from Notices where Text like '%air pollution%';
+---+
| Notice_ID |
+---+
|196349 |
|196569 |
|188183 |
|192090 |
|192686 |
|199283 |
+---+
6 rows in set (0.17 sec)
(NOTE on the search for air -- my.cnf has ft_min_word_len=3)
Are you certain that air is indexed?  What does

 SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air');

return?


Just over 6 thousand results.  And it was answered just below, here:


All OR searches work perfectly fine, as per:

mysql select count(*) from Notices where match (Text) against ('air 
pollution');
+--+
| count(*) |
+--+
|  100 |
+--+
1 row in set (0.03 sec)
How many do you get with

 SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution');

If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 
'pollution'.
 
17 hits for pollution.

Do you normally talk down to people as if they are too retarted to have
tested the obvious themselves?  This wasn't a complaint or a how-to
question, but a researched and tested bug.  I specifically demonstrated
that the phrases exist and that clearly stated that the OR searches worked.
I'm sorry.  It was not my intention to talk down to you or otherwise 
give offense, but given your obvious frustration with this problem, I 
chose my words poorly.  I apologize.

I wrote my reply before Sergei recognized the problem and told you how 
to fix it (and even he asked for more info).  While you may have done 
days of research, we on the list only had the evidence you gave to go 
on.  It is important when troubleshooting not to make assumptions.  As 
you only provided query results, not actual data, I was hoping to elicit 
some additional info to help with the diagnosis.

For example, you gave a query using LIKE to demonstrate that there 
should be at least one match for your first MATCH AGAINST, but your LIKE 
test is not a subset of your MATCH AGAINST.  You are correct that LIKE 
will only return the subset of returns from MATCH AGAINST that are exact 
phrases, but it will also return things which MATCH AGAINST will not.  I 
gave one examaple, namely a phrase containing pollution controls.  The 
's' at the end would satisfy the % of the LIKE test but 'controls' is 
not 'control' to MATCH AGAINST.  You may know that there are many rows 
which should have matched, but you didn't show us one.

In the case of 'air pollution', I must admit I failed to say what I was 
getting at.  There have been questions before where mysql was not using 
ft_min_word_len=3 (something as trivial as putting it in the wrong 
section in my.cnf, for example) as intended.  I thought it important to 
rule that out.  Knowing that OR searches work does not rule that out. 
Try it:

  SELECT COUNT(*) FROM Notices
  WHERE MATCH (Text) AGAINST ('no pollution');
should return the same 17 rows as the simple MATCH AGAINST ('pollution') 
did.  The word 'no' is too short, isn't indexed, and is thus ignored in 
your query.  If ft_min_word_len had not been successfully set to 3, 
'air' would have been ignored.  In this message you provided the fact 
that a MATCH AGAINST 'air' returns over 6 thousand hits, which clearly 
shows that ft_min_word_len is set to 3, as you intended.  Again, while 
you had probably already verified that air was indexed, you didn't show us.

I did not take your message to be a complaint or a how-to question, 
but a real problem to be solved.  In rereading my message, I can see how 
someone, especially someone already frustrated with the problem, could 
take my questions as a suggestion that you didn't know what you were 
doing.  I didn't mean to do that, and I'm sorry I wrote it that way. 
But, to be fair, you must admit that for all your research, this was not 
confirmed as 

Re: Bug in Boolean mode fulltext searching.

2004-01-13 Thread Joe Rhett
 mysql select Notice_ID from Notices where match (Text) against 
 ('+pollution +control' in boolean mode);
 Empty set (0.00 sec)
 
 mysql select Notice_ID from Notices where match (Text) against 
 ('pollution control' in boolean mode);
 Empty set (0.02 sec)
 
 mysql select Notice_ID from Notices where Text like '%pollution 
 control%';
 +---+
 | Notice_ID |
 +---+
 |192090 |
 +---+
 1 row in set (5.00 sec)
 
 Your LIKE query is not equivalent to your MATCH AGAINST.  For example, 
 Text containing pollution controls would match LIKE '%pollution 
 control%', but would not MATCH AGAINST ('+pollution +control'...).  Have 
 you looked at Text for this row to be sure it contains exactly 
 pollution and control?

No, it isn't equivalent but it would match less than the AND would,
since it is an exact phrase.  Yes, those words and that exact phrase 
exists multiple times.  It has been confirmed as a bug.

 mysql select Notice_ID from Notices where match (Text) against 
 ('+pollution +air' in boolean mode);
 Empty set (0.03 sec)
 
 mysql select Notice_ID from Notices where match (Text) against ('air 
 pollution' in boolean mode);
 Empty set (0.00 sec)
 
 mysql select Notice_ID from Notices where Text like '%air pollution%';
 +---+
 | Notice_ID |
 +---+
 |196349 |
 |196569 |
 |188183 |
 |192090 |
 |192686 |
 |199283 |
 +---+
 6 rows in set (0.17 sec)
 
 (NOTE on the search for air -- my.cnf has ft_min_word_len=3)
 
 Are you certain that air is indexed?  What does
 
   SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air');
 
 return?

Just over 6 thousand results.  And it was answered just below, here:

 All OR searches work perfectly fine, as per:
 
 mysql select count(*) from Notices where match (Text) against ('air 
 pollution');
 +--+
 | count(*) |
 +--+
 |  100 |
 +--+
 1 row in set (0.03 sec)
 
 How many do you get with
 
   SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution');
 
 If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 
 'pollution'.
 
17 hits for pollution.

Do you normally talk down to people as if they are too retarted to have
tested the obvious themselves?  This wasn't a complaint or a how-to
question, but a researched and tested bug.  I specifically demonstrated
that the phrases exist and that clearly stated that the OR searches worked.

-- 
Joe Rhett  Chief Geek
[EMAIL PROTECTED]  Isite Services, Inc.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bug in Boolean mode fulltext searching.

2004-01-13 Thread Joe Rhett
 Not a bug.
 In the manual, section Upgrading from Version 3.23 to 4.0, there is
 
* To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables,
  you need to rebuild them with `REPAIR TABLE table_name USE_FRM'.
 
Based on a guess, or did you analyze the data file I sent?

And if so, may I suggest that the upgrade documentation REALLY needs to
be broken into sections?

1. Table changes
2. Privilege changes
3. Configuration changes
4. API/result changes

Yeah, I did overlook that statement -- stuck between a large number of
notes about result changes relative to character set implementations.

-- 
Joe Rhett  Chief Geek
[EMAIL PROTECTED]  Isite Services, Inc.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bug in Boolean mode fulltext searching.

2004-01-13 Thread Sergei Golubchik
Hi!

On Jan 13, Joe Rhett wrote:
  Not a bug.
  In the manual, section Upgrading from Version 3.23 to 4.0, there is
  
 * To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables,
   you need to rebuild them with `REPAIR TABLE table_name USE_FRM'.
  
 Based on a guess, or did you analyze the data file I sent?

Based on your data.
There is a ft_dump utility program that comes from source distribution.

I did 

  ft_dump -d Notices 0|grep '\\(control\|pollution\)\'  log

then I noticed that entries are ordered by weight, not by rowid.
It is what was changed in 4.0 to make boolean search to work,
and it's what is fixed by `REPAIR TABLE table_name USE_FRM'

And of course I tried this myself before writing to you :)
 
 And if so, may I suggest that the upgrade documentation REALLY needs to
 be broken into sections?
 
 1. Table changes
 2. Privilege changes
 3. Configuration changes
 4. API/result changes

Yes, I agree. Not necesarily to these particular division, but to the
fact that our upgrading sections are difficult to follow.
I'm forwarding this request to our doc team.
 
Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bug in Boolean mode fulltext searching.

2004-01-13 Thread Joe Rhett
  Based on a guess, or did you analyze the data file I sent?
 
 Based on your data.
..  ..
 And of course I tried this myself before writing to you :)
  
Cool. Thanks for the analysis.

  And if so, may I suggest that the upgrade documentation REALLY needs to
  be broken into sections?
 
 Yes, I agree. Not necesarily to these particular division, but to the
 fact that our upgrading sections are difficult to follow.
 I'm forwarding this request to our doc team.
  
I don't really care what divisions, just that it's easier to separate out
changes that need to be made to production databases during the upgrade
versus changes that need to be made to code using the new system.

-- 
Joe Rhett  Chief Geek
[EMAIL PROTECTED]  Isite Services, Inc.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bug in Boolean mode fulltext searching.

2004-01-12 Thread Sergei Golubchik
Hi!

On Jan 07, [EMAIL PROTECTED] wrote:
 Description:
   Boolean mode fulltext searching returns zero hits for valid queries. 
 How-To-Repeat:
   Create a database with a Text column.  Add a fulltext index on it.
   Try to search for multiple words with AND or phrase syntax.
 
   Here are examples:
 
 mysql select Notice_ID from Notices where match (Text) against
 ('+pollution +control' in boolean mode);
 Empty set (0.00 sec)
 
 mysql select Notice_ID from Notices where match (Text) against
 ('pollution control' in boolean mode);
 Empty set (0.02 sec)
 
 mysql select Notice_ID from Notices where Text like '%pollution control%';  

 +---+
 | Notice_ID |
 +---+
 |192090 |
 +---+
 1 row in set (5.00 sec)

Not a bug.
In the manual, section Upgrading from Version 3.23 to 4.0, there is

   * To use `MATCH ... AGAINST (... IN BOOLEAN MODE)' with your tables,
 you need to rebuild them with `REPAIR TABLE table_name USE_FRM'.

Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Bug in Boolean mode fulltext searching.

2004-01-11 Thread Michael Stassen
[EMAIL PROTECTED] wrote:

Description:
	Boolean mode fulltext searching returns zero hits for valid queries. 

How-To-Repeat:
Create a database with a Text column.  Add a fulltext index on it.
Try to search for multiple words with AND or phrase syntax.
	Here are examples:

mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' 
in boolean mode);
Empty set (0.00 sec)
mysql select Notice_ID from Notices where match (Text) against ('pollution control' 
in boolean mode);
Empty set (0.02 sec)
mysql select Notice_ID from Notices where Text like '%pollution control%'; 
+---+
| Notice_ID |
+---+
|192090 |
+---+
1 row in set (5.00 sec)
Your LIKE query is not equivalent to your MATCH AGAINST.  For example, 
Text containing pollution controls would match LIKE '%pollution 
control%', but would not MATCH AGAINST ('+pollution +control'...).  Have 
you looked at Text for this row to be sure it contains exactly 
pollution and control?

mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in 
boolean mode);
Empty set (0.03 sec)
mysql select Notice_ID from Notices where match (Text) against ('air pollution' in 
boolean mode);
Empty set (0.00 sec)
mysql select Notice_ID from Notices where Text like '%air pollution%';
+---+
| Notice_ID |
+---+
|196349 |
|196569 |
|188183 |
|192090 |
|192686 |
|199283 |
+---+
6 rows in set (0.17 sec)
(NOTE on the search for air -- my.cnf has ft_min_word_len=3)
Are you certain that air is indexed?  What does

  SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('air');

return?

All OR searches work perfectly fine, as per:

mysql select count(*) from Notices where match (Text) against ('air pollution');
+--+
| count(*) |
+--+
|  100 |
+--+
1 row in set (0.03 sec)
How many do you get with

  SELECT COUNT(*) FROM Notices WHERE MATCH (Text) AGAINST ('pollution');

If air isn't indexed, I'd expect 0 hits for 'air' and 100 hits for 
'pollution'.

These are not overly common words:

mysql select count(*) from Notices;
+--+
| count(*) |
+--+
|11990 |
+--+
1 row in set (0.00 sec)

Fix:
Use a WHERE text-column LIKE %phrase%  for phrase searching.
No known workaround for AND searches.

Submitter-Id:   submitter ID
Originator: Joe Rhett
Organization:   
	Isite Services, Inc.

MySQL support: none
Synopsis:   Boolean mode fulltext searching fails.
Severity:   serious
Priority:   high
Category:   mysql
Class:  sw-bug
Release:mysql-4.0.16 (Source distribution)


C compiler:2.95.3
C++ compiler:  2.95.3
Environment:

System: SunOS web031 5.8 Generic_108529-23 i86pc i386 i86pc
Architecture: i86pc
Some paths:  /usr/bin/perl

Compilation info: CC='gcc'  CFLAGS=''  CXX='g++'  CXXFLAGS=''  LDFLAGS=''  ASFLAGS=''
LIBC: 
lrwxrwxrwx   1 root root  11 Sep 15 18:17 /lib/libc.so - ./libc.so.1
-rwxr-xr-x   1 root bin   956112 Jul 29 20:10 /lib/libc.so.1
lrwxrwxrwx   1 root root  11 Sep 15 18:17 /usr/lib/libc.so - ./libc.so.1
-rwxr-xr-x   1 root bin   956112 Jul 29 20:10 /usr/lib/libc.so.1
Configure command: ./configure '--prefix=/opt/mysql' '--localstatedir=/var/mysql' '--without-debug'




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Bug in Boolean mode fulltext searching.

2004-01-08 Thread Sergei Golubchik
Hi!

On Jan 07, [EMAIL PROTECTED] wrote:
 Description:
   Boolean mode fulltext searching returns zero hits for valid queries. 
 How-To-Repeat:
   Create a database with a Text column.  Add a fulltext index on it.
   Try to search for multiple words with AND or phrase syntax.

I tried - works ok.

So - examples are not enough, I need a repeatable test case (e.g. your
table data).

Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Bug in Boolean mode fulltext searching.

2004-01-07 Thread jrhett
Description:
Boolean mode fulltext searching returns zero hits for valid queries. 
How-To-Repeat:
Create a database with a Text column.  Add a fulltext index on it.
Try to search for multiple words with AND or phrase syntax.

Here are examples:

mysql select Notice_ID from Notices where match (Text) against ('+pollution +control' 
in boolean mode);
Empty set (0.00 sec)

mysql select Notice_ID from Notices where match (Text) against ('pollution control' 
in boolean mode);
Empty set (0.02 sec)

mysql select Notice_ID from Notices where Text like '%pollution control%';
 
+---+
| Notice_ID |
+---+
|192090 |
+---+
1 row in set (5.00 sec)

mysql select Notice_ID from Notices where match (Text) against ('+pollution +air' in 
boolean mode);
Empty set (0.03 sec)

mysql select Notice_ID from Notices where match (Text) against ('air pollution' in 
boolean mode);
Empty set (0.00 sec)

mysql select Notice_ID from Notices where Text like '%air pollution%';
+---+
| Notice_ID |
+---+
|196349 |
|196569 |
|188183 |
|192090 |
|192686 |
|199283 |
+---+
6 rows in set (0.17 sec)

(NOTE on the search for air -- my.cnf has ft_min_word_len=3)

All OR searches work perfectly fine, as per:

mysql select count(*) from Notices where match (Text) against ('air pollution');
+--+
| count(*) |
+--+
|  100 |
+--+
1 row in set (0.03 sec)

These are not overly common words:

mysql select count(*) from Notices;
+--+
| count(*) |
+--+
|11990 |
+--+
1 row in set (0.00 sec)

Fix:
Use a WHERE text-column LIKE %phrase%  for phrase searching.
No known workaround for AND searches.

Submitter-Id:  submitter ID
Originator:Joe Rhett
Organization:  
Isite Services, Inc.
MySQL support: none
Synopsis:  Boolean mode fulltext searching fails.
Severity:  serious
Priority:  high
Category:  mysql
Class: sw-bug
Release:   mysql-4.0.16 (Source distribution)

C compiler:2.95.3
C++ compiler:  2.95.3
Environment:

System: SunOS web031 5.8 Generic_108529-23 i86pc i386 i86pc
Architecture: i86pc

Some paths:  /usr/bin/perl

Compilation info: CC='gcc'  CFLAGS=''  CXX='g++'  CXXFLAGS=''  LDFLAGS=''  ASFLAGS=''
LIBC: 
lrwxrwxrwx   1 root root  11 Sep 15 18:17 /lib/libc.so - ./libc.so.1
-rwxr-xr-x   1 root bin   956112 Jul 29 20:10 /lib/libc.so.1
lrwxrwxrwx   1 root root  11 Sep 15 18:17 /usr/lib/libc.so - ./libc.so.1
-rwxr-xr-x   1 root bin   956112 Jul 29 20:10 /usr/lib/libc.so.1
Configure command: ./configure '--prefix=/opt/mysql' '--localstatedir=/var/mysql' 
'--without-debug'


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]