Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach

Wednesday, from Mike Wexler:
 I don't think that would be appropriate. My example, is our site (tias.com) has
 lots of antiques and collectibles. One popular categories is jewelry. If
 somebody does a search for gold jewelry and the search engine interprets this
 as anything that mentions gold or jewelry. It is going to match a lot of items.
 It would be nice if we could use explain or something like it to get a rough
 estimate of how many results a query would generate, and if it was really bad,
 we could tell the user to be more specific.

This is not a solution, but we make it by using the sql query

SELECT COUNT(*) FROM table WHERE MATCH(index) AGAINST ('gold')
(results e.g. in 100)

and

SELECT COUNT(*) FROM table WHERE MATCH(index) AGAINST ('jewelry')
(results e.g. in 200)

OR-Search: The result is between 300 and 500 matches.
AND-Search: The result is between 0 and 200 matches.

The problem is: The queries lasts nearly as fast, as the complete search.
:)

-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach

Wednesday, from David Axmark:
  Your other point about exact vs. approximate answers is unclear, I expect
  that Google's answers are exact for their currently available indexes at any
  given time.  But even if they are approximate, I'd be happy with that too.
  The scoring on a FULLTEXT search in Mysql is exact but based on a
  formula that is approximate anyway.

 No, MySQL returns all data according to a search. Web engines return
 what they they find on one search machine. So you can get different
 results with Google every time you hit refresh if you are routed to
 different machines. This had happened to me when I was looking for the
 number of matches and not the result itself.

 So we should try to make fulltext searches with a limit between 10 and
 100 be fast to be closer to google.

 I have also head about some other things web search engines do since I
 know some people at FAST but I have forgot that already.

My opinion is, that mySQL itself never should try to find approximate
matches. This is against the definition of SQL itself. SQL is a fourth
generation language. That means, if you say SELECT, the engine selects.
And it has to be as exactly that, what I have searched, every time, on
every machine in any combination with the same data.

So SQL needs a new language construct to make an approximate search. But
what is an approximate search? How is approximate defined?

I don't think it is a good idea to implement it in this way.
Approximazation must be always done on the application level, cause it is
highly dependend on application, what an approximate result could be.

 We will try to make every feature as good as possible. But we do have
 limited resources.

Exactly. FTS is not so important as other features and people which want
you to include a new feature should think about supporting mysql with
money. :-)

But (yes, we support mysql! :-) I think the need is growing rapidly, cause
the amount of data, that has to be indexed is growing over the years. And
other DB's have much more experices with it. Currently we can live with
the speed. Those who cannot live with it should buy better machines,
think about their SE-concept or support mysql.

Search engines techniques are *not* trivial, so the last way is in my eyes
one of the cheapest.


 Well there is always the option of sponsoring further fulltext
 development. We have a guy who has been working on the GNU fulltext
 engines who is interesting in working with MySQL fulltext. But for the
 moment we can not afford it.

This was my first thought: People write about speed problems and how to
cluster and so on. Things, that I would calculate with weeks and high TCO.

But it maybe much cheaper to pay mySQL for this. How much do you estimate
would it cost to implement inverted files? I think this is difficult,
cause Sergei told me, that he couldn't use mySQL-index files any more.

I just ask, nothing special in sight, but many questions from everyone who
needs it. Cause FTS is a feature which highly improves the value of a web
site. And coustomers have no problem to pay for things they think they get
money for. FTS is such a thing.

But perhaps if we know, under which circumstances FTS is improved, it is
easier for us to find a possible way to share the costs for it or find a
compromise. I also understand, if mySQL don't want to speak about it
here.

I think it is also important for us, how much it can be theoretically
improved. My calculations showed me a theoretical speed up of factor 100
or so. This is ... wow. But in live everything is most times slower...


 So if some of you are interested in sponsoring this (or know about
 others who might be) write to [EMAIL PROTECTED]

Or like this... maybe we find coustomers who needs it. Think it's
possible.

My personal feeling is and my stomach says, that fulltext indexing is a
feature, which needs to be expanded.

-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: long query on php

2002-02-13 Thread Alex Aulbach

Yesterday, from savaidis:

 The obious question is: (before I test it)
 This is concatenation to $query that is a string type, no?

Yea. The following works either:

mysql_query(create bla.
bla.
bla
bla blabla
.
bla
.bla
);



 So the  limit isn't  also set to 255 chars too?

There is no such limit in PHP.

 Or is a trick especially for this case?

There is no any bit nothing trick not. Read PHP-Doku about defintions of
strings.

-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: long query on php

2002-02-11 Thread Alex Aulbach

Yesterday, from Nathan:

 Since PHP ignores whitespace, I think this is also acceptable:

PHP doesn't ignore whitespace in a quote. But your code is of course
correct.

 $query = CREATE TABLE query
   tabledef for col a,
   tabledef for col b,
   lots more table defs,
   .
   welcome to line 400;

Another way: (without test)

$bladefault=24;
$query = eoq;
CREATE TABLE query (
  idchar(2) default 24 # doublequotes work
  hugo  char(2) default '24' # singlequotes work either
  bla   char(2) default '$bladefault' # inline vars work
...
)
eoq


--
 How is possible to pass a long query to MySQL server with php?
 I mean i.e a create table statement with more than 400 chars.
 Have I to use shorter create and then alter?

Strings in PHP can be really, really long and are binary safe. Perhaps the
problem is the editor?


-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: FULL-TEXT Index / Searching Question (fwd)

2002-02-08 Thread Alex Aulbach


Tuesday, from Matt Rudderham:

 Hello, I have two tables in my database as such:

 CREATE TABLE `skill_names` (
   `id` bigint(20) NOT NULL auto_increment,
   `name` varchar(30) NOT NULL default '',
   PRIMARY KEY  (`id`)
 );

 CREATE TABLE `skills` (
   `skills_id` int(11) NOT NULL auto_increment,
   `member_id` int(11) NOT NULL default '0',
   `schooling` varchar(100) default NULL,
   `certifications` varchar(20) NOT NULL default '',
   `description` blob NOT NULL,
   `skill_name_id` bigint(20) NOT NULL default '0',
   PRIMARY KEY  (`skills_id`)
 );

 I would like to make full text indexes of the skills table as well as
 the other tables in the database.  My question is that I would like to
 be able to search for the Member_ID's that have a certain skill name.
 How would I accomplish this? Also, right now the database has about 300
 records, the database runs on a Pentium 200  with 96Mb. Can it handle
 this not much traffic? Thanks.

I must read this a while to understand... sql sql sql

This is, what you mean:

CREATE FULLTEXT INDEX name (name)... hhhm forgot the correct syntax be
free to look into docs

SELECT * FROM skills,skillnames WHERE skill_name_id=id and
  MATCH(name) AGAINST('YOUR SEARCHED SKILLS');


BTW: These tables are suboptimal. You can reduce them to one table. This
kind of parting the tables makes only sense, if you have 3 Billion not 300
records.



-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Distributed Fulltext?

2002-02-08 Thread Alex Aulbach

Yesterday, from Brian DeFeyter:

 Has anyone made a suggestion or thought about ways to distribute
 databases which focus on fulltext indexes?

 fulltext indexes do a good job of indexing a moderate amount of data,
 but when you get a lot of data to be indexed, the queries slow down
 significantly.

Yea, this is cause the method to search the index is a geometric
algorithm.

 I have an example table, with about 90 million rows.. and has a fulltext
 index on a varchar(100) field. A single-word query which would return
 approx 300k results takes an average of 15 seconds. A query with smaller
 results (~ 10k) can be as quick as 1 sec.. which I would consider
 acceptable.

The only interesting thing is, how many words have to be indexed (how many
rows is not very important), how big grows your index (does it go in the
memory) and how many rows can be found for one word.

These are the most depending things.

 Has any thought about splitting the data into distributed files or even
 machines? ie: something as simple as 'words' starting with 'X' are split
 into a-h, i-p, q-z... or something more advanced? (maybe mysqld could
 automatically split results based on (#results per unique 'word' /
 desired # of 'split files/machines') Would such a system give any
 advantages to searching speed and concurrenct query scalability? I
 haven't looked at the fulltext internals.. so I don't know if such
 query routing could take place or not.

Hum, I think it's *much* cheaper is to come together and pay the mySQL
people to introduce a new feature into mySQL called inverted files. This
method is in short, that you only store the word and in which records it
can be found. This can redurce the size of indexes and so the speed
dramatically.

 If nothing else, does anyone else have experience with a table of this
 size or even larger? What kind of tuning have you done?

We have made for example an extra table for indexing on an extra server.
This is good, cause

- mySQL can concentrate only for this table, so the
machine dosn't need to swap

- big searches or reindex dosn't bother the rest of the system

- indexing can be done via a cron-job

- we optimized the write routines, for example we stripped all tags out
and wrote special stop-lists, before we write it.



BTW: My wishlist for fulltext indexing:
---

- inverted files

- rules to define words

- stop-word-lists and stop-word-regex

- a function which returns a table, which can tell me, what can be
searched.
E.g SHOW MATCH(table.index) AGAINST('hugobla hugo bla');

 word  count rows stopword   autostopword   score wscore
 hugobla  22no no 10  1
 hugo106no no  8  1
 bla  00no yes 0  1

Words can be searched, depending on rules, stop-words and what has been
indexed.

- configurable scoring, e.g. very fast (and simple) scoring for speed

- special scores for special words

- a cache, which stores often used words and the rows


-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: long query on php

2002-02-07 Thread Alex Aulbach

Yesterday, from savaidis:

 The obious question is: (before I test it)
 This is concatenation to $query that is a string type, no?

Yea. The following works either:

mysql_query(create bla.
bla.
bla
bla blabla
.
bla
.bla
);



 So the  limit isn't  also set to 255 chars too?

There is no such limit in PHP.

 Or is a trick especially for this case?

There is no any bit nothing trick not. Read PHP-Doku about defintions of
strings.

-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: long query on php

2002-02-07 Thread Alex Aulbach

Yesterday, from Nathan:

 Since PHP ignores whitespace, I think this is also acceptable:

PHP doesn't ignore whitespace in a quote. But your code is of course
correct.

 $query = CREATE TABLE query
   tabledef for col a,
   tabledef for col b,
   lots more table defs,
   .
   welcome to line 400;

Another way: (without test)

$bladefault=24;
$query = eoq;
CREATE TABLE query (
  idchar(2) default 24 # doublequotes work
  hugo  char(2) default '24' # singlequotes work either
  bla   char(2) default '$bladefault' # inline vars work
...
)
eoq


--
 How is possible to pass a long query to MySQL server with php?
 I mean i.e a create table statement with more than 400 chars.
 Have I to use shorter create and then alter?

Strings in PHP can be really, really long and are binary safe. Perhaps the
problem is the editor?


-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Fulltext problems 4.0.1 part two (fwd)

2002-02-03 Thread Alex Aulbach


I found out, that the fulltext engine is misbehaving with less than 5
entries. Seems to me, that this is exactly the same case...

Perhaps a flush inside the engine or so isn't made with such less entries?
The bug is also in 3.23 gamma.

Workaround: Enter more rows to fix it. :)

I send this bug 1 month ago to the developer who has done the fulltext
engine (I just forgot the name and the adresses are in the company - a
russian name). Perhaps I should have sent it better to bugs...

-- 

SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris

-- Forwarded message --
Date: Sat, 2 Feb 2002 11:41:36 +0300
From: Alexander Belyaev [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Fulltext problems 4.0.1 part two

Hello,

I have found a bug( or bad feature? :) ) in 4.0.1
boolean fulltext search

Please help...


Test case with table can be found at http://logo.nino.ru/fulltextbug.tar.gz

Alexander

( sql database query table for you, spam protector:)

--
show variables like 'ft%'
--

+--++
| Variable_name| Value  |
+--++
| ft_min_word_len  | 2  |
| ft_max_word_len  | 254|
| ft_max_word_len_for_sort | 20 |
| ft_boolean_syntax| + -()~*:| |
+--++
4 rows in set (0.00 sec)

--
show index from logo_text
--

+---++--+--+-+--
-+-+--++--+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part | Packed | Comment  |
+---++--+--+-+--
-+-+--++--+
| logo_text |  0 | PRIMARY  |1 | id  | A
|   38347 | NULL | NULL   |  |
| logo_text |  1 |  |1 | txt | A
|   38347 |1 | NULL   | FULLTEXT |
+---++--+--+-+--
-+-+--++--+
2 rows in set (0.00 sec)

--
check table logo_text
--

++---+--+---
--+
| Table  | Op| Msg_type | Msg_text
|
++---+--+---
--+
| test.logo_text | check | warning  | Found 38347 parts
Should be: 0 parts |
| test.logo_text | check | status   | OK
|
++---+--+---
--+
2 rows in set (0.40 sec)

--
repair table logo_text
--

+++--+--+
| Table  | Op | Msg_type | Msg_text |
+++--+--+
| test.logo_text | repair | status   | OK   |
+++--+--+
1 row in set (3.02 sec)

--
check table logo_text
--

++---+--+---
--+
| Table  | Op| Msg_type | Msg_text
|
++---+--+---
--+
| test.logo_text | check | warning  | Found 38347 parts
Should be: 0 parts |
| test.logo_text | check | status   | OK
|
++---+--+---
--+
2 rows in set (0.45 sec)

--
select id,txt from test.logo_text where match(txt) against ('+desig*
+service' in boolean mode)
--

+---+---
-+
| id| txt
|
+---+---
-+
| 26570 | wave design a full service design agency seth brewer
http://www.wavedesign.com |
| 38847 | manucom hands on it service j.shannon design
|
+---+---
-+
2 rows in set (0.03 sec)

--
select id,txt from test.logo_text where match(txt) against ('+design
+service' in boolean mode)
--

+---+---
-+
| id| txt
|
+---+---
-+
|  1405 | design service ?? ??
|
|  1480 | edv-service grafik-design
|
| 26570 | wave design a full service design agency seth brewer
http://www.wavedesign.com |
| 38847 | manucom hands on it service j.shannon design
|
+---+---
-+
4 rows in set (0.03 sec)

Bye





-
Before posting, please check: