Re: Distributed Fulltext?
Wednesday, from Mike Wexler: I don't think that would be appropriate. My example, is our site (tias.com) has lots of antiques and collectibles. One popular categories is jewelry. If somebody does a search for gold jewelry and the search engine interprets this as anything that mentions gold or jewelry. It is going to match a lot of items. It would be nice if we could use explain or something like it to get a rough estimate of how many results a query would generate, and if it was really bad, we could tell the user to be more specific. This is not a solution, but we make it by using the sql query SELECT COUNT(*) FROM table WHERE MATCH(index) AGAINST ('gold') (results e.g. in 100) and SELECT COUNT(*) FROM table WHERE MATCH(index) AGAINST ('jewelry') (results e.g. in 200) OR-Search: The result is between 300 and 500 matches. AND-Search: The result is between 0 and 200 matches. The problem is: The queries lasts nearly as fast, as the complete search. :) -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Distributed Fulltext?
Wednesday, from David Axmark: Your other point about exact vs. approximate answers is unclear, I expect that Google's answers are exact for their currently available indexes at any given time. But even if they are approximate, I'd be happy with that too. The scoring on a FULLTEXT search in Mysql is exact but based on a formula that is approximate anyway. No, MySQL returns all data according to a search. Web engines return what they they find on one search machine. So you can get different results with Google every time you hit refresh if you are routed to different machines. This had happened to me when I was looking for the number of matches and not the result itself. So we should try to make fulltext searches with a limit between 10 and 100 be fast to be closer to google. I have also head about some other things web search engines do since I know some people at FAST but I have forgot that already. My opinion is, that mySQL itself never should try to find approximate matches. This is against the definition of SQL itself. SQL is a fourth generation language. That means, if you say SELECT, the engine selects. And it has to be as exactly that, what I have searched, every time, on every machine in any combination with the same data. So SQL needs a new language construct to make an approximate search. But what is an approximate search? How is approximate defined? I don't think it is a good idea to implement it in this way. Approximazation must be always done on the application level, cause it is highly dependend on application, what an approximate result could be. We will try to make every feature as good as possible. But we do have limited resources. Exactly. FTS is not so important as other features and people which want you to include a new feature should think about supporting mysql with money. :-) But (yes, we support mysql! :-) I think the need is growing rapidly, cause the amount of data, that has to be indexed is growing over the years. And other DB's have much more experices with it. Currently we can live with the speed. Those who cannot live with it should buy better machines, think about their SE-concept or support mysql. Search engines techniques are *not* trivial, so the last way is in my eyes one of the cheapest. Well there is always the option of sponsoring further fulltext development. We have a guy who has been working on the GNU fulltext engines who is interesting in working with MySQL fulltext. But for the moment we can not afford it. This was my first thought: People write about speed problems and how to cluster and so on. Things, that I would calculate with weeks and high TCO. But it maybe much cheaper to pay mySQL for this. How much do you estimate would it cost to implement inverted files? I think this is difficult, cause Sergei told me, that he couldn't use mySQL-index files any more. I just ask, nothing special in sight, but many questions from everyone who needs it. Cause FTS is a feature which highly improves the value of a web site. And coustomers have no problem to pay for things they think they get money for. FTS is such a thing. But perhaps if we know, under which circumstances FTS is improved, it is easier for us to find a possible way to share the costs for it or find a compromise. I also understand, if mySQL don't want to speak about it here. I think it is also important for us, how much it can be theoretically improved. My calculations showed me a theoretical speed up of factor 100 or so. This is ... wow. But in live everything is most times slower... So if some of you are interested in sponsoring this (or know about others who might be) write to [EMAIL PROTECTED] Or like this... maybe we find coustomers who needs it. Think it's possible. My personal feeling is and my stomach says, that fulltext indexing is a feature, which needs to be expanded. -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: long query on php
Yesterday, from savaidis: The obious question is: (before I test it) This is concatenation to $query that is a string type, no? Yea. The following works either: mysql_query(create bla. bla. bla bla blabla . bla .bla ); So the limit isn't also set to 255 chars too? There is no such limit in PHP. Or is a trick especially for this case? There is no any bit nothing trick not. Read PHP-Doku about defintions of strings. -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: long query on php
Yesterday, from Nathan: Since PHP ignores whitespace, I think this is also acceptable: PHP doesn't ignore whitespace in a quote. But your code is of course correct. $query = CREATE TABLE query tabledef for col a, tabledef for col b, lots more table defs, . welcome to line 400; Another way: (without test) $bladefault=24; $query = eoq; CREATE TABLE query ( idchar(2) default 24 # doublequotes work hugo char(2) default '24' # singlequotes work either bla char(2) default '$bladefault' # inline vars work ... ) eoq -- How is possible to pass a long query to MySQL server with php? I mean i.e a create table statement with more than 400 chars. Have I to use shorter create and then alter? Strings in PHP can be really, really long and are binary safe. Perhaps the problem is the editor? -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: FULL-TEXT Index / Searching Question (fwd)
Tuesday, from Matt Rudderham: Hello, I have two tables in my database as such: CREATE TABLE `skill_names` ( `id` bigint(20) NOT NULL auto_increment, `name` varchar(30) NOT NULL default '', PRIMARY KEY (`id`) ); CREATE TABLE `skills` ( `skills_id` int(11) NOT NULL auto_increment, `member_id` int(11) NOT NULL default '0', `schooling` varchar(100) default NULL, `certifications` varchar(20) NOT NULL default '', `description` blob NOT NULL, `skill_name_id` bigint(20) NOT NULL default '0', PRIMARY KEY (`skills_id`) ); I would like to make full text indexes of the skills table as well as the other tables in the database. My question is that I would like to be able to search for the Member_ID's that have a certain skill name. How would I accomplish this? Also, right now the database has about 300 records, the database runs on a Pentium 200 with 96Mb. Can it handle this not much traffic? Thanks. I must read this a while to understand... sql sql sql This is, what you mean: CREATE FULLTEXT INDEX name (name)... hhhm forgot the correct syntax be free to look into docs SELECT * FROM skills,skillnames WHERE skill_name_id=id and MATCH(name) AGAINST('YOUR SEARCHED SKILLS'); BTW: These tables are suboptimal. You can reduce them to one table. This kind of parting the tables makes only sense, if you have 3 Billion not 300 records. -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Distributed Fulltext?
Yesterday, from Brian DeFeyter: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the queries slow down significantly. Yea, this is cause the method to search the index is a geometric algorithm. I have an example table, with about 90 million rows.. and has a fulltext index on a varchar(100) field. A single-word query which would return approx 300k results takes an average of 15 seconds. A query with smaller results (~ 10k) can be as quick as 1 sec.. which I would consider acceptable. The only interesting thing is, how many words have to be indexed (how many rows is not very important), how big grows your index (does it go in the memory) and how many rows can be found for one word. These are the most depending things. Has any thought about splitting the data into distributed files or even machines? ie: something as simple as 'words' starting with 'X' are split into a-h, i-p, q-z... or something more advanced? (maybe mysqld could automatically split results based on (#results per unique 'word' / desired # of 'split files/machines') Would such a system give any advantages to searching speed and concurrenct query scalability? I haven't looked at the fulltext internals.. so I don't know if such query routing could take place or not. Hum, I think it's *much* cheaper is to come together and pay the mySQL people to introduce a new feature into mySQL called inverted files. This method is in short, that you only store the word and in which records it can be found. This can redurce the size of indexes and so the speed dramatically. If nothing else, does anyone else have experience with a table of this size or even larger? What kind of tuning have you done? We have made for example an extra table for indexing on an extra server. This is good, cause - mySQL can concentrate only for this table, so the machine dosn't need to swap - big searches or reindex dosn't bother the rest of the system - indexing can be done via a cron-job - we optimized the write routines, for example we stripped all tags out and wrote special stop-lists, before we write it. BTW: My wishlist for fulltext indexing: --- - inverted files - rules to define words - stop-word-lists and stop-word-regex - a function which returns a table, which can tell me, what can be searched. E.g SHOW MATCH(table.index) AGAINST('hugobla hugo bla'); word count rows stopword autostopword score wscore hugobla 22no no 10 1 hugo106no no 8 1 bla 00no yes 0 1 Words can be searched, depending on rules, stop-words and what has been indexed. - configurable scoring, e.g. very fast (and simple) scoring for speed - special scores for special words - a cache, which stores often used words and the rows -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: long query on php
Yesterday, from savaidis: The obious question is: (before I test it) This is concatenation to $query that is a string type, no? Yea. The following works either: mysql_query(create bla. bla. bla bla blabla . bla .bla ); So the limit isn't also set to 255 chars too? There is no such limit in PHP. Or is a trick especially for this case? There is no any bit nothing trick not. Read PHP-Doku about defintions of strings. -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: long query on php
Yesterday, from Nathan: Since PHP ignores whitespace, I think this is also acceptable: PHP doesn't ignore whitespace in a quote. But your code is of course correct. $query = CREATE TABLE query tabledef for col a, tabledef for col b, lots more table defs, . welcome to line 400; Another way: (without test) $bladefault=24; $query = eoq; CREATE TABLE query ( idchar(2) default 24 # doublequotes work hugo char(2) default '24' # singlequotes work either bla char(2) default '$bladefault' # inline vars work ... ) eoq -- How is possible to pass a long query to MySQL server with php? I mean i.e a create table statement with more than 400 chars. Have I to use shorter create and then alter? Strings in PHP can be really, really long and are binary safe. Perhaps the problem is the editor? -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Fulltext problems 4.0.1 part two (fwd)
I found out, that the fulltext engine is misbehaving with less than 5 entries. Seems to me, that this is exactly the same case... Perhaps a flush inside the engine or so isn't made with such less entries? The bug is also in 3.23 gamma. Workaround: Enter more rows to fix it. :) I send this bug 1 month ago to the developer who has done the fulltext engine (I just forgot the name and the adresses are in the company - a russian name). Perhaps I should have sent it better to bugs... -- SSilk - Alexander Aulbach - Herbipolis/Frankonia Minoris -- Forwarded message -- Date: Sat, 2 Feb 2002 11:41:36 +0300 From: Alexander Belyaev [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Fulltext problems 4.0.1 part two Hello, I have found a bug( or bad feature? :) ) in 4.0.1 boolean fulltext search Please help... Test case with table can be found at http://logo.nino.ru/fulltextbug.tar.gz Alexander ( sql database query table for you, spam protector:) -- show variables like 'ft%' -- +--++ | Variable_name| Value | +--++ | ft_min_word_len | 2 | | ft_max_word_len | 254| | ft_max_word_len_for_sort | 20 | | ft_boolean_syntax| + -()~*:| | +--++ 4 rows in set (0.00 sec) -- show index from logo_text -- +---++--+--+-+-- -+-+--++--+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +---++--+--+-+-- -+-+--++--+ | logo_text | 0 | PRIMARY |1 | id | A | 38347 | NULL | NULL | | | logo_text | 1 | |1 | txt | A | 38347 |1 | NULL | FULLTEXT | +---++--+--+-+-- -+-+--++--+ 2 rows in set (0.00 sec) -- check table logo_text -- ++---+--+--- --+ | Table | Op| Msg_type | Msg_text | ++---+--+--- --+ | test.logo_text | check | warning | Found 38347 parts Should be: 0 parts | | test.logo_text | check | status | OK | ++---+--+--- --+ 2 rows in set (0.40 sec) -- repair table logo_text -- +++--+--+ | Table | Op | Msg_type | Msg_text | +++--+--+ | test.logo_text | repair | status | OK | +++--+--+ 1 row in set (3.02 sec) -- check table logo_text -- ++---+--+--- --+ | Table | Op| Msg_type | Msg_text | ++---+--+--- --+ | test.logo_text | check | warning | Found 38347 parts Should be: 0 parts | | test.logo_text | check | status | OK | ++---+--+--- --+ 2 rows in set (0.45 sec) -- select id,txt from test.logo_text where match(txt) against ('+desig* +service' in boolean mode) -- +---+--- -+ | id| txt | +---+--- -+ | 26570 | wave design a full service design agency seth brewer http://www.wavedesign.com | | 38847 | manucom hands on it service j.shannon design | +---+--- -+ 2 rows in set (0.03 sec) -- select id,txt from test.logo_text where match(txt) against ('+design +service' in boolean mode) -- +---+--- -+ | id| txt | +---+--- -+ | 1405 | design service ?? ?? | | 1480 | edv-service grafik-design | | 26570 | wave design a full service design agency seth brewer http://www.wavedesign.com | | 38847 | manucom hands on it service j.shannon design | +---+--- -+ 4 rows in set (0.03 sec) Bye - Before posting, please check: