Re: [PHP] what would a c extension buy me
Something that is chiefly bottlenecked at the database isn't going to be improved enough at the C level to be worth the trouble. Always optimize where it's slow. Database design and indexing helps. Minimizing unnecessary queries, writing good ones, helps too. Once you get to the point that your code is the problem, you should look into your algorithms to see if they're as fast as they should be before you dive into C. A crappy algorithm will run like crap in any language. I had a program in C run in three minutes which was, in my estimation, crappy. I re-worked it and used a more sensible algorithm and it now runs in less than a second. Obviously it wasn't the language. In the end maybe an extension will be the way to go for you, depending on your needs, but do everything else you can first. Just my opinion, Ben On Mon, 13 Mar 2006 08:04:53 -0500, Jochem Maas <[EMAIL PROTECTED]> wrote: ... word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '운전할 %' order by wordsize desc oh would you look at this you're ordering by WORDSIZE. stick an index on WORDSIZE!!! http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html In some cases, MySQL cannot use indexes to resolve the ORDER BY, note 'In Some Cases'. so stick an index on WORDSIZE and find out. although it still uses indexes to find the rows that match the WHERE clause. These cases include the following: The key used to fetch the rows is not the same as the one used in the ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1; 2) you have an iceballs' chance in hell that I'm going to even read the 500+ lines of code that followed here ... let alone try to optimize it. ;-) it was just a 40 line summary of 550 lines of code ah, talk about being caught out :-) regardless a 40line summary won't cut it either - you have to take the block as a whole. questions i may deem myself to answer: ) i get this: [EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out [EMAIL PROTECTED] mysql]# cat out | wc -l 15910<<< --- that's line count how many queries? 15, 910 queries so roughly (15 * 60) seconds to run 15,000+ queries and do the processing? that doesn't actually sound so bad. things already taken care of: 1) 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '운전할' order by wordsize desc in cases when you are not using the wildcard tokens (percentage signs) try changing the query to use something like: ... word = '운전할' ... your suggestion and a line from the query_log match exactly. 2) then, it sends each token to CallmatchThis (line 14) which calls matchThis (line 27 - 47 below) matchThis may be called twice (2 sql queeries) (line 51) select * where word = '$token' and another (take that, but if it's not there issue the next sql ) (line 55) select * where word like '$token%'; Dont do "SELECT *" - always explicitly specify the fields you want. that was just a paraphrase. the previous email points to the line number of the code summary never paraphrase code - you only end up with smart ass comments like mine! have you reordered you fields in the db yet? adn made as many VARCHARs as possible into CHARs? --+ | korean_english | CREATE TABLE `korean_english` ( `wordid` int(11) NOT NULL auto_increment, `word` varchar(130) default NULL, `syn` varchar(190) default NULL, `def` blob, `posn` int(2) default '1', `pos` varchar(13) default '1', `submitter` varchar(25) default NULL, `doe` datetime NOT NULL default '-00-00 00:00:00', `wordsize` tinyint(3) unsigned default NULL, PRIMARY KEY (`wordid`), KEY `word_idx` (`word`), KEY `wordid_idx` (`wordid`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | reorder the fields so that the VARCHARS are at the end of the table (and the BLOB field at the very, very end of the table). also change VARCHARs to CHARs where you can. ++- --- - 1 function MainLoop() -- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
On Mon, Mar 13, 2006 at 10:32:55PM +0900, joseph wrote: > mr. maas, > > psychic me you are a man. > > i created an index, no change. but i already knew that because one of > the cases where an index is never used is when > > The key used to fetch the rows is not the same as the one used in the > > > ORDER BY: > > > > which is the case here. > > (index creation stuff follows) > > i'm just going to bite the bullet for a few months and do a massive code > rehaul. move all the matching code preferably to the sql-server. or > pre-compiled C code for the php [ make an extension ]. i think the mb_ > functions in php are very slow ( reportedly ) ... so move those > somewhere else is a good idea. maybe the 'select ... like' clauses to > the sql-server Personally I think you are making a wrong decision. If you think having a store_procedure of what you are wanting to do it may buy you perhaps 1 second of compile time. The problem is design. Put stuff in a pre-compile C code for php, well, i'd like to see that happen but besides that, let me ask you, if you have query 'A' that you are exeucting in php. how long does query 'A' take to execute without using php? blaming mb_* functions on speed isn't even the question of your problems. Curt. -- cat .signature: No such file or directory -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
mr. maas, psychic me you are a man. i created an index, no change. but i already knew that because one of the cases where an index is never used is when > The key used to fetch the rows is not the same as the one used in the > > ORDER BY: > > which is the case here. (index creation stuff follows) i'm just going to bite the bullet for a few months and do a massive code rehaul. move all the matching code preferably to the sql-server. or pre-compiled C code for the php [ make an extension ]. i think the mb_ functions in php are very slow ( reportedly ) ... so move those somewhere else is a good idea. maybe the 'select ... like' clauses to the sql-server maybe the improvements you suggest will make a difference, maybe not, but when i think about how fast mysql runs its mysqlimport --local --fields-terminated-by='*X*X' dict_explicit french_english french_english compared with mysql < a.sql (bunches of insert statements) that's the kind of improvement i am hoping for. more than 500 queeries a second i am getting now, i need to really really improve that if i am going to parse html on the fly with the technology i have. mysql> describe korean_english; +---+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +---+-+--+-+-++ | wordid| int(11) | | PRI | NULL| auto_increment | | word | varchar(130)| YES | MUL | NULL| | | syn | varchar(190)| YES | | NULL| | | def | blob| YES | | NULL| | | posn | int(2) | YES | | 1 | | | pos | varchar(13) | YES | | 1 | | | submitter | varchar(25) | YES | | NULL| | | doe | datetime| | | -00-00 00:00:00 | | | wordsize | tinyint(3) unsigned | YES | | NULL| | +---+-+--+-+-++ 9 rows in set (0.00 sec) mysql> explain select -> word,def,wordid,pos,posn,wordsize,syn from korean_english where word -> like '운전할 %' order by wordsize desc; ++-++---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+--+-+--+--+-+ | 1 | SIMPLE | korean_english | range | word_idx | word_idx | 391 | NULL |1 | Using where; Using filesort | ++-++---+---+--+-+--+--+-+ 1 row in set (0.01 sec) mysql> CREATE INDEX wordsize_index USING BTREE ON korean_english(wordsize); Query OK, 205265 rows affected (11.16 sec) Records: 205265 Duplicates: 0 Warnings: 0 mysql> explain select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '운전할 %' order by wordsize desc;e); ++-++---+---+--+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++---+---+--+-+--+--+-+ | 1 | SIMPLE | korean_english | range | word_idx | word_idx | 391 | NULL |1 | Using where; Using filesort | ++-++---+---+--+-+--+--+-+ 1 row in set (0.00 sec) mysql> 2006-03-13 (월), 14:04 +0100, Jochem Maas 쓰시길: > ... > > >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word > >>>like '운전할 %' order by wordsize desc > >> > >>oh would you look at this > >>you're ordering by WORDSIZE. > >>stick an index on WORDSIZE!!! > > > > > > > > http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html > > > > > > In some cases, MySQL cannot use indexes to resolve the ORDER BY, > > note 'In Some Cases'. > > so stick an index on WORDSIZE and find out. > > > although it still uses indexes to find the rows that match the WHERE > > clause. These cases include the following: > > > > The key used to fetch the rows is not the same as the one used in the > > ORDER BY: > > > > SELECT * FROM t1 WHERE key2=constant ORDER BY key1; > > > > 2) > > > > > >>you have an iceballs' chance in hell that I'm going to even read > >>the 500+ lines of code that f
Re: [PHP] what would a c extension buy me
joseph wrote: > sorry, i made a mistake before. > > >>> 9795 Query select >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word >>>like '운전할' order by wordsize desc >> >>in cases when you are not using the wildcard tokens (percentage signs) >>try changing the query to use something like: >> >> ... word = '운전할' ... >> > > > i have to stick with 'like' because it matches case-insensitive, else > the first word of a sentence (with a capital letter) doesn't match the > db word. FYI do you really care about case (in languages where characters even have case)? if not stick everything into the DB as uppercase and test against that (always normalizing everything to uppercase before doing the queries). it would be faster than a case-insensitive search also consider it might be worth creating a special index for the first (couple?) of letters of the WORD field, that would allow something like this (not in all cases obviously): SELECT * FROM korean_english WHERE word = '운전할' AND SUBSTR(word,0,1) = '운' (now I'm allowed to sue 'SELECT * ...' because I'm offering a possible concept/idea - as opposed to pasting code for people look at) - basically the idea entails creating an index (possibly based on an extra column) that would allow the mysql engine to start searching in a smaller subset based on the 2nd (in this case) WHERE clause and using the more compact 'SUBSTR(word,0,1)' index. this might not have any effect whatsoever depending on how indexes are used internally by the mysql engine. so basically this is me guessing :-) > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
... >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word >>>like '운전할 %' order by wordsize desc >> >>oh would you look at this >>you're ordering by WORDSIZE. >>stick an index on WORDSIZE!!! > > > > http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html > > > In some cases, MySQL cannot use indexes to resolve the ORDER BY, note 'In Some Cases'. so stick an index on WORDSIZE and find out. > although it still uses indexes to find the rows that match the WHERE > clause. These cases include the following: > > The key used to fetch the rows is not the same as the one used in the > ORDER BY: > > SELECT * FROM t1 WHERE key2=constant ORDER BY key1; > > 2) > > >>you have an iceballs' chance in hell that I'm going to even read >>the 500+ lines of code that followed here ... let alone try to >>optimize it. ;-) >> > > > it was just a 40 line summary of 550 lines of code ah, talk about being caught out :-) regardless a 40line summary won't cut it either - you have to take the block as a whole. > > > > questions i may deem myself to answer: > >>>) i get this: >>>[EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out >>> >>>[EMAIL PROTECTED] mysql]# cat out | wc -l >>>15910<<< --- that's line count >> >>how many queries? > > 15, 910 queries so roughly (15 * 60) seconds to run 15,000+ queries and do the processing? that doesn't actually sound so bad. > > > > things already taken care of: > 1) > >>> 9795 Query select >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word >>>like '운전할' order by wordsize desc >> >>in cases when you are not using the wildcard tokens (percentage signs) >>try changing the query to use something like: >> >> ... word = '운전할' ... > > > your suggestion and a line from the query_log match exactly. > > > 2) > >> >> >>>then, it sends each token to CallmatchThis (line 14) which calls >>>matchThis (line 27 - 47 below) >>>matchThis may be called twice (2 sql queeries) >>>(line 51) select * where word = '$token' and another (take that, but if >>>it's not there issue the next sql ) >>>(line 55) select * where word like '$token%'; >> >>Dont do "SELECT *" - always explicitly specify the fields you want. >> > > that was just a paraphrase. the previous email points to the line number > of the code summary never paraphrase code - you only end up with smart ass comments like mine! have you reordered you fields in the db yet? adn made as many VARCHARs as possible into CHARs? > > >> >> >>>--+ >>>| korean_english | CREATE TABLE `korean_english` ( >>> `wordid` int(11) NOT NULL auto_increment, >>> `word` varchar(130) default NULL, >>> `syn` varchar(190) default NULL, >>> `def` blob, >>> `posn` int(2) default '1', >>> `pos` varchar(13) default '1', >>> `submitter` varchar(25) default NULL, >>> `doe` datetime NOT NULL default '-00-00 00:00:00', >>> `wordsize` tinyint(3) unsigned default NULL, >>> PRIMARY KEY (`wordid`), >>> KEY `word_idx` (`word`), >>> KEY `wordid_idx` (`wordid`) >>>) ENGINE=MyISAM DEFAULT CHARSET=utf8 | >> >>reorder the fields so that the VARCHARS are at the end of the >>table (and the BLOB field at the very, very end of the table). >> >>also change VARCHARs to CHARs where you can. >> >> >>>++- >>>--- >>>- >>> >>> >>> >>> >>> >>> 1 function MainLoop() > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
sorry, i made a mistake before. > >9795 Query select > > word,def,wordid,pos,posn,wordsize,syn from korean_english where word > > like '운전할' order by wordsize desc > > in cases when you are not using the wildcard tokens (percentage signs) > try changing the query to use something like: > > ... word = '운전할' ... > i have to stick with 'like' because it matches case-insensitive, else the first word of a sentence (with a capital letter) doesn't match the db word. FYI -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
many points given to my esteemed and humble correspondant for: 1) > psychics-php is a seperate mailing lists, please channel/mindmeld the correct > subscription procedure from John Nichel ;-) > points to me for a quick comeback ( word is indexed, wordsize is not) 1) per mysql website ... > > > > > as for indexes, i only search against word, so i have an index for that. > > er, you might want to read up a bit on indexes :-) > > [EMAIL PROTECTED] mysql]# head out > >9795 Query select > > word,def,wordid,pos,posn,wordsize,syn from korean_english where word > > like '운전할 %' order by wordsize desc > > oh would you look at this > you're ordering by WORDSIZE. > stick an index on WORDSIZE!!! http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following: The key used to fetch the rows is not the same as the one used in the ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1; 2) > you have an iceballs' chance in hell that I'm going to even read > the 500+ lines of code that followed here ... let alone try to > optimize it. ;-) > it was just a 40 line summary of 550 lines of code questions i may deem myself to answer: > > > > ) i get this: > > [EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out > > > > [EMAIL PROTECTED] mysql]# cat out | wc -l > > 15910<<< --- that's line count > > how many queries? 15, 910 queries > things already taken care of: 1) > >9795 Query select > > word,def,wordid,pos,posn,wordsize,syn from korean_english where word > > like '운전할' order by wordsize desc > > in cases when you are not using the wildcard tokens (percentage signs) > try changing the query to use something like: > > ... word = '운전할' ... your suggestion and a line from the query_log match exactly. 2) > > > > > > then, it sends each token to CallmatchThis (line 14) which calls > > matchThis (line 27 - 47 below) > > matchThis may be called twice (2 sql queeries) > > (line 51) select * where word = '$token' and another (take that, but if > > it's not there issue the next sql ) > > (line 55) select * where word like '$token%'; > > Dont do "SELECT *" - always explicitly specify the fields you want. > that was just a paraphrase. the previous email points to the line number of the code summary > > > > > --+ > > | korean_english | CREATE TABLE `korean_english` ( > > `wordid` int(11) NOT NULL auto_increment, > > `word` varchar(130) default NULL, > > `syn` varchar(190) default NULL, > > `def` blob, > > `posn` int(2) default '1', > > `pos` varchar(13) default '1', > > `submitter` varchar(25) default NULL, > > `doe` datetime NOT NULL default '-00-00 00:00:00', > > `wordsize` tinyint(3) unsigned default NULL, > > PRIMARY KEY (`wordid`), > > KEY `word_idx` (`word`), > > KEY `wordid_idx` (`wordid`) > > ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | > > reorder the fields so that the VARCHARS are at the end of the > table (and the BLOB field at the very, very end of the table). > > also change VARCHARs to CHARs where you can. > > > ++- > > --- > > - > > > > > > > > > > > > 1 function MainLoop() -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
joseph wrote: > fellow php programmers, > > sorry to you two who were kind enough to take the time to attempt to > answer my question. i thought it was enough to say "i had 550 lines > with a bunch of sql calls and loops and big data structures" psychics-php is a seperate mailing lists, please channel/mindmeld the correct subscription procedure from John Nichel ;-) > > to help you make a better assessment i will summarize in more detail > now. > > for my biggest file ( > > http://www.myowndictionary.com/index.php?old_cfile=rss&cfile=rss&rss_id=198&cid=242&from_lang=korean > > ) i get this: > [EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out > > [EMAIL PROTECTED] mysql]# cat out | wc -l > 15910<<< --- that's line count how many queries? > [EMAIL PROTECTED] mysql]# head out >9795 Query select > word,def,wordid,pos,posn,wordsize,syn from korean_english where word > like '운전할 %' order by wordsize desc oh would you look at this you're ordering by WORDSIZE. stick an index on WORDSIZE!!! >9795 Query select > word,def,wordid,pos,posn,wordsize,syn from korean_english where word > like '운전할' order by wordsize desc in cases when you are not using the wildcard tokens (percentage signs) try changing the query to use something like: ... word = '운전할' ... > > then, it sends each token to CallmatchThis (line 14) which calls > matchThis (line 27 - 47 below) > matchThis may be called twice (2 sql queeries) > (line 51) select * where word = '$token' and another (take that, but if > it's not there issue the next sql ) > (line 55) select * where word like '$token%'; Dont do "SELECT *" - always explicitly specify the fields you want. ... > > as for indexes, i only search against word, so i have an index for that. er, you might want to read up a bit on indexes :-) > --+ > | korean_english | CREATE TABLE `korean_english` ( > `wordid` int(11) NOT NULL auto_increment, > `word` varchar(130) default NULL, > `syn` varchar(190) default NULL, > `def` blob, > `posn` int(2) default '1', > `pos` varchar(13) default '1', > `submitter` varchar(25) default NULL, > `doe` datetime NOT NULL default '-00-00 00:00:00', > `wordsize` tinyint(3) unsigned default NULL, > PRIMARY KEY (`wordid`), > KEY `word_idx` (`word`), > KEY `wordid_idx` (`wordid`) > ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | reorder the fields so that the VARCHARS are at the end of the table (and the BLOB field at the very, very end of the table). also change VARCHARs to CHARs where you can. > ++- > --- > - > > > > > > 1 function MainLoop() you have an iceballs' chance in hell that I'm going to even read the 500+ lines of code that followed here ... let alone try to optimize it. ;-) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
fellow php programmers, sorry to you two who were kind enough to take the time to attempt to answer my question. i thought it was enough to say "i had 550 lines with a bunch of sql calls and loops and big data structures" to help you make a better assessment i will summarize in more detail now. for my biggest file ( http://www.myowndictionary.com/index.php?old_cfile=rss&cfile=rss&rss_id=198&cid=242&from_lang=korean ) i get this: [EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out [EMAIL PROTECTED] mysql]# cat out | wc -l 15910<<< --- that's line count [EMAIL PROTECTED] mysql]# head out 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '운전할 %' order by wordsize desc 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '운전할' order by wordsize desc 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like ' %' order by wordsize desc 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '수 %' order by wordsize desc 9795 Query select word,def,wordid,pos,posn,wordsize,syn from korean_english where word like '수' order by wordsize desc takes 6 minutes [EMAIL PROTECTED] current]$ date 2006. 03. 13. (월) 20:25:16 KST [EMAIL PROTECTED] current]$ date 2006. 03. 13. (월) 20:31:37 KST [EMAIL PROTECTED] current]$ in general, you will see what is involved in processing each token of text and sending that token through a match against databases and processing results: a summary of the code is: mainLoop starts, it breaks some text into tokens using my_mb_preg_split. (line 3) the reg exp is like $split_pattern = "/((?:[\.!\s\?,:-]|\\\"는)+)/"; mb_preg_split i'm sure is broken, but haven't prepared a legal case at this time ;) then, it sends each token to CallmatchThis (line 14) which calls matchThis (line 27 - 47 below) matchThis may be called twice (2 sql queeries) (line 51) select * where word = '$token' and another (take that, but if it's not there issue the next sql ) (line 55) select * where word like '$token%'; those can result sometimes in arrays from mysql of ~ what 100 words and 6 fields each word for those queeries. then all that array is narrowed down by finding the longest word (yes, a field in the DB is pre-set with that value) there is an algo involved, i cannot simply select the longest match, because i am also selecting against not just that single token but against the following tokens as well, to stop getting a match for "hair" if "hair of the dog that bit you" is in the databse, i want the longer match. that was a necessary embellishment. the array of arrays returned from matchThis are processed again by MainLoop at line 16 to produce a javascript overlib for every word. as for indexes, i only search against word, so i have an index for that. --- --+ | korean_english | CREATE TABLE `korean_english` ( `wordid` int(11) NOT NULL auto_increment, `word` varchar(130) default NULL, `syn` varchar(190) default NULL, `def` blob, `posn` int(2) default '1', `pos` varchar(13) default '1', `submitter` varchar(25) default NULL, `doe` datetime NOT NULL default '-00-00 00:00:00', `wordsize` tinyint(3) unsigned default NULL, PRIMARY KEY (`wordid`), KEY `word_idx` (`word`), KEY `wordid_idx` (`wordid`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | ++- --- - 1 function MainLoop() 2 3 -> $text_wordArr = my_mb_preg_split($subject, $utf8_split_pattern,$debug_my_mb); 4 while (!empty($text_word) ) { 5 // somehow this allows matches for 'in the bag' (idiom) by some algorithm or other i have 6 if (preg_match($split_pattern,$text_word) 7 && (!(preg_match("/^$one_word_matches$/", $text_wordArr[$i-2][0]) && ($text_wordArr[$i-1][0]==' '))) // gobble up the space following those in such list, for previuos one 8 && (!(preg_match("/^$one_word_matches$/", $text_wordArr[$i-3][0])) && ($text_wordArr[$i-2][0]==' '))) { // try to limit it to just blanks and quotes, other such garbage for previous TWO 9 } else if ( preg_match("/^$one_word_matches$/i", $text_word)) {
Re: [PHP] what would a c extension buy me
joseph wrote: hi, my site is www.myowndictionary.com i use javascript to create definitions for words (from open source dictionaries) as pop-ups and hook word-lists for vocabulary study with that. i can now parse the html from rss feeds and match only text of interest. i just wrote my own multibyte html parser by combining various other code i already had on hand for the parsing of multibyte text anyhow. but, that's not the slowest part. that takes maybe 3 seconds to get the page, 3 seconds to parse it, but it sometimes takes 5 minutes for the next step on huge rss files. (I know the time because they are cached) that last step... The slow part that's left is the dictionary searching itself. It's the parsing of complicated php code -- 550 lines of complex data structure loops, function calls, looping through arrays to make sql queeries -- lots of sql queeries the seconds ~ 5 minute it takes it all takes so much time. use the 'EXPLAIN' syntax of MySQL to figure out if there are any bottlenecks related to crap or 'missing' indexes with regard to the sql queries you mention above. - chances are adding a couple of well placed indexes will probably shave massive ammounts from the process time. i want to ask for opinion about: 1) rewriting the php as C extension. Would it slow my down to the below 15 second range? How Long is a chinaman. How Long is a chinaman? yes the C extension would be much faster (assuming you wrote it correctly - but it's much harder to write and maintain) 2) does mysql have server-side functions yet? there are stored procedures in mysql5. 3) if i moved the php code off mysql to postgresql and wrote server-side functions (which i've been trained to do), how much of a time improvement would we be talking about? how on gods' (or whoever runs the place these days) earth can we tell whether an unwritten stored procedure in postgres would be faster or slower than an unknown (to us) piece of php code? that said I doubt postgres will beat mysql in terms of pure speed assuming you db schema is sound (indexes are good, etc). please send me your input! i am at a growing stage and need direction here. because any of those steps will require considerable time. it's monday morning here too ;-) interesting language tool your building, you might consider repackaging it in a modular form so that other people could implement in their own site. (you do call it 'open source') rgds, Jochem thank you. joseph. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] what would a c extension buy me
joseph wrote: hi, my site is www.myowndictionary.com i use javascript to create definitions for words (from open source dictionaries) as pop-ups and hook word-lists for vocabulary study with that. i can now parse the html from rss feeds and match only text of interest. i just wrote my own multibyte html parser by combining various other code i already had on hand for the parsing of multibyte text anyhow. but, that's not the slowest part. that takes maybe 3 seconds to get the page, 3 seconds to parse it, but it sometimes takes 5 minutes for the next step on huge rss files. (I know the time because they are cached) that last step... The slow part that's left is the dictionary searching itself. It's the parsing of complicated php code -- 550 lines of complex data structure loops, function calls, looping through arrays to make sql queeries -- lots of sql queeries the seconds ~ 5 minute it takes it all takes so much time. i want to ask for opinion about: 1) rewriting the php as C extension. Would it slow my down to the below 15 second range? Who knows? We don't know what the code does or how it works. C will be quicker because it's already compiled. It will be harder to maintain and debug because it's compiled. 2) does mysql have server-side functions yet? Mysql 5 has stored procedures if that's what you mean. 3) if i moved the php code off mysql to postgresql and wrote server-side functions (which i've been trained to do), how much of a time improvement would we be talking about? See # 1. -- Postgresql & php tutorials http://www.designmagick.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php