Hey John,

I never noticed this before so today I ran the index -D and I got the exact 
same error. So I used your my.cnf options and I was able to run index -D 
without errors. I'm glad you brought this to my attention. Now I'll be more 
cautious and review the session that is presented with index.

I wonder if this is a bug? I also wonder how many urls I can index before 
the setting of 16m causes more problems? That would be terrible because I've 
been indexing now for a week. I have about 2,500,000 urls indexed.

I too wonder how to get rid of all those non status 200 urls. I have a few 
million 404's and you know 404 means not found. It also means the web site 
owner removed them from their html tree so they will most likely never be 
available. Actually anything above 200 I don't really care to have around. 
Why keep all this around? I think anything that's not returned as a 200 
should be removed if I want to remove them, but I don't know how to do this.

Maybe Kir will be so kind to tell us how to get rid of all the non 200 
status urls without us breaking something. Maybe he'll have an answer why 
index -D causes this error too. Who knows?

Thanks again for your post John,
Karen

>
>The following error was always presented after index ran. It can be 
>recreated every time running:
>
>./index -D
>
>
>OUTPUT:
>
>Loading configuration from /usr/local/aspseek/etc/db.conf
>Loading configuration from /usr/local/aspseek/etc/ucharset.conf
>Loading configuration from /usr/local/aspseek/etc/stopwords.conf
>Loading configuration from /usr/local/aspseek/etc/aspseek.conf
>Saving real-time ... done
>Saving redirects ... done
>Saving direct href deleta files ... done
>Calculating ranks [........] done.
>Saving lastmods ... done
>Generating word site ..Error: MySQL server has gone away <INSERT INTO 
>wordsite(word, sites) VALUES(:1, :2)>
>Error: MySQL server has gone away <INSERT INTO wordsite(word, sites) 
>VALUES(:1, :2)>
>. done
>
>I don't know how long this has been going on because while running index 
>this is usually scrolled past the screen when indexing is completed. This 
>time I scrolled back to see the process and that's what I found after 
>running index. To duplicate the same error I ran index -D and sure enough 
>the error occured.
>
>I'm running MySQL's my-huge.cnf with no modifications other than commenting 
>out the log-bin. MySQL server has NOT gone away and it is live. No errors 
>in MySQL's sitename.err file either and search works fine. No other users 
>on this box just indexer and me. The box is not open to the world.
>
>I have read information at MySQL's Website concerning this type of error. 
>This page can be located at:
>
>http://www.mysql.com/doc/en/Gone_away.html
>
>I made a change to the my.cnf:
>
>from:
>set-variable = max_allowed_packet = 1m
>
>to:
>
>set-variable = max_allowed_packet = 3m
>
>and that didn't help. So....
>
>set-variable = max_allowed_packet = 10m
>
>and it then was able to generate word site without an error.
>
>My next step now is of course to find out what I need to set this at so I 
>don't run into the same problem in the future. The maximum this can be set 
>at using MySQL 3.23 is 16m. So now I'm wondering just how many URLs can we 
>index if the packets are this large on just 2 million URLs? I also found 
>this page to be interesting regarding this problem.
>
>http://www.mysql.com/doc/en/Packet_too_large.html
>
>Here are my current stats for index (I wish I could get rid of everything 
>other than status 200 and status 0 though):
>
>
>ASPseek database statistics
>
>    Status    Expired      Total
>   -----------------------------
>         0      61298      61324 Not indexed yet
>         1          0        132 Unknown status
>       200          0    1974309 OK
>       202          0         75 Unknown status
>       204          0         45 No content
>       205          0          3 Unknown status
>       300          0         19 Multiple Choices
>       301          0      62552 Moved Permanently
>       302          0     121177 Moved Temporarily
>       303          0          7 See Other
>       307          0          8 Unknown status
>       400          0        107 Bad Request
>       401          0        181 Unauthorized
>       402          0          5 Payment Required
>       403          0       3115 Forbidden
>       404          0    1129545 Not found
>       405          0          1 Method Not Allowed
>       407          0          1 Proxy Authentication Required
>       408          0          5 Request Timeout
>       410          0         10 Gone
>       415          0          1 Unsupported Media Type
>       500          0        344 Internal Server Error
>       501          0         11 Not Implemented
>       502          0         13 Bad Gateway
>       503          0        328 Service Unavailable
>       504          0          8 Gateway Timeout
>       508          0        275 Unknown status
>   -----------------------------
>     Total      61298    3353601
>
>Maybe Kir has an answer to this situation?
>
>Thanks,
>John
>
>
>_________________________________________________________________
>Chat with friends online, try MSN Messenger: http://messenger.msn.com




_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com

Reply via email to