J S said: > > >> > Hi, >> > >> > I need some help please! I have 60GB of proxy logs to parse and load >> into a mysql database. I've written a parsing script but I'm stuck >> now on how to load the data in. >> > >> > I have a database called PROXY_LOG with 2 tables: >> > >> > USER_TABLE >> > user_id date_time url_id size >> > >> > and >> > >> > URL_TABLE >> > url_id url category >> > >> > >> > The data values I have for each record are: >> > >> > user_id date_time size url category >> > >> > for example: >> > >> > u752359 2004-04-02 12:33:04 3403 >> http://www.mysql.com/index.html >> > business >> > >> > The problem is I'm not sure how to load the data into the 2 tables >> and set the url_id. Could anyone point me in the right direction >> please? >> > >> > Many thanks, >> > >> > js. >> > >> > _________________________________________________________________ >> It's fast, it's easy and it's free. Get MSN Messenger today! >> > http://www.msn.co.uk/messenger >>What language did you use to do the parsing. If it was perl I'd >> recommend looking at using perl's DBI interface and doing it >> progromaticaly. Do you have any idea of how many different url's you >> have. >> >>Basic steps: >> Get record, >> check to see if url is in database, if it is get the url_id. >> if not insert it and get the generated url_id. >> insert the user record using the url_id you now have. >>repeat until you run out of records. >> >>Not elegent but it will get the job done. Note look into documentation >> on how to get the new url_id after you do an insert. Its in the >> DBD::mysql for perl. >> > > Thanks for your reply William. I am using a perl script. If I have to > insert these records one by one it's going to be really slow isn't it? > Maybe the quickest way is to parse the logs twice i.e. fill the > URL_TABLE first using your procedure above, then on the second run, > create a LOAD file for USER_TABLE? > > js. How will you get the information for the url-id's? I can see splitting the logs and using a load file for the url_table (if you can eliminate duplicates). You can save some time, if you can build a perl hash with the $url{urlvalue} = url_id. Test that and only do inserts if you need to. Hash look up is faster than db query, but you will have to have the hash in memory. You can use the hash to prepare the USER_TABLE and then load infile that. Just thought, url is going to have to be a unique key? You can speed up the initial inserts by inserting without that key (using the perl hash to avoid collisions) and then altering table to add the key in. However, question comes back to do you have enough memory for the hash in perl? Notice also, that you don't have a rowID equivalent in the USER_TABLE
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]