J S said:
>
>
>> > Hi,
>> >
>> > I need some help please! I have 60GB of proxy logs to parse and load
>> into a  mysql database. I've written a parsing script but I'm stuck
>> now on how to  load the data in.
>> >
>> > I have a database called PROXY_LOG with 2 tables:
>> >
>> > USER_TABLE
>> > user_id   date_time   url_id   size
>> >
>> > and
>> >
>> > URL_TABLE
>> > url_id   url   category
>> >
>> >
>> > The data values I have for each record are:
>> >
>> > user_id   date_time   size   url   category
>> >
>> > for example:
>> >
>> > u752359   2004-04-02 12:33:04   3403
>> http://www.mysql.com/index.html
>> >  business
>> >
>> > The problem is I'm not sure how to load the data into the 2 tables
>> and set  the url_id. Could anyone point me in the right direction
>> please?
>> >
>> > Many thanks,
>> >
>> > js.
>> >
>> > _________________________________________________________________
>> It's fast, it's easy and it's free. Get MSN Messenger today!
>> > http://www.msn.co.uk/messenger
>>What language did you use to do the parsing.  If it was perl I'd
>> recommend looking at using perl's DBI interface and doing it
>> progromaticaly. Do you have any idea of how many different url's you
>> have.
>>
>>Basic steps:
>>    Get record,
>>    check to see if url is in database, if it is get the url_id.
>>        if not insert it and get the generated url_id.
>>    insert the user record using the url_id you now have.
>>repeat until you run out of records.
>>
>>Not elegent but it will get the job done.  Note look into documentation
>> on how to get the new url_id after you do an insert.  Its in the
>> DBD::mysql for perl.
>>
>
> Thanks for your reply William. I am using a perl script. If I have to
> insert  these records one by one it's going to be really slow isn't it?
> Maybe the  quickest way is to parse the logs twice i.e. fill the
> URL_TABLE first using  your procedure above, then on the second run,
> create a LOAD file for  USER_TABLE?
>
> js.
How will you get the information for the url-id's?  I can see splitting
the logs and using a load file for the url_table (if you can eliminate
duplicates).  You can save some time, if you can build a perl hash with
the
$url{urlvalue} = url_id. Test that and only do inserts if you need to.
Hash look up is faster than db query, but you will have to have the hash
in memory.  You can use the hash to prepare the USER_TABLE and then load
infile that.
Just thought, url is going to have to be a unique key? You can speed up
the initial inserts by inserting without that key (using the perl hash to
avoid collisions) and then altering table to add the key in.  However,
question comes back to do you have enough memory for the hash in perl?
Notice also, that you don't have a rowID equivalent in the USER_TABLE



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to