----- Original Message ----- From: "Dan Harrington" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, May 22, 2002 5:41 PM Subject: Loading massive data set from CSV
> Greetings everyone, > > I have an ASCII CSV or Tab Delimited file that is > roughly 3.5 gigabytes, and I want to load it into a mysql > database so I can do some analysis. > > First of all, I'm wondering, is there anything I should be aware of, > or worried about, size-wise? > > I know that I can't even look at the file using basic text functions > in my Linux box like 'head' or 'split' > > Initially, I was thinking I'd use 'split' to break it into smaller chunks > but split won't read it. > > I can't even use 'wc -l' to find out how many lines or records are in the file. I thought most linux utils was quite good at dealing with files ie only reading what was needed into memory, but I suppose it depends on the program. But then again I have never tried to manage a 3.5 gig file :) > > There is a list of the fields in the file, so I know what my table should > look like, but I don't want to crash the SQL server if its too large a file, > or something else like that. I didn't know how big it was originally, so I > was just going to use phpMyAdmin to load the file through a web browser.... > though I don't know if that will work either. Is there a size limitation > to HTTP-POST (I assume it uses that method to upload). I wouldn't think that phpMyAdmin / Apache (guessing at web server here) would be able to handle such a big file through CGI. It would depend on how php handles POST cgi - if like most porgrams it just tries to read all of the uploaded file into memory then something is going to keel over, either apache or php. You might be able to load the file into mysql from the mysql client prog - ie do a select into from (I can't remember the actual command , the one that lets you load a db from a file) again this will depend on how the program handles files. If that doesn't work my advice would be to split the file up into managable chucks. I would say a few hundered megs at most, and try them one at a time. Will 'grep' parse the filet - ie can you do "grep someWord myBigFile.txt" ? If so is there anything "splitable" in the file - ie if the file is a log file with a date field then you could try grep "Sun 5th May" myBigFile.txt > 5thMay.txt , etc. Just out of curiosity where did you get the file from? If grep won't look at it ; Hows your C? You could write a c prog to split the file , just reading and writing a few thousand lines at a time. If you don't know C then you could either, try and learn it (if you are a programmer) or alternativley give me a shout and I I'll try and knock something together for you. (I'm a bit rusty with my C though :) I have no idea about the max size of data mysql can hold , I have never had to deal with anything that was over a few hundred thousand records unfortunately. Maybe someone else can shed light on that - I keep hearing that mysql can handle a LOT of rows. Hope some of this helps... Gav Brown > > Comments? > > Thanks > Dan > > > --------------------------------------------------------------------- > Before posting, please check: > http://www.mysql.com/manual.php (the manual) > http://lists.mysql.com/ (the list archive) > > To request this thread, e-mail <[EMAIL PROTECTED]> > To unsubscribe, e-mail <[EMAIL PROTECTED]> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php > --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php