----- Original Message -----
From: "Dan Harrington" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 22, 2002 5:41 PM
Subject: Loading massive data set from CSV


> Greetings everyone,
>
> I have an ASCII CSV or Tab Delimited file that is
> roughly 3.5 gigabytes, and I want to load it into a mysql
> database so I can do some analysis.
>
> First of all, I'm wondering, is there anything I should be aware of,
> or worried about, size-wise?
>
> I know that I can't even look at the file using basic text functions
> in my Linux box like 'head' or 'split'
>
> Initially, I was thinking I'd use 'split' to break it into smaller chunks
> but split won't read it.
>
> I can't even use 'wc -l' to find out how many lines or records are in the
file.

I thought most linux utils was quite good at dealing with files ie only
reading what was needed into memory, but I suppose it depends on the
program.

But then again I have never tried to manage a 3.5 gig file :)

>
> There is a list of the fields in the file, so I know what my table should
> look like, but I don't want to crash the SQL server if its too large a
file,
> or something else like that.  I didn't know how big it was originally, so
I
> was just going to use phpMyAdmin to load the file through a web
browser....
> though I don't know if that will work either.  Is there a size limitation
> to HTTP-POST (I assume it uses that method to upload).

I wouldn't think that phpMyAdmin  / Apache (guessing at web server here)
would be able to handle such a big file through CGI.
It would depend on how php handles POST cgi - if like most porgrams it just
tries to read all of the uploaded file into memory then something is going
to keel over, either apache or php.

You might be able to load the file into mysql from the mysql client prog -
ie do a select into from (I can't remember the actual command , the one that
lets you load a db from a file) again this will depend on how the program
handles files.

If that doesn't work my advice would be to split the file up into managable
chucks.

I would say a few hundered megs at most, and try them one at a time.

Will 'grep' parse the filet - ie can you do "grep someWord myBigFile.txt" ?
If so is there anything "splitable" in the file - ie if the file is a log
file with a date field then you could try
 grep "Sun 5th May" myBigFile.txt > 5thMay.txt , etc.

Just out of curiosity where did you get the file from?

If grep won't look at it ;

Hows your C?

You could write a c prog to split the file ,  just reading and writing a few
thousand lines at a time.

If you don't know C then you could either, try and learn it (if you are a
programmer) or alternativley give me a shout and I I'll try and knock
something together for you. (I'm a bit rusty with my C though :)

I have no idea about the max size of data mysql can hold , I have never had
to deal with anything that was over a few hundred thousand records
unfortunately. Maybe someone else can shed light on that - I keep hearing
that mysql can handle a LOT of rows.

Hope some of this helps...

Gav Brown

>
> Comments?
>
> Thanks
> Dan
>
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
<[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to