Re: best way to read a huge ascii file.

BartC Fri, 25 Nov 2016 08:47:49 -0800

On 25/11/2016 15:17, Heli wrote:

I have a huge ascii file(40G) and I have around 100M lines.  I read this file 
using :


f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

x=f1[:,1]
y=f1[:,2]
z=f1[:,3]
id=f1[:,0]

I will need the x,y,z and id arrays later for interpolations. The problem is 
reading the file takes around 80 min while the interpolation only takes 15 mins.

I was wondering if there is a more optimized way to read the file that would  
reduce the time to read the input file?

I have the same problem when writing the output using np.savetxt.

Is that read entirely into RAM? I suppose lines are discarded once theyare read otherwise it would have to load 40GB before it can do anything.

How much of your RAM is used up during the operation? If that starts toget full then it can get very slow. (Same with a fragmented hard drive.)

Where does the file come from (with savetxt?); could it be generatedmore compactly?

I don't quite understand what f1[:,1] does, but if that's a slice, thatusually involves copying (and extra memory). But presumably you'vealready measured the 80 minutes just to do the np.loadtxt part. (I guessthe processing won't allow the file to be split into smaller pieces.)



--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Re: best way to read a huge ascii file.

Reply via email to