On 25/11/2016 15:17, Heli wrote:

I have a huge ascii file(40G) and I have around 100M lines.  I read this file 
using :

f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)

x=f1[:,1]
y=f1[:,2]
z=f1[:,3]
id=f1[:,0]

I will need the x,y,z and id arrays later for interpolations. The problem is 
reading the file takes around 80 min while the interpolation only takes 15 mins.

I was wondering if there is a more optimized way to read the file that would  
reduce the time to read the input file?

I have the same problem when writing the output using np.savetxt.

Is that read entirely into RAM? I suppose lines are discarded once they are read otherwise it would have to load 40GB before it can do anything.

How much of your RAM is used up during the operation? If that starts to get full then it can get very slow. (Same with a fragmented hard drive.)

Where does the file come from (with savetxt?); could it be generated more compactly?

I don't quite understand what f1[:,1] does, but if that's a slice, that usually involves copying (and extra memory). But presumably you've already measured the 80 minutes just to do the np.loadtxt part. (I guess the processing won't allow the file to be split into smaller pieces.)


--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to