On 25/11/2016 15:17, Heli wrote:
I have a huge ascii file(40G) and I have around 100M lines. I read this file
using :
f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0)
x=f1[:,1]
y=f1[:,2]
z=f1[:,3]
id=f1[:,0]
I will need the x,y,z and id arrays later for interpolations. The problem is
reading the file takes around 80 min while the interpolation only takes 15 mins.
I was wondering if there is a more optimized way to read the file that would
reduce the time to read the input file?
I have the same problem when writing the output using np.savetxt.
Is that read entirely into RAM? I suppose lines are discarded once they
are read otherwise it would have to load 40GB before it can do anything.
How much of your RAM is used up during the operation? If that starts to
get full then it can get very slow. (Same with a fragmented hard drive.)
Where does the file come from (with savetxt?); could it be generated
more compactly?
I don't quite understand what f1[:,1] does, but if that's a slice, that
usually involves copying (and extra memory). But presumably you've
already measured the 80 minutes just to do the np.loadtxt part. (I guess
the processing won't allow the file to be split into smaller pieces.)
--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list