On Sat, 26 Nov 2016 02:17 am, Heli wrote: > Hi, > > I have a huge ascii file(40G) and I have around 100M lines. I read this > file using : > > f=np.loadtxt(os.path.join(dir,myfile),delimiter=None,skiprows=0) [...] > I will need the x,y,z and id arrays later for interpolations. The problem > is reading the file takes around 80 min while the interpolation only takes > 15 mins.
There's no way of telling whether this is good performance or bad performance. Where are you reading it from? Over a network file share on the other side of the world? From a USB hard drive with a USB 2 cable? From a blazing fast SSD hard drive running on a server-class machine with a TB of RAM? My suggestion is that before you spend any more time trying to optimize the software, you try a simple test that will tell you whether or not you are wasting your time. Try making a copy of this 40GB file. I'd expect that making a copy should take *longer* than just reading the file, because you have to read and write 40GB. If you find that it takes (let's say) 30 minutes to read and write a copy of the file, and 80 minutes for numpy to read the file, then its worth looking at optimizing the process. But if it takes (say) 200 minutes to read and write the file, then probably not. When you're dealing with large quantities of data, it takes time to move that many bytes from place to place. It would also help if you told us a bit more about the machine you are running on, specifically how much RAM do you have. If you're trying to process a 40GB file in memory on a machine with 2GB of RAM, you're going to have a bad time... -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list