Re: dict is really slow for big truck

Bruno Desthuilliers Wed, 29 Apr 2009 08:11:05 -0700

bearophileh...@lycos.com a écrit :

On Apr 28, 2:54 pm, forrest yang <gforrest.y...@gmail.com> wrote:

i try to load a big file into a dict, which is about 9,000,000 lines,
something like
1 2 3 4
2 2 3 4
3 4 5 6


code
for line in open(file)
   arr=line.strip().split('\t')
   dict[line.split(None, 1)[0]]=arr

but, the dict is really slow as i load more data into the memory, by
the way the mac i use have 16G memory.
is this cased by the low performace for dict to extend memory or
something other reason.
is there any one can provide a better solution


Keys are integers,


Actually strings. But this is probably not the problem here.

so they are very efficiently managed by the dict.
If I do this:
d = dict.fromkeys(xrange(9000000))
It takes only a little more than a second on my normal PC.
So probably the problem isn't in the dict, it's the I/O

If the OP experiments a noticeable slow down during the process then Idoubt the problem is with IO. If he finds the process to be slow but ofconstant slowness, then it may or not have to with IO, but possibly notas the single factor.


Hint : don't guess, profile.

and/or the
list allocation. A possible suggestion is to not split the arrays,


The OP is actually splitting a string.

but
keep it as strings, and split them only when you use them:

d = {}
for line in open(file):
  line = line.strip()
  d[line.split(None, 1)[0]] = line


You still split the string - but only once, which is indeed better !-)

Bu you can have your cake and eat it too:

d = {}
for line in open(thefile):
   arr = line.strip().split()
   d[arr[0]] = arr

if that's not fast enough you can simplify it:

d = {}
for line in open(file):
  d[line.split(None, 1)[0]] = line


I doubt this will save that much processing time...

If you have memory problems still, then you can only keep the line
number as dict values, of even absolute file positions, to seek later.
You can also use memory mapped files.

Tell us how is the performance now.


IMHO, not much better...
--
http://mail.python.org/mailman/listinfo/python-list

Re: dict is really slow for big truck

Reply via email to