On 28/02/13 13:27, DoanVietTrungAtGmail wrote:
Dear tutors
My function below simply populates a large dict. When measured by timeit
populating 10 million items once, versus populating 1 million items ten
times, the times are noticeably different:
I cannot replicate your results. When I try it, I get more or less the same
result each time:
py> for count, N in ((1, 1000), (10, 100)):
... t = timeit.Timer('f(N)', 'from __main__ import N, writeDict as f')
... print min(t.repeat(number=count))
...
7.2105910778
7.17914915085
The difference is insignificant.
However, I did notice that when I ran your code, memory consumption went to
80% on my computer, and the load average exceeded 4. I suggest that perhaps
the results you are seeing have something to do with your operating system's
response to memory usage, or some other external factor.
[...]
My guess is that this discrepancy is a result of either how some sort of
overhead in timeit, or of Python having to allocate memory space for a dict
10 times. What do you think, and how to find out for sure?
Whatever the answer is, it is neither of the above.
Firstly, while timeit does have some overhead, it is very small. After all,
timeit is designed for timing tiny sub-microsecond code snippets.
You can get an idea of timeit's overhead like this:
py> from timeit import Timer
py> t1 = Timer("x = 2")
py> t2 = Timer("x = 1;x = 2")
py> min(t1.repeat())
0.048729896545410156
py> min(t2.repeat())
0.06900882720947266
If timeit had no overhead at all, t2 should take twice as long as t1 since
it has two instructions rather than one. But it doesn't, so we can calculate
the (approximate) overhead with a bit of maths:
overhead + t = 0.0486
overhead + 2t = 0.0690
Solving this gives me an overhead of 0.0282s, which is per the one million
loops that timeit does by default. So as you can see, it's quite small: about
30 nanoseconds on my computer per loop. Even if it was a million times
greater, it wouldn't be enough to explain the results you see.
As for your other suggestion, about the memory space allocation, it is also
unlikely to be correct. You are using the same dict on every test! On the
first run, Python has to reallocate memory to make the dict big enough for
10 million entries. After that, the dict is already resized and never gets
any bigger.
Second (for me, this question is more important), how to improve
performance? (I tried a tuple rather than a list for the dict values, it
was slightly faster, but I need dict items to be mutable)
Firstly, are you sure you need to improve performance?
Secondly, performance of what? Have you profiled your application to see
which parts are slow, or are you just guessing?
As it stands, I cannot advise you how to speed your application up, because
I don't know what it does or what bits are slow. But I doubt very much that
the slow part is storing a small list inside a dict.
--
Steven
___
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor