Re: Reentrancy of Python interpreter

2007-09-29 Thread bvukov
On Sep 28, 11:31 pm, Brad Johnson <[EMAIL PROTECTED]>
wrote:
> I have embedded a single threaded instance of the Python interpreter in my
> application.
>
> I have a place where I execute a Python command that calls into C++ code which
> then in turn calls back into Python using the same interpreter. I get a fatal
> error which is "PyThreadStage_Get: no current thread."
>
> I guess I understand why I can't have two invocations of the same interpreter
> thread in one call stack, but how would I go about solving this?

Looks like ( from PyThreadStage_Get error ) that you lost the GIL. You
probably
entered some C++ code and encapsulated you're work in the

Py_BEGIN_ALLOW_THREADS

Py_END_ALLOW_THREADS

but you're  is calling back the Python function, and you forgot
to acquire
back the GIL.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Adding tuples to a dictionary

2007-05-31 Thread bvukov
On May 31, 8:30 pm, Maciej BliziƄski <[EMAIL PROTECTED]>
wrote:
> Hi Pythonistas!
>
> I've got a question about storing tuples in a dictionary. First, a
> small test case which creates a list of dictionaries:
>
> import time
>
> list_of_dicts = []
> keys = [str(x) for x in range(20)]
> prev_clk = time.clock()
> for i in range(20):
> my_dict = {}
> for key in keys:
> my_dict[key] = key
> list_of_dicts.append(my_dict)
> new_clk = time.clock()
> print i, new_clk - prev_clk
> prev_clk = new_clk
>
> It creates dictionaries and stores them in a list, printing out
> execution times. The size of each dictionary is constant, so is the
> execution time for each iteration.
>
> 0 0.1
> 1 0.1
> 2 0.1
> 3 0.08
> 4 0.09
>
> ...and so on.
>
> Then, just one line is changed:
> my_dict[key] = key
> into:
> my_dict[key] = (key, key)
>
> Full code:
>
> list_of_dicts = []
> keys = [str(x) for x in range(20)]
> prev_clk = time.clock()
> for i in range(20):
> my_dict = {}
> for key in keys:
> my_dict[key] = (key, key)
> list_of_dicts.append(my_dict)
> new_clk = time.clock()
> print i, new_clk - prev_clk
> prev_clk = new_clk
>
> The difference is that instead of single values, tuples are added to
> the dictionary instead. When the program is  run again:
>
> 0 0.27
> 1 0.37
> 2 0.49
> 3 0.6
> ...
> 16 2.32
> 17 2.45
> 18 2.54
> 19 2.68
>
> The execution time is rising linearly with every new dictionary
> created.
>
> Next experiment: dictionaries are not stored in a list, they are just
> left out when an iteration has finished. It's done by removing two
> lines:
>
> list_of_dicts = []
>
> and
>
> list_of_dicts.append(my_dict)
>
> Full code:
>
> keys = [str(x) for x in range(20)]
> prev_clk = time.clock()
> for i in range(20):
> my_dict = {}
> for key in keys:
> my_dict[key] = (key, key)
> new_clk = time.clock()
> print i, new_clk - prev_clk
> prev_clk = new_clk
>
> The time is constant again:
>
> 0 0.28
> 1 0.28
> 2 0.28
> 3 0.26
> 4 0.26
>
> I see no reason for this kind of performance problem, really. It
> happens when both things are true: dictionaries are kept in a list (or
> more generally, in memory) and they store tuples.
>
> As this goes beyond my understanding of Python internals, I would like
> to kindly ask, if anyone has an idea about how to create this data
> structure (list of dictionaries of tuples, assuming that size of all
> dictionaries is the same), in constant time?
>
> Regards,
> Maciej

Let me comment on what happens in you're code:
The place where you create new objects is
keys = [str(x) for x in range(20)]   # here you create 20
strings which will be reused ( by reference )
  and
my_dict[key] = (key, key) # here you create a new tuple with 2
elements
( both are key, so you're taking a
reference of existing key object twice )
The tricky part is where you wrote:
for key in keys:
my_dict[key] = (key, key)
list_of_dicts.append(my_dict)  # note that
list_of_dicts.append is in the loop! check upstairs!
This means that my_dict reference will be stored 20 times, and it
won't be released.
statement
my_dict = {}
will always create new my_dict ( 20 times means 20 new dictionaries )
and start over.
Since python caches free dictionaries ( after delete - they're used
everywhere ),
reuse won't happen, and memory will have to be allocated again.
Lists are internally like arrays, when there is not enough space for
next element, pointer array is doubled, so
there is no huge penalty in the append function. Lists are also reused
from some internal cache.
Dictionaries also have a some growth function. When there is no space
for next key, internal hash map doubles.
The reason why you have a growing time comes from the fact that memory
allocation takes place instead of object
being used by reference. Check the memory usage, and you'll see that
test time is pretty much proportional to overall memory usage.

Regards, Bosko

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Binary file output using python

2007-04-17 Thread bvukov
On Apr 17, 10:30 pm, Thomas Dybdahl Ahle <[EMAIL PROTECTED]> wrote:
> Den Tue, 17 Apr 2007 11:07:38 -0700 skrev kyosohma:
>
> > On Apr 17, 12:41 pm, Chi Yin Cheung <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >> Is there a way in python to output binary files? I need to python to
> >> write out a stream of 5 million floating point numbers, separated by
> >> some separator, but it seems that all python supports natively is
> >> string information output, which is extremely space inefficient.
>
> I don't understand. To me it seams like there is no space difference:
>
> [EMAIL PROTECTED] ~]$ python
> Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
> [GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> f = 
> open("test2", "w")
> >>> f.write(str(range(10**7)))
> >>> f.close()
> >>> f = open("test", "wb")
> >>> f.write(str(range(10**7)))
> >>> f.close()
>
> [EMAIL PROTECTED] ~]$ ls -l test test2
> -rw-rw-r-- 1 thomas thomas 8890 17 apr 22:28 test
> -rw-rw-r-- 1 thomas thomas 8890 17 apr 22:27 test2
> [EMAIL PROTECTED] ~]$

That's OK, but he might also take a look at the 'struct' module which
can solve the "stream of 5 million floating point numbers, separated
by
some separator" part of the issue ( if binary format is needed ). From
the python docs...
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8


-- 
http://mail.python.org/mailman/listinfo/python-list