Re: Please help with Threading

2013-06-02 Thread Jurgens de Bruin
On Saturday, 18 May 2013 10:58:13 UTC+2, Jurgens de Bruin  wrote:
 This is my first script where I want to use the python threading module. I 
 have a large dataset which is a list of dict this can be as much as 200 
 dictionaries in the list. The final goal is a  histogram for each dict 16 
 histograms on a page ( 4x4 ) - this already works. 
 
 What I currently do is a create a nested list [ [ {}  ], [ {} ] ] each inner 
 list contains 16 dictionaries, thus each inner list is a single page of 16 
 histograms. Iterating over the outer-list  and creating the graphs takes to 
 long. So I would like multiple inner-list to be processes simultaneously and 
 creating the graphs in parallel. 
 
 I am trying to use the python threading for this. I create 4 threads loop 
 over the outer-list and send a inner-list to the thread. This seems to work 
 if my nested lists only contains 2 elements - thus less elements than 
 threads. Currently the scripts runs and then seems to get hung up. I monitor 
 the resource  on my mac and python starts off good using 80% and when the 
 4-thread is created the CPU usages drops to 0%. 
 
 
 
 My thread creating is based on the following : 
 http://www.tutorialspoint.com/python/python_multithreading.htm
 
 
 
 Any help would be create!!!

Thanks to all for the discussion/comments on threading, although I have not 
been commenting I have been following.  I have learnt a lot and I am still 
reading up on everything mentioned. Thanks again
Will see how I am going to solve my senario. 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Fábio Santos
On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote:
 Python threads work fine if the threads either rely on intelligent
 DLLs for number crunching (instead of doing nested Python loops to
 process a numeric array you pass it to something like NumPy which
 releases the GIL while crunching a copy of the array) or they do lots of
 I/O and have to wait for I/O devices (while one thread is waiting for
 the write/read operation to complete, another thread can do some number
 crunching).

Has nobody thought of a context manager to allow a part of your code to
free up the GIL? I think the GIL is not inherently bad, but if it poses a
problem at times, there should be a way to get it out of your... Way.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Cameron Simpson
On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote:
| On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote:
|  Python threads work fine if the threads either rely on intelligent
|  DLLs for number crunching (instead of doing nested Python loops to
|  process a numeric array you pass it to something like NumPy which
|  releases the GIL while crunching a copy of the array) or they do lots of
|  I/O and have to wait for I/O devices (while one thread is waiting for
|  the write/read operation to complete, another thread can do some number
|  crunching).
| 
| Has nobody thought of a context manager to allow a part of your code to
| free up the GIL? I think the GIL is not inherently bad, but if it poses a
| problem at times, there should be a way to get it out of your... Way.

The GIL makes individual python operations thread safe by never
running two at once. This makes the implementation of the operations
simpler, faster and safer. It is probably totally infeasible to
write meaningful python code inside your suggested context
manager that didn't rely on the GIL; if the GIL were not held the
code would be unsafe.

It is easy for a C extension to release the GIL, and then to do
meaningful work until it needs to return to python land. Most C
extensions will do that around non-trivial sections, and anything
that may stall in the OS.

So your use case for the context manager doesn't fit well.
-- 
Cameron Simpson c...@zip.com.au

Gentle suggestions being those which are written on rocks of less than 5lbs.
- Tracy Nelson in comp.lang.c
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-20 Thread Carlos Nepomuceno

 Date: Sun, 19 May 2013 13:10:36 +1000
 From: c...@zip.com.au
 To: carlosnepomuc...@outlook.com
 CC: python-list@python.org
 Subject: Re: Please help with Threading

 On 19May2013 03:02, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote:
 | Just been told that GIL doesn't make things slower, but as I
 | didn't know that such a thing even existed I went out looking for
 | more info and found that document:
 | http://www.dabeaz.com/python/UnderstandingGIL.pdf
 |
 | Is it current? I didn't know Python threads aren't preemptive.
 | Seems to be something really old considering the state of the art
 | on parallel execution on multi-cores.
 | What's the catch on making Python threads preemptive? Are there any ongoing 
 projects to make that?

 Depends what you mean by preemptive. If you have multiple CPU bound
 pure Python threads they will all get CPU time without any of them
 explicitly yeilding control. But thread switching happens between
 python instructions, mediated by the interpreter.

I meant operating system preemptive. I've just checked and Python does not 
start Windows threads.

 The standard answers for using multiple cores is to either run
 multiple processes (either explicitly spawning other executables,
 or spawning child python processes using the multiprocessing module),
 or to use (as suggested) libraries that can do the compute intensive
 bits themselves, releasing the while doing so so that the Python
 interpreter can run other bits of your python code.

I've just discovered the multiprocessing module[1] and will make some tests 
with it later. Are there any other modules for that purpose?

I've found the following articles about Python threads. Any suggestions?

http://www.ibm.com/developerworks/aix/library/au-threadingpython/
http://pymotw.com/2/threading/index.html
http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/


[1] http://docs.python.org/2/library/multiprocessing.html


 Plenty of OS system calls (and calls to other libraries from the
 interpreter) release the GIL during the call. Other python threads
 can run during that window.

 And there are other Python implementations other than CPython.

 Cheers,
 --
 Cameron Simpson c...@zip.com.au

 Processes are like potatoes. - NCR device driver manual   
   
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-20 Thread Carlos Nepomuceno

 Date: Mon, 20 May 2013 17:45:14 +1000
 From: c...@zip.com.au
 To: fabiosantos...@gmail.com
 Subject: Re: Please help with Threading
 CC: python-list@python.org; wlfr...@ix.netcom.com

 On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote:
 | On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote:
 | Python threads work fine if the threads either rely on intelligent
 | DLLs for number crunching (instead of doing nested Python loops to
 | process a numeric array you pass it to something like NumPy which
 | releases the GIL while crunching a copy of the array) or they do lots of
 | I/O and have to wait for I/O devices (while one thread is waiting for
 | the write/read operation to complete, another thread can do some number
 | crunching).
 |
 | Has nobody thought of a context manager to allow a part of your code to
 | free up the GIL? I think the GIL is not inherently bad, but if it poses a
 | problem at times, there should be a way to get it out of your... Way.

 The GIL makes individual python operations thread safe by never
 running two at once. This makes the implementation of the operations
 simpler, faster and safer. It is probably totally infeasible to
 write meaningful python code inside your suggested context
 manager that didn't rely on the GIL; if the GIL were not held the
 code would be unsafe.

I just got my hands dirty trying to synchronize Python prints from many threads.
Sometimes they mess up when printing the newlines. 

I tried several approaches using threading.Lock and Condition. None of them 
worked perfectly and all of them made the code sluggish. 

Is there a 100% sure method to make print thread safe? Can it be fast???


 It is easy for a C extension to release the GIL, and then to do
 meaningful work until it needs to return to python land. Most C
 extensions will do that around non-trivial sections, and anything
 that may stall in the OS.

 So your use case for the context manager doesn't fit well.
 --
 Cameron Simpson c...@zip.com.au

 Gentle suggestions being those which are written on rocks of less than 5lbs.
 - Tracy Nelson in comp.lang.c
 --
 http://mail.python.org/mailman/listinfo/python-list   
   
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Fábio Santos
My use case was a tight loop processing an image pixel by pixel, or
crunching a CSV file. If it only uses local variables (and probably hold a
lock before releasing the GIL) it should be safe, no?

My idea is that it's a little bad to have to write C or use multiprocessing
just to do simultaneous calculations. I think an application using a
reactor loop such as twisted would actually benefit from this. Sure, it
will be slower than a C implementation of the same loop, but isn't fast
prototyping a very important feature of the Python language?
On 20 May 2013 08:45, Cameron Simpson c...@zip.com.au wrote:

 On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote:
 | On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote:
 |  Python threads work fine if the threads either rely on
 intelligent
 |  DLLs for number crunching (instead of doing nested Python loops to
 |  process a numeric array you pass it to something like NumPy which
 |  releases the GIL while crunching a copy of the array) or they do lots
 of
 |  I/O and have to wait for I/O devices (while one thread is waiting for
 |  the write/read operation to complete, another thread can do some number
 |  crunching).
 |
 | Has nobody thought of a context manager to allow a part of your code to
 | free up the GIL? I think the GIL is not inherently bad, but if it poses a
 | problem at times, there should be a way to get it out of your... Way.

 The GIL makes individual python operations thread safe by never
 running two at once. This makes the implementation of the operations
 simpler, faster and safer. It is probably totally infeasible to
 write meaningful python code inside your suggested context
 manager that didn't rely on the GIL; if the GIL were not held the
 code would be unsafe.

 It is easy for a C extension to release the GIL, and then to do
 meaningful work until it needs to return to python land. Most C
 extensions will do that around non-trivial sections, and anything
 that may stall in the OS.

 So your use case for the context manager doesn't fit well.
 --
 Cameron Simpson c...@zip.com.au

 Gentle suggestions being those which are written on rocks of less than
 5lbs.
 - Tracy Nelson in comp.lang.c

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Cameron Simpson
On 20May2013 10:53, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote:
| I just got my hands dirty trying to synchronize Python prints from many 
threads.
| Sometimes they mess up when printing the newlines. 
| I tried several approaches using threading.Lock and Condition.
| None of them worked perfectly and all of them made the code sluggish.

Show us some code, with specific complaints.

Did you try this?

  _lock = Lock()

  def lprint(*a, **kw):
global _lock
with _lock:
  print(*a, **kw)

and use lprint() everywhere?

For generality the lock should be per file: the above hack uses one
lock for any file, so that's going to stall overlapping prints to
different files; inefficient.

There are other things than the above, but at least individual prints will
never overlap. If you have interleaved prints, show us.

| Is there a 100% sure method to make print thread safe? Can it be fast???

Depends on what you mean by fast. It will be slower than code
with no lock; how much would require measurement.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

My own suspicion is that the universe is not only queerer than we suppose,
but queerer than we *can* suppose.
- J.B.S. Haldane On Being the Right Size
  in the (1928) book Possible Worlds
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Chris Angelico
On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote:
   _lock = Lock()

   def lprint(*a, **kw):
 global _lock
 with _lock:
   print(*a, **kw)

 and use lprint() everywhere?

Fun little hack:

def print(*args,print=print,lock=Lock(),**kwargs):
  with lock:
print(*args,**kwargs)

Question: Is this a cool use or a horrible abuse of the scoping rules?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Fábio Santos
It is pretty cool although it looks like a recursive function at first ;)
On 20 May 2013 10:13, Chris Angelico ros...@gmail.com wrote:

 On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote:
_lock = Lock()
 
def lprint(*a, **kw):
  global _lock
  with _lock:
print(*a, **kw)
 
  and use lprint() everywhere?

 Fun little hack:

 def print(*args,print=print,lock=Lock(),**kwargs):
   with lock:
 print(*args,**kwargs)

 Question: Is this a cool use or a horrible abuse of the scoping rules?

 ChrisA
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Cameron Simpson
On 20May2013 19:09, Chris Angelico ros...@gmail.com wrote:
| On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote:
|_lock = Lock()
| 
|def lprint(*a, **kw):
|  global _lock
|  with _lock:
|print(*a, **kw)
| 
|  and use lprint() everywhere?
| 
| Fun little hack:
| 
| def print(*args,print=print,lock=Lock(),**kwargs):
|   with lock:
| print(*args,**kwargs)
| 
| Question: Is this a cool use or a horrible abuse of the scoping rules?

I carefully avoided monkey patching print itself:-)

That's... mad! I can see what the end result is meant to be, but
it looks like a debugging nightmare. Certainly my scoping-fu is too
weak to see at a glance how it works.
-- 
Cameron Simpson c...@zip.com.au

I will not do it as a hack   I will not do it for my friends
I will not do it on a MacI will not write for Uncle Sam
I will not do it on weekends I won't do ADA, Sam-I-Am
- Gregory Bond g...@bby.com.au
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-20 Thread Carlos Nepomuceno

 Date: Mon, 20 May 2013 18:35:20 +1000
 From: c...@zip.com.au
 To: carlosnepomuc...@outlook.com
 CC: python-list@python.org
 Subject: Re: Please help with Threading

 On 20May2013 10:53, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote:
 | I just got my hands dirty trying to synchronize Python prints from many 
 threads.
 | Sometimes they mess up when printing the newlines.
 | I tried several approaches using threading.Lock and Condition.
 | None of them worked perfectly and all of them made the code sluggish.

 Show us some code, with specific complaints.

 Did you try this?

 _lock = Lock()

 def lprint(*a, **kw):
 global _lock
 with _lock:
 print(*a, **kw)

 and use lprint() everywhere?


It works! Think I was running the wrong script...

Anyway, the suggestion you've made is the third and latest attempt that I've 
tried to synchronize the print outputs from the threads.

I've also used:

### 1st approach ###
lock  = threading.Lock()
[...]
try:
    lock.acquire()
    [thread protected code]
finally:
    lock.release()


### 2nd approach ###
cond  = threading.Condition()
[...]
try:
    [thread protected code]
    with cond:
        print '[...]'


### 3rd approach ###
from __future__ import print_function

def safe_print(*args, **kwargs):
    global print_lock
    with print_lock:
    print(*args, **kwargs)
[...]
try:
    [thread protected code]
    safe_print('[...]')



Except for the first one all kind of have the same performance. The 
problem was I placed the acquire/release around the whole code block, 
instead of only the print statements.

Thanks a lot! ;)

 For generality the lock should be per file: the above hack uses one
 lock for any file, so that's going to stall overlapping prints to
 different files; inefficient.

 There are other things than the above, but at least individual prints will
 never overlap. If you have interleaved prints, show us.

 | Is there a 100% sure method to make print thread safe? Can it be fast???

 Depends on what you mean by fast. It will be slower than code
 with no lock; how much would require measurement.

 Cheers,
 --
 Cameron Simpson c...@zip.com.au

 My own suspicion is that the universe is not only queerer than we suppose,
 but queerer than we *can* suppose.
 - J.B.S. Haldane On Being the Right Size
 in the (1928) book Possible Worlds
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Chris Angelico
On Mon, May 20, 2013 at 7:54 PM, Cameron Simpson c...@zip.com.au wrote:
 On 20May2013 19:09, Chris Angelico ros...@gmail.com wrote:
 | On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote:
 |_lock = Lock()
 | 
 |def lprint(*a, **kw):
 |  global _lock
 |  with _lock:
 |print(*a, **kw)
 | 
 |  and use lprint() everywhere?
 |
 | Fun little hack:
 |
 | def print(*args,print=print,lock=Lock(),**kwargs):
 |   with lock:
 | print(*args,**kwargs)
 |
 | Question: Is this a cool use or a horrible abuse of the scoping rules?

 I carefully avoided monkey patching print itself:-)

 That's... mad! I can see what the end result is meant to be, but
 it looks like a debugging nightmare. Certainly my scoping-fu is too
 weak to see at a glance how it works.

Hehe. Like I said, could easily be called abuse.

Referencing a function's own name in a default has to have one of
these interpretations:

1) It's a self-reference, which can be used to guarantee recursion
even if the name is rebound
2) It references whatever previously held that name before this def statement.

Either would be useful. Python happens to follow #2; though I can't
point to any piece of specification that mandates that, so all I can
really say is that CPython 3.3 appears to follow #2. But both
interpretations make sense, and both would be of use, and use of
either could be called abusive of the rules. Figure that out. :)

The second defaulted argument (lock=Lock()), of course, is a common
idiom. No abuse there, that's pretty Pythonic.

This same sort of code could be done as a decorator:

def serialize(fn):
lock=Lock()
def locked(*args,**kw):
with lock:
fn(*args,**kw)
return locked

print=serialize(print)

Spelled like this, it's obvious that the argument to serialize has to
be the previous 'print'. The other notation achieves the same thing,
just in a quirkier way :)

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Ned Batchelder

On 5/20/2013 6:09 AM, Chris Angelico wrote:

Referencing a function's own name in a default has to have one of
these interpretations:

1) It's a self-reference, which can be used to guarantee recursion
even if the name is rebound
2) It references whatever previously held that name before this def statement.


The meaning must be #2.  A def statement is nothing more than a fancy 
assignment statement.  This:


def foo(a):
return a + 1

is really just the same as:

foo = lambda a: a+1

(in fact, they compile to identical bytecode).  More complex def's don't 
have equivalent lambdas, but are still assignments to the name of the 
function.  So your apparently recursive print function is no more 
ambiguous x = x + 1.  The x on the right hand side is the old value of 
x, the x on the left hand side will be the new value of x.


# Each of these updates a name
x = x + 1
def print(*args,print=print,lock=Lock(),**kwargs):
  with lock:
print(*args,**kwargs)

Of course, if you're going to use that code, a comment might be in order 
to help the next reader through the trickiness...


--Ned.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Dave Angel

On 05/20/2013 03:55 AM, Fábio Santos wrote:

My use case was a tight loop processing an image pixel by pixel, or
crunching a CSV file. If it only uses local variables (and probably hold a
lock before releasing the GIL) it should be safe, no?



Are you making function calls, using system libraries, or creating or 
deleting any objects?  All of these use the GIL because they use common 
data structures shared among all threads.  At the lowest level, creating 
an object requires locked access to the memory manager.



Don't forget, the GIL gets used much more for Python internals than it 
does for the visible stuff.



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Chris Angelico
=On Mon, May 20, 2013 at 8:46 PM, Ned Batchelder n...@nedbatchelder.com wrote:
 On 5/20/2013 6:09 AM, Chris Angelico wrote:

 Referencing a function's own name in a default has to have one of
 these interpretations:

 1) It's a self-reference, which can be used to guarantee recursion
 even if the name is rebound
 2) It references whatever previously held that name before this def
 statement.


 The meaning must be #2.  A def statement is nothing more than a fancy
 assignment statement.

Sure, but the language could have been specced up somewhat
differently, with the same syntax. I was fairly confident that this
would be universally true (well, can't do it with 'print' per se in
older Pythons, but for others); my statement about CPython 3.3 was
just because I hadn't actually hunted down specification proof.

 So your apparently recursive print function is no more
 ambiguous x = x + 1.  The x on the right hand side is the old value of x,
 the x on the left hand side will be the new value of x.

 # Each of these updates a name
 x = x + 1

 def print(*args,print=print,lock=Lock(),**kwargs):
   with lock:
 print(*args,**kwargs)

Yeah. The decorator example makes that fairly clear.

 Of course, if you're going to use that code, a comment might be in order to
 help the next reader through the trickiness...

Absolutely!!

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Fábio Santos
I didn't know that.
On 20 May 2013 12:10, Dave Angel da...@davea.name wrote:
 Are you making function calls, using system libraries, or creating or
deleting any objects?  All of these use the GIL because they use common
data structures shared among all threads.  At the lowest level, creating an
object requires locked access to the memory manager.


 Don't forget, the GIL gets used much more for Python internals than it
does for the visible stuff.

I did not know that. It's both interesting and somehow obvious, although I
didn't know it yet.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread 88888 Dihedral
Chris Angelico於 2013年5月20日星期一UTC+8下午5時09分13秒寫道:
 On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote:
 
_lock = Lock()
 
 
 
def lprint(*a, **kw):
 
  global _lock
 
  with _lock:
 
print(*a, **kw)
 
 
 
  and use lprint() everywhere?
 
 
 
 Fun little hack:
 
 
 
 def print(*args,print=print,lock=Lock(),**kwargs):
 
   with lock:
 
 print(*args,**kwargs)
 
 
 
 Question: Is this a cool use or a horrible abuse of the scoping rules?
 
 
 
 ChrisA

OK, if the python interpreter has a global hiden print out
buffer of ,say, 2to 16 K bytes, and all  string print functions
just construct the output string from the format to this string 
in an efficient low level way, then the next question 
would be that whether the uses can use functions in this 
low level buffer for other string formatting jobs.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Chris Angelico
On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral
dihedral88...@googlemail.com wrote:
 OK, if the python interpreter has a global hiden print out
 buffer of ,say, 2to 16 K bytes, and all  string print functions
 just construct the output string from the format to this string
 in an efficient low level way, then the next question
 would be that whether the uses can use functions in this
 low level buffer for other string formatting jobs.

You remind me of George.
http://www.chroniclesofgeorge.com/

Both make great reading when I'm at work and poking around with random
stuff in our .SQL file of carefully constructed mayhem.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-20 Thread Carlos Nepomuceno
sys.stdout.write() does not suffer from the newlines mess up when printing from 
many threads, like print statement does.

The only usage difference, AFAIK, is to add '\n' at the end of the string.

It's faster and thread safe (really?) by default.

BTW, why I didn't find the source code to the sys module in the 'Lib' directory?


 Date: Tue, 21 May 2013 11:50:17 +1000
 Subject: Re: Please help with Threading
 From: ros...@gmail.com
 To: python-list@python.org

 On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral
 dihedral88...@googlemail.com wrote:
 OK, if the python interpreter has a global hiden print out
 buffer of ,say, 2to 16 K bytes, and all string print functions
 just construct the output string from the format to this string
 in an efficient low level way, then the next question
 would be that whether the uses can use functions in this
 low level buffer for other string formatting jobs.

 You remind me of George.
 http://www.chroniclesofgeorge.com/

 Both make great reading when I'm at work and poking around with random
 stuff in our .SQL file of carefully constructed mayhem.

 ChrisA
 --
 http://mail.python.org/mailman/listinfo/python-list   
   
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-20 Thread Carlos Nepomuceno
 On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral
 dihedral88...@googlemail.com wrote:
 OK, if the python interpreter has a global hiden print out
 buffer of ,say, 2to 16 K bytes, and all string print functions
 just construct the output string from the format to this string
 in an efficient low level way, then the next question
 would be that whether the uses can use functions in this
 low level buffer for other string formatting jobs.

 You remind me of George.
 http://www.chroniclesofgeorge.com/

 Both make great reading when I'm at work and poking around with random
 stuff in our .SQL file of carefully constructed mayhem.

 ChrisA


lol I need more cowbell!!! Please!!! lol
  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-20 Thread Steven D'Aprano
On Tue, 21 May 2013 05:53:46 +0300, Carlos Nepomuceno wrote:

 BTW, why I didn't find the source code to the sys module in the 'Lib'
 directory?

Because sys is a built-in module. It is embedded in the Python 
interpreter.

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-19 Thread Chris Angelico
On Mon, May 20, 2013 at 7:46 AM, Dennis Lee Bieber
wlfr...@ix.netcom.com wrote:
 On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico ros...@gmail.com
 declaimed the following in gmane.comp.python.general:
 With interpreted code eg in CPython, it's easy to implement preemption
 in the interpreter. I don't know how it's actually done, but one easy
 implementation would be every N bytecode instructions, context
 switch. It's still done at a lower level than user code (N bytecode

 Which IS how the common Python interpreter does it -- barring the
 thread making some system call that triggers a preemption ahead of time
 (even time.sleep(0.0) triggers scheduling). Forget if the default is 20
 or 100 byte-code instructions -- as I recall, it DID change a few
 versions back.

Incidentally, is the context-switch check the same as the check for
interrupt signal raising KeyboardInterrupt? ISTR that was another
every N instructions check.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-19 Thread Dave Angel

On 05/19/2013 05:46 PM, Dennis Lee Bieber wrote:

On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico ros...@gmail.com
declaimed the following in gmane.comp.python.general:


On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
carlosnepomuc...@outlook.com wrote:

I didn't know Python threads aren't preemptive. Seems to be something really 
old considering the state of the art on parallel execution on multi-cores.

What's the catch on making Python threads preemptive? Are there any ongoing 
projects to make that?



snip


With interpreted code eg in CPython, it's easy to implement preemption
in the interpreter. I don't know how it's actually done, but one easy
implementation would be every N bytecode instructions, context
switch. It's still done at a lower level than user code (N bytecode


Which IS how the common Python interpreter does it -- barring the
thread making some system call that triggers a preemption ahead of time
(even time.sleep(0.0) triggers scheduling). Forget if the default is 20
or 100 byte-code instructions -- as I recall, it DID change a few
versions back.

Part of the context switch is to transfer the GIL from the preempted
thread to the new thread.

So, overall, on a SINGLE CORE processor running multiple CPU bound
threads takes a bit longer just due to the overhead of thread swapping.

On a multi-core processor, the effect is the same, since -- even
though one may have a thread running on each core -- the GIL is only
assigned to one thread, and other threads get blocked when trying to
access runtime data structures. And you may have even more overhead from
processor cache misses if the a thread gets assigned to a different
core.

(yes -- I'm restating the same thing as I had just trimmed below
this point... but the target is really the OP, where repetition may be
helpful in understanding)



So what's the mapping between real (OS) threads, and the fake ones 
Python uses?  The OS keeps track of a separate stack and context for 
each thread it knows about;  are they one-to-one with the ones you're 
describing here?  If so, then any OS thread that gets scheduled will 
almost always find it can't get the GIL, and spend time thrashing.   But 
the change that CPython does intentionally would be equivalent to a 
sleep(0).


On the other hand, if these threads are distinct from the OS threads, is 
it done with some sort of thread pool, where CPython has its own stack, 
and doesn't really use the one managed by the OS?


Understand the only OS threading I really understand is the one in 
Windows (which I no longer use).  So assuming Linux has some form of 
lightweight threading, the distinction above may not map very well.




--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Please help with Threading

2013-05-18 Thread Jurgens de Bruin
This is my first script where I want to use the python threading module. I have 
a large dataset which is a list of dict this can be as much as 200 dictionaries 
in the list. The final goal is a  histogram for each dict 16 histograms on a 
page ( 4x4 ) - this already works. 
What I currently do is a create a nested list [ [ {}  ], [ {} ] ] each inner 
list contains 16 dictionaries, thus each inner list is a single page of 16 
histograms. Iterating over the outer-list  and creating the graphs takes to 
long. So I would like multiple inner-list to be processes simultaneously and 
creating the graphs in parallel. 
I am trying to use the python threading for this. I create 4 threads loop over 
the outer-list and send a inner-list to the thread. This seems to work if my 
nested lists only contains 2 elements - thus less elements than threads. 
Currently the scripts runs and then seems to get hung up. I monitor the 
resource  on my mac and python starts off good using 80% and when the 4-thread 
is created the CPU usages drops to 0%. 

My thread creating is based on the following : 
http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Peter Otten
Jurgens de Bruin wrote:

 This is my first script where I want to use the python threading module. I
 have a large dataset which is a list of dict this can be as much as 200
 dictionaries in the list. The final goal is a  histogram for each dict 16
 histograms on a page ( 4x4 ) - this already works.
 What I currently do is a create a nested list [ [ {}  ], [ {} ] ] each
 inner list contains 16 dictionaries, thus each inner list is a single page
 of 16 histograms. Iterating over the outer-list  and creating the graphs
 takes to long. So I would like multiple inner-list to be processes
 simultaneously and creating the graphs in parallel.
 I am trying to use the python threading for this. I create 4 threads loop
 over the outer-list and send a inner-list to the thread. This seems to
 work if my nested lists only contains 2 elements - thus less elements than
 threads. Currently the scripts runs and then seems to get hung up. I
 monitor the resource  on my mac and python starts off good using 80% and
 when the 4-thread is created the CPU usages drops to 0%.
 
 My thread creating is based on the following :
 http://www.tutorialspoint.com/python/python_multithreading.htm
 
 Any help would be create!!!

Can you show us the code?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Jurgens de Bruin
I will post code - the entire scripts is 1000 lines of code - can I post the 
threading functions only?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Peter Otten
Jurgens de Bruin wrote:

 I will post code - the entire scripts is 1000 lines of code - can I post
 the threading functions only?

Try to condense it to the relevant parts, but make sure that it can be run 
by us.

As a general note, when you add new stuff to an existing longish script it 
is always a good idea to write it in such a way that you can test it 
standalone so that you can have some confidence that it will work as 
designed once you integrate it with your old code.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Dave Angel

On 05/18/2013 04:58 AM, Jurgens de Bruin wrote:

This is my first script where I want to use the python threading module. I have 
a large dataset which is a list of dict this can be as much as 200 dictionaries 
in the list. The final goal is a  histogram for each dict 16 histograms on a 
page ( 4x4 ) - this already works.
What I currently do is a create a nested list [ [ {}  ], [ {} ] ] each inner list 
contains 16 dictionaries, thus each inner list is a single page of 16 histograms. 
Iterating over the outer-list  and creating the graphs takes to long. So I would like 
multiple inner-list to be processes simultaneously and creating the graphs in 
parallel.
I am trying to use the python threading for this. I create 4 threads loop over 
the outer-list and send a inner-list to the thread. This seems to work if my 
nested lists only contains 2 elements - thus less elements than threads. 
Currently the scripts runs and then seems to get hung up. I monitor the 
resource  on my mac and python starts off good using 80% and when the 4-thread 
is created the CPU usages drops to 0%.

My thread creating is based on the following : 
http://www.tutorialspoint.com/python/python_multithreading.htm

Any help would be create!!!



CPython, and apparently (all of?) the other current Python 
implementations, uses a GIL to prevent multi-threaded applications from 
shooting themselves in the foot.


However the practical effect of the GIL is that CPU-bound applications 
do not multi-thread efficiently;  the single-threaded version usually 
runs faster.


The place where CPython programs gain from multithreading is where each 
thread spends much of its time waiting for some external trigger.


(More specifically, if such a wait is inside well-written C code, it 
releases the GIL so other threads can get useful work done.  Example is 
a thread waiting for internet activity, and blocks inside a system call)



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list


RE: Please help with Threading

2013-05-18 Thread Carlos Nepomuceno

 To: python-list@python.org
 From: wlfr...@ix.netcom.com
 Subject: Re: Please help with Threading
 Date: Sat, 18 May 2013 15:28:56 -0400

 On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin
 debrui...@gmail.com declaimed the following in
 gmane.comp.python.general:

 This is my first script where I want to use the python threading module. I 
 have a large dataset which is a list of dict this can be as much as 200 
 dictionaries in the list. The final goal is a histogram for each dict 16 
 histograms on a page ( 4x4 ) - this already works.
 What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner 
 list contains 16 dictionaries, thus each inner list is a single page of 16 
 histograms. Iterating over the outer-list and creating the graphs takes to 
 long. So I would like multiple inner-list to be processes simultaneously and 
 creating the graphs in parallel.
 I am trying to use the python threading for this. I create 4 threads loop 
 over the outer-list and send a inner-list to the thread. This seems to work 
 if my nested lists only contains 2 elements - thus less elements than 
 threads. Currently the scripts runs and then seems to get hung up. I monitor 
 the resource on my mac and python starts off good using 80% and when the 
 4-thread is created the CPU usages drops to 0%.


 The odds are good that this is just going to run slower...

Just been told that GIL doesn't make things slower, but as I didn't know that 
such a thing even existed I went out looking for more info and found that 
document: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Is it current? I didn't know Python threads aren't preemptive. Seems to be 
something really old considering the state of the art on parallel execution on 
multi-cores.

What's the catch on making Python threads preemptive? Are there any ongoing 
projects to make that?

 One: The common Python implementation uses a global interpreter lock
 to prevent interpreted code from interfering with itself in multiple
 threads. So number cruncher applications don't gain any speed from
 being partitioned into thread -- even on a multicore processor, only one
 thread can have the GIL at a time. On top of that, you have the overhead
 of the interpreter switching between threads (GIL release on one thread,
 GIL acquire for the next thread).

 Python threads work fine if the threads either rely on intelligent
 DLLs for number crunching (instead of doing nested Python loops to
 process a numeric array you pass it to something like NumPy which
 releases the GIL while crunching a copy of the array) or they do lots of
 I/O and have to wait for I/O devices (while one thread is waiting for
 the write/read operation to complete, another thread can do some number
 crunching).

 If you really need to do this type of number crunching in Python
 level code, you'll want to look into the multiprocessing library
 instead. That will create actual OS processes (each with a copy of the
 interpreter, and not sharing memory) and each of those can run on a core
 without conflicting on the GIL.

Which library do you suggest?

 --
 Wulfraed Dennis Lee Bieber AF6VN
 wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/

 --
 http://mail.python.org/mailman/listinfo/python-list   
   
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Chris Angelico
On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno
carlosnepomuc...@outlook.com wrote:
 I didn't know Python threads aren't preemptive. Seems to be something really 
 old considering the state of the art on parallel execution on multi-cores.

 What's the catch on making Python threads preemptive? Are there any ongoing 
 projects to make that?

Preemption isn't really the issue here. On the C level, preemptive vs
cooperative usually means the difference between a stalled thread
locking everyone else out and not doing so. Preemption is done at a
lower level than user code (eg the operating system or the CPU),
meaning that user code can't retain control of the CPU.

With interpreted code eg in CPython, it's easy to implement preemption
in the interpreter. I don't know how it's actually done, but one easy
implementation would be every N bytecode instructions, context
switch. It's still done at a lower level than user code (N bytecode
instructions might all actually be a single tight loop that the
programmer didn't realize was infinite), but it's not at the OS level.

But none of that has anything to do with multiple core usage. The
problem there is that shared data structures need to be accessed
simultaneously, and in CPython, there's a Global Interpreter Lock to
simplify that; but the consequence of the GIL is that no two threads
can simultaneously execute user-level code. There have been
GIL-removal proposals at various times, but the fact remains that a
global lock makes a huge amount of sense and gives pretty good
performance across the board. There's always multiprocessing when you
need multiple CPU-bound threads; it's an explicit way to separate the
shared data (what gets transferred) from local (what doesn't).

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please help with Threading

2013-05-18 Thread Cameron Simpson
On 19May2013 03:02, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote:
| Just been told that GIL doesn't make things slower, but as I
| didn't know that such a thing even existed I went out looking for
| more info and found that document:
| http://www.dabeaz.com/python/UnderstandingGIL.pdf
| 
| Is it current? I didn't know Python threads aren't preemptive.
| Seems to be something really old considering the state of the art
| on parallel execution on multi-cores.
| What's the catch on making Python threads preemptive? Are there any ongoing 
projects to make that?

Depends what you mean by preemptive. If you have multiple CPU bound
pure Python threads they will all get CPU time without any of them
explicitly yeilding control. But thread switching happens between
python instructions, mediated by the interpreter.

The standard answers for using multiple cores is to either run
multiple processes (either explicitly spawning other executables,
or spawning child python processes using the multiprocessing module),
or to use (as suggested) libraries that can do the compute intensive
bits themselves, releasing the while doing so so that the Python
interpreter can run other bits of your python code.

Plenty of OS system calls (and calls to other libraries from the
interpreter) release the GIL during the call. Other python threads
can run during that window.

And there are other Python implementations other than CPython.

Cheers,
-- 
Cameron Simpson c...@zip.com.au

Processes are like potatoes.- NCR device driver manual
-- 
http://mail.python.org/mailman/listinfo/python-list