Re: [Tutor] threading mind set

2012-05-14 Thread Russel Winder
On Mon, 2012-05-14 at 10:31 +1000, Steven D'Aprano wrote:
[...]
 No hard compared to what?

Compared to sequential programming.

[...]
 My argument is that once you move beyond the one-operation-after-another 
 programming model, almost any parallel processing problem is harder than the 
 equivalent sequential version, inherently due to the parallelism. Except 
 perhaps for embarrassingly parallel problems, parallelism adds complexity 
 even if your framework abstracts away most of the tedious detail like 
 semaphores.
 
 http://en.wikipedia.org/wiki/Embarrassingly_parallel
 
 Once you move beyond sequential execution, you have to think about issues 
 that 
 don't apply to sequential programs: how to divide the task up between 
 processes/threads/actors/whatever, how to manage their synchronization, 
 resource starvation (e.g. deadlocks, livelocks), etc.

Actor systems, dataflow systems and CSP (Communicating Sequential
Processes), do not guarantee lack of deadlock or livelock, but the whole
processes communicating by passing messages not by sharing data make
it hugely easier to reason about what is happening.

Moreover if like with CSP, your actors or dataflow systems enforce
sequential actors/operators then it gets even better.

The secret to parallel processing (in general, there are always
exception/corner cases) is to write sequential bits that then
communicate using queues or channels.

No semaphores. No locks. No monitors. These are tools for operating
systems folk and for folk creating actor, dataflow and CSP queues and
channels.

 We have linear minds and it doesn't take that many real-time parallel tasks 
 to 
 overwhelm the human brain. I'm not saying that people can't reason in 
 parallel, because we clearly can and do, but it's inherently harder than 
 sequential reasoning.

I think if you delve into the psychology of it, our minds are far from
linear. Certainly at the electro-chemical level the brain is a massively
parallel machine.

Over the last 50 years, we have enshrined single processor, single
memory into our entire thinking about computing and programming. Our
education systems enforce sequential programming for all but the final
parallel programming option. The main reason for parallel programming
being labelled hard is that we have the wrong tools for reasoning about
it. This is the beauty of the 1960s/1970s models of actors, dataflow and
CSP, you deconstruct the problem into small bits each of which are
sequential and comprehensible, then the overall behaviour of the system
is an emergent property of the interaction between these small
subsystems.

Instead of trying to reason about all the communications systems wide,
we just worry about what happens with a small subsystem.

The hard part is the decomposition. But then the hard part of software
has always been the algorithm.

You highlight embarrassingly parallel which is the simplest
decomposition possible, straight scatter/gather, aka map/reduce. More
often that not this is handled by a façade such as parallel reduce.

It is perhaps worth noting that Big Data is moving to dataflow
processing in a Big Way :-) Data mining and the like has been
revolutionized by changing it's perception of algorithm and how to
decompose problems. 

[...]
 Python doesn't have a GIL. Some Python implementations do, most obviously 
 CPython, the reference implementation. But Jython and IronPython don't. If 
 the 
 GIL is a problem for your program, consider running it on Jython or 
 IronPython.

It is true that Python doesn't have a GIL, thanks for the correction.
CPython and (until recently) PyPy have a GIL. The PyPy folk are
experimenting with software transactional memory (STM) in the
interpreter to be able to remove the GIL. To date things are looking
very positive. PyPy will rock :-)

Although Guido had said (EuroPython 2010) he is happy to continue with
the GIL in CPython, there are subversive elements (notable the PyPy
folk) who are trying to show that STM will work with CPython as well.

Jython is sadly lagging behind in terms of versions of Python supported
and is increasingly becoming irrelevant -- unless someone does something
soon. Groovy, JRuby and Clojure are the dynamic languages of choice on
the JVM.

IronPython is an interesting option except that there is all the FUD
about use of the CLR and having to buy extortion^H^H^H^H^H^H^H^H^H
licencing money to Microsoft. Also Microsoft ceasing to fund IronPython
(and IronRuby) is a clear indicator that Microsoft have no intention of
supporting use of Python on CLR. Thus it could end up in the same state
as Jython.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part

Re: [Tutor] threading mind set

2012-05-13 Thread Steven D'Aprano

bob gailer wrote:

On 5/12/2012 8:22 PM, Steven D'Aprano wrote:
By the way, in future, please don't decorate your code with stars: 
I think you got stars because the code was posted in HTML and bolded. 
Plain text readers add the * to show emphasis.


I think you have it the other way around: if you add asterisks around text, 
some plain text readers hide the * and bold the text. At least, I've never 
seen anything which does it the other way around. (Possibly until now.)


In any case, I'm using Thunderbird, and it does NOT show stars around text 
unless they are already there. When I look at the raw email source, I can see 
the asterisks there.


Perhaps Carlo's mail client is trying to be helpful, and failing miserably. 
While converting HTML b /b tags into simple markup is a nice thing to do 
for plain text, it plays havoc with code.




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Russel Winder
Steven,

On Sun, 2012-05-13 at 10:22 +1000, Steven D'Aprano wrote:
 carlo locci wrote:
  Hello All,
  I've started to study python a couple of month ago(and I truly love it :)),
  however I'm having some problems understanding how to modify a sequential
  script and make it multithreaded (I think it's because I'm not used to
  think in that way), 
 
 No, that's because multithreading and parallel processing is hard.

Shared memory multithreading may be hard due to locks, semaphores,
monitors, etc., but concurrency and parallelism need not be hard. Using
processes and message passing, using dataflow, actors or CSP,
parallelism and concurrency is far more straightforward. Not easy,
agreed, but then programming isn't easy.

  as well as when it's best to use it(some say that
  because of the GIL I won't get any real benefit from threading my script).
 
 That depends on what your script does.
 
 In a nutshell, if your program is limited by CPU processing, then using 
 threads in Python won't help. (There are other things you can do instead, 
 such 
 as launching new Python processes.)

The GIL in Python is a bad thing for parallelism. Using the
multiprocessing package or concurrent.futures gets over the problem.
Well sort of, these processes are a bit heavyweight compared to what can
be achieved on the JVM or with Erlang.

 If your program is limited by disk or network I/O, then there is a 
 possibility 
 you can speed it up with threads.

Or better still use an event based system, cf Twisted.

[...]
 

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Steven D'Aprano

Russel Winder wrote:

Steven,

On Sun, 2012-05-13 at 10:22 +1000, Steven D'Aprano wrote:

carlo locci wrote:

Hello All,
I've started to study python a couple of month ago(and I truly love it :)),
however I'm having some problems understanding how to modify a sequential
script and make it multithreaded (I think it's because I'm not used to
think in that way), 

No, that's because multithreading and parallel processing is hard.


Shared memory multithreading may be hard due to locks, semaphores,
monitors, etc., but concurrency and parallelism need not be hard. 


No hard compared to what?



Using processes and message passing, using dataflow, actors or CSP,
parallelism and concurrency is far more straightforward. Not easy,
agreed, but then programming isn't easy.



My argument is that once you move beyond the one-operation-after-another 
programming model, almost any parallel processing problem is harder than the 
equivalent sequential version, inherently due to the parallelism. Except 
perhaps for embarrassingly parallel problems, parallelism adds complexity 
even if your framework abstracts away most of the tedious detail like semaphores.


http://en.wikipedia.org/wiki/Embarrassingly_parallel

Once you move beyond sequential execution, you have to think about issues that 
don't apply to sequential programs: how to divide the task up between 
processes/threads/actors/whatever, how to manage their synchronization, 
resource starvation (e.g. deadlocks, livelocks), etc.


We have linear minds and it doesn't take that many real-time parallel tasks to 
overwhelm the human brain. I'm not saying that people can't reason in 
parallel, because we clearly can and do, but it's inherently harder than 
sequential reasoning.




The GIL in Python is a bad thing for parallelism. Using the
multiprocessing package or concurrent.futures gets over the problem.
Well sort of, these processes are a bit heavyweight compared to what can
be achieved on the JVM or with Erlang.


Python doesn't have a GIL. Some Python implementations do, most obviously 
CPython, the reference implementation. But Jython and IronPython don't. If the 
GIL is a problem for your program, consider running it on Jython or IronPython.




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Devin Jeanpierre
On Sun, May 13, 2012 at 8:31 PM, Steven D'Aprano st...@pearwood.info wrote:
 Using processes and message passing, using dataflow, actors or CSP,
 parallelism and concurrency is far more straightforward. Not easy,
 agreed, but then programming isn't easy.

 My argument is that once you move beyond the one-operation-after-another
 programming model, almost any parallel processing problem is harder than the
 equivalent sequential version, inherently due to the parallelism. Except
 perhaps for embarrassingly parallel problems, parallelism adds complexity
 even if your framework abstracts away most of the tedious detail like
 semaphores.

If you agree that embarrassingly parallel multithreaded frameworks are
easy, what do you think of dataflow programming? It is exactly the
same, except that you can have multiple tasks, where one task depends
on the output of a previous task. It shares the property that it makes
no difference in what order things are executed (or sequential vs
parallel), so long as the data dependencies are respected -- so it's
another case where you don't actually have to think in a
non-sequential manner. (Rather, think in a vectorized per-work-item
manner.)

http://en.wikipedia.org/wiki/Dataflow_programming

It should be clear that not all ways of programming multithreaded code
are equal, and some are easier than others. In particular, having
mutable state shared between two concurrently-executing procedures is
phenomenally hard, and when it's avoided things become simpler.

-- Devin
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread Steven D'Aprano

carlo locci wrote:

Hello All,
I've started to study python a couple of month ago(and I truly love it :)),
however I'm having some problems understanding how to modify a sequential
script and make it multithreaded (I think it's because I'm not used to
think in that way), 


No, that's because multithreading and parallel processing is hard.



as well as when it's best to use it(some say that
because of the GIL I won't get any real benefit from threading my script).


That depends on what your script does.

In a nutshell, if your program is limited by CPU processing, then using 
threads in Python won't help. (There are other things you can do instead, such 
as launching new Python processes.)


If your program is limited by disk or network I/O, then there is a possibility 
you can speed it up with threads.




It's my understanding that threading a program in python can be useful when
we've got some I/O involved,


To see the benefit of threads, it's not enough to have some I/O, you need 
*lots* of I/O. Threads have some overhead. Unless you save at least as much 
time as just starting and managing the threads consumes, you won't see any 
speed up.


In my experience, for what little it's worth [emphasis on little], unless 
you can keep at least four threads busy doing separate I/O, it probably isn't 
worth the time and effort. And it's probably not worth it for trivial scripts 
-- who cares if you speed your script up from 0.2 seconds to 0.1 seconds?


But as a learning exercise, sure, go ahead and convert your script to threads. 
One experiment is worth a dozen opinions.


You can learn more about threading from here:

http://www.doughellmann.com/PyMOTW/threading/


By the way, in future, please don't decorate your code with stars:


* def read():*
*import csv*
*with open('C:\\test\\VDB.csv', 'rb') as somefile:*

[...]


We should be able to copy and paste your code and have it run immediately, not 
have to spend time editing it by hand to turn it back into valid Python code 
that doesn't give a SyntaxError on every line.


See also this: http://sscce.org/



--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread bob gailer

On 5/12/2012 8:22 PM, Steven D'Aprano wrote:
By the way, in future, please don't decorate your code with stars: 
I think you got stars because the code was posted in HTML and bolded. 
Plain text readers add the * to show emphasis.


When i copied and pasted the code it came out fine.

carlo: in future please post plain text rather than HTML.

--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread bob gailer

def read():
couple of observations
1 - it is customary to put all import statements at the beginning of the 
file.
2 - it is customary to begin variable and function names with a lower 
case letter.
3 - it is better to avoid using built-in function names common method 
names (e.g. read).


def read():
import csv
with open('C:\\test\\VDB.csv', 'rb') as somefile:
read = csv.reader(somefile)
l = []
for row in read:
l += row
return l

def DirGetSize(cartella):
import os
cartella_size = 0
for (path, dirs, files) in os.walk(cartella):
for x in files:
filename = os.path.join(path, x)
cartella_size += os.path.getsize(filename)
return cartella_size

import os.path
for x in read():
if not os.path.exists(x):
print ' DOES NOT EXIST ON', x
else:
S = DirGetSize(x)
print 'the file size of', x, 'is',S



--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor