Re: [Tutor] threading mind set

2012-05-13 Thread Russel Winder
On Mon, 2012-05-14 at 10:31 +1000, Steven D'Aprano wrote:
[...]
> No hard compared to what?

Compared to sequential programming.

[...]
> My argument is that once you move beyond the one-operation-after-another 
> programming model, almost any parallel processing problem is harder than the 
> equivalent sequential version, inherently due to the parallelism. Except 
> perhaps for "embarrassingly parallel" problems, parallelism adds complexity 
> even if your framework abstracts away most of the tedious detail like 
> semaphores.
> 
> http://en.wikipedia.org/wiki/Embarrassingly_parallel
> 
> Once you move beyond sequential execution, you have to think about issues 
> that 
> don't apply to sequential programs: how to divide the task up between 
> processes/threads/actors/whatever, how to manage their synchronization, 
> resource starvation (e.g. deadlocks, livelocks), etc.

Actor systems, dataflow systems and CSP (Communicating Sequential
Processes), do not guarantee lack of deadlock or livelock, but the whole
"processes communicating by passing messages not by sharing data" make
it hugely easier to reason about what is happening.

Moreover if like with CSP, your actors or dataflow systems enforce
sequential actors/operators then it gets even better.

The secret to parallel processing (in general, there are always
exception/corner cases) is to write sequential bits that then
communicate using queues or channels.

No semaphores. No locks. No monitors. These are tools for operating
systems folk and for folk creating actor, dataflow and CSP queues and
channels.

> We have linear minds and it doesn't take that many real-time parallel tasks 
> to 
> overwhelm the human brain. I'm not saying that people can't reason in 
> parallel, because we clearly can and do, but it's inherently harder than 
> sequential reasoning.

I think if you delve into the psychology of it, our minds are far from
linear. Certainly at the electro-chemical level the brain is a massively
parallel machine.

Over the last 50 years, we have enshrined single processor, single
memory into our entire thinking about computing and programming. Our
education systems enforce sequential programming for all but the final
parallel programming option. The main reason for parallel programming
being labelled hard is that we have the wrong tools for reasoning about
it. This is the beauty of the 1960s/1970s models of actors, dataflow and
CSP, you deconstruct the problem into small bits each of which are
sequential and comprehensible, then the overall behaviour of the system
is an emergent property of the interaction between these small
subsystems.

Instead of trying to reason about all the communications systems wide,
we just worry about what happens with a small subsystem.

The hard part is the decomposition. But then the hard part of software
has always been the algorithm.

You highlight "embarrassingly parallel" which is the simplest
decomposition possible, straight scatter/gather, aka map/reduce. More
often that not this is handled by a façade such as "parallel reduce".

It is perhaps worth noting that "Big Data" is moving to dataflow
processing in a "Big Way" :-) Data mining and the like has been
revolutionized by changing it's perception of algorithm and how to
decompose problems. 

[...]
> Python doesn't have a GIL. Some Python implementations do, most obviously 
> CPython, the reference implementation. But Jython and IronPython don't. If 
> the 
> GIL is a problem for your program, consider running it on Jython or 
> IronPython.

It is true that Python doesn't have a GIL, thanks for the correction.
CPython and (until recently) PyPy have a GIL. The PyPy folk are
experimenting with software transactional memory (STM) in the
interpreter to be able to remove the GIL. To date things are looking
very positive. PyPy will rock :-)

Although Guido had said (EuroPython 2010) he is happy to continue with
the GIL in CPython, there are subversive elements (notable the PyPy
folk) who are trying to show that STM will work with CPython as well.

Jython is sadly lagging behind in terms of versions of Python supported
and is increasingly becoming irrelevant -- unless someone does something
soon. Groovy, JRuby and Clojure are the dynamic languages of choice on
the JVM.

IronPython is an interesting option except that there is all the FUD
about use of the CLR and having to buy extortion^H^H^H^H^H^H^H^H^H
licencing money to Microsoft. Also Microsoft ceasing to fund IronPython
(and IronRuby) is a clear indicator that Microsoft have no intention of
supporting use of Python on CLR. Thus it could end up in the same state
as Jython.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digita

Re: [Tutor] threading mind set

2012-05-13 Thread Devin Jeanpierre
On Sun, May 13, 2012 at 8:31 PM, Steven D'Aprano  wrote:
>> Using processes and message passing, using dataflow, actors or CSP,
>> parallelism and concurrency is far more straightforward. Not easy,
>> agreed, but then programming isn't easy.
>
> My argument is that once you move beyond the one-operation-after-another
> programming model, almost any parallel processing problem is harder than the
> equivalent sequential version, inherently due to the parallelism. Except
> perhaps for "embarrassingly parallel" problems, parallelism adds complexity
> even if your framework abstracts away most of the tedious detail like
> semaphores.

If you agree that embarrassingly parallel multithreaded frameworks are
easy, what do you think of dataflow programming? It is exactly the
same, except that you can have multiple tasks, where one task depends
on the output of a previous task. It shares the property that it makes
no difference in what order things are executed (or sequential vs
parallel), so long as the data dependencies are respected -- so it's
another case where you don't actually have to think in a
non-sequential manner. (Rather, think in a "vectorized" per-work-item
manner.)

http://en.wikipedia.org/wiki/Dataflow_programming

It should be clear that not all ways of programming multithreaded code
are equal, and some are easier than others. In particular, having
mutable state shared between two concurrently-executing procedures is
phenomenally hard, and when it's avoided things become simpler.

-- Devin
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Steven D'Aprano

Russel Winder wrote:

Steven,

On Sun, 2012-05-13 at 10:22 +1000, Steven D'Aprano wrote:

carlo locci wrote:

Hello All,
I've started to study python a couple of month ago(and I truly love it :)),
however I'm having some problems understanding how to modify a sequential
script and make it multithreaded (I think it's because I'm not used to
think in that way), 

No, that's because multithreading and parallel processing is hard.


Shared memory multithreading may be hard due to locks, semaphores,
monitors, etc., but concurrency and parallelism need not be hard. 


No hard compared to what?



Using processes and message passing, using dataflow, actors or CSP,
parallelism and concurrency is far more straightforward. Not easy,
agreed, but then programming isn't easy.



My argument is that once you move beyond the one-operation-after-another 
programming model, almost any parallel processing problem is harder than the 
equivalent sequential version, inherently due to the parallelism. Except 
perhaps for "embarrassingly parallel" problems, parallelism adds complexity 
even if your framework abstracts away most of the tedious detail like semaphores.


http://en.wikipedia.org/wiki/Embarrassingly_parallel

Once you move beyond sequential execution, you have to think about issues that 
don't apply to sequential programs: how to divide the task up between 
processes/threads/actors/whatever, how to manage their synchronization, 
resource starvation (e.g. deadlocks, livelocks), etc.


We have linear minds and it doesn't take that many real-time parallel tasks to 
overwhelm the human brain. I'm not saying that people can't reason in 
parallel, because we clearly can and do, but it's inherently harder than 
sequential reasoning.




The GIL in Python is a bad thing for parallelism. Using the
multiprocessing package or concurrent.futures gets over the problem.
Well sort of, these processes are a bit heavyweight compared to what can
be achieved on the JVM or with Erlang.


Python doesn't have a GIL. Some Python implementations do, most obviously 
CPython, the reference implementation. But Jython and IronPython don't. If the 
GIL is a problem for your program, consider running it on Jython or IronPython.




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Russel Winder
Steven,

On Sun, 2012-05-13 at 10:22 +1000, Steven D'Aprano wrote:
> carlo locci wrote:
> > Hello All,
> > I've started to study python a couple of month ago(and I truly love it :)),
> > however I'm having some problems understanding how to modify a sequential
> > script and make it multithreaded (I think it's because I'm not used to
> > think in that way), 
> 
> No, that's because multithreading and parallel processing is hard.

Shared memory multithreading may be hard due to locks, semaphores,
monitors, etc., but concurrency and parallelism need not be hard. Using
processes and message passing, using dataflow, actors or CSP,
parallelism and concurrency is far more straightforward. Not easy,
agreed, but then programming isn't easy.

> > as well as when it's best to use it(some say that
> > because of the GIL I won't get any real benefit from threading my script).
> 
> That depends on what your script does.
> 
> In a nutshell, if your program is limited by CPU processing, then using 
> threads in Python won't help. (There are other things you can do instead, 
> such 
> as launching new Python processes.)

The GIL in Python is a bad thing for parallelism. Using the
multiprocessing package or concurrent.futures gets over the problem.
Well sort of, these processes are a bit heavyweight compared to what can
be achieved on the JVM or with Erlang.

> If your program is limited by disk or network I/O, then there is a 
> possibility 
> you can speed it up with threads.

Or better still use an event based system, cf Twisted.

[...]
> 

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-13 Thread Steven D'Aprano

bob gailer wrote:

On 5/12/2012 8:22 PM, Steven D'Aprano wrote:
By the way, in future, please don't decorate your code with stars: 
I think you got stars because the code was posted in HTML and bolded. 
Plain text readers add the * to show emphasis.


I think you have it the other way around: if you add asterisks around text, 
some plain text readers hide the * and bold the text. At least, I've never 
seen anything which does it the other way around. (Possibly until now.)


In any case, I'm using Thunderbird, and it does NOT show stars around text 
unless they are already there. When I look at the raw email source, I can see 
the asterisks there.


Perhaps Carlo's mail client is trying to be helpful, and failing miserably. 
While converting HTML   tags into simple markup is a nice thing to do 
for plain text, it plays havoc with code.




--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread bob gailer

def read():
couple of observations
1 - it is customary to put all import statements at the beginning of the 
file.
2 - it is customary to begin variable and function names with a lower 
case letter.
3 - it is better to avoid using built-in function names common method 
names (e.g. read).


def read():
import csv
with open('C:\\test\\VDB.csv', 'rb') as somefile:
read = csv.reader(somefile)
l = []
for row in read:
l += row
return l

def DirGetSize(cartella):
import os
cartella_size = 0
for (path, dirs, files) in os.walk(cartella):
for x in files:
filename = os.path.join(path, x)
cartella_size += os.path.getsize(filename)
return cartella_size

import os.path
for x in read():
if not os.path.exists(x):
print ' DOES NOT EXIST ON', x
else:
S = DirGetSize(x)
print 'the file size of', x, 'is',S



--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread bob gailer

On 5/12/2012 8:22 PM, Steven D'Aprano wrote:
By the way, in future, please don't decorate your code with stars: 
I think you got stars because the code was posted in HTML and bolded. 
Plain text readers add the * to show emphasis.


When i copied and pasted the code it came out fine.

carlo: in future please post plain text rather than HTML.

--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading mind set

2012-05-12 Thread Steven D'Aprano

carlo locci wrote:

Hello All,
I've started to study python a couple of month ago(and I truly love it :)),
however I'm having some problems understanding how to modify a sequential
script and make it multithreaded (I think it's because I'm not used to
think in that way), 


No, that's because multithreading and parallel processing is hard.



as well as when it's best to use it(some say that
because of the GIL I won't get any real benefit from threading my script).


That depends on what your script does.

In a nutshell, if your program is limited by CPU processing, then using 
threads in Python won't help. (There are other things you can do instead, such 
as launching new Python processes.)


If your program is limited by disk or network I/O, then there is a possibility 
you can speed it up with threads.




It's my understanding that threading a program in python can be useful when
we've got some I/O involved,


To see the benefit of threads, it's not enough to have "some" I/O, you need 
*lots* of I/O. Threads have some overhead. Unless you save at least as much 
time as just starting and managing the threads consumes, you won't see any 
speed up.


In my experience, for what little it's worth [emphasis on "little"], unless 
you can keep at least four threads busy doing separate I/O, it probably isn't 
worth the time and effort. And it's probably not worth it for trivial scripts 
-- who cares if you speed your script up from 0.2 seconds to 0.1 seconds?


But as a learning exercise, sure, go ahead and convert your script to threads. 
One experiment is worth a dozen opinions.


You can learn more about threading from here:

http://www.doughellmann.com/PyMOTW/threading/


By the way, in future, please don't decorate your code with stars:


* def read():*
*import csv*
*with open('C:\\test\\VDB.csv', 'rb') as somefile:*

[...]


We should be able to copy and paste your code and have it run immediately, not 
have to spend time editing it by hand to turn it back into valid Python code 
that doesn't give a SyntaxError on every line.


See also this: http://sscce.org/



--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] threading mind set

2012-05-12 Thread carlo locci
Hello All,
I've started to study python a couple of month ago(and I truly love it :)),
however I'm having some problems understanding how to modify a sequential
script and make it multithreaded (I think it's because I'm not used to
think in that way), as well as when it's best to use it(some say that
because of the GIL I won't get any real benefit from threading my script).
It's my understanding that threading a program in python can be useful when
we've got some I/O involved, so here is my case, I wrote a quite simple
script that reads the first column from a csv file and insert every row of
the value into a tuple, then I created a function which gets me the size of
a given path/folder and I made it loop so that it'll print the the folder
dimension of each path is in the tuple previously created. Here's the code:

* def read():*
*import csv*
*with open('C:\\test\\VDB.csv', 'rb') as somefile:*
*read = csv.reader(somefile)*
*l = []*
*for row in read:*
*l += row*
*return l*
*
*
*def DirGetSize(cartella):*
*import os*
*cartella_size = 0*
*for (path, dirs, files) in os.walk(cartella):*
*for x in files:*
*filename = os.path.join(path, x)*
*cartella_size += os.path.getsize(filename)*
*return cartella_size*
*
*
*import os.path*
*for x in read():*
*if not os.path.exists(x):*
*print ' DOES NOT EXIST ON', x*
*else:*
*S = DirGetSize(x)*
*print 'the file size of', x, 'is',S*
*
*
The script works quite well(at least does what I want), but my real
question is will I gain any better performance, in terms of speed, out of
it, if I multithread it? The csv file contains a list of server/path/folder
therefore I though that If I would multitread it I's gonna became much
faster since it will perform the *DirGetSize,*
function almost concurrently, although I'm quite confused by the subject,
so I'm not really sure. I would really appreciate anyone who would make me
understand when it's useful to implement a multreaded script and when it's
not and why :),(Maybe I'm asking to much), as well as any good resources
where I can study
from. Thank you in advance to anyone who will reply me as well as thank you
for having such a mailinglist(I discovered it when I had watched a google
I/O conference on youtube). Thank you guys.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor