Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL

2011-12-07 Thread Pauli Virtanen
06.12.2011 23:31, Oleg Mikulya kirjoitti:
> How to make Numpy to match Matlab in term of performance ? I have tryied
> with different options, using different MKL libraries and ICC versions,
> still Numpy is below Matalb for certain basic tasks by ~2x. About 5
> years ago I was able to get about same speed, not anymore. Matlab
> suppose to use same MKL, what it the reason of such Numpy slowness
> (beside one, yet fundamental, task)?

There should be no reason for a difference. It simply makes the calls to
the external library, and the wrapper code is straightforward.

If Numpy indeed is linked against MKL (check the build log), then one
possible reason could be different threading options passed to MKL.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-07 Thread Pierre Haessig
Le 06/12/2011 23:13, Wes McKinney a écrit :
> I think R has two functions read.csv and read.csv2, where read.csv2 is
> capable of dealing with things like European decimal format.
>
I may be wrong, but from R's help I understand that read.csv, read.csv2, 
read.delim, ...
are just calls to read.table with different default values (for 
separtor, decimal sign, )
This function read.table is indeed pretty flexible (see signature below)

Having a dedicated fast function for properly formatted CSV table may be 
a good idea.
But how to define "properly formatted" ... I've seen many tiny 
variations so I'm not sure !

Now for my personal use, I was not so frustrated by loading performance 
but rather by NA support, so I wrote my own loadCsv function to get a 
masked array. It was nor beautiful, neither very efficient, but it does 
the job !

Best,
Pierre

read.table &co signatures :

read.table(file, header = FALSE, sep = "", quote = "\"'",
 dec = ".", row.names, col.names,
 as.is = !stringsAsFactors,
 na.strings = "NA", colClasses = NA, nrows = -1,
 skip = 0, check.names = TRUE, fill = !blank.lines.skip,
 strip.white = FALSE, blank.lines.skip = TRUE,
 comment.char = "#",
 allowEscapes = FALSE, flush = FALSE,
 stringsAsFactors = default.stringsAsFactors(),
 fileEncoding = "", encoding = "unknown", text)

read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",
   fill = TRUE, comment.char="", ...)

read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",",
fill = TRUE, comment.char="", ...)

-
Copy paste from my own dirty "csv toolbox"

NA = -.
def _NA_conv(s):
 '''convert a string number representation into a float,
 with a special behaviour for "NA" values :
 if s=="" or "NA", it returns the key value NA (set to -.)
 '''
 if s=='' or s=='NA':
 return NA
 else:
 return float(s)

def loadCsv(filename, delimiter=',', usecols=None, skiprows=1):
 '''wrapper around numpy.loadtxt to load
 a properly R formatted CSV file with NA values
 of which the first row should be a header row

 Returns
 ---
 (headers, data, dataNAs)
 '''
 # 1) Read header
 headers = []
 with open(filename) as f:
 line = f.readline().strip()
 headers = line.split(delimiter)

 if usecols:
 headers = [headers[i] for i in usecols]

 # 2) Read
 converters = None
 if usecols is not None:
 converters = dict(zip(usecols, [_NA_conv]*len(usecols)))
 data = np.loadtxt(filename,
   delimiter=delimiter, usecols=usecols, 
skiprows=skiprows,
   converters = converters
   )

 dataNAs = (data == NA)
 # Set NAs to zero
 data[dataNAs] = 0.
 # Transforms array in "masked array"
 data = np.ma.masked_array(data, dataNAs)

 return (headers, data, dataNAs)


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-07 Thread Pierre GM

On Dec 07, 2011, at 11:24 , Pierre Haessig wrote:
> 
> Now for my personal use, I was not so frustrated by loading performance 
> but rather by NA support, so I wrote my own loadCsv function to get a 
> masked array. It was nor beautiful, neither very efficient, but it does 
> the job !

Ever tried to use genfromtxt ?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-07 Thread Pierre Haessig
Le 07/12/2011 12:42, Pierre GM a écrit :
> Ever tried to use genfromtxt ?
You'll guess I didn't ... next time I'll do ;-)
Thanks for the tip !

Best,
Pierre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Apparently non-deterministic behaviour of complex array multiplication

2011-12-07 Thread Olivier Delalleau
I was trying to see if I could reproduce this problem, but your code fails
with numpy 1.6.1 with:
AttributeError: 'numpy.ndarray' object has no attribute 'H'
Is X supposed to be a regular ndarray with dtype = 'complex128', or
something else?

-=- Olivier

2011/12/5 kneil 

>
> Hi Nathaniel,
> Thanks for the suggestion.  I more or less implemented it:
>
>np.save('X',X);
>X2=np.load('X.npy')
>X2=np.asmatrix(X2)
>diffy = (X != X2)
>if diffy.any():
>print X[diffy]
>print X2[diffy]
>print X[diffy][0].view(np.uint8)
>print X2[diffy][0].view(np.uint8)
>S=X*X.H/k
>S2=X2*X2.H/k
>
>nanElts=find(isnan(S))
>if len(nanElts)!=0:
>print 'WARNING: Nans in S:'+str(find(isnan(S)))
>print 'WARNING: Nans in S2:'+str(find(isnan(S2)))
>
>
>
> My ouput, (when I got NaN) mostly indicated that both arrays are
> numerically
> identical, and that they evaluated to have the same nan-value entries.
>
> For example
> >>WARNING: Nans in S:[ 6 16]
> >>WARNING: Nans in S2:[ 6 16]
>
> Another time I got as output:
>
> >>WARNING: Nans in S:[ 26  36  46  54  64  72  82  92 100 110 128 138 146
> 156 166 174 184 192
>  202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342
>  352 360 370 380 388 398 416 426 434 444 454 464 474]
> >>WARNING: Nans in S2:[ 26  36  46  54  64  72  82  92 100 110 128 138 146
> 156 166 174 184 192
>  202 212 220 230 240 250 260 268 278 279 296 297 306 314 324 334 335 342
>  352 360 370 380 388 398 416 426 434 444 454 464 474]
>
> These were different arrays I think.  At anyrate, those two results
> appeared
> from two runs of the exact same code.  I do not use any random numbers in
> the code by the way.  Most of the time the code runs without any nan
> showing
> up at all, so this is an improvement.
>
> *I am pretty sure that one time there were nan in S, but not in S2, yet
> still no difference was observed in the two matrices X and X2.  But, I did
> not save that output, so I can't prove it to myself, ... but I am pretty
> sure I saw that.
>
> I will try and run memtest tonight.  I am going out of town for a week and
> probably wont be able to test until next week.
> cheers,
> Karl
>
> I also think What was beyond w:
> 1. I have many less NaN than I used to, but still get NaN in S,
> but NOT in S2!
>
>
>
> Nathaniel Smith wrote:
> >
> > If save/load actually makes a reliable difference, then it would be
> useful
> > to do something like this, and see what you see:
> >
> > save("X", X)
> > X2 = load("X.npy")
> > diff = (X == X2)
> > # did save/load change anything?
> > any(diff)
> > # if so, then what changed?
> > X[diff]
> > X2[diff]
> > # any subtle differences in floating point representation?
> > X[diff][0].view(np.uint8)
> > X2[diff][0].view(np.uint8)
> >
> > (You should still run memtest. It's very easy - just install it with your
> > package manager, then reboot. Hold down the shift key while booting, and
> > you'll get a boot menu. Choose memtest, and then leave it to run
> > overnight.)
> >
> > - Nathaniel
> > On Dec 2, 2011 10:10 PM, "kneil"  wrote:
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Apparently-non-deterministic-behaviour-of-complex-array-multiplication-tp32893004p32922174.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-07 Thread Thouis (Ray) Jones
On Tue, Dec 6, 2011 at 22:11, Ralf Gommers  wrote:
> To be a bit more detailed here, these are the most significant pull requests
> / patches that I think can be merged with a limited amount of work:
> meshgrid enhancements: http://projects.scipy.org/numpy/ticket/966
> sample_from function: https://github.com/numpy/numpy/pull/151
> loadtable function: https://github.com/numpy/numpy/pull/143
>
> Other maintenance things:
> - un-deprecate putmask
> - clean up causes of "DType strings 'O4' and 'O8' are deprecated..."
> - fix failing einsum and polyfit tests
> - update release notes

I'd suggest that, if possible, someone with sufficient knowledge to
evaluate it look at ticket #1990 (data truncation from arrays of
strings and integers), since it's both potentially dangerous, as well
as a new bug introduced between 1.5.1 and 1.6.1.  It might be
straightforward to fix without too much difficulty, and if so, I think
it's probably worth it.

My opinion might be unduly influenced by a collaborator having been
bitten by this bug recently, and having to throw away and redo a few
weeks of calculations and analysis.

Ray Jones
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fast Reading of ASCII files

2011-12-07 Thread Chris.Barker
Hi folks,

This is a continuation of a conversation already started, but i gave it 
a new, more appropriate, thread and subject.

On 12/6/11 2:13 PM, Wes McKinney wrote:
> we should start talking
> about building a *high performance* flat file loading solution with
> good column type inference and sensible defaults, etc.
...

>  I personally don't
> believe in sacrificing an order of magnitude of performance in the 90%
> case for the 10% case-- so maybe it makes sense to have two functions
> around: a superfast custom CSV reader for well-behaved data, and a
> slower, but highly flexible, function like loadtable to fall back on.

I've wanted this for ages, and have done some work towards it, but like 
others, only had the time for a my-use-case-specific solution. A few 
thoughts:

* If we have a good, fast ascii (or unicode?) to array reader, hopefully 
it could be leveraged for use in the more complex cases. So that rather 
than genfromtxt() being written from scratch, it would be a wrapper 
around the lower-level reader.

* key to performance is to have the text to number to numpy type 
happening in C -- if you read the text with python, then convert to 
numbers, then to numpy arrays, it's simple going to be slow.

* I think we want a solution that can be adapted to arbitrary text files 
-- not just tabular, CSV-style data. I have a lot of those to read - and 
some thoughts about how.

Efforts I have made so far, and what I've learned from them:

1) fromfile():
 fromfile (for text) is nice and fast, but buggy, and a bit too 
limited. I've posted various notes about this in the past (and, I'm 
pretty sure a couple tickets). They key missing features are:
   a) no support form commented lines (this is a lessor need, I think)
   b) there can be only one delimiter, and newlines are treated as 
generic whitespace. What this means is that if you have 
whitespace-delimited file, you can read multiple lines, but if it is, 
for instance, comma-delimited, then you can only read one line at a 
time, killing performance.
   c) there are various bugs if the text is malformed, or doesn't quite 
match what you're asking for (ie.e reading integers, but the tet is 
float) -- mostly really limited error checking.

I spent some time digging into the code, and found it to be really hard 
to track C code. And very hard to update. The core idea is pretty nice 
-- each dtype should know how to read itself form a text file, but the 
implementation is painful. The key issue is that for floats and ints, 
anyway, it relies on the C atoi and atof functions. However, there have 
been patches to these that handle NaN better, etc, for numpy, and I 
think a python patch as well. So the code calls the numpy atoi, which 
does some checks, then calls the python atoi, which then calls the C lib 
atoi (I think all that...) In any case, the core bugs are due to the 
fact that atoi and friends doesn't return an error code, so you have to 
check if the pointer has been incremented to see if the read was 
successful -- this error checking is not propagated through all those 
levels of calls. It got really ugly to try to fix! Also, the use of the 
C atoi() means that locales may only be handled in the default way -- 
i.e. no way to read european-style floats on a system with a US locale.

My conclusion -- the current code is too much a mess to try to deal with 
and fix!

I also think it's a mistake to have text file reading a special case of 
fromfile(), it really should be a separate issue, though that's a minor 
API question.

2) FileScanner:

FileScanner is some code a wrote years ago as a C extension - it's 
limited, but does the job and is pretty fast. It essentially calls 
fscanf() as many times as it gets a successful scan, skipping all 
invalid text, then returning a numpy array. You can also specify how 
many numbers you want read from the file. It only supports floats. 
Travis O. asked it it could be included in Scipy way back when, but I 
suspect none of my code actually made it in.

If I had to do it again, I might write something similar in Cython, 
though I am still using it.


My Conclusions:

I think what we need is something similar to MATLAB's fscanf():

what it does is take a C-style format string, and apply it to your file 
over an over again as many times as it can, and returns an array. What's 
nice about this is that it can be purposed to efficiently read a wide 
variety of text files fast.

For numpy, I imagine something like:

fromtextfile(f, dtype=np.float64, comment=None, shape=None):
"""
read data from a text file, returning a numpy array

f: is a filename or file-like object

comment: is a string of the comment signifier. Anything on a line
 after this string will be ignored.

dytpe: is a numpy dtype that you want read from the file

shape: is the shape of the resulting array. If shape==None, the
   file will be read until EOF or until there is read error.
   By 

Re: [Numpy-discussion] numpy 1.7.0 release?

2011-12-07 Thread Bruce Southey
On Tue, Dec 6, 2011 at 4:13 PM, Wes McKinney  wrote:
> On Tue, Dec 6, 2011 at 4:11 PM, Ralf Gommers
>  wrote:
>>
>>
>> On Mon, Dec 5, 2011 at 8:43 PM, Ralf Gommers 
>> wrote:
>>>
>>> Hi all,
>>>
>>> It's been a little over 6 months since the release of 1.6.0 and the NA
>>> debate has quieted down, so I'd like to ask your opinion on the timing of
>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small
>>> improvements, plus three larger chucks of work:
>>>
>>> - datetime
>>> - NA
>>> - Bento support
>>>
>>> My impression is that both datetime and NA are releasable, but should be
>>> labeled "tech preview" or something similar, because they may still see
>>> significant changes. Please correct me if I'm wrong.
>>>
>>> There's still some maintenance work to do and pull requests to merge, but
>>> a beta release by Christmas should be feasible.
>>
>>
>> To be a bit more detailed here, these are the most significant pull requests
>> / patches that I think can be merged with a limited amount of work:
>> meshgrid enhancements: http://projects.scipy.org/numpy/ticket/966
>> sample_from function: https://github.com/numpy/numpy/pull/151
>> loadtable function: https://github.com/numpy/numpy/pull/143
>>
>> Other maintenance things:
>> - un-deprecate putmask
>> - clean up causes of "DType strings 'O4' and 'O8' are deprecated..."
>> - fix failing einsum and polyfit tests
>> - update release notes
>>
>> Cheers,
>> Ralf
>>
>>
>>> What do you all think?
>>>
>>>
>>> Cheers,
>>> Ralf
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> This isn't the place for this discussion but we should start talking
> about building a *high performance* flat file loading solution with
> good column type inference and sensible defaults, etc. It's clear that
> loadtable is aiming for highest compatibility-- for example I can read
> a 2800x30 file in < 50 ms with the read_table / read_csv functions I
> wrote myself recent in Cython (compared with loadtable taking > 1s as
> quoted in the pull request), but I don't handle European decimal
> formats and lots of other sources of unruliness. I personally don't
> believe in sacrificing an order of magnitude of performance in the 90%
> case for the 10% case-- so maybe it makes sense to have two functions
> around: a superfast custom CSV reader for well-behaved data, and a
> slower, but highly flexible, function like loadtable to fall back on.
> I think R has two functions read.csv and read.csv2, where read.csv2 is
> capable of dealing with things like European decimal format.
>
> - Wes
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

I do not agree with loadtable request simply because not wanting to
have functions that do virtually the same thing - as the comments on
the pull request (and Chris's email on 'Fast Reading of ASCII files').
I would like to see a valid user space justification for including it
because just using regex's is not a suitable justification (but I
agree it is a interesting feature):
If loadtable will be a complete replacement for genfromtxt then there
needs a plan towards supporting all the features of genfromtxt like
'skip_footer' and then genfromtxt needs to be set on the path to be
depreciated.
If loadtable is an intermediate between loadttxt and genfromtxt, then
loadtable needs to be clear exactly what loadtable does not do that
genfromtxt does (anything that loadtable does and genfromtxt does not
do, should be filed as bug against genfromtxt).

Knowing the case makes it easier to provide help by directing users to
the appropriate function and which function should have bug reports
against. For example, loadtxt requires 'Each row in the text file must
have the same number of values' so one can direct a user to genfromtxt
for that case rather than filing a bug report against loadtxt.

I am also somewhat concerned regarding the NA object because of the
limited implementation available. For example, numpy.dot is not
implemented.  Also there appears to be no plan to increase the
implementation across numpy or support it long term. So while I have
no problem with it being included, I do think there must be a serious
commitment to having it fully supporting in the near future as well as
providing a suitable long term roadmap. Otherwise it will just be a
problematic code dump that will be difficult to support.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Simple way to launch python processes?

2011-12-07 Thread Lou Pecora
I would like to launch python modules or functions (I don't know which is 
easier to do, modules or functions) in separate Terminal windows so I can see 
the output from each as they execute.  I need to be able to pass each module or 
function a set of parameters.  I would like to do this from a python script 
already running in a Terminal window.  In other words, I'd start up a "master" 
script and it would launch, say, three processes using another module or a 
function with different parameter values for each launch and each would run 
independently in its own Terminal window so stdout from each process would go 
to it's own respective window.  When the process terminated the window would 
remain open.

I've begun to look at subprocess modules, etc., but that's pretty confusing. I 
can do what I say above manually, but it's gotten clumsy as I want to run 
eventually in 12 cores.

I have a Mac Pro running Mac OS X 10.6.

If there is a better forum to ask this question, please let me know. 

Thanks for any advice.

 
-- Lou Pecora,   my views are my own.___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL

2011-12-07 Thread Oleg Mikulya
 Agree with your statement. Yes, it is MKL, indeed. For linear equations it
is no difference, but there is difference for other functions. And yes, my
suspicions is just threading options. How to pass them to MKL from python?
Should I change some compiling options or environment options?

On Wed, Dec 7, 2011 at 2:02 AM, Pauli Virtanen  wrote:

> 06.12.2011 23:31, Oleg Mikulya kirjoitti:
> > How to make Numpy to match Matlab in term of performance ? I have tryied
> > with different options, using different MKL libraries and ICC versions,
> > still Numpy is below Matalb for certain basic tasks by ~2x. About 5
> > years ago I was able to get about same speed, not anymore. Matlab
> > suppose to use same MKL, what it the reason of such Numpy slowness
> > (beside one, yet fundamental, task)?
>
> There should be no reason for a difference. It simply makes the calls to
> the external library, and the wrapper code is straightforward.
>
> If Numpy indeed is linked against MKL (check the build log), then one
> possible reason could be different threading options passed to MKL.
>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple way to launch python processes?

2011-12-07 Thread Olivier Delalleau
Maybe try stackoverflow, since this isn't really a numpy question.
To run a command like "python myscript.py arg1 arg2" in a separate process,
you can do:
p = subprocess.Popen("python myscript.py arg1 arg2".split())
You can launch many of these, and if you want to know if a process p is
over, you can call p.poll().
I'm sure there are other (and better) options though.

-=- Olivier

2011/12/7 Lou Pecora 

> I would like to launch python modules or functions (I don't know which is
> easier to do, modules or functions) in separate Terminal windows so I can
> see the output from each as they execute.  I need to be able to pass each
> module or function a set of parameters.  I would like to do this from a
> python script already running in a Terminal window.  In other words, I'd
> start up a "master" script and it would launch, say, three processes using
> another module or a function with different parameter values for each
> launch and each would run independently in its own Terminal window so
> stdout from each process would go to it's own respective window.  When the
> process terminated the window would remain open.
>
> I've begun to look at subprocess modules, etc., but that's pretty
> confusing. I can do what I say above manually, but it's gotten clumsy as I
> want to run eventually in 12 cores.
>
> I have a Mac Pro running Mac OS X 10.6.
>
> If there is a better forum to ask this question, please let me know.
>
> Thanks for any advice.
>
>
> -- Lou Pecora, my views are my own.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple way to launch python processes?

2011-12-07 Thread Jean-Baptiste Marquette
You should consider the powerful multiprocessing package. Have a look on this 
piece of code:

import glob
import os
import multiprocessing as multi
import subprocess as sub
import time

NPROC = 4
Python = '/Library/Frameworks/EPD64.framework/Versions/Current/bin/python'
Xterm = '/usr/X11/bin/xterm '

coord = []
Size = '100x10'
XPos = 810
YPos = 170
XOffset = 0
YOffset = 0

for i in range(NPROC):
if i % 2 == 0:
coord.append(Size + '+' + str(YPos) + '+' + str(YOffset))
else:
coord.append(Size + '+' + str(XPos) + '+' + str(YOffset))
YOffset = YOffset + YPos

def CompareColourRef(Champ):
BaseChamp = os.path.basename(Champ)
NameProc = int(multi.current_process().name[-1]) - 1
print 'Processing', BaseChamp, 'on processor', NameProc+1
os.putenv('ADAM_USER', DirWrk + 'adam_' + str(NameProc+1))
Command =  Xterm + '-geometry ' + '"' + coord[NameProc] + '" -T " Proc' + 
str(NameProc+1) + ' ' + BaseChamp + ' ' + '" -e " ' + Python + ' ' + DirSrc + \
'CompareColourRef.py ' + BaseChamp + ' 2>&1 | tee ' + DirLog + 
BaseChamp + '.log"'
Process = sub.Popen([Command], shell=True)
Process.wait()
print BaseChamp, 'processed on processor', NameProc+1
return

pool = multi.Pool(processes=NPROC)

Champs = glob.glob(DirImg + '*/*')
results = pool.map_async(CompareColourRef, Champs)
pool.close()

while results._number_left > 0:
print "Waiting for", results._number_left, 'tasks to complete'
time.sleep(15)

pool.join()

print 'Process completed'
exit(0)

Cheers
Jean-Baptiste


Le 7 déc. 2011 à 15:43, Olivier Delalleau a écrit :

> Maybe try stackoverflow, since this isn't really a numpy question.
> To run a command like "python myscript.py arg1 arg2" in a separate process, 
> you can do:
> p = subprocess.Popen("python myscript.py arg1 arg2".split())
> You can launch many of these, and if you want to know if a process p is over, 
> you can call p.poll().
> I'm sure there are other (and better) options though.
> 
> -=- Olivier
> 
> 2011/12/7 Lou Pecora 
> I would like to launch python modules or functions (I don't know which is 
> easier to do, modules or functions) in separate Terminal windows so I can see 
> the output from each as they execute.  I need to be able to pass each module 
> or function a set of parameters.  I would like to do this from a python 
> script already running in a Terminal window.  In other words, I'd start up a 
> "master" script and it would launch, say, three processes using another 
> module or a function with different parameter values for each launch and each 
> would run independently in its own Terminal window so stdout from each 
> process would go to it's own respective window.  When the process terminated 
> the window would remain open.
> 
> I've begun to look at subprocess modules, etc., but that's pretty confusing. 
> I can do what I say above manually, but it's gotten clumsy as I want to run 
> eventually in 12 cores.
> 
> I have a Mac Pro running Mac OS X 10.6.
> 
> If there is a better forum to ask this question, please let me know. 
> 
> Thanks for any advice.
> 
>  
> -- Lou Pecora, my views are my own.
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL

2011-12-07 Thread Derek Homeier
On 07.12.2011, at 9:38PM, Oleg Mikulya wrote:

> Agree with your statement. Yes, it is MKL, indeed. For linear equations it is 
> no difference, but there is difference for other functions. And yes, my 
> suspicions is just threading options. How to pass them to MKL from python? 
> Should I change some compiling options or environment options? 
> 
You could check by monitoring the CPU usage while running the tasks - if it 
remains 
around 100% it is rather not using multiple threads. Generally MKL (if you 
linked the 
multi-threaded version, which seems to be the case, as mkl_intel_thread is in 
the libs) 
heeds the OMP_NUM_THREADS environment variable like other OpenMP programs. 
If that's set to your no. of cores before starting Python, it should be 
inherited; might also 
be possible to set it within Python (in any case you can check it with 
os.getenv()). 
I don't know if matlab sets different defaults so multiple threads are 
automatically used; 
normally I'd also expect Python to use all available cores if OMP_NUM_THREADS 
is not set at all…

Cheers,
Derek

> On Wed, Dec 7, 2011 at 2:02 AM, Pauli Virtanen  wrote:
> 06.12.2011 23:31, Oleg Mikulya kirjoitti:
> > How to make Numpy to match Matlab in term of performance ? I have tryied
> > with different options, using different MKL libraries and ICC versions,
> > still Numpy is below Matalb for certain basic tasks by ~2x. About 5
> > years ago I was able to get about same speed, not anymore. Matlab
> > suppose to use same MKL, what it the reason of such Numpy slowness
> > (beside one, yet fundamental, task)?
> 
> There should be no reason for a difference. It simply makes the calls to
> the external library, and the wrapper code is straightforward.
> 
> If Numpy indeed is linked against MKL (check the build log), then one
> possible reason could be different threading options passed to MKL.
> 
> --
> Pauli Virtanen
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL

2011-12-07 Thread David Cournapeau
On Tue, Dec 6, 2011 at 5:31 PM, Oleg Mikulya  wrote:
> Hi,
>
> How to make Numpy to match Matlab in term of performance ? I have tryied
> with different options, using different MKL libraries and ICC versions,
> still Numpy is below Matalb for certain basic tasks by ~2x. About 5 years
> ago I was able to get about same speed, not anymore. Matlab suppose to use
> same MKL, what it the reason of such Numpy slowness (beside one, yet
> fundamental, task) ?

Have you checked that the returned values are the same (up to some
precision) ? It may be that we don't use the same lapack underlying
function,

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple way to launch python processes?

2011-12-07 Thread Lou Pecora
From: Olivier Delalleau 

To: Discussion of Numerical Python  
Sent: Wednesday, December 7, 2011 3:43 PM
Subject: Re: [Numpy-discussion] Simple way to launch python processes?
 

Maybe try stackoverflow, since this isn't really a numpy question.
To run a command like "python myscript.py arg1 arg2" in a separate process, you 
can do:
    p = subprocess.Popen("python myscript.py arg1 arg2".split())
You can launch many of these, and if you want to know if a process p is over, 
you can call p.poll().
I'm sure there are other (and better) options though.

-=- Olivier



Thank you.
 
-- Lou Pecora, my views are my own.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Simple way to launch python processes?

2011-12-07 Thread Lou Pecora
From: Jean-Baptiste Marquette 

To: Discussion of Numerical Python  
Sent: Wednesday, December 7, 2011 4:23 PM
Subject: Re: [Numpy-discussion] Simple way to launch python processes?
 

You should consider the powerful multiprocessing package. Have a look on this 
piece of code:

importglob
importos
import multiprocessing as multi
import subprocess as sub
importtime

NPROC = 4
Python = '/Library/Frameworks/EPD64.framework/Versions/Current/bin/python'
Xterm = '/usr/X11/bin/xterm '

coord = []
Size = '100x10'
XPos = 810
YPos = 170
XOffset = 0
YOffset = 0

for i in range(NPROC):
    if i % 2 == 0:
        coord.append(Size + '+' + str(YPos) + '+' + str(YOffset))
    else:
        coord.append(Size + '+' + str(XPos) + '+' + str(YOffset))
        YOffset = YOffset + YPos

def CompareColourRef(Champ):
    BaseChamp = os.path.basename(Champ)
    NameProc = int(multi.current_process().name[-1]) - 1
    print 'Processing', BaseChamp, 'on processor', NameProc+1
    os.putenv('ADAM_USER', DirWrk + 'adam_' + str(NameProc+1))
    Command =  Xterm + '-geometry ' + '"' + coord[NameProc] + '" -T " Proc' + 
str(NameProc+1) + ' ' + BaseChamp + ' ' + '" -e " ' + Python + ' ' + DirSrc + \
        'CompareColourRef.py ' + BaseChamp + ' 2>&1 | tee' + DirLog + BaseChamp 
+ '.log"'
    Process = sub.Popen([Command], shell=True)
    Process.wait()
    print BaseChamp, 'processed on processor', NameProc+1
    return

pool = multi.Pool(processes=NPROC)

Champs = glob.glob(DirImg + '*/*')
results = pool.map_async(CompareColourRef, Champs)
pool.close()

while results._number_left > 0:
    print"Waiting for", results._number_left, 'tasks to complete'
    time.sleep(15)
    

pool.join()

print'Process completed'
exit(0)

Cheers
Jean-Baptiste

--

Wow.  I will have to digest that, but thank you.

 
-- Lou Pecora, my views are my own.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] idea of optimisation?

2011-12-07 Thread Xavier Barthelemy
Actually this can be a good idea. i didn't thought using he sorting.
i'll try

thanks for yours ideas
Xavier

2011/12/7 Tony Yu 

>
>
> On Tue, Dec 6, 2011 at 2:51 AM, Xavier Barthelemy wrote:
>
>> ok let me be more precise
>>
>> I have an Z array which is the elevation
>> from this I extract a discrete array of Zero Crossing, and another
>> discrete array of Crests.
>> len(crest) is different than len(Xzeros). I have a threshold method to
>> detect my "valid" crests, and sometimes there are 2 crests between two
>> zero-crossing (grouping effect)
>>
>> Crest and Zeros are 2 different arrays, with positions. example:
>> Zeros=[1,2,3,4] Arrays=[1.5,1.7,3.5]
>>
>>
>> and yes arrays can be sorted. not a problm with this.
>>
>> Xavier
>>
>> I may be oversimplifying this, but does searchsorted do what you want?
>
> In [314]: xzeros=[1,2,3,4]; xcrests=[1.5,1.7,3.5]
>
> In [315]: np.searchsorted(xzeros, xcrests)
> Out[315]: array([1, 1, 3])
>
>  This returns the indexes of xzeros to the left of xcrests.
>
> -Tony
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
 « Quand le gouvernement viole les droits du peuple, l'insurrection est,
pour le peuple et pour chaque portion du peuple, le plus sacré des droits
et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] type checking, what's recommended?

2011-12-07 Thread josef . pktd
If I want to know whether something that might be an array is really a
plain ndarray and not a subclass, is using `type` the safest bet?

All the other forms don't discriminate against subclasses.

>>> type(np.ma.zeros(3)) is np.ndarray
False
>>> type(np.zeros(3)) is np.ndarray
True

>>> isinstance(np.ma.zeros(3), np.ndarray)
True
>>> isinstance(np.zeros(3), np.ndarray)
True

>>> issubclass(np.ma.zeros(3).__class__, np.ndarray)
True
>>> issubclass(np.zeros(3).__class__, np.ndarray)
True

>>> isinstance(np.matrix(np.zeros(3)), np.ndarray)
True
>>> type(np.matrix(np.zeros(3))) is np.ndarray
False

Thanks,

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] type checking, what's recommended?

2011-12-07 Thread Olivier Delalleau
We have indeed been using "type(a) is np.ndarray" in Theano to check that.
If there's a better way, I'm interested to know as well :)

-=- Olivier

2011/12/7 

> If I want to know whether something that might be an array is really a
> plain ndarray and not a subclass, is using `type` the safest bet?
>
> All the other forms don't discriminate against subclasses.
>
> >>> type(np.ma.zeros(3)) is np.ndarray
> False
> >>> type(np.zeros(3)) is np.ndarray
> True
>
> >>> isinstance(np.ma.zeros(3), np.ndarray)
> True
> >>> isinstance(np.zeros(3), np.ndarray)
> True
>
> >>> issubclass(np.ma.zeros(3).__class__, np.ndarray)
> True
> >>> issubclass(np.zeros(3).__class__, np.ndarray)
> True
>
> >>> isinstance(np.matrix(np.zeros(3)), np.ndarray)
> True
> >>> type(np.matrix(np.zeros(3))) is np.ndarray
> False
>
> Thanks,
>
> Josef
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion