[Numpy-discussion] confusion about eigenvector

2008-02-27 Thread [EMAIL PROTECTED]
i all
I am learning PCA method by reading up Turk&Petland papers etc
while trying out PCA on a set of greyscale images using python, and
numpy I tried to create eigenvectors and facespace.

i have
 facesarray--- an NXP numpy.ndarray that contains data of images
   N=numof images,P=pixels in an image
avgarray --1XP array containing avg value for each pixel
 adjustedfaces=facesarray-avgarray
adjustedmatrix=matrix(adjustedfaces)
adjustedmatrix_trans=adjustedmatrix.transpose()
covariancematrix =adjustedmatrix*adjustedmatrix_trans
evalues,evect=eigh(covariancematrix)

after sorting such that most significant eigenvectors are selected.
evectmatrix is now my eigenvectors matrix

here is a sample using 4X3 greyscale images

evalues
[ -1.85852801e-13   6.31143639e+02   3.31182765e+03   5.29077871e+03]
evect
[[ 0.5-0.06727772  0.6496399  -0.56871936]
 [ 0.5-0.77317718 -0.37697426  0.10043632]
 [ 0.5 0.27108233  0.31014514  0.76179023]
 [ 0.5 0.56937257 -0.58281078 -0.29350719]]

evectmatrix  (sorted according to largest evalue first)
[[-0.56871936  0.6496399  -0.06727772  0.5   ]
 [ 0.10043632 -0.37697426 -0.77317718  0.5   ]
 [ 0.76179023  0.31014514  0.27108233  0.5   ]
 [-0.29350719 -0.58281078  0.56937257  0.5   ]]

then i can create facespace by
facespace=evectmat*adjustedfaces

till now i 've been following the steps as mentioned in the PCA
tutorial(by Lindsay smith & others)
what i want to know is that in the above evectmatrix is each row
([-0.56871936  0.6496399  -0.06727772  0.5   ] etc)  an eigenvector?
or  does a column in the above matrix represent an eigenvector?
to put it differently,
should i represent an eigenvector by
evectmatrix[i] or by
(get_column_i_of(evectmatrix)).transpose()

if someone can make this clear please do
D
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Thu, Feb 28, 2008 at 12:12 AM, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> > On Wed, 27 Feb 2008, Robert Kern apparently wrote:
>  >> Fixed in r4827.
>
>
>
>  > On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote:
>  >> For the record, this is the fixed version:
>  >> comment_start = line.find(comments)
>  >>  if comment_start > 0:
>  >>  line = line[:comments_start].strip()
>  >>  else:
>  >>  line = line.strip()
>
>
>  Three problems.
>  1. I do not see this change here:
>  http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py>
>  Am I looking in the wrong place?

I fixed the version in numpy/lib/io.py. I didn't know there was a
second version lying around. It was moved there during in the lib_io
branch but did not get removed from numpy/core during the merge.

>  2. Can I assume this was not cut and past?
>  Otherwise, I see two problems.
>
> 2a.  comment_start vs. comments_start (spelling)
> 2b.  >0 instead of >=0   (e.g., "#try me!" would not be skipped)
>
>  So I think the desired lines are actually::
>
>
> comment_start = line.find(comments)
> if comment_start >= 0:
> line = line[:comment_start].strip()
> else:
> line = line.strip()
> return line

The errors were real. They are now fixed, thank you.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Enthought Python Distribution - Beta

2008-02-27 Thread Alan G Isaac
On Wed, 27 Feb 2008, Travis Vaught apparently wrote:
> http://www.enthought.com/epd 

Looks good.
An increasing number of my students are buying Macs,
so the OSX support will be very welcome.

Cheers,
Alan Isaac



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Handling of numpy.power(0, )

2008-02-27 Thread Alan G Isaac
On Wed, 27 Feb 2008, Stuart Brorson apparently wrote:
> **  0^0:  This is problematic.


Accessible discussion:
http://en.wikipedia.org/wiki/Exponentiation#Zero_to_the_zero_power>

Cheers,
Alan Isaac



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Alan G Isaac
> On Wed, 27 Feb 2008, Robert Kern apparently wrote:
>> Fixed in r4827.


> On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker wrote:
>> For the record, this is the fixed version:
>> comment_start = line.find(comments)
>>  if comment_start > 0:
>>  line = line[:comments_start].strip()
>>  else:
>>  line = line.strip()


Three problems.
1. I do not see this change here: 
http://svn.scipy.org/svn/numpy/trunk/numpy/core/numeric.py>
Am I looking in the wrong place?

2. Can I assume this was not cut and past?
Otherwise, I see two problems.

2a.  comment_start vs. comments_start (spelling)
2b.  >0 instead of >=0   (e.g., "#try me!" would not be skipped)

So I think the desired lines are actually::

comment_start = line.find(comments)
if comment_start >= 0:
line = line[:comment_start].strip()
else:
line = line.strip()
return line

Cheers,
Alan Isaac



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: Enthought Python Distribution - Beta

2008-02-27 Thread Travis Vaught
Greetings,

Enthought is very excited about our pending wide-release of the  
Enthought Python Distribution (EPD).  After much effort, we finally  
think we're close to the first non-beta release.  As one more quality  
check, we'd love to impose on you guys one more time to try out a just- 
minted beta release for Windows (EPD 2.5.2001_beta1) and give us some  
feedback.  Any major problems will, of course, be fixed for the next  
release, but we're open to any suggestions for improvement for future  
releases as well.

http://www.enthought.com/epd

For those of you unfamiliar with EPD, it's a "kitchen-sink-included"  
distribution of Python with over 60 additional tools and libraries.   
It's bundled into a nice MSI installer on Windows and includes NumPy,  
SciPy, IPython, 2D and 3D visualization, database adapters and a lot  
of other tools right out of the box.  We'll have support for RedHat  
and Mac OS X in a general release very soon.

For academic, non-profit or hobbyist use, EPD is, and will remain,  
free.  We are charging an annual subscription for commercial and  
governmental access to downloads and updates of EPD.  Downloaded files  
may be used indefinitely past the subscription term.  You are welcome  
to try out the beta indefinitely, regardless of your commercial/non- 
commercial persuasion.  When the final (non-beta) version is released,  
commercial folks can try it for 30 days. You can check out the license  
terms (http://www.enthought.com/products/epdlicense.php) if you're  
interested in the details.

EPD is compelling because it solves a lingering packaging and  
distribution problem, but also because of the libraries which it  
includes. We owe many folks on this list a debt of gratitude for their  
work on some really great tools. So, thanks ... and enjoy!

Best Regards,

Travis N. Vaught




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Wed, Feb 27, 2008 at 6:31 PM, Christopher Barker
<[EMAIL PROTECTED]> wrote:
> Robert Kern wrote:
>  > Fixed in r4827.
>
>  Thanks Robert. For the record, this is the fixed version:
>
> comment_start = line.find(comments)
>  if comment_start > 0:
>  line = line[:comments_start].strip()
>  else:
>  line = line.strip()
>
>  Just as a matter of interest, why this, rather than line.index()? Are
>  exceptions slower than an if test?

Yes.

>  Also,
>
>  I don't see any io tests in:
>
>  numpy/lib/tests
>
>  Is that where they should be? It seems like a good idea to have a few...

Yes.

>  If I did find the time to write some tests -- how does one go about it
>  for this sort of thing? Do I put a couple sample input files in SVN? Or
>  does the test code write out the sample files, then read them in to
>  test? Or maybe do it all in memory with sStringIO or something.

Any of the above depending on the situation. Use cStringIO if you can.
Put files into numpy/lib/tests/data/ otherwise. Locate them using
os.path.join(os.path.dirname(__file__), 'data', 'mytestfile.dat').
Write things out at runtime *only* if you use tempfile correctly and
are sure you clean up properly after yourself whether the test passes
or fails.

>  Are
>  there any examples of tests of file reading code that I could borrow from?

numpy/lib/tests/test_format.py

Unfortunately, they have been written for nose, which we haven't moved
to, yet, for numpy itself.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Handling of numpy.power(0, )

2008-02-27 Thread Robert Kern
On Wed, Feb 27, 2008 at 5:10 PM, Stuart Brorson <[EMAIL PROTECTED]> wrote:
> I have been poking at the limits of NumPy's handling of powers of
>  zero.   I find some results which are disturbing, at least to me.
>  Here they are:
>
>  In [67]: A = numpy.array([0, 0, 0])
>
>  In [68]: B = numpy.array([-1, 0, 1+1j])
>
>  In [69]: numpy.power(A, B)
>  Out[69]: array([ 0.+0.j,  1.+0.j,  0.+0.j])
>
>  IMO, the answers should be [Inf, NaN, and NaN].  The reasons:
>
>  **  0^-1 is 1/0, which is infinity.  Not much argument here, I would
>  think.

I believe the failure is occurring because of the coercion to complex.
With plain floats:

In [14]: zeros(2) ** array([-1.0, 0.0])
Out[14]: array([ Inf,   1.])

>  **  0^0:  This is problematic.  People smarter than I have argued for
>  both NaN and for 1, although I understand that 1 is the preferred
>  value nowadays.  If the NumPy gurus also think so, then I buy it.

Python gives 1.0:

In [12]: 0.0 ** 0.0
Out[12]: 1.0

I'm not sure about the reasons for this, but I'm willing to assume
that they're acceptable.

>  **  0^(x+y*i):  This one is tricky; please bear with me and I'll walk
>  through the reason it should be NaN.
>
>  In general, one can write a^(x+y*i) = (r exp(i*theta))^(x+y*i) where
>  r, theta, x, and y are all reals.  Then, this expression can be
>  rearranged as:
>
>  (r^x) * (r^i*y) * exp(i*theta*(x+y*i))
>
>  = (r^x) * (r^i*y) * exp(i*theta*x) * exp(-theta*y)
>
>  Now consider what happens to each term if r = 0.

You could probably stop the analysis here. If a=0, then theta is
already undefined. I believe that NaN+NaN*j is the correct answer.

The relevant function is nc_pow() in numpy/core/src/umathmodule.c. The
problem is that a=(0+0j) is special-cased incorrectly:

if (ar == 0. && ai == 0.) {
r->real = 0.;
r->imag = 0.;
return;
}

The preceding if clause (br == 0. && bi == 0.) takes care of the
(0+0j)**(0+0j) case. It's worth noting that the general case at the
bottom returns the expected (NaN+NaN*j). However, we can't just remove
this if-clause; it makes (0+0j)**(-1+0j) return (NaN+NaN*j). It also
makes (0+0j)**(1.5+0j) give (NaN+NaN*j), too.

>  -- r^x is either 0^ = 1, or 0^ = Inf.
>
>  -- r^(i*y) = exp(i*y*ln(r)).  If y != 0 (i.e. complex power), then taking
>  the ln of r = 0 is -Inf.  But what's exp(i*-Inf)?  It's probably NaN,
>  since nothing else makes sense.
>
>  Note that if y == 0 (real power), then this term is still NaN (y*ln(r)
>  = 0*ln(0) = Nan).  However, by convention, 0^ is something other
>  than NaN.
>
>  -- exp(i*theta*x) is just a complex number.
>
>  -- exp(-theta*y) is just a real number.
>
>  Therefore, for 0^ we have Inf * NaN *  * ,
>  which is NaN.
>
>  Another observation to chew on.  I know NumPy != Matlab, but FWIW,
>  here's what Matlab says about these values:
>
>  >> A = [0, 0, 0]
>
>  A =
>
>   0 0 0
>
>  >> B = [-1, 0, 1+1*i]
>
>  B =
>
>-1.   0  1. + 1.i
>
>  >> A .^ B
>
>  ans =
>
>Inf  1. NaN +NaNi
>
>
>
>  Any reactions to this?  Does NumPy just make library calls when
>  computing power, or does it do any trapping of corner cases?  And
>  should the returns from power conform to the above suggestions?

In this case, I think Matlab looks about right.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker
Robert Kern wrote:
> Fixed in r4827.

Thanks Robert. For the record, this is the fixed version:

comment_start = line.find(comments)
 if comment_start > 0:
 line = line[:comments_start].strip()
 else:
 line = line.strip()

Just as a matter of interest, why this, rather than line.index()? Are 
exceptions slower than an if test?

Also,

I don't see any io tests in:

numpy/lib/tests

Is that where they should be? It seems like a good idea to have a few...

If I did find the time to write some tests -- how does one go about it 
for this sort of thing? Do I put a couple sample input files in SVN? Or 
does the test code write out the sample files, then read them in to 
test? Or maybe do it all in memory with sStringIO or something. Are 
there any examples of tests of file reading code that I could borrow from?

thanks,
-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Handling of numpy.power(0, )

2008-02-27 Thread Stuart Brorson
I have been poking at the limits of NumPy's handling of powers of
zero.   I find some results which are disturbing, at least to me.
Here they are:

In [67]: A = numpy.array([0, 0, 0])

In [68]: B = numpy.array([-1, 0, 1+1j])

In [69]: numpy.power(A, B)
Out[69]: array([ 0.+0.j,  1.+0.j,  0.+0.j])

IMO, the answers should be [Inf, NaN, and NaN].  The reasons:

**  0^-1 is 1/0, which is infinity.  Not much argument here, I would
think.

**  0^0:  This is problematic.  People smarter than I have argued for
both NaN and for 1, although I understand that 1 is the preferred
value nowadays.  If the NumPy gurus also think so, then I buy it.

**  0^(x+y*i):  This one is tricky; please bear with me and I'll walk
through the reason it should be NaN.

In general, one can write a^(x+y*i) = (r exp(i*theta))^(x+y*i) where
r, theta, x, and y are all reals.  Then, this expression can be
rearranged as:

(r^x) * (r^i*y) * exp(i*theta*(x+y*i))

= (r^x) * (r^i*y) * exp(i*theta*x) * exp(-theta*y)

Now consider what happens to each term if r = 0.

-- r^x is either 0^ = 1, or 0^ = Inf.

-- r^(i*y) = exp(i*y*ln(r)).  If y != 0 (i.e. complex power), then taking
the ln of r = 0 is -Inf.  But what's exp(i*-Inf)?  It's probably NaN,
since nothing else makes sense.

Note that if y == 0 (real power), then this term is still NaN (y*ln(r)
= 0*ln(0) = Nan).  However, by convention, 0^ is something other
than NaN.

-- exp(i*theta*x) is just a complex number.

-- exp(-theta*y) is just a real number.

Therefore, for 0^ we have Inf * NaN *  * , 
which is NaN.

Another observation to chew on.  I know NumPy != Matlab, but FWIW,
here's what Matlab says about these values:

>> A = [0, 0, 0]

A =

  0 0 0

>> B = [-1, 0, 1+1*i]

B =

   -1.   0  1. + 1.i

>> A .^ B

ans =

   Inf  1. NaN +NaNi



Any reactions to this?  Does NumPy just make library calls when
computing power, or does it do any trapping of corner cases?  And
should the returns from power conform to the above suggestions?

Regards,

Stuart Brorson
Interactive Supercomputing, inc.
135 Beaver Street | Waltham | MA | 02452 | USA
http://www.interactivesupercomputing.com/

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Robert Kern
On Wed, Feb 27, 2008 at 4:04 PM, Travis E. Oliphant
<[EMAIL PROTECTED]> wrote:
>  Did this discussion resolve with a fix that can go in before 1.0.5 is
>  released?

Fixed in r4827.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Travis E. Oliphant
Lisandro Dalcin wrote:
> Well, after all that said, I'm also fine with either approach. Anyway,
> I would say that my personal preference is for the one using
> 'str.index', as it is the simplest one regarding the old code.
>
> Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue
> made a coworker to get crazy (he is a newby in python/numpy).
>
> BTW, I'm pretty sure that some time ago Guido agreed about the removal
> of str.find for Py3k, but it is still there in py3k-repo. Feel free to
> ask at python-dev if any of you consider it appropriate.
>
>   

Did this discussion resolve with a fix that can go in before 1.0.5 is 
released?

-Travis O.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Lisandro Dalcin
Well, after all that said, I'm also fine with either approach. Anyway,
I would say that my personal preference is for the one using
'str.index', as it is the simplest one regarding the old code.

Like Christopher, I rarelly (never?) use 'loadtxt'. But this issue
made a coworker to get crazy (he is a newby in python/numpy).

BTW, I'm pretty sure that some time ago Guido agreed about the removal
of str.find for Py3k, but it is still there in py3k-repo. Feel free to
ask at python-dev if any of you consider it appropriate.

Regards,


On 2/27/08, Christopher Barker <[EMAIL PROTECTED]> wrote:
> David Huard wrote:
>  > The advantage of using regular expressions is that in this case it gives
>  > you some flexibility that wasn't there before. For instance, if for any
>  > reason there are two type of characters that coexist in the file to mark
>  > comments, using
>
>  > pattern = re.compile(comments)
>
> > can take care of that automatically if comments is a regular expression.
>
>
> OK -- but loadtxt() doesn't support that now anyway. I'm not writing the
>  code, nor using it at the moment, so It's fine with me either way, but
>  the re should certainly support the examples I gave that don't work now.
>  (plus probably others, that's not a comprehensive list of possibilities.)
>
>  -CHB
>
>
>  > 2008/2/27, Christopher Barker <[EMAIL PROTECTED]
>
>
> > This pattern fails if the last character of the line is a comment
>  > character, and if it is a comment only line
>
>
> --
>
> Christopher Barker, Ph.D.
>  Oceanographer
>
>  Emergency Response Division
>  NOAA/NOS/OR&R(206) 526-6959   voice
>  7600 Sand Point Way NE   (206) 526-6329   fax
>  Seattle, WA  98115   (206) 526-6317   main reception
>
>  [EMAIL PROTECTED]
>  ___
>  Numpy-discussion mailing list
>  Numpy-discussion@scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker
David Huard wrote:
> The advantage of using regular expressions is that in this case it gives 
> you some flexibility that wasn't there before. For instance, if for any 
> reason there are two type of characters that coexist in the file to mark 
> comments, using

> pattern = re.compile(comments)
> can take care of that automatically if comments is a regular expression.

OK -- but loadtxt() doesn't support that now anyway. I'm not writing the 
code, nor using it at the moment, so It's fine with me either way, but 
the re should certainly support the examples I gave that don't work now. 
(plus probably others, that's not a comprehensive list of possibilities.)

-CHB

> 2008/2/27, Christopher Barker <[EMAIL PROTECTED] 

> This pattern fails if the last character of the line is a comment
> character, and if it is a comment only line

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker

Alan Isaac wrote:

Use index instead?


yup, that'll work. enclosed is another test file, with that and one 
using string.split(comments) instead.


-Chris




--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
#!/usr/bin/env python

"""
test of loadtext issue
"""

comments = "#"

SampleLines = [" 1 2 3 4 5\n",
   " 1 2 3 4 5",
   " 1 2 3 4 5#",
   "  # 1 2 3 4 5",
   ]


#SampleLines = ["a line with a comment # this is the comment"
#   "# a comment-only line",
#   " a line with no comment, and no newline",
#   " a line with a trailing comment character, and no newline#",
#   ]

print "\nold way -- this fails with no comment of newline"
for line in SampleLines: 
print "input line: ", repr(line)
line = line[:line.find(comments)].strip()
print "output line:", repr(line)

print "\nwith regular expression:"
import re
pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()
for line in SampleLines:
print "input line: ", repr(line)
match = pattern.search(line)
line, comment = match.groups()
print "output line:", repr(line)

print "\nsimply pad the line with a space:"
for line in SampleLines: 
print "input line: ", repr(line)
line += " "
line = line[:(line).find(comments)].strip()
print "output line:", repr(line)

print "\ntest for comment not found:"
for line in SampleLines:
print "input line: ", repr(line)
i = line.find(comments)
if i == -1:
line = line.strip() 
else:
line = line[:i].strip()
print "output line:", repr(line)

print "\nuse string.split()"
for line in SampleLines: 
print "input line: ", repr(line)
line = line.strip().split(comments)[0]
print "output line:", repr(line)

print "\nuse string.index"
print "\nold way -- this fails with no comment of newline"
for line in SampleLines: 
print "input line: ", repr(line)
try:
line = line[:line.index(comments)].strip()
except ValueError:
line = line.strip()
print "output line:", repr(line)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Alan Isaac
On Wed, 27 Feb 2008, Christopher Barker wrote:
> The issue here is a result of what I consider a wart in python's string 
> methods -- string.find() returns a valid index( -1 ) when 
> it fails to find anything. 

Use index instead?

Cheers,
Alan Isaac




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread David Huard
Hi Christopher,

The advantage of using regular expressions is that in this case it gives you
some flexibility that wasn't there before. For instance, if for any reason
there are two type of characters that coexist in the file to mark comments,
using

pattern = re.compile(comments)
for i,line in enumerate(fh):
 if i:
>
> David Huard wrote:
> > Would everyone be satisfied with a solution using regular expressions ?
>
>
> Maybe it's because regular expressions make me itch, but I think it's
> overkill for this.
>
> The issue here is a result of what I consider a wart in python's string
> methods -- string.find() returns a valid index( -1 ) when it fails to
> find anything. The usual way to work with this is to test for it:
>
> print "test for comment not found:"
> for line in SampleLines:
>  i = line.find(comments)
>  if i == -1:
>  line = line.strip()
>  else:
>  line = line[:i].strip()
>  print line
>
> which does seem like a lot of extra code.
>
> In this case, that wasn't' done, as most of the time there is a newline
> at the end that can be thrown away anyway, so the -1 index is OK. So
> that inspired the following solution -- just add an extra space every
> time:
>
> print "simply pad the line with a space:"
> for line in SampleLines:
>  line += " "
>
>  line = line[:(line).find(comments)].strip()
>
>  print line
>
> an extra string creation, but simple.
>
>
> > pattern = re.compile(r"""
> > ^\s* # leading white space
> > (.*) # Data
> > %s?  # Zero or one comment character
> > (.*) # Comments
> > \s*$ # Trailing white space
> > """%comments, re.VERBOSE)
>
>
> This pattern fails if the last character of the line is a comment
> character, and if it is a comment only line, though I'm sure that could
> be fixed. I still prefer the python string methods approaches, though.
>
> I've enclosed a little test code, that gives these results:
>
> old way -- this fails with no comment of newline
> 1 2 3 4 5
> 1 2 3 4
> 1 2 3 4 5
>
> with regular expression:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5#
> # 1 2 3 4 5
> simply pad the line with a space:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5
>
> test for comment not found:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5
>
> My suggestions work on all my test cases. We really should put these,
> and others, into a real unit test when this fix is added.
>
> -Chris
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> [EMAIL PROTECTED]
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread Christopher Barker

David Huard wrote:

Would everyone be satisfied with a solution using regular expressions ?


Maybe it's because regular expressions make me itch, but I think it's 
overkill for this.


The issue here is a result of what I consider a wart in python's string 
methods -- string.find() returns a valid index( -1 ) when it fails to 
find anything. The usual way to work with this is to test for it:


print "test for comment not found:"
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip()
else:
line = line[:i].strip()
print line

which does seem like a lot of extra code.

In this case, that wasn't' done, as most of the time there is a newline 
at the end that can be thrown away anyway, so the -1 index is OK. So 
that inspired the following solution -- just add an extra space every time:


print "simply pad the line with a space:"
for line in SampleLines:
line += " "
line = line[:(line).find(comments)].strip()
print line

an extra string creation, but simple.


pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)


This pattern fails if the last character of the line is a comment 
character, and if it is a comment only line, though I'm sure that could 
be fixed. I still prefer the python string methods approaches, though.


I've enclosed a little test code, that gives these results:

old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5

with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

My suggestions work on all my test cases. We really should put these, 
and others, into a real unit test when this fix is added.


-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
#!/usr/bin/env python

"""
test of loadtext issue
"""

comments = "#"

SampleLines = [" 1 2 3 4 5\n",
   " 1 2 3 4 5",
   " 1 2 3 4 5#",
   "  # 1 2 3 4 5",
   ]


#SampleLines = ["a line with a comment # this is the comment"
#   "# a comment-only line",
#   " a line with no comment, and no newline",
#   " a line with a trailing comment character, and no newline#",
#   ]

print "old way -- this fails with no comment of newline"
for line in SampleLines: 
line = line[:line.find(comments)].strip()
print line

print "with regular expression:"
import re
pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()
for line in SampleLines:
match = pattern.search(line)
line, comment = match.groups()
print line

print "simply pad the line with a space:"
for line in SampleLines: 
line += " "
line = line[:(line).find(comments)].strip()
print line

print "test for comment not found:"
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip() 
else:
line = line[:i].strip()
print line

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread David Huard
Lisandro,

When you have some time, could you check this patch solves your problem (and
does not introduce new ones) ?

David


Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 4824)
+++ numpy/lib/io.py (working copy)
@@ -11,6 +11,7 @@
 import cStringIO
 import tempfile
 import os
+import re

 from cPickle import load as _cload, loads
 from _datasource import DataSource
@@ -291,9 +292,12 @@
 converterseq = [_getconv(dtype.fields[name][0]) \
 for name in dtype.names]

+# Remove comments and leading/trailing white space
+pattern = re.compile(comments)
 for i,line in enumerate(fh):
 if i:
>
> I can look at it.
>
> Would everyone be satisfied with a solution using regular expressions ?
> That is, looking for the following pattern:
>
> pattern = re.compile(r"""
> ^\s* # leading white space
> (.*) # Data
> %s?  # Zero or one comment character
> (.*) # Comments
> \s*$ # Trailing white space
> """%comments, re.VERBOSE)
>
> match = pattern.search(line)
> line, comment = match.groups()
>
> instead of
>
> line = line[:line.find(comments)].strip()
>
> By the way, is there a test function for loadtxt and savetxt ? I couldn't
> find one.
>
>
> David
>
> 2008/2/26, Alan G Isaac <[EMAIL PROTECTED]>:
> >
> > On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote:
> > > I believe the current 'loadtxt' function is broken
> >
> >
> > I agree:
> >  > http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html
> > >
> >
> > Cheers,
> >
> > Alan Isaac
> >
> >
> >
> >
> > ___
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt broken if file does not end in newline

2008-02-27 Thread David Huard
I can look at it.

Would everyone be satisfied with a solution using regular expressions ?
That is, looking for the following pattern:

pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s?  # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()

instead of

line = line[:line.find(comments)].strip()

By the way, is there a test function for loadtxt and savetxt ? I couldn't
find one.


David

2008/2/26, Alan G Isaac <[EMAIL PROTECTED]>:
>
> On Tue, 26 Feb 2008, Lisandro Dalcin apparently wrote:
> > I believe the current 'loadtxt' function is broken
>
>
> I agree:
>  http://projects.scipy.org/pipermail/numpy-discussion/2007-November/030057.html
> >
>
> Cheers,
>
> Alan Isaac
>
>
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trouble With MaskedArray and Shared Masks

2008-02-27 Thread Pierre GM
Alexander,

> create the MaskedArray to:
> >>> a = numpy.ma.MaskedArray(
>
> ... data=numpy.zeros((4,5), dtype=float),
> ... mask=True,
> ... fill_value=0.0
> ... )

By far the easiest indeed.


> >  So: should we introduce this extra parameter ?
>
> The propagation semantics and mechanics are definitely tricky,
> especially considering that it seems that the "right behavior" is
> context dependent. Are the mask propagation rules spelled out anywhere
> (aside from the code! :-))? 

Mmh, no: we tried to avoid mask propagation as much as possible, as it can 
have some fairly disastrous side-effects. In other terms, no propagation by 
default when a mask is shared, propagation when the mask is not shared.


> I could see some potential value to an 
> additional argument, but the constructor is already quite complicated
> so I'm reluctant to say "Yes" outright, especially with my current
> level of understanding. 

Yes, there are already a lot of parameters, some more useful than others:
hard_mask : if True, prevent a masked value to be accidentally unmasked.
shrink: if True, force a mask full of False to nomask
keep_mask : when creating a new masked_array for an existing one, specifies 
whether the old mask should be taken into account or not. By default, 
keep_mask is True

For example:
>>>import numpy.mas as ma
>>>x=ma.array([1,2,3,4,5],mask=[1,0,0,1,0])
>>>y=ma.array(x)
>>>y
masked_array(data = [-- 2 3 -- 5],
  mask = [ True False False  True False],
  fill_value=99)

We just inherited the mask from x: y._mask and x._mask are the same object, 
and y._sharedmask is True. Now, let's change keep_mask to False

>>>y=ma.array(x,keep_mask=False)
>>>y
masked_array(data = [1 2 3 4 5],
  mask = False,
  fill_value=99)
We keep the data from x, but we force the mask to the default (viz, nomask)
Now for some more fun: remember that we keep the mask by defulat

>>>y=ma.array(x,mask=[0,0,0,0,1])
>>>y
masked_array(data = [-- 2 3 -- --],
  mask = [ True False False  True  True],
  fill_value=99)

We kept the mask of x ([1,0,0,1,0]) and combined it with our new mask 
([0,0,0,0,1]), so y._mask=[1,0,0,1,1]
If you really want [0,0,0,0,1] as a mask, just drop the initial mask:
>>>y=ma.array(x,mask=[0,0,0,0,1], keep_mask=False)
>>>y
masked_array(data = [1 2 3 4 --],
  mask = [False False False False  True],
  fill_value=99)






> At the very least, perhaps the doc-string 
> should be amended to include the note that if a mask is provided, it
> is assumed to be shared and a copy of it will be made when/if it is
> modified.
Sounds like a good idea. is there a wiki page for MaskedArrays somewhere ? If 
not, Alexander, feel free to start one from your experience, I'll update if 
needed.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A little help please?

2008-02-27 Thread Travis E. Oliphant
Neal Becker wrote:
> Travis E. Oliphant wrote:
>
>
>   
>
> The code for this is a bit hard to understand.  It does appear that it only
> searches for a conversion on the 2nd argument.  I don't think that's
> desirable behavior.
>
> What I'm wondering is, this works fine for builtin types.  What is different
> in the handling of builtin types?
>   

There are quite a few differences which lead to the current issues. 

1) For built-in types there is a coercion order that can be searched 
more intelligently which does not exist for user-defined
types.
2) For built-in types all the 1d loops are stored in a single C-array in 
the same order as the signatures.  The entire signature list is scanned 
until a signature to which all inputs can be cast is found.  
3) For user-defined types the 1d loops (functions) for a particular 
user-defined type are stored in a linked-list that itself is stored in a 
Python dictionary (as a C-object) attached to the ufunc and keyed by the 
user-defined type (of the first argument).

Thus, what is missing is code to search all the linked lists in all the 
entries of all the user-defined types on input (only the linked-list 
keyed by the first user-defined type is searched at the moment).This 
would allow similar behavior to the built-in types (but a bit more 
expensive searching).

-Travis O.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trouble With MaskedArray and Shared Masks

2008-02-27 Thread Alexander Michael
On Tue, Feb 26, 2008 at 2:32 PM, Pierre GM <[EMAIL PROTECTED]> wrote:
> Alexander,
>  The rationale behind the current behavior is to avoid an accidental
>  propagation of the mask. Consider the following example:
>
>  >>>m = numpy.array([1,0,0,1,0], dtype=bool_)
>  >>>x = numpy.array([1,2,3,4,5])
>  >>>y = numpy.sqrt([5,4,3,2,1])
>  >>>mx = masked_array(x,mask=m)
>  >>>my = masked_array(y,mask=m)
>  >>>mx[0] = 0
>  >>>print mx,my, m
>  [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False  True False]
>
>  At the creation, mx._sharedmask and my._sharedmask are both True. Setting
>  mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my.
>
>  Now,
>  >>>m = numpy.array([1,0,0,1,0], dtype=bool_)
>  >>>x = numpy.array([1,2,3,4,5])
>  >>>y = numpy.sqrt([5,4,3,2,1])
>  >>>mx = masked_array(x,mask=m)
>  >>>my = masked_array(y,mask=m)
>  >>>mx._sharedmask = False
>  >>>mx[0] = 0
>  >>>print mx,my, m
>  [0 2 3 -- 5] [5 4 3 -- 1] [False False False  True False]
>
>  By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to
>  update the mask of mx (that is, m), and my gets updated. Sometimes it's what
>  you want (your case for example), often it is not: I've been bitten more than
>  once before reintroducing the _sharedmask flag.
>
>  As you've observed, setting a private flag isn't a very good idea: you should
>  use the .unshare_mask() function instead, that copies the mask and set the
>  _sharedmask to False. OK, in your example, copying the mask is not needed,
>  but in more general cases, it is.
>
>  At the initialization, self._sharedmask is set to (not copy). That is, if you
>  didn't specify copy=True at the creation (the default being copy=False),
>  self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we
>  could introduce yet another parameter to masked_array/array/MaskedArray,
>  share_mask, that would take a default value of True and set
>  self._sharedmask=(not copy)&share_mask

Thank you for your thorough explanation. I was providing the mask
array to the constructor in order to do my own allocating, mostly to
ensure that the MaskedArray had a dense mask that *wouldn't* be
replaced with a copy without my intentional instruction. I didn't
realize that the MaskedArray was not taking ownership of provided mask
(even though copy was False) because the implied usage for providing
the mask explicitly is to read-only alias another MaskedArray's mask.
I was working against my own goal! Now that I understand a little
better, the easiest/betst thing for me to do is change the way I
create the MaskedArray to:

>>> a = numpy.ma.MaskedArray(
... data=numpy.zeros((4,5), dtype=float),
... mask=True,
... fill_value=0.0
... )

This appears to cause MaskedArray to create a dense mask which
persists (i.e. isn't replaced by a copy) for the lifetime of the
MaskedArray.

>  So: should we introduce this extra parameter ?

The propagation semantics and mechanics are definitely tricky,
especially considering that it seems that the "right behavior" is
context dependent. Are the mask propagation rules spelled out anywhere
(aside from the code! :-))? I could see some potential value to an
additional argument, but the constructor is already quite complicated
so I'm reluctant to say "Yes" outright, especially with my current
level of understanding. At the very least, perhaps the doc-string
should be amended to include the note that if a mask is provided, it
is assumed to be shared and a copy of it will be made when/if it is
modified. How does the keep_mask option play into this? I don't
understand what that one does yet.

Thanks!
Alex
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimize speed of for loop using numpy

2008-02-27 Thread Trond Kristiansen
Hey all.

I would just like to thank you all for extremely good feedback on my problem
with optimizing loops. Thank you all for being so helpful.
Cheers, Trond


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A little help please?

2008-02-27 Thread Neal Becker
Travis E. Oliphant wrote:

> Neal Becker wrote:
>> My user-defined type project has mostly gone well, but I'm stuck on
>> mixed-type arithmetic.
>>
>> I have 2 types: cmplx_int32 and cmplx_int64.  I have added basic
>> arithmetic for those types, and for mix of those arrays and their
>> respective scalars. But mixed arithmetic only partly works
> This is an area that needs testing and possible fixes.   The relevant
> code is in ufuncobject.c (select_types) and in multiarraymodule.c
> (PyArray_CanCoerceScalar).If you can go through that code you may be
> able to see what the problem is and let us know.
> 
> I tried to support this kind of thing you are doing, but I'm not sure
> how well I succeeded because I didn't have time or the code to test it
> with.  Thus, there is still some work to do.
> 
> The fact that radd is not called is because ufuncs try to handle
> everything (the ufunc is more general than just the functions with "r"
> prefixes.   I think one problem may be due to the fact that the first
> argument to a ufunc is the one that defines the search for the correctly
> registered function and there may be no code to allow other arguments to
> direct the search should that one fail.
> 
> I'm actually pleased you've gotten this far.   I'll keep trying to help
> as I get time.
> 

The code for this is a bit hard to understand.  It does appear that it only
searches for a conversion on the 2nd argument.  I don't think that's
desirable behavior.

What I'm wondering is, this works fine for builtin types.  What is different
in the handling of builtin types?


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion