[Numpy-discussion] life expectancy of scipy.stats nan statistics

2011-08-19 Thread josef . pktd
I'm just looking at http://projects.scipy.org/scipy/ticket/1200

I agree with Ralf that the bias keyword should be changed to ddof as
in the numpy functions. For functions in scipy.stats, and statistics
in general, I prefer the usual axis=0 default.

However, I think these functions, like scipy.stats.nanstd, should be
replaced by corresponding numpy functions, which might happen
relatively soon. But how soon?

Is it worth deprecating bias in scipy 0.10, and then deprecate again
for removal in 0.11 or 0.12?

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey
On Fri, Aug 19, 2011 at 3:05 PM, Mark Wiebe  wrote:
> On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris
>  wrote:
>>
>>
>> On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey 
>> wrote:
>>>
>>> Hi,
>>> Just some immediate minor observations that are really about trying to
>>> be consistent:
>>>
>>> 1) Could you keep the display of the NA dtype be the same as the array?
>>> For example, NA dtype is displayed as '>> 'float64' as that is the array dtype.
>>>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>>> >>> a
>>> array([[  1.,   2.,   3., NA],
>>>       [  3.,   4.,  nan,   5.]])
>>> >>> a.dtype
>>> dtype('float64')
>>> >>> a.sum()
>>> NA(dtype='>>
>>> 2) Can the 'skipna' flag be added to the methods?
>>> >>> a.sum(skipna=True)
>>> Traceback (most recent call last):
>>>  File "", line 1, in 
>>> TypeError: 'skipna' is an invalid keyword argument for this function
>>> >>> np.sum(a,skipna=True)
>>> nan
>>>
>>> 3) Can the skipna flag be extended to exclude other non-finite cases like
>>> NaN?
>>>
>>> 4) Assigning a np.NA needs a better error message but the Integer
>>> array case is more informative:
>>> >>> b=np.array([1,2,3,4], dtype=np.float128)
>>> >>> b[0]=np.NA
>>> Traceback (most recent call last):
>>>  File "", line 1, in 
>>> TypeError: float() argument must be a string or a number
>>>
>>> >>> j=np.array([1,2,3])
>>> >>> j
>>> array([1, 2, 3])
>>> >>> j[0]=ina
>>> Traceback (most recent call last):
>>>  File "", line 1, in 
>>> TypeError: int() argument must be a string or a number, not
>>> 'numpy.NAType'
>>>
>>> But it is nice that np.NA 'adjusts' to the insertion array:
>>> >>> b.flags.maskna = True
>>> >>> ana
>>> NA(dtype='>> >>> b[0]=ana
>>> >>> b[0]
>>> NA(dtype='>>
>>> 5) Different display depending on masked state. That is I think that
>>> 'maskna=True' should be displayed always when flags.maskna is True :
>>> >>> j=np.array([1,2,3], dtype=np.int8)
>>> >>> j
>>> array([1, 2, 3], dtype=int8)
>>> >>> j.flags.maskna=True
>>> >>> j
>>> array([1, 2, 3], maskna=True, dtype=int8)
>>> >>> j[0]=np.NA
>>> >>> j
>>> array([NA, 2, 3], dtype=int8) # Ithink it should still display
>>> 'maskna=True'.
>>>
>>
>> My main peeve is that NA is upper case ;) I suppose that could use some
>> discussion.
>
> There is some proliferation of cases in the NaN case:
 np.nan
> nan
 np.NAN
> nan
 np.NaN
> nan
> The pros I see for NA over na are:
> * less confusion of NA vs nan (should this carry over to the np.isna
> function, should it be np.isNA according to this point?)
> * more comfortable for switching between NumPy and R when people have to use
> both at the same time
> The main con is:
> * Inconsistent with current nan, inf printing. Here's a hackish workaround:
 np.na = np.NA
 np.set_printoptions(nastr='na')
 np.array([np.na, 2.0])
> array([na,  2.])
> What's your list of pros and cons?
> -Mark
>
>>
>> Chuck
>>

In part I sort of like to have NA and nan since poor
eyesight/typing/editing avoiding problems dropping the last 'n'.

Regarding nan/NAN, do you mean something like my ticket 1051?
http://projects.scipy.org/numpy/ticket/1051
I do not care that much about the case (mixed case is not good)
provided that there is only one to specify these.

Also should np.isfinite() return False for np.NA?
>>> np.isfinite([1,2,np.NA,4])
array([ True,  True, NA,  True], dtype=bool)

Anyhow, many thanks for the replies to my observations and your
amazing effect in getting this done.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ImportError: dynamic module does not define init function (initmultiarray)

2011-08-19 Thread Dominique Orban
Dear list,

I'm embedding Python inside a C program to pull functions from
user-supplied Python modules. All is well except when the
user-supplied module imports numpy. Requesting a stack trace when an
exception occurs reveals the following:

---
Traceback (most recent call last):
  File "/Users/dpo/.virtualenvs/matrox/matrox/curve.py", line 3, in 
import numpy as np
  File 
"/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/__init__.py",
line 137, in 
import add_newdocs
  File 
"/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/add_newdocs.py",
line 9, in 
from numpy.lib import add_newdoc
  File 
"/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/__init__.py",
line 4, in 
from type_check import *
  File 
"/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/type_check.py",
line 8, in 
import numpy.core.numeric as _nx
  File 
"/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/core/__init__.py",
line 5, in 
import multiarray
ImportError: dynamic module does not define init function (initmultiarray)
---

(here, "curve.py" is the user-supplied module in question.)

The symbol initmultiarray *is* defined in multiarray.so so I'm
wondering if anybody has suggestions as to what the problem may be
here.

A bit of Googling reveals the following:

* The 3rd example of Section 31.2.5 of
http://www.swig.org/Doc1.3/Python.html says

   "This error is almost always caused when a bad name is given to the
shared object file. For example, if you created a file example.so
instead of _example.so you would get this error."

* Item #2 in the FAQ at http://biggles.sourceforge.net/doc/1.5/faq says

  "This is a problem with your module search path. Python is loading
[multiarray].so as a module instead of [multiarray].py"

But I don't have any multiarray.py. I have other multiarray.so's, but
they're not in my search path. And I'm not finding any _multiarray.so
with a leading underscore.

So I am lead to ask: should multiarray.so really be called
_multiarray.so? If not, any idea what the problem is?

I'm using Python 2.7.2 compiled as a framework using Homebrew on OSX
10.6.8 and Numpy 1.6.1 installed from PyPi a day or two ago.

Thanks much in advance!

-- 
Dominique
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Thu, Aug 18, 2011 at 2:43 PM, Mark Wiebe  wrote:

> It's taken a lot of changes to get the NA mask support to its current
> point, but the code ready for some testing now. You can read the
> work-in-progress release notes here:
>
>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>
> To try it out, check out the missingdata branch from my github account,
> here, and build in the standard way:
>
> https://github.com/m-paradox/numpy
>
> The things most important to test are:
>
> * Confirm that existing code still works correctly. I've tested against
> SciPy and matplotlib.
> * Confirm that the performance of code not using NA masks is the same or
> better.
> * Try to do computations with the NA values, find places they don't work
> yet, and nominate unimplemented functionality important to you to be next on
> the development list. The release notes have a preliminary list of
> implemented/unimplemented functions.
> * Report any crashes, build problems, or unexpected behaviors.
>
> In addition to adding the NA mask, I've also added features and done a few
> performance changes here and there, like letting reductions like sum take
> lists of axes instead of being a single axis or all of them. These changes
> affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
>

With a new fix to the unitless reduction logic I just committed, the
situation for bug http://projects.scipy.org/numpy/ticket/450 is also
improved.

Cheers,
Mark


> Thanks!
> Mark
>
> Here's a small example run using NAs:
>
> >>> import numpy as np
> >>> np.__version__
> '2.0.0.dev-8a5e2a1'
> >>> a = np.random.rand(3,3,3)
> >>> a.flags.maskna = True
> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >>> a
> array([[[NA, NA,  0.11511708],
> [ 0.46661454,  0.47565512, NA],
> [NA, NA, NA]],
>
>[[NA,  0.57860351, NA],
> [NA, NA,  0.72012669],
> [ 0.36582123, NA,  0.76289794]],
>
>[[ 0.65322748,  0.92794386, NA],
> [ 0.53745165,  0.97520989,  0.17515083],
> [ 0.71219688,  0.5184328 ,  0.75802805]]])
> >>> np.mean(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
>[NA,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
>[NA,  0.32710662,  0.10384331]])
> >>> np.mean(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.11511708,  0.47113483, nan],
>[ 0.57860351,  0.72012669,  0.56435958],
>[ 0.79058567,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.,  0.00452029, nan],
>[ 0.,  0.,  0.19853835],
>[ 0.13735819,  0.32710662,  0.10384331]])
> >>> np.std(a, axis=(1,2), skipna=True)
> array([ 0.16786895,  0.15498008,  0.23811937])
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy segfaults with ctypes

2011-08-19 Thread Matthew Brett
Hi,

On Fri, Aug 19, 2011 at 1:04 PM, Angus McMorland  wrote:
> Hi all,
>
> I'm giving this email a new subject, in case that helps it catch the
> attention of someone who can fix my problem. I currently cannot
> upgrade numpy from git to any date more recent than 10 July. Git
> commit feb8079070b8a659d7ee is the first that causes the problem
> (according to github, the commit was authored by walshb and committed
> by m-paradox, in case that jogs anyone's memory). I've tried taking a
> look at the code diff, but I'm afraid I'm just a user, rather than a
> developer, and it didn't make much sense.
>
> My problem is that python segfaults when I run it with the following code:
>
>> from ctypes import Structure, c_double
>>
>> #-- copied out of an xml2py generated file
>> class S(Structure):
>>    pass
>> S._pack_ = 4
>> S._fields_ = [
>>    ('field', c_double * 2),
>>   ]
>> #--
>>
>> import numpy as np
>> print np.version.version
>> s = S()
>> print "S", np.asarray(s.field)

Just to say, that that commit is also the commit that causes a
segfault for np.lookfor:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg33114.html
http://projects.scipy.org/numpy/ticket/1937

The latter ticket is closed because Mark's missing-data development
branch does not have the segfault.

I guess you could try that branch and see whether it fixes the problem?

I guess also that means we'll have to merge in the missing data branch
in order to fix the problem.

See you,

matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey wrote:
>
>> Hi,
>> Just some immediate minor observations that are really about trying to
>> be consistent:
>>
>> 1) Could you keep the display of the NA dtype be the same as the array?
>> For example, NA dtype is displayed as '> 'float64' as that is the array dtype.
>>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>> >>> a
>> array([[  1.,   2.,   3., NA],
>>   [  3.,   4.,  nan,   5.]])
>> >>> a.dtype
>> dtype('float64')
>> >>> a.sum()
>> NA(dtype='>
>> 2) Can the 'skipna' flag be added to the methods?
>> >>> a.sum(skipna=True)
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: 'skipna' is an invalid keyword argument for this function
>> >>> np.sum(a,skipna=True)
>> nan
>>
>> 3) Can the skipna flag be extended to exclude other non-finite cases like
>> NaN?
>>
>> 4) Assigning a np.NA needs a better error message but the Integer
>> array case is more informative:
>> >>> b=np.array([1,2,3,4], dtype=np.float128)
>> >>> b[0]=np.NA
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: float() argument must be a string or a number
>>
>> >>> j=np.array([1,2,3])
>> >>> j
>> array([1, 2, 3])
>> >>> j[0]=ina
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: int() argument must be a string or a number, not 'numpy.NAType'
>>
>> But it is nice that np.NA 'adjusts' to the insertion array:
>> >>> b.flags.maskna = True
>> >>> ana
>> NA(dtype='> >>> b[0]=ana
>> >>> b[0]
>> NA(dtype='>
>> 5) Different display depending on masked state. That is I think that
>> 'maskna=True' should be displayed always when flags.maskna is True :
>> >>> j=np.array([1,2,3], dtype=np.int8)
>> >>> j
>> array([1, 2, 3], dtype=int8)
>> >>> j.flags.maskna=True
>> >>> j
>> array([1, 2, 3], maskna=True, dtype=int8)
>> >>> j[0]=np.NA
>> >>> j
>> array([NA, 2, 3], dtype=int8) # Ithink it should still display
>> 'maskna=True'.
>>
>>
> My main peeve is that NA is upper case ;) I suppose that could use some
> discussion.
>

There is some proliferation of cases in the NaN case:

>>> np.nan
nan
>>> np.NAN
nan
>>> np.NaN
nan

The pros I see for NA over na are:

* less confusion of NA vs nan (should this carry over to the np.isna
function, should it be np.isNA according to this point?)
* more comfortable for switching between NumPy and R when people have to use
both at the same time

The main con is:

* Inconsistent with current nan, inf printing. Here's a hackish workaround:

>>> np.na = np.NA
>>> np.set_printoptions(nastr='na')
>>> np.array([np.na, 2.0])
array([na,  2.])

What's your list of pros and cons?

-Mark


>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy segfaults with ctypes

2011-08-19 Thread Angus McMorland
Hi all,

I'm giving this email a new subject, in case that helps it catch the
attention of someone who can fix my problem. I currently cannot
upgrade numpy from git to any date more recent than 10 July. Git
commit feb8079070b8a659d7ee is the first that causes the problem
(according to github, the commit was authored by walshb and committed
by m-paradox, in case that jogs anyone's memory). I've tried taking a
look at the code diff, but I'm afraid I'm just a user, rather than a
developer, and it didn't make much sense.

My problem is that python segfaults when I run it with the following code:

> from ctypes import Structure, c_double
>
> #-- copied out of an xml2py generated file
> class S(Structure):
>    pass
> S._pack_ = 4
> S._fields_ = [
>    ('field', c_double * 2),
>   ]
> #--
>
> import numpy as np
> print np.version.version
> s = S()
> print "S", np.asarray(s.field)

Thanks,

Angus
-- 
AJC McMorland
Post-doctoral research fellow
Neurobiology, University of Pittsburgh
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Ralf Gommers
On Fri, Aug 19, 2011 at 9:15 PM, Mark Wiebe  wrote:

> On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote:
>
>> Hi,
>> Just some immediate minor observations that are really about trying to
>> be consistent:
>>
>> 1) Could you keep the display of the NA dtype be the same as the array?
>> For example, NA dtype is displayed as '> 'float64' as that is the array dtype.
>>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>> >>> a
>> array([[  1.,   2.,   3., NA],
>>   [  3.,   4.,  nan,   5.]])
>> >>> a.dtype
>> dtype('float64')
>> >>> a.sum()
>> NA(dtype='>
>
> I suppose I can do it that way, sure. I think it would be good to change
> the 'float64' into '
>
I don't think that looks better. It would also screws up people's doctests
again.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Statistical distributions on samples

2011-08-19 Thread alan
I have applied the update to the documentation (although that function
needs a general rewrite - later...)

>On Mon, Aug 15, 2011 at 8:53 AM, Andrea Gavana wrote:
>
>> Hi Chris and All,
>>
>> On 12 August 2011 16:53, Christopher Jordan-Squire wrote:
>> > Hi Andrea--An easy way to get something like this would be
>> >
>> > import numpy as np
>> > import scipy.stats as stats
>> >
>> > sigma = #some reasonable standard deviation for your application
>> > x = stats.norm.rvs(size=1000, loc=125, scale=sigma)
>> > x = x[x>50]
>> > x = x[x<200]
>> >
>> > That will give a roughly normal distribution to your velocities, as long
>> as,
>> > say, sigma<25. (I'm using the rule of thumb for the normal distribution
>> that
>> > normal random samples lie 3 standard deviations away from the mean about
>> 1
>> > out of 350 times.) Though you won't be able to get exactly normal errors
>> > about your mean since normal random samples can theoretically be of any
>> > size.
>> >
>> > You can use this same process for any other distribution, as long as
>> you've
>> > chosen a scale variable so that the probability of samples being outside
>> > your desired interval is really small. Of course, once again your random
>> > errors won't be exactly from the distribution you get your original
>> samples
>> > from.
>>
>> Thank you for your suggestion. There are a couple of things I am not
>> clear with, however. The first one (the easy one), is: let's suppose I
>> need 200 values, and the accept/discard procedure removes 5 of them
>> from the list. Is there any way to draw these 200 values from a bigger
>> sample so that the accept/reject procedure will not interfere too
>> much? And how do I get 200 values out of the bigger sample so that
>> these values are still representative?
>>
>
>FWIW, I'm not really advocating a truncated normal so much as making the
>standard deviation small enough so that there's no real difference between a
>true normal distribution and a truncated normal.
>
>If you're worried about getting exactly 200 samples, then you could sample N
>with N>200 and such that after throwing out the ones that lie outside your
>desired region you're left with M>200. Then just randomly pick 200 from
>those M. That shouldn't bias anything as long as you randomly pick them. (Or
>just pick the first 200, if you haven't done anything to impose any order on
>the samples, such as sorting them by size.) But I'm not sure why you'd want
>exactly 200 samples instead of some number of samples close to 200.
>
>
>>
>> Another thing, possibly completely unrelated. I am trying to design a
>> toy Latin Hypercube script (just for my own understanding). I found
>> this piece of code on the web (and I modified it slightly):
>>
>> def lhs(dist, size=100):
>>'''
>>Latin Hypercube sampling of any distrbution.
>>dist is is a scipy.stats random number generator
>>such as stats.norm, stats.beta, etc
>>parms is a tuple with the parameters needed for
>>the specified distribution.
>>
>>:Parameters:
>>- `dist`: random number generator from scipy.stats module.
>>- `size` :size for the output sample
>>'''
>>
>>n = size
>>
>>perc = numpy.arange(0.0, 1.0, 1.0/n)
>>numpy.random.shuffle(perc)
>>
>>smp = [stats.uniform(i,1.0/n).rvs() for i in perc]
>>
>>v = dist.ppf(smp)
>>
>>return v
>>
>>
>> Now, I am not 100% clear of what the percent point function is (I have
>> read around the web, but please keep in mind that my statistical
>> skills are close to minus infinity). From this page:
>>
>> http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm
>>
>>
>The ppf is what's called the quantile function elsewhere. I do not know why
>scipy calls it the ppf/percent point function.
>
>The quantile function is the inverse of the cumulative density function
>(cdf). So dist.ppf(z) is the x such that P(dist <= x) = z. Roughly. (Things
>get slightly more finicky if you think about discrete distributions because
>then you have to pick what happens at the jumps in the cdf.) So
>dist.ppf(0.5) gives the median of dist, and dist.ppf(0.25) gives the
>lower/first quartile of dist.
>
>
>> I gather that, if you plot the results of the ppf, with the horizontal
>> axis as probability, the vertical axis goes from the smallest to the
>> largest value of the cumulative distribution function. If i do this:
>>
>> numpy.random.seed(123456)
>>
>> distribution = stats.norm(loc=125, scale=25)
>>
>> my_lhs = lhs(distribution, 50)
>>
>> Will my_lhs always contain valid values (i.e., included between 50 and
>> 200)? I assume the answer is no... but even if this was the case, is
>> this my_lhs array ready to be used to setup a LHS experiment when I
>> have multi-dimensional problems (in which all the variables are
>> completely independent from each other - no correlation)?
>>
>>
>I'm not really sure if the above function is doing the lhs you want.   To
>answer your question, it won't always generate values within [50,2

Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey  wrote:

> Hi,
> Just some immediate minor observations that are really about trying to
> be consistent:
>
> 1) Could you keep the display of the NA dtype be the same as the array?
> For example, NA dtype is displayed as ' 'float64' as that is the array dtype.
>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
> >>> a
> array([[  1.,   2.,   3., NA],
>   [  3.,   4.,  nan,   5.]])
> >>> a.dtype
> dtype('float64')
> >>> a.sum()
> NA(dtype='

I suppose I can do it that way, sure. I think it would be good to change the
'float64' into ' 2) Can the 'skipna' flag be added to the methods?
> >>> a.sum(skipna=True)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: 'skipna' is an invalid keyword argument for this function
> >>> np.sum(a,skipna=True)
> nan
>

Yeah, but I think this is low priority compared to a lot of other things
that need doing. The methods are written in C with a particular hardcoded
implementation pattern, whereas with the functions in the numpy namespace I
was able to adjust to call the ufunc reduce methods without much menial
effort.

3) Can the skipna flag be extended to exclude other non-finite cases like
> NaN?
>

That wasn't really within the scope of the original design, except for one
particular case of the NA-bitpattern dtypes. It's possible to make a new
mask and assign NA to the NaN values like this:

a = [array with NaNs]
aview = a.view(ownmaskna=True)
aview[np.isnan(aview)] = np.NA
np.sum(aview, skipna=True)

4) Assigning a np.NA needs a better error message but the Integer
> array case is more informative:
> >>> b=np.array([1,2,3,4], dtype=np.float128)
> >>> b[0]=np.NA
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: float() argument must be a string or a number
>
> >>> j=np.array([1,2,3])
> >>> j
> array([1, 2, 3])
> >>> j[0]=ina
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: int() argument must be a string or a number, not 'numpy.NAType'
>

I coded this up the way I did to ease the future transition to NA-bitpattern
dtypes, which would handle this conversion from the NA object. The error
message is being produced by CPython in both of these cases, so it looks
like they didn't make their messages consistent.

This could be changed to match the error message like this:

>>> a = np.array([np.NA, 3])
>>> b = np.array([3,4])
>>> b[...] = a
Traceback (most recent call last):
  File "", line 1, in 
ValueError: Cannot assign NA value to an array which does not support NAs


> But it is nice that np.NA 'adjusts' to the insertion array:
> >>> b.flags.maskna = True
> >>> ana
> NA(dtype=' >>> b[0]=ana
> >>> b[0]
> NA(dtype='

It should generally follow the NumPy type promotion rules, but may be a bit
more liberal in places.


> 5) Different display depending on masked state. That is I think that
> 'maskna=True' should be displayed always when flags.maskna is True :
> >>> j=np.array([1,2,3], dtype=np.int8)
> >>> j
> array([1, 2, 3], dtype=int8)
> >>> j.flags.maskna=True
> >>> j
> array([1, 2, 3], maskna=True, dtype=int8)
> >>> j[0]=np.NA
> >>> j
> array([NA, 2, 3], dtype=int8) # Ithink it should still display
> 'maskna=True'.
>

This is just like how NumPy hides the dtype in some cases, it's hiding the
maskna=True whenever it would be automatically detected from the input list.

>>> np.array([1.0, 2.0])
array([ 1.,  2.])
>>> np.array([1.0, 2.0], dtype=np.float32)
array([ 1.,  2.], dtype=float32)

Cheers,
Mark


>
> Bruce
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape

2011-08-19 Thread Paul Anton Letnes

On 19. aug. 2011, at 19.57, Ian wrote:

> Right. I'm new to NumPy so I figured I'd check if there was some nifty way of 
> preserving the shape without storing it in the database that I hadn't 
> discovered yet. No worries, I'll store the shape alongside the array. Thanks 
> for the reply.
> 
I love the h5py package so I keep recommending it (and pytables is supposed to 
be good, I think?). h5py stores files in hdf5, which is readable from 
C,C++,fortran,java,python... It also keeps track of shape and you can store 
other metadata (e.g. strings) as desired.

Also I believe the numpy format (see e.g. 
http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez)
 can do the same, although I don't think performance scales as well for huge 
arrays, and it's not language-neutral (to my knowledge).

Cheers
Paul


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape

2011-08-19 Thread Ian
Right. I'm new to NumPy so I figured I'd check if there was some nifty way of 
preserving the shape without storing it in the database that I hadn't 
discovered yet. No worries, I'll store the shape alongside the array. Thanks 
for the reply.

Ian




>
>From: Olivier Delalleau 
>To: Discussion of Numerical Python 
>Sent: Friday, August 19, 2011 11:44 AM
>Subject: Re: [Numpy-discussion] Reconstruct multidimensional array from buffer 
>without shape
>
>
>How could it be possible? If you only have the buffer data, there could be 
>many different valid shapes associated to this data.
>
>-=- Olivier
>
>
>2011/8/19 Ian 
>
>Hello list,
>>
>>
>>I am storing a multidimensional array as binary in a Postgres 9.04 database. 
>>For retrieval of this array from the database I thought frombuffer() was my 
>>solution, however I see that this constructs a one-dimensional array. I read 
>>in the documentation about the buffer parameter in the ndarray() constructor, 
>>but that requires the shape of the array.
>>
>>
>>Is there a way to re-construct a multidimensional array from a buffer without 
>>knowing its shape?
>>
>>
>>Thanks.
>>___
>>NumPy-Discussion mailing list
>>NumPy-Discussion@scipy.org
>>http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>___
>NumPy-Discussion mailing list
>NumPy-Discussion@scipy.org
>http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Charles R Harris
On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey  wrote:

> Hi,
> Just some immediate minor observations that are really about trying to
> be consistent:
>
> 1) Could you keep the display of the NA dtype be the same as the array?
> For example, NA dtype is displayed as ' 'float64' as that is the array dtype.
>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
> >>> a
> array([[  1.,   2.,   3., NA],
>   [  3.,   4.,  nan,   5.]])
> >>> a.dtype
> dtype('float64')
> >>> a.sum()
> NA(dtype='
> 2) Can the 'skipna' flag be added to the methods?
> >>> a.sum(skipna=True)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: 'skipna' is an invalid keyword argument for this function
> >>> np.sum(a,skipna=True)
> nan
>
> 3) Can the skipna flag be extended to exclude other non-finite cases like
> NaN?
>
> 4) Assigning a np.NA needs a better error message but the Integer
> array case is more informative:
> >>> b=np.array([1,2,3,4], dtype=np.float128)
> >>> b[0]=np.NA
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: float() argument must be a string or a number
>
> >>> j=np.array([1,2,3])
> >>> j
> array([1, 2, 3])
> >>> j[0]=ina
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: int() argument must be a string or a number, not 'numpy.NAType'
>
> But it is nice that np.NA 'adjusts' to the insertion array:
> >>> b.flags.maskna = True
> >>> ana
> NA(dtype=' >>> b[0]=ana
> >>> b[0]
> NA(dtype='
> 5) Different display depending on masked state. That is I think that
> 'maskna=True' should be displayed always when flags.maskna is True :
> >>> j=np.array([1,2,3], dtype=np.int8)
> >>> j
> array([1, 2, 3], dtype=int8)
> >>> j.flags.maskna=True
> >>> j
> array([1, 2, 3], maskna=True, dtype=int8)
> >>> j[0]=np.NA
> >>> j
> array([NA, 2, 3], dtype=int8) # Ithink it should still display
> 'maskna=True'.
>
>
My main peeve is that NA is upper case ;) I suppose that could use some
discussion.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape

2011-08-19 Thread Olivier Delalleau
How could it be possible? If you only have the buffer data, there could be
many different valid shapes associated to this data.

-=- Olivier

2011/8/19 Ian 

> Hello list,
>
> I am storing a multidimensional array as binary in a Postgres 9.04
> database. For retrieval of this array from the database I thought
> frombuffer() was my solution, however I see that this constructs a
> one-dimensional array. I read in the documentation about the buffer
> parameter in the ndarray() constructor, but that requires the shape of the
> array.
>
> Is there a way to re-construct a multidimensional array from a buffer
> without knowing its shape?
>
> Thanks.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Reconstruct multidimensional array from buffer without shape

2011-08-19 Thread Ian
Hello list,

I am storing a multidimensional array as binary in a Postgres 9.04 database. 
For retrieval of this array from the database I thought frombuffer() was my 
solution, however I see that this constructs a one-dimensional array. I read in 
the documentation about the buffer parameter in the ndarray() constructor, but 
that requires the shape of the array.

Is there a way to re-construct a multidimensional array from a buffer without 
knowing its shape?

Thanks.___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey
Hi,
Just some immediate minor observations that are really about trying to
be consistent:

1) Could you keep the display of the NA dtype be the same as the array?
For example, NA dtype is displayed as '>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>>> a
array([[  1.,   2.,   3., NA],
   [  3.,   4.,  nan,   5.]])
>>> a.dtype
dtype('float64')
>>> a.sum()
NA(dtype='>> a.sum(skipna=True)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'skipna' is an invalid keyword argument for this function
>>> np.sum(a,skipna=True)
nan

3) Can the skipna flag be extended to exclude other non-finite cases like NaN?

4) Assigning a np.NA needs a better error message but the Integer
array case is more informative:
>>> b=np.array([1,2,3,4], dtype=np.float128)
>>> b[0]=np.NA
Traceback (most recent call last):
  File "", line 1, in 
TypeError: float() argument must be a string or a number

>>> j=np.array([1,2,3])
>>> j
array([1, 2, 3])
>>> j[0]=ina
Traceback (most recent call last):
  File "", line 1, in 
TypeError: int() argument must be a string or a number, not 'numpy.NAType'

But it is nice that np.NA 'adjusts' to the insertion array:
>>> b.flags.maskna = True
>>> ana
NA(dtype='>> b[0]=ana
>>> b[0]
NA(dtype='>> j=np.array([1,2,3], dtype=np.int8)
>>> j
array([1, 2, 3], dtype=int8)
>>> j.flags.maskna=True
>>> j
array([1, 2, 3], maskna=True, dtype=int8)
>>> j[0]=np.NA
>>> j
array([NA, 2, 3], dtype=int8) # Ithink it should still display 'maskna=True'.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Fri, Aug 19, 2011 at 11:07 AM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Fri, Aug 19, 2011 at 11:55 AM, Bruce Southey wrote:
>
>> On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe  wrote:
>> > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey 
>> wrote:
>> >>
>> >> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
>> >>
>> >> It's taken a lot of changes to get the NA mask support to its current
>> >> point, but the code ready for some testing now. You can read the
>> >> work-in-progress release notes here:
>> >>
>> >>
>> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>> >> To try it out, check out the missingdata branch from my github account,
>> >> here, and build in the standard way:
>> >> https://github.com/m-paradox/numpy
>> >> The things most important to test are:
>> >> * Confirm that existing code still works correctly. I've tested against
>> >> SciPy and matplotlib.
>> >> * Confirm that the performance of code not using NA masks is the same
>> or
>> >> better.
>> >> * Try to do computations with the NA values, find places they don't
>> work
>> >> yet, and nominate unimplemented functionality important to you to be
>> next on
>> >> the development list. The release notes have a preliminary list of
>> >> implemented/unimplemented functions.
>> >> * Report any crashes, build problems, or unexpected behaviors.
>> >> In addition to adding the NA mask, I've also added features and done a
>> few
>> >> performance changes here and there, like letting reductions like sum
>> take
>> >> lists of axes instead of being a single axis or all of them. These
>> changes
>> >> affect various bugs
>> >> like http://projects.scipy.org/numpy/ticket/1143 and
>> http://projects.scipy.org/numpy/ticket/533.
>> >> Thanks!
>> >> Mark
>> >> Here's a small example run using NAs:
>> >> >>> import numpy as np
>> >> >>> np.__version__
>> >> '2.0.0.dev-8a5e2a1'
>> >> >>> a = np.random.rand(3,3,3)
>> >> >>> a.flags.maskna = True
>> >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>> >> >>> a
>> >> array([[[NA, NA,  0.11511708],
>> >> [ 0.46661454,  0.47565512, NA],
>> >> [NA, NA, NA]],
>> >>[[NA,  0.57860351, NA],
>> >> [NA, NA,  0.72012669],
>> >> [ 0.36582123, NA,  0.76289794]],
>> >>[[ 0.65322748,  0.92794386, NA],
>> >> [ 0.53745165,  0.97520989,  0.17515083],
>> >> [ 0.71219688,  0.5184328 ,  0.75802805]]])
>> >> >>> np.mean(a, axis=-1)
>> >> array([[NA, NA, NA],
>> >>[NA, NA, NA],
>> >>[NA,  0.56260412,  0.66288591]])
>> >> >>> np.std(a, axis=-1)
>> >> array([[NA, NA, NA],
>> >>[NA, NA, NA],
>> >>[NA,  0.32710662,  0.10384331]])
>> >> >>> np.mean(a, axis=-1, skipna=True)
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> >> array([[ 0.11511708,  0.47113483, nan],
>> >>[ 0.57860351,  0.72012669,  0.56435958],
>> >>[ 0.79058567,  0.56260412,  0.66288591]])
>> >> >>> np.std(a, axis=-1, skipna=True)
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
>> >>
>> >>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
>> >> RuntimeWarning: invalid value encountered in true_divide
>> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> >> array([[ 0.,  0.00452029, nan],
>> >>[ 0.,  0.,  0.19853835],
>> >>[ 0.13735819,  0.32710662,  0.10384331]])
>> >> >>> np.std(a, axis=(1,2), skipna=True)
>> >> array([ 0.16786895,  0.15498008,  0.23811937])
>> >>
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >> Hi,
>> >> That is great news!
>> >> (Python2.x will be another email.)
>> >>
>> >> Python3.1 and Python3.2 failed with building
>> 'multiarraymodule_onefile.o'
>> >> but I could not see any obvious reason.
>> >
>> > I've pushed a change to fix the Python 3 build, it was a use
>> > of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always
>> > default now. Tested with 3.2.
>> > Thanks!
>> > Mark
>> >
>> >>
>> >> I had removed my build directory and then 'python3 setup.py build' but
>> I
>> >> saw this message:
>> >> Running from numpy source directory.
>> >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch
>> detected,
>> >> the C API version numbers have to be updated. Current C api version is
>> 6,
>> >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum
>> for C
>> >> API version 6 in codegen_dir/cversions.txt is
>> >> e61d5dc51fa1c6459328266e215

Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Charles R Harris
On Fri, Aug 19, 2011 at 11:55 AM, Bruce Southey  wrote:

> On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe  wrote:
> > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey 
> wrote:
> >>
> >> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
> >>
> >> It's taken a lot of changes to get the NA mask support to its current
> >> point, but the code ready for some testing now. You can read the
> >> work-in-progress release notes here:
> >>
> >>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
> >> To try it out, check out the missingdata branch from my github account,
> >> here, and build in the standard way:
> >> https://github.com/m-paradox/numpy
> >> The things most important to test are:
> >> * Confirm that existing code still works correctly. I've tested against
> >> SciPy and matplotlib.
> >> * Confirm that the performance of code not using NA masks is the same or
> >> better.
> >> * Try to do computations with the NA values, find places they don't work
> >> yet, and nominate unimplemented functionality important to you to be
> next on
> >> the development list. The release notes have a preliminary list of
> >> implemented/unimplemented functions.
> >> * Report any crashes, build problems, or unexpected behaviors.
> >> In addition to adding the NA mask, I've also added features and done a
> few
> >> performance changes here and there, like letting reductions like sum
> take
> >> lists of axes instead of being a single axis or all of them. These
> changes
> >> affect various bugs
> >> like http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
> >> Thanks!
> >> Mark
> >> Here's a small example run using NAs:
> >> >>> import numpy as np
> >> >>> np.__version__
> >> '2.0.0.dev-8a5e2a1'
> >> >>> a = np.random.rand(3,3,3)
> >> >>> a.flags.maskna = True
> >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >> >>> a
> >> array([[[NA, NA,  0.11511708],
> >> [ 0.46661454,  0.47565512, NA],
> >> [NA, NA, NA]],
> >>[[NA,  0.57860351, NA],
> >> [NA, NA,  0.72012669],
> >> [ 0.36582123, NA,  0.76289794]],
> >>[[ 0.65322748,  0.92794386, NA],
> >> [ 0.53745165,  0.97520989,  0.17515083],
> >> [ 0.71219688,  0.5184328 ,  0.75802805]]])
> >> >>> np.mean(a, axis=-1)
> >> array([[NA, NA, NA],
> >>[NA, NA, NA],
> >>[NA,  0.56260412,  0.66288591]])
> >> >>> np.std(a, axis=-1)
> >> array([[NA, NA, NA],
> >>[NA, NA, NA],
> >>[NA,  0.32710662,  0.10384331]])
> >> >>> np.mean(a, axis=-1, skipna=True)
> >>
> >>
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> >> RuntimeWarning: invalid value encountered in true_divide
> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> >> array([[ 0.11511708,  0.47113483, nan],
> >>[ 0.57860351,  0.72012669,  0.56435958],
> >>[ 0.79058567,  0.56260412,  0.66288591]])
> >> >>> np.std(a, axis=-1, skipna=True)
> >>
> >>
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> >> RuntimeWarning: invalid value encountered in true_divide
> >>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> >>
> >>
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> >> RuntimeWarning: invalid value encountered in true_divide
> >>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> >> array([[ 0.,  0.00452029, nan],
> >>[ 0.,  0.,  0.19853835],
> >>[ 0.13735819,  0.32710662,  0.10384331]])
> >> >>> np.std(a, axis=(1,2), skipna=True)
> >> array([ 0.16786895,  0.15498008,  0.23811937])
> >>
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >> Hi,
> >> That is great news!
> >> (Python2.x will be another email.)
> >>
> >> Python3.1 and Python3.2 failed with building
> 'multiarraymodule_onefile.o'
> >> but I could not see any obvious reason.
> >
> > I've pushed a change to fix the Python 3 build, it was a use
> > of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always
> > default now. Tested with 3.2.
> > Thanks!
> > Mark
> >
> >>
> >> I had removed my build directory and then 'python3 setup.py build' but I
> >> saw this message:
> >> Running from numpy source directory.
> >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch
> detected,
> >> the C API version numbers have to be updated. Current C api version is
> 6,
> >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum
> for C
> >> API version 6 in codegen_dir/cversions.txt is
> >> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API,
> you
> >> have to update C_API_VERSION  in numpy/core/setup_common.py.
> >>   MismatchCAPIWarning)
> >>
> >> Upstream of the build log is below.
> >>
> >> Bruce
> >>
> >> In fi

Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey
On Fri, Aug 19, 2011 at 10:27 AM, Ralf Gommers
 wrote:
>
>
> On Fri, Aug 19, 2011 at 5:23 PM, Bruce Southey  wrote:
>>
>> On 08/19/2011 10:04 AM, Ralf Gommers wrote:
>>
>> On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey  wrote:
>>>
>>> Hi,
>>> I had to rebuild my Python2.6 as a 'normal' version.
>>>
>>> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests.
>>>
>>> Curiously, only tests in Python2.7 give almost no warnings but all the
>>> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below.
>>> My expectation is that all versions should behave the same regarding
>>> printing messages.
>>
>> This is due to a change in Python 2.7 itself - deprecation warnings are
>> not shown anymore by default. Furthermore, all those messages are unrelated
>> to Mark's missing data commits.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> Yet:
>> $ python2.6 -c "import numpy; numpy.test()"
>> Running unit tests for numpy
>> NumPy version 1.6.1
>> NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy
>> Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1
>> 20100924 (Red Hat 4.5.1-4)]
>> nose version 1.0.0
>>
>> ..K
 ...
>> ..K..K.
 ...
>> ...
 ...
>> ...
>> --
>> Ran 3533 tests in 22.062s
>>
>>

Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey
On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe  wrote:
> On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey  wrote:
>>
>> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
>>
>> It's taken a lot of changes to get the NA mask support to its current
>> point, but the code ready for some testing now. You can read the
>> work-in-progress release notes here:
>>
>> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>> To try it out, check out the missingdata branch from my github account,
>> here, and build in the standard way:
>> https://github.com/m-paradox/numpy
>> The things most important to test are:
>> * Confirm that existing code still works correctly. I've tested against
>> SciPy and matplotlib.
>> * Confirm that the performance of code not using NA masks is the same or
>> better.
>> * Try to do computations with the NA values, find places they don't work
>> yet, and nominate unimplemented functionality important to you to be next on
>> the development list. The release notes have a preliminary list of
>> implemented/unimplemented functions.
>> * Report any crashes, build problems, or unexpected behaviors.
>> In addition to adding the NA mask, I've also added features and done a few
>> performance changes here and there, like letting reductions like sum take
>> lists of axes instead of being a single axis or all of them. These changes
>> affect various bugs
>> like http://projects.scipy.org/numpy/ticket/1143 and http://projects.scipy.org/numpy/ticket/533.
>> Thanks!
>> Mark
>> Here's a small example run using NAs:
>> >>> import numpy as np
>> >>> np.__version__
>> '2.0.0.dev-8a5e2a1'
>> >>> a = np.random.rand(3,3,3)
>> >>> a.flags.maskna = True
>> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>> >>> a
>> array([[[NA, NA,  0.11511708],
>>         [ 0.46661454,  0.47565512, NA],
>>         [NA, NA, NA]],
>>        [[NA,  0.57860351, NA],
>>         [NA, NA,  0.72012669],
>>         [ 0.36582123, NA,  0.76289794]],
>>        [[ 0.65322748,  0.92794386, NA],
>>         [ 0.53745165,  0.97520989,  0.17515083],
>>         [ 0.71219688,  0.5184328 ,  0.75802805]]])
>> >>> np.mean(a, axis=-1)
>> array([[NA, NA, NA],
>>        [NA, NA, NA],
>>        [NA,  0.56260412,  0.66288591]])
>> >>> np.std(a, axis=-1)
>> array([[NA, NA, NA],
>>        [NA, NA, NA],
>>        [NA,  0.32710662,  0.10384331]])
>> >>> np.mean(a, axis=-1, skipna=True)
>>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
>> RuntimeWarning: invalid value encountered in true_divide
>>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> array([[ 0.11511708,  0.47113483,         nan],
>>        [ 0.57860351,  0.72012669,  0.56435958],
>>        [ 0.79058567,  0.56260412,  0.66288591]])
>> >>> np.std(a, axis=-1, skipna=True)
>>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
>> RuntimeWarning: invalid value encountered in true_divide
>>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
>>
>> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
>> RuntimeWarning: invalid value encountered in true_divide
>>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
>> array([[ 0.        ,  0.00452029,         nan],
>>        [ 0.        ,  0.        ,  0.19853835],
>>        [ 0.13735819,  0.32710662,  0.10384331]])
>> >>> np.std(a, axis=(1,2), skipna=True)
>> array([ 0.16786895,  0.15498008,  0.23811937])
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> Hi,
>> That is great news!
>> (Python2.x will be another email.)
>>
>> Python3.1 and Python3.2 failed with building 'multiarraymodule_onefile.o'
>> but I could not see any obvious reason.
>
> I've pushed a change to fix the Python 3 build, it was a use
> of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always
> default now. Tested with 3.2.
> Thanks!
> Mark
>
>>
>> I had removed my build directory and then 'python3 setup.py build' but I
>> saw this message:
>> Running from numpy source directory.
>> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected,
>> the C API version numbers have to be updated. Current C api version is 6,
>> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum for C
>> API version 6 in codegen_dir/cversions.txt is
>> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you
>> have to update C_API_VERSION  in numpy/core/setup_common.py.
>>   MismatchCAPIWarning)
>>
>> Upstream of the build log is below.
>>
>> Bruce
>>
>> In file included from
>> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:
>> numpy/core/src/multiarray/na_singleton.c: At top level:
>> numpy/core/src/multiarray/na_singleton.c:708:25: error:
>> ‘Py_TPFLAGS_CHECKTYPES’ undeclared here (not in a function)
>> numpy/core/src/multiarray/common.c:48:1: warning: 

Re: [Numpy-discussion] summing an array

2011-08-19 Thread Bob Dowling

On 19/08/11 15:49, Chris Withers wrote:
> On 18/08/2011 07:58, Bob Dowling wrote:
>>
>> >>> numpy.add.accumulate(a)
>> array([ 0, 1, 3, 6, 10])
>>
>> >>> numpy.add.accumulate(a, out=a)
>> array([ 0, 1, 3, 6, 10])
>
> What's the difference between numpy.cumsum and numpy.add.accumulate?

I think they're equivalent, with numpy.cumprod() serving for 
numpy.multiply.accumulate()

I have a prefeence for general procedures rather than special short 
cuts.  The numpy..accumulate works for any of the binary ufuncs I 
think.  The cumsum() and cumprod() functions only exist for add and 
multiply.

e.g.

 >>> a = numpy.arange(2,5)
 >>> a
array([2, 3, 4])
 >>> numpy.power.accumulate(a)
array([   2,8, 4096])


> Where can I find the reference docs for these?

help(numpy.ufunc)
help(numpy.ufunc.accumulate)

is where I started.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Can't mix np.newaxis with boolean indexing

2011-08-19 Thread Benjamin Root
I could have sworn that this use to work:

import numpy as np
a = np.random.random((100,))
b = (a > 0.5)
print a[b, np.newaxis]

But instead, I get this error on the latest master:

Traceback (most recent call last):
  File "", line 1, in 
TypeError: long() argument must be a string or a number, not 'NoneType'

Note, the simple work-around would be "a[b][:, np.newaxis]", but I can't
imagine why the intuitive syntax would not be valid.

Thanks,
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey  wrote:

> **
> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
>
> It's taken a lot of changes to get the NA mask support to its current
> point, but the code ready for some testing now. You can read the
> work-in-progress release notes here:
>
>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>
>  To try it out, check out the missingdata branch from my github account,
> here, and build in the standard way:
>
>  https://github.com/m-paradox/numpy
>
>  The things most important to test are:
>
>  * Confirm that existing code still works correctly. I've tested against
> SciPy and matplotlib.
> * Confirm that the performance of code not using NA masks is the same or
> better.
> * Try to do computations with the NA values, find places they don't work
> yet, and nominate unimplemented functionality important to you to be next on
> the development list. The release notes have a preliminary list of
> implemented/unimplemented functions.
> * Report any crashes, build problems, or unexpected behaviors.
>
>  In addition to adding the NA mask, I've also added features and done a
> few performance changes here and there, like letting reductions like sum
> take lists of axes instead of being a single axis or all of them. These
> changes affect various bugs like
> http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
>
>  Thanks!
> Mark
>
>  Here's a small example run using NAs:
>
>  >>> import numpy as np
> >>> np.__version__
> '2.0.0.dev-8a5e2a1'
> >>> a = np.random.rand(3,3,3)
> >>> a.flags.maskna = True
> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >>> a
> array([[[NA, NA,  0.11511708],
> [ 0.46661454,  0.47565512, NA],
> [NA, NA, NA]],
>
> [[NA,  0.57860351, NA],
> [NA, NA,  0.72012669],
> [ 0.36582123, NA,  0.76289794]],
>
> [[ 0.65322748,  0.92794386, NA],
> [ 0.53745165,  0.97520989,  0.17515083],
> [ 0.71219688,  0.5184328 ,  0.75802805]]])
> >>> np.mean(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
>[NA,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
> [NA,  0.32710662,  0.10384331]])
> >>> np.mean(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.11511708,  0.47113483, nan],
>[ 0.57860351,  0.72012669,  0.56435958],
>[ 0.79058567,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.,  0.00452029, nan],
>[ 0.,  0.,  0.19853835],
>[ 0.13735819,  0.32710662,  0.10384331]])
>  >>> np.std(a, axis=(1,2), skipna=True)
> array([ 0.16786895,  0.15498008,  0.23811937])
>
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  Hi,
> That is great news!
> (Python2.x will be another email.)
>
> Python3.1 and Python3.2 failed with building 'multiarraymodule_onefile.o'
> but I could not see any obvious reason.
>

I've pushed a change to fix the Python 3 build, it was a use
of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always
default now. Tested with 3.2.

Thanks!
Mark


>
> I had removed my build directory and then 'python3 setup.py build' but I
> saw this message:
> Running from numpy source directory.
> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected,
> the C API version numbers have to be updated. Current C api version is 6,
> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum for C
> API version 6 in codegen_dir/cversions.txt is
> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you
> have to update C_API_VERSION  in numpy/core/setup_common.py.
>   MismatchCAPIWarning)
>
> Upstream of the build log is below.
>
> Bruce
>
> In file included from
> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:
> numpy/core/src/multiarray/na_singleton.c: At top level:
> numpy/core/src/multiarray/na_singleton.c:708:25: error:
> ‘Py_TPFLAGS_CHECKTYPES’ undeclared here (not in a function)
> numpy/core/src/multiarray/common.c:48:1: warning: ‘_use_default_type’
> defined but not used
> numpy/core/src/multiarray/ctors.h:93:1: warning: ‘_arrays_overlap’ declared
> ‘static’ but ne

Re: [Numpy-discussion] longlong format error with Python <= 2.6 in scalartypes.c

2011-08-19 Thread Alok Singhal
On Thu, Aug 18, 2011 at 9:01 PM, Mark Wiebe  wrote:
> On Thu, Aug 4, 2011 at 4:08 PM, Derek Homeier
>  wrote:
>>
>> Hi,
>>
>> commits c15a807e and c135371e (thus most immediately addressed to Mark,
>> but I am sending this to the list hoping for more insight on the issue)
>> introduce a test failure with Python 2.5+2.6 on Mac:
>>
>> FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime)
>> --
>> Traceback (most recent call last):
>>  File
>> "/Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py",
>> line 219, in test_timedelta_scalar_construction
>>    assert_equal(str(np.timedelta64(3, 's')), '3 seconds')
>>  File "/Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py",
>> line 313, in assert_equal
>>    raise AssertionError(msg)
>> AssertionError:
>> Items are not equal:
>>  ACTUAL: '%lld seconds'
>>  DESIRED: '3 seconds'
>>
>> due to the "lld" format passed to PyUString_FromFormat in scalartypes.c.
>> In the current npy_common.h I found the comment
>>  *      in Python 2.6 the %lld formatter is not supported. In this
>>  *      case we work around the problem by using the %zd formatter.
>> though I did not notice that problem when I cleaned up the
>> NPY_LONGLONG_FMT definitions in that file (and it is not entirely clear
>> whether the comment only pertains to Windows...). Anyway changing the
>> formatters in scalartypes.c to "zd" as well removes the failure and still
>> works with Python 2.7 and 3.2 (at least on Mac OS). However I am wondering
>> if
>> a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python
>> version (and if "%zu" is a valid formatter), and
>> b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h
>>
>> I am attaching a patch implementing a), but only the quick and dirty
>> solution to b).
>
> I've touched this stuff as little as possible, because I rather dislike the
> way the *_FMT macros are set up right now. I added a comment about
> NPY_INTP_FMT in npy_common.h which I see you read. If you're going to try to
> fix this, I hope you fix it deeper than this patch so it's not error-prone
> anymore.
> NPY_INTP_FMT is used together with PyErr_Format/PyString_FromFormat, whereas
> the other *_FMT are used with the *printf functions from the C libraries.
> These are not compatible, and the %zd hack was put in place because it
> exists even in Python 2.4, and Py_ssize_t seems matches the  pointer size in
> all CPython versions.
> Switching the timedelta64 format in scalartypes.c.src to "%zd" won't help on
> 32-bit platforms, because it won't be a 64-bit type there, unlike how it
> works ok for the NPY_INTP_FMT. In summary:
> * There need to be changes to create a clear distinction between the *_FMT
> for PyString_FromFormat vs the *_FMT for C library *printf functions
> * I suspect we're out of luck for 32-bit older versions of CPython with
> PyString_FromFormat
> Cheers,
> -Mark

By the way, the above bug is fixed in the current master (see
https://github.com/numpy/numpy/commit/730b861120094b1ab38670b9a8895a36c19296a7).
 I fixed it in the most direct way possible, because "the correct" way
would require changes to a lot of places.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Ralf Gommers
On Fri, Aug 19, 2011 at 5:23 PM, Bruce Southey  wrote:

> **
> On 08/19/2011 10:04 AM, Ralf Gommers wrote:
>
>
>
> On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey  wrote:
>
>>   Hi,
>> I had to rebuild my Python2.6 as a 'normal' version.
>>
>> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests.
>>
>> Curiously, only tests in Python2.7 give almost no warnings but all the
>> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below.
>> My expectation is that all versions should behave the same regarding
>> printing messages.
>>
>
> This is due to a change in Python 2.7 itself - deprecation warnings are not
> shown anymore by default. Furthermore, all those messages are unrelated to
> Mark's missing data commits.
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  Yet:
>
> $ python2.6 -c "import numpy; numpy.test()"
> Running unit tests for numpy
> NumPy version 1.6.1
>
> NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy
> Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1
> 20100924 (Red Hat 4.5.1-4)]
> nose version 1.0.0
> ..K...
> ..K..K
> ..
> ...
> --
> Ran 3533 tests in 22.062s
>
> OK (KNOWNFAIL=3)
>
> Hence why I was curious about all the messages having not seen them.
>
> Is the

Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 9:23 AM, Warren Weckesser
 wrote:
>
>
> On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin  wrote:
>>
>> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen 
>> wrote:
>> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin 
>> > wrote:
>> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>>  I would like to use numpy's memmap on some data files I have. The
>>  first
>>  12 or so lines of the files contain text (header information) and the
>>  remainder has the numerical data. Is there a way I can tell memmap to
>>  skip a specified number of lines instead of a number of bytes?
>> >>>
>> >>> First use standard Python I/O functions to determine the number of
>> >>> bytes to skip at the beginning and the number of data items. Then pass
>> >>> in `offset` and `shape` parameters to numpy.memmap.
>> >>
>> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> >> functions you are referring to. Can you point me to do the
>> >> documentation?
>> >>
>> >> Thanks again,
>> >> Jeremy
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >
>> > this might get you started:
>> >
>> >
>> > import numpy as np
>> >
>> > # make some fake data with 12 header lines.
>> > with open('test.mm', 'w') as fhw:
>> >    print >> fhw, "\n".join('header' for i in range(12))
>> >    np.arange(100, dtype=np.uint).tofile(fhw)
>> >
>> > # use normal python io to determine of offset after 12 lines.
>> > with open('test.mm') as fhr:
>> >    for i in range(12): fhr.readline()
>> >    offset = fhr.tell()
>> >
>> > # use the offset in your call to np.memmap.
>> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>>
>> Thanks, that looks good. I tried it, but it doesn't get the correct
>> data. I really don't understand what is going on. A simple code and
>> sample data is attached if anyone has a chance to look at it.
>
>
> Your data file is all text.  memmap is generally for binary data; it won't
> work with this file.
>
> Warren

Yikes! I missed the "binary" in the first line of the documentation. Sorry!

Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Warren Weckesser
On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin  wrote:

> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen 
> wrote:
> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin 
> wrote:
> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>  I would like to use numpy's memmap on some data files I have. The
> first
>  12 or so lines of the files contain text (header information) and the
>  remainder has the numerical data. Is there a way I can tell memmap to
>  skip a specified number of lines instead of a number of bytes?
> >>>
> >>> First use standard Python I/O functions to determine the number of
> >>> bytes to skip at the beginning and the number of data items. Then pass
> >>> in `offset` and `shape` parameters to numpy.memmap.
> >>
> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O
> >> functions you are referring to. Can you point me to do the
> >> documentation?
> >>
> >> Thanks again,
> >> Jeremy
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> > this might get you started:
> >
> >
> > import numpy as np
> >
> > # make some fake data with 12 header lines.
> > with open('test.mm', 'w') as fhw:
> >print >> fhw, "\n".join('header' for i in range(12))
> >np.arange(100, dtype=np.uint).tofile(fhw)
> >
> > # use normal python io to determine of offset after 12 lines.
> > with open('test.mm') as fhr:
> >for i in range(12): fhr.readline()
> >offset = fhr.tell()
> >
> > # use the offset in your call to np.memmap.
> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>
> Thanks, that looks good. I tried it, but it doesn't get the correct
> data. I really don't understand what is going on. A simple code and
> sample data is attached if anyone has a chance to look at it.
>


Your data file is all text.  memmap is generally for binary data; it won't
work with this file.

Warren



>
> Thanks,
> Jeremy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey

On 08/19/2011 10:04 AM, Ralf Gommers wrote:



On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey > wrote:


Hi,
I had to rebuild my Python2.6 as a 'normal' version.

Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy
tests.

Curiously, only tests in Python2.7 give almost no warnings but all
the other Python2.x give lots of warnings - Python2.6 and
Python2.7 are below. My expectation is that all versions should
behave the same regarding printing messages.


This is due to a change in Python 2.7 itself - deprecation warnings 
are not shown anymore by default. Furthermore, all those messages are 
unrelated to Mark's missing data commits.


Cheers,
Ralf



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Yet:
$ python2.6 -c "import numpy; numpy.test()"
Running unit tests for numpy
NumPy version 1.6.1
NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy
Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 
20100924 (Red Hat 4.5.1-4)]

nose version 1.0.0
..K.K..K.
--
Ran 3533 tests in 22.062s

OK (KNOWNFAIL=3)

Hence why I was curious about all the messages having not seen them.

Is there some plan to cleanup these tests rather than 'hide' them?

Bruce
_

Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin  wrote:
> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen  wrote:
>> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
 Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
> I would like to use numpy's memmap on some data files I have. The first
> 12 or so lines of the files contain text (header information) and the
> remainder has the numerical data. Is there a way I can tell memmap to
> skip a specified number of lines instead of a number of bytes?

 First use standard Python I/O functions to determine the number of
 bytes to skip at the beginning and the number of data items. Then pass
 in `offset` and `shape` parameters to numpy.memmap.
>>>
>>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>>> functions you are referring to. Can you point me to do the
>>> documentation?
>>>
>>> Thanks again,
>>> Jeremy
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> this might get you started:
>>
>>
>> import numpy as np
>>
>> # make some fake data with 12 header lines.
>> with open('test.mm', 'w') as fhw:
>>    print >> fhw, "\n".join('header' for i in range(12))
>>    np.arange(100, dtype=np.uint).tofile(fhw)
>>
>> # use normal python io to determine of offset after 12 lines.
>> with open('test.mm') as fhr:
>>    for i in range(12): fhr.readline()
>>    offset = fhr.tell()
>>
>> # use the offset in your call to np.memmap.
>> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>
> Thanks, that looks good. I tried it, but it doesn't get the correct
> data. I really don't understand what is going on. A simple code and
> sample data is attached if anyone has a chance to look at it.
>
> Thanks,
> Jeremy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

in that case, i would use:

np.loadtxt('tmp.dat', skiprows=12)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Mark Wiebe
On Fri, Aug 19, 2011 at 7:55 AM, Bruce Southey  wrote:

> **
> On 08/18/2011 04:43 PM, Mark Wiebe wrote:
>
> It's taken a lot of changes to get the NA mask support to its current
> point, but the code ready for some testing now. You can read the
> work-in-progress release notes here:
>
>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>
>  To try it out, check out the missingdata branch from my github account,
> here, and build in the standard way:
>
>  https://github.com/m-paradox/numpy
>
>  The things most important to test are:
>
>  * Confirm that existing code still works correctly. I've tested against
> SciPy and matplotlib.
> * Confirm that the performance of code not using NA masks is the same or
> better.
> * Try to do computations with the NA values, find places they don't work
> yet, and nominate unimplemented functionality important to you to be next on
> the development list. The release notes have a preliminary list of
> implemented/unimplemented functions.
> * Report any crashes, build problems, or unexpected behaviors.
>
>  In addition to adding the NA mask, I've also added features and done a
> few performance changes here and there, like letting reductions like sum
> take lists of axes instead of being a single axis or all of them. These
> changes affect various bugs like
> http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
>
>  Thanks!
> Mark
>
>  Here's a small example run using NAs:
>
>  >>> import numpy as np
> >>> np.__version__
> '2.0.0.dev-8a5e2a1'
> >>> a = np.random.rand(3,3,3)
> >>> a.flags.maskna = True
> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >>> a
> array([[[NA, NA,  0.11511708],
> [ 0.46661454,  0.47565512, NA],
> [NA, NA, NA]],
>
> [[NA,  0.57860351, NA],
> [NA, NA,  0.72012669],
> [ 0.36582123, NA,  0.76289794]],
>
> [[ 0.65322748,  0.92794386, NA],
> [ 0.53745165,  0.97520989,  0.17515083],
> [ 0.71219688,  0.5184328 ,  0.75802805]]])
> >>> np.mean(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
>[NA,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1)
> array([[NA, NA, NA],
>[NA, NA, NA],
> [NA,  0.32710662,  0.10384331]])
> >>> np.mean(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.11511708,  0.47113483, nan],
>[ 0.57860351,  0.72012669,  0.56435958],
>[ 0.79058567,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.,  0.00452029, nan],
>[ 0.,  0.,  0.19853835],
>[ 0.13735819,  0.32710662,  0.10384331]])
>  >>> np.std(a, axis=(1,2), skipna=True)
> array([ 0.16786895,  0.15498008,  0.23811937])
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  Hi,
> I had to rebuild my Python2.6 as a 'normal' version.
>
> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests.
>

Thanks for running the tests!

>
> Curiously, only tests in Python2.7 give almost no warnings but all the
> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below.
> My expectation is that all versions should behave the same regarding
> printing messages.
>

The lack of deprecation warnings is because you need to add -Wd explicitly
when you run under 2.7. There was an idea to make this the default from
within the test suite execution code, but no one has stepped up and
implemented that. See here:

http://projects.scipy.org/numpy/ticket/1894


> Also the message 'Need pytz library to test datetime timezones' means that
> there are invalid tests that have to be rewritten (ticket 1939:
> http://projects.scipy.org/numpy/ticket/1939 ).
>

I did it this way because Python has no timezone objects built in, just
provides the interface. If someone is willing to copy or write timezone
instances into the testsuite to fix this I would be very grateful!

I think all these policies I keep breaking should be written down somewhere.
I don't think it's reasonable to call something a community/project policy
unless a particular wording of it in an easily discoverable official
document has been agreed upon by the community. I nominate this as a new
policy. ;)

Thanks,
Mark


>
> Bruce
>
> $

Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen  wrote:
> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?
>>>
>>> First use standard Python I/O functions to determine the number of
>>> bytes to skip at the beginning and the number of data items. Then pass
>>> in `offset` and `shape` parameters to numpy.memmap.
>>
>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> functions you are referring to. Can you point me to do the
>> documentation?
>>
>> Thanks again,
>> Jeremy
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> this might get you started:
>
>
> import numpy as np
>
> # make some fake data with 12 header lines.
> with open('test.mm', 'w') as fhw:
>    print >> fhw, "\n".join('header' for i in range(12))
>    np.arange(100, dtype=np.uint).tofile(fhw)
>
> # use normal python io to determine of offset after 12 lines.
> with open('test.mm') as fhr:
>    for i in range(12): fhr.readline()
>    offset = fhr.tell()
>
> # use the offset in your call to np.memmap.
> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

Thanks, that looks good. I tried it, but it doesn't get the correct
data. I really don't understand what is going on. A simple code and
sample data is attached if anyone has a chance to look at it.

Thanks,
Jeremy


tmp.dat
Description: Binary data


tmp.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Ralf Gommers
On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey  wrote:

> **
>
>  Hi,
> I had to rebuild my Python2.6 as a 'normal' version.
>
> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests.
>
> Curiously, only tests in Python2.7 give almost no warnings but all the
> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below.
> My expectation is that all versions should behave the same regarding
> printing messages.
>

This is due to a change in Python 2.7 itself - deprecation warnings are not
shown anymore by default. Furthermore, all those messages are unrelated to
Mark's missing data commits.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey

On 08/18/2011 04:43 PM, Mark Wiebe wrote:
It's taken a lot of changes to get the NA mask support to its current 
point, but the code ready for some testing now. You can read the 
work-in-progress release notes here:


https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst

To try it out, check out the missingdata branch from my github 
account, here, and build in the standard way:


https://github.com/m-paradox/numpy

The things most important to test are:

* Confirm that existing code still works correctly. I've tested 
against SciPy and matplotlib.
* Confirm that the performance of code not using NA masks is the same 
or better.
* Try to do computations with the NA values, find places they don't 
work yet, and nominate unimplemented functionality important to you to 
be next on the development list. The release notes have a preliminary 
list of implemented/unimplemented functions.

* Report any crashes, build problems, or unexpected behaviors.

In addition to adding the NA mask, I've also added features and done a 
few performance changes here and there, like letting reductions like 
sum take lists of axes instead of being a single axis or all of them. 
These changes affect various bugs like 
http://projects.scipy.org/numpy/ticket/1143 and 
http://projects.scipy.org/numpy/ticket/533.


Thanks!
Mark

Here's a small example run using NAs:

>>> import numpy as np
>>> np.__version__
'2.0.0.dev-8a5e2a1'
>>> a = np.random.rand(3,3,3)
>>> a.flags.maskna = True
>>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>>> a
array([[[NA, NA,  0.11511708],
[ 0.46661454,  0.47565512, NA],
[NA, NA, NA]],

   [[NA,  0.57860351, NA],
[NA, NA,  0.72012669],
[ 0.36582123, NA,  0.76289794]],

   [[ 0.65322748,  0.92794386, NA],
[ 0.53745165,  0.97520989,  0.17515083],
[ 0.71219688,  0.5184328 ,  0.75802805]]])
>>> np.mean(a, axis=-1)
array([[NA, NA, NA],
   [NA, NA, NA],
   [NA,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1)
array([[NA, NA, NA],
   [NA, NA, NA],
   [NA,  0.32710662,  0.10384331]])
>>> np.mean(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.11511708,  0.47113483, nan],
   [ 0.57860351,  0.72012669,  0.56435958],
   [ 0.79058567,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.,  0.00452029, nan],
   [ 0.,  0.,  0.19853835],
   [ 0.13735819,  0.32710662,  0.10384331]])
>>> np.std(a, axis=(1,2), skipna=True)
array([ 0.16786895,  0.15498008,  0.23811937])


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi,
I had to rebuild my Python2.6 as a 'normal' version.

Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests.

Curiously, only tests in Python2.7 give almost no warnings but all the 
other Python2.x give lots of warnings - Python2.6 and Python2.7 are 
below. My expectation is that all versions should behave the same 
regarding printing messages.


Also the message 'Need pytz library to test datetime timezones' means 
that there are invalid tests that have to be rewritten (ticket 1939:  
http://projects.scipy.org/numpy/ticket/1939 ).


Bruce

$ python2.6 -c "import numpy; numpy.test()"
Running unit tests for numpy
NumPy version 2.0.0.dev-93236a2
NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy
Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 
20100924 (Red Hat 4.5.1-4)]

nose version 1.0.0
/usr/local/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py:1313: 
UserWarning: Need pytz library to test datetime timezones

  warnings.warn("Need pytz library to test datetime timezones")
.../usr/local/lib/python2.6/unittest.py:336: 
DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because 
they are platform specific. Use 'O' instead

  callableObj(*args, **kwargs)
./usr/local/lib/python2.6/site-packages/numpy/core/_internal.p

Re: [Numpy-discussion] summing an array

2011-08-19 Thread Chris Withers
On 18/08/2011 07:58, Bob Dowling wrote:
>
>   >>>  numpy.add.accumulate(a)
> array([ 0,  1,  3,  6, 10])
>
>   >>>  numpy.add.accumulate(a, out=a)
> array([ 0,  1,  3,  6, 10])

What's the difference between numpy.cumsum and numpy.add.accumulate?

Where can I find the reference docs for these?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
 - http://www.simplistix.co.uk
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks for NumPy are ready to test

2011-08-19 Thread Bruce Southey

On 08/18/2011 04:43 PM, Mark Wiebe wrote:
It's taken a lot of changes to get the NA mask support to its current 
point, but the code ready for some testing now. You can read the 
work-in-progress release notes here:


https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst

To try it out, check out the missingdata branch from my github 
account, here, and build in the standard way:


https://github.com/m-paradox/numpy

The things most important to test are:

* Confirm that existing code still works correctly. I've tested 
against SciPy and matplotlib.
* Confirm that the performance of code not using NA masks is the same 
or better.
* Try to do computations with the NA values, find places they don't 
work yet, and nominate unimplemented functionality important to you to 
be next on the development list. The release notes have a preliminary 
list of implemented/unimplemented functions.

* Report any crashes, build problems, or unexpected behaviors.

In addition to adding the NA mask, I've also added features and done a 
few performance changes here and there, like letting reductions like 
sum take lists of axes instead of being a single axis or all of them. 
These changes affect various bugs like 
http://projects.scipy.org/numpy/ticket/1143 and 
http://projects.scipy.org/numpy/ticket/533.


Thanks!
Mark

Here's a small example run using NAs:

>>> import numpy as np
>>> np.__version__
'2.0.0.dev-8a5e2a1'
>>> a = np.random.rand(3,3,3)
>>> a.flags.maskna = True
>>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>>> a
array([[[NA, NA,  0.11511708],
[ 0.46661454,  0.47565512, NA],
[NA, NA, NA]],

   [[NA,  0.57860351, NA],
[NA, NA,  0.72012669],
[ 0.36582123, NA,  0.76289794]],

   [[ 0.65322748,  0.92794386, NA],
[ 0.53745165,  0.97520989,  0.17515083],
[ 0.71219688,  0.5184328 ,  0.75802805]]])
>>> np.mean(a, axis=-1)
array([[NA, NA, NA],
   [NA, NA, NA],
   [NA,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1)
array([[NA, NA, NA],
   [NA, NA, NA],
   [NA,  0.32710662,  0.10384331]])
>>> np.mean(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.11511708,  0.47113483, nan],
   [ 0.57860351,  0.72012669,  0.56435958],
   [ 0.79058567,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: 
RuntimeWarning: invalid value encountered in true_divide

  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.,  0.00452029, nan],
   [ 0.,  0.,  0.19853835],
   [ 0.13735819,  0.32710662,  0.10384331]])
>>> np.std(a, axis=(1,2), skipna=True)
array([ 0.16786895,  0.15498008,  0.23811937])


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi,
That is great news!
(Python2.x will be another email.)

Python3.1 and Python3.2 failed with building 
'multiarraymodule_onefile.o' but I could not see any obvious reason.


I had removed my build directory and then 'python3 setup.py build' but I 
saw this message:

Running from numpy source directory.
numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch 
detected, the C API version numbers have to be updated. Current C api 
version is 6, with checksum ef5688af03ffa23dd8e11734f5b69313, but 
recorded checksum for C API version 6 in codegen_dir/cversions.txt is 
e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, 
you have to update C_API_VERSION  in numpy/core/setup_common.py.

  MismatchCAPIWarning)

Upstream of the build log is below.

Bruce

In file included from 
numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0:

numpy/core/src/multiarray/na_singleton.c: At top level:
numpy/core/src/multiarray/na_singleton.c:708:25: error: 
‘Py_TPFLAGS_CHECKTYPES’ undeclared here (not in a function)
numpy/core/src/multiarray/common.c:48:1: warning: ‘_use_default_type’ 
defined but not used
numpy/core/src/multiarray/ctors.h:93:1: warning: ‘_arrays_overlap’ 
declared ‘static’ but never defined
numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: 
‘gentype_getsegcount’ defined but not used
numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: 
‘gentype_getcharbuf’ defined but not used
numpy/core/src/multiarray/mapping.c:110:1: warning: ‘_array_ass_item’ 
defined but not used
numpy/core/src/multiarray/number.c:266:1: warning: ‘array_divide’ 
defined but not used
numpy/core/src/multiarray/number.c

Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Pearu Peterson


On 08/19/2011 05:01 PM, Brent Pedersen wrote:
> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?
>>>
>>> First use standard Python I/O functions to determine the number of
>>> bytes to skip at the beginning and the number of data items. Then pass
>>> in `offset` and `shape` parameters to numpy.memmap.
>>
>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> functions you are referring to. Can you point me to do the
>> documentation?
>>
>> Thanks again,
>> Jeremy
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> this might get you started:
>
>
> import numpy as np
>
> # make some fake data with 12 header lines.
> with open('test.mm', 'w') as fhw:
>  print>>  fhw, "\n".join('header' for i in range(12))
>  np.arange(100, dtype=np.uint).tofile(fhw)
>
> # use normal python io to determine of offset after 12 lines.
> with open('test.mm') as fhr:
>  for i in range(12): fhr.readline()
>  offset = fhr.tell()

I think that before reading a line the program should
check whether the line starts with "#". Otherwise fhr.readline()
may return a very large junk of data (may be the rest of the file 
content) that ought to be read only via memmap.

HTH,
Pearu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>>> I would like to use numpy's memmap on some data files I have. The first
>>> 12 or so lines of the files contain text (header information) and the
>>> remainder has the numerical data. Is there a way I can tell memmap to
>>> skip a specified number of lines instead of a number of bytes?
>>
>> First use standard Python I/O functions to determine the number of
>> bytes to skip at the beginning and the number of data items. Then pass
>> in `offset` and `shape` parameters to numpy.memmap.
>
> Thanks for that suggestion. However, I'm unfamiliar with the I/O
> functions you are referring to. Can you point me to do the
> documentation?
>
> Thanks again,
> Jeremy
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

this might get you started:


import numpy as np

# make some fake data with 12 header lines.
with open('test.mm', 'w') as fhw:
print >> fhw, "\n".join('header' for i in range(12))
np.arange(100, dtype=np.uint).tofile(fhw)

# use normal python io to determine of offset after 12 lines.
with open('test.mm') as fhr:
for i in range(12): fhr.readline()
offset = fhr.tell()

# use the offset in your call to np.memmap.
a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

assert all(a == np.arange(100))
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion)

2011-08-19 Thread Ralf Gommers
On Fri, Aug 19, 2011 at 2:48 PM, Pauli Virtanen  wrote:

> Fri, 19 Aug 2011 12:48:29 +0200, Ralf Gommers wrote:
> [clip]
> > Hi Ognen,
> >
> > Could you please disable http access to numpy and scipy svn?
>
> Turns out also I had enough permissions to disable this. Now:
>
> $ svn co http://svn.scipy.org/svn/numpy/trunk numpy
> svn: Repository moved permanently to 'http://github.com/numpy/numpy/';
> please relocate
>
>
A helpful message even, nice touch.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>> I would like to use numpy's memmap on some data files I have. The first
>> 12 or so lines of the files contain text (header information) and the
>> remainder has the numerical data. Is there a way I can tell memmap to
>> skip a specified number of lines instead of a number of bytes?
>
> First use standard Python I/O functions to determine the number of
> bytes to skip at the beginning and the number of data items. Then pass
> in `offset` and `shape` parameters to numpy.memmap.

Thanks for that suggestion. However, I'm unfamiliar with the I/O
functions you are referring to. Can you point me to do the
documentation?

Thanks again,
Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Pauli Virtanen
Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
> I would like to use numpy's memmap on some data files I have. The first
> 12 or so lines of the files contain text (header information) and the
> remainder has the numerical data. Is there a way I can tell memmap to
> skip a specified number of lines instead of a number of bytes?

First use standard Python I/O functions to determine the number of
bytes to skip at the beginning and the number of data items. Then pass
in `offset` and `shape` parameters to numpy.memmap.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
I would like to use numpy's memmap on some data files I have. The
first 12 or so lines of the files contain text (header information)
and the remainder has the numerical data. Is there a way I can tell
memmap to skip a specified number of lines instead of a number of
bytes?

Thanks,
Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Build of current Git HEAD for NumPy fails

2011-08-19 Thread Dirk Ullrich
Hi Paeru,

2011/8/19 Pearu Peterson :
>
>
> On 08/19/2011 02:26 PM, Dirk Ullrich wrote:
>> Hi,
>>
>> when trying to build current Git HAED of NumPy with - both for
>> $PYTHON=python2 or $PYTHON=python3:
>>
>> $PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER
>>
>> I get the following error - here for PYTHON=python3.2
>
> The command works fine here with Numpy HEAD and Python 2.7.
> Btw, why do you specify --fcompiler=gnu95 for numpy? Numpy
> has no Fortran sources. So, fortran compiler is not needed
> for building Numpy (unless you use Fortran libraries
> for numpy.linalg).
>
I do use Lapack. Sorry for not mentioning it.
>> running build_clib
> ...
>>    File 
>> "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py",
>> line 179, in build_a_library
>>      fcompiler.extra_f77_compile_args =
>> build_info.get('extra_f77_compile_args') or []
>> AttributeError: 'str' object has no attribute 'extra_f77_compile_args'
>
> Reading the code, I don't see how this can happen. Very strange.
> Anyway, I cleaned up build_clib to follow similar coding convention
> as in build_ext. Could you try numpy head again?
>[...]
Now it seems to work for for Python 3.2 and 2.7.

Thank you very much, Pearu!

Dirk
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion)

2011-08-19 Thread Pauli Virtanen
Fri, 19 Aug 2011 12:48:29 +0200, Ralf Gommers wrote:
[clip]
> Hi Ognen,
> 
> Could you please disable http access to numpy and scipy svn?

Turns out also I had enough permissions to disable this. Now:

$ svn co http://svn.scipy.org/svn/numpy/trunk numpy
svn: Repository moved permanently to 'http://github.com/numpy/numpy/'; please 
relocate

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Build of current Git HEAD for NumPy fails

2011-08-19 Thread Pearu Peterson


On 08/19/2011 02:26 PM, Dirk Ullrich wrote:
> Hi,
>
> when trying to build current Git HAED of NumPy with - both for
> $PYTHON=python2 or $PYTHON=python3:
>
> $PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER
>
> I get the following error - here for PYTHON=python3.2

The command works fine here with Numpy HEAD and Python 2.7.
Btw, why do you specify --fcompiler=gnu95 for numpy? Numpy
has no Fortran sources. So, fortran compiler is not needed
for building Numpy (unless you use Fortran libraries
for numpy.linalg).

> running build_clib
...
>File 
> "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py",
> line 179, in build_a_library
>  fcompiler.extra_f77_compile_args =
> build_info.get('extra_f77_compile_args') or []
> AttributeError: 'str' object has no attribute 'extra_f77_compile_args'

Reading the code, I don't see how this can happen. Very strange.
Anyway, I cleaned up build_clib to follow similar coding convention
as in build_ext. Could you try numpy head again?

Regards,
Pearu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Build of current Git HEAD for NumPy fails

2011-08-19 Thread Dirk Ullrich
Hi,

when trying to build current Git HAED of NumPy with - both for
$PYTHON=python2 or $PYTHON=python3:

$PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER

I get the following error - here for PYTHON=python3.2

running build_clib
customize UnixCCompiler
customize UnixCCompiler using build_clib
building 'npymath' library
Traceback (most recent call last):
  File "setup.py", line 214, in 
setup_package()
  File "setup.py", line 207, in setup_package
configuration=configuration )
  File 
"/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/core.py",
line 186, in setup
return old_setup(**new_attr)
  File "/usr/lib/python3.2/distutils/core.py", line 150, in setup
dist.run_commands()
  File "/usr/lib/python3.2/distutils/dist.py", line 919, in run_commands
self.run_command(cmd)
  File "/usr/lib/python3.2/distutils/dist.py", line 938, in run_command
cmd_obj.run()
  File 
"/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build.py",
line 37, in run
old_build.run(self)
  File "/usr/lib/python3.2/distutils/command/build.py", line 128, in run
self.run_command(cmd_name)
  File "/usr/lib/python3.2/distutils/cmd.py", line 315, in run_command
self.distribution.run_command(command)
  File "/usr/lib/python3.2/distutils/dist.py", line 938, in run_command
cmd_obj.run()
  File 
"/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py",
line 100, in run
self.build_libraries(self.libraries)
  File 
"/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py",
line 119, in build_libraries
self.build_a_library(build_info, lib_name, libraries)
  File 
"/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py",
line 179, in build_a_library
fcompiler.extra_f77_compile_args =
build_info.get('extra_f77_compile_args') or []
AttributeError: 'str' object has no attribute 'extra_f77_compile_args'

It seems that `fcompiler's value in line 179 of
`numpy/distutils/command/build_clib.py' is not properly initialized as
an appropriate `fcompiler' object.

Dirk
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion)

2011-08-19 Thread Ralf Gommers
On Tue, Aug 16, 2011 at 3:01 PM, Pauli Virtanen  wrote:

> Sat, 13 Aug 2011 22:00:33 -0400, josef.pktd wrote:
> [clip]
> > Does Trac require svn access to dig out old information? for example
> > links to old changesets, annotate/blame, ... ?
>
> It does not require HTTP access to SVN, as it looks directly at the
> SVN repo on the local disk.
>
> It also probably doesn't use the old SVN repo for anything in reality,
> as there's a simple Git plugin installed that just grabs the Git history
> to the timeline, and redirects source browsing etc to Github.
> However, I don't know whether the timeline views etc continue to
> function even without the local SVN repo, so I'd just disable the HTTP
> access and leave the local repo as it is as a backup.
>
>
Hi Ognen,

Could you please disable http access to numpy and scipy svn?

Thanks a lot,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion