Re: [Numpy-discussion] finding elements that match any in a set

2011-05-29 Thread Neil Crighton
Michael Katz michaeladamkatz at yahoo.com writes:

 Yes, thanks, np.in1d is what I needed. I didn't know how to find that.

Did you check in the documentation? If so, where did you check? Would you have
found it if it was in the 'See also' section of where()?

(http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html)

I ask because people often post to the list needing in1d() after not being 
able to find it via the docs, so it would be nice to add references in
the places people go looking for it.

Neil


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Boolean arrays

2010-09-02 Thread Neil Crighton

 
 Ideally, I would like in1d to always be the right answer to this problem. It 
 should be easy to put in an if statement to switch to a kern_in()-type 
function  
 in the case of large ar1 but small ar2.  I will do some timing tests and make
 a 
 patch.
 

I uploaded a timing test and a patch to arraysetops.py here:

http://projects.scipy.org/numpy/ticket/1603

The new in1d() uses the kern_in algorithm when it's faster, and the existing 
algorithm otherwise. The speedup compared to the old in1d() for cases with very 
large ar1 and small ar2 can be up to 10x on my laptop.

If someone with commit access could take a look and and apply it if ok, that 
would be great.

Thanks,
Neil





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal : NumPy?ndarray with named axes

2010-07-12 Thread Neil Crighton
Gael Varoquaux gael.varoquaux at normalesup.org writes:

 Let say that you have a dataset that is in a 3D array, where axis 0
 corresponds to days, axis 1 to hours of the day, and axis 2 to
 temperature, you might want to have the mean of the temperature in each
 day, which would be in current numpy:
 
 data.mean(axis=0)
 
 or the mean of the temperature at every hour, across the different days,
 which would be:
 
 data.mean(axis=1)
 
 I do such manipulation all the time, and keeping track of which axis is
 what is fairly tedious and error prone. It would be much nicer to be able
 to write:
 
 data.ax_day.mean(axis=0)
 data.ax_hour.mean(axis=0)
 

Thanks, that's a really nice description. Instead of

data.ax_day.mean(axis=0)

I think it would be clearer to do something like 

data.mean(axis='day')

but I see the motivation.


Neil




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-12 Thread Neil Crighton
Rob Speer rspeer at MIT.EDU writes:

 It's not just about the rows: a 2-D datarray can also index by
 columns, an operation that has no equivalent in a 1-D array of records
 like your example.

rec['305'] effectively indexes by column. This is one the main attractions of 
structured/record arrays.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal : NumPy ndarray with named axes

2010-07-11 Thread Neil Crighton
Robert Kern robert.kern at gmail.com writes:
 
 Please install Fernando's datarray package, play with it, read its
 documentation, then come back with objections or alternatives. I
 really don't think you understand what is being proposed.
 

Well the discussion has been pretty confusing. For mostly my
benefit, here's my understanding of the proposal.

Currently the only way to choose which axis of an array we want
is by the indexing position.  So to access a row of a 2d
array (axis=0)

a[0,:]  or just a[0] # first row

and a column (axis=1)

a[:,0]# first column

To choose an individual element along an axis we must use integer
indices:

a[0,3]   # the element that is 1st along the 1st axis and 
 # 4th along the 2nd axis

Fernando's proposal would allow us to specify the axis by a
name (called a label) instead of a position, and the element
number by a name (called a tick) instead of an integer, while
retaining the old position + integer indexing. Ticks are in
effect named indices.  I can see the attraction of accessing an
axis by name instead of indexing position, because it's easy to
get confused over position when you've got a 2d or higher
dimension array. But the utility of named indices is not so clear
to me. As I understand it, these new arrays will still only be
able to have a single type of data (one of float, str, int and so
on). This seems to be pretty limiting.

What is a use case for the new array type that can't be solved by
structured/record arrays?  Sounds like it was decided at the Sciy
BOF they were a good idea, several people have implemented a
version of them and Fernando and Gael have both said they find
them useful, so they must have something going for them.  Maybe
Fernando or Gael could share an example where arrays with named
axes and indices are especially useful, for the peanut gallery's
benefit?

Cheers, Neil




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reduce array by computing min/max every n samples

2010-06-21 Thread Neil Crighton
Warren Weckesser warren.weckesser at enthought.com writes:

 
 Benjamin Root wrote:
  Brad, I think you are doing it the right way, but I think what is 
  happening is that the reshape() call on the sliced array is forcing a 
  copy to be made first.  The fact that the copy has to be made twice 
  just worsens the issue.  I would save a copy of the reshape result (it 
  is usually a view of the original data, unless a copy is forced), and 
  then perform a min/max call on that with the appropriate axis.
 
  On that note, would it be a bad idea to have a function that returns a 
  min/max tuple?
 
 +1.  More than once I've wanted exactly such a function.
 

I also think this would be useful. For what it's worth, IDL also has a function
called minmax() that does this (e.g.
http://astro.uni-tuebingen.de/software/idl/astrolib/misc/minmax.html)

Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?

2010-05-10 Thread Neil Crighton

 
 This is an intentional feature, not a bug.
 
 Chris
 

Ah, ok, thanks. I missed the explanation in the doc string because I'm using
version 1.3 and forgot to check the web docs.

For the record, this was my bug: I read a fits binary table with pyfits.  One of
the table fields was a chararray containing a bunch of flags ('A','B','C','D'). 
 
I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So

 c = pyfits_table.chararray_column
 mask = np.in1d(c, ['C', 'D'])

It turns out the actual stored values in the chararray were 'A  ', 'B  ', 'C  '
and 'D  '. in1d() converts the chararray to an ndarray before performing the
comparison, so none of the entries matches 'C' or 'D'.

What is the best way to ensure this doesn't happen to other people?  We could
change the array set operations to special-case chararrays, but this seems like 
an ugly solution. Is it possible to change something in pyfits to avoid this?


Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?

2010-05-10 Thread Neil Crighton
 This inconsistency is fixed in Numpy 1.4 (which included a major 
 overhaul of chararrays).  in1d will perform the auto 
 whitespace-stripping on chararrays, but not on regular ndarrays of strings.

Great, thanks.

 Pyfits continues to use chararray since not doing so would break 
 existing code relying on this behavior.  And there are many use cases 
 where this behavior is desirable, particularly with fixed-length strings 
 in tables. 
 
 The best way to get around it from your code is to cast the chararray 
 pyfits returns to a regular ndarray.

My problem was I didn't know I needed to get around it :)  But thanks for the 
suggestion, I'll use that in future when I need to switch between chararrays 
and 
ndarrays.

Neil



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] quot;Matchquot; two arrays

2010-04-01 Thread Neil Crighton

Shailendra shailendra.vikas at gmail.com writes:

 
 Hi All,
 I want to make a function which should be like this
 code
 cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large
 number of points
 cordinates2=(x2,y2) # similar to condinates1
 indices1,indices2= match_cordinates(cordinates1,cordinates2)
 code
 (x1[indices1],y1[indices1]) matches (x2[indices2],y2[indices2])
 
 where definition of match is such that :
 If A is closest point to B and distance between A and B is less that
 delta than it is a match.
 If A is closest point to B and distance between A and B is more that
 delta than there is no match.
 Every point has either 1 match(closest point) or none
 
 Also, the size of the cordinates1 and cordinates2 are quite large and
 outer should not be used. I can think of only C style code to
 achieve this. Can any one suggest pythonic way of doing this?
 
 Thanks,
 Shailendra
 


A similar problem comes up when you have to match astronomical coordinates. I 
wrote a python + numpy function that is fast enough for my use cases - you 
might 
be able to adapt it:

http://bitbucket.org/nhmc/pyserpens/src/tip/coord.py

The matching function starts on line 166.

Disclaimer: I haven't looked at the kdtree code yet, that might be a better 
approach.


Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Test if one element of string array is in a defined list

2010-03-22 Thread Neil Crighton
Eric Emsellem eemselle at eso.org writes:

 Hi
 
 I would like to test whether strings in a numpy S array are in a given list 
but 
 I don't manage to do so. Any hint is welcome.
 
 ===
 # So here is an example of what I would like to do
 # I have a String numpy array:
 
 import numpy as num
 Sarray = num.asarray([test1,test2,tutu,toto])
 Farray = num.arange(len(Sarray))
 mylist = [tutu,hello,why]
 

in1d() does what you want.

 import numpy as np
 Sarray = np.array([test1,test2,tutu,toto])
 mylist = [tutu,hello,why]
 np.in1d(Sarray, mylist)
array([False, False,  True, False], dtype=bool)

Be careful of whitespace when doing string comparisons; tutu  != tutu (I've 
been burnt by this in the past).

in1d() is only in more recent versions of numpy (1.4+). If you can't upgrade, 
you can cut and paste the in1d() and unique() routines from here:

http://projects.scipy.org/numpy/browser/branches/datetime/numpy/lib/arraysetops.
py

to use in your own modules.

Cheers, Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling routines from a Fortran li brary using python

2010-02-18 Thread Neil Crighton
Nils Wagner nwagner at iam.uni-stuttgart.de writes:

 Hi David,
 
 you are right. It's a proprietary library.
 I found a header file (*.h) including prototype 
 declarations of externally callable procedures.
 
 How can I proceed ?

Apparently you can use ctypes to access fortran libraries. See the first
paragraph of:

http://www.sagemath.org/doc/numerical_sage/ctypes.html

You may have to convert the .a library to a .so library.


Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtype=None as default for np.genfromtxt ?

2010-02-14 Thread Neil Crighton
Pierre GM pgmdevlist at gmail.com writes:

 
 It has been suggested (ticket #1262) to change the default dtype=float to 
dtype=None in np.genfromtxt.
 Any thoughts ?
 

I agree dtype=None should be default for the reasons given in the ticket. 

How do we handle the backwards-incompatible change?  A warning in the next 
release, then change it in the following release?

Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Release notes for arraysetops changes

2009-11-09 Thread Neil Crighton
Hi, 

I've written some release notes (below) describing the changes to
arraysetops.py. If someone with commit access could check that these sound ok
and add them to the release notes file, that would be great.

Cheers,

Neil




New features


Improved set operations
~~~

In previous versions of NumPy some set functions (intersect1d,
setxor1d, setdiff1d and setmember1d) could return incorrect results if
the input arrays contained duplicate items. These now work correctly
for input arrays with duplicates. setmember1d has been renamed to
in1d, as with the change to accept arrays with duplicates it is
no longer a set operation, and is conceptually similar to an
elementwise version of the Python operator 'in'.  All of these
functions now accept the boolean keyword assume_unique. This is False
by default, but can be set True if the input arrays are known not
to contain duplicates, which can increase the functions' execution
speed.


Deprecations


#. unique1d: use unique instead. unique1d raises a deprecation
   warning in 1.4, and will be removed in 1.5.

#. intersect1d_nu: use intersect1d instead. intersect1d_nu raises
   a deprecation warning in 1.4, and will be removed in 1.5.

#. setmember1d: use in1d instead. setmember1d raises a deprecation
   warning in 1.4, and will be removed in 1.5.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecate np.max/np.min ?

2009-11-07 Thread Neil Crighton
Charles R Harris charlesr.harris at gmail.com writes:

 People import these functions -- yes, they shouldn't do that -- and the python
builtin versions are overloaded, causing hard to locate errors.

While I would love less duplication in the numpy namespace, I don't think the 
small gain here is worth the pain of deprecation. 

 OTOH, one can ask, why is
 
   np.min(3, 2)
 
 allowed when
 
   np.min([3], 2)
 
 gives ValueError: axis(=2) out of bounds. It seems to me that
 0-dimensional objects should accept only None as the axis? (Fixing this
 would also make misuse of np.min and np.max more difficult.)

I think it would be better to fix this issue.  np.min(3,2) should also give
ValueError: axis(=2) out of bounds. Fixing this also removes any possibility
of generating hard-to-find errors by overwriting the builtin min/max. (Unless
there's some corner case I'm missing).

Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] converting discrete data to unique integers

2009-11-04 Thread Neil Crighton
 josef.pktd at gmail.com writes:

  Good point. With the return_inverse solution, is unique() guaranteed
  to give back the same array of unique values in the same (presumably
  sorted) order? That is, for two arrays A and B which have elements
  only drawn from a set S, is all(unique(A) == unique(B)) guaranteed?
  The code is a quite clever and a bit hard to follow, but it *looks*
  like it will provide a stable mapping since it's using a sort.
 
 I looked at it some time ago, and from what I remember, the sort
 is done if return_inverse=True but for some codepath it uses
 set.
 

unique always sorts, even if it uses set. So I'm pretty sure 
all(unique(A) == unique(B)) is guaranteed.


Neil

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Help with np.where and datetime functions

2009-07-08 Thread Neil Crighton
John [H2O] washakie at gmail.com writes:

 What I am trying to do (obviously?) is find all the values of X that fall
 within a time range.
 
 Specifically, one point I do not understand is why the following two methods
 fail:
 
 -- 196 ind = np.where( (t1  Y[:,0]  t2) ) #same result
 with/without inner parens
 TypeError: can't compare datetime.datetime to numpy.ndarray
 
 OR trying the 'and' method:
 
 -- 196 ind = np.where( (Y[:,0]t1) and (Y[:,0]t2) )
 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all() 
 

Use  (t1  Y[:,0])  (Y[:,0]  t2).  The python keywords 'and' and 'or' can't
be overloaded, but the bitwise operators can be (and are) overloaded by arrays.

Conditionals like a  x  b are converted to (a  x) and (x  b), which is why
they don't work either. There is a proposal to enable overloadable 'and' and
'or' methods (http://www.python.org/dev/peps/pep-0335/), but I don't think it's
ever got enough support to be accepted.

Also, if you don't need the indices, you can just use the conditional
expression as a boolean mask:

 condition =  (t1  Y[:,0])  (Y[:,0]  t2)
 Y[:,0][condition]

Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using loadtxt to read in mixed data types

2009-07-03 Thread Neil Crighton
Pierre GM pgmdevlist at gmail.com writes:

 What about
 'formats':[eval(b) for b in event_format]
 
 Should it fail, try something like:
 dtype([(x,eval(b)) for (x,b) in zip(event_fields, event_format)])
 
 At least you force dtype to have the same nb of names  formats.
 

You could use 

data = np.genfromtxt(filename, names=listofname, dtype=None)

Then you just need to specify the column names, and not the dtypes (they are
inferred from the data). There are probably backwards compatibility issues, but
it would be great if dtype=None was the default for genfromtxt.

Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Patch for review (improving arraysetops)

2009-06-22 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Hi Neil,
  This sounds good. If you don't have time to do it, I don't mind having
  a go at writing
  a patch to implement these changes (deprecate the existing unique1d, rename
  unique1d to unique and add the set approach from the old unique, and the
other
  changes mentioned in http://projects.scipy.org/numpy/ticket/1133).
 
 That would be really great - I will not be online starting tomorrow till 
 the end of next week (more or less), so I can really look at the issue 
 after I return.
 

Here's a patch that implements most of the changes suggested in the ticket, 
and merges unique and unique1d functionality to a single function unique in
arraysetops:

http://projects.scipy.org/numpy/ticket/1133

Please review it. Thanks,

Neil




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Plans for Numpy 1.4.0 and scipy 0.8.0

2009-06-22 Thread Neil Crighton
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:

 
 (Continuing the discussion initiated in the neighborhood iterator thread)
 
 Hi,
 
 I would like to gather people's opinion on what to target for numpy
 1.4.0.

 Are there any other features people would like to put into numpy for 1.4.0 ?
 

I'd like to get the patch in ticket 1113
(http://projects.scipy.org/numpy/ticket/1133), or some version of it, into 1.4.

It would also be great to get all the docstrings David Goldsmith and others are
working on into the next release.

Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Plans for Numpy 1.4.0 and scipy 0.8.0

2009-06-22 Thread Neil Crighton
David Cournapeau cournape at gmail.com writes:

  David Cournapeau wrote:
   (Continuing the discussion initiated in the neighborhood iterator
   thread)
       - Chuck suggested to drop python  2.6 support from now on. I am
   against it without a very strong and detailed rationale, because many OS
   still don't have python 2.6 (RHEL, Ubuntu LTS).
 
  I vote against dropping support for python 2.5. Personally I have no
  incentive to upgrade to 2.6 and am very happy with 2.5.
 
  Will requiring python-2.6 help the developers port numpy to python-3?
 
 
  Can't really say at this point, but it is the suggested path to
  python-3.
 
 OTOH, I don't find the python 3 official transition story very
 convincing. I have tried to gather all the information I could find,
 both on the python wiki and from transitions stories. To support both
 python 2 and 3, the suggestion is to use the 2to3 script, but it is
 painfully slow for big packages like numpy. And there ave very few
 stories for porting python 3 C extensions.
 
 Another suggestion is to avoid breaking the API when transitioning for
 python 3. But that seems quite unrealistic. How do we deal with the
 removing of string/long APIs ? This will impact the numpy API as well,
 so how do we deal with it ?
 

As I understand this suggestion, they just hope external packages don't say
'Hey, if we're breaking backwards compatibility anyway, lets take the chance to
do a whole lot of extra API breakage!'  That way, if people have problems 
migrating to the new version, they know they're likely to be python 3 related.
Jarrod Millman's blog post about numpy and python 3 mentions this: 

http://jarrodmillman.blogspot.com/2009/01/when-will-numpy-and-scipy-migrate-to.html

 Also, there does not seem to be any advantages for python 3 for
 scientific people ?
 

I think there are lots of advantages in python 3 for scientific people.  The 
new integer division alone is a huge improvement.  I've been bitten by this 
(1/2 = 0) several times in the past, and the only reason I'm not bitten by it 
now is that I've trained myself to always type things like 1./x, which look 
ugly.

The reorganisation of the standard library and the removal of duplicate ways of
doing things in the core also makes the language much easier to learn. This 
isn't a huge gain for people already familiar with Python's idiosyncracies, but
it's important for people first coming to the language.

Print becoming a function would have been a pain for interactive work, but 
happily ipython auto-parentheses takes care of that.

You could argue that moving to python 3 isn't attractive because there isn't 
any scientific library support, but then that's because numpy hasn't been 
ported to python 3 yet ;)


Neil



 


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] all and alltrue

2009-06-17 Thread Neil Crighton
Shivaraj M S shivraj.ms at gmail.com writes:

 
 Hello,I just came across 'all' and 'alltrue' functions in fromnumeric.py
 They are one and same.IMHO,alltrue = all would be 
 sufficient.Regards___
 Shivaraj--

There are other duplications too:

np.all
np.alltrue

np.any
np.sometrue

np.deg2rad
np.radians

np.rad2deg
np.degrees

And maybe more I've missed.

Can we deprecate alltrue and sometrue, and either deg2rad/rad2deg, or
radians/degrees? They would be deprecated in 1.4 and presumably removed in 1.5.

Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arraysetops

2009-06-17 Thread Neil Crighton
  What about merging unique and unique1d?  They're essentially identical for 
  an
  array input, but unique uses the builtin set() for non-array inputs and so 
  is
  around 2x faster in this case - see below. Is it worth accepting a speed
  regression for unique to get rid of the function duplication?  (Or can they 
  be
  combined?)

 unique1d can return the indices - can this be achieved by using set(), too?


No, set() can't return the indices as far as I know.

 The implementation for arrays is the same already, IMHO, so I would
 prefer adding return_index, return_inverse to unique (automatically
 converting input to array, if necessary), and deprecate unique1d.

 We can view it also as adding the set() approach to unique1d, when the
 return_index, return_inverse arguments are not set, and renaming
 unique1d - unique.


This sounds good. If you don't have time to do it, I don't mind having
a go at writing
a patch to implement these changes (deprecate the existing unique1d, rename
unique1d to unique and add the set approach from the old unique, and the other
changes mentioned in http://projects.scipy.org/numpy/ticket/1133).

 I have found a strange bug in unique():

 In [24]: l = list(np.random.randint(100, size=1000))

 In [25]: %timeit np.unique(l)
 ---
 UnicodeEncodeErrorTraceback (most recent call last)

 /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s)
  951 else:
  952 magic_args = self.var_expand(magic_args,1)
 -- 953 return fn(magic_args)
  954
  955 def ipalias(self,arg_s):

 /usr/lib64/python2.5/site-packages/IPython/Magic.py in
 magic_timeit(self, parameter_s)
 1829
 precision,
 1830   best
 * scaling[order],
 - 1831
 units[order])
 1832 if tc  tc_min:
 1833 print Compiler time: %.2f s % tc

 UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
 position 28: ordinal not in range(128)

 It disappears after increasing the array size, or the integer size.
 In [39]: np.__version__
 Out[39]: '1.4.0.dev7047'

 r.

Weird! From the error message, it looks like a problem with ipython's timeit
function rather than unique. I can't reproduce it on my machine
(numpy 1.4.0.dev, r7059;   IPython 0.10.bzr.r1163 ).

Neil
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arraysetops

2009-06-14 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 
 Hi,
 
 I am starting a new thread, so that it reaches the interested people.
 Let us discuss improvements to arraysetops (array set operations) at [1] 
 (allowing non-unique arrays as function arguments, better naming 
 conventions and documentation).
 
 r.
 
 [1] http://projects.scipy.org/numpy/ticket/1133
 

Hi,

These changes looks good to me.  For point (1) I think we should fold the 
unique and _nu code into a single function. For point (3) I like in1d - it's 
shorter than isin1d but is still clear.

What about merging unique and unique1d?  They're essentially identical for an 
array input, but unique uses the builtin set() for non-array inputs and so is 
around 2x faster in this case - see below. Is it worth accepting a speed 
regression for unique to get rid of the function duplication?  (Or can they be 
combined?) 


Neil


In [24]: l = list(np.random.randint(100, size=1))
In [25]: %timeit np.unique1d(l)
1000 loops, best of 3: 1.9 ms per loop
In [26]: %timeit np.unique(l)
1000 loops, best of 3: 793 µs per loop
In [27]: l = list(np.random.randint(100, size=100))
In [28]: %timeit np.unique(l)
10 loops, best of 3: 78 ms per loop
In [29]: %timeit np.unique1d(l)
10 loops, best of 3: 233 ms per loop

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] setmember1d_nu

2009-06-09 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

  I'd really like to see the setmember1d_nu function in ticket 1036 get into
  numpy. There's a patch waiting for review that including tests:
 
  http://projects.scipy.org/numpy/ticket/1036
 
  Is there anything I can do to help get it applied?
  
  I guess I could commit it, if you review the patch and it works for you. 
Obviously, I cannot review it myself, but my SVN access may still work :)
 
 Thanks for the review, it is in!
 
 r.

Great - thanks!  People often post to the list asking for this functionality, so
it's nice to get it into numpy (whatever it ends up being called).


Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-06 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Anne Archibald wrote:

  1. add a keyword argument to intersect1d assume_unique; if it is not
  present, check for uniqueness and emit a warning if not unique
  2. change the warning to an exception
  Optionally:
  3. change the meaning of the function to that of intersect1d_nu if the
  keyword argument is not present
  
 You mean something like:
 
 def intersect1d(ar1, ar2, assume_unique=False):
  if not assume_unique:
  return intersect1d_nu(ar1, ar2)
  else:
  ... # the current code
 
 intersect1d_nu could be still exported to numpy namespace, or not.
 

+1 - from the user's point of view there should just be intersect1d and
setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests
can be used if speed is a problem.

I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is
another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from 
readability, unlike the extra a in arange.

Can we summarise the discussion in this thread and write up a short proposal
about what we'd like to change in arraysetops, and how to make the changes? 
Then it's easy for other people to give their opinion on any changes. I can do
this if no one else has time.


Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Changes to arraysetops

2009-06-06 Thread Neil Crighton
Thanks for the summary!  I'm +1 on points 1, 2 and 3.

+0 for points 4 and 5 (assume_unique keyword and renaming arraysetops).

Neil

PS. I think you mean deprecate, not depreciate :)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] setmember1d_nu

2009-06-03 Thread Neil Crighton
Hi all,

I posted this message couple of days ago, but gmane grouped it with an old
thread and it hasn't shown up on the front page.  So here it is again...

I'd really like to see the setmember1d_nu function in ticket 1036 get into
numpy. There's a patch waiting for review that including tests:

http://projects.scipy.org/numpy/ticket/1036

Is there anything I can do to help get it applied?

Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] setmember1d_nu

2009-06-01 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Re-hi!
 
 Robert Cimrman wrote:
  Hi all,
  
  I have added to the ticket [1] a script that compares the proposed 
  setmember1d_nu() implementations of Neil and Kim. Comments are welcome!
  
  [1] http://projects.scipy.org/numpy/ticket/1036
 
 I have attached a patch incorporating the solution that the involved 
 people agreed on, so review, please.
 
 best regards,
 r.
 


Hi all,

I'd really like to see the setmember1d_nu function in ticket 1036 get into 
numpy. There's a patch waiting for review that includes tests for the new
function:

http://projects.scipy.org/numpy/ticket/1036

Is there anything I can do to help get it applied?

Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] List/location of consecutive integers

2009-05-25 Thread Neil Crighton
Andrea Gavana andrea.gavana at gmail.com writes:

 this should be a very easy question but I am trying to make a
 script run as fast as possible, so please bear with me if the solution
 is easy and I just overlooked it.

That's weird, I was trying to solve exactly the same problem a couple of weeks
ago for a program I was working on. It must come up a lot.

I ended up with a similar solution to Josef's, but it took me more than an hour
to work it out - I should have asked here first!


Neil




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masking an array with another array

2009-04-23 Thread Neil Crighton
 josef.pktd at gmail.com writes:

 setmember1d is very fast compared to the other solutions for large b.
 
 However, setmember1d requires that both arrays only have unique elements.
 
 So it doesn't work if, for example, your first array is a data vector
 with member ship in different groups (therefore not only uniques), and
 the second array is the sub group that you want to check.
 

Note there's a patch waiting to be reviewed that adds another version of
setmember_1d for non-unique inputs.

http://projects.scipy.org/numpy/ticket/1036


Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] intersect1d and setmember1d

2009-03-03 Thread Neil Crighton
Robert Kern robert.kern at gmail.com writes:

 Do you mind if we just add you to the THANKS.txt file, and consider
 you as a NumPy Developer per the LICENSE.txt as having released that
 code under the numpy license? If we're dotting our i's and crossing
 our t's legally, that's a bit more straightforward (oddly enough).
 

No, I don't mind having it released under the numpy licence.

Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] intersect1d and setmember1d

2009-03-02 Thread Neil Crighton
Robert Cimrman cimrman3 at ntc.zcu.cz writes:

 Hi Neil!
 
 I would like to add your function to arraysetops.py - is it ok? Just the 
 name would be changed to setmember1d_nu, to follow the naming in the 
 module (like intersect1d_nu).
 
 Thank you,
 r.
 

That's fine!  There's no licence attached, it's in the public domain.

Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy08] Documentation BoF

2008-08-26 Thread Neil Crighton

 - Should we have a separate User manual and a Reference manual, or only
 a single manual?


Are there still plans to write a 10 page 'Getting started with NumPy'
document?  I think this would be very useful.  Ideally a 'getting started'
document, the docstrings, and a reference manual is all the documentation
you'd need.  I try to avoid reading long user manuals unless I'm forced to,
but I don't know what other people do.

Neil
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reference guide updated

2008-07-23 Thread Neil Crighton
Ok, thanks.

I meant the amount of vertical space between lines of text - for
example, the gaps between parameter values and their description, or
the large spacing between both lines of text and and the text boxes in
the examples section. If other people agree it's a problem, I thought
the spacing could be tweaked.  It's not a problem that there's more
than one function per page.

I have been helping out with the docs (where I feel I'm qualified
enough - I'm no numpy expert!).  I think it will make numpy much
easier to learn to have easily-accessible, comprehensive docstrings,
and the documentation editor makes it very easy to contribute.  I'm
also learning a lot reading other people's docstrings :)

  It there a reason why there's so much vertical space between all of the
  text sections?  I find the docstrings much easier to read in the editor:

 Roughly:
 * In the editor, you have one function per page, vs several function per page
 in the reference
 * in the editor, the blocks 'Parameters', 'Returns'... are considered as
 sections, while in the reference, they are Field lists (roughly).

 In the end, it's only a matter of taste, of course. But you raise an
 interesting point, we should provide some kind of options to choose between
 behaviors.

 Please note as well that the reference guide is a work in progress: you're
 more than welcome to join and work with us.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reference guide updated

2008-07-22 Thread Neil Crighton
 A new copy of the reference guide is now available at
 http://mentat.za.net/numpy/refguide/

It there a reason why there's so much vertical space between all of the text
sections?  I find the docstrings much easier to read in the editor:

http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.fromnumeric.all/

than in the reference guide:

http://mentat.za.net/numpy/refguide/routines.logic.xhtml



Neil

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Making NumPy accessible to everyone (or no-one) (was Numpy-discussion Digest, Vol 19, Issue 44)

2008-04-10 Thread Neil Crighton
Thanks Joe for the excellent post. It mirrors my experience with
Python and Numpy very eloquently, and I think it presents a good
argument against the excessive use of namespaces. I'm not so worried
about N. vs np. though - I use the same method Matthew Brett suggests.
If I'm going to use, say, sin and cos a lot in a script such that all
the np. prefixes would make the code hard to read, I'll use:

import numpy as np
from numpy import sin,cos

To those people who have invoked 'Namespaces are a honking great idea
- let's do more of those', I'll cancel that with 'Flat is better than
nested' :)  I certainly wouldn't argue that using namespaces to
separate categories of functions is always a bad thing, but I think it
should only be done as a last resort.

Neil

On 10/04/2008, Joe Harrington [EMAIL PROTECTED] wrote:
   Absolutely.  Let's please standardize on:
   import numpy as np
   import scipy as sp

  I hope we do NOT standardize on these abbreviations.  While a few may
  have discussed it at a sprint, it hasn't seen broad discussion and
  there are reasons to prefer the other practice (numpy as N, scipy as
  S, pylab as P).  My reasons for saying this go back to my reasons for
  disliking lots of heirarchical namespaces at all: if we must have
  namespaces, let's minimize the visual and typing impact by making them
  short and visually distinct from the function names (by capitalizing
  them).

  What concerns me about the discussion is that we are still not
  thinking like communications and thought-process experts, we are
  thinking like categorizers and accountants.  The arguments we are
  raising don't have to do, positively or negatively, with the difficult
  acts of communicating with a computer and with other readers of our
  code.  Those are the sole purposes of computer languages.

  Namespaces add characters to code that have a high redundancy factor.
  This means they pollute code, make it slow and inaccurate to read, and
  making learning harder.  Lines get longer and may wrap if they contain
  several calls.  It is harder while visually scanning code to
  distinguish the function name if it's adjacent to a bunch of other
  text, particularly if that text appears commonly in the nearby code.
  It therefore becomes harder to spot bugs.  Mathematical code becomes
  less and less like the math expressions we write on paper when doing
  derivations, making it harder to interpret and verify.  You have to
  memorize which subpackage each function is in, which is hard to do for
  those functions that could naturally go in two subpackages.  While
  many math function names are obvious, subpackage names are not.  Is it
  .stat or .stats or .statistics?  .rand or .random?  .fin or
  .financial?  Some functions have this problem, but *every* namespace
  name has it in spades.

  The arguments people are raising are arguments related to how
  emotionally satisfying it is to have a place for everything and
  everything in its place, and to know you know everything there is to
  know.  While we like both those things, as scientists, engineers, and
  mathematicians, they are almost irrelevant to coding.  There is simply
  no reduction in readability, writeability, or debugability if you
  don't have namespace prefixes on everything, and knowing you know
  everything is easily accomplished now with the online categorized
  function list.  We can incorporate that functionality into the doc
  reading apparatus (help, currently) by using keywords in ReST
  comments in the docstrings and providing a way for help and its
  friends to list the keywords and what functions are connected to them.

  What nobody has said is if we have lots of namespaces, my code will
  look prettier or if we have lots of namespaces, normal people will
  learn faster or if we have lots of namespaces, my code will be
  easier to verify and debug.  I don't believe any of these statements
  to be true.  Do you?

  Similarly, nobody has said, if we have lots of namespaces, I'll be a
  faster coder.  There is a *very* high obnoxiousness factor in typing
  redundant stuff at an interpreter.  It's already annoying to type
  N.sin instead of sin, but N.T.sin?  Or worse, np.tg.sin?  Now the
  prefix has twice the characters of the function itself!  Most IDL
  users *hate* that you have to type print,  in order to inspect the
  contents of a variable.  Yet, with multiple layers of namespaces we'd
  have lots more than seven extra characters on most lines of code, and
  unlike the IDL mess you'd have to *think* to recall what the right
  extra characters were for each function call, unlike just telling your
  hands to run the print,  finger macro once again.

  The reasons we all like Python relate to how quick and easy it is to
  emit code from our fingertips that is similar to what we are thinking
  in our brains, compared to other languages.  The brain doesn't declare
  variables, nor run loops over arrays.  Neither does Python.  When we
  

Re: [Numpy-discussion] Simple financial functions for NumPy

2008-04-05 Thread Neil Crighton
I'm just a numpy user, but for what it's worth, I would much prefer to
have a single numpy namespace with a small as possible number of
objects inside that namespace. To me, 'as small as possible' means
that it only includes the array and associated array manipulation
functions (searchsorted, where, and the record array functions), and
the various functions that operate on arrays (exp, log, sin, cos, abs,
any, etc).

Having a small number of objects in a single namespace means that
numpy is much easier to learn for beginners, as it's easier to find
the appropriate thing for what you want to do (this is also helped by
reducing duplication *shakes fist at .compress* and good
documentation). It's also much easier to explore with dir() to jog
your memory as to what function you need for a task.

If I felt I contributed enough to this list to have a '1', I would be
-1 on adding financial functions to numpy.


  On Fri, Apr 4, 2008 at 3:31 PM, Anne Archibald [EMAIL PROTECTED]
  wrote:

   On 04/04/2008, Alan G Isaac [EMAIL PROTECTED] wrote:
  
   It seems to me that there are two separate issues people are talking
   about when they talk about packaging:
  
   * How should functions be arranged in the namespaces? numpy.foo(),
   scipy.foo(), numpy.lib.financial.foo(), scikits.foo(),
   numkitfull.foo()?
  
   * Which code should be distributed together? Should scipy require
   separate downloading and compilation from numpy?
  
   The two questions are not completely independent - it would be
   horribly confusing to have the set of functions available in a given
   namespace depend on which packages you had installed - but for the
   most part it's not a problem to have several toplevel namespaces in
   one package (python's library is the biggest example of this I know
   of).
  
   For the first question, there's definitely a question about how much
   should be done with namespaces and how much with documentation. The
   second is a different story.
  
   Personally, I would prefer if numpy and scipy were distributed
   together, preferably with matplotlib. Then anybody who used numpy
   would have available all the scpy tools and all the plotting tools; I
   think it would cut down on wheel reinvention and make application
   development easier. Teachers would not need to restrict themselves to
   using only functions built into numpy for fear that their students
   might not have scipy installed - how many students have learned to
   save their arrays in unportable binary formats because their teacher
   didn't want them to have to install scipy?
  
   I realize that this poses technical problems. For me installing scipy
   is just a matter of clicking on a checkbox and installing a 30 MB
   package, but I realize that some platforms make this much more
   difficult. If we can't just bundle the two, fine. But I think it is
   mad to consider subdividing further if we don't have to.


  If these were tightly tied together, for instance in one big dll , this
  would be unpleasant for me. I still have people downloading stuff over 56k
  modems and adding an extra 30 MB to the already somewhat bloated numpy
  distribution would make there lives more tedious than they already are.

   I think python's success is due in part to its batteries included

   library. The fact that you can just write a short python script with
   no extra dependencies that can download files from the Web, parse XML,
   manage subprocesses, and save persistent objects makes development
   much faster than if you had to forever decide between adding
   dependencies and reinventing the wheel. I think numpy and scipy should
   follow the same principle, of coming batteries included.


  One thing they try to do in Python proper is think a lot more before adding
  stuff to the standard library. Generally packages need to exist separately
  for some period of time to prove there general utility and to stabilize
  before they get accepted.  Particularly in the core, but in the library as
  well, they make an effort to chose a compact set of primitive operations
  without a lot of duplication (the old There should be one-- and preferably
  only one --obvious way to do it.). The numpy community has, particularly of
  late, been rather quick to add things that seem like they *might *be useful.

  One of the advantages of having multiple namespaces would have been to
  enforce a certain amount of discipline on what went into numpy, since it
  would've been easier to look at and evaluate a few dozen functions that
  might have comprised some subpackage rather than, let's say, five hundred or
  so.

  I suspect it's too late now; numpy has chosen the path of matlab and the
  other array packages and is slowly accumulating nearly everything in one big
  flat namespace. I don't like it, but it seems pointless to fight it at this
  point.


  So in this specific case, yes, I think the financial functions should
   absolutely be included; whether 

Re: [Numpy-discussion] RAdian -- degres conversion

2007-12-16 Thread Neil Crighton
Do we really need these functions in numpy?  I mean it's just
multiplying/dividing the value by pi/180 (who knows why they're in the
math module..). Numpy doesn't have asin, acos, or atan either (they're
arcsin, arcos and arctan) so it isn't a superset of the math module.

I would like there to be fewer functions in numpy, not more.

  Someone on the wxPython list just pointed out that the math module now
  includes includes angle-conversion utilities:
 
  degrees.__doc__
  degrees(x) - converts angle x from radians to degrees
  radians.__doc__
  radians(x) - converts angle x from degrees to radians
 
  Not a big deal, but handy. As I generally like to think if numpy as a
  superset of the math module, perhaps is should include these too.


 Done.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion