Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread Éric Depagne
Hi.

Sorry for not having been clearer. I'll explain a little bit.

I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so 
I have 4k x 4k np.array.

My analysis starts with determining the bias level. To do that, I compute for 
each line, and then for each row, an histogram. 
So I compute 8000 histograms.

Here is the code I've used sofar:

for i in range(self.data.shape[0]):
   #Compute an histogram along the columns
   # Gets counts and bounds
self.countsC[i], self.boundsC[i] = np.histogram(data[i], 
bins=self.bins)
for i in range(self.data.shape[1]):
# Do the same, along the rows.
self.countsR[i], self.boundsR[i] = np.histogram(data[:,i], 
bins=self.bins)

And data.shape is (4000,4000).

If histogram  had an axis parameter, I could avoid the loop and I guess it 
would be faster.

Éric.
 So it seems that you give your array directly to histogramdd (asking a
 4000D histogram!). Surely that's not what you are trying to achieve. Can
 you elaborate more on your objectives? Perhaps some code (slow but
 working) to demonstrate the point.
 
 Regards,
 eat
 

Un clavier azerty en vaut deux
--
Éric Depagnee...@depagne.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread Thouis (Ray) Jones
How about something like this:

# numpy 1.6
def rowhist(A, bins=100):
assert (bins  0)
assert isinstance(bins, int)
rownum = np.arange(A.shape[0]).reshape((-1, 1)).astype(int) * bins
intA = (bins * (A - A.min()) / float(A.max() - A.min())).astype(int)
intA[intA == bins] = bins - 1
return np.bincount((intA + rownum).flatten(),
minlength=(A.shape[0]).reshape((A.shape[0], bins))

# numpy 1.5
def rowhist(A, bins=100):
assert (bins  0)
assert isinstance(bins, int)
rownum = np.arange(A.shape[0]).reshape((-1, 1)).astype(int) * bins
intA = (bins * (A - A.min()) / float(A.max() - A.min())).astype(int)
intA[intA == bins] = bins - 1
counts = np.zeros(A.shape[0] * bins)
bc = np.bincount((intA + rownum).flatten())
counts[:len(bc)] = bc
return counts.reshape((A.shape[0], bins))


On Wed, Mar 30, 2011 at 09:04, Éric Depagne e...@depagne.org wrote:
 Hi.

 Sorry for not having been clearer. I'll explain a little bit.

 I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so
 I have 4k x 4k np.array.

 My analysis starts with determining the bias level. To do that, I compute for
 each line, and then for each row, an histogram.
 So I compute 8000 histograms.

 Here is the code I've used sofar:

        for i in range(self.data.shape[0]):
           #Compute an histogram along the columns
           # Gets counts and bounds
            self.countsC[i], self.boundsC[i] = np.histogram(data[i],
 bins=self.bins)
        for i in range(self.data.shape[1]):
            # Do the same, along the rows.
            self.countsR[i], self.boundsR[i] = np.histogram(data[:,i],
 bins=self.bins)

 And data.shape is (4000,4000).

 If histogram  had an axis parameter, I could avoid the loop and I guess it
 would be faster.

 Éric.
 So it seems that you give your array directly to histogramdd (asking a
 4000D histogram!). Surely that's not what you are trying to achieve. Can
 you elaborate more on your objectives? Perhaps some code (slow but
 working) to demonstrate the point.

 Regards,
 eat


 Un clavier azerty en vaut deux
 --
 Éric Depagne                            e...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread eat
Hi,

On Wed, Mar 30, 2011 at 10:04 AM, Éric Depagne e...@depagne.org wrote:

 Hi.

 Sorry for not having been clearer. I'll explain a little bit.

 I have 4k x 4k images that I want to analyse. I turn them into numpy arrays
 so
 I have 4k x 4k np.array.

 My analysis starts with determining the bias level. To do that, I compute
 for
 each line, and then for each row, an histogram.
 So I compute 8000 histograms.

 Here is the code I've used sofar:

for i in range(self.data.shape[0]):
   #Compute an histogram along the columns
   # Gets counts and bounds
self.countsC[i], self.boundsC[i] = np.histogram(data[i],
 bins=self.bins)
for i in range(self.data.shape[1]):
# Do the same, along the rows.
self.countsR[i], self.boundsR[i] = np.histogram(data[:,i],
 bins=self.bins)

 And data.shape is (4000,4000).

 If histogram  had an axis parameter, I could avoid the loop and I guess it
 would be faster.

Well I guess, for a slight performance improvement, you could create your
own streamlined histogrammer.

But, in order to better grasp your situation it would be beneficial to know
how the counts and bounds are used later on. Just wondering if this kind
massive histogramming could be somehow avoided totally.

Regards,
eat


 Éric.
  So it seems that you give your array directly to histogramdd (asking a
  4000D histogram!). Surely that's not what you are trying to achieve. Can
  you elaborate more on your objectives? Perhaps some code (slow but
  working) to demonstrate the point.
 
  Regards,
  eat
 

 Un clavier azerty en vaut deux
 --
 Éric Depagnee...@depagne.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question regarding concatenate/vstack.

2011-03-30 Thread andrew nelson
Dear List,
I have a quick question regarding vstack and concatenate.
In the docs for vstack it says that:

np.concatenate(tup, axis=0)

should be equivalent to:

np.vstack(tup)

However, I tried this out and it doesn't seem to be case, i.e.

 np.vstack((np.arange(5.), np.arange(5.)))
array([[ 0.,  1.,  2.,  3.,  4.],
   [ 0.,  1.,  2.,  3.,  4.]])

 np.concatenate((np.arange(5.),np.arange(5.)), axis=0)
array([ 0.,  1.,  2.,  3.,  4.,  0.,  1.,  2.,  3.,  4.])

These aren't the same. Maybe I'm missing something?

regards,
Andrew.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.histogram on arrays.

2011-03-30 Thread Éric Depagne
 
 Well I guess, for a slight performance improvement, you could create your
 own streamlined histogrammer.
 
 But, in order to better grasp your situation it would be beneficial to know
 how the counts and bounds are used later on. Just wondering if this kind
 massive histogramming could be somehow avoided totally.
Indeed.
Here's what I do.
My images come from CCD, and as such, the zero level in the image is not the 
true zero level, but is the true zero + the background noise of each pixels.
By doing the histogram, I plan on detecting what is the most common value per 
row. 
Once I have the most common value, I can derive the interval where most of the 
values are (the index of the largest occurence is easily obtained by sorting 
the counts, and I take a slice [index_max_count,index_max_count+1] in the 
second array given by the histogram).
Then, I  take the mean value of this interval and I assume it is the value of 
the bias for my row. 

I do this procedure both on the row and columns as a sanity check.
And I know this procedure will not work if on any row/column there is a lot of 
signal and very little bias. I'll fix that afterwards ;-)

Éric.


 
 Regards,
 eat
 

Un clavier azerty en vaut deux
--
Éric Depagnee...@depagne.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question regarding concatenate/vstack.

2011-03-30 Thread gary ruben
You're right, they are not equivalent. vstack will happily create an
array of higher rank than the parts it is stacking, whereas
concatenate requires the arrays it is working with to already be at
least 2d, so the equivalent is
np.concatenate((np.arange(5.)[newaxis],np.arange(5.)[newaxis]), axis=0)
or
np.concatenate((np.atleast_2d(np.arange(5.)),np.atleast_2d(np.arange(5.))),
axis=0)

Gary R.

On Wed, Mar 30, 2011 at 9:30 PM, andrew nelson andyf...@gmail.com wrote:
 Dear List,
 I have a quick question regarding vstack and concatenate.
 In the docs for vstack it says that:

 np.concatenate(tup, axis=0)

 should be equivalent to:

 np.vstack(tup)

 However, I tried this out and it doesn't seem to be case, i.e.

 np.vstack((np.arange(5.), np.arange(5.)))
 array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]])

 np.concatenate((np.arange(5.),np.arange(5.)), axis=0)
 array([ 0.,  1.,  2.,  3.,  4.,  0.,  1.,  2.,  3.,  4.])

 These aren't the same. Maybe I'm missing something?

 regards,
 Andrew.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question regarding concatenate/vstack.

2011-03-30 Thread Ralf Gommers
On Wed, Mar 30, 2011 at 1:42 PM, gary ruben gru...@bigpond.net.au wrote:
 You're right, they are not equivalent. vstack will happily create an
 array of higher rank than the parts it is stacking, whereas
 concatenate requires the arrays it is working with to already be at
 least 2d, so the equivalent is
 np.concatenate((np.arange(5.)[newaxis],np.arange(5.)[newaxis]), axis=0)
 or
 np.concatenate((np.atleast_2d(np.arange(5.)),np.atleast_2d(np.arange(5.))),
 axis=0)

This is fixed in the docstring now.

Ralf

 On Wed, Mar 30, 2011 at 9:30 PM, andrew nelson andyf...@gmail.com wrote:
 Dear List,
 I have a quick question regarding vstack and concatenate.
 In the docs for vstack it says that:

 np.concatenate(tup, axis=0)

 should be equivalent to:

 np.vstack(tup)

 However, I tried this out and it doesn't seem to be case, i.e.

 np.vstack((np.arange(5.), np.arange(5.)))
 array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]])

 np.concatenate((np.arange(5.),np.arange(5.)), axis=0)
 array([ 0.,  1.,  2.,  3.,  4.,  0.,  1.,  2.,  3.,  4.])

 These aren't the same. Maybe I'm missing something?

 regards,
 Andrew.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in genfromtxt for python 3.2

2011-03-30 Thread Ralf Gommers
On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Mon, Mar 28, 2011 at 11:29 PM,  josef.p...@gmail.com wrote:
 numpy/lib/test_io.py    only uses StringIO in the test, no actual csv file

 If I give the filename than I get a  TypeError: Can't convert 'bytes'
 object to str implicitly


 from the statsmodels mailing list example

 data = recfromtxt(open('./star98.csv', U), delimiter=,, 
 skip_header=1, dtype=float)
 Traceback (most recent call last):
  File pyshell#30, line 1, in module
    data = recfromtxt(open('./star98.csv', U), delimiter=,,
 skip_header=1, dtype=float)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1633, in recfromtxt
    output = genfromtxt(fname, **kwargs)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1181, in genfromtxt
    first_values = split_line(first_line)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py,
 line 206, in _delimited_splitter
    line = line.split(self.comments)[0].strip(asbytes( \r\n))
 TypeError: Can't convert 'bytes' object to str implicitly

 Is the right fix for this to open a 'filename' passed to genfromtxt,
 as 'binary' (bytes)?

 If so I will submit a pull request with a fix and a test,

Seems to work and is what was intended I think, see Pauli's
changes/notes in commit 0f2e7db0.

This is ticket #1607 by the way.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Warning: invalid value encountered in true_divide?

2011-03-30 Thread Joon Ro
Hi,After numpy upgrade, I started to get "Warning: invalid value encountered in true_divide," when I run a code which did now show any warningpreviously.What does it mean and where should I look to fix this? It does not stop my debugger so I could not identify where the message was from.Thank you,Joon___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Warning: invalid value encountered in true_divide?

2011-03-30 Thread Robert Kern
On Wed, Mar 30, 2011 at 12:12, Joon Ro joonp...@gmail.com wrote:
 Hi,
 After numpy upgrade, I started to get Warning: invalid value encountered in
 true_divide, when I run a code which did now show any warning previously.
 What does it mean and where should I look to fix this?

It means that a NaN popped up in a division somewhere. It always was
there, but some previous versions of numpy had the warnings
unintentionally silenced.

 It does not stop my
 debugger so I could not identify where the message was from.

You can use np.seterr() to change how these warnings are printed. In
particular, you can cause an exception to be raised so that you can
use a debugger to locate the source.

  http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in genfromtxt for python 3.2

2011-03-30 Thread Matthew Brett
Hi,

On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Mon, Mar 28, 2011 at 11:29 PM,  josef.p...@gmail.com wrote:
 numpy/lib/test_io.py    only uses StringIO in the test, no actual csv file

 If I give the filename than I get a  TypeError: Can't convert 'bytes'
 object to str implicitly


 from the statsmodels mailing list example

 data = recfromtxt(open('./star98.csv', U), delimiter=,, 
 skip_header=1, dtype=float)
 Traceback (most recent call last):
  File pyshell#30, line 1, in module
    data = recfromtxt(open('./star98.csv', U), delimiter=,,
 skip_header=1, dtype=float)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1633, in recfromtxt
    output = genfromtxt(fname, **kwargs)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1181, in genfromtxt
    first_values = split_line(first_line)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py,
 line 206, in _delimited_splitter
    line = line.split(self.comments)[0].strip(asbytes( \r\n))
 TypeError: Can't convert 'bytes' object to str implicitly

 Is the right fix for this to open a 'filename' passed to genfromtxt,
 as 'binary' (bytes)?

 If so I will submit a pull request with a fix and a test,

 Seems to work and is what was intended I think, see Pauli's
 changes/notes in commit 0f2e7db0.

 This is ticket #1607 by the way.

Thanks for making a ticket.  I've submitted a pull request for the fix
and linked to it from the ticket.

The reason I asked whether this was the correct fix was:

imagine I'm working with a non-latin default encoding, and I've opened a file:

fobj = open('my_nonlatin.txt', 'rt')

in python 3.2.  That might contain numbers and non-latin text.   I
can't pass that into 'genfromtxt' because it will give me this error
above.  I can pass it is as binary but then I'll get garbled text.

Should those functions also allow unicode-providing files (perhaps
with binary as default for speed)?

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in genfromtxt for python 3.2

2011-03-30 Thread Ralf Gommers
On Wed, Mar 30, 2011 at 7:37 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Mon, Mar 28, 2011 at 11:29 PM,  josef.p...@gmail.com wrote:
 numpy/lib/test_io.py    only uses StringIO in the test, no actual csv file

 If I give the filename than I get a  TypeError: Can't convert 'bytes'
 object to str implicitly


 from the statsmodels mailing list example

 data = recfromtxt(open('./star98.csv', U), delimiter=,, 
 skip_header=1, dtype=float)
 Traceback (most recent call last):
  File pyshell#30, line 1, in module
    data = recfromtxt(open('./star98.csv', U), delimiter=,,
 skip_header=1, dtype=float)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1633, in recfromtxt
    output = genfromtxt(fname, **kwargs)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py,
 line 1181, in genfromtxt
    first_values = split_line(first_line)
  File C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py,
 line 206, in _delimited_splitter
    line = line.split(self.comments)[0].strip(asbytes( \r\n))
 TypeError: Can't convert 'bytes' object to str implicitly

 Is the right fix for this to open a 'filename' passed to genfromtxt,
 as 'binary' (bytes)?

 If so I will submit a pull request with a fix and a test,

 Seems to work and is what was intended I think, see Pauli's
 changes/notes in commit 0f2e7db0.

 This is ticket #1607 by the way.

 Thanks for making a ticket.  I've submitted a pull request for the fix
 and linked to it from the ticket.

 The reason I asked whether this was the correct fix was:

 imagine I'm working with a non-latin default encoding, and I've opened a file:

 fobj = open('my_nonlatin.txt', 'rt')

 in python 3.2.  That might contain numbers and non-latin text.   I
 can't pass that into 'genfromtxt' because it will give me this error
 above.  I can pass it is as binary but then I'll get garbled text.

I admit the string/bytes thing is still a little confusing to me, but
isn't that always going to be a problem (even with python 2.x)?
There's no way for genfromtxt to know what the encoding of an
arbitrary file is. So your choices are garbled text or an error.
Garbled text is better.

It may help to explicitly say in the docstring that this is an ASCII
routine (as it does in the source code).

Ralf


 Should those functions also allow unicode-providing files (perhaps
 with binary as default for speed)?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in genfromtxt for python 3.2

2011-03-30 Thread Pauli Virtanen
On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote:
[clip]
 imagine I'm working with a non-latin default encoding, and I've opened a
 file:
 
 fobj = open('my_nonlatin.txt', 'rt')
 
 in python 3.2.  That might contain numbers and non-latin text.   I can't
 pass that into 'genfromtxt' because it will give me this error above.  I
 can pass it is as binary but then I'll get garbled text.

That's the way it also works on Python 2. The text is not garbled -- it's 
just in some binary representation that you can later on decode to 
unicode:

 np.array(['asd']).view(np.chararray).decode('utf-8')
array([u'asd'], 
  dtype='U3')

Granted, utf-16 and the ilk might be problematic.

 Should those functions also allow unicode-providing files (perhaps with
 binary as default for speed)?

Nobody has yet asked for this feature as far as I know, so I guess the 
need for it is pretty low.

Personally, I don't think going unicode makes much sense here. First, it 
would be a Py3-only feature. Second, there is a real need for it only 
when dealing with multibyte encodings, which are seldom used these days 
with utf-8 rightfully dominating.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Old tickets

2011-03-30 Thread Bruce Southey
Hi,
This followup on tickets that I had previously indicated. So I want to 
thank Mark, Ralph and any people for going over those!

For those that I followed I generally agreed with the outcome.

Ticket 301: 'Make power and divide return floats from int inputs (like 
true_divide)'
http://projects.scipy.org/numpy/ticket/301
Invalid because the output dtype is the same as the input dtype unless 
you override using the dtype argument:
  np.power(3, 1, dtype=np.float128).dtype
dtype('float128')
Alternatively return a float and indicate in the docstring that the 
output dtype can be changed.

Ticket 354: 'Possible inconsistency in 0-dim and scalar empty array types'
http://projects.scipy.org/numpy/ticket/354
Invalid because an empty array is not the same as an empty string.

Ticket 1071: 'loadtxt fails if the last column contains empty value'
http://projects.scipy.org/numpy/ticket/1071
Invalid mainly because loadtxt states that 'Each row in the text file 
must have the same number of values.' So of cause loadtxt must fail when 
there are missing values.

Ticket 1374: 'Ticket 628 not fixed for Solaris (polyfit uses 100% CPU 
and does not stop)'
http://projects.scipy.org/numpy/ticket/1374
Unless this can be verified it should be set as needs_info.

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in genfromtxt for python 3.2

2011-03-30 Thread Matthew Brett
Hi,

On Wed, Mar 30, 2011 at 11:32 AM, Pauli Virtanen p...@iki.fi wrote:
 On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote:
 [clip]
 imagine I'm working with a non-latin default encoding, and I've opened a
 file:

 fobj = open('my_nonlatin.txt', 'rt')

 in python 3.2.  That might contain numbers and non-latin text.   I can't
 pass that into 'genfromtxt' because it will give me this error above.  I
 can pass it is as binary but then I'll get garbled text.

 That's the way it also works on Python 2. The text is not garbled -- it's
 just in some binary representation that you can later on decode to
 unicode:

 np.array(['asd']).view(np.chararray).decode('utf-8')
 array([u'asd'],
      dtype='U3')

 Granted, utf-16 and the ilk might be problematic.

 Should those functions also allow unicode-providing files (perhaps with
 binary as default for speed)?

 Nobody has yet asked for this feature as far as I know, so I guess the
 need for it is pretty low.

 Personally, I don't think going unicode makes much sense here. First, it
 would be a Py3-only feature. Second, there is a real need for it only
 when dealing with multibyte encodings, which are seldom used these days
 with utf-8 rightfully dominating.

It's not a feature I need, but then, I'm afraid all the languages I've
been taught are latin-1.  Oh, except I learnt a tiny bit of Greek.
But I don't use it for work :)

I suppose the annoyances would be:

1) Probably temporary surprise that genfromtxt(open('my_file.txt',
'rt')) generates this error
2) Having to go back over returned arrays decoding stuff for utf-8
3) Wrong results for other encodings

Maybe the best way is a graceful warning on entry to the routine?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-30 Thread Ralf Gommers
On Thu, Mar 24, 2011 at 5:25 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 On Thu, Mar 24, 2011 at 5:11 PM, Robert Kern robert.k...@gmail.com wrote:
 2011/3/24 Dmitrey tm...@ukr.net:
 from numpy import inf, array
 inf*0
 nan

 (ok)

 array(inf) * 0.0
 StdErr: Warning: invalid value encountered in multiply
 nan

 My cycled calculations yields this thousands times slowing computations and
 making text output completely non-readable.

 from numpy import __version__
 __version__
 '2.0.0.dev-1fe8136'

 We really should change the default to 'warn' for numpy 2.0. Maybe
 even for numpy 1.6. We've talked about it before, and I think most
 people were in favor. We just never pulled the trigger.

 Old thread on this topic:
 http://thread.gmane.org/gmane.comp.python.numeric.general/35664

 Devs, what say you?

 Works for me, also for 1.6.

Hi, just pinging this issue. If this is to happen for 1.6 it should go
in the next beta (probably this weekend, only waiting for the
genfromtxt issue to be resolved).

Some more input would be good. As would a patch.

Thanks,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Old tickets

2011-03-30 Thread Benjamin Root
On Wed, Mar 30, 2011 at 2:37 PM, Bruce Southey bsout...@gmail.com wrote:

 Hi,
 This followup on tickets that I had previously indicated. So I want to
 thank Mark, Ralph and any people for going over those!

 For those that I followed I generally agreed with the outcome.

 Ticket 301: 'Make power and divide return floats from int inputs (like
 true_divide)'
 http://projects.scipy.org/numpy/ticket/301
 Invalid because the output dtype is the same as the input dtype unless
 you override using the dtype argument:
   np.power(3, 1, dtype=np.float128).dtype
 dtype('float128')
 Alternatively return a float and indicate in the docstring that the
 output dtype can be changed.


FWIW,

Just thought I'd note (on a python 2.6 system):

 import numpy as np
 a = np.array([1, 2, 3, 4])
 a.dtype
dtype('int32')
 2 / a
array([2, 1, 0, 0])
 from __future__ import division
 2 / a
array([ 2.,  1.,  0.6667,  0.5   ])

So, numpy already does this when true division is imported (and therefore
consistent with whatever the python environment does), and python currently
also returns integers for exponentials when both inputs are integers.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Old tickets

2011-03-30 Thread Derek Homeier

On 30 Mar 2011, at 23:26, Benjamin Root wrote:

 Ticket 301: 'Make power and divide return floats from int inputs (like
 true_divide)'
 http://projects.scipy.org/numpy/ticket/301
 Invalid because the output dtype is the same as the input dtype unless
 you override using the dtype argument:
   np.power(3, 1, dtype=np.float128).dtype
 dtype('float128')
 Alternatively return a float and indicate in the docstring that the
 output dtype can be changed.


 FWIW,

 Just thought I'd note (on a python 2.6 system):

  import numpy as np
  a = np.array([1, 2, 3, 4])
  a.dtype
 dtype('int32')
  2 / a
 array([2, 1, 0, 0])
  from __future__ import division
  2 / a
 array([ 2.,  1.,  0.6667,  0.5   ])

 So, numpy already does this when true division is imported (and  
 therefore consistent with whatever the python environment does), and  
 python currently also returns integers for exponentials when both  
 inputs are integers.

I'd agree, and in my view power(3, -1) is well defined as 1 / 3 -  
also, in future (or Python3)

  a/2
array([ 0.5,  1. ,  1.5,  2. ])
  a//2
array([0, 1, 1, 2], dtype=int32)

so I think at least a**n should follow integer math rules; depends on  
whether we want
np.power to behave differently from ** (if they are internally handled  
separately at all)...
Not sure if I understand the overload suggestion in the ticket, but  
maybe a solution
could be using the output argument (if an explicit optional dtype is  
not an option):

  b = np.zeros(2, dtype=np.int32)
  np.power(np.arange(1,3),-2, b)
array([1, 0])
  b = np.zeros(2)
  np.power(np.arange(1,3),-2, b)
array([ 1.,  0.])
 ^^
this could be changed to array([ 1.,  0.25])

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Old tickets

2011-03-30 Thread Derek Homeier
Hi,

On 30 Mar 2011, at 21:37, Bruce Southey wrote:

 Ticket 1071: 'loadtxt fails if the last column contains empty value'
 http://projects.scipy.org/numpy/ticket/1071
 Invalid mainly because loadtxt states that 'Each row in the text file
 must have the same number of values.' So of cause loadtxt must fail  
 when
 there are missing values.

I don't follow the line of argument - see my comment on the ticket.
This covers cases where missing values could always have been
caught by the user - Converters can also be used to
 provide a default value for missing data:
 ``converters = {3: lambda s: float(s or 0)}``.

The ticket simply addresses the issue that delimiter='\t' is treated
differently from other delimiters if (and only if) the missing value is
the last item in the row.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-30 Thread Robert Kern
On Wed, Mar 30, 2011 at 16:03, Ralf Gommers ralf.gomm...@googlemail.com wrote:
 On Thu, Mar 24, 2011 at 5:25 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 On Thu, Mar 24, 2011 at 5:11 PM, Robert Kern robert.k...@gmail.com wrote:

 We really should change the default to 'warn' for numpy 2.0. Maybe
 even for numpy 1.6. We've talked about it before, and I think most
 people were in favor. We just never pulled the trigger.

 Old thread on this topic:
 http://thread.gmane.org/gmane.comp.python.numeric.general/35664

 Devs, what say you?

 Works for me, also for 1.6.

 Hi, just pinging this issue. If this is to happen for 1.6 it should go
 in the next beta (probably this weekend, only waiting for the
 genfromtxt issue to be resolved).

 Some more input would be good. As would a patch.

Patch:

  https://github.com/numpy/numpy/pull/65

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.6.0b1 half float buffer bug?

2011-03-30 Thread Eli Stevens (Gmail)
On Fri, Mar 25, 2011 at 10:00 AM, Eli Stevens (Gmail)
wickedg...@gmail.com wrote:
 Can anyone please give me some suggestions on how to go about writing
 a unit test for this?  Or should I just submit a pull request?

I've gotten a bit of positive feedback to adding the 'e' type to the
struct module on the python-ideas list (per my understanding, not
before python 3.3, but I don't think that should hinder adoption in
other libraries), so I'd like to ask again about unit testing a change
like this.  Can anyone offer some advice for where to start?

Also, what kind of timeframe / cutoff am I looking at to get this into
1.6.0 or 1.6.x?

Thanks,
Eli
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt/savetxt tickets

2011-03-30 Thread Charles R Harris
On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 26. mars 2011, at 21.44, Derek Homeier wrote:

  Hi Paul,
 
  having had a look at the other tickets you dug up,
 
  My opinions are my own, and in detail, they are:
  1752:
I attach a possible patch. FWIW, I agree with the request. The
  patch is written to be compatible with the fix in ticket #1562, but
  I did not test that yet.
 
  Tested, see also my comments on Trac.

 Great!

  1731:
This seems like a rather trivial feature enhancement. I attach a
  possible patch.
 
  Agreed. Haven't tested it though.

 Great!

  1616:
The suggested patch seems reasonable to me, but I do not have a
  full list of what objects loadtxt supports today as opposed to what
  this patch will support.

 Looks like you got this one. Just remember to make it compatible with
 #1752. Should be easy.

  1562:
I attach a possible patch. This could also be the default
  behavior to my mind, since the function caller can simply call
  numpy.squeeze if needed. Changing default behavior would probably
  break old code, however.
 
  See comments on Trac as well.

 Your patch is better, but there is one thing I disagree with.
 808if X.ndim  ndmin:
 809if ndmin == 1:
 810X.shape = (X.size, )
 811elif ndmin == 2:
 812X.shape = (X.size, 1)
 The last line should be:
 812X.shape = (1, X.size)
 If someone wants a 2D array out, they would most likely expect a one-row
 file to come out as a one-row array, not the other way around. IMHO.

  1458:
The fix suggested in the ticket seems reasonable, but I have
  never used record arrays, so I am not sure  of this.
 
  There were some issues with Python3, and I also had some general
  reservations
  as noted on Trac - basically, it makes 'unpack' equivalent to
  transposing for 2D-arrays,
  but to splitting into fields for 1D-recarrays. My question was, what's
  going to happen
  when you get to 2D-recarrays? Currently this is not an issue since
  loadtxt can only
  read 2D regular or 1D structured arrays. But this might change if the
  data block
  functionality (see below) were to be implemented - data could then be
  returned as
  3D arrays or 2D structured arrays... Still, it would probably make
  most sense (or at
  least give the widest functionality) to have 'unpack=True' always
  return a list or iterator
  over columns.

 OK, I don't know recarrays, as I said.

  1445:
Adding this functionality could break old code, as some old
  datafiles may have empty lines which are now simply ignored. I do
  not think the feature is a good idea. It could rather be implemented
  as a separate function.
  1107:
I do not see the need for this enhancement. In my eyes, the
  usecols kwarg does this and more. Perhaps I am misunderstanding
  something here.
 
  Agree about #1445, and the bit about 'usecols' - 'numcols' would just
  provide a
  shorter call to e.g. read the first 20 columns of a file (well, not
  even that much
  over 'usecols=range(20)'...), don't think that justifies an extra
  argument.
  But the 'datablocks' provides something new, that a number of people
  seem
  to miss from e.g. gnuplot (including me, actually ;-). And it would
  also satisfy the
  request from #1445 without breaking backwards compatibility.
  I've been wondering if could instead specify the separator lines
  through the
  parameter, e.g. blocksep=['None', 'blank','invalid'], not sure if
  that would make
  it more useful...

 What about writing a separate function, e.g. loadblocktxt, and have it
 separate the chunks and call loadtxt for each chunk? Just a thought. Another
 possibility would be to write a function that would let you load a set of
 text files in a directory, and return a dict of datasets, one per file. One
 could write a similar save-function, too. They would just need to call
 loadtxt/savetxt on a per-file basis.

  1071:
   It is not clear to me whether loadtxt is supposed to support
  missing values in the fashion indicated in the ticket.
 
  In principle it should at least allow you to, by the use of converters
  as described there.
  The problem is, the default delimiter is described as 'any
  whitespace', which in the
  present implementation obviously includes any number of blanks or
  tabs. These
  are therefore treated differently from delimiters like ',' or ''. I'd
  reckon there are
  too many people actually relying on this behaviour to silently change it
  (e.g. I know plenty of tables with columns separated by either one or
  several
  tabs depending on the length of the previous entry). But the tab is
  apparently also
  treated differently if explicitly specified with delimiter='\t' -
  and in that case using
  a converter à la {2: lambda s: float(s or 'Nan')} is working for
  fields in the middle of
  the line, but not at the end - clearly warrants improvement. I've
  prepared a patch
  working for Python3