Re: [Numpy-discussion] something wrong with docs?

2009-09-23 Thread David Goldsmith
On Tue, Sep 22, 2009 at 9:29 PM, Fernando Perez fperez@gmail.comwrote:

 On Tue, Sep 22, 2009 at 7:31 PM, David Goldsmith is there a standard for
 these ala the docstring standard, or some other
  extant way to promulgate and strengthen your suggestion (after proper
  community vetting, of course);

 I'm not sure what you mean here, sorry.  I simply don't understand
 what you are looking to strengthen or what standard there could be:
 this is regular code that goes into reST blocks.  Sorry if I missed
 your point...


It would be nice if we could move gradually
towards docs whose examples (at least those marked as such) were
always run via sphinx.

That's a suggestion, but given your point, it seems like you'd advocate it
being more than that, no?

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] something wrong with docs?

2009-09-23 Thread Pauli Virtanen
Tue, 22 Sep 2009 23:15:56 -0700, David Goldsmith wrote:
[clip]
 It would be nice if we could move gradually towards docs whose examples
 (at least those marked as such) were always run via sphinx.

Also the  examples are doctestable, via numpy.test(doctests=True), or 
enabling Sphinx's doctest extension and its support for those.

What Fernando said about them being more clumsy to write and copy than 
separate code directives is of course true. I wonder if there's a 
technical fix that could be made in Sphinx, at least for HTML, to correct 
this...

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Deserialized arrays with base mutate strings

2009-09-23 Thread Hrvoje Niksic
Numpy arrays with the base property are deserialized as arrays
pointing to a storage contained within a Python string.  This is a
problem since such arrays are mutable and can mutate existing strings.
Here is how to create one:

   import numpy, cPickle as p
   a = numpy.array([1, 2, 3])# create an array
   b = a[::-1]   # create a view
   b
array([3, 2, 1])
   b.base# view's base is the original array
array([1, 2, 3])
   c = p.loads(p.dumps(b, -1))   # roundtrip the view through pickle
   c
array([3, 2, 1])
   c.base# base is now a simple string:
'\x03\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00'
   s = c.base
   s
'\x03\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00'
   type(s)
type 'str'
   c[0] = 4  # when the array is mutated...
   s # ...the string changes value!
'\x04\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00'

This is somewhat disconcerting, as Python strings are supposed to be
immutable.  In this case the string was created by numpy and is probably
not shared by anyone, so it doesn't present a problem in practice.  But
in corner cases it can lead to serious bugs.  Python has a cache of
one-letter strings, which cannot be turned off.  This means that
one-byte array views can change existing Python strings used elsewhere
in the code.  For example:

   a = numpy.array([65], 'int8')
   b = a[::-1]
   c = p.loads(p.dumps(b, -1))
   c
array([65], dtype=int8)
   c.base
'A'
   c[0] = 66
   c.base
'B'
   'A'
'B'

Note how changing a numpy array permanently changed the contents of all
'A' strings in this python instance, rendering python unusable.

The fix should be straightforward: use a string subclass (which will
skip the one-letter cache), or an entirely separate type for storage of
base memory referenced by deserialized arrays.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deserialized arrays with base mutate strings

2009-09-23 Thread Pauli Virtanen
Wed, 23 Sep 2009 09:15:44 +0200, Hrvoje Niksic wrote:
[clip]
 Numpy arrays with the base property are deserialized as arrays
 pointing to a storage contained within a Python string.  This is a
 problem since such arrays are mutable and can mutate existing strings.
 Here is how to create one:

Please file a bug ticket in the Trac, thanks!

Here is a simpler way, although one more difficult to accidentally:

 a = numpy.frombuffer(A, dtype='S1')
 a.flags.writeable = True
 b = A
 a[0] = B
 b
'B'

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deserialized arrays with base mutate strings

2009-09-23 Thread Hrvoje Niksic
Pauli Virtanen wrote:
 Wed, 23 Sep 2009 09:15:44 +0200, Hrvoje Niksic wrote:
 [clip]
 Numpy arrays with the base property are deserialized as arrays
 pointing to a storage contained within a Python string.  This is a
 problem since such arrays are mutable and can mutate existing strings.
 Here is how to create one:
 
 Please file a bug ticket in the Trac, thanks!

Done - ticket #1233.

 Here is a simpler way, although one more difficult to accidentally:
 
 a = numpy.frombuffer(A, dtype='S1')
 a.flags.writeable = True
 b = A
 a[0] = B
 b
 'B'

I guess this one could be prevented by verifying that the buffer is 
writable when setting the writable flag.  When deserializing arrays, I 
don't see a reason for the base property to even exist - sharing of 
the buffer between different views is unpreserved anyway, as reported in 
my other thread.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] current stautus of numpy - Excel

2009-09-23 Thread Tim Michelsen
FYI:

Here is a summary of how one can 
1) write numpy arrays to Excel
2) interact with numpy/scipy/... from Excel

http://groups.google.com/group/python-excel/msg/3881b7e7ae210cc7

Best regards,
Timmie

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy and cython in pure python mode

2009-09-23 Thread Dag Sverre Seljebotn
Robert Kern wrote:
 On Tue, Sep 22, 2009 at 01:33, Sebastian Haase seb.ha...@gmail.com wrote:
   
 Hi,
 I'm not subscribed to the cython list - hoping enough people would
 care to justify my post here:
 
The post might be justified, but it is a question of available knowledge 
as well. I nearly missed this post here. The Cython user list is on:

http://groups.google.com/group/cython-users

 I know that cython's numpy is still getting better and better over
 time, but is it already today possible to have numpy support when
 using Cython in pure python mode?
 I like the idea of being able to develop and debug code the python
 way -- and then just switching on the cython-overdrive mode.
 (Otherwise I have very good experience using C/C++ with appropriate
 typemaps, and I don't mind the C syntax)

 I only recently learned about the pure python mode on the sympy list
 (and at the EuroScipy2009 workshop).
 My understanding is that Cython's pure Python mode could be played
 in two ways:
 a) either not having a .pyx-file at all and putting everything into a
 py-file (using the import cython stuff)
 or b) putting only cython specific declaration in to a pyx file having
 the same basename as the py-file next to it.
 
That should be a pxd file with the same basename. And I think that mode 
should work. b), that is.

Sturla's note on the memory view syntax doesn't apply as that's not in a 
released version of Cython yet, and won't be until 0.12.1 or 0.13. But 
that could be made to support Python mode a).

Finally there's been some recent discussion on cython-dev about a tool 
which can take a pyx file as input and output pure Python.
   
 One more: there is no way on reload cython-modules (yet),  right ?
 

 Correct. There is no way to reload any extension module.
   
This can be worked around (in most situations that arise in practice) by 
compiling the module with a new name each time and importing things from 
it though. Sage already kind of support it (for the %attach feature 
only), and there are patches around for pyximport in Cython that's just 
lacking testing and review. Since pyximport lacks a test suite 
altogether, nobody seems to ever get around to that.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread davew0000

Hi, 

I've got a fairly large (but not huge, 58mb) tab seperated text file, with
approximately 200 columns and 56k rows of numbers and strings. 

Here's a snippet of my code to create a numpy matrix from the data file... 

 

data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines()) 
data = array(data) 

### 
data = array(data)
It causes the following error: 

 ValueError: setting an array element with a sequence 

If I take the 1st 40,000 lines of the file, it works fine. 
If I take the last 40,000 lines of the file, it also works fine, so it isn't
a problem with the file. 

I've found a few other posts complaining of the same problem, but none of
their fixes work. 

It seems like a memory problem to me. This was reinforced when I tried to
break the dataset into 3 chunks and stack the resulting arrays - I got an
error message saying memory error. 
I don't really understand why reading in this 57mb txt file is taking up
~2gb's of RAM.

Any advice? Thanks in advance 

Dave
-- 
View this message in context: 
http://www.nabble.com/Numpy-2D-array-from-a-list-error-tp25531145p25531145.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] simple indexing question

2009-09-23 Thread Neal Becker
I have an array:
In [12]: a
Out[12]: 
array([[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]])

And a selection array:
In [13]: b
Out[13]: array([1, 1, 1, 1, 1])

I want a 1-dimensional output, where the array b selects an element from 
each column of a, where if b[i]=0 select element from 0th row of a and if 
b[i]=k select element from kth row of a.

Easy way to do this?  (Not a[b], that gives 5x5 array output)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple indexing question

2009-09-23 Thread Robert Cimrman
Neal Becker wrote:
 I have an array:
 In [12]: a
 Out[12]: 
 array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
 
 And a selection array:
 In [13]: b
 Out[13]: array([1, 1, 1, 1, 1])
 
 I want a 1-dimensional output, where the array b selects an element from 
 each column of a, where if b[i]=0 select element from 0th row of a and if 
 b[i]=k select element from kth row of a.
 
 Easy way to do this?  (Not a[b], that gives 5x5 array output)

It might be stupid, but it works...

In [51]: a
Out[51]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

In [52]: b = [0,1,0,1,0]

In [53]: a.T.flat[a.shape[0]*np.arange(a.shape[1])+b]
Out[53]: array([0, 6, 2, 8, 4])

cheers,
r.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Create numpy array from a list error

2009-09-23 Thread Dave Wood
Hi all,

I've got a fairly large (but not huge, 58mb) tab seperated text file, with
approximately 200 columns and 56k rows of numbers and strings.

Here's a snippet of my code to create a numpy matrix from the data file...



data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
data = array(data)

###

It causes the following error:

data = array(data)
  ValueError: setting an array element with a sequence

If I take the 1st 40,000 lines of the file, it works fine.
If I take the last 40,000 lines of the file, it also works fine, so it isn't
a problem with the file.

I've found a few other posts complaining of the same problem, but none of
their fixes work.

It seems like a memory problem to me. This was reinforced when I tried to
break the dataset into 3 chunks and stack the resulting arrays - I got an
error message saying memory error.
Also, I don't really understand why reading in this 57mb txt file is taking
up ~2gb's of RAM.

Any advice? Thanks in advance

Dave
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Bruce Southey
On 09/23/2009 08:42 AM, davew wrote:
 Hi,

 I've got a fairly large (but not huge, 58mb) tab seperated text file, with
 approximately 200 columns and 56k rows of numbers and strings.

 Here's a snippet of my code to create a numpy matrix from the data file...

 

 data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
 data = array(data)




 ###
 data = array(data)
 It causes the following error:


 ValueError: setting an array element with a sequence
  
 If I take the 1st 40,000 lines of the file, it works fine.
 If I take the last 40,000 lines of the file, it also works fine, so it isn't
 a problem with the file.

 I've found a few other posts complaining of the same problem, but none of
 their fixes work.

 It seems like a memory problem to me. This was reinforced when I tried to
 break the dataset into 3 chunks and stack the resulting arrays - I got an
 error message saying memory error.
 I don't really understand why reading in this 57mb txt file is taking up
 ~2gb's of RAM.

 Any advice? Thanks in advance

 Dave

If the text file has 'numbers and strings' how is numpy meant to know 
what dtype to use?
Please try genfromtxt especially if columns contain both numbers and 
strings.

What happens if you read a file instead of using stdin?

It is possible that one or more rows have multiple sequential delimiters.
Please check the row lengths of your 'data' variable after doing:

data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())

Really without the input or system, it is hard to say anything.
If you really know your data I would suggest preallocating the array and 
updating the array one line at a time to avoid the large multiple intermediate 
objects.

Bruce



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] cummax

2009-09-23 Thread Nissim Karpenstein
Hi,

I want a cummax function where given an array inp it returns this:

numpy.array([inp[:i].max() for i in xrange(1,len(inp)+1)]).

Various python versions equivalent to the above are quite slow (though a
single python loop is much faster than a python loop with a nested numpy C
loop as shown above).

I have numpy 1.3.0 source.  It looks to me like I could add cummax function
by simply adding PyArray_CumMax to multiarraymodule.c which would be the
same as PyArray_Max except it would call PyArray_GenericAccumulateFunction
instead of PyArray_GenericReduceFunction.  Also add array_cummax to
arraymethods.c.

Is there interest in adding this function to numpy?  If so, I will check out
the latest code and try to check in these changes.
If not, how can I write my own Python module in C that adds this UFunc and
still gets to reuse the code in PyArray_GenericReduceFunction?

Thanks,

   -Nissim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Dave Wood
 If the text file has 'numbers and strings' how is numpy meant to know
what dtype to use?
Please try genfromtxt especially if columns contain both numbers and
strings.

Well, I suppose they are all considered to be strings here. I haven't tried
to convert the numbers to floats yet.

What happens if you read a file instead of using stdin?

Same problem

It is possible that one or more rows have multiple sequential delimiters.
Please check the row lengths of your 'data' variable after doing:

Already done, they all have the same number of rows.
The fact that the script works with the first 40k lines, and also with the
last 40k lines suggests to me that there is no problem with the file.
(I calculate column means and standard deviations later in the script - it's
only the first two columns which can't be cast to floating point numbers)

Really without the input or system, it is hard to say anything.
If you really know your data I would suggest preallocating the array and
updating the array one line at a time to avoid the large multiple
intermediate objects.

I'm running on linux. My machine is redhat with 2GB RAM, but when memory
became an issue I tried running on other Linux machines with much greater
RAM capacities. I don't know what distos.

I just tried preallocating the array and updating it one line at a time, and
that works fine. Thanks very much for the suggestion. :)
This doesn't seem like the expected behaviour though and the error message
seems wrong.

Many thanks,

Dave


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] cummax

2009-09-23 Thread Charles R Harris
On Wed, Sep 23, 2009 at 8:34 AM, Nissim Karpenstein niss...@gmail.comwrote:

 Hi,

 I want a cummax function where given an array inp it returns this:

 numpy.array([inp[:i].max() for i in xrange(1,len(inp)+1)]).

 Various python versions equivalent to the above are quite slow (though a
 single python loop is much faster than a python loop with a nested numpy C
 loop as shown above).

 I have numpy 1.3.0 source.  It looks to me like I could add cummax function
 by simply adding PyArray_CumMax to multiarraymodule.c which would be the
 same as PyArray_Max except it would call PyArray_GenericAccumulateFunction
 instead of PyArray_GenericReduceFunction.  Also add array_cummax to
 arraymethods.c.

 Is there interest in adding this function to numpy?  If so, I will check
 out the latest code and try to check in these changes.
 If not, how can I write my own Python module in C that adds this UFunc and
 still gets to reuse the code in PyArray_GenericReduceFunction?


It's already available

In [5]: a = arange(10)

In [6]: a[5:] = 0

In [7]: maximum.accumulate(a)
Out[7]: array([0, 1, 2, 3, 4, 4, 4, 4, 4, 4])

PyArray_Max is there because it is an ndarray method.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple indexing question

2009-09-23 Thread Neal Becker
Robert Cimrman wrote:

 Neal Becker wrote:
 I have an array:
 In [12]: a
 Out[12]:
 array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
 
 And a selection array:
 In [13]: b
 Out[13]: array([1, 1, 1, 1, 1])
 
 I want a 1-dimensional output, where the array b selects an element from
 each column of a, where if b[i]=0 select element from 0th row of a and if
 b[i]=k select element from kth row of a.
 
 Easy way to do this?  (Not a[b], that gives 5x5 array output)
 
 It might be stupid, but it works...
 
 In [51]: a
 Out[51]:
 array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]])
 
 In [52]: b = [0,1,0,1,0]
 
 In [53]: a.T.flat[a.shape[0]*np.arange(a.shape[1])+b]
 Out[53]: array([0, 6, 2, 8, 4])
 
 cheers,
 r.

Thanks.  Is there really no more elegant solution?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple indexing question

2009-09-23 Thread josef . pktd
On Wed, Sep 23, 2009 at 11:12 AM, Neal Becker ndbeck...@gmail.com wrote:
 Robert Cimrman wrote:

 Neal Becker wrote:
 I have an array:
 In [12]: a
 Out[12]:
 array([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

 And a selection array:
 In [13]: b
 Out[13]: array([1, 1, 1, 1, 1])

 I want a 1-dimensional output, where the array b selects an element from
 each column of a, where if b[i]=0 select element from 0th row of a and if
 b[i]=k select element from kth row of a.

 Easy way to do this?  (Not a[b], that gives 5x5 array output)

 It might be stupid, but it works...

 In [51]: a
 Out[51]:
 array([[0, 1, 2, 3, 4],
         [5, 6, 7, 8, 9]])

 In [52]: b = [0,1,0,1,0]

 In [53]: a.T.flat[a.shape[0]*np.arange(a.shape[1])+b]
 Out[53]: array([0, 6, 2, 8, 4])

 cheers,
 r.

 Thanks.  Is there really no more elegant solution?

How about this?

 a
array([[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]])
 b
array([0, 1, 0, 1, 0])

 a[b,np.arange(a.shape[1])]
array([0, 6, 2, 8, 4])

Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Bruce Southey

On 09/23/2009 10:00 AM, Dave Wood wrote:

If the text file has 'numbers and strings' how is numpy meant to know
what dtype to use?
Please try genfromtxt especially if columns contain both numbers and
strings.
Well, I suppose they are all considered to be strings here. I haven't 
tried to convert the numbers to floats yet.

What happens if you read a file instead of using stdin?
Same problem

It is possible that one or more rows have multiple sequential delimiters.
Please check the row lengths of your 'data' variable after doing:
Already done, they all have the same number of rows.
The fact that the script works with the first 40k lines, and also with 
the last 40k lines suggests to me that there is no problem with the file.
(I calculate column means and standard deviations later in the script 
- it's only the first two columns which can't be cast to floating 
point numbers)


Really without the input or system, it is hard to say anything.
If you really know your data I would suggest preallocating the array 
and updating the array one line at a time to avoid the large multiple 
intermediate objects.
I'm running on linux. My machine is redhat with 2GB RAM, but when 
memory became an issue I tried running on other Linux machines with 
much greater RAM capacities. I don't know what distos.
I just tried preallocating the array and updating it one line at a 
time, and that works fine. Thanks very much for the suggestion. :)
This doesn't seem like the expected behaviour though and the error 
message seems wrong.

Many thanks,
Dave


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   

Glad it you got a solution.

While far from an expert, with 2GB ram you do not have that much free 
RAM outside the OS and other overheads. With your code, the OS has to 
read all the data in at least once as well as allocate the storage for 
the result and any intermediate objects. So it is easy to exhaust memory.


I agree that the error message is too vague so you could file a ticket.

Use PyTables if memory is a problem for you.
For example, see the recent 'np.memmap and memory usage' thread on numpy 
discussion:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg18863.html
Especially the post by Francesc Alted:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg18868.html

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Skipper Seabold
On Wed, Sep 23, 2009 at 9:42 AM, davew davejw...@gmail.com wrote:

 Hi,

 I've got a fairly large (but not huge, 58mb) tab seperated text file, with
 approximately 200 columns and 56k rows of numbers and strings.

 Here's a snippet of my code to create a numpy matrix from the data file...

 

 data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
 data = array(data)

 ###
 data = array(data)
 It causes the following error:

 ValueError: setting an array element with a sequence

 If I take the 1st 40,000 lines of the file, it works fine.
 If I take the last 40,000 lines of the file, it also works fine, so it isn't
 a problem with the file.

 I've found a few other posts complaining of the same problem, but none of
 their fixes work.

 It seems like a memory problem to me. This was reinforced when I tried to
 break the dataset into 3 chunks and stack the resulting arrays - I got an
 error message saying memory error.
 I don't really understand why reading in this 57mb txt file is taking up
 ~2gb's of RAM.

 Any advice? Thanks in advance


Without knowing more, I wouldn't think that there's really a memory
error trying to load a 57 MB file or stacking it split into 3.  Try
using genfromtxt or loadtxt.  It should work without a problem unless
there is something funny about your file.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread Mark Sienkiewicz
I have discovered the hard way that numpy depends on openssl.

I am building a 64 bit python environment for the macintosh.  I 
currently do not have a 64 bit openssl library installed, so the python 
interpreter does not have hashlib.  (hashlib gets its md5 function from 
the openssl library.)

The problem is in numpy/core/code_generators/genapi.py, where it appears 
to be trying to make an md5 hash of the declarations of some of the C 
functions.

What is this hash used for?  Is there a particular reason that it needs 
to be cryptographically strong?

Mark S.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple indexing question

2009-09-23 Thread Robert Cimrman
josef.p...@gmail.com wrote:
 On Wed, Sep 23, 2009 at 11:12 AM, Neal Becker ndbeck...@gmail.com wrote:
 Robert Cimrman wrote:

 Neal Becker wrote:
 I have an array:
 In [12]: a
 Out[12]:
 array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

 And a selection array:
 In [13]: b
 Out[13]: array([1, 1, 1, 1, 1])

 I want a 1-dimensional output, where the array b selects an element from
 each column of a, where if b[i]=0 select element from 0th row of a and if
 b[i]=k select element from kth row of a.

 Easy way to do this?  (Not a[b], that gives 5x5 array output)
 It might be stupid, but it works...

 In [51]: a
 Out[51]:
 array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]])

 In [52]: b = [0,1,0,1,0]

 In [53]: a.T.flat[a.shape[0]*np.arange(a.shape[1])+b]
 Out[53]: array([0, 6, 2, 8, 4])

 cheers,
 r.
 Thanks.  Is there really no more elegant solution?
 
 How about this?
 
 a
 array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
 b
 array([0, 1, 0, 1, 0])
 
 a[b,np.arange(a.shape[1])]
 array([0, 6, 2, 8, 4])

So it was stupid :)

well, time to go home,
r.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread Robert Kern
On Wed, Sep 23, 2009 at 10:52, Mark Sienkiewicz sienk...@stsci.edu wrote:
 I have discovered the hard way that numpy depends on openssl.

 I am building a 64 bit python environment for the macintosh.  I
 currently do not have a 64 bit openssl library installed, so the python
 interpreter does not have hashlib.  (hashlib gets its md5 function from
 the openssl library.)

There are builtin implementations that do not depend on OpenSSL.
hashlib should be using them for MD5 and the standard SHA variants
when OpenSSL is not available. Try import _md5. But basically, we
expect you to have a reasonably complete standard library.

 The problem is in numpy/core/code_generators/genapi.py, where it appears
 to be trying to make an md5 hash of the declarations of some of the C
 functions.

 What is this hash used for?  Is there a particular reason that it needs
 to be cryptographically strong?

It is used for checking for changes in the API. While this use case
does not require all of the properties that would make a hash
cryptographically strong, it needs some of them.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread Charles R Harris
On Wed, Sep 23, 2009 at 9:52 AM, Mark Sienkiewicz sienk...@stsci.eduwrote:

 I have discovered the hard way that numpy depends on openssl.

 I am building a 64 bit python environment for the macintosh.  I
 currently do not have a 64 bit openssl library installed, so the python
 interpreter does not have hashlib.  (hashlib gets its md5 function from
 the openssl library.)

 The problem is in numpy/core/code_generators/genapi.py, where it appears
 to be trying to make an md5 hash of the declarations of some of the C
 functions.

 What is this hash used for?  Is there a particular reason that it needs
 to be cryptographically strong?


The hash is used as a way to check for any API changes. It doesn't have to
be cryptographically strong, it just needs to scatter the hashed values
effectively and we could probably use something simpler. I tend to regard
this problem as a Python bug because the standard python modules should be
available on all platforms. In any case, we should find a fix. Please open a
ticket.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create numpy array from a list error

2009-09-23 Thread Gökhan Sever
On Wed, Sep 23, 2009 at 9:06 AM, Dave Wood davejw...@gmail.com wrote:

 Hi all,

 I've got a fairly large (but not huge, 58mb) tab seperated text file, with
 approximately 200 columns and 56k rows of numbers and strings.

 Here's a snippet of my code to create a numpy matrix from the data file...

 

 data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
 data = array(data)

 ###

 It causes the following error:

 data = array(data)
   ValueError: setting an array element with a sequence

 If I take the 1st 40,000 lines of the file, it works fine.
 If I take the last 40,000 lines of the file, it also works fine, so it
 isn't a problem with the file.

 I've found a few other posts complaining of the same problem, but none of
 their fixes work.

 It seems like a memory problem to me. This was reinforced when I tried to
 break the dataset into 3 chunks and stack the resulting arrays - I got an
 error message saying memory error.
 Also, I don't really understand why reading in this 57mb txt file is taking
 up ~2gb's of RAM.

 Any advice? Thanks in advance

 Dave

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Here I use loadtxt to read ~89 MB txt file. Can you use loadtxt and share
your results?

I[14]: data = np.loadtxt('09_03_18_07_55_33.sau', dtype='float',
skiprows=83).T

I[15]: len data
- len(data)
O[15]: 66

I[16]: len data[0]
- len(data[0])
O[16]: 117040

I[17]: whos
Variable   TypeData/Info

data   ndarray 66x117040: 7724640 elems, type `float64`, 61797120
bytes (58 Mb)



[gse...@ccn various]$ python sysinfo.py

Platform :
Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
Python   : ('CPython', 'tags/r26', '66714')
IPython  : 0.10
NumPy: 1.4.0.dev
Matplotlib   : 1.0.svn



-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread David Warde-Farley

On 23-Sep-09, at 11:52 AM, Mark Sienkiewicz wrote:

 I am building a 64 bit python environment for the macintosh.  I
 currently do not have a 64 bit openssl library installed, so the  
 python
 interpreter does not have hashlib.  (hashlib gets its md5 function  
 from
 the openssl library.)

If you're interested in remedying this with your Python build, have a  
look at Mac/BuildScript, there is a bunch of logic there that  
downloads various optional dependencies and builds them with the  
selected architectures. It should not be difficult to modify it to  
also grab and build openssl.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread David Cournapeau
On Thu, Sep 24, 2009 at 1:20 AM, Charles R Harris
charlesr.har...@gmail.com wrote:
 In any case, we should find a fix.

I don't think we do - we requires a standard python install, and a
python without hashlib is crippled. If you can't build python without
openssl, I would consider this a python bug.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Christopher Barker
Dave Wood wrote:
 Well, I suppose they are all considered to be strings here. I haven't 
 tried to convert the numbers to floats yet.

This could be an issue. For strings, numpy creates an array of strings, 
all of the same length, so each element is as big as the largest one:

In [13]: l
Out[13]: ['5', '34', 'this is a much longer string']

In [14]: np.array(l)
Out[14]:
array(['5', '34', 'this is a much longer string'],
   dtype='|S28')


Note that each element is 28 bytes (that's what the S28 means).

this means that your array would be much larger than the text file if 
you have even one long string it in. Also, as mentioned in this thread, 
in order to figure out how big to make each string element, the array() 
constructor has to scan through your entire list first, and I don't know 
how much intermediate memory it may use in that process.

This really isn't how numpy is meant to be used -- why would you want a 
big ol' array of mixed numbers and strings, all stored as strings?

structured arrays were meant for this, and np.loadtxt() is the easiest 
way to get one.

 I just tried preallocating the array and updating it one line at a time, 
 and that works fine.

what dtype do you end up with?

 This doesn't seem like the expected behaviour though and the error 
 message seems wrong.

yes, not a good error message at all -- it's hard to make sure good 
errors get triggered every time!


HTH,

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create numpy array from a list error

2009-09-23 Thread Gökhan Sever
On Wed, Sep 23, 2009 at 9:06 AM, Dave Wood davejw...@gmail.com wrote:

 Hi all,

 I've got a fairly large (but not huge, 58mb) tab seperated text file, with
 approximately 200 columns and 56k rows of numbers and strings.

 Here's a snippet of my code to create a numpy matrix from the data file...

 

 data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
 data = array(data)

 ###

 It causes the following error:

 data = array(data)
   ValueError: setting an array element with a sequence

 If I take the 1st 40,000 lines of the file, it works fine.
 If I take the last 40,000 lines of the file, it also works fine, so it
 isn't a problem with the file.

 I've found a few other posts complaining of the same problem, but none of
 their fixes work.

 It seems like a memory problem to me. This was reinforced when I tried to
 break the dataset into 3 chunks and stack the resulting arrays - I got an
 error message saying memory error.
 Also, I don't really understand why reading in this 57mb txt file is taking
 up ~2gb's of RAM.

 Any advice? Thanks in advance

 Dave

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


One more reply,

You try to read mixed data (strings and numbers) into an array, that might
be causing the problem. In my example, after skipping the meta-header all I
have is numbers.

Additionally, when you are reading chunk of data if one of the column
elements truncated or overflows its section NumPy complains with that
ValueError.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Dave Wood
Appologies for the multiple posts, people. My posting to the forum was
pending for a long time, so I deleted it and tried emailing directly. I
didn't think they'd all be sent out.
Gokan, thanks for the reply, I hope you get this one.

Here I use loadtxt to read ~89 MB txt file. Can you use loadtxt and share
your results?

I[14]: data = np.loadtxt('09_03_18_07_55_33.sau', dtype='float',
skiprows=83).T

I[15]: len data
- len(data)
O[15]: 66

I[16]: len data[0]
- len(data[0])
O[16]: 117040

I[17]: whos
Variable   TypeData/Info

data   ndarray 66x117040: 7724640 elems, type `float64`, 61797120
bytes (58 Mb)



[gse...@ccn various]$ python sysinfo.py

Platform :
Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
Python   : ('CPython', 'tags/r26', '66714')
IPython  : 0.10
NumPy: 1.4.0.dev
Matplotlib   : 1.0.svn



-- 
Gökhan




I tried using loadtxt and got the same error as before (with a little more
information).



Traceback (most recent call last):
  File /home/dwood/workspace/GeneralScripts/src/test_clab2R.py, line 140,
in module
main()
  File /home/dwood/workspace/GeneralScripts/src/test_clab2R.py, line 45,
in main
data = loadtxt(inputfile.txt,dtype='string')
  File
/apps/python/2.5.4/rhel4/lib/python2.5/site-packages/numpy/lib/io.py, line
505, in loadtxt
X = np.array(X, dtype)
ValueError: setting an array element with a sequence


@Christopher Barker
Thanks for the information. To fix my problem, I tried taking out the row
names (leaving only numerical information), and converting the 2D list to
floats. I still had the same problem.


On 9/23/09, Christopher Barker chris.bar...@noaa.gov wrote:

 Dave Wood wrote:
  Well, I suppose they are all considered to be strings here. I haven't
  tried to convert the numbers to floats yet.

 This could be an issue. For strings, numpy creates an array of strings,
 all of the same length, so each element is as big as the largest one:

 In [13]: l
 Out[13]: ['5', '34', 'this is a much longer string']

 In [14]: np.array(l)
 Out[14]:
 array(['5', '34', 'this is a much longer string'],
   dtype='|S28')


 Note that each element is 28 bytes (that's what the S28 means).

 this means that your array would be much larger than the text file if
 you have even one long string it in. Also, as mentioned in this thread,
 in order to figure out how big to make each string element, the array()
 constructor has to scan through your entire list first, and I don't know
 how much intermediate memory it may use in that process.

 This really isn't how numpy is meant to be used -- why would you want a
 big ol' array of mixed numbers and strings, all stored as strings?

 structured arrays were meant for this, and np.loadtxt() is the easiest
 way to get one.

  I just tried preallocating the array and updating it one line at a time,
  and that works fine.

 what dtype do you end up with?

  This doesn't seem like the expected behaviour though and the error
  message seems wrong.

 yes, not a good error message at all -- it's hard to make sure good
 errors get triggered every time!


 HTH,

 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 2D array from a list error

2009-09-23 Thread Dave Wood
Ignore that last mail, I hit send instead of save by mistake.

Between you you seem to be right, it's a problem with loading the array of
strings. There must be some large strings in the first 'rowname' column. If
this column is left out, it works fine (even as strings).

Many thanks, sorry for all the emails.

Dave




On 9/23/09, Dave Wood davejw...@gmail.com wrote:

 Appologies for the multiple posts, people. My posting to the forum was
 pending for a long time, so I deleted it and tried emailing directly. I
 didn't think they'd all be sent out.
 Gokan, thanks for the reply, I hope you get this one.

 Here I use loadtxt to read ~89 MB txt file. Can you use loadtxt and share
 your results?

 I[14]: data = np.loadtxt('09_03_18_07_55_33.sau', dtype='float',
 skiprows=83).T

 I[15]: len data
 - len(data)
 O[15]: 66

 I[16]: len data[0]
 - len(data[0])
 O[16]: 117040

 I[17]: whos
 Variable   TypeData/Info
 
 data   ndarray 66x117040: 7724640 elems, type `float64`, 61797120
 bytes (58 Mb)



 [gse...@ccn various]$ python sysinfo.py

 
 Platform :
 Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
 Python   : ('CPython', 'tags/r26', '66714')
 IPython  : 0.10
 NumPy: 1.4.0.dev
 Matplotlib   : 1.0.svn

 


 --
 Gökhan




 I tried using loadtxt and got the same error as before (with a little more
 information).

 

 Traceback (most recent call last):
   File /home/dwood/workspace/GeneralScripts/src/test_clab2R.py, line
 140, in module
 main()
   File /home/dwood/workspace/GeneralScripts/src/test_clab2R.py, line 45,
 in main
 data = loadtxt(inputfile.txt,dtype='string')
   File
 /apps/python/2.5.4/rhel4/lib/python2.5/site-packages/numpy/lib/io.py, line
 505, in loadtxt
 X = np.array(X, dtype)
 ValueError: setting an array element with a sequence
 

 @Christopher Barker
 Thanks for the information. To fix my problem, I tried taking out the row
 names (leaving only numerical information), and converting the 2D list to
 floats. I still had the same problem.


 On 9/23/09, Christopher Barker chris.bar...@noaa.gov wrote:

 Dave Wood wrote:
  Well, I suppose they are all considered to be strings here. I haven't
  tried to convert the numbers to floats yet.

 This could be an issue. For strings, numpy creates an array of strings,
 all of the same length, so each element is as big as the largest one:

 In [13]: l
 Out[13]: ['5', '34', 'this is a much longer string']

 In [14]: np.array(l)
 Out[14]:
 array(['5', '34', 'this is a much longer string'],
   dtype='|S28')


 Note that each element is 28 bytes (that's what the S28 means).

 this means that your array would be much larger than the text file if
 you have even one long string it in. Also, as mentioned in this thread,
 in order to figure out how big to make each string element, the array()
 constructor has to scan through your entire list first, and I don't know
 how much intermediate memory it may use in that process.

 This really isn't how numpy is meant to be used -- why would you want a
 big ol' array of mixed numbers and strings, all stored as strings?

 structured arrays were meant for this, and np.loadtxt() is the easiest
 way to get one.

  I just tried preallocating the array and updating it one line at a time,
  and that works fine.

 what dtype do you end up with?

  This doesn't seem like the expected behaviour though and the error
  message seems wrong.

 yes, not a good error message at all -- it's hard to make sure good
 errors get triggered every time!


 HTH,

 -Chris



 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is ndarray.base the closest base or the ultimate base?

2009-09-23 Thread Robert Kern
On Tue, Sep 22, 2009 at 17:14, Citi, Luca lc...@essex.ac.uk wrote:
 My vote (if I am entitled to) goes to change the code.
 Whether or not the addressee of .base is an array, it should be the object 
 that has to be kept alive such that the data does not get deallocated rather 
 one object which will keep alive another object, which will keep alive 
 another object, , which will keep alive the object with the data.
 On creation of a new view B of object A, if A has ONWDATA true then B.base = 
 A, else B.base = A.base.

 When working on
 http://projects.scipy.org/numpy/ticket/1085
 I had to walk the chain of bases to establish whether any of the inputs and 
 the outputs were views of the same data.
 If base were the ultimate base, one would only need to check whether any of 
 the inputs have the same base of any of the outputs.

This is not reliable. You need to check memory addresses and extents
for overlap (unfortunately, slices complicate this;
numpy.may_share_memory() is a good heuristic, though). When
interfacing with other systems using __array_interface__ or similar
APIs, the other system may have multiple objects that point to the
same data. If you create ndarrays from each of these objects, their
.base attributes would all be different although they all point to the
same memory.

 I tried to modify the code to change the behaviour.
 I have opened a ticket for this http://projects.scipy.org/numpy/ticket/1232
 and attached a patch but I am not 100% sure.
 I changed PyArray_View in convert.c and a few places in mapping.c and 
 sequence.c.

 But if there is any reason why the current behaviour should be kept, just 
 ignore the ticket.

Lacking a robust use case, I would prefer to keep the current
behavior. It is likely that nothing would break if we changed it, but
without a use case, I would prefer to be conservative.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simple indexing question

2009-09-23 Thread Neal Becker
josef.p...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 11:12 AM, Neal Becker ndbeck...@gmail.com wrote:
 Robert Cimrman wrote:

 Neal Becker wrote:
 I have an array:
 In [12]: a
 Out[12]:
 array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]])

 And a selection array:
 In [13]: b
 Out[13]: array([1, 1, 1, 1, 1])

 I want a 1-dimensional output, where the array b selects an element
 from each column of a, where if b[i]=0 select element from 0th row of a
 and if b[i]=k select element from kth row of a.

 Easy way to do this?  (Not a[b], that gives 5x5 array output)

 It might be stupid, but it works...

 In [51]: a
 Out[51]:
 array([[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]])

 In [52]: b = [0,1,0,1,0]

 In [53]: a.T.flat[a.shape[0]*np.arange(a.shape[1])+b]
 Out[53]: array([0, 6, 2, 8, 4])

 cheers,
 r.

 Thanks.  Is there really no more elegant solution?
 
 How about this?
 
 a
 array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
 b
 array([0, 1, 0, 1, 0])
 
 a[b,np.arange(a.shape[1])]
 array([0, 6, 2, 8, 4])
 
 Josef
 
Thanks, that's not bad.  I'm a little surprised that given the fancy 
indexing capabilities of np there isn't a more direct way to do this.  I'm 
still trying to wrap my mind around the fancy indexing stuff.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Create numpy array from a list error

2009-09-23 Thread David Warde-Farley
On 23-Sep-09, at 10:06 AM, Dave Wood wrote:

 Hi all,

 I've got a fairly large (but not huge, 58mb) tab seperated text  
 file, with
 approximately 200 columns and 56k rows of numbers and strings.

 Here's a snippet of my code to create a numpy matrix from the data  
 file...

 

 data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
 data = array(data)

In general I have found that the pattern your using is a bad one,  
because it's first reading the entire file into memory and then making  
a complete copy of it when you call map.

I would instead use

data = [x.strip().split('\t') for x in sys.stdin]

or even defer the loop until array() is called, with a generator:

data = (x.strip().split('\t') for x in sys.stdin)

This difference still shouldn't be resulting in a memory error with  
only 57 MB of data, but it'll make things go faster at least.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is ndarray.base the closest base or the ultimate base?

2009-09-23 Thread Robert Kern
On Wed, Sep 23, 2009 at 13:30, Citi, Luca lc...@essex.ac.uk wrote:

 http://projects.scipy.org/numpy/ticket/1085
 But I think in that case it was still an improvement w.r.t. the current 
 implementation
 which is buggy. At least it shields 95% of users from unexpected results.
 Using memory addresses and extents might be overkilling (and expensive) in 
 that case.

numpy.may_share_memory() should be pretty cheap. It's just arithmetic.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deserialized arrays with base mutate strings

2009-09-23 Thread Pauli Virtanen
ke, 2009-09-23 kello 10:01 +0200, Hrvoje Niksic kirjoitti:
[clip]
 I guess this one could be prevented by verifying that the buffer is 
 writable when setting the writable flag.  When deserializing arrays, I 
 don't see a reason for the base property to even exist - sharing of 
 the buffer between different views is unpreserved anyway, as reported in 
 my other thread.

IIRC, it avoids one copy: ndarray.__reduce__ pickles the raw data as a
string, and so ndarray.__setstate__ receives a Python string back.

I don't remember if it's in the end possible to emit raw byte stream to
a pickle somehow, not going through strings. If not, then a copy can't
be avoided.

-- 
Pauli Virtanen



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deserialized arrays with base mutate strings

2009-09-23 Thread Robert Kern
On Wed, Sep 23, 2009 at 13:59, Pauli Virtanen p...@iki.fi wrote:
 ke, 2009-09-23 kello 10:01 +0200, Hrvoje Niksic kirjoitti:
 [clip]
 I guess this one could be prevented by verifying that the buffer is
 writable when setting the writable flag.  When deserializing arrays, I
 don't see a reason for the base property to even exist - sharing of
 the buffer between different views is unpreserved anyway, as reported in
 my other thread.

 IIRC, it avoids one copy: ndarray.__reduce__ pickles the raw data as a
 string, and so ndarray.__setstate__ receives a Python string back.

Correct, that was the goal.

 I don't remember if it's in the end possible to emit raw byte stream to
 a pickle somehow, not going through strings. If not, then a copy can't
 be avoided.

No, I don't think you can.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Coercing object arrays to string (or unicode) arrays

2009-09-23 Thread Michael Droettboom
As I'm looking into fixing a number of bugs in chararray, I'm running 
into some surprising behavior.  One of the things chararray needs to do 
occasionally is build up an object array of string objects, and then 
convert that back to a fixed-length string array.  This length is 
sometimes predetermined by a recarray data structure.  Unfortunately, 
I'm not getting what I would expect when coercing or assigning an object 
array to a string array.  Is this a bug, or am I just going about this 
the wrong way?  If a bug, I'm happy to look into it as part of my 
fixing chararray task, but I just wanted to confirm that it is a bug 
before proceeding.

In [14]: x = np.array(['abcdefgh', 'ijklmnop'], 'O')

# Without specifying the length, it seems to default to sizeof(int)... ???
In [15]: np.array(x, 'S')
Out[15]:
array(['abcd', 'ijkl'],
   dtype='|S4')

In [21]: np.array(x, np.string_)
Out[21]:
array(['abcd', 'ijkl'],
   dtype='|S4')

# Specifying a length gives strange results
In [16]: np.array(x, 'S8')
Out[16]:
array(['abcdijkl', 'mnop\xe0\x01\x85\x08'],
   dtype='|S8')

# This is what I expected to happen above, but the cast to a list seems 
like it should be unnecessary
In [17]: np.array(list(x))
Out[17]:
array(['abcdefgh', 'ijklmnop'],
   dtype='|S8')

# Assignment also seems broken
In [18]: y = np.empty(x.shape, dtype='S8')

In [19]: y[:] = x[:]

In [20]: y
Out[20]:
array(['abcdijkl', 'mnop\xc05\xf9\xb7'],
   dtype='|S8')

Cheers,
Mike
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy depends on OpenSSL ???

2009-09-23 Thread Mark Sienkiewicz
Robert Kern wrote:
 On Wed, Sep 23, 2009 at 10:52, Mark Sienkiewicz sienk...@stsci.edu wrote:
   
 I have discovered the hard way that numpy depends on openssl.

 I am building a 64 bit python environment for the macintosh.  I
 currently do not have a 64 bit openssl library installed, so the python
 interpreter does not have hashlib.  (hashlib gets its md5 function from
 the openssl library.)
 

 There are builtin implementations that do not depend on OpenSSL.
 hashlib should be using them for MD5 and the standard SHA variants
 when OpenSSL is not available. 

This is the clue that I needed.  Here is where it led:

setup.py tries to detect the presence of openssl by looking for the 
library and the include files.  It detects the library that Apple 
provided in /usr/lib/libssl.dylib and tries to build the openssl version 
of hashlib.  But when it actually builds the module, the link fails 
because that library file is not for the correct architecture.  I am 
building for x86_64, but the library contains only ppc and i386.

The result is that hashlib cannot be imported, so the python installer 
decides not to install it at all.  That certainly appears to indicate 
that the python developers consider hashlib to be optional, but it 
_should_ work in most any python installation.

So, the problem is really about the python install automatically 
detecting libraries.  If I hack the setup.py that builds all the C 
modules so that it can't find the openssl library, then it uses the 
fallbacks that are distributed with python.

That gets me a as far as EnvironmentError: math library missing; rerun 
setup.py after setting the MATHLIB env variable, which is a big 
improvement.  (The math library is not missing, but this is a different 
problem entirely.)

Thanks, and sorry for the false alarm.

Mark S.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is ndarray.base the closest base or the ultimate base?

2009-09-23 Thread Citi, Luca
 numpy.may_share_memory() should be pretty cheap. It's just arithmetic.
True, but it is in python. Not something that should go in construct_arrays of 
ufunc_object.c, I suppose.
But the same approach can be translated to C, probably.

I can try if we decide
http://projects.scipy.org/numpy/ticket/1085
is worth fixing.

Let me know.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] something wrong with docs?

2009-09-23 Thread Fernando Perez
On Tue, Sep 22, 2009 at 11:15 PM, David Goldsmith
d.l.goldsm...@gmail.com wrote:
 It would be nice if we could move gradually
 towards docs whose examples (at least those marked as such) were
 always run via sphinx.

 That's a suggestion, but given your point, it seems like you'd advocate it
 being more than that, no?


I was simply thinking that if this markup were to be used in the docs
for all examples where it makes sense, then one could simply use the
sphinx target

make doctest

to also validate the documentation.  Even if users don't run these by
default, developers and buildbots would, which helps raise the
reliability of the docs and reduces chances of code bitrot in the
examples from the main docs (that problem is taken care of for the
docstrings by np.test(doctest=True) ).

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] dtype '|S0' not understood

2009-09-23 Thread David Warde-Farley
Howdy,

It seems it's possible using e.g.

In [25]: dtype([('foo', str)])Out[25]: dtype([('foo', '|S0')])

to get yourself a zero-length string. However dtype('|S0') results in  
a TypeError: data type not understood.

I understand the stupidity of creating a 0-length string field but  
it's conceivable that it's accidental.

For example, it could lead to a situation where you've created that  
field, are missing all the data you had meant to put in it, serialize  
with np.save, and upon np.load aren't able to get _any_ of your data  
back because the dtype descriptor is considered bogus (can you guess  
why I thought of this scenario?).

It seems that either dtype(str) should do something more sensible than  
zero-length string, or it should be possible to create it with dtype('| 
S0').  Which should it be?

David

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion