[Numpy-discussion] How to concatenate two arrays without duplicating memory?

2009-09-02 Thread V. Armando Solé
Hello,

Let's say we have two arrays A and B of shapes (1, 2000) and (1, 
4000).

If I do C=numpy.concatenate((A, B), axis=1), I get a new array of 
dimension (1, 6000) with duplication of memory.

I am looking for a way to have a non contiguous array C in which the 
left (1, 2000) elements point to A and the right (1, 4000) 
elements point to B. 

Any hint will be appreciated.

Thanks,

Armando


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about np.savez

2009-09-02 Thread Francesc Alted
A Wednesday 02 September 2009 05:50:57 Robert Kern escrigué:
 On Tue, Sep 1, 2009 at 21:11, Jorge Scandaliarisjorgesmbox...@yahoo.es 
wrote:
  David Warde-Farley dwf at cs.toronto.edu writes:
  If you actually want to save multiple arrays, you can use
  savez('fname', *[a,b,c]) and they will be accessible under the names
  arr_0, arr_1, etc. and a list of these names is in the 'files'
  attribute on the NpzFile object. To retrieve your list of arrays when
  you load, you can just do
 
  mynewlist = [data[arrname] for arrname in data.files]
 
  Thanks for the tip. I have realized, though, that I might need some more
  flexibility than just the ability to save ndarrays. The data I am dealing
  with is best kept in a hierarchical way (I could represent the structure
  with ndarrays also, but I think it would be messy and difficult). I am
  having a look at h5py to see if it fulfill my needs. I know there is
  pytables, too, but from having a quick look it seems h5py is simpler. Am
  I right on this?. I also get a nice side-effect, the data would be
  readable by the de-facto standard software used by most people in my
  field.

 If there is a particular format that uses HDF5 that you are trying to
 replicate, h5py is the clear answer. However, PyTables will, by and
 large, make files that are entirely readable by other HDF5 libraries
 when you just use the subset of features that is supported by
 HDF5-proper. For example, tables and arrays work just fine. What won't
 be supported by non-PyTables libraries are things like dataset
 attributes which are pickled objects. Your non-PyTables HDF5 apps will
 see some extraneous attributes on the arrays and tables, but those are
 typically not necessary for interpretation.

Most of these 'extraneous' attributes are derived from the use of the high 
level HDF5 interface (http://www.hdfgroup.org/HDF5/doc/HL/).  If they bother 
you, you can get rid of them by setting the parameter ``PYTABLES_SYS_ATTRS`` 
to false (either in tables/parameters.py or passing it to `tables.openFile`).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrays without duplicating memory?

2009-09-02 Thread Gael Varoquaux
On Wed, Sep 02, 2009 at 09:40:49AM +0200, V. Armando Solé wrote:
 Let's say we have two arrays A and B of shapes (1, 2000) and (1, 
 4000).

 If I do C=numpy.concatenate((A, B), axis=1), I get a new array of 
 dimension (1, 6000) with duplication of memory.

 I am looking for a way to have a non contiguous array C in which the 
 left (1, 2000) elements point to A and the right (1, 4000) 
 elements point to B. 

You cannot in the numpy memory model. The numpy memory model defines an
array as something that has regular strides to jump from an element to
the next one.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Citi, Luca
Hello Sturla,
I had a quick look at your code.
Looks fine.

A few notes...

In select you should replace numpy with np.

In _median how can you, if n==2, use s[] if s is not defined?
What if n==1?
Also, I think when returning an empty array, it should be of
the same type you would get in the other cases.

You could replace _median with the following.

Best,
Luca


def _median(x, inplace):
assert(x.ndim == 1)
n = x.shape[0]
if n  2:
k = n  1
s = select(x, k, inplace=inplace)
if n  1:
return s[k]
else:
return 0.5*(s[k]+s[:k].max())  
elif n == 0:
return np.empty(0, dtype=x.dtype)
elif n == 2:
return 0.5*(x[0]+x[1])
else: # n == 1
return x[0]

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread Citi, Luca
As Gaël pointed out you cannot create A, B and then C
as the concatenation of A and B without duplicating
the vectors.

 I am looking for a way to have a non contiguous array C in which the 
 left (1, 2000) elements point to A and the right (1, 4000) 
 elements point to B. 

But you can still re-link A to the left elements
and B to the right ones afterwards by using views into C.

 C=numpy.concatenate((A, B), axis=1)
 A,B = C[:,:2000], C[:,2000:]

Best,
Luca
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrays without duplicating memory?

2009-09-02 Thread V. Armando Solé
Gael Varoquaux wrote:
 You cannot in the numpy memory model. The numpy memory model defines an
 array as something that has regular strides to jump from an element to
 the next one.
   
I expected problems in the suggested case (concatenating columns) but I 
did not expect the problem would be so severe to affect the case of row 
concatenation.

I guess I am still considering a 2D array as an array of pointers and 
that does not apply to numpy arrays.

Thanks for the info.

Armando

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about np.savez

2009-09-02 Thread Jorge Scandaliaris
Thanks David, Robert and Francesc for comments and suggestions. It's nice having
options, but that also means one has to choose ;)
I will have a closer look at pytables. The thing that got me scared about it
was the word database. I have close to zero experience using or, even worst,
designing databases. Maybe I am wrong. The way I was considering for structuring
could be considered like a, rudimentary at least, database.  
I have the feeling this is turning into killing a fly with a cannon...

Jorge


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread V. Armando Solé
Citi, Luca wrote:
 As Gaël pointed out you cannot create A, B and then C
 as the concatenation of A and B without duplicating
 the vectors.
   
 But you can still re-link A to the left elements
 and B to the right ones afterwards by using views into C.
   

Thanks for the hint. In my case the A array is already present and the 
contents of the B array can be read from disk.

At least I have two workarounds making use of your suggested solution of 
re-linking:

- create the C array, copy the contents of A to it and read the contents 
of B directly into C with duplication of the memory of A during some time.

- save the array A in disk, create the array C, read the contents of A 
and B into it and re-link A and B with no duplication but ugly.

Thanks,

Armando


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about np.savez

2009-09-02 Thread Francesc Alted
A Wednesday 02 September 2009 11:20:55 Jorge Scandaliaris escrigué:
 Thanks David, Robert and Francesc for comments and suggestions. It's nice
 having options, but that also means one has to choose ;)
 I will have a closer look at pytables. The thing that got me scared about
 it was the word database. I have close to zero experience using or, even
 worst, designing databases. Maybe I am wrong. The way I was considering for
 structuring could be considered like a, rudimentary at least, database.

Well, I agree that the term 'database' is perhaps a bit scaring and I don't 
actually like this term to be applied to PyTables --I always like to say that 
PyTables is not a database competitor, but rather a companion.

Just for completeness, here it is my own comparison among PyTables and h5py:

http://www.pytables.org/moin/FAQ#HowdoesPyTablescomparewiththeh5pyproject.3F

 I have the feeling this is turning into killing a fly with a cannon...

Maybe.  But if you are going to keep many data on-disk, it can be a nice 
advantage in the medium term.

HTH,

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread Sebastian Haase
Hi,
depending on the needs you have you might be interested in my minimal
implementation of what I call a
mock-ndarray.
I needed somthing like this to analyze higher dimensional stacks of 2d
images and what I needed was mostly the indexing features of
nd-arrays.
A mockarray is initialized with a list of nd-arrays. The result is a
mock array having one additional dimention in front.
 a = N.arange(9)
 b = N.arange(9)
 a.shape=3,3
 b.shape=3,3
 c = F.mockNDarray(a,b)
 c.shape
(2, 3, 3)
 c[2,2,2]
 c[1,2,2]
8

No memory copy is done.

I put the module file here
http://drop.io/kpu4bib/asset/mockndarray-py
Otherwise this is part of my (BSD) Priithon image analysis framework.

Regards
Sebastian Haase

On Wed, Sep 2, 2009 at 11:31 AM, V. Armando Solés...@esrf.fr wrote:
 Citi, Luca wrote:
 As Gaël pointed out you cannot create A, B and then C
 as the concatenation of A and B without duplicating
 the vectors.

 But you can still re-link A to the left elements
 and B to the right ones afterwards by using views into C.


 Thanks for the hint. In my case the A array is already present and the
 contents of the B array can be read from disk.

 At least I have two workarounds making use of your suggested solution of
 re-linking:

 - create the C array, copy the contents of A to it and read the contents
 of B directly into C with duplication of the memory of A during some time.

 - save the array A in disk, create the array C, read the contents of A
 and B into it and re-link A and B with no duplication but ugly.

 Thanks,

 Armando


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] snow leopard and Numeric

2009-09-02 Thread Stefano Covino

 
  Is there a way to constrain an old-style compilation just to make a code
  work? I have similar problems with other old pieces of code.
 
 Use -arch i686 in the CFLAGS and LDFLAGS. I think.
 


Unfortunately, it seems not to have any effect.

I'll try something else.

Thanks anyway.

   Stefano

   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-02 Thread Romain Brette
Hi everyone,

In case anyone is interested, I just set up a google group to discuss 
GPU-based simulation for our Python neural simulator Brian:
http://groups.google.fr/group/brian-on-gpu
Our simulator relies heavily Numpy. I would be very happy if the GPU 
experts here would like to share their expertise.

Best,
Romain

Romain Brette a écrit :
 Sturla Molden a écrit :
 Thus, here is my plan:

 1. a special context-manager class
 2. immutable arrays inside with statement
 3. lazy evaluation: expressions build up a parse tree
 4. dynamic code generation
 5. evaluation on exit

 
 There seems to be some similarity with what we want to do to accelerate 
 our neural simulations (briansimulator.org), as described here:
 http://brian.svn.sourceforge.net/viewvc/brian/trunk/dev/BEPs/BEP-9-Automatic%20code%20generation.txt?view=markup
 (by the way BEP is Brian Enhancement Proposal)
 The speed-up factor we got in our experimental code with GPU is very 
 substantial when there are many neurons (= large vectors, e.g. 10 000 
 elements), even when operations are simple.
 
 Romain

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Dag Sverre Seljebotn
Sturla Molden wrote:
 Dag Sverre Seljebotn skrev:
   
 Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the 
 right type to use in this case?
   
 
 By the way, here is a more polished version, does it look ok?

 http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
 http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
   
I didn't look at the algorithm, but the types look OK (except for the 
gil as you say). Comments:

a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is 
defined as numpy.npy_intp*.
b) If you want higher performance with contiguous arrays (which occur a 
lot as inplace=False is default I guess) you can do

  np.ndarray[T, ndim=1, mode=c]

to tell the compiler the array is contiguous. That doubles the number of 
function instances though...


 Cython needs something like Java's generics by the way :-)
   
Yes, we all long for that. It will come as soon as somebody volunteers I 
suppose -- it shouldn't be all that difficult, but I don't think any of 
the existing devs will be up for it any time soon.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
Hello,

I want to be able to parse a binary file which hold information regarding to
experiment configuration and data obviously. Both configuration and data
sections are variable-length. A chuck this data is shown as below (after a
binary read operation)

'\x00\...@\x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00u\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00u\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00u\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00u\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00;
Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
\n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@

In binary form the file is 1.3MB, and when written to a txt file it expands
to 3.7MB totalling approximately 4 million characters. When fully processed
(with an IDL code) it produces 86 seperate configuration files, and 46 ascii
files for data, about 10-15 different instruments and in various
combinations plus sampling rates.

I attemted to use RE module, however the time it takes parse the file is
really longer than I expected. What would be wisest and fastest way to
tackle this issue? Upon successful re-construction of the data and metadata,
I am planning to use a much modular structure like HDF5 or netCDF4 for an
easy data storage and analyses.

Thank you.


-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Dag Sverre Seljebotn skrev:

  a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
 
  defined as numpy.npy_intp*.

I don't know Cython internals in detail but you do, I so take your word 
for it. I thought shape was a tuple of Python ints.


  b) If you want higher performance with contiguous arrays (which occur a
  lot as inplace=False is default I guess) you can do
 
  np.ndarray[T, ndim=1, mode=c]
 
  to tell the compiler the array is contiguous. That doubles the number of
  function instances though...

Thanks. I could either double the number of specialized select 
functions, or I could make a local copy using numpy.ascontiguousarray in 
the select function.

Quickselect touch the discontiguous array on average 2*n times, whereas 
numpy.ascontiguousarray touch the discontiguous array n times (but in 
orderly). Then there is the question of cache use: Contiguous arrays are 
the more friendly case, and numpy.ascontiguousarray is more friendly 
than quickselect. Also if quickselect is not done inplace (the common 
case for medians), we always have contigous arrays, so mode=c is 
almost always wanted. And when quickselect is done inplace, we usually 
have a contiguous input. This is also why I used a C pointer instead of 
your buffer syntax in the first version. Then I changed my mind, not 
sure why. So I'll try with a local copy first then. I don't think we 
want close to a megabyte of Cython generated gibberish C just for the 
median.

Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Citi, Luca skrev:
 Hello Sturla,
 In _median how can you, if n==2, use s[] if s is not defined?
 What if n==1?
   
That was a typo.



 Also, I think when returning an empty array, it should be of
 the same type you would get in the other cases.
Currently median returns numpy.nan for empty input arrays. I'll do that 
instead. I want it to behave exactly as the current implementation, 
except for the sorting.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrays without duplicating memory?

2009-09-02 Thread Sturla Molden
V. Armando Solé skrev:
 I am looking for a way to have a non contiguous array C in which the 
 left (1, 2000) elements point to A and the right (1, 4000) 
 elements point to B. 

 Any hint will be appreciated.

If you know in advance that A and B are going to be duplicated, you can 
use views:

C = np.zeros((1, 6000))
A = C[:,:2000]
B = C[:,2000:]

Now C is A and B concatenated horizontally.

If you can't to this, you could write the data to a temporary file and 
read it back, but it would be slow.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 09:38, Gökhan Severgokhanse...@gmail.com wrote:
 Hello,

 I want to be able to parse a binary file which hold information regarding to
 experiment configuration and data obviously. Both configuration and data
 sections are variable-length. A chuck this data is shown as below (after a
 binary read operation)

 '\x00\...@\x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00u\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00u\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00u\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00u\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00;
 Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
 \n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
 'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@

 In binary form the file is 1.3MB, and when written to a txt file it expands
 to 3.7MB totalling approximately 4 million characters. When fully processed
 (with an IDL code) it produces 86 seperate configuration files, and 46 ascii
 files for data, about 10-15 different instruments and in various
 combinations plus sampling rates.

 I attemted to use RE module, however the time it takes parse the file is
 really longer than I expected. What would be wisest and fastest way to
 tackle this issue? Upon successful re-construction of the data and metadata,
 I am planning to use a much modular structure like HDF5 or netCDF4 for an
 easy data storage and analyses.

Are there fixed delimiters? Like '\x00\...@\x00' perhaps? It might be
faster to search for those using .find() instead of regexes.

Without more information about how the file format gets split up, I'm
not sure we can make good suggestions.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread Sturla Molden
Sebastian Haase skrev:
 A mockarray is initialized with a list of nd-arrays. The result is a
 mock array having one additional dimention in front.
This is important, because often in the case of  'concatenation' a real 
concatenation is not needed. But then there is a common tool called 
Matlab, which unlike Python has no concept of lists and make numerical  
programmers think they do. C = [A, B] is a horizontal concatenation in 
Matlab. Too much exposure to Matlab cripples the mind easily.

Sturla

  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Sturla Molden
Gökhan Sever skrev:
 What would be wisest and fastest way to tackle this issue? 
Get the format, read the binary data directly, skip the ascii/regex part.

I sometimes use recarrays with formatted binary data; just constructing 
a dtype and use numpy.fromfile to read. That works when the binary file 
store C structs written successively.

Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.bitwise_and.identity

2009-09-02 Thread Citi, Luca
Hello,
I know I am splitting the hair, but should not
np.bitwise_and.identity be -1 instead of 1?
I mean, something with all the bits set?

I am checking whether all elements of a vector 'v'
have a certain bit 'b' set:
if np.bitwise_and.reduce(v)  (1  b):
   # do something

If v is empty, the expression is true for b==0 and
false otherwise.
In fact np.bitwise_and.identity is 1.

I like being able to use np.bitwise_and.reduce
because it many times faster than (v  (1  b)).all()
(it does not create the temporary vector).

Of course there are workarounds but I was wondering
if there is a reason for this behaviour.

Best,
Luca
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.bitwise_and.identity

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 11:11, Citi, Lucalc...@essex.ac.uk wrote:
 Hello,
 I know I am splitting the hair, but should not
 np.bitwise_and.identity be -1 instead of 1?
 I mean, something with all the bits set?

Probably. However, the .identity parts of ufuncs were designed mostly
to support multiply and add, so .identity is restricted to 0, 1, or
nothing currently. It will take some effort to change that. In the C
code, the sentinel value for no identity is -1, alas.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] adaptive interpolation on a regular 2d grid

2009-09-02 Thread denis bzowy
Robert Kern robert.kern at gmail.com writes:

 Looks good! Where can we get the code? Can this be specialized for 1D
functions?



Re code: sure, I'll be happy to post it if anyone points me to a real test
case or two, to help me understand the envelope -- 100^2 - 500^2 grid ?
(Splines on regular grids are fast and robust, hard to beat.)

Re 1d: I have an old version using 2 point - 2 slope splines, overkill,
will trim it.

(Is there a sandbox or wiki of interpolation testcases, not images ?)




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] adaptive interpolation on a regular 2d grid

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 11:33, denis bzowydenis-bz...@t-online.de wrote:
 Robert Kern robert.kern at gmail.com writes:

 Looks good! Where can we get the code? Can this be specialized for 1D
 functions?



 Re code: sure, I'll be happy to post it if anyone points me to a real test
 case or two, to help me understand the envelope -- 100^2 - 500^2 grid ?
 (Splines on regular grids are fast and robust, hard to beat.)

 Re 1d: I have an old version using 2 point - 2 slope splines, overkill,
 will trim it.

 (Is there a sandbox or wiki of interpolation testcases, not images ?)

I have some test cases here:

http://svn.scipy.org/svn/scikits/trunk/delaunay/scikits/delaunay/testfuncs.py

They are meant to test scattered data interpolation. They aren't going
to exercise your adaptive interpolation very hard.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread Sebastian Haase
I forgot to mention I also support transpose.
-S.


On Wed, Sep 2, 2009 at 5:23 PM, Sturla Moldenstu...@molden.no wrote:
 Sebastian Haase skrev:
 A mockarray is initialized with a list of nd-arrays. The result is a
 mock array having one additional dimention in front.
 This is important, because often in the case of  'concatenation' a real
 concatenation is not needed. But then there is a common tool called
 Matlab, which unlike Python has no concept of lists and make numerical
 programmers think they do. C = [A, B] is a horizontal concatenation in
 Matlab. Too much exposure to Matlab cripples the mind easily.

 Sturla


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 10:11 AM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 09:38, Gökhan Severgokhanse...@gmail.com wrote:
  Hello,
 
  I want to be able to parse a binary file which hold information regarding
 to
  experiment configuration and data obviously. Both configuration and data
  sections are variable-length. A chuck this data is shown as below (after
 a
  binary read operation)
 
  '\x00\x00@
 \x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00U\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00U\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00U\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00;
  Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
  \n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
  'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@
 
  In binary form the file is 1.3MB, and when written to a txt file it
 expands
  to 3.7MB totalling approximately 4 million characters. When fully
 processed
  (with an IDL code) it produces 86 seperate configuration files, and 46
 ascii
  files for data, about 10-15 different instruments and in various
  combinations plus sampling rates.
 
  I attemted to use RE module, however the time it takes parse the file is
  really longer than I expected. What would be wisest and fastest way to
  tackle this issue? Upon successful re-construction of the data and
 metadata,
  I am planning to use a much modular structure like HDF5 or netCDF4 for an
  easy data storage and analyses.

 Are there fixed delimiters? Like '\x00\...@\x00' perhaps? It might be
 faster to search for those using .find() instead of regexes.

 Without more information about how the file format gets split up, I'm
 not sure we can make good suggestions.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Fixed delims... That is what I used to parse metadata with a regex.

Something like:

r = re.compile(\0;.+?\...@\0\$, re.DOTALL) which extracts to portions that
I am interested. However I have yet to figure parsing separate data streams.
Couldn't find a way find to see which data blocks goes with which device.

I put the test binary file I am using at:

http://drop.io/1plh5rt


-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 10:34 AM, Sturla Molden stu...@molden.no wrote:

 Gökhan Sever skrev:
  What would be wisest and fastest way to tackle this issue?
 Get the format, read the binary data directly, skip the ascii/regex part.

 I sometimes use recarrays with formatted binary data; just constructing
 a dtype and use numpy.fromfile to read. That works when the binary file
 store C structs written successively.

 Sturla Molden


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


How to use recarrays with variable-length data fields as well as metadata?
Eventually I will record the data with numpy arrays but not sure how to
utilize recarrays in the first stage.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Robert Bradshaw
On Wed, 2 Sep 2009, Dag Sverre Seljebotn wrote:

 Sturla Molden wrote:
 Dag Sverre Seljebotn skrev:

 Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the
 right type to use in this case?


 By the way, here is a more polished version, does it look ok?

 http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
 http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx

 I didn't look at the algorithm, but the types look OK (except for the
 gil as you say). Comments:

 a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
 defined as numpy.npy_intp*.
 b) If you want higher performance with contiguous arrays (which occur a
 lot as inplace=False is default I guess) you can do

  np.ndarray[T, ndim=1, mode=c]

 to tell the compiler the array is contiguous. That doubles the number of
 function instances though...


 Cython needs something like Java's generics by the way :-)

 Yes, we all long for that. It will come as soon as somebody volunteers I
 suppose -- it shouldn't be all that difficult, but I don't think any of
 the existing devs will be up for it any time soon.

Danilo's C++ project has some baby steps in that direction, though it'll 
need to be expanded quite a bit to handle this.

- Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 11:53, Gökhan Severgokhanse...@gmail.com wrote:

 How to use recarrays with variable-length data fields as well as metadata?

You don't.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Citi, Luca
If I understand the problem...
if you are 100% sure that ', ' only occurs between fields
and never within, you can use the 'split' method of the string
which could be faster than regexp in this simple case.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.bitwise_and.identity

2009-09-02 Thread Citi, Luca
Thank you, Robert, for the quick reply.

I just saw the line
#define PyUFunc_None -1
in the ufuncobject.h file.
It is always the same, you choose a sentinel thinking
that it doesn't conflict with any possible value and
you later find there is one such case.

As said it is not a big deal.
I wouldn't spend time on it.

Best,
Luca
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 12:01 PM, Citi, Luca lc...@essex.ac.uk wrote:

 If I understand the problem...
 if you are 100% sure that ', ' only occurs between fields
 and never within, you can use the 'split' method of the string
 which could be faster than regexp in this simple case.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


But it is not possible to extract a pattern such as within a field. A
construct like in regex starting with a ; till the end of the section. ??

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 12:04 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 11:53, Gökhan Severgokhanse...@gmail.com wrote:

  How to use recarrays with variable-length data fields as well as
 metadata?

 You don't.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



I was just confirming my guess :)

The data in the binary file was written in a variable-length fashion.
Although each chuck has a specific starting indication like
\x00\x...@\x00$\x00\x02
the amount of the in each section varies depends on what was in the written
stream.

How your find suggestion work? It just returns the location of the first
occurrence.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 12:27, Gökhan Severgokhanse...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 12:01 PM, Citi, Luca lc...@essex.ac.uk wrote:

 If I understand the problem...
 if you are 100% sure that ', ' only occurs between fields
 and never within, you can use the 'split' method of the string
 which could be faster than regexp in this simple case.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 But it is not possible to extract a pattern such as within a field. A
 construct like in regex starting with a ; till the end of the section. ??

I can't parse that sentence. Can you describe the format in a little
more detail? Or point to documentation of the format? Or the IDL code
that parses it?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 12:33, Gökhan Severgokhanse...@gmail.com wrote:
 How your find suggestion work? It just returns the location of the first
 occurrence.

http://docs.python.org/library/stdtypes.html#str.find

str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is
found, such that sub is contained in the range [start, end]. Optional
arguments start and end are interpreted as in slice notation. Return
-1 if sub is not found.

But perhaps you should profile your code to see where it is actually
taking up the time. Regexes on 1.3 MB of data should be quite fast.

In [21]: marker = '\x00\x...@\x00$\x00\x02'

In [22]: block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4)

In [23]: data = int(round(1.3 * 1024)) * block

In [24]: import re

In [25]: r = re.compile(re.escape(marker))

In [26]: %time r.findall(data)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 12:29 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 12:27, Gökhan Severgokhanse...@gmail.com wrote:
 
  On Wed, Sep 2, 2009 at 12:01 PM, Citi, Luca lc...@essex.ac.uk wrote:
 
  If I understand the problem...
  if you are 100% sure that ', ' only occurs between fields
  and never within, you can use the 'split' method of the string
  which could be faster than regexp in this simple case.
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  But it is not possible to extract a pattern such as within a field. A
  construct like in regex starting with a ; till the end of the section. ??

 I can't parse that sentence. Can you describe the format in a little
 more detail? Or point to documentation of the format? Or the IDL code
 that parses it?

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Put the reference manual in:

http://drop.io/1plh5rt

First few pages describe the data format they use.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 12:29 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 12:27, Gökhan Severgokhanse...@gmail.com wrote:
 
  On Wed, Sep 2, 2009 at 12:01 PM, Citi, Luca lc...@essex.ac.uk wrote:
 
  If I understand the problem...
  if you are 100% sure that ', ' only occurs between fields
  and never within, you can use the 'split' method of the string
  which could be faster than regexp in this simple case.
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  But it is not possible to extract a pattern such as within a field. A
  construct like in regex starting with a ; till the end of the section. ??

 I can't parse that sentence. Can you describe the format in a little
 more detail? Or point to documentation of the format? Or the IDL code
 that parses it?

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


IDL processing code is on:

http://adpaa.svn.sourceforge.net/viewvc/adpaa/trunk/src/Level1/process_raw/

A part of ADPAA - Aircraft Data Processing and Analysis project.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 12:46 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 12:33, Gökhan Severgokhanse...@gmail.com wrote:
  How your find suggestion work? It just returns the location of the first
  occurrence.

 http://docs.python.org/library/stdtypes.html#str.find

 str.find(sub[, start[, end]])
Return the lowest index in the string where substring sub is
 found, such that sub is contained in the range [start, end]. Optional
 arguments start and end are interpreted as in slice notation. Return
 -1 if sub is not found.

 But perhaps you should profile your code to see where it is actually
 taking up the time. Regexes on 1.3 MB of data should be quite fast.

 In [21]: marker = '\x00\x...@\x00$\x00\x02'

 In [22]: block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4)

 In [23]: data = int(round(1.3 * 1024)) * block

 In [24]: import re

 In [25]: r = re.compile(re.escape(marker))

 In [26]: %time r.findall(data)
 CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
 Wall time: 0.01 s

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



This is what I have been using. It's not returning exactly what I want but
very close besides its being slow:

I[52]: mypattern = re.compile('\0\0\1\0.+?\...@\0\$', re.DOTALL)

I[53]: res = mypattern.findall(ss)

I[54]: len res
- len(res)
O[54]: 95

I[55]: %time mypattern.findall(ss);
CPU times: user 9.14 s, sys: 0.00 s, total: 9.14 s
Wall time: 9.16 s

I[57]: res[0]
O[57]:
'\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00
*prj.300*\x00; Version = 1\nProjectName = PME1 2009 King Air
N825ST\nFlightId = \nAircraftType = WMI King Air 200\nAircraftId =
N825ST\nOperatorName = Weather Modification Inc.\nComments = \n\x00\x00@
\x00$'

I need the part starting with the bold typed section (prj.300) and till the
end of the section. I need the bold part because I can construct file names
from that and write the following content in it.

Ohh when it works the resulting search should return me 86 occurrence.


-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 13:28, Gökhan Severgokhanse...@gmail.com wrote:
 Put the reference manual in:

 http://drop.io/1plh5rt

 First few pages describe the data format they use.

Ah. The fields are *not* delimited by a fixed value. Regexes are no
help to you for pulling out the information you need, except perhaps
later to parse the text fields. I think you are also getting spurious
results because your regex matches things inside data fields.

Instead, you have a header containing the length of the data field
followed by the data field. Create a structured dtype that corresponds
to the DataDir struct on page 15. Note that unsigned int there is
actually a numpy.uint16, not a uint32.

  dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16),
('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample',
np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2',
np.uint8), ('param3', np.uint8), ('address', np.uint16)])

Now read dt.itemsize bytes from the file and use

  header = fromstring(f.read(dt.itemsize), dt)[0]

to get a record object that corresponds to the header. Use the
dataOffset and numberBytes fields to extract the actual data bytes
from the file.

For example, if we go to the second header field:

In [28]: f.seek(dt.itemsize,0)

In [29]: header = np.fromstring(f.read(dt.itemsize), dt)[0]

In [30]: header
Out[30]: (65530, 100, 8, 1, 8, 255, 0, 0, 0, 43605)

In [31]: f.seek(header['dataOffset'], 0)

In [32]: f.read(header['numberBytes'])
Out[32]: 'prj.300\x00'


There are still some semantic issues you need to work out, still.
There are multiple buffers per file, and the dataOffsets are
relative to the start of the buffer, not the file.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Chad Netzer
On Mon, Aug 31, 2009 at 9:06 PM, Sturla Moldenstu...@molden.no wrote:

 We recently has a discussion regarding an optimization of NumPy's median
 to average O(n) complexity. After some searching, I found out there is a
 selection algorithm competitive in speed with Hoare's quick select. It
 has the advantage of being a lot simpler to implement. In plain Python:

 Chad, you can continue to write quick select using NumPy's C quick sort
 in numpy/core/src/_sortmodule.c.src.  When you are done, it might be
 about 10% faster than this. :-)

I was sick for a bit last week, so got stalled on my version, but I'll
be working on it this weekend.  I'm going for a more general partition
function, that could have slightly more general use cases than just a
median.  Nevertheless, its good to see there could be several options,
hopefully at least one of which can be put into numpy.

By the way, as far as I can tell, the above algorithm is exactly the
same idea as a non-recursive Hoare (ie. quicksort) selection:  Do the
partition, then only proceed to the sub-partition that must contain
the nth element.My version is a bit more general, allowing
partitioning on a range of elements rather than just one, but the
concept is the same.  The numpy quicksort already does non recursive
sorting.

I'd also like to, if possible, have a specialized 2D version, since
image media filtering is one of my interests, and the C version works
on 1D (raveled) arrays only.

-C
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Jeremy Mayes
This one line causes python to core dump on linux.
numpy.lexsort([
numpy.array(['-','-','-','-','-','-','-','-','-','-','-','-','-'])[::-1],numpy.array([732685.,
732685.,  732685.,  732685.,  732685.,  732685.,732685.,  732685.,
732685.,  732685.,  732685.,  732685.,  732679.])[::-1]])

Here's some version info:

python 2.5.4
numpy 1.3.0

error is
*** glibc detected *** free(): invalid next size (fast): 0x00526be0
***

Any ideas?
-- 
--jlm
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy on Snow Leopard

2009-09-02 Thread Celil Rufat
I am unable to build numpy on Snow Leopard. The error that I am getting is
shown below. It is a linking issue related to the change in the the default
behavior of gcc under Snow Leopard. Before it used to compile for the 32 bit
i386 architecture, now the default is the 64 bit x86_64 architecture.

Has anybody successfully compiled numpy for MACOSX 10.6. If so I would
appreciate if you can tell me how you fixed this issue.

Regards,
Celil ... C compiler: gcc -arch ppc -arch i386 -isysroot
/Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -fno-common -dynamic
-DNDEBUG -g -O3 ... gcc: _configtest.c _configtest.c:1: warning: conflicting
types for built-in function ‘exp’ _configtest.c:1: warning: conflicting
types for built-in function ‘exp’ gcc _configtest.o -o _configtest ld:
warning: in _configtest.o, missing required architecture x86_64 in file
Undefined symbols: _main, referenced from: start in crt1.10.6.o ld:
symbol(s) not found collect2: ld returned 1 exit status ld: warning: in
_configtest.o, missing required architecture x86_64 in file Undefined
symbols: _main, referenced from: start in crt1.10.6.o ld: symbol(s) not
found collect2: ld returned 1 exit status failure.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Charles R Harris
On Wed, Sep 2, 2009 at 4:37 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 17:23, Jeremy Mayesjeremy.ma...@gmail.com wrote:
  This one line causes python to core dump on linux.
  numpy.lexsort([
 
 numpy.array(['-','-','-','-','-','-','-','-','-','-','-','-','-'])[::-1],numpy.array([732685.,
  732685.,  732685.,  732685.,  732685.,  732685.,732685.,  732685.,
  732685.,  732685.,  732685.,  732685.,  732679.])[::-1]])
 
  Here's some version info:
 
  python 2.5.4
  numpy 1.3.0
 
  error is
  *** glibc detected *** free(): invalid next size (fast):
 0x00526be0
  ***
 
  Any ideas?

 Huh. The line executes for me on OS X, but the interpreter crashes
 when exiting. Here is my backtrace:


 Thread 0 Crashed:
 0   org.python.python   0x00270760 collect + 288
 1   org.python.python   0x002712ea PyGC_Collect + 42
 2   org.python.python   0x00260390 Py_Finalize + 208
 3   org.python.python   0x0026f750 Py_Main + 2768
 4   org.python.python   0x1f82 0x1000 + 3970
 5   org.python.python   0x1ea9 0x1000 + 3753


 Can you show us a gdb backtrace on your machine?


It's the [::-1] what done it. I suspect a copy is being made and has a bug.

In [1]: a = np.array(['-']*100)

In [2]: b = np.array([1.0]*100)

In [3]: i = lexsort((a,b))

In [4]: i = lexsort((a[::-1]))

In [5]: i = lexsort((b[::-1]))

In [6]: i = lexsort((a,b[::-1]))

In [7]: i = lexsort((a[::-1],b))

*Crash*

These also work:

In [3]: i = lexsort((b[::-1],a))

In [4]: i = lexsort((b[::-1],b[::-1]))

In [5]: i = lexsort((a[::-1],a[::-1]))

In [6]: i = lexsort((a,b[::-1]))

So it seems to be the combination of reversed string a with an array of
different type. Looks like a type setting is getting skipped somewhere.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] help creating a reversed cumulative histogram

2009-09-02 Thread Tim Michelsen
Hello fellow numy users,
I posted some questions on histograms recently [1, 2] but still couldn't 
find  a solution.

I am trying to create a inverse cumulative histogram [3] which shall 
look like [4] but with the higher values at the left.

The classification shall follow this exemplary rule:

class 1: 0
all values  0

class 2: 10
all values  10

class 3: 15
all values  15

class 4: 20
all values  20

class 5: 25
all values  25

[...]

I could get this easily in a spreadsheet by creating a matix with 
conditional statements (if VALUES_COL  CLASS_BOUNDARY; VALUES_COL; '-').

With python (numpy or pylab) I was not successful. The plotted histogram 
envelope turned out to be just the inverted curve as the one created 
with the spreadsheet app.

I have briely visualised the issue here [5]. I hope that this makes it 
more understandable.

Later I would like to sum and count all values in each bin as discussed 
in [2].

May someone give me pointer or hint on how to improve my code below to 
achive the desired histogram?



Thanks a lot in advance,
Timmie

[1]: http://www.nabble.com/np.hist-with-masked-values-to25243905.html
[2]: 
http://www.nabble.com/histogram%3A-sum-up-values-in-each-bin-to25171265.html
[3]: http://en.wikipedia.org/wiki/Histogram#Cumulative_histogram
[4]: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=126
[5]: http://www.scribd.com/doc/19371606/Distribution-Histogram

# CODE #
normed = False
values # loaded data as array
bins = 10


### sum
## taken from
## 
http://www.nabble.com/Scipy-and-statistics%3A-probability-density-function-to24683007.html#a24683304
sums = np.histogram(values, weights=values,
 normed=normed,
 bins=bins)
ecdf_sums = np.hstack([0.0, sums[0].cumsum() ])
ecdf_inv_sums = ecdf_sums[::-1]


pylab.plot(sums[1], ecdf_inv_sums)
pylab.show()

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Citi, Luca
I experience the same problem.
A few more additional test cases:

In [1]: import numpy

In [2]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5)])
Out[2]: array([0, 1, 2, 3, 4])

In [3]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5.)])
Out[3]: array([0, 1, 2, 3, 4])

In [4]: numpy.lexsort([numpy.arange(5), numpy.arange(5)])
Out[4]: array([0, 1, 2, 3, 4])

In [5]: numpy.lexsort([numpy.arange(5), numpy.arange(5.)])
Out[5]: array([0, 1, 2, 3, 4])

In [6]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5)])
Out[6]: array([0, 1, 2, 3, 4])

In [7]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5.)])
*** glibc detected *** /usr/bin/python: free(): invalid next size (fast): 
0x09be6eb8 ***

It looks like the problem is when the first array is reversed and the second is 
float.

I am not familiar with gdb. If I run gdb python, run it, and give the 
commands above,
it hangs at the glibc line without returning to gdb unless I hit CTRL-C. In 
this case,
I guess, the backtrace I get is related to the CTRL-C rather than the error.
Any hint in how to obtain useful information from gdb?

Best,
Luca

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Charles R Harris
On Wed, Sep 2, 2009 at 1:25 PM, Chad Netzer chad.net...@gmail.com wrote:

 On Mon, Aug 31, 2009 at 9:06 PM, Sturla Moldenstu...@molden.no wrote:
 
  We recently has a discussion regarding an optimization of NumPy's median
  to average O(n) complexity. After some searching, I found out there is a
  selection algorithm competitive in speed with Hoare's quick select. It
  has the advantage of being a lot simpler to implement. In plain Python:

  Chad, you can continue to write quick select using NumPy's C quick sort
  in numpy/core/src/_sortmodule.c.src.  When you are done, it might be
  about 10% faster than this. :-)

 I was sick for a bit last week, so got stalled on my version, but I'll
 be working on it this weekend.  I'm going for a more general partition
 function, that could have slightly more general use cases than just a
 median.  Nevertheless, its good to see there could be several options,
 hopefully at least one of which can be put into numpy.

 By the way, as far as I can tell, the above algorithm is exactly the
 same idea as a non-recursive Hoare (ie. quicksort) selection:  Do the
 partition, then only proceed to the sub-partition that must contain
 the nth element.My version is a bit more general, allowing
 partitioning on a range of elements rather than just one, but the
 concept is the same.  The numpy quicksort already does non recursive
 sorting.

 I'd also like to, if possible, have a specialized 2D version, since
 image media filtering is one of my interests, and the C version works
 on 1D (raveled) arrays only.


There are special hardwired medians for 2,3,5,9 elements, which covers a lot
of image processing. They aren't in numpy, though ;) David has implemented a
NeighborhoodIter that could help extract the elements if you want to deal
with images.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Charles R Harris
On Wed, Sep 2, 2009 at 5:19 PM, Citi, Luca lc...@essex.ac.uk wrote:

 I experience the same problem.
 A few more additional test cases:

 In [1]: import numpy

 In [2]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5)])
 Out[2]: array([0, 1, 2, 3, 4])

 In [3]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5.)])
 Out[3]: array([0, 1, 2, 3, 4])

 In [4]: numpy.lexsort([numpy.arange(5), numpy.arange(5)])
 Out[4]: array([0, 1, 2, 3, 4])

 In [5]: numpy.lexsort([numpy.arange(5), numpy.arange(5.)])
 Out[5]: array([0, 1, 2, 3, 4])

 In [6]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5)])
 Out[6]: array([0, 1, 2, 3, 4])

 In [7]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5.)])
 *** glibc detected *** /usr/bin/python: free(): invalid next size (fast):
 0x09be6eb8 ***

 It looks like the problem is when the first array is reversed and the
 second is float.

 I am not familiar with gdb. If I run gdb python, run it, and give the
 commands above,
 it hangs at the glibc line without returning to gdb unless I hit CTRL-C. In
 this case,
 I guess, the backtrace I get is related to the CTRL-C rather than the
 error.
 Any hint in how to obtain useful information from gdb?


The actual bug is probably not where the crash occurs. I think there is
enough info to track it down for anyone who wants to crawl through the
relevant code.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help creating a reversed cumulative histogram

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 18:15, Tim Michelsentimmichel...@gmx-topmail.de wrote:
 Hello fellow numy users,
 I posted some questions on histograms recently [1, 2] but still couldn't
 find  a solution.

 I am trying to create a inverse cumulative histogram [3] which shall
 look like [4] but with the higher values at the left.

Okay. That is completely different from what you've asked before.

 The classification shall follow this exemplary rule:

 class 1: 0
 all values  0

 class 2: 10
 all values  10

 class 3: 15
 all values  15

 class 4: 20
 all values  20

 class 5: 25
 all values  25

 [...]

 I could get this easily in a spreadsheet by creating a matix with
 conditional statements (if VALUES_COL  CLASS_BOUNDARY; VALUES_COL; '-').

 With python (numpy or pylab) I was not successful. The plotted histogram
 envelope turned out to be just the inverted curve as the one created
 with the spreadsheet app.

 sums = np.histogram(values, weights=values,
                                     normed=normed,
                                     bins=bins)
 ecdf_sums = np.hstack([0.0, sums[0].cumsum() ])
 ecdf_inv_sums = ecdf_sums[::-1]

This is not the kind of inversion that you are looking for. You want

ecdf_inv_sums = ecdf_sums[-1] - ecdf_sums

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Charles R Harris
On Wed, Sep 2, 2009 at 5:19 PM, Citi, Luca lc...@essex.ac.uk wrote:

 I experience the same problem.
 A few more additional test cases:

 In [1]: import numpy

 In [2]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5)])
 Out[2]: array([0, 1, 2, 3, 4])

 In [3]: numpy.lexsort([numpy.arange(5)[::-1].copy(), numpy.arange(5.)])
 Out[3]: array([0, 1, 2, 3, 4])

 In [4]: numpy.lexsort([numpy.arange(5), numpy.arange(5)])
 Out[4]: array([0, 1, 2, 3, 4])

 In [5]: numpy.lexsort([numpy.arange(5), numpy.arange(5.)])
 Out[5]: array([0, 1, 2, 3, 4])

 In [6]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5)])
 Out[6]: array([0, 1, 2, 3, 4])

 In [7]: numpy.lexsort([numpy.arange(5)[::-1], numpy.arange(5.)])
 *** glibc detected *** /usr/bin/python: free(): invalid next size (fast):
 0x09be6eb8 ***

 It looks like the problem is when the first array is reversed and the
 second is float.


It's mixing types with different bit sizes, small type first.

In [6]: a = np.array([1.0]*100, dtype=int16)

In [7]: b = np.array([1.0]*100, dtype=int32)

In [8]: lexsort((a[::-1],b))

*Crash*

Probably the results are incorrect for the reverse order of types that
doesn't crash, but different arrays would be needed to check that.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help creating a reversed cumulative histogram

2009-09-02 Thread josef . pktd
On Wed, Sep 2, 2009 at 7:26 PM, Robert Kernrobert.k...@gmail.com wrote:
 On Wed, Sep 2, 2009 at 18:15, Tim Michelsentimmichel...@gmx-topmail.de 
 wrote:
 Hello fellow numy users,
 I posted some questions on histograms recently [1, 2] but still couldn't
 find  a solution.

 I am trying to create a inverse cumulative histogram [3] which shall
 look like [4] but with the higher values at the left.

 Okay. That is completely different from what you've asked before.

 The classification shall follow this exemplary rule:

 class 1: 0
 all values  0

 class 2: 10
 all values  10

 class 3: 15
 all values  15

 class 4: 20
 all values  20

 class 5: 25
 all values  25

 [...]

 I could get this easily in a spreadsheet by creating a matix with
 conditional statements (if VALUES_COL  CLASS_BOUNDARY; VALUES_COL; '-').

 With python (numpy or pylab) I was not successful. The plotted histogram
 envelope turned out to be just the inverted curve as the one created
 with the spreadsheet app.

 sums = np.histogram(values, weights=values,
                                     normed=normed,
                                     bins=bins)
 ecdf_sums = np.hstack([0.0, sums[0].cumsum() ])
 ecdf_inv_sums = ecdf_sums[::-1]

 This is not the kind of inversion that you are looking for. You want

 ecdf_inv_sums = ecdf_sums[-1] - ecdf_sums

and you can plot the histogram with bar

eisf_sums = ecdf_sums[-1] - ecdf_sums   # empirical inverse survival
function of weights
width = sums[1][1] - sums[1][0]
rects1 = plt.bar(sums[1], eisf_sums, width, color='b')

Are you sure you want cumulative weights in the histogram?

Josef


 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help creating a reversed cumulative histogram

2009-09-02 Thread Tim Michelsen
Hello Robert and Josef,
thanks for the quick answers! I really appreciate this.

 I am trying to create a inverse cumulative histogram [3] which shall
 look like [4] but with the higher values at the left.
 Okay. That is completely different from what you've asked before.
You are right.
But it's soemtimes hard to decribe a desired and expected output in 
python terms and pseudocode.
I still have to lern more numpy vocabs...

I will evalute your answers and give feedback.

Regards,
Timmie

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] masked arrays of structured arrays

2009-09-02 Thread Ernest Adrogué
31/08/09 @ 14:37 (-0400), thus spake Pierre GM:
 On Aug 31, 2009, at 2:33 PM, Ernest Adrogué wrote:
 
  30/08/09 @ 13:19 (-0400), thus spake Pierre GM:
  I can't reproduce that with a recent SVN version (r7348). What  
  version
  of numpy are you using ?
 
  Version 1.2.1
 
 That must be that. Can you try w/ 1.3 ?

Yes, in version 1.3.0 it's fixed.

-- 
Ernest
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] help creating a reversed cumulative histogram

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 19:11, Tim Michelsentimmichel...@gmx-topmail.de wrote:
 Hello Robert and Josef,
 thanks for the quick answers! I really appreciate this.

 I am trying to create a inverse cumulative histogram [3] which shall
 look like [4] but with the higher values at the left.
 Okay. That is completely different from what you've asked before.
 You are right.
 But it's soemtimes hard to decribe a desired and expected output in
 python terms and pseudocode.
 I still have to lern more numpy vocabs...

Actually, I apologize. I meant to delete that line before sending the
message. It was unnecessary and abusive.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy core dump on linux

2009-09-02 Thread Charles R Harris
On Wed, Sep 2, 2009 at 4:23 PM, Jeremy Mayes jeremy.ma...@gmail.com wrote:

 This one line causes python to core dump on linux.
 numpy.lexsort([
 numpy.array(['-','-','-','-','-','-','-','-','-','-','-','-','-'])[::-1],numpy.array([732685.,
 732685.,  732685.,  732685.,  732685.,  732685.,732685.,  732685.,
 732685.,  732685.,  732685.,  732685.,  732679.])[::-1]])

 Here's some version info:

 python 2.5.4
 numpy 1.3.0

 error is
 *** glibc detected *** free(): invalid next size (fast): 0x00526be0
 ***


I've opened ticket #1217 for this.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Gökhan Sever
On Wed, Sep 2, 2009 at 1:58 PM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Sep 2, 2009 at 13:28, Gökhan Severgokhanse...@gmail.com wrote:
  Put the reference manual in:
 
  http://drop.io/1plh5rt
 
  First few pages describe the data format they use.

 Ah. The fields are *not* delimited by a fixed value. Regexes are no
 help to you for pulling out the information you need, except perhaps
 later to parse the text fields. I think you are also getting spurious
 results because your regex matches things inside data fields.

 Instead, you have a header containing the length of the data field
 followed by the data field. Create a structured dtype that corresponds
 to the DataDir struct on page 15. Note that unsigned int there is
 actually a numpy.uint16, not a uint32.

  dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16),
 ('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample',
 np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2',
 np.uint8), ('param3', np.uint8), ('address', np.uint16)])

 Now read dt.itemsize bytes from the file and use

  header = fromstring(f.read(dt.itemsize), dt)[0]

 to get a record object that corresponds to the header. Use the
 dataOffset and numberBytes fields to extract the actual data bytes
 from the file.

 For example, if we go to the second header field:

 In [28]: f.seek(dt.itemsize,0)

 In [29]: header = np.fromstring(f.read(dt.itemsize), dt)[0]

 In [30]: header
 Out[30]: (65530, 100, 8, 1, 8, 255, 0, 0, 0, 43605)

 In [31]: f.seek(header['dataOffset'], 0)

 In [32]: f.read(header['numberBytes'])
 Out[32]: 'prj.300\x00'


 There are still some semantic issues you need to work out, still.
 There are multiple buffers per file, and the dataOffsets are
 relative to the start of the buffer, not the file.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Robert,

You must have thrown a couple RTFM's while replying my emails :) I usually
take trial-error approaches initially, and don't give up unless I hit a
hurdle so fast, which in this case resulted with the unsuccessful regex
approach. However from the good point I have learnt the basics of regular
expressions and realized how powerful could they be during a text parsing
task.

Enough prattle, below is what I am working on:

So far I was successfully able to extract the file names and the data
associated with those names (with the exception of multiple buffer per file
cases).

However not reading time increments correctly, I should be seeing 1 sec
incremental time ticks from the time segment reading, but all it does is to
return the same first time information.

Furthermore, I still couldn't figure out how to wrap the main looping suite
(range(500) is just a dummy number which will let me process whole binary
data) I don't know yet how to make the range input generic which will work
any size of similar binary file.


import numpy as np
import struct

f = open('test.sea', 'rb')

dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16),
('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample',
np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2',
np.uint8), ('param3', np.uint8), ('address', np.uint16)])


start = 0
ct = 0

for i in range(500):

header = np.fromstring(f.read(dt.itemsize), dt)[0]

if header['tagNumber'] == 65530:
loc = f.tell()
f.seek(start + header['dataOffset'])
f.read(header['numberBytes'])
f.seek(loc)
elif header['tagNumber'] == 65531:
loc = f.tell()
f.seek(start + header['dataOffset'])
f.read(header['numberBytes'])
start = f.tell()
elif header['tagNumber'] == 0:
loc = f.tell()
f.seek(start + header['dataOffset'])
print f.tell()
k = f.read(header['numberBytes']
print struct.unpack('9h', k[:18])
f.seek(loc)
ct += 1



-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Chad Netzer skrev:
 By the way, as far as I can tell, the above algorithm is exactly the
 same idea as a non-recursive Hoare (ie. quicksort) selection:  Do the
 partition, then only proceed to the sub-partition that must contain
 the nth element.My version is a bit more general, allowing
 partitioning on a range of elements rather than just one, but the
 concept is the same.  The numpy quicksort already does non recursive
 sorting.

 I'd also like to, if possible, have a specialized 2D version, since
 image media filtering is one of my interests, and the C version works
 on 1D (raveled) arrays only.
I agree. NumPy (or SciPy) could have a select module similar to the sort 
module. If the select function takes an axis argument similar to the 
sort functions, only a small change to the current np.median would needed.

Take a look at this:

http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx

Here is a select function that takes an axis argument. There are 
specialized versions for 1D, 2D, and 3D. Input can be contiguous or not. 
For 4D and above, axes are found by recursion on the shape array. Thus 
it should be fast regardless of dimensions.

I haven't tested the Cython code /thoroughly/, but at least it does compile.


Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Robert Kern
On Wed, Sep 2, 2009 at 23:59, Gökhan Severgokhanse...@gmail.com wrote:

 Robert,

 You must have thrown a couple RTFM's while replying my emails :)

Not really. There's no manual for this. Greg Wilson's _Data Crunching_
may be a good general introduction to how to think about these
problems.

http://www.pragprog.com/titles/gwd/data-crunching

 I usually
 take trial-error approaches initially, and don't give up unless I hit a
 hurdle so fast, which in this case resulted with the unsuccessful regex
 approach. However from the good point I have learnt the basics of regular
 expressions and realized how powerful could they be during a text parsing
 task.

 Enough prattle, below is what I am working on:

 So far I was successfully able to extract the file names and the data
 associated with those names (with the exception of multiple buffer per file
 cases).

 However not reading time increments correctly, I should be seeing 1 sec
 incremental time ticks from the time segment reading, but all it does is to
 return the same first time information.

 Furthermore, I still couldn't figure out how to wrap the main looping suite
 (range(500) is just a dummy number which will let me process whole binary
 data) I don't know yet how to make the range input generic which will work
 any size of similar binary file.

while True:
   ...

   if no_more_data():
   break

 import numpy as np
 import struct

 f = open('test.sea', 'rb')

 dt = np.dtype([('tagNumber', np.uint16), ('dataOffset', np.uint16),
 ('numberBytes', np.uint16), ('samples', np.uint16), ('bytesPerSample',
 np.uint16), ('type', np.uint8), ('param1', np.uint8), ('param2',
 np.uint8), ('param3', np.uint8), ('address', np.uint16)])


 start = 0
 ct = 0

 for i in range(500):

     header = np.fromstring(f.read(dt.itemsize), dt)[0]

     if header['tagNumber'] == 65530:
     loc = f.tell()
     f.seek(start + header['dataOffset'])
     f.read(header['numberBytes'])

Presumably you are doing something with this data, not just discarding it.

     f.seek(loc)

This should be f.seek(loc, 0). f.seek(nbytes) is to seek forward from
the current position by nbytes. The 0 tells it to start from the
beginning.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Robert Kern
On Thu, Sep 3, 2009 at 00:09, Sturla Moldenstu...@molden.no wrote:
 Chad Netzer skrev:

 I'd also like to, if possible, have a specialized 2D version, since
 image media filtering is one of my interests, and the C version works
 on 1D (raveled) arrays only.
 I agree. NumPy (or SciPy) could have a select module similar to the sort
 module. If the select function takes an axis argument similar to the
 sort functions, only a small change to the current np.median would needed.

 Take a look at this:

 http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx

 Here is a select function that takes an axis argument. There are
 specialized versions for 1D, 2D, and 3D. Input can be contiguous or not.
 For 4D and above, axes are found by recursion on the shape array. Thus
 it should be fast regardless of dimensions.

When he is talking about 2D, I believe he is referring to median
filtering rather than computing the median along an axis. I.e.,
replacing each pixel with the median of a specified neighborhood
around the pixel.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Chad Netzer
On Wed, Sep 2, 2009 at 10:28 PM, Robert Kernrobert.k...@gmail.com wrote:

 When he is talking about 2D, I believe he is referring to median
 filtering rather than computing the median along an axis. I.e.,
 replacing each pixel with the median of a specified neighborhood
 around the pixel.

That's right, Robert.  Basically, I meant doing a median on a square
(or rectangle) view of an array, without first having to ravel(),
thus generally saving a copy.  But actually, since my selection based
median overwrites the source array, it may not save a copy anyway.
But Charles Harris's earlier suggestion of some hard coded medians for
common filter template sizes (ie 3x3, 5x5, etc.) may be a nice
addition to scipy, especially if it can be generalized somewhat to
other filters.

-C
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Robert Kern skrev:
 When he is talking about 2D, I believe he is referring to median
 filtering rather than computing the median along an axis. I.e.,
 replacing each pixel with the median of a specified neighborhood
 around the pixel.

   
That's not something numpy's median function should be specialized to 
do. IMHO, median filtering belongs to scipy.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Jon Wright
Chad Netzer wrote:
 But Charles Harris's earlier suggestion of some hard coded medians for
 common filter template sizes (ie 3x3, 5x5, etc.) may be a nice
 addition to scipy, especially if it can be generalized somewhat to
 other filters.
   
For 2D images try looking into PIL :  ImageFilter.MedianFilter

Cheers,

Jon
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion