Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 9:23 AM, Warren Weckesser
 wrote:
>
>
> On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin  wrote:
>>
>> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen 
>> wrote:
>> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin 
>> > wrote:
>> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>>  I would like to use numpy's memmap on some data files I have. The
>>  first
>>  12 or so lines of the files contain text (header information) and the
>>  remainder has the numerical data. Is there a way I can tell memmap to
>>  skip a specified number of lines instead of a number of bytes?
>> >>>
>> >>> First use standard Python I/O functions to determine the number of
>> >>> bytes to skip at the beginning and the number of data items. Then pass
>> >>> in `offset` and `shape` parameters to numpy.memmap.
>> >>
>> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> >> functions you are referring to. Can you point me to do the
>> >> documentation?
>> >>
>> >> Thanks again,
>> >> Jeremy
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >
>> > this might get you started:
>> >
>> >
>> > import numpy as np
>> >
>> > # make some fake data with 12 header lines.
>> > with open('test.mm', 'w') as fhw:
>> >    print >> fhw, "\n".join('header' for i in range(12))
>> >    np.arange(100, dtype=np.uint).tofile(fhw)
>> >
>> > # use normal python io to determine of offset after 12 lines.
>> > with open('test.mm') as fhr:
>> >    for i in range(12): fhr.readline()
>> >    offset = fhr.tell()
>> >
>> > # use the offset in your call to np.memmap.
>> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>>
>> Thanks, that looks good. I tried it, but it doesn't get the correct
>> data. I really don't understand what is going on. A simple code and
>> sample data is attached if anyone has a chance to look at it.
>
>
> Your data file is all text.  memmap is generally for binary data; it won't
> work with this file.
>
> Warren

Yikes! I missed the "binary" in the first line of the documentation. Sorry!

Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Warren Weckesser
On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin  wrote:

> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen 
> wrote:
> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin 
> wrote:
> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>  I would like to use numpy's memmap on some data files I have. The
> first
>  12 or so lines of the files contain text (header information) and the
>  remainder has the numerical data. Is there a way I can tell memmap to
>  skip a specified number of lines instead of a number of bytes?
> >>>
> >>> First use standard Python I/O functions to determine the number of
> >>> bytes to skip at the beginning and the number of data items. Then pass
> >>> in `offset` and `shape` parameters to numpy.memmap.
> >>
> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O
> >> functions you are referring to. Can you point me to do the
> >> documentation?
> >>
> >> Thanks again,
> >> Jeremy
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> > this might get you started:
> >
> >
> > import numpy as np
> >
> > # make some fake data with 12 header lines.
> > with open('test.mm', 'w') as fhw:
> >print >> fhw, "\n".join('header' for i in range(12))
> >np.arange(100, dtype=np.uint).tofile(fhw)
> >
> > # use normal python io to determine of offset after 12 lines.
> > with open('test.mm') as fhr:
> >for i in range(12): fhr.readline()
> >offset = fhr.tell()
> >
> > # use the offset in your call to np.memmap.
> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>
> Thanks, that looks good. I tried it, but it doesn't get the correct
> data. I really don't understand what is going on. A simple code and
> sample data is attached if anyone has a chance to look at it.
>


Your data file is all text.  memmap is generally for binary data; it won't
work with this file.

Warren



>
> Thanks,
> Jeremy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin  wrote:
> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen  wrote:
>> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
 Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
> I would like to use numpy's memmap on some data files I have. The first
> 12 or so lines of the files contain text (header information) and the
> remainder has the numerical data. Is there a way I can tell memmap to
> skip a specified number of lines instead of a number of bytes?

 First use standard Python I/O functions to determine the number of
 bytes to skip at the beginning and the number of data items. Then pass
 in `offset` and `shape` parameters to numpy.memmap.
>>>
>>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>>> functions you are referring to. Can you point me to do the
>>> documentation?
>>>
>>> Thanks again,
>>> Jeremy
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> this might get you started:
>>
>>
>> import numpy as np
>>
>> # make some fake data with 12 header lines.
>> with open('test.mm', 'w') as fhw:
>>    print >> fhw, "\n".join('header' for i in range(12))
>>    np.arange(100, dtype=np.uint).tofile(fhw)
>>
>> # use normal python io to determine of offset after 12 lines.
>> with open('test.mm') as fhr:
>>    for i in range(12): fhr.readline()
>>    offset = fhr.tell()
>>
>> # use the offset in your call to np.memmap.
>> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)
>
> Thanks, that looks good. I tried it, but it doesn't get the correct
> data. I really don't understand what is going on. A simple code and
> sample data is attached if anyone has a chance to look at it.
>
> Thanks,
> Jeremy
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

in that case, i would use:

np.loadtxt('tmp.dat', skiprows=12)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen  wrote:
> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?
>>>
>>> First use standard Python I/O functions to determine the number of
>>> bytes to skip at the beginning and the number of data items. Then pass
>>> in `offset` and `shape` parameters to numpy.memmap.
>>
>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> functions you are referring to. Can you point me to do the
>> documentation?
>>
>> Thanks again,
>> Jeremy
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> this might get you started:
>
>
> import numpy as np
>
> # make some fake data with 12 header lines.
> with open('test.mm', 'w') as fhw:
>    print >> fhw, "\n".join('header' for i in range(12))
>    np.arange(100, dtype=np.uint).tofile(fhw)
>
> # use normal python io to determine of offset after 12 lines.
> with open('test.mm') as fhr:
>    for i in range(12): fhr.readline()
>    offset = fhr.tell()
>
> # use the offset in your call to np.memmap.
> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

Thanks, that looks good. I tried it, but it doesn't get the correct
data. I really don't understand what is going on. A simple code and
sample data is attached if anyone has a chance to look at it.

Thanks,
Jeremy


tmp.dat
Description: Binary data


tmp.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Pearu Peterson


On 08/19/2011 05:01 PM, Brent Pedersen wrote:
> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?
>>>
>>> First use standard Python I/O functions to determine the number of
>>> bytes to skip at the beginning and the number of data items. Then pass
>>> in `offset` and `shape` parameters to numpy.memmap.
>>
>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> functions you are referring to. Can you point me to do the
>> documentation?
>>
>> Thanks again,
>> Jeremy
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> this might get you started:
>
>
> import numpy as np
>
> # make some fake data with 12 header lines.
> with open('test.mm', 'w') as fhw:
>  print>>  fhw, "\n".join('header' for i in range(12))
>  np.arange(100, dtype=np.uint).tofile(fhw)
>
> # use normal python io to determine of offset after 12 lines.
> with open('test.mm') as fhr:
>  for i in range(12): fhr.readline()
>  offset = fhr.tell()

I think that before reading a line the program should
check whether the line starts with "#". Otherwise fhr.readline()
may return a very large junk of data (may be the rest of the file 
content) that ought to be read only via memmap.

HTH,
Pearu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin  wrote:
> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>>> I would like to use numpy's memmap on some data files I have. The first
>>> 12 or so lines of the files contain text (header information) and the
>>> remainder has the numerical data. Is there a way I can tell memmap to
>>> skip a specified number of lines instead of a number of bytes?
>>
>> First use standard Python I/O functions to determine the number of
>> bytes to skip at the beginning and the number of data items. Then pass
>> in `offset` and `shape` parameters to numpy.memmap.
>
> Thanks for that suggestion. However, I'm unfamiliar with the I/O
> functions you are referring to. Can you point me to do the
> documentation?
>
> Thanks again,
> Jeremy
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

this might get you started:


import numpy as np

# make some fake data with 12 header lines.
with open('test.mm', 'w') as fhw:
print >> fhw, "\n".join('header' for i in range(12))
np.arange(100, dtype=np.uint).tofile(fhw)

# use normal python io to determine of offset after 12 lines.
with open('test.mm') as fhr:
for i in range(12): fhr.readline()
offset = fhr.tell()

# use the offset in your call to np.memmap.
a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

assert all(a == np.arange(100))
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen  wrote:
> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>> I would like to use numpy's memmap on some data files I have. The first
>> 12 or so lines of the files contain text (header information) and the
>> remainder has the numerical data. Is there a way I can tell memmap to
>> skip a specified number of lines instead of a number of bytes?
>
> First use standard Python I/O functions to determine the number of
> bytes to skip at the beginning and the number of data items. Then pass
> in `offset` and `shape` parameters to numpy.memmap.

Thanks for that suggestion. However, I'm unfamiliar with the I/O
functions you are referring to. Can you point me to do the
documentation?

Thanks again,
Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Pauli Virtanen
Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
> I would like to use numpy's memmap on some data files I have. The first
> 12 or so lines of the files contain text (header information) and the
> remainder has the numerical data. Is there a way I can tell memmap to
> skip a specified number of lines instead of a number of bytes?

First use standard Python I/O functions to determine the number of
bytes to skip at the beginning and the number of data items. Then pass
in `offset` and `shape` parameters to numpy.memmap.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Jeremy Conlin
I would like to use numpy's memmap on some data files I have. The
first 12 or so lines of the files contain text (header information)
and the remainder has the numerical data. Is there a way I can tell
memmap to skip a specified number of lines instead of a number of
bytes?

Thanks,
Jeremy
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion