Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 9:23 AM, Warren Weckesser wrote: > > > On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin wrote: >> >> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen >> wrote: >> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin >> > wrote: >> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >> I would like to use numpy's memmap on some data files I have. The >> first >> 12 or so lines of the files contain text (header information) and the >> remainder has the numerical data. Is there a way I can tell memmap to >> skip a specified number of lines instead of a number of bytes? >> >>> >> >>> First use standard Python I/O functions to determine the number of >> >>> bytes to skip at the beginning and the number of data items. Then pass >> >>> in `offset` and `shape` parameters to numpy.memmap. >> >> >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> >> functions you are referring to. Can you point me to do the >> >> documentation? >> >> >> >> Thanks again, >> >> Jeremy >> >> ___ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion@scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > this might get you started: >> > >> > >> > import numpy as np >> > >> > # make some fake data with 12 header lines. >> > with open('test.mm', 'w') as fhw: >> > print >> fhw, "\n".join('header' for i in range(12)) >> > np.arange(100, dtype=np.uint).tofile(fhw) >> > >> > # use normal python io to determine of offset after 12 lines. >> > with open('test.mm') as fhr: >> > for i in range(12): fhr.readline() >> > offset = fhr.tell() >> > >> > # use the offset in your call to np.memmap. >> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) >> >> Thanks, that looks good. I tried it, but it doesn't get the correct >> data. I really don't understand what is going on. A simple code and >> sample data is attached if anyone has a chance to look at it. > > > Your data file is all text. memmap is generally for binary data; it won't > work with this file. > > Warren Yikes! I missed the "binary" in the first line of the documentation. Sorry! Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen > wrote: > > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin > wrote: > >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: > >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: > I would like to use numpy's memmap on some data files I have. The > first > 12 or so lines of the files contain text (header information) and the > remainder has the numerical data. Is there a way I can tell memmap to > skip a specified number of lines instead of a number of bytes? > >>> > >>> First use standard Python I/O functions to determine the number of > >>> bytes to skip at the beginning and the number of data items. Then pass > >>> in `offset` and `shape` parameters to numpy.memmap. > >> > >> Thanks for that suggestion. However, I'm unfamiliar with the I/O > >> functions you are referring to. Can you point me to do the > >> documentation? > >> > >> Thanks again, > >> Jeremy > >> ___ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > this might get you started: > > > > > > import numpy as np > > > > # make some fake data with 12 header lines. > > with open('test.mm', 'w') as fhw: > >print >> fhw, "\n".join('header' for i in range(12)) > >np.arange(100, dtype=np.uint).tofile(fhw) > > > > # use normal python io to determine of offset after 12 lines. > > with open('test.mm') as fhr: > >for i in range(12): fhr.readline() > >offset = fhr.tell() > > > > # use the offset in your call to np.memmap. > > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) > > Thanks, that looks good. I tried it, but it doesn't get the correct > data. I really don't understand what is going on. A simple code and > sample data is attached if anyone has a chance to look at it. > Your data file is all text. memmap is generally for binary data; it won't work with this file. Warren > > Thanks, > Jeremy > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen wrote: >> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: > I would like to use numpy's memmap on some data files I have. The first > 12 or so lines of the files contain text (header information) and the > remainder has the numerical data. Is there a way I can tell memmap to > skip a specified number of lines instead of a number of bytes? First use standard Python I/O functions to determine the number of bytes to skip at the beginning and the number of data items. Then pass in `offset` and `shape` parameters to numpy.memmap. >>> >>> Thanks for that suggestion. However, I'm unfamiliar with the I/O >>> functions you are referring to. Can you point me to do the >>> documentation? >>> >>> Thanks again, >>> Jeremy >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> this might get you started: >> >> >> import numpy as np >> >> # make some fake data with 12 header lines. >> with open('test.mm', 'w') as fhw: >> print >> fhw, "\n".join('header' for i in range(12)) >> np.arange(100, dtype=np.uint).tofile(fhw) >> >> # use normal python io to determine of offset after 12 lines. >> with open('test.mm') as fhr: >> for i in range(12): fhr.readline() >> offset = fhr.tell() >> >> # use the offset in your call to np.memmap. >> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) > > Thanks, that looks good. I tried it, but it doesn't get the correct > data. I really don't understand what is going on. A simple code and > sample data is attached if anyone has a chance to look at it. > > Thanks, > Jeremy > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > in that case, i would use: np.loadtxt('tmp.dat', skiprows=12) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen wrote: > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? >>> >>> First use standard Python I/O functions to determine the number of >>> bytes to skip at the beginning and the number of data items. Then pass >>> in `offset` and `shape` parameters to numpy.memmap. >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> functions you are referring to. Can you point me to do the >> documentation? >> >> Thanks again, >> Jeremy >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > this might get you started: > > > import numpy as np > > # make some fake data with 12 header lines. > with open('test.mm', 'w') as fhw: > print >> fhw, "\n".join('header' for i in range(12)) > np.arange(100, dtype=np.uint).tofile(fhw) > > # use normal python io to determine of offset after 12 lines. > with open('test.mm') as fhr: > for i in range(12): fhr.readline() > offset = fhr.tell() > > # use the offset in your call to np.memmap. > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) Thanks, that looks good. I tried it, but it doesn't get the correct data. I really don't understand what is going on. A simple code and sample data is attached if anyone has a chance to look at it. Thanks, Jeremy tmp.dat Description: Binary data tmp.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On 08/19/2011 05:01 PM, Brent Pedersen wrote: > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? >>> >>> First use standard Python I/O functions to determine the number of >>> bytes to skip at the beginning and the number of data items. Then pass >>> in `offset` and `shape` parameters to numpy.memmap. >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> functions you are referring to. Can you point me to do the >> documentation? >> >> Thanks again, >> Jeremy >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > this might get you started: > > > import numpy as np > > # make some fake data with 12 header lines. > with open('test.mm', 'w') as fhw: > print>> fhw, "\n".join('header' for i in range(12)) > np.arange(100, dtype=np.uint).tofile(fhw) > > # use normal python io to determine of offset after 12 lines. > with open('test.mm') as fhr: > for i in range(12): fhr.readline() > offset = fhr.tell() I think that before reading a line the program should check whether the line starts with "#". Otherwise fhr.readline() may return a very large junk of data (may be the rest of the file content) that ought to be read only via memmap. HTH, Pearu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >>> I would like to use numpy's memmap on some data files I have. The first >>> 12 or so lines of the files contain text (header information) and the >>> remainder has the numerical data. Is there a way I can tell memmap to >>> skip a specified number of lines instead of a number of bytes? >> >> First use standard Python I/O functions to determine the number of >> bytes to skip at the beginning and the number of data items. Then pass >> in `offset` and `shape` parameters to numpy.memmap. > > Thanks for that suggestion. However, I'm unfamiliar with the I/O > functions you are referring to. Can you point me to do the > documentation? > > Thanks again, > Jeremy > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > this might get you started: import numpy as np # make some fake data with 12 header lines. with open('test.mm', 'w') as fhw: print >> fhw, "\n".join('header' for i in range(12)) np.arange(100, dtype=np.uint).tofile(fhw) # use normal python io to determine of offset after 12 lines. with open('test.mm') as fhr: for i in range(12): fhr.readline() offset = fhr.tell() # use the offset in your call to np.memmap. a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) assert all(a == np.arange(100)) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: > Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >> I would like to use numpy's memmap on some data files I have. The first >> 12 or so lines of the files contain text (header information) and the >> remainder has the numerical data. Is there a way I can tell memmap to >> skip a specified number of lines instead of a number of bytes? > > First use standard Python I/O functions to determine the number of > bytes to skip at the beginning and the number of data items. Then pass > in `offset` and `shape` parameters to numpy.memmap. Thanks for that suggestion. However, I'm unfamiliar with the I/O functions you are referring to. Can you point me to do the documentation? Thanks again, Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to start at line # x when using numpy.memmap
Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: > I would like to use numpy's memmap on some data files I have. The first > 12 or so lines of the files contain text (header information) and the > remainder has the numerical data. Is there a way I can tell memmap to > skip a specified number of lines instead of a number of bytes? First use standard Python I/O functions to determine the number of bytes to skip at the beginning and the number of data items. Then pass in `offset` and `shape` parameters to numpy.memmap. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How to start at line # x when using numpy.memmap
I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? Thanks, Jeremy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion