On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.ol...@gmail.com> wrote: > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugad...@gwmail.gwu.edu> wrote: >> Hey everyone, >> >> I have timeseries data in which the column label is simply a filename from >> which the original data was taken. Here's some sample data: >> >> name1.txt name2.txt name3.txt >> 32 34 953 >> 32 03 402 >> >> I've noticed that the standard genfromtxt() method works great; however, the >> names aren't written correctly. That is, if I use the command: >> >> print data['name1.txt'] >> >> Nothing happens. >> >> However, when I remove the file extension, Eg: >> >> name1 name2 name3 >> 32 34 953 >> 32 03 402 >> >> Then print data['name1'] return (32, 32) as expected. It seems that the >> period in the name isn't compatible with the genfromtxt() names attribute. >> Is there a workaround, or do I need to restructure my program to get the >> extension removed? I'd rather not do this if possible for reasons that >> aren't important for the discussion at hand. > > It looks like the period is just getting stripped out of the names: > > In [1]: import numpy as N > > In [2]: N.genfromtxt('sample.txt', names=True) > Out[2]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) > > Interestingly, this still happens if you supply the names manually: > > In [17]: def reader(filename): > ....: infile = open(filename, 'r') > ....: names = infile.readline().split() > ....: data = N.genfromtxt(infile, names=names) > ....: infile.close() > ....: return data > ....: > > In [20]: data = reader('sample.txt') > > In [21]: data > Out[21]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) > > What you can do is reset the names after genfromtxt is through with it, > though: > > In [34]: def reader(filename): > ....: infile = open(filename, 'r') > ....: names = infile.readline().split() > ....: infile.close() > ....: data = N.genfromtxt(filename, names=True) > ....: data.dtype.names = names > ....: return data > ....: > > In [35]: data = reader('sample.txt') > > In [36]: data > Out[36]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')]) > > Be warned, I don't know why the period is getting stripped; there may > be a good reason, and adding it in might cause problems.
I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do your_array.sample.txt All the names get passed through a name validator. IIRC it's something like from numpy.lib import _iotools validator = _iotools.NameValidator() validator.validate('sample1.txt') validator.validate('a name with spaces') NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already. Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion