The sixteen lines of data you sent work in a little histogram-
generator for me, ignoring the masking (as a nearly-newbie, I can say
that ignoring the stuff I don't yet care about usually works):
from matplotlib.mlab import csv2rec, csv
import pylab as p
import numpy as n
names = ('date', 'time', 'program', 'level', 'error_id', 'thread',
'na', 'machine', 'request', 'detail')
r = csv2rec("/Users/clew/Documents/pycode/test.csv", names = names)
print r.shape
print r[3]
for name in names:
print 'Values of ', name, ':'
print r[name]
for row in r:
if row['thread'] == 537: print row
print type(r['thread'])
n, bins, patches = p.hist(r['thread'])
print n,bins,patches
p.savefig('csvhistogram')
p.show()
Does this work for you? On the whole file?
&C
On Aug 21, 2009, at 9:27 AM, Phil Robare wrote:
> Hi folks,
>
> I have a (newbie) problem using csv2rec. I am a regular python user
> but this is my first time using matplotlib and numpy after being
> inspired by attending a talk by Dr. John Hunter.
>
> I am trying to read a csv file that has >6000 lines that look like
> this:
>
> <code>
> 8/17/2009,4:49:52
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:
> 20090210::7881558:3893255:311247:166422::,Completed..
> 8/17/2009,4:49:52
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:
> 20090210::7881558:3888955:311247:166422::,From
> Disk..
> 8/17/2009,4:49:51 PM,CVAgent,Warning,8,556,N/A,THP-PR-APVL,Exception
> in CVProcess.GetNewfile: The process cannot access the file because it
> is being used by another process..,
> 8/17/2009,4:49:51 PM,CVAgent,Information,2,447,N/A,THP-PR-APVL,SDAY ->
> R:
> 20090210
> :::3893955:311247:166422::20090210:::3893955:388247:166422::50:,.
> 8/17/2009,4:29:55
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:
> 20090728::7881558:4888461:22088980:964878::,Completed..
> 8/17/2009,4:29:55
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:
> 20090728::7881558:4888461:22030980:964878::,From
> Disk..
> 8/17/2009,4:29:54 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,JJULIO
> -> R:
> 20090728
> :::4888461:22030980:964878::20090728:::4888461:22030980:964878::50:,.
> 8/17/2009,4:24:02
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:
> 20090226::7881558:2882501:325032:316888::,Completed..
> 8/17/2009,4:24:02
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:
> 20090226::7881558:8822501:325882:318816::,From
> Disk..
> 8/17/2009,4:23:56 PM,CVAgent,Information,2,556,N/A,THP-PR-APVL,tdietz
> -> R:20090226::::325882:318816::20090226::::325882:318816::50:,.
> 8/17/2009,4:21:41 PM,CVAgent,Information,2,3045,N/A,THP-PR-APVL,tdietz
> -> R:20090226::::325882:318816::20090226::::325032:318816::50:,.
> 8/17/2009,4:19:44
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:
> 20090210::7881558:2882613:278887:4020000::,Completed..
> 8/17/2009,4:19:43
> PM,CVAgent,Information,3,537,N/A,THP-PR-APVL,F:
> 20090210::7881558:2882613:278777:4020000::,From
> Disk..
> 8/17/2009,4:19:42 PM,CVAgent,Information,2,793,N/A,THP-PR-APVL,MUTSCH
> -> R:
> 20090210
> :::2882613:278887:4020000::20090210:::2882613:278887:4020000::50:,.
> 8/17/2009,4:11:02
> PM,CVAgent,Information,5,793,N/A,THP-PR-APVL,F:
> 20090817::7881558:1776517:1211:58800::,Completed..
> 8/17/2009,4:49:52
> PM,CVAgent,Information,5,537,N/A,THP-PR-APVL,F:
> 20090210::7881558:3893255:311247:166422::,Completed..
> </code>
>
> I have given the columns names since there is not a header line:
> <code>
> In [150]: print names
> ('date', 'time', 'program', 'level', 'error_id', 'thread', 'na',
> 'machine', 'request', 'detail')
> </code>
>
> and I have provided convert functions to be sure the data is read
> correctly:
> <code>
> In [152]: print converterd
> {'thread': <type 'int'>, 'level': <type 'str'>, 'na': <type 'str'>,
> 'request': <type 'str'>, 'detail': <type 'str'>, 'machine': <type
> 'str'>, 'program': <type 'str'>, 'time': <function str2time at
> 0x03795530>, 'date': <function str2date at
> 0x037950B0>}
> </code>
>
> (I'm not sure if this is needed. IPython seems to recognize csv2rec
> just fine but the sample program does an import like this.)
> <code>
> In [141]: import matplotlib.mlab as mlab
> </code>
>
> So now I call csv2rec on my file. It takes a second or so to gulp it
> all in and then returns without error.
> <code>
> In [142]: r=mlab.csv2rec(filename,converterd=converterd,names=names)
> </code>
>
> So now I look to see what I have. And it's nothing like I thought it
> would be. I expected thousands of records and I have 10. I expected
> times and dates, ints and strings. And all I have are masked values.
> <code>
> In [143]: r
> Out[143]:
> masked_records(
> date : [-- -- -- -- -- -- -- -- -- --]
> time : [-- -- -- -- -- -- -- -- -- --]
> program : [-- -- -- -- -- -- -- -- -- --]
> level : [-- -- -- -- -- -- -- -- -- --]
> error_id : [-- -- -- -- -- -- -- -- -- --]
> thread : [-- -- -- -- -- -- -- -- -- --]
> na : [-- -- -- -- -- -- -- -- -- --]
> machine : [-- -- -- -- -- -- -- -- -- --]
> request : [-- -- -- -- -- -- -- -- -- --]
> detail : [-- -- -- -- -- -- -- -- -- --]
> fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
> )
> </code>
>
> So I look at the mask. I see no clues here.
> <code>
> In [144]: r.mask
> Out[144]:
> array([(True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True),
> (True, True, True, True, True, True, True, True, True, True)],
> dtype=[('date', '|b1'), ('time', '|b1'), ('program', '|b1'),
> ('level', '|b1'), ('error_id', '|b1'), ('thread', '|b1'), ('na',
> '|b1'), ('machine', '|b1'),
> ('request', '|b1'), ('detail', '|b1')])
> </code>
>
> Well, maybe if I change the mask I can see what is being hidden.
> <code>
> In [145]: r.mask[0]
> Out[145]: (True, True, True, True, True, True, True, True, True, True)
>
> In [146]: r.mask[0]=(False,)*10
>
> In [147]: r
> Out[147]:
> masked_records(
> date : [2009-08-17 -- -- -- -- -- -- -- -- --]
> time : [2009-08-17 -- -- -- -- -- -- -- -- --]
> program : [2009-08-17 -- -- -- -- -- -- -- -- --]
> level : [2009-08-17 -- -- -- -- -- -- -- -- --]
> error_id : [2009-08-17 -- -- -- -- -- -- -- -- --]
> thread : [2009-08-17 -- -- -- -- -- -- -- -- --]
> na : [2009-08-17 -- -- -- -- -- -- -- -- --]
> machine : [2009-08-17 -- -- -- -- -- -- -- -- --]
> request : [2009-08-17 -- -- -- -- -- -- -- -- --]
> detail : [2009-08-17 -- -- -- -- -- -- -- -- --]
> fill_value : ('?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
> )
> </code>
>
> So I think I see what is going on. Rather than taking each line of
> the input file as a record it is taking each column as a record.
> Since I said there are ten values per record it stopped after ten rows
> since that is all the columns it had to fill in.
>
> Now you know my problem.
>
> How do I get csv2rec to read my file so I can start getting nice
> histograms of counts per day?
>
> A further question is why am I getting masked records at all and how
> do I control this? I don't see anything in the numpy or matplotlib
> user guides that answer this. I did find a helpful document on the
> web (http://www.bom.gov.au/bmrc/climdyn/staff/lih/pubs/docs/masks.pdf)
> that explained what masks are
> and why and how they can be used. I don't need them and would like to
> make sure that nothing is masked.
>
> Thanks in advance for helping a newbie over the hump.
>
> Phil Robare
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day
> trial. Simplify your report design, integration and deployment - and
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Matplotlib-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
Chloe Lewis
Graduate student, Amundson Lab
Division of Ecosystem Sciences, ESPM
University of California, Berkeley
137 Mulford Hall - #3114
Berkeley, CA 94720-3114
[email protected]
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Matplotlib-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/matplotlib-users