Re: [Tutor] Omitting lines matching a list of strings from a file

2010-02-28 Thread galaxywatcher
 One formatting detail: there is a blank line after each line  
printed, how do I ged rid of the extra blank lines?


lines = [line.strip() for line in infile if line[146:148] not in  
omit_states]

print '\n'.join(lines)


This approach stripped leading blank spaces introducing errors into my  
fixed width file.



or alternatively

lines = [line for line in infile if line[146:148] not in omit_states]
print ''.join(lines)


This works beautifully leaving leading blank spaces intact. Thanks.

Just remember that doing a list comprehension like that on a large  
file will drastically reduce the speed of your application as well  
as introduce memory bloat.


Processing a file with well over 1 million records worked very  
quickly, several seconds. Did not notice excessive memory bloat. I do  
have 2 gigs of ram on my Macbook  Pro however.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Omitting lines matching a list of strings from a file

2010-02-25 Thread galaxywatcher

But I would do this with a list comprehension or generator
expression (depending on your Python version):


lines = [line for line in infile if line[146:148] not in omit_states]
print '\n'.join(lines)


That's very helpful. Thanks. One formatting detail: there is a blank  
line after each line printed, how do I ged rid of the extra blank lines?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Omitting lines matching a list of strings from a file

2010-02-24 Thread galaxywatcher
I am trying to output a list of addresses that do not match a list of  
State abbreviations. What I have so far is:


def main():
infile = open(list.txt, r)
for line in infile:
state = line[146:148]
omit_states = ['KS', 'KY', 'MA', 'ND', 'NE', 'NJ', 'PR',  
'RI', 'SD', 'VI', 'VT', 'WI']

for n in omit_states:
if state != n:
print line
infile.close()
main()

This outputs multiple duplicate lines. The strange thing is that if I  
change 'if state == n:' then I correctly output all matching lines.  
But I don't want that. I want to output all lines that do NOT match  
the States in the omit_states list.


I am probably overlooking something very simple. Thanks in advance.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Python workspace - IDE and version control

2010-01-18 Thread galaxywatcher
I want to share a couple of insights that I had getting started with  
Python that I did not come across in the literature all that often. I  
am discovering that there are two primary supporting tools needed in  
order to create an efficient and productive Python programming  
workspace: IDE and Version Control. I didn't realize at first how  
important these supporting tools for Python would be. If Python is  
your first programming language, you will have to learn how to use a  
good text editor or IDE (Integrated Development Environment). If you  
use textpad, it gets old very fast. I have chosen vim as my IDE and I  
added a few key plugins that I think help a lot (snipMate, surround,  
nerd-tree, and repeat).  I believe that snipMate is a plugin made  
specifically for Python users on vim. Among other features, it auto  
indents your code which is very nice.


So now that I can do some Python scripting, I started to notice that  
my scripts were not very organized. Collaboration of code is  
difficult. I had multiple copies of the same script in different  
directories on my computer, and I did not have a good way to really  
keep track. This is the wrong way. Version Control Systems are tried  
and true technologies for collaborating with others (or even yourself)  
on your code. After some research, I have decided to go with Git. I  
have never used version control before, so I don't know the  
distinctions of the various systems out there. I chose Git mainly  
because github.com is really great. Some MAJOR open source (and  
closed) projects are happening on there and you can download the open  
source code so very easily. I am told Mercurial is good too, Bazaar  
and SVN also came up in my research.


Obviously, no tool can think for you. The real programming work of  
course is going on in your brain. I am curious what combination of IDE  
and Version Control System you use and also perhaps, what other tools  
I should be looking at as well.


Thanks.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process

2010-01-09 Thread galaxywatcher
After many more hours of reading and testing, I am still struggling to  
finish this simple script, which bear in mind, I already got my  
desired results by preprocessing with an awk one-liner.


I am opening a zipped file properly, so I did make some progress, but  
simply assigning num1 and num2 to the first 2 columns of the file  
remains elusive. Num3 here gets assigned, not to the 3rd column, but  
the rest of the entire file. I feel like I am missing a simple strip()  
or some other incantation that prevents the entire file from getting  
blobbed into num3. Any help is appreciated in advance.


#!/usr/bin/env python

import string
import re
import zipfile
highflag = flagcount = sum = sumtotal = 0
f = file(test.zip)
z = zipfile.ZipFile(f)
for f in z.namelist():
ranges = z.read(f)
ranges = ranges.strip()
num1, num2, num3 = re.split('\W+', ranges, 2)  ## This line is  
the root of the problem.

sum = int(num2) - int(num1)
if sum  1000:
flag1 =  
flagcount += 1
else:
flag1 = 
if sum  highflag:
highflag = sum
print str(num2) +  -  + str(num1) +  =  + str(sum) + flag1
sumtotal = sumtotal + sum

print Total ranges = , sumtotal
print Total ranges over 10 million: , flagcount
print Largest range: , highflag

==
$ zcat test.zip
134873600, 134873855, 32787 Protex Technologies, Inc.
135338240, 135338495, 40597
135338496, 135338751, 40993
201720832, 201721087, 12838 HFF Infrastructure  Operations
202739456, 202739711, 1623 Beseau Regional de la Region Languedoc  
Roussillon


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process

2010-01-09 Thread galaxywatcher
I finally got it working! I would do a victory lap around my apartment  
building if I wasn't recovering from a broken ankle.


Excuse my excitement, but this simple script marks a new level of  
Python proficiency for me. Thanks to Kent, Bob, Denis, and others who  
pointed me in the right direction.
It does quite a few things: decompresses a zipped file or files if  
there is an archive of them, processes a rather ugly csv file (ugly  
because it uses a comma as a delimiter, yet there are commas in double  
quote separated fields), and it does a simple subtraction of the two  
columns with a little summary to give me the data I need.


#!/usr/bin/env python
import string
import re
import zipfile
highflag = flagcount = sum = sumtotal = 0
z = zipfile.ZipFile('textfile.zip')
for subfile in z.namelist():
print Working on filename:  + subfile + \n
data = z.read(subfile)
pat = re.compile(r(\d+), (\d+), (\.+\|\w+))
for line in data.splitlines():
result = pat.match(line)
ranges = result.groups()
num1 = ranges[0]
num2 = ranges[1]
sum = int(num2) - int(num1)
if sum  1000:
flag1 =  
flagcount += 1
else:
flag1 = 
if sum  highflag:
highflag = sum
print str(num2) +  -  + str(num1) +  =  + str(sum) + flag1
sumtotal = sumtotal + sum

print Total ranges = , sumtotal
print Total ranges over 10 million: , flagcount
print Largest range: , highflag

A few observations from a Python newbie: The zipfile and gzip modules  
should really be merged together. gzcat on unix reads both compression  
formats. It took me way too long to figure out the namelist() method.  
But I did learn a lot more about how zip actually works as a result.  
Is there really no way to extract the contents of a single zipped file  
without using a 'for in namelist' construct?


Trying to get split() to extract just two columns from my data was a  
dead end. The re module is the way to go.


I feel like I am in relatively new territory with Python's regex  
engine. Denis did save me some valuable time with his regex, but my  
file had values in the 3rd column that started with alphas as opposed  
to numerics only, and flipping that (\.+\|\d+)) to a (\.+\|\w 
+)) had me gnashing teeth and pulling hair the whole way through  
the regex tutorial. When I finally figured it out, I smack my forehead  
and say of course!. The compile() method of Python's regex engine is  
new for me. Makes sense. Just something I have to get used to. I do  
have the feeling that Perl's regex is better. But that is another story.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Greetings Pythonistas

2010-01-07 Thread galaxywatcher
This is my first post to the Python tutor list and I just wanted to  
introduce myself and give a little background on my skill level prior  
to asking for Python advice and programming tips. I am relatively new  
to Python, but I have been dabbling with unix shell scripting for at  
least 10 years. I can construct powerful one liners using awk, sed,  
cut, uniq, sort, grep, etc. I definitely know my way around the unix  
file system. I use vim with various plugins, and I feel like I am the  
eternal unix student, constantly evolving my skills, but never  
reaching mastery. I have worked my way slightly past the novice level  
with Perl, but I never felt that I made the breakthrough that I needed  
to feel proficient enough to do anything serious with it.


Python feels different. I have the gut instinct that I can really  
develop my skill set to do great things with the language. By great, I  
mean, that I can take an idea, a big idea even, and efficiently  
transform that idea into a software reality. I really want to master  
this language. I am reading two books at the moment and working  
through the exercises: Dive into Python by Mark Pilgrim, and Python  
Programming - An Introduction to Computer Science by Zelle. I have  
ideas that I want to develop, I put the books down and start  
scripting, but I always seem to hit a wall based on my lack of  
knowledge, so I pick up the books again and continue reading. I often  
wish that I had a private tutor or a Python guru that I could just ask  
how to get past a certain wall. Perhaps this list has that person or  
people on it. With that said, I look forward to participating with the  
Python tutors here.


Thanks
Blake
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Expanding a Python script to include a zcat and awk pre-process

2010-01-07 Thread galaxywatcher
I wrote a simple Python script to process a text file, but I had to  
run a shell one liner to get the text file primed for the script. I  
would much rather have the Python script handle the whole task without  
any pre-processing at all. I will show 1) a small sample of the text  
file, 2) my script, 3) the one liner that I want to fold into the  
script, and 4) the task at hand.


1) $ zcat textfile.txt.zip | head -5
134873600, 134873855, 32787 Protex Technologies, Inc.
135338240, 135338495, 40597
135338496, 135338751, 40993
201720832, 201721087, 12838 HFF Infrastructure  Operations
202739456, 202739711, 1623 Beseau Regional de la Region Languedoc  
Roussillon



2) $ cat getranges.py
#!/usr/bin/env python

import string

highflag = flagcount = sum = sumtotal = 0
infile = open(textfile.txt, r)
# Find the range by subtracting column 1 from column 2
for line in infile:
num1, num2 = string.split(line)
sum = int(num2) - int(num1)
if sum  1000:
flag1 =  
flagcount += 1
if sum  highflag:
highflag = sum
else:
flag1 = 
print str(num2) +  -  + str(num1) +  =  + str(sum) + flag1
sumtotal = sumtotal + sum
print Total ranges = , sumtotal
print Total # of ranges over 10 million: , flagcount
print Largest range: , highflag

3) zcat textfile.txt.zip | awk -F, '{print $1, $2}'  textfile.txt

4) In my first iteration, I used string.split(num1, ,) but I ran  
into trouble when I encountered commas within column 3, such as 32787  
Protexic Technologies, Inc.. I don't know how to handle this  
exception. I also don't know how to uncompress the file in Python and  
pass it to the rest of the script. Hence I used my zcat | awk oneliner  
to get the job done. So how do I uncompress zip and gzipped files in  
Python, and how do I force split to only evaluate the first two  
columns? Better yet, can I tell split to not evaluate commas in the  
double quoted 3rd column?


Regards,
Blake
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor