Re: [Tutor] Omitting lines matching a list of strings from a file
One formatting detail: there is a blank line after each line printed, how do I ged rid of the extra blank lines? lines = [line.strip() for line in infile if line[146:148] not in omit_states] print '\n'.join(lines) This approach stripped leading blank spaces introducing errors into my fixed width file. or alternatively lines = [line for line in infile if line[146:148] not in omit_states] print ''.join(lines) This works beautifully leaving leading blank spaces intact. Thanks. Just remember that doing a list comprehension like that on a large file will drastically reduce the speed of your application as well as introduce memory bloat. Processing a file with well over 1 million records worked very quickly, several seconds. Did not notice excessive memory bloat. I do have 2 gigs of ram on my Macbook Pro however. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Omitting lines matching a list of strings from a file
But I would do this with a list comprehension or generator expression (depending on your Python version): lines = [line for line in infile if line[146:148] not in omit_states] print '\n'.join(lines) That's very helpful. Thanks. One formatting detail: there is a blank line after each line printed, how do I ged rid of the extra blank lines? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Omitting lines matching a list of strings from a file
I am trying to output a list of addresses that do not match a list of State abbreviations. What I have so far is: def main(): infile = open(list.txt, r) for line in infile: state = line[146:148] omit_states = ['KS', 'KY', 'MA', 'ND', 'NE', 'NJ', 'PR', 'RI', 'SD', 'VI', 'VT', 'WI'] for n in omit_states: if state != n: print line infile.close() main() This outputs multiple duplicate lines. The strange thing is that if I change 'if state == n:' then I correctly output all matching lines. But I don't want that. I want to output all lines that do NOT match the States in the omit_states list. I am probably overlooking something very simple. Thanks in advance. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Python workspace - IDE and version control
I want to share a couple of insights that I had getting started with Python that I did not come across in the literature all that often. I am discovering that there are two primary supporting tools needed in order to create an efficient and productive Python programming workspace: IDE and Version Control. I didn't realize at first how important these supporting tools for Python would be. If Python is your first programming language, you will have to learn how to use a good text editor or IDE (Integrated Development Environment). If you use textpad, it gets old very fast. I have chosen vim as my IDE and I added a few key plugins that I think help a lot (snipMate, surround, nerd-tree, and repeat). I believe that snipMate is a plugin made specifically for Python users on vim. Among other features, it auto indents your code which is very nice. So now that I can do some Python scripting, I started to notice that my scripts were not very organized. Collaboration of code is difficult. I had multiple copies of the same script in different directories on my computer, and I did not have a good way to really keep track. This is the wrong way. Version Control Systems are tried and true technologies for collaborating with others (or even yourself) on your code. After some research, I have decided to go with Git. I have never used version control before, so I don't know the distinctions of the various systems out there. I chose Git mainly because github.com is really great. Some MAJOR open source (and closed) projects are happening on there and you can download the open source code so very easily. I am told Mercurial is good too, Bazaar and SVN also came up in my research. Obviously, no tool can think for you. The real programming work of course is going on in your brain. I am curious what combination of IDE and Version Control System you use and also perhaps, what other tools I should be looking at as well. Thanks. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process
After many more hours of reading and testing, I am still struggling to finish this simple script, which bear in mind, I already got my desired results by preprocessing with an awk one-liner. I am opening a zipped file properly, so I did make some progress, but simply assigning num1 and num2 to the first 2 columns of the file remains elusive. Num3 here gets assigned, not to the 3rd column, but the rest of the entire file. I feel like I am missing a simple strip() or some other incantation that prevents the entire file from getting blobbed into num3. Any help is appreciated in advance. #!/usr/bin/env python import string import re import zipfile highflag = flagcount = sum = sumtotal = 0 f = file(test.zip) z = zipfile.ZipFile(f) for f in z.namelist(): ranges = z.read(f) ranges = ranges.strip() num1, num2, num3 = re.split('\W+', ranges, 2) ## This line is the root of the problem. sum = int(num2) - int(num1) if sum 1000: flag1 = flagcount += 1 else: flag1 = if sum highflag: highflag = sum print str(num2) + - + str(num1) + = + str(sum) + flag1 sumtotal = sumtotal + sum print Total ranges = , sumtotal print Total ranges over 10 million: , flagcount print Largest range: , highflag == $ zcat test.zip 134873600, 134873855, 32787 Protex Technologies, Inc. 135338240, 135338495, 40597 135338496, 135338751, 40993 201720832, 201721087, 12838 HFF Infrastructure Operations 202739456, 202739711, 1623 Beseau Regional de la Region Languedoc Roussillon ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Expanding a Python script to include a zcat and awk pre-process
I finally got it working! I would do a victory lap around my apartment building if I wasn't recovering from a broken ankle. Excuse my excitement, but this simple script marks a new level of Python proficiency for me. Thanks to Kent, Bob, Denis, and others who pointed me in the right direction. It does quite a few things: decompresses a zipped file or files if there is an archive of them, processes a rather ugly csv file (ugly because it uses a comma as a delimiter, yet there are commas in double quote separated fields), and it does a simple subtraction of the two columns with a little summary to give me the data I need. #!/usr/bin/env python import string import re import zipfile highflag = flagcount = sum = sumtotal = 0 z = zipfile.ZipFile('textfile.zip') for subfile in z.namelist(): print Working on filename: + subfile + \n data = z.read(subfile) pat = re.compile(r(\d+), (\d+), (\.+\|\w+)) for line in data.splitlines(): result = pat.match(line) ranges = result.groups() num1 = ranges[0] num2 = ranges[1] sum = int(num2) - int(num1) if sum 1000: flag1 = flagcount += 1 else: flag1 = if sum highflag: highflag = sum print str(num2) + - + str(num1) + = + str(sum) + flag1 sumtotal = sumtotal + sum print Total ranges = , sumtotal print Total ranges over 10 million: , flagcount print Largest range: , highflag A few observations from a Python newbie: The zipfile and gzip modules should really be merged together. gzcat on unix reads both compression formats. It took me way too long to figure out the namelist() method. But I did learn a lot more about how zip actually works as a result. Is there really no way to extract the contents of a single zipped file without using a 'for in namelist' construct? Trying to get split() to extract just two columns from my data was a dead end. The re module is the way to go. I feel like I am in relatively new territory with Python's regex engine. Denis did save me some valuable time with his regex, but my file had values in the 3rd column that started with alphas as opposed to numerics only, and flipping that (\.+\|\d+)) to a (\.+\|\w +)) had me gnashing teeth and pulling hair the whole way through the regex tutorial. When I finally figured it out, I smack my forehead and say of course!. The compile() method of Python's regex engine is new for me. Makes sense. Just something I have to get used to. I do have the feeling that Perl's regex is better. But that is another story. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Greetings Pythonistas
This is my first post to the Python tutor list and I just wanted to introduce myself and give a little background on my skill level prior to asking for Python advice and programming tips. I am relatively new to Python, but I have been dabbling with unix shell scripting for at least 10 years. I can construct powerful one liners using awk, sed, cut, uniq, sort, grep, etc. I definitely know my way around the unix file system. I use vim with various plugins, and I feel like I am the eternal unix student, constantly evolving my skills, but never reaching mastery. I have worked my way slightly past the novice level with Perl, but I never felt that I made the breakthrough that I needed to feel proficient enough to do anything serious with it. Python feels different. I have the gut instinct that I can really develop my skill set to do great things with the language. By great, I mean, that I can take an idea, a big idea even, and efficiently transform that idea into a software reality. I really want to master this language. I am reading two books at the moment and working through the exercises: Dive into Python by Mark Pilgrim, and Python Programming - An Introduction to Computer Science by Zelle. I have ideas that I want to develop, I put the books down and start scripting, but I always seem to hit a wall based on my lack of knowledge, so I pick up the books again and continue reading. I often wish that I had a private tutor or a Python guru that I could just ask how to get past a certain wall. Perhaps this list has that person or people on it. With that said, I look forward to participating with the Python tutors here. Thanks Blake ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Expanding a Python script to include a zcat and awk pre-process
I wrote a simple Python script to process a text file, but I had to run a shell one liner to get the text file primed for the script. I would much rather have the Python script handle the whole task without any pre-processing at all. I will show 1) a small sample of the text file, 2) my script, 3) the one liner that I want to fold into the script, and 4) the task at hand. 1) $ zcat textfile.txt.zip | head -5 134873600, 134873855, 32787 Protex Technologies, Inc. 135338240, 135338495, 40597 135338496, 135338751, 40993 201720832, 201721087, 12838 HFF Infrastructure Operations 202739456, 202739711, 1623 Beseau Regional de la Region Languedoc Roussillon 2) $ cat getranges.py #!/usr/bin/env python import string highflag = flagcount = sum = sumtotal = 0 infile = open(textfile.txt, r) # Find the range by subtracting column 1 from column 2 for line in infile: num1, num2 = string.split(line) sum = int(num2) - int(num1) if sum 1000: flag1 = flagcount += 1 if sum highflag: highflag = sum else: flag1 = print str(num2) + - + str(num1) + = + str(sum) + flag1 sumtotal = sumtotal + sum print Total ranges = , sumtotal print Total # of ranges over 10 million: , flagcount print Largest range: , highflag 3) zcat textfile.txt.zip | awk -F, '{print $1, $2}' textfile.txt 4) In my first iteration, I used string.split(num1, ,) but I ran into trouble when I encountered commas within column 3, such as 32787 Protexic Technologies, Inc.. I don't know how to handle this exception. I also don't know how to uncompress the file in Python and pass it to the rest of the script. Hence I used my zcat | awk oneliner to get the job done. So how do I uncompress zip and gzipped files in Python, and how do I force split to only evaluate the first two columns? Better yet, can I tell split to not evaluate commas in the double quoted 3rd column? Regards, Blake ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor