Re: [Tutor] changing char list to int list isn't working
On 04/05/13 05:13, Jim Mooney wrote: I'm turning an integer into a string so I can make a list of separate chars, then turn those chars back into individual ints, You don't actually need to convert to chars, you could use divmod to do it directly on the numbers: digits = [] root = 455 while root 0: ... root, n = divmod(root,10) ... digits.insert(0,n) ... digits [4, 5, 5] But I suspect the str() method is slightly faster... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] creating a corpus from a csv file
Treder, Robert wrote: I'm very new to python and am trying to figure out how to make a corpus from a text file. I have a csv file (actually pipe '|' delimited) where each row corresponds to a different text document. Each row contains a communication note. Other columns correspond to categories of types of communications. I am able to read the csv file and print the notes column as follows: import csv with open('notes.txt', 'rb') as infile: reader = csv.reader(infile, delimiter = '|') i = 0 for row in reader: if i = 25: print row[8] i = i+1 I would like to convert this to a categorized corpus with some of the other columns corresponding to the categories. All of the columns are text (i.e., strings). I have looked for documentation on how to use csv.reader with PlaintextCorpusReader but have been unsuccessful in finding a example similar to what I want to do. Can someone please help? This mailing list is for learning Python. For problems with a specific library you should use the general python list http://mail.python.org/mailman/listinfo/python-list or a forum dedicated to that library http://groups.google.com/group/nltk-users If you ask on a general forum you should give some context -- the name of the library would be the bare minimum. The following comes with no warranties as I'm not an nltk user: import csv from nltk.corpus.reader.plaintext import CategorizedPlaintextCorpusReader from itertools import islice, chain LIMIT_SIZE = 25 # set to None if not debugging def pairs(filename): Generate (filename, list_of_categories) pairs from a csv file with open(filename, rb) as infile: rows = islice(csv.reader(infile, delimiter=|), LIMIT_SIZE) for row in rows: # assume that columns 10 and above contain categories yield row[8], row[9:] if __name__ == __main__: import random FILENAME = notes.txt # assume that every filename occurs only once in the file file_to_categories = dict(pairs(FILENAME)) files = list(file_to_categories) all_categories = set(chain.from_iterable(file_to_categories.itervalues())) reader = CategorizedPlaintextCorpusReader(., files, cat_map=file_to_categories) # print words for a random category category = random.choice(list(all_categories)) print words for category {}:.format(category) print sorted(set(reader.words(categories=category))) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Python internals
Hi, I am trying to learn how Python stores variables in memory. For ex: my_var = 'test' def func(): pass when I type dir() I get ['__builtins__', '__doc__', '__name__', '__package__', 'func', 'help', 'my_var'] are these variables stored in a dict and on calling dir() all the keys are returned? Or is it stored in a list or a heap? Can anyone suggest if there some document I can read to help me understand the Python internals work ? Cheers Kartik ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python internals
On 04/05/13 23:04, kartik sundarajan wrote: Hi, I am trying to learn how Python stores variables in memory. For ex: my_var = 'test' def func(): pass when I type dir() I get ['__builtins__', '__doc__', '__name__', '__package__', 'func', 'help', 'my_var'] are these variables stored in a dict and on calling dir() all the keys are returned? Or is it stored in a list or a heap? Python objects are dynamically allocated in the heap. Python variables are not variables in the C or Pascal sense, they are name bindings. When you do this: my_var = 'test' Python does the following: - create a string object 'test' - create a string object, 'my_var' - use 'my_var' as a key in the current namespace, with value 'test'. Creating a function is a little more complicated, but the simplified version goes like this: - create a string object 'func' - compile the body of the function into a code object - create a new function object named 'func' from the code object - use 'func' as a key in the current namespace, with the function object as the value. When you call dir(), by default it looks at the current namespace. The dunder names shown (Double leading and trailing UNDERscore) have special meaning to Python; the others are objects you have added. The documentation for dir says: py help(dir) Help on built-in function dir in module __builtin__: dir(...) dir([object]) - list of strings If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it. If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns: for a module object: the module's attributes. for a class object: its attributes, and recursively the attributes of its bases. for any other object: its attributes, its class's attributes, and recursively the attributes of its class's base classes. Can anyone suggest if there some document I can read to help me understand the Python internals work ? The Python docs are a good place to start. http://docs.python.org/3/index.html Especially: http://docs.python.org/3/reference/datamodel.html http://docs.python.org/3/reference/executionmodel.html -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] changing char list to int list isn't working
On 05/04/2013 12:13 AM, Jim Mooney wrote: for num in listOfNumChars: num = int(num) It seems like people learning Python run into this very often. I think the reason is that in most simple cases, it's easier and more intuitive to think that the name IS the object: x = 1 y = 2 print x + y Even though I know it's not a precise description, when I see this code, I think of it as x is 1, y is 2, print x plus y. And you do get expected result, which reinforces this intuition. Of course, a more precise way to think is: name 'x' is assigned to object with value=1 name 'y' is assigned to object with value=2 sum values that currently have assigned names of 'x' and 'y' Therefore, what you are really doing is: for each object in listOfNumChars: assign name 'num' to object (this is done automatically by the loop) assign name 'num' to int(value that has currently assigned name 'num') -m -- Lark's Tongue Guide to Python: http://lightbird.net/larks/ Oaths are the fossils of piety. George Santayana ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib2 and tests
On 05/05/13 13:27, RJ Ewing wrote: When I run the following test.py, I get the following error: [...] If I run the fetch_file function outside of the test, it works fine. Any ideas? The code you are actually running, and the code you say you are running below, are different. Your error message refers to a file test_filefetcher.py, not the Test.py you show us. As given, Test.py cannot possibly work, since it doesn't define filefetcher. I can only guess that this is meant to be the module you are trying to test, but since you don't show us what is in that module, I can only guess what it contains. More comments below: RROR: test_fetch_file (__main__.TestFileFetcher) -- Traceback (most recent call last): File test_filefetcher.py, line 12, in test_fetch_file fetched_file = filefetcher.fetch_file(URL) What's filefetcher? I'm guessing its the module you are testing, which is consistent with the next line showing the file name filefetcher.py: File /Users/rjewing/Documents/Work/filefetcher.py, line 7, in fetch_file return urllib2.urlopen(url).read() File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py, line 126, in urlopen return _opener.open(url, data, timeout) File /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py, line 392, in open protocol = req.get_type() AttributeError: 'TestFileFetcher' object has no attribute 'get_type' Somehow, your test suite, the TestFileFetcher object, is being passed down into the urllib2 library. I can only guess that somehow url is not an actual URL. I suggest you add a line: print(url, type(url)) just before the failing line, and see what it prints. -- Test.py: This cannot be the actual test suite you are running, since it cannot run as shown. It doesn't import unittest or the module to be tested. class TestFileFetcher(unittest.TestCase): def test_fetch_file(URL): phrase = 'position = support-intern' fetched_file = filefetcher.fetch_file(URL) And here's your error! Just as I thought, URL is not what you think it is, it is the TestFileFetcher instance. Unittest cases do not take arguments. Since they are methods, they are always defined with a single argument, conventionally called self, representing the instance that the method is called on. So normally you would define a method like this: def test_fetch_file(self, url): which then takes a single *implicit* argument self, provided by Python, plus a second *explicit* argument, url. But because this is a test method, the unittest framework does not expect to pass an argument to the method, so you have to write it like this: def test_fetch_file(self): and get the url some other way. One common way would be to define an attribute on the test, and store the URL in that: class TestFileFetcher(unittest.TestCase): URL = some_url_goes_here # FIX THIS def test_fetch_file(self): phrase = 'position = support-intern' fetched_file = filefetcher.fetch_file(self.URL) ... unittest.assertIsNone(fetched_file, 'The file was not fetched correctly') This part of the test seems to be wrong to me. It says: compare the value of fetched_file to None; if it is None, the test passes; if it is some other value, the test fails with error message 'The file was not fetched correctly' But then you immediately go on to use fetched_file: text = filefetcher.add_phrase(fetched_file) but if the above assertIsNone test passed, then fetched_file is None so this is equivalent to: text = filefetcher.add_phrase(None) which surely isn't right? unittest.assertNotIn(phrase, text, 'The phrase is not in the file') This test also appears backwards. You're testing: check whether phrase is NOT in text; if it is NOT in, then the test passes; otherwise, if it IS in, then fail with an error message 'The phrase is not in the file' which is clearly wrong. The message should be: 'The phrase is in the file' since your test is checking that it isn't in. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] urllib2 and tests
Thank you, I figured out what the problem was. I was passing in url into the test_file_fetch function instead of self. URL was a global. I did get the asserts mixed up. They were the opposite of what I wanted. Sorry I didn't include the whole test.py file for reference. Thanks again On Sat, May 4, 2013 at 9:08 PM, Steven D'Aprano st...@pearwood.info wrote: On 05/05/13 13:27, RJ Ewing wrote: When I run the following test.py, I get the following error: [...] If I run the fetch_file function outside of the test, it works fine. Any ideas? The code you are actually running, and the code you say you are running below, are different. Your error message refers to a file test_filefetcher.py, not the Test.py you show us. As given, Test.py cannot possibly work, since it doesn't define filefetcher. I can only guess that this is meant to be the module you are trying to test, but since you don't show us what is in that module, I can only guess what it contains. More comments below: RROR: test_fetch_file (__main__.TestFileFetcher) --**--** -- Traceback (most recent call last): File test_filefetcher.py, line 12, in test_fetch_file fetched_file = filefetcher.fetch_file(URL) What's filefetcher? I'm guessing its the module you are testing, which is consistent with the next line showing the file name filefetcher.py: File /Users/rjewing/Documents/**Work/filefetcher.py, line 7, in fetch_file return urllib2.urlopen(url).read() File /Library/Frameworks/Python.**framework/Versions/2.7/lib/** python2.7/urllib2.py, line 126, in urlopen return _opener.open(url, data, timeout) File /Library/Frameworks/Python.**framework/Versions/2.7/lib/** python2.7/urllib2.py, line 392, in open protocol = req.get_type() AttributeError: 'TestFileFetcher' object has no attribute 'get_type' Somehow, your test suite, the TestFileFetcher object, is being passed down into the urllib2 library. I can only guess that somehow url is not an actual URL. I suggest you add a line: print(url, type(url)) just before the failing line, and see what it prints. --**--** -- Test.py: This cannot be the actual test suite you are running, since it cannot run as shown. It doesn't import unittest or the module to be tested. class TestFileFetcher(unittest.**TestCase): def test_fetch_file(URL): phrase = 'position = support-intern' fetched_file = filefetcher.fetch_file(URL) And here's your error! Just as I thought, URL is not what you think it is, it is the TestFileFetcher instance. Unittest cases do not take arguments. Since they are methods, they are always defined with a single argument, conventionally called self, representing the instance that the method is called on. So normally you would define a method like this: def test_fetch_file(self, url): which then takes a single *implicit* argument self, provided by Python, plus a second *explicit* argument, url. But because this is a test method, the unittest framework does not expect to pass an argument to the method, so you have to write it like this: def test_fetch_file(self): and get the url some other way. One common way would be to define an attribute on the test, and store the URL in that: class TestFileFetcher(unittest.**TestCase): URL = some_url_goes_here # FIX THIS def test_fetch_file(self): phrase = 'position = support-intern' fetched_file = filefetcher.fetch_file(self.**URL) ... unittest.assertIsNone(fetched_**file, 'The file was not fetched correctly') This part of the test seems to be wrong to me. It says: compare the value of fetched_file to None; if it is None, the test passes; if it is some other value, the test fails with error message 'The file was not fetched correctly' But then you immediately go on to use fetched_file: text = filefetcher.add_phrase(**fetched_file) but if the above assertIsNone test passed, then fetched_file is None so this is equivalent to: text = filefetcher.add_phrase(None) which surely isn't right? unittest.assertNotIn(phrase, text, 'The phrase is not in the file') This test also appears backwards. You're testing: check whether phrase is NOT in text; if it is NOT in, then the test passes; otherwise, if it IS in, then fail with an error message 'The phrase is not in the file' which is clearly wrong. The message should be: 'The phrase is in the file' since your test is checking that it isn't in. -- Steven __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor