Re: [Tutor] duplication in unit tests
Serdar Tumgoren dixit: > I'll admit, I learned the hard way on a project earlier this year. I > got that project done (again with the help of folks on this list), but > didn't do any test-writing up front. And now, as the inevitable bugs > crop up, I'm forced to patch them hoping that I don't break something > else. It drove home the fact that I need to get serious about testing, > even if I don't go full-bore TDD on every project. The great thing about testing is, once you have test suites for each module (in the general sense of the term), for relations between them, for global functionality, then you can update, refactor, enhace, etc... with some amount of confidence that you're not just adding bugs to bugs. Denis la vita e estrany http://spir.wikidot.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
> Yes, this is much better. Notice how much less code it is! :-) Yes, it was amazing to see how much code melted away when I gave up on the OO design. As you and others suggested, clearly this was not the correct approach for this portion of my program. > If you expect unicode input then it makes sense to test for it. If you > don't expect unicode input, it might make sense to test for an > expected error - how do you want the function to behave with invalid > inputs? For invalid inputs, I'd like to log the error and place the data in a file for additional review. The input has to be unicode, and if it's not, it means my initial read and decoding of the source data was not performed properly. For that case, I'll plan to raise an exception and abort the program. You could add other tests as well, for example does it work if > there are two dashes in a row? Does splitLines() correctly remove > blank lines? So it seems I have a healthy list of test cases to start with! > By the way I applaud your effort, unit testing is a valuable skill. I'll admit, I learned the hard way on a project earlier this year. I got that project done (again with the help of folks on this list), but didn't do any test-writing up front. And now, as the inevitable bugs crop up, I'm forced to patch them hoping that I don't break something else. It drove home the fact that I need to get serious about testing, even if I don't go full-bore TDD on every project. I suspect an overdose of preparatory TDD reading had my head swirling a bit. I appreciate you all walking me through this first effort. No doubt I'll be writing again soon with more questions. Meantime, many thanks! Serdar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
Serdar Tumgoren dixit: > Hi everyone, > I'm trying to apply some lessons from the recent list discussions on > unit testing and Test-Driven Development, but I seem to have hit a > sticking point. > > As part of my program, I'm planning to create objects that perform > some initial data clean-up and then parse and database the cleaned > data. Currently I'm expecting to have a FileCleaner and Parser > classes. Using the TDD approach, I've so far come up with the below: > > class FileCleaner(object): > def __init__(self, datastring): > self.source = datastring > > def convertEmDashes(self): > """Convert unicode emdashes to minus signs""" > self.datastring = self.source.replace(u'\u2014','-') > > def splitLines(self): > """Generate and store a list of cleaned, non-empty lines""" > self.data = [x.strip() for x in > self.datastring.strip().split('\n') if x.strip()] > > > My confusion involves the test code for the above class and its > methods. The only way I can get splitLines to pass its unit test is by > first calling the convertEmDashes method, and then splitLines. > > class TestFileCleaner(unittest.TestCase): > def setUp(self): > self.sourcestring = u"""Thisline has an em\u2014dash.\n > So does this \u2014\n.""" > self.cleaner = FileCleaner(self.sourcestring) > > def test_convertEmDashes(self): > """convertEmDashes should remove minus signs from datastring > attribute""" > teststring = self.sourcestring.replace(u'\u2014','-') > self.cleaner.convertEmDashes() > self.assertEqual(teststring, self.cleaner.datastring) > > def test_splitLines(self): > """splitLines should create a list of cleaned lines""" > teststring = self.sourcestring.replace(u'\u2014','-') > data = [x.strip() for x in teststring.strip().split('\n') if > x.strip()] > self.cleaner.convertEmDashes() > self.cleaner.splitLines() > self.assertEqual(data, self.cleaner.data) > > Basically, I'm duplicating the steps from the first test method in the > second test method (and this duplication will accrue as I add more > "cleaning" methods). > > I understand that TestCase's setUp method is called before each test > is run (and therefore the FileCleaner object is created anew), but > this coupling of a test to other methods of the class under test seems > to violate the principle of testing methods in isolation. > > So my questions -- Am I misunderstanding how to properly write unit > tests for this case? Or perhaps I've structured my program > incorrectly, and that's what this duplication reveals? I suspected, > for instance, that perhaps I should group these methods > (convertEmDashes, splitLines, etc.) into a single larger function or > method. > > But that approach seems to violate the "best practice" of writing > small methods. As you can tell, I'm a bit at sea on this. Your > guidance is greatly appreciated!! > > Regards, > Serdar > > ps - recommendations on cleaning up and restructuring code are also welcome! Hello, I guess you're first confused at the design level of your app. Test and design both require you to clearly express your expectations. Here, the cleanup phase may be written as follow (I don't mean it's particuliarly good, just an example): plain source data = input --> output = ready-to-process data As you see, this requirement is, conceptually speaking, a purely function-al one; in the plain sense of the word "function". At least, this is the way I see it. Building an object to implement it is imo a wrong interpretation of OO design. (It's also writing java in python ;-) I would rather chose to write it as a method of a higher-level object. Possibly, this method would split into smaller ones if needed. Then, expressing your tests is in a sense translating the requirement above into code: feeding the piece of code to be tested with raw input data and checking the output is as expected. As well expressed by Kent, you should test with typical, edge, *and wrong* input; in the latter case the test is expected to fail. You will have to hand-write or automatically produce input strings for each test. If the func is split, then you will have to do it for each mini-func to be tested. This can be rather unpleasant, especially in cases like yours where funcs look like logically operating in sequence, but there is no way to escape. Actually, the several cleanup tasks (translating special chars, skipping blank lines, etc...) are rather orthogonal: they don't need to be tested in sequence. Denis la vita e estrany http://spir.wikidot.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
Serdar Tumgoren wrote: Hi Kent and Lie, First, thanks to you both for the help. I reworked the tests and then the main code according to your suggestions (I really was muddling these TDD concepts!). The reworked code and tests are below. In the tests, I hard-coded the source data and the expected results; in the main program code, I eliminated the FileCleaner class and converted its methods to stand-alone functions. I'm planning to group them into a single, larger "process" function as you all suggested. Meantime, I'd be grateful if you could critique whether I've properly followed your advice. And of course, feel free to suggest other tests that might be appropriate. For instance, would it make sense to test convertEmDashes for non-unicode input? Thanks again! Serdar test_cleaner.py from cleaner import convertEmDashes, splitLines class TestCleanerMethods(unittest.TestCase): def test_convertEmDashes(self): """convertEmDashes to minus signs""" srce = u"""Thisline has an em\u2014dash.\nSo does this \u2014.\n""" expected = u"""Thisline has an em-dash.\nSo does this -.\n""" result = convertEmDashes(srce) self.assertEqual(result, expected) def test_splitLines(self): """splitLines should create a list of cleaned lines""" srce = u"""Thisline has an em\u2014dash.\nSo does this \u2014.\n""" expected = [u'Thisline has an em\u2014dash.', u'So does this \u2014.'] result = splitLines(srce) self.assertEqual(result, expected) cleaner.py def convertEmDashes(datastring): """Convert unicode emdashes to minus signs""" datastring = datastring.replace(u'\u2014','-') I think the 'dash' should be a unicode one, at least if you're expecting the datastring to be unicode. datastring = datastring.replace(u'\u2014',u'-') It will probably be slightly more efficient, but more importantly, it'll make it clear what you're expecting. return datastring def splitLines(datastring): """Generate list of cleaned lines""" data = [x.strip() for x in datastring.strip().split('\n') if x.strip()] return data And in both these functions, the doc string doesn't reflect the function very well (any more). They both should indicate what kind of data they expect (unicode?), and the latter one should not say that the lines are cleaned. What it should say is that the lines in the list have no leading or trailing whitespace, and that blank lines are dropped. Once you have multiple "cleanup" functions, the unit tests become much more important. For example, the order of application of the cleanups could matter a lot. And pretty soon you'll have to document just what your public interface is. If your "user" may only call the overall cleanup() function, then blackbox testing only needs to examine that one, and whitebox testing can deal with the functions entirely independently. DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
On Tue, Dec 8, 2009 at 10:11 PM, Serdar Tumgoren wrote: > Hi Kent and Lie, > > First, thanks to you both for the help. I reworked the tests and then > the main code according to your suggestions (I really was muddling > these TDD concepts!). > > The reworked code and tests are below. In the tests, I hard-coded the > source data and the expected results; in the main program code, I > eliminated the FileCleaner class and converted its methods to > stand-alone functions. I'm planning to group them into a single, > larger "process" function as you all suggested. > > Meantime, I'd be grateful if you could critique whether I've properly > followed your advice. Yes, this is much better. Notice how much less code it is! :-) > And of course, feel free to suggest other tests > that might be appropriate. For instance, would it make sense to test > convertEmDashes for non-unicode input? If you expect unicode input then it makes sense to test for it. If you don't expect unicode input, it might make sense to test for an expected error - how do you want the function to behave with invalid inputs? You could add other tests as well, for example does it work if there are two dashes in a row? Does splitLines() correctly remove blank lines? These are simple functions but the idea is to think of all the desired behaviours and write test cases to cover them. By the way I applaud your effort, unit testing is a valuable skill. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
Hi Kent and Lie, First, thanks to you both for the help. I reworked the tests and then the main code according to your suggestions (I really was muddling these TDD concepts!). The reworked code and tests are below. In the tests, I hard-coded the source data and the expected results; in the main program code, I eliminated the FileCleaner class and converted its methods to stand-alone functions. I'm planning to group them into a single, larger "process" function as you all suggested. Meantime, I'd be grateful if you could critique whether I've properly followed your advice. And of course, feel free to suggest other tests that might be appropriate. For instance, would it make sense to test convertEmDashes for non-unicode input? Thanks again! Serdar test_cleaner.py from cleaner import convertEmDashes, splitLines class TestCleanerMethods(unittest.TestCase): def test_convertEmDashes(self): """convertEmDashes to minus signs""" srce = u"""Thisline has an em\u2014dash.\nSo does this \u2014.\n""" expected = u"""Thisline has an em-dash.\nSo does this -.\n""" result = convertEmDashes(srce) self.assertEqual(result, expected) def test_splitLines(self): """splitLines should create a list of cleaned lines""" srce = u"""Thisline has an em\u2014dash.\nSo does this \u2014.\n""" expected = [u'Thisline has an em\u2014dash.', u'So does this \u2014.'] result = splitLines(srce) self.assertEqual(result, expected) cleaner.py def convertEmDashes(datastring): """Convert unicode emdashes to minus signs""" datastring = datastring.replace(u'\u2014','-') return datastring def splitLines(datastring): """Generate list of cleaned lines""" data = [x.strip() for x in datastring.strip().split('\n') if x.strip()] return data ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
On 12/9/2009 10:43 AM, Kent Johnson wrote: So my questions -- Am I misunderstanding how to properly write unit tests for this case? Or perhaps I've structured my program incorrectly, and that's what this duplication reveals? I suspected, for instance, that perhaps I should group these methods (convertEmDashes, splitLines, etc.) into a single larger function or method. Yes, your tests are revealing a problem with the structure. You should probably have a single process() method that does all the cleanup methods and the split. Then you could also have a test for this. I should add, a unittest can be a white-box testing. You can have TestCases for the whole "process" (blackbox test), but you can also have TestCases for each splitLine, convertEmDashes, etc (whitebox test). The test for the large "process" will be, sort of, a simple integration test for each sub-processes. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] duplication in unit tests
On Tue, Dec 8, 2009 at 6:02 PM, Serdar Tumgoren wrote: > As part of my program, I'm planning to create objects that perform > some initial data clean-up and then parse and database the cleaned > data. Currently I'm expecting to have a FileCleaner and Parser > classes. Using the TDD approach, I've so far come up with the below: > > class FileCleaner(object): > def __init__(self, datastring): > self.source = datastring > > def convertEmDashes(self): > """Convert unicode emdashes to minus signs""" > self.datastring = self.source.replace(u'\u2014','-') > > def splitLines(self): > """Generate and store a list of cleaned, non-empty lines""" > self.data = [x.strip() for x in > self.datastring.strip().split('\n') if x.strip()] > > > My confusion involves the test code for the above class and its > methods. The only way I can get splitLines to pass its unit test is by > first calling the convertEmDashes method, and then splitLines. > > class TestFileCleaner(unittest.TestCase): > def setUp(self): > self.sourcestring = u"""This line has an em\u2014dash.\n > So does this \u2014\n.""" > self.cleaner = FileCleaner(self.sourcestring) > > def test_convertEmDashes(self): > """convertEmDashes should remove minus signs from datastring > attribute""" > teststring = self.sourcestring.replace(u'\u2014','-') > self.cleaner.convertEmDashes() > self.assertEqual(teststring, self.cleaner.datastring) > > def test_splitLines(self): > """splitLines should create a list of cleaned lines""" > teststring = self.sourcestring.replace(u'\u2014','-') > data = [x.strip() for x in teststring.strip().split('\n') if x.strip()] > self.cleaner.convertEmDashes() > self.cleaner.splitLines() > self.assertEqual(data, self.cleaner.data) > > Basically, I'm duplicating the steps from the first test method in the > second test method (and this duplication will accrue as I add more > "cleaning" methods). I see a few problems with this. You are confused about what splitLines() does. It does not create a list of cleaned lines, it just splits the lines. Because of your confusion about splitLines(), your test is not just testing splitLines(), it is testing convertEmDashes() and splitLines(). That is why you have code duplication. test_splitLines() could look like this: def test_splitLines(self): """splitLines should create a list of split lines""" teststring = self.sourcestring data = [x.strip() for x in teststring.strip().split('\n') if x.strip()] self.cleaner.splitLines() self.assertEqual(data, self.cleaner.data) Your tests are not very good. They don't really test anything because they use the same code that you are trying to test. What if str.replace() or split() or strip() does not work the way you expect? Your tests would not discover this. You should just hard-code the expected result strings. I would write test_splitLines() like this: def test_splitLines(self): """splitLines should create a list of split lines""" data = [u"Thisline has an em\u2014dash.", u"So does this \u2014", u"."] self.cleaner.splitLines() self.assertEqual(data, self.cleaner.data) You probably don't want to hard-code the source string in the setup method. Typically you want to test a function with multiple inputs so you can check its behaviour with typical values, edge cases and invalid input. For example test_splitLines() doesn't verify that splitLines() removes blank lines; that would be a good test case. You might make a list of pairs of (input value, expected result) and pass each one to splitLines(). > So my questions -- Am I misunderstanding how to properly write unit > tests for this case? Or perhaps I've structured my program > incorrectly, and that's what this duplication reveals? I suspected, > for instance, that perhaps I should group these methods > (convertEmDashes, splitLines, etc.) into a single larger function or > method. Yes, your tests are revealing a problem with the structure. You should probably have a single process() method that does all the cleanup methods and the split. Then you could also have a test for this. There is really no need for a class here. You could write separate functions for each cleanup and for the split, then another function that puts them all together. This would be easier to test, too. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] duplication in unit tests
Hi everyone, I'm trying to apply some lessons from the recent list discussions on unit testing and Test-Driven Development, but I seem to have hit a sticking point. As part of my program, I'm planning to create objects that perform some initial data clean-up and then parse and database the cleaned data. Currently I'm expecting to have a FileCleaner and Parser classes. Using the TDD approach, I've so far come up with the below: class FileCleaner(object): def __init__(self, datastring): self.source = datastring def convertEmDashes(self): """Convert unicode emdashes to minus signs""" self.datastring = self.source.replace(u'\u2014','-') def splitLines(self): """Generate and store a list of cleaned, non-empty lines""" self.data = [x.strip() for x in self.datastring.strip().split('\n') if x.strip()] My confusion involves the test code for the above class and its methods. The only way I can get splitLines to pass its unit test is by first calling the convertEmDashes method, and then splitLines. class TestFileCleaner(unittest.TestCase): def setUp(self): self.sourcestring = u"""Thisline has an em\u2014dash.\n So does this \u2014\n.""" self.cleaner = FileCleaner(self.sourcestring) def test_convertEmDashes(self): """convertEmDashes should remove minus signs from datastring attribute""" teststring = self.sourcestring.replace(u'\u2014','-') self.cleaner.convertEmDashes() self.assertEqual(teststring, self.cleaner.datastring) def test_splitLines(self): """splitLines should create a list of cleaned lines""" teststring = self.sourcestring.replace(u'\u2014','-') data = [x.strip() for x in teststring.strip().split('\n') if x.strip()] self.cleaner.convertEmDashes() self.cleaner.splitLines() self.assertEqual(data, self.cleaner.data) Basically, I'm duplicating the steps from the first test method in the second test method (and this duplication will accrue as I add more "cleaning" methods). I understand that TestCase's setUp method is called before each test is run (and therefore the FileCleaner object is created anew), but this coupling of a test to other methods of the class under test seems to violate the principle of testing methods in isolation. So my questions -- Am I misunderstanding how to properly write unit tests for this case? Or perhaps I've structured my program incorrectly, and that's what this duplication reveals? I suspected, for instance, that perhaps I should group these methods (convertEmDashes, splitLines, etc.) into a single larger function or method. But that approach seems to violate the "best practice" of writing small methods. As you can tell, I'm a bit at sea on this. Your guidance is greatly appreciated!! Regards, Serdar ps - recommendations on cleaning up and restructuring code are also welcome! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor