Re: [Tutor] duplication in unit tests

2009-12-09 Thread spir
Serdar Tumgoren  dixit:

> I'll admit, I learned the hard way on a project earlier this year. I
> got that project done (again with the help of folks on this list), but
> didn't do any test-writing up front. And now, as the inevitable bugs
> crop up, I'm forced to patch them hoping that I don't break something
> else. It drove home the fact that I need to get serious about testing,
> even if I don't go full-bore TDD on every project.

The great thing about testing is, once you have test suites for each module (in 
the general sense of the term), for relations between them, for global 
functionality, then you can update, refactor, enhace, etc... with some amount 
of confidence that you're not just adding bugs to bugs.

Denis


la vita e estrany

http://spir.wikidot.com/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-09 Thread Serdar Tumgoren
> Yes, this is much better. Notice how much less code it is! :-)

Yes, it was amazing to see how much code melted away when I gave up on
the OO design. As you and others suggested, clearly this was not the
correct approach for this portion of my program.

> If you expect unicode input then it makes sense to test for it. If you
> don't expect unicode input, it might make sense to test for an
> expected error - how do you want the function to behave with invalid
> inputs?

For invalid inputs, I'd like to log the error and place the data in a
file for additional review. The input has to be unicode, and if it's
not, it means my initial read and decoding of the source data was not
performed properly. For that case, I'll plan to raise an exception and
abort the program.

You could add other tests as well, for example does it work if
> there are two dashes in a row? Does splitLines() correctly remove
> blank lines?

So it seems I have a healthy list of test cases to start with!

> By the way I applaud your effort, unit testing is a valuable skill.

I'll admit, I learned the hard way on a project earlier this year. I
got that project done (again with the help of folks on this list), but
didn't do any test-writing up front. And now, as the inevitable bugs
crop up, I'm forced to patch them hoping that I don't break something
else. It drove home the fact that I need to get serious about testing,
even if I don't go full-bore TDD on every project.

I suspect an overdose of preparatory TDD reading had my head swirling
a bit. I appreciate you all walking me through this first effort.

No doubt I'll be writing again soon with more questions. Meantime, many thanks!

Serdar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-09 Thread spir
Serdar Tumgoren  dixit:

> Hi everyone,
> I'm trying to apply some lessons from the recent list discussions on
> unit testing and Test-Driven Development, but I seem to have hit a
> sticking point.
> 
> As part of my program, I'm planning to create objects that perform
> some initial data clean-up and then parse and database the cleaned
> data. Currently I'm expecting to have a FileCleaner and Parser
> classes. Using the TDD approach, I've so far come up with the below:
> 
> class FileCleaner(object):
> def __init__(self, datastring):
> self.source = datastring
> 
> def convertEmDashes(self):
> """Convert unicode emdashes to minus signs"""
> self.datastring = self.source.replace(u'\u2014','-')
> 
> def splitLines(self):
> """Generate and store a list of cleaned, non-empty lines"""
> self.data = [x.strip() for x in
> self.datastring.strip().split('\n') if x.strip()]
> 
> 
> My confusion involves the test code for the above class and its
> methods. The only way I can get splitLines to pass its unit test is by
> first calling the convertEmDashes method, and then splitLines.
> 
> class TestFileCleaner(unittest.TestCase):
> def setUp(self):
> self.sourcestring = u"""Thisline   has an em\u2014dash.\n
> So   does this  \u2014\n."""
> self.cleaner = FileCleaner(self.sourcestring)
> 
> def test_convertEmDashes(self):
> """convertEmDashes should remove minus signs from datastring
> attribute"""
> teststring = self.sourcestring.replace(u'\u2014','-')
> self.cleaner.convertEmDashes()
> self.assertEqual(teststring, self.cleaner.datastring)
> 
> def test_splitLines(self):
> """splitLines should create a list of cleaned lines"""
> teststring = self.sourcestring.replace(u'\u2014','-')
> data = [x.strip() for x in teststring.strip().split('\n') if 
> x.strip()]
> self.cleaner.convertEmDashes()
> self.cleaner.splitLines()
> self.assertEqual(data, self.cleaner.data)
> 
> Basically, I'm duplicating the steps from the first test method in the
> second test method (and this duplication will accrue as I add more
> "cleaning" methods).
> 
> I understand that TestCase's setUp method is called before each test
> is run (and therefore the FileCleaner object is created anew), but
> this coupling of a test to other methods of the class under test seems
> to violate the principle of testing methods in isolation.
> 
> So my questions -- Am I misunderstanding how to properly write unit
> tests for this case? Or perhaps I've structured my program
> incorrectly, and that's what this duplication reveals? I suspected,
> for instance, that perhaps I should group these methods
> (convertEmDashes, splitLines, etc.) into a single larger function or
> method.
> 
> But that approach seems to violate the "best practice" of writing
> small methods. As you can tell, I'm a bit at sea on this.  Your
> guidance is greatly appreciated!!
> 
> Regards,
> Serdar
> 
> ps - recommendations on cleaning up and restructuring code are also welcome!

Hello,

I guess you're first confused at the design level of your app. Test and design 
both require you to clearly express your expectations. Here, the cleanup phase 
may be written as follow (I don't mean it's particuliarly good, just an 
example):

plain source data = input   -->   output = ready-to-process data

As you see, this requirement is, conceptually speaking, a purely function-al 
one; in the plain sense of the word "function". At least, this is the way I see 
it.
Building an object to implement it is imo a wrong interpretation of OO design. 
(It's also writing java in python ;-) I would rather chose to write it as a 
method of a higher-level object. Possibly, this method would split into smaller 
ones if needed.

Then, expressing your tests is in a sense translating the requirement above 
into code: feeding the piece of code to be tested with raw input data and 
checking the output is as expected. As well expressed by Kent, you should test 
with typical, edge, *and wrong* input; in the latter case the test is expected 
to fail.
You will have to hand-write or automatically produce input strings for each 
test. If the func is split, then you will have to do it for each mini-func to 
be tested. This can be rather unpleasant, especially in cases like yours where 
funcs look like logically operating in sequence, but there is no way to escape. 
Actually, the several cleanup tasks (translating special chars, skipping blank 
lines, etc...) are rather orthogonal: they don't need to be tested in sequence.


Denis


la vita e estrany

http://spir.wikidot.com/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-08 Thread Dave Angel

Serdar Tumgoren wrote:

Hi Kent and Lie,

First, thanks to you both for the help. I reworked the tests and then
the main code according to your suggestions (I really was muddling
these TDD concepts!).

The reworked code and tests are below. In the tests, I hard-coded the
source data and the expected results; in the main program code, I
eliminated the FileCleaner class and converted its methods to
stand-alone functions. I'm planning to group them into a single,
larger "process" function as you all suggested.

Meantime, I'd be grateful if you could critique whether I've properly
followed your advice. And of course, feel free to suggest other tests
that might be appropriate. For instance, would it make sense to test
convertEmDashes for non-unicode input?

Thanks again!
Serdar

 test_cleaner.py 
from cleaner import convertEmDashes, splitLines

class TestCleanerMethods(unittest.TestCase):
def test_convertEmDashes(self):
"""convertEmDashes to minus signs"""
srce = u"""Thisline   has an em\u2014dash.\nSo   does this
 \u2014.\n"""
expected = u"""Thisline   has an em-dash.\nSo   does this  -.\n"""
result = convertEmDashes(srce)
self.assertEqual(result, expected)

def test_splitLines(self):
"""splitLines should create a list of cleaned lines"""
srce = u"""Thisline   has an em\u2014dash.\nSo   does this
 \u2014.\n"""
expected = [u'Thisline   has an em\u2014dash.', u'So
does this  \u2014.']
result = splitLines(srce)
self.assertEqual(result, expected)


 cleaner.py 
def convertEmDashes(datastring):
"""Convert unicode emdashes to minus signs"""
datastring = datastring.replace(u'\u2014','-')
  
I think the 'dash' should be a unicode one, at least if you're expecting 
the datastring to be unicode.


   datastring = datastring.replace(u'\u2014',u'-')

It will probably be slightly more efficient, but more importantly, it'll make 
it clear what you're expecting.



return datastring

def splitLines(datastring):
"""Generate list of cleaned lines"""
data = [x.strip() for x in datastring.strip().split('\n') if x.strip()]
return data

  
And in both these functions, the doc string doesn't reflect the function 
very well (any more).  They both should indicate what kind of data they 
expect (unicode?), and the latter one should not say that the lines are 
cleaned.  What it should say is that the lines in the list have no 
leading or trailing whitespace, and that blank lines are dropped.



Once you have multiple "cleanup" functions, the unit tests become much 
more important.  For example, the order of application of the cleanups 
could matter a lot.  And pretty soon you'll have to document just what 
your public interface is.  If your "user" may only call the overall 
cleanup() function, then blackbox testing only needs to examine that 
one, and whitebox testing can deal with the functions entirely 
independently.


DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-08 Thread Kent Johnson
On Tue, Dec 8, 2009 at 10:11 PM, Serdar Tumgoren  wrote:
> Hi Kent and Lie,
>
> First, thanks to you both for the help. I reworked the tests and then
> the main code according to your suggestions (I really was muddling
> these TDD concepts!).
>
> The reworked code and tests are below. In the tests, I hard-coded the
> source data and the expected results; in the main program code, I
> eliminated the FileCleaner class and converted its methods to
> stand-alone functions. I'm planning to group them into a single,
> larger "process" function as you all suggested.
>
> Meantime, I'd be grateful if you could critique whether I've properly
> followed your advice.

Yes, this is much better. Notice how much less code it is! :-)

> And of course, feel free to suggest other tests
> that might be appropriate. For instance, would it make sense to test
> convertEmDashes for non-unicode input?

If you expect unicode input then it makes sense to test for it. If you
don't expect unicode input, it might make sense to test for an
expected error - how do you want the function to behave with invalid
inputs? You could add other tests as well, for example does it work if
there are two dashes in a row? Does splitLines() correctly remove
blank lines?

These are simple functions but the idea is to think of all the desired
behaviours and write test cases to cover them.

By the way I applaud your effort, unit testing is a valuable skill.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-08 Thread Serdar Tumgoren
Hi Kent and Lie,

First, thanks to you both for the help. I reworked the tests and then
the main code according to your suggestions (I really was muddling
these TDD concepts!).

The reworked code and tests are below. In the tests, I hard-coded the
source data and the expected results; in the main program code, I
eliminated the FileCleaner class and converted its methods to
stand-alone functions. I'm planning to group them into a single,
larger "process" function as you all suggested.

Meantime, I'd be grateful if you could critique whether I've properly
followed your advice. And of course, feel free to suggest other tests
that might be appropriate. For instance, would it make sense to test
convertEmDashes for non-unicode input?

Thanks again!
Serdar

 test_cleaner.py 
from cleaner import convertEmDashes, splitLines

class TestCleanerMethods(unittest.TestCase):
def test_convertEmDashes(self):
"""convertEmDashes to minus signs"""
srce = u"""Thisline   has an em\u2014dash.\nSo   does this
 \u2014.\n"""
expected = u"""Thisline   has an em-dash.\nSo   does this  -.\n"""
result = convertEmDashes(srce)
self.assertEqual(result, expected)

def test_splitLines(self):
"""splitLines should create a list of cleaned lines"""
srce = u"""Thisline   has an em\u2014dash.\nSo   does this
 \u2014.\n"""
expected = [u'Thisline   has an em\u2014dash.', u'So
does this  \u2014.']
result = splitLines(srce)
self.assertEqual(result, expected)


 cleaner.py 
def convertEmDashes(datastring):
"""Convert unicode emdashes to minus signs"""
datastring = datastring.replace(u'\u2014','-')
return datastring

def splitLines(datastring):
"""Generate list of cleaned lines"""
data = [x.strip() for x in datastring.strip().split('\n') if x.strip()]
return data
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-08 Thread Lie Ryan

On 12/9/2009 10:43 AM, Kent Johnson wrote:

So my questions -- Am I misunderstanding how to properly write unit
tests for this case? Or perhaps I've structured my program
incorrectly, and that's what this duplication reveals? I suspected,
for instance, that perhaps I should group these methods
(convertEmDashes, splitLines, etc.) into a single larger function or
method.


Yes, your tests are revealing a problem with the structure. You should
probably have a single process() method that does all the cleanup
methods and the split. Then you could also have a test for this.


I should add, a unittest can be a white-box testing. You can have 
TestCases for the whole "process" (blackbox test), but you can also have 
TestCases for each splitLine, convertEmDashes, etc (whitebox test).


The test for the large "process" will be, sort of, a simple integration 
test for each sub-processes.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] duplication in unit tests

2009-12-08 Thread Kent Johnson
On Tue, Dec 8, 2009 at 6:02 PM, Serdar Tumgoren  wrote:

> As part of my program, I'm planning to create objects that perform
> some initial data clean-up and then parse and database the cleaned
> data. Currently I'm expecting to have a FileCleaner and Parser
> classes. Using the TDD approach, I've so far come up with the below:
>
> class FileCleaner(object):
>    def __init__(self, datastring):
>        self.source = datastring
>
>    def convertEmDashes(self):
>        """Convert unicode emdashes to minus signs"""
>        self.datastring = self.source.replace(u'\u2014','-')
>
>    def splitLines(self):
>        """Generate and store a list of cleaned, non-empty lines"""
>        self.data = [x.strip() for x in
> self.datastring.strip().split('\n') if x.strip()]
>
>
> My confusion involves the test code for the above class and its
> methods. The only way I can get splitLines to pass its unit test is by
> first calling the convertEmDashes method, and then splitLines.
>
> class TestFileCleaner(unittest.TestCase):
>    def setUp(self):
>        self.sourcestring = u"""This    line   has an em\u2014dash.\n
>                So   does this  \u2014\n."""
>        self.cleaner = FileCleaner(self.sourcestring)
>
>    def test_convertEmDashes(self):
>        """convertEmDashes should remove minus signs from datastring
> attribute"""
>        teststring = self.sourcestring.replace(u'\u2014','-')
>        self.cleaner.convertEmDashes()
>        self.assertEqual(teststring, self.cleaner.datastring)
>
>    def test_splitLines(self):
>        """splitLines should create a list of cleaned lines"""
>        teststring = self.sourcestring.replace(u'\u2014','-')
>        data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
>        self.cleaner.convertEmDashes()
>        self.cleaner.splitLines()
>        self.assertEqual(data, self.cleaner.data)
>
> Basically, I'm duplicating the steps from the first test method in the
> second test method (and this duplication will accrue as I add more
> "cleaning" methods).

I see a few problems with this.

You are confused about what splitLines() does. It does not create a
list of cleaned lines, it just splits the lines. Because of your
confusion about splitLines(), your test is not just testing
splitLines(), it is testing convertEmDashes() and splitLines(). That
is why you have code duplication. test_splitLines() could look like
this:
   def test_splitLines(self):
   """splitLines should create a list of split lines"""
   teststring = self.sourcestring
   data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
   self.cleaner.splitLines()
   self.assertEqual(data, self.cleaner.data)

Your tests are not very good. They don't really test anything because
they use the same code that you are trying to test. What if
str.replace() or split() or strip() does not work the way you expect?
Your tests would not discover this. You should just hard-code the
expected result strings.  I would write test_splitLines() like this:
   def test_splitLines(self):
   """splitLines should create a list of split lines"""
   data = [u"Thisline   has an em\u2014dash.", u"So   does
this  \u2014", u"."]
   self.cleaner.splitLines()
   self.assertEqual(data, self.cleaner.data)

You probably don't want to hard-code the source string in the setup
method. Typically you want to test a function with multiple inputs so
you can check its behaviour with typical values, edge cases and
invalid input. For example test_splitLines() doesn't verify that
splitLines() removes blank lines; that would be a good test case. You
might make a list of pairs of (input value, expected result) and pass
each one to splitLines().

> So my questions -- Am I misunderstanding how to properly write unit
> tests for this case? Or perhaps I've structured my program
> incorrectly, and that's what this duplication reveals? I suspected,
> for instance, that perhaps I should group these methods
> (convertEmDashes, splitLines, etc.) into a single larger function or
> method.

Yes, your tests are revealing a problem with the structure. You should
probably have a single process() method that does all the cleanup
methods and the split. Then you could also have a test for this.

There is really no need for a class here. You could write separate
functions for each cleanup and for the split, then another function
that puts them all together. This would be easier to test, too.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor