Re: Where to put the error handing test?

2009-11-24 Thread Bruno Desthuilliers

alex23 a écrit :

On Nov 24, 1:15 pm, Peng Yu pengyu...@gmail.com wrote:

Suppose that I have function f() that calls g(), I can put a test on
the argument 'x' in either g() or f(). I'm wondering what is the
common practice.

If I put the test in f(), then g() becomes more efficient when other
code call g() and guarantee x will pass the test even though the test
code in not in g(). But there might be some caller of g() that pass an
'x' that might not pass the test, if there were the test in g().


What you should try to do is make each function as self-contained as
possible. f() shouldn't have to know what is a valid argument for g(),
that's the responsibility of g(). 


There's no such clear-cut IMHO - it really depends on the context. If f 
is a user-interface function - a function that deals with user inputs in 
whatever form - and g is a domain-specific library function, then it's 
f's responsability to validate user inputs before calling on g (_and_ of 
course to deal with any exception raised withing g).


As a general rule, defensive code should go at the interface level - 
program's inputs of course, but also, sometimes, at sub-systems 
boundaries.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Paul Miller
On Mon, 23 Nov 2009 22:27:24 -0800, alex23 wrote:

 As a very rough example:
 
 def g(x):
 try:
 assert isinstance(x, int)
 except AssertionError:
 raise TypeError, excepted int, got %s % type(x)
 # ... function code goes here
 
 def f(x):
 try:
 g(x)
 except TypeError:
 # handle the problem here
 # ... function code goes here

I know you say this is a very rough example, but, generally you don't 
want to do this kind of type checking with isinstance.  Rather, it's 
better to just simply manipulate x as if it were an integer and rely on 
Python to check to see if x supports the operations you're trying to do 
with it.  For instance, say we have

def g(x):
return x * x

def f(x):
return g(x) + 2

If you try to pass any value to either of these functions that doesn't 
support the required operations, Python itself will complain with a 
TypeError.  Since the interpreter needs to do this check *anyway*, 
there's no real sense in repeating it manually by checking isinstance.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Dave Angel

Peng Yu wrote:

On Mon, Nov 23, 2009 at 9:44 PM, Lie Ryan lie.1...@gmail.com wrote:
  

Peng Yu wrote:


Suppose that I have function f() that calls g(), I can put a test on
the argument 'x' in either g() or f(). I'm wondering what is the
common practice.

My thought is that if I put the test in g(x), the code of g(x) is
safer, but the test is not necessary when g() is called by h().

If I put the test in f(), then g() becomes more efficient when other
code call g() and guarantee x will pass the test even though the test
code in not in g(). But there might be some caller of g() that pass an
'x' that might not pass the test, if there were the test in g().
  

Typically, you test for x as early as possible, e.g. just after user input
(or file or url load or whatever). After that test, you can (or should be
able to) assume that all function calls will always be called with the
correct argument. This is the ideal situation, it's not always easy to do.

In any case though, don't optimize early.



Let's suppose that g() is refactored out from f() and is call by not
only f() but other functions, and g() is likely to be called by new
functions.

If I don't optimize early, I should put the test in g(), rather than f(), right?

  
Your question is so open-ended as to be unanswerable.  All we should do 
in this case is supply some guidelines so you can guess which one might 
apply in your particular case.


You could be referring to a test that triggers alternate handling.  Or 
you could be referring to a test that notices bad input by a user, or 
bad data from an untrusted source.  Or you could be referring to a test 
that discovers bugs in your code.  And there are variations of these, 
depending on whether your user is also writing code (eval, or even 
import of user-supplied mixins), etc.


The first thing that's needed in the function g() is a docstring, 
defining what inputs it expects, and what it'll do with them.  Then if 
it gets any input that doesn't meet those requirements, it might throw 
an exception.  Or it might just get an arbitrary result.  That's all up 
to the docstring.  Without any documentation, nothing is correct.


Functions that are only called by trusted code need not have explicit 
tests on their inputs, since you're writing it all.  Part of debugging 
is catching those cases where f () can pass bad data to g().  If it's 
caused because bad data is passed to f(), then you have a bug in that 
caller.  Eventually, you get to the user.  If the bad data comes from 
the user, it should be caught as soon as possible, and feedback supplied 
right then.


assert() ought to be the correct way to add tests in g() that test 
whether there's such a bug in f().  Unfortunately, in CPython it 
defaults to debug mode, so scripts that are run will execute those tests 
by default.  Consequently, people leave them out, to avoid slowing down 
code.




It comes down to trust.  If you throw the code together without a test 
suite, you'll be a long time finding all the bugs in non-trivial code.  
So add lots of defensive tests throughout the code, and pretend that's 
equivalent to a good test system.  If you're writing a library to be 
used by others, then define your public interfaces with exceptions for 
any invalid code, and write careful documentation describing what's 
invalid.  And if you're writing an end-user application, test their 
input as soon as you get it, so none of the rest of the application ever 
gets invalid data.



DaveA
--
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Steven D'Aprano
On Mon, 23 Nov 2009 21:15:48 -0600, Peng Yu wrote:

 Suppose that I have function f() that calls g(), I can put a test on the
 argument 'x' in either g() or f(). I'm wondering what is the common
 practice.
 
 My thought is that if I put the test in g(x), the code of g(x) is safer,
 but the test is not necessary when g() is called by h().

If the function g requires the test, then put it in g. If it does not 
require the test, then don't put it in g.

If the test is only required by f, then it belongs in f.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Peng Yu
On Tue, Nov 24, 2009 at 12:27 AM, alex23 wuwe...@gmail.com wrote:
 On Nov 24, 1:15 pm, Peng Yu pengyu...@gmail.com wrote:
 Suppose that I have function f() that calls g(), I can put a test on
 the argument 'x' in either g() or f(). I'm wondering what is the
 common practice.

 If I put the test in f(), then g() becomes more efficient when other
 code call g() and guarantee x will pass the test even though the test
 code in not in g(). But there might be some caller of g() that pass an
 'x' that might not pass the test, if there were the test in g().

 What you should try to do is make each function as self-contained as
 possible. f() shouldn't have to know what is a valid argument for g(),
 that's the responsibility of g(). What f() needs to know is how to
 deal with any problems that arise while using g().

This may not always be possible, because g() might call a third party
software, that I don't have the complete knowledge of. What would you
do if this case?

Another scenario:

Suppose that f_1(),...,f_n(), g() are in a package, where g() is an
internal function that the end users are not suppose to call, and
f_1(),...,f_n() are the functions that the end users may call.

Since all the f_1 ... f_n functions knows g(), they can be
programmed to guarantee not to pass any arguments that can not be
handled by g(). In this case, I think it is reasonable to move the
test code from g()? Is it the general accepted practice?

 As a very rough example:

    def g(x):
        try:
            assert isinstance(x, int)
        except AssertionError:
            raise TypeError, excepted int, got %s % type(x)
        # ... function code goes here

    def f(x):
        try:
            g(x)
        except TypeError:
            # handle the problem here
        # ... function code goes here

 My thought is that if I put the test in g(x), the code of g(x) is
 safer, but the test is not necessary when g() is called by h().

 This sounds strange to me. Are you stating that h() can pass values to
 g() that would be illegal for f() to pass? That sounds like a very
 dangerous design...you want each function's behaviour to be as
 consistent and predictable as it possibly can.

You misunderstood me.

h() doesn't pass any illegal arguments to g(). If I put the test code
in g(), it would be a waste of run time when h() calls g(). In this
case, and under the condition that g() is an internal function of a
package as I mentioned above, I think I should move the test code from
g() to f(). What do you think?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Peng Yu
On Tue, Nov 24, 2009 at 4:58 AM, Dave Angel da...@ieee.org wrote:
 Peng Yu wrote:

 On Mon, Nov 23, 2009 at 9:44 PM, Lie Ryan lie.1...@gmail.com wrote:


 Peng Yu wrote:


 Suppose that I have function f() that calls g(), I can put a test on
 the argument 'x' in either g() or f(). I'm wondering what is the
 common practice.

 My thought is that if I put the test in g(x), the code of g(x) is
 safer, but the test is not necessary when g() is called by h().

 If I put the test in f(), then g() becomes more efficient when other
 code call g() and guarantee x will pass the test even though the test
 code in not in g(). But there might be some caller of g() that pass an
 'x' that might not pass the test, if there were the test in g().


 Typically, you test for x as early as possible, e.g. just after user
 input
 (or file or url load or whatever). After that test, you can (or should be
 able to) assume that all function calls will always be called with the
 correct argument. This is the ideal situation, it's not always easy to
 do.

 In any case though, don't optimize early.


 Let's suppose that g() is refactored out from f() and is call by not
 only f() but other functions, and g() is likely to be called by new
 functions.

 If I don't optimize early, I should put the test in g(), rather than f(),
 right?



 Your question is so open-ended as to be unanswerable.  All we should do in
 this case is supply some guidelines so you can guess which one might apply
 in your particular case.

 You could be referring to a test that triggers alternate handling.  Or you
 could be referring to a test that notices bad input by a user, or bad data
 from an untrusted source.  Or you could be referring to a test that
 discovers bugs in your code.  And there are variations of these, depending
 on whether your user is also writing code (eval, or even import of
 user-supplied mixins), etc.

 The first thing that's needed in the function g() is a docstring, defining
 what inputs it expects, and what it'll do with them.  Then if it gets any
 input that doesn't meet those requirements, it might throw an exception.  Or
 it might just get an arbitrary result.  That's all up to the docstring.
  Without any documentation, nothing is correct.

 Functions that are only called by trusted code need not have explicit tests
 on their inputs, since you're writing it all.  Part of debugging is catching
 those cases where f () can pass bad data to g().  If it's caused because bad
 data is passed to f(), then you have a bug in that caller.  Eventually, you
 get to the user.  If the bad data comes from the user, it should be caught
 as soon as possible, and feedback supplied right then.

I'll still confused by the guideline that an error should be caught as
early as possible.

Suppose I have the following call chain

f1() -- f2() -- f3() -- f4()

The input in f1() might cause an error in f4(). However, this error
can of cause be caught by f1(), whenever I want to do so. In the worst
case, I could duplicate the code of f2 and f3, and the test code in f4
to f1(), to catch the error in f1 rather than f4. But I don't think
that this is what you mean.

Then the problem is where to put the test code more effectively. I
would consider 'whether it is obvious to test the condition in the
give function' as the guideline. However, it might be equal obvious to
test the same thing two functions, for example, f1 and f4.

In this case, I thought originally that I should put the test code in
f1 rather than f4, if f1, f2, f3 and f4 are all the functions that I
have in the package that I am making. But it is possible that some
time later I added the function f5(),...,f10() that calls f4(). Since
f4 doesn't have the test code, f5(),...,f10() should have the same
test code. This is clearly a redundancy to the code. If I move the
test code to f4(), there is a redundancy of the code between f1 and
f4.

I'm wondering how you would solve the above problem?

 assert() ought to be the correct way to add tests in g() that test whether
 there's such a bug in f().  Unfortunately, in CPython it defaults to debug
 mode, so scripts that are run will execute those tests by default.
  Consequently, people leave them out, to avoid slowing down code.



 It comes down to trust.  If you throw the code together without a test
 suite, you'll be a long time finding all the bugs in non-trivial code.  So
 add lots of defensive tests throughout the code, and pretend that's
 equivalent to a good test system.  If you're writing a library to be used by
 others, then define your public interfaces with exceptions for any invalid
 code, and write careful documentation describing what's invalid.  And if
 you're writing an end-user application, test their input as soon as you get
 it, so none of the rest of the application ever gets invalid data.

Having the test code for any function and any class (even the ones
that are internal in the package) is basically what I am doing.
However, if I 

Re: Where to put the error handing test?

2009-11-24 Thread Lie Ryan

Peng Yu wrote:

On Tue, Nov 24, 2009 at 4:58 AM, Dave Angel da...@ieee.org wrote:


I'll put an extra emphasis on this:
Your question is so open-ended as to be unanswerable.  




I'll still confused by the guideline that an error should be caught as
early as possible.


but not too early. Errors must be RAISED as early as possible. Errors 
must be CAUGHT as soon as there is enough information to handle the errors.



Suppose I have the following call chain

f1() -- f2() -- f3() -- f4()

The input in f1() might cause an error in f4(). However, this error
can of cause be caught by f1(), whenever I want to do so. In the worst
case, I could duplicate the code of f2 and f3, and the test code in f4
to f1(), to catch the error in f1 rather than f4. But I don't think
that this is what you mean.


Why would f1() have faulty data? Does it come from external input? Then 
the input must be invalidated at f1(). Then f2(), f3(), and f4() does 
not require any argument checking since it is assumed that f1() calls 
them with good input.


Of course, there is cases where data validation may not be on f1. For 
example, if f1() is a function that receives keyboard input for a 
filename f1 validates whether the filename is valid. Then f2 might open 
the file and validate that the file is of the expected type (if f2 
expect a .csv file, but given an .xls file, f2 would scream out an 
error). f3 received rows/lines of data from f2 and extracts 
fields/columns from the rows; but some of the data might be faulty and 
these must be filtered out before the data is transformed by the f4. f4 
assumes f3 cleans the row and so f4 is does not do error-checking. The 
transformed data then is rewritten back by f3. Now there is an f5 which 
creates new data, the new data needs to be transformed by f4 as well and 
since f5 creates a new data, there is no way for it to create invalid 
data. Now assume f6 merges new data from another csv file; f6's data 
will also be transformed by f4, f6 can use the same validation as in f3 
but why not turn f2,f3 instead to open the other file? Now f7 will merge 
data from multiple f3 streams and transforms by f4. Then comes f8 which 
creates new data from user input, the user-inputted data will need some 
checking; and now we have trouble since the data validation is inside 
f3. But we can easily factor out the validation part into f9 and call f9 
from f3 and f8.


The example case still checks for problem as early as possible. It 
checks for problem when it is possible to determine whether a particular 
condition is problematic. The as early as possible guideline does not 
mean f1, a filename input function, must check whether the csv fields 
contains valid data. But f2, the file opening function, must be given a 
valid filename; f3, the file parser, must be given a valid file object 
with the proper type; f4, the transformer, must be given a valid row to 
transform; etc. f2 should not need to check it is given a valid 
filename, it's f1 job to validate it; f3 should not need to check 
whether the file object is of the proper type; f4 should not need to 
check it is given a valid row tuple; and so on...



Then the problem is where to put the test code more effectively. I
would consider 'whether it is obvious to test the condition in the
give function' as the guideline. However, it might be equal obvious to
test the same thing two functions, for example, f1 and f4.

In this case, I thought originally that I should put the test code in
f1 rather than f4, if f1, f2, f3 and f4 are all the functions that I
have in the package that I am making. But it is possible that some
time later I added the function f5(),...,f10() that calls f4(). Since
f4 doesn't have the test code, f5(),...,f10() should have the same
test code. This is clearly a redundancy to the code. If I move the
test code to f4(), there is a redundancy of the code between f1 and
f4.


I can't think of a good example where such redundancy would happen and 
there is no other, better way to walk around it. Could you provide a 
CONCRETE example? One that might happen, instead one that could 
theoretically happen.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-24 Thread Steven D'Aprano
On Tue, 24 Nov 2009 10:14:19 -0600, Peng Yu wrote:

 I'll still confused by the guideline that an error should be caught as
 early as possible.

I think you are confused by what that means. It means the error should be 
caught as early as possible in *time*, not in the function chain.

In Python, errors are always generated as soon as they occur. But imagine 
a system where you have a function that returns -1 on error and some 
integer on success. Then it is possible for that -1 error code to be 
passed along from one function to another to another to another before 
finally causing a fatal error, by which time you have no idea where it 
came from. That's what should be prohibited.



 Suppose I have the following call chain
 
 f1() -- f2() -- f3() -- f4()
 
 The input in f1() might cause an error in f4(). 

Assuming that none of the functions have side-effects, then f4 is the 
right place to catch the error. To do otherwise means that f1 needs to 
know all the possible things that can go wrong in f2, and f3, and f4.


 However, this error can
 of cause be caught by f1(), whenever I want to do so. In the worst case,
 I could duplicate the code of f2 and f3, and the test code in f4 to
 f1(), to catch the error in f1 rather than f4. But I don't think that
 this is what you mean.
 
 Then the problem is where to put the test code more effectively. I would
 consider 'whether it is obvious to test the condition in the give
 function' as the guideline. However, it might be equal obvious to test
 the same thing two functions, for example, f1 and f4.

You're not a mathematician, are you?

For each function, define its domain: the values it must work with. 
Suppose f1 should return a result for (e.g.) all integers, but f2 only 
returns a result for *positive* integers.

Then it is very simple: f2 will raise an error if given zero or negative 
values, and so f1 must either:

* avoid that error by handling zero or negative separately; or
* catch the error and then handle zero or negative separately.

Simplified example:

def f2(x):
if x  0:
return x+1
raise ValueError


def f1(x):  # version 1
if x  0:
return f2(x)
else:
return something else

# or

def f1(x):  # version 2
try:
return f2(x)
except ValueError:
return something else


Version two is the preferred way to do it, since that means f1 doesn't 
need to know what the domain of f2 is. But if f2 has side-effects 
(usually a bad idea) then version one is better.


But what if f1 and f2 have the same domain?

Then it is very simple: put the error checking wherever it makes sense. 
If f2 is public, put the error checking there, and then f1 can avoid 
duplicating the error checking.

But what if f1 has the more restrictive domain? Then put the error 
checking in f1.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Where to put the error handing test?

2009-11-23 Thread Peng Yu
Suppose that I have function f() that calls g(), I can put a test on
the argument 'x' in either g() or f(). I'm wondering what is the
common practice.

My thought is that if I put the test in g(x), the code of g(x) is
safer, but the test is not necessary when g() is called by h().

If I put the test in f(), then g() becomes more efficient when other
code call g() and guarantee x will pass the test even though the test
code in not in g(). But there might be some caller of g() that pass an
'x' that might not pass the test, if there were the test in g().

def g(x):
   # I can put the code here to test whether x satisfies certain conditions.
   blah, blah

def f(x):
   blah, blah
   #do something on x
   # I can also put the code here to test whether x satisfies certain
conditions.
   g(x)


def h()
   blah
   g(x)#x here are guaranteed to pass the test
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-23 Thread Lie Ryan

Peng Yu wrote:

Suppose that I have function f() that calls g(), I can put a test on
the argument 'x' in either g() or f(). I'm wondering what is the
common practice.

My thought is that if I put the test in g(x), the code of g(x) is
safer, but the test is not necessary when g() is called by h().

If I put the test in f(), then g() becomes more efficient when other
code call g() and guarantee x will pass the test even though the test
code in not in g(). But there might be some caller of g() that pass an
'x' that might not pass the test, if there were the test in g().


Typically, you test for x as early as possible, e.g. just after user 
input (or file or url load or whatever). After that test, you can (or 
should be able to) assume that all function calls will always be called 
with the correct argument. This is the ideal situation, it's not always 
easy to do.


In any case though, don't optimize early.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-23 Thread Peng Yu
On Mon, Nov 23, 2009 at 9:44 PM, Lie Ryan lie.1...@gmail.com wrote:
 Peng Yu wrote:

 Suppose that I have function f() that calls g(), I can put a test on
 the argument 'x' in either g() or f(). I'm wondering what is the
 common practice.

 My thought is that if I put the test in g(x), the code of g(x) is
 safer, but the test is not necessary when g() is called by h().

 If I put the test in f(), then g() becomes more efficient when other
 code call g() and guarantee x will pass the test even though the test
 code in not in g(). But there might be some caller of g() that pass an
 'x' that might not pass the test, if there were the test in g().

 Typically, you test for x as early as possible, e.g. just after user input
 (or file or url load or whatever). After that test, you can (or should be
 able to) assume that all function calls will always be called with the
 correct argument. This is the ideal situation, it's not always easy to do.

 In any case though, don't optimize early.

Let's suppose that g() is refactored out from f() and is call by not
only f() but other functions, and g() is likely to be called by new
functions.

If I don't optimize early, I should put the test in g(), rather than f(), right?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put the error handing test?

2009-11-23 Thread alex23
On Nov 24, 1:15 pm, Peng Yu pengyu...@gmail.com wrote:
 Suppose that I have function f() that calls g(), I can put a test on
 the argument 'x' in either g() or f(). I'm wondering what is the
 common practice.

 If I put the test in f(), then g() becomes more efficient when other
 code call g() and guarantee x will pass the test even though the test
 code in not in g(). But there might be some caller of g() that pass an
 'x' that might not pass the test, if there were the test in g().

What you should try to do is make each function as self-contained as
possible. f() shouldn't have to know what is a valid argument for g(),
that's the responsibility of g(). What f() needs to know is how to
deal with any problems that arise while using g().

As a very rough example:

def g(x):
try:
assert isinstance(x, int)
except AssertionError:
raise TypeError, excepted int, got %s % type(x)
# ... function code goes here

def f(x):
try:
g(x)
except TypeError:
# handle the problem here
# ... function code goes here

 My thought is that if I put the test in g(x), the code of g(x) is
 safer, but the test is not necessary when g() is called by h().

This sounds strange to me. Are you stating that h() can pass values to
g() that would be illegal for f() to pass? That sounds like a very
dangerous design...you want each function's behaviour to be as
consistent and predictable as it possibly can.
-- 
http://mail.python.org/mailman/listinfo/python-list