Re: [Tutor] Limitation of int() in converting strings
On 2 January 2013 17:59, Alan Gauld wrote: > On 01/02/2013 11:41 AM, Steven D'Aprano wrote: > >[SNIP] >> But __index__ is a special method that converts to int without rounding >> or truncating, intended only for types that emulate ints but not other >> numeric types: > > > And this was the new bit I didn't know about. > > [SNIP] help(int.__index__) > Help on wrapper_descriptor: > > __index__(...) > x[y:z] <==> x[y.__index__():z.__index__()] > > > Bingo! Although still doesn't anything explicitly about the need for an > integer! The operator.index builtin checks that an int/long is returned. The same is true of the underlying C-API that is used internally by indexable sequences (list, tuple, etc.). $ python Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import operator >>> class A(object): ... def __index__(self): ... return 4.5 ... >>> a = A() >>> a.__index__() 4.5 >>> operator.index(a) Traceback (most recent call last): File "", line 1, in TypeError: __index__ returned non-(int,long) (type float) >>> b = [1,2,3] >>> b[a] Traceback (most recent call last): File "", line 1, in TypeError: __index__ returned non-(int,long) (type float) You only need to know about this feature if you are implementing a custom integer type or a custom sequence type (both of which are things that most Python users will never do). This particular special method is probably only really documented in the PEP: http://www.python.org/dev/peps/pep-0357/ For my purposes, the important thing is that the method is only supposed to be implemented on types that always exactly represent integers, so it is not usable for converting e.g. floats to integers. Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On Tue, Jan 1, 2013 at 12:07 AM, Steven D'Aprano wrote: > > Again, I was mistaken. x%1 is not suitable to get the fraction part of a > number in Python: it returns the wrong result for negative values. You need > math.modf: > > py> x = -99.25 > py> x % 1 # want -0.25 > 0.75 > py> math.modf(x) > (-0.25, -99.0) math.modf wraps libm's modf, which takes a double. This isn't suitable for Decimal, Fraction, or a custom number type. What's wrong with using math.trunc for this? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 02/01/13 16:55, Dave Angel wrote: On 01/02/2013 11:41 AM, Steven D'Aprano wrote: The bit about __index__ refers to using trunc(): OK, that partially solves it :-) I don't know what this "index() builtin" is, it doesn't appear to exist. That was also what confused me. The only indexz() I could find was the one that found the index of an item in a collection >>> [1,2,3].index(2) 1 And I didn't see how trunc() or division helped there... But __index__ is a special method that converts to int without rounding or truncating, intended only for types that emulate ints but not other numeric types: And this was the new bit I didn't know about. import operator print operator.index(myobject) But I did try >>> import operator as op >>> help(op.index) Help on built-in function index in module operator: index(...) index(a) -- Same as a.__index__() Which was no help at all, so I tried >>> help(a.__index__) Traceback (most recent call last): File "", line 1, in NameError: name 'a' is not defined Which was what I expected! So now with your input I can try: >>> help(int.__index__) Help on wrapper_descriptor: __index__(...) x[y:z] <==> x[y.__index__():z.__index__()] Bingo! Although still doesn't anything explicitly about the need for an integer! But that was harder than it should have been! :-( Thanks guys, -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 01/02/2013 11:41 AM, Steven D'Aprano wrote: > > > The bit about __index__ refers to using trunc(): > > "I still really wish I had followed Pascal's lead instead of C's here: > Pascal requires you to use trunc() to convert a real to an integer. ... > If we had done it that way, we wouldn't have had to introduce the > index() builtin and the corresponding infrastructure (__index__ > and a whole slew of C APIs)." > > > I don't know what this "index() builtin" is, it doesn't appear to exist. > But __index__ is a special method that converts to int without rounding > or truncating, intended only for types that emulate ints but not other > numeric types: I suspect that at one time, an index() built-in was intended. It's now available as an operator, and simply calls the __index__() as you say. import operator print operator.index(myobject) works, at least in 2.7 and 3.x -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 03/01/13 03:18, Alan Gauld wrote: This is going somewhat off-topic but my curiosity is roused... On 02/01/13 15:16, Oscar Benjamin wrote: When the idea was discussed in the run up to Python 3, Guido raised exactly this case and said """... (BTW Pascal also had the division operator right, unlike C, and we're ... If we had done it that way, we wouldn't have had to introduce the index() builtin and the corresponding infrastructure (__index__ and a whole slew of C APIs). I don't get the reference to index here. Why would adopting Pascal style division remove the need for index? No, the comment about division was a parenthetical aside: "(BTW Pascal also had the division operator right, unlike C, and we're finally fixing this in Py3k by following Pascal's nearly-40-year-old lead.)" The bit about __index__ refers to using trunc(): "I still really wish I had followed Pascal's lead instead of C's here: Pascal requires you to use trunc() to convert a real to an integer. ... If we had done it that way, we wouldn't have had to introduce the index() builtin and the corresponding infrastructure (__index__ and a whole slew of C APIs)." I don't know what this "index() builtin" is, it doesn't appear to exist. But __index__ is a special method that converts to int without rounding or truncating, intended only for types that emulate ints but not other numeric types: py> (123).__index__() 123 py> (123.0).__index__() Traceback (most recent call last): File "", line 1, in AttributeError: 'float' object has no attribute '__index__' The purpose is to allow you to use custom integer-types in sequence indexing, e.g. if n = MyInteger(42), you can use mylist[n] and Python will call n.__index__() to convert to a proper int. In the past, Python would only allow actual ints or longs for indexing. Python cannot use the int() builtin or the regular __int__ method to do the conversion because they will happily convert floats and strings, and you don't want to allow mylist['42'] or mylist[42.0] to succeed. So there needs to be a second special method for converting integer-types to real ints. As is often the case, this need was driven by the numpy community, if I remember correctly, which has int8, int16, int32 and int64 types that don't inherit from int. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
This is going somewhat off-topic but my curiosity is roused... On 02/01/13 15:16, Oscar Benjamin wrote: > When the idea was discussed in the run up to Python 3, Guido raised exactly this case and said """... (BTW Pascal also had the division operator right, unlike C, and we're ... If we had done it that way, we wouldn't have had to introduce the index() builtin and the corresponding infrastructure (__index__ and a whole slew of C APIs). I don't get the reference to index here. Why would adopting Pascal style division remove the need for index? -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 1 January 2013 05:07, Steven D'Aprano wrote: > On 23/12/12 04:38, Oscar Benjamin wrote: >> >> On 22 December 2012 01:34, Steven D'Aprano wrote: >>> >>> On 18/12/12 01:36, Oscar Benjamin wrote: >>> I think it's unfortunate that Python's int() function combines two distinct behaviours in this way. In different situations int() is used to: 1) Coerce an object of some type other than int into an int without changing the value of the integer that the object represents. >>> > [SNIP] > > Yes. And it is a demonstrable fact that int is *not* intended to coerce > objects to int "without changing the value of the number", because > changing the value of the number is precisely what int() does, in some > circumstances. I wonder how often the int() function is used in a situation where the dual behaviour is actually desired. I imagine that the common uses for int are (in descending order or prevalence): 1) Parse a string as an integer 2) Convert an integer-valued numeric object of some other type, to the int type. 3) Truncate a non-integer valued object from a numeric type. The int() function as it stands performs all three operations. I would describe 1) and 2) as changing type but not value and 3) as a value changing operation. I can see use cases where 1) and 2) go together. I can't really envisage a case in which 1) and 3) are wanted at the same time in some particular piece of code. In other words I can't imagine a use case that calls for a function that works precisely as the int() function currently does. > > If you would like to argue that it would have been better if int did > not do this, then I might even agree with you. That's exactly what I would argue. > There is certainly > precedence: if I remember correctly, you cannot convert floating point > values to integers directly in Pascal, you first have to truncate them > to an integer-valued float, then convert. > > # excuse my sloppy Pascal syntax, it has been a few years > var > i: integer; > x: real; > begin > i = integer(trunc(x)); > end; When the idea was discussed in the run up to Python 3, Guido raised exactly this case and said """ I still really wish I had followed Pascal's lead instead of C's here: Pascal requires you to use trunc() to convert a real to an integer. (BTW Pascal also had the division operator right, unlike C, and we're finally fixing this in Py3k by following Pascal's nearly-40-year-old lead.) If we had done it that way, we wouldn't have had to introduce the index() builtin and the corresponding infrastructure (__index__ and a whole slew of C APIs). """ http://mail.python.org/pipermail/python-dev/2008-January/076546.html > So I'm not entirely against the idea that Python should have had separate > int() and trunc() functions, with int raising an exception on (non-whole > number?) floats. It's possible that the reason the idea was rejected before is because it was suggested that int(1.0) would raise an error, analogous to the way that float(1+0j) raises an error even though in both cases the conversion can exactly preserve value. > But back to Python as it actually is, rather than how it might have been. > There's no rule that int() must be numerically lossless. It is lossless > with strings, and refuses to convert strings-that-look-like-floats to ints. > And that makes sense: in an int, the "." character is just as illegal as > the characters "k" or "&" or "Ω", int will raise on "123k456", so why > wouldn't it raise on "123.456"? I agree. To be fair string handling is not my real complaint. I referred to that initially since the OP asked about that case. > > But that (good, conservative) design decision isn't required or enforced. > Hence my reply that you cannot safely make the assumption that int() on a > non-numeric type will be numerically exact. This is precisely my complaint. The currently available functions are int, operator.index, trunc, math.ceil, math.floor and round. Conspicuously absent is the function that simply converts an object to an integer type at the same time as refusing to change its numeric value. i.e. I would like the special rounding mode that is "no rounding", just an error for non-integral values. It's probably to do with the kind of code I write but I seem to often find myself implementing a (lazy and bug-prone) function for this task. >> [SNIP] >> >> This is precisely my point. I would prefer if if int(obj) would fail >> on non-integers leaving me with the option of calling an appropriate >> rounding function. After catching RoundError (or whatever) you would >> know that you have a number type object that can be passed to round, >> ceil, floor etc. > > Well, I guess that comes down to the fact that Python is mostly aimed at > mathematically and numerically naive users who would be scared off at a > plethora of rounding modes :-) That's true but I think there are some conceptual points that any user of any programming language should be forced to conside
Re: [Tutor] Limitation of int() in converting strings
On 23/12/12 04:38, Oscar Benjamin wrote: On 22 December 2012 01:34, Steven D'Aprano wrote: On 18/12/12 01:36, Oscar Benjamin wrote: I think it's unfortunate that Python's int() function combines two distinct behaviours in this way. In different situations int() is used to: 1) Coerce an object of some type other than int into an int without changing the value of the integer that the object represents. The second half of the sentence (starting from "without changing") is not justified. You can't safely make that assumption. All you know is that calling int() on an object is intended to convert the object to an int, in whatever way is suitable for that object. In some cases, that will be numerically exact (e.g. int("1234") will give 1234), in other cases it will not be. If I was to rewrite that sentence would replace the word 'integer' with 'number' but otherwise I'm happy with it. Your reference to "numerically exact" shows that you understood exactly what I meant. Yes. And it is a demonstrable fact that int is *not* intended to coerce objects to int "without changing the value of the number", because changing the value of the number is precisely what int() does, in some circumstances. If you would like to argue that it would have been better if int did not do this, then I might even agree with you. There is certainly precedence: if I remember correctly, you cannot convert floating point values to integers directly in Pascal, you first have to truncate them to an integer-valued float, then convert. # excuse my sloppy Pascal syntax, it has been a few years var i: integer; x: real; begin i = integer(trunc(x)); end; So I'm not entirely against the idea that Python should have had separate int() and trunc() functions, with int raising an exception on (non-whole number?) floats. But back to Python as it actually is, rather than how it might have been. There's no rule that int() must be numerically lossless. It is lossless with strings, and refuses to convert strings-that-look-like-floats to ints. And that makes sense: in an int, the "." character is just as illegal as the characters "k" or "&" or "Ω", int will raise on "123k456", so why wouldn't it raise on "123.456"? But that (good, conservative) design decision isn't required or enforced. Hence my reply that you cannot safely make the assumption that int() on a non-numeric type will be numerically exact. 2) Round an object with a non-integer value to an integer value. int() does not perform rounding (except in the most generic sense that any conversion from real-valued number to integer is "rounding"). That is what the round() function does. int() performs truncating: it returns the integer part of a numeric value, ignoring any fraction part: I was surprised by your objection to my use of the word "rounding" here. So I looked it up on Wikipedia: http://en.wikipedia.org/wiki/Rounding#Rounding_to_integer That section describes "round toward zero (or truncate..." which is essentially how I would have put it, and also how you put it below: Well, yes. I explicitly referred to the generic sense where any conversion from real-valued to whole number is "rounding". But I think that it is a problematic, ambiguous term that needs qualification: * sometimes truncation is explicitly included as a kind of rounding; * sometimes truncation is used in opposition to rounding. For example, I think that in everyday English, most people would be surprised to hear you describe "rounding 9.999 to 9". In the absence of an explicit rounding direction ("round down", "round up"), some form of "round to nearest" is assumed in everyday English, and as such is used in contrast to merely cutting off whatever fraction part is there (truncation). Hence the need for qualification. So you shouldn't think of int(number) as "convert number to an int", since that is ambiguous. There are at least six common ways to convert arbitrary numbers to ints: This is precisely my point. I would prefer if if int(obj) would fail on non-integers leaving me with the option of calling an appropriate rounding function. After catching RoundError (or whatever) you would know that you have a number type object that can be passed to round, ceil, floor etc. Well, I guess that comes down to the fact that Python is mostly aimed at mathematically and numerically naive users who would be scared off at a plethora of rounding modes :-) Python provides truncation via the int and math.trunc functions, floor and ceiling via math.floor and math.ceil, and round to nearest via round. In Python 2, ties are rounded up, which is biased; in Python 3, the unbiased banker's rounding is used. I wasn't aware of this change. Thanks for that. Actually, I appear to have been wrong: in Python 2, ties are rounded away from zero rather than up. Positive arguments round up, negative arguments round down: py> round(1.5), round(2.5) (2.0, 3.0) py> round(-1.5), round(-2.5) (-2.0, -3.0)
Re: [Tutor] Limitation of int() in converting strings
On Thu, Dec 27, 2012 at 12:13 PM, Oscar Benjamin wrote: > > I hadn't realised that. Does the int(obj) function use isinstance(obj, > str) under the hood? Yes. int_new and long_new use the macros PyString_Check (in 3.x PyBytes_Check) and PyUnicode_Check, which check the type's tp_flags. The C API can check for a subclass via tp_flags for the following types: #define Py_TPFLAGS_INT_SUBCLASS (1L<<23) #define Py_TPFLAGS_LONG_SUBCLASS(1L<<24) #define Py_TPFLAGS_LIST_SUBCLASS(1L<<25) #define Py_TPFLAGS_TUPLE_SUBCLASS (1L<<26) #define Py_TPFLAGS_STRING_SUBCLASS (1L<<27) #define Py_TPFLAGS_UNICODE_SUBCLASS (1L<<28) #define Py_TPFLAGS_DICT_SUBCLASS(1L<<29) #define Py_TPFLAGS_BASE_EXC_SUBCLASS(1L<<30) #define Py_TPFLAGS_TYPE_SUBCLASS(1L<<31) In 3.x bit 27 is renamed Py_TPFLAGS_BYTES_SUBCLASS. nb_int (__int__) in a types's PyNumberMethods is a unaryfunc, so __int__ as designed can't have the optional "base" argument that's used for strings. That has to be special cased. Without a specified a base, int_new (in 3.x long_new) redirects to the abstract function PyNumber_Int (in 3.x PyNumber_Long). This tries __int__ and __trunc__ (the latter returns an Integral, which is converted to int) before checking for a string or char buffer. Using the buffer interface is the reason the following works for a bytearray in 2.x: >>> int(bytearray('123')) 123 but specifying a base fails: >>> int(bytearray('123'), 10) Traceback (most recent call last): File "", line 1, in TypeError: int() can't convert non-string with explicit base long_new in 3.x adds a PyByteArray_Check: >>> int(bytearray(b'123'), 10) 123 Regarding this whole debate, I think a separate constructor for strings would have been cleaner, but I'm not Dutch. Source links: 3.3, long_new (see 4277): http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/longobject.c#l4248 3.3, PyNumber_Long: http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/abstract.c#l1262 2.7.3, int_new: http://hg.python.org/cpython/file/70274d53c1dd/Objects/intobject.c#l1049 2.7.3, PyNumber_int: http://hg.python.org/cpython/file/70274d53c1dd/Objects/abstract.c#l1610 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 27 December 2012 17:49, Steven D'Aprano wrote: > On 23/12/12 04:57, Oscar Benjamin wrote: >> >> On 22 December 2012 02:06, Steven D'Aprano wrote: >>> >>> On 18/12/12 01:36, Oscar Benjamin wrote: >>> I have often found myself writing awkward functions to prevent a rounding error from occurring when coercing an object with int(). Here's one: def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum >>> >>> >>> Well, that function is dangerously wrong. In no particular order, >>> I can find four bugs and one design flaw. >> >> >> I expected someone to object to this function. I had hoped that they >> might also offer an improved version, though. I can't see a good way >> to do this without special casing the treatment of some or other type >> (the obvious one being str). > > > Why is that a problem? > > > I think this should do the trick. However, I am lazy and have not > tested it, so this is your opportunity to catch me writing buggy > code :-) > > > def make_int(obj): > try: > # Fail if obj is not numeric. > obj + 0 > except TypeError: > # For simplicity, I require that objects that convert to > # ints always do so losslessly. > try: > return int(obj) > except ValueError: > obj = float(obj) > # If we get here, obj is numeric. But is it an int? > n = int(obj) # This may fail if obj is a NAN or INF. > if n == obj: > return n > raise ValueError('not an integer') This one has another large number related problem (also solved by using Decimal instead of float): >>> make_int('10.1') 10 Otherwise the function is good and it demonstrates my original point quite nicely: the function we've ended up with is pretty horrific for such a simple operation. It's also not something that a novice programmer could be expected to write or perhaps even to fully understand. In my ideal world the int() function would always raise an error for non-integers. People would have to get used to calling trunc() in place of int() but only in the (relatively few) places where they actually wanted that behaviour. The resulting code would be more explicit about when numeric values were being altered and what kind of rounding is being used, both of which are good things. At one point a similar (perhaps better) idea was discussed on python-dev: http://mail.python.org/pipermail/python-dev/2008-January/076481.html but it was rejected citing backwards compatibility concerns: http://mail.python.org/pipermail/python-dev/2008-January/076552.html > >> Whether or not assert is appropriate depends on the context (although >> I imagine that some people would disapprove of it always). I would say >> that if you are using assert then it should really be in a situation >> where you're not really looking to handle errors but just to abort the >> program and debug when something goes wrong. In that context I think >> that, far from being confusing, assert statements make it plainly >> clear what the programmer who wrote them was meaning to do. > > And what is that? "I only sometimes want to handle errors, sometimes I > want errors to silently occur without warning"? > > Asserts can be disabled by the person running your code. That alone means > that assert is *never* suitable for error checking, because you cannot be > sure if your error checking is taking place or not. It is as simple as > that. Maybe no-one else will ever run your code. This is the case for much of the code that I write. > So what is assert useful for? > > - Asserts are really handy for testing in the interactive interpreter; > assert is a lazy person's test, but when you're being quick and dirty, > that's a feature, not a bug. This would be my number one reason for using an assert (probably also the reason in that particular case). > > - Asserts are also useful for test suites, although less so because you > cannot run your test suite with optimizations on. I've seen this done a few times for example here: https://github.com/sympy/sympy/blob/master/sympy/assumptions/tests/test_matrices.py but I hadn't considered that particular problem. I think the reason for using them in sympy is that py.test has a special handler for pulling apart assert statements to show you the values in the expression that failed. > - Asserts are good for checking the internal logic and/or state of your > program. This is not error checking in the usual sense, since you are > not checking that data is okay, but defensively checking that your > code is okay. This is always the case when I use an assert. I don't want to catch the error and I also think the condition is, for some reason, always true unless my own co
Re: [Tutor] Limitation of int() in converting strings
On 23/12/12 04:57, Oscar Benjamin wrote: On 22 December 2012 02:06, Steven D'Aprano wrote: On 18/12/12 01:36, Oscar Benjamin wrote: I have often found myself writing awkward functions to prevent a rounding error from occurring when coercing an object with int(). Here's one: def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum Well, that function is dangerously wrong. In no particular order, I can find four bugs and one design flaw. I expected someone to object to this function. I had hoped that they might also offer an improved version, though. I can't see a good way to do this without special casing the treatment of some or other type (the obvious one being str). Why is that a problem? I think this should do the trick. However, I am lazy and have not tested it, so this is your opportunity to catch me writing buggy code :-) def make_int(obj): try: # Fail if obj is not numeric. obj + 0 except TypeError: # For simplicity, I require that objects that convert to # ints always do so losslessly. try: return int(obj) except ValueError: obj = float(obj) # If we get here, obj is numeric. But is it an int? n = int(obj) # This may fail if obj is a NAN or INF. if n == obj: return n raise ValueError('not an integer') Although you have listed 5 errors I would have written the same list as 2 errors: 1) You don't like my use of assert. That's more than a mere personal preference. See below. 2) The function doesn't work for large numbers (bigger than around 10). It's not just that it "doesn't work", but it experiences distinct failure modes. If you were writing regression tests for these bugs, you would need *at least* two such tests: - large strings convert exactly; - for int n, make_int(n) always returns n If I were writing unit tests, I would ensure that I had a unit test for each of the failures I showed. I would also add: 3) It's ridiculous to convert types several times just to convert to an integer without rounding. Perhaps. Even if that is the case, that's not a bug, merely a slightly less efficient implementation. Whether or not assert is appropriate depends on the context (although I imagine that some people would disapprove of it always). I would say that if you are using assert then it should really be in a situation where you're not really looking to handle errors but just to abort the program and debug when something goes wrong. In that context I think that, far from being confusing, assert statements make it plainly clear what the programmer who wrote them was meaning to do. And what is that? "I only sometimes want to handle errors, sometimes I want errors to silently occur without warning"? Asserts can be disabled by the person running your code. That alone means that assert is *never* suitable for error checking, because you cannot be sure if your error checking is taking place or not. It is as simple as that. So what is assert useful for? - Asserts are really handy for testing in the interactive interpreter; assert is a lazy person's test, but when you're being quick and dirty, that's a feature, not a bug. - Asserts are also useful for test suites, although less so because you cannot run your test suite with optimizations on. - Asserts are good for checking the internal logic and/or state of your program. This is not error checking in the usual sense, since you are not checking that data is okay, but defensively checking that your code is okay. What do I mean by that last one? If you're ever written defensive code with a comment saying "This cannot ever happen", this is a good candidate for an assertion. Good defensive technique is to be very cautious about the assumptions you make: just because you think something cannot happen, doesn't mean you are correct. So you test your own logic by checking that the thing you think must be true is true, and raise an error if it turns out you are wrong. But it seems pretty wasteful and pointless to be checking something that you know is always correct. Especially if those checks are expensive, you might want to turn them off. Hence, you use assert, which can be turned off. This is a trade-off, of course: you're trading a bit of extra speed for a bit more risk of a silent failure. If you're aren't confident enough to make that trade-off, you are better off using an explicit, non-assert check. It's a subtle difference, and a matter of personal judgement where the line between "internal logic" and "error checking" lies. But here's an example of what I consider a check of internal logic, specifically that numbers must be zero, positive or negative: # you can assume that x is a numeric type like int, float
Re: [Tutor] Limitation of int() in converting strings
On 24 December 2012 04:42, eryksun wrote: > On Sat, Dec 22, 2012 at 12:57 PM, Oscar Benjamin > wrote: def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum >>> >>> Well, that function is dangerously wrong. In no particular order, >>> I can find four bugs and one design flaw. >> >> I expected someone to object to this function. I had hoped that they >> might also offer an improved version, though. I can't see a good way >> to do this without special casing the treatment of some or other type >> (the obvious one being str). > > Strings don't implement __int__ or __trunc__; they aren't numbers, so > why not special case them? I hadn't realised that. Does the int(obj) function use isinstance(obj, str) under the hood? > You can parse strings with obj = > Decimal(obj) (this uses a regex). Then for all inputs set inum = > int(obj) and raise ValueError if inum != obj. It had occurred to me that this would be the obvious fix for the issue with large numbers. The result is: from decimal import Decimal def int_decimal(x): if isinstance(x, str): x = Decimal(x) ix = int(x) if ix != x: raise ValueError('Not an integer: %s' % x) return ix Probably what I more often want, though, is a function that simply refuses to handle real-valued types as inputs. That way if a float sneaks in I can choose the appropriate rounding function (or bug fix) at the source of the number. I'm not sure what's the best way to detect real-valued types. At least for the stdlib using the numbers module works: from numbers import Integral def int_(x): if not isinstance(x, (Integral, str)): raise TypeError('Need Integral: use round() or trunc()') return int(x) Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On Sat, Dec 22, 2012 at 12:57 PM, Oscar Benjamin wrote: >>> >>> def make_int(obj): >>> '''Coerce str, float and int to int without rounding error >>> Accepts strings like '4.0' but not '4.1' >>> ''' >>> fnum = float('%s' % obj) >>> inum = int(fnum) >>> assert inum == fnum >>> return inum >> >> Well, that function is dangerously wrong. In no particular order, >> I can find four bugs and one design flaw. > > I expected someone to object to this function. I had hoped that they > might also offer an improved version, though. I can't see a good way > to do this without special casing the treatment of some or other type > (the obvious one being str). Strings don't implement __int__ or __trunc__; they aren't numbers, so why not special case them? You can parse strings with obj = Decimal(obj) (this uses a regex). Then for all inputs set inum = int(obj) and raise ValueError if inum != obj. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 22 December 2012 01:34, Steven D'Aprano wrote: > On 18/12/12 01:36, Oscar Benjamin wrote: > >> I think it's unfortunate that Python's int() function combines two >> distinct behaviours in this way. In different situations int() is used >> to: >> 1) Coerce an object of some type other than int into an int without >> changing the value of the integer that the object represents. > > The second half of the sentence (starting from "without changing") is not > justified. You can't safely make that assumption. All you know is that > calling int() on an object is intended to convert the object to an int, > in whatever way is suitable for that object. In some cases, that will > be numerically exact (e.g. int("1234") will give 1234), in other cases it > will not be. If I was to rewrite that sentence would replace the word 'integer' with 'number' but otherwise I'm happy with it. Your reference to "numerically exact" shows that you understood exactly what I meant. >> 2) Round an object with a non-integer value to an integer value. > > > int() does not perform rounding (except in the most generic sense that any > conversion from real-valued number to integer is "rounding"). That is what > the round() function does. int() performs truncating: it returns the > integer part of a numeric value, ignoring any fraction part: I was surprised by your objection to my use of the word "rounding" here. So I looked it up on Wikipedia: http://en.wikipedia.org/wiki/Rounding#Rounding_to_integer That section describes "round toward zero (or truncate..." which is essentially how I would have put it, and also how you put it below: > > * truncate, or round towards zero (drop any fraction part); So I'm not really sure what your objection is to that, though you are free to prefer the word truncate to round in this case (and I am free to disagree). > So you shouldn't think of int(number) as "convert number to an int", since > that is ambiguous. There are at least six common ways to convert arbitrary > numbers to ints: This is precisely my point. I would prefer if if int(obj) would fail on non-integers leaving me with the option of calling an appropriate rounding function. After catching RoundError (or whatever) you would know that you have a number type object that can be passed to round, ceil, floor etc. > Python provides truncation via the int and math.trunc functions, floor and > ceiling via math.floor and math.ceil, and round to nearest via round. > In Python 2, ties are rounded up, which is biased; in Python 3, the > unbiased banker's rounding is used. I wasn't aware of this change. Thanks for that. > Instead, you should consider int(number) to be one of a pair of functions, > "return integer part", "return fraction part", where unfortunately the > second function isn't provided directly. In general though, you can get > the fractional part of a number with "x % 1". For floats, math.modf also > works. Assuming that you know you have an object that supports algebraic operations in a sensible way then this works, although the complementary function for "x % 1" would be "x // 1" or "math.floor(x)" rather than "int(x)". To get the complementary function for "int(x)" you could do "math.copysign(abs(x) % 1, x)" (maybe there's a simpler way): $ python Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> def reconstruct(x): ... return int(x) + x % 1 ... >>> reconstruct(1) 1 >>> reconstruct(1.5) 1.5 >>> reconstruct(-2) -2 >>> reconstruct(-2.5) -1.5 > So, in a sense int() does to double-duty as both a constructor of ints > from non-numbers such as strings, and as a "get integer part" function for > numbers. I'm okay with that. And I am not. Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
Oh, another comment... On 18/12/12 01:36, Oscar Benjamin wrote: I have often found myself writing awkward functions to prevent a rounding error from occurring when coercing an object with int(). Here's one: def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum Well, that function is dangerously wrong. In no particular order, I can find four bugs and one design flaw. 1) It completely fails to work as advertised when Python runs with optimizations on: [steve@ando python]$ cat make_int.py def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum print make_int('4.0') print make_int('4.1') # this should raise an exception [steve@ando python]$ python -O make_int.py 4 4 2) Even when it does work, it is misleading and harmful to raise AssertionError. The problem is with the argument's *value*, hence *ValueError* is the appropriate exception, not ImportError or TypeError or KeyError ... or AssertionError. Don't use assert as a lazy way to get error checking for free. 3) Worse, it falls over when given a sufficiently large int value: py> make_int(10**500) Traceback (most recent call last): File "", line 1, in File "", line 6, in make_int OverflowError: cannot convert float infinity to integer but at least you get an exception to warn you that something has gone wrong. 4) Disturbingly, the function silently does the wrong thing even for exact integer arguments: py> n = 10**220 # an exact integer value py> make_int(n) == n False 5) It loses precision for string values: py> s = "1"*200 py> make_int(s) % 10 8L And not by a little bit: py> make_int(s) - int(s) # should be zero 13582401819835255060712844221836126458722074364073358155901190901 52694241435026881979252811708675741954774190693711429563791133046 96544199238575935688832088595759108887701431234301497L Lest you think that it is only humongous numbers where this is a problem, it is not. A mere seventeen digits is enough: py> s = "10001" py> make_int(s) - int(s) -1L And at that point I stopped looking for faults. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 18/12/12 01:36, Oscar Benjamin wrote: I think it's unfortunate that Python's int() function combines two distinct behaviours in this way. In different situations int() is used to: 1) Coerce an object of some type other than int into an int without changing the value of the integer that the object represents. The second half of the sentence (starting from "without changing") is not justified. You can't safely make that assumption. All you know is that calling int() on an object is intended to convert the object to an int, in whatever way is suitable for that object. In some cases, that will be numerically exact (e.g. int("1234") will give 1234), in other cases it will not be. 2) Round an object with a non-integer value to an integer value. int() does not perform rounding (except in the most generic sense that any conversion from real-valued number to integer is "rounding"). That is what the round() function does. int() performs truncating: it returns the integer part of a numeric value, ignoring any fraction part: py> from decimal import Decimal as D py> from fractions import Fraction as F py> int(D("-123.")) -123 py> int(F(999, 100)) 9 So you shouldn't think of int(number) as "convert number to an int", since that is ambiguous. There are at least six common ways to convert arbitrary numbers to ints: * truncate, or round towards zero (drop any fraction part); * floor, or round towards -infinity (always round down); * ceiling, or round towards +infinity (always round up); * round to nearest, with ties rounding up; * round to nearest, with ties rounding down; * banker's rounding (round to nearest, with ties rounding to the nearest even number) Python provides truncation via the int and math.trunc functions, floor and ceiling via math.floor and math.ceil, and round to nearest via round. In Python 2, ties are rounded up, which is biased; in Python 3, the unbiased banker's rounding is used. Instead, you should consider int(number) to be one of a pair of functions, "return integer part", "return fraction part", where unfortunately the second function isn't provided directly. In general though, you can get the fractional part of a number with "x % 1". For floats, math.modf also works. So, in a sense int() does to double-duty as both a constructor of ints from non-numbers such as strings, and as a "get integer part" function for numbers. I'm okay with that. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On Mon, Dec 17, 2012 at 1:00 PM, Alan Gauld wrote: > > Python uses its own C code for this. The important point here is that they use the strtol/strtod interface, however it's implemented. atoi and atof lack the end pointer argument that enables raising a ValueError for an incomplete conversion. For example, strtol("123e9", &end, 10) will merrily return 123, but with *end == 'e'. I think it's good that Python raises a ValueError in this case. Errors should never pass silently. Unless explicitly silenced. http://www.python.org/dev/peps/pep-0020/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 17/12/12 14:36, Oscar Benjamin wrote: > Even stranger since the underlying atoi() C function Also, are you sure that atoi() is used in CPython? Nope, just making assumptions! :-0 As Eryksun points out Python uses its own C code for this. assume == ass u me :-( -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On Mon, Dec 17, 2012 at 3:55 AM, Alan Gauld wrote: > > So you are right, there is an inconsistency between how int() converts > floating point numbers and how it converts strings. Even stranger since the > underlying atoi() C function appears to handle float strings quite > happily... If you're using strtol in C it's up to you how to interpret an incomplete conversion due to an out-of-range number or bad literal for the given base. Python, on the other hand, automatically switches the type for big integers to a multiprecision long (2.x long) and raises ValueError for bad literals. Where you go from there is up to you. BTW, 2.x int() isn't using the libc atoi or strtol/strtoul functions. It has its own implementation in mystrtoul.c. PyOS_strtol wraps the main workhorse PyOS_strtoul (unsigned): http://hg.python.org/cpython/file/70274d53c1dd/Python/mystrtoul.c#l80 The conversion loop proceeds up to the first non-base character, as defined by the table _PyLong_DigitValue in longobject.c: http://hg.python.org/cpython/file/70274d53c1dd/Objects/longobject.c#l1604 A pointer to the last scanned character is set. In PyInt_FromString, if this points to the first character, or the previous character isn't alphanumeric, or removing trailing whitespace starting from this point doesn't end on a NUL terminator (e.g. *end == '.'), then the conversion raises a ValueError. See lines 364-383 in intobject.c: http://hg.python.org/cpython/file/70274d53c1dd/Objects/intobject.c#l340 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 17 December 2012 08:55, Alan Gauld wrote: > > On 17/12/12 04:19, boB Stepp wrote: > >> It is apparent that int() does not like strings with floating-point >> formats. None of my books (as far as my flipping can tell) or the >> below built-in help clarify this: >> ... >> >> Of course if I type int(float('10.0')) I get the desired 10 . > > > as indeed will > > int(10.0) > > So you are right, there is an inconsistency between how int() converts > floating point numbers and how it converts strings. Even stranger since the > underlying atoi() C function appears to handle float strings quite happily... The atoi() function like many of the older C functions has a major flaw in that it indicates an error by returning zero even though zero is actually a possible return value for the function. As far as I can tell it doesn't even set an error code on failure. As a result it is not safe to use without some additional checking either before or after the call. Python's int() function and C's atoi() function also accept and ignore whitespace around the number in the string: >>> int(' 123 ') 123 >>> int('\t\n \n 123 \n ') 123 >>> int('\t\n \n 123 \n 456') Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: '123 \n 456' In C, atoi() would have allowed that last example and given 123 as the result. Also, are you sure that atoi() is used in CPython? The int() function accepts an optional base argument and can process non-decimal strings: >>> int('0xff', 16) 255 >>> int('0o377', 8) 255 >>> int('0b', 2) 255 >>> int('', 2) 255 >> So, I am guessing that to convert strings to integers with int() that >> the string must already be of integer format? What is the rationale >> for setting up int() in this manner? I think it's unfortunate that Python's int() function combines two distinct behaviours in this way. In different situations int() is used to: 1) Coerce an object of some type other than int into an int without changing the value of the integer that the object represents. 2) Round an object with a non-integer value to an integer value. There are situations where behaviour 1) is required but behaviour 2) is definitely not wanted. The inability to do this safely in Python resulted in PEP 357 [1] that adds an __index__ method to objects that represent integers but are not of type int(). Unfortunately, this was intended for slicing and doesn't help when converting floats and strings to int(). I have often found myself writing awkward functions to prevent a rounding error from occurring when coercing an object with int(). Here's one: def make_int(obj): '''Coerce str, float and int to int without rounding error Accepts strings like '4.0' but not '4.1' ''' fnum = float('%s' % obj) inum = int(fnum) assert inum == fnum return inum References: [1] http://www.python.org/dev/peps/pep-0357/ Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 17/12/12 04:19, boB Stepp wrote: It is apparent that int() does not like strings with floating-point formats. None of my books (as far as my flipping can tell) or the below built-in help clarify this: ... Of course if I type int(float('10.0')) I get the desired 10 . as indeed will int(10.0) So you are right, there is an inconsistency between how int() converts floating point numbers and how it converts strings. Even stranger since the underlying atoi() C function appears to handle float strings quite happily... So, I am guessing that to convert strings to integers with int() that the string must already be of integer format? What is the rationale for setting up int() in this manner? No idea, you'd need to ask Guido, it's his language :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 12/17/2012 12:00 AM, boB Stepp wrote: On Sun, Dec 16, 2012 at 10:41 PM, Mitya Sirenef wrote: > >> What would you want to happen for int("10.5")? If 10.0 was accepted, >> it would be consistent to accept 10.5, too. > > I was expecting int("10.5") to return 10 . I just want to note that this is what you expect _now_, because this is what you're doing at the moment. If you were parsing a text and a float turned up at an unexpected spot, you may well be unpleasantly surprised if python silently changed it into a numerically quite different number! > >> The issue, I think, is that a simple operation should not go too far >> beyond what it is supposed to do - if you are sure you are converting a >> float in a string, you need to do it explicitly, and if you're >> converting a string to an int and the string is not actually an int, >> then maybe it wasn't supposed to be a float and it's a mistake in the >> program -- and therefore python should alert you. >> > And this is why I asked the question. If this is the rationale, it > makes sense--an extra bit of double checking of the programmer's > intent. No problem. I'm not certain this is the main/only reason, but I'd expect this reason alone be enough to make this design choice.. -m -- Lark's Tongue Guide to Python: http://lightbird.net/larks/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On Sun, Dec 16, 2012 at 10:41 PM, Mitya Sirenef wrote: > What would you want to happen for int("10.5")? If 10.0 was accepted, > it would be consistent to accept 10.5, too. I was expecting int("10.5") to return 10 . > The issue, I think, is that a simple operation should not go too far > beyond what it is supposed to do - if you are sure you are converting a > float in a string, you need to do it explicitly, and if you're > converting a string to an int and the string is not actually an int, > then maybe it wasn't supposed to be a float and it's a mistake in the > program -- and therefore python should alert you. > And this is why I asked the question. If this is the rationale, it makes sense--an extra bit of double checking of the programmer's intent. Thanks, boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Limitation of int() in converting strings
On 12/16/2012 11:19 PM, boB Stepp wrote: int('10.0') > Traceback (most recent call last): > File "", line 1, in > ValueError: invalid literal for int() with base 10: '10.0' int("10") > 10 > > It is apparent that int() does not like strings with floating-point > formats. None of my books (as far as my flipping can tell) or the > below built-in help clarify this: > > > Help on int object: > > class int(object) > | int(x[, base]) -> integer > | > | Convert a string or number to an integer, if possible. A floating > | point argument will be truncated towards zero (this does not include a > | string representation of a floating point number!) When converting a > | string, use the optional base. It is an error to supply a base when > | converting a non-string. > > Of course if I type int(float('10.0')) I get the desired 10 . > > So, I am guessing that to convert strings to integers with int() that > the string must already be of integer format? What is the rationale > for setting up int() in this manner? > > Thanks as I continue to puzzle over the fine points of the basics... > boB What would you want to happen for int("10.5")? If 10.0 was accepted, it would be consistent to accept 10.5, too. The issue, I think, is that a simple operation should not go too far beyond what it is supposed to do - if you are sure you are converting a float in a string, you need to do it explicitly, and if you're converting a string to an int and the string is not actually an int, then maybe it wasn't supposed to be a float and it's a mistake in the program -- and therefore python should alert you. -- Lark's Tongue Guide to Python: http://lightbird.net/larks/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Limitation of int() in converting strings
>>> int('10.0') Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: '10.0' >>> int("10") 10 It is apparent that int() does not like strings with floating-point formats. None of my books (as far as my flipping can tell) or the below built-in help clarify this: Help on int object: class int(object) | int(x[, base]) -> integer | | Convert a string or number to an integer, if possible. A floating | point argument will be truncated towards zero (this does not include a | string representation of a floating point number!) When converting a | string, use the optional base. It is an error to supply a base when | converting a non-string. Of course if I type int(float('10.0')) I get the desired 10 . So, I am guessing that to convert strings to integers with int() that the string must already be of integer format? What is the rationale for setting up int() in this manner? Thanks as I continue to puzzle over the fine points of the basics... boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor