Here's what I've got on overloading. This turned out to be a lot more work, and this mail much longer, than I'd hoped. Would appreciate a readthrough and comments/questions, I've done about all I can do. First a review of the problems, then a walkthrough of an implementation I have that fixes them.
1.a) Problem: Ambiguous calls from python to overloaded functions Boost.python's 'overloading' is and has always been broken w.r.t. is ambiguity. The library currently has no notion, given some tuple of arguments, of one function being a better match than another. It knows only "this function matches" or not. What the user observes is the following: when a call to a c++ function is made from python e.g. >>> m.f(1) boost.python looks through the overloads for f, in the reverse order of registration, and calls the first one that is callable. For instance if the registrations are void f_int(int); void f_bool(bool); BOOST_PYTHON_MODULE(m) { def("f", f_int); def("f", f_bool); } then f_bool will execute this call of f(1), since the python 'int' converts to c++ bool. If the overloads were registered in the reverse order, f_int would be executed. In this case the c++ overloads in question are potentially distinguishable from one another, as python has distinct types bool and int. So these aren't "necessarily" ambiguous. 1.b) Problem: "Necessarily" ambiguous overloads There are some overload sets that *are* necessarily ambiguous. This set, for instance, void f_double(double); void f_float(float); BOOST_PYTHON_MODULE(m) { def("f", f_double); def("f", f_float); } can never be unambiguous, since python has no 'float' type. I.e. >>> f(x) will call f_float for any value of x where the call succeeds at all. 1.c) Problem: Multiple values for keyword argument At the same place in the boost.python code where this 'first-match' overload resolution is done, the user error 'multiple values for keyword argument' is not checked. Neal Becker recently pointed this out. With boost.python it looks like this: int addem(int x, int y, int z) { return x*100 + y*10 + z; } BOOST_PYTHON_MODULE(M) { def("addem", &addem, (arg("x"), arg("y"), arg("z"))); } >>> from M import addem >>> addem(1, 8, 2, x=4) Traceback (most recent call last): ... ArgumentError: Python argument types in M.addem(int, int, int) did not match C++ signature: addem(int x, int y, int z) That error message is very confusing... f(int,int,int) doesn't match f(int,int,int)? The pure python version reports something more sensible: >>> def g(x,y,z, **kwargs): ... print x,y,z,kwargs ... >>> g(1,2,3,x=4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: g() got multiple values for keyword argument 'x' 2.) An implemented solution I've got something implemented. Here's what it does. 2.a) Solution: multiple values for keyword The easiest case to catch is the last [1.c]. It is also orthogonal to the others. If a registered function has keyword arguments, check for multiple keyword values, and raise an error if so: >>> from M import addem >>> addem(1,2,3,x=1) Boost.Python.TypeError: M.addem() got multiple values for keyword argument 'x' 2.b) Solution: "necessarily" ambiguous overloads The next easiest thing to catch is case [1.b], "necessarily" ambiguous registrations. It proceeds as follows: at import time, as each new overload V for function F is registered, compare V to the existing overloads EO for F. If V has the same signature as something in EO, raise an AmbiguousOverload exception. For instance, if you load the module from [1.b] above, you get: >>> import m Traceback (most recent call last): ... AmbiguousOverload: Boost.Python function m.f has ambiguous overloads. C++ signatures f(float) -> None and f(double) -> None are indistinguishable to python. Again this is because c++ float and c++ double both map to python 'float'. This one is "easy" as it happens only once (not at every function call) and doesn't take up any space. 2.c) Solution: ambiguous calls The hard case is [1.a]: void f_bool(bool); void f_int(int); BOOST_PYTHON_MODULE(m) { def(f, f_bool); def(f, f_int); } For module 'm' above, a call to f(True) or f(1) should succeed and call the corresponding function. Passing a float, however, is ambiguous: >>> f(True) # ok >>> f(1) # ok >>> f(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function m.f C++ signatures: f(int) f(bool) So the implementation has some how 'scored' each possible overload and in each case either used the best one or reported ambiguity if multiple overloads are tied for 'best'. The scoring works as follows. In the guts of boost.python we recieve a tuple of arguments of type PyObject*. We also have a list of overloads, each with an mpl vector representing the signature of the associated C++ function. This is unchanged from the released version. What has been done until now is to attempt to call each overload in order and return the result of the first that succeeds. In this new implmentation, for each c++ argument of type T_n, we ask a class overload_score<T_n> to score the conversion of the type (PyTypeObject*) of this PyObject* to T_n. Example: template <> struct overload_score<int> { boost::optional<unsigned> operator()(PyObject* type) { if(PyInt_CheckExact(type)) return boost::optional<unsigned>(0); // int == perfect else if (PyBool_Check(type)) return boost::optional<unsigned>(1); // bool == okay else if (PyFloat_CheckExact(type)) return boost::optional<unsigned>(1); // float == okay else if(arg_from_python<int>(type).convertible()) return boost::optional<unsigned>(1); // fallback else return boost::optional<unsigned>(); // unsuitable } }; The "score" type is optional<unsigned>. A valueless, (default-constructed) optional<unsigned> means 'unsuitable'. optional<unsigned>(0) is a perfect match, and optional<unsigned>(value) for value > 0 is a workable but less than perfect match. These per-argument scores are added together for all arguments: this is the overload's total score. If any argument is 'unsuitable', the total score is 'unsuitable'. If there is a tie for the best (lowest) score, the call is ambiguous and a Boost.Python.AmbiguousCall exception is raised. If there is only one function in first place, call it. 3.) Implications This breaks a few corner cases that I've found in the tests. 3.a) implied init<> registration I have found a few instances like this one: struct X { X(); X(int); }; class_<X>("X") // #1 .def(init<>()) // #2 .def(init<int>()) ; Here, #1 causes a default constructor to be registered, as does #2. This will cause a throw at load time as in [2.b]. It is simple to fix. 3.b) raw_constructor The test suite for raw_constructor depends on our first_match overload resolution. The fix is to move a few things around to get the same effect without relying on this behavior. The "before" code looks like this: class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {} tuple args; dict kw; }; object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__init__")(rest, kw); } BOOST_PYTHON_MODULE(raw_ctor_ext) { // using no_init postpones defining __init__ function until after // raw_function for proper overload resolution order, since later // defs get higher priority. class_<Foo>("Foo", no_init) .def("__init__", raw_function(&init_foo)) .def(init<tuple, dict>()) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; } The "after" code does the registration piecemeal: Maybe there is a better way to do this. To my mind this is a little better because it makes explicit what is happening rather than relying on some subtle property of overload resolution: class Foo { public: Foo(tuple args, dict kw) : args(args), kw(kw) {} tuple args; dict kw; }; // our custom factory function object init_foo(tuple args, dict kw) { tuple rest(args.slice(1,_)); return args[0].attr("__real_init__")(rest, kw); } BOOST_PYTHON_MODULE(raw_ctor_ext) { // to get the desired effect we register // the actual constructor and our 'raw' constructor, // and then rename them class_<Foo> c("Foo", init<tuple, dict>()); c .def("__tmp_init__", raw_function(&init_foo)) .def_readwrite("args", &Foo::args) .def_readwrite("kw", &Foo::kw) ; // // __init__ => __real_init__ // __tmp_init__ => __init__ // object real_constructor = getattr(c, "__init__"); object raw_constructor = getattr(c, "__tmp_init__"); setattr(c, "__init__", raw_constructor); delattr(c, "__tmp_init__"); setattr(c, "__real_init__", real_constructor); } And that basically covers it. Looking forward to comments/feedback. The code is here: http://gitorious.org/~straszheim/boost/straszheims-python/trees/master Note that the headers are in subdirectory 'include'... If there is enough response to this mail it I'll send out instructions on how to get a working build going. - t _______________________________________________ Cplusplus-sig mailing list Cplusplus-sig@python.org http://mail.python.org/mailman/listinfo/cplusplus-sig