On Fri, Sep 20, 2019 at 9:31 PM Robert Kern <robert.k...@gmail.com> wrote:
> On Fri, Sep 20, 2019 at 11:33 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> >> >> On Fri, Sep 20, 2019 at 7:09 AM Robert Kern <robert.k...@gmail.com> >> wrote: >> >>> >>> >>> On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers <ralf.gomm...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern <robert.k...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> We might end up with more than 2 implementations if we need to change >>>>> something about the function signature, for whatever reason, and we want >>>>> to >>>>> retain C/Cython API compatibility with older code. The C functions aren't >>>>> necessarily going to be one-to-one to the Generator methods. They're just >>>>> part of the implementation. So for example, if we wanted to, say, >>>>> precompute some intermediate values from the given scalar parameters so we >>>>> don't have to recompute them for each element of the `size`-large >>>>> requested >>>>> output, we might do that in one C function and pass those intermediate >>>>> values as arguments to the C function that does the actual sampling. So >>>>> we'd have two C functions for that one Generator method, and the sampling >>>>> C >>>>> function will not have the same signature as it did before the >>>>> modification >>>>> that refactored the work into two functions. In that case, I would not be >>>>> so strict as to require that `Generator.foo` is one to one with >>>>> `random_foo`. >>>>> >>>> >>>> You're saying "be so strict" as if it were a bad thing, or a major >>>> effort. >>>> >>> >>> I am. It's an unnecessary limitation on the C API without a >>> corresponding benefit. Your original complaint >>> >> >> It's not a "complaint". >> > > Please forgive me. That word choice was not intended to be dismissive. I > don't view "complaints" as minor annoyances that the "complainer" should > just shut up and deal with, or that the "complainer" is just being > annoying, but I can see how it came across that I might. Please continue as > if I said "The problem you originally noted...". It's a real problem that > needs to be addressed. We just have different thoughts on exactly what is > needed to address it. > Okay, thank you:) > >> We're having this discussion because we shipped a partial API in 1.17.0 >> that we will now have to go back and either take out or clean up in 1.17.3. >> The PR for the new numpy.random grew so large that we didn't notice or >> discuss that (such things happen, no big deal - we have limited reviewer >> bandwidth). So now that we do, it makes sense to actually think about what >> needs to be in the API. For now I think that's only the parts that are >> matching the Python API plus what is needed to use them from C/Cython. >> Future additions require similar review and criteria as adding to the >> Python API and the existing NumPy C API. To me, your example seems to (a) >> not deal with API stability, and (b) expose too much implementation detail. >> >> To be clear about the actual status, we: >> - shipped one header file (bitgen.h) >> - shipped two pxd files (common.pxd, bit_generator.pxd) >> - removed a header file we used to ship (randomkit.h) >> - did not ship distributions.pxd, bounded_integers.pxd, >> legacy_distributions.pxd or related header files >> >> bit_generator.pxd looks fine, common.pxd contains parts that shouldn't be >> there. I think the intent was to ship at least distributions.pxd/h, and >> perhaps all of those pxd files. >> >> is much more directly addressed by a "don't gratuitously name related C >>> functions differently than the Python methods they implement" rule (e.g. >>> "gauss" instead of "normal"). >>> >>> >>>> I understand that in some cases a C API can not be evolved in the same >>>> way as a Python API, but in the example you're giving here I'd say you want >>>> one function to be public, and one private. Making both public just exposes >>>> more implementation details for no good reason, and will give us more >>>> maintenance issues long-term. >>>> >>> >>> Not at all. In this example, neither one of those functions is useful >>> without the other. If one is public, both must be. >>> >> >> If neither one is useful without the other, it sounds like both should be >> private and the third one that puts them together - the one that didn't >> change signature and implements `Generator.foo` - is the public one. >> > > That defeats the point of using the C API in this instance, though. The > reason it got split into two (in this plausible hypothetical; I'm thinking > of the binomial implementation here, which caches these intermediates in a > passed-in struct) is because in C you want to call them in different ways > (in this case, the prep function once and the sampling function many > times). It's not that you always call them in lockstep pairs: `prep(); > sample(); prep(); sample();`. A C function that combines them defeats the > efficiency that one wanted to gain by using the C API. The C API has > different needs than the Python API, because the Python API has a lot more > support from the Python language and numpy data structures to be able to > jam a lot of functionality into a single function signature that C just > doesn't give us. The purpose of the C API is not just to avoid Python > function call overhead. If there's a reason that the Generator method needs > the implementation split up into multiple C functions, that's a really > strong signal that *other* C code using the C API will need that same > split. It's not just an implementation detail; it's a documented use case. > > Given the prevalence of Cython, it's actually really easy to use the > Python API pretty easily in "C", so it's actually a huge waste if the C API > matches the Python API too closely. The power and utility of the C API will > be in how it *differs* from the Python API. For the distribution methods, > this is largely in how it lets you sample one number at a time without > bothering with the numpy and broadcasting overhead. That's the driving > motivation for having a C API for the distributions, and the algorithms > that we choose have consequences for the C API that will best satisfy that > motivation. > > The issue that this raises with API stability is that if you require a > one-to-one match between the C API function and the Generator method, we > can never change the function signature of the C function. That's going to > forbid us from moving from an algorithm that doesn't need any > precomputation to one that does. That precomputation either requires a > 2-function dance or a new argument to keep the cached values (c.f. > `random_binomial()`), so it's always going to affect the API. To use such a > new algorithm, we'll have to add a new function or two to the C API and > document the deprecation of the older API function. We can't just swap it > in under the same name, even if the new function is standalone. That's a > significant constraint on future development when the main issue that led > to the suggestion is that the names were sometimes gratuitously different > between the C API and the Python API, which hindered discoverability. We > can fix *that* problem easily without constraining the universe of > algorithms that we might consider using in the future. > Fair enough, now the use case is clear to me. In summary: there may be real reasons to deviate and add more functions; let's do so if and when it makes sense to. Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion