@Ben: Have you found a solution to your problem? Are there thinks we could do in numpy to make it better?
-CHB On Mon, Feb 12, 2018 at 9:33 AM, Chris Barker <chris.bar...@noaa.gov> wrote: > I think it's all been said, but a few comments: > > On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker <nilsc.bec...@gmail.com> > wrote: > >> Generating equidistantly spaced grids is simply not always possible. >> > > exactly -- and linspace gives pretty much teh best possible result, > guaranteeing tha tthe start an end points are exact, and the spacing is > within an ULP or two (maybe we could make that within 1 ULP always, but not > sure that's worth it). > > >> The reason is that the absolute spacing of the possible floating point >> numbers depends on their magnitude [1]. >> > > Also that the exact spacing may not be exactly representable in FP -- so > you have to have at least one space that's a bit off to get the end points > right (or have the endpoints not exact). > > >> If you - for some reason - want the same grid spacing everywhere you may >> choose an appropriate new spacing. >> > > well, yeah, but usually you are trying to fit to some other constraint. > I'm still confused as to where these couple of ULPs actually cause > problems, unless you are doing in appropriate FP comparisons elsewhere. > > Curiously, either by design or accident, arange() seems to do something >> similar as was mentioned by Eric. It creates a new grid spacing by adding >> and subtracting the starting point of the grid. This often has similar >> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around >> 0.0). Consequently, arange() seems to trade keeping the grid spacing >> constant for a larger error in the grid size and consequently in the end >> point. >> > > interesting -- but it actually makes sense -- that is the definition of > arange(), borrowed from range(), which was designed for integers, and, in > fact, pretty much mirroered the classic C index for loop: > > > for (int i=0; i<N; i++) { > ... > > > or in python: > > i = start > while i < stop: > i += step > > The problem here is that termination criteria -- i < stop -- that is the > definition of the function, and works just fine for integers (where it came > from), but with FP, even with no error accumulation, stop may not be > exactly representable, so you could end up with a value for your last item > that is about (stop-step), or you could end up with a value that is a > couple ULPs less than step -- essentially including the end point when you > weren't supposed to. > > The truth is, making a floating point range() was simply a bad idea to > begin with -- it's not the way to define a range of numbers in floating > point. Whiuch is why the docs now say "When using a non-integer step, such > as 0.1, the results will often not > be consistent. It is better to use ``linspace`` for these cases." > > Ben wants a way to define grids in a consistent way -- make sense. And > yes, sometimes, the original source you are trying to match (like GDAL) > provides a starting point and step. But with FP, that is simply > problematic. If: > > start + step*num_steps != stop > > exactly in floating point, then you'll need to do the math one way or > another to get what you want -- and I'm not sure anyone but the user knows > what they want -- do you want step to be as exact as possible, or do you > want stop to be as exact as possible? > > All that being said -- if arange() could be made a tiny bit more accurate > with fma or other numerical technique, why not? it won't solve the > problem, but if someone writes and tests the code (and it does not require > compiler or hardware features that aren't supported everywhere numpy > compiles), then sure. (Same for linspace, though I'm not sure it's possible) > > There is one other option: a new function (or option) that makes a grid > from a specification of: start, step, num_points. If that is really a > common use case (that is, you don't care exactly what the end-point is), > then it might be handy to have it as a utility. > > We could also have an arange-like function that, rather than < stop, would > do "close to" stop. Someone that understands FP better than I might be able > to compute what the expected error might be, and find the closest end point > within that error. But I think that's a bad specification -- (stop - start) > / step may be nowhere near an integer -- then what is the function supposed > to do?? > > > BTW: I kind of wish that linspace specified the number of steps, rather > than the number of points, that is (num+points - 1) that would save a > little bit of logical thinking. So if something new is made, let's keep > that in mind. > > > >> 1. Comparison to calculations with decimal can be difficult as not all >> simple decimal step sizes are exactly representable as >> > finite floating point numbers. >> > > yeah, this is what I mean by inappropriate use of Decimal -- decimal is > not inherently "more accurate" than fp -- is just can represent _decimal_ > numbers exactly, which we are all use to -- we want 1 / 10 to be exact, > but dont mind that 1 / 3 isn't. > > Decimal also provided variable precision -- so it can be handy for that. I > kinda wish Python had an arbitrary precision binary floating point built > in... > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion