Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On 16 Jul 2014 10:26, "Tony Yu" wrote: > > Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: > > np.testing.assert_allclose(0, 1e-14) > > Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. > > https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf > > It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] `allclose` vs `assert_allclose`
On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu wrote: > Is there any reason why the defaults for `allclose` and `assert_allclose` > differ? This makes debugging a broken test much more difficult. More > importantly, using an absolute tolerance of 0 causes failures for some > common cases. For example, if two values are very close to zero, a test > will fail: > > np.testing.assert_allclose(0, 1e-14) > > Git blame suggests the change was made in the following commit, but I > guess that change only reverted to the original behavior. > > > https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf > Indeed, was reverting a change that crept into https://github.com/numpy/numpy/commit/f527b49a > > It seems like the defaults for `allclose` and `assert_allclose` should > match, and an absolute tolerance of 0 is probably not ideal. I guess this > is a pretty big behavioral change, but the current default for > `assert_allclose` doesn't seem ideal. > I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?
Dear all, I have two arrays with both float type, let's say X and Y. I want to round the X to integers (intX) according to some decimal threshold, at the same time I want to limit the following difference as small: diff = np.sum(X*Y) - np.sum(intX*Y) I don't have to necessarily minimize the "diff" variable (If with this demand the computation time is too long). But I would like to limit the "diff" to, let's say ten percent within np.sum(X*Y). I have tried to write some functions, but I don't know where to start the opitimization. def convert_integer(x,threshold=0): """ This fucntion converts the float number x to integer according to the threshold. """ if abs(x-0) < 1e5: return 0 else: pdec,pint = math.modf(x) if pdec > threshold: return int(math.ceil(pint)+1) else: return int(math.ceil(pint)) def convert_arr(arr,threshold=0): out = arr.copy() for i,num in enumerate(arr): out[i] = convert_integer(num,threshold=threshold) return out In [147]: convert_arr(np.array([0.14,1.14,0.12]),0.13) Out[147]: array([1, 2, 0]) Now my problem is, how can I minimize or limit the following? diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) Because it's the first time I encounter such kind of question, so please give me some clue to start :p Thanks a lot in advance. Best, Chao -- please visit: http://www.globalcarbonatlas.org/ *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?
Sorry, there is one error in this part of code, it should be: def convert_integer(x,threshold=0): """ This fucntion converts the float number x to integer according to the threshold. """ if abs(x-0) < 1e-5: return 0 else: pdec,pint = math.modf(x) if pdec > threshold: return int(math.ceil(pint)+1) else: return int(math.ceil(pint)) On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE wrote: > Dear all, > > I have two arrays with both float type, let's say X and Y. I want to round > the X to integers (intX) according to some decimal threshold, at the same > time I want to limit the following difference as small: > > diff = np.sum(X*Y) - np.sum(intX*Y) > > I don't have to necessarily minimize the "diff" variable (If with this > demand the computation time is too long). But I would like to limit the > "diff" to, let's say ten percent within np.sum(X*Y). > > I have tried to write some functions, but I don't know where to start the > opitimization. > > def convert_integer(x,threshold=0): > """ > This fucntion converts the float number x to integer according to the > threshold. > """ > if abs(x-0) < 1e5: > return 0 > else: > pdec,pint = math.modf(x) > if pdec > threshold: > return int(math.ceil(pint)+1) > else: > return int(math.ceil(pint)) > > def convert_arr(arr,threshold=0): > out = arr.copy() > for i,num in enumerate(arr): > out[i] = convert_integer(num,threshold=threshold) > return out > > In [147]: > convert_arr(np.array([0.14,1.14,0.12]),0.13) > > Out[147]: > array([1, 2, 0]) > > Now my problem is, how can I minimize or limit the following? > diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) > > Because it's the first time I encounter such kind of question, so please > give me some clue to start :p Thanks a lot in advance. > > Best, > > Chao > > -- > please visit: > http://www.globalcarbonatlas.org/ > > *** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > -- please visit: http://www.globalcarbonatlas.org/ *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays?
Dear all, A bit sorry, this is not difficult. scipy.optimize.minimize_scalar seems to solve my problem. Thanks anyway, for this great tool. Cheers, Chao On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE wrote: > Dear all, > > I have two arrays with both float type, let's say X and Y. I want to round > the X to integers (intX) according to some decimal threshold, at the same > time I want to limit the following difference as small: > > diff = np.sum(X*Y) - np.sum(intX*Y) > > I don't have to necessarily minimize the "diff" variable (If with this > demand the computation time is too long). But I would like to limit the > "diff" to, let's say ten percent within np.sum(X*Y). > > I have tried to write some functions, but I don't know where to start the > opitimization. > > def convert_integer(x,threshold=0): > """ > This fucntion converts the float number x to integer according to the > threshold. > """ > if abs(x-0) < 1e5: > return 0 > else: > pdec,pint = math.modf(x) > if pdec > threshold: > return int(math.ceil(pint)+1) > else: > return int(math.ceil(pint)) > > def convert_arr(arr,threshold=0): > out = arr.copy() > for i,num in enumerate(arr): > out[i] = convert_integer(num,threshold=threshold) > return out > > In [147]: > convert_arr(np.array([0.14,1.14,0.12]),0.13) > > Out[147]: > array([1, 2, 0]) > > Now my problem is, how can I minimize or limit the following? > diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) > > Because it's the first time I encounter such kind of question, so please > give me some clue to start :p Thanks a lot in advance. > > Best, > > Chao > > -- > please visit: > http://www.globalcarbonatlas.org/ > > *** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > -- please visit: http://www.globalcarbonatlas.org/ *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
> But HDF5 > additionally has a fixed-storage-width UTF8 type, so we could map to a > NumPy fixed-storage-width type trivially. Sure -- this is why *nix uses utf-8 for filenames -- it can just be a char*. But that just punts the problem to client code. I think a UTF-8 string type does not match the numpy model well, and I don't think we should support it just because it would be easier for the HDF 5 wrappers. ( to be fair, there are probably other similar systems numpy wants to interface with that cod use this...) It seems if you want a 1:1 binary mapping between HDF and numpy for utf strings, then a bytes type in numpy makes more sense. Numpy could/should have encode and decode methods for converting byte arrays to/from Unicode arrays (does it already? ). > "Custom" in this context means a user-created HDF5 data-conversion > filter, which is necessary since all data conversion is handled inside > the HDF5 library. > As far as generic Unicode goes, we currently don't support the NumPy > "U" dtype in h5py for similar reasons; there's no destination type in > HDF5 which (1) would preserve the dtype for round-trip write/read > operations and (2) doesn't risk truncation. It sounds to like HDF5 simply doesn't support Unicode. Calling an array of bytes utf-8 simple pushes the problem on to client libs. As that's where the problem lies, then the PyHDF may be the place to address it. If we put utf-8 in numpy, we have the truncation problem there instead -- which is exactly what I think we should avoid. > A Latin-1 based 'a' type > would have similar problems. Maybe not -- latin1 is fixed width. >> Does HDF enforce ascii-only? what does it do with the > 127 values? > > Unfortunately/fortunately the charset is not enforced for either ASCII So you can dump Latin-1 into and out of the HDF 'ASCII' type -- it's essentially the old char* / py2 string. An ugly situation, but why not use it? > or UTF-8, So ASCII and utf-8 are really the same thing, with different meta-data... > although the HDF Group has been thinking about it. I wonder if they would consider going Latin-1 instead of ASCII -- similarly to utf-8 it's backward compatible with ASCII, but gives you a little more. I don't know that there is another 1byte encoding worth using -- it maybe be my English bias, but it seems Latin-1 gives us ASCII+some extra stuff handy for science ( I use the degree symbol a lot, for instance) with nothing lost. > Ideally, NumPy would support variable-length > strings, in which case all these headaches would go away. Would they? That would push the problem back to PyHDF -- which I'm arguing is where it belongs, but I didn't think you were ;-) > > But I > imagine that's also somewhat complicated. :) That's a whole other kettle of fish, yes. -Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] parallel distutils extensions build? use gcc -flto
hi, I have been playing around a bit with gccs link time optimization feature and found that using it actually speeds up a from scratch build of numpy due to its ability to perform parallel optimization and linking. As a bonus you also should get faster binaries due to the better optimizations lto allows. As compiling with lto does require some possibly lesser know details I wanted to share it. Prerequesits are a working gcc toolchain of at least gcc-4.8 and binutils > 2.21, gcc 4.9 is better as its faster. First of all numpy checks the long double representation by compiling a file and looking at the binary, this won't work as the od -b reimplementation here does not understand lto objects, so on x86 we must short circuit that: --- a/numpy/core/setup_common.py +++ b/numpy/core/setup_common.py @@ -174,6 +174,7 @@ def check_long_double_representation(cmd): # We need to use _compile because we need the object filename src, object = cmd._compile(body, None, None, 'c') try: + return 'IEEE_DOUBLE_LE' type = long_double_representation(pyod(object)) return type finally: Next we build numpy as usual but override the compiler, linker and ar to add our custom flags. The setup.py call would look like this: CC='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -O3' \ LDSHARED='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -shared -O3' AR=gcc-ar \ python setup.py build_ext Some explanation: The ar override is needed as numpy builds a static library and ar needs to know about lto objects. gcc-ar does exactly that. -flto=4 the main flag tell gcc to perform link time optimizations using 4 parallel processes. -fno-fat-lto-objects tells gcc to only build lto objects, normally it builds both an lto object and a normal object for toolchain compatibilty. If our toolchain can handle lto objects this is just a waste of time and we skip it. (The flag is default in gcc-4.9 but not 4.8) -fuse-linker-plugin directs it to run its link time optimizer plugin in the linking step, the linker must support plugins, both bfd (> 2.21) and gold linker do so. This allows for more optimizations. -O3 has to be added to the linker too as thats where the optimization occurs. In general a problem with lto is that the compiler options of all steps much match the flags used for linking. If you are using c++ or gfortran you also have to override that to use lto (CXX and FF(?)) See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for a lot more details. For some numbers on my machine a from scratch numpy build with no caching takes 1min55s, with lto on 4 it only takes 55s. Pretty neat for a much more involved optimization process. Concerning the speed gain we get by this, I ran our benchmark suite with this build, there were no really significant gains which is somewhat expected as numpy is simple C code with most function bottlenecks already inlined. So conclusion: flto seems to work well with recent gccs and allows for faster builds using the limited distutils. While probably not useful for development where compiler caching (ccache) is of utmost importance it is still interesting for projects doing one shot uncached builds (travis like CI) and have huge objects (e.g. swig or cython) and don't want to change to proper parallel build systems like bento. PS: So far I know clang also supports lto but I never used it PPS: using NPY_SEPARATE_COMPILATION=0 crashes gcc-4.9, time for a bug report. Cheers, Julian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] __numpy_ufunc__
On Wed, Jul 16, 2014 at 10:07 AM, Nathaniel Smith wrote: > Weirdly, I never received Chuck's original email in this thread. Should > some list admin be informed? > Also weirdly, my reply didn't show up on gmane. Not sure if it got through, so re-sending: It's already in, so do you mean not using? Would help to know what the issue is, because it's finished enough that it's already used in a released version of scipy (in sparse matrices). Ralf I also am not sure what/where Julian's comments were, so I second the call > for context :-). Putting it off until 1.10 doesn't seem like an obviously > bad idea to me, but specifics would help... > > (__numpy_ufunc__ is the new system for allowing arbitrary third party > objects to override how ufuncs are applied to them, i.e. it means > np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something > sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__ > system, which was limited to ndarray subclasses and has major limits on > what you can do. Of course __array_prepare/wrap__ will also continue to be > supported for compatibility.) > -n > On 16 Jul 2014 00:10, "Benjamin Root" wrote: > >> Perhaps a bit of context might be useful? How is numpy_ufunc different >> from the ufuncs that we know and love? What are the known implications? >> What are the known shortcomings? Are there ABI and/or API concerns between >> 1.9 and 1.10? >> >> Ben Root >> >> >> On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris < >> charlesr.har...@gmail.com> wrote: >> >>> Hi All, >>> >>> Julian has raised the question of including numpy_ufunc in numpy 1.9. I >>> don't feel strongly one way or the other, but it doesn't seem to be >>> finished yet and 1.10 might be a better place to work out the remaining >>> problems along with the astropy folks testing possible uses. >>> >>> Thoughts? >>> >>> Chuck >>> >>> ___ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] __numpy_ufunc__
Perhaps a bit of context might be useful? How is numpy_ufunc different from the ufuncs that we know and love? What are the known implications? What are the known shortcomings? Are there ABI and/or API concerns between 1.9 and 1.10? Ben Root On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris wrote: > Hi All, > > Julian has raised the question of including numpy_ufunc in numpy 1.9. I > don't feel strongly one way or the other, but it doesn't seem to be > finished yet and 1.10 might be a better place to work out the remaining > problems along with the astropy folks testing possible uses. > > Thoughts? > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
On Tue, Jul 15, 2014 at 9:15 AM, Charles R Harris wrote: > > > > On Tue, Jul 15, 2014 at 5:26 AM, Sebastian Berg < > sebast...@sipsolutions.net> wrote: > >> On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: >> > As previous posts have pointed out, Numpy's `S` type is currently >> > treated as a byte string, which leads to more complicated code in >> > python3. OTOH, the unicode type is stored as UCS4, which consumes a >> > lot of space, especially for ascii strings. This note proposes to >> > adapt the currently existing 'a' type letter, currently aliased to >> > 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte >> > internal representations for unicode strings, ascii and latin1. Ascii >> > has the advantage that it is a subset of UTF-8, whereas latin1 has a >> > few more symbols. Another possibility is to just make it an UTF-8 >> > encoding, but I think this would involve more overhead as Python would >> > need to determine the maximum character size. These are just >> > preliminary thoughts, comments are welcome. >> > >> >> Just wondering, couldn't we have a type which actually has an >> (arbitrary, python supported) encoding (and "bytes" might even just be a >> special case of no encoding)? Basically storing bytes and on access do >> element[i].decode(specified_encoding) and on storing element[i] = >> value.encode(specified_encoding). >> >> There is always the never ending small issue of trailing null bytes. If >> we want to be fully compatible, such a type would have to store the >> string length explicitly to support trailing null bytes. >> > > UTF-8 encoding works with null bytes. That is one of the reasons it is so > popular. > > Thinking more about it, the easiest thing to do might be to make the S dtype a UTF-8 encoding. Most of the machinery to deal with that is already in place. That change might affect some users though, and we might need to do some work to make it backwards compatible with python 2. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
in 0.15.0 pandas will have full fledged support for categoricals which in effect allow u 2 map a smaller number of strings to integers this is now in pandas master http://pandas-docs.github.io/pandas-docs-travis/categorical.html feedback welcome! > On Jul 14, 2014, at 1:00 PM, Olivier Grisel wrote: > > 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky : >> >>> On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: >>> >>> I feel like for most purposes, what we *really* want is a variable length >>> string dtype (I.e., where each element can be a different length.). >> >> >> >> I've been toying with the idea of creating an array type for interned >> strings. In many applications dealing with large arrays of variable size >> strings, the strings come from a relatively short set of names. Arrays of >> interned strings can be manipulated very efficiently because in may respects >> they are just like arrays of integers. > > +1 I think this is why pandas is using dtype=object to load string > data: in many cases short string values are used to represent > categorical variables with a comparatively small cardinality of > possible values for a dataset with comparatively numerous records. > > In that case the dtype=object is not that bad as it just stores > pointer on string objects managed by Python. It's possible to intern > the strings manually at load time (I don't know if pandas or python > already do it automatically in that case). The integer semantics is > good for that case. Having an explicit dtype might be even better. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
On Mon, Jul 14, 2014 at 10:39 AM, Andrew Collette wrote: > For storing data in HDF5 (PyTables or h5py), it would be somewhat > cleaner if either ASCII or UTF-8 are used, as these are the only two > charsets officially supported by the library. good argument for ASCII, but utf-8 is a bad idea, as there is no 1:1 correspondence between length of string in bytes and length in characters -- as numpy needs to pre-allocate a defined number of bytes for a dtype, there is a disconnect between the user and numpy as to how long a string is being stored...this isn't a problem for immutable strings, and less of a problem for HDF, as you can determine how many bytes you need before you write the file (or does HDF support var-length elements?) > Latin-1 would require a > custom read/write converter, which isn't the end of the world "custom"? it would be an encoding operation -- which you'd need to go from utf-8 to/from unicode anyway. So you would lose the ability to have a nice 1:1 binary representation map between numpy and HDF... good argument for ASCII, I guess. Or for HDF to use latin-1 ;-) Does HDF enforce ascii-only? what does it do with the > 127 values? > would be tricky to do in a correct way, and likely somewhat slow. > We'd also run into truncation issues since certain latin-1 chars > become multibyte sequences in UTF8. > that's the whole issue with UTF-8 -- it needs to be addressed somewhere, and the numpy-HDF interface seems like a smarter place to put it than the numpy-user interface! I assume 'a' strings would still be null-padded? yup. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] `allclose` vs `assert_allclose`
Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. Thanks, -Tony ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] __numpy_ufunc__
On Mon, Jul 14, 2014 at 8:22 PM, Charles R Harris wrote: > Hi All, > > Julian has raised the question of including numpy_ufunc in numpy 1.9. I > don't feel strongly one way or the other, but it doesn't seem to be > finished yet and 1.10 might be a better place to work out the remaining > problems along with the astropy folks testing possible uses. > > Thoughts? > It's already in, so do you mean not using? Would help to know what the issue is, because it's finished enough that it's already used in a released version of scipy (in sparse matrices). Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: > On 12 Jul 2014 23:06, "Charles R Harris" > wrote: > > > > As previous posts have pointed out, Numpy's `S` type is currently > treated as a byte string, which leads to more complicated code in python3. > OTOH, the unicode type is stored as UCS4, which consumes a lot of space, > especially for ascii strings. This note proposes to adapt the currently > existing 'a' type letter, currently aliased to 'S', as a new fixed encoding > dtype. Python 3.3 introduced two one byte internal representations for > unicode strings, ascii and latin1. Ascii has the advantage that it is a > subset of UTF-8, whereas latin1 has a few more symbols. Another possibility > is to just make it an UTF-8 encoding, but I think this would involve more > overhead as Python would need to determine the maximum character size. > These are just preliminary thoughts, comments are welcome. > > I feel like for most purposes, what we *really* want is a variable length > string dtype (I.e., where each element can be a different length.). Pandas > pays quite some price in overhead to fake this right now. Adding such a > thing will cause some problems regarding compatibility (what to do with > array(["foo"])) and education, but I think it's worth it in the long run. A > variable length string with out of band storage also would allow for a lot > of py3.3-style storage tricks of we want then. > > Given that, though, I'm a little dubious about adding a third fixed length > string type, since it seems like it might be a temporary patch, yet raises > the prospect of having to indefinitely support *5* distinct string types (3 > of which will map to py3 str)... > > OTOH, fixed length nul padded latin1 would be useful for various flat file > reading tasks. > As one of the original agitators for this, let me re-iterate that what the astronomical community *really* wants is the original proposal as described by Chris Barker [1] and essentially what Charles said. We have large data archives that have ASCII string data in binary formats like FITS and HDF5. The current readers for those datasets present users with numpy S data types, which in Python 3 cannot be compared to str (unicode) literals. In many cases those datasets are large, and in my case I regularly deal with multi-Gb sized bytestring arrays. Converting those to a U dtype is not practical. This issue is the sole blocker that I personally have in beginning to move our operations code base to be Python 3 compatible, and eventually actually baselining Python 3. A variable length string would be great, but it feels like a different (and more difficult) problem to me. If, however, this can be the solution to the problem I described, and it can be implemented in a finite time, then I'm all for it! :-) I hate begging for features with no chance of contributing much to the implementation (lacking the necessary expertise in numpy internals). I would be happy to draft a NEP if that will help the process. Cheers, Tom [1]: http://mail.scipy.org/pipermail/numpy-discussion/2014-January/068622.html > -n > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.cross for 2D vectors
On Tue, Jul 15, 2014 at 2:22 AM, Neil Hodgson wrote: > Hi, > > We came across this bug while using np.cross on 3D arrays of 2D vectors. > What version of numpy are you using? This should already be solved in numpy master, and be part of the 1.9 release. Here's the relevant commit, although the code has been cleaned up a bit in later ones: https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] __numpy_ufunc__
Weirdly, I never received Chuck's original email in this thread. Should some list admin be informed? I also am not sure what/where Julian's comments were, so I second the call for context :-). Putting it off until 1.10 doesn't seem like an obviously bad idea to me, but specifics would help... (__numpy_ufunc__ is the new system for allowing arbitrary third party objects to override how ufuncs are applied to them, i.e. it means np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__ system, which was limited to ndarray subclasses and has major limits on what you can do. Of course __array_prepare/wrap__ will also continue to be supported for compatibility.) -n On 16 Jul 2014 00:10, "Benjamin Root" wrote: > Perhaps a bit of context might be useful? How is numpy_ufunc different > from the ufuncs that we know and love? What are the known implications? > What are the known shortcomings? Are there ABI and/or API concerns between > 1.9 and 1.10? > > Ben Root > > > On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris < > charlesr.har...@gmail.com> wrote: > >> Hi All, >> >> Julian has raised the question of including numpy_ufunc in numpy 1.9. I >> don't feel strongly one way or the other, but it doesn't seem to be >> finished yet and 1.10 might be a better place to work out the remaining >> problems along with the astropy folks testing possible uses. >> >> Thoughts? >> >> Chuck >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion