Re: [Python-Dev] range objects in 3.x
On Thu, 29 Sep 2011 11:36:21 +1300, Greg Ewing wrote: >> I do hope, though, that the chosen name is *not*: >> >> - 'interval' >> >> - 'interpolate' or similar > > Would 'subdivide' be acceptable? I'm not great at finding names, and I don't totally love it, but I certainly don't see any problems with it. It is, after all, a subdivision of an interval :) I think 'grid' has been mentioned, and I think it's reasonable, even though most people probably associate the word with a two-dimensional object. But grids can have any desired dimensionality. Now, in fact, numpy has a slightly demented (but extremely useful) ogrid object: In [7]: ogrid[0:10:3] Out[7]: array([0, 3, 6, 9]) In [8]: ogrid[0:10:3j] Out[8]: array([ 0., 5., 10.]) Yup, that's a complex slice :) So if python named the builtin 'grid', I think it would go well with existing numpy habits. Cheers, f ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
Le jeudi 29 septembre 2011 02:07:02, Benjamin Peterson a écrit : > 2011/9/28 victor.stinner : > > http://hg.python.org/cpython/rev/36fc514de7f0 > > changeset: 72512:36fc514de7f0 > > user:Victor Stinner > > date:Thu Sep 29 01:12:24 2011 +0200 > > summary: > > Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an > > array > > > > Move other various macros to pymcacro.h > > > > Thanks Rusty Russell for having written these amazing C macros! > > > > files: > > Include/Python.h | 19 + > > Include/pymacro.h | 57 +++ > > Do we really need a new file? Why not pyport.h where other compiler stuff > goes? I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to the platform like INT_MAX, Py_VA_COPY, ... pymacro.h contains platform independant macros. I would like to suggest the opposite: move platform independdant macros from pyport.h to pymacro.h :-) Suggestions: - Py_ARITHMETIC_RIGHT_SHIFT - Py_FORCE_EXPANSION - Py_SAFE_DOWNCAST Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 close to pronouncement
> Resizing > > > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. Wrong. Even if you create a string using the legacy API (e.g. PyUnicode_FromUnicode), the string will be quickly compacted to use the most efficient memory storage (depending on the maximum character). "quickly": at the first call to PyUnicode_READY. Python tries to make all strings ready as early as possible. > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? For pure ASCII strings, you don't have to store a pointer to the UTF-8 string, nor the length of the UTF-8 string (in bytes), nor the length of the wchar_t string (in wide characters): the length is always the length of the "ASCII" string, and the UTF-8 string is shared with the ASCII string. The structure is much smaller thanks to these optimizations, and so Python 3.3 uses less memory than 2.7 for ASCII strings, even for short strings. > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. Latin1 is less interesting, you cannot share length/data fields with utf8 or wstr. We didn't add a special case for Latin1 strings (except using Py_UCS1* strings to store their characters). > Furthermore, determining len(obj) will require a loop over > the data, checking for surrogate code points. A simple memcpy() > is no longer enough. Wrong. len(obj) gives the "right" result (see the long discussion about what is the length of a string in a previous thread...) in O(1) since it's computed when the string is created. > ... in practice you only > very rarely see any non-BMP code points in your data. Making > all Python users pay for the needs of a tiny fraction is > not really fair. Remember: practicality beats purity. The creation of the string is maybe is little bit slower (especially when you have to scan the string twice to first get the maximum character), but I think that this slow down is smaller than the speedup allowed by the PEP. Because ASCII strings are now char*, I think that processing ASCII strings is faster because the CPU can cache more data (close to the CPU). We can do better optimization on ASCII and Latin1 strings (it's faster to manipulate char* than uint16_t* or uint32_t*). For example, str.center(), str.ljust, str.rjust and str.zfill do now use the very fast memset() function for latin1 strings to pad the string. Another example, duplicating a string (or create a substring) should be faster just because you have less data to copy (e.g. 10 bytes for a string of 10 Latin1 characters vs 20 or 40 bytes with Python 3.2). The two most common encodings in the world are ASCII and UTF-8. With the PEP 393, encoding to ASCII or UTF-8 is free, you don't have to encode anything, you have directly the encoded char* buffer (whereas you have to convert 16/32 bit wchar_t to char* in Python 3.2, even for pure ASCII). (It's also free to encode "Latin1" Unicode string to Latin1.) With the PEP 393, we never have to decode UTF-16 anymore when iterating on code pointer to support correctly non-BMP characters (which was required before in narrow build, e.g. on Windows). Iterate on code point is just a dummy loop, no need to check if each character is in range U+D800-U+DFFF. There are other funny tricks (optimizations). For example, text.replace(a, b) knows that there is nothing to do if maxchar(a) > maxchar(text), where maxchar(obj) just requires to read an attribute of the string. Think about ASCII and non-ASCII strings: pure_ascii.replace('\xe9', '') now just creates a new reference... I don't think that Martin wrote his PEP to be able to implement all these optimisations, but there are an interesting side effect of his PEP :-) > The table only lists string sizes up 8 code points. The memory > savings for these are really only significant for ASCII > strings on 64-bit platforms, if you use the default UCS2 > Python build as basis. In the 32 different cases, the PEP 393 is better in 29 cases and "just" as good as Python 3.2 in 3 corner cases: - 1 ASCII, 16-bit wchar, 32-bit - 1 Latin1, 32-bit wchar, 32-bit - 2 Latin1, 32-bit wchar, 32-bit Do you really care of these corner cases? See the more the realistic benchmark in previous Martin's email ("PEP 393 memory savings update"): the PEP 393 not only uses 3x less memory than 3.2, but it uses also *less* memory than Python 2.7, whereas Python 3 uses Unicode for everything! > For larger strings, I expect the savings to be more significant. Sure. > OTOH, a single non-BMP code point in such a string would cause > the savings to drop significantly again. In this case, it's just as good a
Re: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
2011/9/28 victor.stinner : > http://hg.python.org/cpython/rev/36fc514de7f0 > changeset: 72512:36fc514de7f0 > user: Victor Stinner > date: Thu Sep 29 01:12:24 2011 +0200 > summary: > Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array > > Move other various macros to pymcacro.h > > Thanks Rusty Russell for having written these amazing C macros! > > files: > Include/Python.h | 19 + > Include/pymacro.h | 57 +++ Do we really need a new file? Why not pyport.h where other compiler stuff goes? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Implement PEP 393.
Is there some reason str.format had such major surgery done to it? It appears parts of it were removed from stringlib. I had not even thought to look at the code before it was merged, as it never occurred to me anyone would do that. I left it in stringlib even in 3.x because there's the occasional talk of adding bytes.bformat, and since all of the code works well with stringlib (since it was used by str and unicode in 2.x), it made sense to leave it there. In addition, there are outstanding patches that are now broken. I'd prefer it return to how it used to be, and just the minimum changes required for PEP 393 be made to it. Thanks. Eric. On 9/28/2011 2:35 AM, martin.v.loewis wrote: > http://hg.python.org/cpython/rev/8beaa9a37387 > changeset: 72475:8beaa9a37387 > user:Martin v. Löwis > date:Wed Sep 28 07:41:54 2011 +0200 > summary: > Implement PEP 393. > > files: > Doc/c-api/unicode.rst | 9 + > Include/Python.h | 5 + > Include/complexobject.h| 5 +- > Include/floatobject.h | 5 +- > Include/longobject.h | 6 +- > Include/pyerrors.h | 6 + > Include/pyport.h | 3 + > Include/unicodeobject.h| 783 +- > Lib/json/decoder.py| 3 +- > Lib/test/json_tests/test_scanstring.py |11 +- > Lib/test/test_codeccallbacks.py| 7 +- > Lib/test/test_codecs.py| 4 + > Lib/test/test_peepholer.py | 4 - > Lib/test/test_re.py| 7 + > Lib/test/test_sys.py |38 +- > Lib/test/test_unicode.py |41 +- > Makefile.pre.in| 6 +- > Misc/NEWS | 2 + > Modules/_codecsmodule.c| 8 +- > Modules/_csv.c | 2 +- > Modules/_ctypes/_ctypes.c | 6 +- > Modules/_ctypes/callproc.c | 8 - > Modules/_ctypes/cfield.c |64 +- > Modules/_cursesmodule.c| 7 +- > Modules/_datetimemodule.c |13 +- > Modules/_dbmmodule.c |12 +- > Modules/_elementtree.c |31 +- > Modules/_io/_iomodule.h| 2 +- > Modules/_io/stringio.c |69 +- > Modules/_io/textio.c | 352 +- > Modules/_json.c| 252 +- > Modules/_pickle.c | 4 +- > Modules/_sqlite/connection.c |19 +- > Modules/_sre.c | 382 +- > Modules/_testcapimodule.c | 2 +- > Modules/_tkinter.c |70 +- > Modules/arraymodule.c | 8 +- > Modules/md5module.c|10 +- > Modules/operator.c |27 +- > Modules/pyexpat.c |11 +- > Modules/sha1module.c |10 +- > Modules/sha256module.c |10 +- > Modules/sha512module.c |10 +- > Modules/sre.h | 4 +- > Modules/syslogmodule.c |14 +- > Modules/unicodedata.c |28 +- > Modules/zipimport.c| 141 +- > Objects/abstract.c | 4 +- > Objects/bytearrayobject.c | 147 +- > Objects/bytesobject.c | 127 +- > Objects/codeobject.c |15 +- > Objects/complexobject.c|19 +- > Objects/dictobject.c |20 +- > Objects/exceptions.c |26 +- > Objects/fileobject.c |17 +- > Objects/floatobject.c |19 +- > Objects/longobject.c |84 +- > Objects/moduleobject.c | 9 +- > Objects/object.c |10 +- > Objects/setobject.c|40 +- > Objects/stringlib/count.h | 9 +- > Objects/stringlib/eq.h |23 +- > Objects/stringlib/fastsearch.h | 4 +- > Objects/stringlib/find.h |31 +- > Objects/stringlib/formatter.h | 1516 -- > Objects/stringlib/localeutil.h |27 +- > Objects/stringlib/partition.h |12 +- > Objects/stringlib/split.h |26 +- > Objects/stringlib/string_format.h | 1385 -- > Objects/stringlib/stringdefs.h | 2 + > Objects/stringlib/ucs1lib.h|35 + > Objects/stringlib/ucs2lib.h|34 + > Objects/stringlib/ucs4lib.h|34 + > Objects/stringlib/undef.h |10 + > Objects/stringlib/unicode_format.h | 1416 ++ > Objects/stringlib/unicodedefs.h| 2 + > Obj
Re: [Python-Dev] range objects in 3.x
Fernando Perez wrote: Now, I *suspect* (but don't remember for sure) that the option to have it right-hand-open-ended was to match the mental model people have for range: In [5]: linspace(0, 10, 10, endpoint=False) Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) In [6]: range(0, 10) Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] My guess would be it's so that you can concatenate two sequences created with linspace covering adjacent ranges and get the same result as a single linspace call covering the whole range. I do hope, though, that the chosen name is *not*: - 'interval' - 'interpolate' or similar Would 'subdivide' be acceptable? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
In article <74f6adfa-874d-4bac-b304-ce8b12d80...@masklinn.net>, Xavier Morel wrote: > On 2011-09-28, at 19:49 , Martin v. Löwis wrote: > > > > Thanks for the advise - I didn't expect that Apple ships thhree compilers > Yeah I can understand that, they're in the middle of the transition but Clang > is not quite there yet so... BTW, at the moment, we are still using gcc-4.2 (not gcc-llvm nor clang) from Xcode 3 on OS X 10.6 for the 64-bit/32-bit installer builds and gcc-4.0 on 10.5 for the 32-bit-only installer builds. We will probably revisit that as we get closer to 3.3 alphas and betas. -- Ned Deily, n...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] range objects in 3.x
On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote: > The audience for numpy is a small minority of Python users, and they Certainly, though I'd like to mention that scientific computing is a major success story for Python, so hopefully it's a minority with something to contribute > tend to be more sophisticated. I'm sure they can cope with two functions > with different APIs No problem with having different APIs, but in that case I'd hope the builtin wouldnt' be named linspace, to avoid confusion. In numpy/scipy we try hard to avoid collisions with existing builtin names, hopefully in this case we can prevent the reverse by having a dialogue. > While continuity of API might be a good thing, we shouldn't accept a > poor API just for the sake of continuity. I have some criticisms of the > linspace API. > > numpy.linspace(start, stop, num=50, endpoint=True, retstep=False) > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html > > * It returns a sequence, which is appropriate for numpy but in standard > Python it should return an iterator or something like a range object. Sure, no problem there. > * Why does num have a default of 50? That seems to be an arbitrary > choice. Yup. linspace was modeled after matlab's identically named command: http://www.mathworks.com/help/techdoc/ref/linspace.html but I have no idea why the author went with 50 instead of 100 as the default (not that 100 is any better, just that it was matlab's choice). Given how linspace is often used for plotting, 100 is arguably a more sensible choice to get reasonable graphs on normal-resolution displays at typical sizes, absent adaptive plotting algorithms. > * It arbitrarily singles out the end point for special treatment. When > integrating, it is just as common for the first point to be singular as > the end point, and therefore needing to be excluded. Numerical integration is *not* the focus of linspace(): in numerical integration, if an end point is singular you have an improper integral and *must* approach the singularity much more carefully than by simply dropping the last point and hoping for the best. Whether you can get away by using (desired_end_point - very_small_number) --the dumb, naive approach-- or not depends a lot on the nature of the singularity. Since numerical integration is a complex and specialized domain and the subject of an entire subcomponent of the (much bigger than numpy) scipy library, there's no point in arguing the linspace API based on numerical integration considerations. Now, I *suspect* (but don't remember for sure) that the option to have it right-hand-open-ended was to match the mental model people have for range: In [5]: linspace(0, 10, 10, endpoint=False) Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) In [6]: range(0, 10) Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] I'm not arguing this was necessarily a good idea, just my theory on how it came to be. Perhaps R. Kern or one of the numpy lurkers in here will pitch in with a better recollection. > * If you exclude the end point, the stepsize, and hence the values > returned, change: > > >>> linspace(1, 2, 4) > array([ 1., 1., 1.6667, 2.]) > >>> linspace(1, 2, 4, endpoint=False) > array([ 1. , 1.25, 1.5 , 1.75]) > > This surprises me. I expect that excluding the end point will just > exclude the end point, i.e. return one fewer point. That is, I expect > num to count the number of subdivisions, not the number of points. I find it very natural. It's important to remember that *the whole point* of linspace's existence is to provide arrays with a known, fixed number of points: In [17]: npts = 10 In [18]: len(linspace(0, 5, npts)) Out[18]: 10 In [19]: len(linspace(0, 5, npts, endpoint=False)) Out[19]: 10 So the invariant to preserve is *precisely* the number of points, not the step size. As Guido has pointed out several times, the value of this function is precisely to steer people *away* from thinking of step sizes in a context where they are more likely than not going to get it wrong. So linspace focuses on a guaranteed number of points, and lets the step size chips fall where they may. > * The retstep argument changes the return signature from => array to => > (array, number). I think that's a pretty ugly thing to do. If linspace > returned a special iterator object, the step size could be exposed as an > attribute. Yup, it's not pretty but understandable in numpy's context, a library that has a very strong design focus around arrays, and numpy arrays don't have writable attributes: In [20]: a = linspace(0, 10) In [21]: a.stepsize = 0.1 --- AttributeErrorTraceback (most recent call last) /home/fperez/ in () > 1 a.stepsize = 0.1 AttributeError: 'numpy.ndarray' object has no attribute 'stepsize' So while
[Python-Dev] What it takes to change a single keyword.
Hi, First of all, I am sincerely sorry if this is wrong mailing list to ask this question. I checked out definitions of couple other mailing list, and this one seemed most suitable. Here is my question: Let's say I want to change a single keyword, let's say import keyword, to be spelled as something else, like it's translation to my language. I guess it would be more complicated than modifiying Grammar/Grammar, but I can't be sure which files should get edited. I'am asking this, because, I am trying to figure out if I could translate keyword's into another language, without affecting behaviour of language. -- http://yasar.serveblog.net/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
On 2011-09-28, at 19:49 , Martin v. Löwis wrote: > > Thanks for the advise - I didn't expect that Apple ships thhree compilers… Yeah I can understand that, they're in the middle of the transition but Clang is not quite there yet so... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
> Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with > Xcode 4, worth a try. clang indeed works fine. > Also, from your version listing it seems to be llvm-gcc (gcc frontend with > llvm backend I think), > is there no more straight gcc (with gcc frontend and backend)? /usr/bin/cc and /usr/bin/gcc both link to llvm-gcc-4.2. However, there still is /usr/bin/gcc-4.2. Using that, Python also compiles correctly - so I have changed the gcc link on my system. Thanks for the advise - I didn't expect that Apple ships thhree compilers... Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 close to pronouncement
> Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. No, codecs have been rewritten to not use resizing. > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? That's the Py_UNICODE representation for backwards compatibility. It's normally NULL. > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. No, in the ASCII case, the UTF-8 length can be shared with the regular string length - not so for Latin-1 character above 127. > Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing > code will cause problems on some systems where whcar_t is a > signed type. > > Python assumes that Py_UNICODE is unsigned and thus doesn't > check for negative values or takes these into account when > doing range checks or code point arithmetic. > > On such platform where wchar_t is signed, it is safer to > typedef Py_UNICODE to unsigned wchar_t. No. Py_UNICODE values *must* be in the range 0..17*2**16. Values larger than 17*2**16 are just as bad as negative values, so having Py_UNICODE unsigned doesn't improve anything. > Py_UNICODE access to the objects assumes that len(obj) == > length of the Py_UNICODE buffer. The PEP suggests that length > should not take surrogates into account on UCS2 platforms > such as Windows. The causes len(obj) to not match len(wstr). Correct. > As a result, Py_UNICODE access to the Unicode objects breaks > when surrogate code points are present in the Unicode object > on UCS2 platforms. Incorrect. What specifically do you think would break? > The PEP also does not explain how lone surrogates will be > handled with respect to the length information. Just as any other code point. Python does not special-case surrogate code points anymore. > Furthermore, determining len(obj) will require a loop over > the data, checking for surrogate code points. A simple memcpy() > is no longer enough. No, it won't. The length of the Unicode object is stored in the length field. > I suggest to drop the idea of having len(obj) not count > wstr surrogate code points to maintain backwards compatibility > and allow for working with lone surrogates. Backwards-compatibility is fully preserved by PyUnicode_GET_SIZE returning the size of the Py_UNICODE buffer. PyUnicode_GET_LENGTH returns the true length of the Unicode object. > Note that the whole surrogate debate does not have much to > do with this PEP, since it's mainly about memory footprint > savings. I'd also urge to do a reality check with respect > to surrogates and non-BMP code points: in practice you only > very rarely see any non-BMP code points in your data. Making > all Python users pay for the needs of a tiny fraction is > not really fair. Remember: practicality beats purity. That's the whole point of the PEP. You only pay for what you actually need, and in most cases, it's ASCII. > For best performance, each algorithm will have to be implemented > for all three storage types. This will be a trade-off. I think most developers will be happy with a single version covering all three cases, especially as it's much more maintainable. Kind regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 close to pronouncement
2011/9/28 M.-A. Lemburg : > Guido van Rossum wrote: >> Given the feedback so far, I am happy to pronounce PEP 393 as >> accepted. Martin, congratulations! Go ahead and mark ity as Accepted. >> (But please do fix up the small nits that Victor reported in his >> earlier message.) > > I've been working on feedback for the last few days, but I guess it's > too late. Here goes anyway... > > I've only read the PEP and not followed the discussion due to lack of > time, so if any of this is no longer valid, that's probably because > the PEP wasn't updated :-) > > Resizing > > > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. > > > Data structure > -- > > The data structure description in the PEP appears to be wrong: > > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? > > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. I think the purpose is that if it's only ASCII, no work is need to encode to UTF-8. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 close to pronouncement
Guido van Rossum wrote: > Given the feedback so far, I am happy to pronounce PEP 393 as > accepted. Martin, congratulations! Go ahead and mark ity as Accepted. > (But please do fix up the small nits that Victor reported in his > earlier message.) I've been working on feedback for the last few days, but I guess it's too late. Here goes anyway... I've only read the PEP and not followed the discussion due to lack of time, so if any of this is no longer valid, that's probably because the PEP wasn't updated :-) Resizing Codecs use resizing a lot. Given that PyCompactUnicodeObject does not support resizing, most decoders will have to use PyUnicodeObject and thus not benefit from the memory footprint advantages of e.g. PyASCIIObject. Data structure -- The data structure description in the PEP appears to be wrong: PyASCIIObject has a wchar_t *wstr pointer - I guess this should be a char *str pointer, otherwise, where's the memory footprint advantage (esp. on Linux where sizeof(wchar_t) == 4) ? I also don't see a reason to limit the UCS1 storage version to ASCII. Accordingly, the object should be called PyLatin1Object or PyUCS1Object. Here's the version from the PEP: """ typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject; """ Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing code will cause problems on some systems where whcar_t is a signed type. Python assumes that Py_UNICODE is unsigned and thus doesn't check for negative values or takes these into account when doing range checks or code point arithmetic. On such platform where wchar_t is signed, it is safer to typedef Py_UNICODE to unsigned wchar_t. Accordingly and to prevent further breakage, Py_UNICODE should not be deprecated and used instead of wchar_t throughout the code. Length information -- Py_UNICODE access to the objects assumes that len(obj) == length of the Py_UNICODE buffer. The PEP suggests that length should not take surrogates into account on UCS2 platforms such as Windows. The causes len(obj) to not match len(wstr). As a result, Py_UNICODE access to the Unicode objects breaks when surrogate code points are present in the Unicode object on UCS2 platforms. The PEP also does not explain how lone surrogates will be handled with respect to the length information. Furthermore, determining len(obj) will require a loop over the data, checking for surrogate code points. A simple memcpy() is no longer enough. I suggest to drop the idea of having len(obj) not count wstr surrogate code points to maintain backwards compatibility and allow for working with lone surrogates. Note that the whole surrogate debate does not have much to do with this PEP, since it's mainly about memory footprint savings. I'd also urge to do a reality check with respect to surrogates and non-BMP code points: in practice you only very rarely see any non-BMP code points in your data. Making all Python users pay for the needs of a tiny fraction is not really fair. Remember: practicality beats purity. API --- Victor already described the needed changes. Performance --- The PEP only lists a few low-level benchmarks as basis for the performance decrease. I'm missing some more adequate real-life tests, e.g. using an application framework such as Django (to the extent this is possible with Python3) or a server like the Radicale calendar server (which is available for Python3). I'd also like to see a performance comparison which specifically uses the existing Unicode APIs to create and work with Unicode objects. Most extensions will use this way of working with the Unicode API, either because they want to support Python 2 and 3, or because the effort it takes to port to the new APIs is too high. The PEP makes some statements that this is slower, but doesn't quantify those statements. Memory savings -- The table only lists string sizes up 8 code points. The memory savings for these are really only significant for ASCII strings on 64-bit platforms, if you use the default UCS2 Python build as basis. For larger strings, I expect the savings to be more significant. OTOH, a single non-BMP code point in such a string would cause the savings to drop significantly again. Complexity -- In order to benefit from the new API, any code that has to deal with low-level Py_UNICODE access to the Unicode objects will have to be adapted. For best performance, each algorithm will have to be implemented for all three storage types. Not doing so, will result in a slow-down, if I read the PEP correctly. It's difficult to say, of what scale, since that in
Re: [Python-Dev] PEP 393 merged
Congrats! Python 3.3 will be better because of this. On Wed, Sep 28, 2011 at 12:48 AM, "Martin v. Löwis" wrote: > I have now merged the PEP 393 implementation into default. > The main missing piece is the documentation; contributions are > welcome. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unittest missing assertNotRaises
Oops, I accidentally hit Reply instead of Reply to All... On Wed, Sep 28, 2011 at 1:05 PM, Michael Foord wrote: > On 27/09/2011 19:59, Laurens Van Houtven wrote: > > Sure, you just *do* it. The only advantage I see in assertNotRaises is that > when that exception is raised, you should (and would) get a failure, not an > error. > > There are some who don't see the distinction between a failure and an error > as a useful distinction... I'm becoming more sympathetic to that view. > I agree. Maybe if there were less failures posing as errors and errors posing as failures, I'd consider taking the distinction seriously. The only use case I've personally encountered is with fuzzy tests. The example that comes to mind is one where we had a fairly complex iterative algorithm for learning things from huge amounts of test data and there were certain criteria (goodness of result, time taken) that had to be satisfied. In that case, "it blew up because someone messed up dependencies" and "it took 3% longer than is allowable" are pretty obviously different... Considering how exotic that use case is, like I said, I'm not really convinced how generally useful it is :) especially since this isn't even a unit test... > All the best, > > Michael > cheers lvh ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
On 2011-09-28, at 13:24 , mar...@v.loewis.de wrote: > The gcc that Apple ships with the Lion SDK (not sure what Xcode version that > is) Xcode 4.1 > I'm not aware of a work-around in the code. My work-around is to use gcc-4.0, > which is still available on my system from an earlier Xcode installation > (in /Developer-3.2.6) Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with Xcode 4, worth a try. Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm backend I think), is there no more straight gcc (with gcc frontend and backend)? FWIW, on 10.6 the default gcc is a straight 4.2 > gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664) There is an llvm-gcc 4.2 but it uses a slightly different revision of llvm > llvm-gcc --version i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2333.4) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
The gcc that Apple ships with the Lion SDK (not sure what Xcode version that is) miscompiles Python now. I've reported this to Apple as bug 10143715; not sure whether there is a public link to this bug report. In essence, the code typedef struct { long length; long hash; int state; int *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; long utf8_length; char *utf8; long wstr_length; } PyCompactUnicodeObject; void *_PyUnicode_compact_data(void *unicode) { return PyASCIIObject*)unicode)->state & 0x20) ? ((void*)((PyASCIIObject*)(unicode) + 1)) : ((void*)((PyCompactUnicodeObject*)(unicode) + 1))); } miscompiles (with -O2 -fomit-frame-pointer) to __PyUnicode_compact_data: Leh_func_begin1: leaq32(%rdi), %rax ret The compiler version is gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00) This unconditionally assumes that sizeof(PyASCIIObject) needs to be added to unicode, independent of whether the state bit is set or not. I'm not aware of a work-around in the code. My work-around is to use gcc-4.0, which is still available on my system from an earlier Xcode installation (in /Developer-3.2.6) Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] range objects in 3.x
Ethan Furman wrote: Well, actually, I'd be using it with dates. ;) Seems to me that one size isn't going to fit all. Maybe we really want two functions: interpolate(start, end, count) Requires a type supporting addition and division, designed to work predictably and accurately with floats extrapolate(start, step, end) Works for any type supporting addition, not recommended for floats -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unittest missing assertNotRaises
On 27/09/2011 19:59, Laurens Van Houtven wrote: Sure, you just *do* it. The only advantage I see in assertNotRaises is that when that exception is raised, you should (and would) get a failure, not an error. There are some who don't see the distinction between a failure and an error as a useful distinction... I'm becoming more sympathetic to that view. All the best, Michael ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unittest missing assertNotRaises
On 27/09/2011 19:46, Wilfred Hughes wrote: Hi folks I wasn't sure if this warranted a bug in the tracker, so I thought I'd raise it here first. unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so on. So, it seems odd to me that there isn't assertNotRaises. Is there any particular motivation for not putting it in? I've attached a simple patch against Python 3's trunk to give an idea of what I have in mind. As others have said, the opposite of assertRaises is just calling the code! I have several times needed regression tests that call code that *used* to raise an exception. It can look slightly odd to have a test without an assert, but the singular uselessness of assertNotRaises does not make it a better alternative. I usually add a comment: def test_something_that_used_to_not_work(self): # this used to raise an exception do_something() All the best, Michael Foord Thanks Wilfred ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unittest missing assertNotRaises
On Wed, Sep 28, 2011 at 09:43:13AM +1000, Steven D'Aprano wrote: > Oleg Broytman wrote: > >On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote: > >>+def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs): > >>+"""Fail if an exception of class excClass is thrown by > >>+callableObj when invoked with arguments args and keyword > >>+arguments kwargs. > >>++""" > >>+try: > >>+callableObj(*args, **kwargs) > >>+except excClass: > >>+raise self.failureException("%s was raised" % excClass) > >>++ > But I can't see this being a useful test. Me too. Oleg. -- Oleg Broytmanhttp://phdru.name/p...@phdru.name Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unittest missing assertNotRaises
On 27 September 2011 19:59, Laurens Van Houtven <_...@lvh.cc> wrote: > Sure, you just *do* it. The only advantage I see in assertNotRaises is that > when that exception is raised, you should (and would) get a failure, not an > error. It's a useful distinction. I have found myself writing code of the form: def test_old_exception_no_longer_raised(self): try: do_something(): except OldException: self.assertTrue(False) in order to distinguish between a regression and something new erroring. The limitation of this pattern is that the test failure message is not as good. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 393 merged
I have now merged the PEP 393 implementation into default. The main missing piece is the documentation; contributions are welcome. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Implement PEP 393.
> Surely there must be more new APIs and changes that need documenting? Correct. All documentation still needs to be written. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com