Re: python IDE and function definition
Chris Friesen: where I could highlight the "stop" and ask it to go to the definition. (Where the definition is in a different file.) I'm running into issues where my current IDE (I'm playing with Komodo) can't seem to locate the definition, I suspect because it's too ambiguous. Some IDEs allow you to help them understand the context by adding type information. Here's some documentation for Wing IDE that uses an isinstance assertion: http://www.wingware.com/doc/edit/helping-wing-analyze-code Neil -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Dave Angel: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Everything is performing correctly. win32clipboard is low-level direct access to the Win32 clipboard API. A higher level API which is more easily used from Python could be defined on top of this if anyone was motivated. Neil -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Stephen Boulet: From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with "b'\x0" and lack any useful info for what I'm trying to accomplish. Looks like Excel is rounding up its clipboard allocation to the next 64K. There used to be good reasons for trying to leave some extra room on the clipboard and avoid reallocating the block but I thought that was over a long time ago. To strip NULs off the end of the string use s.rstrip('\0') Neil -- https://mail.python.org/mailman/listinfo/python-list
Re: Script that converts between indentation and curly braces in Python code
Musical Notation: Is there any script that converts indentation in Python code to curly braces? The indentation is sometime lost when I copy my code to an application or a website. pindent.py in the Tools/Scripts directory of Python installations does something similar by adding or removing comments that look like # end if Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: RE Module Performance
MRAB: The disadvantage there is that when you move the cursor you must move characters around. For example, what if the cursor was at the start and you wanted to move it to the end? Also, when the gap has been filled, you need to make a new one. The normal technique is to only move the gap when text is added or removed, not when the cursor moves. Code that reads the contents, such as for display, handles the gap by checking the requested position and using a different offset when the position is after the gap. Gap buffers work well because changes are generally close to the previous change, so require moving only a relatively small amount of text. Even an occasional move of the whole contents won't cause too much trouble for interactivity with current processors moving multiple megabytes per millisecond. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: hex dump w/ or w/out utf-8 chars
wxjmfa...@gmail.com: The FSR is naive and badly working. I can not force people to understand the coding of the characters [*]. You could at least *try*. If there really was a problem with the FSR and you truly understood this problem then surely you would be able to communicate the problem to at least one person on the list. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python list code of conduct
Dennis Lee Bieber: So who would enforce any rules? I doubt it could be ported to a new (if approval could even be obtained) comp.lang.python.mod(erated) so nothing can be enforced on the comp.lang.python side; and what would you do with Google Groups? The current news group charter doesn't really have any rules. While it may be possible to recharter an existing news group, it would likely be simpler to just create a new one. CHARTER Comp.lang.python is an unmoderated newsgroup which will serve as a forum for discussing the Python computer language. The group will serve both those who just program in Python and those who work on developing the language. Topics that may be discussed include: - announcements of new versions of the language and applications written in Python. - discussion on the internals of the Python language. - general information about the language. - discussion on programming in Python. http://www.python.org/search/hypermail/python-1994q1/0377.html Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this PEP-able? fwhile
jim...@aol.com: Syntax: fwhile X in ListY and conditionZ: There is precedent in Algol 68: for i from 0 to n while safe(i) do .. od which would also make a python proposal that needs no new key words: for i in range(n) while safe(i): .. The benefit of the syntax would be to concentrate the code expressing the domain of the loop rather than have it in separate locations. Not a big win in my opinion. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Version Control Software
Grant Edwards: The last time we made the choice (4-5 years ago), Windows support for get, bzr, and hg was definitely lacking compared to svn. The lack of something like tortoisesvn for hg/git/bzr was a killer. It looks like the situation has improved since then, but I'd be curious to hear from people who do their development on Windows. GUIs for Hg/Git are now much more usable. On Windows, OS X, and Linux my GUI/command line use split is about 80/20. For Hg, TortoiseHg is quite good on Windows and Linux and so is SourceTree on OS X. I don't use Git as much but SourceTree works well on OS X. SourceTree is in beta on Windows and doesn't yet support Hg there. http://tortoisehg.bitbucket.org/ http://www.sourcetreeapp.com/ Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Installing PyGame?
Eam onn: ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pygame/base.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pygame/base.so: no matching architecture in universal wrapper This is saying that the version of Python you are using is a different architecture to the installed pygame library. This could be because you are using a 64-bit version of Python with a 32-bit library or vice-versa. Or you have a PowerPC library and Python is compiled for Intel processors. In Terminal, you can find the architecture of files with "otool -vh" followed by the file name. So try (on one line) otool -vh /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pygame/base.so And the same with Python, first finding where Python is with whereis python Then post all of the output text, not just your interpretation. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Making safe file names
Andrew Berg: This is not a Unicode issue since (modern) file systems will happily accept it. The issue is that certain characters (which are ASCII) are not allowed on some file systems: \ / : * ? "< > | @ and the NUL character The first 9 are not allowed on NTFS, the @ is not allowed on ext3cow, and NUL and / are not allowed on pretty much any file system. Locale settings and encodings aside, these 11 characters will need to be escaped. There's also the Windows device name hole. There may be trouble with artists named 'COM4', 'CLOCK$', 'Con', or similar. http://support.microsoft.com/kb/74496 http://en.wikipedia.org/wiki/Nul_%28band%29 Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do Perl programmers make more money than Python programmers
jmfauth: 2) More critical, Py 3.3, just becomes non unicode compliant, (eg European languages or "ascii" typographers !) ... This is not demonstrating non-compliance. It is comparing performance, not compliance. Please show an example where Python 3.3 is not compliant with Unicode. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Unicode support so hard...
Hi jmf, This gives me plenty of ideas to test the "flexible string representation" (FSR). I should recognize this FSR is failing particulary very well... This is too vague for me. Which string representation should Python use? 1) UTF-32 2) UTF-8 3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime 4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time 5) Something else Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Neil Hodgson, replying to self: The assembler (32-bit build) for each PyUnicode_READ looks like Don't have 64-bit MSVC 2010 set up but the code from 64-bit MSVC 2012 is better since there are an extra 8 registers in 64-bit mode: ; 10431: c1 = PyUnicode_READ(kind1, data1, i); cmp rsi, 1 jne SHORT $LN17@unicode_co lea rax, QWORD PTR [r9+rcx] movzx r8d, BYTE PTR [rax+rbx] jmp SHORT $LN16@unicode_co $LN17@unicode_co: cmp rsi, 2 jne SHORT $LN15@unicode_co movzx r8d, WORD PTR [r9+r11] jmp SHORT $LN16@unicode_co $LN15@unicode_co: mov r8d, DWORD PTR [r9+r10] $LN16@unicode_co: All the variables used in the loop are now in registers but the tests and branches are the same. This lines up with 64-bit being better than 32-bit on Windows but not as good as Python 3.2 or Unix. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Dave Angel: That would seem to imply that the speed regression on your data is NOT caused by the differing size encodings. Perhaps it is the difference in MSC compiler version, or other changes made between 3.2 and 3.3 Its not caused by there actually being different size encodings but that the code is checking encoding size 2-4 times for each character. Back in 3.2 the comparison loop looked like: while (len1 > 0 && len2 > 0) { Py_UNICODE c1, c2; c1 = *s1++; c2 = *s2++; if (c1 != c2) return (c1 < c2) ? -1 : 1; len1--; len2--; } For 3.3 this has changed to for (i = 0; i < len1 && i < len2; ++i) { Py_UCS4 c1, c2; c1 = PyUnicode_READ(kind1, data1, i); c2 = PyUnicode_READ(kind2, data2, i); if (c1 != c2) return (c1 < c2) ? -1 : 1; } with PyUnicode_READ being #define PyUnicode_READ(kind, data, index) \ ((Py_UCS4) \ ((kind) == PyUnicode_1BYTE_KIND ? \ ((const Py_UCS1 *)(data))[(index)] : \ ((kind) == PyUnicode_2BYTE_KIND ? \ ((const Py_UCS2 *)(data))[(index)] : \ ((const Py_UCS4 *)(data))[(index)] \ ) \ )) There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop and specialize the loop but MSVC 2010 appears to not do so. The assembler (32-bit build) for each PyUnicode_READ looks like mov ecx, DWORD PTR _kind1$[ebp] cmp ecx, 1 jne SHORT $LN17@unicode_co@2 lea ecx, DWORD PTR [ebx+eax] movzx edx, BYTE PTR [ecx+edx] jmp SHORT $LN16@unicode_co@2 $LN17@unicode_co@2: cmp ecx, 2 jne SHORT $LN15@unicode_co@2 movzx edx, WORD PTR [ebx+edi] jmp SHORT $LN16@unicode_co@2 $LN15@unicode_co@2: mov edx, DWORD PTR [ebx+esi] $LN16@unicode_co@2: The kind1/kind2 variables aren't even going into registers and at least one test+branch and a jump are executed for every character. Two tests for 2 and 4 byte kinds. len1 and len2 don't get to go into registers either. Here's the full assembler output for unicode_compare: ; COMDAT _unicode_compare _TEXT SEGMENT _kind2$ = -20 ; size = 4 _kind1$ = -16 ; size = 4 _len2$ = -12; size = 4 _len1$ = -8 ; size = 4 _data2$ = -4; size = 4 _unicode_compare PROC ; COMDAT ; _str1$ = ecx ; _str2$ = eax ; 10417: { pushebp mov ebp, esp sub esp, 20 ; 0014H pushebx pushesi mov esi, eax ; 10418: int kind1, kind2; ; 10419: void *data1, *data2; ; 10420: Py_ssize_t len1, len2, i; ; 10421: ; 10422: kind1 = PyUnicode_KIND(str1); mov eax, DWORD PTR [ecx+16] mov edx, eax shr edx, 2 and edx, 7 pushedi mov DWORD PTR _kind1$[ebp], edx ; 10423: kind2 = PyUnicode_KIND(str2); mov edx, DWORD PTR [esi+16] mov edi, edx shr edi, 2 and edi, 7 mov DWORD PTR _kind2$[ebp], edi ; 10424: data1 = PyUnicode_DATA(str1); testal, 32 ; 0020H je SHORT $LN9@unicode_co@2 testal, 64 ; 0040H je SHORT $LN7@unicode_co@2 lea ebx, DWORD PTR [ecx+24] jmp SHORT $LN10@unicode_co@2 $LN7@unicode_co@2: lea ebx, DWORD PTR [ecx+36] jmp SHORT $LN10@unicode_co@2 $LN9@unicode_co@2: mov ebx, DWORD PTR [ecx+36] $LN10@unicode_co@2: ; 10425: data2 = PyUnicode_DATA(str2); testdl, 32 ; 0020H je SHORT $LN13@unicode_co@2 testdl, 64 ; 0040H je SHORT $LN11@unicode_co@2 lea edx, DWORD PTR [esi+24] jmp SHORT $LN30@unicode_co@2 $LN11@unicode_co@2: lea eax, DWORD PTR [esi+36] mov DWORD PTR _data2$[ebp], eax mov edx, eax jmp SHORT $LN14@unicode_co@2 $LN13@unicode_co@2: mov edx, DWORD PTR [esi+36] $LN30@unicode_co@2: mov DWORD PTR _data2$[ebp], edx $LN14@unicode_co@2: ; 10426: len1 = PyUnicode_GET_LENGTH(str1); mov edi, DWORD PTR [ecx+8] ; 10427: len2 = PyUnicode_GET_LENGTH(str2); mov ecx, DWORD PTR [esi+8] ; 10428: ; 10429: for (i = 0; i < len1 && i < len2; ++i) { xor eax, eax mov DWORD PTR _len1$[ebp], edi mov
Re: Performance of int/long in Python 3
rusi: Can you please try one more experiment Neil? Knock off all non-ASCII strings (paths) from your dataset and try again. Results are the same 0.40 (well, 0.001 less but I don't think the timer is that accurate) for Python 3.2 and 0.78 for Python 3.3. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Roy Smith: On the other hand, how long did it take you to do the directory tree walk required to find those million paths? I'll bet a long longer than 0.78 seconds, so this gets lost in the noise. About 2 minutes. But that's just getting an example data set. Other data sets may be loaded more quickly from databases or files or be created by processing. Reading the example data from a file takes around the same time as sorting. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Reran the programs taking a bit more care with the encoding of the file. This had no effect on the speeds. There are only a small amount of paths that don't fit into ASCII: ASCII 1076101 Latin1 218 BMP 113 Astral 0 # encoding:utf-8 import codecs, os, time from os.path import join, getsize with codecs.open("filelist.txt", "r", "utf-8") as f: paths = f.read().split("\n") bucket = [0,0,0,0] for p in paths: b = 0 maxChar = max([ord(ch) for ch in p]) if maxChar >= 65536: b = 3 elif maxChar >= 256: b = 2 elif maxChar >= 128: b = 1 bucket[b] = bucket[b] + 1 print("ASCII", bucket[0]) print("Latin1", bucket[1]) print("BMP", bucket[2]) print("Astral", bucket[3]) Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Chris Angelico: I'd be curious to know the sorts of characters used. Given that it's probably a narrow-vs-wide Python difference we're talking here, the actual distribution of codepoints may well make a difference. I was going to upload it but then I thought of potential client -confidentiality problems and the need to audit a list that long. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Terry Jan Reedy: What system *and* what compiler and compiler options. Unless 3.2 and 3.3 are both compiler with the same compiler and settings, we do not know the source of the difference. The version signatures are: 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] The machine is running Windows 8 64-bit (the Python installations are 32-bit though) and the processor is an i3 2350M running at 2.3 GHz. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
rusi wrote: ... a 'micro-benchmark' - I'd just like to avoid adding email access to get this over the threshold. What does that last statement mean? Its a reference to a comment by Jamie Zawinski (relatively famous developer of Netscape Navigator and other things): "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can." One of the games played in bug reporting and avoidance is to deny that the report is a real problem. A short script is dismissed as unrepresentative of actual programs. Once it can read email though, it has to be a real program. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Ian Kelly: Micro-benchmarks like the ones you have been reporting are *useful* when it comes to determining what operations can be better optimized, but they are not *important* in and of themselves. What is important is that actual, real-world programs are not significantly slowed by these kinds of optimizations. Until you can demonstrate that real programs are adversely affected by PEP 393, there is not in my opinion any regression that is worth worrying over. The problem with only responding to issues with real-world programs is that real-world programs are complex and their performance issues often difficult to diagnose. See, for example, scons which is written in Python and which has not been able to overcome performance problems over several years. (http://www.electric-cloud.com/blog/2010/07/21/a-second-look-at-scons-performance/) Bottom-up performance work has advantages in that a narrow focus area can be more easily analyzed and tested and can produce widely applicable benefits. The choice of comparison for the script wasn't arbitrary. Comparison is one of the main building blocks of higher-level code. Sorting, for example, depends strongly on comparison performance with a decrease in comparison speed multiplied when applied to sorting. Its unfortunate that stringbench.py does not contain any comparison or sorting tests. Sorting a million string list (all the file paths on a particular computer) went from 0.4 seconds with Python 3.2 to 0.78 with 3.3 so we're out of the 'not noticeable by humans' range. Perhaps this is still a 'micro-benchmark' - I'd just like to avoid adding email access to get this over the threshold. Here's some code. Replace the "if 1" with "if 0" on subsequent runs to avoid the costly file system walk. import os, time from os.path import join, getsize paths = [] if 1: for root, dirs, files in os.walk('c:\\'): for name in files: paths.append(join(root, name)) with open("filelist.txt", "w") as f: f.write("\n".join(paths)) else: with open("filelist.txt", "r") as f: paths = f.read().split("\n") print(len(paths)) timeStart = time.time() paths.sort() timeEnd = time.time() print("Time taken=", timeEnd - timeStart) Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
jmfauth: 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] [0.8343414906182101, 0.8336184057396241, 0.8330473419738562] 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit [1.3840254166697845, 1.3933888932429768, 1.391664674507438] That's a larger performance decrease than the 64-bit version. Reported the issue as http://bugs.python.org/issue17615 Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Performance of int/long in Python 3
Mark Lawrence: You've given many examples of the same type of micro benchmark, not many examples of different types of benchmark. Trying to work out what jmfauth is on about I found what appears to be a performance regression with '<' string comparisons on Windows 64-bit. Its around 30% slower on a 25 character string that differs in the last character and 70-100% on a 100 character string that differs at the end. Can someone else please try this to see if its reproducible? Linux doesn't show this problem. >c:\python32\python -u "charwidth.py" 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176 [0.7116295577956576, 0.7055591343157613, 0.7203483026429418] a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176 [0.7664397841378787, 0.7199902325464409, 0.713719289812504] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176 [0.7341851791817691, 0.6994205901833599, 0.7106807593741005] a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']180 [0.7346812372666784, 0.699543377914, 0.7064768417728411] >c:\python33\python -u "charwidth.py" 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108 [0.9913326076446045, 0.9455845241056282, 0.9459076605341776] a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192 [1.0472289217234318, 1.0362342484091207, 1.0197109728048384] a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192 [1.0439643704533834, 0.9878581050301687, 0.9949265834034335] a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312 [1.0987483965446412, 1.0130257167690004, 1.024832248526499] Here is the code: # encoding:utf-8 import os, sys, timeit print(sys.version) examples = [ "a=['$b','$z']", "a=['$λ','$η']", "a=['$b','$η']", "a=['$\U0002','$\U00020001']"] baseDir = "C:/Users/Neil/Documents/" #~ baseDir = "C:/Users/Neil/Documents/Visual Studio 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug" for t in examples: t = t.replace("$", baseDir) # Using os.write as simple way get UTF-8 to stdout os.write(sys.stdout.fileno(), t.encode("utf-8")) print(sys.getsizeof(t)) print(timeit.repeat("a[0] < a[1]",t,number=500)) print() For a more significant performance difference try replacing the baseDir setting with (may be wrapped): baseDir = "C:/Users/Neil/Documents/Visual Studio 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug" Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Chris Angelico: But both this and your example of case conversion are, fundamentally, iterating over the string. What if you aren't doing that? What if you want to parse and process? Parsing is also normally a scanning operation. If you want to process pieces of the string based on the parse then you remember the positions (as iterators) at the significant places and extract/process the data based on those positions. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
MRAB: Implementing the regex module (http://pypi.python.org/pypi/regex) would have been more difficult if the internal representation had been UTF-8, because of the need to decode, and the implementation would also have been slower for that reason. One way to build regex support for UTF-8 is to build a fixed width version of the regex code and then interpose an object that converts between the UTF-8 representation and that code. The C++11 standard library contains a regex template that can be instantiated over a UTF-8 representation in this way. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Steven D'Aprano: Some string operations need to inspect every character, e.g. str.upper(). Even for them, the increased complexity of a variable-width encoding costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or 4 bytes per character. You have to walk the string grabbing 1 byte at a time, and then decide whether you need another 1, 2 or 3 bytes. Even though it's still O(N), the added bit-masking and overhead of variable- width encoding adds to the overall cost. It does add to implementation complexity but should only add a small amount of time. To compare costs, I am using the text of the web site http://www.mofa.go.jp/mofaj/ since it has a reasonable amount (10%) of multi-byte characters. Since the document fits in the the BMP, Python would choose a 2-byte wide implementation so I am emulating that choice with a very simple 16-bit table-based upper-caser. Real Unicode case conversion code is more concerned with edge cases like Turkic and Lithuanian locales and Greek combining characters and also allowing for measurement/reallocation for the cases where the result is smaller/larger. See, for example, glib's real_toupper in https://git.gnome.org/browse/glib/tree/glib/guniprop.c Here is some simplified example code that implements upper-casing over 16-bit wide (utf16_up) and UTF-8 (utf8_up) buffers: http://www.scintilla.org/UTF8Up.cxx Since I didn't want to spend too much time writing code it only handles the BMP and doesn't have upper-case table entries outside ASCII for now. If this was going to be worked on further to be made maintainable, most of the masking and so forth would be in macros similar to UTF8_COMPUTE/UTF8_GET in glib. The UTF-8 case ranges from around 5% slower on average in a 32 bit release build (VC2012 on an i7 870) to averaging a little faster in a 64-bit build. They're both around a billion characters per-second. C:\u\hg\UpUTF\UpUTF>..\x64\Release\UpUTF.exe Time taken for UTF8 of 80449=0.006528 Time taken for UTF16 of 71525=0.006610 Relative time taken UTF8/UTF16 0.987581 Any string method that takes a starting offset requires the method to walk the string byte-by-byte. I've even seen languages put responsibility for dealing with that onto the programmer: the "start offset" is given in *bytes*, not characters. I don't remember what language this was... it might have been Haskell? Whatever it was, it horrified me. It doesn't horrify me - I've been working this way for over 10 years and it seems completely natural. You can wrap access in iterators that hide the byte offsets if you like. This then ensures that all operations on those iterators are safe only allowing the iterator to point at the start/end of valid characters. Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. I think you mean writing a lot of Latin-1 characters outside ASCII. However, even people writing texts in, say, French will find that only a small proportion of their text is outside ASCII and so the cost of UTF-8 is correspondingly small. The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Ian Foote: Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Many common string operations do not require indexing by character which reduces the impact of this inefficiency. UTF-8 seems like a reasonable choice for an internal representation to me. One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: String performance regression from python 3.2 to 3.3
Steven D'Aprano: So while you might save memory by using "UTF-24" instead of UTF-32, it would probably be slower because you would have to grab three bytes at a time instead of four, and the hardware probably does not directly support that. Low-level string manipulation often deals with blocks larger than an individual character for speed. Generally 32 or 64-bits at a time using the CPU or 128 or 256 using the vector unit. Then there may be entry/exit code to handle initial alignment to a block boundary and dealing with a smaller than block-size tail. For an example of this kind of thing, see find_max_char in python\Objects\stringlib\find_max_char.h which can examine a char* 32 or 64-bits at a time. 24-bit is likely to be a win in many circumstances due to decreased memory traffic. a 12-bit implementation may also be worthwhile as the low 0x1000 characters of Unicode contains Latin (with extensions), Greek, Cyrillic, Arabic, Hebrew, and most Indic scripts. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Largest possible size for executemany() in PEP-249 (Database API)
Roy Smith: _mysql_exceptions.OperationalError: (1153, "Got a packet bigger than 'max_allowed_packet' bytes") Is there any way (other than trial and error) to know how many records I can pass in one call before I blow up? Its unlikely to be a limit in the number of records but a limit on the number of bytes in the serialized command stream. With a deep understanding of the format you could count bytes until about to go over and then flush. I'd *guess* that the data is being sent as textual SQL INSERT statements so you could work out what your insertions look like as INSERT statements and see how many fit into max_allowed_packet. max_allowed_packet is probably 1 million so looking like 100 bytes per INSERT but will depend on data as inserting "Ko" should use less bytes than inserting "Naragarajan". Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: LangWart: Method congestion from mutate multiplicty
Rick Johnson: Really? Yes. >> a = [1,2] => [1, 2] >> a.push(3) => [1, 2, 3] >> a => [1, 2, 3] This could be called "mutation without exclamation". >> require 'WEBrick' => true >> vowels = "[aeiou]+" => "[aeiou]+" >> vowels.object_id => 2234951380 >> WEBrick::HTTPUtils._make_regex!(vowels) => /([^\[aeiou\]\+])/n >> vowels => "[aeiou]+" >> vowels.object_id => 2234951380 The counterpart, exclamation without mutation. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: LangWart: Method congestion from mutate multiplicty
Rick Johnson: The Ruby language attempted to save the programmer from the scourge of obtaining a four year degree in linguistics just to create intuitive identifiers "on-the-fly", and they tried to remove this ambiguity by employing "post-fix-punctuation" of the exclamation mark as a visual cue for in-place modification of the object: Ruby does not use '!' to indicate in-place modification: http://dablog.rubypal.com/2007/8/15/bang-methods-or-danger-will-rubyist Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How to debug pyd File in Vs???
Junze Liu: Third, use the embed interpreter to execute a .py File.The .py File include the module that in .pyd File I created. Here, the problem comes out! When I start my main project. I can only debug the problems in my main project, when my main project use the python interpreter to execute the python interpreter, I can't see what happened in my pyd File, the whole project collapsed.I know the error is in the pyd File, and if I set a break point in my resource files of pyd File, either the project will go to the break point. This normally works unless Visual Studio can't understand the relationship between the .pyd and source files. Make sure you have built the .pyd using a release configuration but with debugging information turned on for both compiling and linking. Or use a debug build of python along with a debug build of the .pyd. Is there any methods to debug the resource file in this condition! You can call DebugBreak() somewhere in the .pyd code which will start up Visual Studio as a debugger. It normally works out where the source code is and then you can add more breakpoints and step through. http://msdn.microsoft.com/en-us/library/windows/desktop/ms679297(v=vs.85).aspx Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Comparing strings from the back?
Ethan Furman: *plonk* I can't work out who you are plonking. While more than one of the posters on this thread seem worthy of a good plonk, by not including sufficient context, you've left me feeling puzzled. Is there a guideline for this in basic netiquette? Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Comparing strings from the back?
Roy Smith: I'm wondering if it might be faster to start at the ends of the strings instead of at the beginning? If the strings are indeed equal, it's the same amount of work starting from either end. Most people write loops that go forwards. This leads to the processor designers prioritizing detection mechanisms like cache prefetching for that case. However, its not always the situation: a couple of years ago Intel contributed a memcpy implementation to glibc that went backwards for a performance improvement. An explanation from a related patch involves speculative and committed operations and address bits on some processors and quickly lost me: paragraph 3 of http://lists.freedesktop.org/archives/pixman/2010-August/000423.html The memcpy patch was controversial as it broke Adobe Flash which assumed memcpy was safe like memmove. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Flexible string representation, unicode, typography, ...
wxjmfa...@gmail.com: Go "has" the integers int32 and int64. A rune ensure the usage of int32. "Text libs" use runes. Go has only bytes and runes. Go's text libraries use UTF-8 encoded byte strings. Not arrays of runes. See, for example, http://golang.org/pkg/regexp/ Are you claiming that UTF-8 is the optimum string representation and therefore should be used by Python? Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Flexible string representation, unicode, typography, ...
wxjmfa...@gmail.com: Small illustration. Take an a4 page containing 50 lines of 80 ascii characters, add a single 'EM DASH' or an 'BULLET' (code points> 0x2000), and you will see all the optimization efforts destroyed. sys.getsizeof('a' * 80 * 50) 4025 sys.getsizeof('a' * 80 * 50 + '•') 8040 This example is still benefiting from shrinking the number of bytes in half over using 32 bits per character as was the case with Python 3.2: >>> sys.getsizeof('a' * 80 * 50) 16032 >>> sys.getsizeof('a' * 80 * 50 + '•') 16036 >>> Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I display unicode value stored in a string variable using ord()
Steven D'Aprano: Using variable-sized strings like UTF-8 and UTF-16 for in-memory representations is a terrible idea because you can't assume that people will only every want to index the first or last character. On average, you need to scan half the string, one character at a time. In Big-Oh, we can ignore the factor of 1/2 and just say we scan the string, O(N). In the majority of cases you can remove excessive scanning by caching the most recent index->offset result. If the next index request is nearer the cached index than to the beginning then iterate from that offset. This converts many operations from quadratic to linear. Locality of reference is common and can often be reasonably exploited. However, exposing the variable length nature of UTF-8 allows the application to choose efficient techniques for more cases. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't Python remember the initial directory?
Nobody: Maybe. On Unix, it's possible that the current directory no longer has a pathname. Its also possible that you do not have permission to successfully call getcwd. One example of this I have experienced is the OS X sandbox where you can run Python starting in a directory where you have only limited permissions. getcwd works by calling readdir and lstat and looping up from the current directory to the root to build the whole path so will break without read permissions on directories: http://www.opensource.apple.com/source/Libc/Libc-763.13/gen/FreeBSD/getcwd.c Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Language Enhancement Idea to help with multi-processing (your opinions please)
jkn: > FWIW, this looks rather like the 'PAR' construct of Occam to me. > > http://en.wikipedia.org/wiki/Occam_%28programming_language%29 Earlier than that, 'par' is from Algol 68: http://en.wikipedia.org/wiki/ALGOL_68#par:_Parallel_processing Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: WxPython versus Tkinter.
rantingrick: > Not if we used the underlying MS library! Windows has such a rich > library why not use it? Why must we constantly re-invent the wheel? It is up to the GUI toolkit or application to implement the interfaces defined by Windows Automation API on every object it displays. The standard Windows controls have this implemented. Tk does not use these controls so would have to do all that work again. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: WxPython versus Tkinter.
Octavian Rasnita: > There are no many people that know about this thing, > but there are standards like MSAA that can be followed > by them if they really want to offer accessibility. I > guess that if Tkinter would support MSAA (Microsoft > Active Accessibility) in its Windows version, the screen > readers would be able to offer support for Tk (or it > might be offered by default... I don't know). MSAA was superseded by Microsoft UI Automation in 2005 which in turn was superseded by Windows Automation API in Windows 7. http://msdn.microsoft.com/en-us/library/dd561932(v=vs.85).aspx Making Tk as accessible as Windows or GTK+ would be a huge job. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Tkinter: The good, the bad, and the ugly!
Emile van Sebille: > The problem with QT is the license. > > From http://qt.nokia.com/products/licensing/: > > Qt Commercial Developer License > The Qt Commercial Developer License is the correct license to use for > the development of proprietary and/or commercial software ... The LGPL version is also useful for producing commercial software. >From the same web page: """ Qt GNU LGPL v. 2.1 Version This version is available for development of proprietary and commercial applications in accordance with the terms and conditions of the GNU Lesser General Public License version 2.1. """ Developing a proprietary (closed source) application using LGPL libraries is normally not a problem as the only pieces of code you have to publish are changes to those LGPL libraries, not the application code. Most applications do not change the libraries. The "can't reuse LGPL code" clause is a restriction on what can be done with the Qt Commercial Developer License not on what can be done with the LGPL license. GTK+ has always been LGPL and that license has not been an obstacle to either open source or closed source projects. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: will Gnome 3.0 kill pygtk?
Tracubik: > i'm studying pygtk right now, am i wasting my time considering that my > preferred platform is linux/gnome? I expect the dynamic binding will be very similar to the current static binding but easier to keep up-to-date. There's already some use of dynamic binding in the recent PyGTK 2.22.0: http://www.daa.com.au/pipermail/pygtk/2010-September/019013.html Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How Python works: What do you know about support for negative indices?
Ben Finney: > For those who think the problem may be with the recipient's software, I > see the same annoying line-wrapping problems in the archived message > http://mail.python.org/pipermail/python-list/2010-September/1255167.html>. That looks well-formatted to me and just the same as I see in a news reader. There appear to be deliberate wraps at sentence end or automatic wraps to fit <80 columns. Which lines are wrong and why are they wrong? Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How Python works: What do you know about support for negativeindices?
Mark Tolonen: > It came across fine for me (on much maligned Outlook Express, no less). Yes, looks fine to me both in Thunderbird (news, not mailing list) and at Google Groups. There is a single text part with all lines except an URL easily within 80 columns. Perhaps there is a problem in Ben's reader or in the mailing list gateway. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?
dmtr: > What I'm really looking for is a dict() that maps short unicode > strings into tuples with integers. But just having a *compact* list > container for unicode strings would help a lot (because I could add a > __dict__ and go from it). Add them all into one string or array and use indexes into that string. Neil -- http://mail.python.org/mailman/listinfo/python-list
Microsoft lessening commitment to IronPython and IronRuby
There is a blog post from Jimmy Schementi who previously worked at Microsoft on IronRuby about the state of dynamic language work there. http://blog.jimmy.schementi.com/2010/08/start-spreading-news-future-of-jimmy.html Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Why is python not written in C++ ?
Paul Rubin: > C has all kinds of undefined behavior. "Might need to rely on" is not > relevant for this kind of issue. Ada's designers had the goal that that > Ada programs should have NO undefined behavior. Ada achieves this by describing a long list of implementation defined behaviour (Annex M). http://oopweb.com/Ada/Documents/Ada95RM/Volume/m.htm > As a famous example of C's underspecification, the behavior of > >a[i++] = i; > > is undefined in C99. Ada does not define ordering in all cases either. For example the order of elaboration of library_items (essentially the order in which modules are run in the absence of explicit declarations) is defined for GNAT as """ first elaborating bodies as early as possible (i.e. in preference to specs where there is a choice), and second by evaluating the immediate with clauses of a unit to determine the probably best choice, and third by elaborating in alphabetical order of unit names where a choice still remains """ Other compilers use different orders. I just love that "probably". Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Why is python not written in C++ ?
Grant Edwards: > That said, the last time I looked the Ada spec was only something like > 100 pages long, so a case could be made that it won't take long to > learn. I don't know how long the C++ language spec is, but I'm > betting it's closer to 1000 than 100. The Ada 2012 Language Reference Manual is 860 pages and the Ada 2005 LRM was 790 pages. The annotated versions are even longer http://www.ada-auth.org/standards/ada12.html Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Sun Grid Engine / NFS and Python shell execution question
J.B. Brown: > I believe the source of this problem is that os.popen() or os.system() > calls spawn subshells which then reference my shell resource files > (.zshrc, .cshrc, .bashrc, etc.). > But I don't see an alternative to os.popen{234} or os.system(). > os.exec*() cannot solve my problem, because it transfers execution to > that program and stops executing the script which called os.exec*(). Call fork then call exec from the new process. Search the web for "fork exec" to find examples in C. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Download Microsoft C/C++ compiler for use with Python 2.6/2.7 ASAP
sturlamolden: > Windows did this too (msvcrt.dll) up to the VS2003 release, which came > with msvcr71.dll in addition. Since then, M$ (pronounced Megadollar > Corp.) have published msvcr80.dll, msvcr90.dll, and msvcr100.dll (and > corresponding C++ versions) to annoy C and C++ developers into > converting to C# .NET. (And yes, programs using third-party DLL and > OCX components become unstable from this. You have to check each DLL/ > OCX you use, and each DLL used by each DLL, etc. How fun...) One of the benefits to COM is that it acts as a C runtime firewall - it has its own memory allocation interface (IMalloc / CoGetMalloc) and file I/O is performed through interfaces like IStream. It is quite common to use an OCX compiled with one compiler in an application compiled with another. If you break the rules by using malloc rather than IMalloc for memory that is deallocated by a different component to that which allocated it or try to pass around FILE* objects then you will see failures. So, always follow the COM rules. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python dynamic attribute creation
WANG Cong: > From what you are saying, Smalltalk picks a way similar to setattr() in > Python? addInstVarName is a method on ClassDescription objects. > Because you mentioned 'addInstVarName' which seems to be a > method or a builtin function. If so, that is my point, as I mentioned > earlier, switching to setattr() by default, instead of using assignments > by default. :) No, that was only part of the problems you enumerated. You want to easily distinguish adding instance variables (possibly also other things) since these are 'metaprogramming'. Now, once setattr is available it is possible to call setattr any place where a function can be called so you are not going to be able to determine whether any particular statement, including o.f=1 adds a new instance variable. You would have to change Python a lot more to be able to determine easily from inspection whether a given statement is 'metaprogramming'. The main problem I had was that you were saying that adding an instance variable is a special form of programming when its just an ordinary part of programming in most languages. > Hmm, although this is off-topic, I am interested in this too. C++ does > have metaprogramming, but that is actually static metaprogramming (using > templates), not dynamic metaprogramming here. C++ is still extremely limited. Can you show a C++ example where a metaprogram modifies an existing class to add an instance variable? Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python dynamic attribute creation
WANG Cong: > 4) Also, this will _somewhat_ violate the OOP princples, in OOP, > this is and should be implemented by inherence. Most object oriented programming languages starting with Smalltalk have allowed adding attributes (addInstVarName) to classes at runtime. Low level OOPLs like C++ and Delphi did not implement this for efficiency reasons. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Encoding troubles
JB: > as hypens (–) and apostrophes (’) are in an odd encoding. When passed > to the database using sqlalchemy they appear as – and other > characters. The encoding is UTF-8. Normally the best way to handle encodings is to convert to Unicode strings (unicode(s, "UTF-8")) as soon as possible and perform most processing in Unicode. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Download Visual Studio Express 2008 now
Martin v. Loewis: > Python 2.6, 2.7, and 3.1 are all built with that release (i.e. 2008). > Because of another long tradition, Python extension modules must be > built with the same compiler version (more specifically, CRT version) as > Python itself. So to build extension modules for any of these releases, > you need to have a copy of VS 2008 or VS 2008 Express. Is it too late for Python 2.7 to update to using Visual Studio 2010? It is going to be much easier for people to find and install the current version of VS than the previous. There is still more than 2 months left before 2.7 is planned to be released. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: msvcr90.dll is MIA?
Filip: > But what's so special about msvcr and visual studio compiler? Python > compiles fine with gcc under unixes, so is it a problem to compile > python interpreter with mingw and get rid of the proprietary runtime > dependecies? MinGW uses an older version of Microsoft's runtime MSVCRT.DLL. While MSVCRT.DLL is present in all commonly used versions of Windows, the particular version varies and, unless you have licensed an older version of Visual C++, you probably do not have the right to redistribute it. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating a rainbow?
Me: >You should use different variables for the two loops. Actually it is closing the divs that makes it work in FireFox: import colorsys sat = 1 value = 1 length = 1000 for h in range(0, length + 1): hue = h / float(length) color = list(colorsys.hsv_to_rgb(hue, sat, value)) for x in range(3): color[x] = int(color[x] * 255) hexval = ("#%02x%02x%02x" % tuple(color)).upper() print( "" "" % hexval) -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating a rainbow?
Tobiah: > for x in range(0, length + 1): > ... > for x in range(3): You should use different variables for the two loops. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python unicode and Windows cmd.exe
Guillermo: > 2) My script gets output from a Popen call (to execute a Powershell > script [new Windows shell language] from Python; it does make sense!). > I suppose changing the Windows codepage for a single Popen call isn't > straightforward/possible? You could try SetConsoleOutputCP and SetConsoleCP. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python unicode and Windows cmd.exe
Guillermo: > Is this an enforced convention under Windows, then? My head's aching > after so much pulling at my hair, but I have the feeling that the > problem only arises when text travels through the dos console... The console is commonly using Code Page 437 which is most compatible with old DOS programs since it can display line drawing characters. You can change the code page to UTF-8 with chcp 65001 Now, "type m.txt" with the original BOM-less file and it should be OK. You may also need to change the console font to one that is Unicode compatible like Lucida Console. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python unicode and Windows cmd.exe
Guillermo: > I then open the file m.txt with notepad, and I see "mañana" normally. > I save (again, no actual modifications), go back to the dos prompt, do > type m.txt and this time it works! I get "mañana". When notepad opens > the file, the encoding is already UTF-8, so short of a UTF-8 bom being > added to the file, That is what happens: the file now starts with a BOM \xEB\xBB\xBF as you can see with a hex editor. > I don't know what happens when I save the > unmodified file. Also, I would think that the python script should > save a valid utf-8 file in the first place... Its just as valid UTF-8 without a BOM. People have different opinions on this but for compatibility, I think it is best to always start UTF-8 files with a BOM. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone: > As it turns out each call is only > 646 nanoseconds slower than 'C'. > However, that is still 80% of the time to perform a file seek, > which I would think is a relatively slow operation compared to just > making a system call. A seek may not be doing much beyond setting a current offset value. It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call. An implementation of fseek will often return relatively quickly when the position is within the current buffer -- from line 192 in http://www.google.com/codesearch/p?hl=en#XAzRy8oK4zA/libc/stdio/fseek.c&q=fseek&sa=N&cd=1&ct=rc Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Passing FILE * types using ctypes
Zeeshan Quireshi: > Hello, I'm using ctypes to wrap a library i wrote. I am trying to pass > it a FILE *pointer, how do i open a file in Python and convert it to a > FILE *pointer. For this to work, your library should have been compiled with the same compiler as Python and possibly the same compiler options such as choice of runtime library. Otherwise, they may differ in the content and layout of FILE and also in behaviour. On Unix, this may not be a problem because of the shared runtime but on Windows it can cause crashes. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Spam from gmail
Steve Holden: > Spam is, at least from my point of view, UCE: unsolicited commercial > e-mail. Spam is more commonly defined as UBE (Unsolicited Bulk Email) of which UCE is a large subset. Its just as much spam if its pushing a political party or charity even though there may be no commercial advantage to the poster. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: TABS in the CPython C source code
Aahz: > BTW, in case anyone is confused, it's "svn blame" vs "cvs annotate". Possibly earlier versions of SVN only supported "blame" but the variants "annotate", "ann", and "praise" all work with the version of SVN (1.6.5) I have installed. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: TABS in the CPython C source code
Alf P. Steinbach: > The size-8 tabs look really bad in an editor configured with tab size 4, > as is common in Windows. I'm concluding that the CPython programmers > configure their Visual Studio's to *nix convention. Most of the core developers use Unix. > Anyways, I would suggest converting all those tabs to spaces, as e.g. > the Boost library project does -- no tabs allowed. This would damage the usefulness of source control histories (svn annotate) as all of the converted lines would show this recent cosmetic change rather than the previous change which is likely to be a substantive modification. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 3147 - new .pyc format
John Roth: > 4. I'm in favor of putting the source in the .pyr directory as well, > but that's got a couple more issues. One is tool support, which is > likely to be worse for source, and the other is some kind of algorithm > for identifying which source goes with which object. Many tools work recursively except for hidden directories so would return both the source in the repository as well as the original source. If you want to do this then the repository directory should be hidden by starting with ".". Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] Perl 6 [was Re: myths about python 3]
Looks to me like the problem with Perl 6 was that it was too ambitious, wanting to fix all perceived problems with the language. Python 3 is much more limited in scope: at its core its Python with Unicode fixed and old junk removed. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: myths about python 3
Carl Banks: > There is also no hope someone will fork Python 2.x and continue it in > perpetuity. Well, someone might try to fork it, but they won't be > able to call it Python. Over time there may be more desire from those unable or unwilling to upgrade to 3.x to work on improvements to 2.x, perhaps leading to a version 2.8. One of the benefits of open source is that you are not trapped into following vendor decisions like Microsoft abandoning classic VB in favour of VB.NET. It would be unreasonable for the core developers to try to block this. Refusing use of the Python trademark for a version that was reasonably compatible in both directions would be particularly petty. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: chr(12) Form Feed in Notepad (Windows)
W. eWatson wrote: > I am writing a txt file. It's up to the user to print it using Notepad > or some other tool. WordPad will interpret chr(12) as you want. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 3.1 cx_Oracle 5.0.2 "ImportError: DLL load failed: The specified module could not be found."
André: > Apparently the error is caused by cx_Oracle not being able to find the > Oracle client DLLs (oci.dll and others). The client home path and the > client home path bin directory are in the PATH System Variable and > oci.dll is there. Open the cx_Oracle extension with Dependency Walker (http://www.dependencywalker.com/) to get a better idea about what the problem is in more detail. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommended number of threads? (in CPython)
mk: > I found that when using more than several hundred threads causes weird > exceptions to be thrown *sometimes* (rarely actually, but it happens > from time to time). If you are running on a 32-bit environment, it is common to run out of address space with many threads. Each thread allocates a stack and this allocation may be as large as 10 Megabytes on Linux. With a 4 Gigabyte 32-bit address space this means that the maximum number of threads will be 400. In practice, the operating system will further subdivide the address space so only 200 to 300 threads will be possible. On Windows, I think the normal stack allocation is 1 Megabyte. The allocation is only of address space, not memory since memory can be mapped into this space when it is needed and many threads do not need very much stack. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: umlauts
The server is sniffing the User-Agent header to decide whether to send UTF-8 or ISO-8859-1. Try this code: import urllib2 r = urllib2.Request("http://www.google.de/ig/api?weather=Muenchen";, None, {"User-Agent":"Mozilla/5.0"}) f = urllib2.urlopen(r) i = f.info() print(i) xml = f.read() f.close() print(xml) Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: save windows clipboard content temporarily and restore later
kakarukeys: > I followed your hints, and wrote the following code. It works for most > clipboard formats except files. Selecting and copying a file, followed > by backup() and restore() throw an exception: For some formats the handle stored on the clipboard may not be a memory handle so may not be retrieved as memory. You could try using a list of formats to include or exclude or just pass over the exception. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: save windows clipboard content temporarily and restore later
kakarukeys: > Restoring the data with that format could result in information loss, > for example when HTML text is saved in ordinary text format. There is > no format that could preserve 100% of any kind of clipboard content. > > Does anyone has a brilliant solution? Enumerate all the clipboard formats with EnumClipboardFormats and grab the contents in each format then put them all back when finished. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode issue
Dave Angel: > I know that the clipboard has type tags, but I haven't looked at them in > so long that I forget what they look like. For text, is it just ASCII > and Unicode? Or are there other possible encodings that the source and > sink negotiate? The normal thing seen is that the clipboard differentiates between Unicode text and locale-dependent 8 bit text. Depending on platform Unicode text may be in UTF-8 (Linux) or UTF-16 (Windows). The encoding of 8-bit text strings is not well defined and is normally assumed to be compatible with whatever is currently in the document or the current user interface encoding. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Signing extensions
Roger Binns: > The Windows Python distribution is signed by PGP and the normal Microsoft > way using a Verisign class 3 cert. (If you read their issuer statement it > ultimately says the cert isn't worth the bits it is printed on :-) One of > those certs is $500 per year which is out of the question for me. Code signing certificates that will be be valid for Windows Authenticode cost $129 per year through CodeProject http://www.codeproject.com/services/certificates/index.aspx > Does anyone have any other suggestions? Has the PSF considered running a > certificate authority for extension developers, and other Python developers > for that matter? I'd like to see a certificate authority for open source projects based mainly on project reputation and longevity. There may need to be some payment to avoid flooding the CA with invalid requests - say $30 per year. It would be great if this CA was recognised by Microsoft and Apple as well as Linux and BSD distributions. There are some issues about identity here. Should the certificate be for the project, an individual, or an individual within a project? You want to know that PyExt1 comes from the genuine Ext1 project but the build will commonly be initiated by an individual who may later be found to be malicious. The Ext1 project should be able to revoke "Mal Icious of Ext1" and have future releases signed by "Trust Worthy of Ext1". Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Open file on remote linux server
The Bear: > Hi I'm looking to do something like this > > f = f.openfileobj(remotefileloc, localfilelikeobj) > > my remote files are on a solaris box that i can access using ssh (could > prehap request othe protocols if necessary) You could look into GIO which is a virtual file system API used in GTK+. I was a bit put off by it (necessarily) exposing the asynchronous nature of remote file operations. Its fun to write a small amount of asynchronous file I/O code but ensuring that all of your code handles all the potential problems with remote connections is tedious. Base library: http://library.gnome.org/devel/gio/stable/ Python bindings: http://library.gnome.org/devel/pygobject/stable/ Before committing to this, you should double check that these are the currently supported APIs. There was an earlier API GnomeVFS that has been deprecated for several years now and I don't follow this area closely. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I post to the wxPython mailing list?
PythonAB: > No, but it means that more of my data goes into the same company. > There's no way to use my own email accounts from my own domain, > and I don't have a choice anymore. I just checked and it allowed me to use an account from my domain so I expect it will work with yours. > In other words, if i want to be able to get the wxPython list mail, I'm > forced to use a google account, am I not? Yes, just as you set up an account with a Mailman server when subscribing to a Mailman list. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I post to the wxPython mailing list?
PythonAB: > I dont want to register with a google account, > is there any way to use a non-gmail account? A Google account does not mean you have to use gmail. The Google account is used to handle your interaction with Google services and can be used in conjunction with arbitrary email accounts. Just create a Google account and set the email address to your preferred address. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: An assessment of the Unicode standard
Chris Jones: > Is the implication that the principal usefulness of such languages as > Hindi and "other Indian languages" is us selling "things" to them..? Unicode was developed by a group of US corporations: Xerox, Apple, Sun, Microsoft, ... The main motivation was to avoid dealing with multiple character set encodings since this was difficult, time consuming and expensive. > I > am not from these climes but all the same, I do find you tone of voice > rather offensive, considering that you are referring to a culture that's > about 3000 years older and 3000 richer than ours and certainly deserves > our respect. Eh? Was Unicode developed in India? China? What precisely is direspectful here? Is there a significant population that regards Unicode as their 'holy patrimony' that will suffer distress due to my post? > Maybe you didn't notice, but our plants shut down many years ago.. They > are selling _us_ their wares. Maybe your plants shut down but some of the plants I have worked at (such as the steelworks at Port Kembla) are still successfully exporting to Asia. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: An assessment of the Unicode standard
Benjamin Peterson: > Like Sanskrit or Snowman language? Sanskrit is mostly written in Devanagari these days which is also useful for selling things to people who speak Hindi and other Indian languages. Not sure if you are referring to the ☃ snowman character or Arctic region languages like Canadian Aboriginal syllabic writing like ᐲᐦᒑᔨᕽ which were added to Unicode 8 years after the initial version. I'd guess that was added from political rather than marketing motives. ☃ was required since it was present in Japanese character sets. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: An assessment of the Unicode standard
r: > Unicode (*puke*) seems nothing more than a brain fart of morons. And > sadly it was created by CS majors who i assumed used logic and > deductive reasoning but i must be wrong. Why should the larger world > keep supporting such antiquated languages and character sets through > Unicode? What purpose does this serve? Are we merely trying to make > everyone happy? A sort of Utopian free-language-love-fest-kinda- > thing? Wow, I like this world you live in: all that altruism! Unicode was developed by corporations from the US left coast in order to sell their products in foreign markets at minimal cost. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Annoying octal notation
Steven D'Aprano: > Obviously I can't speak for Ken Thompson's motivation in creating this > feature, but I'm pretty sure it wasn't to save typing or space on > punchcards. The original implementation of UNIX was on a PDP-7 which was an 18-bit machine. Octal = 3 bits at a a time which evenly divides an 18-bit word whereas the 4 bits of hexadecimal do not. Early implementations of B were (according to Wikipedia) on the PDP-7, PDP-11 (a 16-bit machine) and Honeywell 36-bit mainframes. Octal was widely used on the PDP-11. DEC's PDP-11 Assembler defaulted to octal and didn't even support hexadecimal. The prefixes used in MACRO-11 for explicit radixes were ^D, ^O, and ^B. http://computer-refuge.org/bitsavers/pdf/dec/pdp11/rsx11/RSX11M_V2/DEC-11-OIMRA-A-D_MACRO_75.pdf Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Read C++ enum in python
AggieDan04: > file_data = open(filename).read() > # Remove comments and preprocessor directives > file_data = ' '.join(line.split('//')[0].split('#')[0] for line in > file_data.splitlines()) > file_data = ' '.join(re.split(r'\/\*.*\*\/', file_data)) For some headers I tried it didn't work until the .* was changed to a non-greedy .*? to avoid removing from the start of the first comment to the end of the last comment. file_data = ' '.join(re.split(r'\/\*.*?\*\/', file_data)) Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Is python buffer overflow proof?
Thorsten Kampe: > You cannot create "your own" buffer overflow in Python as you can in C > and C++ but your code could still be vulnerable if the underlying Python > construct is written in C. Python's standard library does now include unsafe constructs. import ctypes x = '1234' # Munging byte 1 OK ctypes.memset(x, 1, 1) print(x) # Next line writes beyond end of variable and crashes ctypes.memset(x, 1, 2) print(x) Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: No PyPI search for 3.x compatible packages
Francesco Bochicchio : > Are you sure? I note that for example pygtk has as language tags both > C and python. So maybe a C extension > for python3 would have both C and python 3 as language tags. > > I suspect that the 109 packages you found are the only ones obf the > 4829 which works with python3 (but I hope > to be wrong ). There are 523 packages marked with Python :: [2, 2.1, ...] and I hope more than that work with Python 2.x. What I would like to encourages is adding Python :: 3 or a more specific minimum supported version classifier to packages that support Python 3. I wouldn't want to go to the level of adding a classifier for each version of Python supported since a package will often stay valid when a new Python is released. Neil -- http://mail.python.org/mailman/listinfo/python-list
No PyPI search for 3.x compatible packages
There appears to be no way to search PyPI for packages that are compatible with Python 3.x. There are classifiers for 'Programming Language' including 'Programming Language :: Python :: 3' but that seems to be for implementation language since there are so many packages that specify C. There are a total of 109 packages classified with Python :: [3, 3.0, 3.1] out of a total of 4829 packages. http://pypi.python.org/pypi?:action=browse&show=all&c=214&c=533 The search box appears to search for any word entered so a search like "xml 3.0" or "xml AND 3.0" does not help. Some packages include version information in the Py Version column of their download lists or embedded in the download file names. Projects are often constrained to a particular set of Python versions so need to choose packages that will work with those versions. It would be helpful if PyPI made this information more visible and searchable. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Mutable Strings - Any libraries that offer this?
Mark Lawrence: > If my sleuthing is correct the problem is with these lines > > ilow *= self->itemSize; > ihigh *= self->itemSize; > > in GapBuffer_slice being computed before ilow and ihigh are compared to > anything. This particular bug was because ihigh is the maximum 32 bit integer 2147483647 so multiplying it by the integer item size (4) caused overflow. Adding an extra check fixes this: if (ihigh > self->lengthBody / self->itemSize) ihigh = self->lengthBody / self->itemSize; Committed a new version 1.02 and new downloads are available from Google code. http://code.google.com/p/gapbuffer/downloads/list Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: If Scheme is so good why MIT drops it?
milanj: > and all of them use native threads (python still use green threads ?) Python uses native threads. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Mutable Strings - Any libraries that offer this?
casebash: > I have searched this list and found out that Python doesn't have a > mutable string class (it had an inefficient one, but this was removed > in 3.0). Are there any libraries outside the core that offer this? I wrote a gap buffer implementation for Python 2.5 allowing character, unicode character and integer elements. http://code.google.com/p/gapbuffer/ Its not seen much use or any maintenance so is unlikely to work with Python 3.x. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: pxssh submit su commands = very very slow
gert: This works but after the su command you have to wait like 2 minutes before each command gets executed ? s.sendline ('su') s.expect('Password:') A common idiom seems to be to omit the start of the expected reply since it may not be grabbed quickly enough. Then the prompt has to time out. Try s.expect('assword:') Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: finding icons for Apps
Sanoski: Where can I find icons to use with my programs? http://sourceforge.net/projects/icon-collection/ Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Any other UI kits with text widget similar to that in Tk?
Kenneth McDonald: I'm wondering if any of the other GUI kits have a text widget that is similar in power to the one in Tk. The main text widget in GTK+ was modeled after Tk but I don't know how well it succeeded in implementing this. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: MESSAGE RESPONSE
ajaksu: Me too. That is, until I tried to Google Belcan and Blubaugh together. Or google for "Blubaugh, David" or similar. Repeating a message you object to actually increases its visibility and includes you in its footprint. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Confused about Boost.Python & bjam
Till Kranz: > I tried to get started with Boost.Python. unfortunately I never used the > bjam build system before. As it is explained in the documentation I > tried to adapt the the files form the examples directory. I copied > 'Jamroot', 'boost_build.jam' and 'extending.cpp' to '~/test/'. But I am > lost as to what to do now. Yes, Boost.Python is difficult to use and the documentation isn't that clear. I've pretty much given up for now but I do have a project called SinkWorld that works with Boost.Python. You can access the CVS at http://scintilla.cvs.sourceforge.net/scintilla/sinkworld/ and the jam config files are in the tentacle/python directory as http://scintilla.cvs.sourceforge.net/scintilla/sinkworld/tentacle/python/Jamfile?revision=1.6&view=markup http://scintilla.cvs.sourceforge.net/scintilla/sinkworld/tentacle/python/boost-build.jam?revision=1.2&view=markup http://scintilla.cvs.sourceforge.net/scintilla/sinkworld/tentacle/python/Jamrules?revision=1.2&view=markup There is a mailing list which may be able to help: http://mail.python.org/mailman/listinfo/c++-sig Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Which way to access Scintilla
Alex: > I also want to embed Scintilla in Tkinter-created window (create the > rest of the GUI in Tkinter), or rather, I want to know if that's > possible at all. Any suggestions are appreciated. While it may be possible with sufficient dedication, it is unlikely to be simple. If you really want to use Tkinter then you are probably better off using some existing code that uses its text widget from Python such as Idle. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Why the HELL has nobody answered my question !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Steve Holden wrote: > ... > Look guys, I thought we'd agreed that the PSU was no longer to be How did Steve manage to click send again after the para -- http://mail.python.org/mailman/listinfo/python-list
Re: Looping through the gmail dot trick
Martin Marcher: > are you saying that when i have 2 gmail addresses > > "[EMAIL PROTECTED]" and > "[EMAIL PROTECTED]" > > they are actually treated the same? That is plain wrong and would break a > lot of mail addresses as I have 2 that follow just this pattern and they > are delivered correctly! This is a feature of some mail services such as Gmail, not of email addresses generically. One use is to provide a set of addresses given one base address. '+' works as well as '.' so when I sign up to service monty I give them the address [EMAIL PROTECTED] Then when I receive spam at nyamatongwe+monty, I know who to blame and what to block. Neil -- http://mail.python.org/mailman/listinfo/python-list